Compare commits

..

30 Commits

Author SHA1 Message Date
307b003e11 Non-enum opcodes 2022-02-01 14:28:24 +01:00
85211b127d Impl missing operators for bytecode 2022-01-31 23:55:44 +01:00
02f258415d Impl dbgprint bytecode 2022-01-31 23:50:52 +01:00
5eae0712bf Add bytecode vm interpreter 2022-01-31 22:14:05 +01:00
e28b3c4f37 Partial refactoring of parser 2022-01-29 19:20:51 +01:00
9e3a642810 Refactor lexer 2022-01-29 14:55:22 +01:00
e62121c75b Implement for loop 2022-01-29 12:29:02 +01:00
ffdce64df8 Update README 2022-01-29 12:28:59 +01:00
d2daa7ae6d Implement non-debug print 2022-01-29 12:28:50 +01:00
abf9eb73c8 Implement strings 2022-01-29 12:28:46 +01:00
39b55b51da Improve runtime performance by 7x 2022-01-28 20:42:21 +01:00
5bf989a640 Implement simple cli 2022-01-28 20:22:50 +01:00
3dacee0be4 Implement debug print 2022-01-28 20:09:15 +01:00
d035724d20 Implement if else 2022-01-28 19:47:07 +01:00
24f5aa30ea Update README 2022-01-28 19:34:57 +01:00
2a014fd210 Implement while loop 2022-01-28 19:34:31 +01:00
788c4a8e82 Implement assignment as binop 2022-01-28 18:56:16 +01:00
7646177030 Implement variable declaration 2022-01-28 18:49:30 +01:00
b128b3357a Update grammar definition 2022-01-28 15:11:46 +01:00
4d5188d9d6 Implement relational binops
- Gt: Greater than
- Ge: Greater or equal
- Lt: Less than
- Le: Less or equal
2022-01-28 15:07:28 +01:00
e28a990b85 Update README 2022-01-28 14:58:21 +01:00
1f1f589dd4 Lex true/false as 1/0 2022-01-28 14:55:10 +01:00
6816392173 Implement equ, neq comparison 2022-01-28 14:46:55 +01:00
3c6fb5466e Implement unary negation 2022-01-28 14:21:57 +01:00
74dbf724a5 Implement parentheses grouping 2022-01-28 14:11:39 +01:00
807482583a Update grammar definition 2022-01-28 14:00:51 +01:00
7b86fecc6f Update README 2022-01-28 12:20:59 +01:00
6b91264f84 Implement more operators
- Mod
- Bitwise Or
- Bitwise And
- Bitwise Xor
- Shift Left
- Shift Right
2022-01-27 23:15:16 +01:00
d9246c7ea1 Implement div & sub 2022-01-27 22:29:06 +01:00
1c4943828f Number separator _ 2022-01-27 21:38:58 +01:00
25 changed files with 1119 additions and 2966 deletions

View File

@ -4,4 +4,3 @@ version = "0.1.0"
edition = "2021" edition = "2021"
[dependencies] [dependencies]
thiserror = "1.0.30"

429
README.md
View File

@ -1,400 +1,38 @@
# NEK-Lang # NEK-Lang
## Table of contents
- [NEK-Lang](#nek-lang)
- [Table of contents](#table-of-contents)
- [Variables](#variables)
- [Declaration](#declaration)
- [Assignment](#assignment)
- [Datatypes](#datatypes)
- [I64](#i64)
- [String](#string)
- [Array](#array)
- [Expressions](#expressions)
- [General](#general)
- [Mathematical Operators](#mathematical-operators)
- [Bitwise Operators](#bitwise-operators)
- [Logical Operators](#logical-operators)
- [Equality & Relational Operators](#equality--relational-operators)
- [Control-Flow](#control-flow)
- [Loop](#loop)
- [If / Else](#if--else)
- [Block Scopes](#block-scopes)
- [Functions](#functions)
- [Function definition](#function-definition)
- [Function calls](#function-calls)
- [IO](#io)
- [Print](#print)
- [Comments](#comments)
- [Line comments](#line-comments)
- [Feature Tracker](#feature-tracker)
- [High level Components](#high-level-components)
- [Language features](#language-features)
- [Parsing Grammar](#parsing-grammar)
- [Expressions](#expressions-1)
- [Statements](#statements)
- [Examples](#examples)
- [Extras](#extras)
- [Visual Studio Code Language Support](#visual-studio-code-language-support)
## Variables
The variables are all contained in scopes. Variables defined in an outer scope can be accessed in
inner scoped. All variables defined in a scope that has ended do no longer exist and can't be
accessed.
### Declaration
- Declare and initialize a new variable
- Declaring a previously declared variable again will shadow the previous variable
- Declaration is needed before assignment or other usage
- The variable name is on the left side of the `<-` operator
- The assigned value is on the right side and can be any expression
```
a <- 123;
```
Create a new variable named `a` and assign the value `123` to it.
### Assignment
- Assigning a value to a previously declared variable
- The variable name is on the left side of the `=` operator
- The assigned value is on the right side and can be any expression
```
a = 123;
```
The value `123` is assigned to the variable named `a`. `a` needs to be declared before this.
## Datatypes
The available variable datatypes are `i64` (64-bit signed integer), `string` (`"this is a string"`) and `array` (`[10]`)
### I64
- The normal default datatype is `i64` which is a 64-bit signed integer
- Can be created by just writing an integer literal like `546`
- Inside the number literal `_` can be inserted for visual separation `100_000`
- The i64 values can be used as expected in calculations, conditions and so on
```
my_i64 <- 123_456;
```
### String
- Strings mainly exist for formatting the text output of a program
- Strings can be created by using doublequotes like in other languages `"Hello world"`
- There is no way to access or change the characters of the string
- Unicode characters are supported `"Hello 🌎"`
- Escape characters `\n`, `\r`, `\t`, `\"`, `\\` are supported
- String can be assigned to variables, just like i64
```
world <- "🌎";
print "Hello ";
print world;
print "\n";
```
### Array
- Arrays can contain any other datatypes and don't need to have the same type in all cells
- Arrays can be created by using brackets with the size in between `[size]`
- Arrays must be assigned to a variable in order to be used
- All cells will be initialized with i64 0 values
- The size can be any expression that results in a positive i64 value
- The array size can't be changed after creation
- The arrays data is always allocated on the heap
- The array cells can be accessed by using the variable name and specifying the index in brackets
`my_arr[index]`
- The index can be any expression that results in a positive i64 value in the range of the arrays
indices
- The indices start with 0
- When an array is passed to a function, it is passed by reference
```
width <- 5;
heigt <- 5;
// Initialize array of size 25, initialized with 25x 0
my_array = [width * height];
// Modify first value
my_array[0] = 5;
// Print first value
// Outputs `5`
print my_array[0];
```
## Expressions
The operator precedence is the same order as in `C` for all implemented operators.
Refer to the
[C Operator Precedence Table](https://en.cppreference.com/w/c/language/operator_precedence)
to see the different precedences.
### General
- Parentheses `(` and `)` can be used to modify evaluation oder just like in any other
programming language.
- For example `(a + b) * c` will evaluate the addition before the multiplication, despite the multiplication having higher binding power
### Mathematical Operators
Supported mathematical operations:
- Addition `a + b`
- Subtraction `a - b`
- Multiplication `a * b`
- Division `a / b`
- Modulo `a % b`
- Negation `-a`
### Bitwise Operators
- And `a & b`
- Or `a | b`
- Xor `a ^ b`
- Bitshift left (by `b` bits) `a << b`
- Bitshift right (by `b` bits) `a >> b`
- "Bit flip" (One's complement) `~a`
### Logical Operators
The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0`.
Note that logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operation will be evaluated, even if it might not be necessary.
- And `a && b`
- Or `a || b`
- Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`)
### Equality & Relational Operators
The equality and relational operations result in `1` if the condition is evaluated as `true` and in `0` if the condition is evaluated as `false`.
- Equality `a == b`
- Inequality `a != b`
- Greater than `a > b`
- Greater or equal than `a >= b`
- Less than `a < b`
- Less or equal than `a <= b`
## Control-Flow
For conditions like in if or loops, every non-zero value is equal to `true`, and `0` is `false`.
### Loop
- The `loop` keyword can be used as an infinite loop, as a while loop or as a while loop with
advancement (an expression that is executed after each loop)
- If only `loop` is used, directly followed by the body, it is an infinite loop that needs to be
terminated by using the `break` keyword
- The `loop` keyword can be followed by the condition (an expression) without needing parentheses
- *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement
- The loops body is wrapped in braces (`{ }`) just like in C/C++
- The `continue` keyword can be used to end the current loop iteration early
- The `break` keyword can be used to fully break out of the current loop
```
// Print the numbers from 0 to 9
// With endless loop
i <- 0;
loop {
if i >= 10 {
break;
}
print i;
i = i + 1;
}
// Without advancement
i <- 0;
loop i < 10 {
print i;
i = i + 1;
}
// With advancement
k <- 0;
loop k < 10; k = k + 1 {
print k;
}
```
### If / Else
- The language supports `if` and an optional `else`
- After the `if` keyword must be the deciding condition, parentheses are not needed
- The blocks are wrapped in braces (`{ }`)
- *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block
- NOTE: Logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operations will be evaluated, even if it might not be necessary
```
a <- 1;
b <- 2;
if a == b {
// a is equal to b
print 1;
} else {
// a is not equal to b
print 0;
}
```
### Block Scopes
- It is possible to create a limited scope for local variables that will no longer exist once the
scope ends
- Shadowing variables by redefining a variable in an inner scope is supported
```
var_in_outer_scope <- 5;
{
var_in_inner_scope <- 3;
// Inner scope can access both vars
print var_in_outer_scope;
print var_in_inner_scope;
}
// Outer scope is still valid
print var_in_outer_scope;
// !!! THIS DOES NOT WORK !!!
// The inner scope has ended
print var_in_inner_scope;
```
## Functions
### Function definition
- Functions can be defined by using the `fun` keyword, followed by the function name and the
parameters in parentheses. After the parentheses, the body is specified inside a braces block
- The function parameters are specified by only their names
- The function body has its own scope
- Parameters are only accessible inside the body
- Variables from the outer scope can be accessed and modified if the are defined before the function
- Variables from the outer scope are shadowed by parameters or local variables with the same name
- The `return` keyword can be used to return a value from the function and exit it immediately
- If no return is specified, a special `void` value is returned. That value can't be used in
calculations or comparisons, but can be stored in a variable (even tho it doesn't make sense)
- Functions can only be defined at the top-level. So defining a function inside of any other scoped
block (like inside another function, if, loop, ...) is invalid
- Functions can only be used after definition and there is no forward declaration right now
- However a function can be called recursively inside of itself
- Functions can't be redefined, so defining a function with an existing name is invalid
```
fun add_maybe(a, b) {
if a < 100 {
return a;
} else {
return a + b;
}
}
fun println(val) {
print val;
print "\n";
}
```
### Function calls
- Function calls are primary expressions, so they can be directly used in calculations (if they
return appropriate values)
- Function calls are performed by writing the function name, followed by the arguments in parentheses
- The arguments can be any expressions, separated by commas
```
b <- 100;
result <- add_maybe(250, b);
// Prints 350 + new-line
println(result);
```
## IO
### Print
Printing is implemented via the `print` keyword
- The `print` keyword is followed by an expression, the value of which will be printed to the terminal
- To add a line break a string print can be used `print "\n";`
```
a <- 1;
// Outputs `1` to the terminal
print a;
// Outputs a new-line to the terminal
print "\n";
```
## Comments
### Line comments
Line comments can be initiated by using `//`
- Everything after `//` up to the end of the current line is ignored and not parsed
```
// This is a comment
```
# Feature Tracker
## High level Components ## High level Components
- [x] Lexer: Transforms text into Tokens - [x] Lexer: Transforms text into Tokens
- [x] Parser: Transforms Tokens into Abstract Syntax Tree - [x] Parser: Transforms Tokens into Abstract Syntax Tree
- [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements - [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements
- [x] Simple optimizer: Apply trivial optimizations to the Ast - [ ] Abstract Syntax Tree Optimizer
- [x] Precalculate binary ops / unary ops that have only literal operands
## Language features ## Language features
- [x] General expressions - [x] Math expressions
- [x] Arithmetic operations - [x] Unary operators
- [x] Addition `a + b` - [x] Negate `-X`
- [x] Subtraction `a - b` - [x] Parentheses `(X+Y)*Z`
- [x] Multiplication `a * b`
- [x] Division `a / b`
- [x] Modulo `a % b`
- [x] Negate `-a`
- [x] Parentheses `(a + b) * c`
- [x] Logical boolean operators - [x] Logical boolean operators
- [x] Equal `a == b`
- [x] Not equal `a != b`
- [x] Greater than `a > b`
- [x] Less than `a < b`
- [x] Greater than or equal `a >= b`
- [x] Less than or equal `a <= b`
- [x] Logical operators
- [x] And `a && b`
- [x] Or `a || b`
- [x] Not `!a`
- [x] Bitwise operators
- [x] Bitwise AND `a & b`
- [x] Bitwise OR `a | b`
- [x] Bitwise XOR `a ^ b`
- [x] Bitwise NOT `~a`
- [x] Bitwise left shift `a << b`
- [x] Bitwise right shift `a >> b`
- [x] Variables - [x] Variables
- [x] Declaration - [x] Declaration
- [x] Assignment - [x] Assignment
- [x] Local variables (for example inside loop, if, else, functions) - [x] While loop `while X { ... }`
- [x] Scoped block for specific local vars `{ ... }`
- [x] Statements with semicolon & Multiline programs
- [x] Control flow
- [x] Loops
- [x] While-style loop `loop X { ... }`
- [x] For-style loop without with `X` as condition and `Y` as advancement `loop X; Y { ... }`
- [x] Infinite loop `loop { ... }`
- [x] Break `break`
- [x] Continue `continue`
- [x] If else statement `if X { ... } else { ... }` - [x] If else statement `if X { ... } else { ... }`
- [x] If Statement - [x] If Statement
- [x] Else statement - [x] Else statement
- [x] Line comments `//` - [ ] Line comments `//`
- [x] Strings - [x] Strings
- [x] Arrays - [x] For loops `for X; Y; Z { ... }`
- [x] Creating array with size `X` as a variable `arr <- [X]` - [ ] IO Intrinsics
- [x] Accessing arrays by index `arr[X]`
- [x] IO Intrinsics
- [x] Print - [x] Print
- [x] Functions - [ ] ReadLine
- [x] Function declaration `fun f(X, Y, Z) { ... }`
- [x] Function calls `f(1, 2, 3)`
- [x] Function returns `return X`
- [x] Local variables
- [x] Pass arrays by-reference, i64 by-vale, string is a const ref
# Parsing Grammar ## Grammar
### Expressions
## Expressions
``` ```
ARRAY_LITERAL = "[" expr "]" LITERAL = I64 | Str
ARRAY_ACCESS = IDENT "[" expr "]" expr_primary = LITERAL | IDENT | "(" expr ")" | "-" expr_primary
FUN_CALL = IDENT "(" (expr ",")* expr? ")"
LITERAL = I64_LITERAL | STR_LITERAL | ARRAY_LITERAL
expr_primary = LITERAL | IDENT | FUN_CALL | ARRAY_ACCESS | "(" expr ")" | "-" expr_primary
| "~" expr_primary
expr_mul = expr_primary (("*" | "/" | "%") expr_primary)* expr_mul = expr_primary (("*" | "/" | "%") expr_primary)*
expr_add = expr_mul (("+" | "-") expr_mul)* expr_add = expr_mul (("+" | "-") expr_mul)*
expr_shift = expr_add ((">>" | "<<") expr_add)* expr_shift = expr_add ((">>" | "<<") expr_add)*
@ -403,38 +41,17 @@ expr_equ = expr_rel (("==" | "!=") expr_rel)*
expr_band = expr_equ ("&" expr_equ)* expr_band = expr_equ ("&" expr_equ)*
expr_bxor = expr_band ("^" expr_band)* expr_bxor = expr_band ("^" expr_band)*
expr_bor = expr_bxor ("|" expr_bxor)* expr_bor = expr_bxor ("|" expr_bxor)*
expr_land = expr_bor ("&&" expr_bor)* expr = expr_bor
expr_lor = expr_land ("||" expr_land)*
expr = expr_lor
``` ```
## Statements ## Statements
``` ```
stmt_return = "return" expr ";" stmt_expr = expr
stmt_break = "break" ";" stmt_let = "let" IDENT "=" expr
stmt_continue = "continue" ";" stmt_while = "while" expr "{" (stmt)* "}"
stmt_var_decl = IDENT "<-" expr ";" stmt_for = "for" stmt_let ";" expr ";" expr "{" (stmt)* "}"
stmt_fun_decl = "fun" IDENT "(" (IDENT ",")* IDENT? ")" "{" stmt* "}" stmt_if = "if" expr "{" (stmt)* "}" ( "else" "{" (stmt)* "}" )
stmt_expr = expr ";" stmt_dbgprint = "$$" expr
stmt_block = "{" stmt* "}" stmt_print = "$" expr
stmt_loop = "loop" (expr (";" expr)?)? "{" stmt* "}" stmt = stmt_expr | stmt_let | stmt_while | stmt_for | stmt_if | stmt_dbgprint | stmt_print
stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")?
stmt_print = "print" expr ";"
stmt = stmt_return | stmt_break | stmt_continue | stmt_var_decl | stmt_fun_decl
| stmt_expr | stmt_block | stmt_loop | stmt_if | stmt_print
``` ```
# Examples
There are a bunch of examples in the [examples](examples/) directory. Those include (non-optimal) solutions to the first five project euler problems, as well as a [simple Game of Life implementation](examples/game_of_life.nek).
To run an example via `cargo-run`, use:
```
cargo run --release -- examples/[NAME]
```
# Extras
## Visual Studio Code Language Support
A VSCode extension that provides simple syntax highlighing for nek is also available on
[gitlab](https://code.fbi.h-da.de/advanced-systems-programming-ws21/x4/nek-lang-vscode). Since this
is a very small scale project, the extension was not published and instuctions on how to install it
can be found in the mentioned repository.

View File

@ -1,15 +0,0 @@
// If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9.
// The sum of these multiples is 23.
// Find the sum of all the multiples of 3 or 5 below 1000.
//
// Correct Answer: 233168
sum <- 0;
i <- 0;
loop i < 1_000; i = i + 1 {
if i % 3 == 0 || i % 5 == 0 {
sum = sum + i;
}
}
print sum;

View File

@ -1,24 +0,0 @@
// Each new term in the Fibonacci sequence is generated by adding the previous two terms.
// By starting with 1 and 2, the first 10 terms will be:
// 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
// By considering the terms in the Fibonacci sequence whose values do not exceed four million,
// find the sum of the even-valued terms.
//
// Correct Answer: 4613732
sum <- 0;
a <- 0;
b <- 1;
loop a < 4_000_000 {
if a % 2 == 0 {
sum = sum + a;
}
tmp <- a;
a = b;
b = b + tmp;
}
print sum;

View File

@ -1,29 +0,0 @@
// The prime factors of 13195 are 5, 7, 13 and 29.
// What is the largest prime factor of the number 600851475143 ?
//
// Correct Answer: 6857
number <- 600_851_475_143;
result <- 0;
div <- 2;
loop number > 1 {
loop number % div == 0 {
if div > result {
result = div;
}
number = number / div;
}
div = div + 1;
if div * div > number {
if number > 1 && number > result {
result = number;
}
break;
}
}
print result;

View File

@ -1,31 +0,0 @@
// A palindromic number reads the same both ways. The largest palindrome made from the product of
// two 2-digit numbers is 9009 = 91 × 99.
// Find the largest palindrome made from the product of two 3-digit numbers.
//
// Correct Answer: 906609
fun reverse(n) {
rev <- 0;
loop n {
rev = rev * 10 + n % 10;
n = n / 10;
}
return rev;
}
res <- 0;
i <- 100;
loop i < 1_000; i = i + 1 {
k <- i;
loop k < 1_000; k = k + 1 {
num <- i * k;
num_rev <- reverse(num);
if num == num_rev && num > res {
res = num;
}
}
}
print res;

View File

@ -1,24 +0,0 @@
# A palindromic number reads the same both ways. The largest palindrome made from the product of
# two 2-digit numbers is 9009 = 91 × 99.
# Find the largest palindrome made from the product of two 3-digit numbers.
#
# Correct Answer: 906609
def reverse(n):
rev = 0
while n:
rev = rev * 10 + n % 10
n //= 10
return rev
res = 0
for i in range(100, 1_000):
for k in range(i, 1_000):
num = i * k
num_rev = reverse(num)
if num == num_rev and num > res:
res = num
print(res)

View File

@ -1,23 +0,0 @@
// 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
// What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
//
// Correct Answer: 232_792_560
fun gcd(x, y) {
loop y {
tmp <- x;
x = y;
y = tmp % y;
}
return x;
}
result <- 1;
i <- 1;
loop i <= 20; i = i + 1 {
result = result * (i / gcd(i, result));
}
print result;

View File

@ -1,15 +0,0 @@
# 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
# What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
#
# Correct Answer: 232_792_560
def gcd(x, y):
while y:
x, y = y, x % y
return x
result = 1
for i in range(1, 21):
result *= i // gcd(i, result)
print(result)

View File

@ -1,134 +0,0 @@
fun print_field(field, width, height) {
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
if field[y*height + x] {
print "# ";
} else {
print ". ";
}
}
print "\n";
}
print "\n";
}
fun count_neighbours(field, x, y, width, height) {
neighbours <- 0;
if y > 0 {
if x > 0 {
if field[(y-1)*width + (x-1)] {
// Top left
neighbours = neighbours + 1;
}
}
if field[(y-1)*width + x] {
// Top
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y-1)*width + (x+1)] {
// Top right
neighbours = neighbours + 1;
}
}
}
if x > 0 {
if field[y*width + (x-1)] {
// Left
neighbours = neighbours + 1;
}
}
if x < width-1 {
if field[y*width + (x+1)] {
// Right
neighbours = neighbours + 1;
}
}
if y < height-1 {
if x > 0 {
if field[(y+1)*width + (x-1)] {
// Bottom left
neighbours = neighbours + 1;
}
}
if field[(y+1)*width + x] {
// Bottom
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y+1)*width + (x+1)] {
// Bottom right
neighbours = neighbours + 1;
}
}
}
return neighbours;
}
fun copy(from, to, len) {
i <- 0;
loop i < len; i = i + 1 {
to[i] = from[i];
}
}
// Set the width and height of the field
width <- 10;
height <- 10;
// Create the main and temporary field
field <- [width*height];
field2 <- [width*height];
// Preset the main field with a glider
field[1] = 1;
field[12] = 1;
field[20] = 1;
field[21] = 1;
field[22] = 1;
fun run_gol(num_rounds) {
runs <- 0;
loop runs < num_rounds; runs = runs + 1 {
// Print the field
print_field(field, width, height);
// Calculate next stage from field and store into field2
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
// Get the neighbours of the current cell
neighbours <- count_neighbours(field, x, y, width, height);
// Set the new cell according to the neighbour count
if neighbours < 2 || neighbours > 3 {
field2[y*width + x] = 0;
} else {
if neighbours == 3 {
field2[y*width + x] = 1;
} else {
field2[y*width + x] = field[y*width + x];
}
}
}
}
// Transfer from field2 to field
copy(field2, field, width*height);
}
}
run_gol(32);

View File

@ -1,9 +0,0 @@
fun fib(n) {
if n <= 1 {
return n;
} else {
return fib(n-1) + fib(n-2);
}
}
print fib(30);

View File

@ -1,6 +0,0 @@
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
print(fib(30))

View File

@ -1,31 +0,0 @@
fun square(a) {
return a * a;
}
fun add(a, b) {
return a + b;
}
fun mul(a, b) {
return a * b;
}
// Funtion with multiple args & nested calls to different functions
fun addmul(a, b, c) {
return mul(add(a, b), c);
}
a <- 10;
b <- 20;
c <- 3;
result <- addmul(a, b, c) + square(c);
// Access and modify outer variable. Argument `a` must not be used from outer var
fun sub_from_result(a) {
result = result - a;
}
sub_from_result(30);
print result;

View File

@ -1,211 +1,104 @@
use std::rc::Rc; use std::rc::Rc;
use crate::stringstore::{Sid, StringStore}; /// Types for binary operators
/// Types for binary operations
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum BinOpType { pub enum BinOpType {
/// Addition ("+") /// Addition
Add, Add,
/// Subtraction ("-") /// Subtraction
Sub, Sub,
/// Multiplication ("*") /// Multiplication
Mul, Mul,
/// Division ("/") /// Divide
Div, Div,
/// Modulo / Remainder ("%") /// Modulo
Mod, Mod,
/// Compare Equal ("==") /// Bitwise OR (inclusive or)
EquEqu,
/// Compare Not Equal ("!=")
NotEqu,
/// Compare Less than ("<")
Less,
/// Compare Less than or Equal ("<=")
LessEqu,
/// Compare Greater than (">")
Greater,
/// Compare Greater than or Equal (">=")
GreaterEqu,
/// Bitwise Or ("|")
BOr, BOr,
/// Bitwise And ("&") /// Bitwise And
BAnd, BAnd,
/// Bitwise Xor / Exclusive Or ("^") /// Bitwise Xor (exclusive or)
BXor, BXor,
/// Logical And ("&&") /// Shift Left
LAnd,
/// Logical Or ("||")
LOr,
/// Bitwise Shift Left ("<<")
Shl, Shl,
/// Bitwise Shift Right (">>") /// Shift Right
Shr, Shr,
/// Assign value to variable ("=") /// Check equality
Equ,
/// Check unequality
Neq,
/// Check greater than
Gt,
/// Check greater or equal
Ge,
/// Check less than
Lt,
/// Check less or equal
Le,
/// Assign to a variable
Assign, Assign,
} }
/// Types for unary operations /// Types for unary operators
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum UnOpType { pub enum UnOpType {
/// Unary Negation ("-") /// Negation
Negate, Neg,
/// Bitwise Not / Bitflip ("~")
BNot,
/// Logical Not ("!")
LNot,
} }
/// Ast Node for possible Expression variants /// A full program abstract syntax tree. This consists of zero or more statements that represents
/// a program.
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expression { pub struct Ast {
pub prog: Vec<Stmt>,
}
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Stmt {
/// Just a simple expression. This might be an assignment, a function call or a calculation.
Expr(Expr),
/// A variable declaration and assignment. (variable name, assigned value)
Let(String, Expr),
/// A while loop consisting of a condition and a body. (condition, body)
While(Expr, Ast),
/// A for loop consisting of an initialization declaration, a condition, an advancement and a
/// body. ((variable name, initial value), condition, advancement, body)
For((String, Expr), Expr, Expr, Ast),
/// If statement consisting of a condition, a true_body and a false_body.
/// (condition, true_body, false_body)
If(Expr, Ast, Ast),
/// Debug print the value of an expression (show the internal type together with the value)
DbgPrint(Expr),
/// Print the value of an expression
Print(Expr),
}
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expr {
/// Integer literal (64-bit) /// Integer literal (64-bit)
I64(i64), I64(i64),
/// String literal /// String literal
String(Sid), Str(Rc<String>),
/// Identifier (variable name)
/// Array with size as an expression Ident(String),
ArrayLiteral(Box<Expression>),
/// Array access with name, stackpos and position as expression
ArrayAccess(Sid, usize, Box<Expression>),
/// Function call with name, stackpos and the arguments as a vec of expressions
FunCall(Sid, usize, Vec<Expression>),
/// Variable with name and the stackpos from behind. This means that stackpos 0 refers to the
/// last variable on the stack and not the first
Var(Sid, usize),
/// Binary operation. Consists of type, left hand side and right hand side /// Binary operation. Consists of type, left hand side and right hand side
BinOp(BinOpType, Box<Expression>, Box<Expression>), BinOp(BinOpType, Box<Expr>, Box<Expr>),
/// Unary operation. Consists of type and operand /// Unary operation. Consists of type and the value that is operated on
UnOp(UnOpType, Box<Expression>), UnOp(UnOpType, Box<Expr>),
}
/// Ast Node for a loop
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct Loop {
/// The condition that determines if the loop should continue
pub condition: Option<Expression>,
/// This is executed after each loop to advance the condition variables
pub advancement: Option<Expression>,
/// The loop body that is executed each loop
pub body: BlockScope,
}
/// Ast Node for an if
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct If {
/// The condition
pub condition: Expression,
/// The body that is executed when condition is true
pub body_true: BlockScope,
/// The if body that is executed when the condition is false
pub body_false: BlockScope,
}
/// Ast Node for a function declaration
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct FunDecl {
/// The function name as StringID, stored in the stringstore
pub name: Sid,
/// The absolute position on the function stack where the function is stored
pub fun_stackpos: usize,
/// The argument names as StringIDs
pub argnames: Vec<Sid>,
/// The function body
pub body: Rc<BlockScope>,
}
/// Ast Node for a variable declaration
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct VarDecl {
/// The variable name as StringID, stored in the stringstore
pub name: Sid,
/// The absolute position on the variable stack where the variable is stored
pub var_stackpos: usize,
/// The right hand side that generates the initial value for the variable
pub rhs: Expression,
}
/// Ast Node for the possible Statement variants
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Statement {
/// Return from a function with the given result value as an expression
Return(Expression),
/// Break out of the current loop
Break,
/// End the current loop iteration early and continue with the next loop iteration
Continue,
/// A variable declaration
Declaration(VarDecl),
/// A function declaration
FunDeclare(FunDecl),
/// A simple expression. This could be a function call or an assignment for example
Expr(Expression),
/// A freestanding block scope
Block(BlockScope),
/// A loop
Loop(Loop),
/// An if
If(If),
/// A print statement that will output the value of the given expression to the terminal
Print(Expression),
}
/// A number of statements that form a block of code together
pub type BlockScope = Vec<Statement>;
/// A full abstract syntax tree
#[derive(Clone, Default)]
pub struct Ast {
/// The stringstore contains the actual string values which are replaced with StringIDs in the
/// Ast. So this is needed to get the actual strings later
pub stringstore: StringStore,
/// The main (top-level) code given as a number of statements
pub main: BlockScope,
}
impl BinOpType {
/// Get the precedence for a binary operator. Higher value means the OP is stronger binding.
/// For example Multiplication is stronger than addition, so Mul has higher precedence than Add.
///
/// The operator precedences are derived from the C language operator precedences. While not all
/// C operators are included or the exact same, the precedence oder is the same.
/// See: https://en.cppreference.com/w/c/language/operator_precedence
pub fn precedence(&self) -> u8 {
match self {
BinOpType::Assign => 1,
BinOpType::LOr => 2,
BinOpType::LAnd => 3,
BinOpType::BOr => 4,
BinOpType::BXor => 5,
BinOpType::BAnd => 6,
BinOpType::EquEqu | BinOpType::NotEqu => 7,
BinOpType::Less | BinOpType::LessEqu | BinOpType::Greater | BinOpType::GreaterEqu => 8,
BinOpType::Shl | BinOpType::Shr => 9,
BinOpType::Add | BinOpType::Sub => 10,
BinOpType::Mul | BinOpType::Div | BinOpType::Mod => 11,
}
}
} }

View File

@ -1,116 +0,0 @@
use crate::ast::{Ast, BlockScope, Expression, If, Loop, Statement, BinOpType, UnOpType, VarDecl};
/// A trait that allows to optimize an abstract syntax tree
pub trait AstOptimizer {
/// Consume an abstract syntax tree and return an ast that has the same functionality but with
/// optional optimizations.
fn optimize(ast: Ast) -> Ast;
}
/// A very simple optimizer that applies trivial optimizations like precalculation expressions that
/// have only literals as operands
pub struct SimpleAstOptimizer;
impl AstOptimizer for SimpleAstOptimizer {
fn optimize(mut ast: Ast) -> Ast {
Self::optimize_block(&mut ast.main);
ast
}
}
impl SimpleAstOptimizer {
fn optimize_block(block: &mut BlockScope) {
for stmt in block {
match stmt {
Statement::Expr(expr) => Self::optimize_expr(expr),
Statement::Block(block) => Self::optimize_block(block),
Statement::Loop(Loop {
condition,
advancement,
body,
}) => {
if let Some(condition) = condition {
Self::optimize_expr(condition);
}
if let Some(advancement) = advancement {
Self::optimize_expr(advancement)
}
Self::optimize_block(body);
}
Statement::If(If {
condition,
body_true,
body_false,
}) => {
Self::optimize_expr(condition);
Self::optimize_block(body_true);
Self::optimize_block(body_false);
}
Statement::Print(expr) => Self::optimize_expr(expr),
Statement::Declaration(VarDecl { name: _, var_stackpos: _, rhs}) => Self::optimize_expr(rhs),
Statement::FunDeclare(_) => (),
Statement::Return(expr) => Self::optimize_expr(expr),
Statement::Break | Statement::Continue => (),
}
}
}
fn optimize_expr(expr: &mut Expression) {
match expr {
Expression::BinOp(bo, lhs, rhs) => {
Self::optimize_expr(lhs);
Self::optimize_expr(rhs);
// Precalculate binary operations that consist of 2 literals. No need to do this at
// runtime, as all parts of the calculation are known at *compiletime* / parsetime.
match (lhs.as_mut(), rhs.as_mut()) {
(Expression::I64(lhs), Expression::I64(rhs)) => {
let new_expr = match bo {
BinOpType::Add => Expression::I64(*lhs + *rhs),
BinOpType::Mul => Expression::I64(*lhs * *rhs),
BinOpType::Sub => Expression::I64(*lhs - *rhs),
BinOpType::Div => Expression::I64(*lhs / *rhs),
BinOpType::Mod => Expression::I64(*lhs % *rhs),
BinOpType::BOr => Expression::I64(*lhs | *rhs),
BinOpType::BAnd => Expression::I64(*lhs & *rhs),
BinOpType::BXor => Expression::I64(*lhs ^ *rhs),
BinOpType::LAnd => Expression::I64(if (*lhs != 0) && (*rhs != 0) { 1 } else { 0 }),
BinOpType::LOr => Expression::I64(if (*lhs != 0) || (*rhs != 0) { 1 } else { 0 }),
BinOpType::Shr => Expression::I64(*lhs >> *rhs),
BinOpType::Shl => Expression::I64(*lhs << *rhs),
BinOpType::EquEqu => Expression::I64(if lhs == rhs { 1 } else { 0 }),
BinOpType::NotEqu => Expression::I64(if lhs != rhs { 1 } else { 0 }),
BinOpType::Less => Expression::I64(if lhs < rhs { 1 } else { 0 }),
BinOpType::LessEqu => Expression::I64(if lhs <= rhs { 1 } else { 0 }),
BinOpType::Greater => Expression::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Expression::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Assign => unreachable!(),
};
*expr = new_expr;
},
_ => ()
}
}
Expression::UnOp(uo, operand) => {
Self::optimize_expr(operand);
// Precalculate unary operations just like binary ones
match operand.as_mut() {
Expression::I64(val) => {
let new_expr = match uo {
UnOpType::Negate => Expression::I64(-*val),
UnOpType::BNot => Expression::I64(!*val),
UnOpType::LNot => Expression::I64(if *val == 0 { 1 } else { 0 }),
};
*expr = new_expr;
}
_ => (),
}
}
_ => (),
}
}
}

197
src/bytecode.rs Normal file
View File

@ -0,0 +1,197 @@
use std::collections::HashMap;
use crate::ast::{Ast, Expr, Stmt, BinOpType};
pub mod op {
type OpSize = u32;
pub const PUSH: OpSize = 0;
pub const POP: OpSize = 1;
pub const LOAD: OpSize = 2;
pub const STORE: OpSize = 3;
pub const ADD: OpSize = 4;
pub const SUB: OpSize = 5;
pub const MUL: OpSize = 6;
pub const DIV: OpSize = 7;
pub const MOD: OpSize = 8;
pub const BOR: OpSize = 9;
pub const BAND: OpSize = 10;
pub const BXOR: OpSize = 11;
pub const SHL: OpSize = 12;
pub const SHR: OpSize = 13;
pub const EQ: OpSize = 14;
pub const NEQ: OpSize = 15;
pub const GT: OpSize = 16;
pub const GE: OpSize = 17;
pub const LT: OpSize = 18;
pub const LE: OpSize = 19;
pub const JUMP: OpSize = 20;
pub const JUMP_TRUE: OpSize = 21;
pub const JUMP_FALSE: OpSize = 22;
pub const PRINT: OpSize = 23;
pub const DBG_PRINT: OpSize = 24;
}
#[derive(Debug, Default)]
pub struct Compiler {
ops: Vec<u32>,
global_vars: HashMap<String, u16>,
}
impl Compiler {
pub fn new() -> Self {
Compiler::default()
}
pub fn compile(&mut self, ast: &Ast) {
for stmt in &ast.prog {
match stmt {
Stmt::Expr(expr) => {
self.compile_expr(expr);
self.ops.push(op::POP);
}
Stmt::Let(name, rhs) => {
let id = self.global_vars.len() as u16;
self.global_vars.insert(name.clone(), id);
self.compile_expr(rhs);
self.gen_store(id);
}
Stmt::While(cond, body) => {
self.compile_expr(cond);
self.ops.push(op::JUMP_FALSE);
let idx_jmp = self.ops.len();
self.gen_i64(0);
let idx_start = self.ops.len();
self.compile(body);
// check condition before loop jump
self.compile_expr(cond);
self.ops.push(op::JUMP_TRUE);
self.gen_i64(idx_start as i64);
self.overwrite_i64(idx_jmp, self.ops.len() as i64);
}
Stmt::For(_, _, _, _) => todo!(),
Stmt::If(cond, if_block, else_block) => {
self.compile_expr(cond);
self.ops.push(op::JUMP_FALSE);
let idx_if = self.ops.len();
self.gen_i64(0);
self.compile(if_block);
self.ops.push(op::JUMP);
let idx_else = self.ops.len();
self.gen_i64(0);
self.overwrite_i64(idx_if, self.ops.len() as i64);
self.compile(else_block);
self.overwrite_i64(idx_else, self.ops.len() as i64);
},
Stmt::DbgPrint(expr) => {
self.compile_expr(expr);
self.ops.push(op::DBG_PRINT);
}
Stmt::Print(expr) => {
self.compile_expr(expr);
self.ops.push(op::PRINT);
}
}
}
}
pub fn into_ops(self) -> Vec<u32> {
self.ops
}
pub fn compile_expr(&mut self, expr: &Expr) {
match expr {
Expr::I64(val) => {
self.ops.push(op::PUSH);
self.gen_i64(*val)
}
Expr::Ident(name) => {
match self.global_vars.get(name).copied() {
Some(addr) => self.gen_load(addr),
None => panic!("Variable '{}' used before declaration", name),
}
},
Expr::BinOp(bo, lhs, rhs) => self.compile_binop(bo, lhs, rhs),
Expr::UnOp(_, _) => todo!(),
Expr::Str(_) => todo!(),
}
}
fn compile_binop(&mut self, bo: &BinOpType, lhs: &Expr, rhs: &Expr) {
if matches!(bo, BinOpType::Assign) {
self.compile_expr(rhs);
if let Expr::Ident(name) = lhs {
let addr = *self.global_vars.get(name).expect("Trying to assign var before decl");
self.gen_store(addr);
} else {
panic!("Trying to assign value to rvalue");
}
return;
}
self.compile_expr(lhs);
self.compile_expr(rhs);
match bo {
BinOpType::Add => self.ops.push(op::ADD),
BinOpType::Sub => self.ops.push(op::SUB),
BinOpType::Mul => self.ops.push(op::MUL),
BinOpType::Div => self.ops.push(op::DIV),
BinOpType::Mod => self.ops.push(op::MOD),
BinOpType::BOr => self.ops.push(op::BOR),
BinOpType::BAnd => self.ops.push(op::BAND),
BinOpType::BXor => self.ops.push(op::BXOR),
BinOpType::Shl => self.ops.push(op::SHL),
BinOpType::Shr => self.ops.push(op::SHR),
BinOpType::Equ => self.ops.push(op::EQ),
BinOpType::Neq => self.ops.push(op::NEQ),
BinOpType::Gt => self.ops.push(op::GT),
BinOpType::Ge => self.ops.push(op::GE),
BinOpType::Lt => self.ops.push(op::LT),
BinOpType::Le => self.ops.push(op::LE),
BinOpType::Assign => unreachable!(),
}
}
fn gen_i64(&mut self, val: i64) {
self.ops.push((val & u32::MAX as i64) as u32);
self.ops.push((val >> 32) as u32);
}
fn overwrite_i64(&mut self, idx: usize, val: i64) {
self.ops[idx] = (val & u32::MAX as i64) as u32;
self.ops[idx+1] = (val >> 32) as u32;
}
fn gen_load(&mut self, addr: u16) {
self.ops.push(op::LOAD | (addr << 8) as u32);
// self.gen_i64(addr as i64)
}
fn gen_store(&mut self, addr: u16) {
self.ops.push(op::STORE | (addr << 8) as u32);
// self.gen_i64(addr as i64)
}
}
pub fn compile(ast: &Ast) -> Vec<u32> {
let mut compiler = Compiler::new();
compiler.compile(ast);
compiler.into_ops()
}

View File

@ -1,553 +1,175 @@
use std::{cell::RefCell, rc::Rc}; use std::{collections::HashMap, fmt::Display, rc::Rc};
use thiserror::Error;
use crate::{ use crate::{
ast::{Ast, BinOpType, BlockScope, Expression, FunDecl, If, Statement, UnOpType}, ast::{Ast, BinOpType, Expr, Stmt, UnOpType},
astoptimizer::{AstOptimizer, SimpleAstOptimizer},
lexer::lex, lexer::lex,
nice_panic,
parser::parse, parser::parse,
stringstore::{Sid, StringStore},
}; };
/// Runtime errors that can occur during execution
#[derive(Debug, Error)]
pub enum RuntimeError {
#[error("Invalid array Index: {0:?}")]
InvalidArrayIndex(Value),
#[error("Variable used but not declared: {0}")]
VarUsedNotDeclared(String),
#[error("Can't index into non-array variable: {0}")]
TryingToIndexNonArray(String),
#[error("Invalid value type for unary operation: {0:?}")]
UnOpInvalidType(Value),
#[error("Incompatible binary operations. Operands don't match: {0:?} and {1:?}")]
BinOpIncompatibleTypes(Value, Value),
#[error("Array access out of bounds: Accessed {0}, size is {1}")]
ArrayOutOfBounds(usize, usize),
#[error("Division by zero")]
DivideByZero,
#[error("Invalid number of arguments for function {0}. Expected {1}, got {2}")]
InvalidNumberOfArgs(String, usize, usize),
}
/// Possible variants for the values
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Value { pub enum Value {
/// 64-bit integer value
I64(i64), I64(i64),
/// String value Str(Rc<String>),
String(Sid),
/// Array value
Array(Rc<RefCell<Vec<Value>>>),
/// Void value
Void,
} }
/// The exit type of a block. When a block ends, the exit type specified why the block ended.
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum BlockExit {
/// Normal exit when the block just ends normally (no returns / breaks / continues / etc.)
Normal,
/// The block ended through a break statement. This will be propagated up to the next loop
/// and cause it to fully terminate
Break,
/// The block ended through a continue statement. This will be propagated up to the next loop
/// and cause it to start the next iteration
Continue,
/// The block ended through a return statement. This will propagate up to the next function
/// body end
Return(Value),
}
#[derive(Default)]
pub struct Interpreter { pub struct Interpreter {
/// Run the SimpleAstOptimizer over the Ast before executing /// The variable table maps all variables by their names to their values
pub optimize_ast: bool, vartable: HashMap<String, Value>,
/// Print the tokens after lexing
pub print_tokens: bool,
/// Print the ast after parsing
pub print_ast: bool,
/// Capture the output values of print statements instead of printing them to the terminal
pub capture_output: bool,
/// The stored values that were captured
output: Vec<Value>,
/// Variable table stores the runtime values of variables as a stack
vartable: Vec<Value>,
/// Function table stores the functions during runtime as a stack
funtable: Vec<FunDecl>,
/// The stringstore contains all strings used throughout the program
stringstore: StringStore,
} }
impl Interpreter { impl Interpreter {
/// Create a new Interpreter
pub fn new() -> Self { pub fn new() -> Self {
Self { let vartable = HashMap::new();
optimize_ast: true, Self { vartable }
..Self::default()
}
} }
/// Get the captured output pub fn run_text(&mut self, code: &str, print_tokens: bool, print_ast: bool) {
pub fn output(&self) -> &[Value] { let tokens = lex(code);
&self.output if print_tokens {
}
/// Try to retrieve a variable value from the varstack. The idx is the index from the back of
/// the stack. So 0 is the last value, not the first
fn get_var(&self, idx: usize) -> Option<Value> {
self.vartable.get(self.vartable.len() - idx - 1).cloned()
}
/// Try to retrieve a mutable reference to a variable value from the varstack. The idx is the
/// index from the back of the stack. So 0 is the last value, not the first
fn get_var_mut(&mut self, idx: usize) -> Option<&mut Value> {
let idx = self.vartable.len() - idx - 1;
self.vartable.get_mut(idx)
}
/// Lex, parse and then run the given sourecode. This will terminate the program when an error
/// occurs and print an appropriate error message.
pub fn run_str(&mut self, code: &str) {
// Lex the tokens
let tokens = match lex(code) {
Ok(tokens) => tokens,
Err(e) => nice_panic!("Lexing error: {}", e),
};
if self.print_tokens {
println!("Tokens: {:?}", tokens); println!("Tokens: {:?}", tokens);
} }
let ast = parse(tokens);
// Parse the ast if print_ast {
let ast = match parse(tokens) { println!("Ast:\n{:#?}", ast);
Ok(ast) => ast,
Err(e) => nice_panic!("Parsing error: {}", e),
};
// Run the ast
match self.run_ast(ast) {
Ok(_) => (),
Err(e) => nice_panic!("Runtime error: {}", e),
} }
self.run(&ast);
} }
/// Execute the given Ast within the interpreter pub fn run(&mut self, prog: &Ast) {
pub fn run_ast(&mut self, mut ast: Ast) -> Result<(), RuntimeError> { for stmt in &prog.prog {
// Optimize the ast
if self.optimize_ast {
ast = SimpleAstOptimizer::optimize(ast);
}
if self.print_ast {
println!("{:#?}", ast.main);
}
// Take over the stringstore of the given ast
self.stringstore = ast.stringstore;
// Run the top level block (the main)
self.run_block(&ast.main)?;
Ok(())
}
/// Run all statements in the given block
pub fn run_block(&mut self, prog: &BlockScope) -> Result<BlockExit, RuntimeError> {
self.run_block_fp_offset(prog, 0)
}
/// Same as run_block, but with an additional framepointer offset. This allows to free more
/// values from the stack than normally and can be used when passing arguments inside a
/// function body scope from the outside
pub fn run_block_fp_offset(
&mut self,
prog: &BlockScope,
framepointer_offset: usize,
) -> Result<BlockExit, RuntimeError> {
let framepointer = self.vartable.len() - framepointer_offset;
let mut block_exit = BlockExit::Normal;
'blockloop: for stmt in prog {
match stmt { match stmt {
Statement::Break => return Ok(BlockExit::Break), Stmt::Expr(expr) => {
Statement::Continue => return Ok(BlockExit::Continue), self.resolve_expr(expr);
Statement::Return(expr) => {
let val = self.resolve_expr(expr)?;
block_exit = BlockExit::Return(val);
break 'blockloop;
} }
Stmt::DbgPrint(expr) => {
Statement::Expr(expr) => { let result = self.resolve_expr(expr);
self.resolve_expr(expr)?; println!("{:?}", result);
} }
Stmt::Print(expr) => {
Statement::Declaration(decl) => { let result = self.resolve_expr(expr);
let rhs = self.resolve_expr(&decl.rhs)?; print!("{}", result);
self.vartable.push(rhs);
} }
Stmt::Let(name, rhs) => {
Statement::Block(block) => match self.run_block(block)? { let result = self.resolve_expr(rhs);
// Propagate return, continue and break self.vartable.insert(name.clone(), result);
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
block_exit = be;
break 'blockloop;
} }
_ => (), Stmt::For(init, condition, advance, body) => {
}, // Execute initital let instruction
let init_val = self.resolve_expr(&init.1);
self.vartable.insert(init.0.clone(), init_val);
Statement::Loop(looop) => {
// loop runs as long condition != 0
loop { loop {
// Check the loop condition // Check condition
if let Some(condition) = &looop.condition { match self.resolve_expr(condition) {
if matches!(self.resolve_expr(condition)?, Value::I64(0)) { Value::I64(val) if val == 0 => break,
break; Value::I64(_) => (),
}
Value::Str(text) if text.is_empty() => break,
Value::Str(_) => (),
} }
// Run the body // Execute loop body
let be = self.run_block(&looop.body)?; self.run(body);
match be {
// Propagate return // Execute advancement
be @ BlockExit::Return(_) => { self.resolve_expr(advance);
block_exit = be;
break 'blockloop;
} }
BlockExit::Break => break, }
BlockExit::Continue | BlockExit::Normal => (), Stmt::While(condition, body) => {
loop {
// Check condition
match self.resolve_expr(condition) {
Value::I64(val) if val == 0 => break,
Value::I64(_) => (),
Value::Str(text) if text.is_empty() => break,
Value::Str(_) => (),
} }
// Run the advancement // Execute loop body
if let Some(adv) = &looop.advancement { self.run(body);
self.resolve_expr(&adv)?;
} }
} }
} Stmt::If(condition, body_if, body_else) => {
if matches!(self.resolve_expr(condition), Value::I64(0)) {
Statement::Print(expr) => { self.run(body_else);
let result = self.resolve_expr(expr)?;
if self.capture_output {
self.output.push(result)
} else { } else {
print!("{}", self.value_to_string(&result)); self.run(body_if);
}
}
}
} }
} }
Statement::If(If { fn resolve_expr(&mut self, expr: &Expr) -> Value {
condition, match expr {
body_true, Expr::I64(val) => Value::I64(*val),
body_false, Expr::Str(name) => Value::Str(name.clone()),
}) => { Expr::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, &lhs, &rhs),
// Run the right block depending on the conditions result being 0 or not Expr::UnOp(uo, val) => self.resolve_unop(uo, &val),
let exit = if matches!(self.resolve_expr(condition)?, Value::I64(0)) { Expr::Ident(name) => match self.vartable.get(name) {
self.run_block(body_false)? None => panic!("Runtime error: Use of undeclared variable '{}'", name),
} else { Some(val) => val.clone(),
self.run_block(body_true)? },
};
match exit {
// Propagate return, continue and break
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
block_exit = be;
break 'blockloop;
}
_ => (),
} }
} }
Statement::FunDeclare(fundec) => { fn resolve_binop(&mut self, bo: &BinOpType, lhs: &Expr, rhs: &Expr) -> Value {
self.funtable.push(fundec.clone()); // Treat assignment separate from the other expressions
if matches!(bo, BinOpType::Assign) {
match lhs {
Expr::Ident(name) => {
let rhs = self.resolve_expr(rhs);
self.vartable.get_mut(name).map(|var| *var = rhs.clone());
return rhs;
} }
_ => panic!("Runtime error: Left hand side of assignment must be an identifier"),
} }
} }
self.vartable.truncate(framepointer); let lhs = self.resolve_expr(lhs);
let rhs = self.resolve_expr(rhs);
Ok(block_exit)
}
/// Execute the given expression to retrieve the resulting value
fn resolve_expr(&mut self, expr: &Expression) -> Result<Value, RuntimeError> {
let val = match expr {
Expression::I64(val) => Value::I64(*val),
Expression::ArrayLiteral(size) => {
let size = match self.resolve_expr(size)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
Value::Array(Rc::new(RefCell::new(vec![Value::I64(0); size as usize])))
}
Expression::String(text) => Value::String(text.clone()),
Expression::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, lhs, rhs)?,
Expression::UnOp(uo, operand) => self.resolve_unop(uo, operand)?,
Expression::Var(name, idx) => self.resolve_var(*name, *idx)?,
Expression::ArrayAccess(name, idx, arr_idx) => {
self.resolve_array_access(*name, *idx, arr_idx)?
}
Expression::FunCall(fun_name, fun_stackpos, args) => {
let args_len = args.len();
// All of the arg expressions must be resolved before pushing the vars on the stack,
// otherwise the stack positions are incorrect while resolving
let args = args
.iter()
.map(|arg| self.resolve_expr(arg))
.collect::<Vec<_>>();
for arg in args {
self.vartable.push(arg?);
}
// Function existance has been verified in the parser, so unwrap here shouldn't fail
let expected_num_args = self.funtable.get(*fun_stackpos).unwrap().argnames.len();
// Check if the number of provided arguments matches the number of expected arguments
if expected_num_args != args_len {
let fun_name = self
.stringstore
.lookup(*fun_name)
.cloned()
.unwrap_or("<unknown>".to_string());
return Err(RuntimeError::InvalidNumberOfArgs(
fun_name,
expected_num_args,
args_len,
));
}
// Run the function body and return the BlockExit type
match self.run_block_fp_offset(
&Rc::clone(&self.funtable.get(*fun_stackpos).unwrap().body),
expected_num_args,
)? {
BlockExit::Normal | BlockExit::Continue | BlockExit::Break => Value::Void,
BlockExit::Return(val) => val,
}
}
};
Ok(val)
}
/// Retrive the value of a given array at the specified index from the varstack. The name is
/// given as a StringID and is used to reference the variable name in case of an error. The
/// idx is the stackpos where the array variable should be located and the arr_idx is the
/// actual array access index, given as an expression.
fn resolve_array_access(
&mut self,
name: Sid,
idx: usize,
arr_idx: &Expression,
) -> Result<Value, RuntimeError> {
// Resolve the array index into a value and check if it is a valid array index
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
// Get the array value
let val = match self.get_var(idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Make sure it is an array
let arr = match val {
Value::Array(arr) => arr,
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Get the value of the requested cell inside the array
let arr = arr.borrow();
arr.get(arr_idx as usize)
.cloned()
.ok_or(RuntimeError::ArrayOutOfBounds(arr_idx as usize, arr.len()))
}
/// Retrive the value of a given variable from the varstack. The name is given as a StringID
/// and is used to reference the variable name in case of an error. The idx is the stackpos
/// where the variable should be located
fn resolve_var(&mut self, name: Sid, idx: usize) -> Result<Value, RuntimeError> {
match self.get_var(idx) {
Some(val) => Ok(val),
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
}
}
/// Execute a unary operation and get the resulting value
fn resolve_unop(&mut self, uo: &UnOpType, operand: &Expression) -> Result<Value, RuntimeError> {
// Recursively resolve the operands expression into an actual value
let operand = self.resolve_expr(operand)?;
// Perform the correct operation, considering the operation and value type
Ok(match (operand, uo) {
(Value::I64(val), UnOpType::Negate) => Value::I64(-val),
(Value::I64(val), UnOpType::BNot) => Value::I64(!val),
(Value::I64(val), UnOpType::LNot) => Value::I64(if val == 0 { 1 } else { 0 }),
(val, _) => return Err(RuntimeError::UnOpInvalidType(val)),
})
}
/// Execute a binary operation and get the resulting value
fn resolve_binop(
&mut self,
bo: &BinOpType,
lhs: &Expression,
rhs: &Expression,
) -> Result<Value, RuntimeError> {
let rhs = self.resolve_expr(rhs)?;
// Handle assignments separate from the other binary operations
match (&bo, &lhs) {
// Normal variable assignment
(BinOpType::Assign, Expression::Var(name, idx)) => {
// Get the variable mutably and assign the right hand side value
match self.get_var_mut(*idx) {
Some(val) => *val = rhs.clone(),
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
}
return Ok(rhs);
}
// Array index assignment
(BinOpType::Assign, Expression::ArrayAccess(name, idx, arr_idx)) => {
// Calculate the array index
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
// Get the mutable ref to the array variable match (lhs, rhs) {
let val = match self.get_var_mut(*idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Verify that it actually is an array
match val {
// Assign the right hand side value to the array it the given index
Value::Array(arr) => arr.borrow_mut()[arr_idx as usize] = rhs.clone(),
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
}
return Ok(rhs);
}
_ => (),
}
// This code is only executed if the binop is not an assignment as the assignments return
// early
// Resolve the left hand side to the value
let lhs = self.resolve_expr(lhs)?;
// Perform the appropriate calculations considering the operation type and datatypes of the
// two values
let result = match (lhs, rhs) {
(Value::I64(lhs), Value::I64(rhs)) => match bo { (Value::I64(lhs), Value::I64(rhs)) => match bo {
BinOpType::Add => Value::I64(lhs + rhs), BinOpType::Add => Value::I64(lhs + rhs),
BinOpType::Mul => Value::I64(lhs * rhs), BinOpType::Mul => Value::I64(lhs * rhs),
BinOpType::Sub => Value::I64(lhs - rhs), BinOpType::Sub => Value::I64(lhs - rhs),
BinOpType::Div => { BinOpType::Div => Value::I64(lhs / rhs),
Value::I64(lhs.checked_div(rhs).ok_or(RuntimeError::DivideByZero)?) BinOpType::Mod => Value::I64(lhs % rhs),
}
BinOpType::Mod => {
Value::I64(lhs.checked_rem(rhs).ok_or(RuntimeError::DivideByZero)?)
}
BinOpType::BOr => Value::I64(lhs | rhs), BinOpType::BOr => Value::I64(lhs | rhs),
BinOpType::BAnd => Value::I64(lhs & rhs), BinOpType::BAnd => Value::I64(lhs & rhs),
BinOpType::BXor => Value::I64(lhs ^ rhs), BinOpType::BXor => Value::I64(lhs ^ rhs),
BinOpType::LAnd => Value::I64(if (lhs != 0) && (rhs != 0) { 1 } else { 0 }),
BinOpType::LOr => Value::I64(if (lhs != 0) || (rhs != 0) { 1 } else { 0 }),
BinOpType::Shr => Value::I64(lhs >> rhs), BinOpType::Shr => Value::I64(lhs >> rhs),
BinOpType::Shl => Value::I64(lhs << rhs), BinOpType::Shl => Value::I64(lhs << rhs),
BinOpType::EquEqu => Value::I64(if lhs == rhs { 1 } else { 0 }), BinOpType::Equ => Value::I64(if lhs == rhs { 1 } else { 0 }),
BinOpType::NotEqu => Value::I64(if lhs != rhs { 1 } else { 0 }), BinOpType::Neq => Value::I64(if lhs != rhs { 1 } else { 0 }),
BinOpType::Less => Value::I64(if lhs < rhs { 1 } else { 0 }), BinOpType::Gt => Value::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::LessEqu => Value::I64(if lhs <= rhs { 1 } else { 0 }), BinOpType::Ge => Value::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Greater => Value::I64(if lhs > rhs { 1 } else { 0 }), BinOpType::Lt => Value::I64(if lhs < rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Value::I64(if lhs >= rhs { 1 } else { 0 }), BinOpType::Le => Value::I64(if lhs <= rhs { 1 } else { 0 }),
BinOpType::Assign => unreachable!(), BinOpType::Assign => unreachable!(),
}, },
(lhs, rhs) => return Err(RuntimeError::BinOpIncompatibleTypes(lhs, rhs)), _ => panic!("Value types are not compatible"),
}; }
Ok(result)
} }
/// Get a string representation of the given value. This uses the interpreters StringStore to fn resolve_unop(&mut self, uo: &UnOpType, val: &Expr) -> Value {
/// retrive the text values of Strings let val = self.resolve_expr(val);
fn value_to_string(&self, val: &Value) -> String {
match val { match val {
Value::I64(val) => format!("{}", val), Value::I64(val) => match uo {
Value::Array(val) => format!("{:?}", val.borrow()), UnOpType::Neg => Value::I64(-val),
Value::String(text) => format!( },
"{}", _ => panic!("Invalid unary operation for type"),
self.stringstore }
.lookup(*text) }
.unwrap_or(&"<invalid string>".to_string()) }
),
Value::Void => format!("void"), impl Display for Value {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Value::I64(val) => write!(f, "{}", val),
Value::Str(text) => write!(f, "{}", text),
} }
} }
} }
@ -555,34 +177,27 @@ impl Interpreter {
#[cfg(test)] #[cfg(test)]
mod test { mod test {
use super::{Interpreter, Value}; use super::{Interpreter, Value};
use crate::ast::{BinOpType, Expression}; use crate::ast::{BinOpType, Expr};
/// Simple test to check if a simple expression is executed properly.
/// Full system tests from lexing to execution can be found in `lib.rs`
#[test] #[test]
fn test_interpreter_expr() { fn test_interpreter_expr() {
// Expression: 1 + 2 * 3 + 4 // Expression: 1 + 2 * 3 + 4
// With precedence: (1 + (2 * 3)) + 4 // With precedence: (1 + (2 * 3)) + 4
let ast = Expression::BinOp( let ast = Expr::BinOp(
BinOpType::Add, BinOpType::Add,
Expression::BinOp( Expr::BinOp(
BinOpType::Add, BinOpType::Add,
Expression::I64(1).into(), Expr::I64(1).into(),
Expression::BinOp( Expr::BinOp(BinOpType::Mul, Expr::I64(2).into(), Expr::I64(3).into()).into(),
BinOpType::Mul,
Expression::I64(2).into(),
Expression::I64(3).into(),
) )
.into(), .into(),
) Expr::I64(4).into(),
.into(),
Expression::I64(4).into(),
); );
let expected = Value::I64(11); let expected = Value::I64(11);
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
let actual = interpreter.resolve_expr(&ast).unwrap(); let actual = interpreter.resolve_expr(&ast);
assert_eq!(expected, actual); assert_eq!(expected, actual);
} }

View File

@ -1,131 +1,115 @@
use std::{iter::Peekable, str::Chars}; use std::{iter::Peekable, str::Chars};
use thiserror::Error;
use crate::{token::Token, T}; use crate::token::{Keyword, Literal, Token};
/// Errors that can occur while lexing a given string
#[derive(Debug, Error)]
pub enum LexErr {
#[error("Failed to parse '{0}' as i64")]
NumericParse(String),
#[error("Invalid escape character '\\{0}'")]
InvalidStrEscape(char),
#[error("Lexer encountered unexpected char: '{0}'")]
UnexpectedChar(char),
#[error("Missing closing string quote '\"'")]
MissingClosingString,
}
/// Lex the provided code into a Token Buffer /// Lex the provided code into a Token Buffer
pub fn lex(code: &str) -> Result<Vec<Token>, LexErr> { pub fn lex(code: &str) -> Vec<Token> {
let lexer = Lexer::new(code); let mut lexer = Lexer::new(code);
lexer.lex() lexer.lex()
} }
/// The lexer is created from a reference to a sourcecode string and is consumed to create a token
/// buffer from that sourcecode.
struct Lexer<'a> { struct Lexer<'a> {
/// The sourcecode text as a peekable iterator over the chars. Peekable allows for look-ahead
/// and the use of the Chars iterator allows to support unicode characters
code: Peekable<Chars<'a>>, code: Peekable<Chars<'a>>,
/// The lexed tokens
tokens: Vec<Token>,
/// The sourcecode character that is currently being lexed
current_char: char,
} }
impl<'a> Lexer<'a> { impl<'a> Lexer<'a> {
/// Create a new lexer from the given sourcecode
fn new(code: &'a str) -> Self { fn new(code: &'a str) -> Self {
let code = code.chars().peekable(); let code = code.chars().peekable();
let tokens = Vec::new(); Self { code }
let current_char = '\0';
Self {
code,
tokens,
current_char,
}
} }
/// Consume the lexer and try to lex the contained sourcecode into a token buffer /// Advance to next character and return the removed char. If there is no next char, '\0'
fn lex(mut self) -> Result<Vec<Token>, LexErr> { /// is returned.
fn next(&mut self) -> char {
self.code.next().unwrap_or('\0')
}
/// Get the next character without removing it. If there is no next char, '\0' is returned.
fn peek(&mut self) -> char {
self.code.peek().copied().unwrap_or('\0')
}
fn lex(&mut self) -> Vec<Token> {
let mut tokens = Vec::new();
loop { loop {
self.current_char = self.next(); match self.next() {
// Match on the current and next character. This gives a 1-char look-ahead and // End of text
// can be used to directly match 2-char tokens '\0' => break,
match (self.current_char, self.peek()) {
// Stop lexing at EOF
('\0', _) => break,
// Skip / ignore whitespace // Skip whitespace
(' ' | '\t' | '\n' | '\r', _) => (), ' ' | '\r' | '\n' | '\t' => (),
// Line comment. Consume every char until linefeed (next line) // Handle tokens that span two characters
('/', '/') => while !matches!(self.next(), '\n' | '\0') {}, '>' if matches!(self.peek(), '>') => {
self.next();
tokens.push(Token::Shr);
}
'<' if matches!(self.peek(), '<') => {
self.next();
tokens.push(Token::Shl);
}
'=' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::Equ);
}
'!' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::Neq);
}
'<' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::Le);
}
'>' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::Ge);
}
'$' if matches!(self.peek(), '$') => {
self.next();
tokens.push(Token::DoubleDollar);
}
// Double character tokens // Handle tokens that span one character
('>', '>') => self.push_tok_consume(T![>>]), '+' => tokens.push(Token::Add),
('<', '<') => self.push_tok_consume(T![<<]), '-' => tokens.push(Token::Sub),
('=', '=') => self.push_tok_consume(T![==]), '*' => tokens.push(Token::Mul),
('!', '=') => self.push_tok_consume(T![!=]), '/' => tokens.push(Token::Div),
('<', '=') => self.push_tok_consume(T![<=]), '%' => tokens.push(Token::Mod),
('>', '=') => self.push_tok_consume(T![>=]), '|' => tokens.push(Token::BOr),
('<', '-') => self.push_tok_consume(T![<-]), '&' => tokens.push(Token::BAnd),
('&', '&') => self.push_tok_consume(T![&&]), '^' => tokens.push(Token::BXor),
('|', '|') => self.push_tok_consume(T![||]), '(' => tokens.push(Token::LParen),
')' => tokens.push(Token::RParen),
'<' => tokens.push(Token::Lt),
'>' => tokens.push(Token::Gt),
'=' => tokens.push(Token::Assign),
';' => tokens.push(Token::Semicolon),
'{' => tokens.push(Token::LBrace),
'}' => tokens.push(Token::RBrace),
'$' => tokens.push(Token::Dollar),
// Single character tokens // Handle special multicharacter tokens
(',', _) => self.push_tok(T![,]),
(';', _) => self.push_tok(T![;]),
('+', _) => self.push_tok(T![+]),
('-', _) => self.push_tok(T![-]),
('*', _) => self.push_tok(T![*]),
('/', _) => self.push_tok(T![/]),
('%', _) => self.push_tok(T![%]),
('|', _) => self.push_tok(T![|]),
('&', _) => self.push_tok(T![&]),
('^', _) => self.push_tok(T![^]),
('(', _) => self.push_tok(T!['(']),
(')', _) => self.push_tok(T![')']),
('~', _) => self.push_tok(T![~]),
('<', _) => self.push_tok(T![<]),
('>', _) => self.push_tok(T![>]),
('=', _) => self.push_tok(T![=]),
('{', _) => self.push_tok(T!['{']),
('}', _) => self.push_tok(T!['}']),
('!', _) => self.push_tok(T![!]),
('[', _) => self.push_tok(T!['[']),
(']', _) => self.push_tok(T![']']),
// Special tokens with variable length // Lex numbers
ch @ '0'..='9' => tokens.push(self.lex_number(ch)),
// Lex multiple characters together as numbers // Lex strings
('0'..='9', _) => self.lex_number()?, '"' => tokens.push(self.lex_string()),
// Lex multiple characters together as a string // Lex identifiers
('"', _) => self.lex_str()?, ch @ ('a'..='z' | 'A'..='Z' | '_') => tokens.push(self.lex_ident(ch)),
// Lex multiple characters together as identifier or keyword // Any other character is unexpected
('a'..='z' | 'A'..='Z' | '_', _) => self.lex_identifier()?, ch => panic!("Lexer encountered unexpected char: '{}'", ch),
// Any character that was not handled otherwise is invalid
(ch, _) => Err(LexErr::UnexpectedChar(ch))?,
} }
} }
Ok(self.tokens) tokens
} }
/// Lex multiple characters as a number until encountering a non numeric digit. The fn lex_number(&mut self, first_char: char) -> Token {
/// successfully lexed i64 literal token is appended to the stored tokens. let mut sval = String::from(first_char);
fn lex_number(&mut self) -> Result<(), LexErr> {
// String representation of the integer value
let mut sval = String::from(self.current_char);
// Do as long as a next char exists and it is a numeric char // Do as long as a next char exists and it is a numeric char
loop { loop {
@ -143,171 +127,112 @@ impl<'a> Lexer<'a> {
} }
} }
// Try to convert the string representation of the value to i64. The error is mapped to // TODO: We only added numeric chars to the string, but the conversion could still fail
// the appropriate LexErr Token::Literal(Literal::I64(sval.parse().unwrap()))
let i64val = sval.parse().map_err(|_| LexErr::NumericParse(sval))?;
self.push_tok(T![i64(i64val)]);
Ok(())
} }
/// Lex characters as a string until encountering an unescaped closing doublequoute char '"'. /// Lex an identifier from the character stream. The first char has to have been consumed
/// The successfully lexed string literal token is appended to the stored tokens. /// from the stream already and is passed as an argument instead.
fn lex_str(&mut self) -> Result<(), LexErr> { fn lex_ident(&mut self, first_char: char) -> Token {
// The opening " was consumed in match, so a fresh string can be used let mut ident = String::from(first_char);
// Do as long as a next char exists and it is a valid ident char
while let 'a'..='z' | 'A'..='Z' | '_' | '0'..='9' = self.peek() {
// The next char is verified to be Some, so unwrap is safe
ident.push(self.next());
}
// Check if the identifier is a keyword
match ident.as_str() {
"true" => Token::Literal(Literal::I64(1)),
"false" => Token::Literal(Literal::I64(0)),
"let" => Token::Keyword(Keyword::Let),
"while" => Token::Keyword(Keyword::While),
"if" => Token::Keyword(Keyword::If),
"else" => Token::Keyword(Keyword::Else),
"for" => Token::Keyword(Keyword::For),
_ => Token::Ident(ident),
}
}
/// Lex a string token from the character stream. This requires the initial quote '"' to be
/// consumed before.
fn lex_string(&mut self) -> Token {
let mut text = String::new(); let mut text = String::new();
// Read all chars until encountering the closing " let mut escape = false;
// Do as long as a next char exists and it is not '"'
loop { loop {
match self.peek() { if escape {
// An unescaped doubleqoute ends the current string escape = false;
'"' => break,
// If the end of file is reached while still waiting for '"', error out // Escape characters
'\0' => Err(LexErr::MissingClosingString)?, match self.next() {
'\\' => text.push('\\'),
_ => match self.next() {
// Backslash indicates an escaped character, so consume one more char and
// treat it as the escaped char
'\\' => match self.next() {
'n' => text.push('\n'), 'n' => text.push('\n'),
'r' => text.push('\r'), 'r' => text.push('\r'),
't' => text.push('\t'), 't' => text.push('\t'),
'\\' => text.push('\\'), ch => panic!("Invalid string escape: '{:?}'", ch),
'"' => text.push('"'),
// If the escaped char is not handled, it is unsupported and an error
ch => Err(LexErr::InvalidStrEscape(ch))?,
},
// All other characters are simply appended to the string
ch => text.push(ch),
},
} }
} } else {
// Consume closing "
self.next();
self.push_tok(T![str(text)]);
Ok(())
}
/// Lex characters from the text as an identifier. The successfully lexed ident or keyword
/// token is appended to the stored tokens.
fn lex_identifier(&mut self) -> Result<(), LexErr> {
let mut ident = String::from(self.current_char);
// Do as long as a next char exists and it is a valid char for an identifier
loop {
match self.peek() { match self.peek() {
// In the middle of an identifier numbers are also allowed // Doublequote '"' ends the string lexing
'a'..='z' | 'A'..='Z' | '0'..='9' | '_' => { '"' => {
ident.push(self.next());
}
// Next char is not valid, so stop and finish the ident token
_ => break,
}
}
// Check for pre-defined keywords
let token = match ident.as_str() {
"loop" => T![loop],
"print" => T![print],
"if" => T![if],
"else" => T![else],
"fun" => T![fun],
"return" => T![return],
"break" => T![break],
"continue" => T![continue],
// If it doesn't match a keyword, it is a normal identifier
_ => T![ident(ident)],
};
self.push_tok(token);
Ok(())
}
/// Push the given token into the stored tokens
fn push_tok(&mut self, token: Token) {
self.tokens.push(token);
}
/// Same as `push_tok` but also consumes the next token, removing it from the code iter. This
/// is useful when lexing double char tokens where the second token has only been peeked.
fn push_tok_consume(&mut self, token: Token) {
self.next(); self.next();
self.tokens.push(token); break;
}
// Backslash '\' escapes the next character
'\\' => {
self.next();
escape = true;
} }
/// Advance to next character and return the removed char. When the end of the code is reached, // Reached end of text but didn't encounter closing doublequote '"'
/// `'\0'` is returned. This is used instead of an Option::None since it allows for much '\0' => panic!("String is never terminated (missing '\"')"),
/// shorter and cleaner code in the main loop. The `'\0'` character would not be valid anyways
fn next(&mut self) -> char { _ => text.push(self.next()),
self.code.next().unwrap_or('\0') }
}
} }
/// Get the next character without removing it. When the end of the code is reached, Token::Literal(Literal::Str(text))
/// `'\0'` is returned. This is used instead of an Option::None since it allows for much
/// shorter and cleaner code in the main loop. The `'\0'` character would not be valid anyways
fn peek(&mut self) -> char {
self.code.peek().copied().unwrap_or('\0')
} }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use crate::{lexer::lex, T}; use crate::token::Literal;
use super::{lex, Token};
/// A general test to check if the lexer actually lexes tokens correctly
#[test] #[test]
fn test_lexer() { fn test_lexer() {
let code = r#"53+1-567_000 * / % | ~ ! < > & ^ ({[]});= <- >= <= let code = "33 +5*2 + 4456467*2334+3 % - / << ^ | & >>";
== != && || << >> loop if else print my_123var "hello \t world\r\n\"\\""#;
let expected = vec![ let expected = vec![
T![i64(53)], Token::Literal(Literal::I64(33)),
T![+], Token::Add,
T![i64(1)], Token::Literal(Literal::I64(5)),
T![-], Token::Mul,
T![i64(567_000)], Token::Literal(Literal::I64(2)),
T![*], Token::Add,
T![/], Token::Literal(Literal::I64(4456467)),
T![%], Token::Mul,
T![|], Token::Literal(Literal::I64(2334)),
T![~], Token::Add,
T![!], Token::Literal(Literal::I64(3)),
T![<], Token::Mod,
T![>], Token::Sub,
T![&], Token::Div,
T![^], Token::Shl,
T!['('], Token::BXor,
T!['{'], Token::BOr,
T!['['], Token::BAnd,
T![']'], Token::Shr,
T!['}'],
T![')'],
T![;],
T![=],
T![<-],
T![>=],
T![<=],
T![==],
T![!=],
T![&&],
T![||],
T![<<],
T![>>],
T![loop],
T![if],
T![else],
T![print],
T![ident("my_123var".to_string())],
T![str("hello \t world\r\n\"\\".to_string())],
]; ];
let actual = lex(code).unwrap(); let actual = lex(code);
assert_eq!(expected, actual); assert_eq!(expected, actual);
} }
} }

View File

@ -1,68 +1,7 @@
pub mod ast;
pub mod interpreter;
pub mod lexer; pub mod lexer;
pub mod parser; pub mod parser;
pub mod interpreter;
pub mod token; pub mod token;
pub mod stringstore; pub mod ast;
pub mod astoptimizer; pub mod bytecode;
pub mod util; pub mod vm;
/// A bunch of full program tests using the example code programs as test subjects.
#[cfg(test)]
mod tests {
use crate::interpreter::{Interpreter, Value};
use std::fs::read_to_string;
/// Run a nek program with the given filename from the examples directory and assert the
/// captured output with the expected result. This only works if the program just outputs one
/// value as the result
fn run_example_check_single_i64_output(filename: &str, correct_result: i64) {
let mut interpreter = Interpreter::new();
// Enable output capturing. This captures all calls to `print`
interpreter.capture_output = true;
// Load and run the given program
let code = read_to_string(format!("examples/{filename}")).unwrap();
interpreter.run_str(&code);
// Compare the captured output with the expected value
let expected_output = [Value::I64(correct_result)];
assert_eq!(interpreter.output(), &expected_output);
}
#[test]
fn test_euler1() {
run_example_check_single_i64_output("euler1.nek", 233168);
}
#[test]
fn test_euler2() {
run_example_check_single_i64_output("euler2.nek", 4613732);
}
#[test]
fn test_euler3() {
run_example_check_single_i64_output("euler3.nek", 6857);
}
#[test]
fn test_euler4() {
run_example_check_single_i64_output("euler4.nek", 906609);
}
#[test]
fn test_euler5() {
run_example_check_single_i64_output("euler5.nek", 232792560);
}
#[test]
fn test_recursive_fib() {
run_example_check_single_i64_output("recursive_fib.nek", 832040);
}
#[test]
fn test_functions() {
run_example_check_single_i64_output("test_functions.nek", 69);
}
}

View File

@ -1,59 +1,63 @@
use std::{env::args, fs, process::exit}; use std::{env::args, io::Write};
use nek_lang::{interpreter::Interpreter, nice_panic}; use nek_lang::{interpreter::Interpreter, lexer::lex, parser::parse, bytecode::compile, vm::Vm};
/// Cli configuration flags and arguments. This could be done with `clap`, but since only so few
/// arguments are supported this seems kind of overkill.
#[derive(Debug, Default)] #[derive(Debug, Default)]
struct CliConfig { struct CliConfig {
print_tokens: bool, print_tokens: bool,
print_ast: bool, print_ast: bool,
no_optimizations: bool, interactive: bool,
file: Option<String>, file: Option<String>,
} }
fn main() { fn main() {
let mut conf = CliConfig::default(); let mut cfg = CliConfig::default();
// Go through all commandline arguments except the first (filename)
for arg in args().skip(1) { for arg in args().skip(1) {
match arg.as_str() { match arg.as_str() {
"--token" | "-t" => conf.print_tokens = true, "--tokens" | "-t" => cfg.print_tokens = true,
"--ast" | "-a" => conf.print_ast = true, "--ast" | "-a" => cfg.print_ast = true,
"--no-opt" | "-n" => conf.no_optimizations = true, "--interactive" | "-i" => cfg.interactive = true,
"--help" | "-h" => print_help(), file if cfg.file.is_none() => cfg.file = Some(file.to_string()),
file if !arg.starts_with("-") && conf.file.is_none() => { _ => panic!("Invalid argument: '{}'", arg),
conf.file = Some(file.to_string())
}
_ => nice_panic!("Error: Invalid argument '{}'", arg),
} }
} }
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
interpreter.print_tokens = conf.print_tokens; if let Some(file) = &cfg.file {
interpreter.print_ast = conf.print_ast; let code = std::fs::read_to_string(file).expect(&format!("File not found: '{}'", file));
interpreter.optimize_ast = !conf.no_optimizations; let tokens = lex(&code);
let ast = parse(tokens);
if let Some(file) = &conf.file { let prog = compile(&ast);
let code = match fs::read_to_string(file) {
Ok(code) => code, // println!("{:?}", prog);
Err(_) => nice_panic!("Error: Could not read file '{}'", file),
}; let mut vm = Vm::new(prog);
// Lex, parse and run the program
interpreter.run_str(&code); vm.run();
} else {
println!("Error: No file given\n"); // interpreter.run_text(&code, cfg.print_tokens, cfg.print_ast);
print_help();
}
} }
fn print_help() { if cfg.interactive || cfg.file.is_none() {
println!("Usage nek-lang [FLAGS] [FILE]");
println!("FLAGS: "); let mut code = String::new();
println!("-t, --token Print the lexed tokens");
println!("-a, --ast Print the abstract syntax tree"); loop {
println!("-n, --no-opt Disable the AST optimizations"); print!(">> ");
println!("-h, --help Show this help screen"); std::io::stdout().flush().unwrap();
exit(0);
code.clear();
std::io::stdin().read_line(&mut code).unwrap();
let code = code.trim();
if code == "exit" {
break;
}
interpreter.run_text(&code, cfg.print_tokens, cfg.print_ast);
}
}
} }

View File

@ -1,376 +1,186 @@
use thiserror::Error; use std::iter::Peekable;
use crate::{ use crate::{
ast::{Ast, BlockScope, Expression, FunDecl, If, Loop, Statement, VarDecl}, ast::{Ast, BinOpType, Expr, Stmt, UnOpType},
stringstore::{Sid, StringStore}, token::{Keyword, Literal, Token},
token::Token,
util::{PutBackIter, PutBackableExt},
T,
}; };
/// Errors that can occur while parsing
#[derive(Debug, Error)]
pub enum ParseErr {
#[error("Unexpected Token \"{0:?}\", expected \"{1}\"")]
UnexpectedToken(Token, String),
#[error("Left hand side of declaration is not a variable")]
DeclarationOfNonVar,
#[error("Use of undefined variable \"{0}\"")]
UseOfUndeclaredVar(String),
#[error("Use of undefined function \"{0}\"")]
UseOfUndeclaredFun(String),
#[error("Redeclation of function \"{0}\"")]
RedeclarationFun(String),
#[error("Function not declared at top level \"{0}\"")]
FunctionOnNonTopLevel(String),
}
/// A result that can either be Ok, or a ParseErr
type ResPE<T> = Result<T, ParseErr>;
/// This macro can be used to quickly and easily assert if the next token is matching the expected
/// token and return an appropriate error if not. Since this is intended to be used inside the
/// parser, the first argument should always be `self`.
macro_rules! validate_next {
($self:ident, $expected_tok:pat, $expected_str:expr) => {
match $self.next() {
$expected_tok => (),
tok => return Err(ParseErr::UnexpectedToken(tok, format!("{}", $expected_str))),
}
};
}
/// Parse the given tokens into an abstract syntax tree
pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> ResPE<Ast> {
let parser = Parser::new(tokens);
parser.parse()
}
/// A parser that takes in a Token Stream and can create a full abstract syntax tree from it.
struct Parser<T: Iterator<Item = Token>> { struct Parser<T: Iterator<Item = Token>> {
tokens: PutBackIter<T>, tokens: Peekable<T>,
string_store: StringStore,
var_stack: Vec<Sid>,
fun_stack: Vec<Sid>,
nesting_level: usize,
} }
impl<T: Iterator<Item = Token>> Parser<T> { impl<T: Iterator<Item = Token>> Parser<T> {
/// Create a new parser to parse the given Token Stream /// Create a new parser to parse the given Token Stream
pub fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self { fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self {
let tokens = tokens.into_iter().putbackable(); let tokens = tokens.into_iter().peekable();
let string_store = StringStore::new(); Self { tokens }
let var_stack = Vec::new();
let fun_stack = Vec::new();
Self {
tokens,
string_store,
var_stack,
fun_stack,
nesting_level: 0,
}
} }
/// Consume the parser and try to create the abstract syntax tree from the token stream /// Get the next Token without removing it
pub fn parse(mut self) -> ResPE<Ast> { fn peek(&mut self) -> &Token {
let main = self.parse_scoped_block()?; self.tokens.peek().unwrap_or(&Token::EoF)
Ok(Ast {
main,
stringstore: self.string_store,
})
} }
/// Parse a series of statements together as a BlockScope. This will continuously parse /// Advance to next Token and return the removed Token
/// statements until encountering end-of-file or a block end '}' . fn next(&mut self) -> Token {
fn parse_scoped_block(&mut self) -> ResPE<BlockScope> { self.tokens.next().unwrap_or(Token::EoF)
self.parse_scoped_block_fp_offset(0)
} }
/// Same as parse_scoped_block, but an offset to the framepointer can be specified to allow fn parse(&mut self) -> Ast {
/// for easily passing variables into scopes from the outside. This is used when parsing
/// function calls
fn parse_scoped_block_fp_offset(&mut self, framepointer_offset: usize) -> ResPE<BlockScope> {
self.nesting_level += 1;
let framepointer = self.var_stack.len() - framepointer_offset;
let mut prog = Vec::new(); let mut prog = Vec::new();
loop { loop {
match self.peek() {
// Just a semicolon is an empty statement. So just consume it
T![;] => {
self.next();
}
// '}' end the current block and EoF ends everything, as the end of the tokenstream
// is reached
T![EoF] | T!['}'] => break,
// Create a new scoped block
T!['{'] => {
self.next();
prog.push(Statement::Block(self.parse_scoped_block()?));
validate_next!(self, T!['}'], "}");
}
// By default try to lex statements
_ => prog.push(self.parse_stmt()?),
}
}
// Reset the stack to where it was before entering the scope
self.var_stack.truncate(framepointer);
self.nesting_level -= 1;
Ok(prog)
}
/// Parse a single statement from the tokens
fn parse_stmt(&mut self) -> ResPE<Statement> {
let stmt = match self.peek() { let stmt = match self.peek() {
// Break statement Token::Semicolon => {
T![break] => {
self.next(); self.next();
continue;
// After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
Statement::Break
} }
Token::EoF => break,
Token::RBrace => break,
// Continue statement Token::Keyword(keyword) => match keyword {
T![continue] => { Keyword::Let => self.parse_let_stmt(),
Keyword::While => self.parse_while(),
Keyword::If => self.parse_if(),
Keyword::For => self.parse_for(),
Keyword::Else => panic!("Unexpected else keyword"),
},
Token::Dollar => {
self.next(); self.next();
Stmt::Print(self.parse_expr())
// After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
Statement::Continue
} }
Token::DoubleDollar => {
// Loop statement
T![loop] => Statement::Loop(self.parse_loop()?),
// Print statement
T![print] => {
self.next(); self.next();
Stmt::DbgPrint(self.parse_expr())
let expr = self.parse_expr()?;
// After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
Statement::Print(expr)
} }
// By default try to parse an expression
// Return statement _ => Stmt::Expr(self.parse_expr()),
T![return] => {
self.next();
let stmt = Statement::Return(self.parse_expr()?);
// After a statement, there must be a semicolon
validate_next!(self, T![;], ";");
stmt
}
// If statement
T![if] => Statement::If(self.parse_if()?),
// Function definition statement
T![fun] => {
self.next();
// Expect an identifier as the function name
let fun_name = match self.next() {
T![ident(fun_name)] => fun_name,
tok => return Err(ParseErr::UnexpectedToken(tok, "<ident>".to_string())),
}; };
// Only allow function definitions on the top level prog.push(stmt);
if self.nesting_level > 1 {
return Err(ParseErr::FunctionOnNonTopLevel(fun_name));
} }
// Intern the function name Ast { prog }
let fun_name = self.string_store.intern_or_lookup(&fun_name);
// Check if the function name already exists
if self.fun_stack.contains(&fun_name) {
return Err(ParseErr::RedeclarationFun(
self.string_store
.lookup(fun_name)
.cloned()
.unwrap_or("<unknown>".to_string()),
));
} }
// Put the function name on the fucntion stack for precalculating the stack fn parse_for(&mut self) -> Stmt {
// positions if !matches!(self.next(), Token::Keyword(Keyword::For)) {
let fun_stackpos = self.fun_stack.len(); panic!("Error parsing for: Expected for token");
self.fun_stack.push(fun_name); }
let init = match self.parse_let_stmt() {
let mut arg_names = Vec::new(); Stmt::Let(name, rhs) => (name, rhs),
validate_next!(self, T!['('], "(");
// Parse the optional arguments inside the parentheses
while matches!(self.peek(), T![ident(_)]) {
let var_name = match self.next() {
T![ident(var_name)] => var_name,
_ => unreachable!(), _ => unreachable!(),
}; };
// Intern argument names if !matches!(self.next(), Token::Semicolon) {
let var_name = self.string_store.intern_or_lookup(&var_name); panic!("Error parsing for: Expected semicolon token");
arg_names.push(var_name);
// Push the variable onto the varstack
self.var_stack.push(var_name);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
}
} }
validate_next!(self, T![')'], ")"); let condition = self.parse_expr();
validate_next!(self, T!['{'], "{"); if !matches!(self.next(), Token::Semicolon) {
panic!("Error parsing for: Expected semicolon token");
// Create the scoped block with a stack offset. This will pop the args that are
// added to the stack while parsing args
let body = self.parse_scoped_block_fp_offset(arg_names.len())?;
validate_next!(self, T!['}'], "}");
Statement::FunDeclare(FunDecl {
name: fun_name,
fun_stackpos,
argnames: arg_names,
body: body.into(),
})
} }
// Either a variable declaration statement or an expression statement let advance = self.parse_expr();
_ => {
// To decide if it is a declaration or an expression, a lookahead is needed
let first = self.next();
let stmt = match (first, self.peek()) { if !matches!(self.next(), Token::LBrace) {
// Identifier and "<-" is a declaration panic!("Error parsing for: Expected '{{' token");
(T![ident(name)], T![<-]) => { }
let body = self.parse();
if !matches!(self.next(), Token::RBrace) {
panic!("Error parsing for: Expected '}}' token");
}
Stmt::For(init, condition, advance, body)
}
fn parse_if(&mut self) -> Stmt {
if !matches!(self.next(), Token::Keyword(Keyword::If)) {
panic!("Error parsing if: Expected if token");
}
let condition = self.parse_expr();
if !matches!(self.next(), Token::LBrace) {
panic!("Error parsing if: Expected '{{' token");
}
let body_if = self.parse();
if !matches!(self.next(), Token::RBrace) {
panic!("Error parsing if: Expected '}}' token");
}
let mut body_else = Ast { prog: Vec::new() };
if matches!(self.peek(), Token::Keyword(Keyword::Else)) {
self.next(); self.next();
let rhs = self.parse_expr()?; if !matches!(self.next(), Token::LBrace) {
panic!("Error parsing else: Expected '{{' token");
let sid = self.string_store.intern_or_lookup(&name);
let sp = self.var_stack.len();
self.var_stack.push(sid);
Statement::Declaration(VarDecl {
name: sid,
var_stackpos: sp,
rhs,
})
} }
// Anything else must be an expression
(first, _) => { body_else = self.parse();
// Put the first token back in order for the parse_expr to see it
self.putback(first); if !matches!(self.next(), Token::RBrace) {
Statement::Expr(self.parse_expr()?) panic!("Error parsing else: Expected '}}' token");
} }
}
Stmt::If(condition, body_if, body_else)
}
fn parse_while(&mut self) -> Stmt {
if !matches!(self.next(), Token::Keyword(Keyword::While)) {
panic!("Error parsing while: Expected while token");
}
let condition = self.parse_expr();
if !matches!(self.next(), Token::LBrace) {
panic!("Error parsing while: Expected '{{' token");
}
let body = self.parse();
if !matches!(self.next(), Token::RBrace) {
panic!("Error parsing while: Expected '}}' token");
}
Stmt::While(condition, body)
}
fn parse_let_stmt(&mut self) -> Stmt {
if !matches!(self.next(), Token::Keyword(Keyword::Let)) {
panic!("Error parsing let: Expected let token");
}
let name = match self.next() {
Token::Ident(name) => name,
_ => panic!("Error parsing let: Expected identifier after let"),
}; };
// After a statement, there must be a semicolon if !matches!(self.next(), Token::Assign) {
validate_next!(self, T![;], ";"); panic!("Error parsing let: Expected assignment token");
stmt
}
};
Ok(stmt)
} }
/// Parse an if statement from the tokens let rhs = self.parse_expr();
fn parse_if(&mut self) -> ResPE<If> {
validate_next!(self, T![if], "if");
let condition = self.parse_expr()?; Stmt::Let(name, rhs)
validate_next!(self, T!['{'], "{");
let body_true = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
let mut body_false = BlockScope::default();
// Optionally parse the else part
if self.peek() == &T![else] {
self.next();
validate_next!(self, T!['{'], "{");
body_false = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
} }
Ok(If { fn parse_expr(&mut self) -> Expr {
condition, let lhs = self.parse_primary();
body_true,
body_false,
})
}
/// Parse a loop statement from the tokens
fn parse_loop(&mut self) -> ResPE<Loop> {
validate_next!(self, T![loop], "loop");
let mut condition = None;
let mut advancement = None;
// Check if the optional condition is present
if !matches!(self.peek(), T!['{']) {
condition = Some(self.parse_expr()?);
// Check if the optional advancement is present
if matches!(self.peek(), T![;]) {
self.next();
advancement = Some(self.parse_expr()?);
}
}
validate_next!(self, T!['{'], "{");
let body = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
Ok(Loop {
condition,
advancement,
body,
})
}
/// Parse a single expression from the tokens
fn parse_expr(&mut self) -> ResPE<Expression> {
let lhs = self.parse_primary()?;
self.parse_expr_precedence(lhs, 0) self.parse_expr_precedence(lhs, 0)
} }
/// Parse binary expressions with a precedence equal to or higher than min_prec. /// Parse binary expressions with a precedence equal to or higher than min_prec
/// This uses the precedence climbing methode for dealing with the operator precedences: fn parse_expr_precedence(&mut self, mut lhs: Expr, min_prec: u8) -> Expr {
/// https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method
fn parse_expr_precedence(&mut self, mut lhs: Expression, min_prec: u8) -> ResPE<Expression> {
while let Some(binop) = &self.peek().try_to_binop() { while let Some(binop) = &self.peek().try_to_binop() {
// Stop if the next operator has a lower binding power // Stop if the next operator has a lower binding power
if !(binop.precedence() >= min_prec) { if !(binop.precedence() >= min_prec) {
@ -381,211 +191,117 @@ impl<T: Iterator<Item = Token>> Parser<T> {
// valid // valid
let binop = self.next().try_to_binop().unwrap(); let binop = self.next().try_to_binop().unwrap();
let mut rhs = self.parse_primary()?; let mut rhs = self.parse_primary();
while let Some(binop2) = &self.peek().try_to_binop() { while let Some(binop2) = &self.peek().try_to_binop() {
if !(binop2.precedence() > binop.precedence()) { if !(binop2.precedence() > binop.precedence()) {
break; break;
} }
rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1)?; rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1);
} }
lhs = Expression::BinOp(binop, lhs.into(), rhs.into()); lhs = Expr::BinOp(binop, lhs.into(), rhs.into());
} }
Ok(lhs) lhs
} }
/// Parse a primary expression. A primary can be a literal value, variable, function call, /// Parse a primary expression (for now only number)
/// array indexing, parentheses grouping or a unary operation fn parse_primary(&mut self) -> Expr {
fn parse_primary(&mut self) -> ResPE<Expression> { match self.next() {
let primary = match self.next() { Token::Literal(Literal::I64(val)) => Expr::I64(val),
// Literal i64
T![i64(val)] => Expression::I64(val),
// Literal String Token::Literal(Literal::Str(text)) => Expr::Str(text.into()),
T![str(text)] => Expression::String(self.string_store.intern_or_lookup(&text)),
// Array literal. Square brackets containing the array size as expression Token::Ident(name) => Expr::Ident(name),
T!['['] => {
let size = self.parse_expr()?;
validate_next!(self, T![']'], "]"); Token::LParen => {
// The tokens was an opening parenthesis, so parse a full expression again as the
// expression inside the parentheses `"(" expr ")"`
let inner = self.parse_expr();
Expression::ArrayLiteral(size.into()) // If there is no closing parenthesis after the expression, it is a syntax error
if !matches!(self.next(), Token::RParen) {
panic!("Error parsing primary expr: Missing closing parenthesis ')'");
} }
// Array sccess, aka indexing. An ident followed by square brackets containing the inner
// index as an expression
T![ident(name)] if self.peek() == &T!['['] => {
// Get the stack position of the array variable
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
self.next();
let index = self.parse_expr()?;
validate_next!(self, T![']'], "]");
Expression::ArrayAccess(sid, stackpos, index.into())
} }
// Identifier followed by parenthesis is a function call Token::Sub => Expr::UnOp(UnOpType::Neg, self.parse_primary().into()),
T![ident(name)] if self.peek() == &T!['('] => {
// Skip the opening parenthesis
self.next();
let sid = self.string_store.intern_or_lookup(&name); tok => panic!("Error parsing primary expr: Unexpected Token '{:?}'", tok),
let mut args = Vec::new();
// Parse the arguments as expressions
while !matches!(self.peek(), T![')']) {
let arg = self.parse_expr()?;
args.push(arg);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
} }
} }
validate_next!(self, T![')'], ")");
// Find the function stack position
let fun_stackpos = self.get_fun_stackpos(sid)?;
Expression::FunCall(sid, fun_stackpos, args)
} }
// Just an identifier is a variable pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> Ast {
T![ident(name)] => { let mut parser = Parser::new(tokens);
// Find the variable stack position parser.parse()
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
Expression::Var(sid, stackpos)
} }
// Parentheses grouping impl BinOpType {
T!['('] => { /// Get the precedence for a binary operator. Higher value means the OP is stronger binding.
// Contained inbetween the parentheses can be any other expression /// For example Multiplication is stronger than addition, so Mul has higher precedence than Add.
let inner_expr = self.parse_expr()?; ///
/// The operator precedences are derived from the C language operator precedences. While not all
// Verify that there is a closing parenthesis /// C operators are included or the exact same, the precedence oder is the same.
validate_next!(self, T![')'], ")"); /// See: https://en.cppreference.com/w/c/language/operator_precedence
fn precedence(&self) -> u8 {
inner_expr match self {
BinOpType::Assign => 0,
BinOpType::BOr => 1,
BinOpType::BXor => 2,
BinOpType::BAnd => 3,
BinOpType::Equ | BinOpType::Neq => 4,
BinOpType::Gt | BinOpType::Ge | BinOpType::Lt | BinOpType::Le => 5,
BinOpType::Shl | BinOpType::Shr => 6,
BinOpType::Add | BinOpType::Sub => 7,
BinOpType::Mul | BinOpType::Div | BinOpType::Mod => 8,
} }
// Unary operations or invalid token
tok => match tok.try_to_unop() {
// If the token is a valid unary operation, parse it as such
Some(uot) => Expression::UnOp(uot, self.parse_primary()?.into()),
// Otherwise it's an unexpected token
None => return Err(ParseErr::UnexpectedToken(tok, "primary".to_string())),
},
};
Ok(primary)
}
/// Try to get the position of a variable on the variable stack. This is needed to precalculate
/// the stackpositions in order to save time when executing
fn get_stackpos(&self, varid: Sid) -> ResPE<usize> {
self.var_stack
.iter()
.rev()
.position(|it| *it == varid)
.map(|it| it)
.ok_or(ParseErr::UseOfUndeclaredVar(
self.string_store
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
}
/// Try to get the position of a function on the function stack. This is needed to precalculate
/// the stackpositions in order to save time when executing
fn get_fun_stackpos(&self, varid: Sid) -> ResPE<usize> {
self.fun_stack
.iter()
.rev()
.position(|it| *it == varid)
.map(|it| self.fun_stack.len() - it - 1)
.ok_or(ParseErr::UseOfUndeclaredFun(
self.string_store
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
}
/// Get the next Token without removing it. If there are no more tokens left, the EoF token is
/// returned. This follows the same reasoning as in the Lexer
fn peek(&mut self) -> &Token {
self.tokens.peek().unwrap_or(&T![EoF])
}
/// Put a single token back into the token stream
fn putback(&mut self, tok: Token) {
self.tokens.putback(tok);
}
/// Advance to next Token and return the removed Token. If there are no more tokens left, the
/// EoF token is returned. This follows the same reasoning as in the Lexer
fn next(&mut self) -> Token {
self.tokens.next().unwrap_or(T![EoF])
} }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{parse, BinOpType, Expr};
use crate::{ use crate::{
ast::{BinOpType, Expression, Statement}, parser::{Ast, Stmt},
parser::parse, token::{Literal, Token},
T,
}; };
/// A very simple test to check if the parser correctly parses a simple expression
#[test] #[test]
fn test_parser() { fn test_parser() {
// Expression: 1 + 2 * 3 - 4 // Expression: 1 + 2 * 3 + 4
// With precedence: (1 + (2 * 3)) - 4 // With precedence: (1 + (2 * 3)) + 4
let tokens = [ let tokens = [
T![i64(1)], Token::Literal(Literal::I64(1)),
T![+], Token::Add,
T![i64(2)], Token::Literal(Literal::I64(2)),
T![*], Token::Mul,
T![i64(3)], Token::Literal(Literal::I64(3)),
T![-], Token::Sub,
T![i64(4)], Token::Literal(Literal::I64(4)),
T![;],
]; ];
let expected = Statement::Expr(Expression::BinOp( let expected = Expr::BinOp(
BinOpType::Sub, BinOpType::Sub,
Expression::BinOp( Expr::BinOp(
BinOpType::Add, BinOpType::Add,
Expression::I64(1).into(), Expr::I64(1).into(),
Expression::BinOp( Expr::BinOp(BinOpType::Mul, Expr::I64(2).into(), Expr::I64(3).into()).into(),
BinOpType::Mul,
Expression::I64(2).into(),
Expression::I64(3).into(),
) )
.into(), .into(),
) Expr::I64(4).into(),
.into(), );
Expression::I64(4).into(),
));
let expected = vec![expected]; let expected = Ast {
prog: vec![Stmt::Expr(expected)],
};
let actual = parse(tokens).unwrap(); let actual = parse(tokens);
assert_eq!(expected, actual.main); assert_eq!(expected, actual);
} }
} }

View File

@ -1,104 +0,0 @@
use std::collections::HashMap;
/// A StringID that identifies a String inside the stringstore. This is only valid for the
/// StringStore that created the ID. These StringIDs can be trivialy and cheaply copied
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Sid(usize);
/// A Datastructure that stores strings, handing out StringIDs that can be used to retrieve the
/// real strings at a later point. This is called interning.
#[derive(Clone, Default)]
pub struct StringStore {
/// The actual strings that are stored in the StringStore. The StringIDs match the index of the
/// string inside of this strings vector
strings: Vec<String>,
/// A Hashmap that allows to match already interned Strings to their StringID. This allows for
/// deduplication since the same string won't be stored twice
sids: HashMap<String, Sid>,
}
impl StringStore {
/// Create a new empty StringStore
pub fn new() -> Self {
Self { strings: Vec::new(), sids: HashMap::new() }
}
/// Put the given string into the StringStore and get a StringID in return. If the string is
/// not yet stored, it will be after this.
///
/// Note: The generated StringIDs are only valid for the StringStore that created them. Using
/// the IDs with another StringStore is undefined behavior. It might return wrong Strings or
/// None.
pub fn intern_or_lookup(&mut self, text: &str) -> Sid {
self.sids.get(text).copied().unwrap_or_else(|| {
let sid = Sid(self.strings.len());
self.strings.push(text.to_string());
self.sids.insert(text.to_string(), sid);
sid
})
}
/// Lookup and retrieve a string by the StringID. If the String is not found, None is returned.
///
/// Note: The generated StringIDs are only valid for the StringStore that created them. Using
/// the IDs with another StringStore is undefined behavior. It might return wrong Strings or
/// None.
pub fn lookup(&self, sid: Sid) -> Option<&String> {
self.strings.get(sid.0)
}
}
#[cfg(test)]
mod tests {
use super::StringStore;
#[test]
fn test_stringstore_intern_lookup() {
let mut ss = StringStore::new();
let s1 = "Hello";
let s2 = "World";
let id1 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1).unwrap().as_str(), s1);
let id2 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2).unwrap().as_str(), s2);
assert_eq!(ss.lookup(id1).unwrap().as_str(), s1);
}
#[test]
fn test_stringstore_no_duplicates() {
let mut ss = StringStore::new();
let s1 = "Hello";
let s2 = "World";
let id1_1 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1_1).unwrap().as_str(), s1);
let id1_2 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1_2).unwrap().as_str(), s1);
// Check that the string is the same
assert_eq!(id1_1, id1_2);
// Check that only one string is actually stored
assert_eq!(ss.strings.len(), 1);
assert_eq!(ss.sids.len(), 1);
let id2_1 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2_1).unwrap().as_str(), s2);
let id2_2 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2_2).unwrap().as_str(), s2);
// Check that the string is the same
assert_eq!(id2_1, id2_2);
assert_eq!(ss.strings.len(), 2);
assert_eq!(ss.sids.len(), 2);
}
}

View File

@ -1,379 +1,147 @@
use crate::{ use crate::ast::BinOpType;
ast::{BinOpType, UnOpType},
T,
};
/// Language keywords
#[derive(Debug, PartialEq, Eq)]
pub enum Keyword {
/// Loop keyword ("loop")
Loop,
/// Print keyword ("print")
Print,
/// If keyword ("if")
If,
/// Else keyword ("else")
Else,
/// Function declaration keyword ("fun")
Fun,
/// Return keyword ("return")
Return,
/// Break keyword ("break")
Break,
/// Continue keyword ("continue")
Continue,
}
/// Literal values
#[derive(Debug, PartialEq, Eq)] #[derive(Debug, PartialEq, Eq)]
pub enum Literal { pub enum Literal {
/// Integer literal (64-bit) /// Integer literal (64-bit)
I64(i64), I64(i64),
/// String literal
String(String), /// String literal ("Some string")
Str(String),
} }
/// Combined tokens that consist of a combination of characters
#[derive(Debug, PartialEq, Eq)] #[derive(Debug, PartialEq, Eq)]
pub enum Combo { pub enum Keyword {
/// Equal Equal ("==") /// Let identifier (let)
Equal2, Let,
/// Exclamation mark Equal ("!=") /// While (while)
ExclamationMarkEqual, While,
/// Ampersand Ampersand ("&&") /// For (for)
Ampersand2, For,
/// Pipe Pipe ("||") /// If (if)
Pipe2, If,
/// LessThan LessThan ("<<") /// Else (else)
LessThan2, Else,
/// GreaterThan GreaterThan (">>")
GreaterThan2,
/// LessThan Equal ("<=")
LessThanEqual,
/// GreaterThan Equal (">=")
GreaterThanEqual,
/// LessThan Minus ("<-")
LessThanMinus,
} }
/// Tokens are a group of one or more sourcecode characters that have a meaning together
#[derive(Debug, PartialEq, Eq)] #[derive(Debug, PartialEq, Eq)]
pub enum Token { pub enum Token {
/// Literal value token /// Literal values
Literal(Literal), Literal(Literal),
/// Keyword token /// Identifier (variable / function / ... name)
Keyword(Keyword),
/// Identifier token (names for variables, functions, ...)
Ident(String), Ident(String),
/// Combined tokens consisting of multiple characters /// Specific identifiers that have a special meaning as keywords
Combo(Combo), Keyword(Keyword),
/// Comma (",") /// Left parenthesis ('(')
Comma,
/// Equal Sign ("=")
Equal,
/// Semicolon (";")
Semicolon,
/// End of file (This is not generated by the lexer, but the parser uses this to find the
/// end of the token stream)
EoF,
/// Left Bracket ("[")
LBracket,
/// Right Bracket ("]")
RBracket,
/// Left Parenthesis ("(")
LParen, LParen,
/// Right Parenthesis (")"") /// Right parentheses (')')
RParen, RParen,
/// Left curly braces ("{") /// Left brace ({)
LBraces, LBrace,
/// Right curly braces ("}") /// Right brace (})
RBraces, RBrace,
/// Plus ("+") /// Dollar sign ($)
Plus, Dollar,
/// Minus ("-") /// Double Dollar sign ($$)
Minus, DoubleDollar,
/// Asterisk ("*") /// Assignment (single equal) (=)
Asterisk, Assign,
/// Slash ("/") /// Plus (+)
Slash, Add,
/// Percent ("%") /// Minus (-)
Percent, Sub,
/// Pipe ("|") /// Asterisk (*)
Pipe, Mul,
/// Tilde ("~") /// Slash (/)
Tilde, Div,
/// Logical not ("!") /// Percent (%)
Exclamationmark, Mod,
/// Left angle bracket ("<") /// Pipe (|)
LessThan, BOr,
/// Right angle bracket (">") /// Ampersand (&)
GreaterThan, BAnd,
/// Ampersand ("&") /// Circumflex (^)
Ampersand, BXor,
/// Circumflex ("^") /// Shift Left (<<)
Circumflex, Shl,
/// Shift Right (>>)
Shr,
/// Equal sign (==)
Equ,
/// Not Equal sign (!=)
Neq,
/// Greater than (>)
Gt,
/// Greater or equal (>=)
Ge,
/// Less than (<)
Lt,
/// Less or equal (<=)
Le,
/// Semicolon (;)
Semicolon,
/// End of file
EoF,
} }
impl Token { impl Token {
/// If the Token can be used as a binary operation type, get the matching BinOpType. Otherwise
/// return None.
pub fn try_to_binop(&self) -> Option<BinOpType> { pub fn try_to_binop(&self) -> Option<BinOpType> {
Some(match self { Some(match self {
T![+] => BinOpType::Add, Token::Add => BinOpType::Add,
T![-] => BinOpType::Sub, Token::Sub => BinOpType::Sub,
T![*] => BinOpType::Mul, Token::Mul => BinOpType::Mul,
T![/] => BinOpType::Div, Token::Div => BinOpType::Div,
T![%] => BinOpType::Mod, Token::Mod => BinOpType::Mod,
T![&] => BinOpType::BAnd, Token::BAnd => BinOpType::BAnd,
T![|] => BinOpType::BOr, Token::BOr => BinOpType::BOr,
T![^] => BinOpType::BXor, Token::BXor => BinOpType::BXor,
T![&&] => BinOpType::LAnd, Token::Shl => BinOpType::Shl,
T![||] => BinOpType::LOr, Token::Shr => BinOpType::Shr,
T![<<] => BinOpType::Shl, Token::Equ => BinOpType::Equ,
T![>>] => BinOpType::Shr, Token::Neq => BinOpType::Neq,
T![==] => BinOpType::EquEqu, Token::Gt => BinOpType::Gt,
T![!=] => BinOpType::NotEqu, Token::Ge => BinOpType::Ge,
Token::Lt => BinOpType::Lt,
Token::Le => BinOpType::Le,
T![<] => BinOpType::Less, Token::Assign => BinOpType::Assign,
T![<=] => BinOpType::LessEqu,
T![>] => BinOpType::Greater,
T![>=] => BinOpType::GreaterEqu,
T![=] => BinOpType::Assign,
_ => return None,
})
}
/// If the token can be used as a unary operation type, get the matching UnOpType. Otherwise
/// return None
pub fn try_to_unop(&self) -> Option<UnOpType> {
Some(match self {
T![-] => UnOpType::Negate,
T![!] => UnOpType::LNot,
T![~] => UnOpType::BNot,
_ => return None, _ => return None,
}) })
} }
} }
/// Macro to quickly create a token of the specified kind. As this is implemented as a macro, it
/// can be used anywhere including in patterns.
///
/// An implementation should exist for each token, so that there is no need to ever write out the
/// long token definitions.
#[macro_export]
macro_rules! T {
// Keywords
[loop] => {
crate::token::Token::Keyword(crate::token::Keyword::Loop)
};
[print] => {
crate::token::Token::Keyword(crate::token::Keyword::Print)
};
[if] => {
crate::token::Token::Keyword(crate::token::Keyword::If)
};
[else] => {
crate::token::Token::Keyword(crate::token::Keyword::Else)
};
[fun] => {
crate::token::Token::Keyword(crate::token::Keyword::Fun)
};
[return] => {
crate::token::Token::Keyword(crate::token::Keyword::Return)
};
[break] => {
crate::token::Token::Keyword(crate::token::Keyword::Break)
};
[continue] => {
crate::token::Token::Keyword(crate::token::Keyword::Continue)
};
// Literals
[i64($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::I64($($val)*))
};
[str($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::String($($val)*))
};
// Ident
[ident($($val:tt)*)] => {
crate::token::Token::Ident($($val)*)
};
// Combo crate::token::Tokens
[==] => {
crate::token::Token::Combo(crate::token::Combo::Equal2)
};
[!=] => {
crate::token::Token::Combo(crate::token::Combo::ExclamationMarkEqual)
};
[&&] => {
crate::token::Token::Combo(crate::token::Combo::Ampersand2)
};
[||] => {
crate::token::Token::Combo(crate::token::Combo::Pipe2)
};
[<<] => {
crate::token::Token::Combo(crate::token::Combo::LessThan2)
};
[>>] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThan2)
};
[<=] => {
crate::token::Token::Combo(crate::token::Combo::LessThanEqual)
};
[>=] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThanEqual)
};
[<-] => {
crate::token::Token::Combo(crate::token::Combo::LessThanMinus)
};
// Normal Tokens
[,] => {
crate::token::Token::Comma
};
[=] => {
crate::token::Token::Equal
};
[;] => {
crate::token::Token::Semicolon
};
[EoF] => {
crate::token::Token::EoF
};
['['] => {
crate::token::Token::LBracket
};
[']'] => {
crate::token::Token::RBracket
};
['('] => {
crate::token::Token::LParen
};
[')'] => {
crate::token::Token::RParen
};
['{'] => {
crate::token::Token::LBraces
};
['}'] => {
crate::token::Token::RBraces
};
[+] => {
crate::token::Token::Plus
};
[-] => {
crate::token::Token::Minus
};
[*] => {
crate::token::Token::Asterisk
};
[/] => {
crate::token::Token::Slash
};
[%] => {
crate::token::Token::Percent
};
[|] => {
crate::token::Token::Pipe
};
[~] => {
crate::token::Token::Tilde
};
[!] => {
crate::token::Token::Exclamationmark
};
[<] => {
crate::token::Token::LessThan
};
[>] => {
crate::token::Token::GreaterThan
};
[&] => {
crate::token::Token::Ampersand
};
[^] => {
crate::token::Token::Circumflex
};
}

View File

@ -1,167 +0,0 @@
/// Exit the program with error code 1 and format-print the given text on stderr. This pretty much
/// works like panic, but doesn't show the additional information that panic adds. Those can be
/// interesting for debugging, but don't look that great when building a release executable for an
/// end user.
/// When running tests or running in debug mode, panic is used to ensure the tests working
/// correctly.
#[macro_export]
macro_rules! nice_panic {
($fmt:expr) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt);
} else {
eprintln!($fmt);
std::process::exit(1);
}
}
};
($fmt:expr, $($arg:tt)*) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt, $($arg)*);
} else {
eprintln!($fmt, $($arg)*);
std::process::exit(1);
}
}
};
}
/// The PutBackIter allows for items to be put back back and to be peeked. Putting an item back
/// will cause it to be the next item returned by `next`. Peeking an item will get a reference to
/// the next item in the iterator without removing it.
///
/// The whole PutBackIter behaves analogous to `std::iter::Peekable` with the addition of the
/// `putback` function. This is slightly slower than `Peekable`, but allows for an unlimited number
/// of putbacks and therefore an unlimited look-ahead range.
pub struct PutBackIter<T: Iterator> {
iter: T,
putback_stack: Vec<T::Item>,
}
impl<T> PutBackIter<T>
where
T: Iterator,
{
/// Make the given iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
pub fn new(iter: T) -> Self {
Self {
iter,
putback_stack: Vec::new(),
}
}
/// Put the given item back into the iterator. This causes the putbacked items to be returned by
/// next in last-in-first-out order (aka. stack order). Only after all previously putback items
/// have been returned, the actual underlying iterator is used to get items.
/// The number of items that can be put back is unlimited.
pub fn putback(&mut self, it: T::Item) {
self.putback_stack.push(it);
}
/// Peek the next item, getting a reference to it without removing it from the iterator. This
/// also includes items that were previsouly put back and not yet removed.
pub fn peek(&mut self) -> Option<&T::Item> {
if self.putback_stack.is_empty() {
let it = self.next()?;
self.putback(it);
}
self.putback_stack.last()
}
}
impl<T> Iterator for PutBackIter<T>
where
T: Iterator,
{
type Item = T::Item;
fn next(&mut self) -> Option<Self::Item> {
match self.putback_stack.pop() {
Some(it) => Some(it),
None => self.iter.next(),
}
}
}
pub trait PutBackableExt {
/// Make the iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
fn putbackable(self) -> PutBackIter<Self>
where
Self: Iterator + Sized,
{
PutBackIter::new(self)
}
}
impl<T: Iterator> PutBackableExt for T {}
#[cfg(test)]
mod tests {
use super::PutBackableExt;
#[test]
fn putback_iter_next() {
let mut iter = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut pb_iter = iter.clone().putbackable();
// Check if next works
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter.next());
}
}
#[test]
fn putback_iter_peek() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
for _ in 0..iter.len() {
// Check if peek gives a preview of the actual next element
assert_eq!(pb_iter.peek(), iter.next().as_ref());
// Check if next still returns the next (just peeked) element and not the one after
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
#[test]
fn putback_iter_putback() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
// Get the first 5 items with next and check if they match
let it0 = pb_iter.next();
assert_eq!(it0, iter.next());
let it1 = pb_iter.next();
assert_eq!(it1, iter.next());
let it2 = pb_iter.next();
assert_eq!(it2, iter.next());
let it3 = pb_iter.next();
assert_eq!(it3, iter.next());
let it4 = pb_iter.next();
assert_eq!(it4, iter.next());
// Put one value back and check if `next` works as expected, returning the just put back
// item
pb_iter.putback(it0.unwrap());
assert_eq!(pb_iter.next(), it0);
// Put all values back
pb_iter.putback(it4.unwrap());
pb_iter.putback(it3.unwrap());
pb_iter.putback(it2.unwrap());
pb_iter.putback(it1.unwrap());
pb_iter.putback(it0.unwrap());
// After all values have been put back, the iter should match the original again
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
}

208
src/vm.rs Normal file
View File

@ -0,0 +1,208 @@
use crate::{bytecode::op::*, interpreter::Value};
#[derive(Debug, Default)]
pub struct Vm {
prog: Vec<u32>,
ip: usize,
stack: Vec<Value>,
/// This isn't actually a heap. It's actually still more of a f*cked up stack
heap: Vec<Value>,
}
macro_rules! binop_stack {
($self:ident, $op:tt) => {
{
let rhs = $self.stack.pop().unwrap();
let lhs = $self.stack.last_mut().unwrap();
match (lhs, rhs) {
(Value::I64(lhs), Value::I64(rhs)) => *lhs = *lhs $op rhs,
_ => panic!("Invalid data for add"),
}
}
};
}
impl Vm {
pub fn new(prog: Vec<u32>) -> Self {
Self {
prog,
..Default::default()
}
}
pub fn run(&mut self) {
while let Some(op) = self.prog.get(self.ip).copied() {
self.ip += 1;
match op & 0xff {
PUSH => {
let val = self.read_i64();
self.stack.push(Value::I64(val));
}
POP => {
self.stack.pop();
}
LOAD => {
// let addr = self.read_i64() as usize;
let addr = (op >> 8) as usize;
if let Some(val) = self.heap.get(addr) {
self.stack.push(val.clone());
} else {
panic!("Trying to load from uninitialized heap");
}
}
STORE => {
let val = self
.stack
.pop()
.expect("Trying to pop value from stack for storing");
// let addr = self.read_i64() as usize;
let addr = (op >> 8) as usize;
if self.heap.len() == addr {
self.heap.push(val);
} else {
self.heap[addr] = val;
}
}
PRINT => {
let val = self
.stack
.pop()
.expect("Trying to pop value from stack for printing");
print!("{}", val);
}
DBG_PRINT => {
let val = self
.stack
.pop()
.expect("Trying to pop value from stack for printing");
print!("{:?}", val);
}
ADD => {
binop_stack!(self, +);
// self.stack.push(Value::I64(vals.0 + vals.1))
}
SUB => {
binop_stack!(self, -);
// let vals = self.pop2_i64();
// self.stack.push(Value::I64(vals.0 - vals.1))
}
MUL => {
binop_stack!(self, *);
// let vals = self.pop2_i64();
// self.stack.push(Value::I64(vals.0 * vals.1))
}
DIV => {
binop_stack!(self, /);
// let vals = self.pop2_i64();
// self.stack.push(Value::I64(vals.0 / vals.1))
}
MOD => {
binop_stack!(self, %);
// let vals = self.pop2_i64();
// self.stack.push(Value::I64(vals.0 % vals.1))
}
EQ => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 == vals.1 { 1 } else { 0 }))
}
NEQ => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 != vals.1 { 1 } else { 0 }))
}
GT => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 > vals.1 { 1 } else { 0 }))
}
GE => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 >= vals.1 { 1 } else { 0 }))
}
LT => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 < vals.1 { 1 } else { 0 }))
}
LE => {
let vals = self.pop2_i64();
self.stack
.push(Value::I64(if vals.0 <= vals.1 { 1 } else { 0 }))
}
BOR => {
let vals = self.pop2_i64();
self.stack.push(Value::I64(vals.0 | vals.1))
}
BAND => {
let vals = self.pop2_i64();
self.stack.push(Value::I64(vals.0 & vals.1))
}
BXOR => {
let vals = self.pop2_i64();
self.stack.push(Value::I64(vals.0 ^ vals.1))
}
SHL => {
let vals = self.pop2_i64();
self.stack.push(Value::I64(vals.0 << vals.1))
}
SHR => {
let vals = self.pop2_i64();
self.stack.push(Value::I64(vals.0 >> vals.1))
}
JUMP => {
self.ip = self.read_i64() as usize;
}
JUMP_TRUE => {
let jmp_target = self.read_i64() as usize;
if !matches!(self.stack.pop(), Some(Value::I64(0))) {
self.ip = jmp_target;
}
}
JUMP_FALSE => {
let jmp_target = self.read_i64() as usize;
if matches!(self.stack.pop(), Some(Value::I64(0))) {
self.ip = jmp_target;
}
}
_ => panic!("Invalid opcode")
}
}
}
fn pop2_i64(&mut self) -> (i64, i64) {
let rhs = self.stack.pop();
let lhs = self.stack.pop();
match (lhs, rhs) {
(Some(Value::I64(lhs)), Some(Value::I64(rhs))) => (lhs, rhs),
_ => panic!("Invalid data for add"),
}
}
fn read_i64(&mut self) -> i64 {
let mut val = if let Some(val) = self.prog.get(self.ip).copied() {
val
} else {
panic!("Expected Value as next OP")
} as i64;
self.ip += 1;
val |= (if let Some(val) = self.prog.get(self.ip).copied() {
val
} else {
panic!("Expected Value as next OP")
} as i64)
<< 32;
self.ip += 1;
val
}
}