Compare commits

..

43 Commits

Author SHA1 Message Date
80d9b36901 Add test for stringstore 2022-02-11 19:08:08 +01:00
70c9d073f9 Add a few more comments 2022-02-11 18:34:46 +01:00
5f720ad7c3 Update README 2022-02-11 16:07:12 +01:00
748bd10dd9 Fix runtime error msg 2022-02-11 16:05:05 +01:00
1ade6cae50 Mention vsc extension 2022-02-11 13:31:48 +01:00
3e4ed82dc4 Update README 2022-02-11 13:04:45 +01:00
e5edc6b2ba Fix UB non top-level functions 2022-02-11 13:00:41 +01:00
f4286db21d Remove anyhow dependency 2022-02-11 12:36:36 +01:00
3892ea46e0 Update examples 2022-02-11 01:19:45 +01:00
8b7ed96e15 Update nice_panic macro 2022-02-11 01:19:34 +01:00
67b07dfd72 Fix typo 2022-02-11 01:01:31 +01:00
6c0867143b Add toc to README 2022-02-11 00:12:36 +01:00
abefe32300 Update README 2022-02-10 23:02:40 +01:00
742d6706b0 Array values are now pass-by-reference 2022-02-10 21:27:05 +01:00
3806a61756 Allow endless loops with no condition 2022-02-10 20:36:26 +01:00
2880ba81ab Implement break & continue
- Fix return propagation inside loops
2022-02-10 13:13:15 +01:00
4e92a416ed Improve CLI
- Remove unused flags
- Show more helpful error messages
2022-02-10 12:58:09 +01:00
c1bee69fa6 Simplify general program tests 2022-02-10 12:24:20 +01:00
f2331d7de9 Add general test for functions as example 2022-02-10 12:19:01 +01:00
c4d2f89d35 Fix function args 2022-02-10 12:13:30 +01:00
ab059ce18c Add recursive fibonacci as test 2022-02-10 01:32:07 +01:00
aeedfb4ef2 Implement functions
- Implement function declaration and call
- Change the precalculated variable stack positions to contain the
  offset from the end instead of the absolute position. This is
  important for passing fun args on the stack
- Add the ability to offset the stackframes. This is used to delete the
  stack where the fun args have been stored before the block executes
- Implement exit type for blocks in interpreter. This is used to get the
  return values and propagate them where needed
- Add recursive fibonacci examples
2022-02-10 01:26:11 +01:00
f0c2bd8dde Remove panics from interpreter, worse performance
- Replaced the interpreters panics with actual errors and results
- Added a few extra checks for arrays and div-by-zero
- These changes significantly reduced runtime performance, even without
  the extra checks
2022-02-09 18:18:21 +01:00
421fbbc873 Update euler5 example 2022-02-09 17:12:47 +01:00
383da4ae05 Rewrite declaration as statement instead of binop
- Declarations are now separate statements
- Generate unknown var errors when vars are not declared
- Replace Peekable by new custom PutBackIter type that allows for
  unlimited putback and therefore look-ahead
2022-02-09 16:54:06 +01:00
7ea5f67f9c Cleaner unop parsing 2022-02-09 14:23:24 +01:00
235eb460dc Replace panics with errors in parser 2022-02-09 13:49:14 +01:00
2312deec5b Small refactoring for parser 2022-02-09 01:13:22 +01:00
948d41fb45 Update lexer tests 2022-02-09 00:20:56 +01:00
fdef796440 Update token macros 2022-02-08 23:26:23 +01:00
926bdeb2dc Refactor lexer match loop 2022-02-08 22:54:41 +01:00
726dd62794 Big token refactoring
- Extract keywords, literals and combo tokens into separate sub-enums
- Add a macro for quickly generating all tokens including the sub-enum
  tokens. This also takes less chars to write
2022-02-08 18:56:17 +01:00
c723b1c2cb Rename var in parser 2022-02-06 15:31:41 +01:00
e7b67d85a9 Add game of life example 2022-02-05 11:53:01 +01:00
cf2e5348bb Implement arrays 2022-02-04 18:48:45 +01:00
8b67c4d59c Implement block scopes (code inside braces)
- Putting code in between braces will create a new scope
2022-02-04 17:30:23 +01:00
cbf31fa513 Implement simple AST optimizer
- Precalculate operations only containing literals
2022-02-04 17:06:38 +01:00
56665af233 Update examples 2022-02-04 14:25:25 +01:00
22634af554 Precalculate stack positions for variables
- Parser calculates positions for the variables
- This removes the lookup time during runtime
- Consistent high performance
2022-02-04 14:25:25 +01:00
d4c6f3d5dc Implement string interning 2022-02-04 14:25:23 +01:00
4dbc3adfd5 Refactor Ast to ScopedBlock 2022-02-04 14:24:03 +01:00
cbea567d65 Implement vec based scopes
- Replaced vartable hashmap with vec
- Use linear search in reverse to find the variables by name
- This is really fast with a small number of variables but tanks fast
  with more vars due to O(n) lookup times
- Implemented scopes by dropping all elements from the vartable at the
  end of a scope
2022-02-04 14:24:00 +01:00
e4977da546 Use euler examples as tests 2022-02-04 12:45:34 +01:00
23 changed files with 2483 additions and 607 deletions

View File

@ -4,5 +4,4 @@ version = "0.1.0"
edition = "2021" edition = "2021"
[dependencies] [dependencies]
anyhow = "1.0.53"
thiserror = "1.0.30" thiserror = "1.0.30"

280
README.md
View File

@ -1,13 +1,49 @@
# NEK-Lang # NEK-Lang
## Table of contents
- [NEK-Lang](#nek-lang)
- [Table of contents](#table-of-contents)
- [Variables](#variables)
- [Declaration](#declaration)
- [Assignment](#assignment)
- [Datatypes](#datatypes)
- [I64](#i64)
- [String](#string)
- [Array](#array)
- [Expressions](#expressions)
- [General](#general)
- [Mathematical Operators](#mathematical-operators)
- [Bitwise Operators](#bitwise-operators)
- [Logical Operators](#logical-operators)
- [Equality & Relational Operators](#equality--relational-operators)
- [Control-Flow](#control-flow)
- [Loop](#loop)
- [If / Else](#if--else)
- [Block Scopes](#block-scopes)
- [Functions](#functions)
- [Function definition](#function-definition)
- [Function calls](#function-calls)
- [IO](#io)
- [Print](#print)
- [Comments](#comments)
- [Line comments](#line-comments)
- [Feature Tracker](#feature-tracker)
- [High level Components](#high-level-components)
- [Language features](#language-features)
- [Parsing Grammar](#parsing-grammar)
- [Expressions](#expressions-1)
- [Statements](#statements)
- [Examples](#examples)
- [Extras](#extras)
- [Visual Studio Code Language Support](#visual-studio-code-language-support)
## Variables ## Variables
Currently all variables are global and completely unscoped. That means no matter where a variable is declared, it remains over the whole remaining runtime of the progam. The variables are all contained in scopes. Variables defined in an outer scope can be accessed in
inner scoped. All variables defined in a scope that has ended do no longer exist and can't be
All variables are currently of type `i64` (64-bit signed integer) accessed.
### Declaration ### Declaration
- Declare and initialize a new variable - Declare and initialize a new variable
- Declaring a previously declared variable again is currently equivalent to an assignment - Declaring a previously declared variable again will shadow the previous variable
- Declaration is needed before assignment or other usage - Declaration is needed before assignment or other usage
- The variable name is on the left side of the `<-` operator - The variable name is on the left side of the `<-` operator
- The assigned value is on the right side and can be any expression - The assigned value is on the right side and can be any expression
@ -25,6 +61,62 @@ a = 123;
``` ```
The value `123` is assigned to the variable named `a`. `a` needs to be declared before this. The value `123` is assigned to the variable named `a`. `a` needs to be declared before this.
## Datatypes
The available variable datatypes are `i64` (64-bit signed integer), `string` (`"this is a string"`) and `array` (`[10]`)
### I64
- The normal default datatype is `i64` which is a 64-bit signed integer
- Can be created by just writing an integer literal like `546`
- Inside the number literal `_` can be inserted for visual separation `100_000`
- The i64 values can be used as expected in calculations, conditions and so on
```
my_i64 <- 123_456;
```
### String
- Strings mainly exist for formatting the text output of a program
- Strings can be created by using doublequotes like in other languages `"Hello world"`
- There is no way to access or change the characters of the string
- Unicode characters are supported `"Hello 🌎"`
- Escape characters `\n`, `\r`, `\t`, `\"`, `\\` are supported
- String can be assigned to variables, just like i64
```
world <- "🌎";
print "Hello ";
print world;
print "\n";
```
### Array
- Arrays can contain any other datatypes and don't need to have the same type in all cells
- Arrays can be created by using brackets with the size in between `[size]`
- Arrays must be assigned to a variable in order to be used
- All cells will be initialized with i64 0 values
- The size can be any expression that results in a positive i64 value
- The array size can't be changed after creation
- The arrays data is always allocated on the heap
- The array cells can be accessed by using the variable name and specifying the index in brackets
`my_arr[index]`
- The index can be any expression that results in a positive i64 value in the range of the arrays
indices
- The indices start with 0
- When an array is passed to a function, it is passed by reference
```
width <- 5;
heigt <- 5;
// Initialize array of size 25, initialized with 25x 0
my_array = [width * height];
// Modify first value
my_array[0] = 5;
// Print first value
// Outputs `5`
print my_array[0];
```
## Expressions ## Expressions
The operator precedence is the same order as in `C` for all implemented operators. The operator precedence is the same order as in `C` for all implemented operators.
Refer to the Refer to the
@ -54,7 +146,9 @@ Supported mathematical operations:
- "Bit flip" (One's complement) `~a` - "Bit flip" (One's complement) `~a`
### Logical Operators ### Logical Operators
The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0` The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0`.
Note that logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operation will be evaluated, even if it might not be necessary.
- And `a && b` - And `a && b`
- Or `a || b` - Or `a || b`
- Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`) - Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`)
@ -69,37 +163,53 @@ The equality and relational operations result in `1` if the condition is evaluat
- Less or equal than `a <= b` - Less or equal than `a <= b`
## Control-Flow ## Control-Flow
For conditions like in if or loops, every non zero value is equal to `true`, and `0` is `false`. For conditions like in if or loops, every non-zero value is equal to `true`, and `0` is `false`.
### Loop ### Loop
- There is currently only the `loop` keyword that can act like a `while` with optional advancement (an expression that is executed after the loop body) - The `loop` keyword can be used as an infinite loop, as a while loop or as a while loop with
- The `loop` keyword is followed by the condition (an expression) without needing parentheses advancement (an expression that is executed after each loop)
- If only `loop` is used, directly followed by the body, it is an infinite loop that needs to be
terminated by using the `break` keyword
- The `loop` keyword can be followed by the condition (an expression) without needing parentheses
- *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement - *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement
- The loops body is wrapped in braces (`{ }`) just like in C/C++ - The loops body is wrapped in braces (`{ }`) just like in C/C++
- The `continue` keyword can be used to end the current loop iteration early
- The `break` keyword can be used to fully break out of the current loop
``` ```
// Print the numbers from 0 to 9 // Print the numbers from 0 to 9
// With endless loop
i <- 0;
loop {
if i >= 10 {
break;
}
print i;
i = i + 1;
}
// Without advancement // Without advancement
i <- 0; i <- 0;
loop i < 10 { loop i < 10 {
print i; print i;
i = i - 1; i = i + 1;
} }
// With advancement // With advancement
k <- 0; k <- 0;
loop k < 10; k = k - 1 { loop k < 10; k = k + 1 {
print k; print k;
} }
``` ```
### If / Else ### If / Else
- The language supports `if` and an optional `else` - The language supports `if` and an optional `else`
- After the `if` keyword must be the deciding condition, parentheses are not needed - After the `if` keyword must be the deciding condition, parentheses are not needed
- The block *if-true* block is wrapped in braces (`{ }`) - The blocks are wrapped in braces (`{ }`)
- *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block - *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block
- NOTE: Logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operations will be evaluated, even if it might not be necessary
``` ```
a <- 1; a <- 1;
b <- 2; b <- 2;
@ -112,15 +222,88 @@ if a == b {
} }
``` ```
### Block Scopes
- It is possible to create a limited scope for local variables that will no longer exist once the
scope ends
- Shadowing variables by redefining a variable in an inner scope is supported
```
var_in_outer_scope <- 5;
{
var_in_inner_scope <- 3;
// Inner scope can access both vars
print var_in_outer_scope;
print var_in_inner_scope;
}
// Outer scope is still valid
print var_in_outer_scope;
// !!! THIS DOES NOT WORK !!!
// The inner scope has ended
print var_in_inner_scope;
```
## Functions
### Function definition
- Functions can be defined by using the `fun` keyword, followed by the function name and the
parameters in parentheses. After the parentheses, the body is specified inside a braces block
- The function parameters are specified by only their names
- The function body has its own scope
- Parameters are only accessible inside the body
- Variables from the outer scope can be accessed and modified if the are defined before the function
- Variables from the outer scope are shadowed by parameters or local variables with the same name
- The `return` keyword can be used to return a value from the function and exit it immediately
- If no return is specified, a special `void` value is returned. That value can't be used in
calculations or comparisons, but can be stored in a variable (even tho it doesn't make sense)
- Functions can only be defined at the top-level. So defining a function inside of any other scoped
block (like inside another function, if, loop, ...) is invalid
- Functions can only be used after definition and there is no forward declaration right now
- However a function can be called recursively inside of itself
- Functions can't be redefined, so defining a function with an existing name is invalid
```
fun add_maybe(a, b) {
if a < 100 {
return a;
} else {
return a + b;
}
}
fun println(val) {
print val;
print "\n";
}
```
### Function calls
- Function calls are primary expressions, so they can be directly used in calculations (if they
return appropriate values)
- Function calls are performed by writing the function name, followed by the arguments in parentheses
- The arguments can be any expressions, separated by commas
```
b <- 100;
result <- add_maybe(250, b);
// Prints 350 + new-line
println(result);
```
## IO ## IO
### Print ### Print
Printing is implemented via the `print` keyword Printing is implemented via the `print` keyword
- The `print` keyword is followed by an expression, the value of which will be printed to the terminal. - The `print` keyword is followed by an expression, the value of which will be printed to the terminal
- Print currently automatically adds a linebreak - To add a line break a string print can be used `print "\n";`
``` ```
a <- 1; a <- 1;
print a; // Outputs `"1\n"` to the terminal // Outputs `1` to the terminal
print a;
// Outputs a new-line to the terminal
print "\n";
``` ```
## Comments ## Comments
@ -140,6 +323,8 @@ Line comments can be initiated by using `//`
- [x] Lexer: Transforms text into Tokens - [x] Lexer: Transforms text into Tokens
- [x] Parser: Transforms Tokens into Abstract Syntax Tree - [x] Parser: Transforms Tokens into Abstract Syntax Tree
- [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements - [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements
- [x] Simple optimizer: Apply trivial optimizations to the Ast
- [x] Precalculate binary ops / unary ops that have only literal operands
## Language features ## Language features
@ -149,7 +334,7 @@ Line comments can be initiated by using `//`
- [x] Subtraction `a - b` - [x] Subtraction `a - b`
- [x] Multiplication `a * b` - [x] Multiplication `a * b`
- [x] Division `a / b` - [x] Division `a / b`
- [x] Modulo `a % b - [x] Modulo `a % b`
- [x] Negate `-a` - [x] Negate `-a`
- [x] Parentheses `(a + b) * c` - [x] Parentheses `(a + b) * c`
- [x] Logical boolean operators - [x] Logical boolean operators
@ -173,23 +358,43 @@ Line comments can be initiated by using `//`
- [x] Variables - [x] Variables
- [x] Declaration - [x] Declaration
- [x] Assignment - [x] Assignment
- [x] Local variables (for example inside loop, if, else, functions)
- [x] Scoped block for specific local vars `{ ... }`
- [x] Statements with semicolon & Multiline programs - [x] Statements with semicolon & Multiline programs
- [x] Control flow - [x] Control flow
- [x] While loop `while X { ... }` - [x] Loops
- [x] While-style loop `loop X { ... }`
- [x] For-style loop without with `X` as condition and `Y` as advancement `loop X; Y { ... }`
- [x] Infinite loop `loop { ... }`
- [x] Break `break`
- [x] Continue `continue`
- [x] If else statement `if X { ... } else { ... }` - [x] If else statement `if X { ... } else { ... }`
- [x] If Statement - [x] If Statement
- [x] Else statement - [x] Else statement
- [x] Line comments `//` - [x] Line comments `//`
- [x] Strings - [x] Strings
- [x] Arrays
- [x] Creating array with size `X` as a variable `arr <- [X]`
- [x] Accessing arrays by index `arr[X]`
- [x] IO Intrinsics - [x] IO Intrinsics
- [x] Print - [x] Print
- [x] Functions
- [x] Function declaration `fun f(X, Y, Z) { ... }`
- [x] Function calls `f(1, 2, 3)`
- [x] Function returns `return X`
- [x] Local variables
- [x] Pass arrays by-reference, i64 by-vale, string is a const ref
## Grammar # Parsing Grammar
### Expressions ## Expressions
``` ```
LITERAL = I64_LITERAL | STR_LITERAL ARRAY_LITERAL = "[" expr "]"
expr_primary = LITERAL | IDENT | "(" expr ")" | "-" expr_primary | "~" expr_primary ARRAY_ACCESS = IDENT "[" expr "]"
FUN_CALL = IDENT "(" (expr ",")* expr? ")"
LITERAL = I64_LITERAL | STR_LITERAL | ARRAY_LITERAL
expr_primary = LITERAL | IDENT | FUN_CALL | ARRAY_ACCESS | "(" expr ")" | "-" expr_primary
| "~" expr_primary
expr_mul = expr_primary (("*" | "/" | "%") expr_primary)* expr_mul = expr_primary (("*" | "/" | "%") expr_primary)*
expr_add = expr_mul (("+" | "-") expr_mul)* expr_add = expr_mul (("+" | "-") expr_mul)*
expr_shift = expr_add ((">>" | "<<") expr_add)* expr_shift = expr_add ((">>" | "<<") expr_add)*
@ -203,10 +408,33 @@ expr_lor = expr_land ("||" expr_land)*
expr = expr_lor expr = expr_lor
``` ```
### Statements ## Statements
``` ```
stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")? stmt_return = "return" expr ";"
stmt_loop = "loop" expr (";" expr)? "{" stmt* "}" stmt_break = "break" ";"
stmt_continue = "continue" ";"
stmt_var_decl = IDENT "<-" expr ";"
stmt_fun_decl = "fun" IDENT "(" (IDENT ",")* IDENT? ")" "{" stmt* "}"
stmt_expr = expr ";" stmt_expr = expr ";"
stmt = stmt_expr | stmt_loop stmt_block = "{" stmt* "}"
``` stmt_loop = "loop" (expr (";" expr)?)? "{" stmt* "}"
stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")?
stmt_print = "print" expr ";"
stmt = stmt_return | stmt_break | stmt_continue | stmt_var_decl | stmt_fun_decl
| stmt_expr | stmt_block | stmt_loop | stmt_if | stmt_print
```
# Examples
There are a bunch of examples in the [examples](examples/) directory. Those include (non-optimal) solutions to the first five project euler problems, as well as a [simple Game of Life implementation](examples/game_of_life.nek).
To run an example via `cargo-run`, use:
```
cargo run --release -- examples/[NAME]
```
# Extras
## Visual Studio Code Language Support
A VSCode extension that provides simple syntax highlighing for nek is also available on
[gitlab](https://code.fbi.h-da.de/advanced-systems-programming-ws21/x4/nek-lang-vscode). Since this
is a very small scale project, the extension was not published and instuctions on how to install it
can be found in the mentioned repository.

View File

@ -7,7 +7,7 @@
sum <- 0; sum <- 0;
i <- 0; i <- 0;
loop i < 1_000; i = i + 1 { loop i < 1_000; i = i + 1 {
if i % 3 == 0 | i % 5 == 0 { if i % 3 == 0 || i % 5 == 0 {
sum = sum + i; sum = sum + i;
} }
} }

View File

@ -10,14 +10,12 @@ sum <- 0;
a <- 0; a <- 0;
b <- 1; b <- 1;
tmp <- 0;
loop a < 4_000_000 { loop a < 4_000_000 {
if a % 2 == 0 { if a % 2 == 0 {
sum = sum + a; sum = sum + a;
} }
tmp = a; tmp <- a;
a = b; a = b;
b = b + tmp; b = b + tmp;
} }

View File

@ -18,10 +18,10 @@ loop number > 1 {
div = div + 1; div = div + 1;
if div * div > number { if div * div > number {
if number > 1 & number > result { if number > 1 && number > result {
result = number; result = number;
} }
number = 0; break;
} }
} }

View File

@ -4,30 +4,25 @@
// //
// Correct Answer: 906609 // Correct Answer: 906609
fun reverse(n) {
rev <- 0;
loop n {
rev = rev * 10 + n % 10;
n = n / 10;
}
return rev;
}
res <- 0; res <- 0;
tmp <- 0;
num <- 0;
num_rev <- 0;
i <- 100; i <- 100;
k <- 100;
loop i < 1_000; i = i + 1 { loop i < 1_000; i = i + 1 {
k = 100; k <- i;
loop k < 1_000; k = k + 1 { loop k < 1_000; k = k + 1 {
num_rev = 0; num <- i * k;
num_rev <- reverse(num);
num = i * k; if num == num_rev && num > res {
tmp = num;
loop tmp {
num_rev = num_rev*10 + tmp % 10;
tmp = tmp / 10;
}
if num == num_rev & num > res {
res = num; res = num;
} }
} }

View File

@ -4,19 +4,19 @@
# #
# Correct Answer: 906609 # Correct Answer: 906609
def reverse(n):
rev = 0
while n:
rev = rev * 10 + n % 10
n //= 10
return rev
res = 0 res = 0
for i in range(100, 999): for i in range(100, 1_000):
for k in range(100, 999): for k in range(i, 1_000):
num = i * k num = i * k
tmp = num num_rev = reverse(num)
num_rev = 0
while tmp != 0:
num_rev = num_rev*10 + tmp % 10
tmp = tmp // 10
if num == num_rev and num > res: if num == num_rev and num > res:
res = num res = num

View File

@ -3,26 +3,21 @@
// //
// Correct Answer: 232_792_560 // Correct Answer: 232_792_560
num <- 20; fun gcd(x, y) {
should_continue <- 1; loop y {
i <- 2; tmp <- x;
x = y;
loop should_continue { y = tmp % y;
should_continue = 0;
i = 20;
loop i >= 2; i = i - 1 {
if num % i != 0 {
should_continue = 1;
// break
i = 0;
}
} }
if should_continue == 1 { return x;
num = num + 20;
}
} }
print num; result <- 1;
i <- 1;
loop i <= 20; i = i + 1 {
result = result * (i / gcd(i, result));
}
print result;

15
examples/euler5.py Normal file
View File

@ -0,0 +1,15 @@
# 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
# What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
#
# Correct Answer: 232_792_560
def gcd(x, y):
while y:
x, y = y, x % y
return x
result = 1
for i in range(1, 21):
result *= i // gcd(i, result)
print(result)

134
examples/game_of_life.nek Normal file
View File

@ -0,0 +1,134 @@
fun print_field(field, width, height) {
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
if field[y*height + x] {
print "# ";
} else {
print ". ";
}
}
print "\n";
}
print "\n";
}
fun count_neighbours(field, x, y, width, height) {
neighbours <- 0;
if y > 0 {
if x > 0 {
if field[(y-1)*width + (x-1)] {
// Top left
neighbours = neighbours + 1;
}
}
if field[(y-1)*width + x] {
// Top
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y-1)*width + (x+1)] {
// Top right
neighbours = neighbours + 1;
}
}
}
if x > 0 {
if field[y*width + (x-1)] {
// Left
neighbours = neighbours + 1;
}
}
if x < width-1 {
if field[y*width + (x+1)] {
// Right
neighbours = neighbours + 1;
}
}
if y < height-1 {
if x > 0 {
if field[(y+1)*width + (x-1)] {
// Bottom left
neighbours = neighbours + 1;
}
}
if field[(y+1)*width + x] {
// Bottom
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y+1)*width + (x+1)] {
// Bottom right
neighbours = neighbours + 1;
}
}
}
return neighbours;
}
fun copy(from, to, len) {
i <- 0;
loop i < len; i = i + 1 {
to[i] = from[i];
}
}
// Set the width and height of the field
width <- 10;
height <- 10;
// Create the main and temporary field
field <- [width*height];
field2 <- [width*height];
// Preset the main field with a glider
field[1] = 1;
field[12] = 1;
field[20] = 1;
field[21] = 1;
field[22] = 1;
fun run_gol(num_rounds) {
runs <- 0;
loop runs < num_rounds; runs = runs + 1 {
// Print the field
print_field(field, width, height);
// Calculate next stage from field and store into field2
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
// Get the neighbours of the current cell
neighbours <- count_neighbours(field, x, y, width, height);
// Set the new cell according to the neighbour count
if neighbours < 2 || neighbours > 3 {
field2[y*width + x] = 0;
} else {
if neighbours == 3 {
field2[y*width + x] = 1;
} else {
field2[y*width + x] = field[y*width + x];
}
}
}
}
// Transfer from field2 to field
copy(field2, field, width*height);
}
}
run_gol(32);

View File

@ -0,0 +1,9 @@
fun fib(n) {
if n <= 1 {
return n;
} else {
return fib(n-1) + fib(n-2);
}
}
print fib(30);

View File

@ -0,0 +1,6 @@
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
print(fib(30))

View File

@ -0,0 +1,31 @@
fun square(a) {
return a * a;
}
fun add(a, b) {
return a + b;
}
fun mul(a, b) {
return a * b;
}
// Funtion with multiple args & nested calls to different functions
fun addmul(a, b, c) {
return mul(add(a, b), c);
}
a <- 10;
b <- 20;
c <- 3;
result <- addmul(a, b, c) + square(c);
// Access and modify outer variable. Argument `a` must not be used from outer var
fun sub_from_result(a) {
result = result - a;
}
sub_from_result(30);
print result;

View File

@ -1,126 +1,188 @@
use std::rc::Rc; use std::rc::Rc;
/// Types for binary operators use crate::stringstore::{Sid, StringStore};
/// Types for binary operations
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum BinOpType { pub enum BinOpType {
/// Addition /// Addition ("+")
Add, Add,
/// Subtraction /// Subtraction ("-")
Sub, Sub,
/// Multiplication /// Multiplication ("*")
Mul, Mul,
/// Divide /// Division ("/")
Div, Div,
/// Modulo /// Modulo / Remainder ("%")
Mod, Mod,
/// Compare Equal /// Compare Equal ("==")
EquEqu, EquEqu,
/// Compare Not Equal /// Compare Not Equal ("!=")
NotEqu, NotEqu,
/// Less than /// Compare Less than ("<")
Less, Less,
/// Less than or Equal /// Compare Less than or Equal ("<=")
LessEqu, LessEqu,
/// Greater than /// Compare Greater than (">")
Greater, Greater,
/// Greater than or Equal /// Compare Greater than or Equal (">=")
GreaterEqu, GreaterEqu,
/// Bitwise OR (inclusive or) /// Bitwise Or ("|")
BOr, BOr,
/// Bitwise And /// Bitwise And ("&")
BAnd, BAnd,
/// Bitwise Xor (exclusive or) /// Bitwise Xor / Exclusive Or ("^")
BXor, BXor,
/// Logical And /// Logical And ("&&")
LAnd, LAnd,
/// Logical Or /// Logical Or ("||")
LOr, LOr,
/// Shift Left /// Bitwise Shift Left ("<<")
Shl, Shl,
/// Shift Right /// Bitwise Shift Right (">>")
Shr, Shr,
/// Assign value to variable /// Assign value to variable ("=")
Assign, Assign,
/// Declare new variable with value
Declare,
} }
/// Types for unary operations
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum UnOpType { pub enum UnOpType {
/// Unary Negate /// Unary Negation ("-")
Negate, Negate,
/// Bitwise Not /// Bitwise Not / Bitflip ("~")
BNot, BNot,
/// Logical Not /// Logical Not ("!")
LNot, LNot,
} }
/// Ast Node for possible Expression variants
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expression { pub enum Expression {
/// Integer literal (64-bit) /// Integer literal (64-bit)
I64(i64), I64(i64),
/// String literal /// String literal
String(Rc<String>), String(Sid),
/// Variable
Var(String), /// Array with size as an expression
ArrayLiteral(Box<Expression>),
/// Array access with name, stackpos and position as expression
ArrayAccess(Sid, usize, Box<Expression>),
/// Function call with name, stackpos and the arguments as a vec of expressions
FunCall(Sid, usize, Vec<Expression>),
/// Variable with name and the stackpos from behind. This means that stackpos 0 refers to the
/// last variable on the stack and not the first
Var(Sid, usize),
/// Binary operation. Consists of type, left hand side and right hand side /// Binary operation. Consists of type, left hand side and right hand side
BinOp(BinOpType, Box<Expression>, Box<Expression>), BinOp(BinOpType, Box<Expression>, Box<Expression>),
/// Unary operation. Consists of type and operand /// Unary operation. Consists of type and operand
UnOp(UnOpType, Box<Expression>), UnOp(UnOpType, Box<Expression>),
} }
/// Ast Node for a loop
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub struct Loop { pub struct Loop {
/// The condition that determines if the loop should continue /// The condition that determines if the loop should continue
pub condition: Expression, pub condition: Option<Expression>,
/// This is executed after each loop to advance the condition variables /// This is executed after each loop to advance the condition variables
pub advancement: Option<Expression>, pub advancement: Option<Expression>,
/// The loop body that is executed each loop /// The loop body that is executed each loop
pub body: Ast, pub body: BlockScope,
} }
/// Ast Node for an if
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub struct If { pub struct If {
/// The condition /// The condition
pub condition: Expression, pub condition: Expression,
/// The body that is executed when condition is true /// The body that is executed when condition is true
pub body_true: Ast, pub body_true: BlockScope,
/// The if body that is executed when the condition is false /// The if body that is executed when the condition is false
pub body_false: Ast, pub body_false: BlockScope,
} }
/// Ast Node for a function declaration
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct FunDecl {
/// The function name as StringID, stored in the stringstore
pub name: Sid,
/// The absolute position on the function stack where the function is stored
pub fun_stackpos: usize,
/// The argument names as StringIDs
pub argnames: Vec<Sid>,
/// The function body
pub body: Rc<BlockScope>,
}
/// Ast Node for a variable declaration
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct VarDecl {
/// The variable name as StringID, stored in the stringstore
pub name: Sid,
/// The absolute position on the variable stack where the variable is stored
pub var_stackpos: usize,
/// The right hand side that generates the initial value for the variable
pub rhs: Expression,
}
/// Ast Node for the possible Statement variants
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Statement { pub enum Statement {
/// Return from a function with the given result value as an expression
Return(Expression),
/// Break out of the current loop
Break,
/// End the current loop iteration early and continue with the next loop iteration
Continue,
/// A variable declaration
Declaration(VarDecl),
/// A function declaration
FunDeclare(FunDecl),
/// A simple expression. This could be a function call or an assignment for example
Expr(Expression), Expr(Expression),
/// A freestanding block scope
Block(BlockScope),
/// A loop
Loop(Loop), Loop(Loop),
/// An if
If(If), If(If),
/// A print statement that will output the value of the given expression to the terminal
Print(Expression), Print(Expression),
} }
#[derive(Debug, PartialEq, Eq, Clone, Default)] /// A number of statements that form a block of code together
pub type BlockScope = Vec<Statement>;
/// A full abstract syntax tree
#[derive(Clone, Default)]
pub struct Ast { pub struct Ast {
pub prog: Vec<Statement>, /// The stringstore contains the actual string values which are replaced with StringIDs in the
/// Ast. So this is needed to get the actual strings later
pub stringstore: StringStore,
/// The main (top-level) code given as a number of statements
pub main: BlockScope,
} }
impl BinOpType { impl BinOpType {
@ -133,7 +195,6 @@ impl BinOpType {
pub fn precedence(&self) -> u8 { pub fn precedence(&self) -> u8 {
match self { match self {
BinOpType::Declare => 0,
BinOpType::Assign => 1, BinOpType::Assign => 1,
BinOpType::LOr => 2, BinOpType::LOr => 2,
BinOpType::LAnd => 3, BinOpType::LAnd => 3,

116
src/astoptimizer.rs Normal file
View File

@ -0,0 +1,116 @@
use crate::ast::{Ast, BlockScope, Expression, If, Loop, Statement, BinOpType, UnOpType, VarDecl};
/// A trait that allows to optimize an abstract syntax tree
pub trait AstOptimizer {
/// Consume an abstract syntax tree and return an ast that has the same functionality but with
/// optional optimizations.
fn optimize(ast: Ast) -> Ast;
}
/// A very simple optimizer that applies trivial optimizations like precalculation expressions that
/// have only literals as operands
pub struct SimpleAstOptimizer;
impl AstOptimizer for SimpleAstOptimizer {
fn optimize(mut ast: Ast) -> Ast {
Self::optimize_block(&mut ast.main);
ast
}
}
impl SimpleAstOptimizer {
fn optimize_block(block: &mut BlockScope) {
for stmt in block {
match stmt {
Statement::Expr(expr) => Self::optimize_expr(expr),
Statement::Block(block) => Self::optimize_block(block),
Statement::Loop(Loop {
condition,
advancement,
body,
}) => {
if let Some(condition) = condition {
Self::optimize_expr(condition);
}
if let Some(advancement) = advancement {
Self::optimize_expr(advancement)
}
Self::optimize_block(body);
}
Statement::If(If {
condition,
body_true,
body_false,
}) => {
Self::optimize_expr(condition);
Self::optimize_block(body_true);
Self::optimize_block(body_false);
}
Statement::Print(expr) => Self::optimize_expr(expr),
Statement::Declaration(VarDecl { name: _, var_stackpos: _, rhs}) => Self::optimize_expr(rhs),
Statement::FunDeclare(_) => (),
Statement::Return(expr) => Self::optimize_expr(expr),
Statement::Break | Statement::Continue => (),
}
}
}
fn optimize_expr(expr: &mut Expression) {
match expr {
Expression::BinOp(bo, lhs, rhs) => {
Self::optimize_expr(lhs);
Self::optimize_expr(rhs);
// Precalculate binary operations that consist of 2 literals. No need to do this at
// runtime, as all parts of the calculation are known at *compiletime* / parsetime.
match (lhs.as_mut(), rhs.as_mut()) {
(Expression::I64(lhs), Expression::I64(rhs)) => {
let new_expr = match bo {
BinOpType::Add => Expression::I64(*lhs + *rhs),
BinOpType::Mul => Expression::I64(*lhs * *rhs),
BinOpType::Sub => Expression::I64(*lhs - *rhs),
BinOpType::Div => Expression::I64(*lhs / *rhs),
BinOpType::Mod => Expression::I64(*lhs % *rhs),
BinOpType::BOr => Expression::I64(*lhs | *rhs),
BinOpType::BAnd => Expression::I64(*lhs & *rhs),
BinOpType::BXor => Expression::I64(*lhs ^ *rhs),
BinOpType::LAnd => Expression::I64(if (*lhs != 0) && (*rhs != 0) { 1 } else { 0 }),
BinOpType::LOr => Expression::I64(if (*lhs != 0) || (*rhs != 0) { 1 } else { 0 }),
BinOpType::Shr => Expression::I64(*lhs >> *rhs),
BinOpType::Shl => Expression::I64(*lhs << *rhs),
BinOpType::EquEqu => Expression::I64(if lhs == rhs { 1 } else { 0 }),
BinOpType::NotEqu => Expression::I64(if lhs != rhs { 1 } else { 0 }),
BinOpType::Less => Expression::I64(if lhs < rhs { 1 } else { 0 }),
BinOpType::LessEqu => Expression::I64(if lhs <= rhs { 1 } else { 0 }),
BinOpType::Greater => Expression::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Expression::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Assign => unreachable!(),
};
*expr = new_expr;
},
_ => ()
}
}
Expression::UnOp(uo, operand) => {
Self::optimize_expr(operand);
// Precalculate unary operations just like binary ones
match operand.as_mut() {
Expression::I64(val) => {
let new_expr = match uo {
UnOpType::Negate => Expression::I64(-*val),
UnOpType::BNot => Expression::I64(!*val),
UnOpType::LNot => Expression::I64(if *val == 0 { 1 } else { 0 }),
};
*expr = new_expr;
}
_ => (),
}
}
_ => (),
}
}
}

View File

@ -1,85 +1,253 @@
use std::{fmt::Display, rc::Rc}; use std::{cell::RefCell, rc::Rc};
use thiserror::Error;
use crate::{ use crate::{
ast::{Ast, BinOpType, Expression, If, Statement, UnOpType}, ast::{Ast, BinOpType, BlockScope, Expression, FunDecl, If, Statement, UnOpType},
astoptimizer::{AstOptimizer, SimpleAstOptimizer},
lexer::lex, lexer::lex,
nice_panic,
parser::parse, parser::parse,
stringstore::{Sid, StringStore},
}; };
#[derive(Debug, PartialEq, Eq, Clone)] /// Runtime errors that can occur during execution
pub enum Value { #[derive(Debug, Error)]
I64(i64), pub enum RuntimeError {
String(Rc<String>), #[error("Invalid array Index: {0:?}")]
InvalidArrayIndex(Value),
#[error("Variable used but not declared: {0}")]
VarUsedNotDeclared(String),
#[error("Can't index into non-array variable: {0}")]
TryingToIndexNonArray(String),
#[error("Invalid value type for unary operation: {0:?}")]
UnOpInvalidType(Value),
#[error("Incompatible binary operations. Operands don't match: {0:?} and {1:?}")]
BinOpIncompatibleTypes(Value, Value),
#[error("Array access out of bounds: Accessed {0}, size is {1}")]
ArrayOutOfBounds(usize, usize),
#[error("Division by zero")]
DivideByZero,
#[error("Invalid number of arguments for function {0}. Expected {1}, got {2}")]
InvalidNumberOfArgs(String, usize, usize),
} }
/// Possible variants for the values
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Value {
/// 64-bit integer value
I64(i64),
/// String value
String(Sid),
/// Array value
Array(Rc<RefCell<Vec<Value>>>),
/// Void value
Void,
}
/// The exit type of a block. When a block ends, the exit type specified why the block ended.
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum BlockExit {
/// Normal exit when the block just ends normally (no returns / breaks / continues / etc.)
Normal,
/// The block ended through a break statement. This will be propagated up to the next loop
/// and cause it to fully terminate
Break,
/// The block ended through a continue statement. This will be propagated up to the next loop
/// and cause it to start the next iteration
Continue,
/// The block ended through a return statement. This will propagate up to the next function
/// body end
Return(Value),
}
#[derive(Default)]
pub struct Interpreter { pub struct Interpreter {
// Variable table stores the runtime values of variables /// Run the SimpleAstOptimizer over the Ast before executing
vartable: Vec<(String, Value)>, pub optimize_ast: bool,
/// Print the tokens after lexing
pub print_tokens: bool,
/// Print the ast after parsing
pub print_ast: bool,
/// Capture the output values of print statements instead of printing them to the terminal
pub capture_output: bool,
/// The stored values that were captured
output: Vec<Value>,
/// Variable table stores the runtime values of variables as a stack
vartable: Vec<Value>,
/// Function table stores the functions during runtime as a stack
funtable: Vec<FunDecl>,
/// The stringstore contains all strings used throughout the program
stringstore: StringStore,
} }
impl Interpreter { impl Interpreter {
/// Create a new Interpreter
pub fn new() -> Self { pub fn new() -> Self {
Self { Self {
vartable: Vec::new(), optimize_ast: true,
..Self::default()
} }
} }
fn get_var(&self, name: &str) -> Option<Value> { /// Get the captured output
self.vartable pub fn output(&self) -> &[Value] {
.iter() &self.output
.rev()
.find(|it| it.0 == name)
.map(|it| it.1.clone())
} }
fn get_var_mut(&mut self, name: &str) -> Option<&mut Value> { /// Try to retrieve a variable value from the varstack. The idx is the index from the back of
self.vartable /// the stack. So 0 is the last value, not the first
.iter_mut() fn get_var(&self, idx: usize) -> Option<Value> {
.rev() self.vartable.get(self.vartable.len() - idx - 1).cloned()
.find(|it| it.0 == name)
.map(|it| &mut it.1)
} }
pub fn run_str(&mut self, code: &str, print_tokens: bool, print_ast: bool) { /// Try to retrieve a mutable reference to a variable value from the varstack. The idx is the
let tokens = lex(code).unwrap(); /// index from the back of the stack. So 0 is the last value, not the first
if print_tokens { fn get_var_mut(&mut self, idx: usize) -> Option<&mut Value> {
let idx = self.vartable.len() - idx - 1;
self.vartable.get_mut(idx)
}
/// Lex, parse and then run the given sourecode. This will terminate the program when an error
/// occurs and print an appropriate error message.
pub fn run_str(&mut self, code: &str) {
// Lex the tokens
let tokens = match lex(code) {
Ok(tokens) => tokens,
Err(e) => nice_panic!("Lexing error: {}", e),
};
if self.print_tokens {
println!("Tokens: {:?}", tokens); println!("Tokens: {:?}", tokens);
} }
let ast = parse(tokens); // Parse the ast
if print_ast { let ast = match parse(tokens) {
println!("{:#?}", ast); Ok(ast) => ast,
} Err(e) => nice_panic!("Parsing error: {}", e),
};
self.run(&ast); // Run the ast
match self.run_ast(ast) {
Ok(_) => (),
Err(e) => nice_panic!("Runtime error: {}", e),
}
} }
pub fn run(&mut self, prog: &Ast) { /// Execute the given Ast within the interpreter
let vartable_len = self.vartable.len(); pub fn run_ast(&mut self, mut ast: Ast) -> Result<(), RuntimeError> {
for stmt in &prog.prog { // Optimize the ast
if self.optimize_ast {
ast = SimpleAstOptimizer::optimize(ast);
}
if self.print_ast {
println!("{:#?}", ast.main);
}
// Take over the stringstore of the given ast
self.stringstore = ast.stringstore;
// Run the top level block (the main)
self.run_block(&ast.main)?;
Ok(())
}
/// Run all statements in the given block
pub fn run_block(&mut self, prog: &BlockScope) -> Result<BlockExit, RuntimeError> {
self.run_block_fp_offset(prog, 0)
}
/// Same as run_block, but with an additional framepointer offset. This allows to free more
/// values from the stack than normally and can be used when passing arguments inside a
/// function body scope from the outside
pub fn run_block_fp_offset(
&mut self,
prog: &BlockScope,
framepointer_offset: usize,
) -> Result<BlockExit, RuntimeError> {
let framepointer = self.vartable.len() - framepointer_offset;
let mut block_exit = BlockExit::Normal;
'blockloop: for stmt in prog {
match stmt { match stmt {
Statement::Expr(expr) => { Statement::Break => return Ok(BlockExit::Break),
self.resolve_expr(expr); Statement::Continue => return Ok(BlockExit::Continue),
Statement::Return(expr) => {
let val = self.resolve_expr(expr)?;
block_exit = BlockExit::Return(val);
break 'blockloop;
} }
Statement::Expr(expr) => {
self.resolve_expr(expr)?;
}
Statement::Declaration(decl) => {
let rhs = self.resolve_expr(&decl.rhs)?;
self.vartable.push(rhs);
}
Statement::Block(block) => match self.run_block(block)? {
// Propagate return, continue and break
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
block_exit = be;
break 'blockloop;
}
_ => (),
},
Statement::Loop(looop) => { Statement::Loop(looop) => {
// loop runs as long condition != 0 // loop runs as long condition != 0
loop { loop {
if matches!(self.resolve_expr(&looop.condition), Value::I64(0)) { // Check the loop condition
break; if let Some(condition) = &looop.condition {
if matches!(self.resolve_expr(condition)?, Value::I64(0)) {
break;
}
} }
self.run(&looop.body); // Run the body
let be = self.run_block(&looop.body)?;
match be {
// Propagate return
be @ BlockExit::Return(_) => {
block_exit = be;
break 'blockloop;
}
BlockExit::Break => break,
BlockExit::Continue | BlockExit::Normal => (),
}
// Run the advancement
if let Some(adv) = &looop.advancement { if let Some(adv) = &looop.advancement {
self.resolve_expr(&adv); self.resolve_expr(&adv)?;
} }
} }
} }
Statement::Print(expr) => { Statement::Print(expr) => {
let result = self.resolve_expr(expr); let result = self.resolve_expr(expr)?;
print!("{}", result);
if self.capture_output {
self.output.push(result)
} else {
print!("{}", self.value_to_string(&result));
}
} }
Statement::If(If { Statement::If(If {
@ -87,73 +255,264 @@ impl Interpreter {
body_true, body_true,
body_false, body_false,
}) => { }) => {
if matches!(self.resolve_expr(condition), Value::I64(0)) { // Run the right block depending on the conditions result being 0 or not
self.run(body_false); let exit = if matches!(self.resolve_expr(condition)?, Value::I64(0)) {
self.run_block(body_false)?
} else { } else {
self.run(body_true); self.run_block(body_true)?
};
match exit {
// Propagate return, continue and break
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
block_exit = be;
break 'blockloop;
}
_ => (),
} }
} }
Statement::FunDeclare(fundec) => {
self.funtable.push(fundec.clone());
}
} }
} }
self.vartable.truncate(vartable_len); self.vartable.truncate(framepointer);
Ok(block_exit)
} }
fn resolve_expr(&mut self, expr: &Expression) -> Value { /// Execute the given expression to retrieve the resulting value
match expr { fn resolve_expr(&mut self, expr: &Expression) -> Result<Value, RuntimeError> {
let val = match expr {
Expression::I64(val) => Value::I64(*val), Expression::I64(val) => Value::I64(*val),
Expression::ArrayLiteral(size) => {
let size = match self.resolve_expr(size)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
Value::Array(Rc::new(RefCell::new(vec![Value::I64(0); size as usize])))
}
Expression::String(text) => Value::String(text.clone()), Expression::String(text) => Value::String(text.clone()),
Expression::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, lhs, rhs), Expression::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, lhs, rhs)?,
Expression::UnOp(uo, operand) => self.resolve_unop(uo, operand), Expression::UnOp(uo, operand) => self.resolve_unop(uo, operand)?,
Expression::Var(name) => self.resolve_var(name), Expression::Var(name, idx) => self.resolve_var(*name, *idx)?,
Expression::ArrayAccess(name, idx, arr_idx) => {
self.resolve_array_access(*name, *idx, arr_idx)?
}
Expression::FunCall(fun_name, fun_stackpos, args) => {
let args_len = args.len();
// All of the arg expressions must be resolved before pushing the vars on the stack,
// otherwise the stack positions are incorrect while resolving
let args = args
.iter()
.map(|arg| self.resolve_expr(arg))
.collect::<Vec<_>>();
for arg in args {
self.vartable.push(arg?);
}
// Function existance has been verified in the parser, so unwrap here shouldn't fail
let expected_num_args = self.funtable.get(*fun_stackpos).unwrap().argnames.len();
// Check if the number of provided arguments matches the number of expected arguments
if expected_num_args != args_len {
let fun_name = self
.stringstore
.lookup(*fun_name)
.cloned()
.unwrap_or("<unknown>".to_string());
return Err(RuntimeError::InvalidNumberOfArgs(
fun_name,
expected_num_args,
args_len,
));
}
// Run the function body and return the BlockExit type
match self.run_block_fp_offset(
&Rc::clone(&self.funtable.get(*fun_stackpos).unwrap().body),
expected_num_args,
)? {
BlockExit::Normal | BlockExit::Continue | BlockExit::Break => Value::Void,
BlockExit::Return(val) => val,
}
}
};
Ok(val)
}
/// Retrive the value of a given array at the specified index from the varstack. The name is
/// given as a StringID and is used to reference the variable name in case of an error. The
/// idx is the stackpos where the array variable should be located and the arr_idx is the
/// actual array access index, given as an expression.
fn resolve_array_access(
&mut self,
name: Sid,
idx: usize,
arr_idx: &Expression,
) -> Result<Value, RuntimeError> {
// Resolve the array index into a value and check if it is a valid array index
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
// Get the array value
let val = match self.get_var(idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Make sure it is an array
let arr = match val {
Value::Array(arr) => arr,
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Get the value of the requested cell inside the array
let arr = arr.borrow();
arr.get(arr_idx as usize)
.cloned()
.ok_or(RuntimeError::ArrayOutOfBounds(arr_idx as usize, arr.len()))
}
/// Retrive the value of a given variable from the varstack. The name is given as a StringID
/// and is used to reference the variable name in case of an error. The idx is the stackpos
/// where the variable should be located
fn resolve_var(&mut self, name: Sid, idx: usize) -> Result<Value, RuntimeError> {
match self.get_var(idx) {
Some(val) => Ok(val),
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
} }
} }
fn resolve_var(&mut self, name: &str) -> Value { /// Execute a unary operation and get the resulting value
match self.get_var(name) { fn resolve_unop(&mut self, uo: &UnOpType, operand: &Expression) -> Result<Value, RuntimeError> {
Some(val) => val.clone(), // Recursively resolve the operands expression into an actual value
None => panic!("Variable '{}' used but not declared", name), let operand = self.resolve_expr(operand)?;
}
}
fn resolve_unop(&mut self, uo: &UnOpType, operand: &Expression) -> Value { // Perform the correct operation, considering the operation and value type
let operand = self.resolve_expr(operand); Ok(match (operand, uo) {
match (operand, uo) {
(Value::I64(val), UnOpType::Negate) => Value::I64(-val), (Value::I64(val), UnOpType::Negate) => Value::I64(-val),
(Value::I64(val), UnOpType::BNot) => Value::I64(!val), (Value::I64(val), UnOpType::BNot) => Value::I64(!val),
(Value::I64(val), UnOpType::LNot) => Value::I64(if val == 0 { 1 } else { 0 }), (Value::I64(val), UnOpType::LNot) => Value::I64(if val == 0 { 1 } else { 0 }),
_ => panic!("Value type is not compatible with unary operation"), (val, _) => return Err(RuntimeError::UnOpInvalidType(val)),
} })
} }
fn resolve_binop(&mut self, bo: &BinOpType, lhs: &Expression, rhs: &Expression) -> Value { /// Execute a binary operation and get the resulting value
let rhs = self.resolve_expr(rhs); fn resolve_binop(
&mut self,
bo: &BinOpType,
lhs: &Expression,
rhs: &Expression,
) -> Result<Value, RuntimeError> {
let rhs = self.resolve_expr(rhs)?;
// Handle assignments separate from the other binary operations
match (&bo, &lhs) { match (&bo, &lhs) {
(BinOpType::Declare, Expression::Var(name)) => { // Normal variable assignment
self.vartable.push((name.clone(), rhs.clone())); (BinOpType::Assign, Expression::Var(name, idx)) => {
return rhs; // Get the variable mutably and assign the right hand side value
} match self.get_var_mut(*idx) {
(BinOpType::Assign, Expression::Var(name)) => {
match self.get_var_mut(name) {
Some(val) => *val = rhs.clone(), Some(val) => *val = rhs.clone(),
None => panic!("Runtime Error: Trying to assign value to undeclared variable"), None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
} }
return rhs;
return Ok(rhs);
}
// Array index assignment
(BinOpType::Assign, Expression::ArrayAccess(name, idx, arr_idx)) => {
// Calculate the array index
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
// Get the mutable ref to the array variable
let val = match self.get_var_mut(*idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
// Verify that it actually is an array
match val {
// Assign the right hand side value to the array it the given index
Value::Array(arr) => arr.borrow_mut()[arr_idx as usize] = rhs.clone(),
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
}
return Ok(rhs);
} }
_ => (), _ => (),
} }
let lhs = self.resolve_expr(lhs); // This code is only executed if the binop is not an assignment as the assignments return
// early
match (lhs, rhs) { // Resolve the left hand side to the value
let lhs = self.resolve_expr(lhs)?;
// Perform the appropriate calculations considering the operation type and datatypes of the
// two values
let result = match (lhs, rhs) {
(Value::I64(lhs), Value::I64(rhs)) => match bo { (Value::I64(lhs), Value::I64(rhs)) => match bo {
BinOpType::Add => Value::I64(lhs + rhs), BinOpType::Add => Value::I64(lhs + rhs),
BinOpType::Mul => Value::I64(lhs * rhs), BinOpType::Mul => Value::I64(lhs * rhs),
BinOpType::Sub => Value::I64(lhs - rhs), BinOpType::Sub => Value::I64(lhs - rhs),
BinOpType::Div => Value::I64(lhs / rhs), BinOpType::Div => {
BinOpType::Mod => Value::I64(lhs % rhs), Value::I64(lhs.checked_div(rhs).ok_or(RuntimeError::DivideByZero)?)
}
BinOpType::Mod => {
Value::I64(lhs.checked_rem(rhs).ok_or(RuntimeError::DivideByZero)?)
}
BinOpType::BOr => Value::I64(lhs | rhs), BinOpType::BOr => Value::I64(lhs | rhs),
BinOpType::BAnd => Value::I64(lhs & rhs), BinOpType::BAnd => Value::I64(lhs & rhs),
BinOpType::BXor => Value::I64(lhs ^ rhs), BinOpType::BXor => Value::I64(lhs ^ rhs),
@ -168,18 +527,27 @@ impl Interpreter {
BinOpType::Greater => Value::I64(if lhs > rhs { 1 } else { 0 }), BinOpType::Greater => Value::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Value::I64(if lhs >= rhs { 1 } else { 0 }), BinOpType::GreaterEqu => Value::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Declare | BinOpType::Assign => unreachable!(), BinOpType::Assign => unreachable!(),
}, },
_ => panic!("Value types are not compatible"), (lhs, rhs) => return Err(RuntimeError::BinOpIncompatibleTypes(lhs, rhs)),
} };
}
}
impl Display for Value { Ok(result)
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { }
match self {
Value::I64(val) => write!(f, "{}", val), /// Get a string representation of the given value. This uses the interpreters StringStore to
Value::String(text) => write!(f, "{}", text), /// retrive the text values of Strings
fn value_to_string(&self, val: &Value) -> String {
match val {
Value::I64(val) => format!("{}", val),
Value::Array(val) => format!("{:?}", val.borrow()),
Value::String(text) => format!(
"{}",
self.stringstore
.lookup(*text)
.unwrap_or(&"<invalid string>".to_string())
),
Value::Void => format!("void"),
} }
} }
} }
@ -189,6 +557,8 @@ mod test {
use super::{Interpreter, Value}; use super::{Interpreter, Value};
use crate::ast::{BinOpType, Expression}; use crate::ast::{BinOpType, Expression};
/// Simple test to check if a simple expression is executed properly.
/// Full system tests from lexing to execution can be found in `lib.rs`
#[test] #[test]
fn test_interpreter_expr() { fn test_interpreter_expr() {
// Expression: 1 + 2 * 3 + 4 // Expression: 1 + 2 * 3 + 4
@ -212,7 +582,7 @@ mod test {
let expected = Value::I64(11); let expected = Value::I64(11);
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
let actual = interpreter.resolve_expr(&ast); let actual = interpreter.resolve_expr(&ast).unwrap();
assert_eq!(expected, actual); assert_eq!(expected, actual);
} }

View File

@ -1,8 +1,9 @@
use crate::token::Token;
use anyhow::Result;
use std::{iter::Peekable, str::Chars}; use std::{iter::Peekable, str::Chars};
use thiserror::Error; use thiserror::Error;
use crate::{token::Token, T};
/// Errors that can occur while lexing a given string
#[derive(Debug, Error)] #[derive(Debug, Error)]
pub enum LexErr { pub enum LexErr {
#[error("Failed to parse '{0}' as i64")] #[error("Failed to parse '{0}' as i64")]
@ -20,116 +21,111 @@ pub enum LexErr {
/// Lex the provided code into a Token Buffer /// Lex the provided code into a Token Buffer
pub fn lex(code: &str) -> Result<Vec<Token>, LexErr> { pub fn lex(code: &str) -> Result<Vec<Token>, LexErr> {
let mut lexer = Lexer::new(code); let lexer = Lexer::new(code);
lexer.lex() lexer.lex()
} }
/// The lexer is created from a reference to a sourcecode string and is consumed to create a token
/// buffer from that sourcecode.
struct Lexer<'a> { struct Lexer<'a> {
/// The sourcecode text as an iterator over the chars /// The sourcecode text as a peekable iterator over the chars. Peekable allows for look-ahead
/// and the use of the Chars iterator allows to support unicode characters
code: Peekable<Chars<'a>>, code: Peekable<Chars<'a>>,
/// The lexed tokens
tokens: Vec<Token>,
/// The sourcecode character that is currently being lexed
current_char: char,
} }
impl<'a> Lexer<'a> { impl<'a> Lexer<'a> {
/// Create a new lexer from the given sourcecode
fn new(code: &'a str) -> Self { fn new(code: &'a str) -> Self {
let code = code.chars().peekable(); let code = code.chars().peekable();
Self { code } let tokens = Vec::new();
let current_char = '\0';
Self {
code,
tokens,
current_char,
}
} }
fn lex(&mut self) -> Result<Vec<Token>, LexErr> { /// Consume the lexer and try to lex the contained sourcecode into a token buffer
let mut tokens = Vec::new(); fn lex(mut self) -> Result<Vec<Token>, LexErr> {
loop { loop {
match self.next() { self.current_char = self.next();
// Match on the current and next character. This gives a 1-char look-ahead and
// can be used to directly match 2-char tokens
match (self.current_char, self.peek()) {
// Stop lexing at EOF // Stop lexing at EOF
'\0' => break, ('\0', _) => break,
// Skip whitespace // Skip / ignore whitespace
' ' | '\t' | '\n' | '\r' => (), (' ' | '\t' | '\n' | '\r', _) => (),
// Line comment. Consume every char until linefeed (next line) // Line comment. Consume every char until linefeed (next line)
'/' if matches!(self.peek(), '/') => while !matches!(self.next(), '\n' | '\0') {}, ('/', '/') => while !matches!(self.next(), '\n' | '\0') {},
// Double character tokens // Double character tokens
'>' if matches!(self.peek(), '>') => { ('>', '>') => self.push_tok_consume(T![>>]),
self.next(); ('<', '<') => self.push_tok_consume(T![<<]),
tokens.push(Token::Shr); ('=', '=') => self.push_tok_consume(T![==]),
} ('!', '=') => self.push_tok_consume(T![!=]),
'<' if matches!(self.peek(), '<') => { ('<', '=') => self.push_tok_consume(T![<=]),
self.next(); ('>', '=') => self.push_tok_consume(T![>=]),
tokens.push(Token::Shl); ('<', '-') => self.push_tok_consume(T![<-]),
} ('&', '&') => self.push_tok_consume(T![&&]),
'=' if matches!(self.peek(), '=') => { ('|', '|') => self.push_tok_consume(T![||]),
self.next();
tokens.push(Token::EquEqu);
}
'!' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::NotEqu);
}
'<' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::LAngleEqu);
}
'>' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::RAngleEqu);
}
'<' if matches!(self.peek(), '-') => {
self.next();
tokens.push(Token::LArrow);
}
'&' if matches!(self.peek(), '&') => {
self.next();
tokens.push(Token::LAnd);
}
'|' if matches!(self.peek(), '|') => {
self.next();
tokens.push(Token::LOr);
}
// Single character tokens // Single character tokens
';' => tokens.push(Token::Semicolon), (',', _) => self.push_tok(T![,]),
'+' => tokens.push(Token::Add), (';', _) => self.push_tok(T![;]),
'-' => tokens.push(Token::Sub), ('+', _) => self.push_tok(T![+]),
'*' => tokens.push(Token::Mul), ('-', _) => self.push_tok(T![-]),
'/' => tokens.push(Token::Div), ('*', _) => self.push_tok(T![*]),
'%' => tokens.push(Token::Mod), ('/', _) => self.push_tok(T![/]),
'|' => tokens.push(Token::BOr), ('%', _) => self.push_tok(T![%]),
'&' => tokens.push(Token::BAnd), ('|', _) => self.push_tok(T![|]),
'^' => tokens.push(Token::BXor), ('&', _) => self.push_tok(T![&]),
'(' => tokens.push(Token::LParen), ('^', _) => self.push_tok(T![^]),
')' => tokens.push(Token::RParen), ('(', _) => self.push_tok(T!['(']),
'~' => tokens.push(Token::Tilde), (')', _) => self.push_tok(T![')']),
'<' => tokens.push(Token::LAngle), ('~', _) => self.push_tok(T![~]),
'>' => tokens.push(Token::RAngle), ('<', _) => self.push_tok(T![<]),
'=' => tokens.push(Token::Equ), ('>', _) => self.push_tok(T![>]),
'{' => tokens.push(Token::LBraces), ('=', _) => self.push_tok(T![=]),
'}' => tokens.push(Token::RBraces), ('{', _) => self.push_tok(T!['{']),
'!' => tokens.push(Token::LNot), ('}', _) => self.push_tok(T!['}']),
('!', _) => self.push_tok(T![!]),
('[', _) => self.push_tok(T!['[']),
(']', _) => self.push_tok(T![']']),
// Special tokens with variable length // Special tokens with variable length
// Lex multiple characters together as numbers // Lex multiple characters together as numbers
ch @ '0'..='9' => tokens.push(self.lex_number(ch)?), ('0'..='9', _) => self.lex_number()?,
// Lex multiple characters together as a string // Lex multiple characters together as a string
'"' => tokens.push(self.lex_str()?), ('"', _) => self.lex_str()?,
// Lex multiple characters together as identifier // Lex multiple characters together as identifier or keyword
ch @ ('a'..='z' | 'A'..='Z' | '_') => tokens.push(self.lex_identifier(ch)?), ('a'..='z' | 'A'..='Z' | '_', _) => self.lex_identifier()?,
ch => Err(LexErr::UnexpectedChar(ch))?, // Any character that was not handled otherwise is invalid
(ch, _) => Err(LexErr::UnexpectedChar(ch))?,
} }
} }
Ok(tokens) Ok(self.tokens)
} }
/// Lex multiple characters as a number until encountering a non numeric digit. This includes /// Lex multiple characters as a number until encountering a non numeric digit. The
/// the first character /// successfully lexed i64 literal token is appended to the stored tokens.
fn lex_number(&mut self, first_char: char) -> Result<Token, LexErr> { fn lex_number(&mut self) -> Result<(), LexErr> {
// String representation of the integer value // String representation of the integer value
let mut sval = String::from(first_char); let mut sval = String::from(self.current_char);
// Do as long as a next char exists and it is a numeric char // Do as long as a next char exists and it is a numeric char
loop { loop {
@ -147,31 +143,40 @@ impl<'a> Lexer<'a> {
} }
} }
// Try to convert the string representation of the value to i64 // Try to convert the string representation of the value to i64. The error is mapped to
// the appropriate LexErr
let i64val = sval.parse().map_err(|_| LexErr::NumericParse(sval))?; let i64val = sval.parse().map_err(|_| LexErr::NumericParse(sval))?;
Ok(Token::I64(i64val))
self.push_tok(T![i64(i64val)]);
Ok(())
} }
/// Lex characters as a string until encountering an unescaped closing doublequoute char '"' /// Lex characters as a string until encountering an unescaped closing doublequoute char '"'.
fn lex_str(&mut self) -> Result<Token, LexErr> { /// The successfully lexed string literal token is appended to the stored tokens.
// Opening " was consumed in match fn lex_str(&mut self) -> Result<(), LexErr> {
// The opening " was consumed in match, so a fresh string can be used
let mut text = String::new(); let mut text = String::new();
// Read all chars until encountering the closing " // Read all chars until encountering the closing "
loop { loop {
match self.peek() { match self.peek() {
// An unescaped doubleqoute ends the current string
'"' => break, '"' => break,
// If the end of file is reached while still waiting for '"', error out // If the end of file is reached while still waiting for '"', error out
'\0' => Err(LexErr::MissingClosingString)?, '\0' => Err(LexErr::MissingClosingString)?,
_ => match self.next() { _ => match self.next() {
// Backshlash indicates an escaped character // Backslash indicates an escaped character, so consume one more char and
// treat it as the escaped char
'\\' => match self.next() { '\\' => match self.next() {
'n' => text.push('\n'), 'n' => text.push('\n'),
'r' => text.push('\r'), 'r' => text.push('\r'),
't' => text.push('\t'), 't' => text.push('\t'),
'\\' => text.push('\\'), '\\' => text.push('\\'),
'"' => text.push('"'), '"' => text.push('"'),
// If the escaped char is not handled, it is unsupported and an error
ch => Err(LexErr::InvalidStrEscape(ch))?, ch => Err(LexErr::InvalidStrEscape(ch))?,
}, },
// All other characters are simply appended to the string // All other characters are simply appended to the string
@ -183,12 +188,15 @@ impl<'a> Lexer<'a> {
// Consume closing " // Consume closing "
self.next(); self.next();
Ok(Token::String(text)) self.push_tok(T![str(text)]);
Ok(())
} }
/// Lex characters from the text as an identifier. This includes the first character passed in /// Lex characters from the text as an identifier. The successfully lexed ident or keyword
fn lex_identifier(&mut self, first_char: char) -> Result<Token, LexErr> { /// token is appended to the stored tokens.
let mut ident = String::from(first_char); fn lex_identifier(&mut self) -> Result<(), LexErr> {
let mut ident = String::from(self.current_char);
// Do as long as a next char exists and it is a valid char for an identifier // Do as long as a next char exists and it is a valid char for an identifier
loop { loop {
@ -204,24 +212,46 @@ impl<'a> Lexer<'a> {
// Check for pre-defined keywords // Check for pre-defined keywords
let token = match ident.as_str() { let token = match ident.as_str() {
"loop" => Token::Loop, "loop" => T![loop],
"print" => Token::Print, "print" => T![print],
"if" => Token::If, "if" => T![if],
"else" => Token::Else, "else" => T![else],
"fun" => T![fun],
"return" => T![return],
"break" => T![break],
"continue" => T![continue],
// If it doesn't match a keyword, it is a normal identifier // If it doesn't match a keyword, it is a normal identifier
_ => Token::Ident(ident), _ => T![ident(ident)],
}; };
Ok(token) self.push_tok(token);
Ok(())
} }
/// Advance to next character and return the removed char /// Push the given token into the stored tokens
fn push_tok(&mut self, token: Token) {
self.tokens.push(token);
}
/// Same as `push_tok` but also consumes the next token, removing it from the code iter. This
/// is useful when lexing double char tokens where the second token has only been peeked.
fn push_tok_consume(&mut self, token: Token) {
self.next();
self.tokens.push(token);
}
/// Advance to next character and return the removed char. When the end of the code is reached,
/// `'\0'` is returned. This is used instead of an Option::None since it allows for much
/// shorter and cleaner code in the main loop. The `'\0'` character would not be valid anyways
fn next(&mut self) -> char { fn next(&mut self) -> char {
self.code.next().unwrap_or('\0') self.code.next().unwrap_or('\0')
} }
/// Get the next character without removing it /// Get the next character without removing it. When the end of the code is reached,
/// `'\0'` is returned. This is used instead of an Option::None since it allows for much
/// shorter and cleaner code in the main loop. The `'\0'` character would not be valid anyways
fn peek(&mut self) -> char { fn peek(&mut self) -> char {
self.code.peek().copied().unwrap_or('\0') self.code.peek().copied().unwrap_or('\0')
} }
@ -229,31 +259,52 @@ impl<'a> Lexer<'a> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{lex, Token}; use crate::{lexer::lex, T};
/// A general test to check if the lexer actually lexes tokens correctly
#[test] #[test]
fn test_lexer() { fn test_lexer() {
let code = "33 +5*2 + 4456467*2334+3 % - / << ^ | & >>"; let code = r#"53+1-567_000 * / % | ~ ! < > & ^ ({[]});= <- >= <=
== != && || << >> loop if else print my_123var "hello \t world\r\n\"\\""#;
let expected = vec![ let expected = vec![
Token::I64(33), T![i64(53)],
Token::Add, T![+],
Token::I64(5), T![i64(1)],
Token::Mul, T![-],
Token::I64(2), T![i64(567_000)],
Token::Add, T![*],
Token::I64(4456467), T![/],
Token::Mul, T![%],
Token::I64(2334), T![|],
Token::Add, T![~],
Token::I64(3), T![!],
Token::Mod, T![<],
Token::Sub, T![>],
Token::Div, T![&],
Token::Shl, T![^],
Token::BXor, T!['('],
Token::BOr, T!['{'],
Token::BAnd, T!['['],
Token::Shr, T![']'],
T!['}'],
T![')'],
T![;],
T![=],
T![<-],
T![>=],
T![<=],
T![==],
T![!=],
T![&&],
T![||],
T![<<],
T![>>],
T![loop],
T![if],
T![else],
T![print],
T![ident("my_123var".to_string())],
T![str("hello \t world\r\n\"\\".to_string())],
]; ];
let actual = lex(code).unwrap(); let actual = lex(code).unwrap();

View File

@ -1,5 +1,68 @@
pub mod lexer;
pub mod token;
pub mod parser;
pub mod ast; pub mod ast;
pub mod interpreter; pub mod interpreter;
pub mod lexer;
pub mod parser;
pub mod token;
pub mod stringstore;
pub mod astoptimizer;
pub mod util;
/// A bunch of full program tests using the example code programs as test subjects.
#[cfg(test)]
mod tests {
use crate::interpreter::{Interpreter, Value};
use std::fs::read_to_string;
/// Run a nek program with the given filename from the examples directory and assert the
/// captured output with the expected result. This only works if the program just outputs one
/// value as the result
fn run_example_check_single_i64_output(filename: &str, correct_result: i64) {
let mut interpreter = Interpreter::new();
// Enable output capturing. This captures all calls to `print`
interpreter.capture_output = true;
// Load and run the given program
let code = read_to_string(format!("examples/{filename}")).unwrap();
interpreter.run_str(&code);
// Compare the captured output with the expected value
let expected_output = [Value::I64(correct_result)];
assert_eq!(interpreter.output(), &expected_output);
}
#[test]
fn test_euler1() {
run_example_check_single_i64_output("euler1.nek", 233168);
}
#[test]
fn test_euler2() {
run_example_check_single_i64_output("euler2.nek", 4613732);
}
#[test]
fn test_euler3() {
run_example_check_single_i64_output("euler3.nek", 6857);
}
#[test]
fn test_euler4() {
run_example_check_single_i64_output("euler4.nek", 906609);
}
#[test]
fn test_euler5() {
run_example_check_single_i64_output("euler5.nek", 232792560);
}
#[test]
fn test_recursive_fib() {
run_example_check_single_i64_output("recursive_fib.nek", 832040);
}
#[test]
fn test_functions() {
run_example_check_single_i64_output("test_functions.nek", 69);
}
}

View File

@ -1,16 +1,14 @@
use std::{ use std::{env::args, fs, process::exit};
env::args,
fs,
io::{stdin, stdout, Write},
};
use nek_lang::interpreter::Interpreter; use nek_lang::{interpreter::Interpreter, nice_panic};
/// Cli configuration flags and arguments. This could be done with `clap`, but since only so few
/// arguments are supported this seems kind of overkill.
#[derive(Debug, Default)] #[derive(Debug, Default)]
struct CliConfig { struct CliConfig {
print_tokens: bool, print_tokens: bool,
print_ast: bool, print_ast: bool,
interactive: bool, no_optimizations: bool,
file: Option<String>, file: Option<String>,
} }
@ -22,34 +20,40 @@ fn main() {
match arg.as_str() { match arg.as_str() {
"--token" | "-t" => conf.print_tokens = true, "--token" | "-t" => conf.print_tokens = true,
"--ast" | "-a" => conf.print_ast = true, "--ast" | "-a" => conf.print_ast = true,
"--interactive" | "-i" => conf.interactive = true, "--no-opt" | "-n" => conf.no_optimizations = true,
file if conf.file.is_none() => conf.file = Some(file.to_string()), "--help" | "-h" => print_help(),
_ => panic!("Invalid argument: '{}'", arg), file if !arg.starts_with("-") && conf.file.is_none() => {
conf.file = Some(file.to_string())
}
_ => nice_panic!("Error: Invalid argument '{}'", arg),
} }
} }
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
interpreter.print_tokens = conf.print_tokens;
interpreter.print_ast = conf.print_ast;
interpreter.optimize_ast = !conf.no_optimizations;
if let Some(file) = &conf.file { if let Some(file) = &conf.file {
let code = fs::read_to_string(file).expect(&format!("File not found: '{}'", file)); let code = match fs::read_to_string(file) {
interpreter.run_str(&code, conf.print_tokens, conf.print_ast); Ok(code) => code,
} Err(_) => nice_panic!("Error: Could not read file '{}'", file),
};
if conf.interactive || conf.file.is_none() { // Lex, parse and run the program
let mut code = String::new(); interpreter.run_str(&code);
} else {
loop { println!("Error: No file given\n");
print!(">> "); print_help();
stdout().flush().unwrap();
code.clear();
stdin().read_line(&mut code).unwrap();
if code.trim() == "exit" {
break;
}
interpreter.run_str(&code, conf.print_tokens, conf.print_ast);
}
} }
} }
fn print_help() {
println!("Usage nek-lang [FLAGS] [FILE]");
println!("FLAGS: ");
println!("-t, --token Print the lexed tokens");
println!("-a, --ast Print the abstract syntax tree");
println!("-n, --no-opt Disable the AST optimizations");
println!("-h, --help Show this help screen");
exit(0);
}

View File

@ -1,168 +1,376 @@
use std::iter::Peekable; use thiserror::Error;
use crate::ast::*; use crate::{
use crate::token::Token; ast::{Ast, BlockScope, Expression, FunDecl, If, Loop, Statement, VarDecl},
stringstore::{Sid, StringStore},
token::Token,
util::{PutBackIter, PutBackableExt},
T,
};
/// Errors that can occur while parsing
#[derive(Debug, Error)]
pub enum ParseErr {
#[error("Unexpected Token \"{0:?}\", expected \"{1}\"")]
UnexpectedToken(Token, String),
#[error("Left hand side of declaration is not a variable")]
DeclarationOfNonVar,
#[error("Use of undefined variable \"{0}\"")]
UseOfUndeclaredVar(String),
#[error("Use of undefined function \"{0}\"")]
UseOfUndeclaredFun(String),
#[error("Redeclation of function \"{0}\"")]
RedeclarationFun(String),
#[error("Function not declared at top level \"{0}\"")]
FunctionOnNonTopLevel(String),
}
/// A result that can either be Ok, or a ParseErr
type ResPE<T> = Result<T, ParseErr>;
/// This macro can be used to quickly and easily assert if the next token is matching the expected
/// token and return an appropriate error if not. Since this is intended to be used inside the
/// parser, the first argument should always be `self`.
macro_rules! validate_next {
($self:ident, $expected_tok:pat, $expected_str:expr) => {
match $self.next() {
$expected_tok => (),
tok => return Err(ParseErr::UnexpectedToken(tok, format!("{}", $expected_str))),
}
};
}
/// Parse the given tokens into an abstract syntax tree /// Parse the given tokens into an abstract syntax tree
pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> Ast { pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> ResPE<Ast> {
let mut parser = Parser::new(tokens); let parser = Parser::new(tokens);
parser.parse() parser.parse()
} }
/// A parser that takes in a Token Stream and can create a full abstract syntax tree from it.
struct Parser<T: Iterator<Item = Token>> { struct Parser<T: Iterator<Item = Token>> {
tokens: Peekable<T>, tokens: PutBackIter<T>,
string_store: StringStore,
var_stack: Vec<Sid>,
fun_stack: Vec<Sid>,
nesting_level: usize,
} }
impl<T: Iterator<Item = Token>> Parser<T> { impl<T: Iterator<Item = Token>> Parser<T> {
/// Create a new parser to parse the given Token Stream /// Create a new parser to parse the given Token Stream
fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self { pub fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self {
let tokens = tokens.into_iter().peekable(); let tokens = tokens.into_iter().putbackable();
Self { tokens } let string_store = StringStore::new();
let var_stack = Vec::new();
let fun_stack = Vec::new();
Self {
tokens,
string_store,
var_stack,
fun_stack,
nesting_level: 0,
}
} }
/// Parse tokens into an abstract syntax tree. This will continuously parse statements until /// Consume the parser and try to create the abstract syntax tree from the token stream
/// encountering end-of-file or a block end '}' . pub fn parse(mut self) -> ResPE<Ast> {
fn parse(&mut self) -> Ast { let main = self.parse_scoped_block()?;
Ok(Ast {
main,
stringstore: self.string_store,
})
}
/// Parse a series of statements together as a BlockScope. This will continuously parse
/// statements until encountering end-of-file or a block end '}' .
fn parse_scoped_block(&mut self) -> ResPE<BlockScope> {
self.parse_scoped_block_fp_offset(0)
}
/// Same as parse_scoped_block, but an offset to the framepointer can be specified to allow
/// for easily passing variables into scopes from the outside. This is used when parsing
/// function calls
fn parse_scoped_block_fp_offset(&mut self, framepointer_offset: usize) -> ResPE<BlockScope> {
self.nesting_level += 1;
let framepointer = self.var_stack.len() - framepointer_offset;
let mut prog = Vec::new(); let mut prog = Vec::new();
loop { loop {
match self.peek() { match self.peek() {
Token::Semicolon => { // Just a semicolon is an empty statement. So just consume it
T![;] => {
self.next(); self.next();
} }
Token::EoF | Token::RBraces => break,
// By default try to lex a statement // '}' end the current block and EoF ends everything, as the end of the tokenstream
_ => prog.push(self.parse_stmt()), // is reached
T![EoF] | T!['}'] => break,
// Create a new scoped block
T!['{'] => {
self.next();
prog.push(Statement::Block(self.parse_scoped_block()?));
validate_next!(self, T!['}'], "}");
}
// By default try to lex statements
_ => prog.push(self.parse_stmt()?),
} }
} }
Ast { prog } // Reset the stack to where it was before entering the scope
self.var_stack.truncate(framepointer);
self.nesting_level -= 1;
Ok(prog)
} }
/// Parse a single statement from the tokens. /// Parse a single statement from the tokens
fn parse_stmt(&mut self) -> Statement { fn parse_stmt(&mut self) -> ResPE<Statement> {
match self.peek() { let stmt = match self.peek() {
Token::Loop => Statement::Loop(self.parse_loop()), // Break statement
T![break] => {
Token::Print => {
self.next(); self.next();
let expr = self.parse_expr(); // After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
// After a statement, there must be a semicolon Statement::Break
if !matches!(self.next(), Token::Semicolon) { }
panic!("Expected semicolon after statement");
} // Continue statement
T![continue] => {
self.next();
// After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
Statement::Continue
}
// Loop statement
T![loop] => Statement::Loop(self.parse_loop()?),
// Print statement
T![print] => {
self.next();
let expr = self.parse_expr()?;
// After the statement, there must be a semicolon
validate_next!(self, T![;], ";");
Statement::Print(expr) Statement::Print(expr)
} }
Token::If => Statement::If(self.parse_if()), // Return statement
T![return] => {
// If it is not a loop, try to lex as an expression self.next();
_ => { let stmt = Statement::Return(self.parse_expr()?);
let stmt = Statement::Expr(self.parse_expr());
// After a statement, there must be a semicolon // After a statement, there must be a semicolon
if !matches!(self.next(), Token::Semicolon) { validate_next!(self, T![;], ";");
panic!("Expected semicolon after statement");
}
stmt stmt
} }
}
// If statement
T![if] => Statement::If(self.parse_if()?),
// Function definition statement
T![fun] => {
self.next();
// Expect an identifier as the function name
let fun_name = match self.next() {
T![ident(fun_name)] => fun_name,
tok => return Err(ParseErr::UnexpectedToken(tok, "<ident>".to_string())),
};
// Only allow function definitions on the top level
if self.nesting_level > 1 {
return Err(ParseErr::FunctionOnNonTopLevel(fun_name));
}
// Intern the function name
let fun_name = self.string_store.intern_or_lookup(&fun_name);
// Check if the function name already exists
if self.fun_stack.contains(&fun_name) {
return Err(ParseErr::RedeclarationFun(
self.string_store
.lookup(fun_name)
.cloned()
.unwrap_or("<unknown>".to_string()),
));
}
// Put the function name on the fucntion stack for precalculating the stack
// positions
let fun_stackpos = self.fun_stack.len();
self.fun_stack.push(fun_name);
let mut arg_names = Vec::new();
validate_next!(self, T!['('], "(");
// Parse the optional arguments inside the parentheses
while matches!(self.peek(), T![ident(_)]) {
let var_name = match self.next() {
T![ident(var_name)] => var_name,
_ => unreachable!(),
};
// Intern argument names
let var_name = self.string_store.intern_or_lookup(&var_name);
arg_names.push(var_name);
// Push the variable onto the varstack
self.var_stack.push(var_name);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
}
}
validate_next!(self, T![')'], ")");
validate_next!(self, T!['{'], "{");
// Create the scoped block with a stack offset. This will pop the args that are
// added to the stack while parsing args
let body = self.parse_scoped_block_fp_offset(arg_names.len())?;
validate_next!(self, T!['}'], "}");
Statement::FunDeclare(FunDecl {
name: fun_name,
fun_stackpos,
argnames: arg_names,
body: body.into(),
})
}
// Either a variable declaration statement or an expression statement
_ => {
// To decide if it is a declaration or an expression, a lookahead is needed
let first = self.next();
let stmt = match (first, self.peek()) {
// Identifier and "<-" is a declaration
(T![ident(name)], T![<-]) => {
self.next();
let rhs = self.parse_expr()?;
let sid = self.string_store.intern_or_lookup(&name);
let sp = self.var_stack.len();
self.var_stack.push(sid);
Statement::Declaration(VarDecl {
name: sid,
var_stackpos: sp,
rhs,
})
}
// Anything else must be an expression
(first, _) => {
// Put the first token back in order for the parse_expr to see it
self.putback(first);
Statement::Expr(self.parse_expr()?)
}
};
// After a statement, there must be a semicolon
validate_next!(self, T![;], ";");
stmt
}
};
Ok(stmt)
} }
/// Parse an if statement from the tokens /// Parse an if statement from the tokens
fn parse_if(&mut self) -> If { fn parse_if(&mut self) -> ResPE<If> {
if !matches!(self.next(), Token::If) { validate_next!(self, T![if], "if");
panic!("Error lexing if: Expected if token");
}
let condition = self.parse_expr(); let condition = self.parse_expr()?;
if !matches!(self.next(), Token::LBraces) { validate_next!(self, T!['{'], "{");
panic!("Error lexing if: Expected '{{'")
}
let body_true = self.parse(); let body_true = self.parse_scoped_block()?;
if !matches!(self.next(), Token::RBraces) { validate_next!(self, T!['}'], "}");
panic!("Error lexing if: Expected '}}'")
}
let mut body_false = Ast::default(); let mut body_false = BlockScope::default();
if matches!(self.peek(), Token::Else) { // Optionally parse the else part
if self.peek() == &T![else] {
self.next(); self.next();
if !matches!(self.next(), Token::LBraces) { validate_next!(self, T!['{'], "{");
panic!("Error lexing if: Expected '{{'")
}
body_false = self.parse(); body_false = self.parse_scoped_block()?;
if !matches!(self.next(), Token::RBraces) { validate_next!(self, T!['}'], "}");
panic!("Error lexing if: Expected '}}'")
}
} }
If { Ok(If {
condition, condition,
body_true, body_true,
body_false, body_false,
} })
} }
/// Parse a loop statement from the tokens /// Parse a loop statement from the tokens
fn parse_loop(&mut self) -> Loop { fn parse_loop(&mut self) -> ResPE<Loop> {
if !matches!(self.next(), Token::Loop) { validate_next!(self, T![loop], "loop");
panic!("Error lexing loop: Expected loop token");
}
let condition = self.parse_expr(); let mut condition = None;
let mut advancement = None; let mut advancement = None;
let body; // Check if the optional condition is present
if !matches!(self.peek(), T!['{']) {
condition = Some(self.parse_expr()?);
match self.next() { // Check if the optional advancement is present
Token::LBraces => { if matches!(self.peek(), T![;]) {
body = self.parse(); self.next();
advancement = Some(self.parse_expr()?);
} }
Token::Semicolon => {
advancement = Some(self.parse_expr());
if !matches!(self.next(), Token::LBraces) {
panic!("Error lexing loop: Expected '{{'")
}
body = self.parse();
}
_ => panic!("Error lexing loop: Expected ';' or '{{'"),
} }
if !matches!(self.next(), Token::RBraces) { validate_next!(self, T!['{'], "{");
panic!("Error lexing loop: Expected '}}'")
}
Loop { let body = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
Ok(Loop {
condition, condition,
advancement, advancement,
body, body,
} })
} }
/// Parse a single expression from the tokens /// Parse a single expression from the tokens
fn parse_expr(&mut self) -> Expression { fn parse_expr(&mut self) -> ResPE<Expression> {
let lhs = self.parse_primary(); let lhs = self.parse_primary()?;
self.parse_expr_precedence(lhs, 0) self.parse_expr_precedence(lhs, 0)
} }
/// Parse binary expressions with a precedence equal to or higher than min_prec /// Parse binary expressions with a precedence equal to or higher than min_prec.
fn parse_expr_precedence(&mut self, mut lhs: Expression, min_prec: u8) -> Expression { /// This uses the precedence climbing methode for dealing with the operator precedences:
/// https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method
fn parse_expr_precedence(&mut self, mut lhs: Expression, min_prec: u8) -> ResPE<Expression> {
while let Some(binop) = &self.peek().try_to_binop() { while let Some(binop) = &self.peek().try_to_binop() {
// Stop if the next operator has a lower binding power // Stop if the next operator has a lower binding power
if !(binop.precedence() >= min_prec) { if !(binop.precedence() >= min_prec) {
@ -173,99 +381,190 @@ impl<T: Iterator<Item = Token>> Parser<T> {
// valid // valid
let binop = self.next().try_to_binop().unwrap(); let binop = self.next().try_to_binop().unwrap();
let mut rhs = self.parse_primary(); let mut rhs = self.parse_primary()?;
while let Some(binop2) = &self.peek().try_to_binop() { while let Some(binop2) = &self.peek().try_to_binop() {
if !(binop2.precedence() > binop.precedence()) { if !(binop2.precedence() > binop.precedence()) {
break; break;
} }
rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1); rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1)?;
} }
lhs = Expression::BinOp(binop, lhs.into(), rhs.into()); lhs = Expression::BinOp(binop, lhs.into(), rhs.into());
} }
lhs Ok(lhs)
} }
/// Parse a primary expression (for now only number) /// Parse a primary expression. A primary can be a literal value, variable, function call,
fn parse_primary(&mut self) -> Expression { /// array indexing, parentheses grouping or a unary operation
match self.next() { fn parse_primary(&mut self) -> ResPE<Expression> {
let primary = match self.next() {
// Literal i64 // Literal i64
Token::I64(val) => Expression::I64(val), T![i64(val)] => Expression::I64(val),
// Literal String // Literal String
Token::String(text) => Expression::String(text.into()), T![str(text)] => Expression::String(self.string_store.intern_or_lookup(&text)),
Token::Ident(name) => Expression::Var(name), // Array literal. Square brackets containing the array size as expression
T!['['] => {
let size = self.parse_expr()?;
validate_next!(self, T![']'], "]");
Expression::ArrayLiteral(size.into())
}
// Array sccess, aka indexing. An ident followed by square brackets containing the
// index as an expression
T![ident(name)] if self.peek() == &T!['['] => {
// Get the stack position of the array variable
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
self.next();
let index = self.parse_expr()?;
validate_next!(self, T![']'], "]");
Expression::ArrayAccess(sid, stackpos, index.into())
}
// Identifier followed by parenthesis is a function call
T![ident(name)] if self.peek() == &T!['('] => {
// Skip the opening parenthesis
self.next();
let sid = self.string_store.intern_or_lookup(&name);
let mut args = Vec::new();
// Parse the arguments as expressions
while !matches!(self.peek(), T![')']) {
let arg = self.parse_expr()?;
args.push(arg);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
}
}
validate_next!(self, T![')'], ")");
// Find the function stack position
let fun_stackpos = self.get_fun_stackpos(sid)?;
Expression::FunCall(sid, fun_stackpos, args)
}
// Just an identifier is a variable
T![ident(name)] => {
// Find the variable stack position
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
Expression::Var(sid, stackpos)
}
// Parentheses grouping // Parentheses grouping
Token::LParen => { T!['('] => {
let inner_expr = self.parse_expr(); // Contained inbetween the parentheses can be any other expression
let inner_expr = self.parse_expr()?;
// Verify that there is a closing parenthesis // Verify that there is a closing parenthesis
if !matches!(self.next(), Token::RParen) { validate_next!(self, T![')'], ")");
panic!("Error parsing primary expr: Exepected closing parenthesis ')'");
}
inner_expr inner_expr
} }
// Unary negation // Unary operations or invalid token
Token::Sub => { tok => match tok.try_to_unop() {
let operand = self.parse_primary(); // If the token is a valid unary operation, parse it as such
Expression::UnOp(UnOpType::Negate, operand.into()) Some(uot) => Expression::UnOp(uot, self.parse_primary()?.into()),
}
// Otherwise it's an unexpected token
None => return Err(ParseErr::UnexpectedToken(tok, "primary".to_string())),
},
};
// Unary bitwise not (bitflip) Ok(primary)
Token::Tilde => {
let operand = self.parse_primary();
Expression::UnOp(UnOpType::BNot, operand.into())
}
// Unary logical not
Token::LNot => {
let operand = self.parse_primary();
Expression::UnOp(UnOpType::LNot, operand.into())
}
tok => panic!("Error parsing primary expr: Unexpected Token '{:?}'", tok),
}
} }
/// Get the next Token without removing it /// Try to get the position of a variable on the variable stack. This is needed to precalculate
/// the stackpositions in order to save time when executing
fn get_stackpos(&self, varid: Sid) -> ResPE<usize> {
self.var_stack
.iter()
.rev()
.position(|it| *it == varid)
.map(|it| it)
.ok_or(ParseErr::UseOfUndeclaredVar(
self.string_store
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
}
/// Try to get the position of a function on the function stack. This is needed to precalculate
/// the stackpositions in order to save time when executing
fn get_fun_stackpos(&self, varid: Sid) -> ResPE<usize> {
self.fun_stack
.iter()
.rev()
.position(|it| *it == varid)
.map(|it| self.fun_stack.len() - it - 1)
.ok_or(ParseErr::UseOfUndeclaredFun(
self.string_store
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
}
/// Get the next Token without removing it. If there are no more tokens left, the EoF token is
/// returned. This follows the same reasoning as in the Lexer
fn peek(&mut self) -> &Token { fn peek(&mut self) -> &Token {
self.tokens.peek().unwrap_or(&Token::EoF) self.tokens.peek().unwrap_or(&T![EoF])
} }
/// Advance to next Token and return the removed Token /// Put a single token back into the token stream
fn putback(&mut self, tok: Token) {
self.tokens.putback(tok);
}
/// Advance to next Token and return the removed Token. If there are no more tokens left, the
/// EoF token is returned. This follows the same reasoning as in the Lexer
fn next(&mut self) -> Token { fn next(&mut self) -> Token {
self.tokens.next().unwrap_or(Token::EoF) self.tokens.next().unwrap_or(T![EoF])
} }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{parse, BinOpType, Expression};
use crate::{ use crate::{
parser::{Ast, Statement}, ast::{BinOpType, Expression, Statement},
token::Token, parser::parse,
T,
}; };
/// A very simple test to check if the parser correctly parses a simple expression
#[test] #[test]
fn test_parser() { fn test_parser() {
// Expression: 1 + 2 * 3 + 4 // Expression: 1 + 2 * 3 - 4
// With precedence: (1 + (2 * 3)) + 4 // With precedence: (1 + (2 * 3)) - 4
let tokens = [ let tokens = [
Token::I64(1), T![i64(1)],
Token::Add, T![+],
Token::I64(2), T![i64(2)],
Token::Mul, T![*],
Token::I64(3), T![i64(3)],
Token::Sub, T![-],
Token::I64(4), T![i64(4)],
Token::Semicolon, T![;],
]; ];
let expected = Statement::Expr(Expression::BinOp( let expected = Statement::Expr(Expression::BinOp(
@ -284,11 +583,9 @@ mod tests {
Expression::I64(4).into(), Expression::I64(4).into(),
)); ));
let expected = Ast { let expected = vec![expected];
prog: vec![expected],
};
let actual = parse(tokens); let actual = parse(tokens).unwrap();
assert_eq!(expected, actual); assert_eq!(expected, actual.main);
} }
} }

104
src/stringstore.rs Normal file
View File

@ -0,0 +1,104 @@
use std::collections::HashMap;
/// A StringID that identifies a String inside the stringstore. This is only valid for the
/// StringStore that created the ID. These StringIDs can be trivialy and cheaply copied
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Sid(usize);
/// A Datastructure that stores strings, handing out StringIDs that can be used to retrieve the
/// real strings at a later point. This is called interning.
#[derive(Clone, Default)]
pub struct StringStore {
/// The actual strings that are stored in the StringStore. The StringIDs match the index of the
/// string inside of this strings vector
strings: Vec<String>,
/// A Hashmap that allows to match already interned Strings to their StringID. This allows for
/// deduplication since the same string won't be stored twice
sids: HashMap<String, Sid>,
}
impl StringStore {
/// Create a new empty StringStore
pub fn new() -> Self {
Self { strings: Vec::new(), sids: HashMap::new() }
}
/// Put the given string into the StringStore and get a StringID in return. If the string is
/// not yet stored, it will be after this.
///
/// Note: The generated StringIDs are only valid for the StringStore that created them. Using
/// the IDs with another StringStore is undefined behavior. It might return wrong Strings or
/// None.
pub fn intern_or_lookup(&mut self, text: &str) -> Sid {
self.sids.get(text).copied().unwrap_or_else(|| {
let sid = Sid(self.strings.len());
self.strings.push(text.to_string());
self.sids.insert(text.to_string(), sid);
sid
})
}
/// Lookup and retrieve a string by the StringID. If the String is not found, None is returned.
///
/// Note: The generated StringIDs are only valid for the StringStore that created them. Using
/// the IDs with another StringStore is undefined behavior. It might return wrong Strings or
/// None.
pub fn lookup(&self, sid: Sid) -> Option<&String> {
self.strings.get(sid.0)
}
}
#[cfg(test)]
mod tests {
use super::StringStore;
#[test]
fn test_stringstore_intern_lookup() {
let mut ss = StringStore::new();
let s1 = "Hello";
let s2 = "World";
let id1 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1).unwrap().as_str(), s1);
let id2 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2).unwrap().as_str(), s2);
assert_eq!(ss.lookup(id1).unwrap().as_str(), s1);
}
#[test]
fn test_stringstore_no_duplicates() {
let mut ss = StringStore::new();
let s1 = "Hello";
let s2 = "World";
let id1_1 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1_1).unwrap().as_str(), s1);
let id1_2 = ss.intern_or_lookup(s1);
assert_eq!(ss.lookup(id1_2).unwrap().as_str(), s1);
// Check that the string is the same
assert_eq!(id1_1, id1_2);
// Check that only one string is actually stored
assert_eq!(ss.strings.len(), 1);
assert_eq!(ss.sids.len(), 1);
let id2_1 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2_1).unwrap().as_str(), s2);
let id2_2 = ss.intern_or_lookup(s2);
assert_eq!(ss.lookup(id2_2).unwrap().as_str(), s2);
// Check that the string is the same
assert_eq!(id2_1, id2_2);
assert_eq!(ss.strings.len(), 2);
assert_eq!(ss.sids.len(), 2);
}
}

View File

@ -1,146 +1,379 @@
use crate::ast::BinOpType; use crate::{
ast::{BinOpType, UnOpType},
T,
};
/// Language keywords
#[derive(Debug, PartialEq, Eq)] #[derive(Debug, PartialEq, Eq)]
pub enum Token { pub enum Keyword {
/// Loop keyword ("loop")
Loop,
/// Print keyword ("print")
Print,
/// If keyword ("if")
If,
/// Else keyword ("else")
Else,
/// Function declaration keyword ("fun")
Fun,
/// Return keyword ("return")
Return,
/// Break keyword ("break")
Break,
/// Continue keyword ("continue")
Continue,
}
/// Literal values
#[derive(Debug, PartialEq, Eq)]
pub enum Literal {
/// Integer literal (64-bit) /// Integer literal (64-bit)
I64(i64), I64(i64),
/// String literal /// String literal
String(String), String(String),
}
/// Identifier (name for variables, functions, ...) /// Combined tokens that consist of a combination of characters
#[derive(Debug, PartialEq, Eq)]
pub enum Combo {
/// Equal Equal ("==")
Equal2,
/// Exclamation mark Equal ("!=")
ExclamationMarkEqual,
/// Ampersand Ampersand ("&&")
Ampersand2,
/// Pipe Pipe ("||")
Pipe2,
/// LessThan LessThan ("<<")
LessThan2,
/// GreaterThan GreaterThan (">>")
GreaterThan2,
/// LessThan Equal ("<=")
LessThanEqual,
/// GreaterThan Equal (">=")
GreaterThanEqual,
/// LessThan Minus ("<-")
LessThanMinus,
}
/// Tokens are a group of one or more sourcecode characters that have a meaning together
#[derive(Debug, PartialEq, Eq)]
pub enum Token {
/// Literal value token
Literal(Literal),
/// Keyword token
Keyword(Keyword),
/// Identifier token (names for variables, functions, ...)
Ident(String), Ident(String),
/// Loop keyword (loop) /// Combined tokens consisting of multiple characters
Loop, Combo(Combo),
/// Print keyword (print) /// Comma (",")
Print, Comma,
/// If keyword (if) /// Equal Sign ("=")
If, Equal,
/// Else keyword (else) /// Semicolon (";")
Else,
/// Left Parenthesis ('(')
LParen,
/// Right Parenthesis (')')
RParen,
/// Left curly braces ({)
LBraces,
/// Right curly braces (})
RBraces,
/// Plus (+)
Add,
/// Minus (-)
Sub,
/// Asterisk (*)
Mul,
/// Slash (/)
Div,
/// Percent (%)
Mod,
/// Equal Equal (==)
EquEqu,
/// Exclamationmark Equal (!=)
NotEqu,
/// Pipe (|)
BOr,
/// Ampersand (&)
BAnd,
/// Circumflex (^)
BXor,
/// Logical AND (&&)
LAnd,
/// Logical OR (||)
LOr,
/// Shift Left (<<)
Shl,
/// Shift Right (>>)
Shr,
/// Tilde (~)
Tilde,
/// Logical not (!)
LNot,
/// Left angle bracket (<)
LAngle,
/// Right angle bracket (>)
RAngle,
/// Left angle bracket Equal (<=)
LAngleEqu,
/// Left angle bracket Equal (>=)
RAngleEqu,
/// Left arrow (<-)
LArrow,
/// Equal Sign (=)
Equ,
/// Semicolon (;)
Semicolon, Semicolon,
/// End of file /// End of file (This is not generated by the lexer, but the parser uses this to find the
/// end of the token stream)
EoF, EoF,
/// Left Bracket ("[")
LBracket,
/// Right Bracket ("]")
RBracket,
/// Left Parenthesis ("(")
LParen,
/// Right Parenthesis (")"")
RParen,
/// Left curly braces ("{")
LBraces,
/// Right curly braces ("}")
RBraces,
/// Plus ("+")
Plus,
/// Minus ("-")
Minus,
/// Asterisk ("*")
Asterisk,
/// Slash ("/")
Slash,
/// Percent ("%")
Percent,
/// Pipe ("|")
Pipe,
/// Tilde ("~")
Tilde,
/// Logical not ("!")
Exclamationmark,
/// Left angle bracket ("<")
LessThan,
/// Right angle bracket (">")
GreaterThan,
/// Ampersand ("&")
Ampersand,
/// Circumflex ("^")
Circumflex,
} }
impl Token { impl Token {
/// If the Token can be used as a binary operation type, get the matching BinOpType. Otherwise
/// return None.
pub fn try_to_binop(&self) -> Option<BinOpType> { pub fn try_to_binop(&self) -> Option<BinOpType> {
Some(match self { Some(match self {
Token::Add => BinOpType::Add, T![+] => BinOpType::Add,
Token::Sub => BinOpType::Sub, T![-] => BinOpType::Sub,
Token::Mul => BinOpType::Mul, T![*] => BinOpType::Mul,
Token::Div => BinOpType::Div, T![/] => BinOpType::Div,
Token::Mod => BinOpType::Mod, T![%] => BinOpType::Mod,
Token::BAnd => BinOpType::BAnd, T![&] => BinOpType::BAnd,
Token::BOr => BinOpType::BOr, T![|] => BinOpType::BOr,
Token::BXor => BinOpType::BXor, T![^] => BinOpType::BXor,
Token::LAnd => BinOpType::LAnd, T![&&] => BinOpType::LAnd,
Token::LOr => BinOpType::LOr, T![||] => BinOpType::LOr,
Token::Shl => BinOpType::Shl, T![<<] => BinOpType::Shl,
Token::Shr => BinOpType::Shr, T![>>] => BinOpType::Shr,
Token::EquEqu => BinOpType::EquEqu, T![==] => BinOpType::EquEqu,
Token::NotEqu => BinOpType::NotEqu, T![!=] => BinOpType::NotEqu,
Token::LAngle => BinOpType::Less, T![<] => BinOpType::Less,
Token::LAngleEqu => BinOpType::LessEqu, T![<=] => BinOpType::LessEqu,
Token::RAngle => BinOpType::Greater, T![>] => BinOpType::Greater,
Token::RAngleEqu => BinOpType::GreaterEqu, T![>=] => BinOpType::GreaterEqu,
Token::LArrow => BinOpType::Declare, T![=] => BinOpType::Assign,
Token::Equ => BinOpType::Assign,
_ => return None,
})
}
/// If the token can be used as a unary operation type, get the matching UnOpType. Otherwise
/// return None
pub fn try_to_unop(&self) -> Option<UnOpType> {
Some(match self {
T![-] => UnOpType::Negate,
T![!] => UnOpType::LNot,
T![~] => UnOpType::BNot,
_ => return None, _ => return None,
}) })
} }
} }
/// Macro to quickly create a token of the specified kind. As this is implemented as a macro, it
/// can be used anywhere including in patterns.
///
/// An implementation should exist for each token, so that there is no need to ever write out the
/// long token definitions.
#[macro_export]
macro_rules! T {
// Keywords
[loop] => {
crate::token::Token::Keyword(crate::token::Keyword::Loop)
};
[print] => {
crate::token::Token::Keyword(crate::token::Keyword::Print)
};
[if] => {
crate::token::Token::Keyword(crate::token::Keyword::If)
};
[else] => {
crate::token::Token::Keyword(crate::token::Keyword::Else)
};
[fun] => {
crate::token::Token::Keyword(crate::token::Keyword::Fun)
};
[return] => {
crate::token::Token::Keyword(crate::token::Keyword::Return)
};
[break] => {
crate::token::Token::Keyword(crate::token::Keyword::Break)
};
[continue] => {
crate::token::Token::Keyword(crate::token::Keyword::Continue)
};
// Literals
[i64($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::I64($($val)*))
};
[str($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::String($($val)*))
};
// Ident
[ident($($val:tt)*)] => {
crate::token::Token::Ident($($val)*)
};
// Combo crate::token::Tokens
[==] => {
crate::token::Token::Combo(crate::token::Combo::Equal2)
};
[!=] => {
crate::token::Token::Combo(crate::token::Combo::ExclamationMarkEqual)
};
[&&] => {
crate::token::Token::Combo(crate::token::Combo::Ampersand2)
};
[||] => {
crate::token::Token::Combo(crate::token::Combo::Pipe2)
};
[<<] => {
crate::token::Token::Combo(crate::token::Combo::LessThan2)
};
[>>] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThan2)
};
[<=] => {
crate::token::Token::Combo(crate::token::Combo::LessThanEqual)
};
[>=] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThanEqual)
};
[<-] => {
crate::token::Token::Combo(crate::token::Combo::LessThanMinus)
};
// Normal Tokens
[,] => {
crate::token::Token::Comma
};
[=] => {
crate::token::Token::Equal
};
[;] => {
crate::token::Token::Semicolon
};
[EoF] => {
crate::token::Token::EoF
};
['['] => {
crate::token::Token::LBracket
};
[']'] => {
crate::token::Token::RBracket
};
['('] => {
crate::token::Token::LParen
};
[')'] => {
crate::token::Token::RParen
};
['{'] => {
crate::token::Token::LBraces
};
['}'] => {
crate::token::Token::RBraces
};
[+] => {
crate::token::Token::Plus
};
[-] => {
crate::token::Token::Minus
};
[*] => {
crate::token::Token::Asterisk
};
[/] => {
crate::token::Token::Slash
};
[%] => {
crate::token::Token::Percent
};
[|] => {
crate::token::Token::Pipe
};
[~] => {
crate::token::Token::Tilde
};
[!] => {
crate::token::Token::Exclamationmark
};
[<] => {
crate::token::Token::LessThan
};
[>] => {
crate::token::Token::GreaterThan
};
[&] => {
crate::token::Token::Ampersand
};
[^] => {
crate::token::Token::Circumflex
};
}

167
src/util.rs Normal file
View File

@ -0,0 +1,167 @@
/// Exit the program with error code 1 and format-print the given text on stderr. This pretty much
/// works like panic, but doesn't show the additional information that panic adds. Those can be
/// interesting for debugging, but don't look that great when building a release executable for an
/// end user.
/// When running tests or running in debug mode, panic is used to ensure the tests working
/// correctly.
#[macro_export]
macro_rules! nice_panic {
($fmt:expr) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt);
} else {
eprintln!($fmt);
std::process::exit(1);
}
}
};
($fmt:expr, $($arg:tt)*) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt, $($arg)*);
} else {
eprintln!($fmt, $($arg)*);
std::process::exit(1);
}
}
};
}
/// The PutBackIter allows for items to be put back back and to be peeked. Putting an item back
/// will cause it to be the next item returned by `next`. Peeking an item will get a reference to
/// the next item in the iterator without removing it.
///
/// The whole PutBackIter behaves analogous to `std::iter::Peekable` with the addition of the
/// `putback` function. This is slightly slower than `Peekable`, but allows for an unlimited number
/// of putbacks and therefore an unlimited look-ahead range.
pub struct PutBackIter<T: Iterator> {
iter: T,
putback_stack: Vec<T::Item>,
}
impl<T> PutBackIter<T>
where
T: Iterator,
{
/// Make the given iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
pub fn new(iter: T) -> Self {
Self {
iter,
putback_stack: Vec::new(),
}
}
/// Put the given item back into the iterator. This causes the putbacked items to be returned by
/// next in last-in-first-out order (aka. stack order). Only after all previously putback items
/// have been returned, the actual underlying iterator is used to get items.
/// The number of items that can be put back is unlimited.
pub fn putback(&mut self, it: T::Item) {
self.putback_stack.push(it);
}
/// Peek the next item, getting a reference to it without removing it from the iterator. This
/// also includes items that were previsouly put back and not yet removed.
pub fn peek(&mut self) -> Option<&T::Item> {
if self.putback_stack.is_empty() {
let it = self.next()?;
self.putback(it);
}
self.putback_stack.last()
}
}
impl<T> Iterator for PutBackIter<T>
where
T: Iterator,
{
type Item = T::Item;
fn next(&mut self) -> Option<Self::Item> {
match self.putback_stack.pop() {
Some(it) => Some(it),
None => self.iter.next(),
}
}
}
pub trait PutBackableExt {
/// Make the iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
fn putbackable(self) -> PutBackIter<Self>
where
Self: Iterator + Sized,
{
PutBackIter::new(self)
}
}
impl<T: Iterator> PutBackableExt for T {}
#[cfg(test)]
mod tests {
use super::PutBackableExt;
#[test]
fn putback_iter_next() {
let mut iter = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut pb_iter = iter.clone().putbackable();
// Check if next works
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter.next());
}
}
#[test]
fn putback_iter_peek() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
for _ in 0..iter.len() {
// Check if peek gives a preview of the actual next element
assert_eq!(pb_iter.peek(), iter.next().as_ref());
// Check if next still returns the next (just peeked) element and not the one after
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
#[test]
fn putback_iter_putback() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
// Get the first 5 items with next and check if they match
let it0 = pb_iter.next();
assert_eq!(it0, iter.next());
let it1 = pb_iter.next();
assert_eq!(it1, iter.next());
let it2 = pb_iter.next();
assert_eq!(it2, iter.next());
let it3 = pb_iter.next();
assert_eq!(it3, iter.next());
let it4 = pb_iter.next();
assert_eq!(it4, iter.next());
// Put one value back and check if `next` works as expected, returning the just put back
// item
pb_iter.putback(it0.unwrap());
assert_eq!(pb_iter.next(), it0);
// Put all values back
pb_iter.putback(it4.unwrap());
pb_iter.putback(it3.unwrap());
pb_iter.putback(it2.unwrap());
pb_iter.putback(it1.unwrap());
pb_iter.putback(it0.unwrap());
// After all values have been put back, the iter should match the original again
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
}