1 Commits

Author SHA1 Message Date
21228ff3d7 Implement vec based scopes
- Replaced vartable hashmap with vec
- Use linear search in reverse to find the variables by name
- This is really fast with a small number of variables but tanks fast
  with more vars due to O(n) lookup times
- Implemented scopes by dropping all elements from the vartable at the
  end of a scope
2022-02-03 22:09:58 +01:00
23 changed files with 572 additions and 2149 deletions

View File

@@ -4,4 +4,5 @@ version = "0.1.0"
edition = "2021" edition = "2021"
[dependencies] [dependencies]
anyhow = "1.0.53"
thiserror = "1.0.30" thiserror = "1.0.30"

278
README.md
View File

@@ -1,49 +1,13 @@
# NEK-Lang # NEK-Lang
## Table of contents
- [NEK-Lang](#nek-lang)
- [Table of contents](#table-of-contents)
- [Variables](#variables)
- [Declaration](#declaration)
- [Assignment](#assignment)
- [Datatypes](#datatypes)
- [I64](#i64)
- [String](#string)
- [Array](#array)
- [Expressions](#expressions)
- [General](#general)
- [Mathematical Operators](#mathematical-operators)
- [Bitwise Operators](#bitwise-operators)
- [Logical Operators](#logical-operators)
- [Equality & Relational Operators](#equality--relational-operators)
- [Control-Flow](#control-flow)
- [Loop](#loop)
- [If / Else](#if--else)
- [Block Scopes](#block-scopes)
- [Functions](#functions)
- [Function definition](#function-definition)
- [Function calls](#function-calls)
- [IO](#io)
- [Print](#print)
- [Comments](#comments)
- [Line comments](#line-comments)
- [Feature Tracker](#feature-tracker)
- [High level Components](#high-level-components)
- [Language features](#language-features)
- [Parsing Grammar](#parsing-grammar)
- [Expressions](#expressions-1)
- [Statements](#statements)
- [Examples](#examples)
- [Extras](#extras)
- [Visual Studio Code Language Support](#visual-studio-code-language-support)
## Variables ## Variables
The variables are all contained in scopes. Variables defined in an outer scope can be accessed in Currently all variables are global and completely unscoped. That means no matter where a variable is declared, it remains over the whole remaining runtime of the progam.
inner scoped. All variables defined in a scope that has ended do no longer exist and can't be
accessed. All variables are currently of type `i64` (64-bit signed integer)
### Declaration ### Declaration
- Declare and initialize a new variable - Declare and initialize a new variable
- Declaring a previously declared variable again will shadow the previous variable - Declaring a previously declared variable again is currently equivalent to an assignment
- Declaration is needed before assignment or other usage - Declaration is needed before assignment or other usage
- The variable name is on the left side of the `<-` operator - The variable name is on the left side of the `<-` operator
- The assigned value is on the right side and can be any expression - The assigned value is on the right side and can be any expression
@@ -61,62 +25,6 @@ a = 123;
``` ```
The value `123` is assigned to the variable named `a`. `a` needs to be declared before this. The value `123` is assigned to the variable named `a`. `a` needs to be declared before this.
## Datatypes
The available variable datatypes are `i64` (64-bit signed integer), `string` (`"this is a string"`) and `array` (`[10]`)
### I64
- The normal default datatype is `i64` which is a 64-bit signed integer
- Can be created by just writing an integer literal like `546`
- Inside the number literal `_` can be inserted for visual separation `100_000`
- The i64 values can be used as expected in calculations, conditions and so on
```
my_i64 <- 123_456;
```
### String
- Strings mainly exist for formatting the text output of a program
- Strings can be created by using doublequotes like in other languages `"Hello world"`
- There is no way to access or change the characters of the string
- Unicode characters are supported `"Hello 🌎"`
- Escape characters `\n`, `\r`, `\t`, `\"`, `\\` are supported
- String can be assigned to variables, just like i64
```
world <- "🌎";
print "Hello ";
print world;
print "\n";
```
### Array
- Arrays can contain any other datatypes and don't need to have the same type in all cells
- Arrays can be created by using brackets with the size in between `[size]`
- Arrays must be assigned to a variable in order to be used
- All cells will be initialized with i64 0 values
- The size can be any expression that results in a positive i64 value
- The array size can't be changed after creation
- The arrays data is always allocated on the heap
- The array cells can be accessed by using the variable name and specifying the index in brackets
`my_arr[index]`
- The index can be any expression that results in a positive i64 value in the range of the arrays
indices
- The indices start with 0
- When an array is passed to a function, it is passed by reference
```
width <- 5;
heigt <- 5;
// Initialize array of size 25, initialized with 25x 0
my_array = [width * height];
// Modify first value
my_array[0] = 5;
// Print first value
// Outputs `5`
print my_array[0];
```
## Expressions ## Expressions
The operator precedence is the same order as in `C` for all implemented operators. The operator precedence is the same order as in `C` for all implemented operators.
Refer to the Refer to the
@@ -146,9 +54,7 @@ Supported mathematical operations:
- "Bit flip" (One's complement) `~a` - "Bit flip" (One's complement) `~a`
### Logical Operators ### Logical Operators
The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0`. The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0`
Note that logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operation will be evaluated, even if it might not be necessary.
- And `a && b` - And `a && b`
- Or `a || b` - Or `a || b`
- Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`) - Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`)
@@ -163,53 +69,37 @@ The equality and relational operations result in `1` if the condition is evaluat
- Less or equal than `a <= b` - Less or equal than `a <= b`
## Control-Flow ## Control-Flow
For conditions like in if or loops, every non-zero value is equal to `true`, and `0` is `false`. For conditions like in if or loops, every non zero value is equal to `true`, and `0` is `false`.
### Loop ### Loop
- The `loop` keyword can be used as an infinite loop, as a while loop or as a while loop with - There is currently only the `loop` keyword that can act like a `while` with optional advancement (an expression that is executed after the loop body)
advancement (an expression that is executed after each loop) - The `loop` keyword is followed by the condition (an expression) without needing parentheses
- If only `loop` is used, directly followed by the body, it is an infinite loop that needs to be
terminated by using the `break` keyword
- The `loop` keyword can be followed by the condition (an expression) without needing parentheses
- *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement - *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement
- The loops body is wrapped in braces (`{ }`) just like in C/C++ - The loops body is wrapped in braces (`{ }`) just like in C/C++
- The `continue` keyword can be used to end the current loop iteration early
- The `break` keyword can be used to fully break out of the current loop
``` ```
// Print the numbers from 0 to 9 // Print the numbers from 0 to 9
// With endless loop
i <- 0;
loop {
if i >= 10 {
break;
}
print i;
i = i + 1;
}
// Without advancement // Without advancement
i <- 0; i <- 0;
loop i < 10 { loop i < 10 {
print i; print i;
i = i + 1; i = i - 1;
} }
// With advancement // With advancement
k <- 0; k <- 0;
loop k < 10; k = k + 1 { loop k < 10; k = k - 1 {
print k; print k;
} }
``` ```
### If / Else ### If / Else
- The language supports `if` and an optional `else` - The language supports `if` and an optional `else`
- After the `if` keyword must be the deciding condition, parentheses are not needed - After the `if` keyword must be the deciding condition, parentheses are not needed
- The blocks are wrapped in braces (`{ }`) - The block *if-true* block is wrapped in braces (`{ }`)
- *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block - *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block
- NOTE: Logical operators like AND / OR do not support short-circuit evaluation. So Both sides of
the logical operations will be evaluated, even if it might not be necessary
``` ```
a <- 1; a <- 1;
b <- 2; b <- 2;
@@ -222,88 +112,15 @@ if a == b {
} }
``` ```
### Block Scopes
- It is possible to create a limited scope for local variables that will no longer exist once the
scope ends
- Shadowing variables by redefining a variable in an inner scope is supported
```
var_in_outer_scope <- 5;
{
var_in_inner_scope <- 3;
// Inner scope can access both vars
print var_in_outer_scope;
print var_in_inner_scope;
}
// Outer scope is still valid
print var_in_outer_scope;
// !!! THIS DOES NOT WORK !!!
// The inner scope has ended
print var_in_inner_scope;
```
## Functions
### Function definition
- Functions can be defined by using the `fun` keyword, followed by the function name and the
parameters in parentheses. After the parentheses, the body is specified inside a braces block
- The function parameters are specified by only their names
- The function body has its own scope
- Parameters are only accessible inside the body
- Variables from the outer scope can be accessed and modified if the are defined before the function
- Variables from the outer scope are shadowed by parameters or local variables with the same name
- The `return` keyword can be used to return a value from the function and exit it immediately
- If no return is specified, a special `void` value is returned. That value can't be used in
calculations or comparisons, but can be stored in a variable (even tho it doesn't make sense)
- Functions can only be defined at the top-level. So defining a function inside of any other scoped
block (like inside another function, if, loop, ...) is invalid
- Functions can only be used after definition and there is no forward declaration right now
- However a function can be called recursively inside of itself
- Functions can't be redefined, so defining a function with an existing name is invalid
```
fun add_maybe(a, b) {
if a < 100 {
return a;
} else {
return a + b;
}
}
fun println(val) {
print val;
print "\n";
}
```
### Function calls
- Function calls are primary expressions, so they can be directly used in calculations (if they
return appropriate values)
- Function calls are performed by writing the function name, followed by the arguments in parentheses
- The arguments can be any expressions, separated by commas
```
b <- 100;
result <- add_maybe(250, b);
// Prints 350 + new-line
println(result);
```
## IO ## IO
### Print ### Print
Printing is implemented via the `print` keyword Printing is implemented via the `print` keyword
- The `print` keyword is followed by an expression, the value of which will be printed to the terminal - The `print` keyword is followed by an expression, the value of which will be printed to the terminal.
- To add a line break a string print can be used `print "\n";` - Print currently automatically adds a linebreak
``` ```
a <- 1; a <- 1;
// Outputs `1` to the terminal print a; // Outputs `"1\n"` to the terminal
print a;
// Outputs a new-line to the terminal
print "\n";
``` ```
## Comments ## Comments
@@ -323,8 +140,6 @@ Line comments can be initiated by using `//`
- [x] Lexer: Transforms text into Tokens - [x] Lexer: Transforms text into Tokens
- [x] Parser: Transforms Tokens into Abstract Syntax Tree - [x] Parser: Transforms Tokens into Abstract Syntax Tree
- [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements - [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements
- [x] Simple optimizer: Apply trivial optimizations to the Ast
- [x] Precalculate binary ops / unary ops that have only literal operands
## Language features ## Language features
@@ -334,7 +149,7 @@ Line comments can be initiated by using `//`
- [x] Subtraction `a - b` - [x] Subtraction `a - b`
- [x] Multiplication `a * b` - [x] Multiplication `a * b`
- [x] Division `a / b` - [x] Division `a / b`
- [x] Modulo `a % b` - [x] Modulo `a % b
- [x] Negate `-a` - [x] Negate `-a`
- [x] Parentheses `(a + b) * c` - [x] Parentheses `(a + b) * c`
- [x] Logical boolean operators - [x] Logical boolean operators
@@ -358,43 +173,23 @@ Line comments can be initiated by using `//`
- [x] Variables - [x] Variables
- [x] Declaration - [x] Declaration
- [x] Assignment - [x] Assignment
- [x] Local variables (for example inside loop, if, else, functions)
- [x] Scoped block for specific local vars `{ ... }`
- [x] Statements with semicolon & Multiline programs - [x] Statements with semicolon & Multiline programs
- [x] Control flow - [x] Control flow
- [x] Loops - [x] While loop `while X { ... }`
- [x] While-style loop `loop X { ... }`
- [x] For-style loop without with `X` as condition and `Y` as advancement `loop X; Y { ... }`
- [x] Infinite loop `loop { ... }`
- [x] Break `break`
- [x] Continue `continue`
- [x] If else statement `if X { ... } else { ... }` - [x] If else statement `if X { ... } else { ... }`
- [x] If Statement - [x] If Statement
- [x] Else statement - [x] Else statement
- [x] Line comments `//` - [x] Line comments `//`
- [x] Strings - [x] Strings
- [x] Arrays
- [x] Creating array with size `X` as a variable `arr <- [X]`
- [x] Accessing arrays by index `arr[X]`
- [x] IO Intrinsics - [x] IO Intrinsics
- [x] Print - [x] Print
- [x] Functions
- [x] Function declaration `fun f(X, Y, Z) { ... }`
- [x] Function calls `f(1, 2, 3)`
- [x] Function returns `return X`
- [x] Local variables
- [x] Pass arrays by-reference, i64 by-vale, string is a const ref
# Parsing Grammar ## Grammar
## Expressions ### Expressions
``` ```
ARRAY_LITERAL = "[" expr "]" LITERAL = I64_LITERAL | STR_LITERAL
ARRAY_ACCESS = IDENT "[" expr "]" expr_primary = LITERAL | IDENT | "(" expr ")" | "-" expr_primary | "~" expr_primary
FUN_CALL = IDENT "(" (expr ",")* expr? ")"
LITERAL = I64_LITERAL | STR_LITERAL | ARRAY_LITERAL
expr_primary = LITERAL | IDENT | FUN_CALL | ARRAY_ACCESS | "(" expr ")" | "-" expr_primary
| "~" expr_primary
expr_mul = expr_primary (("*" | "/" | "%") expr_primary)* expr_mul = expr_primary (("*" | "/" | "%") expr_primary)*
expr_add = expr_mul (("+" | "-") expr_mul)* expr_add = expr_mul (("+" | "-") expr_mul)*
expr_shift = expr_add ((">>" | "<<") expr_add)* expr_shift = expr_add ((">>" | "<<") expr_add)*
@@ -408,33 +203,10 @@ expr_lor = expr_land ("||" expr_land)*
expr = expr_lor expr = expr_lor
``` ```
## Statements ### Statements
``` ```
stmt_return = "return" expr ";"
stmt_break = "break" ";"
stmt_continue = "continue" ";"
stmt_var_decl = IDENT "<-" expr ";"
stmt_fun_decl = "fun" IDENT "(" (IDENT ",")* IDENT? ")" "{" stmt* "}"
stmt_expr = expr ";"
stmt_block = "{" stmt* "}"
stmt_loop = "loop" (expr (";" expr)?)? "{" stmt* "}"
stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")? stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")?
stmt_print = "print" expr ";" stmt_loop = "loop" expr (";" expr)? "{" stmt* "}"
stmt = stmt_return | stmt_break | stmt_continue | stmt_var_decl | stmt_fun_decl stmt_expr = expr ";"
| stmt_expr | stmt_block | stmt_loop | stmt_if | stmt_print stmt = stmt_expr | stmt_loop
``` ```
# Examples
There are a bunch of examples in the [examples](examples/) directory. Those include (non-optimal) solutions to the first five project euler problems, as well as a [simple Game of Life implementation](examples/game_of_life.nek).
To run an example via `cargo-run`, use:
```
cargo run --release -- examples/[NAME]
```
# Extras
## Visual Studio Code Language Support
A VSCode extension that provides simple syntax highlighing for nek is also available on
[gitlab](https://code.fbi.h-da.de/advanced-systems-programming-ws21/x4/nek-lang-vscode). Since this
is a very small scale project, the extension was not published and instuctions on how to install it
can be found in the mentioned repository.

View File

@@ -7,7 +7,7 @@
sum <- 0; sum <- 0;
i <- 0; i <- 0;
loop i < 1_000; i = i + 1 { loop i < 1_000; i = i + 1 {
if i % 3 == 0 || i % 5 == 0 { if i % 3 == 0 | i % 5 == 0 {
sum = sum + i; sum = sum + i;
} }
} }

View File

@@ -10,12 +10,14 @@ sum <- 0;
a <- 0; a <- 0;
b <- 1; b <- 1;
tmp <- 0;
loop a < 4_000_000 { loop a < 4_000_000 {
if a % 2 == 0 { if a % 2 == 0 {
sum = sum + a; sum = sum + a;
} }
tmp <- a; tmp = a;
a = b; a = b;
b = b + tmp; b = b + tmp;
} }

View File

@@ -18,10 +18,10 @@ loop number > 1 {
div = div + 1; div = div + 1;
if div * div > number { if div * div > number {
if number > 1 && number > result { if number > 1 & number > result {
result = number; result = number;
} }
break; number = 0;
} }
} }

View File

@@ -4,25 +4,30 @@
// //
// Correct Answer: 906609 // Correct Answer: 906609
fun reverse(n) {
rev <- 0;
loop n {
rev = rev * 10 + n % 10;
n = n / 10;
}
return rev;
}
res <- 0; res <- 0;
i <- 100; tmp <- 0;
loop i < 1_000; i = i + 1 { num <- 0;
k <- i; num_rev <- 0;
loop k < 1_000; k = k + 1 {
num <- i * k;
num_rev <- reverse(num);
if num == num_rev && num > res { i <- 100;
k <- 100;
loop i < 1_000; i = i + 1 {
k = 100;
loop k < 1_000; k = k + 1 {
num_rev = 0;
num = i * k;
tmp = num;
loop tmp {
num_rev = num_rev*10 + tmp % 10;
tmp = tmp / 10;
}
if num == num_rev & num > res {
res = num; res = num;
} }
} }

View File

@@ -4,19 +4,19 @@
# #
# Correct Answer: 906609 # Correct Answer: 906609
def reverse(n):
rev = 0
while n:
rev = rev * 10 + n % 10
n //= 10
return rev
res = 0 res = 0
for i in range(100, 1_000): for i in range(100, 999):
for k in range(i, 1_000): for k in range(100, 999):
num = i * k num = i * k
num_rev = reverse(num) tmp = num
num_rev = 0
while tmp != 0:
num_rev = num_rev*10 + tmp % 10
tmp = tmp // 10
if num == num_rev and num > res: if num == num_rev and num > res:
res = num res = num

View File

@@ -3,21 +3,26 @@
// //
// Correct Answer: 232_792_560 // Correct Answer: 232_792_560
fun gcd(x, y) { num <- 20;
loop y { should_continue <- 1;
tmp <- x; i <- 2;
x = y;
y = tmp % y; loop should_continue {
should_continue = 0;
i = 20;
loop i >= 2; i = i - 1 {
if num % i != 0 {
should_continue = 1;
// break
i = 0;
}
} }
return x; if should_continue == 1 {
num = num + 20;
}
} }
result <- 1; print num;
i <- 1;
loop i <= 20; i = i + 1 {
result = result * (i / gcd(i, result));
}
print result;

View File

@@ -1,15 +0,0 @@
# 2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
# What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?
#
# Correct Answer: 232_792_560
def gcd(x, y):
while y:
x, y = y, x % y
return x
result = 1
for i in range(1, 21):
result *= i // gcd(i, result)
print(result)

View File

@@ -1,134 +0,0 @@
fun print_field(field, width, height) {
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
if field[y*height + x] {
print "# ";
} else {
print ". ";
}
}
print "\n";
}
print "\n";
}
fun count_neighbours(field, x, y, width, height) {
neighbours <- 0;
if y > 0 {
if x > 0 {
if field[(y-1)*width + (x-1)] {
// Top left
neighbours = neighbours + 1;
}
}
if field[(y-1)*width + x] {
// Top
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y-1)*width + (x+1)] {
// Top right
neighbours = neighbours + 1;
}
}
}
if x > 0 {
if field[y*width + (x-1)] {
// Left
neighbours = neighbours + 1;
}
}
if x < width-1 {
if field[y*width + (x+1)] {
// Right
neighbours = neighbours + 1;
}
}
if y < height-1 {
if x > 0 {
if field[(y+1)*width + (x-1)] {
// Bottom left
neighbours = neighbours + 1;
}
}
if field[(y+1)*width + x] {
// Bottom
neighbours = neighbours + 1;
}
if x < width-1 {
if field[(y+1)*width + (x+1)] {
// Bottom right
neighbours = neighbours + 1;
}
}
}
return neighbours;
}
fun copy(from, to, len) {
i <- 0;
loop i < len; i = i + 1 {
to[i] = from[i];
}
}
// Set the width and height of the field
width <- 10;
height <- 10;
// Create the main and temporary field
field <- [width*height];
field2 <- [width*height];
// Preset the main field with a glider
field[1] = 1;
field[12] = 1;
field[20] = 1;
field[21] = 1;
field[22] = 1;
fun run_gol(num_rounds) {
runs <- 0;
loop runs < num_rounds; runs = runs + 1 {
// Print the field
print_field(field, width, height);
// Calculate next stage from field and store into field2
y <- 0;
loop y < height; y = y+1 {
x <- 0;
loop x < width; x = x+1 {
// Get the neighbours of the current cell
neighbours <- count_neighbours(field, x, y, width, height);
// Set the new cell according to the neighbour count
if neighbours < 2 || neighbours > 3 {
field2[y*width + x] = 0;
} else {
if neighbours == 3 {
field2[y*width + x] = 1;
} else {
field2[y*width + x] = field[y*width + x];
}
}
}
}
// Transfer from field2 to field
copy(field2, field, width*height);
}
}
run_gol(32);

View File

@@ -1,9 +0,0 @@
fun fib(n) {
if n <= 1 {
return n;
} else {
return fib(n-1) + fib(n-2);
}
}
print fib(30);

View File

@@ -1,6 +0,0 @@
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
print(fib(30))

View File

@@ -1,31 +0,0 @@
fun square(a) {
return a * a;
}
fun add(a, b) {
return a + b;
}
fun mul(a, b) {
return a * b;
}
// Funtion with multiple args & nested calls to different functions
fun addmul(a, b, c) {
return mul(add(a, b), c);
}
a <- 10;
b <- 20;
c <- 3;
result <- addmul(a, b, c) + square(c);
// Access and modify outer variable. Argument `a` must not be used from outer var
fun sub_from_result(a) {
result = result - a;
}
sub_from_result(30);
print result;

View File

@@ -1,7 +1,5 @@
use std::rc::Rc; use std::rc::Rc;
use crate::stringstore::{StringStore, Sid};
/// Types for binary operators /// Types for binary operators
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum BinOpType { pub enum BinOpType {
@@ -61,6 +59,9 @@ pub enum BinOpType {
/// Assign value to variable /// Assign value to variable
Assign, Assign,
/// Declare new variable with value
Declare,
} }
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
@@ -80,18 +81,9 @@ pub enum Expression {
/// Integer literal (64-bit) /// Integer literal (64-bit)
I64(i64), I64(i64),
/// String literal /// String literal
String(Sid), String(Rc<String>),
/// Array with size
ArrayLiteral(Box<Expression>),
/// Array access with name, stackpos and position
ArrayAccess(Sid, usize, Box<Expression>),
FunCall(Sid, usize, Vec<Expression>),
/// Variable /// Variable
Var(Sid, usize), Var(String),
/// Binary operation. Consists of type, left hand side and right hand side /// Binary operation. Consists of type, left hand side and right hand side
BinOp(BinOpType, Box<Expression>, Box<Expression>), BinOp(BinOpType, Box<Expression>, Box<Expression>),
/// Unary operation. Consists of type and operand /// Unary operation. Consists of type and operand
@@ -101,11 +93,11 @@ pub enum Expression {
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub struct Loop { pub struct Loop {
/// The condition that determines if the loop should continue /// The condition that determines if the loop should continue
pub condition: Option<Expression>, pub condition: Expression,
/// This is executed after each loop to advance the condition variables /// This is executed after each loop to advance the condition variables
pub advancement: Option<Expression>, pub advancement: Option<Expression>,
/// The loop body that is executed each loop /// The loop body that is executed each loop
pub body: BlockScope, pub body: Ast,
} }
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
@@ -113,46 +105,22 @@ pub struct If {
/// The condition /// The condition
pub condition: Expression, pub condition: Expression,
/// The body that is executed when condition is true /// The body that is executed when condition is true
pub body_true: BlockScope, pub body_true: Ast,
/// The if body that is executed when the condition is false /// The if body that is executed when the condition is false
pub body_false: BlockScope, pub body_false: Ast,
}
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct FunDecl {
pub name: Sid,
pub fun_stackpos: usize,
pub argnames: Vec<Sid>,
pub body: Rc<BlockScope>,
}
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct VarDecl {
pub name: Sid,
pub var_stackpos: usize,
pub rhs: Expression,
} }
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Statement { pub enum Statement {
Return(Expression),
Break,
Continue,
Declaration(VarDecl),
FunDeclare(FunDecl),
Expr(Expression), Expr(Expression),
Block(BlockScope),
Loop(Loop), Loop(Loop),
If(If), If(If),
Print(Expression), Print(Expression),
} }
pub type BlockScope = Vec<Statement>; #[derive(Debug, PartialEq, Eq, Clone, Default)]
#[derive(Clone, Default)]
pub struct Ast { pub struct Ast {
pub stringstore: StringStore, pub prog: Vec<Statement>,
pub main: BlockScope,
} }
impl BinOpType { impl BinOpType {
@@ -165,6 +133,7 @@ impl BinOpType {
pub fn precedence(&self) -> u8 { pub fn precedence(&self) -> u8 {
match self { match self {
BinOpType::Declare => 0,
BinOpType::Assign => 1, BinOpType::Assign => 1,
BinOpType::LOr => 2, BinOpType::LOr => 2,
BinOpType::LAnd => 3, BinOpType::LAnd => 3,

View File

@@ -1,111 +0,0 @@
use crate::ast::{Ast, BlockScope, Expression, If, Loop, Statement, BinOpType, UnOpType, VarDecl};
pub trait AstOptimizer {
fn optimize(ast: Ast) -> Ast;
}
pub struct SimpleAstOptimizer;
impl AstOptimizer for SimpleAstOptimizer {
fn optimize(mut ast: Ast) -> Ast {
Self::optimize_block(&mut ast.main);
ast
}
}
impl SimpleAstOptimizer {
fn optimize_block(block: &mut BlockScope) {
for stmt in block {
match stmt {
Statement::Expr(expr) => Self::optimize_expr(expr),
Statement::Block(block) => Self::optimize_block(block),
Statement::Loop(Loop {
condition,
advancement,
body,
}) => {
if let Some(condition) = condition {
Self::optimize_expr(condition);
}
if let Some(advancement) = advancement {
Self::optimize_expr(advancement)
}
Self::optimize_block(body);
}
Statement::If(If {
condition,
body_true,
body_false,
}) => {
Self::optimize_expr(condition);
Self::optimize_block(body_true);
Self::optimize_block(body_false);
}
Statement::Print(expr) => Self::optimize_expr(expr),
Statement::Declaration(VarDecl { name: _, var_stackpos: _, rhs}) => Self::optimize_expr(rhs),
Statement::FunDeclare(_) => (),
Statement::Return(expr) => Self::optimize_expr(expr),
Statement::Break | Statement::Continue => (),
}
}
}
fn optimize_expr(expr: &mut Expression) {
match expr {
Expression::BinOp(bo, lhs, rhs) => {
Self::optimize_expr(lhs);
Self::optimize_expr(rhs);
// Precalculate binary operations that consist of 2 literals. No need to do this at
// runtime, as all parts of the calculation are known at *compiletime* / parsetime.
match (lhs.as_mut(), rhs.as_mut()) {
(Expression::I64(lhs), Expression::I64(rhs)) => {
let new_expr = match bo {
BinOpType::Add => Expression::I64(*lhs + *rhs),
BinOpType::Mul => Expression::I64(*lhs * *rhs),
BinOpType::Sub => Expression::I64(*lhs - *rhs),
BinOpType::Div => Expression::I64(*lhs / *rhs),
BinOpType::Mod => Expression::I64(*lhs % *rhs),
BinOpType::BOr => Expression::I64(*lhs | *rhs),
BinOpType::BAnd => Expression::I64(*lhs & *rhs),
BinOpType::BXor => Expression::I64(*lhs ^ *rhs),
BinOpType::LAnd => Expression::I64(if (*lhs != 0) && (*rhs != 0) { 1 } else { 0 }),
BinOpType::LOr => Expression::I64(if (*lhs != 0) || (*rhs != 0) { 1 } else { 0 }),
BinOpType::Shr => Expression::I64(*lhs >> *rhs),
BinOpType::Shl => Expression::I64(*lhs << *rhs),
BinOpType::EquEqu => Expression::I64(if lhs == rhs { 1 } else { 0 }),
BinOpType::NotEqu => Expression::I64(if lhs != rhs { 1 } else { 0 }),
BinOpType::Less => Expression::I64(if lhs < rhs { 1 } else { 0 }),
BinOpType::LessEqu => Expression::I64(if lhs <= rhs { 1 } else { 0 }),
BinOpType::Greater => Expression::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Expression::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Assign => unreachable!(),
};
*expr = new_expr;
},
_ => ()
}
}
Expression::UnOp(uo, operand) => {
Self::optimize_expr(operand);
// Precalculate unary operations just like binary ones
match operand.as_mut() {
Expression::I64(val) => {
let new_expr = match uo {
UnOpType::Negate => Expression::I64(-*val),
UnOpType::BNot => Expression::I64(!*val),
UnOpType::LNot => Expression::I64(if *val == 0 { 1 } else { 0 }),
};
*expr = new_expr;
}
_ => (),
}
}
_ => (),
}
}
}

View File

@@ -1,208 +1,85 @@
use std::{cell::RefCell, rc::Rc}; use std::{fmt::Display, rc::Rc};
use thiserror::Error;
use crate::{ use crate::{
ast::{Ast, BinOpType, BlockScope, Expression, FunDecl, If, Statement, UnOpType}, ast::{Ast, BinOpType, Expression, If, Statement, UnOpType},
astoptimizer::{AstOptimizer, SimpleAstOptimizer},
lexer::lex, lexer::lex,
nice_panic,
parser::parse, parser::parse,
stringstore::{Sid, StringStore},
}; };
#[derive(Debug, Error)]
pub enum RuntimeError {
#[error("Invalid array Index: {0:?}")]
InvalidArrayIndex(Value),
#[error("Variable used but not declared: {0}")]
VarUsedNotDeclared(String),
#[error("Can't index into non-array variable: {0}")]
TryingToIndexNonArray(String),
#[error("Invalid value type for unary operation: {0:?}")]
UnOpInvalidType(Value),
#[error("Incompatible binary operations. Operands don't match: {0:?} and {1:?}")]
BinOpIncompatibleTypes(Value, Value),
#[error("Array access out of bounds: Accessed {0}, size is {1}")]
ArrayOutOfBounds(usize, usize),
#[error("Division by zero")]
DivideByZero,
#[error("Invalid number of arguments for function {0}. Expected {1}, got {2}")]
InvalidNumberOfArgs(String, usize, usize),
}
#[derive(Debug, PartialEq, Eq, Clone)] #[derive(Debug, PartialEq, Eq, Clone)]
pub enum Value { pub enum Value {
I64(i64), I64(i64),
String(Sid), String(Rc<String>),
Array(Rc<RefCell<Vec<Value>>>),
Void,
} }
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum BlockExit {
Normal,
Break,
Continue,
Return(Value),
}
#[derive(Default)]
pub struct Interpreter { pub struct Interpreter {
pub optimize_ast: bool,
pub print_tokens: bool,
pub print_ast: bool,
pub capture_output: bool,
output: Vec<Value>,
// Variable table stores the runtime values of variables // Variable table stores the runtime values of variables
vartable: Vec<Value>, vartable: Vec<(String, Value)>,
funtable: Vec<FunDecl>,
stringstore: StringStore,
} }
impl Interpreter { impl Interpreter {
pub fn new() -> Self { pub fn new() -> Self {
Self { Self {
optimize_ast: true, vartable: Vec::new(),
..Self::default()
} }
} }
pub fn output(&self) -> &[Value] { fn get_var(&self, name: &str) -> Option<Value> {
&self.output self.vartable
.iter()
.rev()
.find(|it| it.0 == name)
.map(|it| it.1.clone())
} }
fn get_var(&self, idx: usize) -> Option<Value> { fn get_var_mut(&mut self, name: &str) -> Option<&mut Value> {
self.vartable.get(self.vartable.len() - idx - 1).cloned() self.vartable
.iter_mut()
.rev()
.find(|it| it.0 == name)
.map(|it| &mut it.1)
} }
fn get_var_mut(&mut self, idx: usize) -> Option<&mut Value> { pub fn run_str(&mut self, code: &str, print_tokens: bool, print_ast: bool) {
let idx = self.vartable.len() - idx - 1; let tokens = lex(code).unwrap();
self.vartable.get_mut(idx) if print_tokens {
}
pub fn run_str(&mut self, code: &str) {
let tokens = match lex(code) {
Ok(tokens) => tokens,
Err(e) => nice_panic!("Lexing error: {}", e),
};
if self.print_tokens {
println!("Tokens: {:?}", tokens); println!("Tokens: {:?}", tokens);
} }
let ast = match parse(tokens) { let ast = parse(tokens);
Ok(ast) => ast, if print_ast {
Err(e) => nice_panic!("Parsing error: {}", e), println!("{:#?}", ast);
};
match self.run_ast(ast) {
Ok(_) => (),
Err(e) => nice_panic!("Runtime error: {}", e),
}
} }
pub fn run_ast(&mut self, mut ast: Ast) -> Result<(), RuntimeError> { self.run(&ast);
if self.optimize_ast {
ast = SimpleAstOptimizer::optimize(ast);
} }
if self.print_ast { pub fn run(&mut self, prog: &Ast) {
println!("{:#?}", ast.main); let vartable_len = self.vartable.len();
} for stmt in &prog.prog {
self.stringstore = ast.stringstore;
self.run_block(&ast.main)?;
Ok(())
}
pub fn run_block(&mut self, prog: &BlockScope) -> Result<BlockExit, RuntimeError> {
self.run_block_fp_offset(prog, 0)
}
pub fn run_block_fp_offset(
&mut self,
prog: &BlockScope,
framepointer_offset: usize,
) -> Result<BlockExit, RuntimeError> {
let framepointer = self.vartable.len() - framepointer_offset;
for stmt in prog {
match stmt { match stmt {
Statement::Break => return Ok(BlockExit::Break),
Statement::Continue => return Ok(BlockExit::Continue),
Statement::Return(expr) => {
let val = self.resolve_expr(expr)?;
self.vartable.truncate(framepointer);
return Ok(BlockExit::Return(val));
}
Statement::Expr(expr) => { Statement::Expr(expr) => {
self.resolve_expr(expr)?; self.resolve_expr(expr);
} }
Statement::Declaration(decl) => {
let rhs = self.resolve_expr(&decl.rhs)?;
self.vartable.push(rhs);
}
Statement::Block(block) => match self.run_block(block)? {
// Propagate return, continue and break
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
self.vartable.truncate(framepointer);
return Ok(be);
}
_ => (),
},
Statement::Loop(looop) => { Statement::Loop(looop) => {
// loop runs as long condition != 0 // loop runs as long condition != 0
loop { loop {
if let Some(condition) = &looop.condition { if matches!(self.resolve_expr(&looop.condition), Value::I64(0)) {
if matches!(self.resolve_expr(condition)?, Value::I64(0)) {
break; break;
} }
}
let be = self.run_block(&looop.body)?; self.run(&looop.body);
match be {
// Propagate return
be @ BlockExit::Return(_) => {
self.vartable.truncate(framepointer);
return Ok(be);
}
BlockExit::Break => break,
BlockExit::Continue | BlockExit::Normal => (),
}
if let Some(adv) = &looop.advancement { if let Some(adv) = &looop.advancement {
self.resolve_expr(&adv)?; self.resolve_expr(&adv);
} }
} }
} }
Statement::Print(expr) => { Statement::Print(expr) => {
let result = self.resolve_expr(expr)?; let result = self.resolve_expr(expr);
print!("{}", result);
if self.capture_output {
self.output.push(result)
} else {
print!("{}", self.value_to_string(&result));
}
} }
Statement::If(If { Statement::If(If {
@@ -210,230 +87,73 @@ impl Interpreter {
body_true, body_true,
body_false, body_false,
}) => { }) => {
let exit = if matches!(self.resolve_expr(condition)?, Value::I64(0)) { if matches!(self.resolve_expr(condition), Value::I64(0)) {
self.run_block(body_false)? self.run(body_false);
} else { } else {
self.run_block(body_true)? self.run(body_true);
};
match exit {
// Propagate return, continue and break
be @ (BlockExit::Return(_) | BlockExit::Continue | BlockExit::Break) => {
self.vartable.truncate(framepointer);
return Ok(be);
} }
_ => (),
}
}
Statement::FunDeclare(fundec) => {
self.funtable.push(fundec.clone());
} }
} }
} }
self.vartable.truncate(framepointer); self.vartable.truncate(vartable_len);
Ok(BlockExit::Normal)
} }
fn resolve_expr(&mut self, expr: &Expression) -> Result<Value, RuntimeError> { fn resolve_expr(&mut self, expr: &Expression) -> Value {
let val = match expr { match expr {
Expression::I64(val) => Value::I64(*val), Expression::I64(val) => Value::I64(*val),
Expression::ArrayLiteral(size) => {
let size = match self.resolve_expr(size)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
Value::Array(Rc::new(RefCell::new(vec![Value::I64(0); size as usize])))
}
Expression::String(text) => Value::String(text.clone()), Expression::String(text) => Value::String(text.clone()),
Expression::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, lhs, rhs)?, Expression::BinOp(bo, lhs, rhs) => self.resolve_binop(bo, lhs, rhs),
Expression::UnOp(uo, operand) => self.resolve_unop(uo, operand)?, Expression::UnOp(uo, operand) => self.resolve_unop(uo, operand),
Expression::Var(name, idx) => self.resolve_var(*name, *idx)?, Expression::Var(name) => self.resolve_var(name),
Expression::ArrayAccess(name, idx, arr_idx) => {
self.resolve_array_access(*name, *idx, arr_idx)?
}
Expression::FunCall(fun_name, fun_stackpos, args) => {
let args_len = args.len();
// All of the arg expressions must be resolved before pushing the vars on the stack,
// otherwise the stack positions are incorrect while resolving
let args = args
.iter()
.map(|arg| self.resolve_expr(arg))
.collect::<Vec<_>>();
for arg in args {
self.vartable.push(arg?);
}
// Function existance has been verified in the parser, so unwrap here shouldn't fail
let expected_num_args = self.funtable.get(*fun_stackpos).unwrap().argnames.len();
if expected_num_args != args_len {
let fun_name = self
.stringstore
.lookup(*fun_name)
.cloned()
.unwrap_or("<unknown>".to_string());
return Err(RuntimeError::InvalidNumberOfArgs(
fun_name,
expected_num_args,
args_len,
));
}
match self.run_block_fp_offset(
&Rc::clone(&self.funtable.get(*fun_stackpos).unwrap().body),
expected_num_args,
)? {
BlockExit::Normal | BlockExit::Continue | BlockExit::Break => Value::Void,
BlockExit::Return(val) => val,
} }
} }
};
Ok(val) fn resolve_var(&mut self, name: &str) -> Value {
} match self.get_var(name) {
Some(val) => val.clone(),
fn resolve_array_access( None => panic!("Variable '{}' used but not declared", name),
&mut self,
name: Sid,
idx: usize,
arr_idx: &Expression,
) -> Result<Value, RuntimeError> {
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
let val = match self.get_var(idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
let arr = match val {
Value::Array(arr) => arr,
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
let arr = arr.borrow_mut();
arr.get(arr_idx as usize)
.cloned()
.ok_or(RuntimeError::ArrayOutOfBounds(arr_idx as usize, arr.len()))
}
fn resolve_var(&mut self, name: Sid, idx: usize) -> Result<Value, RuntimeError> {
match self.get_var(idx) {
Some(val) => Ok(val),
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
} }
} }
fn resolve_unop(&mut self, uo: &UnOpType, operand: &Expression) -> Result<Value, RuntimeError> { fn resolve_unop(&mut self, uo: &UnOpType, operand: &Expression) -> Value {
let operand = self.resolve_expr(operand)?; let operand = self.resolve_expr(operand);
Ok(match (operand, uo) { match (operand, uo) {
(Value::I64(val), UnOpType::Negate) => Value::I64(-val), (Value::I64(val), UnOpType::Negate) => Value::I64(-val),
(Value::I64(val), UnOpType::BNot) => Value::I64(!val), (Value::I64(val), UnOpType::BNot) => Value::I64(!val),
(Value::I64(val), UnOpType::LNot) => Value::I64(if val == 0 { 1 } else { 0 }), (Value::I64(val), UnOpType::LNot) => Value::I64(if val == 0 { 1 } else { 0 }),
(val, _) => return Err(RuntimeError::UnOpInvalidType(val)), _ => panic!("Value type is not compatible with unary operation"),
}) }
} }
fn resolve_binop( fn resolve_binop(&mut self, bo: &BinOpType, lhs: &Expression, rhs: &Expression) -> Value {
&mut self, let rhs = self.resolve_expr(rhs);
bo: &BinOpType,
lhs: &Expression,
rhs: &Expression,
) -> Result<Value, RuntimeError> {
let rhs = self.resolve_expr(rhs)?;
match (&bo, &lhs) { match (&bo, &lhs) {
(BinOpType::Assign, Expression::Var(name, idx)) => { (BinOpType::Declare, Expression::Var(name)) => {
match self.get_var_mut(*idx) { self.vartable.push((name.clone(), rhs.clone()));
return rhs;
}
(BinOpType::Assign, Expression::Var(name)) => {
match self.get_var_mut(name) {
Some(val) => *val = rhs.clone(), Some(val) => *val = rhs.clone(),
None => { None => panic!("Runtime Error: Trying to assign value to undeclared variable"),
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
} }
} return rhs;
return Ok(rhs);
}
(BinOpType::Assign, Expression::ArrayAccess(name, idx, arr_idx)) => {
let arr_idx = match self.resolve_expr(arr_idx)? {
Value::I64(size) if !size.is_negative() => size,
val => return Err(RuntimeError::InvalidArrayIndex(val)),
};
let val = match self.get_var_mut(*idx) {
Some(val) => val,
None => {
return Err(RuntimeError::VarUsedNotDeclared(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
};
match val {
Value::Array(arr) => arr.borrow_mut()[arr_idx as usize] = rhs.clone(),
_ => {
return Err(RuntimeError::TryingToIndexNonArray(
self.stringstore
.lookup(*name)
.cloned()
.unwrap_or_else(|| "<unknown>".to_string()),
))
}
}
return Ok(rhs);
} }
_ => (), _ => (),
} }
let lhs = self.resolve_expr(lhs)?; let lhs = self.resolve_expr(lhs);
let result = match (lhs, rhs) { match (lhs, rhs) {
(Value::I64(lhs), Value::I64(rhs)) => match bo { (Value::I64(lhs), Value::I64(rhs)) => match bo {
BinOpType::Add => Value::I64(lhs + rhs), BinOpType::Add => Value::I64(lhs + rhs),
BinOpType::Mul => Value::I64(lhs * rhs), BinOpType::Mul => Value::I64(lhs * rhs),
BinOpType::Sub => Value::I64(lhs - rhs), BinOpType::Sub => Value::I64(lhs - rhs),
BinOpType::Div => { BinOpType::Div => Value::I64(lhs / rhs),
Value::I64(lhs.checked_div(rhs).ok_or(RuntimeError::DivideByZero)?) BinOpType::Mod => Value::I64(lhs % rhs),
}
BinOpType::Mod => {
Value::I64(lhs.checked_rem(rhs).ok_or(RuntimeError::DivideByZero)?)
}
BinOpType::BOr => Value::I64(lhs | rhs), BinOpType::BOr => Value::I64(lhs | rhs),
BinOpType::BAnd => Value::I64(lhs & rhs), BinOpType::BAnd => Value::I64(lhs & rhs),
BinOpType::BXor => Value::I64(lhs ^ rhs), BinOpType::BXor => Value::I64(lhs ^ rhs),
@@ -448,25 +168,18 @@ impl Interpreter {
BinOpType::Greater => Value::I64(if lhs > rhs { 1 } else { 0 }), BinOpType::Greater => Value::I64(if lhs > rhs { 1 } else { 0 }),
BinOpType::GreaterEqu => Value::I64(if lhs >= rhs { 1 } else { 0 }), BinOpType::GreaterEqu => Value::I64(if lhs >= rhs { 1 } else { 0 }),
BinOpType::Assign => unreachable!(), BinOpType::Declare | BinOpType::Assign => unreachable!(),
}, },
(lhs, rhs) => return Err(RuntimeError::BinOpIncompatibleTypes(lhs, rhs)), _ => panic!("Value types are not compatible"),
}; }
}
Ok(result)
} }
fn value_to_string(&self, val: &Value) -> String { impl Display for Value {
match val { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
Value::I64(val) => format!("{}", val), match self {
Value::Array(val) => format!("{:?}", val.borrow()), Value::I64(val) => write!(f, "{}", val),
Value::String(text) => format!( Value::String(text) => write!(f, "{}", text),
"{}",
self.stringstore
.lookup(*text)
.unwrap_or(&"<invalid string>".to_string())
),
Value::Void => format!("void"),
} }
} }
} }
@@ -499,7 +212,7 @@ mod test {
let expected = Value::I64(11); let expected = Value::I64(11);
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
let actual = interpreter.resolve_expr(&ast).unwrap(); let actual = interpreter.resolve_expr(&ast);
assert_eq!(expected, actual); assert_eq!(expected, actual);
} }

View File

@@ -1,8 +1,8 @@
use crate::token::Token;
use anyhow::Result;
use std::{iter::Peekable, str::Chars}; use std::{iter::Peekable, str::Chars};
use thiserror::Error; use thiserror::Error;
use crate::{token::Token, T};
#[derive(Debug, Error)] #[derive(Debug, Error)]
pub enum LexErr { pub enum LexErr {
#[error("Failed to parse '{0}' as i64")] #[error("Failed to parse '{0}' as i64")]
@@ -20,101 +20,116 @@ pub enum LexErr {
/// Lex the provided code into a Token Buffer /// Lex the provided code into a Token Buffer
pub fn lex(code: &str) -> Result<Vec<Token>, LexErr> { pub fn lex(code: &str) -> Result<Vec<Token>, LexErr> {
let lexer = Lexer::new(code); let mut lexer = Lexer::new(code);
lexer.lex() lexer.lex()
} }
struct Lexer<'a> { struct Lexer<'a> {
/// The sourcecode text as an iterator over the chars /// The sourcecode text as an iterator over the chars
code: Peekable<Chars<'a>>, code: Peekable<Chars<'a>>,
/// The lexed tokens
tokens: Vec<Token>,
/// The sourcecode character that is currently being lexed
current_char: char,
} }
impl<'a> Lexer<'a> { impl<'a> Lexer<'a> {
fn new(code: &'a str) -> Self { fn new(code: &'a str) -> Self {
let code = code.chars().peekable(); let code = code.chars().peekable();
let tokens = Vec::new(); Self { code }
let current_char = '\0';
Self {
code,
tokens,
current_char,
}
} }
fn lex(mut self) -> Result<Vec<Token>, LexErr> { fn lex(&mut self) -> Result<Vec<Token>, LexErr> {
let mut tokens = Vec::new();
loop { loop {
self.current_char = self.next(); match self.next() {
match (self.current_char, self.peek()) {
// Stop lexing at EOF // Stop lexing at EOF
('\0', _) => break, '\0' => break,
// Skip whitespace // Skip whitespace
(' ' | '\t' | '\n' | '\r', _) => (), ' ' | '\t' | '\n' | '\r' => (),
// Line comment. Consume every char until linefeed (next line) // Line comment. Consume every char until linefeed (next line)
('/', '/') => while !matches!(self.next(), '\n' | '\0') {}, '/' if matches!(self.peek(), '/') => while !matches!(self.next(), '\n' | '\0') {},
// Double character tokens // Double character tokens
('>', '>') => self.push_tok_consume(T![>>]), '>' if matches!(self.peek(), '>') => {
('<', '<') => self.push_tok_consume(T![<<]), self.next();
('=', '=') => self.push_tok_consume(T![==]), tokens.push(Token::Shr);
('!', '=') => self.push_tok_consume(T![!=]), }
('<', '=') => self.push_tok_consume(T![<=]), '<' if matches!(self.peek(), '<') => {
('>', '=') => self.push_tok_consume(T![>=]), self.next();
('<', '-') => self.push_tok_consume(T![<-]), tokens.push(Token::Shl);
('&', '&') => self.push_tok_consume(T![&&]), }
('|', '|') => self.push_tok_consume(T![||]), '=' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::EquEqu);
}
'!' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::NotEqu);
}
'<' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::LAngleEqu);
}
'>' if matches!(self.peek(), '=') => {
self.next();
tokens.push(Token::RAngleEqu);
}
'<' if matches!(self.peek(), '-') => {
self.next();
tokens.push(Token::LArrow);
}
'&' if matches!(self.peek(), '&') => {
self.next();
tokens.push(Token::LAnd);
}
'|' if matches!(self.peek(), '|') => {
self.next();
tokens.push(Token::LOr);
}
// Single character tokens // Single character tokens
(',', _) => self.push_tok(T![,]), ';' => tokens.push(Token::Semicolon),
(';', _) => self.push_tok(T![;]), '+' => tokens.push(Token::Add),
('+', _) => self.push_tok(T![+]), '-' => tokens.push(Token::Sub),
('-', _) => self.push_tok(T![-]), '*' => tokens.push(Token::Mul),
('*', _) => self.push_tok(T![*]), '/' => tokens.push(Token::Div),
('/', _) => self.push_tok(T![/]), '%' => tokens.push(Token::Mod),
('%', _) => self.push_tok(T![%]), '|' => tokens.push(Token::BOr),
('|', _) => self.push_tok(T![|]), '&' => tokens.push(Token::BAnd),
('&', _) => self.push_tok(T![&]), '^' => tokens.push(Token::BXor),
('^', _) => self.push_tok(T![^]), '(' => tokens.push(Token::LParen),
('(', _) => self.push_tok(T!['(']), ')' => tokens.push(Token::RParen),
(')', _) => self.push_tok(T![')']), '~' => tokens.push(Token::Tilde),
('~', _) => self.push_tok(T![~]), '<' => tokens.push(Token::LAngle),
('<', _) => self.push_tok(T![<]), '>' => tokens.push(Token::RAngle),
('>', _) => self.push_tok(T![>]), '=' => tokens.push(Token::Equ),
('=', _) => self.push_tok(T![=]), '{' => tokens.push(Token::LBraces),
('{', _) => self.push_tok(T!['{']), '}' => tokens.push(Token::RBraces),
('}', _) => self.push_tok(T!['}']), '!' => tokens.push(Token::LNot),
('!', _) => self.push_tok(T![!]),
('[', _) => self.push_tok(T!['[']),
(']', _) => self.push_tok(T![']']),
// Special tokens with variable length // Special tokens with variable length
// Lex multiple characters together as numbers // Lex multiple characters together as numbers
('0'..='9', _) => self.lex_number()?, ch @ '0'..='9' => tokens.push(self.lex_number(ch)?),
// Lex multiple characters together as a string // Lex multiple characters together as a string
('"', _) => self.lex_str()?, '"' => tokens.push(self.lex_str()?),
// Lex multiple characters together as identifier // Lex multiple characters together as identifier
('a'..='z' | 'A'..='Z' | '_', _) => self.lex_identifier()?, ch @ ('a'..='z' | 'A'..='Z' | '_') => tokens.push(self.lex_identifier(ch)?),
(ch, _) => Err(LexErr::UnexpectedChar(ch))?, ch => Err(LexErr::UnexpectedChar(ch))?,
} }
} }
Ok(self.tokens) Ok(tokens)
} }
/// Lex multiple characters as a number until encountering a non numeric digit. The /// Lex multiple characters as a number until encountering a non numeric digit. This includes
/// successfully lexed i64 literal token is appended to the stored tokens. /// the first character
fn lex_number(&mut self) -> Result<(), LexErr> { fn lex_number(&mut self, first_char: char) -> Result<Token, LexErr> {
// String representation of the integer value // String representation of the integer value
let mut sval = String::from(self.current_char); let mut sval = String::from(first_char);
// Do as long as a next char exists and it is a numeric char // Do as long as a next char exists and it is a numeric char
loop { loop {
@@ -134,15 +149,11 @@ impl<'a> Lexer<'a> {
// Try to convert the string representation of the value to i64 // Try to convert the string representation of the value to i64
let i64val = sval.parse().map_err(|_| LexErr::NumericParse(sval))?; let i64val = sval.parse().map_err(|_| LexErr::NumericParse(sval))?;
Ok(Token::I64(i64val))
self.push_tok(T![i64(i64val)]);
Ok(())
} }
/// Lex characters as a string until encountering an unescaped closing doublequoute char '"'. /// Lex characters as a string until encountering an unescaped closing doublequoute char '"'
/// The successfully lexed string literal token is appended to the stored tokens. fn lex_str(&mut self) -> Result<Token, LexErr> {
fn lex_str(&mut self) -> Result<(), LexErr> {
// Opening " was consumed in match // Opening " was consumed in match
let mut text = String::new(); let mut text = String::new();
@@ -172,15 +183,12 @@ impl<'a> Lexer<'a> {
// Consume closing " // Consume closing "
self.next(); self.next();
self.push_tok(T![str(text)]); Ok(Token::String(text))
Ok(())
} }
/// Lex characters from the text as an identifier. The successfully lexed ident or keyword /// Lex characters from the text as an identifier. This includes the first character passed in
/// token is appended to the stored tokens. fn lex_identifier(&mut self, first_char: char) -> Result<Token, LexErr> {
fn lex_identifier(&mut self) -> Result<(), LexErr> { let mut ident = String::from(first_char);
let mut ident = String::from(self.current_char);
// Do as long as a next char exists and it is a valid char for an identifier // Do as long as a next char exists and it is a valid char for an identifier
loop { loop {
@@ -196,33 +204,16 @@ impl<'a> Lexer<'a> {
// Check for pre-defined keywords // Check for pre-defined keywords
let token = match ident.as_str() { let token = match ident.as_str() {
"loop" => T![loop], "loop" => Token::Loop,
"print" => T![print], "print" => Token::Print,
"if" => T![if], "if" => Token::If,
"else" => T![else], "else" => Token::Else,
"fun" => T![fun],
"return" => T![return],
"break" => T![break],
"continue" => T![continue],
// If it doesn't match a keyword, it is a normal identifier // If it doesn't match a keyword, it is a normal identifier
_ => T![ident(ident)], _ => Token::Ident(ident),
}; };
self.push_tok(token); Ok(token)
Ok(())
}
/// Push the given token into the stored tokens
fn push_tok(&mut self, token: Token) {
self.tokens.push(token);
}
/// Same as `push_tok` but also consumes the next token, removing it from the code iter
fn push_tok_consume(&mut self, token: Token) {
self.next();
self.tokens.push(token);
} }
/// Advance to next character and return the removed char /// Advance to next character and return the removed char
@@ -238,51 +229,31 @@ impl<'a> Lexer<'a> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use crate::{lexer::lex, T}; use super::{lex, Token};
#[test] #[test]
fn test_lexer() { fn test_lexer() {
let code = r#"53+1-567_000 * / % | ~ ! < > & ^ ({[]});= <- >= <= let code = "33 +5*2 + 4456467*2334+3 % - / << ^ | & >>";
== != && || << >> loop if else print my_123var "hello \t world\r\n\"\\""#;
let expected = vec![ let expected = vec![
T![i64(53)], Token::I64(33),
T![+], Token::Add,
T![i64(1)], Token::I64(5),
T![-], Token::Mul,
T![i64(567_000)], Token::I64(2),
T![*], Token::Add,
T![/], Token::I64(4456467),
T![%], Token::Mul,
T![|], Token::I64(2334),
T![~], Token::Add,
T![!], Token::I64(3),
T![<], Token::Mod,
T![>], Token::Sub,
T![&], Token::Div,
T![^], Token::Shl,
T!['('], Token::BXor,
T!['{'], Token::BOr,
T!['['], Token::BAnd,
T![']'], Token::Shr,
T!['}'],
T![')'],
T![;],
T![=],
T![<-],
T![>=],
T![<=],
T![==],
T![!=],
T![&&],
T![||],
T![<<],
T![>>],
T![loop],
T![if],
T![else],
T![print],
T![ident("my_123var".to_string())],
T![str("hello \t world\r\n\"\\".to_string())],
]; ];
let actual = lex(code).unwrap(); let actual = lex(code).unwrap();

View File

@@ -1,61 +1,5 @@
pub mod lexer;
pub mod token;
pub mod parser;
pub mod ast; pub mod ast;
pub mod interpreter; pub mod interpreter;
pub mod lexer;
pub mod parser;
pub mod token;
pub mod stringstore;
pub mod astoptimizer;
pub mod util;
#[cfg(test)]
mod tests {
use crate::interpreter::{Interpreter, Value};
use std::fs::read_to_string;
fn run_example_check_single_i64_output(filename: &str, correct_result: i64) {
let mut interpreter = Interpreter::new();
interpreter.capture_output = true;
let code = read_to_string(format!("examples/{filename}")).unwrap();
interpreter.run_str(&code);
let expected_output = [Value::I64(correct_result)];
assert_eq!(interpreter.output(), &expected_output);
}
#[test]
fn test_euler1() {
run_example_check_single_i64_output("euler1.nek", 233168);
}
#[test]
fn test_euler2() {
run_example_check_single_i64_output("euler2.nek", 4613732);
}
#[test]
fn test_euler3() {
run_example_check_single_i64_output("euler3.nek", 6857);
}
#[test]
fn test_euler4() {
run_example_check_single_i64_output("euler4.nek", 906609);
}
#[test]
fn test_euler5() {
run_example_check_single_i64_output("euler5.nek", 232792560);
}
#[test]
fn test_recursive_fib() {
run_example_check_single_i64_output("recursive_fib.nek", 832040);
}
#[test]
fn test_functions() {
run_example_check_single_i64_output("test_functions.nek", 69);
}
}

View File

@@ -1,12 +1,16 @@
use std::{env::args, fs, process::exit}; use std::{
env::args,
fs,
io::{stdin, stdout, Write},
};
use nek_lang::{interpreter::Interpreter, nice_panic}; use nek_lang::interpreter::Interpreter;
#[derive(Debug, Default)] #[derive(Debug, Default)]
struct CliConfig { struct CliConfig {
print_tokens: bool, print_tokens: bool,
print_ast: bool, print_ast: bool,
no_optimizations: bool, interactive: bool,
file: Option<String>, file: Option<String>,
} }
@@ -18,39 +22,34 @@ fn main() {
match arg.as_str() { match arg.as_str() {
"--token" | "-t" => conf.print_tokens = true, "--token" | "-t" => conf.print_tokens = true,
"--ast" | "-a" => conf.print_ast = true, "--ast" | "-a" => conf.print_ast = true,
"--no-opt" | "-n" => conf.no_optimizations = true, "--interactive" | "-i" => conf.interactive = true,
"--help" | "-h" => print_help(), file if conf.file.is_none() => conf.file = Some(file.to_string()),
file if !arg.starts_with("-") && conf.file.is_none() => { _ => panic!("Invalid argument: '{}'", arg),
conf.file = Some(file.to_string())
}
_ => nice_panic!("Error: Invalid argument '{}'", arg),
} }
} }
let mut interpreter = Interpreter::new(); let mut interpreter = Interpreter::new();
interpreter.print_tokens = conf.print_tokens;
interpreter.print_ast = conf.print_ast;
interpreter.optimize_ast = !conf.no_optimizations;
if let Some(file) = &conf.file { if let Some(file) = &conf.file {
let code = match fs::read_to_string(file) { let code = fs::read_to_string(file).expect(&format!("File not found: '{}'", file));
Ok(code) => code, interpreter.run_str(&code, conf.print_tokens, conf.print_ast);
Err(_) => nice_panic!("Error: Could not read file '{}'", file),
};
interpreter.run_str(&code);
} else {
println!("Error: No file given\n");
print_help();
}
} }
fn print_help() { if conf.interactive || conf.file.is_none() {
println!("Usage nek-lang [FLAGS] [FILE]"); let mut code = String::new();
println!("FLAGS: ");
println!("-t, --token Print the lexed tokens"); loop {
println!("-a, --ast Print the abstract syntax tree"); print!(">> ");
println!("-n, --no-opt Disable the AST optimizations"); stdout().flush().unwrap();
println!("-h, --help Show this help screen");
exit(0); code.clear();
stdin().read_line(&mut code).unwrap();
if code.trim() == "exit" {
break;
}
interpreter.run_str(&code, conf.print_tokens, conf.print_ast);
}
}
} }

View File

@@ -1,328 +1,168 @@
use thiserror::Error; use std::iter::Peekable;
use crate::{ use crate::ast::*;
ast::{Ast, BlockScope, Expression, FunDecl, If, Loop, Statement, VarDecl}, use crate::token::Token;
stringstore::{Sid, StringStore},
token::Token,
util::{PutBackIter, PutBackableExt},
T,
};
#[derive(Debug, Error)]
pub enum ParseErr {
#[error("Unexpected Token \"{0:?}\", expected \"{1}\"")]
UnexpectedToken(Token, String),
#[error("Left hand side of declaration is not a variable")]
DeclarationOfNonVar,
#[error("Use of undefined variable \"{0}\"")]
UseOfUndeclaredVar(String),
#[error("Use of undefined function \"{0}\"")]
UseOfUndeclaredFun(String),
#[error("Redeclation of function \"{0}\"")]
RedeclarationFun(String),
#[error("Function not declared at top level \"{0}\"")]
FunctionOnNonTopLevel(String),
}
type ResPE<T> = Result<T, ParseErr>;
macro_rules! validate_next {
($self:ident, $expected_tok:pat, $expected_str:expr) => {
match $self.next() {
$expected_tok => (),
tok => return Err(ParseErr::UnexpectedToken(tok, format!("{}", $expected_str))),
}
};
}
/// Parse the given tokens into an abstract syntax tree /// Parse the given tokens into an abstract syntax tree
pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> ResPE<Ast> { pub fn parse<T: Iterator<Item = Token>, A: IntoIterator<IntoIter = T>>(tokens: A) -> Ast {
let parser = Parser::new(tokens); let mut parser = Parser::new(tokens);
parser.parse() parser.parse()
} }
struct Parser<T: Iterator<Item = Token>> { struct Parser<T: Iterator<Item = Token>> {
tokens: PutBackIter<T>, tokens: Peekable<T>,
string_store: StringStore,
var_stack: Vec<Sid>,
fun_stack: Vec<Sid>,
nesting_level: usize,
} }
impl<T: Iterator<Item = Token>> Parser<T> { impl<T: Iterator<Item = Token>> Parser<T> {
/// Create a new parser to parse the given Token Stream /// Create a new parser to parse the given Token Stream
pub fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self { fn new<A: IntoIterator<IntoIter = T>>(tokens: A) -> Self {
let tokens = tokens.into_iter().putbackable(); let tokens = tokens.into_iter().peekable();
let string_store = StringStore::new(); Self { tokens }
let var_stack = Vec::new();
let fun_stack = Vec::new();
Self {
tokens,
string_store,
var_stack,
fun_stack,
nesting_level: 0,
}
}
pub fn parse(mut self) -> ResPE<Ast> {
let main = self.parse_scoped_block()?;
Ok(Ast {
main,
stringstore: self.string_store,
})
}
fn parse_scoped_block(&mut self) -> ResPE<BlockScope> {
self.parse_scoped_block_fp_offset(0)
} }
/// Parse tokens into an abstract syntax tree. This will continuously parse statements until /// Parse tokens into an abstract syntax tree. This will continuously parse statements until
/// encountering end-of-file or a block end '}' . /// encountering end-of-file or a block end '}' .
fn parse_scoped_block_fp_offset(&mut self, framepoint_offset: usize) -> ResPE<BlockScope> { fn parse(&mut self) -> Ast {
self.nesting_level += 1;
let framepointer = self.var_stack.len() - framepoint_offset;
let mut prog = Vec::new(); let mut prog = Vec::new();
loop { loop {
match self.peek() { match self.peek() {
T![;] => { Token::Semicolon => {
self.next(); self.next();
} }
Token::EoF | Token::RBraces => break,
T![EoF] | T!['}'] => break,
T!['{'] => {
self.next();
prog.push(Statement::Block(self.parse_scoped_block()?));
validate_next!(self, T!['}'], "}");
}
// By default try to lex a statement // By default try to lex a statement
_ => prog.push(self.parse_stmt()?), _ => prog.push(self.parse_stmt()),
} }
} }
self.var_stack.truncate(framepointer); Ast { prog }
self.nesting_level -= 1;
Ok(prog)
} }
/// Parse a single statement from the tokens. /// Parse a single statement from the tokens.
fn parse_stmt(&mut self) -> ResPE<Statement> { fn parse_stmt(&mut self) -> Statement {
let stmt = match self.peek() { match self.peek() {
T![break] => { Token::Loop => Statement::Loop(self.parse_loop()),
Token::Print => {
self.next(); self.next();
validate_next!(self, T![;], ";"); let expr = self.parse_expr();
Statement::Break
}
T![continue] => {
self.next();
validate_next!(self, T![;], ";");
Statement::Continue
}
T![loop] => Statement::Loop(self.parse_loop()?),
T![print] => {
self.next();
let expr = self.parse_expr()?;
// After a statement, there must be a semicolon // After a statement, there must be a semicolon
validate_next!(self, T![;], ";"); if !matches!(self.next(), Token::Semicolon) {
panic!("Expected semicolon after statement");
}
Statement::Print(expr) Statement::Print(expr)
} }
T![return] => { Token::If => Statement::If(self.parse_if()),
self.next();
let stmt = Statement::Return(self.parse_expr()?);
// After a statement, there must be a semicolon
validate_next!(self, T![;], ";");
stmt
}
T![if] => Statement::If(self.parse_if()?),
T![fun] => {
self.next();
let fun_name = match self.next() {
T![ident(fun_name)] => fun_name,
tok => return Err(ParseErr::UnexpectedToken(tok, "<ident>".to_string())),
};
if self.nesting_level > 1 {
return Err(ParseErr::FunctionOnNonTopLevel(fun_name));
}
let fun_name = self.string_store.intern_or_lookup(&fun_name);
if self.fun_stack.contains(&fun_name) {
return Err(ParseErr::RedeclarationFun(
self.string_store
.lookup(fun_name)
.cloned()
.unwrap_or("<unknown>".to_string()),
));
}
let fun_stackpos = self.fun_stack.len();
self.fun_stack.push(fun_name);
let mut arg_names = Vec::new();
validate_next!(self, T!['('], "(");
while matches!(self.peek(), T![ident(_)]) {
let var_name = match self.next() {
T![ident(var_name)] => var_name,
_ => unreachable!(),
};
let var_name = self.string_store.intern_or_lookup(&var_name);
arg_names.push(var_name);
// Push the variable onto the varstack
self.var_stack.push(var_name);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
}
}
validate_next!(self, T![')'], ")");
validate_next!(self, T!['{'], "{");
// Create the scoped block with a stack offset. This will pop the args that are
// added to the stack while parsing args
let body = self.parse_scoped_block_fp_offset(arg_names.len())?;
validate_next!(self, T!['}'], "}");
Statement::FunDeclare(FunDecl {
name: fun_name,
fun_stackpos,
argnames: arg_names,
body: body.into(),
})
}
// If it is not a loop, try to lex as an expression
_ => { _ => {
let first = self.next(); let stmt = Statement::Expr(self.parse_expr());
let stmt = match (first, self.peek()) {
(T![ident(name)], T![<-]) => {
self.next();
let rhs = self.parse_expr()?;
let sid = self.string_store.intern_or_lookup(&name);
let sp = self.var_stack.len();
self.var_stack.push(sid);
Statement::Declaration(VarDecl {
name: sid,
var_stackpos: sp,
rhs,
})
}
(first, _) => {
self.putback(first);
Statement::Expr(self.parse_expr()?)
}
};
// After a statement, there must be a semicolon // After a statement, there must be a semicolon
validate_next!(self, T![;], ";"); if !matches!(self.next(), Token::Semicolon) {
panic!("Expected semicolon after statement");
}
stmt stmt
} }
}; }
Ok(stmt)
} }
/// Parse an if statement from the tokens /// Parse an if statement from the tokens
fn parse_if(&mut self) -> ResPE<If> { fn parse_if(&mut self) -> If {
validate_next!(self, T![if], "if"); if !matches!(self.next(), Token::If) {
panic!("Error lexing if: Expected if token");
let condition = self.parse_expr()?;
validate_next!(self, T!['{'], "{");
let body_true = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
let mut body_false = BlockScope::default();
if self.peek() == &T![else] {
self.next();
validate_next!(self, T!['{'], "{");
body_false = self.parse_scoped_block()?;
validate_next!(self, T!['}'], "}");
} }
Ok(If { let condition = self.parse_expr();
if !matches!(self.next(), Token::LBraces) {
panic!("Error lexing if: Expected '{{'")
}
let body_true = self.parse();
if !matches!(self.next(), Token::RBraces) {
panic!("Error lexing if: Expected '}}'")
}
let mut body_false = Ast::default();
if matches!(self.peek(), Token::Else) {
self.next();
if !matches!(self.next(), Token::LBraces) {
panic!("Error lexing if: Expected '{{'")
}
body_false = self.parse();
if !matches!(self.next(), Token::RBraces) {
panic!("Error lexing if: Expected '}}'")
}
}
If {
condition, condition,
body_true, body_true,
body_false, body_false,
}) }
} }
/// Parse a loop statement from the tokens /// Parse a loop statement from the tokens
fn parse_loop(&mut self) -> ResPE<Loop> { fn parse_loop(&mut self) -> Loop {
validate_next!(self, T![loop], "loop"); if !matches!(self.next(), Token::Loop) {
panic!("Error lexing loop: Expected loop token");
}
let mut condition = None; let condition = self.parse_expr();
let mut advancement = None; let mut advancement = None;
if !matches!(self.peek(), T!['{']) { let body;
condition = Some(self.parse_expr()?);
if matches!(self.peek(), T![;]) { match self.next() {
self.next(); Token::LBraces => {
advancement = Some(self.parse_expr()?); body = self.parse();
}
} }
validate_next!(self, T!['{'], "{"); Token::Semicolon => {
advancement = Some(self.parse_expr());
let body = self.parse_scoped_block()?; if !matches!(self.next(), Token::LBraces) {
panic!("Error lexing loop: Expected '{{'")
}
validate_next!(self, T!['}'], "}"); body = self.parse();
}
Ok(Loop { _ => panic!("Error lexing loop: Expected ';' or '{{'"),
}
if !matches!(self.next(), Token::RBraces) {
panic!("Error lexing loop: Expected '}}'")
}
Loop {
condition, condition,
advancement, advancement,
body, body,
}) }
} }
/// Parse a single expression from the tokens /// Parse a single expression from the tokens
fn parse_expr(&mut self) -> ResPE<Expression> { fn parse_expr(&mut self) -> Expression {
let lhs = self.parse_primary()?; let lhs = self.parse_primary();
self.parse_expr_precedence(lhs, 0) self.parse_expr_precedence(lhs, 0)
} }
/// Parse binary expressions with a precedence equal to or higher than min_prec /// Parse binary expressions with a precedence equal to or higher than min_prec
fn parse_expr_precedence(&mut self, mut lhs: Expression, min_prec: u8) -> ResPE<Expression> { fn parse_expr_precedence(&mut self, mut lhs: Expression, min_prec: u8) -> Expression {
while let Some(binop) = &self.peek().try_to_binop() { while let Some(binop) = &self.peek().try_to_binop() {
// Stop if the next operator has a lower binding power // Stop if the next operator has a lower binding power
if !(binop.precedence() >= min_prec) { if !(binop.precedence() >= min_prec) {
@@ -333,170 +173,99 @@ impl<T: Iterator<Item = Token>> Parser<T> {
// valid // valid
let binop = self.next().try_to_binop().unwrap(); let binop = self.next().try_to_binop().unwrap();
let mut rhs = self.parse_primary()?; let mut rhs = self.parse_primary();
while let Some(binop2) = &self.peek().try_to_binop() { while let Some(binop2) = &self.peek().try_to_binop() {
if !(binop2.precedence() > binop.precedence()) { if !(binop2.precedence() > binop.precedence()) {
break; break;
} }
rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1)?; rhs = self.parse_expr_precedence(rhs, binop.precedence() + 1);
} }
lhs = Expression::BinOp(binop, lhs.into(), rhs.into()); lhs = Expression::BinOp(binop, lhs.into(), rhs.into());
} }
Ok(lhs) lhs
} }
/// Parse a primary expression (for now only number) /// Parse a primary expression (for now only number)
fn parse_primary(&mut self) -> ResPE<Expression> { fn parse_primary(&mut self) -> Expression {
let primary = match self.next() { match self.next() {
// Literal i64 // Literal i64
T![i64(val)] => Expression::I64(val), Token::I64(val) => Expression::I64(val),
// Literal String // Literal String
T![str(text)] => Expression::String(self.string_store.intern_or_lookup(&text)), Token::String(text) => Expression::String(text.into()),
// Array literal. Square brackets containing the array size as expression Token::Ident(name) => Expression::Var(name),
T!['['] => {
let size = self.parse_expr()?;
validate_next!(self, T![']'], "]");
Expression::ArrayLiteral(size.into())
}
// Array sccess, aka indexing. An ident followed by square brackets containing the
// index as an expression
T![ident(name)] if self.peek() == &T!['['] => {
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
self.next();
let index = self.parse_expr()?;
validate_next!(self, T![']'], "]");
Expression::ArrayAccess(sid, stackpos, index.into())
}
T![ident(name)] if self.peek() == &T!['('] => {
// Skip the opening parenthesis
self.next();
let sid = self.string_store.intern_or_lookup(&name);
let mut args = Vec::new();
while !matches!(self.peek(), T![')']) {
let arg = self.parse_expr()?;
args.push(arg);
// If there are more args skip the comma so that the loop will read the argname
if self.peek() == &T![,] {
self.next();
}
}
validate_next!(self, T![')'], ")");
let fun_stackpos = self.get_fun_stackpos(sid)?;
Expression::FunCall(sid, fun_stackpos, args)
}
T![ident(name)] => {
let sid = self.string_store.intern_or_lookup(&name);
let stackpos = self.get_stackpos(sid)?;
Expression::Var(sid, stackpos)
}
// Parentheses grouping // Parentheses grouping
T!['('] => { Token::LParen => {
let inner_expr = self.parse_expr()?; let inner_expr = self.parse_expr();
// Verify that there is a closing parenthesis // Verify that there is a closing parenthesis
validate_next!(self, T![')'], ")"); if !matches!(self.next(), Token::RParen) {
panic!("Error parsing primary expr: Exepected closing parenthesis ')'");
}
inner_expr inner_expr
} }
// Unary operations or invalid token // Unary negation
tok => match tok.try_to_unop() { Token::Sub => {
Some(uot) => Expression::UnOp(uot, self.parse_primary()?.into()), let operand = self.parse_primary();
None => return Err(ParseErr::UnexpectedToken(tok, "primary".to_string())), Expression::UnOp(UnOpType::Negate, operand.into())
},
};
Ok(primary)
} }
fn get_stackpos(&self, varid: Sid) -> ResPE<usize> { // Unary bitwise not (bitflip)
self.var_stack Token::Tilde => {
.iter() let operand = self.parse_primary();
.rev() Expression::UnOp(UnOpType::BNot, operand.into())
.position(|it| *it == varid)
.map(|it| it)
.ok_or(ParseErr::UseOfUndeclaredVar(
self.string_store
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
} }
fn get_fun_stackpos(&self, varid: Sid) -> ResPE<usize> { // Unary logical not
self.fun_stack Token::LNot => {
.iter() let operand = self.parse_primary();
.rev() Expression::UnOp(UnOpType::LNot, operand.into())
.position(|it| *it == varid) }
.map(|it| self.fun_stack.len() - it - 1)
.ok_or(ParseErr::UseOfUndeclaredFun( tok => panic!("Error parsing primary expr: Unexpected Token '{:?}'", tok),
self.string_store }
.lookup(varid)
.map(String::from)
.unwrap_or("<unknown>".to_string()),
))
} }
/// Get the next Token without removing it /// Get the next Token without removing it
fn peek(&mut self) -> &Token { fn peek(&mut self) -> &Token {
self.tokens.peek().unwrap_or(&T![EoF]) self.tokens.peek().unwrap_or(&Token::EoF)
}
fn putback(&mut self, tok: Token) {
self.tokens.putback(tok);
} }
/// Advance to next Token and return the removed Token /// Advance to next Token and return the removed Token
fn next(&mut self) -> Token { fn next(&mut self) -> Token {
self.tokens.next().unwrap_or(T![EoF]) self.tokens.next().unwrap_or(Token::EoF)
} }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{parse, BinOpType, Expression};
use crate::{ use crate::{
ast::{BinOpType, Expression, Statement}, parser::{Ast, Statement},
parser::parse, token::Token,
T,
}; };
#[test] #[test]
fn test_parser() { fn test_parser() {
// Expression: 1 + 2 * 3 - 4 // Expression: 1 + 2 * 3 + 4
// With precedence: (1 + (2 * 3)) - 4 // With precedence: (1 + (2 * 3)) + 4
let tokens = [ let tokens = [
T![i64(1)], Token::I64(1),
T![+], Token::Add,
T![i64(2)], Token::I64(2),
T![*], Token::Mul,
T![i64(3)], Token::I64(3),
T![-], Token::Sub,
T![i64(4)], Token::I64(4),
T![;], Token::Semicolon,
]; ];
let expected = Statement::Expr(Expression::BinOp( let expected = Statement::Expr(Expression::BinOp(
@@ -515,9 +284,11 @@ mod tests {
Expression::I64(4).into(), Expression::I64(4).into(),
)); ));
let expected = vec![expected]; let expected = Ast {
prog: vec![expected],
};
let actual = parse(tokens).unwrap(); let actual = parse(tokens);
assert_eq!(expected, actual.main); assert_eq!(expected, actual);
} }
} }

View File

@@ -1,31 +0,0 @@
use std::collections::HashMap;
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct Sid(usize);
#[derive(Clone, Default)]
pub struct StringStore {
strings: Vec<String>,
sids: HashMap<String, Sid>,
}
impl StringStore {
pub fn new() -> Self {
Self { strings: Vec::new(), sids: HashMap::new() }
}
pub fn intern_or_lookup(&mut self, text: &str) -> Sid {
self.sids.get(text).copied().unwrap_or_else(|| {
let sid = Sid(self.strings.len());
self.strings.push(text.to_string());
self.sids.insert(text.to_string(), sid);
sid
})
}
pub fn lookup(&self, sid: Sid) -> Option<&String> {
self.strings.get(sid.0)
}
}

View File

@@ -1,371 +1,146 @@
use crate::{ use crate::ast::BinOpType;
ast::{BinOpType, UnOpType},
T,
};
/// Language keywords
#[derive(Debug, PartialEq, Eq)]
pub enum Keyword {
/// Loop keyword ("loop")
Loop,
/// Print keyword ("print")
Print,
/// If keyword ("if")
If,
/// Else keyword ("else")
Else,
/// Function declaration keyword ("fun")
Fun,
/// Return keyword ("return")
Return,
/// Break keyword ("break")
Break,
/// Continue keyword ("continue")
Continue,
}
/// Literal values
#[derive(Debug, PartialEq, Eq)]
pub enum Literal {
/// Integer literal (64-bit)
I64(i64),
/// String literal
String(String),
}
/// Combined tokens that consist of a combination of characters
#[derive(Debug, PartialEq, Eq)]
pub enum Combo {
/// Equal Equal ("==")
Equal2,
/// Exclamation mark Equal ("!=")
ExclamationMarkEqual,
/// Ampersand Ampersand ("&&")
Ampersand2,
/// Pipe Pipe ("||")
Pipe2,
/// LessThan LessThan ("<<")
LessThan2,
/// GreaterThan GreaterThan (">>")
GreaterThan2,
/// LessThan Equal ("<=")
LessThanEqual,
/// GreaterThan Equal (">=")
GreaterThanEqual,
/// LessThan Minus ("<-")
LessThanMinus,
}
#[derive(Debug, PartialEq, Eq)] #[derive(Debug, PartialEq, Eq)]
pub enum Token { pub enum Token {
/// Literal value token /// Integer literal (64-bit)
Literal(Literal), I64(i64),
/// Keyword token /// String literal
Keyword(Keyword), String(String),
/// Identifier (name for variables, functions, ...) /// Identifier (name for variables, functions, ...)
Ident(String), Ident(String),
/// Combined tokens consisting of multiple characters /// Loop keyword (loop)
Combo(Combo), Loop,
/// Comma (",") /// Print keyword (print)
Comma, Print,
/// Equal Sign ("=") /// If keyword (if)
Equal, If,
/// Semicolon (";") /// Else keyword (else)
Else,
/// Left Parenthesis ('(')
LParen,
/// Right Parenthesis (')')
RParen,
/// Left curly braces ({)
LBraces,
/// Right curly braces (})
RBraces,
/// Plus (+)
Add,
/// Minus (-)
Sub,
/// Asterisk (*)
Mul,
/// Slash (/)
Div,
/// Percent (%)
Mod,
/// Equal Equal (==)
EquEqu,
/// Exclamationmark Equal (!=)
NotEqu,
/// Pipe (|)
BOr,
/// Ampersand (&)
BAnd,
/// Circumflex (^)
BXor,
/// Logical AND (&&)
LAnd,
/// Logical OR (||)
LOr,
/// Shift Left (<<)
Shl,
/// Shift Right (>>)
Shr,
/// Tilde (~)
Tilde,
/// Logical not (!)
LNot,
/// Left angle bracket (<)
LAngle,
/// Right angle bracket (>)
RAngle,
/// Left angle bracket Equal (<=)
LAngleEqu,
/// Left angle bracket Equal (>=)
RAngleEqu,
/// Left arrow (<-)
LArrow,
/// Equal Sign (=)
Equ,
/// Semicolon (;)
Semicolon, Semicolon,
/// End of file /// End of file
EoF, EoF,
/// Left Bracket ("[")
LBracket,
/// Right Bracket ("]")
RBracket,
/// Left Parenthesis ("(")
LParen,
/// Right Parenthesis (")"")
RParen,
/// Left curly braces ("{")
LBraces,
/// Right curly braces ("}")
RBraces,
/// Plus ("+")
Plus,
/// Minus ("-")
Minus,
/// Asterisk ("*")
Asterisk,
/// Slash ("/")
Slash,
/// Percent ("%")
Percent,
/// Pipe ("|")
Pipe,
/// Tilde ("~")
Tilde,
/// Logical not ("!")
Exclamationmark,
/// Left angle bracket ("<")
LessThan,
/// Right angle bracket (">")
GreaterThan,
/// Ampersand ("&")
Ampersand,
/// Circumflex ("^")
Circumflex,
} }
impl Token { impl Token {
/// If the Token can be used as a binary operation type, get the matching BinOpType. Otherwise
/// return None.
pub fn try_to_binop(&self) -> Option<BinOpType> { pub fn try_to_binop(&self) -> Option<BinOpType> {
Some(match self { Some(match self {
T![+] => BinOpType::Add, Token::Add => BinOpType::Add,
T![-] => BinOpType::Sub, Token::Sub => BinOpType::Sub,
T![*] => BinOpType::Mul, Token::Mul => BinOpType::Mul,
T![/] => BinOpType::Div, Token::Div => BinOpType::Div,
T![%] => BinOpType::Mod, Token::Mod => BinOpType::Mod,
T![&] => BinOpType::BAnd, Token::BAnd => BinOpType::BAnd,
T![|] => BinOpType::BOr, Token::BOr => BinOpType::BOr,
T![^] => BinOpType::BXor, Token::BXor => BinOpType::BXor,
T![&&] => BinOpType::LAnd, Token::LAnd => BinOpType::LAnd,
T![||] => BinOpType::LOr, Token::LOr => BinOpType::LOr,
T![<<] => BinOpType::Shl, Token::Shl => BinOpType::Shl,
T![>>] => BinOpType::Shr, Token::Shr => BinOpType::Shr,
T![==] => BinOpType::EquEqu, Token::EquEqu => BinOpType::EquEqu,
T![!=] => BinOpType::NotEqu, Token::NotEqu => BinOpType::NotEqu,
T![<] => BinOpType::Less, Token::LAngle => BinOpType::Less,
T![<=] => BinOpType::LessEqu, Token::LAngleEqu => BinOpType::LessEqu,
T![>] => BinOpType::Greater, Token::RAngle => BinOpType::Greater,
T![>=] => BinOpType::GreaterEqu, Token::RAngleEqu => BinOpType::GreaterEqu,
T![=] => BinOpType::Assign, Token::LArrow => BinOpType::Declare,
Token::Equ => BinOpType::Assign,
_ => return None,
})
}
pub fn try_to_unop(&self) -> Option<UnOpType> {
Some(match self {
T![-] => UnOpType::Negate,
T![!] => UnOpType::LNot,
T![~] => UnOpType::BNot,
_ => return None, _ => return None,
}) })
} }
} }
/// Macro to quickly create a token of the specified kind
#[macro_export]
macro_rules! T {
// Keywords
[loop] => {
crate::token::Token::Keyword(crate::token::Keyword::Loop)
};
[print] => {
crate::token::Token::Keyword(crate::token::Keyword::Print)
};
[if] => {
crate::token::Token::Keyword(crate::token::Keyword::If)
};
[else] => {
crate::token::Token::Keyword(crate::token::Keyword::Else)
};
[fun] => {
crate::token::Token::Keyword(crate::token::Keyword::Fun)
};
[return] => {
crate::token::Token::Keyword(crate::token::Keyword::Return)
};
[break] => {
crate::token::Token::Keyword(crate::token::Keyword::Break)
};
[continue] => {
crate::token::Token::Keyword(crate::token::Keyword::Continue)
};
// Literals
[i64($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::I64($($val)*))
};
[str($($val:tt)*)] => {
crate::token::Token::Literal(crate::token::Literal::String($($val)*))
};
// Ident
[ident($($val:tt)*)] => {
crate::token::Token::Ident($($val)*)
};
// Combo crate::token::Tokens
[==] => {
crate::token::Token::Combo(crate::token::Combo::Equal2)
};
[!=] => {
crate::token::Token::Combo(crate::token::Combo::ExclamationMarkEqual)
};
[&&] => {
crate::token::Token::Combo(crate::token::Combo::Ampersand2)
};
[||] => {
crate::token::Token::Combo(crate::token::Combo::Pipe2)
};
[<<] => {
crate::token::Token::Combo(crate::token::Combo::LessThan2)
};
[>>] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThan2)
};
[<=] => {
crate::token::Token::Combo(crate::token::Combo::LessThanEqual)
};
[>=] => {
crate::token::Token::Combo(crate::token::Combo::GreaterThanEqual)
};
[<-] => {
crate::token::Token::Combo(crate::token::Combo::LessThanMinus)
};
// Normal Tokens
[,] => {
crate::token::Token::Comma
};
[=] => {
crate::token::Token::Equal
};
[;] => {
crate::token::Token::Semicolon
};
[EoF] => {
crate::token::Token::EoF
};
['['] => {
crate::token::Token::LBracket
};
[']'] => {
crate::token::Token::RBracket
};
['('] => {
crate::token::Token::LParen
};
[')'] => {
crate::token::Token::RParen
};
['{'] => {
crate::token::Token::LBraces
};
['}'] => {
crate::token::Token::RBraces
};
[+] => {
crate::token::Token::Plus
};
[-] => {
crate::token::Token::Minus
};
[*] => {
crate::token::Token::Asterisk
};
[/] => {
crate::token::Token::Slash
};
[%] => {
crate::token::Token::Percent
};
[|] => {
crate::token::Token::Pipe
};
[~] => {
crate::token::Token::Tilde
};
[!] => {
crate::token::Token::Exclamationmark
};
[<] => {
crate::token::Token::LessThan
};
[>] => {
crate::token::Token::GreaterThan
};
[&] => {
crate::token::Token::Ampersand
};
[^] => {
crate::token::Token::Circumflex
};
}

View File

@@ -1,167 +0,0 @@
/// Exit the program with error code 1 and format-print the given text on stderr. This pretty much
/// works like panic, but doesn't show the additional information that panic adds. Those can be
/// interesting for debugging, but don't look that great when building a release executable for an
/// end user.
/// When running tests or running in debug mode, panic is used to ensure the tests working
/// correctly.
#[macro_export]
macro_rules! nice_panic {
($fmt:expr) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt);
} else {
eprintln!($fmt);
std::process::exit(1);
}
}
};
($fmt:expr, $($arg:tt)*) => {
{
if cfg!(test) || cfg!(debug_assertions) {
panic!($fmt, $($arg)*);
} else {
eprintln!($fmt, $($arg)*);
std::process::exit(1);
}
}
};
}
/// The PutBackIter allows for items to be put back back and to be peeked. Putting an item back
/// will cause it to be the next item returned by `next`. Peeking an item will get a reference to
/// the next item in the iterator without removing it.
///
/// The whole PutBackIter behaves analogous to `std::iter::Peekable` with the addition of the
/// `putback` function. This is slightly slower than `Peekable`, but allows for an unlimited number
/// of putbacks and therefore an unlimited look-ahead range.
pub struct PutBackIter<T: Iterator> {
iter: T,
putback_stack: Vec<T::Item>,
}
impl<T> PutBackIter<T>
where
T: Iterator,
{
/// Make the given iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
pub fn new(iter: T) -> Self {
Self {
iter,
putback_stack: Vec::new(),
}
}
/// Put the given item back into the iterator. This causes the putbacked items to be returned by
/// next in last-in-first-out order (aka. stack order). Only after all previously putback items
/// have been returned, the actual underlying iterator is used to get items.
/// The number of items that can be put back is unlimited.
pub fn putback(&mut self, it: T::Item) {
self.putback_stack.push(it);
}
/// Peek the next item, getting a reference to it without removing it from the iterator. This
/// also includes items that were previsouly put back and not yet removed.
pub fn peek(&mut self) -> Option<&T::Item> {
if self.putback_stack.is_empty() {
let it = self.next()?;
self.putback(it);
}
self.putback_stack.last()
}
}
impl<T> Iterator for PutBackIter<T>
where
T: Iterator,
{
type Item = T::Item;
fn next(&mut self) -> Option<Self::Item> {
match self.putback_stack.pop() {
Some(it) => Some(it),
None => self.iter.next(),
}
}
}
pub trait PutBackableExt {
/// Make the iterator putbackable, wrapping it in the PutBackIter type. This effectively
/// adds the `peek` and `putback` functions.
fn putbackable(self) -> PutBackIter<Self>
where
Self: Iterator + Sized,
{
PutBackIter::new(self)
}
}
impl<T: Iterator> PutBackableExt for T {}
#[cfg(test)]
mod tests {
use super::PutBackableExt;
#[test]
fn putback_iter_next() {
let mut iter = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut pb_iter = iter.clone().putbackable();
// Check if next works
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter.next());
}
}
#[test]
fn putback_iter_peek() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
for _ in 0..iter.len() {
// Check if peek gives a preview of the actual next element
assert_eq!(pb_iter.peek(), iter.next().as_ref());
// Check if next still returns the next (just peeked) element and not the one after
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
#[test]
fn putback_iter_putback() {
let mut iter_orig = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
let mut iter = iter_orig.clone();
let mut pb_iter = iter.clone().putbackable();
// Get the first 5 items with next and check if they match
let it0 = pb_iter.next();
assert_eq!(it0, iter.next());
let it1 = pb_iter.next();
assert_eq!(it1, iter.next());
let it2 = pb_iter.next();
assert_eq!(it2, iter.next());
let it3 = pb_iter.next();
assert_eq!(it3, iter.next());
let it4 = pb_iter.next();
assert_eq!(it4, iter.next());
// Put one value back and check if `next` works as expected, returning the just put back
// item
pb_iter.putback(it0.unwrap());
assert_eq!(pb_iter.next(), it0);
// Put all values back
pb_iter.putback(it4.unwrap());
pb_iter.putback(it3.unwrap());
pb_iter.putback(it2.unwrap());
pb_iter.putback(it1.unwrap());
pb_iter.putback(it0.unwrap());
// After all values have been put back, the iter should match the original again
for _ in 0..iter.len() {
assert_eq!(pb_iter.next(), iter_orig.next());
}
}
}