nek-lang/README.md
2022-02-11 00:12:36 +01:00

418 lines
13 KiB
Markdown

# NEK-Lang
## Table of contents
- [NEK-Lang](#nek-lang)
- [Table of contents](#table-of-contents)
- [Variables](#variables)
- [Declaration](#declaration)
- [Assignment](#assignment)
- [Datatypes](#datatypes)
- [I64](#i64)
- [String](#string)
- [Array](#array)
- [Expressions](#expressions)
- [General](#general)
- [Mathematical Operators](#mathematical-operators)
- [Bitwise Operators](#bitwise-operators)
- [Logical Operators](#logical-operators)
- [Equality & Relational Operators](#equality--relational-operators)
- [Control-Flow](#control-flow)
- [Loop](#loop)
- [If / Else](#if--else)
- [Block Scopes](#block-scopes)
- [Functions](#functions)
- [Function definition](#function-definition)
- [Function calls](#function-calls)
- [IO](#io)
- [Print](#print)
- [Comments](#comments)
- [Line comments](#line-comments)
- [Feature Tracker](#feature-tracker)
- [High level Components](#high-level-components)
- [Language features](#language-features)
- [Parsing Grammar](#parsing-grammar)
- [Expressions](#expressions-1)
- [Statements](#statements)
- [Examples](#examples)
## Variables
The variables are all contained in scopes. Variables defined in an outer scope can be accessed in
inner scoped. All variables defined in a scope that has ended do no longer exist and can't be
accessed.
### Declaration
- Declare and initialize a new variable
- Declaring a previously declared variable again is currently equivalent to an assignment
- Declaration is needed before assignment or other usage
- The variable name is on the left side of the `<-` operator
- The assigned value is on the right side and can be any expression
```
a <- 123;
```
Create a new variable named `a` and assign the value `123` to it.
### Assignment
- Assigning a value to a previously declared variable
- The variable name is on the left side of the `=` operator
- The assigned value is on the right side and can be any expression
```
a = 123;
```
The value `123` is assigned to the variable named `a`. `a` needs to be declared before this.
## Datatypes
The available variable datatypes are `i64` (64-bit signed integer), `string` (`"this is a string"`) and `array` (`[10]`)
### I64
- The normal default datatype is `i64` which is a 64-bit signed integer
- Can be created by just writing an integer literal like `546`
- Inside the number literal `_` can be inserted for visual separation `100_000`
- The i64 values can be used as expected in calculations, conditions and so on
-
```
my_i64 <- 123_456;
```
### String
- Strings mainly exist for formatting the text output of a program
- Strings can be created by using doublequotes like in other languages `"Hello world"`
- There is no way to access or change the characters of the string
- Unicode characters are supported `"Hello 🌎"`
- Escape characters `\n`, `\r`, `\t`, `\"`, `\\` are supported
- String can still be assigned to variables, just like i64
```
world <- "🌎";
print "Hello ";
print world;
print "\n";
```
### Array
- Arrays can contain any other datatypes and don't need to have the same type in all cells
- Arrays can be created by using brackets with the size in between `[size]`
- Arrays must be assigned to a variable to be used
- All cells will be initialized with i64 0 values
- The size can be any expression that results in a positive i64 value
- The array size can't be changed after creation
- The arrays data is always allocated on the heap
- The array cells can be accessed by using the variable name and brackets `my_arr[index]`
- The index can be any expression that results in a positive i64 value in the range of the arrays
indices
- The indices start with 0
- When an array is passed to a function, it is passed by reference
```
width <- 5;
heigt <- 5;
// Initialize array of size 25 with 25x 0
my_array = [width * height];
// Modify first value
my_array[0] = 5;
// Print first value
print my_array[0];
```
## Expressions
The operator precedence is the same order as in `C` for all implemented operators.
Refer to the
[C Operator Precedence Table](https://en.cppreference.com/w/c/language/operator_precedence)
to see the different precedences.
### General
- Parentheses `(` and `)` can be used to modify evaluation oder just like in any other
programming language.
- For example `(a + b) * c` will evaluate the addition before the multiplication, despite the multiplication having higher binding power
### Mathematical Operators
Supported mathematical operations:
- Addition `a + b`
- Subtraction `a - b`
- Multiplication `a * b`
- Division `a / b`
- Modulo `a % b`
- Negation `-a`
### Bitwise Operators
- And `a & b`
- Or `a | b`
- Xor `a ^ b`
- Bitshift left (by `b` bits) `a << b`
- Bitshift right (by `b` bits) `a >> b`
- "Bit flip" (One's complement) `~a`
### Logical Operators
The logical operators evaluate the operands as `false` if they are equal to `0` and `true` if they are not equal to `0`
- And `a && b`
- Or `a || b`
- Not `!a` (if `a` is equal to `0`, the result is `1`, otherwise the result is `0`)
### Equality & Relational Operators
The equality and relational operations result in `1` if the condition is evaluated as `true` and in `0` if the condition is evaluated as `false`.
- Equality `a == b`
- Inequality `a != b`
- Greater than `a > b`
- Greater or equal than `a >= b`
- Less than `a < b`
- Less or equal than `a <= b`
## Control-Flow
For conditions like in if or loops, every non zero value is equal to `true`, and `0` is `false`.
### Loop
- The `loop` keyword can be used as an infinite loop, as a while loop or as a while loop with advancement (an expression that is executed after the loop body)
- If only `loop` is used, directly followed by the body, it is an infinite loop that needs to be
terminated by using the `break` keyword
- The `loop` keyword is followed by the condition (an expression) without needing parentheses
- *Optional:* If there is a `;` after the condition, there must be another expression which is used as the advancement
- The loops body is wrapped in braces (`{ }`) just like in C/C++
- The `continue` keyword can be used to end the current loop iteration early
- The `break` keyword can be used to fully break out of the current loop
```
// Print the numbers from 0 to 9
// With endless loop
i <- 0;
loop {
if i >= 10 {
break;
}
print i;
i = i + 1;
}
// Without advancement
i <- 0;
loop i < 10 {
print i;
i = i + 1;
}
// With advancement
k <- 0;
loop k < 10; k = k + 1 {
print k;
}
```
### If / Else
- The language supports `if` and an optional `else`
- After the `if` keyword must be the deciding condition, parentheses are not needed
- The block *if-true* block is wrapped in braces (`{ }`)
- *Optional:* If there is an `else` after the *if-block*, there must be a following *if-false*, aka. else block
```
a <- 1;
b <- 2;
if a == b {
// a is equal to b
print 1;
} else {
// a is not equal to b
print 0;
}
```
### Block Scopes
- It is possible to create a limited scope for local variables that will no longer exist once the
scope ends
- Shadowing variables by redefining a variable in an inner scope is supported
```
var_in_outer_scope <- 5;
{
var_in_inner_scope <- 3;
// Inner scope can access both vars
print var_in_outer_scope;
print var_in_inner_scope;
}
// Outer scope is still valid
print var_in_outer_scope;
// !!! THIS DOES NOT WORK !!!
// The inner scope has ended
print var_in_inner_scope;
```
## Functions
### Function definition
- Functions can be defined by using the `fun` keyword, followed by the function name and the
parameters in parentheses. After the parentheses, the body is specified inside a braces block
- The function parameters are specified by only the names
- The function body has its own scope
- Parameters are only accessible inside the body
- Variables from the outer scope can be accessed and modified if the are defined before the function
- Variables from the outer scope are shadowed by parameters with the same name
- The `return` keyword can be used to return a value from the function and exit it immediately
- If no return is specified, a `void` value is returned
```
fun add_maybe(a, b) {
if a < 100 {
return a;
} else {
return a + b;
}
}
fun println(val) {
print val;
print "\n";
}
```
### Function calls
- Function calls are primary expressions, so they can be directly used in calculations (if they
return appropriate values)
- Function calls are performed by writing the function name, followed by the arguments in parentheses
- The arguments can be any expressions, separated by commas
```
b <- 100;
result <- add_maybe(250, b);
// Prints 350 + new-line
println(result);
```
## IO
### Print
Printing is implemented via the `print` keyword
- The `print` keyword is followed by an expression, the value of which will be printed to the terminal.
- Print currently automatically adds a linebreak
```
a <- 1;
// Outputs `"1"` to the terminal
print a;
```
## Comments
### Line comments
Line comments can be initiated by using `//`
- Everything after `//` up to the end of the current line is ignored and not parsed
```
// This is a comment
```
# Feature Tracker
## High level Components
- [x] Lexer: Transforms text into Tokens
- [x] Parser: Transforms Tokens into Abstract Syntax Tree
- [x] Interpreter (tree-walk-interpreter): Walks the tree and evaluates the expressions / statements
- [x] Simple optimizer: Apply trivial optimizations to the Ast
- [x] Precalculate binary ops / unary ops that have only literal operands
## Language features
- [x] General expressions
- [x] Arithmetic operations
- [x] Addition `a + b`
- [x] Subtraction `a - b`
- [x] Multiplication `a * b`
- [x] Division `a / b`
- [x] Modulo `a % b`
- [x] Negate `-a`
- [x] Parentheses `(a + b) * c`
- [x] Logical boolean operators
- [x] Equal `a == b`
- [x] Not equal `a != b`
- [x] Greater than `a > b`
- [x] Less than `a < b`
- [x] Greater than or equal `a >= b`
- [x] Less than or equal `a <= b`
- [x] Logical operators
- [x] And `a && b`
- [x] Or `a || b`
- [x] Not `!a`
- [x] Bitwise operators
- [x] Bitwise AND `a & b`
- [x] Bitwise OR `a | b`
- [x] Bitwise XOR `a ^ b`
- [x] Bitwise NOT `~a`
- [x] Bitwise left shift `a << b`
- [x] Bitwise right shift `a >> b`
- [x] Variables
- [x] Declaration
- [x] Assignment
- [x] Local variables (for example inside loop, if, else, functions)
- [x] Scoped block for specific local vars `{ ... }`
- [x] Statements with semicolon & Multiline programs
- [x] Control flow
- [x] Loops
- [x] While-style loop `loop X { ... }`
- [x] For-style loop without with `X` as condition and `Y` as advancement `loop X; Y { ... }`
- [x] Infinite loop `loop { ... }`
- [x] Break `break`
- [x] Continue `continue`
- [x] If else statement `if X { ... } else { ... }`
- [x] If Statement
- [x] Else statement
- [x] Line comments `//`
- [x] Strings
- [x] Arrays
- [x] Creating array with size `X` as a variable `arr <- [X]`
- [x] Accessing arrays by index `arr[X]`
- [x] IO Intrinsics
- [x] Print
- [x] Functions
- [x] Function declaration `fun f(X, Y, Z) { ... }`
- [x] Function calls `f(1, 2, 3)`
- [x] Function returns `return X`
- [x] Local variables
- [x] Pass arrays by-reference, i64 by-vale, string is a const ref
# Parsing Grammar
## Expressions
```
ARRAY_LITERAL = "[" expr "]"
ARRAY_ACCESS = IDENT "[" expr "]"
FUN_CALL = IDENT "(" (expr ",")* expr? ")"
LITERAL = I64_LITERAL | STR_LITERAL | ARRAY_LITERAL
expr_primary = LITERAL | IDENT | FUN_CALL | ARRAY_ACCESS | "(" expr ")" | "-" expr_primary
| "~" expr_primary
expr_mul = expr_primary (("*" | "/" | "%") expr_primary)*
expr_add = expr_mul (("+" | "-") expr_mul)*
expr_shift = expr_add ((">>" | "<<") expr_add)*
expr_rel = expr_shift ((">" | ">=" | "<" | "<=") expr_shift)*
expr_equ = expr_rel (("==" | "!=") expr_rel)*
expr_band = expr_equ ("&" expr_equ)*
expr_bxor = expr_band ("^" expr_band)*
expr_bor = expr_bxor ("|" expr_bxor)*
expr_land = expr_bor ("&&" expr_bor)*
expr_lor = expr_land ("||" expr_land)*
expr = expr_lor
```
## Statements
```
stmt_return = "return" expr ";"
stmt_break = "break" ";"
stmt_continue = "continue" ";"
stmt_var_decl = IDENT "<-" expr ";"
stmt_fun_decl = "fun" IDENT "(" (IDENT ",")* IDENT? ")" "{" stmt* "}"
stmt_expr = expr ";"
stmt_block = "{" stmt* "}"
stmt_loop = "loop" (expr (";" expr)?)? "{" stmt* "}"
stmt_if = "if" expr "{" stmt* "}" ("else" "{" stmt* "}")?
stmt_print = "print" expr ";"
stmt = stmt_return | stmt_break | stmt_continue | stmt_var_decl | stmt_fun_decl
| stmt_expr | stmt_block | stmt_loop | stmt_if | stmt_print
```
# Examples
There are a bunch of examples in the [examples](examples/) directory. Those include (non-optimal) solutions to the first five project euler problems, as well as a [simple Game of Life implementation](examples/game_of_life.nek).
To run an example via `cargo-run`, use:
```
cargo run --release -- examples/[NAME]
```