Getting Started
This guide walks you through building a BrainFuck interpreter with Alpaca. By the end, you will have a working lexer, parser, and evaluator — roughly 60 lines of code.
BrainFuck is a minimal language with eight single-character commands. That makes it an ideal first project: the grammar is small enough to fit on screen, but rich enough to exercise lexing, parsing, loops, and AST construction.
Prerequisites
- JDK 21 or later
- Mill 1.1.3+ (or SBT — see Installation for SBT/Scala CLI setup)
- Scala 3.8.3-RC1 or later
Project Setup
Create a Mill project with Alpaca as a dependency:
//| mill-version: 1.1.3
//| mill-jvm-version: 21
import mill._
import mill.scalalib._
object brainfuck extends ScalaModule {
def scalaVersion = "3.8.3-RC1"
def scalacOptions = Seq("-Yretain-trees")
def mvnDeps = Seq(
mvn"io.github.halotukozak::alpaca:0.1.0"
)
}
The -Yretain-trees flag is required. Alpaca's macros inspect the AST of your lexer and parser definitions at compile time, and this flag tells the compiler to preserve that information.
Step 1: The Lexer
BrainFuck> has the eight standard BrainFuck commands plus function definitions (name(body)) and function calls (name!). Everything else is a comment.
| Command | Meaning |
|---|---|
> |
Move the data pointer to the next cell |
< |
Move the data pointer to the previous cell |
+ |
Increment the byte at the data pointer |
- |
Decrement the byte at the data pointer |
. |
Output the byte at the data pointer as a character |
, |
Read one byte of input into the current cell |
[ |
Jump forward past the matching ] if the current cell is zero |
] |
Jump back to the matching [ if the current cell is non-zero |
name(body) |
Define a function |
name! |
Call a function |
import alpaca.*
case class BrainLexContext(
var brackets: Int = 0,
var squareBrackets: Int = 0,
) extends LexerCtx
val BrainLexer = lexer[BrainLexContext]:
case ">" => Token["next"]
case "<" => Token["prev"]
case "\\+" => Token["inc"]
case "-" => Token["dec"]
case "\\." => Token["print"]
case "," => Token["read"]
case "\\[" =>
ctx.squareBrackets += 1
Token["jumpForward"]
case "\\]" =>
require(ctx.squareBrackets > 0, "Mismatched brackets")
ctx.squareBrackets -= 1
Token["jumpBack"]
case name @ "[A-Za-z]+" => Token["functionName"](name)
case "\\(" =>
ctx.brackets += 1
Token["functionOpen"]
case "\\)" =>
require(ctx.brackets > 0, "Mismatched brackets")
ctx.brackets -= 1
Token["functionClose"]
case "!" => Token["functionCall"]
case "." => Token.Ignored
case "\n" => Token.Ignored
Each case maps a regex pattern to a token. Token["next"] creates a named token. Token["functionName"](name) creates a value-bearing token — the matched text is available as the token's value. Token.Ignored matches but produces no output.
The custom context BrainLexContext tracks bracket depth. Inside rule bodies, ctx gives access to the context — the lexer increments and decrements counters and uses require to catch mismatched brackets at lex time.
Pattern order matters: "\\." (literal dot — the print command) must appear before "." (any character — the catch-all). Otherwise the catch-all would shadow the print command.
Try it:
val (ctx, lexemes) = BrainLexer.tokenize("foo(++)")
require(ctx.brackets == 0 && ctx.squareBrackets == 0, "Mismatched brackets")
println(lexemes.map(_.name))
// List(functionName, functionOpen, inc, inc, functionClose)
Step 2: The AST
Before writing the parser, define the tree structure it will produce. BrainFuck> programs are sequences of instructions — some contain nested lists (loops, function bodies):
enum BrainAST:
case Root(ops: List[BrainAST])
case While(ops: List[BrainAST])
case FunctionDef(name: String, ops: List[BrainAST])
case FunctionCall(name: String)
case Next, Prev, Inc, Dec, Print, Read
Root wraps the top-level program. While represents a [...] loop. FunctionDef and FunctionCall handle named functions.
Step 3: The Parser
The parser turns a flat list of lexemes into a nested BrainAST:
import alpaca.*
import scala.collection.mutable
case class BrainParserCtx(
functions: mutable.Set[String] = mutable.Set.empty,
) extends ParserCtx
object BrainParser extends Parser[BrainParserCtx]:
val root: Rule[BrainAST] = rule:
case Operation.List(stmts) => BrainAST.Root(stmts)
val While: Rule[BrainAST] = rule:
case (BrainLexer.jumpForward(_), Operation.List(stmts), BrainLexer.jumpBack(_)) =>
BrainAST.While(stmts)
val Operation: Rule[BrainAST] = rule(
{ case BrainLexer.next(_) => BrainAST.Next },
{ case BrainLexer.prev(_) => BrainAST.Prev },
{ case BrainLexer.inc(_) => BrainAST.Inc },
{ case BrainLexer.dec(_) => BrainAST.Dec },
{ case BrainLexer.print(_) => BrainAST.Print },
{ case BrainLexer.read(_) => BrainAST.Read },
{ case While(whl) => whl },
{ case FunctionDef(fdef) => fdef },
{ case FunctionCall(call) => call },
)
val FunctionDef: Rule[BrainAST] = rule:
case (BrainLexer.functionName(name), BrainLexer.functionOpen(_),
Operation.List(ops), BrainLexer.functionClose(_)) =>
require(ctx.functions.add(name.value), s"Function ${name.value} is already defined")
BrainAST.FunctionDef(name.value, ops)
val FunctionCall: Rule[BrainAST] = rule:
case (BrainLexer.functionName(name), BrainLexer.functionCall(_)) =>
require(ctx.functions.contains(name.value), s"Function ${name.value} is not defined")
BrainAST.FunctionCall(name.value)
Things to note:
rootis required. It defines the grammar's start symbol. Here it matches zero or moreOperations via.List.Operation.List(stmts)is an EBNF operator. It matches zero or more occurrences and returns aList[BrainAST].BrainLexer.next(_)matches a lexeme whose token name is"next". The_discards the lexeme — we only care that the token appeared.name.valueaccesses the value from aLexeme. AfterBrainLexer.functionName(name),nameis aLexemeandname.valueis the matchedString.Parser[BrainParserCtx]carries state through reductions.ctx.functionstracks which functions have been defined, soFunctionCallcan reject undefined names.
Step 4: The Evaluator
BrainFuck operates on an array of 256 bytes with a movable pointer. Functions are stored by name and called by looking them up:
import scala.collection.mutable
class Memory(
val cells: Array[Int] = new Array(256),
var pointer: Int = 0,
val functions: mutable.Map[String, List[BrainAST]] = mutable.Map.empty,
)
extension (ast: BrainAST)
def eval(mem: Memory): Unit = ast match
case BrainAST.Root(ops) => ops.foreach(_.eval(mem))
case BrainAST.Next => mem.pointer = (mem.pointer + 1) & 0xff
case BrainAST.Prev => mem.pointer = (mem.pointer - 1) & 0xff
case BrainAST.Inc => mem.cells(mem.pointer) = (mem.cells(mem.pointer) + 1) & 0xff
case BrainAST.Dec => mem.cells(mem.pointer) = (mem.cells(mem.pointer) - 1) & 0xff
case BrainAST.Print => print(mem.cells(mem.pointer).toChar)
case BrainAST.Read => mem.cells(mem.pointer) = scala.io.StdIn.readChar() & 0xff
case BrainAST.While(ops) => while mem.cells(mem.pointer) != 0 do ops.foreach(_.eval(mem))
case BrainAST.FunctionDef(name, ops) => mem.functions += (name -> ops)
case BrainAST.FunctionCall(name) =>
mem.functions.get(name) match
case Some(ops) => ops.foreach(_.eval(mem))
case _ => throw RuntimeException(s"Undefined function: $name")
Step 5: Run It
Wire the three stages together:
import alpaca.*
@main def run(): Unit =
// Standard BrainFuck: Hello World
val hello = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."
val (ctx1, lexemes1) = BrainLexer.tokenize(hello)
require(ctx1.squareBrackets == 0, "Mismatched brackets")
val (_, ast1) = BrainParser.parse(lexemes1)
ast1.nn.eval(Memory())
// prints: Hello World!
// BrainFuck>: function definition and call
val withFunctions = "foo(+++)foo!foo!"
val (ctx2, lexemes2) = BrainLexer.tokenize(withFunctions)
require(ctx2.brackets == 0, "Mismatched brackets")
val (_, ast2) = BrainParser.parse(lexemes2)
val mem = Memory()
ast2.nn.eval(mem)
// mem.cells(0) == 6 (two calls to foo, each incrementing 3 times)
The pipeline is always the same: tokenize produces lexemes, parse produces an AST (or null on failure — .nn asserts non-null), and you evaluate the result however you want.
What's Next
This BrainFuck interpreter uses the simplest form of every Alpaca feature. The rest of the documentation extends it:
- Lexer — regex patterns, value extraction, token naming rules
- Lexer Context — tracking state during tokenization (we add bracket matching to BrainLexer)
- Parser — rules, named productions, EBNF operators
- Parser Context — shared state during parsing (we add a function registry)
- Extractors — pattern matching on terminals and non-terminals
- Conflict Resolution — resolving shift/reduce and reduce/reduce conflicts
- Theory — formal foundations behind everything above
