The Compilation Pipeline

Source text is just a string. A compiler pipeline is a sequence of transformations that turns that string into something structured and meaningful. Each stage takes the output of the previous one, narrowing the representation from raw text to a typed result.

This model applies to every Alpaca program — not just calculators, but any language you define with the library.

The Four Stages

Most compilers share the same four-stage structure:

  1. Source text — the raw input string, e.g., "3 + 4 * 2"
  2. Lexical analysis — groups characters into tokens: NUMBER(3.0), PLUS, NUMBER(4.0), TIMES, NUMBER(2.0)
  3. Syntactic analysis — arranges tokens into a parse tree (concrete syntax tree) that encodes grammatical structure
  4. Semantic analysis / evaluation — extracts meaning from the tree, producing a typed result (in a calculator: Double)

Some compilers add a fifth stage — code generation — that emits machine code or bytecode. Alpaca stops at stage 4: its pipeline produces a typed Scala value, not machine code, but you can implement it by yourself.

Alpaca's Pipeline

With Alpaca, running the full pipeline takes two calls:

// Full pipeline: source text → typed result
val (_, lexemes) = BrainLexer.tokenize("++[>+<-].")
// lexemes: List[Lexeme] — inc, inc, jumpForward, next, inc, prev, dec, jumpBack, print

val (_, ast) = BrainParser.parse(lexemes)
// ast: BrainAST | Null — the parsed abstract syntax tree

BrainLexer.tokenize handles stages 1–2: source string to List[Lexeme]. BrainParser.parse handles stage 3: lexemes to an AST. You then evaluate or interpret that AST as stage 4, producing the final result.

Both BrainLexer and BrainParser are generated by Alpaca's macros at compile time.

What Comes Next

The rest of the Compiler Theory Tutorial builds on this mental model:

For the full API, see the reference pages:

  • See Lexer for how BrainLexer is defined.
  • See Parser for how BrainParser is defined and how grammar rules produce a typed result.