Contextual Lexing
This guide covers stateful tokenization: tracking nesting depth, maintaining counters, passing information from the lexer to the parser, and handling errors gracefully.
What you'll learn: custom LexerCtx, ParserCtx, the OnTokenMatch hook, ErrorHandling strategies, and how lexer context flows into parser rules.
Tracking State During Lexing
The BrainFuck> lexer tracks bracket depth to catch mismatched brackets at lex time:
import alpaca.*
case class BrainLexContext(
var brackets: Int = 0,
var squareBrackets: Int = 0,
) extends LexerCtx
val BrainLexer = lexer[BrainLexContext]:
case "\\[" =>
ctx.squareBrackets += 1
Token["jumpForward"]
case "\\]" =>
require(ctx.squareBrackets > 0, "Mismatched brackets")
ctx.squareBrackets -= 1
Token["jumpBack"]
case "\\(" =>
ctx.brackets += 1
Token["functionOpen"]
case "\\)" =>
require(ctx.brackets > 0, "Mismatched brackets")
ctx.brackets -= 1
Token["functionClose"]
case name @ "[A-Za-z]+" => Token["functionName"](name)
case "!" => Token["functionCall"]
case "\\+" => Token["inc"]
case "-" => Token["dec"]
case ">" => Token["next"]
case "<" => Token["prev"]
case "\\." => Token["print"]
case "," => Token["read"]
case "." => Token.Ignored
case "\n" => Token.Ignored
After tokenization, check the final context:
val (ctx, lexemes) = BrainLexer.tokenize(input)
require(ctx.squareBrackets == 0 && ctx.brackets == 0, "Mismatched brackets")
Accessing Lexer Context in the Parser
Every Lexeme carries a snapshot of the lexer context at match time. Inside parser rules, use the binding to access positional info:
import alpaca.*
val FunctionCall: Rule[BrainAST] = rule:
case (BrainLexer.functionName(name), BrainLexer.functionCall(_)) =>
// name.value: String -- the function name
// name.position: Int -- 1-based column within the current line (if lexer uses PositionTracking)
// name.line: Int -- line number (if lexer uses LineTracking)
BrainAST.FunctionCall(name.value)
To get position and line numbers, extend your context with the tracking traits:
case class BrainLexContext(
var brackets: Int = 0,
var squareBrackets: Int = 0,
var position: Int = 1,
var line: Int = 1,
) extends LexerCtx with PositionTracking with LineTracking
Parser-Level Context
ParserCtx is for state that evolves during parsing -- symbol tables, function registries, type environments. The BrainFuck> parser uses it to track defined functions:
import alpaca.*
import scala.collection.mutable
case class BrainParserCtx(
functions: mutable.Set[String] = mutable.Set.empty,
) extends ParserCtx
object BrainParser extends Parser[BrainParserCtx]:
val FunctionDef: Rule[BrainAST] = rule:
case (BrainLexer.functionName(name), BrainLexer.functionOpen(_),
Operation.List(ops), BrainLexer.functionClose(_)) =>
require(ctx.functions.add(name.value), s"Function ${name.value} is already defined")
BrainAST.FunctionDef(name.value, ops)
val FunctionCall: Rule[BrainAST] = rule:
case (BrainLexer.functionName(name), BrainLexer.functionCall(_)) =>
require(ctx.functions.contains(name.value), s"Function ${name.value} is not defined")
BrainAST.FunctionCall(name.value)
// ... other rules
ctx is shared across all reductions in a single parse() call. A function defined in FunctionDef is immediately visible in FunctionCall.
Error Handling Strategies
By default, the lexer throws on unmatched input. You can customize this with an ErrorHandling instance:
import alpaca.*
import alpaca.internal.lexer.ErrorHandling
// Option A: skip unrecognized characters silently
given ErrorHandling[BrainLexContext] = _ =>
ErrorHandling.Strategy.IgnoreChar
import alpaca.*
import alpaca.internal.lexer.ErrorHandling
// Option B: stop gracefully, returning what was tokenized so far
given ErrorHandling[BrainLexContext] = _ =>
ErrorHandling.Strategy.Stop
Four strategies are available:
| Strategy | Behavior |
|---|---|
Throw(ex) |
Abort with the given exception |
IgnoreChar |
Skip one character and continue |
IgnoreToken |
Skip to the next match and continue |
Stop |
Return lexemes collected so far |
An alternative to custom ErrorHandling is a catch-all pattern at the end of your lexer:
case x @ "." =>
println(s"Unexpected character: $x")
Token.Ignored // skip and continue
This is simpler and often sufficient. The BrainFuck lexer uses this approach -- "." => Token.Ignored catches all non-command characters.
Data Flow Summary
- Input flows into the lexer
OnTokenMatchupdates theLexerCtxafter every matchLexemes are produced, each carrying a context snapshotList[Lexeme]flows into the parserParserCtxis initialized and updated as rules are reduced- Result is produced, along with the final
ParserCtx
