alpaca/Theory/Operator Precedence Grammars

Operator Precedence Grammars

When a grammar has binary operators at multiple precedence levels, the parser must decide which operations to perform first. There are two approaches: encode precedence in the grammar structure (stratification), or use a flat grammar with explicit conflict resolution. Alpaca supports both.

The Problem

The expression 1 + 2 * 3 has two valid parse trees:

    +               *
   / \             / \
  1   *           +   3
     / \         / \
    2   3       1   2

  = 1 + 6 = 7   = 3 * 3 = 9

The grammar Expr → Expr + Expr | Expr * Expr | NUMBER is ambiguous — it does not specify which tree to build. Precedence rules resolve this: * binds tighter than +, so 1 + 2 * 3 = 7.

Approach 1: Grammar Stratification

Encode precedence by splitting Expr into layers, one per precedence level:

Expr   → Expr + Term | Expr - Term | Term
Term   → Term * Factor | Term / Factor | Factor
Factor → ( Expr ) | NUMBER

This grammar is unambiguous. The structure forces * and / to bind tighter than + and -: a Term must be fully reduced before it participates in an Expr-level operation.

In Alpaca:

import alpaca.*

object CalcParser extends Parser:
  val Expr: Rule[Double] = rule(
    { case (Expr(a), CalcLexer.PLUS(_), Term(b))  => a + b },
    { case (Expr(a), CalcLexer.MINUS(_), Term(b)) => a - b },
    { case Term(t) => t },
  )
  val Term: Rule[Double] = rule(
    { case (Term(a), CalcLexer.TIMES(_), Factor(b)) => a * b },
    { case (Term(a), CalcLexer.DIVIDE(_), Factor(b)) => a / b },
    { case Factor(f) => f },
  )
  val Factor: Rule[Double] = rule(
    { case CalcLexer.NUMBER(n) => n.value },
    { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
  )
  val root: Rule[Double] = rule:
    case Expr(e) => e

No resolutions needed — the grammar itself is unambiguous.

Approach 2: Flat Grammar + Conflict Resolution

Keep all operators at the same grammar level and use before/after to declare precedence:

import alpaca.*

object CalcParser extends Parser:
  val Expr: Rule[Double] = rule(
    "plus"  { case (Expr(a), CalcLexer.PLUS(_), Expr(b))  => a + b },
    "minus" { case (Expr(a), CalcLexer.MINUS(_), Expr(b)) => a - b },
    "times" { case (Expr(a), CalcLexer.TIMES(_), Expr(b))  => a * b },
    "div"   { case (Expr(a), CalcLexer.DIVIDE(_), Expr(b)) => a / b },
    { case CalcLexer.NUMBER(n) => n.value },
    { case (CalcLexer.LPAREN(_), Expr(e), CalcLexer.RPAREN(_)) => e },
  )
  val root: Rule[Double] = rule:
    case Expr(e) => e

  override val resolutions = Set(
    production.times.before(CalcLexer.TIMES, CalcLexer.DIVIDE),
    production.div.before(CalcLexer.TIMES, CalcLexer.DIVIDE),
    production.plus.before(CalcLexer.PLUS, CalcLexer.MINUS),
    production.minus.before(CalcLexer.PLUS, CalcLexer.MINUS),
    production.plus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
    production.minus.after(CalcLexer.TIMES, CalcLexer.DIVIDE),
  )

The grammar is ambiguous, but the resolutions tell the parser exactly how to handle every conflict.

Trade-offs

For grammars with 2-3 precedence levels, stratification is straightforward. For grammars with many levels (C has 15), flat grammar + resolution is more maintainable.

Extending BrainFuck with Arithmetic

Standard BrainFuck has no operator precedence — + always means "increment by 1." But suppose we extend the language with arithmetic expressions for cell values: +(3*2) adds 6 to the current cell, -(1+2) subtracts 3. Now the BrainFuck lexer needs number and operator tokens, and the parser needs expression rules with precedence.

The extended lexer adds arithmetic tokens:

import alpaca.*

val ExtendedLexer = lexer:
  // Standard BrainFuck
  case ">" => Token["next"]
  case "<" => Token["prev"]
  case "\\." => Token["print"]
  case "," => Token["read"]
  case "\\[" => Token["jumpForward"]
  case "\\]" => Token["jumpBack"]
  // Arithmetic extension
  case "\\+" => Token["+"]
  case "-" => Token["-"]
  case "\\*" => Token["*"]
  case "/" => Token["/"]
  case "\\(" => Token["("]
  case "\\)" => Token[")"]
  case n @ "[0-9]+" => Token["NUM"](n.toInt)
  case "\\s+" => Token.Ignored

The parser uses a flat grammar with conflict resolution for precedence:

import alpaca.*

object ExtendedParser extends Parser:
  val root: Rule[BrainAST] = rule:
    case Operation.List(ops) => BrainAST.Root(ops)

  // Arithmetic expressions with precedence
  val Expr: Rule[Int] = rule(
    "add"  { case (Expr(a), ExtendedLexer.`+`(_), Expr(b)) => a + b },
    "sub"  { case (Expr(a), ExtendedLexer.`-`(_), Expr(b)) => a - b },
    "mul"  { case (Expr(a), ExtendedLexer.`*`(_), Expr(b)) => a * b },
    "div"  { case (Expr(a), ExtendedLexer.`/`(_), Expr(b)) => a / b },
    { case ExtendedLexer.NUM(n) => n.value },
    { case (ExtendedLexer.`(`(_), Expr(e), ExtendedLexer.`)`(_)) => e },
  )

  // BrainFuck operations, now with arithmetic amounts
  val Operation: Rule[BrainAST] = rule(
    { case (ExtendedLexer.`+`(_), ExtendedLexer.`(`(_), Expr(n), ExtendedLexer.`)`(_)) =>
        BrainAST.Add(n) },        // +(3*2) adds 6
    { case (ExtendedLexer.`-`(_), ExtendedLexer.`(`(_), Expr(n), ExtendedLexer.`)`(_)) =>
        BrainAST.Sub(n) },        // -(1+2) subtracts 3
    { case ExtendedLexer.`+`(_) => BrainAST.Inc },  // bare + is increment by 1
    { case ExtendedLexer.`-`(_) => BrainAST.Dec },
    // ... other operations
  )

  override val resolutions = Set(
    // * and / bind tighter than + and -
    production.mul.before(ExtendedLexer.`*`, ExtendedLexer.`/`),
    production.div.before(ExtendedLexer.`*`, ExtendedLexer.`/`),
    production.add.before(ExtendedLexer.`+`, ExtendedLexer.`-`),
    production.sub.before(ExtendedLexer.`+`, ExtendedLexer.`-`),
    production.add.after(ExtendedLexer.`*`, ExtendedLexer.`/`),
    production.sub.after(ExtendedLexer.`*`, ExtendedLexer.`/`),
  )

Without the resolutions, the compiler reports:

Shift "*" vs Reduce Expr -> Expr + Expr
In situation like:
Expr + Expr * ...
Consider marking production Expr -> Expr + Expr to be before or after "*"

The resolutions establish: *// bind tighter than +/-, and all operators are left-associative. Now +(3+2*4) correctly evaluates to +(11) — adding 11 to the current cell.

Transitivity and Cycle Detection

Alpaca treats before and after constraints as a partial order over productions. The compiler computes the transitive closure: if A.before(B) and B.before(C) are both declared, then A.before(C) holds implicitly — you do not need to state it.

This matters for grammars with many precedence levels. For C-like operators (*, +, <, &&, ||), declaring mul.before(add).before(cmp).before(and).before(or) is enough; pairwise constraints between non-adjacent levels are derived.

Cycles in the constraint graph are contradictions. If the closure ever produces both A.before(B) and A.after(B) (directly or indirectly through other productions), the compiler rejects the resolution set with an InconsistentConflictResolution error showing the full cycle path. This catches mistakes like declaring mul.before(add) together with add.before(mul) — even when the contradiction is not direct.

The detection runs at compile time, so a grammar that compiles is guaranteed to have a consistent precedence ordering.

Cross-links

See Conflict Resolution for the full before/after DSL reference.
See Conflicts and Disambiguation for the formal theory of shift/reduce and reduce/reduce conflicts.
See the Expression Evaluator for a complete calculator with precedence resolution.

In this article

Generated with

made with ❤️ and coffee