Justin Pombrio

What we preceive as reality is a construct of the mind.

Parser-Printers

Why are we still specifying parsers and printers separately? I would like to be able to write down a BNF grammar with rules like

<prog> ::= [Text| e:<expr>] <-> [Expr| e]
<expr> ::= '(' e:<expr> ')' <-> e
         | e1:<expr> "*" e2:<expr> <-> [Prod| e1 e2]
         | e:<list> <-> e
         | n:<int> <-> [Num| n]
<list> ::= '[' (t:<term>)* ']' <-> [List| t*]

(where [Prod| e1 e2] is syntax for a node of label “Prod”), and get both a parser and a printer.

Lenses solve this problem, but in a heavy-weight way. They come with three operations:

  • get: C -> A
  • put: A -> C -> C
  • create: A -> C with three laws:
  • put (get c) c = c
  • get (put a c) = a
  • get (create a) = a

We often do not need the ability to reflect changes in A back onto C, and merely having get and create (a.k.a. parse and print) and the third law would suffice. It would also be nice to have a grammar-like syntax for specifying parsers, instead of relying on lens combinators (though there is some awareness (http://www.seas.upenn.edu/~cse400/CSE400_2008_2009/websites/magee_puller/report.pdf) of this).

Some further thoughts:

  • I have thought enough about this to be comfortable claiming that its straightforward to solve in a dynamic language.

  • Things get a little more interesting in a typed language. You’ll need to abstract over data constructors. In the example above, you might have,

      data Text = Text [Char]
      data Expr = Prod Expr Expr
                | List [Expr]
                | Num Int

    But again, there’s nothing monumental here.

  • ’*’s act like lisp macro ’…’s

  • You will need to be able to inject user-defined printer/parsers (e.g. int) by supplying a “print” and “parse” function and swearing that “forall x, parse(print(x)) = x”.

  • Parsing AST nodes is no harder than dealing with text. See Ometa paper.

  • Some restrictions must be made to ensure that the parser is actually invertible:

    • Disjunctions must be distinct.
    • Sequences must be unambiguous (just like for Lenses).
    • All variables bound on the LHS of “A <-> B” must also appear on the RHS.