Why are we still specifying parsers and printers separately? I would like to be able to write down a BNF grammar with rules like
<prog> ::= [Text| e:<expr>] <-> [Expr| e]
<expr> ::= '(' e:<expr> ')' <-> e
| e1:<expr> "*" e2:<expr> <-> [Prod| e1 e2]
| e:<list> <-> e
| n:<int> <-> [Num| n]
<list> ::= '[' (t:<term>)* ']' <-> [List| t*]
(where [Prod| e1 e2]
is syntax for a node of label “Prod”), and get both a parser and a printer.
Lenses solve this problem, but in a heavy-weight way. They come with three operations:
get: C -> A
put: A -> C -> C
create: A -> C
with three laws:put (get c) c = c
get (put a c) = a
get (create a) = a
We often do not need the ability to reflect changes in A back onto C, and merely having get
and create
(a.k.a. parse
and print
) and the third law would suffice. It would also be nice to have a grammar-like syntax for specifying parsers, instead of relying on lens combinators (though there is some awareness (http://www.seas.upenn.edu/~cse400/CSE400_2008_2009/websites/magee_puller/report.pdf) of this).
Some further thoughts:
I have thought enough about this to be comfortable claiming that its straightforward to solve in a dynamic language.
Things get a little more interesting in a typed language. You’ll need to abstract over data constructors. In the example above, you might have,
data Text = Text [Char]
data Expr = Prod Expr Expr
| List [Expr]
| Num Int
But again, there’s nothing monumental here.
’*’s act like lisp macro ’…’s
You will need to be able to inject user-defined printer/parsers (e.g. int) by supplying a “print” and “parse” function and swearing that “forall x, parse(print(x)) = x”.
Parsing AST nodes is no harder than dealing with text. See Ometa paper.
Some restrictions must be made to ensure that the parser is actually invertible: