I present a more expressive variant of Wadler’s “Prettier Printer”.
You’ve probably used a code formatter like gofmt
or rustfmt
or JS
prettier
. These tools work in two steps: (i) parse the source code in a file,
and (ii) print it out nicely. Step (ii) is pretty printing.
Pretty printing is the reverse of parsing. Parsing turns linear text into a tree (a parse tree, which after a bit of post-processing becomes an abstract syntax tree (AST)). Pretty printing is the reverse: it turns the tree-shaped AST into linear text.
One of the most commonly used pretty printing algorithms is Philip Wadler’s “A Prettier Printer” [1]. Hereafter I’ll call it Prettier. Prettier is reasonably expressive while being extremely fast (linear time) and simple (50 lines of code), which probably accounts for its popularity.
In this post, I’ll:
Say we have a program we want to print. For simplicity of exposition, just a
short list: [hello, world]
(imagine that “hello” and “world” are variables).
There are a couple ways you might want to print this. Either all on one line:
[ hello, world ]
or on separate lines:
[
hello,
world
]
Each of these possibilities is called a layout. A pretty printer will have (i) a language in which you can express a variety of possible layouts, and (ii) an algorithm for picking the “best” possible layout from that set. What “best” means depends on the printer, but for Prettier it’s simply “try not to make lines too long” (canonically 80 characters, but you can set that to whatever width you want).
In our tiny example, there are only two possible layouts, but in general there can be many more. Exponentially more, in fact, since alternatives can be nested inside each other. This gives a very large space of possibilities to explore. Prettier’s algorithm explores it in linear time by greedily resolving alternatives one line at a time. We’ll see that in detail later, but for now let’s return to the easier problem of how to express the two layouts above.
Those two layouts can be expressed in Prettier using a combination of:
txt
for the text of each token like “[” and “hello”. I’ll typically omit
the txt
and just write the text directly in quotes, so the examples don’t
get too noisy.nl
for newlines.indent(i, x)
to indent every newline in x
by i
spaces.x & y
to concatenate things together.I’ll call a mix of these operations a notation.
We can write a notation that prints as the first layout like so:
⇨
[ hello, world ]
And a notation that prints as the second layout:
⇨
[
hello,
world
]
Notice how much these two notations have in common! If you replace every nl
with a space " "
, the second notation becomes almost identical to the first:
The only difference is the indent
, which no longer matters because there are
no newlines to indent.
This is the key to how Prettier represents choices. There’s just one additional notation operator:
group(x)
means “Display x
with every nl
in it replaced by a space,
if that would fit on the current line. Otherwise, display x
as-is.”Using group
, we can express a choice between the two above layouts with a
single notation:
Then you can print this notation with various max line widths. If the max width
is 80 chars, the group
sees that you can replace newlines with spaces and
things will fit on one line, so it prints:
[ hello, world ]
but if the max width is 10 it sees that things don’t fit on one line so it prints:
[
hello,
world
]
And that’s all! Prettier is basically these five operations, plus a fairly simple and fast algorithm for printing a notation.
I’ve found a slight variation on this that’s more expressive, while being able to use essentially the same simple and fast printing algorithm.
Instead of group
, there are two operations:
x | y
for a choice between arbitrary notations x
and y
. The printer
will display x
if it fits on the current line, or y
otherwise.
There are some rules about the properties of x
and y
that you should
follow when writing choices, which will be described later!flat(x)
forces every choice inside of it to pick its leftmost option. It’s
called flat
because this will typically “flatten” the layout to a single line.This is better than group
because it’s more powerful. It allows you to specify
choices between layouts that differ in more than just whitespace. For example,
rustfmt
would format our “hello world” example with a trailing comma:
[
hello,
world,
]
[
hello,
world,
turtle,
]
then the diff is only one line:
+ turtle,
But if you were to add that third line while not using trailing commas, then the
diff is two lines:
- world
+ world,
+ turtle
The point is it's at least sensible to put a trailing comma on lists that are
split across multiple lines, so it would be nice to support that.
But it would be stupid to put a trailing comma on single line lists!
// stupid:
[ hello, world, ]
So we’d like a printer that can express choices between layouts that differ in more complex ways than simply “newline → space”. There are various ways to slightly extend what Prettier can express, piecemeal. Or we can go all in and use the two operations I describe above, which give a whole lot of power all at once. I like to go all in 😃.
The choice operator will let us eliminate the trailing comma from the single-line layout. As a bonus, we can also remove the spaces just inside the brackets of the single-line layout, giving:
[hello, world]
or
[
hello,
world,
]
To express these two alternatives, all we do is take the single-line notation we
want and the multi-line notation we want, and join them with |
:
This example doesn’t need flat
, but flat
comes in handy when we’re writing a
notation but don’t know what might appear inside of it. For example, if we’re
writing a notation for a generic list containing unknown notations $X
and
$Y
, the single-line layout would want to enforce that the things inside it
don’t span multiple lines. It could do so using flat
:
With great power comes great responsibility. The choice |
operator gives great
power, in that you can make a choice between arbitrary notations. But the
printer can’t feasibly check all exponentially many possibilities (remember
that choices can be deeply nested), so it must rely on the choices obeying a
rule.
The Rule. In every choice x | y
, the shortest possible first line of y
must be at least as short as every possible first line of x
.
The printing algorithm, which I’ll start describing soon, will rely on this choice for correctly determining whether things fit on a line.
There’s another softer rule, which is that if you have a choice x | y
then x
should not contain forced newlines. If you obey this rule, then flat
does what
its name says and collapses the entire layout onto a single line. If you don’t
the printer will continue to work and flat
will continue to pick the first
option, but that first option might span multiple lines.
Let’s walk through a quick prototype of the printing algorithm in Rust. Then we’ll use it to print Json.
We’ll initialize a rust project with cargo init --lib
, then add two
dependencies to Cargo.toml
:
[dependencies]
unicode-width = "0.1.5"
[dev-dependencies]
serde_json = "1.0"
unicode-width
is to get accurate string widths for monospaced terminal fonts
including full-width Unicode characters. I’m pretty sure this is an impossible
problem but we can at least try.serde_json
is for parsing Json. It’s only a dev-dependency because the
pretty printing library won’t rely on it. It’s only needed for the Json parser
we’ll build on top of the library later.The library’s going to be split into two modules: one defines a data type for notations, and the other gives the pretty printing algorithm:
You might notice that we didn’t export any operations for concat or choice. This
is because they’ll be defined with the bitwise operators &
and |
. (If you
really don’t like overloaded operators, feel free to write functions for these
instead. And then update the use sites, where you’ll see how clunky it can get.)
The notation enum
has exactly the operators described in this post, though
with a couple interesting details I’ll discuss after showing the code:
Sub-notations that are used in multiple places must be shared, so they’re stored
in an Rc
(ref-counted). This sharing is very important: without it the notation
becomes exponentially larger. For example, notice that the final “hello world”
notation repeated “hello” and “world” twice. If you start nesting notations like
this, these repetitions multiply together. So it’s important to share them by
having multiple references to the same notation on the heap, which is what
Rc
does. Prettier was written in Haskell, which shares by default, so it
didn’t need anything like Rc
.
The Text
variant contains the width of the string in columns. It makes sense
to compute this up front because it will accessed multiple times when
determining whether things fit on a line.
The model we're relying on here is: "when you print characters in a monospace font they'll all be printed in that font, that font will only contain single-width and double-width glyphs, and these widths can be determined from the character alone without knowing the OS, terminal, and installed Unicode version". This model is false, but we'll pray it's sufficiently true to be useful and that the `unicode_width` library does a good job at approximating it.
Here’s the core algorithmic idea from Wadler. When printing a document, we’re
going to break it up into a stack of chunks containing everything we haven’t
yet printed. Each chunk keeps a reference to a notation, together with the
accumulated indentation at that notation and whether it’s inside a flat
:
A few methods for modifying chunks will come in handy later:
The printer needs just three things:
To initialize a Printer
, set the starting column to 0 and the chunk stack to
contain a single chunk whose notation represents the entire document to print:
Here’s the method for printing:
Walking through each case:
Text
prints the text and adds the text’s width to col
.Newline
prints a newline, followed by spaces for the current indentation
level. It also updates col
.Flat
and Indent
set the chunk’s flat/indentation and push it back onto the
stack so that we’ll recur on it.Concat(x, y)
splits the chunk into two chunks and pushes them onto the
stack. It has to push first y
and then x
so that x
is on the top of the
stack and we process it next.Choice(x, y)
is the one interesting case. If we’re inside flat
, we pick
x
. Otherwise we pick x
if it fits on the current line or y
otherwise.
This calls self.fits()
which does the hard work.The fits
method is where the sausage gets made. We’re going to essentially
scan ahead to see whether things will fit on the current line, but there are
some important details.
Let’s walk through this piece by piece.
The amount of space remaining on the current line is the max line width
self.width
minus the current column self.col
, though if we’re already
past the width limit we can return false
because things really don’t fit.
Just like in print()
, we want to keep a stack of chunks to process. Though now
we’re just going to scan ahead to see if stuff fits on the current line, without
actually printing anything. We therefore don’t want to modify self.chunks
because modifications should not persist after the fits()
method finishes.
However, we don’t want to copy self.chunks
because that would be
inefficient. So instead we’ll keep a working stack called stack
and a
reference to the self.chunks
that we haven’t yet processed called chunks
:
The first thing the loop will do is “get the next chunk”. Since there are two
places it might come from this is a little complicated, but all this code
says is “take a chunk from the stack
, or if that’s empty take it from
chunks
, or if that’s empty too return true
”:
Then the body of the loop processes the chunk to check whether it fits on the line:
true
.Text
, we check if it fits in the remaining space. If it
does, we subtract its width from the remaining space. If it doesn’t, we
return false
.Flat
, Indent
, and Concat
just like in the print()
method,
though here we’re pushing to stack
instead of self.chunks
.Choice(x, y)
case is the crux of the whole algorithm. When we encounter
this choice, it means that we were already checking whether the first option
of a choice fits on the current line, because that’s the only situation where
fits()
is invoked. While doing so, we started looking ahead and encountered
this second choice Choice(x, y)
. There might be a tradeoff between these two
choices: perhaps either one can fit on one line but not both at once. We are
greedily resolving this tradeoff in favor of the first choice, by asking
whether there’s any way the second choice can be resolved such that the
first choice fits on the line. This is where The Rule comes in! We ignore x
and only check y
, by the reasoning that “if y
doesn’t fit, then x
wouldn’t fit either, so it’s safe to only check if y
fits.”And that’s it! That’s the whole printer.
To test our newly completed pretty printing library, let’s add an example usage
in example/json.rs
for printing Json. It will give functions for constructing
Notation
s for each kind of Json value (null, number, list, etc.):
We could call these methods manually:
but it would be tedious to build a large enough example to be interesting.
Instead let’s make a Json code formatter that uses serde_json
to parse Json
and then re-prints it with our pretty printer. serde_json
has a representation
of Json
called serde_json::Value
. We’ll need a function to convert that to a
Notation
:
And now we can glue everything together to re-format a Json file. Let’s record timing info to tell how long each phase takes:
Running this on a big Json file of Pokémon on my laptop:
gives:
{
"abilities": [
{
"ability": {"name": "overgrow", "url": "https://pokeapi.co/api/v2/ability/65/"},
"is_hidden": false,
"slot": 1
},
{
"ability": {"name": "chlorophyll", "url": "https://pokeapi.co/api/v2/ability/34/"},
"is_hidden": true,
"slot": 3
}
],
"base_experience": 64,
"cries": {
"latest": "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/1.ogg",
"legacy": "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/1.ogg"
},
"forms": [{"name": "bulbasaur", "url": "https://pokeapi.co/api/v2/pokemon-form/1/"}],
"game_indices": [
{"game_index": 153, "version": {"name": "red", "url": "https://pokeapi.co/api/v2/version/1/"}},
{"game_index": 153, "version": {"name": "blue", "url": "https://pokeapi.co/api/v2/version/2/"}},
{"game_index": 153, "version": {"name": "yellow", "url": "https://pokeapi.co/api/v2/version/3/"}},
{"game_index": 1, "version": {"name": "gold", "url": "https://pokeapi.co/api/v2/version/4/"}},
...
{"game_index": 1, "version": {"name": "white-2", "url": "https://pokeapi.co/api/v2/version/22/"}}
],
"height": 7,
"held_items": [],
"id": 1,
"is_default": true,
"location_area_encounters": "https://pokeapi.co/api/v2/pokemon/1/encounters",
"moves": [
{
"move": {"name": "razor-wind", "url": "https://pokeapi.co/api/v2/move/13/"},
"version_group_details": [
{
"level_learned_at": 0,
"move_learn_method": {"name": "egg", "url": "https://pokeapi.co/api/v2/move-learn-method/2/"},
"version_group": {"name": "gold-silver", "url": "https://pokeapi.co/api/v2/version-group/3/"}
},
{
"level_learned_at": 0,
"move_learn_method": {"name": "egg", "url": "https://pokeapi.co/api/v2/move-learn-method/2/"},
"version_group": {"name": "crystal", "url": "https://pokeapi.co/api/v2/version-group/4/"}
}
]
},
...
6737 lines of Json
...
Time to parse file as Json: 2 ms
Time to construct Notation: 8 ms
Time to pretty print Notation: 2 ms
Time to print to terminal: 36 ms
(This is printed at width 120 to make the output more interesting by having more things fit on a line, then hackily squeezed into my website’s rather narrow code blocks.)
So:
serde_json
is fast.Notation
than to print it! I suspect that
most of the time comes from the Rc
s. There are faster techniques for
sharing, but they’re pretty heavy handed so I won’t show them here. But here’s
a branch that does
arena allocation.So there it is: a fairly simple, fairly fast, fairly expressive pretty printer. It’s Wadler’s Prettier Printer, with a twist.
The full code is on Github.
[1] Philip Wadler. “A Prettier Printer”. The Fun of Programming, Cornerstones of Computing, 2003. link.
[2] Christian Lindig. “Strictly Pretty”. Informally published, 2000. Link.