A Twist on Wadler's Printer

I present a more expressive variant of Wadler’s “Prettier Printer”.

What’s a pretty printer?

You’ve probably used a code formatter like gofmt or rustfmt or JS prettier. These tools work in two steps: (i) parse the source code in a file, and (ii) print it out nicely. Step (ii) is pretty printing.

Pretty printing is the reverse of parsing. Parsing turns linear text into a tree (a parse tree, which after a bit of post-processing becomes an abstract syntax tree (AST)). Pretty printing is the reverse: it turns the tree-shaped AST into linear text.

One of the most commonly used pretty printing algorithms is Philip Wadler’s “A Prettier Printer” [1]. Hereafter I’ll call it Prettier. Prettier is reasonably expressive while being extremely fast (linear time) and simple (50 lines of code), which probably accounts for its popularity.

Implementing Prettier in a strict language. (Click to expand.)

Prettier is written in Haskell and relies on the language being lazily evaluated. If you code it directly in another language, you'll get something that runs in exponential time! But there's an easy modification to the algorithm for strict languages described in "Strictly Pretty" [2]. The code in this post will do essentially the same thing.

In this post, I’ll:

describe how Prettier behaves
describe a variation of it I came up with that’s just as fast while being more expressive
walk through implementing it in Rust, and writing a Json printer using it

Wadler’s Prettier Printer

Say we have a program we want to print. For simplicity of exposition, just a short list: [hello, world] (imagine that “hello” and “world” are variables). There are a couple ways you might want to print this. Either all on one line:

[ hello, world ]

or on separate lines:

[
    hello,
    world
]

Each of these possibilities is called a layout. A pretty printer will have (i) a language in which you can express a variety of possible layouts, and (ii) an algorithm for picking the “best” possible layout from that set. What “best” means depends on the printer, but for Prettier it’s simply “try not to make lines too long” (canonically 80 characters, but you can set that to whatever width you want).

In our tiny example, there are only two possible layouts, but in general there can be many more. Exponentially more, in fact, since alternatives can be nested inside each other. This gives a very large space of possibilities to explore. Prettier’s algorithm explores it in linear time by greedily resolving alternatives one line at a time. We’ll see that in detail later, but for now let’s return to the easier problem of how to express the two layouts above.

Those two layouts can be expressed in Prettier using a combination of:

txt for the text of each token like “[” and “hello”. I’ll typically omit the txt and just write the text directly in quotes, so the examples don’t get too noisy.
nl for newlines.
indent(i, x) to indent every newline in x by i spaces.
x & y to concatenate things together.

I’ll call a mix of these operations a notation.

I'm using different words than Wadler does in his paper.

If you're referencing his paper, he calls the above `text`, `line`, `nest`, and `<>`. I say "indent" instead of "nest" because that's a better word for what it does; "nl" (newline) instead of "line" because "line" could mean a lot of other things; and "&" because you can use that as an operator in Rust. Additionally, what I call a "notation" Wadler calls a "document". I find that a particularly unpleasant word choice because "document" sounds like the thing you wanted to print in the first place, not the pretty printing expression you've converted it to to display it. Unfortunately, "document" is pretty well embedded as a term in the literature now. Fortunately I'm writing a blog post and can call it whatever I please.

We can write a notation that prints as the first layout like so:

"[" & " " & "hello" & "," & " " & "world" & " " & "]"

⇨

[ hello, world ]

And a notation that prints as the second layout:

"[" & indent(4, nl & "hello" & "," & nl & "world") & nl & "]"

⇨

[
    hello,
    world
]

Notice how much these two notations have in common! If you replace every nl with a space " ", the second notation becomes almost identical to the first:

"[" & indent(4, " " & "hello" & "," & " " & "world") & " " & "]"

The only difference is the indent, which no longer matters because there are no newlines to indent.

This is the key to how Prettier represents choices. There’s just one additional notation operator:

group(x) means “Display x with every nl in it replaced by a space, if that would fit on the current line. Otherwise, display x as-is.”

Using group, we can express a choice between the two above layouts with a single notation:

group("[" & indent(4, nl & "hello" & "," & nl & "world") & nl & "]")

Then you can print this notation with various max line widths. If the max width is 80 chars, the group sees that you can replace newlines with spaces and things will fit on one line, so it prints:

[ hello, world ]

but if the max width is 10 it sees that things don’t fit on one line so it prints:

[
    hello,
    world
]

And that’s all! Prettier is basically these five operations, plus a fairly simple and fast algorithm for printing a notation.

My Twist on the Printer

I’ve found a slight variation on this that’s more expressive, while being able to use essentially the same simple and fast printing algorithm.

Instead of group, there are two operations:

x | y for a choice between arbitrary notations x and y. The printer will display x if it fits on the current line, or y otherwise. There are some rules about the properties of x and y that you should follow when writing choices, which will be described later!
flat(x) forces every choice inside of it to pick its leftmost option. It’s called flat because this will typically “flatten” the layout to a single line.

This is only a little bit novel.

Wadler's paper also describes this choice operation and defines `group` in terms of it, but purposefully doesn't expose it to the user because they could break the invariants the printer relies on. It's only the `flat` operation that's new. Wadler's paper has "flatten", which replaces newlines with spaces, while my "flat" picks the first option of each choice.

This is better than group because it’s more powerful. It allows you to specify choices between layouts that differ in more than just whitespace. For example, rustfmt would format our “hello world” example with a trailing comma:

[
    hello,
    world,
]

The trailing comma looks a little silly, but it's nice for git diffs.

If you were to add a third line to the list while using a trailing comma:

[
    hello,
    world,
    turtle,
]

then the diff is only one line:

+    turtle,

But if you were to add that third line while not using trailing commas, then the diff is two lines:

-    world
+    world,
+    turtle

The point is it's at least sensible to put a trailing comma on lists that are split across multiple lines, so it would be nice to support that.

But it would be stupid to put a trailing comma on single line lists!

// stupid:
[ hello, world, ]

So we’d like a printer that can express choices between layouts that differ in more complex ways than simply “newline → space”. There are various ways to slightly extend what Prettier can express, piecemeal. Or we can go all in and use the two operations I describe above, which give a whole lot of power all at once. I like to go all in 😃.

The choice operator will let us eliminate the trailing comma from the single-line layout. As a bonus, we can also remove the spaces just inside the brackets of the single-line layout, giving:

[hello, world]

[
    hello,
    world,
]

To express these two alternatives, all we do is take the single-line notation we want and the multi-line notation we want, and join them with |:

"[" & "hello" & "," & " " & "world" & "]"
|
"[" & indent(4, nl & "hello" & "," & nl & "world" & ",") & nl & "]"

This example doesn’t need flat, but flat comes in handy when we’re writing a notation but don’t know what might appear inside of it. For example, if we’re writing a notation for a generic list containing unknown notations $X and $Y, the single-line layout would want to enforce that the things inside it don’t span multiple lines. It could do so using flat:

"[" & flat($X) & "," & " " & flat($Y) & "]"
|
"[" & indent(4, nl & $X & "," & nl & $Y & ",") & nl & "]"

The Rule

With great power comes great responsibility. The choice | operator gives great power, in that you can make a choice between arbitrary notations. But the printer can’t feasibly check all exponentially many possibilities (remember that choices can be deeply nested), so it must rely on the choices obeying a rule.

The Rule. In every choice x | y, the shortest possible first line of y must be at least as short as every possible first line of x.

The printing algorithm, which I’ll start describing soon, will rely on this choice for correctly determining whether things fit on a line.

There’s another softer rule, which is that if you have a choice x | y then x should not contain forced newlines. If you obey this rule, then flat does what its name says and collapses the entire layout onto a single line. If you don’t the printer will continue to work and flat will continue to pick the first option, but that first option might span multiple lines.

Implementation

Let’s walk through a quick prototype of the printing algorithm in Rust. Then we’ll use it to print Json.

Dependencies

We’ll initialize a rust project with cargo init --lib, then add two dependencies to Cargo.toml:

[dependencies]
unicode-width = "0.1.5"

[dev-dependencies]
serde_json = "1.0"

unicode-width is to get accurate string widths for monospaced terminal fonts including full-width Unicode characters. I’m pretty sure this is an impossible problem but we can at least try.
serde_json is for parsing Json. It’s only a dev-dependency because the pretty printing library won’t rely on it. It’s only needed for the Json parser we’ll build on top of the library later.

src/lib.rs

The library’s going to be split into two modules: one defines a data type for notations, and the other gives the pretty printing algorithm:

mod notation;
mod print;

pub use notation::{flat, indent, nl, txt, Notation};
pub use print::pretty_print;

You might notice that we didn’t export any operations for concat or choice. This is because they’ll be defined with the bitwise operators & and |. (If you really don’t like overloaded operators, feel free to write functions for these instead. And then update the use sites, where you’ll see how clunky it can get.)

src/notation.rs

The notation enum has exactly the operators described in this post, though with a couple interesting details I’ll discuss after showing the code:

use std::ops::{BitAnd, BitOr};
use std::rc::Rc;

#[derive(Debug, Clone)]
pub struct Notation(pub(crate) Rc<NotationInner>);

#[derive(Debug, Clone)]
pub enum NotationInner {
    Newline,
    Text(String, u32),
    Flat(Notation),
    Indent(u32, Notation),
    Concat(Notation, Notation),
    Choice(Notation, Notation),
}

/// Display a newline
pub fn nl() -> Notation {
    Notation(Rc::new(NotationInner::Newline))
}  

/// Display text exactly as-is. The text should not contain a newline!
pub fn txt(s: impl ToString) -> Notation {
    let string = s.to_string();
    let width = unicode_width::UnicodeWidthStr::width(&string as &str) as u32;
    Notation(Rc::new(NotationInner::Text(string, width)))
}

/// Use the leftmost option of every choice in the contained Notation.
/// If the contained Notation follows the recommendation of not
/// putting newlines in the left-most options of choices, then this
/// `flat` will be displayed all on one line.
pub fn flat(notation: Notation) -> Notation {
    Notation(Rc::new(NotationInner::Flat(notation)))
}

/// Increase the indentation level of the contained notation by the
/// given width. The indentation level determines the number of spaces
/// put after `Newline`s. (It therefore doesn't affect the first line
/// of a notation.)
pub fn indent(indent: u32, notation: Notation) -> Notation {
    Notation(Rc::new(NotationInner::Indent(indent, notation)))
}

impl BitAnd<Notation> for Notation {
    type Output = Notation;

    /// Display both notations. The first character of the right
    /// notation immediately follows the last character of the
    /// left notation.
    fn bitand(self, other: Notation) -> Notation {
        Notation(Rc::new(NotationInner::Concat(self, other)))
    }
}

impl BitOr<Notation> for Notation {
    type Output = Notation;

    /// If inside a `flat`, _or_ the first line of the left notation
    /// fits within the required width, then display the left
    /// notation. Otherwise, display the right notation.
    fn bitor(self, other: Notation) -> Notation {
        Notation(Rc::new(NotationInner::Choice(self, other)))
    }
}

Sub-notations that are used in multiple places must be shared, so they’re stored in an Rc (ref-counted). This sharing is very important: without it the notation becomes exponentially larger. For example, notice that the final “hello world” notation repeated “hello” and “world” twice. If you start nesting notations like this, these repetitions multiply together. So it’s important to share them by having multiple references to the same notation on the heap, which is what Rc does. Prettier was written in Haskell, which shares by default, so it didn’t need anything like Rc.

The Text variant contains the width of the string in columns. It makes sense to compute this up front because it will accessed multiple times when determining whether things fit on a line.

Dive into the rabbit hole of string widths.

The width of a string in a monospace font is different from how many bytes it is, because `á` is multiple bytes but width 1, and it's different from the number of Unicode code points or grapheme clusters because "漢" is one character/code-point/grapheme-cluster but has width 2.

The model we're relying on here is: "when you print characters in a monospace font they'll all be printed in that font, that font will only contain single-width and double-width glyphs, and these widths can be determined from the character alone without knowing the OS, terminal, and installed Unicode version". This model is false, but we'll pray it's sufficiently true to be useful and that the `unicode_width` library does a good job at approximating it.

src/print.rs

Here’s the core algorithmic idea from Wadler. When printing a document, we’re going to break it up into a stack of chunks containing everything we haven’t yet printed. Each chunk keeps a reference to a notation, together with the accumulated indentation at that notation and whether it’s inside a flat:

use crate::notation::{Notation, NotationInner};

#[derive(Debug, Clone, Copy)]
struct Chunk<'a> {
    notation: &'a Notation,
    indent: u32,
    flat: bool,
}

A few methods for modifying chunks will come in handy later:

impl<'a> Chunk<'a> {
    fn with_notation(self, notation: &'a Notation) -> Chunk<'a> {
        Chunk {
            notation,
            indent: self.indent,
            flat: self.flat,
        }
    }
    
    fn indented(self, indent: u32) -> Chunk<'a> {
        Chunk {
            notation: self.notation,
            indent: self.indent + indent,
            flat: self.flat,
        }
    }
    
    fn flat(self) -> Chunk<'a> {
        Chunk {
            notation: self.notation,
            indent: self.indent,
            flat: true,
        }
    }
}

The printer needs just three things:

struct PrettyPrinter<'a> {
    /// Maximum line width that we'll try to stay within
    width: u32,
    /// Current column position
    col: u32,
    /// A stack of chunks to print. The _top_ of the stack is the
    /// _end_ of the vector, which represents the _earliest_ part
    /// of the document to print.
    chunks: Vec<Chunk<'a>>,
}

To initialize a Printer, set the starting column to 0 and the chunk stack to contain a single chunk whose notation represents the entire document to print:

impl<'a> PrettyPrinter<'a> {
    fn new(notation: &'a Notation, width: u32) -> PrettyPrinter<'a> {
        let chunk = Chunk {
            notation,
            indent: 0,
            flat: false,
        };
        PrettyPrinter {
            width,
            col: 0,
            chunks: vec![chunk],
        }
    }
}

Here’s the method for printing:

impl<'a> PrettyPrinter<'a> {
fn print(&mut self) -> String {
    use NotationInner::*;

    let mut output = String::new();
    while let Some(chunk) = self.chunks.pop() {
        match chunk.notation.0.as_ref() {
            Text(text, width) => {
                output.push_str(text);
                self.col += width;
            }
            Newline => {
                output.push('\n');
                for _ in 0..chunk.indent {
                    output.push(' ');
                }
                self.col = chunk.indent;
            }
            Flat(x) =>
                self.chunks.push(chunk.with_notation(x).flat()),
            Indent(i, x) =>
                self.chunks.push(chunk.with_notation(x).indented(*i)),
            Concat(x, y) => {
                self.chunks.push(chunk.with_notation(y));
                self.chunks.push(chunk.with_notation(x));
            }
            Choice(x, y) => {
                if chunk.flat || self.fits(chunk.with_notation(x)) {
                    self.chunks.push(chunk.with_notation(x));
                } else {
                    self.chunks.push(chunk.with_notation(y));
                }
            }
        }
    }
    output
}
}

Walking through each case:

Text prints the text and adds the text’s width to col.
Newline prints a newline, followed by spaces for the current indentation level. It also updates col.
Flat and Indent set the chunk’s flat/indentation and push it back onto the stack so that we’ll recur on it.
Concat(x, y) splits the chunk into two chunks and pushes them onto the stack. It has to push first y and then x so that x is on the top of the stack and we process it next.
Choice(x, y) is the one interesting case. If we’re inside flat, we pick x. Otherwise we pick x if it fits on the current line or y otherwise. This calls self.fits() which does the hard work.

The fits method is where the sausage gets made. We’re going to essentially scan ahead to see whether things will fit on the current line, but there are some important details.

impl<'a> PrettyPrinter<'a> {
    fn fits(&self, chunk: Chunk<'a>) -> bool {
        use NotationInner::*;
  
        let mut remaining = if self.col <= self.width {         
            self.width - self.col
        } else {
            return false;
        };
        let mut stack = vec![chunk];
        let mut chunks = &self.chunks as &[Chunk];

        loop {
            let chunk = match stack.pop() {
                Some(chunk) => chunk,
                None => match chunks.split_last() {
                    None => return true,
                    Some((chunk, more_chunks)) => {
                        chunks = more_chunks;
                        *chunk
                    }
                },
            };

            match chunk.notation.0.as_ref() {
                Newline => return true,
                Text(_text, text_width) => {
                    if *text_width <= remaining {
                        remaining -= *text_width;
                    } else {
                        return false;
                    }
                }
                Flat(x) => stack.push(chunk.with_notation(x).flat()),
                Indent(i, x) =>
                    stack.push(chunk.with_notation(x).indented(*i)),
                Concat(x, y) => {
                    stack.push(chunk.with_notation(y));
                    stack.push(chunk.with_notation(x));
                }
                Choice(x, y) => {
                    if chunk.flat {
                        stack.push(chunk.with_notation(x));
                    } else {
                        // Relies on the rule that for every choice
                        // `x | y`, the first line of `y` is no longer
                        // than the first line of `x`.
                        stack.push(chunk.with_notation(y));
                    }
                }
            }
        }
    }
}

Let’s walk through this piece by piece.

The amount of space remaining on the current line is the max line width self.width minus the current column self.col, though if we’re already past the width limit we can return false because things really don’t fit.

let mut remaining = if self.col <= self.width {         
    self.width - self.col
} else {
    return false;
};

Just like in print(), we want to keep a stack of chunks to process. Though now we’re just going to scan ahead to see if stuff fits on the current line, without actually printing anything. We therefore don’t want to modify self.chunks because modifications should not persist after the fits() method finishes. However, we don’t want to copy self.chunks because that would be inefficient. So instead we’ll keep a working stack called stack and a reference to the self.chunks that we haven’t yet processed called chunks:

let mut stack = vec![chunk];
let mut chunks = &self.chunks as &[Chunk];

The first thing the loop will do is “get the next chunk”. Since there are two places it might come from this is a little complicated, but all this code says is “take a chunk from the stack, or if that’s empty take it from chunks, or if that’s empty too return true”:

loop {
    let chunk = match stack.pop() {
        Some(chunk) => chunk,
        None => match chunks.split_last() {
            None => return true,
            Some((chunk, more_chunks)) => {
                chunks = more_chunks;
                *chunk
            }
        },
    };

Then the body of the loop processes the chunk to check whether it fits on the line:

match chunk.notation.0.as_ref() {
    Newline => return true,
    Text(_text, text_width) => {
        if *text_width <= remaining {
            remaining -= *text_width;
        } else {
            return false;
        }
    }
    Flat(x) => stack.push(chunk.with_notation(x).flat()),
    Indent(i, x) => stack.push(chunk.with_notation(x).indented(*i)),
    Concat(x, y) => {
        stack.push(chunk.with_notation(y));
        stack.push(chunk.with_notation(x));
    }
    Choice(x, y) => {
        if chunk.flat {
            stack.push(chunk.with_notation(x));
        } else {
            // Relies on the rule that for every choice
            // `x | y`, the first line of `y` is no longer
            // than the first line of `x`.
            stack.push(chunk.with_notation(y));
        }
    }
}

If we hit a newline, there’s nothing more on the line so things must have fit and we can return true.
When we encounter Text, we check if it fits in the remaining space. If it does, we subtract its width from the remaining space. If it doesn’t, we return false.
We handle Flat, Indent, and Concat just like in the print() method, though here we’re pushing to stack instead of self.chunks.
The Choice(x, y) case is the crux of the whole algorithm. When we encounter this choice, it means that we were already checking whether the first option of a choice fits on the current line, because that’s the only situation where fits() is invoked. While doing so, we started looking ahead and encountered this second choice Choice(x, y). There might be a tradeoff between these two choices: perhaps either one can fit on one line but not both at once. We are greedily resolving this tradeoff in favor of the first choice, by asking whether there’s any way the second choice can be resolved such that the first choice fits on the line. This is where The Rule comes in! We ignore x and only check y, by the reasoning that “if y doesn’t fit, then x wouldn’t fit either, so it’s safe to only check if y fits.”

And that’s it! That’s the whole printer.

examples/json.rs

To test our newly completed pretty printing library, let’s add an example usage in example/json.rs for printing Json. It will give functions for constructing Notations for each kind of Json value (null, number, list, etc.):

use pretty::{flat, indent, nl, pretty_print, txt, Notation};

pub fn json_null() -> Notation {
    txt("null")
}

pub fn json_bool(b: bool) -> Notation {
    if b {
        txt("true")
    } else {
        txt("false")
    }
}

pub fn json_string(s: &str) -> Notation {
    // TODO: escape sequences
    txt(format!("\"{}\"", s))
}

pub fn json_number(n: impl ToString) -> Notation {
    txt(n)
}

fn comma_sep_single_line(elems: &[Notation]) -> Notation {
    let mut list = flat(elems[0].clone());
    for elem in &elems[1..] {
        list = list & txt(", ") & flat(elem.clone());
    }
    list
}

fn comma_sep_multi_line(elems: &[Notation]) -> Notation {
    let mut list = elems[0].clone();
    for elem in &elems[1..] {
        list = list & txt(", ") & nl() & elem.clone();
    }
    list
}

fn surrounded(open: &str, elems: &[Notation], closed: &str) -> Notation {
    if elems.is_empty() {
        return txt(open) & txt(closed);
    }

    let single_line = txt(open) & comma_sep_single_line(elems) & txt(closed);
    let multi_line = txt(open) & indent(4, nl() & comma_sep_multi_line(elems)) & nl() & txt(closed);
    single_line | multi_line
}

pub fn json_array(
    elems: impl IntoIterator<Item = Notation>
) -> Notation {
    let elems = elems.into_iter().collect::<Vec<_>>();
    surrounded("[", &elems, "]")
}

fn json_object_entry(key: String, value: Notation) -> Notation {
    json_string(&key) & txt(": ") & value
}

pub fn json_object(
    entries: impl IntoIterator<Item = (String, Notation)>
) -> Notation {
    let entries = entries
        .into_iter()
        .map(|(key, val)| json_object_entry(key, val))
        .collect::<Vec<_>>();
    surrounded("{", &entries, "}")
}

We could call these methods manually:

json_array([json_string("hello"), json_string("world")])

but it would be tedious to build a large enough example to be interesting.

Instead let’s make a Json code formatter that uses serde_json to parse Json and then re-prints it with our pretty printer. serde_json has a representation of Json called serde_json::Value. We’ll need a function to convert that to a Notation:

fn json_to_notation(json: serde_json::Value) -> Notation {
    use serde_json::Value::{Array, Bool, Null, Number, Object, String};

    match json {
        Null => json_null(),
        Bool(b) => json_bool(b),
        Number(n) => json_number(n),
        String(s) => json_string(&s),
        Array(elems) =>
            json_array(elems.into_iter().map(json_to_notation)),
        Object(entries) => json_object(
            entries
                .into_iter()
                .map(|(key, val)| (key, json_to_notation(val))),
        ),
    }
}

And now we can glue everything together to re-format a Json file. Let’s record timing info to tell how long each phase takes:

const WIDTH: u32 = 120;

fn main() {
    use std::env;
    use std::fs::File;
    use std::io::BufReader;
    use std::time::Instant;

    // Get the filename to parse from the command line args
    let env_args = env::args().collect::<Vec<_>>();
    if env_args.len() != 2 {
        panic!("Usage: cargo run --release --example json FILENAME.json");
    }
    let filename = &env_args[1];

    // Parse the file into json using serde
    let start = Instant::now();
    let file = File::open(filename).unwrap();
    let reader = BufReader::new(file);
    let json = serde_json::from_reader(reader).unwrap();
    let ms_to_parse = start.elapsed().as_millis();

    // Convert it to a Notation
    let start = Instant::now();
    let notation = json_to_notation(json);
    let ms_to_construct = start.elapsed().as_millis();

    // Pretty print the Notation
    let start = Instant::now();
    let output = pretty_print(&notation, WIDTH);
    let ms_to_pretty_print = start.elapsed().as_millis();

    // Print it to the terminal
    let start = Instant::now();
    println!("{}", output);
    let ms_to_output = start.elapsed().as_millis();

    // Print timing info to stderr
    eprintln!("Time to parse file as Json:    {} ms", ms_to_parse);
    eprintln!("Time to construct Notation:    {} ms", ms_to_construct);
    eprintln!("Time to pretty print Notation: {} ms", ms_to_pretty_print);
    eprintln!("Time to print to terminal:     {} ms", ms_to_output);
}

Running this on a big Json file of Pokémon on my laptop:

cargo run --release --example json pokemon.json

gives:

{
    "abilities": [
        {
            "ability": {"name": "overgrow", "url": "https://pokeapi.co/api/v2/ability/65/"}, 
            "is_hidden": false, 
            "slot": 1
        }, 
        {
            "ability": {"name": "chlorophyll", "url": "https://pokeapi.co/api/v2/ability/34/"}, 
            "is_hidden": true, 
            "slot": 3
        }
    ], 
    "base_experience": 64, 
    "cries": {
        "latest": "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/1.ogg", 
        "legacy": "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/1.ogg"
    },
    "forms": [{"name": "bulbasaur", "url": "https://pokeapi.co/api/v2/pokemon-form/1/"}], 
    "game_indices": [
        {"game_index": 153, "version": {"name": "red", "url": "https://pokeapi.co/api/v2/version/1/"}}, 
        {"game_index": 153, "version": {"name": "blue", "url": "https://pokeapi.co/api/v2/version/2/"}}, 
        {"game_index": 153, "version": {"name": "yellow", "url": "https://pokeapi.co/api/v2/version/3/"}}, 
        {"game_index": 1, "version": {"name": "gold", "url": "https://pokeapi.co/api/v2/version/4/"}}, 
        ...
        {"game_index": 1, "version": {"name": "white-2", "url": "https://pokeapi.co/api/v2/version/22/"}}
    ], 
    "height": 7, 
    "held_items": [], 
    "id": 1,    
    "is_default": true, 
    "location_area_encounters": "https://pokeapi.co/api/v2/pokemon/1/encounters", 
    "moves": [
        {
            "move": {"name": "razor-wind", "url": "https://pokeapi.co/api/v2/move/13/"}, 
            "version_group_details": [
                {
                    "level_learned_at": 0, 
                    "move_learn_method": {"name": "egg", "url": "https://pokeapi.co/api/v2/move-learn-method/2/"}, 
                    "version_group": {"name": "gold-silver", "url": "https://pokeapi.co/api/v2/version-group/3/"}
                }, 
                {
                    "level_learned_at": 0, 
                    "move_learn_method": {"name": "egg", "url": "https://pokeapi.co/api/v2/move-learn-method/2/"}, 
                    "version_group": {"name": "crystal", "url": "https://pokeapi.co/api/v2/version-group/4/"}
                }
            ]
        },
...
6737 lines of Json
...
Time to parse file as Json:    2 ms
Time to construct Notation:    8 ms
Time to pretty print Notation: 2 ms
Time to print to terminal:     36 ms

(This is printed at width 120 to make the output more interesting by having more things fit on a line, then hackily squeezed into my website’s rather narrow code blocks.)

So:

Only 2ms to parse the file. serde_json is fast.
It takes longer to construct the Notation than to print it! I suspect that most of the time comes from the Rcs. There are faster techniques for sharing, but they’re pretty heavy handed so I won’t show them here. But here’s a branch that does arena allocation.
The parsing and printing together is much faster than outputting to the terminal.

So there it is: a fairly simple, fairly fast, fairly expressive pretty printer. It’s Wadler’s Prettier Printer, with a twist.

The full code is on Github.

Citations

[1] Philip Wadler. “A Prettier Printer”. The Fun of Programming, Cornerstones of Computing, 2003. link.

[2] Christian Lindig. “Strictly Pretty”. Informally published, 2000. Link.

February 23, 2024

Justin Pombrio

About Me

Archive

Notes

Puzzles

Tree Editors

Programming

Physics

Math

All Posts