# Algebra and Data Types

Addition, multiplication, and exponentiation model data types.

In math class you’ve done algebra, with addition and multiplication and exponentiation and polynomials like `1 + x + x²`. And while programming, you’ve worked with `enum`s and `struct`s and functions and lists. You probably thought these things were unrelated.

Surprise! They’re deeply related, and by the end of this post you’ll see how to use algebra to refactor your data types. The crowning example in this post will be finding an equivalent representation of red-black trees.

## Algebraic Data Types

If you already know why algebraic data types are called “algebraic”, and how they relate to addition and multiplication, feel free to skip this section and jump to “Refactoring with Algebra”. Otherwise, don’t be afraid: algebraic data types are very simple, and I’ll show examples below. All the code in this post will be in Rust.

To have algebraic data types, you need two things: “product types” and “sum types”.

### Product Types

`struct`s in Rust are product types:

`Rectangle` is called a product type because to calculate the number of possible Rectangles, you multiply the number of possible values of its fields. There are `2^32` possible values for `x`, for `y`, for `width`, and for `height`, so there are `2^32 * 2^32 * 2^32 * 2^32 = 2^128` possible `Rectangle`s. Let’s call this number, `2^128`, the cardinality of `Rectangle`.

The same thing is true for tuples: the number of possible values of a tuple is the product of the number of possible values of its elements. So tuples are also considered product types:

### Sum Types

We used a product type for `Pos` because it contains an x-coordinate and a y-coordinate. On the other hand, you use a sum type when you have one thing or another thing. So Rust `enum`s are sum types.

For example, a vending machine has multiple states, and needs to store different information depending on which state it is in:

(This is not meant to be a full featured implementation of a vending machine. For example, it doesn’t handle “someone put too much money in” or “help I’m out of quarters”.)

`VendingMachineState` is called a sum type because its number of possible values is the sum of the number of possible values of each of its options. There is `1` `Idle` value, `2^32` possible “amounts of money” in the `MoneyInserted` state, and `2^32` possible `char`s in the `Dispensing` state (because unlike in C, a Rust `char` is 4 bytes). So altogether, `VendingMachineState` has a cardinality of `1 + 2^32 + 2^32 = 1 + 2^33`.

Notice that this is not a count of the number of plausible values of `VendingMachineState`. For example, the `char` will probably only ever be an ascii letter, and the “amount of money inserted in cents” should always be much less than `u32::MAX`. Instead, we are counting the number of possible values allowed by the type system.

#### Result and Option

Let’s look at the cardinality of two common built-in sum types in Rust: `Result` and `Option`.

A `Result<T, E>` contains either a value of type `T` or a value of type `E`. Thus its cardinality is the sum of the cardinalities of `T` and `E`, which I’ll write as `T + E`.

Since an `Option<T>` contains either a `T` or no data, its cardinality is `1 + T`: all the possible values of `T`, plus the extra `None` option.

### The Baffling Lack of Sum Types

To reiterate:

• You use a product type when you have one thing and another thing.
• You use a sum type when you have one thing or another thing.

<rant>

How, then, are you supposed to represent X or Y in a language that lacks sum types?

There are several different workarounds, depending on the language. Here are a couple:

• Many languages allow values to be `null` (think objects in Java, or pointers in C). If you have a type `T` that could also be `null`, its cardinality will be `1 + T` (just like `Option`). This is a wimpy version of a sum type that you can use to emulate real sum types. To represent the sum type `A + B`, you store a nullable `A` and a nullable `B`, and you take care to ensure that at all times, exactly one of the two is `null`. The downside is that the language will allow you to accidentally set 0 or 2 of the values to `null`, thereby constructing nonsensical data.
• Many languages have abstract classes and inheritance (or the like). You can use this to emulate the sum type `A + B` by making an abstract class for the sum, plus concrete classes for `A` and `B`. One downside is that this tends to be verbose, and to split what is logically a single function on `A + B` into multiple method implementations. It also invokes machinery that’s vastly over-complicated for the task at hand.

As a programming languages person, this drives me bonkers. Do you know how long we’ve had sum types? Since the 70s! They are very easy to implement and to type check. And they’re both safer and ergonomically nicer than the alternatives.

If you’re ever designing a language, please, I beg you, give it sum types.

</rant>

That’s my only rant this post, I promise.

## Refactoring with Algebra

Now for the magic!

Just as you can refactor code by rewriting it in a way that looks different but does the same thing, you can refactor data type(s) by arranging their contents in a different way. For example, we could replace the `x` and `y` in our `Rectangle` type with a `Pos` tuple:

This probably isn’t news to you. What you may not know is that you can use algebra to verify that this refactoring is correct!

The key insight is that for the refactoring to be correct, the information contained in the `NewRectangle` type must be the same as the information contained in the old `Rectangle` type. Therefore, the total number of possible values must remain the same. We can verify this with algebra:

``````Rectangle
= i32 * i32 * u32 * u32
= (i32 * i32) * u32 * u32
= Pos * u32 * u32
= NewRectangle
``````

In general, two data types contain the same information if and only if the algebraic expressions for those data types are equal. There are a couple ways to make use of this:

• You can verify that a refactoring doesn’t accidentally gain or lose information, like we did above for `Rectangle` and `NewRectangle`.
• You can derive refactoring techniques from laws of algebra. Every algebraic law gives a different refactoring technique!

In the rest of this post, we’ll work through the refactorings implied by a couple dozen laws of algebra, and see many applications of them.

### Commutativity

The commutative law for multiplication says `A * B = B * A`. Remember that `A * B` as a data type is a pair `(A, B)` or alternatively a `struct` with fields of type `A` and `B`. Thus commutativity suggests that you can refactor a type by re-ordering its elements (thus winning the contest for most boring refactoring ever):

There’s also a commutative law for addition: `A + B = B + A`. Remember that `A + B` is an enum with two variants, one containing an `A` and one containing a `B`. So this law says, for example:

(Note that this isn’t good programming practice: semantically, the `Ok` and `Err` variants of a `Result` are not symmetric. For example, the `?` operator treats them differently, and programmers expect that if there’s an error it goes in the `Err` variant and not in the `Ok` variant. However, the algebra only cares about the information content, and flipping a `Result` does keep the same information content.)

Similarly:

### Associativity

There are also associative laws. One for multiplication:

``````(A * B) * C = A * (B * C) = A * B * C
``````

Which gives some equivalences between tuples:

``````(A + B) + C = A + (B + C) = A + B + C
``````

Which gives some equivalences between enums:

### Distributivity

So far we’ve only looked at boring algebraic laws that involved addition or multiplication. But distributivity involves both:

``````A * (B + C) = (A * B) + (A * C)
``````

In terms of data types, we get a law involving both tuples/structs and enums:

Or a more realistic example:

This is a common refactoring. It is typically better to use the first form, since it doesn’t logically duplicate `Data`.

### Subtraction as a Constraint

One interesting thing we can do is use subtraction to constrain a type.

A Go function that produces an answer `T` or an error `E` will return a pair ```(Option<T>, Option<E>)```. (I’m writing Go’s `nil`-able values as Rust-like `Option`s for convenience.) In algebra:

``````  (Option<T>, Option<E>)
= (1 + T) * (1 + E)
``````

However, programmers are expected to maintain the convention that exactly one of these two `Option`s should be filled. Thus these two cases are invalid:

• `1`: Neither an answer nor an error is present
• `T*E`: Both an answer and an error are present

Let’s start with our full type and subtract out these two invalid cases, to see what’s left:

``````  (1 + T) * (1 + E) - 1 - T*E
= 1 + T + E + T*E - 1 - T*E
= T + E
``````

This is the cardinality of Rust’s `Result<T, E>`! So Rust’s `Result` type allows exactly the valid Go results and no more, using the type system to enforce what is only a convention in Go.

## Numbers

So far we’ve seen big numbers like `i32 = 2^32`. But what about small numbers?

### Counting Down: Two

`2` would be the number for a type that has two possible values. Either one thing, or another thing, but nothing else. You know a type like this! It’s the friendly boolean:

What can you say in algebra about the number two? One thing is `A + A = 2A`. This means:

This is relevant to Rust’s standard library binary search function:

The docs for this say:

If the value is found then `Result::Ok` is returned, containing the index of the matching element. […] If the value is not found then `Result::Err` is returned, containing the index where a matching element could be inserted while maintaining sorted order.

Since `Result<usize, usize>` is equivalent to `(bool, usize)`, this function could also have returned both an index and a boolean specifying the meaning of that index. Though in this case I think the `Result` type is more clear.

### Counting Down: One

`1` would be the cardinality of a type that has one possible value.

In Rust, this is the unit type, written `()`. It’s like a tuple with 0 elements. It’s the most boring type. It only has one possible value, which is also written `()`. Rust stores it in literally 0 bytes:

#### Multiplicative Identity

One is the multiplicative identity. This is a fancy way of saying:

``````1 * A = A
``````

In terms of types, this means:

In other words, there’s no reason to include the `()`. It doesn’t add any information.

(Why, then, does Rust even have `()`? One reason is that it’s the return type for functions that “don’t return anything”. In C or Java this would be written “`void`”.)

### Counting Down: Zero?

What about zero? Zero is the sum of no things. It’s an enum with no variants:

Since there are no variants for it, you cannot construct a `Zero`. It has 0 possible values.

In Rust, this type is written `!` and pronounced “the never type”.

In algebra, zero is the additive identity of multiplication:

``````A + 0 = A
``````

In data types, this means:

Since the `Err` variant contains a `!`, it can never occur. If the `exhaustive_patterns` feature is present this means we can exhaustively match on `Result<T, !>` by just taking the `Ok` variant. This illustrates another behaviour of `!` - it can be used to “delete” certain enum variants from generic types like `Result`.

#### Multiplicative Absorbative

Zero plays a second role in algebra too. It’s the “absorbative” element of multiplication (aren’t these names great?):

``````A * 0 = 0
``````

In data types, this means:

In other words, `!` is contagious. There’s no way to construct it, so there’s no way to construct a tuple or struct that contains it.

### Many

We just saw some very small numbers: 2, 1, 0. And we’ve seen intermediate numbers, like ```char = 2^32```. How about very large numbers?

Let’s take `String`. `String` has infinitely many possible values. We could say that the cardinality of `String` is `∞`, but `∞` isn’t really a number, and our algebra would fall apart if we tried to use it.

Instead we’re going to just… not simplify `String`. In the algebra, we’ll continue to call it `String`, and treat it as an indeterminate, never replacing it with a concrete number.

In fact, it’s useful to do this for more than just `String`. For example, you may want to avoid treating `char` and `u32` as identical, even if they have the same number of possible values, since one is meant to store a unicode character, and the other a number. You can achieve this just by leaving `char` as `char` and not setting it equal to `2^32`.

Summing up the numbers we’ve seen:

``````Data Type   Algebraic Expression
------------------------------------
!           0
()          1
bool        2
u8          2^8, or just u8
char        2^32, or just char
String      String
``````

## Arrays

What’s the algebraic expression for an array? Well, the array `[A; n]` contains `n` `A`s. So its expression is `A * A * ... * A = A^n`.

### Exponentiation Laws

And now we can get some refactoring rules for arrays from the laws of exponents!

#### Exponentiation is Repeated Multiplication

An array can be stored in a product type instead:

``````A^3 = A * A * A
``````

#### Matrix Storage

``````(A^m)^n = (A^n)^m
``````

This gives the equivalence between row-major and column-major storage for matrices:

#### Array Flattening

Of course, you can always just jam your matrix into one big array:

``````(A^m)^n = A^(m*n)
``````

(I probably got some of these indices backwards… you get the idea, though.)

#### Array Splitting

If you have a long array, you can split it into a first part and a second part:

``````A^(m+n) = A^m * A^n
``````

#### More Array Splitting

An array of pairs is equivalent to a pair of arrays:

``````(A*B)^n = A^n * B^n
``````

This is the famous array of structs vs. struct of arrays equivalence.

### Exponentiation with Small Numbers

We can find more rules by asking what happens if either the base or the exponent is a small number.

#### Short Arrays

An array of length 1 might as well not be an array:

``````A^1 = A
``````

#### Shorter Arrays

An array of length 0 contains no information at all:

``````A^0 = 1
``````

The Rust compiler confirms:

#### Array of Units

An array of units also contains no information:

``````1^n = 1
``````

Rustc confirms again:

## Functions

There’s one last data type we will model: functions!

Functions are totally data. You can stick one in a variable:

(There is a variety of function types in Rust, due to its ownership model. Let’s completely ignore this fact.)

The whole trick of this blog post is to count the number of possible values of a data type. So how many distinct functions from `u8` to `bool` are there?

There are infinitely many. For example:

• The function that returns `true` if its input is zero, and `false` otherwise.
• The function that ignores its input, prints “Hi Mom!”, then returns `true`.
• The function that mines bitcoins for 5 hours, stores them in a wallet, then returns `true`.
• The function that prints “Haha!”, then runs forever.

This is, uh, not really what we were looking for. The `u8`-to-`bool`-ness is getting buried beneath the side effects.

The better question to ask is, how many pure, terminating functions from `u8` to `bool` are there?

This is easier to answer: to specify such a function, you have to say what boolean it returns on input `0` and what boolean it returns on input `1`and what boolean it returns on input `255`. Altogether, that’s `2*2*...*2 = 2^256` possible functions.

In general, the cardinality of a function from `A` to `B` is `B^A`.

### Exponentiation: now for Functions

Let’s go through some of the algebraic laws for exponentiation again, this time applying them to functions rather than arrays.

#### Identity

First off:

``````A^n = A^n
``````

Well no duh. What am I getting at? While the two `A^n`s look the same, I mean the first one to be interpreted as the cardinality of `fn(n) -> A`, and the second one as the cardinality of `[A; n]`. This is saying that a function whose input has `n` possible values and whose output is `A`, can be represented as an array of `n` `A`s.

This was the answer to a Google interview question! The question was to find the fastest way to count the number of set bits in a byte (i.e., the number of 1s in its binary representation). The answer was to realize that:

``````  fn count_ones(byte: u8) -> u8
= u8^u8
= u8^256
= [u8; 256]
``````

In other words: if you really care about the speed of a function, you can store all 256 of its possible return values in a lookup table. Memory is cheap these days!

#### Currying

Currying is the idea that a function that takes two arguments can also be expressed as a function that takes the first argument and returns a function that takes the second argument:

``````A^(m*n) = (A^m)^n
``````

#### Function Splitting

If you have a function that takes an `enum` (like `Result`), you can split it into functions that each handle one of the variants of the `enum` (like one for the `Ok` and one for the `Err`):

``````C^(A+B) = C^A * C^B
``````

(This is analogous to the way you can emulate sum types in OOP: instead of having a single function that acts on the sum `A + B`, you have two methods—one for handling `A`, and one for handling `B`—that live in different classes.)

#### More Function Splitting

A function that returns a pair is equivalent to a pair of functions:

``````(B*C)^A = B^A * C^A
``````

### Exponentiation with Small Numbers

Again, we can find more rules by asking what happens if either the base or the exponent is a small number.

#### Short Functions

A function that just takes the unit value (or equivalently no arguments) might as well not be a function:

``````A^1 = A
``````

Remember that exponentiation models pure, terminating functions. So this law is saying that a function that takes a unit argument can only be useful (beyond its single return value) if it has a side effect.

#### Shorter Functions

A function that takes in a “never” argument is equivalent to unit:

``````A^0 = 1
``````

[UPDATE] I wasn’t really sure about this, but a friend clarified it. We’re classifying functions based on their behavior. How many functions are there from `!` to `A`, counting functions as equal if they have the same behavior? Why, there’s just one. They’re all the same, because you can’t call such a function.

#### Void Functions

A function that doesn’t return anything is sometimes called a void function (because it returns “`void`” in C). Void functions can’t do anything unless they have a side effect:

``````1^A = 1
``````

## Lists

We’ve looked at arrays, but how about lists? Unfortunately, there’s a lot of confusing terminology around lists and their representations in various languages, so let me be clear. By “list”, I mean a finite but arbitrarily long (and resizable) sequence of elements of the same type. Rust’s most commonly used list type is called `Vec<A>`. It’s backed by an array that’s re-allocated as needed, but that’s an implementation detail.

How many possible values does a `Vec<A>` have? This is easiest to answer if you think of a `Vec<A>` like a linked list (it’s the same information content either way). It’s either empty, or it has an element and another list:

``````Vec<A> = 1 + A*Vec<A>
``````

Let’s substitute in the right hand side of the equation… repeatedly:

``````Vec<A> = 1 + A*Vec<A>
= 1 + A*(1 + A*Vec<A>)
= 1 + A + A^2*Vec<A>
= 1 + A + A^2*(1 + A*Vec<A>)
= 1 + A + A^2 + A^3*Vec<A>
= ...
= 1 + A + A^2 + A^3 + ...
``````

This is saying that a list of `A`s can have 0 `A`s (because it’s empty), or 1 `A` (because it has length 1), or 2 `A`s (because it has length 2), etc.

It’s tedious to work with infinite polynomials like this, though, so let’s try to find a closed-form solution:

``````Vec<A> = 1 + A*Vec<A>
Vec<A> - A*Vec<A> = 1
(1 - A)*Vec<A> = 1
Vec<A> = 1/(1 - A)
``````

There we go! A list of `A`s has `1/(1-A)` possible values.

“But Justin,” you say, “what does division mean? You haven’t shown any data type that it models.”

“And Justin,” you continue, “what if `A` is `bool`”? Then `1/(1-A) = 1/(1-2) = -1`. What are negative numbers supposed to be?”

Shhh child, do not fear. One need not understand the formula for it to be true.

“You don’t know either, do you?”

Fear leads to anger. Anger leads to hate. Hate leads to algebraically manipulating infinite polynomials because you weren’t willing to simplify them.

#### Short Lists

And the `1/(1-A)` list formula does work.

Say we have a short list, with fewer than 3 elements. Equivalently, this is a list subject to the constraint that it does not have 3 or more elements. We can represent all lists with 3 or more elements as `A^3 * Vec<A>`: you store the first three elements, plus a list for the rest. Therefore our short list has cardinality:

``````Vec<A> - A^3 * Vec<A>
``````

I.e., it’s a list (`Vec<A>`) subject to the constraint (`-`) that it does not have 3 or more elements (`A^3 * Vec<A>`).

Simplifying with algebra:

``````  Vec<A> - A^3 * Vec<A>
= 1/(1 - A) - A^3/(1 - A)
= (1 - A^3) / (1 - A)
= (1 - A) * (1 + A + A^2) / (1 - A)
= 1 + A + A^2
``````

In other words, it has length 0, 1, or 2. Not a startling insight, but it shows that the list formula works.

#### Zip

Here’s a more interesting example. The zip function takes two iterators and produces an iterator of pairs, by consuming from both iterators at once until one of them stops producing values. If we assume the iterators are finite (the methods in this blog post don’t work on infinite data structures), then they’re equivalent to lists, so we can use the `1/(1-A)` list formula for them.

Since the `zip` function might not fully consume its input iterators, it pretty clearly throws some information away. Let’s figure out what is lost, exactly. `zip(Vec<A>, Vec<B>)` returns `Vec<A*B>`, but if it were lossless, it would have to return some additional information. Let’s call this extra information `X`. Then `Vec<A>*Vec<B> = X*Vec<A*B>`. Solving for `X` and simplifying:

``````X = Vec<A>*Vec<B> / Vec<A*B>
= [1/(1-A) * 1/(1-B)] / [1/(1-A*B)]
= [1/(1-A)(1-B)] / [1/(1-A*B)]
= (1-A*B) / (1-A)(1-B)
= 1 + A/(1-A) + B/(1-B)   // this step's tricky
= 1 + A*Vec<A> + B*Vec<B>
``````

So if you wanted `zip` to be a lossless function, then not only would it have to produce `Vec<A*B>`, it would also have to preserve the unused elements of its inputs:

• `1` in case its inputs are the same length so all elements have been paired up;
• `A*Vec<A>` (i.e., a non-empty list of `A`s) in case its first input has unused elements; or
• `B*Vec<B>` (i.e., a non-empty list of `B`s) in case its second input has unused elements.

#### Alternating Lists

Here’s another example of refactoring with algebra.

Say we have a list of alternating elements `A` and `B`, that could start with either `A` or `B`. We could represent that with this Rust type:

There’s one subtlety here. I missed it the first time I wrote this section, but the algebra made it visible.

These definitions give two ways to express an empty list: either an empty `StartWithA` or an empty `StartWithB`. To avoid having two ways to represent the same information, we should pick one representation, and declare the other one illegal. So let’s say that an empty list should be represented by `Alternating::StartWithA(ListA::Empty)`, and that `Alternating::StartWithB(ListB::Empty)` is illegal.

Then the cardinalities are:

``````Alternating = ListA + ListB - 1  // minus 1 for the illegal state
ListA = 1 + A*ListB
ListB = 1 + B*ListA
``````

Now to do some algebra, to find another representation! First, substitute `ListB` into the equation for `ListA`, so that we can solve for `ListA`:

``````ListA = 1 + A*ListB
ListA = 1 + A*(1 + B*ListA)
ListA = 1 + A + A*B*ListA
ListA(1 - A*B) = 1 + A
ListA = (1 + A)/(1 - A*B)
``````

Then use that to solve for `Alternating`:

``````Alternating
= ListA + ListB - 1
= ListA + 1 + B*ListA - 1
= ListA + B*ListA
= (1 + B) * ListA
= (1 + B) * (1 + A) / (1 - A*B)
= (1 + B) * (1 + A) * 1/(1 - A*B)
``````

Wow! What is this type? Remember that `1 + X` means `Option<X>` and `1/(1 - X)` means `Vec<X>`. Putting this together, and helpfully describing the fields, gives:

That’s a nicer representation, isn’t it? Three types became one, the linked-lists became a `Vec`, and the illegal state vanished.

## Binary Trees

One last magic trick. A Red-black tree is a kind of balanced binary tree. To stay balanced, it considers every node to be either red or black, and maintains these invariants:

• The root is black.
• All leaves are black.
• A red node has two black nodes as children.
• Every path from the root to a leaf crosses the same number of black nodes.

Say that the leaves have type `X`, and the non-leaf nodes have type `Y`. Then we can capture the first three invariants in Rust types:

We can use algebra to find an alternate form for red-black trees.

First, translate the Rust types to algebraic equations:

``````// N: Node
// R: RedNode
// B: BlackNode

N = R + B
B = x + y*N*N
= x + y*(R + B)*(R + B)
R = y*B*B
``````

But remember that this didn’t capture the last invariant, that “every path from the root to a leaf crosses the same number of black nodes”. In order to capture it, we’ll need to index the types `R` and `B` by the number of black nodes crossed. So I’ll write `Bn` for “the type of a black node, such that every path from it to its leaves crosses `n` black nodes (including itself). Using this, we can get the full equations for a red-black tree:

``````B1 = x    // n=1 -> must be a leaf
B{n+1} = y*(Bn + Rn)*(Bn + Rn)
Rn = y*Bn*Bn
``````

Now that the equations are complete, we can simplify them. With a little algebra we can substitute away `Rn`, so that there’s only one remaining type:

``````B1 = x  // this stays the same

B{n+1} = y * (Bn + Rn) * (Bn + Rn)
= y * (Bn + y*Bn*Bn) * (Bn + y*Bn*Bn)
= y * Bn * (1 + y*Bn) * Bn * (1 + y*Bn)
= y * Bn * Bn * (1 + y*Bn) * (1 + y*Bn)
``````

We can translate this back into Rust. There’s no good way to keep the `n` parameter, so we’ll erase it but remember that, per the last equation, the `n` value of a child is always 1 less than its parent’s. Thus the tree now has constant depth (given by `n`). So erasing `n` we have:

So we’ve found another representation of red-black trees!

Is it useful? It at least warranted a prominent mention on the Wikipedia page for red-black trees. This section starts out:

A red–black tree is similar in structure to a B-tree of order 4, where each node can contain between 1 and 3 values and (accordingly) between 2 and 4 child pointers.

By “values”, it means “data in nodes”, i.e. our `Y`s. And indeed the `Branch` variant has 1-3 `Y`s and (accordingly) 2-4 `Node<X, Y>`s.

So that’s how you can use algebra to find another representation of a data structure, even one with complicated invariants like a red-black tree.

## Summary

I don’t know about you, but I was excited to discover just how extensively you can model data types with algebra. This post covered a lot, so here’s a cheat sheet to remember all the algebraic laws and what they say about types:

Cheat Sheet

To read more in this vein, you might want to check out:

Discussion on lobste.rs

March 11, 2021 