Addition, multiplication, and exponentiation model data types.
In math class you’ve done algebra, with addition and multiplication and exponentiation and
polynomials like 1 + x + x²
. And while programming, you’ve worked with enum
s and struct
s and
functions and lists. You probably thought these things were unrelated.
Surprise! They’re deeply related, and by the end of this post you’ll see how to use algebra to refactor your data types. The crowning example in this post will be finding an equivalent representation of red-black trees.
Table of contents:
If you already know why algebraic data types are called “algebraic”, and how they relate to addition and multiplication, feel free to skip this section and jump to “Refactoring with Algebra”. Otherwise, don’t be afraid: algebraic data types are very simple, and I’ll show examples below. All the code in this post will be in Rust.
To have algebraic data types, you need two things: “product types” and “sum types”.
struct
s in Rust are product types:
Rectangle
is called a product type because to calculate the number of possible Rectangles, you
multiply the number of possible values of its fields. There are 2^32
possible values for x
,
for y
, for width
, and for height
, so there are 2^32 * 2^32 * 2^32 * 2^32 = 2^128
possible
Rectangle
s. Let’s call this number, 2^128
, the cardinality of Rectangle
.
The same thing is true for tuples: the number of possible values of a tuple is the product of the number of possible values of its elements. So tuples are also considered product types:
We used a product type for Pos
because it contains an x-coordinate and a y-coordinate. On the
other hand, you use a sum type when you have one thing or another thing. So Rust enum
s are sum
types.
For example, a vending machine has multiple states, and needs to store different information depending on which state it is in:
(This is not meant to be a full featured implementation of a vending machine. For example, it doesn’t handle “someone put too much money in” or “help I’m out of quarters”.)
VendingMachineState
is called a sum type because its number of possible values is the sum of the
number of possible values of each of its options. There is 1
Idle
value, 2^32
possible
“amounts of money” in the MoneyInserted
state, and 2^32
possible char
s in the Dispensing
state (because unlike in C, a Rust char
is 4 bytes). So altogether, VendingMachineState
has a
cardinality of 1 + 2^32 + 2^32 = 1 + 2^33
.
Notice that this is not a count of the number of plausible values of VendingMachineState
. For
example, the char
will probably only ever be an ascii letter, and the “amount of money inserted
in cents” should always be much less than u32::MAX
. Instead, we are counting the number of
possible values allowed by the type system.
Let’s look at the cardinality of two common built-in sum types in Rust: Result
and Option
.
A Result<T, E>
contains either a value of type T
or a value of type E
. Thus its cardinality
is the sum of the cardinalities of T
and E
, which I’ll write as T + E
.
Since an Option<T>
contains either a T
or no data, its cardinality is 1 + T
: all the
possible values of T
, plus the extra None
option.
To reiterate:
<rant>
How, then, are you supposed to represent X or Y in a language that lacks sum types?
There are several different workarounds, depending on the language. Here are a couple:
null
(think objects in Java, or pointers in C). If you have a
type T
that could also be null
, its cardinality will be 1 + T
(just like Option
). This is
a wimpy version of a sum type that you can use to emulate real sum types. To represent the sum
type A + B
, you store a nullable A
and a nullable B
, and you take care to ensure that at
all times, exactly one of the two is null
. The downside is that the language will allow you to
accidentally set 0 or 2 of the values to null
, thereby constructing nonsensical data.A + B
by making an abstract class for the sum, plus concrete classes for A
and
B
. One downside is that this tends to be verbose, and to split what is logically a single
function on A + B
into multiple method implementations. It also invokes machinery that’s
vastly over-complicated for the task at hand.As a programming languages person, this drives me bonkers. Do you know how long we’ve had sum types? Since the 70s! They are very easy to implement and to type check. And they’re both safer and ergonomically nicer than the alternatives.
If you’re ever designing a language, please, I beg you, give it sum types.
</rant>
That’s my only rant this post, I promise.
Now for the magic!
Just as you can refactor code by rewriting it in a way that looks different but does the same
thing, you can refactor data type(s) by arranging their contents in a different way. For example,
we could replace the x
and y
in our Rectangle
type with a Pos
tuple:
This probably isn’t news to you. What you may not know is that you can use algebra to verify that this refactoring is correct!
The key insight is that for the refactoring to be correct, the information contained in the
NewRectangle
type must be the same as the information contained in the old Rectangle
type.
Therefore, the total number of possible values must remain the same. We can verify this with
algebra:
Rectangle
= i32 * i32 * u32 * u32
= (i32 * i32) * u32 * u32
= Pos * u32 * u32
= NewRectangle
In general, two data types contain the same information if and only if the algebraic expressions for those data types are equal. There are a couple ways to make use of this:
Rectangle
and NewRectangle
.In the rest of this post, we’ll work through the refactorings implied by a couple dozen laws of algebra, and see many applications of them.
The commutative law for multiplication says A * B = B * A
. Remember that A * B
as a data
type is a pair (A, B)
or alternatively a struct
with fields of type A
and B
. Thus
commutativity suggests that you can refactor a type by re-ordering its elements (thus winning the
contest for most boring refactoring ever):
There’s also a commutative law for addition: A + B = B + A
. Remember that A + B
is an enum with
two variants, one containing an A
and one containing a B
. So this law says, for example:
(Note that this isn’t good programming practice: semantically, the Ok
and Err
variants of a
Result
are not symmetric. For example, the ?
operator treats them differently, and programmers
expect that if there’s an error it goes in the Err
variant and not in the Ok
variant. However,
the algebra only cares about the information content, and flipping a Result
does keep the same
information content.)
Similarly:
There are also associative laws. One for multiplication:
(A * B) * C = A * (B * C) = A * B * C
Which gives some equivalences between tuples:
And one for addition:
(A + B) + C = A + (B + C) = A + B + C
Which gives some equivalences between enums:
So far we’ve only looked at boring algebraic laws that involved addition or multiplication. But distributivity involves both:
A * (B + C) = (A * B) + (A * C)
In terms of data types, we get a law involving both tuples/structs and enums:
Or a more realistic example:
This is a common refactoring. It is typically better to use the first form, since it doesn’t
logically duplicate Data
.
One interesting thing we can do is use subtraction to constrain a type.
A Go function that produces an answer T
or an error E
will return a pair (Option<T>,
Option<E>)
. (I’m writing Go’s nil
-able values as Rust-like Option
s for convenience.)
In algebra:
(Option<T>, Option<E>)
= (1 + T) * (1 + E)
However, programmers are expected to maintain the convention that exactly one of these two Option
s
should be filled. Thus these two cases are invalid:
1
: Neither an answer nor an error is presentT*E
: Both an answer and an error are presentLet’s start with our full type and subtract out these two invalid cases, to see what’s left:
(1 + T) * (1 + E) - 1 - T*E
= 1 + T + E + T*E - 1 - T*E
= T + E
This is the cardinality of Rust’s Result<T, E>
! So Rust’s Result
type allows exactly the valid
Go results and no more, using the type system to enforce what is only a convention in Go.
So far we’ve seen big numbers like i32 = 2^32
. But what about small numbers?
2
would be the number for a type that has two possible values. Either one thing, or another
thing, but nothing else. You know a type like this! It’s the friendly boolean:
What can you say in algebra about the number two? One thing is A + A = 2A
. This means:
This is relevant to Rust’s standard library binary search function:
The docs for this say:
If the value is found then
Result::Ok
is returned, containing the index of the matching element. […] If the value is not found thenResult::Err
is returned, containing the index where a matching element could be inserted while maintaining sorted order.
Since Result<usize, usize>
is equivalent to (bool, usize)
, this function could also have
returned both an index and a boolean specifying the meaning of that index. Though in this case
I think the Result
type is more clear.
1
would be the cardinality of a type that has one possible value.
In Rust, this is the unit type, written ()
. It’s like a tuple with 0 elements. It’s the most
boring type. It only has one possible value, which is also written ()
. Rust stores it in literally
0 bytes:
One is the multiplicative identity. This is a fancy way of saying:
1 * A = A
In terms of types, this means:
In other words, there’s no reason to include the ()
. It doesn’t add any information.
(Why, then, does Rust even have ()
? One reason is that it’s the return type for functions that
“don’t return anything”. In C or Java this would be written “void
”.)
What about zero? Zero is the sum of no things. It’s an enum with no variants:
Since there are no variants for it, you cannot construct a Zero
. It has 0 possible values.
In Rust, this type is written !
and pronounced “the never
type”.
In algebra, zero is the additive identity of multiplication:
A + 0 = A
In data types, this means:
As explained in the Rust docs:
Since the
Err
variant contains a!
, it can never occur. If theexhaustive_patterns
feature is present this means we can exhaustively match onResult<T, !>
by just taking theOk
variant. This illustrates another behaviour of!
- it can be used to “delete” certain enum variants from generic types likeResult
.
Zero plays a second role in algebra too. It’s the “absorbative” element of multiplication (aren’t these names great?):
A * 0 = 0
In data types, this means:
In other words, !
is contagious. There’s no way to construct it, so there’s no way to construct a
tuple or struct that contains it.
We just saw some very small numbers: 2, 1, 0. And we’ve seen intermediate numbers, like char =
2^32
. How about very large numbers?
Let’s take String
. String
has infinitely many possible values. We could say that the
cardinality of String
is ∞
, but ∞
isn’t really a number, and our algebra would fall apart if
we tried to use it.
Instead we’re going to just… not simplify String
. In the algebra, we’ll continue to call it
String
, and treat it as an indeterminate, never replacing it with a concrete number.
In fact, it’s useful to do this for more than just String
. For example, you may want to avoid
treating char
and u32
as identical, even if they have the same number of possible values, since
one is meant to store a unicode character, and the other a number. You can achieve this just by
leaving char
as char
and not setting it equal to 2^32
.
Summing up the numbers we’ve seen:
Data Type Algebraic Expression
------------------------------------
! 0
() 1
bool 2
u8 2^8, or just u8
char 2^32, or just char
String String
What’s the algebraic expression for an array? Well, the array [A; n]
contains n
A
s. So its
expression is A * A * ... * A = A^n
.
And now we can get some refactoring rules for arrays from the laws of exponents!
An array can be stored in a product type instead:
A^3 = A * A * A
(A^m)^n = (A^n)^m
This gives the equivalence between row-major and column-major storage for matrices:
Of course, you can always just jam your matrix into one big array:
(A^m)^n = A^(m*n)
(I probably got some of these indices backwards… you get the idea, though.)
If you have a long array, you can split it into a first part and a second part:
A^(m+n) = A^m * A^n
An array of pairs is equivalent to a pair of arrays:
(A*B)^n = A^n * B^n
This is the famous array of structs vs. struct of arrays equivalence.
We can find more rules by asking what happens if either the base or the exponent is a small number.
An array of length 1 might as well not be an array:
A^1 = A
An array of length 0 contains no information at all:
A^0 = 1
The Rust compiler confirms:
An array of units also contains no information:
1^n = 1
Rustc confirms again:
There’s one last data type we will model: functions!
Functions are totally data. You can stick one in a variable:
(There is a variety of function types in Rust, due to its ownership model. Let’s completely ignore this fact.)
The whole trick of this blog post is to count the number of possible values of a data type. So how
many distinct functions from u8
to bool
are there?
There are infinitely many. For example:
true
if its input is zero, and false
otherwise.true
.true
.This is, uh, not really what we were looking for. The u8
-to-bool
-ness is getting buried
beneath the side effects.
The better question to ask is, how many pure, terminating functions from u8
to bool
are
there?
This is easier to answer: to specify such a function, you have to say what boolean it returns on
input 0
and what boolean it returns on input 1
… and what boolean it returns on input
255
. Altogether, that’s 2*2*...*2 = 2^256
possible functions.
In general, the cardinality of a function from A
to B
is B^A
.
Let’s go through some of the algebraic laws for exponentiation again, this time applying them to functions rather than arrays.
First off:
A^n = A^n
Well no duh. What am I getting at? While the two A^n
s look the same, I mean the first one to be
interpreted as the cardinality of fn(n) -> A
, and the second one as the cardinality of [A; n]
.
This is saying that a function whose input has n
possible values and whose output is A
, can be
represented as an array of n
A
s.
This was the answer to a Google interview question! The question was to find the fastest way to count the number of set bits in a byte (i.e., the number of 1s in its binary representation). The answer was to realize that:
fn count_ones(byte: u8) -> u8
= u8^u8
= u8^256
= [u8; 256]
In other words: if you really care about the speed of a function, you can store all 256 of its possible return values in a lookup table. Memory is cheap these days!
Currying is the idea that a function that takes two arguments can also be expressed as a function that takes the first argument and returns a function that takes the second argument:
A^(m*n) = (A^m)^n
If you have a function that takes an enum
(like Result
), you can split it into functions that
each handle one of the variants of the enum
(like one for the Ok
and one for the Err
):
C^(A+B) = C^A * C^B
(This is analogous to the way you can emulate sum types in OOP: instead of having a single function
that acts on the sum A + B
, you have two methods—one for handling A
, and one for handling
B
—that live in different classes.)
A function that returns a pair is equivalent to a pair of functions:
(B*C)^A = B^A * C^A
Again, we can find more rules by asking what happens if either the base or the exponent is a small number.
A function that just takes the unit value (or equivalently no arguments) might as well not be a function:
A^1 = A
Remember that exponentiation models pure, terminating functions. So this law is saying that a function that takes a unit argument can only be useful (beyond its single return value) if it has a side effect.
A function that takes in a “never” argument is equivalent to unit:
A^0 = 1
[UPDATE] I wasn’t really sure about this, but a friend clarified it. We’re classifying functions
based on their behavior. How many functions are there from !
to A
, counting functions as
equal if they have the same behavior? Why, there’s just one. They’re all the same, because you
can’t call such a function.
A function that doesn’t return anything is sometimes called a void function (because it returns
“void
” in C). Void functions can’t do anything unless they have a side effect:
1^A = 1
We’ve looked at arrays, but how about lists? Unfortunately, there’s a lot of confusing terminology
around lists and their representations in various languages, so let me be clear. By “list”, I mean a
finite but arbitrarily long (and resizable) sequence of elements of the same type. Rust’s most
commonly used list type is called Vec<A>
. It’s backed by an array that’s re-allocated as needed,
but that’s an implementation detail.
How many possible values does a Vec<A>
have? This is easiest to answer if you think of a Vec<A>
like a linked list (it’s the same information content either way). It’s either empty, or it has an
element and another list:
Vec<A> = 1 + A*Vec<A>
Let’s substitute in the right hand side of the equation… repeatedly:
Vec<A> = 1 + A*Vec<A>
= 1 + A*(1 + A*Vec<A>)
= 1 + A + A^2*Vec<A>
= 1 + A + A^2*(1 + A*Vec<A>)
= 1 + A + A^2 + A^3*Vec<A>
= ...
= 1 + A + A^2 + A^3 + ...
This is saying that a list of A
s can have 0 A
s (because it’s empty), or 1 A
(because it has
length 1), or 2 A
s (because it has length 2), etc.
It’s tedious to work with infinite polynomials like this, though, so let’s try to find a closed-form solution:
Vec<A> = 1 + A*Vec<A>
Vec<A> - A*Vec<A> = 1
(1 - A)*Vec<A> = 1
Vec<A> = 1/(1 - A)
There we go! A list of A
s has 1/(1-A)
possible values.
“But Justin,” you say, “what does division mean? You haven’t shown any data type that it models.”
“And Justin,” you continue, “what if A
is bool
”? Then 1/(1-A) = 1/(1-2) = -1
. What are negative
numbers supposed to be?”
Shhh child, do not fear. One need not understand the formula for it to be true.
“You don’t know either, do you?”
Fear leads to anger. Anger leads to hate. Hate leads to algebraically manipulating infinite polynomials because you weren’t willing to simplify them.
And the 1/(1-A)
list formula does work.
Say we have a short list, with fewer than 3 elements. Equivalently, this is a list subject to
the constraint that it does not have 3 or more elements. We can represent all lists with 3 or
more elements as A^3 * Vec<A>
: you store the first three elements, plus a list for the rest.
Therefore our short list has cardinality:
Vec<A> - A^3 * Vec<A>
I.e., it’s a list (Vec<A>
) subject to the constraint (-
) that it does not have 3 or more
elements (A^3 * Vec<A>
).
Simplifying with algebra:
Vec<A> - A^3 * Vec<A>
= 1/(1 - A) - A^3/(1 - A)
= (1 - A^3) / (1 - A)
= (1 - A) * (1 + A + A^2) / (1 - A)
= 1 + A + A^2
In other words, it has length 0, 1, or 2. Not a startling insight, but it shows that the list formula works.
Here’s a more interesting example. The
zip function takes two
iterators and produces an iterator of pairs, by consuming from both iterators at once until one of
them stops producing values. If we assume the iterators are finite (the methods in this blog post
don’t work on infinite data structures), then they’re equivalent to lists, so we can use the
1/(1-A)
list formula for them.
Since the zip
function might not fully consume its input iterators, it pretty clearly throws some
information away. Let’s figure out what is lost, exactly. zip(Vec<A>, Vec<B>)
returns Vec<A*B>
,
but if it were lossless, it would have to return some additional information. Let’s call this extra
information X
. Then Vec<A>*Vec<B> = X*Vec<A*B>
. Solving for X
and simplifying:
X = Vec<A>*Vec<B> / Vec<A*B>
= [1/(1-A) * 1/(1-B)] / [1/(1-A*B)]
= [1/(1-A)(1-B)] / [1/(1-A*B)]
= (1-A*B) / (1-A)(1-B)
= 1 + A/(1-A) + B/(1-B) // this step's tricky
= 1 + A*Vec<A> + B*Vec<B>
So if you wanted zip
to be a lossless function, then not only would it have to produce
Vec<A*B>
, it would also have to preserve the unused elements of its inputs:
1
in case its inputs are the same length so all elements have been paired up;A*Vec<A>
(i.e., a non-empty list of A
s) in case its first input has unused elements; orB*Vec<B>
(i.e., a non-empty list of B
s) in case its second input has unused elements.Here’s another example of refactoring with algebra.
Say we have a list of alternating elements A
and B
, that could start with either A
or B
. We
could represent that with this Rust type:
There’s one subtlety here. I missed it the first time I wrote this section, but the algebra made it visible.
These definitions give two ways to express an empty list: either an empty StartWithA
or an
empty StartWithB
. To avoid having two ways to represent the same information, we should pick one
representation, and declare the other one illegal. So let’s say that an empty list should be
represented by Alternating::StartWithA(ListA::Empty)
, and that
Alternating::StartWithB(ListB::Empty)
is illegal.
Then the cardinalities are:
Alternating = ListA + ListB - 1 // minus 1 for the illegal state
ListA = 1 + A*ListB
ListB = 1 + B*ListA
Now to do some algebra, to find another representation! First, substitute ListB
into the equation
for ListA
, so that we can solve for ListA
:
ListA = 1 + A*ListB
ListA = 1 + A*(1 + B*ListA)
ListA = 1 + A + A*B*ListA
ListA(1 - A*B) = 1 + A
ListA = (1 + A)/(1 - A*B)
Then use that to solve for Alternating
:
Alternating
= ListA + ListB - 1
= ListA + 1 + B*ListA - 1
= ListA + B*ListA
= (1 + B) * ListA
= (1 + B) * (1 + A) / (1 - A*B)
= (1 + B) * (1 + A) * 1/(1 - A*B)
Wow! What is this type? Remember that 1 + X
means Option<X>
and 1/(1 - X)
means Vec<X>
.
Putting this together, and helpfully describing the fields, gives:
That’s a nicer representation, isn’t it? Three types became one, the linked-lists became a Vec
,
and the illegal state vanished.
One last magic trick. A Red-black tree is a kind of balanced binary tree. To stay balanced, it considers every node to be either red or black, and maintains these invariants:
Say that the leaves have type X
, and the non-leaf nodes have type Y
. Then we can capture the
first three invariants in Rust types:
We can use algebra to find an alternate form for red-black trees.
First, translate the Rust types to algebraic equations:
// N: Node
// R: RedNode
// B: BlackNode
N = R + B
B = x + y*N*N
= x + y*(R + B)*(R + B)
R = y*B*B
But remember that this didn’t capture the last invariant, that “every path from the root to a leaf
crosses the same number of black nodes”. In order to capture it, we’ll need to index the types R
and
B
by the number of black nodes crossed. So I’ll write Bn
for “the type of a black node, such
that every path from it to its leaves crosses n
black nodes (including itself). Using this, we
can get the full equations for a red-black tree:
B1 = x // n=1 -> must be a leaf
B{n+1} = y*(Bn + Rn)*(Bn + Rn)
Rn = y*Bn*Bn
Now that the equations are complete, we can simplify them. With a little algebra we can substitute
away Rn
, so that there’s only one remaining type:
B1 = x // this stays the same
B{n+1} = y * (Bn + Rn) * (Bn + Rn)
= y * (Bn + y*Bn*Bn) * (Bn + y*Bn*Bn)
= y * Bn * (1 + y*Bn) * Bn * (1 + y*Bn)
= y * Bn * Bn * (1 + y*Bn) * (1 + y*Bn)
We can translate this back into Rust. There’s no good way to keep the n
parameter, so we’ll erase
it but remember that, per the last equation, the n
value of a child is always 1 less than its
parent’s. Thus the tree now has constant depth (given by n
). So erasing n
we have:
So we’ve found another representation of red-black trees!
Is it useful? It at least warranted a prominent mention on the Wikipedia page for red-black trees. This section starts out:
A red–black tree is similar in structure to a B-tree of order 4, where each node can contain between 1 and 3 values and (accordingly) between 2 and 4 child pointers.
By “values”, it means “data in nodes”, i.e. our Y
s. And indeed the Branch
variant has 1-3 Y
s
and (accordingly) 2-4 Node<X, Y>
s.
So that’s how you can use algebra to find another representation of a data structure, even one with complicated invariants like a red-black tree.
I don’t know about you, but I was excited to discover just how extensively you can model data types with algebra. This post covered a lot, so here’s a cheat sheet to remember all the algebraic laws and what they say about types:
To read more in this vein, you might want to check out:
1/(1-A)
as the cardinality of a
list wasn’t total nonsense? That’s called a generating function, and you learn to wield those
in a class on combinatorics. UPDATE: book recommendation – Concrete Mathematics