Traffic Engineering with Portals, Part II

In part I, I introduced the question of how best to arrange of network of portals for efficient travel, and proposed a (to my mind) satisfying bit shifting solution.

(This post is going to go faster with less detailed explanation than the last. If parts are confusing, try working the math out for yourself!)

Slightly more efficient portal network

But you can do better than bit-shifting! Alex Elsayed points out that my “bit shifting” approach has been studied, and is called De-Bruijn graphs. And further, there’s a better approach, called Kautz graphs! For portal networks with out-degree 2, Kautz graphs let you have 50% more nodes, or alternatively reduce the route length by about 1/2.

To repeat from last time, my bit-shifting solution—a.k.a. De-Bruijn graphs—was to label each hub with a 20-bit address, and have two outgoing portals from each hub, labeled 0 and 1. Exiting through the 0 portal shifts the bits in the address left and adds a zero at the end:

a1 a2 ... a19 a20   ->   a2 a3 ... a19 a20 0

a1 a2 ... a19 a20   ->   a2 a3 ... a19 a20 1

If you follow the sequence of portals that spells out your destination address in binary, you’ll get there in 20 hops. And this is exactly what De-Bruijn graphs are.

Kautz graphs are similar, with one modification: adjacent digits of the address are required to be different. This makes a binary addresses pretty useless (it would have to alternate 010101…!), so let’s assume a 20-digit ternary address instead (larger bases also work). How many such addresses are there? There are three possibilities for the first digit, and two possibilities for each subsequent digit (since it can’t be the same as the previous digit). Thus a 20-digit ternary Kautz address has 3*2^19 possibilities.

And how about portals and routes? There are two exit portals from each hub (there would be three, but one of the digits would make an illegal address so you can skip that one). Typically, if you want to get from address abcd to address WXYZ (assuming that addresses are length 4 to keep this example short) you can proceed as with a DeBruijn network:

abcd
portal_W: bcdW
portal_X: cdWX
portal_Y: dWXY
portal_Z: WXYZ

But if W is the same digit as d, then this route would go through illegal addresses! So in that case, to get from abcd to WXYZ you have to take a different route, one that is one hop shorter:

abcd
portal_X: bcdX
portal_Y: cdXY
portal_Z: dXYZ

There’s a 1/3 chance that W is the same digit as d, so the average route length for 20 digit addresses is 19 2/3.

Putting this all together, we can compare bit-shifting, a.k.a. De-Bruijn, with Kautz:

Method   | Num hubs | Deg | Avg. route len
---------+----------+-----+---------------
DeBruijn | 2^20     | 2   | 20
Kautz    | 3*2^19   | 2   | 19.67

(“degree” means “number of exit portals per hub”.) So the Kautz approach connects more hubs while having slightly shorter routes.

Much more efficient portal network

But you can do even better than that! My friend David Meierfrankenfeld designed a portal network that’s better than I thought was possible.

It works like this. We’ll split the hubs into two categories: destination hubs and transit hubs, and arrange the network like this:

• There are 8^6 = 2^18 transit hubs. These transit hubs are connected to each other via my “bit shifting” (a.k.a. DeBruijn) arrangement but in octal instead of binary. Thus you can get from any transit hub to any other in 6 hops.
• Each transit hub is “assigned” 8 destination hubs. Thus there are 8^6*8 = 2^21 destination hubs, and 8/9 of all hubs are destination hubs. The eight destination hubs assigned to a transit hub h are connected in two strings of four, like this: h->1->2->3->4->h and h->5->6->7->8->h. Thus each transit hub has degree ten: 8 portals to other transit hubs, and 2 to destination hubs. And each destination hub has degree 1.
• An address takes the form:

T[6 digit octal code][a or b repeated 0 to 4 times]

For example, 221357aa is a valid address.

To navigate from one destination hub to another, you walk along the string of destination hubs until you get to the transit hub, then use the octal address to get to the target transit hub, then walk along the appropriate string of destination hubs according to the (a or b repeated 0 to 4 times) portion of the address.

This portal network is surprisingly efficient:

• The average degree is 2. This is because 1/9 of the hubs have degree 10 and 8/9 have degree 1, so the average degree is 1/9 * 10 + 8/9 * 1 = 18/9 = 2.
• The average route length is 11. To prove this, first notice that the route between two transit hubs is always 6. And the average extra route length following the a/b portion of an address is 5 (2.5 going out plus 2.5 going in), so the average route length is 6 + 5 = 11. (One nice property is that the round trip route length is always exactly 22.)

This is a good deal more efficient than bit shifting! The thing to compare it to is bit shifting (i.e. a De Bruijn graph) using a binary address of length 21:

Method           | Num hubs   | Avg. deg | Avg. route len
-----------------+------------+----------+---------------
DeBruijn         |       2^21 | 2        | 21
Meierfrankenfeld | (9/8)*2^21 | 2        | 11

That nearly halves the average route length!

Question for the reader: in the last post, I proved that for 2^20 hubs and degree 2, you couldn’t have route lengths smaller than 2^19. Which is very wrong! Meierfrankenfeld’s approach has routes of length 11 for even more hubs than that. What assumption(s) was the proof making that are invalidated in Meierfrankenfeld’s approach?

This raises another question. There must be some upper bound on how well you could do. What is it?

I have a hypothesis, which I will humbly call the:

Fundamental Theorem of Portal Traffic Engineering

Suppose we have a portal network (i.e. graph), with N hubs (i.e. nodes) and exactly one designated route from any hub to any other hub. That is, while there may be more than one path from one hub to another, exactly one of these paths is a route.

Furthermore, define:

• len is a random variable giving the length of a route chosen uniformly at random.
• H[len] is the entropy of len. For example, if half the routes are one length and half another, then H[len] is log 2, i.e. 1 bit. Let’s use base 2 for the logs.
• Avg[len] is the average route length.
• flow(h) is the fraction of all traffic that proceeds through an exit portal at hub h. In other words, if you snip each route into individual hops from one portal to another, it’s the fraction of all hops (among all routes) that start at h. So Sum_h flow(h) = 1 by definition.
• deg(h) is the degree of hub h, i.e. the number of exit portals at h.
• N is the number of hubs.

Then:

H[len] + Avg[len] * Sum_h (flow(h) * log deg(h)) >= log N

Call the left hand side of this inequality E, because it’s an upper bound of the entropy of the routes.

Proof Sketch

Place a person at a source hub selected uniformly at random. We’re going to have them follow a route at random, and compute (an upper bound of) the entropy of the decisions they make along the way. This entropy must be at least the log of the number of destinations (log N), or else there would be fewer routes than destinations. So:

H[travel decisions] >= log N

We can ask them to first decide on a route length L (from among the valid route lengths starting from their source hub), and then make L decisions of which exit portal to take. If we assume that several things are independent—the source hub, the route length, and the exit portal choices—then we have:

H[len] + Avg[len] * Avg[entropy of which portal to take] >= log N

Why can we assume that all of these things are independent? They very well might not be. But entropy is maximized if they are, and we’re computing an upper bound, so we can conservatively assume that they are independent.

That last inequality isn’t quite right, though. The Avg[entropy of which portal to take] part is assuming that the person is equally likely to traverse any hub. But they’re more likely to traverse some hubs than others. So we should take a weighted average: log deg(h) weighted by flow(h) (the chance that a hop is leaving h, of all places). As before, we’re conservatively assuming that several things are independent. This gives:

H[len] + Avg[len] * Sum_h (flow(h) * log deg(h)) >= log N

Which is the theorem.

Place a person who knows the navigation rules for the portal network at a source hub selected uniformly at random. Hand them a uniformly random destination address, and tell them to travel there. The entropy of the decisions they make while traveling must be, in expectation, at least as large as the entropy of the destination address. Otherwise there would be more destinations than routes. The entropy of the destination address is log N, so:

H[travel decisions] >= log N

The decisions they make can be split into the decision of how long a route to take, plus the sequence of decisions about what exit portal to take at each step of the route. The entropy of the route length is simply H[len]. And while traveling the route, they must make a decision of which exit portal to go through at each hub. So:

H[len] + Avg[len] * Avg[entropy of which portal to take] >= log N

(Notice that I took the average of the path length and entropy. I think this works out due to the linearity of expectations.)

What’s the entropy of the decision of which portal to take at a hub h? There are deg(h) exit portals, so it’s at most log deg(h):

H[len] + Avg[len] * Avg[log deg(h)] >= log N

But wait! This is assuming that we’re equally likely to be at any hub. But we’re more likely to be at some hubs than others. So we need to take a weighted average: log deg(h) weighted by flow(h) (the chance that we’re at h, of all places). This gives:

H[len] + Avg[len] * Sum_h (flow(h) * log deg(h)) >= log N

Which is the theorem.

What it Means

What is this saying?

We want to minimize Avg[len] and flow(h) and deg(h). But they all contribute positively to the left hand side of the inequality, which must be at least log N, so we can’t minimize them all at once! Instead, there’s a tradeoff. I would phrase it as a tradeoff between:

• The average route length
• The average degree
• How much traffic goes through higher-degree nodes

I’ll leave off with a table of entropy stats for various portal networks:

Method           | hubs |H[len]|Avg[len]| flow |Avg[deg]| E    |E-log N
-----------------+------+------+--------+----- +--------+------+-------
ring             | 2^20 | 20   | 2^19   | 1    | 1      | 20   | 0
star             | 2^20 | 0    | 2      | 1    | 2^20   | ~40  | 20
bit flipping     | 2^20 | 3.2  | 10     | 1    | 20     | ~46  | 26
DeBruijn base 2  | 2^20 | 0    | 20     | 1    | 2      | 20   | 0
DeBruijn base 16 | 2^20 | 0    | 5      | 1    | 16     | 20   | 0
Meierfrankenfeld |~2^21 | 2.8  | 11     | 0.55 | 2      | 22.7 | 1.7
Kautz graph      |3*2^19| 1    | 19.67  | 1    | 2      | 20.5 | 0.08

All these portal networks have a particular form that simplifies the formula: some of their hubs have degree 1 and thus log deg(h) 0 and can be ignored, and the rest have a fixed degree d. This allows the formula to be simplified to:

H[len] + Avg[len] * flow * log d >= log N

where flow is the fraction of traffic that flows through hubs with degree d.

You can see from this table what kind of tradeoffs different portal networks make:

• A ring portal network decreases average degree at the expense of increasing average route length.
• A DeBruijn portal network maximizes traffic uniformity, putting all traffic through hubs of the same degree.
• A Kautz portal network is similar, but increases the H[len] term to decrease the average route length a bit.
• A Meierfrankenfeld portal network routes more traffic through higher degree nodes, in exchange for greatly decreasing route length.
• Star and bit-flipping are just bad.

The last column shows how “tight” E is to what the theorem says is possible (0 means it couldn’t possibly be any smaller). The fact that many of these different approaches are so close to the boundary of what the theorem says is possible, gives me some confidence that it might be correct.

October 17, 2021 