This is archived content, mostly untouched since 2003. For newer content/updated versions, see netfuture.ch.

   W and A Waldvogel & Aschwanden 
    Nicola Aschwanden 
    Lars Waldvogel 
     Kinderlieder+Spiele 
    Marcel Waldvogel 
     Contact 
     Publications 
     Research 
     Classes 
      Verteilte Systeme 
      Rechnersysteme 
      CS 423S 
      CS 524S 
      CS 679x 
     Software 
     Archive 
     Fun 
     Quick Links 

  

The Fastest Lookups in the West?

By George Varghese in 1998; minimal (link, typo) updates by Marcel Waldvogel in 2003.

With everyone and everything (from babysistters cooperatives to your favorite drink) building Web Sites, Internet usage has been expanding at a rate more commonly associated with nuclear reactions. Internet Traffic has been increasing because of increased users but also because of the demand for bandwidth intensive data. While email only contributes small change, big ticket items such as video and images can easily require megabytes of data to be transferred. If one thinks of the Internet as a highway system, increased traffic demand can lead to congestion and unhappy users --- unless, the Internet speeds up to meet this increased demand.

Messages on the Internet are ferried by a system of automated post offices, called routers, that are interconnected by communication links. When you send a message it goes across a link to a router. The router looks at the destination address in your message and decides which of its outgoing links it will switch the message to. This process repeats until the message reaches its destination.

To keep up with increased traffic, the speed of links in the Internet core has been increased from 45 megabits ( a megabit is one million bits) per second, into the gigabit (1,000 million bits per second) range. To avoid bottlenecks, however, the routers must also keep pace. Today's fastest routers forward messages at a maximum rate of 100, 000 to 500,000 messages a second. To keep up with communication link speeds in the gigabit range, a router has to forward five million messages a second. To keep up with these speeds, routers must be able to lookup the route associated with a message in a few hundred nanoseconds.

Given this need and the popularity of the Internet, a shootout is taking place in the Wild West of Internet Country, where established network vendors and a flurry of startups are all vying to provide the fastest Internet message forwarding rates. Among the Wild Bill Hickcocks and Wyatt Earps are computer scientists at Washington University in St. Louis, who entered this shooting match with two major inventions for Internet address lookup for which they filed for patents early this year.

Internet address lookup would be simple if we could lookup a IP destination address in a table that lists the output link for each Internet address in the world. In this case, lookup could be done the way we do dictionary lookup (for which computer scientists have well known procedures that they give barbaric names such as hashing). The only catch is that a router would have to keep millions of entries in its database corresponding to the millions of computers on the Internet. To reduce database size, a router database consists of a smaller set of prefixes. This reduces database size, but at the cost of requiring a more complex lookup called longest matching prefix.

A metaphor can explain why prefixes are so popular. Consider a flight database in London. We could list the flights to a thousand U.S. cities in our database. However, suppose most flights to the U.S. hub though Boston, except flights to California that hub through LA. We can reduce the flight database from thousand entries to two prefix entries (USA* --> Boston; USA.CA.* --> LA). We use '*' to denote a wildcard that can match any number of characters. The flip is that a destination city like USA.CA.Fresno will now match both the USA* and USA.CA.* prefixes; we must return the longest match (USA.CA.*).

The Internet address lookup problem is similar except you can forget your individual quirky e-mail address of numbers and letters. The Internet works in a binary world: each destination address is a string of 32 characters (called bits), each of which can be a 1 or a 0 only. Thus routers in the Internet core today have around 40,000 prefix entries instead of possibly millions of possible Internet addresses. A naive lookup solution would compare the address in each packet to each of these 40,000 prefixes to find the longest match which would be much too slow.

The best existing ways of finding prefix matches essentially boil down to breaking up the large data base into sub-data bases of prefixes with the same length. Thus all prefixes of the form USA.MO.* and USA.CA.* are stored in a length 5 database, while a prefix of the form USA.* would be stored in a length 3 database. Then the search proceeds by extracting the appropriate number of characters of the destination address (e.g., 5 for the length 5 database) and trying for a match in each sub-database. Each lookup to a sub-database is a simpler dictionary lookup which can be done in one access to memory. Unfortunately, since there are 32 possible prefix lengths and each search in a sub-database takes one access to slow memory that runs at 100 nsec, these schemes are too slow to work with the next generation of routers.

The Wash U researchers have added two new techniques to this twenty year old problem. The first scheme (due to Professor George Varghese and graduate student V. Srinivasan) recognizes that the existing schemes will work faster if they can search through databases that have a smaller number of distinct prefix lengths. Using a technique with a fancy name, "controlled prefix expansion," their scheme takes an existing database with 32 possible prefix lengths and transforms it like the Genie in Alladin. The old database is transformed into a new database with a much smaller number of prefix lengths. We can then run the existing schemes faster on the new database.

So what's the trick? Lets start by supposing we wake up one morning with an aversion towards odd length prefixes. If we can get rid of all such lengths (e.g., 1, 3, 5, . . 31), then we are left with only 16 possible prefix lengths, a factor of two reduction. But what do we do with a prefix like 101* (the asterisk means any arbitrary combination of bits following 101) of length 3 that we now dislike? But 101* is really equivalent to everything that begins with 101, and it's also equivalent to everything that begins with 1010* or 1011* (because the next bit can be either 0 or 1). Thus we replace 101* (of length 3) with two prefixes of length 4 (1010* and 1011*). Its possible that one of these expanded prefixes collides with an existing prefix at the new length. But that's easy to handle. We throw out the expanded prefix, and let the existing prefix get priority.

While the example only shows how we get rid of odd lengths, it can easily be applied to getting rid of any lengths we dislike. If we think of prefixes as eggs and prefix lengths as baskets, we are essentially increasing the number of eggs but putting these eggs in fewer baskets. Since speed depends on the number of baskets, we are trading memory (more eggs) for speed (fewer baskets). While this a simple trick, we use mathematical optimization techniques to pick which levels we will expand to reduce the memory requirements while retaining the speed improvement.

The second idea (due to Professors Jon Turner, George Varghese, and visiting graduate student Marcel Waldvogel) uses a completely different approach. Instead of searching for prefixes one length at a time, requiring 32 possible attempts, the new scheme divides the problem in half at each stage. Instead of asking if the longest prefix is at length 1, length 2, etc (the naive way) the new scheme is a little like playing twenty questions efficiently.

Suppose we could start by asking the question "Is the longest prefix length in the range from 16..32?" If the answer is no, we know the answer must be in the range 0..16; so we can cut down the range further by asking whether the longest prefix length is in the range 8..16. If the answer is yes, we know the answer must be in the range 16..32, so our next question asks whether the longest match is in the range 24..32. If we repeat this process, we will find the correct length prefix in a very small number of steps. More accurately, for 32 possible prefix lengths, we need only 5 questions, because 2 * 2 . . . five times is equal to 32.

The process of playing twenty questions with prefixes, however, is more tricky than it may appear. It is necessary to add signposts or markers to guide the search to longer lengths. Consider an an example. For clarity, we go back to our original metaphor of prefixes in terms of countries, states and cities. Once again, we have broken up the database by prefix length. We have three real prefixes: USA.* (which is stored in the Length 3 database); USA.CA.* (which is stored in the Length 5 database), and USA.MO.SL (for St. Louis) and USA.CA.LA (which are stored in the Length 7 database). The naive search would require searching through 3 databases. We would like to halve this (a small reduction in our example, but more significant with 32 possible lengths) to searching in two databases.

Initially the set of possible lengths in which the longest prefix may reside could be either 3, 5, or 7. Suppose we start with a search in the Length 5 database. Suppose we are looking for USA.CA.LA. Since we take the first 5 characters and get a match with USA.CA, we can immediately discard the length 3 database (as we have already found a longer match) but we certainly need to search the length 7 database as it may have a longer match. This gives us a simple rule. We start by looking for a match in the database corresponding to the middle length in our current set. If we get a match, we try lengths higher than the middle; if we do not, we try lengths smaller than the middle. This halves the set of lengths we must examine at each step.

To make this work, we must introduce the idea of SIGNPOST entries to lead the search to longer entries. Consider a search for USA.MO.SL in our example. If we did not have the signpost entry shown in the figure, we would not get a match in the length 5 database and our search would look at the Length 3 database and get the wrong answer. Thus we add a fictitious signpost entry corresponding to the first 5 characters of USA.MO.SL (i.e., USA.MO.*) that will ensure that the search will proceed to the Length 7 database. By adding signposts only to crucial search points, it is possible to minimize the memory needed for signposts.

However, signposts can then cause the search to hare away on a wild goose chase. Consider that in the sample example database, we do a search for the destination address USA.MO.KC (for Kansas City). Since we have a signpost in the Length 5 database, we get a match and go to the Length 7 database. But here we don't get a match since we only have USA.MO.SL. The problem is that the signpost USA.MO.* which was added for USA.MO.SL has made us go off on false trail when searching for USA.MO.KC. The real longest prefix of USA.MO.KC is USA.* in the Length 3 database because the signpost USA.MO.* is not a real prefix.

The simple idea used to prevent having to backtrack after following a signpost is as follows: we store the longest matching prefix of a signpost with the signpost. As a result, when we follow the signpost, we remember the information provided by the dotted arrow; in case of failure after following the signpost, we can always go back to the information provided by the dotted arrow. Despite these conceptual intricacies, the final procedure is simple and fast. This scheme has been described in a paper that was presented at the ACM Annual Networking conference in Cannes, France. More details can be found elsewhere on this site

Almost every major router vendor has signed nondisclosure agreements with Washington University to examine the Washington University techniques. Two companies have already nearly finalized license agreements; a third is working on the legal details of the agreement, and a fourth has tenatively implemented one of the schemes and is waiting for a final design review.

On the horizon, Internet routing protocols soon will have to look up 128-bit addresses instead of 32-bit ones, as the Internet prepares to retrofit to serve a global community of users and user devices. In the future your toaster or your Nike shoes could have an Internet address. Some vendors like the Twenty Questions scheme because it scales well to the next generation 128 bit addresses, using only 7 questions to determine an answer. Other vendors like the eggs in fewer baskets scheme because it is simple and fast for the existing Internet.

The Washington University licensing policy has been to price the schemes fairly cheaply (in the low hundred thousands) so that it will not be an obstacle for companies to use the technology, rather than working around the patents. The policy also allows startups to pay in installments and will potentially allow free licensing for reasearch and other non-commercial use. The license includes access to software packages written by graduate student V. Srinivasan. These packages have been optimized to run approximately ten times faster than existing publically availabe routing lookup software when run on a Pentium PC.

So who's winning the big Internet shootout? One has to wait for a few years for the smoke to clear. Inventors of lookup schemes can be thought of as being in the business of designing faster guns. Besides lookups, there is also the problem of implementing router firewalls, which are a set of security checks to make sure that only authorized messages can enter an organization's network. Washington University researchers are firing away at these new problems, and letting the market take care of itself.