This is archived content, mostly untouched since 2003. For newer content/updated versions, see netfuture.ch.
This is archived content, mostly untouched since 2003. For newer content/updated versions, see netfuture.ch.

  W and AWaldvogel & Aschwanden
   Roman Pletka
   Nicola Aschwanden
   Lars Waldvogel
    Kinderlieder+Spiele
   Marcel Waldvogel
    Contact
    Publications
    Research
     Memory
     Crossbow
     Fuzzycast
     TERP
     FPX
     Forwarding
     P2P
     Multicast
     VersaKey
     Da CaPo++
    Tutorials
    Classes
    Software
    Archive
    Fun

  

Memory Usage and Layout: Optimizing for Backing Store

Even while memory is continuously getting cheaper and computers are offered with higher memory capacities, the available memory never seems enough. The use of virtual memory (i.e., paging to disk) may help somewhat, but often, performance is slowed beyond human tolerance. As a result of our investigations, we noticed that dynamic data structures are one of the main slow-downs. Frequently, structures that logically belong together and are accessed together are distributed over many virtual memory pages. As a result, each data structure access will require many pages to already be active or newly activated (require a page-in from disk and the eviction of other pages which are currently in-core). Each of these pages will use a significant amount of memory, even though only a few bytes may be used: A massive waste of RAM.

We mostly concentrated on one particular data structure: hash tables or associative arrays, a common feature of modern programming languages, especially scripting languages. For example, all arrays in GNU awk (gawk) are associative. Therefore, even when a programmer tries to access the array in a linear fashion, the accesses are in fact random. Furthermore, both key and value of the hash table are generic gawk variables. This means that each string is in fact only a container for some control information plus a pointer to a string. Together with the deep hash chains (linked lists) used by gawk, many additional pages may be referenced, heavily increasing memory pressure and disk I/O. Even though we discuss gawk here, a similar scheme is used by many other languages supporting hash tables.

While one mechanism would be to change the data structures to colocate the string buffer with the control information, this may not be feasible: changing the variable layout may break

  • reference counting schemes used for garbage collection or
  • cause increased overhead as modifications may require data copying (or waste more memory, a bad choice under tight memory conditions).
In addition, these changes would often require major modifications throughout the code base or incur significant performance penalties during "normal" operation (when memory was not tight). Sane developers will try to avoid these impacts like the devil.

Instead, we proposed the following changes, which were also implemented:

  1. Reduce the average depth of hash chains.
    This was implemented in gawk and with just a single-line change and resulted in significant performance gains under tight memory with negligible increase in memory (more hash buckets will remain empty with shallower chains; but each of these empty buckets only wastes a single pointer, small compared to the other overheads involved.)
  2. Enable locality-aware memory allocation.
    This was implemented in GNU malloc and is available to application programmers. Whenever applications are aware that multiple memory blocks will generally be used together, they can give appropriate hints to the memory management subsystems. The big advantage of this method is its ability to be implemented incrementally. All memory blocks, whether allocated traditionally or with known affinity, can be managed the same. Functions such as free() and realloc() continue to work as usual on both kinds of blocks. Compare this to a change in data structure (e.g., merging two data types into one), which would require every access to them to be inspected and changed, anywhere in the code. The new functions we proposed were named palloc(), which tries to allocate a new block in the same page as another data item, and mmmalloc(), which allows for the allocation of an entire list of blocks, which will be laid out contiguously. The latter function is now available under the names of independent_comalloc() and independent_calloc() in recent versions of the GNU C library (glibc).

Related Publications

MMCN 2003 Efficient Buffer Management for Scalable Media-on-Demand

Participants