For a long time already, the view that random memory access is uniformly fast has been too simplistic. Modern processors employ a complicated hierarchy of caches to hide memory latency issues, due to the gap between processor speed and memory speed. As a result, memory access patterns of an algorithm influence its run-time to a large degree. Tuning an algorithm for maximum performance across different machines, and even for a single machine, has turned out to be challenging due to the large number of parameters which influence how data is moved between caches.
Cache-oblivious algorithms and data structures have been proposed as a solution. Their goal is to eliminate hardware parameters like cache size and cache-line length from the picture by moving data between different levels of the storage hierarchy in an optimal way. However, despite an increasing number of papers on the subject, little implementations are available.
The aim of this project is to close this gap by implementing well-known cache-oblivious (and cache-aware) data structures, and assess their behavior in real-world situations.
The idea is to extend previous work and continue with questions which have been identified in past experiments.