About Oak: Efficient ordered in-memory key-value (KV-) maps are paramount for the scalability of modern data platforms. In managed languages like Java, KV-maps face unique challenges due to the high overhead of garbage collection (GC). We present Oak, a scalable concurrent KV-map for environments with managed memory. Oak offloads data from the managed heap, thereby reducing GC overheads and improving memory utilization. An important consideration in this context is the programming model since a standard object-based API entails moving data between the on- and off-heap spaces. In order to avoid the cost associated with such movement, we introduce a novel zero-copy(ZC) API. It provides atomic get, put, remove, and various conditional put operations such as compute (in-situ update). We have released an open-source Java version of Oak. We further present a prototype Oak-based implementation of the internal multidimensional index in Apache Druid. Our experiments show that Oak is often 2x faster than Java's state-of-the-art concurrent skip-list.
About the Talk: The Oak paper and the talk were part of the PPOPP20 conference (one of the top conferences in parallel programming). The Oak is a combination of practical and academic novelties such as (1) techniques for working with off-heap memory, (2) new zero-copy API, (3) concurrency techniques for maps and off-heap accesses, (4) scalability and performance, (5) backward and forward scanning, etc. Talk may include a further work that was done on Oak since PPOPP20. The talk is presented with good background so it can target any audience with some computer science background.
Program committee comment
The paper was presented on PPoPP'20 and impressed the Program Committee. The talk is about the application of the concurrency in off-heap and what problems arise.
Download presentation