SMALL INDEX LARGE INDEX (SILT)

Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song Jiang Amith Nagaraja Fs7045

Summary: SILT (Small Index Large Tables) A Memory Efficient High Performance Key Value Store SILT is Small Index Large Tables which is a memory efficient high performance key value store system based on flash storage. SILT mainly focuses on: The design and implementation of three basic key-value stores (LogStore, HashStore, and SortedStore) that use new fast and compact indexing data structures (partial-key cuckoo hashing and entropy-coded tries), each of which places different emphasis on memory-efficiency and write-friendliness. Synthesis of these basic stores to build SILT. An analytic model that enables an explicit and careful balance between memory, storage, and computation to provide an accurate prediction of system performance, flash lifetime, and memory efficiency. LogStore: The LogStore writes PUT and DELETE sequentially to flash to achieve high write throughput. Its in-memory partial-key cuckoo hash index efficiently maps keys to their location in the flash log HashStore: Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of displacements in the hash table), SILT freezes the LogStore and converts it into a more memoryefficient data structure. SortedStore: It is a static key-value store with very low memory footprint. It stores (key, value) entries sorted by key on flash, indexed by a new entropycoded trie data structure that is fast to construct, uses 0.4 bytes of index memory per key on average, and keeps read amplification low (exactly 1) by directly pointing to the correct location on flash. Once SILT accumulates a configurable number of HashStores, it performs a bulk merge to incorporate them into the SortedStore

(Q1) Use Figure 4 to explain how a LogStore is converted into a HashStore? Four keys K1, K2, K3, and K4 are inserted to the LogStore. Layout of the file is its insertion order Memory index keeps the offset of each key on flash During the conversion process, SILT first converts LogStore to immutable HashStore. In HashStore, the on-flash data forms a hash table where key are in the same order as the in-memory filter HashStore saves memory over LogStore by eliminating index and reordering key-value pairs from insertion order

(Q2) Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of displacements in the hash table), SILT freezes the LogStore and converts it into a more memory-efficient data structure. Compared to LogStore, what s the advantage of HashStore? Why doesn t SILT create HashStore at the beginning (without first creating LogStore)? HashStore saves memory over LogStore by eliminating the index and reordering the flash (key-value pairs) from insertion order to hash order. Directly writing data to the HashStore results in less efficiency. Hence data is written to LogStore and then it s converted to immutable HashStore to improve the efficiency. (Q3) When fixed-length key-value entries are sorted by key on flash, a trie for the shortest unique prefixes of the keys serves as an index for these sorted data. While a SortedStore is fully sorted, could you comment on the cost of merging a HashStore with a SortedStore? Compare this cost to the major compaction cost for LevelDB? Sorting allows efficient bulk insertion of new data Since new data can be sorted and sequentially merged into existing sorted store, rewriting entire data is avoided which results in less cost In LevelDB, there are more levels and each level is 10 times more in size compared to that of the corresponding lower level. LevelDB undergoes more number of compactions which results in higher cost compared to that of the SILT (Since it has only LogStore, HashStore, and SortedStore).

(Q4) Figure 5 shows an example of using a trie to index sorted data. Please use Figures 5 and 6 to explain how the index of a SortedStore is produced. Key prefixes with no shading-shortest unique prefixes which are used for indexing Shaded part-ignored for indexing since suffix part would not change key location For instance, to lookup a key 10010, follows down to the leaf node that represents 100. As there are three preceding leaf node, index of key is 3 SortedStore uses compact recursive representation to eliminate pointers.

Function is supplied with the lookup key and trie representation string By decoding the encoded next no, SortedStore knows if current node is internal node where it can recurse into its sub tree If lookup key goes to left sub tree, SortedStore recurses into left sub tree whose representation follows given tree representation SortedStore recursively decodes and discards entire left sub tree and recurse into right sub tree. For instance to lookup 10010, SortedStore obtains 3 from the representation. Since first bit is 1, it skips next numbers 2, 1 (It s represented on left sub tree). Now it proceeds to right sub tree. It then reads the next number 1. SortedStore arrives at leaf node by taking left sub tree