Skimpy-Stash RAM Space Skimpy Key-Value Store on Flash-based Storage YASHWANTH BODDU FQ9316
Outline Introduction Flash Memory Overview KV Store Applications Skimpy-Stash Design Architectural Components Hash Table Directory Design
Introduction This is designed to achieve high throughput and low latency. The distinguishing feature of Skimpy-Stash is the design goal of extremely low RAM footprint at about 1 (± 0.5) byte per key-value pair, which is more aggressive than earlier designs. It uses a hash table directory in RAM to index key value pairs stored in log structured manner on flash.
Q. Our base design uses less than 1 byte in RAM per key-value pair and our enhanced design takes slightly more than 1 byte per key-value pair. In FAWN, even a pointer to a KV pair needs a 4-byte pointer. How can possibly Skimpy-Stash achieve such a low memory cost for metadata? Skimpy-Stash moves most of the pointers that locate each key value pair from RAM to the Flash itself. Resolving hash table collisions using linear chaining and saving the linked lists on flash itself with a pointer in each hash table bucket in RAM pointing to the beginning record of the chain on flash. Enhanced design uses two-choice based load balancing to reduce wide variations in bucket sizes and a bloom filter in each hash table directory slot in RAM for summarizing the records in the bucket.
Q. Skimpy-Stash uses a hash table directory in RAM to index key-value pairs stored in a log-structure on flash. Why are key-value pairs on the flash organized as a log? The key benefit of the log structured data organization on the Flash may be the high write throughput that can be obtained, since all the updates to the data and the metadata are written in a sequential order in the log.
Q. The average bucket size is the critical design parameter that serves as a powerful knob for making a continuum of tradeoffs between low RAM usage and low lookup latencies. Please explain this statement. If the bucket size is small, the linked list will be greater in size and therefore we have more lookup latency. And, if the bucket size is more, linked list will be shorter in size and we have less lookup latency but more the size of the bucket, more RAM space will be utilized which is again a backdrop.
Flash Memory Overview
KV Store Applications Online Multi Player gaming There is a need to maintain server-side state so as to track player actions on each client machine and update global game states to make them visible to other players as quickly as possible. There is also the requirement to store server-side game state (i) resume game from interrupted state if and when crashes occur (ii) offline analysis of game popularity, progression, and dynamics with the objective of improving the game, and (iii) verification of player actions for fairness when outcomes are associated with monetary rewards. Storage Deduplication
Skimpy-Stash Design Coping with Flash Constraints Random Writes Writes less than Flash page size
Design Goals Support low latency, high throughput operations Use flash aware data structures and algorithms Random writes and in place updates are expensive on flash memory, hence they must be avoided and sequential writes should be used to extent possible. Low RAM footprint per key independent of key-value
Architectural Components RAM write buffer RAM Hash Table directory Flash Store
Q. The client [write] call returns only after the write buffer is flushed to flash. Why cannot such a call be acknowledged earlier? It can t be acknowledged earlier because the memory is still volatile in RAM write buffer.
Q. Basic functions: Store, Lookup, Delete explain how these basic functions are executed? Store: A key insert (or, update) operation (set) writes the key-value pair into the RAM write buffer. When there are enough key-value pairs in RAM write buffer to fill a flash page (or, a configurable timeout interval since the client call has expired, say 1 msec), these entries are written to flash and inserted into the RAM HT directory and flash. Lookup: It firsts looks up the RAM write buffer, upon a miss there it lookups the Hash Table directory in RAM and searches the chained key value pair records on flash in the respective bucket. Delete: A delete operation on a key is supported through insertion of null value for that key. Eventually the null entry and earlier inserted values of the key on flash will be garbage collected.
Q. The chain of records on flash pointed to by each slot comprises the bucket of records corresponding to this slot in the HT directory. Please use the figure to describe Skimpy-Stash s data structure. Also explain how lookup, insert, and delete operations are executed.
Hash Table Directory Design Base Design: Resolving hash table collisions using linear chaining, where multiple keys that resolve (collide) to the same hash table bucket are chained in a linked list, and Storing the linked lists on flash itself with a pointer in each hash table bucket in RAM pointing to the beginning record of the chain on flash. Each key-value pair record on flash contains, in addition to the key and value fields, a pointer to the next record (in the order in its respective chain) on flash.
Enhanced Design Load balancing across Buckets Bloom Filter per Bucket
Q. Because we store the chain of key-value pairs in each bucket on flash, we incur multiple flash reads upon lookup of a key in the store. Please explain how this issue can be alleviated. This can be done by periodically compacting the chain on flash in a bucket by placing the valid keys in the chain contiguously on one or more flash pages that are appended to the tail of log.
Q...two-choice based load balancing strategy is used to reduce variations in the number of keys assigned to each bucket. Explain how this is achieved. This strategy is used to reduce variations in the number of keys assigned to each bucket. With a load balanced design for HT directory, each key would be hashed to 2 candidate HT directory buckets using 2 hash functions h1 and h2, and actually inserted into the one that has currently fewer elements.
Q. when the last record in a bucket chain is encountered in the log during garbage collection, all valid records in that chain are compacted and relocated to the tail of the log.. Please explain how garbage is collected. When a certain configurable fraction of garbage accumulates in the log (in terms of space occupied), the pages on flash from the head of the log are recycled- valid entries from the head of the log are written back to end of the log while invalid entries can be skipped. This effectively leads to the design decision of garbage collecting entire bucket chains on flash at a time.
Thank You