SMALL INDEX LARGE INDEX (SILT)



Similar documents
SILT: A Memory-Efficient, High-Performance Key-Value Store

DATA STRUCTURES USING C

File Management. Chapter 12

Raima Database Manager Version 14.0 In-memory Database Engine

Physical Data Organization

FAWN - a Fast Array of Wimpy Nodes

Lecture 1: Data Storage & Index

Hypertable Architecture Overview

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Big Data and Scripting. Part 4: Memory Hierarchies

Original-page small file oriented EXT3 file storage system

Cuckoo Filter: Practically Better Than Bloom

Bigtable is a proven design Underpins 100+ Google services:

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Speeding Up Cloud/Server Applications Using Flash Memory

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The What, Why and How of the Pure Storage Enterprise Flash Array

DATABASE DESIGN - 1DL400

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Cassandra A Decentralized, Structured Storage System

Unit Storage Structures 1. Storage Structures. Unit 4.3

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Data processing goes big

Scalable Prefix Matching for Internet Packet Forwarding

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Storage in Database Systems. CMPSCI 445 Fall 2010

Data Warehousing und Data Mining

Binary Trees and Huffman Encoding Binary Search Trees

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Spam Detection Using Customized SimHash Function

Storage Management for Files of Dynamic Records

COS 318: Operating Systems. File Layout and Directories. Topics. File System Components. Steps to Open A File

DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Benchmarking Cassandra on Violin

Databases and Information Systems 1 Part 3: Storage Structures and Indices

International Journal of Advanced Research in Computer Science and Software Engineering

Storage and File Structure

PROBLEMS #20,R0,R1 #$3A,R2,R4

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Comparing SQL and NOSQL databases

GUJARAT TECHNOLOGICAL UNIVERSITY, AHMEDABAD, GUJARAT. Course Curriculum. DATA STRUCTURES (Code: )

Symbol Tables. Introduction

Stateful Inspection Firewall Session Table Processing

In-Memory Databases MemSQL

What is new in syslog-ng Premium Edition 4 F1

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Lecture 2 February 12, 2003

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

Query Processing C H A P T E R12. Practice Exercises

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Structure for String Keys

Chapter 8: Structures for Files. Truong Quynh Chi Spring- 2013

10CS35: Data Structures Using C

File Management. Chapter 12

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Storage Systems Autumn Chapter 6: Distributed Hash Tables and their Applications André Brinkmann

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Chapter 12 File Management

Chapter 12 File Management. Roadmap

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki

these three NoSQL databases because I wanted to see a the two different sides of the CAP

Image Compression through DCT and Huffman Coding Technique

Merkle Hash Trees for Distributed Audit Logs

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Chapter 13: Query Processing. Basic Steps in Query Processing

Mimer SQL Real-Time Edition White Paper

Binary Heap Algorithms

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Chapter 12 File Management

File Management Chapters 10, 11, 12

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

Big Data and Scripting map/reduce in Hadoop

Accelerate Cloud Computing with the Xilinx Zynq SoC

1. Stem. Configuration and Use of Stem

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

A Deduplication File System & Course Review

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Transcription:

Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song Jiang Amith Nagaraja Fs7045

Summary: SILT (Small Index Large Tables) A Memory Efficient High Performance Key Value Store SILT is Small Index Large Tables which is a memory efficient high performance key value store system based on flash storage. SILT mainly focuses on: The design and implementation of three basic key-value stores (LogStore, HashStore, and SortedStore) that use new fast and compact indexing data structures (partial-key cuckoo hashing and entropy-coded tries), each of which places different emphasis on memory-efficiency and write-friendliness. Synthesis of these basic stores to build SILT. An analytic model that enables an explicit and careful balance between memory, storage, and computation to provide an accurate prediction of system performance, flash lifetime, and memory efficiency. LogStore: The LogStore writes PUT and DELETE sequentially to flash to achieve high write throughput. Its in-memory partial-key cuckoo hash index efficiently maps keys to their location in the flash log HashStore: Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of displacements in the hash table), SILT freezes the LogStore and converts it into a more memoryefficient data structure. SortedStore: It is a static key-value store with very low memory footprint. It stores (key, value) entries sorted by key on flash, indexed by a new entropycoded trie data structure that is fast to construct, uses 0.4 bytes of index memory per key on average, and keeps read amplification low (exactly 1) by directly pointing to the correct location on flash. Once SILT accumulates a configurable number of HashStores, it performs a bulk merge to incorporate them into the SortedStore

(Q1) Use Figure 4 to explain how a LogStore is converted into a HashStore? Four keys K1, K2, K3, and K4 are inserted to the LogStore. Layout of the file is its insertion order Memory index keeps the offset of each key on flash During the conversion process, SILT first converts LogStore to immutable HashStore. In HashStore, the on-flash data forms a hash table where key are in the same order as the in-memory filter HashStore saves memory over LogStore by eliminating index and reordering key-value pairs from insertion order

(Q2) Once a LogStore fills up (e.g., the insertion algorithm terminates without finding any vacant slot after a maximum number of displacements in the hash table), SILT freezes the LogStore and converts it into a more memory-efficient data structure. Compared to LogStore, what s the advantage of HashStore? Why doesn t SILT create HashStore at the beginning (without first creating LogStore)? HashStore saves memory over LogStore by eliminating the index and reordering the flash (key-value pairs) from insertion order to hash order. Directly writing data to the HashStore results in less efficiency. Hence data is written to LogStore and then it s converted to immutable HashStore to improve the efficiency. (Q3) When fixed-length key-value entries are sorted by key on flash, a trie for the shortest unique prefixes of the keys serves as an index for these sorted data. While a SortedStore is fully sorted, could you comment on the cost of merging a HashStore with a SortedStore? Compare this cost to the major compaction cost for LevelDB? Sorting allows efficient bulk insertion of new data Since new data can be sorted and sequentially merged into existing sorted store, rewriting entire data is avoided which results in less cost In LevelDB, there are more levels and each level is 10 times more in size compared to that of the corresponding lower level. LevelDB undergoes more number of compactions which results in higher cost compared to that of the SILT (Since it has only LogStore, HashStore, and SortedStore).

(Q4) Figure 5 shows an example of using a trie to index sorted data. Please use Figures 5 and 6 to explain how the index of a SortedStore is produced. Key prefixes with no shading-shortest unique prefixes which are used for indexing Shaded part-ignored for indexing since suffix part would not change key location For instance, to lookup a key 10010, follows down to the leaf node that represents 100. As there are three preceding leaf node, index of key is 3 SortedStore uses compact recursive representation to eliminate pointers.

Function is supplied with the lookup key and trie representation string By decoding the encoded next no, SortedStore knows if current node is internal node where it can recurse into its sub tree If lookup key goes to left sub tree, SortedStore recurses into left sub tree whose representation follows given tree representation SortedStore recursively decodes and discards entire left sub tree and recurse into right sub tree. For instance to lookup 10010, SortedStore obtains 3 from the representation. Since first bit is 1, it skips next numbers 2, 1 (It s represented on left sub tree). Now it proceeds to right sub tree. It then reads the next number 1. SortedStore arrives at leaf node by taking left sub tree