The Impact of Solid State Drive on Search Engine Cache Management



Similar documents
Solid State Drive Architecture

SSD Performance Tips: Avoid The Write Cliff

DELL SOLID STATE DISK (SSD) DRIVES

Benchmarking Cassandra on Violin

Application-Focused Flash Acceleration

Accelerating Server Storage Performance on Lenovo ThinkServer

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Outline. CS 245: Database System Principles. Notes 02: Hardware. Hardware DBMS Data Storage

An Overview of Flash Storage for Databases

Speeding Up Cloud/Server Applications Using Flash Memory

Optimizing SQL Server Storage Performance with the PowerEdge R720

Storage Class Memory and the data center of the future

Managing Storage Space in a Flash and Disk Hybrid Storage System

CSCA0201 FUNDAMENTALS OF COMPUTING. Chapter 5 Storage Devices

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Big Fast Data Hadoop acceleration with Flash. June 2013

Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2

Future Prospects of Scalable Cloud Computing

Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology

Accelerate Oracle Performance by Using SmartCache of T Series Unified Storage

Bringing Greater Efficiency to the Enterprise Arena

Intel Solid- State Drive Data Center P3700 Series NVMe Hybrid Storage Performance

Delivering Accelerated SQL Server Performance with OCZ s ZD-XL SQL Accelerator

Disk Storage & Dependability

Reduce Latency and Increase Application Performance Up to 44x with Adaptec maxcache 3.0 SSD Read and Write Caching Solutions

Solid State Storage in Massive Data Environments Erik Eyberg

Seeking Fast, Durable Data Management: A Database System and Persistent Storage Benchmark

Indexing on Solid State Drives based on Flash Memory

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

Caching Mechanisms for Mobile and IOT Devices

All-Flash Storage Solution for SAP HANA:

Getting the Most Out of Flash Storage

Databases Going Virtual? Identifying the Best Database Servers for Virtualization

White paper. QNAP Turbo NAS with SSD Cache

On Benchmarking Popular File Systems

File System Management

Latency: The Heartbeat of a Solid State Disk. Levi Norman, Texas Memory Systems

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Flash Memory Technology in Enterprise Storage

Fusion iomemory iodrive PCIe Application Accelerator Performance Testing

Web Technologies: RAMCloud and Fiz. John Ousterhout Stanford University

The Use of Flash in Large-Scale Storage Systems.

Energy aware RAID Configuration for Large Storage Systems

Sistemas Operativos: Input/Output Disks

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Oracle Redo Log Performance Issues and Solutions

Virtual server management: Top tips on managing storage in virtual server environments

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Benchmarking Hadoop & HBase on Violin

Solid State Storage in a Hard Disk Package. Brian McKean, LSI Corporation

McGraw-Hill Technology Education McGraw-Hill Technology Education

Understanding Flash SSD Performance

SLIDE 1 Previous Next Exit

Embedded Operating Systems in a Point of Sale Environment. White Paper

Disks and RAID. Profs. Bracy and Van Renesse. based on slides by Prof. Sirer

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

Minimum Hardware Configurations for EMC Documentum Archive Services for SAP Practical Sizing Guide

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

Nasir Memon Polytechnic Institute of NYU

Comparison of Hybrid Flash Storage System Performance

Chapter 2: Computer-System Structures. Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General System Architecture

Solid State Technology What s New?

White Paper. Educational. Measuring Storage Performance

Technologies Supporting Evolution of SSDs

How To Improve Write Speed On An Nand Flash Memory Flash Drive

Memoright SSDs: The End of Hard Drives?

System Architecture. CS143: Disks and Files. Magnetic disk vs SSD. Structure of a Platter CPU. Disk Controller...

Storage Class Memory Aware Data Management

Flash-optimized Data Progression

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Understanding endurance and performance characteristics of HP solid state drives

Price/performance Modern Memory Hierarchy

Implementation of Buffer Cache Simulator for Hybrid Main Memory and Flash Memory Storages

361 Computer Architecture Lecture 14: Cache Memory

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Efficient Parallel Block-Max WAND Algorithm

The Data Placement Challenge

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Transcription:

The Impact of Solid State Drive on Search Engine Cache Management Jiancong Tong Ph.D. candidate at Nankai University Visiting student at University of Melbourne lingfenghx@gmail.com with Gang Wang, Xiaoguang Liu (Nankai University) and Jianguo Wang, Eric Lo, Man Lung Yiu (Hong Kong Polytechnic University) Monash University May 12, 2014

Outline 1 Background and Motivation 2 RQ1: What is the impact of SSD on buffer management? RQ2: How could we deal with that? 1 / 30

Hard Disk Drive Magnetic head Hard Disk Drive (HDD) How does HDD work? [Garcia et al., 2000] Random read latency of HDD Seek time Rotational latency Transfer time Caching technology is used to reduce the latency. 2 / 30

What is a Cache? Small, fast memory used to improve average access time to large, slow storage media. Exploits locality: both spacial and temporal. Almost everything is a cache in computer architecture... 3 / 30

What is a Cache? (Cont.) Cache Hit: the requested data is found in the memory Cache Miss: the requested data is not found in the memory Cache hit Cache miss Hit ratio = #Hits #Memory accesses Miss ratio = Hit ratio + Miss ratio = 1 #Misses #Memory accesses 4 / 30

What is Solid State Drive? Hard Disk Drive (HDD) with magnetic moving head Solid State Drive (SSD) based on semiconductor chips SSD: New Faster (10 100x) HDD with compatible interface Strong technical merits: [Chen et al., 2009] 1 Lower power consumption 2 More compact size 3 Better shock resistance 4 Extraordinarily faster random data access 5 / 30

The Rise of SSD Words of Pioneer (by Jim Gray, 2006) Tape is dead; Disk is tape; Flash is disk; RAM locality is King. SSD in Large-Scale System Architectures Google 2008 (or later) Baidu 2008 Facebook 2010 Myspace 2010 Oracle 2011 Microsoft Azure 2012 Trends Flash memory based SSD is replacing and is going to completely replace HDD as the major storage medium! 6 / 30

Challenges Existing caching policies were originally designed for HDD HDD: Very slow random read (compare to sequential read) Cache design principle: minimize random read However... What now? [Tong et al., 2013] 7 / 30

Research Questions RQ1: What is the impact of SSD on buffer management? Are the existing cache techniques designed for HDD-based search engine still good for SSD-based search engine? What measure(s) should be used to define a good cache policy in this case? If the performance of caching is improved or degraded, what does that mean to the entire system? RQ2: How could we deal with that? What to do if the efficiency of the entire system is affected by such impact? Can we propose better cache policies for SSD-based systems? 8 / 30

1 Background and Motivation 2 RQ1: What is the impact of SSD on buffer management? RQ2: How could we deal with that? 9 / 30

Large-scale experimental study RQ1 can be answered by evaluating the effectiveness of existing caching policies on an SSD-based search engine. Settings Datasets Web documents: 12,000,000 ( 100GB) Queries: 1,000,000 Device SSD: ADATA 256GB SSD HDD: Seagate 3TB 7200rpm System: Apache Lucene Measure: Query time (NOT hit ratio) 10 / 30

Overview of search engine architecture [Wang et al., 2013] 11 / 30

Overview of search engine architecture [Wang et al., 2013] 12 / 30

Index Server and List (Index) Cache Posting Lists (Inverted Lists) Clayton (2, 4, 5, 10, 100, 107,..., 1,000,000) Monash (1, 2, 3, 23,...) Index Server 13 / 30

Which lists should be kept in the cache? Metrics to consider High Frequency First? High Frequency/Size First? 14 / 30

Previous conclusion [Baeza-Yates et al., 2007] Frequency/Size is a better metric (for HDD). The reason On HDD Random read is 130 170x slower than sequential read = List access time is (almost) a constant! It is #access that dominates, not access data = Short lists are favored (more lists can be held) High benefit gained when size is taken into account 15 / 30

How does SSD change the story? [Wang et al., 2013] Differences On SSD Random read is 130 170x slower than sequential read = List access time is (almost) a constant! Random read is only 2 10x slower than sequential read = List access time varies a lot! Long lists have slower access time = Admitting too many short lists offers less benefit 16 / 30

How does SSD change the story? (Cont.) Read access latency of posting lists of varying lengths Conjecture 1 The best cache policy may have changed 17 / 30

Evaluation on Index Server Static List Cache Policy on HDD on SSD Conjecture 1 The best cache policy may have changed Confirmed Do not rely on your knowledge acquired in HDD era. 18 / 30

Is Hit Ratio Reliable? Previously on HDD Now on SSD Reliable? As list access is a constant NO So, can measure the effectiveness of the cache? Conjecture 2 Cache hit ratio might not be reliable. 19 / 30

Is Hit Ratio Reliable? (Cont.) Static List Cache Policy (on SSD) Conjecture 2 Cache hit ratio might not be reliable Confirmed Use query time, not hit ratio 20 / 30

Document Server and Its Cache Document Server 21 / 30

Document Server and Its Cache (Cont.) 22 / 30

Evaluation on Document Server Document retrieval is no longer the bottleneck on document server Conjecture 3 Is it the same case for the whole system? 23 / 30

The Impact on the Entire System Efficiency Time break down for query processing (on SSD) All caches are enabled Conjecture 3 Is it the same case for the whole system? Confirmed Disk accessing is no longer the bottleneck! 24 / 30

1 Background and Motivation 2 RQ1: What is the impact of SSD on buffer management? RQ2: How could we deal with that? 25 / 30

The Impact on the Entire System Efficiency (Cont.) Time break down for query processing (on SSD) All caches are enabled Problems The time spent on intersection and ranking should be reduced Solution Adopting full-term-ranking-cache (FTRC) [Altingövde et al., 2011] and two-term-intersection-cache (TTIC) [Long and Suel, 2005] 26 / 30

Incorporating PLC+FTRC+TTIC 27 / 30

New Static List Caching Policy Can we convert the latencies of random seek and sequential read into a uniform measure? 28 / 30

New Static List Caching Policy (Cont.) Latency-Aware Caching (Static): BLOCK [Tong et al., 2013] For term t, B(t) represents the I/O cost that can be saved during the whole query evaluating process if l(t) is kept in the cache Estimating model and existing static caching policies: New method: BLOCK 1 access = 1 random seek + several sequential reads = 1 random seek + a few equivalent random seeks 29 / 30

New Static List Caching Policy (Cont.) Experimental Results on SSD on HDD Average disk access latency (ms) 16 14 12 10 8 6 4 2 1.5 1.4 1.3 1.2 1.1 1.0 0.9 QTFDF QTF/MECH BLOCK 2560 3072 3584 512 1024 1536 2048 2560 3072 3584 Average disk access latency (ms) 25 20 15 10 5 3.5 3.0 2.5 2.0 1.5 1.0 QTFDF QTF MECH BLOCK 2560 3072 3584 512 1024 1536 2048 2560 3072 3584 Cache size (MB) Cache size (MB) BLOCK outperforms other methods on SSD 1 BLOCK beats QTFDF (by far the best policy on HDD) by 20%-60% 2 BLOCK outperforms QTF/MECH (identical on SSD) too BLOCK is better even on HDD (though not that significantly) 30 / 30

References Altingövde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, Ö. (2011). Second chance: A hybrid approach for dynamic result caching in search engines. In ECIR, pages 510 516. Baeza-Yates, R. A., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. (2007). The impact of caching on search engines. In SIGIR, pages 183 190. Chen, F., Koufaty, D. A., and Zhang, X. (2009). Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In SIGMETRICS/Performance, pages 181 192. Garcia, J. M., Frohock, M., Reding, E., Reding, J. A., Garcia, M. F., DeLuca, S. A., and Whalen, E. (2000). Microsoft SQL Server(tm) 2000 Administrator s Companion. Microsoft Press. Long, X. and Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In WWW, pages 257 266. Tong, J., Wang, G., and Liu, X. (2013). Latency-aware strategy for static list caching in flash-based web search engines. In CIKM, pages 1209 1212. Wang, J., Lo, E., Yiu, M. L., Tong, J., Wang, G., and Liu, X. (2013). The impact of solid state drive on search engine cache management. In SIGIR, pages 693 702.

Thanks for your time! Questions & Comments? For more details of RQ1, please refer to: J. Wang, E. Lo, M, Yiu, J. Tong, G. Wang, X. Liu, The impact of Solid State Disk on Search Engine Cache Management, In SIGIR, 2013, pp. 693-702. For more details of RQ2, please refer to: J. Tong, G. Wang, X. Liu, Latency-Aware Strategy for Static List Caching in Flash-based Web Search Engines, In CIKM, 2013, pp. 1209-1212. Some slides were taken from a presentation ( The Impact of Solid State Drive on Search Engine Cache Management ) of Jianguo Wang.