Data Deduplication in a Hybrid Architecture for Improving Write Performance
|
|
|
- Kelly Houston
- 9 years ago
- Views:
Transcription
1 Data Deduplication in a Hybrid Architecture for Improving Write Performance Data-intensive Salable Computing Laboratory Department of Computer Science Texas Tech University Lubbock, Texas June 10th, 2013
2 1 Background and Motivation 2 Data Deduplication in a Hybrid Architecture (DDiHA) 3 Evaluation 4 Conclusion and Future Work
3 Big Data Problem Many scientific simulations become highly data intensive Simulation resolution desires finer granularity both spacial and temporal - e.x. climate model, 250KM 20KM; 6 hours 30 minutes The output data volume reaches tens of terabytes in a single simulation, the entire system deals with petabytes of data The pressure on the I/O system capability substantially increases PI Project On- Line Data Off- Line Data Lamb, FLASH: Buoyancy- Driven 75TB 300TB Don Turbulent Nuclear Burning Fischer, Paul Reactor Core Hydrodynamics 2TB 5TB Dean, David ComputaIonal Nuclear Structure 4TB 40TB Baker, David ComputaIonal Protein Structure 1TB 2TB Worley, Patrick H. Performance EvaluaIon and Analysis 1TB 1TB Wolverton, Christopher KineIcs and Thermodynamics of Metal and 5TB 100TB Complex Hydride NanoparIcles Washington, Warren Climate Science 10TB 345TB Tsigelny, Igor Parkinson's Disease 2.5TB 50TB Tang, William Plasma Microturbulence 2TB 10TB Sugar, Robert LaVce QCD 1TB 44TB Siegel, Andrew Thermal Striping in Sodium Cooled Reactors 4TB 8TB Roux, Benoit GaIng Mechanisms of Membrane Proteins 10TB 10TB Figure 3: Data volume of current simulations Figure 4: Climate Model Evolution: FAR (1990), SAR (1996), TAR (2001), AR4 (2007)
4 Motivation - Gap between Applications Demand and I/O System Capability Gyrokinetic Toroidal Code (GTC) code - Outputs particle data that consists of two 2D arrays for electrons and ions, respectively - Two arrays distributed among all cores, particles can move across cores in a random manner as the simulation evolves A production run with the scale of 16,384 cores - Each core outputs roughly two million particles, 260GB in total - Desires O(100MB/s) for efficient output The average I/O throughput of Jaguar (now Titan) is around 4.7MB/s per node Large and growing gap between the application s requirement and system capability
5 Motivation - Reducing Data Movement For large-scale simulations, reducing data movement over network is essential for performance improvement E.g. active storage demonstrated a potential for analysis applications (read-intensive) - Moving analysis kernel near to data - Transfer small size results over the I/O network What about scientific simulations? - Write a large volume of data - Data need to be stored for future processing and analysis - Storage capacity and bandwidth demands
6 Motivation - Data Deduplication Widely used in backup storage systems to reduce storage requirement Eliminates the duplicated copies of data using a signature representation Identical segment deduplication most widely used - Breaks data or stream into contiguous segments - Utilizes the Secure Hash Algorithm (SHA) to calculate a fingerprint for each segment Impressive performance in terms of both compression ratio and throughput, e.g. up to 20:1 compression ratio
7 Data Deduplication in a Hybrid Architecture (DDiHA) We explore utilizing data deduplication to reduce the data volume for write intensive applications on the data path: Using dedicated nodes (deduplication nodes) for deduplication operations Building a deduplication framework before data is transferred over the network Evaluating the performance with both theoretical analysis and prototyping
8 Decoupled Execution Paradigm DDiHA is designed based on a decoupled system architecture Explores how to use the compute-side nodes to improve the write performance!""#$%&'()*+.'3"%$4'*(5-26,78*(!"#$%982( 0:+&'#+(;136-&'3&%1'(!"#$%&'()"*'+(!"#$%&',+-*'(./&/()"*'+( 0&"1/2',+-*'(./&/()"*'+(!"#$%&'""$&#( )'&*+(,,-(.#'%*/$( )'&*+(,,-(.#'%*/$( Decoupled HPC System Architecture
9 DDiHA Overview Dedicated deduplication nodes serve for compute nodes and connected through high-speed network Deployed with a global address space library to support a shared data deduplication approach Fingerprints are organized in a global shared table (global index) for a systematic view and improving deduplication ratios Writes going through streaming deduplication API (SD API, an enhanced MPI-IO API) Deduplicated data written to storage through a PFS API
10 DDiHA Architecture Compute Nodes Application Application Application Application MPI-IO MPI-IO MPI-IO MPI-IO FS API SD API FS API SD API FS API SD API FS API SD API Data Chunk Buffer Data Chunk Buffer FS- File System SDD-Solid State Drive SD-streaming deduplication Data Deduplication Data Deduplication Global Index based on Global Address Space SSD FS API SSD FS API Deduplication Nodes Network Storage Storage Storage Figure 5: Architecture of DDiHA
11 DDiHA API Two APIs: SD Open & SD Write. (No read) MPI-IO open/write was revised Users need to add prefix SD to the file name when open a file Data will be directed to the deduplication nodes until file is closed sample code MPI Comm com; char *filename="sd:pvfs2:/path/data"; MPI Info info; MPI File fh; MPI File open(com, filename,...)...
12 DDiHA Runtime Uses the identical segment deduplication algorithm for deduplications - Divides data stream into fixed size segments, e.g. 8KB - Computes fingerprints with SHA1 algorithm for each segment - Compared against stored fingerprints or inserted as a new one Deduplication nodes maintain a global fingerprint table Fingerprint table can be implemented as a binary search tree that distributes across different deduplication nodes
13 Theoretical Modeling Aim to answer following questions: What are the performance gain/loss? What are desired configurations for deduplication nodes? Methodology: Assume deduplication nodes are the same with compute nodes Assume the task of an application can be divided into two parts: computation and I/O Compare the performance with traditional system given same resources Compute Node Simulation/Computation that generates the data I/O operation to file system Figure 6: Execution Model of Scientific Applications
14 Theoretical Modeling Execution Model with DDiHA: Deduplication cycle longer than computing cycle Deduplication cycle shorter than computing cycle - Expected, overlapping computation, deduplication, and I/O - Requirement of deduplication efficiency Compute Node Compute Node Data Node: Data Node: Simulation/Computation Data transferring Data Processing Case 1 Case 2 I/O Operation to file system Figure 7: Execution Model with DDiHA Notations: W Workload, data generated by simulation, measured with MB W Output data volume of deduplication C Simulation/Computation capability each compute node for given operation, measured with MB/s. C Deduplication efficiency of each deduplication node for given operation, measured with MB/s. b Average I/O bandwidth of each compute node in current architecture. b h Bandwidth of high speed network between compute nodes and deduplication nodes. r Ratio of data nodes compared to compute nodes. n Number of nodes. k The number of compute-i/o phases of an application. λ b h /b ϕ W /W
15 Theoretical Modeling Traditional Performance: P = 1 T (1) T = ( W n C + W n b ) k (2) Overlapping Requirement: W n r C + W n r b DDiHA Performance: P = 1 T (3) T W = ( n (1 r) C + W ) k n (1 r) b h W n (1 r) C + + W n r C + W n r b W n (1 r) b h (5) (4) Performance gain: = P P = ( W n C + W n b ) k W ( n (1 r) C + W n (1 r) b h ) k = λ(c + b)(1 r) λb + C (6)
16 Outline Background and Motivation Data Deduplication in a Hybrid Architecture (DDiHA) Evaluation Conclusion and Futur Theoretical Analysis Baseline: Platform: Jaguar XT5 - I/O bandwidth per node: 4.67MB/s Application: GTC - Compute capacity at Jaguar XT5: 1.08MB/s r-λ analysis. Condition analysis b=4.7 MB/s λ 4 C=1.1 MB/s b=4.7 MB/s r=0.1 λ= χ C=1.1 MB/s φ r Figure 8: Theoretical analysis about performance gain. Assuming deduplication is efficienct enough for overlapping, DDiHA can achieve better performance by configuring appropriate deduplication nodes (around 10%) C' (MB/s) Figure 9: Theoretical analysis about the requirement of deduplication efficiency. If the deduplication can reduce the data size more than 10%, The deduplication efficiency requirement is no more than 20MB/s, easy to be achieved
17 Platform and Setup Platform: Name DISCFarm Cluster Num. of nodes 16 nodes CPU dual quad-core 2.6GHz Memory 4GB per node Setup: Traditional DDiHA Computing cores Deduplication cores 0 2 Total cores Methodology simulating GTC application behavior Data size 260MB 8GB
18 Deduplication Bandwidth 65MB/s processing capacity/bandwidth without any optimization Meet the basic requirement analyzed from theoretical modeling (20MB/s) Dedup Bandwidth Bandwidth (MB/s ) Data Size (MB) Figure 10: Efficiency of data deduplication.
19 Compression Ratio Analyzed and compared compression ratio Observed different compression ratios (from 0 (dataset4) to 16 (dataset3)) for different data sets. Compression Ra-o of Dedup (Raw:Deduped) Compression Ra-o dataset1 dataset2 dataset3 dataset4 dataset5 Figure 11: Compression ratio of data deduplication for different data sets.
20 I/O Improvement For the traditional paradigm, it measures the time of writing data to the storage For DDiHA, it measures the time of writing data to deduplication nodes (I/O from application s view) I/O Improvement!mes Speedup Data size (MB) Figure 12: I/O improvement from application s view
21 Overall Performance Comparison Measures the total execution time of an application (a simulation of GTC in this work) Both computation and I/O were included, data was transfered to the storage DDiHA reduced more than 3 times than the traditional paradigm Overall Performance Execu&on &me (s) DEDUP CONVENTIONAL Data Size (MB) Figure 13: Overall performance comparison between DDiHA and traditional paradigm.
22 Conclusion and Future Work Big data computing brings new opportunities but also poses big challenges Trading part of computing resources for data reduction can be helpful and critical for system performance An initial investigation of data deduplication on the write path for write-intensive applications - Beneficial because of efficiency and compression ratio Theoretical analysis and prototyping were conducted to evaluate the potential of DDiHA Plan to further explore deduplication algorithms and how to leverage features of scientific datasets to achieve high compression ratio
23 Any Questions? Thank You! Welcome to visit:
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,
Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
Cloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
POSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
Tradeoffs in Scalable Data Routing for Deduplication Clusters
Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC
Turnkey Deduplication Solution for the Enterprise
Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for
Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges
Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
Speeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
Reducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 #1 You can
Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
Managing Storage Space in a Flash and Disk Hybrid Storage System
Managing Storage Space in a Flash and Disk Hybrid Storage System Xiaojian Wu, and A. L. Narasimha Reddy Dept. of Electrical and Computer Engineering Texas A&M University IEEE International Symposium on
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Understanding EMC Avamar with EMC Data Protection Advisor
Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection
IT of SPIM Data Storage and Compression. EMBO Course - August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course - August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
Flexible Storage Allocation
Flexible Storage Allocation A. L. Narasimha Reddy Department of Electrical and Computer Engineering Texas A & M University Students: Sukwoo Kang (now at IBM Almaden) John Garrison Outline Big Picture Part
09'Linux Plumbers Conference
09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center [email protected] 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing
Performance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
Accelerating Server Storage Performance on Lenovo ThinkServer
Accelerating Server Storage Performance on Lenovo ThinkServer Lenovo Enterprise Product Group April 214 Copyright Lenovo 214 LENOVO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER
Data Deduplication HTBackup
Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help
STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside
Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among
Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC
HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical
Violin: A Framework for Extensible Block-level Storage
Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada [email protected] Angelos Bilas ICS-FORTH & University of Crete, Greece
Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos
Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
SMB Direct for SQL Server and Private Cloud
SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid
THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING José Daniel García Sánchez ARCOS Group University Carlos III of Madrid Contents 2 The ARCOS Group. Expand motivation. Expand
Mixing Hadoop and HPC Workloads on Parallel Filesystems
Mixing Hadoop and HPC Workloads on Parallel Filesystems Esteban Molina-Estolano *, Maya Gokhale, Carlos Maltzahn *, John May, John Bent, Scott Brandt * * UC Santa Cruz, ISSDM, PDSI Lawrence Livermore National
HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper
HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all
The assignment of chunk size according to the target data characteristics in deduplication backup system
The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,
VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014
VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup
Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology
Lab Evaluation of NetApp Hybrid Array with Flash Pool Technology Evaluation report prepared under contract with NetApp Introduction As flash storage options proliferate and become accepted in the enterprise,
FAST 11. Yongseok Oh <[email protected]> University of Seoul. Mobile Embedded System Laboratory
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department
WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Redefining Microsoft SQL Server Data Management. PAS Specification
Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing
Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs
Data Reduction: Deduplication and Compression Danny Harnik IBM Haifa Research Labs Motivation Reducing the amount of data is a desirable goal Data reduction: an attempt to compress the huge amounts of
Using Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
Large File System Backup NERSC Global File System Experience
Large File System Backup NERSC Global File System Experience M. Andrews, J. Hick, W. Kramer, A. Mokhtarani National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory
Workshop on Parallel and Distributed Scientific and Engineering Computing, Shanghai, 25 May 2012
Scientific Application Performance on HPC, Private and Public Cloud Resources: A Case Study Using Climate, Cardiac Model Codes and the NPB Benchmark Suite Peter Strazdins (Research School of Computer Science),
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Optimizing IT Data Services
Christopher L Poelker Optimizing IT Data Services US Commission on Cloud Computing Commission on the Leadership Opportunity in U.S. Deployment of the Cloud (CLOUD 2 ) Commission Full Report Text Available
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
DeltaStor Data Deduplication: A Technical Review
White Paper DeltaStor Data Deduplication: A Technical Review DeltaStor software is a next-generation data deduplication application for the SEPATON S2100 -ES2 virtual tape library that enables enterprises
EFFICIENT GEAR-SHIFTING FOR A POWER-PROPORTIONAL DISTRIBUTED DATA-PLACEMENT METHOD
EFFICIENT GEAR-SHIFTING FOR A POWER-PROPORTIONAL DISTRIBUTED DATA-PLACEMENT METHOD 2014/1/27 Hieu Hanh Le, Satoshi Hikida and Haruo Yokota Tokyo Institute of Technology 1.1 Background 2 Commodity-based
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Data Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia [email protected],
Metadata Feedback and Utilization for Data Deduplication Across WAN
Zhou B, Wen JT. Metadata feedback and utilization for data deduplication across WAN. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 604 623 May 2016. DOI 10.1007/s11390-016-1650-6 Metadata Feedback
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract
VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE
1 VNX HYBRID FLASH BEST PRACTICES FOR PERFORMANCE JEFF MAYNARD, CORPORATE SYSTEMS ENGINEER 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product
Veeam Best Practices with Exablox
Veeam Best Practices with Exablox Overview Exablox has worked closely with the team at Veeam to provide the best recommendations when using the the Veeam Backup & Replication software with OneBlox appliances.
Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers
Converged storage architecture for Oracle RAC based on NVMe SSDs and standard x86 servers White Paper rev. 2015-11-27 2015 FlashGrid Inc. 1 www.flashgrid.io Abstract Oracle Real Application Clusters (RAC)
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Effective Planning and Use of TSM V6 Deduplication
Effective Planning and Use of IBM Tivoli Storage Manager V6 Deduplication 08/17/12 1.0 Authors: Jason Basler Dan Wolfe Page 1 of 42 Document Location This is a snapshot of an on-line document. Paper copies
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Decentralized Deduplication in SAN Cluster File Systems
Decentralized Deduplication in SAN Cluster File Systems Austin T. Clements Irfan Ahmad Murali Vilayannur Jinyuan Li VMware, Inc. MIT CSAIL Storage Area Networks Storage Area Networks Storage Area Networks
Data Deduplication Background: A Technical White Paper
Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice
WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper
WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length
ECLIPSE Performance Benchmarks and Profiling. January 2009
ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
Blueprints for Scalable IBM Spectrum Protect (TSM) Disk-based Backup Solutions
Blueprints for Scalable IBM Spectrum Protect (TSM) Disk-based Backup Solutions Jason Basler Software Test Architect IBM Technical University/Symposia materials may not be reproduced in whole or in part
Quanqing XU [email protected]. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud
Quanqing XU [email protected] YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines
Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines Mayur Dewaikar Sr. Product Manager Information Management Group White Paper: Symantec NetBackup PureDisk Symantec
EMC XTREMIO EXECUTIVE OVERVIEW
EMC XTREMIO EXECUTIVE OVERVIEW COMPANY BACKGROUND XtremIO develops enterprise data storage systems based completely on random access media such as flash solid-state drives (SSDs). By leveraging the underlying
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
A Data De-duplication Access Framework for Solid State Drives
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science
Flash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.
Flash Storage Roles & Opportunities L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P. Overview & Background Flash Storage history First smartphones, then laptops, then onboard
Hardware Configuration Guide
Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...
Using Synology SSD Technology to Enhance System Performance. Based on DSM 5.2
Using Synology SSD Technology to Enhance System Performance Based on DSM 5.2 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD Cache as Solution...
Estimating Deduplication Ratios in Large Data Sets
IBM Research labs - Haifa Estimating Deduplication Ratios in Large Data Sets Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov Gil Vernik Estimating dedupe and compression ratios some motivation
Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure
Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure Q1 2012 Maximizing Revenue per Server with Parallels Containers for Linux www.parallels.com Table of Contents Overview... 3
Maginatics Cloud Storage Platform for Elastic NAS Workloads
Maginatics Cloud Storage Platform for Elastic NAS Workloads Optimized for Cloud Maginatics Cloud Storage Platform () is the first solution optimized for the cloud. It provides lower cost, easier administration,
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
M710 - Max 960 Drive, 8Gb/16Gb FC, Max 48 ports, Max 192GB Cache Memory
SFD6 NEC *Gideon Senderov NEC $1.4B/yr in R & D Over 55 years in servers and storage (1958) SDN, Servers, Storage, Software M-Series and HYDRAstor *Chauncey Schwartz MX10-Series New models are M110, M310,
An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
Redefining Microsoft Exchange Data Management
Redefining Microsoft Exchange Data Management FEBBRUARY, 2013 Actifio PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft Exchange Data Management.... 3 Virtualizing
MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012
MapReduce and Hadoop Aaron Birkland Cornell Center for Advanced Computing January 2012 Motivation Simple programming model for Big Data Distributed, parallel but hides this Established success at petabyte
Theoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
Read Performance Enhancement In Data Deduplication For Secondary Storage
Read Performance Enhancement In Data Deduplication For Secondary Storage A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pradeep Ganesan IN PARTIAL FULFILLMENT
Globus Striped GridFTP Framework and Server. Raj Kettimuthu, ANL and U. Chicago
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago Outline Introduction Features Motivation Architecture Globus XIO Experimental Results 3 August 2005 The Ohio State University
Protecting Information in a Smarter Data Center with the Performance of Flash
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
