PETASCALE DATA STORAGE INSTITUTE. SciDAC @ Petascale storage issues. 3 universities, 5 labs, G. Gibson, CMU, PI

Similar documents
SciDAC Petascale Data Storage Institute

Update on Petascale Data Storage Institute

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

CS848 Project Presentation Cloud Personal Data Management. Robert Robinson

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Storage Architectures for Big Data in the Cloud

Physical Data Organization

Parallel Large-Scale Visualization

In Memory Accelerator for MongoDB

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Design and Evolution of the Apache Hadoop File System(HDFS)

W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

DualFS: A New Journaling File System for Linux

Using RDBMS, NoSQL or Hadoop?

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Distributed File Systems

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Data storage Tree indexes

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Storage Management. in a Hybrid SSD/HDD File system

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Implementing Network Attached Storage. Ken Fallon Bill Bullers Impactdata

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

How to Choose Between Hadoop, NoSQL and RDBMS

Apache Hadoop FileSystem and its Usage in Facebook

The Google File System

COS 318: Operating Systems

Running a Workflow on a PowerCenter Grid

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Architectures for massive data management

Use Cases for Large Memory Appliance/Burst Buffer

The World According to the OS. Operating System Support for Database Management. Today s talk. What we see. Banking DB Application

In-Memory Databases MemSQL

Design: Metadata Cache Logging

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

InfiniteGraph: The Distributed Graph Database

BabuDB: Fast and Efficient File System Metadata Storage

The Methodology Behind the Dell SQL Server Advisor Tool

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Planning Domain Controller Capacity

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Archival Storage At LANL Past, Present and Future

Understanding Disk Storage in Tivoli Storage Manager

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Cray: Enabling Real-Time Discovery in Big Data

Network File System (NFS) Pradipta De

Availability Digest. Raima s High-Availability Embedded Database December 2011

Hadoop & its Usage at Facebook

LLNL s Parallel I/O Testing Tools and Techniques for ASC Parallel File Systems

Windows NT File System. Outline. Hardware Basics. Ausgewählte Betriebssysteme Institut Betriebssysteme Fakultät Informatik

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Tivoli Storage Manager Explained

Parallel Processing and Software Performance. Lukáš Marek

Outline. Windows NT File System. Hardware Basics. Win2K File System Formats. NTFS Cluster Sizes NTFS

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

SEIZE THE DATA SEIZE THE DATA. 2015

PostgreSQL Backup Strategies

XenData Archive Series Software Technical Overview

Tuning Your GlassFish Performance Tips. Deep Singh Enterprise Java Performance Team Sun Microsystems, Inc.

Demand Attach / Fast-Restart Fileserver

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Performance Analysis of Mixed Distributed Filesystem Workloads

Recovery Principles in MySQL Cluster 5.1

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

THE HADOOP DISTRIBUTED FILE SYSTEM

Database storage management with object-based storage devices

Distributed storage for structured data

SQL Server 2008 Performance and Scale

Beyond Embarrassingly Parallel Big Data. William Gropp

Speeding Up Cloud/Server Applications Using Flash Memory

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Distributed File Systems

File System Management

DataBlitz Main Memory DataBase System

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

CSE 120 Principles of Operating Systems

Benchmarking Cassandra on Violin

InterBase SMP: Safeguarding Your Data from Disaster

Lecture 11. RFS A Network File System for Mobile Devices and the Cloud

MongoDB Developer and Administrator Certification Course Agenda

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Microsoft SQL Server Beginner course content (3-day)

Google File System. Web and scalability

Configuring Apache Derby for Performance and Durability Olav Sandstå

Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation

File Systems Management and Examples

Quantcast Petabyte Storage at Half Price with QFS!

Transcription:

PETASCALE DATA STORAGE INSTITUTE 3 universities, 5 labs, G. Gibson, CMU, PI SciDAC @ Petascale storage issues www.pdsi-scidac.org Community building: ie. PDSW-SC07 (Sun 11th) APIs & standards: ie., Parallel NFS, POSIX Failure data collection, analysis: ie., CFDR Performance trace collection & benchmark publication IT automation applied to HEC systems & problems Novel mechanisms for core (esp. metadata, wide area) www.pdl.cmu.edu & www.pdsi-scidac.org 1

Earthquake simulation c/o Dave O Hallaron, www.cs.cmu.edu/~quake Parallel supercomputer w/1-10ks PEs DFS Mesh generation Solving x20k timesteps Analysis/ visualization Velocity model Unstructured mesh 4D vel/dis wavefield 10s MB 10-100s GB 1-10s TB www.pdl.cmu.edu & www.pdsi-scidac.org 2

Motivation: Earthquake analysis Serialization of model, mesh and wavefield into one database file Total is generally much too large to be entirely loaded into memory Multiple regions of the file contain data of different types Different types of data see different access patterns B-tree on mesh gets random tiny accesses Mesh descriptors get random medium fetches Wavefields get larger accesses, but concurrency makes it look random Basic problem: file system sees no internal structure information Prefetching, allocation, caching strategies assume all data is of same type Mixed access patterns strongly confuse file system policies Fall back on safe, probably slower policies: no prefetching, serial locking What if file system knew more about compound, out-of-core files? Hints? Tradition is too few codes issue them, too few systems use them, too hard to debug performance implications of hints What alternatives? Partial schema definitions: regions that have different types of access Access Methods: embedded some structures in FS (B-trees, arrays ) www.pdl.cmu.edu & www.pdsi-scidac.org 3

Motivation: file count scales w/ FLOPS? Understanding File Systems at Rest (www.pdsi-scidac.org/fsstats/) Just getting started with data collection, but already see lots of tiny files Manipulation of small files, and attributes of small files is a growing problem Small things cost mechanical positioning -- orders of magnitude slower Large files may in fact be collections of small things (HDF, netcdf, etc) If sequentially loaded into memory and written from memory, cool But if accessed out-of-core, really much the same problem Gets worse as numbers of things gets larger, if not sequential access www.pdl.cmu.edu & www.pdsi-scidac.org 4

Motivation: HugeDirs -> AttributeDB CMU adding directories of 1B to 1T small files to PVFS Partitioning entries into buckets distributed over all storage servers Building index structure for insert(), delete(), lookup() no range query, but unordered complete scan must be fast (readdir) Making highly parallel, minimal bottlenecks, minimal coherency Believe this is extensible to range queries (concurrent B-trees) More challenging to fully load balance if key access patterns arbitrary Probable next steps to do this however Simple extension to not create actual files, just manipulate records Special directory is key, opaque data tuples Use for extended attributes of small files to optimize to access patterns Ie., type: mp3, code, pdf, checkpoint,. Find all by type Ie., timestamp of last modification, find N oldest Ie,. Size, find N largest Apply powerful aggregation structures as well as indices: ie., fastbits Could this be used for variable stores? www.pdl.cmu.edu & www.pdsi-scidac.org 5

Motivation: ChangeLog Representation Fastest checkpoint might be a sequential series of variable=value Instead of seeking to serialized location, just append operation to log Each thread writes strictly sequential log of operations Meaning of set of logs is applying log to (possibly null) initial database Decouple writing logs from applying logs to serialized database Optimize each separately; pipeline from compute to IO nodes Defer serializing by just storing changelogs for later application Some checkpoints never read, so never serialized If read before serialized, trigger serialization (or something smarter) Represent logs as attributes of database; that is, hide in FS (directory?) Important tricks: block logs so each can be applied in parallel later; type logs so no overwrite can be known and any order allowed Not always best representation If data sequential anyway, operation encoding probably larger If read intermingled with write, might force inefficient serialization www.pdl.cmu.edu & www.pdsi-scidac.org 6

Issues: Library vs File System Library is user code bound into application File has structure if the right library is bound into app touching it Rapid development of new ease of programming APIs Optimization for actual file system is best guess of library writers Especially a problem if ease of programming was big goal File system is system code independent of application File structure known to file system, facilitating optimization Slower development and proliferation as file system is a broad service FS is always-there service; easy delayed processing, rebalancing, recovery FS is inherently distributed, aware of actual servers, changes in servers Deployment of new file system APIs? Changes in parallel file systems in progress: pnfs, HEC Posix extensions Mitigation for deployment issues: canoncial representations Any file system offering a specialized API also offers: Canonicalization (tar, untar) routines and library implementation written to canonical representation, which does not have to be (as) fast [special case of the backup problem & NDMP solution] www.pdl.cmu.edu & www.pdsi-scidac.org 7

Next steps? Select and work through a few examples www.pdl.cmu.edu & www.pdsi-scidac.org 8