PASIG May 12, 2012. Jacob Farmer, CTO Cambridge Computer



Similar documents
Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

Scalable Storage for Life Sciences

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

<Insert Picture Here> Solution Direction for Long-Term Archive

Building Your EDI Modernization Roadmap

ARC VIEW. OSIsoft-SAP Partnership Deepens SAP s Predictive Analytics at the Plant Floor. Keywords. Summary. By Peter Reynolds

Object Storage, Cloud Storage, and High Capacity File Systems

INTEGRATED RULE ORIENTED DATA SYSTEM (IRODS)

Data Management in an International Data Grid Project. Timur Chabuk 04/09/2007

Alternatives to Big Backup

Backup with synchronization/ replication

FAN An Architecture for Scalable, Service-Oriented Data Management

The Future of Data Management

Real World Considerations for Implementing Desktop Virtualization

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Why is the V3 appliance so effective as a physical desktop replacement?

Best Practices for Managing and Monitoring SAS Data Management Solutions. Gregory S. Nelson

Managing Data in Motion

Tips and Best Practices for Managing a Private Cloud

SAN & NAS Virtualization. Yakov Cohen Regional Technology Consultant, AMESA EMC

VIRTUAL REFERENCE PRACTICES IN LIBRARIES OF INDIA

Architecting an Industrial Sensor Data Platform for Big Data Analytics

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Data grid storage for digital libraries and archives using irods

SOFTWARE DEFINED STORAGE IN ACTION

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

Quantum StorNext. Product Brief: Distributed LAN Client

Directions for VMware Ready Testing for Application Software

How To Build A Data Center

ECM Migration Without Disrupting Your Business: Seven Steps to Effectively Move Your Documents

UniFS A True Global File System

EMC: The Virtual Data Center

Data processing goes big

<Insert Picture Here> Oracle Cloud Storage. Morana Kobal Butković Principal Sales Consultant Oracle Hrvatska

DAS (Direct Attached Storage)

Archiving, Indexing and Accessing Web Materials: Solutions for large amounts of data

entigral whitepaper 10 Success Factors for RFID Asset Tracking Deployments

HyperQ Storage Tiering White Paper

Teaching Portfolio. Teaching Philosophy

Digital Asset Management in Museums

Peregrine. AssetCenter. Product Documentation. Asset Tracking solution. Part No. DAC-441-EN38

irods Policy-Driven Data Preservation Integrating Cloud Storage and Institutional Repositories

The case for cloud-based disaster recovery

EMC Invista: The Easy to Use Storage Manager

Digital Asset Management (DAM) Protecting, preserving, retrieving and distributing digital assets

EII - ETL - EAI What, Why, and How!

Accenture Cloud Platform at v3 - the Airbnb or Uber of cloud?

DottsConnected SHAREPOINT 2010 ADMIN TRAINING. Exercise 1: Create Dedicated Service Accounts in Active Directory

Tiburon Master Support Agreement Exhibit 6 Back Up Schedule & Procedures. General Notes on Backups

How To Run A Cloud Computer System

Network Attached Storage. Jinfeng Yang Oct/19/2015

Storage Virtualization

VMware and Primary Data: Making the Software-Defined Datacenter a Reality

Agenda Smal Business Mid-market 2

Business Process Desktop: Acronis backup & Recovery 11.5 Deployment Guide

DISASTER RECOVERY SURVEY PRESENTED BY

C Examcollection.Premium.Exam.34q

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Windows Server 2003 End of Support Options

Red Hat Storage Server

CS6204 Advanced Topics in Networking

Policy Policy--driven Distributed driven Distributed Data Management (irods) Richard M arciano Marciano marciano@un

Understanding Object Storage and How to Use It

The 4 Pillars of Technosoft s Big Data Practice

A Brief Introduction to Apache Tez

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

VDI FIT and VDI UX: Composite Metrics Track Good, Fair, Poor Desktop Performance

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

3 common cloud challenges eradicated with hybrid cloud

Whitepaper Enable Talent Management Through Fusion

On- and Off-Line User Interfaces for Collaborative Cloud Services

Trends in Application Recovery. Andreas Schwegmann, HP

Contingency Planning and Disaster Recovery

The Migration of Microsoft Excel Tools to Next Generation Platforms. Can You Hear the Footsteps? Jeremy Eden ICEAA Conference San Diego, CA June 2015

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Transcription:

Adding Intelligence to Conventional NAS and File Systems: Metadata, Backups, and Data Life Cycle Management PASIG May 12, 2012 Presented by: Jacob Farmer, CTO Cambridge Computer Copyright 2009-2011, Cambridge Computer Services, Inc. All Rights Reserved www.cambridgecomputer.com 781-250-3000

My Background and My Company Jacob Farmer, CTO, Cambridge Computer 25 years experience with data storage My company: Cambridge Computer Founded in 1991 (20 years this July) Roughly 70 people, spread around the country Expertise in data storage Unusual business model like a broker or agent We help our clients select and deploy the appropriate storage technologies There are typically no fees or additional costs to our service Popular business model for higher education and research! 2

Who Are My Clients: People Who Like Free Help and Special Deals Universities Research institutions Independent labs Divisions in the big government labs Libraries, museums, cultural institutions Some government agencies Industry Manufacturing Pharmaceutical Finance Healthcare Oil and gas Etc. 3

Focus Areas for Novel Ways of Managing Data Scientific research, in particular Life Sciences University labs Independent labs Research divisions in pharmaceutical Digital asset management Especially with home grown software applications Especially for institutions with multiple stove-piped DAM systems 4

Our Work: Defining and Refining Use Cases for SRB and IRODS The Cambridge Computer team is working with SRB (Storage Resource Broker) and IRODS (Integrated Rules-Oriented Data System ) to solve common storage management problems: Backup, Life Cycle Management, Collaboration Our goal is to make these platforms easy to deploy and to solve low hanging fruit problems. We are looking for collaborators, potential guinea pigs, and general feedback. Ultimately (later this year or next) we are looking for customers! 5

SRB / IRODS History 1995/97 Storage Resource Broker (SRB) developed and deployed at San Diego Supercomputer Center DICE Group Data Intensive Cyber Infrastructure Academic license used by roughly 200 government and academic applications 2001 SRB forks into commercial version (Nirvana Storage) developed by General Atomics Available as commercial software 10 years of commercial-grade development and deployment 2008 IRODS replaces the Academic SRB (Integrated Rules Oriented Data System) Features integrated rules engine Open source under Berkeley License 6

Fundamental Concepts of IRODS Inventory your files by crawling the file system and making a database entry for each file and directory Associate metadata with directories Apply storage management rules based on file system and extended metadata. Federate storage devices and user directories with a virtual global file system Rules engine runs real-time or in batch. Micro-services routines that are called by the rules engine Make them simple and discreet. Run a bunch of small micro-services together to carry out the full suite of functionality that you seek. Replace or update functionality at a micro-level 7

What Can You Do with IRODS? Everything and nothing!!! It is not an application. It is middleware It is a framework for how to manage data It is not commercial grade software The core of the system is well-written It is not fully documented It is missing features that are critical for most enterprise IT shops The grant money pays for new ideas and new features, not for refining code or adding ho-hum features. 8

Common Pain Points In Storage Management for Research Data Migrating files between storage systems Data Protection (Backup, Replication, Data Integrity) Satisfying NSF Requirement for Data Management Plans Separating important data from not-so-important data and ensuring preservation of important data Finding data: Machines, Users, Applications Especially after it moves Disposing of data Collaboration Cost leveraging lower cost storage devices 9

Problem: Conventional File System Metadata is Insufficiently Descriptive Problem -- Conventional file systems are not descriptive enough for defining policies or for describing data beyond a single individual s memory. Solution Associate descriptive metadata with files, and apply data management policies based on that metadata. \\myserver\mydirectory\stuff\ \\myserver\mydirectory\copy_of_stuff\ \\myserver\mydirectory\more_stuff_do_not_delete\ \\myserver\mydirectory\yet_more_stuff_save_for_comparison\ \\myserver\mydirectory\raw_results_from_experiment3_run23_march-10\ 10

Problem: When Data Moves, Things Break If you move someone else s data, they may never find it again. At the very least they will complain vocally If you move data, you may break essential links to metadata Some content management applications know how to move files or can be updated when files are moved. Others cannot. Often data tracking applications are written by amateur programmers who are 100% on algorithms not data management. Users may have created applications as simple as spreadsheets that reference UNC paths. Often complex files contain hard links to other files or objects 11

Problem: How Do You Get the Metadata Can you extract it from tags embedded in the files? Can you infer it from the way or the frequency that data is used? Can you pick it up at the point of creation? Can you capture it at various stages in a pipeline? Can you get the users to do data entry? How? Can you beat it out of them? (the stick) Can you give them incentives (the carrot) Some combination of both? 12

File System Middleware 13

Typical Content Management Stack 14

Typical Content Management Stack with Conventional Data Protection 15

Inserting File System Middleware 16

Where Do We Live: In-Band or Out-ofBand? In-Band: If the solution sits in the data path it will introduce latency. This is okay for: Archiving solutions Desktop file access WAN file access where WAN latency dwarfs the virtualization layer s latency Other applications where performance does not matter Out-of-Band: No impact on performance, but Some lag time for the system to synchronize Lots of file system crawling Need really slick user interfaces to entice users to embrace the system. Need some kind of carrot/stick mechanisms to get users to your bidding 17

Where Do We Sit: In Band or Out of Band? Our initial goal is to sit outside of the data path Unobtrusive If our product breaks, we don t take systems down Quality assurance is also a lot easier Research computing will not tolerate in-band latency Someday we hope to sit in the data path FUSE NFS/CIFS Ideally, we would be a hybrid of both 18