Data Management using Hierarchical Storage Management (HSM) with 3-Tier Storage Architecture



Similar documents
OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

The Archival Upheaval Petabyte Pandemonium Developing Your Game Plan Fred Moore President

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

EaseTag Cloud Storage Solution

Next Generation NAS: A market perspective on the recently introduced Snap Server 500 Series

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

THE CASE FOR ACTIVE DATA ARCHIVING

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

The Microsoft Large Mailbox Vision

XenData Video Edition. Product Brief:

Save Time and Money with Quantum s Integrated Archiving Solution

Understanding Disk Storage in Tivoli Storage Manager

W H I T E P A P E R T h e C r i t i c a l N e e d t o P r o t e c t M a i n f r a m e B u s i n e s s - C r i t i c a l A p p l i c a t i o n s

DATA PROGRESSION. The Industry s Only SAN with Automated. Tiered Storage STORAGE CENTER DATA PROGRESSION

Implementing a Digital Video Archive Using XenData Software and a Spectra Logic Archive

EMC CLARiiON Backup Storage Solutions

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Reducing Storage TCO With Private Cloud Storage

Long term retention and archiving the challenges and the solution

Virtualization s Evolution

Protect Microsoft Exchange databases, achieve long-term data retention

Using HP StoreOnce Backup systems for Oracle database backups

IBM System Storage DS5020 Express

Total Cost of Ownership Analysis

Implementing a Digital Video Archive Based on XenData Software

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Many government agencies are requiring disclosure of security breaches. 32 states have security breach similar legislation

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

DAS, NAS or SAN: Choosing the Right Storage Technology for Your Organization

Considerations when Choosing a Backup System for AFS

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Introduction to Optical Archiving Library Solution for Long-term Data Retention

Data Protection Report 2008 Best Practices in Data Backup & Recovery

Pacific Life Insurance Company

XenData Archive Series Software Technical Overview

The future of Storage and Storage Management Using Virtualization to Increase Productivity. Storyflex VISION 2007 Hans Lamprecht NetApp SEE Vienna

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

Protecting enterprise servers with StoreOnce and CommVault Simpana

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

STORAGETEK VIRTUAL STORAGE MANAGER SYSTEM

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

Building a Successful Strategy To Manage Data Growth

How To Back Up A Computer To A Backup On A Hard Drive On A Microsoft Macbook (Or Ipad) With A Backup From A Flash Drive To A Flash Memory (Or A Flash) On A Flash (Or Macbook) On

White Paper. What is IP SAN?

Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

巨 量 資 料 分 層 儲 存 解 決 方 案

Deduplication has been around for several

D2D2T Backup Architectures and the Impact of Data De-duplication

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Implementing an Automated Digital Video Archive Based on the Video Edition of XenData Software

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

A Virtual Tape Library Architecture & Its Benefits

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

IBM Tivoli Storage Manager

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

SAP Running on an EMC Virtualized Infrastructure and SAP Deployment of Fully Automated Storage Tiering

Acme Corporation Enterprise Storage Assessment Prepared by:

AFS Usage and Backups using TiBS at Fermilab. Presented by Kevin Hill

Microsoft Azure Cloud on your terms. Start your cloud journey.

Uncompromised business agility with Oracle, NetApp and VMware

The Modern Virtualized Data Center

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Comparison of Native Fibre Channel Tape and SAS Tape Connected to a Fibre Channel to SAS Bridge. White Paper

XenData Product Brief: SX-550 Series Servers for LTO Archives

IBM System Storage Executive Briefing Center Topics (Tucson)

White paper 200 Camera Surveillance Video Vault Solution Powered by Fujitsu

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Scala Storage Scale-Out Clustered Storage White Paper

DATA ARCHIVING. The first Step toward Managing the Information Lifecycle. Best practices for SAP ILM to improve performance, compliance and cost

How To Protect Data On Network Attached Storage (Nas) From Disaster

Leveraging EMC Fully Automated Storage Tiering (FAST) and FAST Cache for SQL Server Enterprise Deployments

Data center virtualization

Effective Storage Management for Cloud Computing

Protect Data... in the Cloud

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data centre cost savings exercise

Backup and Recovery Redesign with Deduplication

XenData Product Brief: SX-250 Archive Server for LTO

Application Brief: Using Titan for MS SQL

Oracle Database Backup Service. Secure Backup in the Oracle Cloud

IBM CommonStore Archiving Preload Solution

Assessing the OpEx Savings Driven by Nexsan Management Software

A Comparative TCO Study: VTLs and Physical Tape. With a Focus on Deduplication and LTO-5 Technology

Virtual Tape Systems for IBM Mainframes A comparative analysis

Transcription:

P - 388 Data Management using Hierarchical Storage Management (HSM) with 3-Tier Storage Architecture Manoj Kumar Kujur, B G Chaurasia, Sompal Singh, D P Singh GEOPIC, ONGC, Dehradun, E-Mail kujur_mk@ongc.co.in Summary With growing volumes of data in seismic data processing and interpretation for oil exploration, the efficient management of this data has become major factor determining the productivity of this industry. The trade-off between data availability and cost of storage necessitates implementing efficient solutions based on the hierarchy of the data usage and requirements. The paper aims at emphasizing the need of efficient and cost effective method of data storage and management in enhancing efficiency of API cycle of oil exploration. It introduces a solution based on Hierarchical Storage Management based on three tiered architecture of data storage and retrieval, based on hierarchy of online, near-on-line and off-line data to achieve the same. Using this three-tiered architecture, we are able to present a seamless, integrated data availability solution to the Geoscientists at reasonable costs and without much complexity in the method of usage. The HSM shall improve efficiency of data access, storage and backup mechanism. Introduction Oil and Gas Exploration is one of the most data intensive businesses in the world. Seismic data volume is growing exponentially every year with raw data sets as large as several terabytes. Seismic Data Processing & Interpretation have also become complex as more sophisticated and bigger algorithms are used to improve the resolution. Due to this large growth of big data volumes, the expenses of managing the big volumes of data. continue to increase. Up to 80% of exploration data of Oil and Gas Company is unstructured data. One of the biggest challenges they face is managing the vast quantity of data involved in oil and gas prospecting. Today Geoscientist spends lot of their time (up to 60 percent) finding appropriate data from this voluminous collection for use in analysis resulting in wastage of highly costly and productive time of the geoscientists and programmers. Situation warrants some technological solutions to improve this environment In brief the Oil and Gas Companies are facing Challenges in API Cycle arising from Exponential Data Growth, Dat Management Complexity, Ownership Cost, Information Security and Continuity and Regulatory Compliance requirements. 1.1. Key Challenges in API Cycle The amount of seismic data to be managed is growing exponentially because of advancements in exploration, acquisition systems with large number of channels and associated computer technology. Managing, backing up and copying seismic data and migrating it to new media types are expensive and labour- intensive process. The Device and Media obsolescence are very fast. The media degradation is also a very big concern. What is needed, what is not needed (Data Life Cycle management) is a big concern. Backup/Archival policies are not in place. Hard Disk being expensive media, there is a need to create an organization-wide

policy for disk space allocation. There is also a wastage of space due to data redundancy. Current Scenario Isolated pools of storage. Manual Backup to Tape / Robotics tape library. Redundant copies of data. Improper disk space management. Manual handling of data tapes. Most frequent accessed and less frequent accessed data is parked on performance disks. More time spent in data searching. Issues Resulting from Current Scenario Disk is always full everywhere. Demand for more and more storage. Low Disk utilization. Reduced productivity. Delayed project due to non-availability of free disk space. Delay in decision-making. Loss of data due to poor reliability backups. Slow data backup/restore. 1.2. Need for Data Management With volumes of stored data growing seemingly without limits, organizations are struggling to meet their ongoing storage demands. This needs better approach of data management. Effective Exploration Data Management is Critical to Success and leads to faster recovery and has a direct impact on operating efficiency and profitability of oil and gas companies. While the acquisition cost of high-performance disk storage continues to decrease, the cost of maintaining and managing large volumes of primary storage makes a storage strategy based solely on disk increasingly undesirable. The only alternative for many has been manually archiving data from primary disk to tape or other forms of storage a time consuming and error prone process that can inhibit or even prevent access to critical data at the time of its requirement of the geoscientists. For example, as data set ages in an archive, it can be automatically moved to a slower but less expensive form of storage. Using an HSM, an administrator can establish and state guidelines for how often different kinds of files are to be copied to a Tiered Storage Device. Once the guideline has been set up, the HSM software manages everything automatically. HSM adds to archiving and file protection for disaster recovery the capability to manage storage devices efficiently, especially in large-scale user environments where storage costs can mount rapidly. It also enables the automation of backup, archiving, and migration to the hierarchy of tiered storage devices in a way that frees users from having to be aware of the storage policies. Older files can automatically be moved to less expensive storage. If needed, they appear to be immediately accessible and can be restored transparently from the backup storage medium. 2.2. Tiered Storage Architecture: Tiered Storage is the assignment of different categories of data to different types of storage media in order to reduce total storage cost. It combines mix of primary storage, less expensive, lower- performance secondary storage, and least- expensive tape (archival) storage. They are pooled in some way to facilitate management of storage as a single set of resources. The Tiered Storage Architecture provides Restorability, Scalability, Data Integrity, Ease of Administration and Cost of Ownership. 2. Technological Solutions Available 2.1. Hierarchical Storage Management (HSM) is policy- based management of data storage, backup and archiving in a way that it uses Tiered Storage Devices economically and without the user needing to be aware of when files are being backed up or retrieved from Tiered Storage Architecture and/or backup storage media Cost comparison shows typical % difference in cost of a configuration of 10 or 20 TB of all Fibre Channel RAID v/s Configuration of same capacity consisting Fibre Channel RAID, SATA and TAPE. 3. Proposed Solution The proposed scheme is based on Hierarchical storage management (HSM) with Tiered Storage Architecture. It aims at hierarchical classification of data used in seismic data processing into three categories

based on their usage requirements storing / automatically migrating them and accordingly HSM (Tivoli Storage Manager) is being procured with PC Cluster system. HSM feature is also available in Western Geco Application Software. Interpretation Environment: Existing 42 TB online Disk Storage System. 20TB Online CXFS Disk Storage. Tier-2 storage ( Capacity disk storage system is to be planned) Tier-3-04 Nos of LTO-1 RTL is being upgraded with LTO-3 Drives. Capacity = 112TB. amongst the tiered storages as mentioned above. Some of the policies for the classification may be based on considerations such as Criticality, Frequency of access, Age of the data etc. 3.1. Example Tier 1 Data (such as mission-critical, recently accessed ongoing projects) might be stored on High Performance Enterprise Storage 146/300 GB @15Krpm (FC Disks) for Online Data Access (Costly storage). Tier 2 Data (such as less critical, less accessed) might be stored on less expensive media in conventional Mid-Range Storage Systems 300/500 GB @ 7.2Krpm FATA/Serial ATA for backups (Cheaper storage). As the tier number increased, cheaper media could be used. Tier 3 In a 3-tier system, the Data might contain event-driven, rarely used, or unclassified files on stored on tapes. Solution Conceived for GEOPIC Processing Environment: Tier-1 Storage Existing 42 TB Online Disk storage system. 20TB Online Disk Storage procured with Altix 3700 BX2 server and 52 TB Online Disk Storage with PC Cluster system. Tier-1 Storage = 42TB+20TB+52TB=114TB. Tier-2 storage Capacity Disk Storage System is planned to be installed after induction of PC Cluster system and Altix 3700 BX2 Server. Tier-3 Storage Existing 26 TB Robotic Tape Library. 1250 TB Robotics Tape Library is also being procured with PC Cluster system. (Tier-3 storage) Tier-3 Storage =1250TB for HSM Solution. Software for Solution

Tier-1 : 42TB+20TB=62TB. Tier-3 : 112TB. Implementation solution for Interpretation will be taken up after Processing Solution is implemented. Tired Storage Technology available in the Market 1. IBM Total Storage Solution 2. EMC DMX Series Tiered Storage solution 3. Hitachi Universal Storage Platform (USP) Tiered Storage Solution 4. HP Tiered storage solution 5. NetApp Tiered Storage Solution The above mentioned technologies are available in the market with storage consolidation feature. The existing disk storage systems shall also be consolidated with Tiered storage. Benefits Automated system-managed storage system and data base administration. Eliminates lost volumes and minimize storage management costs. Proper balance between cost and performance by placing most frequently data on high cost, high performance disks and less frequently accessed data on lower cost, slower performance storage. A single view of storage resources/pool. Concurrent access by multiple applications running on a variety of platforms. Reduced cost Save money with Tiered Storage. Improved Quality of Services. Virtually Unlimited Scalability 3.2. Important Considerations in Implementation Some of the process steps and metrics that should be considered when setting up a tiered storage strategy are: a. User Requirements should cover primary storage capacity, performance and availability and recovery time. e. Data Migration Plans should start with a gap assessment, comparing current data placement with the optimized placement according to the defined data classes and the available storage tiers. Then, determine what can be done with existing infrastructure, and plan new storage tier deployments to support needed capacity growth and functionality. 4. Conclusions If the HSM with Tiered Storage Architecture is properly planned and executed, can increase service levels for critical applications and data sets, while reducing the overall cost of data storage. The proposed solution shall contribute to reduction in API cycle time by: Providing faster data access to the geoscientists. Enlarging the availability of disk spaces. Improving the efficiency of storage utilization. Providing the facility of automated backups. Cope with the phenomenal growth in data through scalability. It may be mentioned that the HSM Solution, initially proposed for implementation at GEOPIC is equally relevant and useful to other Seismic Data Processing Centers (RCC s) and the same can be rolled over to the other locations with experiencing the gains of implementation at GEOPIC. References: SGI : SGI InfiniteStorage Data Migration Facility(DMF). EMC : White Paper No: H2254 Solution for Oil Gas Industry. NetApp : A Three Tiered Storage Architecture. White Paper by Energy Insights : Information Life Cycle Management in Upstream Oil & Gas Industry. b. Data Classes / Storage Tiers should be designed to reflect the user requirements and operating environments of different applications and data sets. It to define a number of data classes, depending on the environment. c. Data Management Policies should support the user requirements for each data class. d. Storage Architecture for each tier may change over time, as technologies evolve. But the architectural design and change control processes must ensure that each tier continues to support the minimum requirements of the data classes it stores, at the lowest practical TCO.

Acknowledgements The authors are thankful to ONGC for providing an opportunity to take up this study. The authors are deeply thankful to Shri S. K. Das, GGM (GP), Head, GEOPIC, ONGC, Dehradun, for providing opportunity and resources to take up the work. Sincere acknowledgement is due to Shri N.Raman, CE(E&T), Computer Services(Hardware), GEOPIC, ONGC, Dehradun, for valuable suggestions during the drafting of this paper. NB: The views expressed in the paper are solely of the authors and do not necessarily reflect the views of the organization which they belong to.