ILM: Tiered Services & The Need For Classification Edgar StPierre, EMC 2 SNW San Diego April 2007
SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. 2
About SNIA and the DMF About the Storage Networking Industry Association (SNIA) SNIA s primary goal is to ensure that storage networks become complete and trusted solutions across the IT community For additional information about SNIA see www.snia.org SNIA s Dictionary of Storage Networking Terminology is online at www.snia.org/dictionary About the Data Management Forum (DMF) Founded in 2004, the Data Management Forum is a sub-group of SNIA specializing in data management and protection throughout the lifecycle of information. More information about the DMF including resources on data and information lifecycle management can be found at www.snia-dmf.org 3
Agenda Introduction What is ILM Tiers of service with service level management Classification Section 1 Tiers of Service What are tiers of service? What technologies compose a tier? What tiers do you need in your Data Center? What is the business value of tiered storage? Section 2 Data Classification Why classify data? Using Information to drive classification Aligning Classified Data with Available Resources Section 3 Implementation Considerations Unstructured Files in file systems Semi-structured Email messages in Email systems Structured Database records Section 4 Automated Classification 4
Abstract Tiered Services & The Need For Classification Establishing an ILM strategy is a high priority for many organizations as they struggle to deal with data growth and regulatory compliance. The definition of tiered services is an important part of that ILM strategy. Two keys to a successful tiered service implementation are: Resource classification - organizing your storage-related services into tiers that will meet the requirements of your business. Data Classification - if you don't know what you have or how valuable it is, it s very hard to decide what data should be placed on a particular service tier. In this tutorial we will look at how to think about organizing your storage-related service tiers based on your needs, and at data classification techniques and emerging technologies that can help you manage the placement of data across different tiers of service, or the movement of data across storage within the same service tier. 5
Information Growth Save everything 161 Exabytes in 2006 988 Exabytes in 2010 Source: IDC The Expanding Digital Universe Major new drivers Digital images, voice and TV Growth for both size and units Beyond size : 70% created by individuals Organizations responsible for managing 85% of: Security Privacy Reliability Compliance 6
Increased Requirements on IT for Service Delivery Senior IT Management: Optimize storage costs with a flat budget Legal Counsel: Need to find the right information at the right time for legal and patent issues Security Officer: Manage enterprise-wide information security and risk Compliance Officer / Records Information Manager (RIM): Comply with government and corporate regulations for retention and access Chief Risk Officer: Addressing corporate risk management Business users: The average knowledge worker spends six hours per week searching for information* *Sources: IDC; Kahn/Blair 7
Mitigation: ILM and Tiers of Service Key characteristics of ILM: Standard configurations to form tiers of service (Service Catalog) Data classification to organize data based on information requirements Information Requirements Guides Standard Configurations Service Catalog Capabilities Gold DR Align Silver Defines Bronze 8
Section 1 - Tiers of Service What are Tiers of Service? What technologies compose a Tier? What do you need in your Data Center? What is the Business Value of Tiered Services? 9
Service Catalog a customer sample Primary Storage Secondary Storage Operational Recovery (OR) Disaster Recovery (DR) Guaranteed Performance Availability Performance Availability Alignment Attributes Scheme Retention & Disposition Accessibility Data Integrity Offsite Recovery Granularity Recovery Point Objective (RPO) Recovery Time Objective (RTO) Recoverability Retention period Recovery Point Objective (RPO) Recovery Time Objective (RTO) Specification Performance throughput per port (I/O sec) Response time (ms) Maximum unplanned downtime per year (mins) Response time Throughput Maximum downtime (year) Retention period Data shredding compliance Read access frequency Guarantee of authenticity Recovery point objective Recovery granularity Amount of data loss Time to restore data Ability to recover backed up data Time data is retained Amount of data loss Time to restore data Tier 1 Tier 2 Tier 3 Tier 4 5,000+ 3,500 5,000 1,500 3,500 < 8ms 7-14ms 12-30ms < 26.5 < 1 second < 1 second < 24 hours <= 300 Mbps <= 700 Mbps <= 280 Mbps <5.25 mins <52.56 mins < 175.2 hours < 30 years < 10 years < 3 years Yes No No < Hourly > Hourly Daily Yes No No < 1 minute < 28 hours < 38 hours Complete app. restore < 26.5 Complete app. restore < 52.5 File or file sys. restore 1 hour 24 hours 24 hours < 30 minutes < 30 minutes 7 GB/minute 100% 100% 98% 2 hours 24 hours 3 Weeks 0 minutes < 4 hours 24-48 hours < 2 hours <12 hours < 48 hours 500 1,500 12-30ms < 263 File or file sys. restore 30 days.5 GB/minute 95% 15 months 24-48 hours <72 hours Courtesy of EMC 2 Consulting 10
Storage which technology to use? Storage technology recommendations today are obsolete tomorrow Technology driven and constantly changing Some capabilities may be critical to decision-making Best practices have significant influence on success Your data and lifecycle requirements & your existing capabilities must drive How many stages of storage & what technologies to choose 11
Data Protection Technologies Technology recommendations today are still obsolete tomorrow Technology driven and constantly changing Best practices have significant influence on success Critical best practice strategy: Separate your requirements for copies: OR vs. DR vs. Archive Operational Recovery (OR): Save the company $ with fast recovery from logical errors Multiple layers of OR When to fall back to DR copies Disaster Recovery (DR): Is your business continuity plan in place? Leverage OR copies for DR Archive: From the SNIA Online Dictionary for Archive : A collection of data that is maintained as a long-term record of a business, application, or information state. Archives are typically kept for auditing, regulatory, analysis or reference purposes rather than for application or data recovery. http://www.snia.org/education/dictionary/a/#archive Not all archive definitions are the same! Data moved to secondary or tertiary online storage Offline/offsite copy of data A copy of data on backup tapes What are your stakeholders requirements for archive access and protection? 12
What tiers do you need Start from your requirements! Bring in ILM Professional Services to facilitate Include all the information stakeholders Determine how many tiers of service you need The fewer the better! Some services are specific to your needs Then determine solutions to deliver Including how many stages in each data lifecycle Requirements for storage and protection at each stage Solutions to manage the data stage transition Iterate and negotiate Pilot the process with one application Add applications with subsequent iterations 13
Business value of tiered storage Improve efficiency by aligning data with the most appropriate resources Delay or reduce purchase of high end storage Delete data if and when appropriate Improve service delivery by Managing to requirements No more nor less than what s s needed Focusing on fewer configurations Improve scalability of personnel Establish a baseline for measurement 14
Section 2 - Data Classification Why classify data? Using information to drive data classification Aligning classified data with available resources 15
Information Classification: Gathering requirements from stakeholders Corporate information is simply data to the data center Data is what I.T. manages: files, volumes, bits and bytes Information is data with context: decisions are based on information Use a collaborative process to identify information service requirements Use these requirements to define an SLA Line of Business (LOB) information stakeholders: Application performance, availability, recoverability, Staff response time, asset reporting, Cost Corporate information stakeholders: Security officer: Secret, confidential, proprietary, Records Manager: retention time, Compliance officer: authorization, retention, This process enables the IT Organization to: Business Process Analyst App Owner DBA Create a service catalog Create data classification policies Match data with appropriate resources based on its service requirements rements SLA Security Officer Records Legal Manager Information Classification Data Admin IT Architect IT Admins Data Requirements 16
Classifying Data Benefits it will provide: Identify alignment of IT with business priorities Identify gaps in alignment, utilization & management Identify and organize data for regulatory compliance Benefits it could provide: Reduced footprint Improved environmental resource utilization Data that is deleted is no longer an expense and no longer a liability Simplifies introduction of ITIL practices 17
How is Data Classified? Data Classification methods: Classify by business process or application All data assigned same classification Simple; good start; a first approximation Net effect: ranking of applications to service tiers Classify by metadata Time last accessed, owner, file name, path, etc Useful for aligning data to tiers of service E.g., the CEO s s email receives different service than yours Or for placement of data to appropriate stage within a service tiert E.g., Hierarchical Storage Management (HSM) for a file server Classify by content Content-driven alignment of data to service level requirements Added value for Business Intelligence, Compliance & ediscovery 18
Section 3 Classification Implementation Considerations Three primary data types Unstructured Files in file systems Semi-structured Email messages in Email systems Structured Database records Each type presents different challenges Each type requires a different form of virtualization 19
Unstructured Data (Files) Direct access File System Access Redirect Migrate/recall Policy Engine File system 2 File system 3 Tier 1 Tier 2 Tier 3 Tier 4 Caveat: be sure your data protection solution is integrated with your storage tiers! It s s more than just HSM It s s file virtualization! Direct access: Network or NAS File redirect: typically NAS Traditional migrate/recall Policy-driven movement: Internal or external policy engines Age/owner type metadata File content Movement can be 1. Tier-to to-tier tier (bi-directional) 2. To compliance archive 3. To trash 4. To/from SAN or NAS 5. To offsite archive vaults 20
Semi-Structured Data (Email) Email Server Typical email deployment MAPI Tier 1 Tier 2 Tier 3 Policy Engine Trash Selection criteria include To/From type metadata Age & access metadata Content, attachments Actions might include 1.Create single instance store & migrate attachments 2.Send appropriate email to compliance archive 3.Delete email upon expiration 21
Structured Data (RDBMS) Represents significant challenges because: Content & organization is a function of the application Each application is different Must maintain transactional integrity Three basic approaches. Application RDBMS Application Transparency + Policy Application Transparency + Policy Transparency + Policy RDBMS (Tier 1) RDBMS (Tier 2) RDBMS (Tier 1) RDBMS (Tier 2) 22
Structured Data Archiving Represents significant challenges because: Conversion of records to files in long term re-usable format Allow for import to current and future releases of RDBMS Allow for import to different database technologies/products Caveat: Archiving loses database transparency Application RDBMS Transparency + Policy Tier 1 Tier 2 Tier 3 23
Tools Emerging product areas Automated data classification products Create classification schemes Based on metadata, content or both Policy engine Service Level Management Data lifecycle management Managing data based on its classification Catalog for metadata & content Extracted and maintained externally Independently or as extensions of file system metadata Leverage data movement capabilities OS, HSM, Backup, etc. In band and out of band Interfaces to archive devices to set retention, etc. 24
Issues to consider with automated data classification Metadata repository Scalability Automation of data movement Defining policies information classification 25
Continue Your SNIA Education Experience At SNW Attend Hands-On Labs in: Data Classification Key to Service Level Management Data Security and Protection Data Assurance Solutions to Meet Corporate Requirements IP Storage iscsi, Your IP SAN Storage Management Manage Storage or Be Managed By It Storage Virtualization Increasing Productivity Zero to SAN Fibre Channel Connectivity in No Time Sessions begin Monday afternoon, April 16 and continue through Wednesday, April 18. All sessions in Emma/Maggie/Annie, 3 rd Floor of the Hyatt Manchester. Registration at the SNW Registration area 26
DMF s ILM Framework for the Datacenter For more information on SNIA s Data Management Forum (DMF) visit the DMF website at http://www.snia-dmf.org At SNW, see: The Data Management Solutions Center in the Show Exhibit area Information Classification: The Cornerstone to Information Management The Secret Sauce of ILM The Professional ILM Assessment Data Classification Hands On Lab Business Framework Define Business Process Information Network Infrastructure IT Infrastructure Business Requirements Requirements Applications Data Management Services Compute Infrastructure Requirements Goals Management Policies, Instrumentation, Filters Information Management Services Storage Infrastructure ILM Framework 27
Please send any comments on this tutorial to SNIA: trackdatamgmt@snia.org Many thanks to the following individuals for their contributions to this tutorial: Edgar StPierre Bob Rogers Nik Simpson Bill Pierce 28