True Data Disaster Recovery. PRESENTATION TITLE GOES HERE Matthew Kinderwater icube Development (Calgary) Ltd.



Similar documents
NSS Volume Data Recovery

Acronis True Image 2015 REVIEWERS GUIDE

MRT Advanced HDD Repair and Data Recovery Training Course

Intel RAID Controllers

Taurus Super-S3 LCM. Dual-Bay RAID Storage Enclosure for two 3.5-inch Serial ATA Hard Drives. User Manual March 31, 2014 v1.2

Taurus - RAID. Dual-Bay Storage Enclosure for 3.5 Serial ATA Hard Drives. User Manual

RAID Utility User Guide. Instructions for setting up RAID volumes on a computer with a Mac Pro RAID Card or Xserve RAID Card

NEC ESMPRO Manager RAID System Management Guide for VMware ESXi 5 or later

Chapter 12 Network Administration and Support

Hydra Super-S Combo. 4-Bay RAID Storage Enclosure (3.5 SATA HDD) User Manual July 29, v1.3

How to Start a Data Recovery Business?

is605 Dual-Bay Storage Enclosure for 3.5 Serial ATA Hard Drives FW400 + FW800 + USB2.0 Combo External RAID 0, 1 Subsystem User Manual

1 of :01

RAID Utility User s Guide Instructions for setting up RAID volumes on a computer with a MacPro RAID Card or Xserve RAID Card.

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

2.6.1 Creating an Acronis account Subscription to Acronis Cloud Creating bootable rescue media... 12

Hard Drive Diagnostics With Scott Moulton. Hard Drive Data Recovery Forensics SANS

Perforce Backup Strategy & Disaster Recovery at National Instruments

This chapter explains how to update device drivers and apply hotfix.

MFR IT Technical Guides

Chapter 2 Array Configuration [SATA Setup Utility] This chapter explains array configurations using this array controller.

Chapter 7 Types of Storage. Discovering Computers Your Interactive Guide to the Digital World

ATOLA INSIGHT A New Standard in Data Recovery Technology

data recovery specialists

Difference between Enterprise SATA HDDs and Desktop HDDs. Difference between Enterprise Class HDD & Desktop HDD

User Manual. For more information visit

ATOLA INSIGHT. A New Standard In Data Recovery Equipment

SATA RAID SIL 3112 CONTROLLER USER S MANUAL

Case Study: Quick data recovery using HOT SWAP trick in Data Compass

Silent data corruption in SATA arrays: A solution

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University. (

How to recover a failed Storage Spaces

2.8.1 Creating an Acronis account Subscription to Acronis Cloud Creating bootable rescue media... 16

Open Source and License Source Information

Chapter 10: Mass-Storage Systems

Availability and Disaster Recovery: Basic Principles

Backup & Disaster Recovery Options

XL-RAID-SATA2-USB. User Manual. v.1.2 (January, 2010)

MFR IT Technical Guides

Areas Covered. Chapter 1 Features (Overview/Note) Chapter 2 How to Use WebBIOS. Chapter 3 Installing Global Array Manager (GAM)

UTC3100 and 3170 POS RAID Information

Dual/Quad 3.5 SATA to USB 3.0 & esata External Hard Drive RAID/Non-RAID Enclosure w/fan. User s Manual

June Blade.org 2009 ALL RIGHTS RESERVED

Data recovery Data management Electronic Evidence

Practical issues in DIY RAID Recovery

Q & A From Hitachi Data Systems WebTech Presentation:

Security+ Guide to Network Security Fundamentals, Fourth Edition. Chapter 13 Business Continuity

Pleiades USB/LAN. User Manual. & Installation Guide. External Storage Enclosure for 3.5 Hard Drive. v1.1

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Xanadu 130. Business Class Storage Solution. 8G FC Host Connectivity and 6G SAS Backplane. 2U 12-Bay 3.5 Form Factor

Cloud Attached Storage

Distribution One Server Requirements

This user guide describes features that are common to most models. Some features may not be available on your computer.

Data recovery from a drive with physical defects within the firmware Service Area, unrecoverable using

Business Continuity and Capacity Building

QuickSpecs. Models HP Smart Array E200 Controller. Upgrade Options Cache Upgrade. Overview

Crash Proof - Data Loss Prevention

Kroll Ontrack Data Recovery. Oracle Data Loss: When the best of plans fail

Benefits of Intel Matrix Storage Technology

BlackArmor NAS 110 User Guide

2» 10» 18» 26» PD »

NETWORK SERVICES WITH SOME CREDIT UNIONS PROCESSING 800,000 TRANSACTIONS ANNUALLY AND MOVING OVER 500 MILLION, SYSTEM UPTIME IS CRITICAL.

Thecus N2100 FAQ. 1. NAS Management

Version : 1.0. SR3620-2S-SB2 User Manual. SOHORAID Series

Hydra esata. 4-Bay RAID Storage Enclosure. User Manual January 16, v1.0

Manual IB-3620 Series

Storage Technologies for Video Surveillance

Forensic Imaging of Hard Disk Drives

Price/performance Modern Memory Hierarchy

Deploying a File Server Lesson 2

**Please read this manual carefully before you use the product**

Trends in Application Recovery. Andreas Schwegmann, HP

RAID Implementation for StorSimple Storage Management Appliance

USER S MANUAL.

CASPER SECURE DRIVE BACKUP

Chapter 12: Mass-Storage Systems

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

Disaster Recovery for Small Businesses

About Backing Up a Cisco Unity System

Reboot the ExtraHop System and Test Hardware with the Rescue USB Flash Drive

2 Bay USB 3.0 RAID 3.5in HDD Enclosure

USB3.0/eSATA/1394b-to-SATA II RAID SUBSYSTEM

3.5 Dual Bay USB 3.0 RAID HDD Enclosure

Planning for and Surviving a Data Disaster

USER S GUIDE. MegaRAID SAS Software. June 2007 Version , Rev. B

SAN Conceptual and Design Basics

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

PARALLELS CLOUD STORAGE

Microsoft Exchange 2010 on Dell Systems. Simple Distributed Configurations

Mini-EPic System of RAID

AMD SP5100 Dot Hill SATA RAID Users Manual V100

How To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V

DISASTER RECOVERY. Omniture Disaster Plan. June 2, 2008 Version 2.0

Intel Matrix Storage Console

RAID. User Manual. For more information visit

How to register. Who should attend Services, both internal HP and external

BlackArmor NAS 440/420 User Guide

January 28, Re: Seagate HDD Firmware Update. To all of our customers and partners;

Data Integrity: Backups and RAID

Enterprise-class versus Desktopclass

Transcription:

PRESENTATION TITLE GOES HERE Matthew Kinderwater icube Development (Calgary) Ltd.

Who We Are: Provide data recovery / data redundancy / disaster plan consulting services worldwide. Operate one of the largest data recovery labs in Western Canada. Recipient of multiple data recovery awards. Published in mainstream media, featured educator for eforensics magazine, and received Global News coverage. 2

Presentation Overview: Review of device / array failures. Data Labs vs. Users / Administrators. Real world case study and how labs work. Avoiding / preventing data recovery situations. Data recovery facts and fiction. How to deal with disaster. 3

Failures, Signs, and Symptoms: DEVICE & ARRAY FAILURES 4

Common causes of Single Disk Failures: 40% Bad Sectors (Mechanical & Firmware): Common Cause(s): Heat and age. 20% Firmware: Common Cause(s): Error recording, sector reallocation, performance tweaks, garbage collection (SSD), and inaccurate translation data. 5

Common causes of Single Disk Failures: 20% Mechanical: Common Cause(s): Heat, physical impact and bad sector self-recovery attempts ( head-burn ). 10% Electrical: Common Cause(s): PSU function and age. 10% Logical: Common Cause(s): User and / or software errors. 6

Common RAID Failures: Dependent on a disk failure (previous slide). Another drive member in the array fails during a rebuild process (common in RAID 5/50). RAID controller failures (firmware & cache). Power failures (battery backups). Operating system error(s) (open handles). User error(s) (accidental deletion). 7

Common Mechanical Failures: Excessive amount of bad sectors (freezing). Non-functional read / write head(s) (clicking). Parking Ramp ( Sticktion ) (buzzing drive). Spindle Failure (buzzing drive). Bearing Failure (buzzing drive). Surface Damage (clicking and scratching). 8

Mechanical Assessment: Includes the reading and writing heads, actuator arm, and the platters. Visual Inspection / Specification Assessment Digital High Resolution Microscope Fog (Clean Box) Tests Ring Formation Analysis (RFA) Laser Diffraction Tests Shows the quantity of pitting (surface damage) Shows the quantity of scratches (surface damage) 9

Mechanical Failure: (Head Failure) Client states the drive is now making a clicking noise. 10

Mechanical Failure: (Head Crash) Client states the drive is now making a clicking noise. 11

Mechanical Failure: (Parking Ramp) Cause: This drive was dropped by the client. 12

Mechanical Failure: (Bearing Failure) Cause: This drive was dropped by the client. 13

Mechanical Failure: (Head Crash) Cause: Spindle Seizure, suspect physical impact 14

Mechanical Failure: (Fire & Water) Massive amounts of mechanical and electrical failures. 15

Mechanical Failure: (Fire & Water) Drive involved in a fire (heat and water damage). Drive has been prepped for DP after 4 hours of cleaning. 16

Mechanical Repair: Averages 6 to 12 hours to complete. Availability of donor parts. Quantity and quality of heads and platters. Drive design (parking ramp vs. landing zone). Storage capacity of the hard drive. Location, amount, and type of surface damage. Adaptive parameters. Previous recovery attempts. 17

Mechanical Donors: Drives include numerous sub-models. Each manufacturer site (production factory) typically uses slightly different parts. Can be very expensive depending on the brand, size, and rarity of the hard drive. Often more than a single donor is required to complete a recovery. 18

Mechanical Repair: (Head Assembly) This drive uses a parking zone (no parking ramp). 19

Common Electrical Failures: TVS Diode(s) Motor IC and MCU (Processor) Failures PreAMP Failure USB Bridge Controllers Hardware Encryption: Examples: Initio, Symwave, JMicron, ASMedia ROM Corruption 20

Electrical Failure: (MCU / Processor) Note center of chip is slightly darker. 21

Electrical Failure: (Pre-Amplifier Chip) Note chip bubbles on the bottom right corner. Required a mechanical donor to facilitate recovery. 22

Electrical Assessment: Includes the external PCB and the internal head stack assembly components. Can be performed even if a mechanical failure is present. Thermal camera imaging. Isolated circuit / dependencies circuit test. ROM dump. 23

Electrical Repair: Averages 1 to 2 hours to complete. May involve a mechanical repair. Availability of donor parts. Previous recovery attempts. 24

Common Firmware Failures: Extremely Brand & Model Specific SMART Logging Full Pending / Reallocation Mapping Full P-List, G-List, and T-List Translation Failures Physical to Digital geometric conversion. Corrupt / Unreadable Modules 25

Firmware Assessment: Ability to read all copies of modules. Comparison of unique module copies (SA-A, SA-B, SA-C) and those from a donor. Firmware response (read and process) time. Validity of module content. 26

F3 Firmware Failure Brand: Seagate Problem: Pending Sector Reallocation Cause: Excessive Bad Sectors / Weak Heads 27

ROYL Firmware Failure Brand: Western Digital Problem: Corrupt Modules Cause: Bad Sector (Module 11) 28

Firmware Repair: Averages 1 to 4 hours to complete. May require a working donor drive (hot swap). Commonly requires the use of donor firmware. Complicated by interface type and brand. Requires proper operation of both electrical and mechanical components. 29

Unique Drive Attributes: Commonly known as Track 0, System Area, and sometimes firmware. Defect lists are completely unique to each drive in the production run. Adaptive parameters are specific to a drives operational conditions. 30

Logical Assessment: Shares a close relationship with any Pre- Assessment Variables. Includes manual (offline) assembly of RAID. Identification of the partition and file systems with the quality of file index elements. 31

Logical Repair: Impossible to estimate completion time. Heavily dependent on the Pre-Recovery Assessment Variables. Complicated by degraded RAID volumes and read / write activities. Often results in the same file being recovered numerous times (quality of data varies). Nameless recovery based on file signatures. Used when file and folder asset data is corrupted. 32

Logical (File System) Inclusions: Priority is placed on the recovery of MFS (PC), Catalog (Mac), or Superblocks (Linux). Most file systems contain a backup of the file and folder names (alternate locations). Disregard security (ACL) NTFS or Metadata (Mac) extended attributes. Closed source file systems complicate the recovery process. 33

Limitations of what you can do yourself: DATA LABS VS. USERS 34

Data Labs vs. Users / Administrators: Use specialized hardware and software to assemble an array without the use of a RAID controller. Use DE, DD, ID, and supporting evidence to make proper decisions regarding a recovery. Attempt to reduce and / or remove variables. Always assume a worst case scenario. Never work from source media. 35

Users / Administrators vs. Data Labs: Are limited by readily available hardware and software solutions. Typically unable to use DE, DD, and ID to make proper recovery decisions. Often create more variables. Attempt to solve symptoms not causes. Typically change / manipulate the source medium. 36

How we recover data: REAL WORK CASE EXAMPLE 37

Real Case Example: Client has RAID 5 volume with x4 300GB 320UW SCSI HDD s. Single drive will not power on. RAID volume was offline, clients attempts to rebuild the array to a new drive had failed. Client physically removed array members from the server and did not record / replace the drives back in the correct order. 38

Drive Evaluation Process (DE): Mechanical: A moving part is not working correctly. Electrical: Power, electrical shorts, integrated circuits. Firmware: Manufacturer programming not fully functional. Logical: Partition and file system errors. 39

Drive Duplication Process (DP): Combination of hardware and software technologies. Software is typically used on healthy drives. Specialized hardware is used to obtain partial images from drives which contain bad sectors. Target 1:1 images are stored on a high capacity, high speed storage array. 40

Image Identification Process (ID): Determine offsets (reserved area). Search for partition and file system signatures. Localize and read back RAID controller Disk Control Blocks (DCB s). Allows the lab to confirm all members are present, disk order, and RAID type. 41

ID Process: Typically contains physical drive identification assets. Includes other RAID indicators such as disk order, stripe size, and parity rotation. Eg: DELL (PERC) RAID 1: 42

Pre-Recovery Evidence Assessment: Only relates to information supported by the DE, DD, and ID processes. This assessment determines 80% of what actions are needed and will be taken by the data recovery lab. 43

Pre-Recovery Variables Assessment: Involves information provided by the client. Includes any post-failure recovery attempts. Is often inaccurate and lacks important details which delays or complicates recovery. Is the greatest cost influencer. Directly affects the outcome of data recovery. 44

Major Influential Variables: A manual or automated rebuild attempt was performed and has failed / did not complete. Post-failure read and write operations have been performed. Drive order has been compromised or conflicts with DCB information. File system type. 45

Case Example Summary: A drive had an electrical failure. The array rebuild failed due to bad sectors on existing RAID member. Data recovery services required on the electrically failed drive. Manual RAID assembly written to a new storage array mounted in ESXi to view datastore (closed file system). 46

Keeping your data safe: DATA RECOVERY PREVENTION 47

Why the need for data recovery? Backups are missing, never tested, out of date, or produce corrupt data. Backup restoration time exceeds recovery time (often seen with cloud backup users). Hardware and storage capacity limitations. Limited IT resources (dishonesty, outsourcing, inventory management, and inadequate maintenance). 48

Understanding Data Endurance: Avoid the use of large 2TB + drives in a RAID array. More platters = more heads and therefore increases chance of disk failures. More smaller drives are better than less larger drives (both performance and redundancy). Consider SAS vs. SATA vs. SSD based on the application(s) and speed requirements. Always have both a hot and cold spare. 49

Recommended IT Policies: Choose correct hardware. Proper controller configuration. External audits. Written schedule. Assign IT personal liability. Test backups (virtualization). Standby equipment (hardware). 50

Choosing Correct Hardware: The most critical decision: Avoid green (energy efficient) hard drives for any backup or high-use data storage functions. Understand Enterprise vs. Desktop storage. More drives vs. high density drives preferred. Ensure backplane fully supports SPGIO. RAID controller should support hardware polling and SMART schedule testing using own NIC. Use temperature monitors in your server room. 51

Proper Rack Layout Concepts: Organize according to heat generation: Heat generating devices should be at the top of your rack (UPS, Switch & Router exempt). If possible use 1U spacers between servers and 2U spacers between SAN or NAS devices. Cable management is critical to ensure proper air intake, circulation, and ventilation. Equipment accessible for media replacement with clear view of all LED indicator lights. 52

Proper Controller Configuration: Performance is not everything: Choose the best RAID type which suits the application(s). Challenge manufacturer defaults. Always test new equipment by failing a drive and then document the replacement / rebuild process. Use (local) ISP SMTP server and e-mail to text message address for any / all alerting functions. Always utilize both a hot and cold spare. 53

External Audits: Provides management insight: Determines what is really backed up. Includes virtualization of your backup data. Determine fragility and / or condition of the backup. Calculates an accurate restoration time. Introduces data disaster restoration practices and optimizes recovery workflow. Adds security and privacy controls. 54

Written Schedule: Includes an offline copy of the following: Initial purchase, replacement, and warranty expiration dates. Drive manufacturer, model, serial number, and member assignment(s). Last SMART (or RAID polling) test date. RAID controller, array type, and configuration. 55

IT Liability: Includes an offline copy of the following: Initial purchase, replacement, and warranty expiration dates (both server and drives). Drive manufacturer, model, serial number, and member assignment(s). Last SMART (or RAID polling) test date. RAID controller, array type(s), and configuration(s). Ensure accuracy by manager or third party. 56

What s right and what s wrong: DATA RECOVERY FACTS AND FICTION 57

Data Recovery Facts (Accreditations): The data recovery industry does not have international standards or a governing body (despite any association claims). Data recovery professionals are self-taught. Data recovery is the reverse engineering of what the manufacturer doesn t want you to know. Real world experience is everything. 58

Data Recovery Facts (Yearly Costs): SATA Hardware Imager: $5,000 - $10,000. SA / Firmware Programmer: $10,000 Storage Array: $50,000 Clean Room / Clean Box: $5,000 - $35,000 Insurance: $25,000+ Research & Development: $50,000+ Staff (individual) $125,000+ 59

Data Recovery Facts (Summary): There are only 3 lab recognized providers of data recovery hardware and software. Successful data recovery labs develop their own procedures and tools. New hardware requires lab to continuously purchase new equipment. Data recovery is expensive. 60

Data Recovery Fiction: Freezing a drive may have worked for older, lower density hard drives. It will not work for newer drives / causes data loss. Hitting or knocking a drive never works. Once data is overwritten it s not recoverable (regardless of what the TV show CSI tells us). Clean boxes vs. clean rooms. Wearing a lab coat make us look smarter. 61

Know your options before a lab is involved: DEALING WITH A DATA RECOVERY SITUATION 62

Dealing with Disaster: Immediately reduce and / or remove any read and write operations. Write down what the last execution(s) were. Consult your backup systems to determine last backup date / time. Immediately establish a working data recovery budget before you call a lab. 63

Choosing a Data Recovery Company: Ensure they do not outsource. Do they ship to another lab (referral)? Or perform the actual recovery at their office? Inquire about success rates (by model number) and the time needed to evaluate the storage device. Group or individual? Insist on a no data, no charge policy. Research industry reputation and history. 64

Thanks, and enjoy the conference! MORE INFORMATION: WWW.ICUBEDEV.COM 65