LVM2 data recovery. Milan Brož mbroz@redhat.com. LinuxAlt 2009, Brno



Similar documents
LVM and Raid. Scott Gilliland

Typing some stupidities in text files, databases or whatever, where does it fit? why does it fit there, and how do you access there?

Cloning Complex Linux Servers

Setup software RAID1 array on running CentOS 6.3 using mdadm. (Multiple Device Administrator) 1. Gather information about current system.

Chip Coldwell Senior Software Engineer, Red Hat

The Logical Volume Manager (LVM)

Linux Software Raid. Aug Mark A. Davis

How To Set Up Software Raid In Linux (Amd64)

Ubuntu bit + 64 bit Server Software RAID Recovery and Troubleshooting.

Shared Storage Setup with System Automation

Support for Storage Volumes Greater than 2TB Using Standard Operating System Functionality

Installing Debian with SATA based RAID

Enterprise Storage Management with Red Hat Enterprise Linux for Reduced TCO. Dave Wysochanski

Encrypting Your Files. Because nobody else will And would you trust them if they did?

Creating a Cray System Management Workstation (SMW) Bootable Backup Drive

How To Manage Your Volume On Linux (Evms) On A Windows Box (Amd64) On A Raspberry Powerbook (Amd32) On An Ubuntu Box (Aes) On Linux

Backtrack 4 Bootable USB Thumb Drive with Full Disk Encryption

Adaptec ASR-7805/ASR-7805Q/ASR-71605/ASR-71605Q/ASR-71605E/ASR-71685/ASR SAS/SATA RAID Controllers AFM-700 Flash Backup Unit

Softraid: OpenBSD s virtual HBA, with benefits. Marco Peereboom OpenBSD

Red Hat Linux Administration II Installation, Configuration, Software and Troubleshooting

Disk encryption... (not only) in Linux. Milan Brož

Red Hat Enterprise Linux 6 Logical Volume Manager Administration. LVM Administrator Guide

Practical Online Filesystem Checking and Repair

GL-250: Red Hat Linux Systems Administration. Course Outline. Course Length: 5 days

Module 4 - Introduction to XenServer Storage Repositories

Choosing Storage Systems

Redundant Array of Inexpensive/Independent Disks. RAID 0 Disk striping (best I/O performance, no redundancy!)

Red Hat System Administration 1(RH124) is Designed for IT Professionals who are new to Linux.

How you configure Iscsi target using starwind free Nas software & configure Iscsi initiator on Oracle Linux 6.4

ENTERPRISE LINUX SYSTEM ADMINISTRATION

Linux Powered Storage:

Veritas Storage Foundation UMI error messages for HP-UX

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Navigating the Rescue Mode for Linux

Data storage, backup and restore

These application notes are intended to be a guide to implement features or extend the features of the Elastix IP PBX system.

Configuring Linux to Enable Multipath I/O

Drobo How-To Guide. What You Will Need. Use a Drobo iscsi Array with a Linux Server

2.6.1 Creating an Acronis account Subscription to Acronis Cloud Creating bootable rescue media... 12

Outline. Failure Types

Solaris For The Modern Data Center. Taking Advantage of Solaris 11 Features

Unix System Administration

Snapshot User s Manual (Installation and Operation Guide for Linux)

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

Oracle VM Server Recovery Guide. Version 8.2

Bare Metal Backup And Restore

README.TXT

Intel Rapid Storage Technology (Intel RST) in Linux*

Acronis Backup & Recovery 11.5 Server for Linux. User Guide

10 Red Hat Linux Tips and Tricks

TELE 301 Lecture 7: Linux/Unix file

Add Disk Space to a VM when a Partition is Full

Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER

Storage Fusion Architecture. Multipath (v1.8) User Guide

XenServer Storage Overview. Worldwide Field Readiness. XenServer Storage Overview. Target Audience

Intel Rapid Storage Technology enterprise (Intel RSTe) for Linux OS

CS3210: Crash consistency. Taesoo Kim

Chapter 2 Array Configuration [SATA Setup Utility] This chapter explains array configurations using this array controller.

RedHat (RHEL) System Administration Course Summary

4 Backing Up and Restoring System Software

How To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V

LBNC and IBM Corporation Document: LBNC-Install.doc Date: Path: D:\Doc\EPFL\LNBC\LBNC-Install.doc Version: V1.0

Acronis Backup & Recovery 11.5 Server for Linux. Update 2. User Guide

Replacing a Laptop Hard Disk On Linux. Khalid Baheyeldin KWLUG, September 2015

WES 9.2 DRIVE CONFIGURATION WORKSHEET

Capturing a Forensic Image. By Justin C. Klein Keane <jukeane@sas.upenn.edu> 12 February, 2013

Red Hat Enterprise Linux 7 High Availability Add-On Administration. Configuring and Managing the High Availability Add-On

ovirt and Gluster hyper-converged! HA solution for maximum resource utilization

When Good Disks Go Bad: Dealing with Disk Failures Under LVM

Restoring a Suse Linux Enterprise Server 9 64 Bit on Dissimilar Hardware with CBMR for Linux 1.02

Virtualization and Performance NSRC

Open Source Data Recovery

Using iscsi with BackupAssist. User Guide

Linux Disaster Recovery best practices with rear

Reboot the ExtraHop System and Test Hardware with the Rescue USB Flash Drive

Oracle Linux Advanced Administration

EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

File Systems Management and Examples

On Disk Encryption with Red Hat Enterprise Linux

How To Align A File System In Virtual Environments

COMPUTER FORENSICS. DAVORY: : DATA RECOVERY

NI Real-Time Hypervisor for Windows

COS 318: Operating Systems

File System Forensics FAT and NTFS. Copyright Priscilla Oppenheimer 1

Red Hat Certifications: Red Hat Certified System Administrator (RHCSA)

Backup and Disaster Recovery Restoration Guide

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

Storage Management. in a Hybrid SSD/HDD File system

ucloud server User Guide v3.0 ( )

LifeKeeper for Linux. Software RAID (md) Recovery Kit v7.2 Administration Guide

PARALLELS CLOUD STORAGE

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Veritas CommandCentral Disaster Recovery Advisor Release Notes 5.1

NSS Volume Data Recovery

Symantec Backup Exec 2014 Icon List

Acronis Backup & Recovery 10 Server for Linux. Command Line Reference

Transcription:

LVM2 data recovery Milan Brož mbroz@redhat.com LinuxAlt 2009, Brno

Linux IO storage stack [ VFS ] filesystem [ volumes ] MD / LVM / LUKS / MPATH... [ partitions ] legacy partition table recovery from the bottom up (also benchmark and optimize performance this way) driver / IO scheduler block device layer, iscsi,... HW

Common storage problems Storage failures missing disk (cable, power, network iscsi) bad sectors intermittent HW failures Accidental changes Overwritten metadata Bugs firmware drivers volume manager filesystem

Planning storage (failures) Failures are inevitable Losing data - problem? redundancy (RAID, replication) backups (RAID is NOT a backup) TEST all changes first most problems can be solved without data loss data loss is very often caused by operator error when trying to "fix" the problem

Houston, we have a problem... Don't panic! Think, try to understand the problem. read manual, error messages, logs, Don't make changes before the problem is understood. Backup. Test recovery strategy. Seek advice. paid support, mailing list, IRC

HW failures intermittent / permanent Disks, cables, connectors, power Firmware bugs Operator error (again) wrong cable connection 1) Fix HW 2) recover data Bad sectors use binary backups dd, dd_rescue,...

Driver & disk partition problems Driver / kernel version what changed an update? which version works, which not Legacy partitions fdisk, parted legacy partition table, GPT partprobe refresh in-kernel metadata Gpart guess & recover partition table Check device sizes blockdev fdisk / parted sysfs

MD / LUKS / multipath some pointers MD multiple device (RAID) metadata, configuration mdadm, mdadm.conf cat /proc/mdstat, sysfs LUKS disk encryption cryptsetup luksdump, status Multipath... multipath -ll, multipath.conf

Volume management - metadata Metadata "how to construct device" Where are metadata stored? - on-disk (in first or last sectors) - in configuration file MD on-disk, persistent superblock, handled in kernel LVM2, LUKS on disk, handled in userspace multipath multipath.conf, userspace daemon Recover metadata then data

LVM2 - overview LV logical volume... LV /dev/vg/lv logical volume extent extent extent extent extent... extent Volume Group PV physical volume PV physical volume... PV physical volume /dev/sda /dev/sdb /dev/...

LVM2 on-disk PV format Label 1 sector: signature, PV UUID, metadata area position Metadata Area 1 sector: metadata area header pointer to metadata circular buffer, text format (at least 2 versions of metadata) atomic update 1) write new version 2) update pointer SEQNO sequential number checksum, redundancy, autorepair label metadata area DATA (extents) [padding] LVM2 Physical Volume

LVM2 text metadata example creation_time = description = "Created *after* executing "... vg_test { id = "xxxxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxxxxx" seqno = 25... physical_volumes { pv0 { id = "xxxxxx-xxxx-xxxx-xxxx-xxxx-xx... device = "/dev/sdb1" # Hint only... pe_start = 384 pe_count = 50 # 200 Megabytes } pv1 {... } } logical_volumes { lv1 { id = "xxxxxx-xxxx-xxxx-xxxx-xx......

Metadata backup Archive & backup in /etc/lvm vgcfgbackup, vgcfgrestore Partial mode e.g. vgchange --partial Test mode no metadata updates --test LVM2 & system info / debug lvmdump -vvvv (all commands)

Example: [1/6] missing PV Rescan all devices on system # vgscan Reading all physical volumes. This may take a while... Couldn't find device with uuid 'DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'. Found volume group "vg_test" using metadata type lvm2 Let's check what is on the missing device: # pvs -o +uuid Couldn't find device with uuid 'DhmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'. PV VG Fmt Attr PSize PFree PV UUID /dev/sdb vg_test lvm2 a- 200.00m 0 5KjGmZ-vhc6-u62q-YcSZ-aKX0-bCYP-EXtbzd unknown device vg_test lvm2 a- 200.00m 0 DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp # lvs -o +devices Couldn't find device with uuid 'DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'. LV VG Attr LSize Devices lv1 vg_test -wi--- 100.00m /dev/sdb(0) lv2 vg_test -wi--- 100.00m unknown device(0) lv3 vg_test -wi--- 200.00m /dev/sdb(25) lv3 vg_test -wi--- 200.00m unknown device(25) Resume: lv1 is OK, lv2 is lost, lv3 half is lost.

Example: [2/6] missing PV You can activate only full available LVs now. # vgchange -a y vg_test Couldn't find device with uuid 'DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'. Refusing activation of partial LV lv2. Use --partial to override. Refusing activation of partial LV lv3. Use --partial to override. 1 logical volume(s) in volume group "vg_test" now active for --partial, missing parts are replaced by device specified in /etc/lvm.conf: missing_stripe_filler = "error" default is error (returns IO error on access)

Example: [3/6] missing PV Let's try to recover at least something from lv3. Prepare block "zero" device # dmsetup create zero_missing --table "0 10000000 zero" missing_stripe_filler = "/dev/mapper/zero_missing" # vgchange -a y vg_test --partial... 3 logical volume(s) in volume group "vg_test" now active Always copy volume to another disk, "zero" is replacement, not real disk writes are ignored.

Example: [4/6] missing PV Remove all LVs on missing disk. # vgreduce --removemissing vg_test Couldn't find device with uuid 'DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'. WARNING: Partial LV lv2 needs to be repaired or removed. WARNING: Partial LV lv3 needs to be repaired or removed. WARNING: There are still partial LVs in VG vg_test. To remove them unconditionally use: vgreduce --removemissing --force. Proceeding to remove empty missing PVs. # vgreduce --removemissing vg_test --force Couldn't find device with uuid 'DhmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp'.... Wrote out consistent volume group vg_test # pvs PV VG Fmt Attr PSize PFree /dev/sdb vg_test lvm2 a- 200.00m 100.00m # lvs -o +devices LV VG Attr LSize Devices lv1 vg_test -wi--- 100.00m /dev/sdb(0) Done.

Example: [5/6] missing PV And now... something completely different. "operator error" - old device magically reappears! # vgscan Reading all physical volumes. This may take a while... WARNING: Inconsistent metadata found for VG vg_test - updating to use version 18 Removing PV /dev/sdc (DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp) that no longer belongs to VG vg_test Found volume group "vg_test" using metadata type lvm2 # pvs PV VG Fmt Attr PSize PFree /dev/sdb vg_test lvm2 a- 200.00m 100.00m /dev/sdc lvm2 -- 204.00m 204.00m Fixed automagically SEQNO 18 new metadata version.

Example: [6/6] missing PV Recover the old metadata from backup. # vgcfgrestore -f /etc/lvm/archive/vg_test_01564.vg vg_test Cannot restore Volume Group vg_test with 1 PVs marked as missing. Restore failed. Device marked 'missing' remove flag manually. Edit the backup file:... pv1 {... id = "DHmMDP-bqQy-TalG-2GLa-sh6o-fyVW-3XQ3gp" device = "unknown device" flags = ["MISSING"] # vgcfgrestore -f vg_test_edited.vg vg_test Restored volume group vg_test Done.

Example: [1/3] overwritten PV header Original system configuration (live example) # pvs PV VG Fmt Attr PSize PFree /dev/sda2 system_vg lvm2 a- 7.89G 0 /dev/sdb system_vg lvm2 a- 8.00G 0 # lvs LV VG Attr LSize root system_vg -wi-ao 13.86G swap system_vg -wi-ao 2.04G

Example: [2/3] overwritten PV header

Example: [3/3] overwritten PV header Rescue boot from DVD (log from live example) 1) Check the system state, volumes, partitions... # lvm pvs # lvm lvs -o name,devices # lvm pvs -a # fdisk -l /dev/sdb # lvm pvs /dev/sdb1 # lvm pvs /dev/sdb Resume: there is new partition table on /dev/sdb and new swap partition on sdb1 but correct lvm2 label on sdb is still present (but maybe metadata corrupted). Task: remove partition table and recreate lvm metadata. 2) recover LVM metadata # fdisk /dev/sdb # lvm vgcfgbackup -f /vg system_vg # (remove MISSING flag from /vg file) # lvm pvcreate -u <missing PV UUID> --restorefile /vg system_vg # lvm vgcfgrestore -f /vg system_vg # lvm vgchange -a y system_vg # fsck -f -C /dev/system_vg/root...

Some common problems Duplicated volume UUID Checksum error pvmove & suspended devices udev initrd missing drivers

Storage performance IO scheduler CFQ / deadline / noop per device / default (elevator= ) CFQ for desktop vs avoid for NFS, iscsi, KVM SSD no seek time (noop, deadline) Partitions offset alignment (for SSD, fdisk -u to use sector units) Beware of magic 63 sector offset (legacy DOS) MD & LVM MD chunk size aligned to LVM data offset Readahead

LVM2 storage performance Most values automatically detected now automatic alignment for MD device automatic readahead by default detects IO topology layer (kernel > 2.6.31) Large VG (many PVs) only several metadata areas needed pvcreate metadatacopies 0 Read more in Red Hat Summit presentation by Mike Snitzer https://www.redhat.com/f/pdf/summit/msnitzer_1120_optimize_storage.pdf