Copyright Tom Kline, Ryan Whyms 2007 This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
Investigating the Use of Virtual Servers to Improve the Restoration Process of an Active Directory Forest MARC07 Tom Kline & Ryan Whyms University of Maryland, Baltimore Center for Information Technology Services
Overview Strategy to Improve the Process Virtualization Explained Active Directory Concepts Forest Disaster Recovery Procedure
Recommendations & Disclaimer Forest Corruption 1 st step call MS Continue to Backup via Tapes Focused on total Forest Restore Is it appropriate for your Environment? Do the Research Proposition has not been tested - WIP Questions?
Forest-Wide Failure What? No Domain Controller (DC) replication Cannot make changes to AD Cannot install new DCs How? Intentional/unintentional Forest Schema Corruption Intentional/unintentional deletion of critical objects Hacking/security breach worm/virus How likely?
Can we Improve the Process? THE Reference MS doc: W2k3 Planning for Active Directory Forest Recovery 1) Introduce Lag Site 2) Replace Tape restore with Virtualization Improve = significantly speed up
Why so long?
Server Virtualization Ryan Whyms
Server Virtualization What is server virtualization? How does virtualization work? Pros and cons of virtualization Myths of virtualization Real world applications
What is Server Virtualization? Server virtualization replaces physical hardware with software Biggest players in the industry VMWare VMWare ESX Server SWSoft Virtuozzo Microsoft Microsoft Virtual Server
How Does Virtualization Work? Software masquerades as hardware A base (or host ) operating system hosts multiple guest operating systems Guest operating systems cannot tell the difference between physical hardware and virtualized hardware
Host & Guest Operating Systems Application Guest OS Virtualized hardware Host OS Actual hardware Each guest OS is completely segregated.
Pros of Virtualization More efficient hardware utilization Many servers only utilize an estimated 5-15% of their full computing potential. Virtualization allows upwards of 85% utilization. Reduced total cost of ownership (TCO) Virtualization allows for a reduction in the use of electricity, cooling requirements, space requirements, and network drop connections. New virtual server rollout is FAST New virtual servers can be put into production in as little as 15 minutes with the proper tools.
Cons of Virtualization Increased Initial Investment The virtualization server will require more basic resources than a typical server, the biggest of which is system memory. Redundancy must be built into a virtualization server to help minimize the chance the server will go down due to failures (e.g. power supply or hard disk drive failure). Single point of failure If the virtualization server suffers a critical failure and goes down, so do all the virtual servers hosted by the now failed virtualization server.
Myths of Virtualization Fewer servers means reduced staff needs! MYTH! Just because one can not physically see a server does not mean it requires no regular maintenance like a physical server. Virtualization is too risky to use! MYTH! Tools exist to minimize the chances of a complete meltdown of the virtualization environment. Vendors won t support a virtualized server! MYTH! A virtual server is no different than a physical server. The only difference as far as software vendors are concerned is the hardware is just different.
Real World Applications Development Environment Instead of purchasing double hardware, virtualize the development server and reduce operating costs & TCO. Simulation Lab Environment Use virtualization to create a simulation lab in which large, complex computing environments can be replicated on a small scale. Useful for testing computing environment changes, such as firewall rules, to see how the environment reacts. Low Use Business Critical Applications Sometimes business critical applications don t see heavy use and waste resources if on dedicated hardware. Placing the application on a virtual server reduces operating costs & TCO.
Active Directory Concepts Active Directory Flexible Single Master Operations (FSMO) Roles Schema master Domain Naming Master RID Master PDC Emulator Infrastructure Master
Active Directory Concepts AD Directory NTDS file Forest/Domain Partitions Schema Configuration Domain Naming Global Catalog Replication pulls changes
Active Directory Concepts Non authoritative Restore Authoritative Restore Directory Services Restore Mode DSRM NTDSUTIL Update Sequence Number USN Invocation ID
Procedure for Deployment Get Departmental/School buy-in Develop and share Plan Document all DC Roles and Functions Maintain current Backup Scenario as Fallback Create server VMs for each domain W2k3sp1
Procedure for Deployment Create a Lag Site in Sites & Services
Procedure for Deployment Create/enable DC Locator disable DNS
Procedure for Deployment Ensure that VMs do Not Register in WINS
Procedure for Deployment Promote VM server to a DC for each domain Move VMs to the Lag Site Manually rebuild Replication Links for each VM to pull Replication from one physical DC within its Domain
Procedure for Deployment Manually Create Replication Links from the Root VM DC to each Child VM DCs Change Schedule for Replication Links
Procedure for Deployment Let Replication occur between Domain physical DCs to respective VM DCs Let Replication occur between VMs in Lag Site Replication is Disabled until Next Scheduled Time
Procedure for Deployment Make Copy/Backup of each VM Backup System State can use ntbackup Shutdown VM Copy file(s), label day/month Restart VMs, wait for next scheduled replication cycle Test VM Forest Restores in Lab
Procedure for Restore Identify Problem and When it Occurred Call Microsoft - Determine how to Proceed Shutdown all DCs Best Practice: Choose Backup-VMs at least Two Days prior to Incident
Procedure for Restore Restart Root VM in Directory Services Restore Mode DSRM - F8 startup Need DS Recovery Password Start NTDSUTIL Perform Non-Auth Restore of System State for that VM that was Saved for that Day
Procedure for Restore NTDSUTIL Perform Metadata Cleanup on Root VM to remove all Physical Domain DCs, their Replication Links and DNS entries in AD Note: Roles will be automatically seized and Recall DNS is still running on this DC
Procedure for Restore Reboot Root VM (one DC now functioning) Change Replication Schedule to all the time Patch and apply/update AV software if necessary Make it a GC As per MS: Raise RID value Reset Computer Acct of DC twice Reset krbtgt password twice Reset Trust Passwords Check Logs
Procedure for Restore Re-install W2k3sp1 on the other Physical Root Servers and DCpromo - they will now acquire Schema, Configuration, Naming DBs from Root VM May have to Recreate Default First Site Return DCs to your Default First Site Patch and apply AV software if necessary *Install and Config AD Integrated DNS * Check Sites & Services - Ensure Replication Works Verify SRV records register in AD DNS
Procedure for Restore If No Replication: Check for USN Rollback in Logs Event ID 2095 Troubleshoot Replication: support tools More info on KB 875495 No Issues (Now all DCs in Root Domain Functioning)
Procedure for Restore Choose Second Domain VM DC from Same Point in Time Copy/Move VM Restart in DSRM Mode System State and Metadata Cleanup Reboot, apply patches/av software if necessary
Procedure for Restore As per MS: Raise RID value Reset Computer Acct of DC twice Reset krbtgt password twice Reset Trust Passwords Check Logs Change Replication Schedule to All the Time
Procedure for Restore (One Child VM-DC up and One Child Domain Functioning) Contact Domain Admins - They can now DCpromo remaining Physical DCs without Enterprise Admin Intervention Repeat Procedure for Remaining Domains
Procedure for Restore Post-Recovery Clean up: Backlinks restore objects that link to other objects in other domains Re-distribute Domain Roles that were seized Delete WINS records to any DCs not restored
Procedure for Restore Post-Recovery Clean up: Restore External trusts Restore or Re-install any Software apps Understand that all Additions, Deletions, and Modifications are Lost that were made after the chosen Restore Date Admins should keep logs
Conclusion
References MS Windows Server 2003 Planning for Active Directory Forest Recovery http://www.microsoft.com/downloads/details.aspx?familyid=afe436fa-8e8a- 443a-9027-c522dee35d85&displaylang=en MS Windows Server 2003 Running a Domain Controller in Virtual Server 2005 http://www.microsoft.com/downloads/details.aspx?familyid=64db845d-f7a3-4209-8ed2-e261a117fc6b&displaylang=en Definitive Guide to Active Directory Disaster Recovery (NetPro Computing) http://www.netpro.com/media/pdf/netpro_addr_guide.pdf Detect/Recover USN Rollback in W2k3 KB 875495 http://support.microsoft.com/kb/875495 Considerations when hosting AD DC in Virtual Hosting environments KB 888794 http://support.microsoft.com/kb/888794 Definitive Guide to Rapid Windows Recovery (Moskowitz and Jones) http://nexus.realtimepublishers.com/dgrwr.htm MS Technet: Performing a Non-authoritative Restore of a Domain Controller http://technet2.microsoft.com/windowsserver/en/library/f3bfb611-dcbe-4365-8f1d-3321916aeb631033.mspx?mfr=true
Questions Tom Kline Ryan Whyms tklin001@umaryland.edu rwhyms@umaryland.edu