Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Citrix Slutins Lab White Paper This paper examines the issues and cncerns arund building a disaster recvery plan and slutin, the pssible use cases that may ccur, and hw a team f engineers within the Citrix Slutins Lab appraches building a disaster recvery slutin. Nvember 2015 Prepared by: Citrix Slutins Lab
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Table f cntents Sectin 1: Overview... 5 Executive summary... 5 Audience... 5 Disaster Recvery vs. High Availability... 6 Defining types f Disaster Recvery... 6 Defining what is critical... 7 Sectin 2: Defining the Envirnment... 8 Service Descriptins... 9 User t Site Assignment... 10 User Cunts by Regin... 10 Reginal Site 1 Netwrk Diagram... 11 Reginal Site 2 Netwrk Diagram... 12 Cld DR Site Netwrk Diagram... 13 Sftware... 14 Hardware... 14 Servers... 14 Netwrk... 15 Strage... 16 Use Cases... 17 Sectin 3: Deplyment... 18 Cnfiguratin Cnsideratins... 19 Regin Server Pls... 22 Failver Prcess... 25 Sectin 4: Cnclusin... 27 Sectin 5: Appendices... 28 Appendix A... 28 citrix.cm 3
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery References... 28 Appendix B... 29 High Level Reginal Diagrams... 29 Appendix C... 31 Identifying Services and Applicatins fr DR/HA... 31 citrix.cm 4
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Sectin 1: Overview Executive summary There is much cnversatin arund executing disaster recvery fr a data center, and utilizing high availability wherever pssible. Hwever, what are the requirements arund disaster recvery, and hw des it differ frm high availability? Hw d they wrk tgether t ensure yur systems and applicatins are up and available, n matter what? This white paper lks at understanding disaster recvery and high availability. As with mst things in life, there are trade-ffs. The mre resilient t failure yu want t be, the mre it is ging t cst. Hw d these trade-ffs affect yu? There is the ld-fashined apprach f writing everything f imprtance t tape, string the tape ff-site, and waiting fr a disaster t ccur. Tape is a very lw cst ptin, but it culd take days r weeks t rebuild yur envirnment. The ther end f the spectrum cmes frm utilizing tday s technlgy and making everything active/active, essentially running tw cmplete data centers in tw different lcatins. The tw data centers ptin is an extremely resilient, but als extremely cstly ptin. Simply, yu are betting yu are ging t have a disaster that affects at least ne f yur sites. What exactly needs t be up and running as quickly as pssible after a failure f yur data center? Where des high availability cme int play t help? This dcument lks at sme f these questins, and asks a few mre, t help yu understand and make gd decisins in building a disaster recvery plan. This prject is nt lking at sizing, scaling, r perfrmance, but at design cnsideratins fr disaster recvery. In the Slutins Lab, a team f engineers including lab hardware specialists, netwrk specialists, strage specialists, architects, and Citrix experts were challenged t build a disaster recvery slutin fr a fictitius cmpany defined by Slutins Lab Management. This dcument shws hw the cmpany was defined, hw the team architected and then implemented a slutin, and sme f the issues and prblems they uncvered as flaws in their plan r things they did nt expect r anticipate. The end result plan was cmpared t hw cmpanies such as Citrix handle disaster recvery, and it was fund t be very similar. The team had an advantage in that they were able t build the cmpany data center t fit their design, nt try t fit a design t an existing data center. Hpefully what they learned and uncvered will assist yu as yu think abut building yur wn disaster recvery plan. Nte that a majr cmpnent f any disaster recvery slutin is the strage and strage vendr used. The cncerns are arund the amunt f data t be mved between the sites and the acceptable delta between data synchrnizatins. Fr this paper, we wrked with EMC, utilizing their strage slutin t achieve ur defined gals. Audience This paper was written fr IT experts, cnsultants, and architects tasked with designing a disaster recvery plan. citrix.cm 5
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Disaster Recvery vs. High Availability Befre we can prceed, we need t align n sme definitins and terms. Fr this paper, High Availability (HA) is fcused mre n the server level, and is cnfigured in such a manner that the end user experiences little t n dwn time. Recvery can be autmatic, simply failing ver t anther hst, server, r instance f the applicatin. HA is ften thught f in terms f N+1, the additin f ne mre server (physical r virtual) r applicatin than is required. If five physical servers were required t supprt the wrklad, then six wuld be cnfigured with the lad distributed acrss all six servers. If any single server fails, the remaining five can pick up the wrklad withut significantly affecting the user experience. With sftware like Citrix XenDesktp, the same apprach applies. If ne delivery cntrller/brker, prvisining server, r SQL server is nt sufficient t supprt the wrklad, a secnd ne is deplyed. Depending n the sftware, this can be either Active/Active where all are actively prcessing, r Active/Passive where the HA nly becmes active in failure f the first system. In XenDesktp, we always recmmend an Active/Active deplyment. Disaster Recvery (DR) implies a cmplete disaster, n access t the site r regin, ttal failure. The recvery will require manual interventin at sme pint, and the respnse times fr being peratinal again are defined by the disaster recvery specifics. We will talk mre abut this later in this paper. Defining types f Disaster Recvery Fr HA, we talked in terms f Active/Active and Active/Passive, where these terms define hw the HA cmpnents act, either all are up and supprting users r ne is awaiting a failure event and then prceeds t pick up the lad. These terms can be applied t DR as well: Active/Passive (A/P) referred t as planned r cld Once a disaster strikes the secnd site must be brught up entirely Only as current as the last back up Culd have hardware sitting idle waiting fr disaster Active/Active (A/A) referred t as ht sites Everything replicated in the disaster site Duplicate hardware Everything that ccurs n the primary site als ccurs n the secndary site Lad balanced Active/Warm (A/W) referred t as reactive r warm Sme cmpnents nline, ready Must define pririty recvery When disaster ccurs, prvisin capacity as needed In A/P, depending n hw quickly yu need t be back up and running, it may be as simple as backing up t tape and in a disaster restring frm tape t available hardware. This is the lwer cst slutin, but nt very resilient r quick fr recvery. A/A has duplicate hardware and sftware running and supprting users. In a multi-site scenari, each site must have enugh additinal hardware t supprt the user failver. A/A is much quicker t recver frm a disaster, but much mre expensive frm a Capital Expenditure (CAPEX) cst with hardware. Essentially, each site has a cmplete duplicate set f underutilized hardware waiting fr a disaster. With A/W, the plan is t define that which is critical t the cmpany and what must be recvered as quickly as pssible and having enugh bandwidth at the ther site(s) t supprt the requirement. Once the mst critical envirnment is defined, the rest f the cmpany can be dealt with. This des require sme extra hardware in each regin, but we can better manage the resurces and csts. citrix.cm 6
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Defining what is critical In an A/A deplyment, the thught is that everything is critical, and must be up and running. In an A/P deplyment, critical uptime is nt imprtant. Hwever, fr A/W we must define what is critical, and which users are critical. The fllwing terms are used ging frward: Missin Critical (MC) Highest Pririty Requires cntinuus availability Breaks in service are very impactful n the cmpany business Availability required at almst any price Missin critical users are highest pririty in event f a failure Business Critical (BC) High Pririty Requires cntinuus availability, thugh shrt breaks in service are nt catastrphic Availability required fr effective business peratin Business critical users have a less stringent recvery time Business Operatinal / Prductivity (PR) Medium Pririty Cntributing t efficient business peratin but desn't greatly affect business Regular users, may nt fail ver, r dne s as final steps As stated earlier, we created a fictitius cmpany fr this disaster recvery plan scenari. This cmpany has a single Missin Critical applicatin and a single Business Critical applicatin, and assciated users. The cmpany president defined the acceptable respnse times and requirements, including a desire t have a warm failver fr missin- and business-critical users, and a passive failver fr the rest f the cmpany. The fllwing sectins highlight the develpment and implementatin f the plan. citrix.cm 7
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Sectin 2: Defining the Envirnment Fr this setup, the fictitius business was structured as ne business with tw reginal sites. The business requires bth cmpany database (which is cnsidered Missin Critical) and Exchange (which is cnsidered Business Critical) availability. Regin 1 fcuses n cmpany infrastructure and Regin 2 fcuses n a call center. MC and BC users are spread acrss multiple grups in each regin. This setup must als be able t handle the ttal failure f bth Regins 1 and 2 at the same time. In a single regin failure, the recvery gals fr ur setup are fr MC applicatins and users t be back up and running within tw hurs with minimal data lss. BC applicatins and users must be back up and running within fur hurs with up t 60 minutes f acceptable data lss. If Regins 1 and 2 bth fail, the third site must be up and running within five days with n mre than 24 hurs f acceptable data lss. Fr a clser lk at this diagram by regin, see Appendix B at the end f the paper. citrix.cm 8
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Service Descriptins This table defines ur MC, BC and PR services and applicatins and ur cnsideratins in handling them in ur setup. Service Type Service Descriptin Cnfiguratin Requirements Missin Critical Micrsft SQL Sample Database - Nrthwind SQL Sample Nrthwind Database is used alng with a web server. This represents the Call Center missin critical applicatin database. SQL Sample database is deplyed at all lcatins. Replicatin is handled by strage backend. In case f majr failure, the database must be delivered frm the DR data center. A maintenance message must be presented t external users when the database is nt available. Business Critical Micrsft Exchange / Outlk Access t email data fr business critical users frm Exchange database The database is replicated between primary and secndary lcatins using Exchange database cpies. The database is backed up every 4 hurs t the strage in DR lcatin. Business Operatinal / Prductivity Micrsft Office and file shares All users use Micrsft Office t create and review dcuments. Dcuments are stred n file server shares synced between regins. Micrsft Office is published n XenApp. DFS Replicatin is cnfigured between primary sites and file-based backup is perfrmed t the DR lcatin every 8 hurs. In case f disaster, a limited set f users must have access t the DR file share lcatin. Published Micrsft Office must be unavailable t users when the file share is nt available. citrix.cm 9
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery User t Site Assignment Each reginal site in ur setup has different type f users. Regin 1 is fcused n HR and engineering. Regin 2 is fcused n call center users. A majrity f the users are hsted shared desktps, the remaining users are VDI users, either pled r dedicated. User Cunts by Regin The table belw shws the breakdwn f users by regin and hw they are rganized within the regins. Regin 1 User Cunts Missin Critical Business Critical Business Operatinal / PR Engineering 30 60 560 HR 10 10 20 Management 5 5 Regin 1 Grand Ttal 45 75 580 Regin 2 User Cunts Missin Critical Business Critical Business Operatinal / PR Call Center 20 60 520 Engineering 10 50 HR 5 25 Management 5 5 Regin 2 Grand Ttal 40 90 570 citrix.cm 10
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Reginal Site 1 Netwrk Diagram Fr Regin 1, the server cnfiguratin cnsisted f: Three physical servers running XenServer, hsting infrastructure VMs. Fur physical XenApp hsts in a single delivery grup, as a 3+1 HA mdel supprting the business peratinal users. Fur physical hsts running XenServer cnfigured as a pl, in a 3+1 HA mdel supprting the missin- and business-critical users. This pl supprted the fllwing cnfiguratin: 30 Windws 8.1 Dedicated VDI VMs 90 Windws 8.1 Randm Pled VDI VMs 5 Windws 2012 R2 Multi-user XA/HSD VMs supprting 80 users The Regin 2 failver pl in Regin 1 is fur XenServer hsts in a 3+1 mdel supprting the fllwing cnfiguratin: 25 Windws 8.1 Dedicated VDI VMs 25 Windws 8.1 Randm Pled VDI VMs 5 Windws 2012 R2 Multi-user XA/HSD VMs supprting 80 users 3 SQL 2014 VMs in a cluster (Call center database failver) citrix.cm 11
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Reginal Site 2 Netwrk Diagram Fr Regin 2, the server cnfiguratin cnsisted f: Three physical servers running XenServer, hsting infrastructure VMs, including the SQL call center cluster. Fur physical XenApp hsts in a single delivery grup, as a 3+1 HA mdel supprting the business peratinal users. Fur physical hsts running XenServer cnfigured as a pl, in a 3+1 HA mdel supprting the missin- and business-critical users. This pl supprted the fllwing cnfiguratin: 25 Windws 8.1 Dedicated VDI VMs 95 Windws 8.1 Randm Pled VDI VMs 5 Windws 2012 R2 Multi-user XA/HSD VMs supprting 80 users The Regin 1 failver pl in Regin 2 is fur XenServer hsts in a 3+1 mdel supprting the fllwing cnfiguratin: 30 Windws 8.1 Dedicated VDI VMs 10 Windws 8.1 Randm Pled VDI VMs 5 Windws 2012 R2 Multi-user XA/HSD VMs supprting 80 users citrix.cm 12
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Cld DR Site Netwrk Diagram Fr the DR site, the Regin 1 disaster recvery site was set up with fur XenServer hsts in a 3+1 HA mdel supprting the fllwing cnfiguratin: Windws 8.1 Dedicated VDI VMs Windws 2012 R2 Multi-user XA/HSD VMs Infrastructure VMs The Regin 2 disaster recvery site was set up with fur XenServer hsts in a 3+1 HA mdel supprting: Windws 8.1 Dedicated VDI VMs Windws 2012 R2 Multi-user XA/HSD VMs Infrastructure VMs Nte: The netwrks fr Regin 1 and Regin 2 in this site are set up with the same IP ranges as in the riginal reginal sites. citrix.cm 13
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Sftware The fllwing is a list f sftware cmpnents deplyed in the envirnment: Cmpnent Virtual Desktp Brker Versin XenDesktp 7.6 Platinum Editin FP2 VDI Desktp Prvisining Prvisining Services 7.6 Endpint Client Citrix Receiver fr Windws 4.2 (ICA) Web Prtal Citrix StreFrnt 3.0 License Server Citrix License Server 11.12.1 Office Micrsft Office 2013 Virtual Desktp OS (Pled VDI) Virtual Desktp OS (Hsted Shared Desktps) Micrsft Windws 8.1 x64 Micrsft Windws Server 2012 R2 Datacenter Database Server Micrsft SQL Server 2014 Hypervisr Netwrk Appliance WAN Optimizatin Strage Netwrk Strage DR XenServer 6.5 SP1 NetScaler VPX, NS11.0: Build 62.10.nc CludBridge WAN Acceleratr CBVPX 7.4.1 Brcade 5100 switch Fr XtremIO: EMC RecverPint 4.1 SP2 P1 Fr Isiln: OneFS 7.2 SyncIQ Nte: All sftware is updated t run the latest htfixes and patches Hardware Servers The hardware used in this cnfiguratin were blade servers with 2-scket Intel Xen E5-2670 @ 2.60GHz, with 192 GB f RAM and tw internal hard drives. citrix.cm 14
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Netwrk VMs were utilized as site edge devices that helped rute traffic between regins. The perimeter netwrk (als knwn as a DMZ) had a firewall between itself and the internet and anther firewall between the perimeter netwrk and prductin netwrk. NetScaler Glbal Site Lad Balancing (GSLB) was used t determine which regin the user is sent. If available, users are sent t their primary regin. When the primary regin is nt available, users are sent t their secndary regin. A pair f NetScaler VPX appliances per regin were utilized fr authenticatin, access, and VPN cmmunicatins. Additinally, a pair f NetScaler Gateway VPX appliances were utilized per regin t allw cnnectivity int the XenApp/XenDesktp envirnment. CludBridge VPX appliances were utilized fr traffic acceleratin and ptimizatin between regins. NetScaler CludBridge Cnnectr was cnfigured fr IPSec tunneling. The fllwing diagram is a detailed architectural design f ur netwrk implementatin. citrix.cm 15
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Strage Strage was cnfigured using EMC XtremIO All-Flash Strage and Isiln Clustered NAS systems. Strage Netwrk fr EMC XtremIO was cnfigured with Brcade Fibre Channel SAN switches. The fllwing diagram gives a high level view fr Regin 1. As stated previusly, failver t a DR site requires manual interventin, s the cncern in syncing data cmes dwn t a math prblem. Hw much data d yu need t sync between sites and what size pipe between the sites? That determines hw lng it will take t sync. Can yu sync in the time allwed? If nt, what d yu have t crrect the prblem, reduce the amunt f data r increase the pipe speed? One thing t lk at is the LUNs, r strage repsitries. Our design created multiple vlumes fr missin critical data and business critical data, and scheduled syncs accrdingly. It is crucial that yu wrk with the strage vendr t get the prper cnfiguratin. citrix.cm 16
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Use Cases The fllwing use cases define the pssible scenaris that must be cnsidered and, fr ur case study, the users that must be supprted. The minimum implies the missin critical and business critical users that need t be supprted. Use Case 1 The sites are cnfigured as Active/Active using NetScaler GSLB. If the Regin 1 site fails, missin- and business-critical users will be able t cnnect and lg n t the Regin 2 site with the same data resurces as were available in the Regin 1 site. With the Regin 1 site back nline, NetScaler GSLB will direct users t the crrect site, as Regin 1 site users lg ff frm the Regin 2 site and then lg back int the Regin 1 site. A maximum f 120 users will have warm HA failver capability frm Regin 1 t Regin 2. Use Case 2 The sites are cnfigured as Active/Active using NetScaler GSLB. If the Regin 2 site fails, missin- and business-critical users will be able t cnnect and lg n t the Regin 1 site with the same data resurces as were available in the Regin 2 site. With the Regin 2 site back nline, NetScaler GSLB will direct users t the crrect site, as Regin 2 site users lg ff frm the Regin 1 site and then lg back int the Regin 2 site. A maximum f 130 users will have warm HA failver capability frm Regin 2 t Regin 1. Use Case 3 Cld DR The sites cnfigured as Active/Passive, with the gal f failing ver nly the missin critical users frm the Regin 1/Regin 2 sites t the DR site. This site will be based n backup data frm Regin1 and Regin 2 and will g live within 5 days. Manual prcess t switch t the DR site When users lgin t the DR site, they shuld have any changes/mdificatins in their dedicated envirnment in the DR site envirnment. There is ptential f data lss between the last site t site cpy and the failver. Once failed ver t DR site, when Regin 1/Regin 2 return nline, and after allwing apprpriate time fr replicatin between sites, lgin shuld cnnect t Regin 1/Regin 2 and the changes shuld be reflected there. The cld DR site will cntain subset f the reginal sites including netwrking, infrastructure and dedicated VDIs. This apprach allws us t bth easily recver frm disaster with backups, and later rebuild reginal sites frm the DR site data. Missin Critical users will have primary access t the cld DR site, fllwed by Business Critical, and then the rest f the cmpany depending n timelines and disaster impact. A maximum f 45 users will have cld DR access frm Regin 1. A maximum f 40 users will have cld DR access frm Regin 2. citrix.cm 17
Sectin 3: Deplyment Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery In building this cnfiguratin, this dcument is nt a step by step manual, but a guide t help understand what needs t be dne. Wherever pssible, Citrix dcumentatin was fllwed arund deplyment and cnfiguratin. The fllwing cnfiguratin sectins highlight any deviatins r areas f imprtance t help with a successful deplyment. Implementing the sftware breaks dwn t tw majr areas. First, putting the crrect sftware int each regin. Secnd, cnfiguring NetScaler fr GSLB. The prcess fllwed fr deplyment was: 1. Deply XenServer pls. 2. Create required AD grups and DHCP scpes. 3. Prepare SQL Envirnment (SQL AlwaysOn). PVS 7.6 adds supprt fr AlwaysOn. 4. Deply XenDesktp envirnment. 5. Deply Strefrnt servers and cnnect t XenDesktp. 6. Deply PVS envirnment and create required vdisks. 7. Cnfigure NetScaler GSLB, create site and service. 8. Cnfigure NetScaler Gateway in Active/Passive mde and update Strefrnt cnfiguratin. 9. Deply Micrsft Exchange Envirnment. The NetScaler cnfiguratins are straightfrward, there was nthing special dne with cnfiguring StreFrnt. This was a typical XenDesktp and NetScaler Gateway cnfiguratin. Tw StreFrnt servers were cnfigured t be lad balanced by NetScaler. NetScaler GSLB is where the fcus is: Using LB Methd StaticPrximity: Regin 1 users will be sent t Regin 1 if it is nline, therwise the users will be set t Regin 2 and vice versa. Using lcatin settings in NetScaler t define the primary regins f the client s lcal DNS Servers and fr the GSLB sites and services. Users regardless f regin use the same Fully Qualified Dmain Name (FQDN) (i.e. desktp.dmain.cm) NetScaler running ADNS will answer authritatively with the IP f primary site. Once the user is redirected t the prper site, the user authenticates at AG, and is then redirected t lcal StreFrnt t get access t resurces. Additinally, NetScaler CludBridge Cnnectr is cnfigured fr IPSec tunneling: An IPSec tunnel fr AD replicatin, server/client cmmunicatin is created using the utbund cnnectin. A secnd IPSec tunnel is created fr site t site data replicatin. citrix.cm 18
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Cnfiguratin Cnsideratins The fllwing defines sme f the specific cnfiguratins applied t envirnment: XenApp/XenDesktp Reginal Sites R1 and R2 2 Delivery Cntrllers primary reginal site FMA Services t have SSL n Cntrllers and change XML Service prts frm HTTP t HTTPS prts t secure traffic cmmunicatin XD/XA Database n the Always On SQL grup Must have unique Site Database naming SSL t VDA feature f XenApp and XenDesktp 7.6 Hsted shared desktps 5 Machine Catalgs Physical XA HSD XA HSD MC XA HSD BC XA HSD MC Failver XA HSD BC Failver 5 Delivery Grups matching the catalgs Pled VDI desktps 4 Machine Catalgs PR BC PR Failver BC Failver 4 Delivery Grups Dedicated VDI Desktps 4 Machine Catalgs MC BC MC Failver BC Failver 4 Delivery Grups citrix.cm 19
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery VDI Virtual Desktps XenApp / HSD User Prfile Manager StreFrnt VMs License Server VM Pled Randm VDI Desktps VDI VMs Streamed frm PVS vdisk Dedicated VDI Desktps Static VMs My Dcuments must be redirected t a netwrk lcatin n File Share Deplyed in tw mdels Physical Hsts in N+1 HA Mdel manually installed n hardware Virtualized XA HSD VMs in N+1 HA mdel streamed frm PVS vdisks Hsted Shared Desktp: User Prfile Data: \\FS01\PrfileData\HSD\ #SAMAccuntName# Hsted Virtual Desktp: User Prfile Data: \\FS01\PrfileData\HVD\ #SAMAccuntName# Hsted Virtual Desktp: User Prfile Data: \\FS01\PrfileData\MC\ #SAMAccuntName# User Prfile and Flder redirectin Plicies SSL cnfigured t secure traffic cmmunicatin 2 StreFrnt Servers (HA) and LB by NetScaler VPX Authenticatin is cnfigured n NetScaler Gateway. 2 HA license servers SSL cnfigured t secure traffic cmmunicatin Windws 2012 RDS Licenses Citrix Licensing Server citrix.cm 20
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Isiln Scale-ut NAS fr each Site Prvisining Services NetScaler VPX VMs 4 - X410 Ndes 34 TB HDD + 1.6 TB SSD 128 GB RAM 8x16 GB 2 PVS Server VMs in HA PVS DB Server cnfigured n SQL AlwaysOn Utilizing remte strage lcatin fr vdisks n each PVS remte strage attached t PVS VMs as 2 nd drive via File Server and SMB/CIFS. Separate lcatins fr vdisk stre fr Missin Critical and Business Critical vdisks n File Server via SMB/CIFS Regular vdisks lcated n lcal File Servers Multi hmed Utilizing Guest VLAN as Management interface Utilizing the PXE VLAN fr Streaming interface DHCP fr PVS netwrk/pxe VLAN Cache in device RAM with verflw n hard disk 256MB fr Windws 8.1 VDI 2048MB fr XA HSD 2 LB VPX in HA mde LDAP Authenticatin AG VIP VPN GSLB fr reginal sites 2 - VPXs fr LB f StreFrnt and XML services citrix.cm 21
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Regin Server Pls The fllwing defines the VM breakdwn per regin fr the different pls required within the infrastructure envirnment. In all cases, the VMs were balanced acrss XenServer hsts, and VMs were cnfigured in an HA mdel; a minimum f tw VMs fr each required applicatin. Regin 1: Perimeter Netwrk R2 HA Fail-Over Pl 2 XenDesktp Brkers 2 - StreFrnt VMs 2 - License Server VMs 2 - Prvisining Services 2 - File Server VMs 3 - SQL 2014 Database Server VM Always On 2 AD DC VMs 4 - Exchange server VMs 2 Mailbx 2 Client Access 1 Firewall / Ruter VM 2 - NetScaler VPX VMs HA Mdel User Access 2 CludBridge VPX VMs - HA Mdel - Active/Passive Site t Site user access WAN ptimizatin 2 CludBridge VPX VMs - HA Mdel - Active/Passive Site t Site data replicatin 2 - NetScaler VPX VMs HA Mdel Data Replicatin 5 XA HSD VMs 25 Pled VDI VMs 25 Dedicated VDI VMs 3 SQL Server VMs (Call Center Cluster) citrix.cm 22
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Regin 2: Perimeter Netwrk R1 HA Fail-Over Pl 2 XenDesktp Brkers 2 - StreFrnt VMs 2 - License Server VMs 2 - Prvisining Services 3 SQL 2014 Database Server VMs SQL Cluster 3 - SQL 2014 Database Server VM Always On 2 - File Server VMs 2 AD DC VMs 4 - Exchange server VMs 2 Mailbx 2 Client Access 1 Firewall / Ruter VM 2 - NetScaler VPX VMs HA Mdel User Access 2 CludBridge VPX VMs - HA Mdel - Active/Passive Site t Site user access WAN ptimizatin 2 CludBridge VPX VMs - HA Mdel - Active/Passive Site t Site data replicatin 2 - NetScaler VPX VMs HA Mdel Data Replicatin 5 XA HSD VMs 10 Pled VDI VMs 30 Dedicated VDI VMs citrix.cm 23
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Regin 3: Infrastructure Pl t supprt Regin 3 2 AD DC VMs 1 VM t handle backups frm regins 1 and 2 Regin 1 Infrastructure Pl 2 AD DC VMs 2 Delivery Cntrllers 2 StreFrnt VMs 2 License Server VMs 1 File Server VMs 2 SQL 2012 Database Server VM Always On 4 Exchange server VMs 2 Mailbx Regin 2 Infrastructure Pl Perimeter Netwrk 2 Client Access 2 AD DC VMs 2 Delivery Cntrllers 2 StreFrnt VMs 2 License Server VMs 1 File Server VMs 2 SQL 2014 Database Server VM Always On 2 SQL 2014 Database Server VM SQL Cluster 4 Exchange server VMs 2 Mailbx 2 Client Access 1 Firewall / Ruter VM 2 NetScaler VPX VMs R1/R2 Access VIP per Regin 2 VIPs Nte: The infrastructure VMs fr regins 1 and 2 were duplicated in regin 3 fr netwrking purpses. By setting the netwrks crrectly in regin 3, nce regins 1 and 2 were brught up, n netwrk changes were required in their infrastructure r VHD files. citrix.cm 24
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Failver Prcess The dedicated VMs present the biggest challenge in a failure. T address this, VMs are created in bth regins fr the failver dedicated VMs frm the ther regin. Hwever, n strage is attached t these VMs. In the event f a failure, these VMs will be assigned the prper VHD file frm the backup strage lcatin. It shuld als be nted that fr fail-back after the failed regin is back nline, the dedicated VM VHD files will be deleted in the failed regin and cpied back frm the failver regin and attached t the prper VM. This ensures the latest versin f the dedicated VMs will be restarted after the fail-back. Nte: In dealing with dedicated VMs, we realized that we had t carefully name the VHD files and assciated files t ensure cnnecting the crrect VHD file t the crrect VM in failver and fail-back. If there is a failure in either Regin 1 r Regin 2 (what s called a warm failver), a few steps need t be taken. The actins differ depending n the failure. If it is a netwrk access issue, r the Internet is dwn, the dedicated VMs in the failed regin are placed in Maintenance Mde in Citrix Studi and shut dwn. The latest strage backup f the dedicated VMs in the new regin must be made available and the strage fr each VM needs t be attached individually t the pre-created VMs already present. Grup plicy is applied t the dedicated VMs OU which imprt the registry value, listing the delivery cntrllers hst names, allwing VDA registratin with the lcal delivery cntrllers. The pled VDI and XA HSD VMs n the lcal delivery site are als taken ff Maintenance Mde and brught nline. Fr Regin 2, the SQL database fr the call center applicatin is brught nline as well. Depending n the type f failure, yu may need t pwer dwn the failed regin firewall t frce failver t the ther regin. Once thse steps are cmpleted, yu bt Missin Critical User VMs and Business Critical User VMs. Missin- and Business-Critical data is kept in sync between the sites. Yu can then cmmunicate the availability t yur users. The end users use the same URL as always, with GSLB redirecting as required. Fr fail-back after recvery f the failed regin has cmpleted, the steps are t sync all strage back t the failed site, perfrm the necessary steps fr the dedicated VMs, bring the applicatins back nline, and bring up the users. In a full lss f bth Regins 1 and Regins 2, the DR site, r Regin 3, needs t be brught nline. The physical servers are pwered up, making the XenServer pls accessible. The latest database and Exchange infrmatin are imprted and the infrastructure fr user VDI VMs shuld be restred and brught nline. A new URL is required t lg in. Once the site has been brught nline, any new infrmatin, like a new URL fr access, needs t be given t yur users. citrix.cm 25
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery The fllwing defines steps required t recver and bring Regin 3 as defined back nline: Active Directry, DNS and DHCP Imprt Dmain Cntrllers frm backup and restre Active Directry functinality Update DNS Recrds fr Strefrnt / Access Gateway / Exchange MX Create DHCP Scpes NetScaler Rebuild NetScaler cmpnents, NetScaler Gateway XenServer Turn existing XenServer Pls n File Services Restre access t file services, user data and UPM. XenDesktp Envirnment Imprt SQL VMs and restre XenDesktp, PVS and Call Center applicatin databases Imprt StreFrnt, XenDesktp and PVS VMs and test cnnectivity t databases Exchange Envirnment Imprt Client Access and Mailbx Servers and restre databases External DNS Update External DNS recrds fr Access Gateway URLs Update External MX recrds fr email Update Outlk Anywhere, Active Sync, etc. DNS recrds citrix.cm 26
Sectin 4: Cnclusin Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery As stated in the beginning, the gal f this prject was t challenge a grup f engineers with creating a disaster recvery plan fr a fictitius cmpany. This meant understanding what was missin critical, business critical, and nrmal day-t-day wrk, and what applicatins and data needed t be ready in case f a disaster. This als meant understanding user needs fr issues like dedicated VMs. This paper highlights and defines sme f the issues arund creating a disaster recvery envirnment. This is nt a hw-t, step-by-step manual, but a guide t help yu understand the issues and cncerns in ding disaster recvery, and things t cnsider when defining yur disaster plan. It shws yu hw the Citrix Slutins Lab team f engineers defined, designed, and implemented a DR plan fr a fictitius cmpany. This may nt be the ptimal slutin verall fr yur cmpany, but it is ne that yu can utilize as a base line f cnsideratins and peratinal steps t be used when yu create yur disaster recvery plan fr yu cmpany. During the prcess f deplying and testing, there were sme realizatins and changes made. One f the first was arund failing back after a failver; hw t handle the data. D yu sync back, r delete and cpy back? Our decisin was t delete and cpy back, ensuring the riginal site is clean and up t date. Anther realizatin was arund the cnfiguratin f GSLB and the failed site. Since preparing the fail ver site fr access requires manual interventin, there is ptential fr GSLB t re-direct users t the fail ver site befre it is ready, users culd hit a StreFrnt befre any persnal desktps r applicatins are available fr them, they wuld have access t any cmmn applicatins r desktps. We used tw different SQL appraches, Always-n fr ur infrastructure envirnment and clustering fr ur data base applicatin. This was dne by design in the lab t shw issues and cnsideratins arund bth. T supprt high availability between the tw main regins and having a third regin fr ttal failver the ne thing that ur cmpany president was less than thrilled with was the Cap-Ex cst f hardware nt being fully utilized. This is a cst f ding business. Hwever, with the recent intrductin f Citrix Wrkspace Clud, an alternate may have cme up that we are rewrking ur fictitius cmpany tward. Rather than having additinal hardware in Regins 1 and 2, what if there was a clud site running at a minimum waiting fr a regin t fail, and spin up what is needed t supprt the failure? Essentially, what is needed in the clud is a NetScaler VPX fr cnnectivity, an AD server, a SQL Always n server, and an Exchange server. This keeps the missin critical and business critical envirnments in sync. Yu can then determine what else may be required t supprt each regin. The ne current caveat f the clud is that currently n clud supprts desktp perating systems; VDI users get server perating systems running in a desktp mde. This is nt a majr issue fr pled VDI users, but des becme smething t be slved fr dedicated VDI users. Will the clud wrk fr yu? Shuld yu use additinal hardware in yur regins? What are yur recvery times? Hw much f yur envirnment is actually missin critical? These are questins we hpe yu are nw cnsidering as yu build a disaster recvery plan fr yur cmpany. citrix.cm 27
Sectin 5: Appendices Appendix A References Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery EMC Strage http://www.emc.cm/en-us/strage/strage.htm?nav=1 Brcade Strage Netwrk http://www.brcade.cm/en/prducts-services/strage-netwrking/fibre-channel.html XenApp http://www.citrix.cm/prducts/xenapp/verview.html XenDesktp http://www.citrix.cm/prducts/xendesktp/verview.html NetScaler http://www.citrix.cm/prducts/netscaler-applicatin-delivery-cntrller/verview.html CludBridge http://www.citrix.cm/prducts/cludbridge/verview.html Citrix CludBridge Data Sheet: https://www.citrix.cm/cntent/dam/citrix/en_us/dcuments/prducts-slutins/cludbridge-data-sheet.pdf citrix.cm 28
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Appendix B High Level Reginal Diagrams citrix.cm 29
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery citrix.cm 30
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Appendix C Identifying Services and Applicatins fr DR/HA This sectin identifies all the applicatins, services and data items fr planning within ur setup. Call Center Type: Database and App Descriptin: Main applicatin fr call center activity required fr cmpany missin critical functin Level: Missin Critical Primary Lcatin: Regin 2 (West Cast), Regin 1, R3/DR in case f failver r disaster Access Methds: Lcal Web Brwser Published App Web Brwser Data: SQL Database actual test database - Micrsft SQL Sample StreFrnt Data Lcatin: SQL 2014 Cluster Systems: Ntes: SQL 2014 Database servers Web Servers Database servers and database must be made accessible in R1 and R3/DR in case f fail-ver r disaster Bth database and web site fr it wuld need t be created http://businessimpactinc.cm/install-nrthwind-database/ https://msdn.micrsft.cm/en-us/library/vstudi/tw738475%28v=vs.100%29.aspx Exchange Type: Service Descriptin: Email service, required fr internal and external cmmunicatin Level: Business Critical Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Sme Exchange databases are regin specific Access Methds: Lcal Outlk Applicatin Published Outlk Applicatin Web Outlk Data: Exchange Databases citrix.cm 31
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Data Lcatin: Exchange Servers Systems: Exchange Mailbx Servers Exchange Client Access Servers Ntes Exchange will need t be accessible in DR scenari in R3/DR fr missin-critical users Micrsft Office Type: Applicatin Descriptin: Prductivity applicatins fr regular ffice wrk Level: Outlk - Business Critical ther ffice apps - Prductivity Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Access Methds: Lcal Outlk Applicatin Published Outlk Applicatin Web Outlk Data: Outlk Data File Outlk Address Bk Exchange Mailbx Exchange Address Bk Data Lcatin: Exchange Servers User Outlk file lcatin (redirected frm My Dcuments t UPM strage?) Systems: Exchange Mailbx Servers Exchange Client Access Servers Ntes: Outlk needs t be available in all regins in case f failver fr business critical users. XenDesktp Type: Service Descriptin: Virtual Desktp Brkering and management system, required fr virtual desktp access and assignment Level: Missin Critical citrix.cm 32
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Data: XD Site Databases, regin specific. Data Lcatin: SQL Always On HA Grup Systems: Ntes: XD Delivery Brker Server VMs Citrix Licensing Server VMs Must be available in all regins fr missin- and business-critical users t be able t access desktps. Fr R3/DR the XenDesktp database and SQL servers supprting it are required t be brught up befre the XD Deliver Cntrllers Licensing server must be available fr XenDesktp functinality t allw user cnnectins StreFrnt Type: Service Descriptin: Web Prtal int the XenDesktp envirnment, required fr user sessin access Level: Missin Critical Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Access Methds: Web Brwser, Citrix Receiver Data: SF cnfiguratin Data Lcatin: SF servers Systems: Strefrnt Server VMs Ntes: Must be available in all regins fr missin- and business-critical users t be able t access desktps. Prvisining Services Type: Service Descriptin: Virtual Desktp VM streaming and deplyment system, required fr the virtual desktp VMs launch Level: Missin Critical Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Access Methds: PXE and DHCP fr the Virtual Desktp VMs Data: PVS Farm Databases vdisks Data Lcatin: Farm Database - SQL Always On HA Grup citrix.cm 33
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery vdisks File Servers Systems: PVS Server VMs File Servers (fr vdisks) Ntes: Licensing server must be available fr PVS functinality t allw virtual desktp launch User Prfiles Type: Data Descriptin: User data required fr all users wrk n virtual desktps Level: Missin Critical Primary Lcatin: Regin 1 & 2, R3/DR in case f disaster Access Methds: SMB Data: User persnal data, including redirected My Dcuments Data Lcatin: UPM File Servers Systems: File Server VMs citrix.cm 34
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Crprate Headquarters Frt Lauderdale, FL, USA India Develpment Center Bangalre, India Latin America Headquarters Cral Gables, FL, USA Silicn Valley Headquarters Santa Clara, CA, USA Online Divisin Headquarters Santa Barbara, CA, USA UK Develpment Center Chalfnt, United Kingdm EMEA Headquarters Schaffhausen, Switzerland Pacific Headquarters Hng Kng, China Abut Citrix Citrix (NASDAQ:CTXS) is leading the transitin t sftware-defining the wrkplace, uniting virtualizatin, mbility management, netwrking and SaaS slutins t enable new ways fr businesses and peple t wrk better. Citrix slutins pwer business mbility thrugh secure, mbile wrkspaces that prvide peple with instant access t apps, desktps, data and cmmunicatins n any device, ver any netwrk and clud. With annual revenue in 2014 f $3.14 billin, Citrix slutins are in use at mre than 330,000 rganizatins and by ver 100 millin users glbally. Learn mre at www.citrix.cm citrix.cm 35
Design Cnsideratins fr Citrix XenApp/XenDesktp 7.6 Disaster Recvery Cpyright 2015 Citrix Systems, Inc. All rights reserved. XenApp, XenDesktp, XenServer, CludBridge, and NetScaler are trademarks f Citrix Systems, Inc. and/r ne f its subsidiaries, and may be registered in the U.S. and ther cuntries. Other prduct and cmpany names mentined herein may be trademarks f their respective cmpanies.