Data Center Management Didik Partono Rudiarto CDCP, ITIL-F, COBIT5, CITM
A physical place that houses a computer network's most critical systems, including back-up power supplies, air conditioning, and security applications
It is vital that the mission critical Data Centre is designed, maintained and operated with hiavailability and efficiency in mind
Fact is most Data Centre's DO NOT MEET the full availability, capacity, safety or efficiency requirements often demanded
Standards and Guidelines ANSI TIA-942 Data Center Standard (Telecommucations) BICSI-002 Data Center Standard EN-50173-5 Data Center Cabling Standard ISO/IEC 24764 Data Center Cabling Standard The Uptime Institute TUI Tier Standard ISO 14000 Environmental Management System ISO 38500 IT Governance ISO 27001 Information Security ISO 22301 Business Continuity EN50600-2-6 Management and Operational Information
Power Cooling Infrastructure Computing Building system (HVAC, lighting, transport, fire, water supply), Environment Management System, Standard operating procedure, Manuals and documentation, etc.
?
DOWN TIME!!
Availability
Uptime SLA Downtime per year 90.0% 36 days, 12 hours 95.0% 18 days, 6 hours 99.0% 87 hours, 36 minutes 99.50% 43 hours, 48 minutes 99.90% 8 hours, 45 minutes, 36 seconds 99.99% 52 minutes, 33 seconds 99.999% 5 minutes, 15 seconds 99.9999% 32 seconds
EQUATION #1: As = Ac 1 * Ac 2 * Ac 3 * Ac n EQUATION #2: As = Ac 1 + ((1 - Ac 1 ) * Ac 2 ) single component single redundant EQUATION #3: As = Ac (n-1) + ((1 - Ac (n-1) ) * Ac n ) multi-redundant
Redundancy N-Base requirement N+1 redundancy N+2 redundancy 2N redundancy 2(N+1) redundancy
Sample #1: single component Component Availability Web 85% Application 90% Database 99.9% DNS 98% Firewall 85% Switch 99% Structure Cabling 99.99% ISP 95% Availability: 85%*90%*99.9%*98%*85%*99%*99.99%*95% = 59.87%
Sample #2: redundancy Redundant Web: 85% + (1-85%)*85% = 97.75%. Availability: 97.75%*90%*99.9%*98%*85%*99%*99.99%*95% =79.10% + Redundant Firewall: 85% + (1-85%)*85% = 97.75%. Availability: 97.75%*99%*99.9999%*99.96%*97.75%*99.99%*99.99%*99.75% = 94.3%
Sample #3: Multiple redundancy Avail % 1 Component 2 Components 3 Components 4 Components Web 85% =B2+((1-B2)*$B2) =C2+((1-C2)*$B2) =D2+((1-D2)*$B2) Application 90% =B3+((1-B3)*$B3) =C3+((1-C3)*$B3) =D3+((1-D3)*$B3) Database 99.9% =B4+((1-B4)*$B4) =C4+((1-C4)*$B4) =D4+((1-D4)*$B4) DNS 98% =B5+((1-B5)*$B5) =C5+((1-C5)*$B5) =D5+((1-D5)*$B5) Firewall 85% =B6+((1-B6)*$B6) =C6+((1-C6)*$B6) =D6+((1-D6)*$B6) Switch 99% =B7+((1-B7)*$B7) =C7+((1-C7)*$B7) =D7+((1-D7)*$B7) Structure cabling 99.99% =B8+((1-B8)*$B8) ISP 95% =B9+((1-B9)*$B9) =C9+((1-C9)*$B9) =D9+((1-D9)*$B9) System Avail % =b2*b3*b4*b5 *b6*b7*b8*b9 =c2*c3*c4*c5 *c6*c7*b8*c9 =d2*d3*d4*d5 *d6*d7*d8*d9 =e2*e3*e4*e5 *e6*e7*d8*e9
Data Center Ratings The Uptime Institute Basic Infrastructure Basic Infrastructure w/ Redundancy Concurrently Maintainable Fault Tolerant Tier-1 Tier-2 Tier-3 Tier-4 ANSI-942 Rating 1 Rating 2 Rating 3 Rating 4 BICSI-002 Class F1 Class F3 Class F3 Class F4 EN50600-1 Availability Class 1 Availability Class 2 Availability Class 3 Availability Class 4 Note: Ratings will separated between design and operation
Design & Build Consideration
Standard and Guidelines There is no world-wide recognized standard for data center environment Best practices (semi standards) Uptime Institute ANSI/TIA-942 (Telecomm) SS507 (ISO 24762 International guideline for Business Continuity/Disaster Recovery) Vendors (white papers)
Site Selection potential hazard availability proximity selection capacity Infrastructure Location Build or Rent? Building slab-to-slab height building codes floor loading space requirements Security & safety
Critical Infrastructure main power detection genset prevention Fire suppression Power UPS suppression Cooling cooling infrastructure air flow
What Type of Requirement? Lease or Purchase Existing Data Center Keep apprised of any existing Data Center Inventory Understand the level of infrastructure Convert a Building to a Data Center Know which buildings have the most potential for conversion those with some level of Data Center Centric attributes (close to substation, access to fiber, bunker type construction, etc.) Have an understanding of Cost to Convert Construct a New Data Center Know which sites are best located for a Data Center (close to substation, access to fiber, affordably priced, away from freeways and rail lines, etc.) Have an understanding of Cost to Build
Building a New Data Center Cost Ranges $400/sf Basic Infrastructure with No Redundancy Basic power & cooling Unplanned outage disrupts systems & users Scheduled maintenance disrupts availability 99.67% availability Tier1 $600-800/sf Basic Infrastructure with Redundant Components Single, non-redundant distribution path Unplanned outage can disrupt systems & users Maintenance doesn t disrupt availability 99.75% availability Tier2 $900-1200/sf Concurrently Maintainable Sufficient MEP redundancy even when one of the MEP components has been removed from the infrastructure Unplanned & maintenance outages don t interrupt availability 99.98% availability Tier3 $1500/sf Fault Tolerant Multiple independent & physically separate systems Each system has redundant components & multiple, independent, diverse & active distribution paths Unplanned & maintenance outages don t interrupt availability 99.99% availability Tier4
Some of the Data Center COST Components..
Structure Raised Floor Electrical Security Chilled Water Fire Suppression Cooling System Water Storage Fuel Storage Dual Feeds High Reliability Generators UPS System Cooling Tower Switch Gear
Data Center Operations
Data Center Organization Enterprise Infrastructure services Technical services Data Center operations End User services Service Center Production control IT Security services System administration Desktop services Database administration End User training Web Messaging Network services Source: Organization structure for large IT organizations, Harris Kern s Enterprise Computing Institute
Data Center Organization Data Center Director Architecture Design Implement Operate Compute Compute Compute Compute Network Network Network Network Storage Storage Storage Storage Facility Facility Facility Facility Source: Trend in IT organization transform Data Center, Cisco
Functions & Layout Holding area Staging area Control room Server room (production)
Equipment Lifecycle installation commissioning removal de-commissioning
Maintenance Documentation Succession planning Labeling Training Cleaning Periodic assessment Testing Equipment Lifecycle
Monitoring and Automation Environmental conditions temperature, humidity, water leak detection, cooling systems routers, switches, overall network performance Network infrastructure Power infrastructure Monitoring status generator, UPS system, batteries mainframes, servers, storage and backup IT systems Safety and security systems fire detection panels, CCTV cameras Objective: to detect potential issues before they turn into problems
Reporting & Documentations Power capacity & redundancy Reporting Outage reports Capacity reports Cooling capacity Physical space capacity Network capacity IT/computing capacity
Operational Safety govern by SOP s Daily operations Emergency situation addressed by CERT Special works PTW must be present at all times
Shift Hand-over
HR Qualification & Certification
Certification Roadmap Certified Data Centre Professional - CDCP Certified Data Centre Specialist - CDCS Certified Data Centre Expert - CDCE Certified Data Centre Facilities Operation Manager CDFOM