1 Large Computing Systems Server Farm Networked cluster of interchangeable file/application servers Provides load balancing for availability and reliability Blade Server Server farm in a single cabinet providing I/O, power, cooling Blade = hot-swappable single-board file/application server Big Iron Large, expensive computers Multi-processor systems Complex inter-processor architecture Supercomputer Fast numerical processing (number crunching) Specialized, highly parallel user programming interface Enormous I/O capability and reliability Standard single-user interface I/O Controller I/O Controller I/O Controller Bus Adaptor Card Memory Card Memory Card CPU Card Backplane 2 Supercomputer Systems Oriented to problems limited by calculation speed Weather modeling Global warming forecasts DNA and protein analysis Digital video processing Complicated to program High degree of programmer-visible parallelism Special "parallelized" high-level language Require specialized, application-specific software Typical systems SMP assembly of 64 to 256,000 Alpha, Itanium 2, or PowerPC CPUs Proprietary OS assigns tasks to CPUs Oriented to problems limited by I/O and reliability Optimized for business-oriented "heterogeneous workload" Simple transaction-oriented computations Enormous volume of accesses to external databases Bank account management Credit card processing Market trading Insurance processing Airline reservations Built for reliability and availability Mean Time Between Failure (MTBF) measured in years Automatic swapping of failed hardware/software components Constant self-testing and error correction No reboots for decades 3 4
5 Quality of Service (QoS) s are "Rolls-Royce" of computer systems Quality always outweighs cost Highest quality hardware engineering Most reliable software techniques Highest level security and authentication Guaranteed backward compatibility High level technical support Off-site redundancy Backup system run by vendor Instantaneous transparent switch-over on failure Architecture Overview OS OS OS OS Systems Manager Hardware Hardware CPUs, I/O system, internal communication network Systems manager (hypervisor) Operator console for partitioning/configuring CPUs and I/O OS Each partition runs a separate instance of an operating system Can run Unix, Windows, z/os, MVS, VM, instances in parallel User User sees single-user interface provided by OS User OS according to I/O configuration of terminal/network interface 6 Scalability Systems Manager sees all hardware as a single unit Holistic approach to large system Multiple CPUs in a single physical cluster Multiple physical cluster in a single hardware cabinet Multiple cabinets in a system complex (Sysplex) Hot swap Change hardware configuration without shutdown Add/Remove processors and I/O systems Reassign processors and I/O systems to groups On-demand computing Configuration allocates default resource partition Dynamically reassign resources for load balancing Marketing Perspective can replace 10 to 1000 smaller servers Multiprocessor system provides equivalent power Partitioning provides equivalent flexibility Reliable infrastructure replaces multiple small systems Centralized power supply, cooling system, backup RAS (Reliability, Availability, Serviceability) and compatibility Reduced administrative, management, and service costs Lower TCO (Total Cost of Ownership) Higher ROI (Return on Investment) Advantage to organizations that cannot afford risk Prices dropping but IBM s zseries profits are growing 7 8
9 Traditional IBM s Traditional non-pipelined CPU implements CISC architecture IBM System/360, System/370, System/390, zseries 890 Business-oriented transaction-based application load 85% of programs written in COBOL 15% written in Assembler, C, C++, Java and other languages IBM SNA (System Network Architecture) networking Logical Partitions (LPARs) Partitioned multiprocessor assembly organization One instance of an OS per LPAR IBM operating systems MVS (Multiple Virtual Storage) JCL (Job Control Language) batch processing interface TSO (Time Sharing Option) time-sharing via dumb terminals VM/CMS Virtual Machine provides virtual mainframe environment per user Conversational Monitor System user shell running under VM Contemporary IBM zenterprise EC12 EC = enterprise class Modern architecture 64-bit superscalar pipelined CPUs SMP multicore configuration Advanced ILP Out-of-order instruction scheduling Cache hierarchy + branch prediction Modern operating system support IBM z/os (MVS replacement) Optimized (at assembly level) for zec12 mainframes Native support for UNIX programs (z/os is a certified UNIX system) TCP/IP Java (z/os provides full Java execution environment) Encryption + security protocols IBM z/vm (VM/CMS replacement) User sees virtual machine running Linux 10 zec12 Hardware Arrangement Frame Z Frame A zec12 Architecture Overview HCA InfiniBand host channel adapter Hardware Management Console (HMC) operator console (stand alone computer) Support Element (SE) laptop issues HMC instructions Flexible Service Processor (FSP) dedicated CPU implements communication + control Book processor cluster + memory + I/O interface + power supply interface 11 12
13 zec12 Book Structure (Maximum System) Frame A 120 active cores 4 books 30 active cores per book Book Multi-Chip Module (MCM) 36 cores = 6 PU chips 6 cores per PU 2 storage control chips + 384 MB of L4 cache Physical memory = 960 GB per book 3840 GB per Frame A 8 PCIe fanouts 8 GB/s links to PCIe I/O drawers 3 Distributed Converter Assemblies (DCA) power connection n+1 redundancy continue operation after 1 DCA failure Permits hot maintenance 2 Flexible Service Processor (FSP) cards Fabric Book Connectivity (FBC) High speed point-to-point connectivity zec12 Processing Unit (PU) 6 core PU chip 2.75 billion transistors 5.5 GHz clock speed 48 MB unified L3 cache Unified interface to 6 cores + I/O buses + memory controllers 160 GB/s to each core Storage control (SC) implements L3 to L4 communication GX I/O bus to PCIe Memory controller (MC) access to main memory POWER7 64-bit superscalar core Dynamic scheduling 6 EUs 2 integer ALU, 2 load/store, 1 FPU, 1 decimal FPU Cache 64 KB I + 96 KB D private L1 cache 1 MB I + 1 MB D private L2 cache 14 zec12 I/O System I/O cage Holds communications controllers 28 I/O card slots I/O controllers Handle network connections Users, terminals, peripherals Coupling controllers Handle connections between mainframe systems Processor Resource/Systems Manager (PR/SM) System Manager between hardware and OS layers PR/SM functions control all system aspects Responsible for physical topology knowledge Hardware information handled by OS in smaller computers PR/SM is aware of (physical) book structure Manages work dispatch on physical topology PR/SM implements Logical Partitioning (LPAR) zec12 only runs in LPAR mode Logical partitions (LPAR) Allocated physical resources by PR/SM Not aware of (physical) book structure Have no control over systems aspect of physical resources LPAR - OS LPAR - OS LPAR - OS LPAR - OS Systems Manager (PR/SM) Hardware (PUs, RAM, Books, I/O) 15 16
17 LPAR Allocation rules PUs, memory and communication channels allocated to LPARs PR/SM attempts to minimize hardware allocated to a logical partition Resources can be dedicated to an LPAR or shared by LPARs Resources can be shared between LPARs by weight (priority) PR/SM attempts to group PUs for a logical partition within one book PR/SM attempts to group memory for a logical partition within one book PR/SM attempts to group logical PUs and memory within one book If not possible, groups in adjacent books PR/SM re-allocates PUs to logical partitions for load balancing PR/SM attempts to re-allocate logical PU on physical PU Permits reuse of L1 cache content Parallel Sysplex Parallel Sysplex Merge 2 to 32 instances of z/os into a single system Applications divide work and data among LPARs Coupling facility (CF) Coordinates shared LPAR resources Manages process coordination among z/os instances Manages data coherence Manages time synchronization Implemented independently or in a zec12 LPAR Geographical diversity Coupled LPARs can be on remote physical systems Provides physical backup for disaster recovery 18 Parallel Sysplex Model LPAR - OS LPAR - OS LPAR - OS LPAR - OS Systems Manager (PR/SM) Hardware (PUs, RAM, Books, I/O) Coupling Facility LPAR - OS LPAR - OS LPAR - OS LPAR - OS Systems Manager (PR/SM) Hardware (PUs, RAM, Books, I/O) Advantages of Parallel Sysplex High capacity for large workloads Applications see all resources on all LPARs as one system Resource sharing Applications can access all resources on all LPARs Dynamic workload balancing Software can increase resources without reconfiguring LPARs Automatic failure recovery Remote LPARs continue working if local LPAR fails System z server groups designed for 99.999 percent availability Continuous application availability Applications continue on one LPAR during service on another LPAR 19 20
21 Integrated Hardware and System Assists System z Application Assist Processors (zaaps) Execute Java programs Under IBM Java Virtual Machine (JVM) Works in LPARs running z/os Reduce capacity requirements on CPUs CP Assist for Cryptographic Function (CPACF) Cryptographic support on every PU DES and TDES data encryption/decryption Integrated Facility for Linux (IFL) Supports Linux and open standards Real or virtual environment within System zec12 configurable as Linux-only server Unified Resource Manager Integrated management fabric (package) Runs on Hardware Management Console and Support Element Sees all workload from one uniform point of control Fast + agile for reconfiguration Growth, load balance, disaster recovery Management areas General system management Virtual server management + provisioning Hypervisor management + support for application deployment Energy management + monitoring Power + cooling control Network management Virtual networks + access control Workload Awareness Manage CPU resource across virtual servers hosted in same hypervisor Balance workload performance policy objectivesdisaster recovery 22 BladeCenter Extension (zbx) System z IBM blade server systems Optimized for standard OLTP + web-oriented services zbx Optional machine incorporates System z services into zec12 Managed transparently by Unified Resource Manager Optional blades IBM WebSphere DataPower Integration Appliance Offloads web-based workloads from core applications Front end server to optimize XML processing XML hardware acceleration for service-oriented architecture (SOA) HTTP format SOAP (Simple Object Access Protocol) format Seamless integration of distributed and System z platforms POWER7 blades Virtualized running AIX / Red Hat Enterprise Linux / Windows Server "Managed in Cloud" Cloud = virtualization management infrastructure Eliminate traditional fixed-hardware boundaries CPU memory network storage Deliver infrastructure / platform / application as service zec12 as private cloud infrastructure Centrally managed + controlled set of IT resources Rapid and flexible service delivery Capacity on Demand (CoD) Multiple configuration definitions available for temporary requirements Up to 200 staged definitions 8 installed at given time Manual invocation by operator Automatic invocation Workload Manager (WLM) sets policy thresholds Capacity Provisioning Manager invokes on specific thresholds 23 24
25 Environmental Requirements Power Cooling Width Depth Height 27.6 kw Water / Air Cooled 1568 mm 1806 mm 2013 mm