Tyrant: A High Performance Storage over IP Switch Engine Stuart Oberman, Rodney Mullendore, Kamran Malik, Anil Mehta, Keith Schakel, Michael Ogrinc, Dane Mrazek Hot Chips 13, August 2001 1
Background: Direct-Attached Storage LAN / NAS Direct-Attached Storage Expansion beyond server s internal drive capacity Tape backup Often used with Network- Attached Storage (NAS) High performance SCSI or Fibre Channel connections Utilization Utilization Hot Chips 13, August 2001 2
Storage Area Networks SANs Utilization LAN / NAS SAN Utilization Storage Area Networks Pooling of external storage devices for better utilization and availability LAN-free backup Serverless backup Non-disruptive expansion and maintenance Leverage existing staff to manage three or four times more storage SAN ROI estimates* range from 65-297 percent *Source: CSFB, June 2001 Hot Chips 13, August 2001 3
No Native Fibre Channel Extension Wide area networks are based on IP or Gigabit Ethernet, not Fibre Channel FC SAN FC Switch FC Switch FC Switch FC Switch FC SAN Hot Chips 13, August 2001 4
Integrated IP SAN, MAN, and WAN Fabric Optimal routing decisions are made by the integrated IP SAN-MAN-WAN fabric IP SAN Nishan Nishan Unified IP SAN-MAN-WAN Fabric No more islands Nishan Nishan IP SAN Familiar IP and Ethernet management tools Hot Chips 13, August 2001 5
Nishan Systems Framework for IP Storage OPEN SAN, LAN, MAN, WAN IP Routers and Gigabit Ethernet Switches 10 Gigabit Ethernet Future Interfaces SCSI Disk, RAID, Tape SCSI Host Bus Adapters Network Attached Storage FC Disk, RAID, Tape FC Host Bus Adapters FC SANs Standards-Based Hot Chips 13, August 2001 6
Tyrant: An Introduction Tyrant: a switch engine used to build many sizes of IP Storage Area Network switches Two categories of switches: Low density: <=32 ports High density: > 32 ports Tyrant integrates hardware to build both categories supporting Fibre Channel switch ports, 1Gbps and 2Gbps Fibre Channel arbitrated loop ports, 1Gbps and 2Gbps Gigabit Ethernet Switching between all port types Hot Chips 13, August 2001 7
Tyrant: Major Features Four 1Gbps Ethernet MACs Four 1Gbps / 2Gbps Fibre Channel MACs 1Gbps or 2Gbps FC-AL and FC-SW Four Gigabit network processor interfaces Four flexible packet schedulers Four programmable encapsulation engines Distributed shared-memory controller supporting 2MB SRAM per chip Chip-to-chip fabric interface Hot Chips 13, August 2001 8
Tyrant Block Diagram To Network Processors Fibre channel Gigabit Ethernet FC MAC 0 GE MAC 0 NP IF 0 ENCAP ENG 0 FABRIC IF 0 To Other Tyrants (Ingress Flow) Fibre channel Gigabit Ethernet FC MAC 1 GE MAC 1 NP IF 1 ENCAP ENG 1 FABRIC IF 1 MEMORY CONTROLLER AND CROSSBAR To Other Tyrants (Egress Flow) SRAM Fibre channel Gigabit Ethernet FC MAC 2 GE MAC 2 NP IF 2 ENCAP ENG 2 FABRIC IF 2 From Other Tyrants (Ingress Flow) Fibre channel Gigabit Ethernet FC MAC 3 GE MAC 3 NP IF 3 ENCAP ENG 3 FABRIC IF 3 From Other Tyrants (Egress Flow) Hot Chips 13, August 2001 9
Cut-thru Each Tyrant has side-band cut-thru request / grant interface Allows for each ingress port to request cut-thru to any egress port If cut-thru is granted, packets are switched in distributed fashion, with no writes to SRAM First-byte-in to MAC to first-byte-out of MAC latency: Less than 2us Any port to any port FC packets using no external network processing Hot Chips 13, August 2001 10
Early Forwarding Reduced switch latency possible even when packet stored to SRAM Entire store-and-forward not required Store only minimum amount of packet: Programmable stored amount based upon link speed difference of ingress and egress ports (GE, FC, or 2G FC) Ensures no underrun of egress port 4-5us latency possible Hot Chips 13, August 2001 11
Packet Scheduler True output queues at each egress port Requires fast and wide shared memory Guarantees maximum switching throughput Single trip to memory 256 unique output queues per port Scheduler features: Three level hierarchy Each level may separately use strict priority or weighted-fairqueuing (WFQ) / weighted-round-robin (WRR) Virtual time based scheduling => very low jitter Minimum and maximum bandwidth guarantees Hot Chips 13, August 2001 12
Scheduler Organization 8 Q s L1 REG L2 REG 8 Q s L1 REG 32 L1 Schedulers L3 REG Selected Queue 8 Q s L1 REG L2 REG 8 Q s L1 REG Hot Chips 13, August 2001 13
Full-Duplex FC-AL FC arbitrated loop supports 126 devices Loop operates either half-duplex or full-duplex In full-duplex configuration, target device may open switch port to send it data If switch port can reorganize its output data while receiving target device s data, bandwidth is maximized Hot Chips 13, August 2001 14
Optimized Full-Duplex FC-AL Performance Bin packets destined for different target devices using one output queue per loop device Separate packet scheduler interface allows FC MAC to directly select particular queue for transmission Switch port opened by loop device: MAC pulls packet from correct queue No head-of-line blocking Hot Chips 13, August 2001 15
Programmable Encapsulation Engine Programmable encapsulation engine per egress port used to implement IP storage protocols Allows flexibility for evolving IP storage standards Network processor on ingress FC port adds Encapsulation Instructions to header of packet Encapsulation engine at egress IP port executes microcode to form encapsulated IP storage packets Encapsulation engine performs: IP address lookups Checksum and CRC computations GE and IP header creation Hot Chips 13, August 2001 16
Application Example 1: 16-port Switch Each port may be configured to be GE, 1G FC, or 2G FC 0 1 2 3 4 5 6 7 SRAM Tyrant0 Tyrant1 SRAM NP NP SRAM Tyrant2 Tyrant3 SRAM NP NP 8 9 10 11 12 13 14 15 Hot Chips 13, August 2001 17
Application Example 2: Chassis Line Card SRAM NP Each port may be configured to be GE, 1G FC, or 2G FC 0 1 Tyrant0 SRAM NP Generic Fabric Interface 10+ Gbps pipe across backplane to Fabric Card using High Speed Serial Links 2 3 Tyrant1 SRAM NP External Fabric Transceiver 4 5 Tyrant2 SRAM NP 6 7 Tyrant3 Hot Chips 13, August 2001 18
Physical Details IBM 0.18um ASIC process: SA-27E 25 million transistors 15mm x 15mm die size 928 signal pins Nominal power consumption: 16W Three major clock domains 125 MHz 104 MHz 53.125 MHz Implementation status: internal testing Hot Chips 13, August 2001 19
Hot Chips 13, August 2001 20 Chip Layout
Hot Chips 13, August 2001 21 Tyrant Photo