Data Information and Management System (DIMS) The New DIMS Hardware Wilhelm Wildegger, DFD-IT; Jens Pollex, DFD-BN 04. Mai 2007 Viewgraph 1 The New DIMS Hardware > W. Wildegger > 2006-12-08
Outline DIMS Overview Reasons for Renewal of DIMS Hardware New s Disk Storage Area Network (Disk SAN) and Storage System Tape SAN and Tape Library New archive Tape Drives and Media Data Migration to New Media High Availability: Sun Cluster and High Availability Administration Network and SunRay Network DIMS at Oberpfaffenhofen, Neustrelitz and Birlinghoven Status Viewgraph 2
Data Information and Management System (DIMS) Sites OP and NZ DIMS OP EOWEB User Information Services (incl. Pickup Point) Online/Offline Product Generation & Delivery DIMS NZ Order Management Product Library Product Library Operating Tool Requests Orders Request Trees Reports Archive Inventory Inventory Archive Request Trees Operating Tool Requests System Post- Post- System Ingestion System System System Ingestion System 10110011 00010101 10111001 Raw 01100100 11111111 00010101 Level 101110010 00000000 Systems 10110011 00010101 10111001 Raw 01100100 11111111 00010101 Level 101110010 00000000 Viewgraph 3
Reasons for Renewal old hardware (servers, disk systems, tape drives) approaching end of service life old archive tape media / drives start having read problems necessary to copy data to new media for long term archiving old hardware not powerful enough to support new projects like TerraSAR-X with respect to data throughput and disk / tape capacity old hardware didn t fulfill availability requirements of TerraSAR-X Viewgraph 4
New s old servers were Sun E6500, E3500, E450 new servers are Sun Fire V4900, Sun Fire V890 and Sun Fire V490 V4900: 8 CPUs à 1.8 GHz, 32 GByte main memory V890: 4 or 8 CPUs à 1.8 GHz, 16 or 32 GByte main memory V490: 2 or 4 CPUs à 1.8 GHz, 8 or 16 GByte main memory ca. 10 times more CPU power with respect to old servers ca. 10 times more main memory with respect to old servers all servers connected to network with 1 Gbit/s Ethernet, archive server with 2* 1 Gbit/s Ethernet Solaris 10 SAM-FS 4.5.33 Up to 20 I/O ports and 4 Ethernet ports Viewgraph 5
s (Oberpfaffenhofen & Neustrelitz) Order Management Ordering Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS User Inform. Services UIS Post Post Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. CAF Firewall C-AF Network only Oberpfaffenhofen to DLR Backbone Switch Viewgraph 6
, s (Oberpfaffenhofen) Viewgraph 7
, s (Neustrelitz) Viewgraph 8
Disk SAN servers only have disks for operating system and COTS SW installation all data partitions are located on one central storage system per site (OP and NZ) Sun/HDS StorEdge 9985: 25 TByte in OP, 13 TByte in NZ interconnection between servers and storage system via 2 SAN switches (Brocade 4100) and 2 Gbit/s fiber channel links ( Disk SAN ) redundancy / high availability RAID 5 and RAID 6 used every server is connected to both SAN switches using different host bus adapters (HBA) (per switch one or several links in parallel) storage system is connected to every switch with 8 links even if one SAN switch, one HBA, one link or one controller of the SAN storage system fails, all partitions are still visible, performance is reduced Viewgraph 9
Disk SAN Order Management Ordering Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS SAN Storage System Switch Switch User Inform. Services UIS Post Post Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. CAF Firewall C-AF Network only Oberpfaffenhofen to DLR Backbone Switch Viewgraph 10
Tape SAN and 2nd Copy Tape Library with the old hardware both primary and secondary copies were written in the same tape library (AML/2 in OP, AML/J in NZ); secondary copy tapes were off-loaded once full now secondary copy tapes are written in a secondary copy tape library (Quantum / ADIC i2000); this tape library is located at some distance from the primary copy tape library, in an other building interconnection between servers and tape drives within the tape libraries is via 2 SAN switches (Brocade 4100) and 2 Gbit/s fiber channel links ( Tape SAN ) redundancy / high availability DIMS archive server connected to both Tape-SAN switches with several links using several HBAs (Host Bus Adapters) 2 extra Tape SAN switches (Brocade 4100) are co-located with secondary copy library half of the tape drives are connected to one pair of switches, the other half is connected to the other pair of switches local switches are connected to remote switches with 2* 5 single mode fiber links Viewgraph 11
Tape SAN and Tape Library Order Management SAN Storage System Ordering Switch Switch Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS 2* 5* Single Mode Fibers Primary Copy Robot Library... 9940B Secondary Copy Robot Library (remote)... LTO-3 User Inform. Services UIS Post Post Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. 2* LTO-2 AMU CAF Firewall C-AF Network only Oberpfaffenhofen to DLR Backbone Switch Viewgraph 12
New Archive Tape Drives and Media Purpose Type Native Capacity Read / Write Speed Connection old 1st copy DLT4000 DLT7000 Sony AIT-2 Sony MOD 20 GByte 35 GByte 50 GByte 5 GByte 2 MByte /s 3 MByte /s 5 MByte /s 1 MByte /s SCSI old 2nd copy DLT4000 DLT7000 20 GByte 35 GByte 2 MByte /s 3 MByte /s SCSI new 1st copy (where necessary) SAN disk system (disk archiving) as much as necessary 80 MByte / s new 2nd/1st copy StorageTek 9940B 200 GByte 30 MByte /s new 2nd/3rd copy IBM LTO-3 400 GByte up to 80 MByte /s Capacity Library 9940B > 11.000 slots ~2.2 PByte / 540 (1500) slots ~ 110 (300) TByte Capacity Library LTO-3 > 1.100 slots ~0.4 PByte / 400 ( 720) slots ~ 160 (300) TByte SAN / SAN / SAN / OP/NZ Viewgraph 13
The New DIMS HW, OP, AML-2, before conversion Viewgraph 14
The New DIMS HW, OP, AML-2, after conversion Viewgraph 15
The New DIMS HW, OP, ADIC i2000 Viewgraph 16
The New DIMS HW, ADIC i2000, Backside Drives Viewgraph 17
The New DIMS HW, ADIC i2000, View inside Viewgraph 18
Data on Old Tape Drives/Media and Migration to New Media new data (archived from Oct. 2006) are written to new media old data (archived until Sept. 2006) are still on old media (and operationally accessible) to enable reading the old data, half of the old tape drives are still connected to the new servers to preserve the old data, they will be migrated (re-archived) to new media ca. 100 TByte in OP ca. 45 TByte in NZ migration in OP started April 2007 once finished, old tape drives will be removed Viewgraph 19
Migration to new Media in Neustrelitz Migration was started in December 2006 Split into two steps, DIMS and old world Using FMRT (HMK) in 19 different streams (each new file-system one stream) ca. 45 TByte Migration of first and second copy at the same time Using 4 ait-2 drives for reading Cache for migration 7GB Max. 2 streams in parallel First part (approx. 15 TByte) finished before Christmas 2006 Completely finished end of February 2007 Some problems with reading first copy, all data could be read from first or second copy Viewgraph 20
High Availability (1) Sun Cluster vital functions for DIMS run on DIMS Operation server e.g. DNS slave, Werum Corba Name Service if one of these services fails, all DIMS functions are inoperable to avoid losing vital functions due to a DIMS operation server failure, DIMS Operation server and server are configured as a Sun Cluster, i.e. if the DIMS Operation server fails, the server provides the DIMS vital functions switching to the backup server and starting the services is fully automatic (without human intervention); it happens within a few minutes ( functions are not taken over by the DIMS Operation server, if server fails) Viewgraph 21
High Availability Order Management Ordering Sun Cluster Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS Secondary Copy Robot Library (remote) SAN Storage System 25 TByte User Inform. Services Switch Switch UIS Post Post High Availability High Availability / Upgrade Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. Primary Copy Robot Library... 13* 9940B 2* LTO-2 AMU... 8* LTO-3 CAF Firewall C-AF Network only Oberpfaffenhofen to DLR Backbone Switch Viewgraph 22
High Availability (2) High Availability DIMS functions must have an availability of 98% relating to one month, i.e. max. 14 h of unavailability per month (TerraSAR-X requirement) hardware problems or upgrade procedures of OS or COTS SW might last longer than 14 hours therefore there is a High Availability server (HA server) it can serve any DIMS function (including archive) thanks to Disk SAN it is possible to mount the disk partitions / file systems of the failed server at the HA server (prerequisite is that all data being modified reside on the SAN storage system and not on the local disk) thanks to Tape SAN it is possible to attach the tape drives of the failed archive server to the HA (of course not the old SCSI connected drives) no physical change of any connection is possible switch over is performed manually (with the help of a written procedure) bringing up a normal DIMS service on the HA server takes ca. 20 min, bringing up the archive service takes ca. 40 min Viewgraph 23
Administration Network and SunRay Network Administration Network and ALOM (Advanced Lights Out Management) no console terminals connected to the servers all servers are equipped with a service processor for advanced lights out management administration network allows access to ALOM service processors console of each server can be seen on any workstation / PC allows issuing of all commands, monitoring the boot process and even switching on and off the server fiber channel switches, storage system and secondary copy library are also connected to administration net to allow remote administration access to administration net is controlled by C-AF firewall + password on every service processor standard graphic access no graphic cards in the servers access via SunRay and SunRay Clients interconnected via SunRay network Viewgraph 24
Administration Network and SunRay Network C-AF Network Order Management Ordering Name/Web/ SW Service Operations Product Library Product Library / Inventory Archive DIMS Secondary Copy Robot Library (remote) SAN Storage System Switch Primary Copy Robot Library... LTO-3 Switch... 9940B User Inform. Services UIS Post Post High Availability High Availability / Upgrade Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. 2* LTO-2 AMU Administration Network CAF Firewall to DLR Backbone Switch personal workstation SunRay Client SunRay SunRay Client SunRay Network only Oberpfaffenhofen SunRay Client Viewgraph 25
with Systems (Oberpfaffenhofen) MSG Proc. System(s) ERS-2 GOME L2/L3 Proc. System(s) SRTM X-SAR Proc. Systems Terra/Aqua MODIS Proc. Systems AIR OS... Proc. System(s) TerraSAR-X TMSP Proc. System TerraSAR-X TVSP Proc. System Systems Order Management Ordering Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS Secondary Copy Robot Library (remote) SAN Storage System 25 TByte User Inform. Services Switch Switch UIS Post Post High Availability High Availability / Upgrade Online/Offline Product Gen. & Delivery CD/ DVD Writing System Online/ Offline Product Gen. Primary Copy Robot Library... 13* 9940B 2* LTO-2 AMU... 8* LTO-3 CAF Firewall C-AF Network to DLR Backbone Switch Viewgraph 26
with Systems (Neustrelitz) ENVISAT MERIS Proc. System(s) Champ Proc. System(s) BIRD Proc. Systems GRACE Proc. Systems xxx... Proc. System(s) TerraSAR-X TMSP Proc. System TerraSAR-X TMSP Proc. System Systems Name/Web/ SW Service DIMS Operations Product Library Product Library / Inventory Archive DIMS Secondary Copy Robot Library (remote) SAN Storage System 13 TByte Switch Switch Post Post High Availability High Availability / Upgrade Primary Copy Robot Library... 10* 9940B AMU... 6* LTO-3 CAF Firewall C-AF Network to DLR Backbone Switch Viewgraph 27
DIMS in Oberpfaffenhofen, Neustrelitz and Birlinghoven Oberpfaffenhofen (OP) Neustrelitz (NZ) other Institutes CAF DMZ CAF Infrastructure Net 1G & 100M Ethernet Switches CAF Firewall CAF Prod. Net with DIMS s DLR OP Backbone Switch CAF DMZ CAF Infrastructure Net 1G & 100M Ethernet Switches CAF Firewall CAF Prod. Net with DIMS s DLR NZ Backbone Switch & Router protected DLR LAN Packet Shaper VPN + WAN Router DLR OP to 155 Mbit/s VPN-Tunnels 155 Mbit/s over X-WIN DLR OP to Bih VPN Tunnel DLR NZ VPN Tunnel Bih to DLR NZ VPN Tunnel 1 Gbit/s VPN + WAN Router Packet Shaper protected DLR LAN VPN Router GÉANT protected DLR LAN 1 Gbit/s DLR Firewall FTP DLR protected DMZ EOWEB Internet X-WIN 1 Gbit/s St. Augustin / Birlinghoven (Bih) DLR public DMZ Köln-Porz (KP) Viewgraph 28
Status all new hardware operational (including all connections and all COTS software) all of the DIMS services already migrated to new hardware WERUM Corba Name Service (OP and NZ) Product Library (incl. Archive) (OP and NZ) (OP and NZ) User Information Services Interface / Loading (UI / UL) (OP) Order (OP) Online / Offline Product Generation (OPG) (OP) EOWEB backend, EOWEB UI/UL (on new HW from the very beginning) (OP) EOWEB frontend (Bih) Post- (NZ) Still to be migrated Post- (OP) HA- Switch Over was tested in operational environment Viewgraph 29
SQFS für TerraSAR-X Viewgraph 30
Thanks to all involved persons Willi s Team in OP DIMS Team and HeJoe in NZ Company Dignum DIMS development team Viewgraph 31