Software Defined Whatever @SURFsara RON TROMPERT
About SURFsara Supports research in the Netherlands (and abroad) by offering advanced ICT infrastructure, services and expertise National Supercomputer Cartesius LISA compute cluster Visualisation Grid BigData HPC Cloud Data Ingest SURFdrive (together with SURFnet)
Customers
Customers
Customers
Where does it al come from? It s all about scale After the internet become all pervasive in the 90s companies emerged that provided services to the whole planet Technologies at the time were not suitable to operate at planet scale New technologies were developed for this purpose Google (HDFS, BigTable) Facebook (Cassandra) Cost effective technologies Commodity hardware New technologies providing fault tolerance. Services are always there. Google is never down for a scheduled maintenance. At this scale, failure is not an exceptional situation, it belongs to the normal daily routine
How does this affect us? OK we don t have a datacenter as big as an IKEA store but are these developments still interesting for us? Need cost efficient investments (maximum bang for the buck) (long) downtimes for maintenances are becoming less acceptable We need scalability too. Computing and Data Storage needs are continuously increasing
Anything else? Choice for open source products Costs Open standards Hardware procurement by means of tenders typically three generations of hardware in your infrastructure supplied by multiple vendors Still need reliable hardware and support
Software Defined Storage Responsibility for the storage system lies with the software Not trying to prevent failures but just work around it. Recovery is the responsibility of the software not the hardware. Mix-and-match hardware configurations from multiple vendors Low cost commodity hardware Minimum effort required to manage
Grid Server Park Used to run services just to enable to be on the Grid Now also running non-grid services for user communities Has been around for years, first running services on iron. Later on services were more and more virtualised. Turn the GSP into a private cloud based on Openstack
Grid Server Park -> Service Cloud Until recently: SAN storage-based
Service Cloud 23 compute nodes 14 CEPH Storage nodes CEPH Block devices Object storage File system
HPC Cloud Compute nodes Huge switch DDN storage (NFS)
New HPC Cloud 12 GPU nodes 32 32-core Compute nodes Huge switch High memory system (2TB memory) Compute with SSDs CEPH storage cluster 34 nodes 60TB each 16 nodes 40TB each
Apache Mesos Warehouse Scale Computer (WSC)/Datacenter Computing A datacenter is just a form factor. Look at it as just one computer. No difference between a smart phone, tablet, laptop, server or a data center High utilization Failures are dealt with in application software No more silo s. No more dedicated hardware for special purposes. Just run a variety of workloads on the same hardware. VMs Batch jobs Containers Long running services Datacenter with Mesos as OS
Apache Mesos
SDN: Network as a Service (NaaS) Redesign of operation system support (OSS) of SURFsara network. Choice of network management system (NMS) software. (OpenDayLight, OpenNAAS, Tail-f are likely candidates.) Replaces in-house developed custom scripts and tools. Goal: allow non-network engineers to configure the network - Server ops can set firewall rules; - Building facilities can upgrade port descriptions of office network; - IaaS users can set up lightpaths. - Lightpaths can be extended through the SURFsara network (API for external access: NSI standard, developed in OGF). Status: implementation in progress Network Operations Team DNS NTP ssh Stepping-stones Operations Support Systems (OSS) layer https ssh Network layer Syslog TFTP Authentication (tacacs) dns ntp tacacs syslog tftp https REST https REST JSON Resource management (Racktables) Monitoring (Cacti) Configuratiion management (Rancid, websvn) https REST http(s) SNMP ICMP NMS-internal (frontend) IT Department https REST Netflow (NfSen) Alarms (Icinga) Network management frontends (read-only) Network management ssl-only https REST JSON ssl-only Dashboard Monitoring Network controller NMS-backend ssh NETCONF JSON REST External https REST JSON Lightpath service (OpenNSI) NMS-External (frontend) ssl-only (read-write)
SURFdrive We want our own
Why is the solution attractive to end users? 1 100 GB capacity (free of charge) 2 Allows for collaboration with colleagues, other higher education and research institutions 3 Files can be accessed from any device 4 5 Allows for sharing with guest users in any location around the world Secure and protected against the invasion of privacy, within the Dutch legal framework
SURFdrive Loadbalance-01 Loadbalance-02 External Network Appserv-01 Appserv-02 Appserv-03 Appserv-04 Appserv-05 Database-01 Database-02 Database-03 Storage Network DB Replication Network Management Network Install Server Monitoring