Big Data, Cloud & Virtualization Tokyo, 2014 Vik Nagjee Product Manager, Database Platforms Big Data 1
What s Big about {Big} Data? The 3 V s Volume Variety Velocity The {Big} Data Challenge Image credit: Diya Soubra [http://www.datasciencecentral.com/forum/topics/the-3vs-that-define-big-data] 2
What s the Real {Big} Data Challenge The 4 th Dimension of Big Data Volume VALUE Velocity Variety The {Big} Data Journey: A Data Platform for Just-In-Time Action Volume Velocity Variety VALUE 3
Big Data Case Study ESA: The Gaia Mission Source: http://upload.wikimedia.org/wikipedia/commons/a/a8/nasa-apollo8-dec24-earthrise.jpg 4
Gaia: Complete, Faint, Accurate Hipparcos Gaia Magnitude limit 12 mag 20 mag Completeness 7.3 9.0 mag 20 mag Bright limit 0 mag 6 mag Number of objects 120,000 47 million to G = 15 mag 360 million to G = 18 mag 1192 million to G = 20 mag Effective distance limit 1 kpc 50 kpc Quasars 1 (3C 273) 500,000 Galaxies None 1,000,000 Accuracy 1 milliarcsec 7 µarcsec at G = 10 mag 26 µarcsec at G = 15 mag 333 µarcsec at G = 20 mag Photometry 2-colour (B and V) Low-res. spectra to G = 20 mag Radial velocity None 15 km s -1 to G RVS = 16 mag Observing Pre-selected Complete and unbiased Source: http://www.cosmos.esa.int/web/gaia/presentations 5
One Billion Stars in 3D will provide in our Galaxy the distance and velocity distributions of all stellar populations the spatial and dynamic structure of the disk and halo its formation history a detailed mapping of the Galactic dark-matter distribution a rigorous framework for stellar-structure and evolution theories a large-scale survey of extra-solar planets (~7,000) a large-scale survey of Solar-system bodies (~250,000) and beyond definitive distance standards out to the LMC/SMC rapid reaction alerts for supernovae and burst sources (~6,000) quasar detection, redshifts, microlensing structure (~500,000) fundamental quantities to unprecedented accuracy: to 2 10-6 (2 10-5 present) Source: http://www.cosmos.esa.int/web/gaia/presentations Source: http://n.pr/1p7vyxv 6
Source: http://www.cosmos.esa.int/web/gaia/data-processing Core Processing Powered by InterSystems Caché Data volumes ~1,200,000,000 stars observed by Gaia In 5 years, Gaia will observe each star, on average, 80 times: (80 x 1,200,000,000) = 96,000,000,000 transits 96,000,000,000 / 5 years = 52,316,076 transits / day On a nominal day ~50,000,000 transits ~ 285,000 MB data On a heavy day ~350,000,000 transits ~1,995,000 MB data Data growth patterns Per day: ~ 285,000 MB = ~280 GB = ~0.28 TB First 4 months (COMMISSIONING Period) All daily data is kept Total growth = 0.28 TB/day x 120 days = ~34 TB In 5th month, cleanup occurs. Remaining data = ~3 TB 5th month onwards, steady state size = ~3 TB 7
Core Processing Powered by InterSystems Caché Delightfully Parsimonious Architecture One 16CPU, 1.2TB RAM server IDT/FL DB Storage for ITD/FL DB: 1x NetApp FAS3160, 160 SATA Disks, iscsi 16x Internal SSDs List price: ~$200,000 (storage + server) Mapping the Galaxy for less than $500,000 in hardware [database-specific = $300,000] One 16CPU, 1.2TB RAM server Asynchronous Mirror Storage for Async: 1x NetApp FAS3250 35 STATA Disks, NFS interconnect 16x Internal SSDs internal to each server List price: ~$90,000 (storage + server) $500,000 / 1 billion stars = $0.0005 per star Application Access: Java Application(s) across ~20 application servers Connecting to Caché via JDBC List price: ~$10,000 / server = ~$200,000 for Java application Answering the formation history of the galaxy = Priceless! HA / DR Configuration: No hot HA: 95% Uptime SLA guarantee rebuild of server, or DR DR: Caché Database Mirroring Data Platform for Just-In-Time Action One unexpected characteristic we have noticed during commissioning concerns stray light. In our test images, an excess of diffuse illumination is sometimes seen on some of the detectors, repeating in a cycle that relates to Gaia s spin period of 6 hours. 8
Source: http://www.esa.int/spaceinimages/images/2013/12/farewell_to_gaia 9
10
Gaia Unraveling the chemical and dynamical history of our Galaxy Cloud & Virtualization 11
Virtualization & Cloud Intertwined! The NIST Definition of Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Source: National Institute of Standards and Technology, Special Publication 800-145 12
The NIST Definition of Cloud Computing Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Source: National Institute of Standards and Technology, Special Publication 800-145 My definition of Cloud Computing Harnessing advances in information technology to accelerate value delivery to customers 13
My definition of Cloud Computing Harnessing advances in information technology to accelerate value delivery to customers Types of Cloud Computing Public Cloud an infrastructure as a service (IaaS) provider such as Amazon EC2, Rackspace, Azure, etc. Private Cloud either provided by an IaaS, or hosted internally at the customer site (using something like Openstack or Cloudstack) Virtualization-based Cloud this would be something like a VMware (vcloud) environment, or even a fully virtualized environment Customer SaaS offering where a partner has built a solution based on our products and delivers that solution on a SaaS basis 14
SaaS SaaS 15
Cloud. The Enabler. Deploy Breakthrough Applications in The Cloud How? Pay-as-you-go Virtually *infinite* computing resources Elastic Provision on-demand Stay lean and agile Better to go in with your eyes wide open CAUTION! 16
Amazon EC2 SLA Amazon EC2 => 99.95% monthly uptime ~22 minutes downtime / month (min threshold) Finer print: 99.95% to 99% monthly uptime ~22 minutes to 7.2 HOURS downtime/month Service Credit as compensation Other considerations? Regulatory Compliance Cost Where s the Data? How Secure Is My Data? Etc 17
Cloud Case Studies Providing a Cloud-Enabled Data Platform 3M Health Information Systems Breakthrough Enterprise Service Bus (ESB) in the cloud Ensemble ESB in the Cloud Goals To simplify inter-application communication To reduce maintenance costs To increase scalability To improve governance of application access To increase the flexibility with which new software applications could be added to the overall system and To automate system operations Scalable auto-scale, based on demand Elastic grow, shrink based on demand Stateless no-persistence model Automated single-click, automated deployment 18
Ontario Systems Breakthrough Software-as-a-Service (SaaS) Receivables Management Software for Third-Party Collection Agencies New regulatory burden for Collection Agencies monitor customer complaints, or else! Built a cloud-based Complaint Tracker application Built & Deployed the breakthrough application as SaaS offering in less than six weeks using Caché, Ensemble, DeepSee Eased burden on existing customers; gained several new customers Building on InterSystems technologies, we went from initial concept to delivering a functional product in just 35 days. - Chris Cochran, Product Director, Ontario Systems Eventsforce Breakthrough Software-as-a-Service (SaaS) End- to-end event planning and management solution Modular, flexible SaaS offering Extremely scalable model events from tens to thousands of users! Extremely elastic model scale up or down during an event add or remove functional modules on a live system Breakthrough web-based SaaS offering, including mobile app 19
Press Computer Systems (PCS) Social Knowledge Breakthrough Real-time Perception Management SaaS Listen Understand Engage Wrap-Up Questions? You can reach me @ Vik@InterSystems.com Thank you! 20