Big Data; Old News or New Hype? Marcel den Hartog, June 2012
One of the first Big Data projects in 1964 The Ranger series of spacecraft were designed solely to take high-quality pictures of the Moon and transmit them back to Earth in real time. The images were to be used for scientific study, as well as selecting landing sites for the Apollo Moon missions. Ranger 7 was the first of the Ranger series to be entirely successful. It transmitted 4,308 high-quality images over the last 17 minutes of flight, the final image having a resolution of 0.5 meter/pixel One shot, one chance to get it right
How Big is Big REALLY?!? Better store a date without the 2 century digits. It saves thousands of bytes in VSAM space And then came 2000 and Mod 54 Storage devices Convert your music to 98kbps MP3 files, the USB stick can hold almost 4 CD s! Then the ipod came with 50Gb of Memory Let s only keep 2 months of DB2 performance history, it takes soooo much disk space! How do I know what to keep from the 500Gb of log files my ERP application generates EVERY SINGLE DAY? I constantly reach my maximum of 32Tb of log data.
It depends. Yesterday s BIG is today s normal
Big Data Is just data The history of data ipad Other... WEB GUI WEB GUI Cloud External Services Partners Partners GUI Oracle Oracle Distributed Distributed Distributed Oracle SQL Server MySQL Oracle SQL Server MySQL Oracle SQL Server MySQL 3270 3270 Mainframe Outsourced Outsourced IMS VSAM IMS VSAM DB2 IMS VSAM DB2 IMS VSAM DB2 IMS VSAM DB2
What does history teach us? The amount of data grows Data gets duplicated Reliability? More duplication, transformation, transportation Reliability? Standard Applications like ERP, CRM Reliability? Cloud Services Reliability? RFID s Reliability?
Business Intelligence Google search: 54.000.000 Only entries from last MONTH!!!!
Just some predictions A retailer using big data to the full could increase its operating margin by more than 60% If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8%. In the developed economies of Europe, government administrators could save more than 100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. McKinsey: Big data: The next frontier for innovation, competition, and productivity. May 2011
Predictions or promises? We have been wrong before.. The fact that there is a theoretical potential for something good does not automatically mean it is practically possible How reliable is your data TODAY? How reliable is reliable really?
The world we live in..
Some issues we already have today Who owns that data? Is this the source, or already 2nd hand data? What is it for? Who is it for? What decisions are going to be based on this data? There is too much of it. It doesn t always match.. It actually never matches
There is too much of it already And it will only get more.. Data is richer (high res pictures, more details, more links) Data is linked (Social Network Adverts) Visitors of Websites
So, what are the challenges really? Size? by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data (twice the size of US retailer Wal-Mart's data warehouse in 1999) per company with more than 1,000 employees Complexity of the data? Everybody in this room knows how hard it is to combine data from different sources you own.. Let alone combining it with what you DO NOT own Complexity of management? Big Data means BIG management challenges Remember when storage was cheap?
Scary stuff "[The] new analytics taps the expertise of the broad business ecosystem to address the lack of responsiveness from central analytics units," xxx noted in its report. "The challenge for centralized analytics was to respond to business needs when the business units themselves weren't sure what findings they wanted or clues they were seeking. The new analytics wave "does that by giving access and tools to those who act on the findings. If you don t know what you are looking for, any answer looks ok
An example A company sells expensive cars Price range is 80k 400k In 2009, they sold 120 with a total of 10,000,000 In 2010, they sold 150 with a total of 20,000,000 In 2011, they sold 30 with a total of 11,000,000 The average selling price is: 41,000,000 / 300 = 136,666 To make the numbers for 2012 (1,200,000) we need to sell 9 cars, right? Wrong. If we sell only low priced, it s 150, on the positive, only 30. Marketing cost to sell 1 car = 1000 Euro s Do we allocate 150,000 Euro s, or only 30,000 for Marketing?
Despite common belief, not every businessperson knows how to interpret data
The role of the mainframe We are/should be part of any Big Data Initiative
Before I continue HOW EXPENSIVE IS THE MAINFRAME? Get it? People can manipulate. And they will
Mainframe as supplier of data Normal DW/BI stuff we have done over the years More data, better planning needed Look for products with ziip support to reduce load for tranfer and compression Look for technology in your database that helps you do this in a smart and efficient way (replication, smart stored procedures) Prepare for questions like: Just give me everything, my tool will help me figure it out Appoint 1 designated person to deal with the dataanalyst(s)
Mainframe as host Why not store the BI cube extracted from all Big data ON the mainframe. Advantages zenterprise architecture Channel Path Speed System and database (DB2) can be tuned better than anything else Management (DR, Storage Management, Version management, Granular Security) Mainframe management Staff
Mainframe as a host II Host the BI cube extracted from all Big data ON zlinux on the mainframe. Advantages zenterprise architecture Reduced License cost Channel Path Speed when moving data from DB2 Management (DR, Storage Management, Version management, Granular Security) Mainframe management Staff
No matter what Big Data IS It has to be managed.
storage functionality Manage storage resources Backup and Archive Catalog Management Raid and Storage Volumes Unix for z-series Performance statistics Online DASD Volumes Catalog Catalog Cache Catalog I/O Application Data Set 23
storage workspace 24
storage visualize relationships 25
storage performance analysis 26
Conclusions Big Data is just Bigger Data Same challenges as before, but exponentially more difficult You can make numbers say ANYTHING You can even choose how to pick the data that you want people to start analizing On the mainframe, knowledge is key Collecting data from MORE data requires insight, discipline, awareness and downright common sense We fullfill all requirements A MAINFRAME database running ON a mainframe is the ideal environment to host, manage and protect mission critical data?