Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER
Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. Go to: http://www.intel.com/products/processor_number Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel Virtualization Technology requires a computer system with an enabled Intel processor, BIOS, virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization No computer system can provide absolute security under all conditions. Intel Trusted Execution Technology (Intel TXT) requires a computer system with Intel Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security Intel, Intel Xeon, Intel Atom, Intel Xeon Phi, Intel Itanium, the Intel Itanium logo, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. Copyright 2013, Intel Corporation. All rights reserved.
Agenda Big Data trends and opportunities Evolution of Data Management & Analytics Intel provides foundation for Big Data Intel Compute Platforms Optimized for Big Data Intel Storage & Network Technology Intel Software Optimization Summary
COMPLEXITY Big Data Trends Billions connected users sharing Skype 663m 5.3 bn Cell Phones facebook 629m >1500 Exabytes of cloud traffic 1400 Exabytes of new integrated systems data 690% Growth In storage capacity by 2015 Volume Big Sensed Data Big Corp Data Unstructured Data Yahoo 273m Hotmail 364m Big Web Data Corporate Data Structured Data Time What insights can we derive? PREDICTION Are you looking at Big Data? No 5% No, but on radar 20% ANALYSIS MONITORING How are you approaching the opportunity? Yes 75% REPORTING BUSINESS VALUE IT Survey Source: Intel
What Enterprises are doing with Big Data? From Experts From Customers Only business model Tech has left. Forbes, 2011 Data are becoming the new raw material of business: an economic input almost on a par with capital and labor. The Economist, 2010 Information will be the oil of the 21st century. Gartner, 2010 Retail: increase margins 60% Manufacturing: 50% decrease in production costs Cellular: $150B to Providers Public Sector: $250B growth. McKinsey 2010 Retail Financial Services Provider Billing Smart City Telco Utility Real time social trend analysis to identify the hottest products to offer Real time fraud detection, prevention & recovery Real time access to subscriber billing records to offer new service, prevent customer churn Predictive traffic forecasting New customer segmentation for realtime campaigns Load balance energy grids thru real time monitoring customer energy usage
Evolution to Big Data Processing Date Paradigm Processing Style Form Factor 90s ATA Reporting / Mining High Cost /Departmental use Batch- e.g. sales reports Sequential SQL queries e.g. retrieve sales reports RDMS Scale 2000s Model-based discovery High Cost / Dept Use Batch-e.g. correlated buying pattern No SQL. parallel analysis Shared disk/memory No SQL RDMS Scale Node Node Proprietary MPP/ DW Appliance Today Low Cost / Enterprise Use Arrival of vast amounts of unstructured data Near real-time- e.g. recommend engine Process @ storage node Built-in data replication/reliability Shared nothing, in memory Open Source SW loosely coupled on standards based HW Node Node Node Unlimited Linear Scale Distributed node addition In Memory Analytics EXALYTICS Future Real world modeling Real-time predictive analytics HPC Simulation Machine Learning
What is Different about Big Data? Traditional Data Analysis Big Data Analysis Transaction Relational Database Batch Data Warehouse Analyze Structured, Unstructured, Streaming Node Node Cluster Organize Analyze SQL Devices MapReduce R Hive Volume Gigabytes to Terabytes Petabytes and Beyond Velocity Batch CEP Real-Time Data Analytics Variety Centralized, Data Moves to Analytics Distributed, Analytics Moves to Data Value Reactive, Query, Reporting, Proprietary Predictive Analytics, Machine Learning, Graph Algorithms, statistical modeling Big Data augments traditional Business Intelligence
Right Data Methods For Right Data Structure Unstructured Multi-format Data Emerging Technologies Analytical Paradigms Structured Data Relational Database EXALYTICS *Other brands and names are the property of their respective owners.
Technology driving Big Data innovation
Intel Role in Big Data Era Distribute analytics to the edge sensors/devices and drive a standards based connected, managed and secure architecture Accelerate big data analytics through faster and more effective CPU, storage, I/O and network architectures Drive innovation in big data applications by providing optimized software stacks and services Foster the growth of big data through partner collaboration, focused on usage model examples and reference deployment architectures Invest in solution research and academia collaboration
Choice of Compute Platforms Optimized for Big Data
$/TB In Memory Analytics are Game Changing Running time (s) HANA VOLTDB 20 node VoltDB system can do what a 1000 node Hadoop cluster can do Michael Stonebreaker, Architecting for In Memory Model Objectivity GraphDB + + TimesTen In- Memory Database Business Intelligence Enterprise Edition SolidDB $50 000 $40 000 5000 SAP HANA* Scalability Customer Workload $30 000 Ideal $20 000 $10 000 20x Reduction 500 8S Glueless $0 Q4 2010 (DRAM) Q4 2016 (DRAM) 2016 (CR) Low Cost Memory Technology 50 1 2 4 8 Socket Count Near-perfect scaling on Intel Xeon processor E7 family Near Real-time Insight Enabled by In-Memory Solutions Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: http://www.intel.com/content/www/us/en/highperformance-computing/high-performance-computing-xeon-e7-analyze-business-as-it-happens-with-sap-hana-software-brief.html
Big Data Transforming Storage
Storage Models evolving for Big Data Traditional Storage Management Distributed Storage Architecture VM VM VM VM Compute Storage Network storage client Metadata Servers metadata services Storage Servers storage services Designed for structured data Longer time to deployment Restricted to single site Forklift add of new discrete storage for capacity Designed for unstructured data growth Faster time to deployment Multiple, distributed locations managed as a single device Scale capacity & performance by adding nodes
Big Data Visibly Mobile Performance Responsiveness Insight & Productivity Work Station Performance For Right Deep Model Generation for Analytics Processes Collaboration Secure Media, Data,& Assets Visibly Mobile Data Productivity Flexible End Point Solutions with client application support that allow fast and efficient data modeling, scoring and direct data access from any location 18 Intel Virtualization Technology requires a computer system with an enabled Intel processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain computer system software enabled for it.
Building On the EcoSystem Database and compute infrastructure Relational Analytics engines VOLTDB Nonrelational EXALYTICS No matter the choice, all optimized, some exclusively, on Xeon
Intel s contribution to Open Source Enable open source operating environments to run best on Intel architecture UPSTREAM Code Capital Foster open source ecosystems and develop new markets for Intel and its partners DOWNSTREAM Alliances Foundations OEM Service Provider Enteprise
Intel HiTune The Hadoop performance analyzer Users develop their applications based on MapReduce model The Hadoop framework dynamically maps it to the underlying cluster HiTune automatically instruments Hadoop tasks (at binary level) to collect runtime information Low overheads (<2%) No source code changes Various runtime information JVM information System statistics Hadoop log information See Intel paper HiTune: Dataflow-Based Performance Analysis for Big Data Cloud in 2011 USENIX Annual Technical Conference
Driving Big Data Usages & Requirements Vertical Deployments & Lab Innovations Telco Retail Science Mfg Finance Healthcare Science and Technology Centers for Big Data Drive field usage models and cutting edge enhancements Open Standards Intel Cloud Builders Ref Architectures & Adoption Big Data Security Working Group Hadoop Enhancements Define and Prioritize IT Requirements & Accelerate Industry Standards Ecosystem Contributions & Distro Innovation Benchmarking ISV/OEM Designs Craft enterprise ready software contribution for OEM/ISV to build solutions Work with Industry Partners to identify and deliver usage examples and reference architectures for variety of Big Data solutions
Summary 1 Big Data is here and growing rapidly 2 3 Intel is well positioned from software stack and platform basis Intel is committed to investing in new technology to address more demanding big data requirements of the future
Want more information? hadoop.intel.com Learn how to deploy Hadoop Downloads, tutorials, deployment guides www.intel.com/bigdata Information for IT managers Case studies, Analyst Reviews & Complementary Research
Thank You!