4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP and HP, have spent more than $15 billion on buying data management and analytics software. This industry on its own is worth more than $100 billion. 4
5 1. Cuán grande es big data? 2. De dónde proviene la data? 3. Dónde se guarda? 4. Cómo se analiza? 5. Cómo se visualiza? (luego) 6. Quién lo necesita?
6 Google was processing 20 PB a day in 2008 Wayback Machine had 3 PB +100 TB/month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) ebay has 6.5 PB of user data + 50 TB/day (5/2009) 640K ought to be enough for anybody.
7 Large Hadron Collider in B/S (40 TB/S) Air Bus A380 Generate 640TB per Flight Twitter Generate 12 TB of data per day New York Stock Exchange 1TB of data everyday Walmart alone had 30 Billion RFID sensors in 2012
8 The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data 8
9 Lots of data is being collected Web data, e-commerce department/grocery stores Bank/Credit Card Social Network Health Genetics Big Data Everywhere!
11 Source: Avanade Global Survey: The Business Impact of Big Data, November 2010
12 Science Data bases from astronomy, genomics, environmental data, transportation data, Humanities and Social Sciences Scanned books, historical documents, social interactions data, new technology like GPS Business & Commerce Corporate sales, stock market transactions, census, Entertainment Hollywood movies, MP3 files, Medicine MRI & CT scans, patient records,
13 HP envisions 1 trillion sensors in use around the world There are many types of sensors temperature, pressure, level, humidity speed, motion, distance light or the presence/absence
14 Rodolfo Milito IoT: expansion of connectivity using IP networking of things into public and private IP networks, linking computing and storage resources, and also people The Industrialization of IP networks reaches domains previously characterized by application specific, often non-ip networks Smart Objects include Smart tags (RFID) Sensors: measure power quality/voltage/, pressure, mechanical constraints, video, pollution, gas/water/.. leaks, motion Actuators: act on devices (e.g. turn on/off an engine, a light, close a valve, or even trigger a complex set of actions) Organized into: Vehicles, Intelligent traffic controls and lighting elements, industrial automation, healthcare, etc.
15 Rodolfo Milito Today s Dominant Endpoints Dominant Endpoints in 2025 Industrial Automation Transportation and Connected Vehicles Precision Agriculture Healthcare A person behind every device Intelligent Buildings Devices clustered in systems
16 Rodolfo Milito Non-trivial Extension of Cloud Computing from the Core to the Edge that enables a whole new wave of services and applications Virtualization, Multi-tenancy, & some distinctive features Suites of Use Cases - (Mobile) Content Delivery Low latency Apps (gaming, streaming, augmented reality...) - Geo-distributed apps Sensor/actuator networks, Smart Cities - Large-scale distributed control systems Connected Vehicle, Int.Transportation, Smart Grid fog = cloud close to the ground Fog is the platform where the Internet meets the physical world
17 FOG CLOUD interpla y Grid Data Latency Hierarchy (taken from Jeff Taft) Rodolfo Milito Multiple uses of same datum (latency requirements/destinations)
18 Dónde se guarda? What makes big data different? Why isn't saving/moving/copying big data as simple as using the tools we already have?
19 Big Data Store Difficult/slow transfers Expense for storage/backup Difficult to share and publish
21 The process of examining large amounts of data of various types to uncover hidden patterns, unknown correlations, and other valuable information.
22 Predictive Power of Big Data Analytics in Healthcare Analysis Of Farm Soil Improving Oil and Gas Operations Retailers are Using Big Data Analytics to Outperform Others..
23 Rodolfo Milito Need of immediate response time Can't afford latency of sending up and back the chain Closed-loop control In controlling physical systems cannot depend on speed and availability of resources back at the data center e.g. smart traffic light system Privacy, Data-ownership considerations Regulatory and business concerns may not allow moving the data Improved scale and aggregate throughput via parallelism -- Data sources often naturally distributed Avoid sending unnecessary Data Offload centralized resources that would otherwise have to filter through volumes of uninteresting/useless data.
24 Rodolfo Milito Data Centers (Central or Distributed) Core Multi-Service Edge Edge (Embedded Systems and Sensors) NGN Mobility Video Cloud Security Emerging Footprint for Distributed Intelligence & Processing Data at Rest aggregated collection and storage Data at Rest ETL and Analytics for Structured & Unstructured Data What if Analytics Predictive Analytics Streaming/CEP Analytics Applications Visualization & Reporting Networked Data Collection Processing at the Edge Streaming ETL (e.g. Filtering, Transformation, Aggregation) Streaming/CEP Analytics Real-time Sensing Alerts and Actions Applications Analysis Execution Localized Visualization & Reporting Action Networked Data Collection Processing at the Edge Streaming ETL (e.g. Cleansing, Filtering, Transformation, Aggregation) Skinny Streaming/CEP Analytics, Alerts and Actions
25 It is not just lots of data (structured?) It is not just exponential growth of data It is new ways of making sense over data that require changes to existing architectures. Big Data, the term, in its current use, implies many other things, like: Apache Hadoop Framework Commodity hardware leveraging Moore s law Infinite scalability No data temples
26 No single standard definition Big Data is unstructured data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it 29
27 Data Volume Data volume is increasing exponentially 30
28 Various formats, types, and structures Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc Static data vs. streaming data 31
29 Data is begin generated fast and need to be processed fast Online Data Analytics 33
30 how we can capture the most important data as it happens and deliver that to the right people in real-time how we can store the data how we can analyze and understand it given its size and our computational capacity other challenges from privacy and security to access
31 Greater than the challenges are the opportunities We can extract insight and knowledge identify trends use the data to improve productivity gain competitive advantage create substantial value for the world economy Big data provides an opportunity to find insight in new and emerging types of data. Argentina can take advantage of these opportunities
32 Discovery of useful, possibly unexpected, patterns in data Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
33 Aggregation and Statistics Data warehouse and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling
35 Rodolfo Milito HealthCare - Deep Analytics (pattern recognition) - Assisted Living, Home Care, Athletics Apps Precision Agriculture Oil and Gas Transportation Smart Cities - Smart Traffic Lights - Pollution Monitoring - Infrastructure Health Monitoring Connected Vehicle & Rail Smart Grid Retail Industry
36 Rodolfo Milito Satellite Internet / Cloud / VPN Aggregation Point (e.g., Farm House) IP WAN Backhaul (Cellular, Broadband, Ethernet, Serial) Wi-Fi or Ethernet LAN Macro Cell Cellular Satellite 2 3 Cellular Wi-Fi Small Cell IPv6 enabled g/e RF Mesh Endpoint IPv6 Stack Mobile Endpoints (Tractors, Implements) Fixed Endpoints (Environmental Sensors Water, Nitrogen)
37 Opportunities in Precision Farming Rodolfo Milito Category Requirements Pluses Minuses Comments Intelligent Irrigation System Sensor network and access Edge and Core integration (sensor information + weather forecast) Better yields Water savings Sensors can also measure soil conditions Cost of deployment Wi-Fi infrastructur e helps Produce Tracking Tagging & tracking system Provenance guarantees
38 Rodolfo Milito
39 Services Operation Infrastructure End Points Smart Traffic Light System Rodolfo Milito IoT for Smart & Connected Communities PublicCloud Subscription Based Services Private Cloud Security, ITS, Lighting, Water NMS S+CC Service Delivery Platform Ethernet WiFi, P, Wave2M, Low Power RF, PLC, , etc. Smart Water Structural Health Intelligent Transportation Public Lightin g Environmental Monitoring Safety & Security
40 End Points Infrastructure Operation Services Rodolfo Milito Public Cloud Private (OEM) Cloud Enterprise Cloud Subscription-based Services Data Center/Virtual Servers Enterprise Video, Voice, Data VNO Policy Enforcement, Flow-based Management, DPI Energy Service Providers Mobile (Smart Mobile SP Grid) 1 SP 1 Communications Mobile Mobile SP 1 SP Service 1 Providers, Fog Mobile WiFi Offload Wi-Fi Hotspots, u, 3G/4G DSRC Roadside Infrastructure p (V2I) Consumer Network Home/Dealership Wi-Fi Hotspots, Femtocells Electrical Charging Network Charging Stations, Other Services (802.11p?) V2I/Upstream Communication (Wi-Fi, 3G/4G, p, etc.) Software V2V Communication (802.11p)
41 Rodolfo Milito Travelers Centers Vehicles Field 4 7
42 Rodolfo Milito Roadside multi-purpose equipment based on convergence of routing, computing and wireless technologies Distributed, multi-tenancy computing model Supporting multiple wireless technologies Located with other traffic control equipment Purpose - Managed Service Regulate traffic (Traffic Router cars, IP packets, same) Collect tolls taxes (per transaction fee collection) E-Commerce support Content delivery Traffic sensor management (e.g., Sensys)
43 Big Data Integration is Multidisciplinary Less than 10% of Data world are genuinely relational Meaningful data integration in the real, messy, schema-less and complex Big Data world of database and semantic web using multidisciplinary and multi-technology methode The Billion Triple Challenge Web of data contain 31 billion RDf triples, that 446million of them are RDF links, 13 Billion government data, 6 Billion geographic data, 4.6 Billion Publication and Media data, 3 Billion life science data BTC 2011, Sindice 2011 Demonstrate the Value of Semantics: let data integration drive DBMS technology Large volumes of heterogeneous data, like link data and RDF
45 Jobs - The U.S. could face a shortage by 2018 of 140,000 to 190,000 people with "deep analytical talent" and of 1.5 million people capable of analyzing data in ways that enable business decisions. (McKinsey & Co) - Big Data industry is worth more than $100 billion and growing at almost 10% a year (roughly twice as fast as the software business)
46 In 2008 it the paper Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society Just as search engines have transformed how we access information, other forms of bigdata computing can and will transform the activities of companies, scientific researchers, medical practitioners, and our nation's defense and intelligence operations. In 2012, the Obama administration announced the Big Data Research and Development Initiative
47 Let s catch the wave Argentina! Puedes ser un líder en Big Data Qué debemos hacer para subirnos al tren.
49 - Government In 2012, the Obama administration announced the Big Data Research and Development Initiative 84 different big data programs spread across six departments - Private Sector - Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data - Facebook handles 40 billion photos from its user base. - Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide - Science - Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days. - Large Hardon Colider 13 Petabyte data produced in Medical computation like decoding human Genome - Social science revolution - New way of science (Microscope example) 9
50 Crowler Ingestion processes of the data Custom processing highly specialized User accessing and using data Transaction processing (storage processing gfs) capture thru interaction spaner Processing it analysys Mapreduce hadoop (batch mode) Machine learning Smart quering Required many eng. Teams to solve this
51 Data from everywhere You should not care where from Medical health genone genetic map and tracking Consumer related kmart.target, walmart Auto industry car status
52 Internet plays a key role Enterprise, health, retail, government, finantial New DB, new Storage What new 3V volume,velocity,variety 4S source,size,speed,structure Tipical data Create,Read,Update,Delete CRUD now Create,Replicate,Apende (not delet just apend),processing
53 Retailing Finantial Healthcare Data from video IoT Hadoop is leader in 2 key elements Distributed file system Mapreduce BD on the Cloud Oportunities Farmers whether crop faliors Pandemics Heath care 150B saving IoT cisco predicts that in Billon Therabytes trafic
OPEN DATA CENTER ALLIANCE : sm Big Data Consumer Guide SM Table of Contents Legal Notice...3 Executive Summary...4 Introduction...5 Objective...5 Big Data 101...5 Defining Big Data...5 Big Data Evolution...7
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
Seung Ku HWANG Big Data Software Research Lab. PRAGMA Workshop 2012. 10. 10. Contents I II III IV Big Data Trend Big Data Technology Research Direction of ETRI Concluding Remarks 2 3 BIG DATA : IS IT REAL
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
DRAFT VERSION Big Data privacy principles under pressure September 2013 2 Contents Summary... 6 1 Introduction... 8 1.1 Problems for discussion... 8 1.2 Definitions... 9 1.2.1 Big Data... 9 1.2.2 Personal
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
12 The Internet of Things (IOT) 12.1 Moving toward a Smarter Internet Imagine a world where billions of objects can sense, communicate and share information, all interconnected over public or private Internet
Future Internet Bandwidth Trends: An Investigation on Current and Future Disruptive Technologies Yanyan Zhuang Theodore S. Rappaport Justin Cappos Rick McGeer Department of Computer Science and Engineering
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
CHAPTER 1.9 Making Big Data Something More than the Next Big Thing ANANT GUPTA HCL Technologies Big data is the business buzzword du jour. But how can you turn this hot topic into a real source of business
BIG DATA TECHNOLOGIES, USE CASES, AND RESEARCH ISSUES Il-Yeol Song, Ph.D. College of Computing & Informatics Drexel University Philadelphia, PA 19104 ACM SAC 2015 April 14, 2015 Salamanca, Spain Source:
Internet of Things Next-Generation Business and the Internet of Things Opportunities and Challenges Created by a Connected and Real-Time World Table of Contents 3 The Internet of Things Is Redefining Enterprise
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
Sensing as a Service and Big Data Arkady Zaslavsky #1, Charith Perera #*2, Dimitrios Georgakopoulos #3 # ICT Centre, CSIRO, Canberra, ACT, 2601, Australia firstname.lastname@example.org 2 email@example.com
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
IBM Industries White paper Business analytics in the cloud Driving business innovation through cloud computing and analytics solutions 2 Business analytics in the cloud Contents 2 Abstract 3 The case for
May 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company s business
A Practical Guide to Big Data Opportunities, Challenges & Tools 3DS.COM/EXALEAD Give me a lever long enough and a fulcrum on which to place it, and I shall move the world. 1 Archimedes About the Author
Big data and open data as sustainability tools A working paper prepared by the Economic Commission for Latin America and the Caribbean Supported by the Project Document Big data and open data as sustainability
Big Data Security Issues and Challenges Raghav Toshniwal * Kanishka Ghosh Dastidar Asoke Nath Department of Computer Science Department of Computer Science Department of Computer Science St. Xavier s College
what can businesses do to capture the full potential of big data? helping companies observe and assess their data sets, identify potential revenues and mitigate challenges contents introduction 3 identify