1 BIG DATA POSSIBILITIES AND CHALLENGES PROFESSOR AND CENTER DIRECTOR
2 WHY BIG DATA? In God we trust - all others must bring data W. Edwards Deming (US engineer and statistician, )
3 WHAT IS BIG DATA? Wikipedia (en.wikipedia.org/wiki/big_data) All-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications Big Data characteristics Volume (often very large) Velocity (often arrives very fast) Variety (often varied/complex format/type/meaning) Veracity (often uncertain or imprecise)
4 BIG DATA AVAILABILITY Pervasive use of computers and sensors Ability to acquire/store/process data Big Data collected everywhere Society increasingly data driven Today as much data created in two days as we did until 2003!
5 BIG DATA EXAMPLE: THE INTERNET What happens in an internet minute?
6 BIG DATA EXAMPLE: TERRAIN DATA Previously meter data E.g Shuttle Radar Topography Mission (SRTM) near global 90-meter dataset Now accurate meter or sub-meter data (e.g. LiDAR) Europe: Denmark, Sweden, Netherlands, USA: NC, OH, PA, DE, IA, LA, Denmark Denmark at 30-meter: ~46 million data points (GB) Current 2-meter model: ~12 billion data points (TB) Upcoming ½-meter model: ~ 168 billion data points
7 BIG DATA IMPORTANCE Nature/Science: Paradigm shift; Science will be about mining data The economist: Managing data deluge difficult; doing so will transform business and public life Value is not in data creation but in data analysis!
8 BIG DATA ANALYSIS IMPORTANCE New York Times, 11/2/2012: The age of Big Data What is Big Data? A meme and a marketing term, for sure, but also shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.
9 BIG DATA ANALYSIS IMPORTANCE Dan Ariely: Big Data is like teenage sex: Everyone talks about it Nobody really knows how to do it Everyone thinks everyone is doing it So everyone claims they are doing it And like sex, the ones getting the most are smart enough not to talk about it
10 BIG DATA INCREASING IMPORTANCE Increasing government awareness of importance of Big Data analysis Big Data as a driver for growth Governments are increasingly supporting use of data through free data programs
11 POPULAR BIG DATA ANALYSIS EXAMPLES Google: Power of statistical methods on Big Data from the web Google flue-trends - Statistically certain search terms are good indicators of flu activity Google translate - Not : Linguistic analysis to extract the meaning from syntax and vocabulary - Instead : Statistically most probable translation based on similar translations on web
12 POPULAR BIG DATA ANALYSIS EXAMPLES Netflix: The power of recommendation systems Analysis of subscriber preferences created hit series House of Cards - Old (1990) British TV series still popular - Films featuring Kevin Spacey had always done well - Movies directed by David Fincher ( the social network ) had a healthy share
13 BIG DATA ANALYSIS CHALLENGES What questions should be asked What questions can be answered How can questions be answered How is Big Data processed efficiently How can different data be combined How is uncertainly handled What about legal issues What about privacy issues Researcher-industry-society collaboration important!
14 AFTERNOON CASES Many interesting projects/collaborations, including on Releasing and exploiting government, social media and newspaper data and how they are accesses Utilizing health care data to help mothers, newborn, school kids and hip patients alike including in Africa Improving indoor service logistics, recycling systems and personal products offerings - as well as national and global markets Collecting data to model, analyze and improve air quality, traffic behavior, food perception - as well as animal farming Many good Big Data Big Impact examples involving researchers, industry and government
15 MADALGO CASES MADALGO cases involve efficient processing of big terrain data Cleaning ocean floor scanning data Flood risk screening both strong research/publications and new/improved products Important for success MADALGO algorithms research Domain and market knowledge of industry Startup SCALGO as development glue
16 CENTER FOR MASSIVE DATA ALGORITHMICS Established 2007 funded by Danish National Research Foundation 5 year renewal in 2012 (10 year budget > $25 million) - International evaluation: MADALGO is the world-leading center in the area of massive dataset algorithmics High level objectives Advance algorithmic knowledge in massive data algorithms area Train researchers in world-leading international environment Be catalyst for multidisciplinary collaboration
17 CENTER FOR MASSIVE DATA ALGORITHMICS Established 2007 funded by Danish National Research Foundation 5 year renewal in 2012 (10 year budget > $25 million) - International evaluation: MADALGO is the world-leading center in the area of massive dataset algorithmics Building on: Algorithms research focus areas: - I/O-efficient, cache-oblivious and streaming - Algorithm engineering Strong international team/environment Multidisciplinary and industry collaboration
18 I/O-EFFICIENT ALGORITHMS Problems involving Big Data on disk Disk access is 10 6 times slower than main memory access Large access time amortized by transferring large blocks of data Important to store/access data to take advantage of blocks I/O-efficient The algorithms: difference in speed between modern CPU and disk technologies is Move as analogous few disk blocks to the as difference possible to in solve speed problem in sharpening a pencil using a sharpener on one s desk or by taking an airplane to the other side of the world and using a sharpener on someone else s desk. (D. Comer)
19 I/O-EFFICIENT ALGORITHMS MATTER Example: Visiting data in order Array size N = 10 elements Disk block size B = 2 elements Main memory size M = 4 elements Algorithm 1: N=10 disk accesses Algorithm 2: N/B=5 disk assesses Difference between N and N/B huge N = 256 x10 6, B = 8000, 1 ms disk access time N accesses take 256 x10 3 sec = 4266 min = 71 hours N/B assesses take 256/8 sec = 32 seconds
20 ALGORITHM ENGINEERING & COLLABORATION Much of centers collaboration driven by algorithm engineering Design/implementation of practical algorithms & experimentation - Often provide valuable input to theoretical research work - Sometime leads to practical breakthroughs MADALGO, COWI and SCALGO flood risk collaboration Started in 2006 as part of Strategic Research Council project Builds on MADALGO I/O-efficient algorithms research Unique big terrain data solutions and establishment of SCALGO Collaboration continues, including in Innovation Fond project Unique flood risk products
21 FLOOD RISK ANALYSIS IMPORTANCE Important to screen extreme rain or sea-level rise flood risk 50% of Danes worry about their homes being flooded (Userneeds) 90% of Danes say high flood risk affect decision to buy house Cost of 2011 Copenhagen flood over 6 billion kroner (Swiss Re) Potential to do so using detailed national elevation model Elevation for roughly every 2x2 meter of soon ½x½ meter hundreds or even thousands of points in family home lot!
22 DETAILED (BIG) TERRAIN DATA ESSENTIAL Mandø 2 meter sea-level rise 90 meter terrain model 2 meter terrain model
23 DETAILED (BIG) TERRAIN DATA ESSENTIAL Drainage network (flow accumulation) 90 meter terrain model 2 meter terrain model
24 SURFACE FLOW MODELING Flow accumulation on grid terrain model: Initially one unit of water in each grid cell Water (initial and received) distributed from each cell to lowest lower neighbor cell Flow accumulation of cell is total flow through it Note Flow accumulation of cell = size of upstream area Drainage network = cells with high flow accumulation Flow stops/disappears in depressions -> model often filled
25 FLOW ACCUMULATION PERFORMANCE Natural algorithm access disk for each grid cell Push flow down the terrain by visiting cells in height order Problem since cells of same height scattered over terrain Performance of commercial systems often not satifactory Cannot handle Denmark at 2-meter resolution We developed I/O-optimal algorithms Now handle Denmark 2-meter model in a day on limited 4GB desktop!
26 FLOW ACCUMULATION SUCCESS STORY Shuttle Radar Topography Mission (SRTM) Near global dataset 3-arc seconds (90-meter at equator) raster ~60 billion cells stored in roughly files Large USGS Hydrosheds project produced hydrological conditioned 90-meter data But upscaled to 500-meter to compute flow accumulation Using I/O-efficient algorithms: One day on standard 4GB workstation!
27 FLASH FLOOD MAPPING Models how surface water gathers in depressions as it rains Water from watershed of depression gathers in the depression Depressions fill, leading to (dramatic) increase in neighbor depression watershed size Watershed Watershed area area Volume Volume Flash Flood Mapping: Amount of rain before any given raster cell is below water
28 FLASH FLOOD MAPPING EXAMPLE After 10mm rain After 50mm rain After 100mm rain After 150mm rain
29 FLASH FLOOD MAPPING SUCCESS STORY Based on collaborative research, COWI markets SCALGO produced Flash Flood Mapping product in Denmark under name Skybrudskort Produced for entire country Sold to over half of local governments Jones Edmunds compared Flash Flood Mapping to result of advanced dynamic model (ICPR) for Marion County, Florida Results very close Significantly more detailed Cost under 5%
30 AFTERNOON: ONLINE DEMONSTRATION
31 CONCLUSIONS Hope to have convinced you that Big Data has huge potential - in research, industry and society Exploiting Big Data challenging - research-industry-society collaboration one way to success Thanks!
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
GSR discussion paper Big Data - Opportunity or Threat? Work in progress, for discussion purposes Comments are welcome! Please send your comments on this paper at: firstname.lastname@example.org by 20 June 2014. The views
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
Is Connectivity A Human Right? For almost ten years, Facebook has been on a mission to make the world more open and connected. For us, that means the entire world not just the richest, most developed countries.
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
Three steps to put Predictive Analytics to Work The most powerful examples of analytic success use Decision Management to deploy analytic insight in day to day operations helping organizations make more
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania email@example.com, firstname.lastname@example.org
Semester: Title: Cloud computing - impact on business Project Period: September 2014- January 2015 Aalborg University Copenhagen A.C. Meyers Vænge 15 2450 København SV Semester Coordinator: Henning Olesen
Big Data Cloud Computing Industry and Technology Assessment 7 June, 2012 HTTP: Hunter Thomas Tony Project Page 1 Introduction In the current age, information is treated as precious commodity. As society
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data
Credit value:? Unit 3 Information systems 3 Information systems Organisations can exploit information to do better at planning, monitoring and controlling their business activity. However, it is easy to
THE NETWORK PROVIDERS BUSINESS CASE FOR INTERNET CONTENT DELIVERY Peter Christy Vice President Internet Research Group 399 Main St., Los Altos, CA 94022 (650) 559-2103 email@example.com www.irgintl.com
White paper Proactive Planning for.. Big Data.. In government, Big Data presents both a challenge and an opportunity that will grow over time. Executive Summary Consider this list of government-adopted
ericsson White paper Uen 284 23-3264 February 2015 Next-generation data center infrastructure MAKING HYPERSCALE AVAILABLE In the Networked Society, enterprises will need 10 times their current IT capacity
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
HBR.ORG October 2012 reprint R1210C Spotlight on Big Data Big Data: The Management Revolution Exploiting vast new flows of information can radically improve your company s performance. But first you ll
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
CGMA REPORT From insight to impact Unlocking opportunities in big data Two of the world s most prestigious accounting bodies, AICPA and CIMA, have formed a joint venture to establish the Chartered Global
IBM Software Business Analytics Big Data Business Analytics for Big Data Unlock value to fuel performance 2 Business Analytics for Big Data Contents 2 Introduction 3 Extracting insights from big data 4
Issue 4 Handling Inactive Data Efficiently 1 Editor s Note 3 Does this mean long term backup? NOTE FROM THE EDITOR S DESK: 4 Key benefits of archiving the data? 5 Does archiving file servers help? 6 Managing
BIG DATA IN LOGISTICS A DHL perspective on how to move beyond the hype December 2013 Powered by Solutions & Innovation: Trend Research PUBLISHER DHL Customer Solutions & Innovation Represented by Martin
2014 Market Research Results R E P O R T Data Analytics/Big Data in Financial Services Data Analytics and Big Data in Financial Services Market Research Results Market Research Report Components: Introduction
Big data comes of age in FP&A Financial planning, budgeting, and forecasting kpmg.com KPMG 2013 Startup Success Guide i CONTENTS 1 Executive summary 2 Economic update: Federal Reserve Bank of Chicago 4
Big Data Privacy Workshop Advancing the State of the Art in Technology and Practice co- hosted by The White House Office of Science & Technology Policy & Massachusetts Institute of Technology MIT Big Data
TDWI research First Quarter 2014 BEST PRACTICES REPORT Predictive Analytics for Business Advantage By Fern Halper Co-sponsored by: tdwi.org TDWI research BEST PRACTICES REPORT First Quarter 2014 Predictive