Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On Declarative Data Analysis and Data Independence in the Big Data Era PVLDB 7(13): 1730-1733 Presentation to the European Competitiveness Council on March 3 rd, 2015 1 2013 Berlin Big Data Center All Rights Reserved Volker Markl
More and more data is available to science and businesses! sensor data web archives Drivers: Cloud Computing Internet of Services Internet of Things Cyber Physical Systems video streams audio streams RFID data simulation data Underlying Trends: Connectivity Collaboration Computer Generated Data 2 2 Volker Markl
ML ML Data & Analysis: Increasingly Complex! scalability algorithms DM data volume too large data rate too fast data too heterogeneous data too uncertain Data Volume Velocity Variability Veracity Reporting Ad-Hoc Queries ETL/Integration aggregation, selection SQL, XQuery Map/Reduce Data Mining MATLAB, R, Python Predictive/Prescriptive MATLAB, R, Python Analysis DM scalability algorithms 3 2013 Berlin Big Data Center All Rights Reserved 3 Volker Markl
Data-driven applications lifecycle management home automation health e-sciences water management market research information marketplaces transportation energy management will revolutionize decision-making in business and the sciences! have great economic potential! 4 4 Volker Markl
Opportunities in Individual Sectors Sectors/Domains Big Data Value Source Public Administration Healthcare & Social Care Utilities Transport and Logistics Retail & Trade Geospatial Applications & Services EUR 150 billion to EUR 300 billion in new value (Considering EU 23 larger governments) EUR 90 billion considering only the reduction of national healthcare expenditure in the EU Reduce CO2 emissions by more than 2 gigatonnes, equivalent to EUR 79 billion (Global figure) USD 500 billion in value worldwide in the form of time and fuel savings, or 380 megatonnes of CO2 emissions saved 60% potential increase in retailers operating margins possible with Big Data USD 800 billion in revenue to service providers and value to consumer and business end users USD 51 billion worldwide directly associated to Big Data market (Services and applications) OECD, 2013 McKinsey Global Institute, 2011 OECD, 2013 OECD, 2013 McKinsey Global Institute, 2011 McKinsey Global Institute, 2011 Various, 5 5 Volker Markl
Data Value Chains will succeed only when individual links operate with needed capabilities Social & Economic Benefits Several European companies and in particular research institutions and startups have created interesting technologies and services along the data value chain. However, both in business & science, data use is handled in a fragmented way. In particular SMEs lack skills to capitalize on data assets in order to improve their competetiveness. Actors along the data value chain should cooperate and form the basis of a strong and vibrant data-driven ecosystem to maximise big data value creation. 6 6 Volker Markl
Data Scientist Jack of All Trades! Domain Expertise (e.g., Industry 4.0, Medicine, Physics, Engineering, Energy, Logistics) Mathematical Programming Linear Algebra Stochastic Gradient Descent Error Estimation Active Sampling Regression Monte Carlo Statistics Sketches Hashing Application Data Science Relational Algebra / SQL Data Warehouse/OLAP NF 2 /XQuery Resource Management Hardware Adaptation Fault Tolerance Memory Management Parallelization Scalability Memory Hierarchy Convergence Decoupling Iterative Algorithms Curse of Dimensionality Control Flow Data Analysis Language Compiler Query Optimization Indexing Data Flow Real-Time 7 2013 Berlin Big Data Center All Rights Reserved 7 Volker Markl
Data Science Requires Systems Programming! Data Analysis Statistics Algebra Optimization Machine Learning NLP Signal Processing Image Analysis Audio-,Video Analysis Information Integration Information Extraction Data Value Chain Data Analysis Process Predictive Analytics R/Matlab: 3 million users Hadoop: 100,000 users People with Big Data Analytics Skills Indexing Parallelization Communication Memory Management Query Optimization Efficient Algorithms Resource Management Fault Tolerance Numerical Stability We cannot address the complexity of Data Science merely by teaching it. We need new technologies to empower more people to conduct deep analysis on big data! 8 8 Volker Markl
Simple Analysis Deep Analytics Deep Analysis of Big Data is Key to Competetiveness! Small Data Big Data (3V) The established vendors and exisiting products are falling short of the needs; new technologies, systems, platforms, and services for deep analytics are emerging. 9 2013 Berlin Big Data Center All Rights Reserved 9 Volker Markl
Simple Analysis Deep Analytics The cards are dealt anew! Apache Flink IBM BigInsights Small Data Big Data (3V) Many new companies and products are emerging to enable deep big data analysis; strong European contenders include Apache Flink, SAP HANA, Parstream, and Exasol. 10 10 Volker Markl
The Five Dimensions of the Data Economy Competitive Intelligence Industry 4.0/IoT Energy Healthcare Transportation Digital Humanities Application Dimension Scalable Data Processing Data Management Signal Processing Statistics/ML Linguistics/Text&Speech Novel Computer Architectures HCI/Visualization Technology Dimension Legal Dimension Systems Frameworks Skills Best-Practices Tools Economic Dimension Ownership Copyright/IPR Liability Insolvency Privacy Social Dimension User Behaviour Societal Impact Collaboration Business Models Benchmarking Open Source & Open Data Deployment Models Information Pricing Information Marketplaces 11 11 Volker Markl
PPP: Uniting the Actors Main industry drivers: ATOS (ES), Engineering (IT), DFKI (DE), Fraunhofer (DE), Nokia Networks and Solutions (FI), Orange (FR), SAP (DE), SIEMENS (DE), Software AG (DE), Thales (FR), TIE Kinetix (NL) Have worked on a Strategic Research & Innovation Agenda (SRIA) for period 2016 2020 (regular updates during the running of the PPP) Lighthouse Projects (e.g., on health, logistics, energy) Innovation spaces will offer secure environments for experimenting with both private and open data; will also act as business incubators and hubs for the development of skills, competence and best practices. 12 12 Volker Markl
Call to Action: Data Ecosystem for Europe Educate Data Scientists to Create the Required Talent Information Literacy -shaped Students (computer science/data management and mathematics/data analysis skills, combined with application, legal, and social skills) Enhance the e-competencies framework with data skills and job profiles Research Data Analytics Technologies, Systems and Platforms Simplified programming, large-scale data management, and novel hardware Scalable machine learning, statistical methods, and mathematical programming Information marketplaces, large-scale data stream processing and visual analytics Innovate to Maintain Competitiveness Create networks of national centers of excellence in big and open data Provide data, processing and analytics capabilities through information marketplaces Demonstrate flagship use-cases to raise awareness & solve real-world problems Startups are key innovation drivers in this field promote startups in the area of data analytics technologies, information marketplaces, and applications Raise awareness of data value and analysis value in enterprises and governments (Chief Data Scientist) and transfer technologies to enterprises, in particular SMEs Determine legal frameworks and business models Create a data ecosystem We need synchronized national and European data strategies to ensure a European technological leadership role in the Data Economy from a technology, analysis and application perspective addressing all five dimensions in the Data Value Chain! 13 13 Volker Markl