Big Data Analysis John Domingue (STI International and The Open University) Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 257943) 1
The Data landscape Coping with data variety and verifiability are central challenges and opportunities for Big Data The long tail of data variety is a major shift in the data landscape Need for scalable approaches to cope with data under different format and semantic assumptions 2
The Data Value Chain Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing Sensor networks Streams Multimodality Data preprocessing Semantic analysis Sentiment analysis Data correlation Pattern recognition Realtime analysis Machine learning Trust Provenance Data augmentation Annotation Data validation Redundancy elimination Keep up-to-date Consistency Resources to store and process data In-Memory technologies Efficient data NoSQL NewSQL Cloud Security and privacy Decision support Predictions Simulation Exploration Modelling Control Domainspecific usage 3
Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Data analysis 4
Heterogeneous data at speed 75 Exabytes/year of Mobile Data by 2015 5
Data analysis Overview Big Data Analysis Big Data Analysis is concerned with making raw data which has been acquired amenable to use Supports decision making as well as domain specific usage. Typically data analysis will entail transforming data into a richer, possibly semantic based, representation. 6
Core techniques The techniques associated with Big Data Analysis will encompass those related to data mining and machine learning, to information extraction and new forms of data processing and reasoning including for example, stream data processing and largescale reasoning. 7
Meaning of Paris Capital of France Socialite Star Trek Character Mythical Character Plant James Bond Character Song Asteroid 8
RDF = Subject, Property, Value Triples 9
Triples combine to make Graphs 10
linked data successes 11
I Like Casablanca 13
People, photos, friends and the Web 14
16
Big data analysis exemplars 17
Data analysis Case Study Real-time radiation monitoring Crowd-sourced realtime radiation monitoring in Fukushima Combine official data, unofficial official data contributed by concerned citizen Community Analysis and Collection Number of data collection points can be dramatically increased; Communities will often create bespoke tools for the particular situation and to handle any problems in data collection (Developer Ecosystem) Citizen engagement is increased significantly 18
London riots summer 2011 19
Data journalism (award winning) London Riots 2011 Key Offence Accused address Richer Poorer 20
Future scenario 21
Health and big data Virtual Physiological Human 22 Patient Avatar Personalised Model Cardiovascular Workflow
Summary The Data Landscape Coping with data variety and verifiability are central challenges and opportunities for Big Data The long tail of data variety is a major shift in the data landscape Need for scalable approaches to cope with data under different format and semantic assumptions The Solution Space Lowering the usability barrier for data tools is a major requirement across all sectors. Users should be able to directly manipulate the data Solutions based on large communities (crowd-based approaches) are emerging as a trend to cope with Big Data challenges Principled semantic and standardized data representation models are central to cope with data heterogeneity Significant increase in the use of new data models (i.e. graph-based) (expressivity and flexibility) 23
Subject Matter Expert Interviews Interviews available at: http://big-project.eu/video-interviews 24