ICT Perspectives on Big Data: Well Sorted Materials

Similar documents
Big Data: Rethinking Text Visualization

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Information Visualization WS 2013/14 11 Visual Analytics

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Analyzing Big Data: The Path to Competitive Advantage

Data Refinery with Big Data Aspects

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Navigating Big Data business analytics

The Scientific Data Mining Process

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trustworthiness of Big Data

Visualization methods for patent data

Concept and Project Objectives

Beyond the Single View with IBM InfoSphere

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Are You Big Data Ready?

The Business Analyst s Guide to Hadoop

Cisco Data Preparation

How To Handle Big Data With A Data Scientist

Real-Time Solutions to Big Data Problems

IBM Big Data in Government

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Pitfalls and Best Practices in Role Engineering

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG Data Analytics Move to Competitive Advantage

Big Data in the context of Preservation and Value Adding

Tapping the benefits of business analytics and optimization

Big Data & Analytics for Semiconductor Manufacturing

Databricks. A Primer

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

The Liaison ALLOY Platform

Reference Architecture, Requirements, Gaps, Roles

Integrating a Big Data Platform into Government:

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

XML enabled databases. Non relational databases. Guido Rotondi

Big Data Analytics. Chances and Challenges. Volker Markl

Apache Hadoop: The Big Data Refinery

Big Data - Infrastructure Considerations

BIG DATA THE NEW OPPORTUNITY

IBM System x reference architecture solutions for big data

Wikibon Big Data Analytics Survey: Barriers to Adoption by Role

Databricks. A Primer

Predicting the future of predictive analytics. December 2013

IBM SECURITY QRADAR INCIDENT FORENSICS

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Advanced Big Data Analytics with R and Hadoop

Model, Analyze and Optimize the Supply Chain

How To Create An Insight Analysis For Cyber Security

Data Centric Computing Revisited

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

I. TODAY S UTILITY INFRASTRUCTURE vs. FUTURE USE CASES...1 II. MARKET & PLATFORM REQUIREMENTS...2

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Essential Elements of an IoT Core Platform

Knowledge Discovery from patents using KMX Text Analytics

Introduction to Data Mining

BIG DATA What it is and how to use?

Testing Big data is one of the biggest

Government Technology Trends to Watch in 2014: Big Data

Sanjeev Kumar. contribute

Anuradha Bhatia, Faculty, Computer Technology Department, Mumbai, India

ANALYTICS BUILT FOR INTERNET OF THINGS

Luncheon Webinar Series May 13, 2013

Industry 4.0 and Big Data

Integrated Social and Enterprise Data = Enhanced Analytics

Big Data better business benefits

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Big Data. Fast Forward. Putting data to productive use

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Using In-Memory Computing to Simplify Big Data Analytics

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

HadoopTM Analytics DDN

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK AND INFORMATION PROTECTION

Data Isn't Everything

BIG DATA & DATA SCIENCE

International Journal of Innovative Research in Computer and Communication Engineering

HOW TO DO A SMART DATA PROJECT

Transforming the Telecoms Business using Big Data and Analytics

Master big data to optimize the oil and gas lifecycle

CONNECTING DATA WITH BUSINESS

Convergence of Big Data and Cloud

Kingdom Big Data & Analytics Summit 28 FEB 1 March 2016 Agenda MASTERCLASS A 28 Feb 2016

Transcription:

ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in this document, go here: www.well-sorted.org/output/ictperspectivesonbigdata

Introduction Dear participant, Thank you for taking part in submitting and sorting your ideas. This document contains several visualisations of your ideas, grouped by the average of your online sorts. They are: Dendrogram - This tree shows each submitted idea and its similarity to the others. The lower two ideas 'join' the more people grouped those two ideas together. For example, if two ideas join at the bottom, every person grouped those two together. Tree Map - This visualisation presents an 'average' grouping. It is calculated by 'cutting' the Dendrogram at the dashed line so that any items which join lower than that line are placed in the same group. In addition, rectangles which share a side of the same length are more similar to each other than their peers. Heat Map - This visualisation shows a similarity matrix where each idea is given a colour at the intersection with another idea, showing how similar the two are. This is useful to see how well formed a group is. The more red there is in a group (shown by the black lines), the more similar the ideas inside it were judged to be. Raw Group Data - This table shows every submitted idea and its longer description. They are shown in the same order as the Dendrogram (so similar ideas are close to each other) and split into the coloured groups used in the Tree Map. In addition, each idea has been given a unique number so they are easier to find. Page 1

Dendrogram Page 2

Tree Map Page 3

Heat Map Page 4

Raw Group Data Red 1 Data privacy issue Big data offers a great opportunity to tackle societal challenges. But data privacy is one of barriers to maximize such an opportunity. One key research question: novel data processing approaches to enable efficient analysis without violating privacy 2 Security The scale and impact of a big data breach is likely to dwarf scale and impact of data breaches in more traditional systems. The distributed nature of big data systems makes them more vulnerable. Can existing approaches to big data security scale? Blue 3 How Big Data better improve manufacturing process? Big Data will be more prominent with more and more IoT devices getting introduced into manufacturing. Advanced analytics helps decode complex manufacturing processes to improve yield through data prioritization, root causes analysis and process modelling. 4 Building Performance Determinants Research shows higher than expected consumption of energy and resources to create and maintain healthy & comfortable indoor environments. Disparate data sources need to be linked and analysed to identify key determinants for excessive resource consumption Page 5

Green 5 Managing Volume and Velocity of data Complex sensor networks and the like will output actionable data at volumes and velocities that we have not worked with yet. There is huge potential from machine learning and automated control systems taking informed next best actions. 6 Big Data is growing faster than Moore's Law! The volume and velocity of data generation is increasing faster than Moore's law (and other laws for scaling performance of computing systems). How do we engineer compute platforms and software infrastructure to analyse big data workloads? 7 Scalable and Expressive Platforms for Big Data We need new types of distributed software platforms for Big Data processing that are scalable, expressive and performant, but remain intuitive to use. They need to offer a real-time unified view of large sets of streaming and historic data. 8 Capacity Planning for converging digital Networks - dealing with data volumes from multi-sensorscapacity planning from the network- cloud computingsufficient statistics- energy-aware data centres and proxy-clouds 9 Research Data Management (RDM) at scale The costs of storing, curating, sharing and publishing research data at scale demands new techniques for sharing data without copying it and applying analytics without direct access to the data. New levels of collaboration and resource sharing are needed. 10 Reliable software We need software to analyse large quantities of data but the functionality is quite different from 'normal' software. This creates a challenge - how can we have confidence in the software used? Orange 11 Data Wrangling Data Wrangling is the process of collecting and preparing data before analysis can take place. It emerged that data scientists spend 50%-80% of their time for data wrangling (DW). Current approaches to DW are rather ad hoc. A methodology is sorely needed. 12 Determining what data are valuable Data size and rate of creation are predicted to rise rapidly. Much of the data will be of low quality, have limited provenance and be subject to errors. Determining rapidly and efficiently subsets of the data have value and what to keep will be essential. 13 Integration of heterogeneous data Data is now taking many complex forms from time series, images, video, to massive scale sequences of amino acids, clinical images and so forth. How these are to be integrated in a coherent manner that improves SNR is a major open issue. 14 Merging Heterogeneous Big Data Sources The key research challenge in the area of big data in my opinion is the problem of merging heterogeneous Page 6

sources of information, quantifying at the same time the quality of the different data sources and the uncertainty associated to them. 15 Heterogeneous data - mixed data types Objects are described by multiple data types, e.g. a patient may be described by a mixture of structured data, time series representing results of a particular text over time, images, text reports and more. Analysis of such complex data is underesearched 16 Merging Disparate Data Sets (Data Fusion) Developing computational tools to merge and contrast, at large scale, related but distinct data sources. Can we find patterns that are (a) shared or (b) distinct? Can we quantify which data sources provide the most insight and which are contradictory? 17 Big Data Integration Value is increasingly going to be found in latent connections amongst datasets that originate from independent sources. Diversity is both an opportunity and a challenge for predictive analytics techniques that aim to distill knowledge from data. 18 Multi-source information fusion for big data Application of AI techniques and development of data fusion methods for the extraction of salient data for decision support in cluttered, congested and conflicted big data environments. 19 Tools for addressing heterogeneity of big data Scale is not our biggest challenge - it is heterogeneity in form (structured/unstructured data, numbers/text) and format (numerous haphazardly-organised sources). Tools are urgently needed - generalised but powerful enough to be useful analytically. 20 Reference Data It is challenging to create a single database with all data from different intrusion systems. The data in different formats needs to be combined to make an informed decision by the security analyst. 21 Modelling ill-structured data Ill-structured data include unstructured crowd source data (e.g. Social media) and irregular samples (e.g. cross section surveys) at different spatial and/or temporal density/frequency. Most existing analytical tools are suitable for well-structured data. 22 Flexible, scalable and semanticbased approaches Techniques for analysing, modeling and reasoning over heterogeneous and dynamic datasets whose structure may potentially change with time from multiple sources will be needed. This calls for highly flexible, scalable and semantic-based approaches. Page 7

Purple 23 Algorithmic Breakthroughs in Big Data Analytics While MapReduce etc. simplifies parallelism, most big data computations are just sums and counts. New directions promising breakthroughs: parallel optimization and approximation algorithms; streaming & summary techniques; efficient graph/matrix algorithms 24 New algorithmic frameworks Shifting attention in algorithm design and analysis towards (1) relational rather than numerical data (2) data streams rather than sets (3) approximate solutions to imperfectly posed problems (4) randomised methods for partial data 25 Robust and Theory Driven Statistical Modeling Big Data Analytics can produce statistics and features of interest when studying socio-technical phenomena,but what is the theory behind the analytics? Can we incorporate theory from different disciplines,and are we sure the methods used are appropriate? 26 New methods for exploratory data analysis We need to understand properties of microscopic events in large information spaces where information meets coincidentally rather than causally determined. This becomes important for exploratory data analysis to generate hypothesis in big data scenarios. 27 How can we accurately quantify uncertainty? There are established methods for quantifying uncertainty, but these will not be appropriate for big data. For example, some methods involve modelling assumptions. For big data, model errors will lead to substantial underestimation of uncertainty. 28 Combining Big Data with Prior Knowledge Often in machine learning, we want to leverage prior knowledge about the data to improve the analysis. Current approaches include feature design and probabilistic modelling. But it can be hard to express vague knowledge in the language of these formalisms 29 Impedance Mismatch: Advanced Analyses v BD Systems Math/Statistical principles vs algorithms and engineering. Need to know how both sides work and how one is mapped to the other. Guaranteed effects in analyses may be lost by underlying infrastructure. Scalable BIG DATA analytics depends on above. Page 8

Yellow 30 Human-Centred Analytics The black box nature of mining algorithms, need for parameter tuning, difficulty of coping with outliers/bad data requires experts to mine data. We need a humancentred approach using effective visualisation and visual analytics. 31 Big data HCI How can we visualise, summarise, or otherwise expose what is contained in big datasets (or what we learn from analysing them), such that that people can understand and exploit this knowledge? 32 User oriented visualisation and configuration How do we develop methodologies and technology that allow users to explore and configure large complex data spaces and their visualisations? i.e. can we take the data analytics expert out of the loop? 33 Interacting with data Humans are often left out when talking about making sense from data. Users will need to interact with complex data in cognitively demanding situations. Understanding the needs of users in these situations is key to fully exploiting the new data rich lands 34 Remove barriers in the use and understanding Massive volumes of data are generated every second from corporate and public sources which could potentially be relevant to decision-makers. How can we build interfaces to support human decision-making? 35 The Emotions of Data What is humans' emotional connection with data and how does that affect data-based decision making processes? Data-based decision making implicitly assumes rational actors. But humans are not rational. How can analytics support emotional decision making? 36 Delivery of Big Data Analytics Impacts The challenges for big data analytics are not only from the pure analytic parts, say, improving learning algorithms and etc. The key issue is how ordinary people would benefit from those analytic results given the huge knowledge gap in between. 37 Targeted decision support The key challenge is to understand what could be extracted from big data that would allow us to take more informed decisions for particular uses. This would involve collaboration between data scientists and experts in the field of the origin of the data. 38 Scale of the results of big data analysis We do not only need methods that are scaled to cope with big data, we also need to consider how to scale down the results to make sure that we (humans) are able to understand and use all of its potential. 39 Visual Big Data: Images and Video The evolution of the internet from a text to a visual medium is well recognised and especially evident in social media (CicsoTF: by 2018 70% of internet will be video). How do we make sense of this deluge of visual data often with little/no metadata? Powered by TCPDF (www.tcpdf.org) Page 9