EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

Similar documents
ICT Perspectives on Big Data: Well Sorted Materials

Big Data better business benefits

Big Data R&D Initiative

Integrating a Big Data Platform into Government:

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / %

Big Data: Rethinking Text Visualization

DATA ANALYTICS SERVICES. G-CLOUD SERVICE DEFINITION.

Increase Revenue THE JOURNEY TO BIG DATA. Gary Evans. CTO EMC Ireland. Twitter.com/Gary3vans. Copyright 2013 EMC Corporation. All rights reserved.

Navigating Big Data business analytics

CONNECTING DATA WITH BUSINESS

How To Improve The Performance Of Anatm

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Kingdom Big Data & Analytics Summit 28 FEB 1 March 2016 Agenda MASTERCLASS A 28 Feb 2016

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

KAVE ecosystem unlocks the potential of Big Data Building blocks for scalable, manageable and cost-efficient data analysis

Towers Watson pricing software

BIG Data Analytics Move to Competitive Advantage

LJMU Research Data Policy: information and guidance

How To Understand The Benefits Of Big Data

Dr Alexander Henzing

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Better planning and forecasting with IBM Predictive Analytics

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Fight fire with fire when protecting sensitive data

An effective approach to preventing application fraud. Experian Fraud Analytics

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Datascape for Cyber-Security NSA Cyber Defence Exercise Worked Example

Big Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS

Towards an On board Personal Data Mining Framework For P4 Medicine

Business Intelligence meets Big Data: An Overview on Security and Privacy

Smarter Analytics. Barbara Cain. Driving Value from Big Data

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

2 Visual Analytics. 2.1 Application of Visual Analytics

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

FOUNDATIONS OF A CROSS- DISCIPLINARY PEDAGOGY FOR BIG DATA

Visualization methods for patent data

A strategic approach to fraud

Chapter 7: Data Mining

Agenda Overview for Multichannel Marketing, 2015

ESRC Big Data Network Phase 2: Business and Local Government Data Research Centres Welcome, Context, and Call Objectives

Mining Network Relationships in the Internet of Things

Insightful Analytics: Leveraging the data explosion for business optimisation. Top Ten Challenges for Investment Banks 2015

Data-Driven Decisions: Role of Operations Research in Business Analytics

37.5 (core office hours are 9:00am 5:30pm Monday to Friday)

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Sanjeev Kumar. contribute

Bachelor of Information Technology

DGE /DG Connect

WHITE PAPER. Modelling the Way

Predicting the future of predictive analytics. December 2013

Big Data and Social Networks Research How Digital Technologies Shape Collective Behavior and Why it Matters

Information Visualization WS 2013/14 11 Visual Analytics

EL Program: Smart Manufacturing Systems Design and Analysis

Panel on Emerging Cyber Security Technologies. Robert F. Brammer, Ph.D., VP and CTO. Northrop Grumman Information Systems.

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Government Technology Trends to Watch in 2014: Big Data

How To Find Influence Between Two Concepts In A Network

Briefing note: GCHQ Internships

Introduction to Data Mining

THE GENIUS OF DATA: MAKING INTELLIGENT SECURITY A REALITY

EMPOWER WITH DATA YOUR BUSINESS AND KEEPING IT SAFE. maximizing data s business value

MEng, BSc Applied Computer Science

National and Transnational Security Implications of Big Data in the Life Sciences

Programme Specification

12/7/2015. Data Science Master s programs

Information Management course

Symantec Global Intelligence Network 2.0 Architecture: Staying Ahead of the Evolving Threat Landscape

Business Plan 2012/13

BIG DATA PUBLIC PRIVATE FORUM

BIG DATA & DATA SCIENCE

Big Data Analytics: 14 November 2013

An Introduction to Advanced Analytics and Data Mining

PALANTIR & LAW ENFORCEMENT

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Urban Big Data Centre

Building Connected Businesses with Internet of Things Services

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

POSTGRADUATE PROGRAMS IN APPLIED DATA ANALYTICS

top issues An annual report

Business Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA

9360/15 FMA/AFG/cb 1 DG G 3 C

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Data Mining Applications in Higher Education

CYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21)

Big Data-ready, Secure & Sovereign Cloud

Visualizing Threats: Improved Cyber Security Through Network Visualization

8970/15 FMA/AFG/cb 1 DG G 3 C

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

MEng, BSc Computer Science with Artificial Intelligence

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering

Data analytics Delivering intelligence in the moment

Overview of SEO Recon Features and Benefits

Social Data Science for Intelligent Cities

How To Handle Big Data With A Data Scientist

Data Intensive Science and Computing

Connecting data driving productivity and innovation

Transcription:

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in this document, go here: www.well-sorted.org/output/epsrcbigdata

Introduction Dear participant, Thank you for taking part in submitting and sorting your ideas. This document contains several visualisations of your ideas, grouped by the average of your online sorts. They are: Dendrogram - This tree shows each submitted idea and its similarity to the others. The lower two ideas 'join' the more people grouped those two ideas together. For example, if two ideas join at the bottom, every person grouped those two together. Tree Map - This visualisation presents an 'average' grouping. It is calculated by 'cutting' the Dendrogram at the dashed line so that any items which join lower than that line are placed in the same group. In addition, rectangles which share a side of the same length are more similar to each other than their peers. Heat Map - This visualisation shows a similarity matrix where each idea is given a colour at the intersection with another idea, showing how similar the two are. This is useful to see how well formed a group is. The more red there is in a group (shown by the black lines), the more similar the ideas inside it were judged to be. Raw Group Data - This table shows every submitted idea and its longer description. They are shown in the same order as the Dendrogram (so similar ideas are close to each other) and split into the coloured groups used in the Tree Map. In addition, each idea has been given a unique number so they are easier to find. Page 1

Dendrogram Page 2

Tree Map Page 3

Heat Map Page 4

Raw Group Data Red 1 Anonymity and Privacy Pulling together data to grain greater insights. However this has an impact on privacy. Even anonymisation or pseudonymisation have challenges as data can be worked back to the source. Mathematic techniques to prevent deduction of data would be useful. 2 Data privacy for Big Data Understanding how to achieve adequate levels of privacy given the difficulty in using traditional methods of cleaning and delinking. 3 Data personalisation and deidentification Finding robust, scaleable, practical ways of reconciling the tension between the need to identify/own personal data (and data derived from personal data) while also being able to contribute to population level analyses of data without being identified. 4 Cybersecurity Analysis of how big data can be used to track activity in an organisation and detect possible cyber attacks, track threats and campaigns and provide context and insight into the form of attacks. This could enable early warning and prevention. 5 Trusted Crowd Sourced Information How can we develop methods that ensure crowdsourced information is accurate and trustworthy? Page 5

Blue 6 Applying techniques to new areas For many big data problems there is a lack of expertise in the field to understand the correct approaches to employ and this needs data scientists and application scientists to work together. 7 EPS big data problems There are a huge numbers of areas in the EPS disciplines where big data problems exist, but the expertise to tackle them does not. This is especially important for EPSRC engagement with industry. 8 Innovation ready data scientists How do we produce the multidisciplinary researchers and entrepreneurs who are commercially aware, can talk/engage with people, and yet also have the data analytic and visualisation skills? 9 Develop the skill set There is a significant lack of people with proper experience and knowledge of handling big / distributed datasets. Those people with academic experience often don't have experience with the tools used in industry 10 Raise the profile of Statistical Science in UK At the heart of Big Data is the statistical methods required to make sane and rational inferences that lead to actionable knowledge. SS is central to realising potential of Big Data and this is a good thing for EPSRC supporting SS. Green 11 Taking the data analyst out of the loop How do we create data exploration interfaces and associated methodologies that enable (and guide) nonexperts to explore their own data to discover and then exploit value. Is this partly an education issue? 12 Improving accessibility to big data Research to create methods & tools that non-experts can use to explore the potential of big data - enabling wider uptake and new kinds of innovation and impact driven by a wide range of people 13 Complex Data Visualisation How can we develop visualisations that make complex data sets easier to understand and analyse. Page 6

Orange 14 Defining the physiological envelope Sophisticated physics-based simulations of individual patients are performed using exquisitely accurate anatomical data from medical images. The boundary conditions are equally important and should be personalised from information in the clinical record. 15 Uncertainty and variation in physiological models We need to learn how to characterise and to represent the uncertainty and variation in information that comes from clinical data, and to develop methods for the propagation into physiological models for diagnosis and interventional planning. 16 Dynamic modelling of complex real-life systems Synthetic and playable data-driven models of complex interacting systems in biology, engineering, health, environment, transport, robotoics, manufacturing and public policy, unsupervised learning of emergent phenomena capable of driving decision making. 17 Define a connectedness in research landscape Almost all sciences are becoming more reliant on the sensible analysis and production of Big Data. Many of these disparate disciplines are being tied together by the need of common computational statistical methods to make the advances BD promises. 18 Application-specific research Work in support of applications in the Digital Economy and PaCCS programmes. Specifically there will be emerging "big data" challenges from the ESRC Research and Evidence Hub and also the new EPSRC IoT Research Hub. 19 Opportunity to deliver impact in number of areas There is no doubt that BD presents many opportunities to physical sciences - how this is to be harnessed and deliver impact can be facilitated by EPSRC - ATI being one example of many. Purple 20 Economic and social models for data Much of the impact from big data should come from the new economies and social structures that it engenders. Can we model this in a way that gives useful, predictive abilities for economies and societies of the future. 21 Social and realtime data Making use of new forms of data coming from social networks and sensor networks to augment curated data from longitudinal studies. 22 Symbiotic Human-Machine Collaboration Understand the science of how humans and machines can work together in the most effective way that makes the best use of their complementary data analysis skills Page 7

Yellow 23 Closing loops Creating new and automated ways of collecting in-use, through and end of life data and feeding it back to early stages of product development processes - to improve existing products and inform the development of new ones. 24 Archiving software is essential for big data Software often generates big data. If this is the case, it is not necessary to always keep the big data but is important to make sure the software generating the data is archived in a sustainable and recoverable manner. 25 When do we no longer need a dataset? The increase in speed of software can mean it is better to reproduce data than store it. A life expectancy of this data might therefore be four times as long as it takes to generate it. Understanding this life cycle is key to an effective big data policy. 26 Lots of little bits of data make big data Alot, if not most, of big data is made up of many bits of small data. Engendering a culture of sustainably documenting and archiving small data is a critical component of many scientific areas where EPSRC can contribute by promoting best practice. 27 Sharing Big Data from Many Producers Understanding how to move big data where the model is not simply having one large instrument, but is instead many producers of large amounts of data who need to share and combine subsets places different stresses on the infrastructure we have in place. 28 Data discovery and aggregation Within large organisations data is often distributed in separate IT systems many using commercial enterprise software. How do we apply map-reduce type operations at a meta-level? How do we securely aggregate data that may be commercially sensitive? 29 Methods/infrastructure to integrate models/data The model-based interpretation of Big Data requires effective and efficient integration of the data with the modelling and simulation tools. We need to develop the whole area of physics-based Reduced-Order Modelling, and the infrastructure to integrate. Pink 30 Creation of Synthetic Benchmark Datasets Based on an understanding of application areas, the creation of large scale, realistic data sets that can be used to benchmark potential solutions. Page 8

Silver 31 Data Analytics for Disruptive Business Models New ways of collecting, analysing, visualising heterogeneous data (especially that of new doublesided markets and platforms). What are the best ways to encourage the two sides to provide value? 32 Landing decisions Businesses don't need more data or insights, they need better decisions based on data. how to translate big data into business changes and impact (beyond good sounding case studies) is difficult. 33 Asset optimisation How do we organise assets to provide optimised service offerings? E.g. real-time asset tracking and health monitoring; portfolio management; predicting customer behaviour and usage patterns. 34 Product optimisation How do we optimise products using all relevant product data? E.g. physics based design simulations; manufacturing process data; service data from current products in the field; knowledge of the market place and competitor products. Brown 35 Intelligent simulations Big data could be used to inform the design, validation and verification of computational simulations - bringing process simulations closer to real-world processes, akin to CAD for 3D products. Page 9

Cyan 36 Mathematics of Information Mathematics is the language of information and data. Some mathematical areas (harmonic analysis, optimization, computation, topology) are already engaged in Big Data research, the challenge is to create intellectual space for further developments. 37 Transformation at the maths / CS interface Using problems of challenging data (big, heterogeneous, streaming, soft, uncertain, partial, garbled) to generate novel research at the interface between mathematics and computer science, especially in algorithms, complexity, computability and reasoning. 38 Algorithms Theoretical Computer Sciences are fairly weak in UK, the challenge is expanding capacity in general area of Algorithms: deterministic, random, combinatorial, mixed..., their design, analysis and complexity. 39 Data algorithms at scale Developing robust algorithms to give good enough decisions/predictions in situations of very high data volumes/velocities and/or at very low levels of data integrity (through heterogeneity or measurement uncertainty). 40 Development of new algorithms for big data Given the size and complexity of some big data problems new approaches are needed that combined statistics, computer science and high performance computing to tackle the analysis. 41 Analytics Efficiency As data grows exponentially although advances in processing it are also speeding up there comes a point when the cost of analysing it exceeds the value gained. Research into efficient analytic techniques will help to continue getting the benefit from this 42 Cross-cutting machine learning Machine learning is the machine room of Big Data. It spans themes in Statistics, Functional Analysis, Approximation Theory, Optimization, Computer Science and Engineering. The challenge is to plan research there as truly inter-disciplinary. 43 Novel Machine Learning Methods New algorithms to support real-time analysis of uncertain, incomplete, inconsistent and possibly corrupted data. 44 Scalable Machine Learning Frameworks Interactive programming notebooks could be the new excel. If these get linked to distributed computing and easy to use programming frameworks, business can get faster access to insights without specialised staff. 45 Multi-scale data analysis Applications involving very large data sets and streams being mined for structure, pattern and community detection on a variety of scales simultaneously. Examples: social, economic, behavioural as well as systems biology, gen- and proteomics, cosmology. Powered by TCPDF (www.tcpdf.org) Page 10