BIG DATA EUROPE. Integrating Big Data, Software & Communities for Addressing Europe s Societal Challenges



Similar documents
Standards for Big Data in the Cloud

Workprogramme

FOT-Net Final Event. Myriam Coulon-Cantuer European Commission DG CONNECT Smart Cities and Sustainability

How To Help The European Single Market With Data And Information Technology

Research Infrastructures in Horizon 2020

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Dr Alexander Henzing

H2020-LEIT-ICT WP Big Data PPP

Kimmo Rossi. European Commission DG CONNECT

Big Data in Drug Discovery

CONNECTing to the Future

H2020-LEIT-ICT WP ICT 14, 15, 17,18. Big Data PPP

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Horizon October DG Agriculture and Rural Development European Commission

LDIF - Linked Data Integration Framework

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Data Integration Strategies

Anforderungen der Life-Science Industrie an die Hochschulen. Hans Widmer Novartis Institutes for BioMedical Research

Introduction to the 2015 Horizon 2020 Energy Call for Proposals. 14 July 2014

We use Reaxys intensively for hit identification, hit-to-lead and lead optimization.

Integrating a Big Data Platform into Government:

Scalable End-User Access to Big Data HELLENIC REPUBLIC National and Kapodistrian University of Athens

HORIZON ENERGY context and Calls 2014/15. Ljubljana, 23 January 2014 THE EU FRAMEWORK PROGRAMME FOR RESEARCH AND INNOVATION

Pivot Park Screening Centre participates in novel 196 million pan-european drug discovery platform

e-infrastructures in Horizon 2020 Vision, approach, drivers, policy background, challenges, WP structure INFODAY France Paris, 25 mars 2014

On the need for intelligent access to big data in life sciences

Las Tecnologías de la Información y de la Comunicación en el HORIZONTE 2020

ICT in the Health, demographic change and well being challenge

MASHUPS FOR THE INTERNET OF THINGS

Nanomedicine in Horizon 2020

BIOINFORMATICS Supporting competencies for the pharma industry

Knowledge based energy management for public buildings through holistic information modeling and 3D visualization. Ing. Antonio Sacchetti TERA SRL

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

LINKED OPEN DRUG DATA FROM THE HEALTH INSURANCE FUND OF MACEDONIA

Horizon 2020 EU s 8th framework programme for research and innovation

Industry 4.0 and Big Data

Big Data and the Data Lake. February 2015

Orientation on Horizon2020

Informatics and Knowledge Management at the Novartis Institutes for BioMedical Research (NIBR)

dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Synergies between the Big Data Value (BDV) Public Private Partnership and the Helix Nebula Initiative (HNI)

CONNECTING DATA WITH BUSINESS

THOMSON REUTERS CORTELLIS FOR INFORMATICS. REUTERS/ Aly Song

OpenAIRE Research Data Management Briefing paper

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

ALPINE NEWSLETTER. INSIDE CitySUMs INDICATE CI-NERGY FASUDIR FORTISSIMO iurban MIRIAM Glasgow TSB Future Cities

EDITORIAL MINING FOR GOLD : CAPITALISING ON DATA TO TRANSFORM DRUG DEVELOPMENT. A Changing Industry. What Is Big Data?

ICT 7: Advanced cloud infrastructures and services. ICT 8: Boosting public sector productivity and innovation through cloud computing services

Cloud and Big Data Standardisation

Integrated Risk Management System Components in the GEO Architecture Implementation Pilot Phase 2 (AIP-2)

Using Open Source software and Open data to support Clinical Trial Protocol design

ProSUM Prospecting Secondary raw materials in the Urban mine and Mining wastes

A Big Picture for Big Data

Technology Implications of an Instrumented Planet presented at IFIP WG 10.4 Workshop on Challenges and Directions in Dependability

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

DGE /DG Connect

Language Technologies in Europe: trends and future perspectives

8 October 2015 The evolution of renewable energy integration and the smart grid in Europe: The current situation, challenges and opportunities

Turning data into business. Exploiting big data requires fundamental rethinking of how we do business.

TopBraid Insight for Life Sciences

EU Threat Landscape Threat Analysis in Research ENISA Workshop Brussels 24th February 2015

Litsa Kountouridou National Contact Point for ICT and FET. Tel:

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

BIOVIA: SCIENTIFIC INNOVATION IN THE AGE OF EXPERIENCE A NEW SCIENTIFIC COLLABORATIVE ENVIRONMENT

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

WORK PROGRAMME Topic ICT 9: Tools and Methods for Software Development

De novo design in the cloud from mining big data to clinical candidate

D5.5 Initial EDSA Data Management Plan

ICT : Internet of Things and Platforms for Connected Smart Objects

Transcription:

BIG DATA EUROPE Integrating Big Data, Software & Communities for Addressing Europe s Societal Challenges

Partners

Mission Lower barrrier for using big data technologies o Required effort and resources o Required data science skills Assist in establishing cross-lingual/organizational/domain Data Value Chains Show societal value of Big Data www.big-data-europe.eu 16-mars-15

cross-lingual / cross-organizational / cross-domain Societal Domain Preliminary Big Data Focus area Selected Key Data assets Life Sciences & Health Heterogeneous data Linking & integration Biomedical Semantic Indexing & QA ACD Labs / ChemSpider, ChEBI, ChEMBL, Con-ceptWiki, DrugBank, EN-ZYME, Gene Ontology, GO Annotation, Swis-sProt, UniProt, Wik-iPathways, PubMed, MeSH, Disease Ontology (DO), Joint Chemical Dic-tionary (Jochem), Bio-ASQ datasets Food & Agriculture Large-scale distributed data integration INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural Bibliography Network (ABN), AGRIS, AquaMaps, Fishbase Energy Real-time monitoring, stream processing, data analytics, and decision support European Energy Exchange Data, smart meter measurement data, gas/fuels/energy market/price data, consumption statistics, equipment condition monitoring data) Transport Climate Social Sciences Security Streaming sensor network & geo-spatial data integration Real-time monitoring, stream processing, and data analytics. Statistical and research data linking & integration Real-time monitoring, stream processing, and data analytics. Image data analysis GTFS data, OSM/ LinkedGeoData, MobilityMaps, Transport sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure - EuroRoadS European Grid Infrastructure (EGI), Databases hosting atmospheric data. Several software frameworks for simulation, calibration and reconstruction. Federated social sciences data catalogs, statistical data from public data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank) Earth Observation data (e.g. Very High Resolution Satellite Imagery acquired from commercial providers and governmental systems) and collateral data for supporting CFSP/CSDP missions and operations, Databases hosting atmospheric Data. Experimental and simulation data concerning dispersion of hazardous substances

Project Summary Two clearly defined coordination and support measures: Coordination: Engaging with a diverse range of stakeholder groups representing particularly the Horizon 2020 societal challenges Health, Food & Agriculture, Energy, Transport, Climate, Social Sciences and Security; Collecting requirements for the ICT infrastructure needed by data-intensive science practitioners tackling a wide range of societal challenges; covering all aspects of publishing and consuming semantically interoperable, large-scale data and knowledge assets; Support: Designing, realizing and evaluating a Big Data Aggregator platform infrastructure that meets requirements, minimises disruption to current workflows, and maximises the opportunities to take advantage of the latest European RTD developments (incl. multilingual data harvesting, data analytics & visualisation). BigDataEurope will implement and apply two main instruments to successfully realize these measures: Build Societal Big Data Interest Groups in the W3C interest group scheme and involving a large number of stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design, integrate and deploy a cloud-deployment-ready Big Data aggregator platform comprising key open-source Big Data technologies for real-time and batch processing, such as Hadoop, Cassandra and Storm.

Domain Specific Data Assets & Technology Societal Challenges Orthogonal Dimensions of Big Data Ecosystems Generic Big Data Enabling Technologies Data Value Chain Data Generation & Acquisition Data Analysis & Processing Data Storage & Curation Data Visualization & Usage Data-driven Services Healthcare Food Security Energy Intelligent Transport Climate & Environment Inclusive & Reflective Societies Secure Societies

BigDataEurope Platform www.big-data-europe.eu 16-mars-15

Work Packages & Implementation Phases M1-M12 M13-M24 M25-M36 Community Building WP2 Community Building & Requirements Enabling Technologies WP3 Big Data Generic Enabling Technologies & Architecture Component Integration WP4 Big Data Integrator Platform Integrator Deployment WP5 Big Data Integrator Instances Community Assessment WP6 Real-life Deployment & User Evaluation Uptake WP7 Dissemination & Communication

BDE platform covers complete data-landscape Data processing with human organized information Similar data processing steps applied on a large quantity Similar data processing steps applied on a stream of data

Reporting API Dissemination storage Dissemination API Blueprint BDE platform Background knowhow Bulk database Background aggregator Bulk data aggregator aggregated data Search index SPARQL JSON-LD JSON LOD search Real time aggregator Dataset Meta data

Reporting API Deployment Dissemination storage Dissemination API Blueprint BDE platform Background knowhow Bulk database Background aggregator Bulk data aggregator aggregated data Search index SPARQL JSON-LD JSON LOD search Real time aggregator Dataset Meta data

Coordination www.big-data-europe.eu 16-mars-15

Networking partners Health, demographic change and wellbeing Food, Agriculture, Forestry, Water and Bioeconomy Inclusive, innovative and Reflective Societies Secure, clean and efficient energy Climate, environment, resource efficiency and raw materials Smart, green and integrated transport Secure Societies www.big-data-europe.eu

Envisioned societal stakeholder engagement cycle

Community building and supporting Establish 7 Societal Big Data Interest Groups o o o modelled after the W3C interest groups involving a large number of stakeholders from the H2020 societal challenges as well as technical Big Data experts each group has a domain and a technical chair Building a European network and multiplier organization per societal challenge to o o o o o engage with stakeholders in the particular societal challenge area and raise awareness support the requirements elicitation, definition and prioritization assemble a library of data sources and datasets provide a comprehensive test bed for the evaluation of the BDE Aggregator Platform select pilot use cases, across different domains o promote the showcase developed for the societal domain and support the dissemination of the BDE results o provide appropriate academic and training curricula for training future 27-févr.-15 www.big-data-europe.eu researchers and practitioners.

Workshops 7 X 3 Workshops (at least 3 per Societal Challenge) First series of workshops in the next months will focus on requirements definition o o analyse workshops results and create 1st draft per societal challenge, examine also the use of other tools such as surveys (broad audience to ask for (big) data management needs) manage experts interviews with Big Data experts interviews with EC representative per societal challenge Second series of workshops in the 2 nd year will focus on a review of the architecture and first prototype implementation Third series of workshops in the 3 rd year will focus on the platform evaluation and showcases for the societal domains 27-févr.-15 www.big-data-europe.eu

Big Data Europe Bert.Van.Nuffelen@tenforce.com y.barnard@mail.ertico.com www.big-data-europe.eu 16-mars-15

OPEN PHACTS - BIG DATA AND DRUG DISCOVERY BRYN WILLIAMS-JONES, CEO THE OPEN PHACTS FOUNDATION Big Data Europe

Pre-competitive Informatics: Pharma companies are all accessing, processing, storing & re-processing external open research data Literature Patents PubChem Genbank Databases Downloads x Repeat @ each company Data Integration Data Analysis Firewalled Databases Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

The Innovative Medicines Initiative EC funded public-private partnership for pharmaceutical research Focus on key problems Efficacy, Safety, Education & Training, Knowledge Management The Open PHACTS Project Create a semantic integration hub ( Open Pharmacological Space ) Runs 2011-2014, ENSO till 2016 Deliver services to support on-going drug discovery programs in pharma and public domain Leading academics in semantics, pharmacology and informatics, driven by solid industry business requirements 10 EFPIA companies, 15 academics, 6 SMEs Focus on sustainability and long term impact of the Open PHACTS infrastructure

Open PHACTS Mission Integrate Multiple Research Biomedical Data Resources Into A Single Open & Free Access Point

What do research scientists want to know? ChEMBL DrugBank Gene Ontology Wikipathways GeneGo ChEBI UniProt UMLS GVKBio ConceptWiki ChemSpider TrialTrove TR Integrity

Business Questions Number sum Nr of 1 Question 15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 13 8 For a given interaction profile, give me compounds similar to it. 37 13 8 38 13 8 41 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature. 44 13 8 Give me all active compounds on a given target with the relevant assay data 46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) 59 14 8 Identify all known protein-protein interaction inhibitors

Core Platform The Open PHACTS Discovery Platform Apps Identity Resolution Service Adenosine receptor 2a Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identifier Management Service P12374 EC2.43.4 CS4532 Semantic Workflow Engine Data Cache (Virtuoso Triple Store) Chemistry Registration Normalisation & Q/C Indexing VoID VoID Nanopub VoID VoID Nanopub VoID Nanopub Public Ontologies Db Db Db http://dx.doi.org/10.1016/j.websem.2014.03.003 Db User Annotations

Sustaining Impact Software is free like puppies are free - they both need money for maintenance and more resource for future development

How do we move data about and integrate it?

Data Standardisation is vital http://imgs.xkcd.com/comics/standards.png

Yet the bioscience world really struggles to agree on names GB:29384 P12047 X31045

bryn@openphactsfoundation.org Acknowledgements Open PHACTS Practical Semantics info@openphactsfoundation.org @Open_PHACTS GlaxoSmithKline Coordinator Universität Wien Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen Esteve Almirall OpenLink Scibite The Open PHACTS Foundation Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca Pfizer

Big Data Europe Bert.Van.Nuffelen@tenforce.com y.barnard@mail.ertico.com www.big-data-europe.eu 16-mars-15