INRA's Big Data perspectives and implementation challenges. Pascal Neveu UMR MISTEA INRA - Montpellier



Similar documents
Big Data: Challenges in Agriculture. Big Data Summit, November 2014 Moorea Brega: Agronomic Modeling Lead The Climate Corporation

The growing challenges of big data in the agricultural and ecological sciences

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Big Data: Challenges and Opportunities

PODD. An Ontology Driven Architecture for Extensible Phenomics Data Management

Research Roadmap for the Future. National Grape and Wine Initiative March 2013

Data collection architecture for Big Data

AppSymphony White Paper

A case study in cloud computing

Trustworthiness of Big Data

The agro-ecological transition at INRA

A Strategy for Plant Breeding Data Management in International Agricultural Research

High-Throughput Computing for HPC

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

Big Data and Natural Language: Extracting Insight From Text

HPC technology and future architecture

Big Data Europe

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Linked Science as a producer and consumer of big data in the Earth Sciences

Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform

How can information technology play a role in primary industries climate resilience?

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Zenith: a hybrid P2P/cloud for Big Scientific Data

Data Refinery with Big Data Aspects

Conquering the Astronomical Data Flood through Machine

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Environmental Research and Innovation ( ERIN )

Introduction to Data Mining

A Service for Data-Intensive Computations on Virtual Clusters

Modelling neutral agricultural landscapes with tessellation methods: the GENEXP-LANDSITES software - Application to the simulation of gene flow

The Global Digital Revolution and Canadian Agriculture and Mining

Industry 4.0 and Big Data

Workflow Tools at NERSC. Debbie Bard NERSC Data and Analytics Services

Manufacturing Analytics: Uncovering Secrets on Your Factory Floor

Precision Agriculture Using SAP HANA and F4F Cloud Integration to Improve Agribusiness. Dr. Lauren McCallum May, 2015

Report of the DTL focus meeting on Life Science Data Repositories

Big Data Explained. An introduction to Big Data Science.

The challenge of managing research data. Axel Berg

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Big Data Effects on Weather and Climate

Cloud computing based big data ecosystem and requirements

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Trafodion Operational SQL-on-Hadoop

The Liaison ALLOY Platform

UAS as a platform for integrated sensing and big data

locuz.com Big Data Services

Remote sensing information cloud service: research and practice

Enabling Cross- Species Computation on Phenotype in Plants. Carolyn J. Lawrence, Iowa State University

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Dr Alexander Henzing

Information Management course

URGI and ELIXIR France for plants and food

Towards innovative cropping systems : High Throughput Phenotyping of plant biotic interactions.

Scale-out NAS Unifies the Technical Enterprise

Optimizing IT Deployment Issues

High Performance Computing and Big Data: The coming wave.

DESIGNING WEATHER INSURANCE CONTRACTS FOR FARMERS

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Building Platform as a Service for Scientific Applications

Reversing Statistics for Scalable Test Databases Generation

AgroPortal. a proposition for ontologybased services in the agronomic domain

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

Cloud Computing and Advanced Relationship Analytics

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

ICT Perspectives on Big Data: Well Sorted Materials

HPC and Big Data technologies for agricultural information and sensor systems

White Paper. Version 1.2 May 2015 RAID Incorporated

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

SERIES Y: GLOBAL INFORMATION INFRASTRUCTURE, INTERNET PROTOCOL ASPECTS AND NEXT-GENERATION NETWORKS Cloud Computing

Technology Shift 1: Ubiquitous Telecommunications Infrastructure

Transforming the Telecoms Business using Big Data and Analytics

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University of Liverpool)

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Comprehensive Data Collection by Sensor Network and Optical Sensor for Agricultural Big Data

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Enabling the Big Data Commons through indexing of data and their interactions

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Graduated Student: José O. Nogueras Colón Adviser: Yahya M. Masalmah, Ph.D.

Chapter 7. Using Hadoop Cluster and MapReduce

GOBII. Genomic & Open-source Breeding Informatics Initiative

Publishing Linked Data Requires More than Just Using a Tool

Land Use/Land Cover Map of the Central Facility of ARM in the Southern Great Plains Site Using DOE s Multi-Spectral Thermal Imager Satellite Images

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data a threat or a chance?

"LET S MIGRATE OUR ENTERPRISE APPLICATION TO BIG DATA TECHNOLOGY IN THE CLOUD" - WHAT DOES THAT MEAN IN PRACTICE?

Performance Testing of Big Data Applications

This vision will be accomplished by targeting 3 Objectives that in time are further split is several lower level sub-objectives:

Transcription:

INRA's Big Data perspectives and implementation challenges UMR MISTEA INRA - Montpellier

Agronomic Sciences Raises integrated issues and challenges: How to adapt agriculture to climate change? How agriculture impacts environment? Agroecology «producing and supplying food in a different way» Global food security and needs of adaptation Plant treatment and food safety... / AG MIA 2014 2

Data challenges in Science Modern science must deal: More data production A lot of experimental datasets available on the Web More collaborative and integrative approaches Management, sharing and data analysis play an increasing role in research Discover, combine and analyse these data Big Data Challenges 3

Big data? A buzz word A definition: data sets grow so large and complex that it becomes impossible to process using traditional data processing methods (Management and Analysis) / AG MIA 2014 4

Big data Why data is big? Devices, sensors, simulations, etc. Collaborative and participative Storage capacity, Internet access, etc. Make data valuable (information and knowledge) But Less than 1 % of big data is analyzed Less than 20 % of big data is protected (New Digital Universe Study) Big Data vs Survey Sample theory / AG MIA 2014 5

Agronomic Big data V characteristics Volume: massive data and growing size hard to store, manage and analyze Variety and Complexity: different sources, scales, disciplines different semantics, schemas and formats etc. hard to understand, combine, integrate, Velocity: speed of data generation have to be process on line Veracity Validity,, Vulnerability, Volatility, Visibility, Visualisation, etc. 6

Why Big Data is important in Agronomic Sciences? Production of a lot of heterogeneous data for understanding Open new insights Allow to know: Which theories are consistent and which ones are not! When data did not quite match what we expect Decision support: Combine, transform, analyse, design models, predictive approache needs 7

Illustration: High throughpout phenotyping High throughput? High frequency and many trait observations of Phenotypes Many Plant Genotypes Interactions Various Environments 8

Why high throughput phenotyping is important for agriculture? Adaptation to climate change More efficient use of natural resources (including water and soil) in our farming practices Sustainable management and equity Food security Crop performance (yields are globally decreasing) Genotyping and Phenotyping Plant phenotyping has become a bottleneck for progress in plant science and plant breeding 9

What to measure? 10

Phenome High throughput plant phenotyping French Infrastructure 9 multi-species plateforms 2 controlled platforms 5 field platforms 2 high throughout omics 11

5 Field Platforms Various scales and data types Cell, organ, plant, canopy, population time Images, hyperspectral, spectral, sensors, actuators, human readings... Thousands of micro-plots 12

2 Controlled Platforms 60 Various scales and data types Plant biomass g 40-1 time 50 30 20 10 0 0 20 40 60 80 100 Time (d) 13

2 «Omics» platforms Various data complex types composition and the structure of biopolymers Quantification of metabolites and enzyme activities Grinding weighting (-80 C) Extraction Fractionation Pipetting Incubation Reading 14

Data management challenges in Phenome: Volume growth 40 Tbytes in 2013, 100 Tbytes in 2014, Volume is a relative concept Exponential growth makes hard Storage Management Analysis Phenome HPC and Storage Cloud (FranceGrille, EGI) Easy to use with a sort of «unlimited scalability» On-demand infrastructure and Elasticity (season) Virtualization technologies Data-Based parallelism (same operation on different data) 15

Data management challenges in Phenome: Variety Can be produce by differents communities (geneticians, ecophysiologists, farmers, breeders, etc) Data integration needs extensive connections to other types of data (genotypes, environments, experimental methods, etc.) Different semantics, data schemas, Can be associated in many ways (environments, individuals, populations, etc.) Extremely diverse data Web API, Ontology sets, NoSQL and Semantic Web methods 16

Data management in Phenome: Velocity Controlled platforms produce tens of thousands images/day (200 days per year) Field platforms produce tens of thousands images/day (100 days per year) Omics platforms produce tens of Gbytes/day (300 days per year) Scientific Workflow Galaxy OpenAlea /provenance module (Virtual Plant INRIA team) Scifloware (Zenith INRIA team) 17

Data management challenges in Phenome: Validity Data cleaning Automatically diagnose and manage: Consistency?, duplicate? Wrong? annotation consistency? Outliers? Disguised missing data?... Some approaches Unsupervised Curve clustering (Zenith INRIA team) Curve fitting over dynamic constrains Clustering of Image histograms 18

Conclusion High throughput phenotyping data: Hard to produce Hard to manage Also hard to analyse Thank you for your attention 19