Understanding Big Data Analytics Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Analytics



Similar documents
High Productivity Data Processing Analytics Methods with Applications

On Establishing Big Data Breakwaters

Classification Techniques in Remote Sensing Research using Smart Data Analytics

European Data Infrastructure - EUDAT Data Services & Tools

Selected Parallel and Scalable Methods for Scientific Big Data Analytics

Selected Parallel and Scalable Methods for Scientific Big Data Analytics

Scalable Developments for Big Data Analytics in Remote Sensing

Big Data Governance Certification Self-Study Kit Bundle

Agile Analytics on Extreme-Size Earth Science Data

Big Data Governance Certification Self-Study Kit Bundle

The Matsu Wheel: A Cloud-based Scanning Framework for Analyzing Large Volumes of Hyperspectral Data

CUSTOMER Presentation of SAP Predictive Analytics

EDISON Education for Data Intensive Science to Open New science frontiers

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

Performing a data mining tool evaluation

Workprogramme

Database Marketing, Business Intelligence and Knowledge Discovery

Big Data in the context of Preservation and Value Adding

SCIENCE DATA ANALYSIS ON THE CLOUD

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Science: what is possible. Engineering: turn science into an everyday commodity (cheap, safe, reliable, resilient, )

Azure Machine Learning, SQL Data Mining and R

Cleaned Data. Recommendations

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

Data Mining Applications in Higher Education

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

CASC Spring Meeting 2014 Federal Agency Panel Update on Big Data

IBM's Fraud and Abuse, Analytics and Management Solution

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

OGC s Big Data Initiative

Big Data Analytics. Tools and Techniques

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering

Introduction to Data Mining

Conquering the Astronomical Data Flood through Machine

An Introduction to Advanced Analytics and Data Mining

Hadoop Cluster Applications

9360/15 FMA/AFG/cb 1 DG G 3 C

Information Management course

Studying Code Development for High Performance Computing: The HPCS Program

The Challenges of Geospatial Analytics in the Era of Big Data

Maschinelles Lernen mit MATLAB

Selected Big Data Analytics Methods in Health Research

HPC & Visualization. Visualization and High-Performance Computing

Satellite and ground-based remote sensing for rapid seismic vulnerability assessment M. Wieland, M. Pittore

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Architect Certification Self-Study Kit Bundle

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

Workspaces Concept and functional aspects

The? Data: Introduction and Future

RDA. RDA: Co-chairing - Big Data IG - Geospatial IG. OGC co-chairing: - editor, Big Geo Data stds - BigData.DWG

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Business Intelligence. Data Mining and Optimization for Decision Making

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Introduction to Big Data in HPC, Hadoop and HDFS Part One

Introduction. A. Bellaachia Page: 1

Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables

8970/15 FMA/AFG/cb 1 DG G 3 C

Smart specialization: an agenda for scientific and policy action

Make Better Decisions Through Predictive Intelligence

The challenge of managing research data. Axel Berg

Outils pour l'analyse prédictive parallèle de multiples sources de données non structurées

Transforming the Telecoms Business using Big Data and Analytics

Integrating a Big Data Platform into Government:

SIMCA 14 MASTER YOUR DATA SIMCA THE STANDARD IN MULTIVARIATE DATA ANALYSIS

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

not possible or was possible at a high cost for collecting the data.

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

High-Throughput Computing for HPC

How To Understand And Understand A Problem

Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing

Data Isn't Everything

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.

Big Data Infrastructures for Processing Sentinel Data

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

Elephant Meeting on Big Data Services

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Handling Heterogeneous EO Datasets via the Web Coverage Processing Service

HPC Wales Skills Academy Course Catalogue 2015

Financial Services and Technology Forum 30 September TOPIC: Economic Value of Data

Las Tecnologías de la Información y de la Comunicación en el HORIZONTE 2020

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS

Manifest for Big Data Pig, Hive & Jaql

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Big Data Executive Survey

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Politecnico di Torino. Porto Institutional Repository

Standards for Big Data in the Cloud

Information Processing, Big Data, and the Cloud

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

Scientific Big Data Analytics by HPC - Parallel and Scalable Machine Learning on JURECA

Data Analysis Bootcamp - What To Expect. Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Transcription:

Understanding Big Data Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Interest Group Co Chairs

are Needed in Big Data-driven Scientific Research The challenge is to understand which analytics make sense Terms & Motivation Understanding climate change, finding alternative energy sources, and preserving the health of an ageing population are all cross-disciplinary problems that require high-performance data storage, smart analytics, transmission and mining to solve. In the data-intensive scientific world, new skills are needed for creating, handling, manipulating, analysing, and making available large amounts of data for re-use by others. [2] A Surfboard for Riding the Wave Report [1] Riding the Wave Report Big Data How do we enable high productivity processing How do we find a message in the bottle

Understanding concepts & terminologies There are different views on the different terms so lets be concrete and show evidence and running code Data Analysis supports the search for causality Describing exactly WHY something is happening science Understanding causality is hard and time-consuming, but is necessary Searching it often leads us down the wrong paths Big Data is focussed on correlation Not focussed on causality enough THAT it is happening money/events Discover novel patterns and WHAT is happening more quickly Using correlations for invaluable insights often data speaks for itself Analysis is the in-depth interpretation of big data are powerful techniques to work on big data Parameter/event space exploration may use (1) analytics, then (2) analysis Pre-/Post-Process data with (1) analytics for deeper/faster (2) data analysis processing Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 3/ 9

Understanding Applications & Technology Lighthouse goal : High Productivity Processing of Research Data Regression++ Scientific Computing Scientific Applications using Big Data Traditional Scientific Computing Methods HPC and HTC Paradigms & Parallelization Emerging Data Approaches Statistical Data Mining & Machine Learning Big Data Big Data Focus of our work Classification++ 4/ 9 Clustering++

RDA Group: Understand Concrete Solutions Develops community based recommendations on feasible data analytics approaches to address scientific community needs/problems of utilizing large quantities of data. Work with different scientific domain applications and their use of concrete big data analytics techniques what really works and runs to solve the solution IG: Bottom-up, dispersed and only slightly coordinated Sharing knowledge of analysis algorithms, analytical tools, data and resource characteristics and running code that works will be part of the recommendations. Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 5/ 9

PRACE/XSEDE Earth Science Problem: Quality control via outlier detection with PANGAEA data Key PI: Dr. Robert Huber, MARUM, Bremen, Germany PANGAEA Problem: longitude/latitude/altitude correlations with IAGOS data Key PI: Dr. Owen Cooper, NOAA ESRL, US IAGOS Problem: Event tracking analytics with spatial computing datasets Key PI: Dr. Rahul Ramachandran, NASA MSFC, US EVENTS Problem: Continuous seismic waveforms analysis for earthquakes monitoring Key PI: Alberto Michelini, INGV, Italy SEISMIC Problem: Projecting & transforming geospatial big data into a common coordinate reference framework Key PI: Shaowen Wang, NCSA, US SCALE GIS Take advantage of e-infrastructure for automation and sharing of methods/data Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 6/ 9

Increase Understanding with Applications Contributions Tackles bottom-up use cases that require big data analytics Provides a systematic classification of technology combinations Develops recommendations on feasible analytics approaches Offers best practice guides for researchers & concrete problems Concrete Application Implementations Selected use cases with concrete problems Event Outlier Detection Problem: event tracking analytics (e.g. understanding somali jets) Data sets from satellites ( events with changing geolocations ) Technologies: HPC/HTC (map-reduce), data-bases, several algorithms Status: review existing event tracking literature & algorithms Problem: automatic outlier detection in big data (PANGAEA) Data sets from time-series measurements (e.g. Koljoefjords, Sweden ) Technologies: HPC/HTC (map-reduce), R (outliers, RMPI) Status: CRISP-DM, investigating running code for outlier algorithms Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 7/ 9

Findings: Parallelization & Big Data is Hard Problem: Automatic outlier detection for data quality Tailor solution for community Scalability towards Big Data Design and improve data analytics approaches Problem: Classification of buildings from multi-spectral images Enable smooth transition from manual Matlab SVM scripts Research on parallel SVM methods (map-reduce, HPC) Clustering++ Classification++ Regression++ Algorithm A Implementation Systematic guided by CRISP DM Algorithm Extension A Implementation Parallelization of Algorithm Extension A A implementations available implementations rare and/or not stable Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 8/ 9

References [1] J. Wood et al., Riding the Wave How Europe can gain from the rising tide of scientific data, report to the European Commission, 2010 [2] Knowledge Exchange Partner, A Surfboard for Riding the Wave Towards a Four Action Country Programme on Research Data, 2011 [3] Research Data Alliance (RDA) Web Page, Online: https://rd-alliance.org/node [4] G. Fox, MPI and Map-Reduce, Talk at CCGSC 2010 Flat Rock,NC, 2010 Morris Riedel et al., Understanding Big Data Applications in Earth Sciences, EGU 2014, Vienna, Austria 9/ 9