Big Learning Data Management and Data Analysis

Size: px
Start display at page:

Download "Big Learning Data Management and Data Analysis"

Transcription

1 Big Learning Data Management and Data Analysis... for industrial applications Thomas Natschläger Das SCCH ist eine Initiative der Das SCCH befindet sich im

2 SCCH Key Facts application-oriented research organization initiated by institutes of the Johannes Kepler University Linz cooperation science - industry non-profit organization constituted as Ltd owners Johannes Kepler University Linz Upper Austrian Research GmbH Association of Company Partners of SCCH ~ 60 employees (>80 with partners) 5,7 mio euros income incl. subsidies in business year 2010/2011 founded in July 1999 in the realm of the K plus Program since 2008 COMET competence center 2

3 Research Topics Process and Quality Engineering software engineering software quality process and approaches Rigorous Methods in Software Engineering software specification, verification, validation formal methods (ASM, Event-B, etc.) process modeling, workflows Models, Architectures and Tools software architecture model-based development integration of architecture in development Knowledge-Based Vision Systems machine vision object recognition object tracking Data Analysis Systems automated and intelligent data analysis prediction and optimization knowledge discovery 3

4 Application Domains DAS - Data Analysis Systems Topics Computational Models Semantic Knowledge Models Knowledge Discovery Machine Learning Stream Data Analysis Data Warehousing Data Management 4

5 Application Domains DAS - Data Analysis Systems Topics Computational Models Semantic Knowledge Models Knowledge Discovery Machine Learning Stream Data Analysis Data Warehousing Data Management 5

6 Overview Temporal Analytics on Big Data Applications Fault Detection Proposed Architecture Related Work Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 6

7 Overview Temporal Analytics on Big Data Applications Fault Detection Proposed Architecture Related Work Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 7

8 Domain: Industrial Production system 1 system 2 system i system n PIMS Subsystems generate streams of sensor data Stored in Production Information Management System Analysis Tasks Quality Assurance Process Optimization Fault Detection Fault Diagnosis... 8

9 Selected References voestalpine Stahl GmbH Analysis of continuous casting process Integration of expert knowledge visual Data Mining, Interpretation Böhler Edelstahl Quality analysis of high-grade steel production unisoftware plus machine learning framework (mlf) Basis for many projects in the area of process analysis Siemens Transformers Austria Optimization of power transformer cores Voith Paper, SCA Laakirchen Analysis and optimization in paper production Analysis tool PaperMiner AMS Engineering Knowledge discover in discrete manufacturing Analysis of stand stills, fault detection 9

10 Domain: Machine Manufacturer Data Center Machines at different locations generate streams of sensor data Stored in data center Analysis Tasks Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault Diagnosis... 10

11 Domain: Decentralized Renewable Energy, Home Automation Data Center Sensors of different kind at each building generate streams of sensor data Temperature Solar radiation Energy production... Analysis Tasks Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault Diagnosis 11

12 Application : Fault Detection for Renewable Energy Units (near) real time detection of faults of units naturally temporal task => Data Stream Processing profile analysis of units Need access to all units => central application large amount of devices => Big Data low false positive rate, i.e. good model needs considerable amount of historical data especially for long term drifts => Big Data 12

13 Fault Detection Algorithms A) Compare measured channels to a model Deviation indicate fault and its type A good model needs to be identified (learned) Typically using historical good data B) Fit known model type e.g. ARX: y t = a k y t k + i,k b i,k x i (t k) Bad coefficient of fitness indicates faults 13

14 Evaluated Solution Combination of Big Data Storage (BDS) for off-line MapReduce and Stream Processing Engine (SPE) for on-line, real-time unit 1 unit 2 SPE unit i MUX unit n BDS 14

15 Fault Detection Method A Compare measured channels to a mode MapReduce is used to calibrate model on historical data SPE applies model in user-defined operator (UDO) REPLAY for testing unit 1 unit 2 SPE Read e.g. from RDBMS unit i MUX REPLAY Model unit n BDS MapReduce 15

16 Fault Detection Method B Fit known model structure to data BDS supplies historical data for testing via REPLAY SPE incrementally fits certain kind of regression model unit 1 unit 2 SPE Mo del unit i MUX REPLAY unit n BDS 16

17 Stream Data Mining: Incremental Algorithms 1. Process an example at a time, and inspect it only once 2. Use a limited amount of memory 3. Work in a limited amount of time 4. Be ready to predict at any time Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen, Thomas Seidl. Journal of Machine Learning Research (JMLR) Workshop and Conference Proceedings. Volume 11: Workshop on Applications of Pattern Analysis (2010). 17

18 Stream Data Mining: Open Source Framework MOA MOA: Massive Online Analysis WEKA community, Java Big Data stream mining (classification, regression, and clustering) in real time Can be easily used with e.g. Hadoop Extendable with new mining algorithms Goal: provide a benchmark suite for the stream mining community 18

19 Discussion General Setting Units generate streams of sensor data (time,value) Central storage of data for analysis tasks Many analysis tasks are temporal in nature; e.g. fault detection Implemented by current technology without much effort REPLAY partially solves the problem of implementing algorithms for MapReduce and SPE Issues: Usage of multiple SPE per machine or combiner Integration of existing incremental learning tools such as MOA 19

20 Related Work: TiMR Framework Combination of M-R and SPE (DSMS) Temporal queries for off-line and on-line Implemented using StreamInsight and SCOPE/Dryad Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan Temporal Analytics on Big Data for Web Advertising. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (ICDE '12). IEEE Computer Society, Washington, DC, USA 20

21 Overview Temporal Analytics on Big Data Applications Failure detection Proposed Architecture Related Work Mo del Mo del Mo del Mo del Mo del Mo del Mo del Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 21

22 Causal Models for Prediction and Fault Detection Setting Complex industrial process Limited knowledge about interdependencies Goal E.g. Predict amount of TOC in wastewater for next 48h Challenges Robustness of model Precision of model Several thousands of sensors => computational complexity Approach Identify causal model structure Use parallelization to tackle computational complexity 22

23 Base: Gaussian Graphical Models Linear Model Various methods to estimate parameters Prominent Method to estimate structure: Graphical Lasso (Friedman 2007, 2012) based on L1 regularized minimization of log-likelihood 23

24 Extension to time: Granger Causality X would Granger Cause Y if it contains information useful in forecasting Y Implemented by graphical lasso on time lagged variables Work in progress Grouped Granger Graphical Lasso Detection of control loops Non-linear extensions => increases computational complexity 24

25 Parallelization of Machine Learning Algorithms MapReduce (see first part of talk) Good for data-parallel: Problems with iterative algorithms and complex dependencies in the data GraphLab intuitively expresses computational dependencies applied to dependent records which are stored as vertices in a large distributed data-graph GPGPU complex low level code (kernel) or: High-Level languages: SAC, Matlab, Mathematica... Meta-Programming: PyCUDA / CL,... graphlab.org 25

26 Parallelization of Machine Learning Algorithms MapReduce (see first part of talk) data-parallel: Problems with iterative algorithms and complex dependencies in the data GraphLab intuitively expresses computational dependencies applied to dependent records which are stored as vertices in a large distributed data-graph GPGPU complex low level code (kernel) or: High-Level languages: SAC, Matlab, Mathematica... Meta-Programming: PyCUDA / CL,... Hardware agnostic Parallel Patterns Esp. Parallel Patterns for Machine Learning graphlab.org 26

27 ParaPhrase High-level design and implementation patterns useful parallelism for a wide range of parallel applications heterogeneous multicore/manycore systems Hardware Abstraction Basis : FastFlow Framework (Turin, Pisa) General Purpose Patterns Master Slave, Farm, Pipeline, work queue, data dependency Domain Specific Patterns (SCCH, HLR Stuttgart) Suitability of generic patterns for machine learning ML - Patterns: pool oriented, graphical models patterns, time series,... 27

28 Relevant Use-Cases / Project Competencies (selection) TRUMPF Austria Improving precision of bending machines K-Projekt SoftNet (I + II) Fault prediction in software systems Mining Repositories K-Projekt PAC Process Analytic Chemestry Virtual sensors for chemical process analysis and control BlueSky Locally optimized weather predictions Application : Energy Efficiency Verbund Prediction of available water flow to optimize renewable energy usage Based on machine learning framework 28

29 Use Case: Local Weather Prediction mb, , Salzburg Linz St. Pölten Wien Eisenstadt 48 Data collection Bregenz 47 Innsbruck Graz 47 Klagenfurt Analysis Data sources Global Weather Models Expert Knowledge Prediction Local Sensors: Weather stations, power plante,... Topographie, Expert knowledge Models Alcohol Goal Planning of events, maintenance,... Basis for optimization of energy usage 29

30 Optimization of Renewable Energy Usage Flow values, Precipitation / Temperature & Forecast Snow melt, ground Humidity (Holzmann & Nachtnebel 2002) Data Driven Models (z.b. Ridge Regression, Neural Networks) Rainfall-Runoff-Model (Hebenstreit 2000) HYSIM II (Drabek et al. 2002) CH Legende: Laufkraftwerke der AHP Speicherkraftwerke der AHP Gemeinschaftskraftwerke der AHP Beteiligungen des Verbund INN Oberaufdorf-Ebbs Gerlos Mayrhofen Bösdornau Roßhag Braunau-Simbach Nußdorf D Passau-Ingling Schärding-Neuhaus Egglfing-Obernberg Ering-Frauenstein SALZACH INN Kreuzbergmaut Bischofshofen Urreiting Funsingau Schwarzach St. Veit Wallnerau Kaprun- Hauptstufe Häusling Kaprun-Oberstufe Reißeck-Kreuzeck Malta-Oberstufe Paternion DRAU Kellerberg Jochenstein Rosegg-St. Jakob Mühlrading Staning Garsten-St. Ulrich Rosenau Mandling Ternberg Klaus Salza Sölk Bodendorf-Paal Malta-Hauptstufe Malta-Unterstufe Villach Feistritz-Ludmannsdorf Aschach Ferlach-Maria Rain Ottensheim-Wilhering ENNS Triebenbach St. Georgen Abwinden-Asten St. Pantaleon Krippau Fisching MUR Bodendorf-Mur Wallsee-Mitterk. Leoben Friesach Graz DONAU Melk Losenstein Ybbs-Persenbeug Großraming Weyer Schönau Edling Annabrücke Altenmarkt Landl Hieflau St.Martin Lebring Lavamünd Schwabeck Altenwörth Dionysen Pernegg Laufnitzdorf Arnstein Rabenstein Peggau Weinzödl Spielfeld Greifenstein Mellach Gralla Gabersdorf Obervogau SLO CZ Freudenau SK H SAMBA: Optimal weighting of all models Goals Short Term: Inclusion of availability of renewable energy in energy planning and trading (Water, Wind, Solar) 30

31 Summary Temporal Analytics on Big Data Applications Failure detection Proposed Architecture Related Work (MOA, TiMR) Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control Use-Cases 31

32 Veranstaltungstipp! Mit geeigneter Strategie zur nachhaltigen Softwarequalität: TRUST-IT 18. April, 09:00-14:00 Österreichische Computergesellschaft, Wien Zielgruppe: Software-Entwicklungsleiter, Prozessverantwortliche, Projektleiter, Software- Qualitätsingenieure und Architekturverantwortliche. 32

33 Kontakt DI Michael Zwick Dr. Thomas Natschläger Dr. Holger Schöner

Big Data Anwendungen in Industrie und Forschung

Big Data Anwendungen in Industrie und Forschung Big Data Anwendungen in Industrie und Forschung Dr. Reinhard Stumptner +43 7236 3343 851 reinhard.stumptner@scch.at www.scch.at Das SCCH ist eine Initiative der Das SCCH befindet sich im SCCH Key Facts

More information

Big Data-Anwendungsbeispiele aus Industrie und Forschung

Big Data-Anwendungsbeispiele aus Industrie und Forschung Big Data-Anwendungsbeispiele aus Industrie und Forschung Dr. Patrick Traxler +43 7236 3343 898 Patrick.traxler@scch.at www.scch.at Das SCCH ist eine Initiative der Das SCCH befindet sich im Organizational

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

VERBUND SUSTAINABILITY REPORT 2008

VERBUND SUSTAINABILITY REPORT 2008 VERBUND SUSTAINABILITY REPORT 2008 INDICATORS ECONOMIC INDICATORS EVA MILLION DIVIDENDS PER SHARE 2 PRODUCTIVITY PER EMPLOYEE T 06 07 08 374.2 439.5 440.8 06 07 08 0.75 0.90 1.05 06 07 08 1,177.1 1,244.7

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

From Big Data to Smart Data Thomas Hahn

From Big Data to Smart Data Thomas Hahn Siemens Future Forum @ HANNOVER MESSE 2014 From Big to Smart Hannover Messe 2014 The Evolution of Big Digital data ~ 1960 warehousing ~1986 ~1993 Big data analytics Mining ~2015 Stream processing Digital

More information

Decision Support in Structural Health Monitoring

Decision Support in Structural Health Monitoring Engineering and Information Systems Oct. 18-19 2010, Tokyo, Japan Decision Support in Structural Health Monitoring Reinhard Stumptner Institute for Application Oriented Knowledge Processing (FAW) Johannes

More information

Smart Data THE driving force for industrial applications

Smart Data THE driving force for industrial applications Smart Data THE driving force for industrial applications European Data Forum Luxembourg, siemens.com The world is becoming digital User behavior is radically changing based on new business models Newspaper,

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet) HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Augmented Search for Software Testing

Augmented Search for Software Testing Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,

More information

uni software plus Profile. Products. Solutions. uni software plus GmbH

uni software plus Profile. Products. Solutions. uni software plus GmbH Profile. Products. Solutions. uni software plus GmbH Mathematica UnRisk machine learning framework from Wolfram Research MathConsult / IMCC SCCH FLLL for Over 25 research institutions CERN, Fraunhofer

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Herzlich Willkommen. zum Webinar. Data Insight Lab - Smart Data for Business & RapidMiner

Herzlich Willkommen. zum Webinar. Data Insight Lab - Smart Data for Business & RapidMiner Herzlich Willkommen zum Webinar Data Insight Lab - Smart Data for Business & RapidMiner Ihre Referenten Thomas Husung Sales Manager Dr. Daniel Vinke Data Analytics Strategy & Business Model Innovation

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

Cloud Computing. RISC Software GmbH Ein Unternehmen der Johannes Kepler Universität Linz. practically defined. July 2011, Málaga Michael Krieger

Cloud Computing. RISC Software GmbH Ein Unternehmen der Johannes Kepler Universität Linz. practically defined. July 2011, Málaga Michael Krieger Cloud Computing practically defined July 2011, Málaga Michael Krieger RISC Software GmbH Ein Unternehmen der Johannes Kepler Universität Linz Overview Introduction RISC Software GmbH Hagenberg Cloud Computing

More information

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA

San Diego Supercomputer Center, UCSD. Institute for Digital Research and Education, UCLA Facilitate Parallel Computation Using Kepler Workflow System on Virtual Resources Jianwu Wang 1, Prakashan Korambath 2, Ilkay Altintas 1 1 San Diego Supercomputer Center, UCSD 2 Institute for Digital Research

More information

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008 Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report

More information

Web-Based Economic Optimization Tools for Reducing Operating Costs

Web-Based Economic Optimization Tools for Reducing Operating Costs Web-Based Economic Tools for Reducing Operating Costs Authors: Keywords: Abstract: Jeffery Williams Power & Water Solutions, Inc. David Egelston Power & Water Solutions, Inc. Browsers, Economics, Linear

More information

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,

More information

FCD in the real world system capabilities and applications

FCD in the real world system capabilities and applications 19th ITS World Congress, Vienna, Austria, 22/26 October 2012 EU-00040 FCD in the real world system capabilities and applications Anita Graser 1*, Melitta Dragaschnig 1, Wolfgang Ponweiser 1, Hannes Koller

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

HPC technology and future architecture

HPC technology and future architecture HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr

More information

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Nagarjuna College Of

Nagarjuna College Of Nagarjuna College Of Information Technology (Bachelor in Information Management) TRIBHUVAN UNIVERSITY Project Report on World s successful data mining and data warehousing projects Submitted By: Submitted

More information

Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience

Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience Extracting Knowledge and Computable Models from Data - Needs, Expectations, and Experience Thomas Natschläger, Felix Kossak, and Mario Drobics Software Competence Center Hagenberg, A-4232 Hagenberg, Austria

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

HPC and Big Data technologies for agricultural information and sensor systems

HPC and Big Data technologies for agricultural information and sensor systems HPC and Big Data technologies for agricultural information and sensor systems Dr. Gábor ÉLŐ, associate professor Péter SZÁRMES, doctoral student Széchenyi István University, Győr, Hungary Contents Modern

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist 2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Handling Big Data Stream Analytics using SAMOA Framework - A Practical Experience

Handling Big Data Stream Analytics using SAMOA Framework - A Practical Experience , pp. 197-208 http://dx.doi.org/10.14257/ijdta.2014.7.4.15 Handling Big Data Stream Analytics using SAMOA Framework - A Practical Experience Bakshi Rohit Prasad and Sonali Agarwal Indian Institute of Information

More information

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence Augmented Search for IT Data Analytics New frontier in big log data analysis and application intelligence Business white paper May 2015 IT data is a general name to log data, IT metrics, application data,

More information

The Database Systems and Information Management Group at Technische Universität Berlin

The Database Systems and Information Management Group at Technische Universität Berlin Group at Technische Universität Berlin 1 Introduction Group, in German known by the acronym DIMA, is part of the Department of Software Engineering and Theoretical Computer Science at the TU Berlin. It

More information

Executive Briefing White Paper Plant Performance Predictive Analytics

Executive Briefing White Paper Plant Performance Predictive Analytics Executive Briefing White Paper Plant Performance Predictive Analytics A Data Mining Based Approach Abstract The data mining buzzword has been floating around the process industries offices and control

More information

Data Mining Analysis of a Complex Multistage Polymer Process

Data Mining Analysis of a Complex Multistage Polymer Process Data Mining Analysis of a Complex Multistage Polymer Process Rolf Burghaus, Daniel Leineweber, Jörg Lippert 1 Problem Statement Especially in the highly competitive commodities market, the chemical process

More information

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur 2015 The MathWorks, Inc. 1 Model-Based Design Continuous Verification and Validation Requirements

More information

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better. Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better." Matt Denesuk! Chief Data Science Officer! GE Software! October 2014! Imagination at work. Contact:

More information

Big Data Means at Least Three Different Things. Michael Stonebraker

Big Data Means at Least Three Different Things. Michael Stonebraker Big Data Means at Least Three Different Things. Michael Stonebraker The Meaning of Big Data - 3 V s Big Volume With simple (SQL) analytics With complex (non-sql) analytics Big Velocity Drink from a fire

More information

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group sspada@arcweb.com

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group sspada@arcweb.com Industrial Roadmap for Connected Machines Sal Spada Research Director ARC Advisory Group sspada@arcweb.com Industrial Internet of Things (IoT) Based upon enhanced connectivity of this stuff Connecting

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Azure Data Lake Analytics

Azure Data Lake Analytics Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

Master of Science in Computer Science

Master of Science in Computer Science Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks. MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant Arvind.Hosagrahara@mathworks.com 310-819-3970 2014 The MathWorks, Inc. 1 Outline Problem Statement The Big Picture

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Holger Eichelberger, Cui Qin, Klaus Schmid, Claudia Niederée

Holger Eichelberger, Cui Qin, Klaus Schmid, Claudia Niederée Adaptive Application Performance Management for Holger Eichelberger, Cui Qin, Klaus Schmid, Claudia Niederée {eichelberger, schmid,qin}@sse.uni-hildesheim.de niederee@l3s.de Contents Contents Motivation

More information

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point

More information

Building Energy Management: Using Data as a Tool

Building Energy Management: Using Data as a Tool Building Energy Management: Using Data as a Tool Issue Brief Melissa Donnelly Program Analyst, Institute for Building Efficiency, Johnson Controls October 2012 1 http://www.energystar. gov/index.cfm?c=comm_

More information

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco. April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

AA Automated Attendant is a device connected to voice mail systems that answers and may route incoming calls or inquiries.

AA Automated Attendant is a device connected to voice mail systems that answers and may route incoming calls or inquiries. CRM Glossary Guide AA Automated Attendant is a device connected to voice mail systems that answers and may route incoming calls or inquiries. ABANDON RATE Abandon Rate refers to the percentage of phone

More information

RiMONITOR. Monitoring Software. for RIEGL VZ-Line Laser Scanners. Ri Software. visit our website www.riegl.com. Preliminary Data Sheet

RiMONITOR. Monitoring Software. for RIEGL VZ-Line Laser Scanners. Ri Software. visit our website www.riegl.com. Preliminary Data Sheet Monitoring Software RiMONITOR for RIEGL VZ-Line Laser Scanners for stand-alone monitoring applications by autonomous operation of all RIEGL VZ-Line Laser Scanners adaptable configuration of data acquisition

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction Jin Xu, Shinjae Yoo, Dantong Yu, Dong Huang, John Heiser, Paul Kalb Solar Energy Abundant, clean, and secure

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Big Data in Subsea Solutions

Big Data in Subsea Solutions Big Data in Subsea Solutions Subsea Valley Conference 2014 Telenor Arena, Fornebu, April 2-3 Roar Fjellheim, Computas AS Computas AS - Brief company profile Norwegian IT consulting company providing services

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information