Big Learning Data Management and Data Analysis



Similar documents
Big Data-Anwendungsbeispiele aus Industrie und Forschung

Data Mining & Data Stream Mining Open Source Tools

VERBUND SUSTAINABILITY REPORT 2008

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

From Big Data to Smart Data Thomas Hahn

SURVEY REPORT DATA SCIENCE SOCIETY 2014

How To Scale Out Of A Nosql Database

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet

Integrating a Big Data Platform into Government:

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Industry 4.0 and Big Data

BIG DATA What it is and how to use?

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Data Mining Analytics for Business Intelligence and Decision Support

Herzlich Willkommen. zum Webinar. Data Insight Lab - Smart Data for Business & RapidMiner

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

The 4 Pillars of Technosoft s Big Data Practice

HIGH PERFORMANCE BIG DATA ANALYTICS

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Bayesian networks - Time-series models - Apache Spark & Scala

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The University of Jordan

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Sunnie Chung. Cleveland State University

Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Database Marketing, Business Intelligence and Knowledge Discovery

Information Management course

Maschinelles Lernen mit MATLAB

HPC and Big Data technologies for agricultural information and sensor systems

HPC technology and future architecture

Nagarjuna College Of

Gerard Mc Nulty Systems Optimisation Ltd BA.,B.A.I.,C.Eng.,F.I.E.I

Bringing Big Data Modelling into the Hands of Domain Experts

ANALYTICS IN BIG DATA ERA

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

This Symposium brought to you by

Data Refinery with Big Data Aspects

Handling Big Data Stream Analytics using SAMOA Framework - A Practical Experience

MSCA Introduction to Statistical Concepts

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Echtzeittesten mit MathWorks leicht gemacht Simulink Real-Time Tobias Kuschmider Applikationsingenieur

Industrial Roadmap for Connected Machines. Sal Spada Research Director ARC Advisory Group

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

On a Hadoop-based Analytics Service System

Introduction. A. Bellaachia Page: 1

Unified Batch & Stream Processing Platform

UPS battery remote monitoring system in cloud computing

Master of Science in Computer Science

Big Data Analytics. Lucas Rego Drumond

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Prerequisites. Course Outline

RiMONITOR. Monitoring Software. for RIEGL VZ-Line Laser Scanners. Ri Software. visit our website Preliminary Data Sheet

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Azure Data Lake Analytics

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

AA Automated Attendant is a device connected to voice mail systems that answers and may route incoming calls or inquiries.

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Building Energy Management: Using Data as a Tool

Holger Eichelberger, Cui Qin, Klaus Schmid, Claudia Niederée

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Big Data Storage Architecture Design in Cloud Computing

Challenges for Data Driven Systems

Big Data - Infrastructure Considerations

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Solar Irradiance Forecasting Using Multi-layer Cloud Tracking and Numerical Weather Prediction

How to Enhance Traditional BI Architecture to Leverage Big Data

MEng, BSc Computer Science with Artificial Intelligence

An Introduction to Data Mining

Data Analysis with MATLAB The MathWorks, Inc. 1

Big Data in Subsea Solutions

How To Use Neural Networks In Data Mining

Transcription:

Big Learning Data Management and Data Analysis... for industrial applications Thomas Natschläger +43 7236 3343 868 thomas.natschlaeger@scch.at www.scch.at Das SCCH ist eine Initiative der Das SCCH befindet sich im

SCCH Key Facts application-oriented research organization initiated by institutes of the Johannes Kepler University Linz cooperation science - industry non-profit organization constituted as Ltd owners Johannes Kepler University Linz Upper Austrian Research GmbH Association of Company Partners of SCCH ~ 60 employees (>80 with partners) 5,7 mio euros income incl. subsidies in business year 2010/2011 founded in July 1999 in the realm of the K plus Program since 2008 COMET competence center 2

Research Topics Process and Quality Engineering software engineering software quality process and approaches Rigorous Methods in Software Engineering software specification, verification, validation formal methods (ASM, Event-B, etc.) process modeling, workflows Models, Architectures and Tools software architecture model-based development integration of architecture in development Knowledge-Based Vision Systems machine vision object recognition object tracking Data Analysis Systems automated and intelligent data analysis prediction and optimization knowledge discovery 3

Application Domains DAS - Data Analysis Systems Topics Computational Models Semantic Knowledge Models Knowledge Discovery Machine Learning Stream Data Analysis Data Warehousing Data Management 4

Application Domains DAS - Data Analysis Systems Topics Computational Models Semantic Knowledge Models Knowledge Discovery Machine Learning Stream Data Analysis Data Warehousing Data Management 5

Overview Temporal Analytics on Big Data Applications Fault Detection Proposed Architecture Related Work Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 6

Overview Temporal Analytics on Big Data Applications Fault Detection Proposed Architecture Related Work Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 7

Domain: Industrial Production system 1 system 2 system i system n PIMS Subsystems generate streams of sensor data Stored in Production Information Management System Analysis Tasks Quality Assurance Process Optimization Fault Detection Fault Diagnosis... 8

Selected References voestalpine Stahl GmbH Analysis of continuous casting process Integration of expert knowledge visual Data Mining, Interpretation Böhler Edelstahl Quality analysis of high-grade steel production unisoftware plus machine learning framework (mlf) Basis for many projects in the area of process analysis Siemens Transformers Austria Optimization of power transformer cores Voith Paper, SCA Laakirchen Analysis and optimization in paper production Analysis tool PaperMiner AMS Engineering Knowledge discover in discrete manufacturing Analysis of stand stills, fault detection 9

Domain: Machine Manufacturer Data Center Machines at different locations generate streams of sensor data Stored in data center Analysis Tasks Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault Diagnosis... 10

Domain: Decentralized Renewable Energy, Home Automation Data Center Sensors of different kind at each building generate streams of sensor data Temperature Solar radiation Energy production... Analysis Tasks Usage Monitoring Profile Analysis Condition Monitoring Fault Detection Fault Diagnosis 11

Application : Fault Detection for Renewable Energy Units (near) real time detection of faults of units naturally temporal task => Data Stream Processing profile analysis of units Need access to all units => central application large amount of devices => Big Data low false positive rate, i.e. good model needs considerable amount of historical data especially for long term drifts => Big Data 12

Fault Detection Algorithms A) Compare measured channels to a model Deviation indicate fault and its type A good model needs to be identified (learned) Typically using historical good data B) Fit known model type e.g. ARX: y t = a k y t k + i,k b i,k x i (t k) Bad coefficient of fitness indicates faults 13

Evaluated Solution Combination of Big Data Storage (BDS) for off-line MapReduce and Stream Processing Engine (SPE) for on-line, real-time unit 1 unit 2 SPE unit i MUX unit n BDS 14

Fault Detection Method A Compare measured channels to a mode MapReduce is used to calibrate model on historical data SPE applies model in user-defined operator (UDO) REPLAY for testing unit 1 unit 2 SPE Read e.g. from RDBMS unit i MUX REPLAY Model unit n BDS MapReduce 15

Fault Detection Method B Fit known model structure to data BDS supplies historical data for testing via REPLAY SPE incrementally fits certain kind of regression model unit 1 unit 2 SPE Mo del unit i MUX REPLAY unit n BDS 16

Stream Data Mining: Incremental Algorithms 1. Process an example at a time, and inspect it only once 2. Use a limited amount of memory 3. Work in a limited amount of time 4. Be ready to predict at any time Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Philipp Kranen, Hardy Kremer, Timm Jansen, Thomas Seidl. Journal of Machine Learning Research (JMLR) Workshop and Conference Proceedings. Volume 11: Workshop on Applications of Pattern Analysis (2010). 17

Stream Data Mining: Open Source Framework MOA MOA: Massive Online Analysis WEKA community, Java Big Data stream mining (classification, regression, and clustering) in real time Can be easily used with e.g. Hadoop Extendable with new mining algorithms Goal: provide a benchmark suite for the stream mining community http://moa.cms.waikato.ac.nz 18

Discussion General Setting Units generate streams of sensor data (time,value) Central storage of data for analysis tasks Many analysis tasks are temporal in nature; e.g. fault detection Implemented by current technology without much effort REPLAY partially solves the problem of implementing algorithms for MapReduce and SPE Issues: Usage of multiple SPE per machine or combiner Integration of existing incremental learning tools such as MOA 19

Related Work: TiMR Framework Combination of M-R and SPE (DSMS) Temporal queries for off-line and on-line Implemented using StreamInsight and SCOPE/Dryad Badrish Chandramouli, Jonathan Goldstein, and Songyun Duan. 2012. Temporal Analytics on Big Data for Web Advertising. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering (ICDE '12). IEEE Computer Society, Washington, DC, USA 20

Overview Temporal Analytics on Big Data Applications Failure detection Proposed Architecture Related Work Mo del Mo del Mo del Mo del Mo del Mo del Mo del Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control 21

Causal Models for Prediction and Fault Detection Setting Complex industrial process Limited knowledge about interdependencies Goal E.g. Predict amount of TOC in wastewater for next 48h Challenges Robustness of model Precision of model Several thousands of sensors => computational complexity Approach Identify causal model structure Use parallelization to tackle computational complexity 22

Base: Gaussian Graphical Models Linear Model Various methods to estimate parameters Prominent Method to estimate structure: Graphical Lasso (Friedman 2007, 2012) based on L1 regularized minimization of log-likelihood 23

Extension to time: Granger Causality X would Granger Cause Y if it contains information useful in forecasting Y Implemented by graphical lasso on time lagged variables Work in progress Grouped Granger Graphical Lasso Detection of control loops Non-linear extensions => increases computational complexity 24

Parallelization of Machine Learning Algorithms MapReduce (see first part of talk) Good for data-parallel: Problems with iterative algorithms and complex dependencies in the data GraphLab intuitively expresses computational dependencies applied to dependent records which are stored as vertices in a large distributed data-graph GPGPU complex low level code (kernel) or: High-Level languages: SAC, Matlab, Mathematica... Meta-Programming: PyCUDA / CL,... graphlab.org 25

Parallelization of Machine Learning Algorithms MapReduce (see first part of talk) data-parallel: Problems with iterative algorithms and complex dependencies in the data GraphLab intuitively expresses computational dependencies applied to dependent records which are stored as vertices in a large distributed data-graph GPGPU complex low level code (kernel) or: High-Level languages: SAC, Matlab, Mathematica... Meta-Programming: PyCUDA / CL,... Hardware agnostic Parallel Patterns Esp. Parallel Patterns for Machine Learning graphlab.org 26

ParaPhrase High-level design and implementation patterns useful parallelism for a wide range of parallel applications heterogeneous multicore/manycore systems Hardware Abstraction Basis : FastFlow Framework (Turin, Pisa) General Purpose Patterns Master Slave, Farm, Pipeline, work queue, data dependency Domain Specific Patterns (SCCH, HLR Stuttgart) Suitability of generic patterns for machine learning ML - Patterns: pool oriented, graphical models patterns, time series,... 27

Relevant Use-Cases / Project Competencies (selection) TRUMPF Austria Improving precision of bending machines K-Projekt SoftNet (I + II) Fault prediction in software systems Mining Repositories K-Projekt PAC Process Analytic Chemestry Virtual sensors for chemical process analysis and control BlueSky Locally optimized weather predictions Application : Energy Efficiency Verbund Prediction of available water flow to optimize renewable energy usage Based on machine learning framework 28

2 1 0-1 -2-3 -4 0 20 40 60 80 100 1 0 12 1 13 14 15 16 17 18 19 10 2 Use Case: Local Weather Prediction 49 925mb, 0.556939, 0.92949 9 10 11 12 13 14 15 16 17 18 49 48 Salzburg Linz St. Pölten Wien Eisenstadt 48 Data collection Bregenz 47 Innsbruck Graz 47 Klagenfurt 46 9 10 11 12 13 14 15 16 17 18 46 Analysis Data sources Global Weather Models 5 2.5 6 0-2.5-5 4 0 2 2 4 6 0 Expert Knowledge Prediction Local Sensors: Weather stations, power plante,... Topographie, Expert knowledge Models 1 0.5-0.5-1 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 23 4 56 7 89 Alcohol 20 40 60 80 100 14.61 14.29 13.97 13.65 13.33 13.01 12.69 12.37 12.05 11.73 11.41 Goal Planning of events, maintenance,... Basis for optimization of energy usage 29

Optimization of Renewable Energy Usage Flow values, Precipitation / Temperature & Forecast Snow melt, ground Humidity (Holzmann & Nachtnebel 2002) Data Driven Models (z.b. Ridge Regression, Neural Networks) Rainfall-Runoff-Model (Hebenstreit 2000) HYSIM II (Drabek et al. 2002) CH Legende: Laufkraftwerke der AHP Speicherkraftwerke der AHP Gemeinschaftskraftwerke der AHP Beteiligungen des Verbund INN Oberaufdorf-Ebbs Gerlos Mayrhofen Bösdornau Roßhag Braunau-Simbach Nußdorf D Passau-Ingling Schärding-Neuhaus Egglfing-Obernberg Ering-Frauenstein SALZACH INN Kreuzbergmaut Bischofshofen Urreiting Funsingau Schwarzach St. Veit Wallnerau Kaprun- Hauptstufe Häusling Kaprun-Oberstufe Reißeck-Kreuzeck Malta-Oberstufe Paternion DRAU Kellerberg Jochenstein Rosegg-St. Jakob Mühlrading Staning Garsten-St. Ulrich Rosenau Mandling Ternberg Klaus Salza Sölk Bodendorf-Paal Malta-Hauptstufe Malta-Unterstufe Villach Feistritz-Ludmannsdorf Aschach Ferlach-Maria Rain Ottensheim-Wilhering ENNS Triebenbach St. Georgen Abwinden-Asten St. Pantaleon Krippau Fisching MUR Bodendorf-Mur Wallsee-Mitterk. Leoben Friesach Graz DONAU Melk Losenstein Ybbs-Persenbeug Großraming Weyer Schönau Edling Annabrücke Altenmarkt Landl Hieflau St.Martin Lebring Lavamünd Schwabeck Altenwörth Dionysen Pernegg Laufnitzdorf Arnstein Rabenstein Peggau Weinzödl Spielfeld Greifenstein Mellach Gralla Gabersdorf Obervogau SLO CZ Freudenau SK H SAMBA: Optimal weighting of all models Goals Short Term: Inclusion of availability of renewable energy in energy planning and trading (Water, Wind, Solar) 30

Summary Temporal Analytics on Big Data Applications Failure detection Proposed Architecture Related Work (MOA, TiMR) Learning Big Models Causal Inference Enabled by parallelization Prediction und optimal control Use-Cases 31

Veranstaltungstipp! Mit geeigneter Strategie zur nachhaltigen Softwarequalität: TRUST-IT 18. April, 09:00-14:00 Österreichische Computergesellschaft, Wien Zielgruppe: Software-Entwicklungsleiter, Prozessverantwortliche, Projektleiter, Software- Qualitätsingenieure und Architekturverantwortliche. www.scch.at/de/trust-it-wien-programm 32

Kontakt DI Michael Zwick +43 7236 3343 843 michael.zwick@scch.at www.scch.at Dr. Thomas Natschläger +43 7236 3343 868 thomas.natschlaeger@scch.at www.scch.at Dr. Holger Schöner +43 7236 3343 816 holger.schoener@scch.at www.scch.at 33