Big Data-Anwendungsbeispiele aus Industrie und Forschung Dr. Patrick Traxler +43 7236 3343 898 Patrick.traxler@scch.at www.scch.at Das SCCH ist eine Initiative der Das SCCH befindet sich im
Organizational Frame non-profit organization constituted as Ltd owners Johannes Kepler University Linz Upper Austrian Research GmbH Association of Company Partners of SCCH ~ 65 employees (>100 with partners) 5,6 mio. euros income incl. subsidies in business year 2012 founded in July 1999 in the realm of the K plus Program since January 2008 COMET competence center 2007 Software Competence Center Hagenberg GmbH 2
Research Topics Process and Quality Engineering software engineering software quality process and approaches Rigorous Methods in Software Engineering formal methods modeling critical software components Software Analytics and Evolution software architecture model-based development integration of architecture in development Data Analysis Systems automated and intelligent data analysis prediction knowledge discovery Knowledge-Based Vision Systems machine vision object recognition object tracking 3
Application Domains Data Analysis Systems Topics Topics Computational Models Semantic Knowledge Models Knowledge Discovery Machine Learning Stream Data Analysis Data Warehousing Data Management 4
Overview Internet of Things in industry Industrial production Machines & devices A pattern for processing and analyzing industrial big data Disaster management NoSQL DWH integration Research in computer science Summary 5
Internet of Things in Industry Coined as Industrial Internet by Evans & Annunziata, 2012 (General Electrics) 6
Industrial Production Subsystem 2 Subsystem 1 Subsystem i Subsystem n Subsystem 3 PIMS Subsystems generate streams of sensor data Complex interaction of subsystems Stored in production information management system Analysis tasks Quality assurance Process optimization Fault detection Fault diagnosis 7
Machines & Devices Big data storage Machines at different locations generate streams of sensor data Many machines or devices (spatial-temporal context) Stored in big data storage Analysis tasks Usage monitoring Condition monitoring Fault detection Fault diagnosis... 8
A Pattern for Processing and Analysing Industrial Big Data Machines & devices, industrial production, building automation, smart grid (renewable energy), Data = # Machines, devices, (sub)systems * # Time points * # Features # Time points = Time period * Frequency General Setting Units generate streams of sensor data (time, value) Central storage of data for analysis tasks E.g. model learning once a week Near real-time processing of data [using a learned model] E.g. immediately check for faults E.g. optimize control every second unit 1 unit 2 unit i central storage unit n 9
A Pattern for Processing and Analysing Industrial Big Data Combining Big Data Storages (BDS) and Stream Processing Engines (SPE) BDS: for offline data processing and analysis SPE: for online, near real-time data processing and analysis unit 1 unit 2 SPE Read e.g. from RDBMS unit i MUX REPLAY Model unit n BDS MapReduce Implemented by current technology without much effort REPLAY partially solves the problem of different programming paradigms for SPEs (CQL) vs. BDSs (MapReduce) Open problem: Usage of multiple SPE per machine or combiner Open problem: Integration of existing incremental learning tools such as MOA 10
Disaster management NoSQL DWH Integration Use Case: INDYCO National research project (FFG) Development of dynamic disaster management system Situations derived from incoming sensor data Dynamic workflows started depending on current situation 11
Disaster management NoSQL DWH Integration Use Case: INDYCO 12
Disaster management NoSQL DWH Integration Motivation technology Increasing popularity of NoSQL (big) data storages Tabular/columnar data storages Document storages Graph storages Key/value storages Problem Fast reaction to live data Wish to integrate in traditional business intelligence (BI) environments 13
Disaster management NoSQL DWH Integration Use Case: INDYCO Complex Event Processing (CEP) engine (Drools) to interpret (live) sensor data Aggregated sensor data, detected situations, executed workflows loaded in data warehouse (DWH) for ad-hoc analysis and possible for long-term learning 14
Disaster management NoSQL DWH Integration Use Case: INDYCO NoSQL database (MongoDB) to store live sensor data (and situations & workflows) MapReduce to build aggregates Aggregated sensor data, situations, workflows transferred in DWH (SQL Server Integration Services) Basis for traditional BI (Analysis/Reporting Services) and advanced analytics (Rapidminer, KNIME) 15
Disaster management NoSQL DWH Integration Use Case: INDYCO 16
Research in computer science > High redundancy > Conditional mean & median > Major progress in (mapreduce) algorithms > Almost no redundancy > Web, knowledge, social graph > Very limited (map-reduce) algorithms > Moderate redundancy > Flow & sensor networks > Mixed results??? 17
Summary Big data in industrial applications Many machines or devices Complex facilities Temporal context (sensor data) Complex feature set Data = # Machines, devices, systems * # Time points * # Features Industrial big data processing: Usually achievable with current big data technologies. Industrial big data analysis: Depends strongly on the application. Does there exist an efficent map-reduce algorithm? 18
Contact Dr. Reinhard Stumptner Reinhard.stumptner@scch.at www.scch.at Dr. Thomas Natschläger Thomas.natschlaeger@scch.at www.scch.at Dr. Patrick Traxler Patrick.traxler@scch.at www.scch.at 19