KNIME Open Source Days 2012. Sep 3 7, Konstanz, Germany

Similar documents
Additional Information about RFQ for EM-motive

Technical Report. The KNIME Text Processing Feature:

Ensembles and PMML in KNIME

Anomaly Detection and Predictive Maintenance

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.

KNIME Enterprise server usage and global deployment at NIBR

What s Cooking in KNIME

Data Analysis in E-Learning System of Gunadarma University by Using Knime

Geo-Localization of KNIME Downloads

2015 Workshops for Professors

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

Interactive Data Mining and Visualization

Information Management course

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Data Mining & Data Stream Mining Open Source Tools

SQL Server Administrator Introduction - 3 Days Objectives

KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME

Radoop: Analyzing Big Data with RapidMiner and Hadoop

The Scientific Data Mining Process

IT services for analyses of various data samples

Design Considerations for a More Efficient Power Unit Circuit

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Fuzzy Logic in KNIME Modules for Approximate Reasoning

DATA MINING ALPHA MINER

Analyzing the Web from Start to Finish Knowledge Extraction from a Web Forum using KNIME

Tutorial for proteome data analysis using the Perseus software platform

Protein Protein Interaction Networks

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

Client Overview. Engagement Situation. Key Requirements

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

LDIF - Linked Data Integration Framework

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]

Data, Measurements, Features

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

ifinder ENTERPRISE SEARCH

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

2 Decision tree + Cross-validation with R (package rpart)

Cheminformatics and Pharmacophore Modeling, Together at Last

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Scientific and Technical Applications as a Service in the Cloud

Journée Thématique Big Data 13/03/2015

Distance Degree Sequences for Network Analysis

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Search and Information Retrieval

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Visualization methods for patent data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Final Project Report

Data processing goes big

Michael Bitter, Robert Bosch GmbH

Massive scale analytics with Stratosphere using R

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

TIM 50 - Business Information Systems

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Introduction to Pattern Recognition

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Anomaly Detection in Predictive Maintenance

CIP Safety on. Joaquin Ocampo, Bosch Rexroth USA Gary Thrall, Bosch Rexroth USA. Drive for Technology Expo

Data Mining mit der JMSL Numerical Library for Java Applications

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Analytics on Big Data

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Information Retrieval Elasticsearch

SIMCA 14 MASTER YOUR DATA SIMCA THE STANDARD IN MULTIVARIATE DATA ANALYSIS

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Clustering & Visualization

Context Aware Predictive Analytics: Motivation, Potential, Challenges

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Machine Learning with MATLAB David Willingham Application Engineer

6.2.8 Neural networks for data mining

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Information Architecture

Cisco Data Preparation

An intelligent tool for expediting and automating data mining steps. Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Transcription:

KNIME Open Source Days 2012 Sep 3 7, Konstanz, Germany

Mind Era - who are we? Mind Eratosthenes Kft., Budapest, Hungary, mind-era.com Katalin Bakos CEO, sister Gábor Bakos mathematician, software engineer, brother KOS Days 2012

RapidMiner, HiTS what is it? RapidMiner: Another Open Source Framework for data mining We integrated it to KNIME, it works like a metanode HiTS - some nodes to help data analysis of High Throughput/Content Screenings Contains nodes to perform cellhts2 transformations, visualize data, transform data, and a failed experiment to handle/search images using Bio-Formats KOS Days 2012

RapidMiner, HiTS - highlights RapidMiner Node Allows to execute/edit RapidMiner workflows (processes) RapidMiner Viewer Node Helps visualize data Hits nodes Leaf ordering, Reverse Order, Sort by Cluster, Dendrogram with Heatmap, Simple Heatmap,Rank, Direct Product, Merge (kind of antisort), Pivot, Unpivot, Subsets, KOS Days 2012

STARK Joint initiative KNIME + PASCAL2 Prof José L Balcázar (UC, now UPC) Proposer and part time programmer Personnel from Universidad de Cantabria Javier de la Dehesa (senior undergrad, now grad student, coded most of it) Diego García-Sáiz (grad student) Cristina Tîrnauca (post-doc) KOS Days 2012

STARK what is it? Self-Tuning Association Rules for KNIME KNIME node that performs association rule mining with very low configuration needs Tuning support and choosing rule interest measures are very difficult tasks for end users We proposed a self-tuning approach Decreasing support traversal, confidence boost Prototype in Python: yacaree.sf.net Now: Porting it into KNIME Will try to sell it to you all these days... KOS Days 2012

Current status Yacaree Node exists now The confidence boost handling needs a bit of improvement The usage is a bit complicated BUT: the Python version went ahead The KNIME node is a bit behind Algorithms have advanced even further conceptually Trying to catch up this week! KOS Days 2012

GenericWorkflowNodes for SeqAn and OpenMS Freie Universität Berlin Prof. Knut Reinert Head of Algorithmic Bioinformatics group Stephan Aiche Research Associate Björn Kahlert Research Associate KOS Days 2012

GenericWorkflowNodes for SeqAn/OpenMS what is it? GenericWorkflowNodes Wrap existing tools into KNIME nodes Seqan/OpenMS Open Source Frameworks for sequence analysis and analysis of mass spectrometry data Developed at Freie Universität Berlin (SeqAn, OpenMS) and Universität Tübingen (OpenMS) KOS Days 2012

GenericWorkflowNodes - highlights SeqAn/OpenMS Nodes most OpenMS and SeqAn apps available in KNIME CTD (Common Tool Description) a generic XML based description of command line tools Translate any tool you need into a KNIME node based on a CTD for the tool KOS Days 2012

Cortana - who are we? Leiden University Arno Knobbe Post-doc, occational programmer Marvin Meeng Main delevoper Wouter Duivesteijn, Michael Mampaey, Rob Konijn KOS Days 2012

Cortana what is it? Modern Subgroup Discovery tool Developed at Leiden University Research vehical to address problems in Subgroup Discovery Analyses tool used in many domains Bank transaction data Bioinformatics (Genomics/ Metabolimics) Chemical drug compound efficacy KOS Days 2012

Cortana - highlights Generic SD algorithm Target Type/ Quality Measure Search conditions/ Seach strategy Visualisation and manipulation of both Data and Results Table, Histogram, Scatter plot, DAG Change data type, missing values Subgroup inspection, ROC plots KOS Days 2012

Palladian KNIME Open Source Days, Konstanz 03.09.2012 Klemens Muthmann, TU Dresden

About Us Information retrieval team, Lehrstuhl Rechnernetze, TU Dresden Klemens Muthmann Philipp Katz David Urbansky (o. Abb.)

About Palladian Java-based toolkit for information retrieval Provide users with a basic set of tools Palladian s strengths Text classification, feed reading, named entity recognition, date recognition, keyword extraction, content scraping

Highlights Palladian text classifier for sentiment analysis

Highlights

PMM Lab - who are we? Federal Institute for Risk Assessment Germany Christian Thöns Programmer and Research Assistent And others (Matthias Filter, Jörgen Brandt, Armin Weiser, Alexander Falenski) KOS Days 2012

PMM Lab what is it? Collection of KNIME nodes for Predictive Microbiology Developed at the Federal Institute for Risk Assessment since 2011 Provides nodes for fitting and visualizing Predictive Microbiology models KOS Days 2012

KNIME - highlights Views for PMM models and data User can enter new models (model equations are parsed with JEP) KOS Days 2012

Who are we? University of Tübingen / Applied Bioinforma9cs group Exper9se in proteomics/metabolomics, drug design, molecular modelling, sequence analysis, systems biology and immunoinforma9cs Prof. Oliver Kohlbacher Head of Applied Bioinforma9cs Group Luis de la Garza PhD Student Kohlbacher, de la Garza Applied Bioinforma3cs Group 1

What do we do? Workflows on grid systems Integra9on of Computer Aided Drug Design Suite (CADDSuite) and OpenMS as KNIME nodes GenericKnimeNodes development together with FU Berlin GenericKnimeNodes Kohlbacher, de la Garza Applied Bioinforma3cs Group 2

Highlights - CADDSuite Flexible and open workflow- enabled framework for computer- aided drug design Part of the Biochemical Algorithms Library (BALL) Project Offers solu9ons to common tasks in drug design such as file format conversion, molecule prepara9on, docking, etc. Kohlbacher, de la Garza Applied Bioinforma3cs Group 3

Highlights - OpenMS Open mass spectrometry / liquid chromatography C++ library Offers visualiza9on of data, proteomics pipelining, workflow modeling engine, signal processing, feature finding, etc. Kohlbacher, de la Garza Applied Bioinforma3cs Group 4

KNIME Open Source Days 2012 Who are we Robert Bosch GmbH, DS/ETM Alexander Warta Test Engineer, Student Tutor Robert Bosch GmbH Diesel Systems, Engineering Test Methods (DS/ETM1) alexander.warta@de.bosch.com Computer Science Students (Master) Markus John (05/2012-10/2012) two other students 1 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Why KNIME Context and Challenge In order to design diesel fuel injection systems for global markets Robert Bosch GmbH considers a lot of specific diesel fuel quality parameters of various markets For this, fuel samples from almost all countries are chemically analyzed by a service provider regularly so-called fuel surveys One survey sample record contains up to 140 attributes, e.g. date, town, country, supplier and the results of chemical and physical analysis like sulfur content, density, viscosity, biodiesel content etc. About 10.000 records are currently of relevance The previous process integrated Microsoft Excel (plots, histograms, etc.) and PowerPoint (world map) in a non-automated succession This procedure is quite time consuming, not interactive, inflexible and not scalable 2 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Why KNIME Catalog of Requirements extract knowledge through interactive exploration easy access to all fuel surveys with filter methods generate choropleth maps and cartograms show country names show additional diagrams for each country show only selected countries enrich map with external data (like cities of the fuel survey records, locations of oil refineries, etc.) generate star plots, parallel coordinates, scatterplots apply data mining algorithms for finding new patterns between instances and features (like association rule learning, hierarchical clustering, multidimensional scaling) enrich fuel survey data with external data (like new diesel car registrations, failure count of the common rail system, etc.) 3 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Highlights KNIME Node GenericWorldMap generating world maps based on statistical attributes, additional dimensions with bars and scalable icons 4 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Highlights KNIME Node FuelSurveyVisualizer generating boxplots, starplots, etc. interactively by integrating R 5 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Highlights KNIME Node FuelSurveyStandardAnalysis creating standard presentation slides automatically by integrating Apache POI and R 6 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Highlights KNIME Node FuelSurveyWarnSystem early warning system to identify worsening fuel quality fast by integrating JBoss Drools (rule-based system) and Apache POI (generating Excel- and Word-file output) ongoing 7 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

KNIME Open Source Days 2012 Developed KNIME Nodes Selection Preprocessing Transformation FuelSurveyReader FuelSurveyDeleter StandardAnalysisXML Modeling Neighbors LocalOutlierDetection DistanceBasedkMeans Percentizer RefineryReader Visualization GenericWorldMap FuelSurveyVisualizer StandardAnalysis LocationTransformer DynamicColumnFilter MultipleReference RowFilter LoopColumnToVariable Elbow FuelSurveyWarnSystem FuelSurveyWarnSystemXML 8 Diesel Systems DS/ETM1-Wr, -Jo 03.09.2012 209-2283 Robert Bosch GmbH 2012. Alle Rechte vorbehalten, auch bzgl. jeder Verfügung, Verwertung, Reproduktion, Bearbeitung, Weitergabe sowie für den Fall von Schutzrechtsanmeldungen.

Who are we? Christian Dietz Image Processing Martin Horn Image Processing Tobias Kötter Network Mining Michael Zinsmaier Image Processing KOS Days 2012

Our Projects (1/2) Network Mining Framework to process attributed graphs Supports (un)directed, (un)weighted (hyper/multi/k-partite) graphs Indexing & Searching High-performance indexing and advanced querying Bases on Apache Lucene KOS Days 2012

Our Projects (2/2) Image Processing and Analysis Extension to process and analyse multidimensional images Integrates state-of-the-art libraries ImgLib2 BioFormats ImageJ ImageJ2 OMERO KOS Days 2012

KNIME Iris Adä Modular Data Generation, Ensemble Methods, JFreeChart Zaenal Akbar Parallel Data Mining Violeta Ivanova Parallel Data Mining Sebastian Peter Web Analytics. JFreeChart 05.09.2012 KNIME Open Source Days 18

KNIME Dawid Piatek Statistics Guru Thorsten Meinl Optimization & Build System Thomas Gabriel Database Connectors & R Peter Ohl File Reader & Server Development 05.09.2012 KNIME Open Source Days 19

KNIME Bernd Wiswedel Data Handling Aaron Hart (Magic) Support Michael Berthold The Godfather 05.09.2012 KNIME Open Source Days 20

KNIME Heather Fyson Keeps everything running Peter Burger System Administrator & BBQ master 05.09.2012 KNIME Open Source Days 21