Data Mining Ice Cubes Tim Ruhe, Katharina Morik ADASS XXI, Paris 2011

Size: px
Start display at page:

Download "Data Mining Ice Cubes Tim Ruhe, Katharina Morik ADASS XXI, Paris 2011"

Transcription

1 Data Mining Ice Cubes Tim Ruhe, Katharina Morik ADASS XXI, Paris 2011

2 Outline: - IceCube - RapidMiner - Feature Selection - Random Forest training and application - Summary and outlook

3 The IceCube detector: - Completed in December Located at the geographic South Pole Digital Optical Modules on 86 strings - Instrumented volume of 1 km 3 - Has taken data in various string configurations (this work: 59 strings)

4 The IceCube detector: - Detection principle: Cherenkov light - Look for events of the form: ν + X e,µ,τ - Dominant background of atm. µ Use earth as a filter (select upgoing events only)

5 Fakultät Physik

6 Data Mining in IceCube: - App reconstructed attributes - Data and MC do not necessarily agree - Signal/background ratio ~ 10-3 Interesting for studies within the scope of machine learning

7 RapidMiner: - Data Mining environment, Open Source, Java - Developed at the Department of Computer Science at TU Dortmund (group of K. Morik) - Operator based - Quite intuitive to handle (personal opinion)

8 Preselection of parameters: (After application of precuts) 1. Check for consistency (data vs. nu MC vs. background MC ) Eliminate if missing in one (reduction ~ out of ~2600) 2. Check for missing values (nans, infs) Eliminate if number of missing values exceeds 30% (reduction to 1408 attributes) 3. Eliminate the obvious (Azimuth, DelAng, GalLong, Time...) (reduction to 612 attributes) 4. Eliminate highly correlated (ρ = 1.0 ) and constant parameters Final set of 477 parameters

9 Mininmum Redundancy Maximum Relevance (MRMR): - Iteratively add features with biggest relevance and least redundancy - Quality criterion Q: 1 Q = R( x, y) D( x, x) j x in F j R: Relevance; D: Redundancy; F j = already selected features

10 Stability of the MRMR Selection: Jaccard Index: Kuncheva s Index: B A B A J = ) ( ), ( 2 B A r k B A k n k k rn B A I C = = = =

11 Fakultät Physik

12 Random Forest output: Forest parameters: trees x 10 5 backgr. events x 10 4 signal events - 5 fold X-Validation - 28 x 10 4 of each class used for training Data/MC mismatch underestimation of background

13 Change the Scaling of the Background: such that it matches data for Signalness > 0.2 Fakultät Physik

14 Expected Numbers: With Rescaled Background Cut Nugen Corsika Sum Data ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 33 5 ± ±

15 Summary and Outlook: - IceCube is well suited for a detailed study within machine learning - Random Forest outperforms simpler classifiers - Feature Selection shows stable performance - Application on data matches MC expectations - Increase in performance expected for full optimization

16 Backup Slides

Monday Morning Data Mining

Monday Morning Data Mining Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik

More information

Data mining on the rocks T. Ruhe for the IceCube collaboration, K. Morik GREAT workshop on Astrostatistics and data mining 2011

Data mining on the rocks T. Ruhe for the IceCube collaboration, K. Morik GREAT workshop on Astrostatistics and data mining 2011 Data mining on the rocks T. Ruhe for the IceCube collaboration, K. Morik GREAT workshop on Astrostatistics and data mining 2011 Outline: - IceCube, detector and detection principle - Signal and Background

More information

2 Decision tree + Cross-validation with R (package rpart)

2 Decision tree + Cross-validation with R (package rpart) 1 Subject Using cross-validation for the performance evaluation of decision trees with R, KNIME and RAPIDMINER. This paper takes one of our old study on the implementation of cross-validation for assessing

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Welcome to this presentation on Driving LEDs AC-DC Power Supplies, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we

Welcome to this presentation on Driving LEDs AC-DC Power Supplies, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we Welcome to this presentation on Driving LEDs AC-DC Power Supplies, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we will look at: - the typical circuit structure of AC-DC

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Ensuring Collective Availability in Volatile Resource Pools via Forecasting

Ensuring Collective Availability in Volatile Resource Pools via Forecasting Ensuring Collective Availability in Volatile Resource Pools via Forecasting Artur Andrzejak andrzejak[at]zib.de Derrick Kondo David P. Anderson Zuse Institute Berlin (ZIB) INRIA UC Berkeley Motivation

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model

Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model Realization of Inventory Databases and Object-Relational Mapping for the Common Information Model Department of Physics and Technology, University of Bergen. November 8, 2011 Systems and Virtualization

More information

PoS(ICRC2015)707. FACT - First Energy Spectrum from a SiPM Cherenkov Telescope

PoS(ICRC2015)707. FACT - First Energy Spectrum from a SiPM Cherenkov Telescope from a SiPM Cherenkov Telescope F. Temme a, M. L. Ahnen b, M. Balbo c, M. Bergmann d, A. Biland b, C. Bockermann e, T. Bretz b, K. A. Brügge a, J. Buss a, D. Dorner d, S. Einecke a, J. Freiwald a, C. Hempfling

More information

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION

CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION MATIJA STEVANOVIC PhD Student JENS MYRUP PEDERSEN Associate Professor Department of Electronic Systems Aalborg University,

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Big Data Analytics in Astrophysics. Katharina Morik, Informatik 8, TU Dortmund

Big Data Analytics in Astrophysics. Katharina Morik, Informatik 8, TU Dortmund Big Data Analytics in Astrophysics Katharina Morik, Informatik 8, TU Dortmund Overview Short introduction to the Collaborative Research Center SFB 876 Zooming into one of its projects: C3 Astrophysical

More information

Note: This App is under development and available for testing on request. Note: This App is under development and available for testing on request. Note: This App is under development and available for

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Feature Selection with Monte-Carlo Tree Search

Feature Selection with Monte-Carlo Tree Search Feature Selection with Monte-Carlo Tree Search Robert Pinsler 20.01.2015 20.01.2015 Fachbereich Informatik DKE: Seminar zu maschinellem Lernen Robert Pinsler 1 Agenda 1 Feature Selection 2 Feature Selection

More information

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator

More information

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Una-May O Reilly MIT MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications Lots of Data Everywhere Knowledge Mining

More information

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello Extracting Knowledge from Biomedical Data through Logic Learning Machines and Rulex Marco Muselli Institute of Electronics, Computer and Telecommunication Engineering National Research Council of Italy,

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

Neural Networks and Support Vector Machines

Neural Networks and Support Vector Machines INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines

More information

Didacticiel Études de cas

Didacticiel Études de cas 1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

Easy and efficient asset management for your business

Easy and efficient asset management for your business Easy and efficient asset management for your business PERFORMANCE MADE SMARTER Featuring Monitoring of process values Diagnostics Device programming Simulation of process values TEMPERATURE I.S. INTERFACES

More information

RapidMiner. Business Analytics Applications. Data Mining Use Cases and. Markus Hofmann. Ralf Klinkenberg. Rapid-I / RapidMiner.

RapidMiner. Business Analytics Applications. Data Mining Use Cases and. Markus Hofmann. Ralf Klinkenberg. Rapid-I / RapidMiner. RapidMiner Data Mining Use Cases and Business Analytics Applications Edited by Markus Hofmann Institute of Technology Blanchardstown, Dublin, Ireland Ralf Klinkenberg Rapid-I / RapidMiner Dortmund, Germany

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Server Virtualization with Windows Server Hyper-V and System Center

Server Virtualization with Windows Server Hyper-V and System Center Course 20409B: Server Virtualization with Windows Server Hyper-V and System Center Course Details Course Outline Module 1: Evaluating the Environment for Virtualization This module provides an overview

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

WebSphere Business Monitor

WebSphere Business Monitor WebSphere Business Monitor Monitor models 2010 IBM Corporation This presentation should provide an overview of monitor models in WebSphere Business Monitor. WBPM_Monitor_MonitorModels.ppt Page 1 of 25

More information

SharePoint as a Business Platform

SharePoint as a Business Platform Jean-François Saint-Pierre Evolusys SA SharePoint as a Business Platform Why, What and How? No Code Evolusys SA DataMining Business Intelligence Business Workflows & Forms Social & Geo Intelligence Mobility

More information

Electronics for Analog Signal Processing - II Prof. K. Radhakrishna Rao Department of Electrical Engineering Indian Institute of Technology Madras

Electronics for Analog Signal Processing - II Prof. K. Radhakrishna Rao Department of Electrical Engineering Indian Institute of Technology Madras Electronics for Analog Signal Processing - II Prof. K. Radhakrishna Rao Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 18 Wideband (Video) Amplifiers In the last class,

More information

Model Combination. 24 Novembre 2009

Model Combination. 24 Novembre 2009 Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Measurement Study of Wuala, a Distributed Social Storage Service

Measurement Study of Wuala, a Distributed Social Storage Service Measurement Study of Wuala, a Distributed Social Storage Service Thomas Mager - Master Thesis Advisors: Prof. Ernst Biersack Prof. Thorsten Strufe Prof. Pietro Michiardi Illustration: Maxim Malevich 15.12.2010

More information

WELCOME to The Land Use Database :

WELCOME to The Land Use Database : Demo prepared by : Demo sponsored by: WELCOME to The Land Use Database : Demo-2 : Data Entry Filter and Backup/Restore Shown is : - how to define a Data Entry Filter - how to Backup/Restore Data Emphasis

More information

Welcome to this presentation on Switch Mode Drivers, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we will look at:

Welcome to this presentation on Switch Mode Drivers, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we will look at: Welcome to this presentation on Switch Mode Drivers, part of OSRAM Opto Semiconductors LED Fundamentals series. In this presentation we will look at: How switch mode drivers work, switch mode driver topologies,

More information

Geographic redundant VoIP systems with Kamailio

Geographic redundant VoIP systems with Kamailio Geographic redundant VoIP systems with Kamailio Welcome! Linuxtag 2010, 09.06.2010 Outline 1. Kamailio SIP Server use cases and differentiation 2. 1&1 VoIP backend purpose and scale setup and design 3.

More information

Classification using Logistic Regression

Classification using Logistic Regression Classification using Logistic Regression Ingmar Schuster Patrick Jähnichen using slides by Andrew Ng Institut für Informatik This lecture covers Logistic regression hypothesis Decision Boundary Cost function

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

Increasing Marketing ROI with Optimized Prediction

Increasing Marketing ROI with Optimized Prediction Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.

More information

AUTOMATED TESTING and SPI. Brian Lynch

AUTOMATED TESTING and SPI. Brian Lynch AUTOMATED TESTING and SPI Brian Lynch 1 Introduction The following document explains the purpose and benefits for having an Automation Test Team, the strategy/approach taken by an Automation Test Team

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Self-Adjusting Condition Monitoring of PV Power Plant Production

Self-Adjusting Condition Monitoring of PV Power Plant Production Self-Adjusting Condition Monitoring of PV Power Plant Production Mining Measurement Data Dipl.-Inf. Steffen Dienst M.Sc. Soheila Sahami M.Sc. Johannes Schmidt {sdienst sahami}@informatik.uni-leipzig.de

More information

Radar Systems Engineering Lecture 6 Detection of Signals in Noise

Radar Systems Engineering Lecture 6 Detection of Signals in Noise Radar Systems Engineering Lecture 6 Detection of Signals in Noise Dr. Robert M. O Donnell Guest Lecturer Radar Systems Course 1 Detection 1/1/010 Block Diagram of Radar System Target Radar Cross Section

More information

Course Outline. Create and configure virtual hard disks. Create and configure virtual machines. Install and import virtual machines.

Course Outline. Create and configure virtual hard disks. Create and configure virtual machines. Install and import virtual machines. Server Virtualization with Windows Server Hyper-V and System Centre Duration 5 days Course Code SSM20409 Format Instructor Led Overview This five day course will provide you with the knowledge and skills

More information

New Ensemble Combination Scheme

New Ensemble Combination Scheme New Ensemble Combination Scheme Namhyoung Kim, Youngdoo Son, and Jaewook Lee, Member, IEEE Abstract Recently many statistical learning techniques are successfully developed and used in several areas However,

More information

Server-Virtualisierung mit Windows Server Hyper-V und System Center MOC 20409

Server-Virtualisierung mit Windows Server Hyper-V und System Center MOC 20409 Server-Virtualisierung mit Windows Server Hyper-V und System Center MOC 20409 Course Outline Module 1: Evaluating the Environment for Virtualization This module provides an overview of Microsoft virtualization

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

TURKISH ORACLE USER GROUP

TURKISH ORACLE USER GROUP TURKISH ORACLE USER GROUP Data Mining in 30 Minutes Husnu Sensoy Global Maksimum Data & Information Tech. Founder VLDB Expert Agenda Who am I? Different problems of Data Mining In database data mining?!?

More information

Project Planning and Project Estimation Techniques. Naveen Aggarwal

Project Planning and Project Estimation Techniques. Naveen Aggarwal Project Planning and Project Estimation Techniques Naveen Aggarwal Responsibilities of a software project manager The job responsibility of a project manager ranges from invisible activities like building

More information

Server Virtualization with Windows Server Hyper-V and System Center

Server Virtualization with Windows Server Hyper-V and System Center Server Virtualization with Windows Server Hyper-V and System Center About this Course This five day course will provide you with the knowledge and skills required to design and implement Microsoft Server

More information

Performance Data Management with TAU PerfExplorer

Performance Data Management with TAU PerfExplorer Performance Data Management with TAU PerfExplorer Sameer Shende Performance Research Laboratory, University of Oregon http://tau.uoregon.edu TAU Analysis 2 TAUdb: Performance Data Management Framework

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

IceCube Project Monthly Report July 2007

IceCube Project Monthly Report July 2007 IceCube Project Monthly Report July 2007 Accomplishments The Data Acquisition (DAQ) software processed 1.50 billion events during 2.69 million live seconds in the month of July. The DAQ uptime percentage

More information

Server Virtualization with Windows Server Hyper-V and System Center

Server Virtualization with Windows Server Hyper-V and System Center Course Code: M20409 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,025 Server Virtualization with Windows Server Hyper-V and System Center Overview This five day course will provide you with the

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems

Impact of Feature Selection on the Performance of Wireless Intrusion Detection Systems 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems

More information

Module 1: Sensor Data Acquisition and Processing in Android

Module 1: Sensor Data Acquisition and Processing in Android Module 1: Sensor Data Acquisition and Processing in Android 1 Summary This module s goal is to familiarize students with acquiring data from sensors in Android, and processing it to filter noise and to

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu

Profile Data Mining with PerfExplorer. Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu Profile Data Mining with PerfExplorer Sameer Shende Performance Reseaerch Lab, University of Oregon http://tau.uoregon.edu TAU Analysis PerfDMF: Performance Data Mgmt. Framework 3 Using Performance Database

More information

Operational excellence for Oracle applications

Operational excellence for Oracle applications Operational excellence for Oracle applications Sebastiaan Vingerhoed, specialist region EE&CIS October 20th, 2010 HROUG Agenda Welcome & Introduction Application Life Cycle Automate

More information

Objective Criteria of Job Scheduling Problems. Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University

Objective Criteria of Job Scheduling Problems. Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University Objective Criteria of Job Scheduling Problems Uwe Schwiegelshohn, Robotics Research Lab, TU Dortmund University 1 Jobs and Users in Job Scheduling Problems Independent users No or unknown precedence constraints

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

Implementation of Breiman s Random Forest Machine Learning Algorithm

Implementation of Breiman s Random Forest Machine Learning Algorithm Implementation of Breiman s Random Forest Machine Learning Algorithm Frederick Livingston Abstract This research provides tools for exploring Breiman s Random Forest algorithm. This paper will focus on

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet) HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients

An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients An innovative application of a constrained-syntax genetic programming system to the problem of predicting survival of patients Celia C. Bojarczuk 1, Heitor S. Lopes 2 and Alex A. Freitas 3 1 Departamento

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015 Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

More information

Server Virtualization with Windows Server Hyper-V and System Center

Server Virtualization with Windows Server Hyper-V and System Center Course 20409 Server Virtualization with Windows Server Hyper-V and System Center Length: Language(s): Audience(s): 5 Days English IT Professionals Level: 300 Technology: Windows Server 2012 Type: Delivery

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Server Virtualization with Windows Server Hyper-V and System Center

Server Virtualization with Windows Server Hyper-V and System Center Server Virtualization with Windows Server Hyper-V and System Center MS20409 DESCRIZIONE: This five day course will provide you with the knowledge and skills required to design and implement Microsoft Server

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

Copyright 2011 - Bizagi. CRM Sales Opportunity Management Construction Document Bizagi Process Modeler

Copyright 2011 - Bizagi. CRM Sales Opportunity Management Construction Document Bizagi Process Modeler Copyright 2011 - Bizagi CRM Sales Opportunity Management Bizagi Process Modeler Table of Contents CRM- Sales Opportunity Management... 4 Description... 4 Main Facts in the Process Construction... 5 Data

More information

FIRE ALARM SYSTEM. reliable fire detection and alarm signalling system

FIRE ALARM SYSTEM. reliable fire detection and alarm signalling system FIRE ALARM SYSTEM reliable fire detection and alarm signalling system Satel fire alarm devices allow to implement reliable and modern systems responsible for fire safety. A combination of 20-year experience

More information

Professional fire alarm panel for small and medium-sized objects. Compact

Professional fire alarm panel for small and medium-sized objects. Compact Compact Professional fire alarm panel for small and medium-sized objects ESSER by Honeywell: The experts for safety in buildings With a comprehensive and future-oriented portfolio in the fields of fire

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

How To Understand Business Intelligence

How To Understand Business Intelligence An Introduction to Advanced PREDICTIVE ANALYTICS BUSINESS INTELLIGENCE DATA MINING ADVANCED ANALYTICS An Introduction to Advanced. Where Business Intelligence Systems End... and Predictive Tools Begin

More information

Application of Honeywell MasterLogic-200 Controller in Highway Tunnel Monitoring Systems

Application of Honeywell MasterLogic-200 Controller in Highway Tunnel Monitoring Systems White Paper Application of Honeywell MasterLogic-200 Controller in Highway Tunnel Monitoring Systems Executive Summary This article introduces the applications and features of the latest Honeywell programmable

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Study of the B D* ℓ ν with the Partial Reconstruction Technique

Study of the B D* ℓ ν with the Partial Reconstruction Technique Study of the B D* ℓ ν with the Partial Reconstruction Technique + University of Ferrara / INFN Ferrara Dottorato di Ricerca in Fisica Ciclo XVII Mirco Andreotti 4 March 25 Measurement of B(B D*ℓν) from

More information

Core Solutions of Microsoft Exchange Server 2013 MOC 20341

Core Solutions of Microsoft Exchange Server 2013 MOC 20341 Core Solutions of Microsoft Exchange Server 2013 MOC 20341 Course Outline Module 1: Deploying and Managing Exchange Server 2013 This module explains how to plan and perform deployment and management of

More information

Worldclass Recruiting Software for successful Enterprises

Worldclass Recruiting Software for successful Enterprises Worldclass Recruiting Software for successful Enterprises Transform your business with staffitpro WEB. We have been supporting recruiting agencies for over 15 years. staffitpro WEB is the top market leading

More information

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka. 1 Subject Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka. This document extends a previous tutorial dedicated to the comparison of various implementations

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Audit Logging. Overall Goals

Audit Logging. Overall Goals Audit Logging Security Training by Arctec Group (www.arctecgroup.net) 1 Overall Goals Building Visibility In Audit Logging Domain Model 2 1 Authentication, Authorization, and Auditing 3 4 2 5 6 3 Auditing

More information