Monday Morning Data Mining


 Ethel Haynes
 2 years ago
 Views:
Transcription
1 Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse
2 Outline:  data mining  IceCube  Data mining in IceCube
3 Computer Scientists are different... Fakultät Physik
4 Fakultät Physik
5 Fakultät Physik
6 Building a model and predicting the outcome:
7 Can be broken down to 4 (simple) steps: 1. Find representation of data 2. Find a good algorithm 3. Validate your results 4. Apply on data
8 IceCube in a nutshell:  completed in December located at the geographic South Pole Digital Optical Modules on 86 strings  instrumented volume of 1 km 3  subdetectors DeepCore and IceTop
9 IceCube in a nutshell:  Detection principle: Cherenkov light  Look for events of the form: ν + X e,µ,τ  Dominant background of atm. µ Use earth as a filter (select upgoing events only)
10 IceCube: Scientific goals  detection of astrophysical neutrinos  atmospheric neutrino energy spectrum  neutrino oscillations  CRanisotropy  exotic stuff
11 Fakultät Physik
12 Fakultät Physik
13 Fakultät Physik
14 Fakultät Physik
15 Fakultät Physik
16 Fakultät Physik
17 Fakultät Physik
18 Data Mining in IceCube:  app reconstructed attributes  Data and MC do not necessarily agree  signal/background ratio ~ 103 interesting for studies within the scope of machine learning
19 1. Finding a good representation of your data
20 Make sure you understand your input: Attributes can be: nominal green, blue, red, yellow ordinal cool, mild, hot cool < mild < hot numerical 1,2,3,4,... labels can be: polynominal red, green, yellow, blue binominal signal, background numerical 1,2,3...,5000,...
21 Data Preprocessing: Preselection of parameters 1. Check for consistency (data vs.signal MC vs. Backgr. MC) 2. Check for missing values (nans, infs) How to handle the nans? (see next slide) 3. Eliminate the obvious (Azimuth angle, timing information...) 4. Eliminate highly correlated and constant parameters
22 Data and MC preprocessing: How to handle nans? Several possibilities:  Exclude attributes that exceed a certain number of nans  Replace by:  minimum  maximum  average  nothing at all  (median...)
23 Data and MC preprocessing: Feature Selection 1. Forward Selection start with empty selection add each unused attribute estimate performance add attribute with highest increase in performance start new round
24 Data and MC preprocessing: Feature Selection 2. Backward Elimination start with a full set of attributes Remove each of the attributes Estimate performance for each removed attribute The attribute giving the least decrease in performance is removed start new round
25 Backward Elimination in RapidMiner: Fakultät Physik
26 Data and MC preprocessing: Feature Selection 3. Mininmum Redundancy Maximum Relevance iteratively add features with biggest relevance and least redundancy Quality criterion Q: 1 Q = R( x, y) D( x, x) j x in R: Relevance; D: Redundancy; F j = already selected features F j
27 MRMR in RapidMiner:
28 Evaluating the Stability of the Parameter Selection:  Data and MC is subject to a certain variance this variance does influence the parameter selection!
29 Stability of the MRMR Selection: Jaccard Index: Kuncheva s Index: B A B A J = ) ( ), ( 2 B A r k B A k n k k rn B A I C = = = =
30 Fakultät Physik
31 2. Learning algorithms
32 Learners: 1. Decision Trees 2. Naive Bayes 3. k  Nearest Neighbours 4. Random Forests 5. Boosting
33 A bit more technically speaking: set of vectors x = (x 1,x 2,...,x n ); x i = attribute (attributes = features, variables, parameters) labels y 1,y 2,...,y n labels = target class create a model f from your examples, that predicts a y for a given x.
34 Constructing a simple model:
35 Decision Trees: Simple Classifier!
36 Naive Bayes:  based on Bayes theorem: Pr[ H E] = Pr[ E H ] Pr[ H Pr[ E] ]  assumes all attributes are independent
37 Naive Bayes: Golf data
38 Naive Bayes: Play? outlook = sunny, temperature = cool, humidity = high, windy = true
39 Naive Bayes: Pr[ yes E] = 2 / 9 3/ 9 3/ 9 Pr[ E] 3/ 9 9 /14
40 Naive Bayes: Pr[ yes Pr[ yes E] = E] = Pr[ no E] = 2 / 9 3/ / 9 3/ 9 9 /14 Pr[ E] needs to be normalized! Pr[ yes E] = Pr[ no E] = 0.795
41 Naive Bayes: What if Pr[E i yes]=0? Let s assume we don not have positive examples for outlook = rainy Pr[ sunny yes] = 4 / 9 Pr[ sunny yes] = 5/12 Pr[ overcast yes] = 5/ 9 Pr[ overcast yes] = 6 /12 Pr[ rainy yes] = 0 / 9 Pr[ rainy yes] = 1/12 Use Laplace correction!
42 knearest Neighbours (knn)  memory based classifier  unsupervised  find the k neighbours closest to x and classify by majority vote  all features should be normalized
43 Random Forests:  ensemble of decision trees  developed by Leo Breiman (2001)  no boosting between individual trees  events are classified by individual trees  uses average for final classification 1 n trees s = n trees i= 0 s i
44 Random Forests: Output MC scaled to data expectations choose final cut on signalness
45 Random Forests in rapidminer
46 Weka Random Forest:
47 Boosting:  uses an ensemble of weak classifiers (decision trees)  weights are increased for false classified events  weighted vote is applied  each classifier depends on the performance of the previous ones
48 Fakultät Physik
49 AdaBoost in rapidminer
50 Fakultät Physik
51 3. Validating the results
52 Split Validation:
53 Cross Valdiation:
54 Split Validation vs. Cross Validation: Fakultät Physik
55 Cross validated predictions: Cut Nugen Corsika Sum ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 33 4 ± ± 34
56 Cross Validation for a limited number of examples? YES! Leave One Out!
57 4. Application on data
58 Change the Scaling of the Corsika: Fakultät Physik such that it matches data for Signalness > 0.2
59 Data/MC mismatch: Underestimation of Background
60 Application of RF on 10% of data: Cut Nugen Corsika Sum Data ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 33 4 ± ±
61 Possible Improvements: Ensembles
62 Hierarchical Clustering: Agglomerative Fakultät Physik
63 Hierarchical Clustering: Divisive
64 kmeans Clustering:  Pick mean at random  Calculate distance of examples to mean  assign examples to cluster  recalculate mean of the cluster  reiterate until mean does not change any longer Significantly faster than hierarchical clustering Have to know k in advance...
65 Careful when using clusters: Normalize!!!
66 Summary:  IceCube is interesting for detailed studies in machine learning  studies can be carried out using RapidMiner  MRMR for Feature Selection  Simple learners are good for benchmarks  Cross Validation is good for you!  Signal/Background separation using data mining is possible!
67 Fakultät Physik
68 Fakultät Physik
Data Mining Ice Cubes Tim Ruhe, Katharina Morik ADASS XXI, Paris 2011
Data Mining Ice Cubes Tim Ruhe, Katharina Morik ADASS XXI, Paris 2011 Outline:  IceCube  RapidMiner  Feature Selection  Random Forest training and application  Summary and outlook The IceCube detector:
More informationData mining on the rocks T. Ruhe for the IceCube collaboration, K. Morik GREAT workshop on Astrostatistics and data mining 2011
Data mining on the rocks T. Ruhe for the IceCube collaboration, K. Morik GREAT workshop on Astrostatistics and data mining 2011 Outline:  IceCube, detector and detection principle  Signal and Background
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
More informationData Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
More information6. If there is no improvement of the categories after several steps, then choose new seeds using another criterion (e.g. the objects near the edge of
Clustering Clustering is an unsupervised learning method: there is no target value (class label) to be predicted, the goal is finding common patterns or grouping similar examples. Differences between models/algorithms
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationCPSC 340: Machine Learning and Data Mining. KMeans Clustering Fall 2015
CPSC 340: Machine Learning and Data Mining KMeans Clustering Fall 2015 Admin Assignment 1 solutions posted after class. Tutorials for Assignment 2 on Monday. Random Forests Random forests are one of the
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 20150305
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 20150305 Roman Kern (KTI, TU Graz) Ensemble Methods 20150305 1 / 38 Outline 1 Introduction 2 Classification
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationMachine learning for algo trading
Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationKnowledgebased systems and the need for learning
Knowledgebased systems and the need for learning The implementation of a knowledgebased system can be quite difficult. Furthermore, the process of reasoning with that knowledge can be quite slow. This
More informationIntroduction to Machine Learning Connectionist and Statistical Language Processing
Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes Introduction to Machine Learning p.1/22
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining BecerraFernandez, et al.  Knowledge Management 1/e  2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationData Mining Practical Machine Learning Tools and Techniques
Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric
More informationClass #6: Nonlinear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Nonlinear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Nonlinear classification Linear Support Vector Machines
More informationIntroduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 20092010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationData Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
More informationMore Data Mining with Weka
More Data Mining with Weka Class 3 Lesson 1 Decision trees and rules Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 3.1: Decision trees and rules
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationUniversité de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr
Université de Montpellier 2 Hugo AlatristaSalas : hugo.alatristasalas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationData Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationCLASSIFICATION AND CLUSTERING. Anveshi Charuvaka
CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training
More informationIS463 Introduction to Data Mining Semester 1, Academic year Tutorial # 2
IS463 Introduction to Data Mining Semester 1, Academic year 20122013 Tutorial # 2 Activity 1: Classify the following attributes as qualitative (nominal or ordinal) or quantitative (interval or ratio),
More informationData Mining Classification: Alternative Techniques. InstanceBased Classifiers. Lecture Notes for Chapter 5. Introduction to Data Mining
Data Mining Classification: Alternative Techniques InstanceBased Classifiers Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar Set of Stored Cases Atr1... AtrN Class A B
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationCS570 Data Mining Classification: Ensemble Methods
CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of HanKamberPei, Tan et al., and Li Xiong Günay (Emory) Classification:
More informationCar Insurance. Jan Tomášek Štěpán Havránek Michal Pokorný
Car Insurance Jan Tomášek Štěpán Havránek Michal Pokorný Competition details Jan Tomášek Official text As a customer shops an insurance policy, he/she will receive a number of quotes with different coverage
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationA Lightweight Solution to the Educational Data Mining Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM 10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationClustering & Association
Clustering  Overview What is cluster analysis? Grouping data objects based only on information found in the data describing these objects and their relationships Maximize the similarity within objects
More informationAn Approach to Detect Spam Emails by Using Majority Voting
An Approach to Detect Spam Emails by Using Majority Voting Roohi Hussain Department of Computer Engineering, National University of Science and Technology, H12 Islamabad, Pakistan Usman Qamar Faculty,
More informationOverview. Introduction to Machine Learning. Definition of Learning. A Sample Data Set. Connectionist and Statistical Language Processing
Overview Introduction to Machine Learning Connectionist and Statistical Language Processing Frank Keller keller@coli.unisb.de Computerlinguistik Universität des Saarlandes definition of learning sample
More informationChapter 7. Cluster Analysis
Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. DensityBased Methods 6. GridBased Methods 7. ModelBased
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification knearest neighbors
Classification knearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationClustering Algorithms. Data Mining Clustering. Distance. Example. More Than One Mean. Mean Clustering
Clustering Algorithms Data Mining Clustering Kevin Swingler Organise data into a number of distinct groups (clusters) according to the similarity of their members and their differences from other clusters
More informationKNIME TUTORIAL. Anna Monreale KDDLab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDDLab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationImplementation of Breiman s Random Forest Machine Learning Algorithm
Implementation of Breiman s Random Forest Machine Learning Algorithm Frederick Livingston Abstract This research provides tools for exploring Breiman s Random Forest algorithm. This paper will focus on
More informationUsing Machine Learning Techniques to Improve Precipitation Forecasting
Using Machine Learning Techniques to Improve Precipitation Forecasting Joshua Coblenz Abstract This paper studies the effect of machine learning techniques on precipitation forecasting. Twelve features
More informationFlat Clustering KMeans Algorithm
Flat Clustering KMeans Algorithm 1. Purpose. Clustering algorithms group a set of documents into subsets or clusters. The cluster algorithms goal is to create clusters that are coherent internally, but
More informationOutlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Daybyday Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationA Study Of Bagging And Boosting Approaches To Develop MetaClassifier
A Study Of Bagging And Boosting Approaches To Develop MetaClassifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet524121,
More informationDecisionTree Learning
DecisionTree Learning Introduction ID3 Attribute selection Entropy, Information, Information Gain Gain Ratio C4.5 Decision Trees TDIDT: TopDown Induction of Decision Trees Numeric Values Missing Values
More informationBig Data: The Science of Patterns. Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu
Big Data: The Science of Patterns Dr. Lutz Hamel Dept. of Computer Science and Statistics hamel@cs.uri.edu The Blessing and the Curse: Lots of Data Outlook Temp Humidity Wind Play Sunny Hot High Weak No
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationClustering. Data Mining. Abraham Otero. Data Mining. Agenda
Clustering 1/46 Agenda Introduction Distance Knearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in
More informationData Mining Techniques Chapter 11: Automatic Cluster Detection
Data Mining Techniques Chapter 11: Automatic Cluster Detection Clustering............................................................. 2 kmeans Clustering......................................................
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDDLAB ISTI CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationAn Enhanced Clustering Algorithm to Analyze Spatial Data
International Journal of Engineering and Technical Research (IJETR) ISSN: 23210869, Volume2, Issue7, July 2014 An Enhanced Clustering Algorithm to Analyze Spatial Data Dr. Mahesh Kumar, Mr. Sachin Yadav
More informationDATA ANALYTICS USING R
DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data
More informationModelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
More informationProbability Theory. Elementary rules of probability Sum rule. Product rule. p. 23
Probability Theory Uncertainty is key concept in machine learning. Probability provides consistent framework for the quantification and manipulation of uncertainty. Probability of an event is the fraction
More informationClustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015
Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages
More informationData Mining Essentials
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
More informationMHI3000 Big Data Analytics for Health Care Final Project Report
MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data preprocessing The data given
More informationSummary Data Mining & Process Mining (1BM46) Content. Made by S.P.T. Ariesen
Summary Data Mining & Process Mining (1BM46) Made by S.P.T. Ariesen Content Data Mining part... 2 Lecture 1... 2 Lecture 2:... 4 Lecture 3... 7 Lecture 4... 9 Process mining part... 13 Lecture 5... 13
More informationGuido Sciavicco. 11 Novembre 2015
classical and new techniques Università degli Studi di Ferrara 11 Novembre 2015 in collaboration with dr. Enrico Marzano, CIO Gap srl Active Contact System Project 1/27 Contents What is? Embedded Wrapper
More informationAn Introduction to Ensemble Learning in Credit Risk Modelling
An Introduction to Ensemble Learning in Credit Risk Modelling October 15, 2014 Han Sheng Sun, BMO Zi Jin, Wells Fargo Disclaimer The opinions expressed in this presentation and on the following slides
More informationRobotics 2 Clustering & EM. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard
Robotics 2 Clustering & EM Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard 1 Clustering (1) Common technique for statistical data analysis to detect structure (machine learning,
More informationUNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee
UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass
More informationGeneralizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
More informationExtend Table Lens for HighDimensional Data Visualization and Classification Mining
Extend Table Lens for HighDimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE CQAS 747 Principles of
More informationIntroduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011
Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationlop Building Machine Learning Systems with Python en source
Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho
More informationJournal of Asian Scientific Research COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM2.5 IN HONG KONG RURAL AREA.
Journal of Asian Scientific Research journal homepage: http://aesswebcom/journaldetailphp?id=5003 COMPARISON OF THREE CLASSIFICATION ALGORITHMS FOR PREDICTING PM25 IN HONG KONG RURAL AREA Yin Zhao School
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationData Mining of Web Access Logs
Data Mining of Web Access Logs A minor thesis submitted in partial fulfilment of the requirements for the degree of Master of Applied Science in Information Technology Anand S. Lalani School of Computer
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData Analytics Applied
Data Analytics Applied A case study from the utilities sector Bram Steurtewagen  bram.steurtewagen@ugent.be  www.bigdata.ugent.be 1 Outline 1. Who are we? 2. Toolkit: R and PySpark 3. The Case Study
More informationLecture 20: Clustering
Lecture 20: Clustering Wrapup of neural nets (from last lecture Introduction to unsupervised learning Kmeans clustering COMP424, Lecture 20  April 3, 2013 1 Unsupervised learning In supervised learning,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
More informationMachine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
More informationKnowledge Discovery and Data Mining. Structured vs. NonStructured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. NonStructured Data Most business databases contain structured data consisting of welldefined fields with numeric or alphanumeric values.
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring  Overview Random Forest  Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationComparison of Kmeans and Backpropagation Data Mining Algorithms
Comparison of Kmeans and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationBOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on elearning (elearning2014), 2223 September 2014, Belgrade, Serbia BOOSTING  A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationMachine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
More informationRapidMiner. Business Analytics Applications. Data Mining Use Cases and. Markus Hofmann. Ralf Klinkenberg. RapidI / RapidMiner.
RapidMiner Data Mining Use Cases and Business Analytics Applications Edited by Markus Hofmann Institute of Technology Blanchardstown, Dublin, Ireland Ralf Klinkenberg RapidI / RapidMiner Dortmund, Germany
More information