Introduction Predictive Analytics Tools: Weka, R!
|
|
- Logan Thornton
- 8 years ago
- Views:
Transcription
1 Introduction Predictive Analytics Tools: Weka, R! Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego!
2 Available Data Mining Tools! COTs:! n IBM Intelligent Miner! n SAS Enterprise Miner! n Oracle ODM! n Microstrategy! n Microsoft DBMiner! n Pentaho! n Matlab! n Teradata! Open Source:! n WEKA! n KNIME! n Orange! n RapidMiner! n NLTK! n R! n Rattle! 2
3 Agenda! WEKA! Intro and background" Data Preparation" Creating Models/ Applying Algorithms" Evaluating Results" R! R Background" R Basics" Outline" R-Studio Overview" Hands On (homework)"
4 WEKA!
5 Download and Install WEKA! Website: 5 7/1/14
6 What is WEKA?! Waikato Environment for Knowledge Analysis! WEKA is a data mining/machine learning application developed by Department of Computer Science, University of Waikato, New Zealand" WEKA is open source software in JAVA " WEKA is a collection machine learning algorithms and tools for data mining tasks" data pre-processing, classification, regression, clustering, association, and visualization. " WEKA is well-suited for developing new machine learning schemes " WEKA is a bird found only in New Zealand.! 6 7/1/14
7 Advantages of Weka! Free availability! under the GNU General Public License" Portability! fully implemented in the Java programming language and thus runs on almost any modern computing platforms" Windows, Mac OS X and Linux" Comprehensive collection of data preprocessing and modeling techniques! Supports standard data mining tasks: data preprocessing, clustering, classification, regression, visualization, and feature selection." Easy to use GUI! Provides access to SQL databases! using Java Database Connectivity and can process the result returned by a database query."
8 Disadvantages!! Sequence modeling is not covered by the algorithms included in the Weka distribution! Not capable of multi-relational data mining!
9 WEKA Walk Through: Main GUI! Three graphical user interfaces! The Explorer (exploratory data analysis)" pre-process data" build classifiers " cluster data" find associations" attribute selection" data visualization" The Experimenter (experimental environment)" used to compare performance of different learning schemes " The KnowledgeFlow (new process model inspired interface) " Java-Beans-based interface for setting up and running machine learning experiments." Command line Interface ( Simple CLI )! More at: 9 7/1/14
10 1 0 7/1/14
11 WEKA:: Explorer: Preprocess! Importing data! Data format" Uses flat text files to describe the data" Data can be imported from a file in various formats: " ARFF, CSV, C4.5, binary" Data can also be read from a URL or from an SQL database (using JDBC)"
12 WEKA:: ARFF file age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present...! A more thorough description is available here
13 University of Waikato 7/1/14 1 3
14 University of Waikato 7/1/14 1 4
15 Weka: Explorer:Preprocess! Preprocessing data! Visualization" Filtering algorithms " filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria." Removing Noisy Data" Adding Additional Attributes" Remove Attributes"
16
17
18 WEKA:: Explorer: Preprocess! Used to define filters to transform Data.! WEKA contains filters for:! Discretization, normalization, resampling, attribute selection, transforming, combining attributes, etc"
19 University of Waikato 7/1/14 1 9
20
21
22 Explorer: Visualize! Visualization very useful in practice! help determine difficulty of the learning problem" WEKA can visualize single attributes (1-d) and pairs of attributes (2-d)! Color-coded class values! Jitter option to deal with nominal attributes (and to detect hidden data points)! Zoom-in function! 7/1/14 22
23 University of Waikato 7/1/14 2 3
24 University of Waikato 7/1/14 2 4
25 Explorer: Attribute Selection! Panel that can be used to investigate which (subsets of) attributes are the most predictive ones! Attribute selection methods contain two parts:! A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking! An evaluation method: correlation-based, wrapper, information gain, chi-squared, " Very flexible: WEKA allows (almost) arbitrary combinations of these two! 2 5 7/1/14
26 WEKA:: Explorer: building classifiers! Classifiers in WEKA are models for predicting nominal or numeric quantities! Implemented learning schemes include:! Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, " Meta -classifiers include:! Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, "
27 University of Waikato 7/1/14 2 7
28 University of Waikato 7/1/14 2 8
29 WEKA:: Explorer: building Cluster! WEKA contains clusters for finding groups of similar instances in a dataset! Implemented schemes are:! k-means, EM, Cobweb, X-means, FarthestFirst" Clusters can be visualized and compared to true clusters (if given)! Evaluation based on loglikelihood if clustering scheme produces a probability distribution!
30 Explorer: Finding associations! WEKA contains an implementation of the Apriori algorithm for learning association rules! Works only with discrete data" Can identify statistical dependencies between groups of attributes:! milk, butter bread, eggs (with confidence 0.9 and support 2000)" Apriori can compute all rules that have a given minimum support and exceed a given confidence! 7/1/14 30
31 References and Resources! References:! WEKA website: WEKA Tutorial:" Machine Learning with WEKA: A presentation demonstrating all graphical user interfaces (GUI) in Weka. " A presentation which explains how to use Weka for exploratory data mining. " WEKA Data Mining Book:" Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques (Second Edition)" WEKA Wiki: Main_Page"
32 R Environment: R Studio!
33 Downloading R/ R Studio!
34 What is R?! An Environment! R is an integrated suite of software facilities for data manipulation, calculation and graphical facilities for data analysis and display. " Effective data handling and storage" Suite of operators for calculations on arrays" Large, coherent, integrated collection of intermediate tools for data analysis " Programming language, run time environment" Developed at Bell Labs! GNU open source software! Under the terms of the Free Software Foundation's GNU General Public License" Open Source implementation of S-Plus language! Well-developed, simple and effective programming language" Highly extensible!
35 R Features! Software package designed for data analysis and graphical representation! Interactive, but may also be used programmatically! Platform independence! Compiles and runs on a wide variety platforms, Unix base, Windows and MacOS. " Free, open source code! Engaged community! over 4,200 user-contributed packages" Extendable! User defined functions" > 4000 packages available in the CRAN package repository" Supports extensions / add-ons (i.e. rapache)" Compatible with other languages (i.e. SQL, perl, C)" Data Import" Pre-processing data from different sources" Scalability! Parallel R packages "
36 Clustering! Classification! Association Rules! Sequential patterns! Time Series! Statistics! Graphics! Data manipulation! R packages for DM!
37 Data Mining! linear models (lm)! generalized linear models(glm)! generalized additive models (gam)! linear mixed effects models(lme)! quantile regression (qr)! vector general additive models(vgam)! lasso, ridge, and elastic net models (glmnet)! non-linear models (nlm)! linear mixed effects models (nlmer)! linear discriminant analysis (lda)! quadratic discriminate analysis (qda)! trees (tree)! random forests (randomforrest)! support vector machines (svm)! neural networks (nnet)! k-nearest neighbors (knn)! kmeans!
38 Big Data Options! lapply-based parallelism! multicore library" snow library" foreach-based parallelism! domc backend" dosnow backend" dompi backend" Map/Reduce- (Hadoop-) based parallelism! Hadoop streaming with R mappers/reducers" Rhadoop (rmr, rhdfs, rhbase)" RHIPE" Poor-man's Parallelism! lots of Rs running" lots of input files" Hands-off Parallelism! OpenMP support compiled into R build" Dangerous!"
39 R Considerations/Limitations! Command Line Interface! Performance! Memory Limits! memory limits dependent on the build, (32-bit vs. 64-bit)" 32-bit build of R on Windows is dependent on the underlying OS version" Syntax curiosities! Learning curve!
40 R-Studio Overview! R-Studio is an integrated development environment to support R code. R-Studio runs in two ways: Desktop version for Linux, Mac, Windows: Single user, perfect for laptop or desktop machine Server Version for Linux: Allows an number of remote users to run R-Studio within a web-browser, facilitates sharing of code and data among team members
41 General View of R-Studio Editor Window! Project Window:! Currently loaded! Workspace, and! history! pop-up :! Multi-tab display:! Shows graphics,! Current directory and! loaded packages! Console: Run R! Commands!
42 The Fundamentals! Launch R! Quit R! q() " Getting Help! help(package_name) or?(package_name) or help start()" example(package_name)"??(keyword)" library(help= package_name )"
43 R environmental commands! list objects" ls() " objects()" list files in current directory" list.files()" list current directory" getwd()" set working directory" setwd()" remove objects" rm()" Workspace versus console! Clear workspace" rm(list=ls())" Clear console" (control, L)" The Basics!
44 The Basics (Naming Variables)! Requirements! Case sensitive, names must start with letter or '. " Only letters, numbers, underscores and. s" Special keywords! break, else, FALSE, for, function, if, Inf, NA, NaN, next, repeat, return, TRUE, while" Names not limited in length!
45 The Basics All entities in are called objects! arrays, vectors, matrices, functions, lists, data frames, factors" Expressions vs. assignments! 10+10" my.age <- 23" my.age < - 23 (note the added space)" age<- c(my.age, 14, 59, 32)" my.age == 40" Data Types! Numeric, Integer, Complex, Logical, Character" Function call!!> mean(weight)"!
46 Summary of Data Structures! Linear! Rectangular! Homogeneous" Vectors" Matrices" Heterogeneous" Lists" Data Frames" " Vectors and Matrices must contain same data type! Character Type will trump numeric: Values will be forced into characters!
47 The Basics (Functions)! Basic functions! mean(age)" sd(age)" sqrt(var(age))" TIP: to list all function in search path" sapply(search(), ls, all.names = TRUE) User Defined functions! Score <- age * 10;" Using the correct functions for the given data type! apply() family "
48 Function Components! writelines(text= text, con = stdout(), sep = "\n", usebytes = FALSE)! function name: writelines( 146.6, poprate.txt, sep = "\n )" parentheses: writelines( 146.6, poprate.txt, sep = "\n )! commas: writelines( 146.6, poprate.txt, sep = "\n )" first argument: writelines( 146.6, poprate.txt, sep = "\n )" second argument: writelines( 146.6, poprate.txt, sep = "\n )"" optional argument: writelines( 146.6, poprate.txt, "\n )"
49 Importing Data/Exporting Data! Flat Files! Import: > AHW <- read.csv( AHW_1.csv, header=true)" >weatherdata <- read.table(file="c:/work/dm1/weather.csv", header=true, sep=",") " Export: > USTemps=read.table(file=file.choose(),header=TRUE)" Databases! Import" connection <- dbconnect(driver, user, password, host, dbname)" > AHW <- dbsendquery(connection, SELECT * FROM AHW ) Export" > connnection <- dbconnect(driver, user, password, host,dbname)" > dbwritetable (con, AHW, AHW) R objects! Import: > load( AHW.Rdata )" Export: > save(ahw, file= New_AHW.Rdata )" Web! connection <-url( )" AHW <- read.csv(con, header=true)" Plots! png(filename="c:/r/figure.png", height=295, width=300, bg="white")" pdf(file="c:/r/figure.pdf", height=3.5, width=5)" Dev.off() #turn off device driver (to flush output to png/pdf)"
50 Loading dataset to R-Studio (Simple text file) Name of data frame! to be created with! imported data! Options for parsing! the text data into! fields and values! How data frame will! look once the data! are imported!
51 Extending R! Install a package! from command line" "> install.package( name_of_package )" from GUI" Packages & Data > Package Installer" Load Library (to use installed package)" > library(name_of_package)" Example " > library(markdown)" Use Library Function! > function_name(parameters)" Example " > markdowntohtml("example.md")" "
52 More Information! The R Manuals! And Introduction to R! Books!
53 Other Resources! /server irc.freenode.net/join #R!"
54 the end!
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
More informationIntroduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
More informationPentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
More informationUniversité de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr
Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationDATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
More informationData Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: Jerzy.Stefanowski@cs.put.poznan.pl Data Mining a step in A KDD Process Data mining:
More informationLavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
More informationProf. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/
Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence 2014 http://www.iet.unipi.it/p.ducange/esercitazionibi/ Email: p.ducange@iet.unipi.it Office: Dipartimento di Ingegneria
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationTHE COMPARISON OF DATA MINING TOOLS
T.C. İSTANBUL KÜLTÜR UNIVERSITY THE COMPARISON OF DATA MINING TOOLS Data Warehouses and Data Mining Yrd.Doç.Dr. Ayça ÇAKMAK PEHLİVANLI Department of Computer Engineering İstanbul Kültür University submitted
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationHow To Understand How Weka Works
More Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz More Data Mining with Weka a practical course
More informationWEKA Explorer Tutorial
Machine Learning with WEKA WEKA Explorer Tutorial for WEKA Version 3.4.3 Svetlana S. Aksenova aksenovs@ecs.csus.edu School of Engineering and Computer Science Department of Computer Science California
More informationData Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationMammoth Scale Machine Learning!
Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes
More informationParallel Options for R
Parallel Options for R Glenn K. Lockwood SDSC User Services glock@sdsc.edu Motivation "I just ran an intensive R script [on the supercomputer]. It's not much faster than my own machine." Motivation "I
More informationOpen-Source Machine Learning: R Meets Weka
Open-Source Machine Learning: R Meets Weka Kurt Hornik Christian Buchta Achim Zeileis Weka? Weka is not only a flightless endemic bird of New Zealand (Gallirallus australis, picture from Wekapedia) but
More informationDBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis
DBTechNet DBTech Pro Workshop Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining Dimitris A. Dervos dad@it.teithe.gr http://aetos.it.teithe.gr/~dad Georgios Evangelidis
More informationDATA MINING USING PENTAHO / WEKA
DATA MINING USING PENTAHO / WEKA Yannis Angelis Channels & Information Exploitation Division Application Delivery Sector EFG Eurobank 1 Agenda BI in Financial Environments Pentaho Community Platform Weka
More informationDidacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.
1 Subject Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka. This document extends a previous tutorial dedicated to the comparison of various implementations
More informationAnalysis Tools and Libraries for BigData
+ Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I
More informationDidacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
More informationKeywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationSome vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
More information1 Topic. 2 Scilab. 2.1 What is Scilab?
1 Topic Data Mining with Scilab. I know the name "Scilab" for a long time (http://www.scilab.org/en). For me, it is a tool for numerical analysis. It seemed not interesting in the context of the statistical
More informationRAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationMachine Learning. Hands-On for Developers and Technical Professionals
Brochure More information from http://www.researchandmarkets.com/reports/2785739/ Machine Learning. Hands-On for Developers and Technical Professionals Description: Dig deep into the data with a hands-on
More informationBig Data Analytics Predicting Risk of Readmissions of Diabetic Patients
Big Data Analytics Predicting Risk of Readmissions of Diabetic Patients Saumya Salian 1, Dr. G. Harisekaran 2 1 SRM University, Department of Information and Technology, SRM Nagar, Chennai 603203, India
More informationIBM SPSS Modeler 15 In-Database Mining Guide
IBM SPSS Modeler 15 In-Database Mining Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 217. This edition applies to IBM SPSS Modeler
More informationImproving spam mail filtering using classification algorithms with discretization Filter
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationMake Better Decisions Through Predictive Intelligence
IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationWhat is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationImplementation of Breiman s Random Forest Machine Learning Algorithm
Implementation of Breiman s Random Forest Machine Learning Algorithm Frederick Livingston Abstract This research provides tools for exploring Breiman s Random Forest algorithm. This paper will focus on
More information1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
More informationContents WEKA Microsoft SQL Database
WEKA User Manual Contents WEKA Introduction 3 Background information. 3 Installation. 3 Where to get WEKA... 3 Downloading Information... 3 Opening the program.. 4 Chooser Menu. 4-6 Preprocessing... 6-7
More informationDATA MINING ALPHA MINER
DATA MINING ALPHA MINER AlphaMiner is developed by the E-Business Technology Institute (ETI) of the University of Hong Kong under the support from the Innovation and Technology Fund (ITF) of the Government
More informationSupervised DNA barcodes species classification: analysis, comparisons and results. Tutorial. Citations
Supervised DNA barcodes species classification: analysis, comparisons and results Emanuel Weitschek, Giulia Fiscon, and Giovanni Felici Citations If you use this procedure please cite: Weitschek E, Fiscon
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationData Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine
Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining
More informationHow To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
More informationCOC131 Data Mining - Clustering
COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window
More informationThe Prophecy-Prototype of Prediction modeling tool
The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai
More informationGGobi meets R: an extensible environment for interactive dynamic data visualization
New URL: http://www.r-project.org/conferences/dsc-2001/ DSC 2001 Proceedings of the 2nd International Workshop on Distributed Statistical Computing March 15-17, Vienna, Austria http://www.ci.tuwien.ac.at/conferences/dsc-2001
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationWEKA. Machine Learning Algorithms in Java
WEKA Machine Learning Algorithms in Java Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand E-mail: ihw@cs.waikato.ac.nz Eibe Frank Department of Computer Science
More informationData Mining - The Next Mining Boom?
Howard Ong Principal Consultant Aurora Consulting Pty Ltd Abstract This paper introduces Data Mining to its audience by explaining Data Mining in the context of Corporate and Business Intelligence Reporting.
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationWaffles: A Machine Learning Toolkit
Journal of Machine Learning Research 12 (2011) 2383-2387 Submitted 6/10; Revised 3/11; Published 7/11 Waffles: A Machine Learning Toolkit Mike Gashler Department of Computer Science Brigham Young University
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationPredictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics
Predictive Analytics Powered by SAP HANA Cary Bourgeois Principal Solution Advisor Platform and Analytics Agenda Introduction to Predictive Analytics Key capabilities of SAP HANA for in-memory predictive
More informationData Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More informationWEKA KnowledgeFlow Tutorial for Version 3-5-8
WEKA KnowledgeFlow Tutorial for Version 3-5-8 Mark Hall Peter Reutemann July 14, 2008 c 2008 University of Waikato Contents 1 Introduction 2 2 Features 3 3 Components 4 3.1 DataSources..............................
More informationDATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
More informationMore Data Mining with Weka
More Data Mining with Weka Class 5 Lesson 1 Simple neural networks Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Lesson 5.1: Simple neural networks Class
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationtesto dello schema Secondo livello Terzo livello Quarto livello Quinto livello
Extracting Knowledge from Biomedical Data through Logic Learning Machines and Rulex Marco Muselli Institute of Electronics, Computer and Telecommunication Engineering National Research Council of Italy,
More informationWEKA A Machine Learning Workbench for Data Mining
Chapter 1 WEKA A Machine Learning Workbench for Data Mining Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Bernhard Pfahringer, Ian H. Witten Department of Computer Science, University of Waikato,
More informationWhat s Cooking in KNIME
What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationDistance Learning and Examining Systems
Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed
More informationBig Data Analytics and Optimization
Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n
More informationWROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationIntroduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationOrigins, Evolution, and Future Directions of MATLAB Loren Shure
Origins, Evolution, and Future Directions of MATLAB Loren Shure 2015 The MathWorks, Inc. 1 Agenda Origins Peaks 5 Evolution 0-5 Tomorrow 2 0 y -2-3 -2-1 x 0 1 2 3 2 Computational Finance Workflow Access
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationImporting Data into R
1 R is an open source programming language focused on statistical computing. R supports many types of files as input and the following tutorial will cover some of the most popular. Importing from text
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationIBM SPSS Modeler Professional
IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model
More informationWEKA Explorer User Guide for Version 3-4-3
WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationFortgeschrittene Computerintensive Methoden
Fortgeschrittene Computerintensive Methoden Einheit 5: mlr - Machine Learning in R Bernd Bischl Matthias Schmid, Manuel Eugster, Bettina Grün, Friedrich Leisch Institut für Statistik LMU München SoSe 2015
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationCHAPTER 6 IMPLEMENTATION OF CONVENTIONAL AND INTELLIGENT CLASSIFIER FOR FLAME MONITORING
135 CHAPTER 6 IMPLEMENTATION OF CONVENTIONAL AND INTELLIGENT CLASSIFIER FOR FLAME MONITORING 6.1 PROPOSED SETUP FOR FLAME MONITORING IN BOILERS The existing flame monitoring system includes the flame images
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationData Mining and Business Intelligence CIT-6-DMB. http://blackboard.lsbu.ac.uk. Faculty of Business 2011/2012. Level 6
Data Mining and Business Intelligence CIT-6-DMB http://blackboard.lsbu.ac.uk Faculty of Business 2011/2012 Level 6 Table of Contents 1. Module Details... 3 2. Short Description... 3 3. Aims of the Module...
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More information