How To Understand Data Mining In R And Rattle
|
|
|
- Michael Malone
- 5 years ago
- Views:
Transcription
1 http: // togaware. com Copyright 2014, 1/40 Data Analytics and Business Intelligence (8696/8697) Introducing Data Science with R and Rattle [email protected] Chief Data Scientist Australian Taxation Office Adjunct Professor, Australian National University Adjunct Professor, University of Canberra Fellow, Institute of Analytics Professionals of Australia [email protected]
2 An Introduction to Data Mining http: // togaware. com Copyright 2014, 3/40 Overview 1 An Introduction to Data Mining 2 The Rattle Package for Data Mining 3 Moving Into R
3 An Introduction to Data Mining Big Data and Big Business http: // togaware. com Copyright 2014, 4/40 Data Mining A data driven analysis to uncover otherwise unknown but useful patterns in large datasets, to discover new knowledge and to develop predictive models, turning data and information into knowledge and (one day perhaps) wisdom, in a timely manner.
4 An Introduction to Data Mining Big Data and Big Business http: // togaware. com Copyright 2014, 5/40 Data Mining Application of Machine Learning Statistics Software Engineering and Programming with Data Effective Communications and Intuition... to Datasets that vary by Volume, Velocity, Variety, Value, Veracity... to discover new knowledge... to improve business outcomes... to deliver better tailored services
5 An Introduction to Data Mining Big Data and Big Business http: // togaware. com Copyright 2014, 6/40 Data Mining in Research Health Research Adverse reactions using linked Pharmaceutical, General Practitioner, Hospital, Pathology datasets. Astronomy Microlensing events in the Large Magellanic Cloud of several million observed stars (out of 10 billion). Psychology Investigation of age-of-onset for Alzheimer s disease from 75 variables for 800 people. Social Sciences Survey evaluation. Social network analysis - identifying key influencers.
6 An Introduction to Data Mining Big Data and Big Business http: // togaware. com Copyright 2014, 7/40 Data Mining in Government Australian Taxation Office Lodgment ($110M) Tax Havens ($150M) Tax Fraud ($250M) Immigration and Border Control Check passengers before boarding Health and Human Services Doctor shoppers Over servicing
7 An Introduction to Data Mining Big Data and Big Business http: // togaware. com Copyright 2014, 8/40 The Business of Data Mining SAS has annual revenues of $3B (2013) IBM bought SPSS for $1.2B (2009) Analytics is >$100B business and >$320B by 2020 Amazon, ebay/paypal, Google, Facebook, LinkedIn,... Shortage of 180,000 data scientists in US in 2018 (McKinsey)...
8 An Introduction to Data Mining Algorithms http: // togaware. com Copyright 2014, 9/40 Basic Tools: Data Mining Algorithms Cluster Analysis (kmeans, wskm) Association Analysis (arules) Linear Discriminant Analysis (lda) Logistic Regression (glm) Decision Trees (rpart, wsrpart) Random Forests (randomforest, wsrf) Boosted Stumps (ada) Neural Networks (nnet) Support Vector Machines (kernlab)... That s a lot of tools to learn in R! Many with different interfaces and options.
9 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 10/40 Overview 1 An Introduction to Data Mining 2 The Rattle Package for Data Mining 3 Moving Into R
10 The Rattle Package for Data Mining A GUI for Data Mining http: // togaware. com Copyright 2014, [email protected] 11/40 Why a GUI? Statistics can be complex and traps await So many tools in R to deliver insights Effective analyses should be scripted Scripting also required for repeatability R is a language for programming with data How to remember how to do all of this in R? How to skill up 150 data analysts with Data Mining?
11 The Rattle Package for Data Mining A GUI for Data Mining http: // togaware. com Copyright 2014, [email protected] 12/40 Users of Rattle Today, Rattle is used world wide in many industries Health analytics Customer segmentation and marketing Fraud detection Government It is used by Universities to teach Data Mining Within research projects for basic analyses Consultants and Analytics Teams across business It is and will remain freely available. CRAN and
12 The Rattle Package for Data Mining Setting Things Up http: // togaware. com Copyright 2014, 13/40 Installation Rattle is built using R Need to download and install R from cran.r-project.org Recommend also install RStudio from Then start up RStudio and install Rattle: install.packages("rattle") Then we can start up Rattle: rattle() Required packages are loaded as needed.
13 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 14/40 Tour A Tour Thru Rattle: Startup
14 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 15/40 Tour A Tour Thru Rattle: Loading Data
15 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 16/40 Tour A Tour Thru Rattle: Explore Distribution
16 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 17/40 Tour A Tour Thru Rattle: Explore Correlations
17 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 18/40 Tour A Tour Thru Rattle: Hierarchical Cluster
18 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 19/40 Tour A Tour Thru Rattle: Decision Tree
19 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 20/40 Tour A Tour Thru Rattle: Decision Tree Plot
20 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 21/40 Tour A Tour Thru Rattle: Random Forest
21 The Rattle Package for Data Mining http: // togaware. com Copyright 2014, 22/40 Tour A Tour Thru Rattle: Risk Chart Risk Chart Random Forest weather.csv [test] RainTomorrow Risk Scores Lift Performance (%) % 1 RainTomorrow (92%) Rain in MM (97%) 0 Precision Caseload (%)
22 Moving Into R http: // togaware. com Copyright 2014, [email protected] 23/40 Overview 1 An Introduction to Data Mining 2 The Rattle Package for Data Mining 3 Moving Into R
23 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 24/40 Data Scientists are Programmers of Data But... Data scientists are programmers of data A GUI can only do so much R is a powerful statistical language Data Scientists Desire... Scripting Transparency Repeatability Sharing
24 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 25/40 From GUI to CLI Rattle s Log Tab
25 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 26/40 From GUI to CLI Rattle s Log Tab
26 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 27/40 Step 1: Load the Dataset dsname <- "weather" ds <- get(dsname) dim(ds) ## [1] names(ds) ## [1] "Date" "Location" "MinTemp" "... ## [5] "Rainfall" "Evaporation" "Sunshine" "... ## [9] "WindGustSpeed" "WindDir9am" "WindDir3pm" "... ## [13] "WindSpeed3pm" "Humidity9am" "Humidity3pm" "......
27 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 28/40 Step 2: Observe the Data Observations head(ds) ## Date Location MinTemp MaxTemp Rainfall Evapora... ## Canberra ## Canberra ## Canberra tail(ds) ## Date Location MinTemp MaxTemp Rainfall Evapo... ## Canberra ## Canberra ## Canberra
28 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 29/40 Step 2: Observe the Data Structure str(ds) ## 'data.frame': 366 obs. of 24 variables: ## $ Date : Date, format: " " " ## $ Location : Factor w/ 49 levels "Adelaide","Alba... ## $ MinTemp : num ## $ MaxTemp : num ## $ Rainfall : num ## $ Evaporation : num ## $ Sunshine : num ## $ WindGustDir : Ord.factor w/ 16 levels "N"<"NNE"<"N... ## $ WindGustSpeed: num ## $ WindDir9am : Ord.factor w/ 16 levels "N"<"NNE"<"N... ## $ WindDir3pm : Ord.factor w/ 16 levels "N"<"NNE"<"N......
29 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 30/40 Step 2: Observe the Data Summary summary(ds) ## Date Location MinTemp... ## Min. : Canberra :366 Min. : ## 1st Qu.: Adelaide : 0 1st Qu.: ## Median : Albany : 0 Median : ## Mean : Albury : 0 Mean : ## 3rd Qu.: AliceSprings : 0 3rd Qu.: ## Max. : BadgerysCreek: 0 Max. : ## (Other) : 0... ## Rainfall Evaporation Sunshine Wind... ## Min. : 0.00 Min. : 0.20 Min. : 0.00 NW... ## 1st Qu.: st Qu.: st Qu.: 5.95 NNW... ## Median : 0.00 Median : 4.20 Median : 8.60 E......
30 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 31/40 Step 2: Observe the Data Variables id <- c("date", "Location") target <- "RainTomorrow" risk <- "RISK_MM" (ignore <- union(id, risk)) ## [1] "Date" "Location" "RISK_MM" (vars <- setdiff(names(ds), ignore)) ## [1] "MinTemp" "MaxTemp" "Rainfall" "... ## [5] "Sunshine" "WindGustDir" "WindGustSpeed" "... ## [9] "WindDir3pm" "WindSpeed9am" "WindSpeed3pm" "... ## [13] "Humidity3pm" "Pressure9am" "Pressure3pm" "......
31 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 32/40 Step 3: Clean the Data Remove Missing dim(ds) ## [1] sum(is.na(ds[vars])) ## [1] 47 ds <- ds[-attr(na.omit(ds[vars]), "na.action"),]
32 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 33/40 Step 3: Clean the Data Remove Missing dim(ds) ## [1] sum(is.na(ds[vars])) ## [1] 0
33 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 34/40 Step 3: Clean the Data Target as Categoric summary(ds[target]) ## RainTomorrow ## Min. :0.000 ## 1st Qu.:0.000 ## Median :0.000 ## Mean :0.183 ## 3rd Qu.:0.000 ## Max. : ds[target] <- as.factor(ds[[target]]) levels(ds[target]) <- c("no", "Yes")
34 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 35/40 Step 3: Clean the Data Target as Categoric summary(ds[target]) ## RainTomorrow ## 0:268 ## 1: count RainTomorrow
35 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 36/40 Step 4: Prepare for Modelling (form <- formula(paste(target, "~."))) ## RainTomorrow ~. (nobs <- nrow(ds)) ## [1] 328 train <- sample(nobs, 0.70*nobs) length(train) ## [1] 229 test <- setdiff(1:nobs, train) length(test) ## [1] 99
36 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 37/40 Step 5: Build the Model Random Forest library(randomforest) model <- randomforest(form, ds[train, vars], na.action=na.omit) model ## ## Call: ## randomforest(formula=form, data=ds[train, vars],... ## Type of random forest: classification ## Number of trees: 500 ## No. of variables tried at each split: 4...
37 Moving Into R Programming with Data http: // togaware. com Copyright 2014, [email protected] 38/40 Step 6: Evaluate the Model Risk Chart pr <- predict(model, ds[test,], type="prob")[,2] riskchart(pr, ds[test, target], ds[test, risk], title="random Forest - Risk Chart", risk=risk, recall=target, thresholds=c(0.35, 0.15)) Random Forest Risk Chart 100 Risk Scores Lift 5 Performance (%) RainTomorrow (98%) RISK_MM (97%) Precision Caseload (%) 19%
38 Moving Into R Resources http: // togaware. com Copyright 2014, [email protected] 39/40 Resources and References OnePageR: Tutorial Notes Rattle: Guides: Practise: Book: Data Mining using Rattle/R Chapter: Rattle and Other Tales Paper: A Data Mining GUI for R R Journal, Volume 1(2)
39 Summary Starting Data Science with Rattle and R http: // togaware. com Copyright 2014, [email protected] 40/40 Lecture Summary Data Science Analytics Data Mining; Rattle as a GUI for Quickly Analysing Data; The Power is with R. This document, sourced from StartL.Rnw revision 430, was processed by KnitR version 1.6 of and took 3.6 seconds to process. It was generated by gjw on nyx running Ubuntu LTS with Intel(R) Xeon(R) CPU 2.67GHz having 4 cores and 12.3GB of RAM. It completed the processing :16:29.
Data Science with R. Introducing Data Mining with Rattle and R. [email protected]
http: // togaware. com Copyright 2013, [email protected] 1/35 Data Science with R Introducing Data Mining with Rattle and R [email protected] Senior Director and Chief Data Miner,
Data Science with R Ensemble of Decision Trees
Data Science with R Ensemble of Decision Trees [email protected] 3rd August 2014 Visit http://handsondatascience.com/ for more Chapters. The concept of building multiple decision trees to produce
Data Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, [email protected] 1/34 Data Analytics and Business Intelligence (8696/8697) Introducing and Interacting with R [email protected] Chief Data
1 R: A Language for Data Mining. 2 Data Mining, Rattle, and R. 3 Loading, Cleaning, Exploring Data in Rattle. 4 Descriptive Data Mining
A Data Mining Workshop Excavating Knowledge from Data Introducing Data Mining using R [email protected] Data Scientist Australian Taxation Office Adjunct Professor, Australian National University
Hands-On Data Science with R Dealing with Big Data. [email protected]. 27th November 2014 DRAFT
Hands-On Data Science with R Dealing with Big Data [email protected] 27th November 2014 Visit http://handsondatascience.com/ for more Chapters. In this module we explore how to load larger datasets
Data Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, [email protected] 1/39 Data Analytics and Business Intelligence (8696/8697) Introducing Data Mining [email protected] Chief Data Scientist Australian
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Data Science Using Open Souce Tools Decision Trees and Random Forest Using R
Data Science Using Open Souce Tools Decision Trees and Random Forest Using R Jennifer Evans Clickfox [email protected] January 14, 2014 Jennifer Evans (Clickfox) Twitter: JenniferE CF January
Data Science with R Cluster Analysis
Data Science with R Cluster Analysis [email protected] 22nd June 2014 Visit http://onepager.togaware.com/ for more OnePageR s. We focus on the unsupervised method of cluster analysis in this
Didacticiel Études de cas
1 Theme Data Mining with R The rattle package. R (http://www.r project.org/) is one of the most exciting free data mining software projects of these last years. Its popularity is completely justified (see
Azure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
R Tools Evaluation. A review by Analytics @ Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015
R Tools Evaluation A review by Analytics @ Global BI / Local & Regional Capabilities Telefónica CCDO May 2015 R Features What is? Most widely used data analysis software Used by 2M+ data scientists, statisticians
Predictive Modeling and Big Data
Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
MS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
Hands-On Data Science with R Exploring Data with GGPlot2. [email protected]. 22nd May 2015 DRAFT
Hands-On Data Science with R Exploring Data with GGPlot2 [email protected] 22nd May 215 Visit http://handsondatascience.com/ for more Chapters. The ggplot2 (Wickham and Chang, 215) package implements
Data Mining. Dr. Saed Sayad. University of Toronto 2010 [email protected]. http://chem-eng.utoronto.ca/~datamining/
Data Mining Dr. Saed Sayad University of Toronto 2010 [email protected] http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by
Big Data and Marketing
Big Data and Marketing Professor Venky Shankar Coleman Chair in Marketing Director, Center for Retailing Studies Mays Business School Texas A&M University http://www.venkyshankar.com [email protected]
COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Deep dive into Haven Predictive Analytics Powered by HP Distributed R and
Data Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
The R pmmltransformations Package
The R pmmltransformations Package Tridivesh Jena Alex Guazzelli Wen-Ching Lin Michael Zeller Zementis, Inc.* Zementis, Inc. Zementis, Inc. Zementis, Inc. Tridivesh.Jena@ Alex.Guazzelli@ Wenching.Lin@ Michael.Zeller@
Index Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More
Bob Muenchen, Author R for SAS and SPSS Users, Co-Author R for Stata Users [email protected], http://r4stats.com What is R? R s Advantages R s Disadvantages Installing and Maintaining R Ways of Running
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important
Health Spring Meeting May 2008 Session # 42: Dental Insurance What's New, What's Important Floyd Ray Martin, FSA, MAAA Thomas A. McInteer, FSA, MAAA Jonathan P. Polon, FSA Dental Insurance Fraud Detection
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III
www.cognitro.com/training Predicitve DATA EMPOWERING DECISIONS Data Mining & Predicitve Training (DMPA) is a set of multi-level intensive courses and workshops developed by Cognitro team. it is designed
Chapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
Predictive modelling around the world 28.11.13
Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting
IBM SPSS Modeler Premium
IBM SPSS Modeler Premium Improve model accuracy with structured and unstructured data, entity analytics and social network analysis Highlights Solve business problems faster with analytical techniques
ElegantJ BI. White Paper. The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis
ElegantJ BI White Paper The Competitive Advantage of Business Intelligence (BI) Forecasting and Predictive Analysis Integrated Business Intelligence and Reporting for Performance Management, Operational
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
Course Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
SEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90
FREE echapter C H A P T E R1 Big Data and Analytics Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90 percent of the data in the
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining
Oracle Advanced Analytics Oracle R Enterprise & Oracle Data Mining R The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
Predictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar
An In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
extreme Datamining mit Oracle R Enterprise
extreme Datamining mit Oracle R Enterprise Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise About
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19
PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
Role of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points
Analytic 360 Our Raison d'être Identify major choice decision points Leverage Analytical Tools and Techniques to solve problems hindering these decision points Empowerment through Intelligence Our Suite
BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com [email protected]
Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2
Oracle 11g DB Data Warehousing ETL OLAP Statistics Anomaly and Fraud Detection with Oracle Data Mining 11g Release 2 Data Mining Charlie Berger Sr. Director Product Management, Data
IBM SPSS Modeler Professional
IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model
Data Analysis Bootcamp - What To Expect. Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC
Data Analysis Bootcamp - What To Expect Damian Herrick Founder, Principal Consultant Lake Hill Analytics, LLC Why Are Companies Using Data and Analytics Today? Data + Predictive Ability + Optimization
Easily Identify Your Best Customers
IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do
Big Data Analytics. Benchmarking SAS, R, and Mahout. Allison J. Ames, Ralph Abbey, Wayne Thompson. SAS Institute Inc., Cary, NC
Technical Paper (Last Revised On: May 6, 2013) Big Data Analytics Benchmarking SAS, R, and Mahout Allison J. Ames, Ralph Abbey, Wayne Thompson SAS Institute Inc., Cary, NC Accurate and Simple Analysis
Machine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Easily Identify the Right Customers
PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your
«The Five Myths of Predictive Analytics» 1
The Five Myths of Predictive Analytics @AnalyticsQueen #PAWGov email: [email protected] White paper: www.aryng.com Piyanka Jain President & CEO, Aryng.com «The Five Myths of Predictive Analytics» 1 Analytics
Make Better Decisions Through Predictive Intelligence
IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly
Package acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
IBM SPSS Direct Marketing
IBM Software IBM SPSS Statistics 19 IBM SPSS Direct Marketing Understand your customers and improve marketing campaigns Highlights With IBM SPSS Direct Marketing, you can: Understand your customers in
Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
GETTING STARTED WITH R AND DATA ANALYSIS
GETTING STARTED WITH R AND DATA ANALYSIS [Learn R for effective data analysis] LEARN PRACTICAL SKILLS REQUIRED FOR VISUALIZING, TRANSFORMING, AND ANALYZING DATA IN R One day course for people who are just
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics
An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
Machine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
In this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Text Mining with R Twitter Data Analysis 1
Text Mining with R Twitter Data Analysis 1 Yanchang Zhao http://www.rdatamining.com R and Data Mining Workshop for the Master of Business Analytics course, Deakin University, Melbourne 28 May 2015 1 Presented
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
Get to Know the IBM SPSS Product Portfolio
IBM Software Business Analytics Product portfolio Get to Know the IBM SPSS Product Portfolio Offering integrated analytical capabilities that help organizations use data to drive improved outcomes 123
