Practical Data Science with R



Similar documents
CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Prerequisites. Course Outline

Exploratory Data Analysis with #codemash

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Azure Machine Learning, SQL Data Mining and R

CS Data Science and Visualization Spring 2016

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Analytics on Big Data

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Sunnie Chung. Cleveland State University

AcademyR Course Catalog

Data UNC. Vinayak Deshpande

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

MS1b Statistical Data Mining

Maximierung des Geschäftserfolgs durch SAP Predictive Analytics. Andreas Forster, May 2014

Machine Learning with MATLAB David Willingham Application Engineer

GETTING STARTED WITH R AND DATA ANALYSIS

An In-Depth Look at In-Memory Predictive Analytics for Developers

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Bayesian networks - Time-series models - Apache Spark & Scala

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science And Big Data Analytics Course

2015 Workshops for Professors

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

from Larson Text By Susan Miertschin

Principles of Data Mining by Hand&Mannila&Smyth

Predict the Popularity of YouTube Videos Using Early View Data

R Tools Evaluation. A review by Global BI / Local & Regional Capabilities. Telefónica CCDO May 2015

An Introduction to Data Mining

HT2015: SC4 Statistical Data Mining and Machine Learning

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

Predictive Modeling Techniques in Insurance

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

INDIAN STATISTICAL INSTITUTE announces Training Program on Statistical Techniques for Data Mining & Business Analytics

Data Mining Algorithms Part 1. Dejan Sarka

Our Philosophy. Authentic Contexts. Provide relevant and meaningful courseware to promote deeper understanding

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Business Intelligence. Data Mining and Optimization for Decision Making

Data Mining Part 5. Prediction

Data Analytical Framework for Customer Centric Solutions

Machine Learning.

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Data Mining. Dr. Saed Sayad. University of Toronto

King Saud University

! E6893 Big Data Analytics:! Demo Session II: Mahout working with Eclipse and Maven for Collaborative Filtering

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

«The Five Myths of Predictive Analytics» 1

not possible or was possible at a high cost for collecting the data.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Learning outcomes. Knowledge and understanding. Competence and skills

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Social Media Mining. Data Mining Essentials

Data Mining. Nonlinear Classification

Final Project Report

Office: LSK 5045 Begin subject: [ISOM3360]...

Predictive Analytics Powered by SAP HANA. Cary Bourgeois Principal Solution Advisor Platform and Analytics

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Data Mining Part 5. Prediction

SEIZE THE DATA SEIZE THE DATA. 2015

MSCA Introduction to Statistical Concepts

ANALYTICS CENTER LEARNING PROGRAM

Why is Internal Audit so Hard?

Statistics W4240: Data Mining Columbia University Spring, 2014

Course Description This course will change the way you think about data and its role in business.

Assignment 5: Visualization

Car Insurance. Prvák, Tomi, Havri

Statistics, Data Mining and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data. and Alex Gray

Predictive Modeling and Big Data

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Data Science with Hadoop at Opower

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.

Big Data Analytics and Optimization

For more about patterns & practices: My blog:

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Microsoft Azure Machine learning Algorithms

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Data Mining + Business Intelligence. Integration, Design and Implementation

MACHINE LEARNING BASICS WITH R

Lecture 9: Introduction to Pattern Analysis

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

Knowledge Discovery and Data Mining

Customer and Business Analytic

Predict Influencers in the Social Network

Introduction to Data Visualization

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Machine learning for algo trading

DATA MINING TECHNIQUES AND APPLICATIONS

Statistics Graduate Courses

THE COMPARISON OF DATA MINING TOOLS

Advanced In-Database Analytics

Enhancing Education Quality Assurance Using Data Mining. Case Study: Arab International University Systems.

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Transcription:

Practical Data Science with R Instructor Matthew Renze Twitter: @matthewrenze Email: matthew@matthewrenze.com Web: http://www.matthewrenze.com Course Description Data science is the practice of transforming data into actionable insight. R is the most popular opensource programming language currently in use by data scientists. In our data-driven economy, this combination of skills is in extremely high demand, commanding significant increases in salary, and is revolutionizing the world as we know it. In this workshop, we ll learn about the practice of data science, the R programming language, and how they can be used to answer day-to-day questions about your business. In addition, we ll learn how to transform and clean our data, create and interpret descriptive statistics, data visualizations, and statistical models. We ll also learn how to handle Big Data, make predictions using machine learning algorithms, and deploy R to production. Prerequisites Please bring your own Windows laptop and complete 0 to install all of the necessary software before the workshop begins. Module Descriptions 1. Introduction introduce the practice of data science and the R programming language 2. Transforming Data learn how to import, transform, clean, and export data 3. Descriptive Statistics learn how to create and interpret univariate and bivariate statistics 4. Data Visualization learn how to create univariate, bivariate, and multivariate data visualizations 5. Statistical Modeling learn to create Gaussian models and simple linear regression models 6. Handling Big Data learn about big data and how to handle it with tools in R 7. Machine Learning learn about ML and how to train, test, and implement ML models 8. R in Practice learn about R in production, reproducible research, and industry best practices

Learning Objectives When students are finished with this workshop, they should understand the following: Introduction What data science is, why it is important, and how the process of data science works What R is and why it has become so popular for data science How to create data types, data structures, subset data tables, and find help on R topics Transforming and Cleaning Data What data munging is, what clean data are, and the steps involved in the data munging process How to import, transform, clean, and export data How to use the dplyr package in R Descriptive Statistics What descriptive statistics are and how they can be used to make sense of data What types of variables exist and the corresponding types of data analysis we can perform How to create standard univariate and bivariate descriptive statistics Data Visualization What data visualization is and how we can use it to identify patterns in data What types of data visualization we can create based on the question we are trying to answer How to create and interpret univariate, bivariate, and multivariate data visualizations Statistical Modeling What a statistical model is and how it can be used for statistical inference How to create and generate data with a Gaussian distribution model How to create and predict with a simple linear regression model Handling Big Data What Big Data is and what are the limitations of R How to work around these limitations with sampling and 3 rd -party tools Machine Learning What machine learning is and how it can be used to make predictions How to train, test, and implement a machine learning algorithm How to predict with k-mean cluster analysis, decision trees, naïve Bayes, and neural networks R in Practice How to use R in production with tools like R Server and shiny What industry best practices exist for using R for data science How to create reproducible research with R markdown

Course Outline Introduction to Data Science and R Introduction to Data Science What is data science? Why is data science important? The data science process Introduction to R What is R? Why is R so popular for data science? R language basics Installation and setup Hello World Working with data types Working with data structures Working with data frames Miscellaneous topics Transforming and Cleaning Data What is data munging? What are clean data? The data munging process Data munging tools Importing data Transforming data Cleaning data Exporting data Using dplyr

Descriptive Statistics What are descriptive statistics? Types of data analysis Univariate descriptive statistics Bivariate descriptive statistics Creating univariate descriptive statistics Creating bivariate descriptive statistics Data Visualization What is data visualization? Univariate data visualizations Bivariate data visualizations Multivariate data visualizations Creating univariate data visualizations Creating bivariate data visualizations Creating multivariate data visualizations Statistical Modeling What are statistical models? Gaussian distribution models Linear regression models Creating Gaussian distribution models Creating linear regression models Handling Big Data What is Big Data? How to handle big data? No

Machine Learning What is machine learning? Types of machine learning The machine learning process Predicting with k-means cluster analysis Creating training and test data sets Predicting with decision trees Predicting with naïve Bayes classifiers Predicting with neural networks R in Practice Using R in production Best practices Reproducible research Exporting charts Using shiny Creating R markdown