How to perform predictive analysis on your web analytics tool data June 19 th, 2013 FREE Webinar by
Before we start... www Q & A?
Our speakers Carolina Araripe Inbound Marketing Strategist @Tatvic http://linkd.in/yazvvn Amar Gondaliya Data Model Engineer @Tatvic http://linkd.in/16cpdqi Kushan Shah Web Analyst @Tatvic http://linkd.in/18rfffv
Talking about Analytics Descriptive: What has happened? Analytics Predictive: Predicts the outcome or future Prescriptive: What should happen?
Talking about Analytics Descriptive: What has happened? Analytics Predictive: Predicts the outcome or future Prescriptive: What should happen?
In other words Predictive Analytics Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions. Source: Siegel, E. (2013) Predictive Analytics. The power to predict who will click, buy, lie or die.
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Introduction to R What Open source statistical computing language, widely used by organizations to solve business problems. Applications Data Analysis Data Visualization Statistical Tests Predictive Model Forecasting Why Easy to integrate Data frame Pre developed packages How to get started Download and install Choose and download a user-friendly GUI RStudio
R Packages Categories of Packages For this webinar Data Extraction Data Visualization RGoogleAnalytics Usage: To extract Google Analytics data into R Contibutors: Michael Pearmain, Nick Mihailovski, Amar Gondaliya and Vignesh Prajapati ggplot2 Usage: Build plots and charts Contibutor: Hadley Wickham Time Series Machine Learning
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Google Analytics data User performing data extraction Extracting your GA data into R Google OAuth2 Authorization Server Google Analytics API Access Token Request Access Token Response Call API for list of profiles Call API for query
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Business Problem Projected Growth of Retail ecommerce in US US Retail ecommerce Sales 2011-2016 (in billion $) $194.70 $225.50 $258.90 $296.70 $338.90 $384.90 2011 2012 2013 2014 2015 2016 Source: http://www.emarketer.com/article/retail-ecommerce-set-keep-strong-pace-through-2017/1009836
Business Problem Product return Returns are on the rise-up 19% from 2007. For every US$1 spent on merchandize, 9 are returned. Average return rate for ecommerce retailers varies from 3-12%. Source: Time Magazine, Sept. 04 th, 2012 Product Return Impact (per day) Average Return Rate 9 % 7 % Increase in Revenue with recovered returns in long run Average Order Value $100 $100 Orders Per Day 500 500 Total Income $50,000 $50,000 Loss due to returns $4,500 $3,500 Revenue post loss $45,500 $46,500 Month x30 $30,000 Year x365 $365,000 Increase in Revenue/day ----- $1000
Data Introduction Transactional Data Pre Purchase Data Browsing Behavior up to shopping cart In Purchase Data Purchase Behavior from shopping cart to thank you page
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Machine Learning Tech. Supervised Learning Generates a function that maps inputs (labeled data) to desired outputs (e.g.: Spam Detection) Training Data Variables Supervised Learning Model Labels are right answers from historical data Labels Machine Learning Algorithm e.g.: Spam Detection Input Data: Contains emails marked Spam/No Spam Test Data Variables Predictive Model Predicted Outcome labels
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Feature engineering Going beyond algorithms and using domain knowledge to augment new variables to model E.g.: Products purchased as gifts are less likely to be returned Create a New Variable with binary values: 1 Product purchased as gift, 0 otherwise Products purchased in holiday season are more likely to be returned Based on Purchase date, create new variable with binary values: 1 Product purchased in the month Nov-Dec, 0 - otherwise
Response Variable Price of House ($) Predictor/Response Variables 800,000.00 700,000.00 600,000.00 500,000.00 400,000.00 300,000.00 200,000.00 100,000.00 0.00 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 Size of House (sq ft) Predictor Variable
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Generalized Linear Models glm (formula, family, data) Formula Response ~ Predictor (This argument shows which all variables are independent (predictor) variables and which variable is/are dependent(response) variable/s Family Binomial (Since the output variable (which is product return is defined as binary value 0 or 1, we are using binomial family) Data Train data set This data set consists values of all 18 variables (i.e. values of dependent variables and independent variables are given). This dataset is also called labeled data.
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Modeling Loading Input Data Introducing Model Variables Model Creation Model Performance Applying Model to Test Data
Machine Learning Tech. Supervised Learning Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection) Training Data Variables Supervised Learning Model Labels are right answers from historical data Labels Machine Learning Algorithm e.g.: Spam Detector Input Data: Contains emails marked Spam/No Spam Test Data Variables Predictive Model Predicted Outcome labels
Number of Transactions Summary Probability of product return > 60% Probability of product return 60% > 60 % 60 % 60 < 60 Probability of Product Returns Call customer before shipping Send discount coupon to initiate customer for future purchase
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
Outline of this webinar Predictive Analytics Tool Data Model R Google Analytics Logistic Regression Visualization
ggplot2 Geometric Shapes Scales and Coordinate Systems Plot Annotations
Q&A Round
Thank you! Carolina Araripe carolina@tatvic.com +91 7600-515-354 +1 276-644-0456