Predictive Analytics: Extracts from Red Olive foundational course For more details or to speak about a tailored course for your organisation please contact: Jefferson Lynch: jefferson.lynch@red-olive.co.uk +44 1256 831100 December 2014 Analytics and Data Management 1
Contents What makes a great analysis? Measuring relationships between variables Profiling What is data mining? The data mining process Data mining techniques Discussion next steps for data mining Back-up slides Introduction to descriptive statistics Copyright 2014 Red Olive Ltd, All Rights Reserved. 2
Some examples Copyright 2014 Red Olive Ltd, All Rights Reserved. 3
Monitoring Trends Traffic Disruption in London Christmas 2011 Road works ban starts (1 st July 2012) Winter-time road works / end FY Queen s Diamond Jubilee London 2012 Olympic and Paralympic Games Information from Transport for London Oracle Day presentation, 6 Nov 2012 Copyright 2014 Red Olive Ltd, All Rights Reserved. 4
Geographical Mash-ups Visualising: Connections between businesses in East London Based on: Streams of Twitter data, tracking relationships, mentions and retweeets Source: http://www.techcitymap.com/index.html#/ Copyright 2014 Red Olive Ltd, All Rights Reserved. 5
Census Analysis Census 2011: Explore population changes in your area Source: The Telegraph online Interactive tool for looking comparing areas on their 2001 and 2011 demographic profiles http://www.telegraph.co.uk/ear th/greenpolitics/population/94 03239/Census-2011-Explorethe-population-changes-inyour-area.html Original source: ONS data visualisation centre http://www.ons.gov.uk/ons/interactive/index.html Copyright 2014 Red Olive Ltd, All Rights Reserved. 6
Measuring relationships between variables In order to start making connections we need to investigate relationships between variables Start point - relationships between two variables at time Multivariate techniques allow us to investigate relationships between many variables The appropriate measure of relationship depends on the type of data that you re analysing primarily whether scale (numeric) or nominal (categorical) Copyright 2012 Red Olive Ltd, All Rights Reserved. 7
Measures of relationship Scale (numeric) data Correlation quantifies the linear relationship between variables in scatter plots +1 = exact positive relationship e.g. e.g. 0 = no relationship e.g. x x x x x x x x x -1 = exact negative relationship e.g. Copyright 2012 Red Olive Ltd, All Rights Reserved. 8
Correlation coefficient takes values between -1 and +1 The correlation will rarely be exactly 1 or -1 This would suggest that the variables were exactly dependent on each other Likewise the correlation is rarely exactly 0 Because a slight relationship can occur by chance Correlation measures the extent of a linear relationship, so needs to be handled with care Four sets of data with the same correlation of 0.816 For Correlation: Excel function CORREL Copyright 2014 Red Olive Ltd, All Rights Reserved. 9
What is data mining? Copyright 2014 Red Olive Ltd, All Rights Reserved. 10
Two main types of data mining model Type 1: Models driven by a Target Variable e.g. Which site visitors are likely to subscribe? - Implies building a Predictive Model - Directed Data Mining Techniques Type 2: Models with no Target Variable e.g. How does the subscriber base segment? - Implies a Descriptive Model - Undirected Data Mining Techniques Copyright 2014 Red Olive Ltd, All Rights Reserved. 11
Gains Chart based on representative evaluation sample Cumulative % oof respoondents Gains Chart Churn Model 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% prediction random optimal 30.00% 20.00% 10.00% 0.00% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Cumulative % of base Copyright 2014 Red Olive Ltd, All Rights Reserved. 12
Data mining techniques and where they can be applied Copyright 2014 Red Olive Ltd, All Rights Reserved. 13
Techniques to be discussed Predictive Forecasting Decision trees Regression models Descriptive Factor analysis Cluster analysis Affinity analysis Copyright 2014 Red Olive Ltd, All Rights Reserved. 14
Techniques on individual-level data Data mining methods Copyright 2014 Red Olive Ltd, All Rights Reserved. 15
Example Decision Tree Target Variable: Good/Bad Credit Rating Highly significant Best predictor: Income Level 2 nd best predictor: Number of credit cards Final predictor: Age End nodes: No further splits Copyright 2014 Red Olive Ltd, All Rights Reserved. 16 16
Regression Example Regression Model Source: The Times 24/11/2012 Copyright 2014 Red Olive Ltd, All Rights Reserved. 17
The affinity tile map Strengths of affinities are displayed using a hot-cold colour palette By clicking on a tile, details of the pair of products and their affinity are revealed Source: Teradata Copyright 2014 Red Olive Ltd, All Rights Reserved. 18