Machine Learning for Understanding User Behaviours. Semi-Supervised Learning Applied to Click Streams

Similar documents
ANALYTICS CENTER LEARNING PROGRAM

1 Which of the following questions can be answered using the goal flow report?

Big Data Unlock the mystery and see what the future holds. Philip Sow SE Manager, SEA

Speaker Monique Sherrett

Certified Digital Marketing Professional VS-1217

ROULARTA MEDIA GROUP S BIG DATA STRATEGY GETS BIGGER WITH THE HELP OF SELLIGENT TARGET

Comprehensive Analytics on the Hortonworks Data Platform

Google Analytics Health Check Laying the foundations for successful analytics and optimisation

Digital Marketing Success for Websites FERDINAND WEPS SHHARED, APR 23, 2015

Social Media Marketing. Hours 45

Facebook Management BENEFITS OF SOCIAL MEDIA MANAGEMENT

Impressive Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

LEAD GENERATION IN 2013

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Professional Diploma in Digital Marketing

Social Media Marketing Analytics ( 社 群 媒 體 行 銷 分 析 )

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

oneforty. Social Media KPIs and You: A Love Story Janet Aronica Directory, Marketing & Community oneforty

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Google Product. Google Module 1

Transforming the Telecoms Business using Big Data and Analytics

Case Study : 3 different hadoop cluster deployments

SAP and Hortonworks Reference Architecture

Here s your full marketing OS. Reimagined.

Journée Thématique Big Data 13/03/2015

KNIME & Avira, or how I ve learned to love Big Data

Getting Started with Google Analytics 7 Easy but comprehensive steps

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

DIGITAL MARKETING. The Page Title Meta Descriptions & Meta Keywords

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

How To Make Money Using Facebook in Q4. Presented By Jason Roussos Living Direct

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

SOCIAL MEDIA ADVERTISING STRATEGIES THAT WORK

Customer Case Study. Sharethrough

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

HOW DOES GOOGLE ANALYTICS HELP ME?

The Future of Data Management

SmarterStats vs. Google Analytics

A digital Creative Company

Digital Marketing Solutions

DIGITAL MARKETING TRAINING

Open source Google-style large scale data analysis with Hadoop

Online Reputation Management Services

Online and Social Media Marketing Certificate Program. Syllabus

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Pay Per Click (PPC) Drive Calls To Your Business

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Everything You Need to Know About Digital Marketing

Concepts. Help Documentation

Graylog2 Lennart Koopmann, OSDC /

How To Learn Marketing Skills

Digital Marketing Program

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

The Forgotten Majority: Nurturing Unknown Prospects

Is your website generating leads for your business?

THE RECRUITMENT INDUSTRY ONLINE

10 Essential Google Analytics Reports And How They Matter to B2B Executives

Big Data Web Analytics Platform on AWS for Yottaa

4 ecommerce challenges solved using analytics

Sunnie Chung. Cleveland State University

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

Using Twitter for Business

DO YOU YOU HAVE THE. Our experts can get you there REPUTATION MANAGEMENT

Improve your International Recruitment Results with a Digital Marketing Assessment

Our Services Are Designed To Support Online Publishers, Ad Networks, Agencies and Digital Marketing Companies Globally

Infinity Call Tracking

TTI Summer Forum London, Barbara Pezzi, Director Web Analytics & Search Optimization

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

More Enquiries, Same Budget: Solving the B2B Marketer s Challenge

A quick guide to. Social Media

Inbound Marketing Overview. January 26, 2015 BEC 382

LDmobile launch «Track it» at the MWC 2012

Google Analytics & Social Media Monitoring Jeremy Coates

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Digital Marketing Training Institute

Title/Description/Keywords & Various Other Meta Tags Development

Teradata Unified Big Data Architecture

Easy Strategies for using Content (Ctrl) in your Marketing Today

1. Introduction to SEO (Search Engine Optimization)

SKoolAide Privacy Policy

8 Killer Tips: How to Use Facebook for Event Marketing

Marko Grobelnik Jozef Stefan Institute Ljubljana, Slovenia

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A 7-Step Analytics Reporting Framework

Advanced In-Database Analytics

Transcription:

Machine Learning for Understanding User Behaviours Semi-Supervised Learning Applied to Click Streams

Goals Motivation for semi-supervised learning and log analytics Overall Methodology Leveraging Hadoop / Java to create datasets Machine Learning Applied

Online Retail Upsales / Cross Sales Personalized Offers Newsletter Targeting Competition Optimized Price Optimized Display Campaigns Funnel Optimization.

Publishing & Media Understand and monetize audience Acquire and sell customer data Personalize User Experience Optimize Conversion Funnels

Product Manager Questions Different Customer Behaviours Subscribers Newcomers How do behaviour evolve in time? Do customer search & find relevant content?

What kind of session? Fan that loves to rate and comment Search for a specific Topic Newcomer from Google News Foreigner Discovering the Web Site Home Page Wanderer

Product Manager Available Data Click Streams Topics / Content Referential Competitors / Outside Data Customer Feedbacks Server Logs

Structured? Number of events / pages Number of Users Click Stream + 100 M 10 M Content Hierarchy ++ 10 K - Web / RSS +- 100 M User Ratings + ~ 100 K ~ 10K Server Logs +- 100M 10M

Methodology Overview Raw Data User Sessions Session Analytics Hadoop! Aggregation Sampling /! Clustering /! Viz Define Session Model Detailed Session Stats Reports Supervised! Learning Manual Session Tagging

Add Signals to the Sessions Click Stream Sessions GeoIP + Geo/ISP/Device + Channel + Page View Marketing Campaigns Page Level Analytics + Customer history Session History + Generated Revenue + Main topic CRM Articles Topics + Error / Codes Sessions To Analyze = ~ 50 dimensions

ClickStream Click Stream UserID TimeStamp Page Refererer.. 94390940 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 30418239 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 99666658 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 99666658 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 43653729 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43 94390940 2013/11/11-12:32:43

Build Sessions Sessions Click Stream UserID TimeStamp Page Refererer.. SessionID 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1 94390940 2013/11/11-12:32:43 94390940-1

Add Signals to the Sessions Sessions Click Stream UserID TimeStamp Page Referer,,,, SessionID 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1 94390940 2013/11/11-12:32:43 1

Add Signals to the Sessions Click Stream Sessions + Geo/ISP/Device/Time GeoIP / ISP Database UserID TimeStamp Page User Agent IP,,,, SessionID City ISP Device OS! Local Hour 94390940 2013/11/11-12:32:43 Mozilla 4.1 ( ) 193.4.1.3 1 PARIS FREE SAS Mobile ios 14

Channel Signal Click Stream Sessions + Geo/ISP/Device + Channel GeoIP / ISP Database Marketing Campaigns UserID TimeStam p Page User Agent IP,,,, SessionID City ISP Device OS Marketing Source Camp. 94390940 2013/11/1 1-12:32:4 3 Mozilla 4.1 ( ) 193.4.1.3 1 PARIS FREE SAS Mobile ios AdWords ad-213 E-Mailing Retarg. e-2013 crteo-2 DIsplay.. None.. SEO..

Channel Signal Click Stream Sessions + Geo/ISP/Device + Channel GeoIP / ISP Database Marketing Campaigns UserID TimeStam p Page User Agent IP,,,, SessionID City ISP Device OS Marketing Source Camp. 94390940 2013/11/1 1-12:32:4 3 Mozilla 4.1 ( ) 193.4.1.3 1 PARIS FREE SAS Mobile ios AdWords ad-213 E-Mailing Retarg. e-2013 crteo-2 DIsplay.. None.. SEO..

Channel Signal Click Stream Sessions + Geo/ISP/Device + Channel GeoIP / ISP Database Marketing Campaigns + Customer history UserID TimeStamp Page User Agent IP Marketing Source Camp.. First Contact N days ago Last Contact N days ago N Contacts Last 30 Days 94390940 2013/11/11-12:32:43 Mozilla 4.1 ( ) 193.4.1.3 AdWords ad-213 231 2 19 E-Mailing Retarg. e-2013 crteo-2 DIsplay.. None.. SEO..

Channel Signal Click Stream Sessions + Geo/ISP/Device + Channel + Customer history + PageView +. Sessions To Analyze = ~ 50 dimensions UserID TimeStamp Page User Agent IP N Contacts Last 30 Days. N PageViews N Page < 5 sec N Search Page 2013/11/11-1 Mozilla 4.1

How to do these aggregations? Hive Job Pig Job Spark Hadoop Streaming + Python Scalding Cascading + Java JUG READY!

Methodology Overview Raw Data User Sessions Session Analytics Hadoop! Aggregation Sampling /! Clustering /! Viz Define Session Model Detailed Session Stats Reports Supervised! Learning Manual Session Tagging

Sessions Data

Clustering & Cluster Sampling

Tag ~ 1000 Sessions Search for a specific Topic Newcomer from Google News Foreigner Discovering the Web Site Fan that loves to rate and comment Home Page Wanderer Dark Bot (From A Competitor?)

Methodology Overview Raw Data User Sessions Session Analytics Hadoop! Aggregation Sampling /! Clustering /! Viz Define Session Model Detailed Session Stats Reports Supervised! Learning Manual Session Tagging

Supervised Learning Mahout (Logistic Regression?) Python Scikit MLI + Spark (Logistic Regression?) R Hadoop

Methodology Overview Raw Data User Sessions Session Analytics Hadoop! Aggregation Sampling /! Clustering /! Viz Define Session Model Detailed Session Stats Reports Supervised! Learning Manual Session Tagging

Reporting Part Hive + Desktop Reporting Tool Python (Pandas) ElasticSearch (Optionally with Kibana)

Sessions Statistics Search for a specific Topic 938k sessions 0.3 `` per session 0.23 ` acquisition costs Newcomer from Google News 738k sessions 0.83 per session 0.73 acquisition costs Fan that loves to rate and comment 13k sessions 1.3 per session 0.23 acquisition costs Foreigner Discovering the Web Site 68k sessions 0.3 per session 1.23 acquisition costs Home Page Wanderer 938k sessions 0.3 per session 0.23 acquisition costs 1k sessions 0 per session 0 acquisition costs Dark Bot (From A Competitor?)

Next: Follow Transitions Search for a specific Topic 938k sessions 0.3 `` per session 0.23 ` acquisition costs Newcomer from Google News 738k sessions 0.83 per session 0.73 acquisition costs Fan that loves to rate and comment 13k sessions 1.3 per session 0.23 acquisition costs Foreigner Discovering the Web Site 68k sessions 0.3 per session 1.23 acquisition costs Home Page Wanderer 938k sessions 0.3 per session 0.23 acquisition costs 1k sessions 0 per session 0 acquisition costs Dark Bot (From A Competitor?)

Thank you! Tweet #ParisJUG @fdouetteau Semisupervised learning for user sessions We re hiring jobs@dataiku.com