Big Data & Analytics @ Netflix Paul Ellwood February 9th, 2015
Who Am I? Director, Data Science & Engineering Also Leader, DataKind San Francisco chapter Formerly: Director, Product Analytics @ Netflix Formerly: Associate Partner, Analytics & Optimization @ Rosetta MS in Predictive Analytics from Northwestern BS in Systems & Control Eng. from CWRU
2007!!!
The Future of TV is Finally Here
Competition Winning the Moment of Truth
Stay focused and run fast
Culture of Freedom & Responsibility Context, not control Highly aligned, loosely coupled No brilliant jerks Professional sports team No time or expense policies
Netflix Expense Policy Act in Netflix s Best Interest
QA Team?
Accept things WILL break
Safety Nets
Give folks what they need
Eliminate Rules & Process
Tell vs. Ask
Conrad Gessner The modern world overwhelms people with data and this overabundance is both "confusing and harmful" to the mind.
Francis Bacon If a man will begin with certainties, he shall end in doubts; but if he will be content to begin with doubts, he shall end in certainties.
Innovation Cycle Formulate Hypothesis Productize Offline Experiment A/B Test
Analytics Teams Data Science & Engineering Science & Algorithms Business Stakeholders Consumer Insights Financial Planning & Analysis
(Mostly Cloud) Data Platform Source Source Systems Source Systems Source Systems Systems S3 Sting
Data Science & Engineering Enabling a data-driven culture
Metrics What does success look like?
Data Model How will people use it?
Data Integration How can we stitch it all together?
Data Visualization How should we present it?
Insights What does it mean?
Vertically Aligned Product Streaming Content Marketing Finance Messaging Content Delivery Content Buying Digital Customer Service Sign-up Flow Device Partners Originals TV Billing Social Streaming Supply Chain Out-of-Home Payments Search Infrastructure Consumer Device UIs Insights Rec Algos
Cross-functional Teams Horizontal Roles Skills Technologies Insights Data Scientists Data Analysts Metrics, Analysis, Data Mining, Modeling SQL, R, Python Data Visualization Data Visualization Engineers Design & Build Reports, Interactive Dashboards Microstrategy, Tableau, D3 Data Engineering Data Engineers Data Modeling, ETL Amazon S3, Hadoop, MapReduce, Pig, Python, Hive, Teradata, Redshift, SQL
Data Privacy VPPA EU DPA Judge Robert Bork (1927-2012) 1988 Video Privacy Protection Act:...A video tape service provider who knowingly discloses, to any person, personally identifiable information concerning any consumer shall be liable for - Protecting personal data is a fundamental right - Personal data can only be kept in the context of the legitimate ordinary business activities - Right to be forgotten :
Marketing City-level metrics LTV modeling Media mix modeling Pioneering use of Google s digital stack Audience Targeting Originals Marketing Real Time Bidding (RTB) Optimization Looking for a new leader!
Content Content performance on service Target price for content Catalog coverage for current customers Catalog coverage for future customers Digital Supply Chain
Product Multi-channel Messaging Sign-up flow optimization Deeper personalization of recommendations New UI designs Kids Impact of social Search
Streaming Optimizing the Streaming experience ISP content delivery performance Device partnerships
FinOps Fraud Reduction Global payment processing Gift Cards Bill on Behalf Of Talent & Recruiting Analytics Customer Service -> Insights
Open Source Analytic Tools ScorePMML Time Series Anomaly Detection coming
Questions?
APPENDIX
Star Ratings Row Order Row Ranking Over 50% of hours from homepage Evidence
Smart TV
Extract, Transform, Load (ETL) Logging (Raw) Searches/clicks/plays (Clean Fact) Sums, Avgs (Clean Aggregate) Signals (Join Aggregate) Offline Analysis & Signal Discovery:
A/B Testing: Randomly allocate users to offline strategies Schedule daily job to update model(s) data Measure changes in key metrics over time vs. production Primary: Retention, streaming hours Secondary: Hours from titles discovered in search Genie: Lipstick: Quinto:
Determine what data is good enough Expect the unexpected (edge cases) Be vigilant (everyone involved with A/B test is looking at website/ui) Teamwork highly aligned, loosely coupled Data-driven decision making
The Challenge of Choice
Personalized Recommendations
Adaptive Row Ordering
Predicted Ratings
My List
Facebook Integration
User Experience
Price Tiers
Often product decisions are made by the most passionate person in the room
Sheer volume can get in the way of making the right decisions
Core Metrics What are we trying to accomplish?
Our Core Metrics Non-member Conversion rates Rate of paying for second month Member Retention now revenue weighted Streaming hours a great proxy for retention
Hypothesis Formulation Back to that Freedom & Responsibility thing
Qualitative Studies Voice of the Customer
Data Exploration Digging deep
Time Investment Data Visualization Tool Trade-offs Custom Web Products Sting Robustness
Handling the raw data
Analytics What does it mean?
Example Requests Launch Reporting & Analysis Reach Projections Usage Patterns Viewing Patterns Feature Mining for Algorithmic Models Predictive Models
Hypothesis Evaluation
Large Scale A/B Testing Let the customer decide
Product Strat Meetings A meeting, really??