Introduction to Predictive Analytics Dr. Ronen Meiri
Outline From big data to predictive analytics Predictive Analytics vs. BI Intelligent platforms What can we do with it. The modeling process. Example Life time value. How DMWay makes predictive analytics easy.
The digital revolution Printing revolution (Gutenberg's press ~ 1450) Scientific revolution (~1550), mechanics, medicine, chemistry, optics, electricity Industrial revolution (~1800), textile, chemicals, agriculture, transportation, muss production Digital revolution (~1950) Accounting, a man on the moon, Signal processing, information retrieval, wearable Computing
Albert Einstein Computers are incredibly fast, accurate and stupid; Humans are incredibly slow, inaccurate and brilliant; Together they are powerful beyond imagination.
Big Data is all over. ~4 Zetta bytes of data (2013) http://en.wikipedia.org/wiki/zettabyte Major players: Facebook - 1,150 million users Gmail 425 million users Skype 300 million users Tweeter 500 million users (M200 active) WhatsApp 300+ million users Youtube 1,000 million users (4 billion views a day) Instagram - 150 million users Many others - Google, Waze, Amazon, Ebay, Paypal, Value Symbol Name 1000 kb kilobyte 1000 2 MB megabyte 1000 3 GB gigabyte 1000 4 TB terabyte 1000 5 PB petabyte 1000 6 EB exabyte 1000 7 ZB zettabyte 1000 8 YB yottabyte Sources: http://www.calcalist.co.il/local/articles/1,7340,l-3602417,00.html (Calcalist, May 2013) http://expandedramblings.com/index.php/resource-how-many-people-use-the-top-social-media/ September 15, 2013
Emphasis So Far How to store and manage big data? Should one host the data on premises or on the cloud? Multiple technologies: Hadoop Hbase MongoBD nosql databases Others How to gain benefits from big data?
From big data to data science Data is a strategic asset (competitive advantage) Extract the value buried in the data for decision making Hottest buzzword these days is Data Science NY Times declared data science as the sexiest job of this century The MCkinsey group estimates a shortage of 150,000-190,000 data scientists by the end of 2018 http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
Data Science - Venn diagram
Business Analytics Business Analytics Descriptive Analytics Predictive Analytics Prescriptive Analytics
Descriptive Analytics (BI) Good for reporting Interactive analytics Measuring the business performance (KPI) Ad-hoc reporting Works well with huge amount of data (Big data). Relatively easy to use Value?
Prescriptive Analytics Proactive - Cannot do without it Maximize business performance Combines business rules with modeling (descriptive, predictive) to drive actions
Predictive Analytics Looks on past events to predict future outcomes (targeting, churn ) Complex modeling techniques (statistics, math, ML, computer science ) Proactive Value
Predictive Analytics vs. Descriptive (Gartner)
Predictive Analytics vs. BI Sorry business intelligence gurus, but BI is no longer good enough business intelligence reports and dashboards describe what has already happened they are not proactive Ian A. Bertram, in Gartner Business Intelligence & Analytics Summit March 18, 2013 http://data-informed.com/gartner-researchers-predictiveanalytics-to-gain-traction-in-business/
Intelligent Platform Cycle Collect Data Analyze Evaluate Deploy
PA what can we do with it? Prediction Estimate the Life Time Value of a new customer Estimate the expected losses or the number of claims in insurance policy. Expected deposits Classification Who is likely to churn. Who is likely respond to an offer Who is likely to default on a loan in the next period of time
PA what can we do with it? Forecasting Stock price Forecast KPI (expected sales in next month, quarter, year ) Seasonality Collaborative Analytics (Wisdom of the crowd) What product to offer to what user
Modeling Process Model the business problem Data/ETL Analyze Deploy evaluate
Modeling the business problem What is the business problem What is a churn? What is the definition of the LTV How to define success in a process? Relevant and actionable Churn for example What are the data component User activities? How the solution can be integrated within the organization s operational system Batch process Near real time
Data Modeling Map the data sources Map the relations between the data sources Create the analysis dataset (structure the data in 2D)
Analyze Relevant modeling algorithms (linear regression, trees, logistic regressions ) Transformations Feature selection Validate
Deployment Write pseudo code Code the predictive model in SQL, Java,. Imbed the code in the operational systems Add business rules (Prescriptive Analytics).
Evaluate Collect performance data Measure the model performance Recalibrate/Update model Set alerts when model performance degrade