Analyze It use cases in telecom & healthcare Chung Min Chen, VP of Data Science The views and opinions expressed in this presentation are those of the author and do not necessarily reflect the position of the company.
Company History 984 999 January 202 February 203 Rooted from AT&T Bell Labs Most trusted provider of communications technology and services Telcordia Acquired by Ericsson Wholly owned by Ericsson, doing business as iconectiv 2
Four Pillars of Data Analytics Describe summaries, reports, aggregates, trends real time rule engine, data driven campaign Prescribe Diagnose anomalies, correlation, causality machine learning, data mining, math Predict
Data Analysis Now and Then Trend: integrate and analyze dynamic data from diverse data sources Then Now User demography User demography User experiences Network data OLAP, customer segmentation Big data analytics Target user list for churn in Precision marketing with cross sell, up sell More accurate network monitoring BI driven network planning
Telecom Support Systems Billing Marketing CRM BSS (Business Support Systems) Provisioning Assurance OSS (Operation Support Systems) Inventory Configuration NMS (Network Mgt. Systems) EMS (Element Mgt. Systems) Telecom Networks
Use Case: Network KPI Dashboard Network Health Dashboard KPIs SLA thresholding What KPIs most indicative? Severity: Weighted sum of KPIs How to determine weights?
Logistic Regression KPIs:,,, weights:,,, severity levels s:, 2,, 5 Given x, its estimated severity where Training set: (,,(,,,(, For each severity level j, train a logistic regression predicator ) that minimizes ) 2 At run time, given,,,, predict the severity to be the one that maximizes ),..5
Logistic Regression KPIs:,,, weights:,,, severity levels s:, 2,, 5 f,2,3,4,5 training set (, (, (, (, (0, (, LR model 0, 0,
Logistic Regression KPIs:,,, weights:,,, severity levels s:, 2,, 5 Given x, its estimated severity where Training set: (,,(,,,(, For each severity level j, train a logistic regression predicator ) that minimizes ) 2 Gradient descent At run time, given,,,, predict the severity to be the one that maximizes ),..5
Use Case: Network Catastrophe Prediction Finding catastrophe precursor Multi scale time series analysis Wavelet, product coefficient Moving window Clustering queue length deviation latency packet loss precursor network meltdown starts time
Marketing and Sales Then Market/customer segmentation Based on static customer profile Now Precision marketing and cross /up sell recommendations Based on complicated algorithms with detail user activity data
Use Case: Product Recommendation Predict what products/services a user may need or be interested For newly onboard users: Predict based on user profile: age, gender, income, geo location, family status, education level, Clustering, classification tree, SVM 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3
Use Case: Product Recommendation After a while Collaborative filtering based on interest matrix products users????
Latent Factor Model: Matrix Factorization Give a large R, find small P, Q such that,,, where k << n, m, : prediction of user i s interest on product j c, : user i s taste n k n P m R k 0.7? m Q, : estimate of user s interst on product j, :productj s appeal
Matrix Factorization : user product matrix, Find P, Q that minimize (i,j) (,, ) 2 over all (i,j) where, is known observed estimated Regression through alternate projections : Hold P as constant and solve Q in regression Hold Q as constant and solve P in regression Repeat until P, Q converge (to a local optimum) c
Use Case: Network Engineering Identify regions with poor wireless quality where users are churning out Use data from Number Portability database Call Detail Records (which include cell IDs) Challenges Phone numbers no longer linked to geo locations User address not necessary reflect usage zones User may roam across multiple cell id zones Hypothesis must be statistical significant (t distribution). Can t just use simple report/stats
Conclusions Data analytics= data mining + machine leaning + statistics + math New twist: big data Data driven business intelligence and operation optimization Many interesting use cases in vertical industries Telecom, healthcare, energy, financial, national security