20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns



Similar documents
Working with telecommunications

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Data Mining Solutions for the Business Environment

A Knowledge Management Framework Using Business Intelligence Solutions

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Data Mining for Successful Healthcare Organizations

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Data Mining in Telecommunication

Customer Analytics. Turn Big Data into Big Value

Database Marketing, Business Intelligence and Knowledge Discovery

Situational Awareness Through Network Visualization

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

BUSINESS INTELLIGENCE. Keywords: business intelligence, architecture, concepts, dashboards, ETL, data mining

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

TIM 50 - Business Information Systems

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining System, Functionalities and Applications: A Radical Review

DATA MINING AND WAREHOUSING CONCEPTS

Transforming the Telecoms Business using Big Data and Analytics

DHL Data Mining Project. Customer Segmentation with Clustering

CHAPTER SIX DATA. Business Intelligence The McGraw-Hill Companies, All Rights Reserved

Foundations of Business Intelligence: Databases and Information Management

Data. Data and database. Aniel Nieves-González. Fall 2015

SPATIAL DATA CLASSIFICATION AND DATA MINING

Adobe Insight, powered by Omniture

ADVANCED DATA VISUALIZATION

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

Principles of Data Visualization for Exploratory Data Analysis. Renee M. P. Teate. SYS 6023 Cognitive Systems Engineering April 28, 2015

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Introduction to Data Mining

relevant to the management dilemma or management question.

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

The US Bridge Portal -Visualization Analytics Applications for the National Bridge Inventory (NBI) Database

Using Data Mining for Mobile Communication Clustering and Characterization

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Business Intelligence, Analytics & Reporting: Glossary of Terms

Foundations of Business Intelligence: Databases and Information Management

Using Tableau Software with Hortonworks Data Platform

Business Intelligence Solutions for Gaming and Hospitality

Tableau's data visualization software is provided through the Tableau for Teaching program.

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

CHAPTER 5: BUSINESS ANALYTICS

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Arti Tyagi Sunita Choudhary

Location Analytics for Financial Services. An Esri White Paper October 2013

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Advanced In-Database Analytics

When to consider OLAP?

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Analytics with Excel and ARQUERY for Oracle OLAP

Data Warehouse: Introduction

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

What is Visualization? Information Visualization An Overview. Information Visualization. Definitions

Foundations of Business Intelligence: Databases and Information Management

SQL Server 2012 Business Intelligence Boot Camp

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

uncommon thinking ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION ONSITE TRAINING OUTLINES

TEXT ANALYTICS INTEGRATION

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

BENEFITS OF AUTOMATING DATA WAREHOUSING

The Value of Visualization 2

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Cablecom Delivers Unique Customer Experience Through Its Innovative Use of Business Analytics

PhonEX ONE Microsoft Sample Reports November 2010

Cúram Business Intelligence and Analytics Guide

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

Visualizing Repertory Grid Data for Formative Assessment

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Three proven methods to achieve a higher ROI from data mining

PBI365: Data Analytics and Reporting with Power BI

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Introduction. A. Bellaachia Page: 1

Data Warehousing and Data Mining in Business Applications

Visualization Techniques in Data Mining

Self-Service Business Intelligence

Data Mining Applications in Fund Raising

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Data Warehousing and OLAP Technology for Knowledge Discovery

CHAPTER 4: BUSINESS ANALYTICS

Course MIS. Foundations of Business Intelligence

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Safe Harbor Statement

Prediction of Heart Disease Using Naïve Bayes Algorithm

Analytical CRM solution for Banking industry

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Big Data-Challenges and Opportunities

Transcription:

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing their prepaid subscriber usage patterns. They have a challenge of reducing prepaid churn and maximizing the lifetime value of their subscribers. The prepaid subscriber is anonymous, and the only way a prepaid subscriber gives information to the operator is through call records of events on the use of the telecommunications network. Thus, the call details in their raw form do not provide any useful information. In addition, these details provide an overwhelming amount of data that is not easy to analyze. To assist the telecommunications operators, this study undertook to develop a visualization framework that helps telecommunication operators discover prepaid subscriber usage patterns. An exploratory approach was used to unravel subscriber usage patterns from call data records obtained from a local telecommunication operator in Uganda. Five visualization tools that were selected based on their functionalities. Based on the findings, a visualization framework for discovering subscriber usage patterns is presented. The framework is evaluated using call data with known knowledge obtained from the local telecommunication operator. Results outline the strengths of various visualization techniques as regards to specific prepaid usage patterns. Introduction The growth in prepaid mobile subscribers in the telecommunication industry is increasing at a fast rate. Up to 90 percent of mobile subscribers are prepaid (Meso et al., 2005). Equally, the turnover of these subscribers, as they cease to use mobile services or switch to a competitor is quite high. This migration is referred to as churn in the mobile telephone industry. Information available indicates that up to half or more prepaid customers are likely to change operators in a 12-month period. To reduce churn, operators are starting to rethink their prepaid strategies. In this respect loyalty and customer care programmes specifically tailored for prepaid customers are critical in the prepaid mobile market. In order to minimize churn it is imperative that potential churn is identified in advance (Karen, 2004). Tailoring specific loyalty and customer care programmes for these subscribers is therefore an important business requirement for all mobile telecommunication operators. The challenge is that, most telecommunication companies do not have sufficient information about their prepaid subscribers (Shearer, 2004). This is despite, most 251

252 Advances in Systems Modelling and ICT Applications prepaid subscribers giving the operator information about themselves through recorded events on the use of the telecommunication network - Call Detail Records (CDRs). The raw CDRs do not provide immediate useful information to the telecommunication operators. Turning this raw CDRs into giving significant insights of customers and markets is the main challenge. Therefore this study aims at using the CDRs to enable the telecommunications operators overcome this anonymous nature of prepaid subscribers, discover valuable information and gain insight of its subscribers. Methodology An exploratory research approach that utilized the visualization techniques and interactive functionalities to unravel hidden patterns from detail call data from a local telecommunication company was used. Visual Data mining Visual data mining aims at integrating the human in the data analysis process, applying human perceptual abilities to the analysis of large data sets available in today s computer systems ((Bustos et al., 2003). The basic idea of visual data mining is to present the data in some visual form, allowing the user to gain insight into the data, draw conclusions, and directly interact with the data (Chung, 1999). Visual presentations can be very powerful in revealing trends, highlighting outliers, showing clusters, and exposing gaps. Visual data exploration often follows a three step process: Overview first, zoom and filter, and then details-on-demand (Shneiderman, 2002). In other words, in the exploratory data analysis of a data set, an analyst first obtains an overview. The data analyst directly interacts with the visualizations and dynamically changes the visualizations according to the exploration objectives. This may reveal potentially interesting patterns or certain subsets of the data that deserve further investigation. The analyst then focuses on one or more of these, inspecting the details of the data (Shneiderman, 1998). Five different visualization tools were selected for this study based on their encompassing spectral capabilities. The tools included both 2-dimensional and Multi-dimensional display techniques. This included the use of scatterplots suitable for identifying trends and outliers, parallel coordinates that are suitable for viewing subscriber mobility, outliers, relationships, barcharts, piecharts, that are good for ranking patterns like high and low usage subscribers. Requirements Analysis Analysis of the business requirements was done by reviewing existing reports and interviewing various users in the local telecommunication company. This was done to identify and clarify the user needs that were subsequently used to understand and identify the data required to meet these needs. The billing systems

A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 253 proved to be the rich and convenient source of data used in this study because it stores online historical data of up to four months in an oracle database. Table 1 shows statistical text reports that are currently generated and the questions that need to be answered. Table 1. Reports generated from the call records. Text Report Calls by tariff plan Calls by subscriber Calls by location Calls by destination Calls by time of day First and last call by subscriber Daily call summary Developing the Data Mart Questions that Need to be Answered Which is the most used tariff plan? Which customer group is highly profitable, which one is not? Which subscribers are high users? Which subscribers are low end users? To which customers should we advertise speciall offer? What is the likely home location of the subscriber? Which locations have low and high traffic? What is the most called destination? Which are the least called destination? Which subscribers are business users? Which subscribers are personal users? Which subscribers have just joined the network? Which subscribers have left the network? Which subscribers are about to leave the network? What is the call usage trend? Which days have low and high usage? Is there anything unusual? The requirements analysis stage identified the following key call detail attributes; Originating number - the calling number. Tariff plan - tariff plan associated with the subscriber at the time of making a call. Terminating number - call destination (called number). Call time - date and time of starting the call. Location - network location where the call was made. Duration - the duration of the call in seconds. Charge - the charge for the call.

254 Advances in Systems Modelling and ICT Applications The data requirements identified above were transformed into dimensions that constitute a model for the data mart (Figure 1). Fig 1. Data Mart Dimension Model Creation of the Database Oracle RDBMS was used to implement the above model. To extract, transform and load the data into the data mart, SQL scripts were written. The data was extracted from the billing system (OLTP system) and loaded to the staging area where cleaning and transformation took place (Figure 2). The following details for each call were extracted; calling number, called destination, location, call date, tariff plan, duration of call and the charge of call. After cleaning and transforming the data, it was loaded into the data mart in summarized form. The day was chosen at the lowest granularity of data. The summarized data was used because the size of the call details data stored in the billing system was large. Fig 2: Data Mart Architecture Billing System Staging Area Data Mart Data extracted from OLTP system Transformed data ETL Developing a Visualization Framework Fayyad et al. (1997) describe data mining as a collection of powerful analysis techniques for making sense out of very large datasets. There is no one data mining

A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 255 approach, but rather a set of techniques that often can be used in combination with each other to extract the most insight from the data. This study used visual data mining approach to gain insight into the usage patterns hidden in the subscriber call detail records. Visualization Tools The following five different visualization tools were used in this study. a) Eureka - (http://infovis.cs.vt.edu/demos/) b) Advizor (http://www.visualinsights.com/) c) GGobi - (http://www.ggobi.org/) d) XmdvTool (MultivariateData Visualization Tool) (http://davis.wpi.edu/ xmdv/) e) Omniscope (http://www.visokio.com/) The interactive techniques that are imbued in these tools included, filtering, brushing, sorting, identification, grouping and linked (multiple) views. Each tool was used to explore the CDRs and observing the different visual views presented by each tool, identify the technique suitable for visualizing particular kind of patterns. Suitable technique to reveal specific patterns in the CDR data was established based on the guidelines detailed below (Pillat et al., 2005). Based on these guidelines, each tool and the techniques used were assessed for suitability for use in telecommunications subscriber usage pattern discovery. a) Ability to reveal patterns in call records data. The purpose of the visualization tool is primarily to gain knowledge through the recognition of patterns in the data. In this respect, the ability of a visualization technique to provide answers to questions such as those in table 1. b) Ability to visualize massive telecommunications call data c) Support for multi-dimensional data. The call data typically contains several dimensions such as subscriber, tariff plan, time, location, destination, the visualization technique must be able to display many dimensions in a single display. d) User interaction - To support the user to gain insight of the displayed data, the technique should provide the ability to dynamically manipulate the display. e) Easy to use - The visualization tool should be ease to use in terms of loading data in different formats; presenting clear displays that are easily perceived by users. The figures that follow below are examples of exploring the CDRs for patterns using the five visualization tools. These are meant to highlight the out put of techniques used by the tools.

256 Advances in Systems Modelling and ICT Applications Fig 3: Table lens Shows the Focused Details and the Rest of the Display Remains in Context. Figure 3 shows the table lens tool display of data, using focusing interaction technique. Details of a particular point of interest in the display can be revealed. Fig 4: XmdvTool Parallel Coordinates Display of Call Data with brushed data points Figure 4 shows the XmdvTool brushing technique used for linking multiple displays of the same data. One of the displays is used to select the data elements of interest by brushing the points with color and immediately the corresponding elements are highlighted on the other existing displays. Fig 5: GGobi Scatterplot of Subscribers by Call Date Figure 5 shows the Ggobi scatterplot display for identifying churn and joiners. In the display the dot represents the subscriber that made calls on a given day.the display is useful for telling the subscribers who have left the network and those who joined for the period in consideration.

A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 257 Fig 6: Omniscope Filtering Technique Used to Select Data Figure 6 shows Omnicscope tool filtering interaction technique used to graphically select data of interest. The display shows the filtered data by selecting on the chart view only tariff plan 4. The table view is automatically updated to show only tariff plan 4 details. Fig 7: Advizor Multiscape(3D) Display of Calls for each Subscriber by Tariff Plan and Location Figure 7 shows Advizor tool ability to display multiple displays on one page. The 3D display are useful in analyzing the relationship between three variables. For example displaying the subscriber calls by location and tariff plan on the same page, it possible to tell the most likely home location of the subscriber and the most common tariff plan used by subscribers. Validation of the Framework The validation of the framework is a testing phase, in which the developed visualization framework was applied to real world telecommunications call data obtained from the local telecommunication company.

258 Advances in Systems Modelling and ICT Applications Table 2. Summary of visualization tools Display and Interaction Techniques Table 3 represents the techniques and the patterns. The rows represent visualization techniques and each column represents a pattern of interest in the call records data. Again, Y in the column signifies that yes, the visualization technique can be used to reveal the pattern satisfactorily. A blank signifies Not Applicable. The results show that techniques such as parallel coordinates and multiscape can reveal highest number of patterns in the call records. Scatterplot, Time series and multiscape can reveal churn, one of the most interesting patterns in telecommunications. Table 3. Visualization Technques and Patterns Results The proposed framework is based on the premise that a visualization tool consists of two techniques; display and interaction techniques. The display techniques present the graphical display of data and range from common visual displays to special purpose displays. Interaction techniques are for manipulating the data in the displays. These techniques are used to pose graphical queries on the data so that further insight

A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 259 can be gained about the data. In this study, display techniques have been divided into two categories:- specialized and the traditional or common techniques. Based on the number of attributes that can be displayed, the specialized techniques are further divided into two; multidimensional and two-dimensional. Multidimensional display techniques: Are those that can display more than two variables (attributes) at the same time, for example multiscape (3D), parallel coordinates and Table lens. Multidimensional display techniques can reveal two types of patterns using the telecommunications call data, namely; a) Ranking patterns: These are patterns that either rank or categorize the attributes. For example: most called destination, tariff plan with highest and lowest number of subscribers, locations with high and low number of calls, high and low using subscribers. b) Unique patterns: These are patterns that are not easily seen with text data or reports. These include; churn - subscribers who have left the network, Joiners new subscribers that have just joined the network, changes in subscriber profiles (subscribers who keep changing tariff plans), subscriber mobility from one location to another. Two Dimensional Techniques: Are those that can display only two variables (attributes) at the same time, for example time series plots, scatterplots, and line graphs. The time series techniques reveal trends or changes with time and therefore in telecommunications this technique is useful for identifying the following; churn - subscribers who have left or about to leave the network. Joiners - subscribers who have just joined the network. Outliers exceptions that need not to be ignored but investigated. Line graphs can reveal trends, for example subscribers who have stayed longer on the network are generally high users as compared to new subscribers. Scatteplots can reveal relationships between attributes that need to be investigated further. In addition scatterplots can be used to tell churn, joiners and outliers. The regular or traditional techniques (barchart, piechart) are also display techniques that can be used for revealing ranking patterns in telecommunications. Interaction Techniques: Are used to manipulate the displays presented by the display techniques. Although in visualization the display techniques can immediately tell facts about the data being displayed, interaction techniques are used to uncover patterns that cannot easily be seen on a single display of data that is gaining insights into hidden patterns. These techniques include brushing, focusing, filtering, labeling, and sorting. In order to benefit from these techniques, more than one display of the data is used. Interaction techniques help pose questions on the data being displayed and generate answers to these questions. Using telecommunications call data records, below are typical questions that can be answered with the help of interaction techniques; Where do high using subscribers originate their calls from? Which subscribers should be targeted with advertising?

260 Advances in Systems Modelling and ICT Applications Where do the high users call most? Which tariff plan has the greatest number of low users? After exploring the call data records for patterns using visualization tools as described in the previous section, a framework for visualizing subscriber usage patterns in telecommunications is presented in figure 7. Figure 7. Framework for Visualizing Subscriber Usage Patterns in Telecommunications Discussion Visualization approach is a preferred alternative to automated data mining approaches. The results show that, with exception of Eureka, the five visualization tools have almost the same techniques; the only significant difference is the size of the data set, and the data formats supported by each tool. Consequently, a framework for visualizing usage patterns in telecommunications is described. The framework shows techniques and the patterns each technique is suitable for in telecommunications subscriber call records. The framework classifies the visualization techniques into two; the specialized techniques and common techniques. The patterns in the telecommunications call records have also been divided into three; trending patterns, ranking patterns and unique patterns. The results also show that Omniscope, XmdvTool and GGobi are limited in the size of data set they can support. They are not robust as Advizor and Eureka. Conclusion In this study we have suggested a visualization framework that serves as a model for discovering prepaid usage patterns using existing visualization techniques. The paper has described a range of existing visualization techniques; simple patterns can be seen on just a single display, but hidden patterns can be discovered by using multiple displays together with interaction techniques. This work has further demonstrated that the visualization techniques cut across all the application domains with varying degrees of strengths and weaknesses.

A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 261 An interesting area of study is to find out the relationships between the subscriber calling patterns and their airtime loading patterns. This study did not use the subscriber demographic data due to its unavailability from the local telecommunications operator whose data was used in the study. Calling patterns with supporting demographic data can make telecommunications operator gain more insight of their subscribers. References Bustos, B., Keim, D. A., Panse, C., Schneidewind, J., Schreck, T., Sips, M. and Wawryniuk, M. (2003). Pattern Visualization. http://dke.cti.gr/panda/tasks/deliverables/dlv-2-3.pdf. (Accessed: March 29, 2006). Chung,W. P. (1999). Visual Data Mining. http://www.pnl.gov/infoviz/visual_data_mining. pdf. (Accessed: March 30, 2006). Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., (1996). From data mining to knowledge discovery in databases. AI Magazine, Fall 1996, pp. 37-54. Meso, P., Musa, P. and Mbarika, V. (2005). Towards a model of consumer use of mobile information and communication technology in LDCs: the case of sub-saharan Africa. Information Systems Journal. Vol. 15(2): pp119-146. Karen, G. S. (2004). Customer-Centered Telecommunications Services Marketing. London: Artech House. Pillat, R. M., Valiati, E. A. and Freitas, C. M. D. S. (2005). Experimental Study on Evaluation of Multidimensional Information Visualization Techniques, Proceedings of the 2005 Latin American conference on Human-computer interaction, Cuernavaca, Mexico. Shearer, C. (2004). Anticipating Consumer Behavior with Analytics. http://www.crm2day. com/library/ (Accessed: February 8, 2006). Shneiderman, B. (1998). The Eyes Have It: User Interfaces for Information Visualization. http://hci.stanford.edu/cs547/abstracts/97-98/980220-shneiderman.html. (Accessed: March 28, 2006) Shneiderman, B. (2002). Inventing Discovery Tools: Combining Information Visualization with Data Mining, Journal of Information Visualization, 1(1): 5-12.