A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar SEZER Hacettepe University Computer Engineering Department, Ankara, Turkey ebru@hacettepe.edu.tr ABSTRACT In recent years, many companies have begun to use data mining and decision support systems (DSS) for decision making activities. Although their use is increasing continuously, DSSs are generally built as desktop applications and designed for the use of data mining experts. The purposes of the present study are selected as design and implementation of a webbased data mining exploration and reporting tool namely ASMiner. ASMiner provides exploring and reporting on three data mining techniques (decision trees, clustering and association rules mining), by presenting a scalable and fully webbased thin client data mining tool for both decision makers and knowledge workers. KEYWORDS DSS, Decision-making, Web-based data mining 1. INTRODUCTION The data mining is a useful tool for decision-makers in predicting and planning the future. It is possible to say that the data mining methods may have a crucial importance among the existing approaches to solve forecasting problems encountered in all engineering areas, medical and applied sciences, etc. in near future. Web based technologies have been revolutionized the design, development and implementation stages of decision support systems (Ba & Kalakota, 1995; Bhargava & Power, 2001). Moreover, the Web environment is expanding as a very important DSS development and delivery platform (Shim et al., 2002). The key advantages of the web based tools when compared with the traditional batch-based or client-server oriented tools include ease of-use, universal access across information technology platforms, and single minute response and feedback based upon dynamic and real-time data (Heinrichs & Him, 2003). Development of a completely web based data mining exploration and reporting tool to save time during the exploration and reporting phases of data mining applications and to enable even typical users to be effective decision makers are the main purposes of the present study. For the purpose of the study, a tool namely ASMiner is developed. ASMiner employs Microsoft SQL Server Analysis Services behind the scene as the data mining engine and it currently supports three data mining techniques such as decision trees, clustering and association rules. 2. WEB BASED DATA MINING TOOL DEVELOPED In market, it is possible to find numerous numbers of data mining tools and applications requiring professional data mining background and practice. Owing to these requirements, the data mining solutions as the software packages are used by data mining experts, only. Moreover, most of all commercial data mining solutions are implemented with non web-based approaches. Furthermore, report production in many data

mining software still requires exhaustive and time-consuming processes. To cope with these difficulties, ASMiner considers the knowledge workers to help them in the process of becoming data miners and to achieve this, it presents easy to understand, user friendly and perspicuous user interfaces in exploring mining models created in Microsoft Analysis Services. In market, some databases such as Oracle, MS SQL and WEKA etc exist. However, when developing ASMiner, Microsoft Analysis Services is preferred owing to its cheapness and commonly usefulness. Microsoft Analysis Services has been the business intelligence component of Microsoft SQL Server software since 2000. In decision tree algorithm platform, Microsoft invented it is own decision tree algorithm namely Microsoft Decision Trees. This algorithm can handle both categorical and continuous variables as well as CART and CHAID. In addition, it supports entropy and Bayesian score as the splitting strategy and unlike the other famous algorithms, it offers no pruning phase. In Analysis Services, as soon as a decision tree model is created, a corresponding dependency network is also formed. In clustering models, Microsoft Analysis Services offers two types of clustering algorithms such as K-Means and EM (Expectation- Maximization) with scalable and non-scalable versions. On the other hand, well known Apriori algorithm is employed in association rules mining. ASMiner uses client connectivity interfaces of SQL Server in both OLTP and data mining aspect. ADOMD.NET and AMO has been used as the entry point to Analysis Services. ADOMD.NET is mainly focused on retrieving mining models meta-data. However, AMO provides management options on server objects in Analysis Services. Thus, model training/processing operations and model settings can only be made via AMO. Domain experts can load, create and manage data mining models on Analysis Services by using a reduced version of Visual Studio that shipped with Microsoft SQL Server. As soon as a domain expert creates a data mining model in Analysis Services, model is saved with its metadata and this metadata can be retrieved by ADOMD.NET. Cooperating with AMO and ADOMD.NET, ASMiner accesses data mining models metadata and composes appropriate viewers that users request. Figure 1. Modules and sub-components chart of ASMiner ASMiner is formed by five main modules such as authentication mechanism, decision tree subsystem, clustering subsystem; association rules subsystem and management tools (Fig. 1). Authentication subsystem authorizes every request and validate if the user has access right to requested page and operation. Decision tree, clustering and association rules mining subsystems have their specific type of mining model viewers. In these viewers, some third party open source charting and visualization components are either used or selfdeveloped in this study. ASMiner also has a management tool developed for various purposes. These characteristics of ASMiner are explained in the subsequent paragraphs. Decision tree module of ASMiner contains three types of tree viewer such as general tree viewer, discrete tree viewer and radial tree viewer. General tree viewer has a capacity to draw both regression trees and discrete decision trees. To increase speed and interactivity, Javascript client side scripting technology is utilized when drawing a tree. In tree design, Walker tree drawing algorithm is employed for production of perspicuous and aesthetic trees. Users can navigate on trees by expanding or closing the nodes by clicking appropriate buttons on nodes. Besides,

Visifire (Visifire, 2008) charting solutions are employed in the node histogram display. One of the other important features of general tree viewer (Fig. 2) is to have a drill-through support. Finally, drill-through data can be stored as CSV or Excel formats. Figure 2. General tree viewer of ASMiner Discrete tree viewer has some special properties specified for discrete decision trees. Additionally, a radial tree viewer is empirically implemented to provide an opportunity of viewing tree structure in a different point of view for users. The dependency network graphs are produced for the correlation exploration. ASMiner has two types of dependency network viewers. By using these graphs, users can navigate on the overall graph and explore the content. A sample dependency network graph displayed with ZGRViewer is presented in Fig. 3. Another dependency network graph viewer is based on Flash technology. In the lack of Java Runtime, this Flash based viewer is thought to give service to users. On the other hand, this viewer is capable of highlighting and showing the most nearest neighbors of selected nodes beside the features like zooming, rotating and unique coloring of nodes. Figure 3. The ZGRViewer powered dependency graph In order to complete the purpose of decision tree based decision making and to serve the opportunities of decision tree based prediction, ASMiner has a web-based online prediction tool. In fact, decision tree based prediction is no more than hoping on the decision nodes with appropriate directions. At the last step of this recursive process, the value of target variable (attribute) becomes clear or a distribution table is given at worst case. In the case of regression trees, the value of target variable is calculated by the formula of decision node. Two types of prediction queries such as batch and singleton exist in Analysis Services. However, only singleton querying is supported by ASMiner.

Online prediction can be repeated many times to obtain best decision because it is an iterative process. Fig. 4 shows the stages of ASMiner web-based prediction tool. In the first and second stages, decision maker selects the predictable variable(s) and attach them with a required member of predict function family. Pure predict() function results the value of target variable. On the other hand, predict-support() function returns the support value of the predicted target variable. In the third stage, to make a prediction, decision maker must enter the case that will be predicted. Thus, in this stage, the input variables are entered. In the last stage, results are taken on the fly and evaluated by decision maker. If needed, predictable or input variables may be changed with different combinations and overall scenario repeats itself until the decision maker is satisfied. (1)Select a predictable variable (2) Attach a suitable predict function (3) Input the independent variables of case (4) Get the prediction results in a table Figure 4. The stages in web-based prediction of ASMiner ASMiner clustering subsystem focuses on describing and introducing discovered clusters in different point of views. Majority of the viewers implemented in ASMiner clustering subsystem targets to inform the users about characteristics, statistical differences and discriminations of clusters. Furthermore, a distribution based cluster dominancy exploration method and viewer (Fig. 5) are empirically developed to gain insight on that which clusters are highly dominant or recessive at the intersection of values of discrete variables in two dimensional spaces. ASMiner has six different types of clustering viewer implemented such as value distribution, cluster distribution, general cluster profiles, specific cluster characteristics, cluster comparison and lastly cluster neighborhood + distribution viewers. Moreover, ASMiner presents two new viewers used for value-variable distribution and cluster distribution. By using these viewers, decision makers have the opportunity of having statistical insights of clusters. In the cluster properties viewer, the properties of a selected cluster are listed in decreasing support value. Figure 5. Cluster dominancy distribution viewer Association rules mining is one of the most important data mining tasks. For this reason, an association rules mining module is implemented to ASMiner and three viewers are designed for this module. Itemsets viewer, rules viewer and rules dependency network viewer constitute the association rules module of ASMiner. ASMiner includes a comprehensive rules viewer. Unlike the other implementations of Apriori algorithm, Analysis Services focuses on the Importance (namely lift) score for measuring the usefulness of the rule (Maclennan et al., 2008). Rule importance score ranges between -1 to 1. Due to its potential

advantages, ASMiner focuses on the ways of filtering and saving the important rules that decision makers require to report. Therefore, rule viewer is equipped with a minimum importance, minimum confidence and textual search controls. A web-based and flexible management system for administrators of system is developed for ASMiner. By using this tool, user, roles, active mining models and the relationships among them can be managed. With the help of Analysis Services AMO programming interfaces, model information can be retrieved and the updates are directly reflected. Additionally, system administrator can specifically allow or ban the user(s) to explore selected models. Finally, anyone as the user has right to access the model, he/she can be forbidden of training or making predictions over it. Up to now, data mining oriented decision making processes have taken too much time due to the barriers between decision makers and data mining experts. In addition, each change during the generation of reports requires alternation in data mining models. For this reason, it results in new loops between the decision makers and system administrators. However, this limitation may be decreased by operations to be performed by the decision makers. This shows that the current approach in data mining software packages assuming the target users as data mining expert. This approach is the fundamental barrier to the common use of data mining. As the examples, researchers from different disciplines, officers of banks and insurance companies, market managers and decision makers can use ASMiner easily. Decision tree based risk assessment for all incoming requests can be carried out by these persons without needing a data mining expert. 3. CONCLUSION In this study, a web-based DSS namely ASMiner was developed. It is designed and implemented to take full advantages of ultimate technologies in Internet and in DSS. In the designing stage, some viewers were designed inspiring form the original Analysis Services viewers. For this reason, ASMiner can be assessed as the web based version of Analysis Services. In addition, although Analysis Services presents some features for connecting to itself on HTTP platform, ASMiner provides a pure three-tiered web-based data mining platform. In addition, by considering AJAX based techniques and controls, the performance and user interaction capabilities were enhanced. Due to the characteristics of ASMiner, it is possible to say that it has some advantageous when compared with the other reporting and exploring tools used in practice. As the further recommendation, extending the management capabilities on the data mining models and enhancing the system for administrative usage is planned. In addition, another important point that is aimed to implement in the future, is that supporting batch queries against live data sources. Furthermore, implementing web-based naïve Bayesian model viewer and sequence clustering model viewer are the important milestones in the development roadmap of ASMiner. Additionally, ASMiner would be fully automated and more comprehensive web-based DSS for both decision makers and data mining experts. REFERENCES Ba, S. and Kalakota, A. B., 1995. Executable Documents DSS. Proc. 3rd International. Conference on DSS. Hong- Kong. Bhargava, H.K. and Power, D.J., 2001. Decision Support Systems and Web Technologies. AMCIS 2001 Proceedings. Heinrichs, J.H. and Him, J., 2003. Integrating Web Based Data Mining Tools with Business Models For Knowledge Management, Decision Support Systems, Vol. 35, No. 1, pp 103 112. Maclennan, J. et al, 2008. Data Mining with SQL Server 2008. Wiley, Indiana Polis, USA. Shim, J.P. et al, 2002. Past, Present, and Future of Decision Support Technology. Decision Support Systems, Vol. 4, No. 2, pp 111 126. Visifire, 2009, Available: http://www.visifire.com