A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING



Similar documents
Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

IT462 Lab 5: Clustering with MS SQL Server

COURSE RECOMMENDER SYSTEM IN E-LEARNING

from Larson Text By Susan Miertschin

KnowledgeSEEKER Marketing Edition

Data Mining Solutions for the Business Environment

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

The Prophecy-Prototype of Prediction modeling tool

Introduction Predictive Analytics Tools: Weka

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

SQL Server 2014 BI. Lab 04. Enhancing an E-Commerce Web Application with Analysis Services Data Mining in SQL Server Jump to the Lab Overview

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Principles of Data Mining by Hand&Mannila&Smyth

SQL Server Administrator Introduction - 3 Days Objectives

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

XFlash A Web Application Design Framework with Model-Driven Methodology

Zoomer: An Automated Web Application Change Localization Tool

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Data Mining with SQL Server Data Tools

<no narration for this slide>

CUSTOMER Presentation of SAP Predictive Analytics

IBM SPSS Modeler 14.2 In-Database Mining Guide

What you can do:...3 Data Entry:...3 Drillhole Sample Data:...5 Cross Sections and Level Plans...8 3D Visualization...11

IBM SPSS Modeler 15 In-Database Mining Guide

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Prerequisites. Course Outline

Fluency With Information Technology CSE100/IMT100

Students who successfully complete the Health Science Informatics major will be able to:

An Overview of Knowledge Discovery Database and Data mining Techniques

IBM Cognos TM1 Executive Viewer Fast self-service analytics

Make Better Decisions Through Predictive Intelligence

Data Mining Algorithms Part 1. Dejan Sarka

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

Visualizing e-government Portal and Its Performance in WEBVS

Application of Data Warehouse and Data Mining. in Construction Management

IS 2927 Independent Study in Systems & Technology Applications of Information Technology. Adaptive Online Course Recommendation System Part II

An Introduction to WEKA. As presented by PACE

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Database Marketing, Business Intelligence and Knowledge Discovery

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Oracle Data Miner (Extension of SQL Developer 4.0)

The basic data mining algorithms introduced may be enhanced in a number of ways.

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Predicting Students Final GPA Using Decision Trees: A Case Study

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

EnterpriseLink Benefits

Introduction to Data Mining

Development of a Learning Content Management Systems

2015 Workshops for Professors

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Efficient Integration of Data Mining Techniques in Database Management Systems

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

IBM SPSS Modeler Professional

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Extensions (DMX) Reference

Ezgi Dinçerden. Marmara University, Istanbul, Turkey

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

LVQ Plug-In Algorithm for SQL Server

Programmabilty. Programmability in Microsoft Dynamics AX Microsoft Dynamics AX White Paper

SQL Server 2005 Features Comparison

The Pastel Business Intelligence Centre will revolutionise the way you view your accounting data

Data Warehousing and Data Mining in Business Applications

The Analysis of Data Collected by Time and Attendance Systems

Financial Trading System using Combination of Textual and Numerical Data

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

OWB Users, Enter The New ODI World

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

Oracle Real Time Decisions

Knowledge Discovery from patents using KMX Text Analytics

Transcription:

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey selman@cs.hacettepe.edu.tr Ebru Akcapinar SEZER Hacettepe University Computer Engineering Department, Ankara, Turkey ebru@hacettepe.edu.tr ABSTRACT In recent years, many companies have begun to use data mining and decision support systems (DSS) for decision making activities. Although their use is increasing continuously, DSSs are generally built as desktop applications and designed for the use of data mining experts. The purposes of the present study are selected as design and implementation of a webbased data mining exploration and reporting tool namely ASMiner. ASMiner provides exploring and reporting on three data mining techniques (decision trees, clustering and association rules mining), by presenting a scalable and fully webbased thin client data mining tool for both decision makers and knowledge workers. KEYWORDS DSS, Decision-making, Web-based data mining 1. INTRODUCTION The data mining is a useful tool for decision-makers in predicting and planning the future. It is possible to say that the data mining methods may have a crucial importance among the existing approaches to solve forecasting problems encountered in all engineering areas, medical and applied sciences, etc. in near future. Web based technologies have been revolutionized the design, development and implementation stages of decision support systems (Ba & Kalakota, 1995; Bhargava & Power, 2001). Moreover, the Web environment is expanding as a very important DSS development and delivery platform (Shim et al., 2002). The key advantages of the web based tools when compared with the traditional batch-based or client-server oriented tools include ease of-use, universal access across information technology platforms, and single minute response and feedback based upon dynamic and real-time data (Heinrichs & Him, 2003). Development of a completely web based data mining exploration and reporting tool to save time during the exploration and reporting phases of data mining applications and to enable even typical users to be effective decision makers are the main purposes of the present study. For the purpose of the study, a tool namely ASMiner is developed. ASMiner employs Microsoft SQL Server Analysis Services behind the scene as the data mining engine and it currently supports three data mining techniques such as decision trees, clustering and association rules. 2. WEB BASED DATA MINING TOOL DEVELOPED In market, it is possible to find numerous numbers of data mining tools and applications requiring professional data mining background and practice. Owing to these requirements, the data mining solutions as the software packages are used by data mining experts, only. Moreover, most of all commercial data mining solutions are implemented with non web-based approaches. Furthermore, report production in many data

mining software still requires exhaustive and time-consuming processes. To cope with these difficulties, ASMiner considers the knowledge workers to help them in the process of becoming data miners and to achieve this, it presents easy to understand, user friendly and perspicuous user interfaces in exploring mining models created in Microsoft Analysis Services. In market, some databases such as Oracle, MS SQL and WEKA etc exist. However, when developing ASMiner, Microsoft Analysis Services is preferred owing to its cheapness and commonly usefulness. Microsoft Analysis Services has been the business intelligence component of Microsoft SQL Server software since 2000. In decision tree algorithm platform, Microsoft invented it is own decision tree algorithm namely Microsoft Decision Trees. This algorithm can handle both categorical and continuous variables as well as CART and CHAID. In addition, it supports entropy and Bayesian score as the splitting strategy and unlike the other famous algorithms, it offers no pruning phase. In Analysis Services, as soon as a decision tree model is created, a corresponding dependency network is also formed. In clustering models, Microsoft Analysis Services offers two types of clustering algorithms such as K-Means and EM (Expectation- Maximization) with scalable and non-scalable versions. On the other hand, well known Apriori algorithm is employed in association rules mining. ASMiner uses client connectivity interfaces of SQL Server in both OLTP and data mining aspect. ADOMD.NET and AMO has been used as the entry point to Analysis Services. ADOMD.NET is mainly focused on retrieving mining models meta-data. However, AMO provides management options on server objects in Analysis Services. Thus, model training/processing operations and model settings can only be made via AMO. Domain experts can load, create and manage data mining models on Analysis Services by using a reduced version of Visual Studio that shipped with Microsoft SQL Server. As soon as a domain expert creates a data mining model in Analysis Services, model is saved with its metadata and this metadata can be retrieved by ADOMD.NET. Cooperating with AMO and ADOMD.NET, ASMiner accesses data mining models metadata and composes appropriate viewers that users request. Figure 1. Modules and sub-components chart of ASMiner ASMiner is formed by five main modules such as authentication mechanism, decision tree subsystem, clustering subsystem; association rules subsystem and management tools (Fig. 1). Authentication subsystem authorizes every request and validate if the user has access right to requested page and operation. Decision tree, clustering and association rules mining subsystems have their specific type of mining model viewers. In these viewers, some third party open source charting and visualization components are either used or selfdeveloped in this study. ASMiner also has a management tool developed for various purposes. These characteristics of ASMiner are explained in the subsequent paragraphs. Decision tree module of ASMiner contains three types of tree viewer such as general tree viewer, discrete tree viewer and radial tree viewer. General tree viewer has a capacity to draw both regression trees and discrete decision trees. To increase speed and interactivity, Javascript client side scripting technology is utilized when drawing a tree. In tree design, Walker tree drawing algorithm is employed for production of perspicuous and aesthetic trees. Users can navigate on trees by expanding or closing the nodes by clicking appropriate buttons on nodes. Besides,

Visifire (Visifire, 2008) charting solutions are employed in the node histogram display. One of the other important features of general tree viewer (Fig. 2) is to have a drill-through support. Finally, drill-through data can be stored as CSV or Excel formats. Figure 2. General tree viewer of ASMiner Discrete tree viewer has some special properties specified for discrete decision trees. Additionally, a radial tree viewer is empirically implemented to provide an opportunity of viewing tree structure in a different point of view for users. The dependency network graphs are produced for the correlation exploration. ASMiner has two types of dependency network viewers. By using these graphs, users can navigate on the overall graph and explore the content. A sample dependency network graph displayed with ZGRViewer is presented in Fig. 3. Another dependency network graph viewer is based on Flash technology. In the lack of Java Runtime, this Flash based viewer is thought to give service to users. On the other hand, this viewer is capable of highlighting and showing the most nearest neighbors of selected nodes beside the features like zooming, rotating and unique coloring of nodes. Figure 3. The ZGRViewer powered dependency graph In order to complete the purpose of decision tree based decision making and to serve the opportunities of decision tree based prediction, ASMiner has a web-based online prediction tool. In fact, decision tree based prediction is no more than hoping on the decision nodes with appropriate directions. At the last step of this recursive process, the value of target variable (attribute) becomes clear or a distribution table is given at worst case. In the case of regression trees, the value of target variable is calculated by the formula of decision node. Two types of prediction queries such as batch and singleton exist in Analysis Services. However, only singleton querying is supported by ASMiner.

Online prediction can be repeated many times to obtain best decision because it is an iterative process. Fig. 4 shows the stages of ASMiner web-based prediction tool. In the first and second stages, decision maker selects the predictable variable(s) and attach them with a required member of predict function family. Pure predict() function results the value of target variable. On the other hand, predict-support() function returns the support value of the predicted target variable. In the third stage, to make a prediction, decision maker must enter the case that will be predicted. Thus, in this stage, the input variables are entered. In the last stage, results are taken on the fly and evaluated by decision maker. If needed, predictable or input variables may be changed with different combinations and overall scenario repeats itself until the decision maker is satisfied. (1)Select a predictable variable (2) Attach a suitable predict function (3) Input the independent variables of case (4) Get the prediction results in a table Figure 4. The stages in web-based prediction of ASMiner ASMiner clustering subsystem focuses on describing and introducing discovered clusters in different point of views. Majority of the viewers implemented in ASMiner clustering subsystem targets to inform the users about characteristics, statistical differences and discriminations of clusters. Furthermore, a distribution based cluster dominancy exploration method and viewer (Fig. 5) are empirically developed to gain insight on that which clusters are highly dominant or recessive at the intersection of values of discrete variables in two dimensional spaces. ASMiner has six different types of clustering viewer implemented such as value distribution, cluster distribution, general cluster profiles, specific cluster characteristics, cluster comparison and lastly cluster neighborhood + distribution viewers. Moreover, ASMiner presents two new viewers used for value-variable distribution and cluster distribution. By using these viewers, decision makers have the opportunity of having statistical insights of clusters. In the cluster properties viewer, the properties of a selected cluster are listed in decreasing support value. Figure 5. Cluster dominancy distribution viewer Association rules mining is one of the most important data mining tasks. For this reason, an association rules mining module is implemented to ASMiner and three viewers are designed for this module. Itemsets viewer, rules viewer and rules dependency network viewer constitute the association rules module of ASMiner. ASMiner includes a comprehensive rules viewer. Unlike the other implementations of Apriori algorithm, Analysis Services focuses on the Importance (namely lift) score for measuring the usefulness of the rule (Maclennan et al., 2008). Rule importance score ranges between -1 to 1. Due to its potential

advantages, ASMiner focuses on the ways of filtering and saving the important rules that decision makers require to report. Therefore, rule viewer is equipped with a minimum importance, minimum confidence and textual search controls. A web-based and flexible management system for administrators of system is developed for ASMiner. By using this tool, user, roles, active mining models and the relationships among them can be managed. With the help of Analysis Services AMO programming interfaces, model information can be retrieved and the updates are directly reflected. Additionally, system administrator can specifically allow or ban the user(s) to explore selected models. Finally, anyone as the user has right to access the model, he/she can be forbidden of training or making predictions over it. Up to now, data mining oriented decision making processes have taken too much time due to the barriers between decision makers and data mining experts. In addition, each change during the generation of reports requires alternation in data mining models. For this reason, it results in new loops between the decision makers and system administrators. However, this limitation may be decreased by operations to be performed by the decision makers. This shows that the current approach in data mining software packages assuming the target users as data mining expert. This approach is the fundamental barrier to the common use of data mining. As the examples, researchers from different disciplines, officers of banks and insurance companies, market managers and decision makers can use ASMiner easily. Decision tree based risk assessment for all incoming requests can be carried out by these persons without needing a data mining expert. 3. CONCLUSION In this study, a web-based DSS namely ASMiner was developed. It is designed and implemented to take full advantages of ultimate technologies in Internet and in DSS. In the designing stage, some viewers were designed inspiring form the original Analysis Services viewers. For this reason, ASMiner can be assessed as the web based version of Analysis Services. In addition, although Analysis Services presents some features for connecting to itself on HTTP platform, ASMiner provides a pure three-tiered web-based data mining platform. In addition, by considering AJAX based techniques and controls, the performance and user interaction capabilities were enhanced. Due to the characteristics of ASMiner, it is possible to say that it has some advantageous when compared with the other reporting and exploring tools used in practice. As the further recommendation, extending the management capabilities on the data mining models and enhancing the system for administrative usage is planned. In addition, another important point that is aimed to implement in the future, is that supporting batch queries against live data sources. Furthermore, implementing web-based naïve Bayesian model viewer and sequence clustering model viewer are the important milestones in the development roadmap of ASMiner. Additionally, ASMiner would be fully automated and more comprehensive web-based DSS for both decision makers and data mining experts. REFERENCES Ba, S. and Kalakota, A. B., 1995. Executable Documents DSS. Proc. 3rd International. Conference on DSS. Hong- Kong. Bhargava, H.K. and Power, D.J., 2001. Decision Support Systems and Web Technologies. AMCIS 2001 Proceedings. Heinrichs, J.H. and Him, J., 2003. Integrating Web Based Data Mining Tools with Business Models For Knowledge Management, Decision Support Systems, Vol. 35, No. 1, pp 103 112. Maclennan, J. et al, 2008. Data Mining with SQL Server 2008. Wiley, Indiana Polis, USA. Shim, J.P. et al, 2002. Past, Present, and Future of Decision Support Technology. Decision Support Systems, Vol. 4, No. 2, pp 111 126. Visifire, 2009, Available: http://www.visifire.com