A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

Size: px
Start display at page:

Download "A Web-based Interactive Data Visualization System for Outlier Subspace Analysis"

Transcription

1 A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada Hai Wang Sobey School of Business Saint Mary s University Halifax, NS, B3H 3C3 Canada Ji Zhang Mathematics & Computing University of Southern Queensland Toowoomba, QLD, 4350 Australia Abstract Detecting outliers from high-dimensional data is a challenge task since outliers mainly reside in various lowdimensional subspaces of the data. To tackle this challenge, subspace analysis based outlier detection approach has been proposed recently. Detecting outlying subspaces in which a given data point is an outlier facilitates a better characterization process for detecting outliers for high-dimensional data stream, and make outlier mining for large high-dimensional data set to be more manageable. In this paper, to facilitate outlier subspaces analysis from human perception perspectives in supporting the development of efficient solutions for high-dimensional data, we propose a web-based interactive data visualization system, which can display various low-dimensional outlier subspaces to allow users to observe and analyze the distributions of outliers. The proposed visualization tool can help the developers of outlier detection applications to directly examine the distributions of outliers in various low-dimensional subspaces to validate their experiment results. 1 Introduction Outliers in a database or data stream are the data objects that are grossly different from or inconsistent with the rest of the data, which reflect abnormal behaviours in the real world. Outliers may stand for toxin spills in chemical sensor data, the network intrusions in network log data, cancers in medical data, or simply some errors or noises caused by human mistakes or sensor damage, etc [11, 12, 13]. Outliers should be treated differently in different situations, such as errors and noises outliers should be removed, and intrusion and cancer outliers are targets and should be detected for analysis and event prevention. In other situation, outliers must be detected and classified properly. Traditional outlier detection methods are mainly been designed using whole dimensionality analysis approach. They work well for low-dimensional data sets. However, nowadays more and more real applications are involved in high-dimensional data. Detecting outlier from highdimensional data is a challenging task, in that traditional methods become infeasible for high-dimensional data due to the Curse of Dimensionality phenomena, in that the outliers hidden in low-dimensional subsets of the data will be disappeared as the dimensionality is increased for using whole dimensionality analysis methods [2]. The new strategy to deal with high-dimensional data is to detect outliers for possible lower dimensional subspaces of the high-dimensional data, such as introduced in [1]. The idea is to convert the issue of outlier detection in the high-dimensional data space into the issue of detecting low-dimensional outlying subspaces since exhaustive search all subspaces in high-dimensional data space is not tractable. In this paper, we propose a data visualization system to facilitate analysis and solution development for projected outlier subspace finding and gain insight by allowing the developers/users to observe the data

2 distributions for various low-dimensional outlier subspace of the data. Visualization has been proved to be a useful tool for data analysis. With development of computer hardware and software, visualization techniques can use computer graphics to create visual images which aid in understanding of complex, often massive representations of data. There are a number of visualization tools available, such as SequoiaView [3], GGobi [6], OpenViz [7], VisuMap [8] and ADVIZOR [9]. Some tools are webbased systems for the continence of accessing the tool for broad user groups, such as Manyeyes [4] and Drillet [5]. However, there is no data visualization system for directly analyzing projected outlier subspaces. In this paper, we present a visualization system for outlier subspace analysis in that the features and interface tools are special designed for effectively supporting human to observe and explore large volume high-dimensional data for gaining insight on outlier detection on such complex data sets. 2 System Design and Implementation The proposed visualization system is designed for supporting outlier analysis on high-dimensional data in that human perception can play a role for gaining insight on outlier subspaces, which is based on the concept of Stream Projected Outlier Detector (SPOT) [1]. In SPOT system, the problem of detecting projected outliers from high-dimensional data streams is formulated as follows. Given a data streamd with a potentially unbounded size of ϕ-dimensional data points, each data point pi = {pi1, pi2,..., pi'} in D will be labeled as either a projected outlier or a regular data point. If pi is a projected outlier, its associated outlying subspace(s) will be given as well. The results to be returned will be a set of projected outliers and their associated outlying subspace(s) to indicate the context where these projected outliers exist. The results, denoted by A, can be formally expressed as A = {<o, S >,o O and S is the outlying subspace set of o}, where O denotesset of outliers detected. The visualization system aims to help users to examine the detected outlying subspaces for highdimensional data set. Users are allowed to adjust the parameters of the outlier detection algorithms and visualize the intermediate detection results. A set of visualization tools is designed for supporting human exploration on projected outlier subspace analysis. 2.1 System Architecture The architecture of the visualization system is illustrated in Figure 1. The data to be displayed can include both the original high-dimensional data set and the outlier detection results after data pre-processing which includes standard steps of data cleaning, data transformation and data normalization. Data cleaning is to remove incorrect records in the dataset. Data transformation is to correct inconsistent data format and convert continuous data attribute values into a finite set of intervals with minimal loss of information. In data normalization, we will find out the minimum and maximum value for each dimension and convert value between 0 and 1. Figure 1 System Architecture For the prepared high-dimensional data, one data point may be considered as outlier in many subspaces,

3 therefore the outlier detection result may be very large. In order to handle large size of outlier detection results, the system to use a database to store the datasets and the information of outlying subspaces. After data preparation stage, both the datasets and the outliers are stored into two tables in the database. By doing so, the database server can quickly retrieve the selected data for feeding into the visualization system for display. With the prepared data sets, the user should be able to access the system through internet with a web browser. The system allows the user to select different subspaces and views to display. According to user s subspace selection, the system will connect to the database server with JDBC and send queries to database server. The retrieved data and outlier information for the selected subspaces will be transmitted to client machine over internet and displayed in user s web browser. The database and web application services are at server side. On the client side, user can access the web services and visualize data and outliers for the selected subspaces from the web browser. The system also allows the user to visualize different datasets by reading data file name specified by the user from user s local machine. The system is implemented in Java. The client machine needs to install J2SE 5 and Java 3D 1.5 or higher version to run the system. 2.2 Synthetic Datasets In the experiments, both synthetic data and real data sets are used. The synthetic data is generated randomly by a high-dimensional data generator used in SPOT research [1]. The nature of the data is close to real-life data. It exhibits different data characteristics in projections of different subsets of features. It consists of 15 attributes and 10,000 lines of data. The outlier detection result directly from SPOT method [1] consists of 426,513 outliers from one dimensional to three dimensional subspaces. Below is a sample of the first two detected outlying subspaces in the file. Outlierness Threshold: 3 ***************************************** Top outlier: data #1 In subspace: 11 Cell index: 1 Outlier-ness: Top outlier: data #2 In subspace: 1 6 Cell index: 15 6 Outlier-ness: Field Type Description linenumber int(11) Primary Key. Row number of data. valume1 double Attribute 1 valum2 double Attribute valume15 double Attribute 15 Table 1 Schema of Data in Database Field Type Description id int(11) Primary Key and identify each outlying subspace. linenumber int(11) Row number of data. dimension1 int(11) Attribute 1 of outlying subspace. dimension2 int(11) Attribute 2 of outlying subspace. dimension3 int(11) Attribute 3 of outlying subspace. outlierness double Outlierness of outlier. Table 2 Schema of Outlier Information in Database Since the outlier detection result contains only outlying subspaces of 1, 2 and 3 dimensional subspaces. The corresponding data tables and outlier table are created in the database. The detailed schema of the data table is

4 illustrated in Table 1. The detailed schema of the outlier table is given in Table 2. The attribute values of outlying subspaces are sorted in ascending order. For onedimensional outlying subspaces, the values of dimension2 and dimension3 are NULL. Similarly, for twodimensional outlying subspaces, the attribute of dimension3 is NULL. For three dimensional outlying subspaces, values of all dimensions are not NULL. 2.3 Real-life Datasets The experiments also include real-life data sets, i.e. the KDD Cup 1999 data [10], which is a log connection traffic data set from MIT/Lincoln-Lab. It contains connections detail in its network such as the protocoltype, duration, service-use and many related information. We use the first 5000 lines of the data from the corrected data with labels for our visualization. In the preprocessing stage, we separate label information from datasets into a separated file. The label names are transformed into numbers. Each type of network intrusion is mapping to one number. There are four types (shown in Table 3) of network intrusion labelled in the first 5000 lines data. We use the number of outlier type as outlierness value. In this way, we can visualize the distribution of different kind of network intrusion. Table 3 Label Mapping 3 Experiments and System Demonstration cases for both synthetic datasets and KDD 1999 network log data. The visualization system can help to answer questions on the outlier detection. For examples, 1. In a two-dimensional subspace of the synthetic datasets, find out whether a selected particular outlier data point is also an outlier in other two-dimensional subspaces. 2. What distribution of smurf network attacks is in KDD 1999 data? Case 1: In a two-dimensional subspace, find out whether a selected outlier data point is also considered as an outlier in other two-dimensional subspaces. For answering this question, we visualize four twodimensional subspaces (as shown in Figure 2) which are (Dim4, Dim 6), (Dim3, Dim 6),( Dim 12, Dim 10) and (Dim 2, Dim 4). When click one outlier (index #174) in subspace (Dim 4, Dim 6), then click the Concurrent button in other two-dimensional subspace display windows. We can easily observe that the outlier data point (index #174) in (Dim4, Dim6) is also considered as outlier in (Dim3, Dim 6) and (Dim 2, Dim 4). Moreover, we may change the outlierness threshold by moving slide bar in these two windows. We can get the outlierness value of data point (index 174) is in both (Dim3, Dim 6) and (Dim 2, Dim 4). Case2: Visualize distribution of smurf network attack in KDD 1999 data. The example of visualizing the distribution of outliers in three-dimensional subspaces is shown in Figure 3. We may find out that the smurf network attacks are mainly resided closely in the marked area in the selected threedimensional subspace. Figure 4 is an example of use concurrent display of two-dimensional subspaces. The system reports the selected outlier from the subspace in left window is also marked as an outlier in the other subspace in the right window. The experiments are developed based on sample

5 Figure 2 Case 1: Two-Dimensional Subspaces Concurrent Display Figure 3 Case2: 3D Display Figure 4 Case2: 2D Concurrent Display

6 4 Conclusion and Future Work The proposed web-based visualization system can help to observe subspaces of high-dimensional datasets interactively. - The system enables the user to evaluate performance of an outlier detection algorithm by visually verifying the correctness of the results, and determining a proper parameter for better outlier detection results. Through visualizing datasets and their labelled results, user can gain insight visually on what real facts are about the data distribution nature and the outlier distribution. It is also useful for comparing the effectiveness of different algorithms. The user may also adjust the values of different parameters of the algorithms for comparing the changes of performance. This system currently can visualize datasets and their labelled outlier information. It can interact with user and help to explore the datasets and outlier subspaces. In the future work, we may make the system to allow users to directly label outliers from selected subspaces. Users may also manually adjust outlierness value for selected outlier data points for observing sensitivity of the data. Moreover, the system may be integrated with different outlier detection algorithms such as the SPOT algorithm in [1]. [5] Drillet Visual Tool for interactive data analysis, [6] Data Visulization system: GGobi, [7] Data Visulization system: OpenViz, [8] Data Visulization system: VisuMap, [9] Data Visulization system: ADVIZOR, [10] KDD data source: [11] B. Aleskerov, E. Freisleben and B. Rao. Cardwatch: A Neural Network Based Database Mining System for Credit Card Fraud Detection. Computational Intelligence for Financial Engineering (CIFEr), [12] J. F. Costa. Reducing the Impact of Outliers in Ore Reserves Estimation. Mathematical Geology, 35(3), [13] J. Han and M. Kamber. Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann Publishers, References [1] J. Zhang, Q. Gao and H. Wang. SPOT: A System for Detecting Projected Outliers from High-dimensional Data Streams. IEEE 24th International Conference on Data Engineering (ICDE 08), Cancun, Mexico, pp , [2] R. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, [3] Data Visualization system: Sequoiaview, [4] Data Visualization system: Manyeyes,

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Application Tool for Experiments on SQL Server 2005 Transactions

Application Tool for Experiments on SQL Server 2005 Transactions Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 30 Application Tool for Experiments on SQL Server 2005 Transactions ŞERBAN

More information

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing

More information

KEITH LEHNERT AND ERIC FRIEDRICH

KEITH LEHNERT AND ERIC FRIEDRICH MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They

More information

A Review on Network Intrusion Detection System Using Open Source Snort

A Review on Network Intrusion Detection System Using Open Source Snort , pp.61-70 http://dx.doi.org/10.14257/ijdta.2016.9.4.05 A Review on Network Intrusion Detection System Using Open Source Snort Sakshi Sharma and Manish Dixit Department of CSE& IT MITS Gwalior, India Sharmasakshi1009@gmail.com,

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Preprocessing Web Logs for Web Intrusion Detection

Preprocessing Web Logs for Web Intrusion Detection Preprocessing Web Logs for Web Intrusion Detection Priyanka V. Patil. M.E. Scholar Department of computer Engineering R.C.Patil Institute of Technology, Shirpur, India Dharmaraj Patil. Department of Computer

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS

A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS A FRAMEWORK FOR MANAGING RUNTIME ENVIRONMENT OF JAVA APPLICATIONS Abstract T.VENGATTARAMAN * Department of Computer Science, Pondicherry University, Puducherry, India. A.RAMALINGAM Department of MCA, Sri

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

Detecting false users in Online Rating system & Securing Reputation

Detecting false users in Online Rating system & Securing Reputation Detecting false users in Online Rating system & Securing Reputation ABSTRACT: With the rapid development of reputation systems in various online social networks, manipulations against such systems are

More information

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT?

WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? WHAT MOTIVATED DATA MINING? WHY IS IT IMPORTANT? Data mining is mainly used for decision making in business. The abundance of data, coupled with the need for powerful data analysis tools, has been described

More information

WVSOM. BDMS webxtender Tutorial

WVSOM. BDMS webxtender Tutorial WVSOM BDMS webxtender Tutorial Chris Hollandsworth 2/16/2012 Introduction The Banner Document Management System (BDMS) is a software application that interfaces with the Banner database system at WVSOM.

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Arti Tyagi Sunita Choudhary

Arti Tyagi Sunita Choudhary Volume 5, Issue 3, March 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Web Usage Mining

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

COURSE DESCRIPTION. Queries in Microsoft Access. This course is designed for users with a to create queries in Microsoft Access.

COURSE DESCRIPTION. Queries in Microsoft Access. This course is designed for users with a to create queries in Microsoft Access. COURSE DESCRIPTION Course Name Queries in Microsoft Access Audience need This course is designed for users with a to create queries in Microsoft Access. Prerequisites * Keyboard and mouse skills * An understanding

More information

AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING

AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 AN EFFICIENT PREPROCESSING AND POSTPROCESSING TECHNIQUES IN DATA MINING R.Tamilselvi 1, B.Sivasakthi 2, R.Kavitha

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

The Data Quality Continuum*

The Data Quality Continuum* The Data Quality Continuum* * Adapted from the KDD04 tutorial by Theodore Johnson e Tamraparni Dasu, AT&T Labs Research The Data Quality Continuum Data and information is not static, it flows in a data

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Performance Improvement of Web Server through Log files cleaning

Performance Improvement of Web Server through Log files cleaning International Journal of Latest Trends in Engineering and Technology Vol.(7)Issue(3), pp. 211-216 DOI: http://dx.doi.org/10.21172/1.73.030 e-issn:2278-621x Performance Improvement of Web Server through

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

Report on the Train Ticketing System

Report on the Train Ticketing System Report on the Train Ticketing System Author: Zaobo He, Bing Jiang, Zhuojun Duan 1.Introduction... 2 1.1 Intentions... 2 1.2 Background... 2 2. Overview of the Tasks... 3 2.1 Modules of the system... 3

More information

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti

More information

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,

More information

Data Mining Governance for Service Oriented Architecture

Data Mining Governance for Service Oriented Architecture Data Mining Governance for Service Oriented Architecture Ali Beklen Software Group IBM Turkey Istanbul, TURKEY alibek@tr.ibm.com Turgay Tugay Bilgin Dept. of Computer Engineering Maltepe University Istanbul,

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Neural Networks in Data Mining

Neural Networks in Data Mining IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department

More information

HPE Vertica QuickStart for IBM Cognos Business Intelligence

HPE Vertica QuickStart for IBM Cognos Business Intelligence HPE Vertica QuickStart for IBM Cognos Business Intelligence HPE Vertica Analytic Database November, 2015 Legal Notices Warranty The only warranties for HPE products and services are set forth in the express

More information

Stellar Phoenix. SQL Database Repair 6.0. Installation Guide

Stellar Phoenix. SQL Database Repair 6.0. Installation Guide Stellar Phoenix SQL Database Repair 6.0 Installation Guide Overview Stellar Phoenix SQL Database Repair software is an easy to use application designed to repair corrupt or damaged Microsoft SQL Server

More information

Web Intelligence User Guide

Web Intelligence User Guide Web Intelligence User Guide Office of Financial Management - Enterprise Reporting Services 4/11/2011 Table of Contents Chapter 1 - Overview... 1 Purpose... 1 Chapter 2 Logon Procedure... 3 Web Intelligence

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo

More information

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700 Information Technology MS Access 2007 Users Guide ACCESS 2007 Importing and Exporting Data Files IT Training & Development (818) 677-1700 training@csun.edu TABLE OF CONTENTS Introduction... 1 Import Excel

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology

Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology Communications of the IIMA Volume 7 Issue 4 Article 1 2007 Fraudulent Behavior Forecast in Telecom Industry Based on Data Mining Technology Sen Wu School of Economics and Management University of Science

More information

Web Data Mining Trends & Techniques

Web Data Mining Trends & Techniques Web Data Mining Trends & Techniques Authors: Ujwala Manoj Patil & J.B. Patil Publication: August 2012 Team Members : Vishma Shah Pooja Vora Background & Problem Definition Three types of Mining: Data Mining

More information

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus

An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus An Evaluation of Machine Learning Method for Intrusion Detection System Using LOF on Jubatus Tadashi Ogino* Okinawa National College of Technology, Okinawa, Japan. * Corresponding author. Email: ogino@okinawa-ct.ac.jp

More information

CHAPTER 6: ANALYZE MICROSOFT DYNAMICS NAV 5.0 DATA IN MICROSOFT EXCEL

CHAPTER 6: ANALYZE MICROSOFT DYNAMICS NAV 5.0 DATA IN MICROSOFT EXCEL Chapter 6: Analyze Microsoft Dynamics NAV 5.0 Data in Microsoft Excel CHAPTER 6: ANALYZE MICROSOFT DYNAMICS NAV 5.0 DATA IN MICROSOFT EXCEL Objectives The objectives are: Explain the process of exporting

More information

Creating an Access Database. To start an Access Database, you should first go into Access and then select file, new.

Creating an Access Database. To start an Access Database, you should first go into Access and then select file, new. To start an Access Database, you should first go into Access and then select file, new. Then on the right side of the screen, select Blank database. Give your database a name where it says db1 and save

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Research and Development of Data Preprocessing in Web Usage Mining

Research and Development of Data Preprocessing in Web Usage Mining Research and Development of Data Preprocessing in Web Usage Mining Li Chaofeng School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. China Abstract Web Usage Mining is the

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

More information

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing

More information

Regression Clustering

Regression Clustering Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis

Data Foundations. Data Attributes. Data Attributes and Features Data Pre-processing Data Storage Data Analysis Data Foundations Data Attributes and Features Data Pre-processing Data Storage Data Analysis 1 Data Attributes Describing data content and characteristics Representing data dimensions Set of all attributes:

More information

Improving Decision Making and Managing Knowledge

Improving Decision Making and Managing Knowledge Improving Decision Making and Managing Knowledge Decision Making and Information Systems Information Requirements of Key Decision-Making Groups in a Firm Senior managers, middle managers, operational managers,

More information

Deltek Costpoint 7.1.1. Process Execution Modes

Deltek Costpoint 7.1.1. Process Execution Modes Deltek Costpoint 7.1.1 Process Execution Modes October 24, 2014 While Deltek has attempted to verify that the information in this document is accurate and complete, some typographical or technical errors

More information

A Survey on Web Mining From Web Server Log

A Survey on Web Mining From Web Server Log A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering

More information

Decision Trees and Decision Rules

Decision Trees and Decision Rules POLYTECHNIC UNIVERSITY Department of Computer Science / Finance and Risk Engineering Decision Trees and Decision Rules K. Ming Leung Abstract: The logic-based decision trees and decision rules methodology

More information

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

More information

A Framework for Data Migration between Various Types of Relational Database Management Systems

A Framework for Data Migration between Various Types of Relational Database Management Systems A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is

More information

Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

More information

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: 10.1.1. Security Note BlackBerry Enterprise Service 10 Secure Work Space for ios and Android Version: 10.1.1 Security Note Published: 2013-06-21 SWD-20130621110651069 Contents 1 About this guide...4 2 What is BlackBerry Enterprise

More information

Network Anomaly Detection Through Hybrid Algorithm

Network Anomaly Detection Through Hybrid Algorithm RESEARCH ARTICLE Network Anomaly Detection Through Hybrid Algorithm Vasudha Sharma [1], Vivek Suryawanshi [2] M.Tech Scholar [1], Assistant professor [2] Department of Computer Science and Engineering

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

The Edge Editions of SAP InfiniteInsight Overview

The Edge Editions of SAP InfiniteInsight Overview Analytics Solutions from SAP The Edge Editions of SAP InfiniteInsight Overview Enabling Predictive Insights with Mouse Clicks, Not Computer Code Table of Contents 3 The Case for Predictive Analysis 5 Fast

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

SAS BI Dashboard 3.1. User s Guide

SAS BI Dashboard 3.1. User s Guide SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard

More information

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved.

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved. Evaluator s Guide PC-Duo Enterprise HelpDesk v5.0 Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved. All third-party trademarks are the property of their respective owners.

More information

Call Recorder Quick CD Access System

Call Recorder Quick CD Access System Call Recorder Quick CD Access System V4.0 VC2010 Contents 1 Call Recorder Quick CD Access System... 3 1.1 Install the software...4 1.2 Start...4 1.3 View recordings on CD...5 1.4 Create an archive on Hard

More information

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD Mrs. Vijayalaxmi M. 1, Anagha Kelkar 2, Neha Puthran 2, Sailee Devne 2 Vice Principal 1, B.E. Students 2, Department of Information Technology V.E.S Institute

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation

Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation Dae-Ki Kang 1, Doug Fuller 2, and Vasant Honavar 1 1 Artificial Intelligence Lab, Department of Computer Science, Iowa

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

College of Business Administration

College of Business Administration Page 1 Exercise 4 Create Queries Queries are generally used to extract information from the tables and present it in a non-formal format or create formal reports. Queries can be created from one table

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

More information

WebSphere Business Monitor

WebSphere Business Monitor WebSphere Business Monitor Dashboards 2010 IBM Corporation This presentation should provide an overview of the dashboard widgets for use with WebSphere Business Monitor. WBPM_Monitor_Dashboards.ppt Page

More information

Business Objects Version 5 : Introduction

Business Objects Version 5 : Introduction Business Objects Version 5 : Introduction Page 1 TABLE OF CONTENTS Introduction About Business Objects Changing Your Password Retrieving Pre-Defined Reports Formatting Your Report Using the Slice and Dice

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information