# Action Reducts. Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA 2

Save this PDF as:

Size: px
Start display at page:

Download "Action Reducts. Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA 2"

## Transcription

1 Action Reducts Seunghyun Im 1, Zbigniew Ras 2,3, Li-Shiang Tsay 4 1 Computer Science Department, University of Pittsburgh at Johnstown, Johnstown, PA 15904, USA 2 Computer Science Department, University of North Carolina, Charlotte, NC 28223, USA 3 Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland 4 ECIT Department, NC A&T State University, Greensboro, NC, 27411, USA Abstract. An action is defined as controlling or changing some of attribute values in an information system to achieve desired result. An action reduct is a minimal set of attribute values distinguishing a favorable object from other objects. We use action reducts to formulate necessary actions. The action suggested by an action reduct induces changes of decision attribute values by changing the condition attribute values to the distinct patterns in action reducts. Keywords: Reduct, Action Reduct, Prime Implicant, Rough Set. 1 Introduction Suppose that the customers of a bank can be classified into several groups according to their satisfaction levels, such as satisfied, neutral, or unsatisfied. One thing the bank can do to improve the business is finding a way to make the customers more satisfied, so that they continue to do the business with the bank. The algorithm described in this paper tries to solve such problem using existing data. Assume that a bank maintains a database for customer information in a table. The table has a number of columns describing the characteristics of the customers, such as personal information, account data, survey result etc. We divide the customers into two groups based on the satisfaction level (decision value). The first group is comprised of satisfied customers who will most likely keep their account active for an extended period of time. The second group is comprised of neutral or unsatisfied customers. We find a set of distinct values or unique patterns from the first group that does not exist in the second group. The unique characteristics of the satisfied customers can be used by the bank to improve the customer satisfaction for the people in the second group. Clearly, some of attribute values describing the customers can be

2 controlled or changed, which is defined as an action. In this paper, we propose the concept of action reduct to formulate necessary actions. An action reduct has following properties; (1) It is obtained from objects having favorable decision values. (2) It is a distinct set of values not found in the other group, the group not having favorable decision values. (3) It is the minimal set of differences between separate groups of objects. The minimal set has the advantages when formulating actions because smaller changes are easier to undertake. The rest of this paper is organized as follows. Chapter 2 describes the algorithm. Related works are presented in Chapter 3. Implementation and experimental results are shown in Chapter 4. Chapter 5 concludes the paper. Table 1. Information System S. The decision attribute is D. The condition attributes are B, C, and E. There are eight objects referred as x 1 ~ x 8 B C E D x 1 b 2 c 1 e 1 d 2 x 2 b 1 c 3 e 2 d 2 x 3 b 1 c 1 d 2 x 4 b 1 c 3 e 1 d 2 x 5 b 1 c 1 e 1 d 1 x 6 b 1 c 1 e 1 d 1 x 7 x 8 b 2 b 1 c 2 e 2 e 2 d 1 d 1 2 Algorithm 2.1 Notations We will use the following notations throughout the paper. By an information system [2] we mean a triple S=(X,A,V), where X = {x 1,x 2,...x i } is a finite set of objects, A = {a 1,a 2,...a j } is a finite set of attributes, defined as partial functions from X into V, V = {v 1,v 2,...,v k } is a finite set of attribute values.

3 We also assume that V = {V a : a A}, where V a is a domain of attribute a. For instance, in the information system S presented by Table 1, x 1 refers to the first row and B(x 1 ) = b 2. There are four attributes, A = {B, C, E, D}. We classify the attributes into two types: condition and decision. The condition attributes are B, C, and E, and the decision attribute is D. We assume that the set of condition attributes is further partitioned into stable attributes, A S, and flexible attributes, A F. An attribute is called stable if the values assigned to objects do not change over time. Otherwise, the attribute is flexible. Birth date is an example of a stable attribute. Interest rate is a flexible attribute. A F = {B,C} A S = {E} D = decision attribute The values in D are divided into two sets. d = { v i V D ; v i is a desired decision value} d = V D - d For simplicity of presentation, we use an example having only one element d 2 in d and d 1 in d (d = {d 2 }, d = {d 1 }). However, the algorithm described in the next section directly carries over to the general case where d 2 and d 2. The objects are partitioned into 2 groups based on the decision values. X = {x i X; D(x i ) d }; i.e., objects that the decision values are in d X = {x j X; D(x j ) d }; i.e., objects that the decision values are in d In Table 1, X = {x 1, x 2, x 3, x 4 } and X = {x 5, x 6, x 7, x 8 }. 2.2 Algorithm Description We want to provide the user a list of attribute values that can be used to make changes on some of the objects to steer the unfavorable decision value to a more favorable value. We use the reduct [2][9] to create that list. By a reduct relative to an object x we mean a minimal set of attribute values distinguishing x from all other objects in the information system. For example, {b 2, e 2 } is a reduct relative to x 7 since {b 2, e 2 } can differentiate x 7 from other objects in S (see Table 1).

4 Now, we will extend the concept of reduct relative to an object to -reduct. We partitioned the objects in S into two groups by their decision values. Objects in X have d 2 that is the favorable decision value. Our goal is to identify which sets of condition attribute values describing objects in X make them different from the objects in X. Although there are several different ways to find distinct condition attribute values (e.g. association rules), reducts have clear advantages; it does not require the user to specify the rule extraction criteria, such as support and confidence values, while generating the minimal set of distinct sets. Thereby, the algorithm is much easier to use, and creates a consistent result across different users. Table 5 shows -reducts for S. Those are the smallest sets of condition attribute values that are different from the condition attribute values representing objects in X. We obtained the first two -reducts relative to x 1. These are the prime implicants [1] (see the next section for details) of the differences between x 1 and {x 5, x 6, x 7, x 8 }. Subsequent - reducts are extracted using objects x 2, x 3, and x 4. We need a method to measure the usability of the -reducts. Two factors, frequency and hit ratio, determine the usability. (1) Frequency : More than one object can have the same -reduct. The frequency of an -reduct in X is denoted by f. (2) Hit Ratio : The hit ratio, represented as h, is the ratio between the number of applicable objects and the total number of objects in X. An applicable object is the object that the attribute values are different from those in -reduct, and they are not stable values. The -reducts may not be used to make changes for some of the attribute values of x X for two reasons. Some objects in X do not differ in terms of their attribute values. Therefore, changes cannot be made. Second, we cannot modify stable attribute values. It does not make sense to suggest a change of un-modifiable values. We define the following function to measure the usability. The weight of an -reduct k is, w k = ( f k h k ) / ( ( f h)), where f k and h k are the frequency and hit ratio for k, and ( f h) is the sum of the weights of all -reducts. It provides a way to prioritize the -reduct using a normalized value.

5 Table 2. X. The objects classified as d 2. We assume that d 2 is the favorable decision. B C E D x 1 b 2 c 1 e 1 d 2 x 2 b 1 c 3 e 2 d 2 x 3 b 1 c 1 d 2 x 4 b 1 c 3 e 1 d 2 Table 3. X. The objects classified as d 1 B C E D x 5 b 1 c 1 e 1 d 1 x 6 b 1 c 1 e 1 d 1 x 7 x 8 b 2 b 1 c 2 e 2 e 2 d 1 d Example Step 1. Finding -reducts Using the partitions in Tables 2 and 3, we find distinct attribute values of x X against x X. The following matrix shows the discernable attribute values for {x 1, x 2, x 3, x 4 } against {x 5, x 6, x 7, x 8 } Table 4. Discernable attribute values for {x 1, x 2, x 3, x 4 } against {x 5, x 6, x 7, x 8 } x 1 x 2 x 3 x 4 x 5 b 2 c 3 + e 2 c 3 x 6 b 2 c 3 + e 2 c 3 x 7 c 1 + e 1 b 1 + c 3 b 1 + c 1 b 1 + c 3 + e 1 x 8 b 2 + c 1 + e 1 c 3 c 1 c 3 + e 1 For example, b 2 in x 1 is different from x 5. We need b 2 to discern x 1 from x 5. Either c 1 or (or is denoted as + sign) e 1 can be used to distinguish x 1 from x 7. In order to find the minimal set of values that distinguishes x 1 from all objects in X = {x 5, x 6, x 7, x 8 } we multiply all discernable values: (b 2 ) (b 2 ) (c 1 + e 1 ) ( b 2 + c 1 + e 1 ). That is, (b 2 ) and (c 1

6 or e 1 ) and (b 2 or c 1 or e 1 ) should be different to make x 1 distinct from all other objects. The process is known as finding a prime implicant by converting the conjunction normal form (CNF) to disjunction normal form (DNF)[2][9]. The -reduct r(x 1 ) relative to x 1 is computed using the conversion and the absorption laws: r(x 1 ) = (b 2 ) (b 2 ) (c 1 + e 1 ) (b 2 + c 1 + e 1 ) = (b 2 ) (b 2 ) (c 1 + e 1 ) = (b 2 c 1 ) + (b 2 e 1 ) A missing attribute value of an object in X does not qualify to discern the object from the objects in X because it is undefined. A missing value in X, however, is regarded as a different value if a value is present in x X. When a discernible value does not exist, we do not include it in the calculation of the prime implicant. We acquired the following - reducts: r(x 1 ) = (b 2 c 1 ) + (b 2 e 1 ) r(x 2 ) = (c 3 ) r(x 3 ) = NIL r(x 4 ) = (c 3 ) Step 2. Measuring the usability of -reduct Table 5 shows the -reducts for information System S. The frequency of -reduct {b 2, c 1 } is 1 because it appears in X once. The hit ratio is 4/4 = 1, meaning that we can use the reduct for all objects in X. The weight is 0.25, which is acquired by dividing its weight, f h = 1, by the sum of all weights, ( f h) = (1 1) + (1 0.5) + (2 1) + (1 0.5) = 4. The values in the stable attribute E cannot be modified. Therefore, the hit ratio for - reduct {b 2, e 1 } is 2/4 = 0.5 because the stable value, e 2, in x 7 and x 8 cannot be converted to e 1. Table 5. -reduct for Information Sytem S. * indicate a stable attribute value. -reduct Weight (w) Frequency (f) Hit Ratio (h) {b 2, c 1 } 2/7 (0.29%) 1 1 {b 2, e * 1 } 1/7 (0.14%) {c 3 } 4/7 (0.57%) 2 1 Using the -reduct {c 3 } that has the highest weight, we can make a recommendation; change the value of C in X to c 3 in order to induce the decision value in X to d 2.

7 3 Implementation and Experiment We implemented the algorithm in Python 2.6 on a MacBook computer running OS X, and tested it using a sample data set (lenses) obtained from [8]. The data set contains information for fitting contact lenses. Table 6 shows the attributes names, descriptions, and the partitions. The decision attribute, lenses, has three classes. We set the second class (soft lenses) as the favorable decision value. The data set has 4 condition attributes. We assumed that the age of the patient is a stable attribute, and prescription, astigmatic and tear production rate are flexible attributes. The favorable decision value and the attribute partition are defined only for this experiment. Their actual definitions might be different. All attributes are categorical in the dataset. Table 6. Dataset used for the experiment. Attribute Values Type (1) age of the patient * young, pre-presbyopic, presbyopic Stable (2) spectacle prescription myope, hypermetrope Flexible (3) astigmatic no, yes Flexible (4) tear production rate reduced, normal Flexible (5) lenses the patient fitted with hard contact lenses the patient fitted with soft contact lenses = d the patient not be fitted with contact lenses. Decision Table 7 shows the -reducts generated during experiment. We interpret them as action reducts. For instance, the second -reduct can be read as, change the values in attribute 2, 3, and 4 to the suggested values (hypermetrope, no, normal) in order to change the decision to 'soft lenses'. Because there is no stable attribute value in the -reduct and the same pattern has not been found in X, its hit ratio is 1. Table 7. -reduct for d = soft lenses. * indicate the attribute value is stable. The number in the () is the attribute number in Table 6. -reduct. Weight (w) Frequency (f) Hit Ratio (h) young(1)*, no(3), normal(4) hypermetrope(2), no(3), normal(4) pre-presbyopic(1)*, no(3), normal(4)

8 4 Related Work and Contribution The procedure for formulating an action from existing database has been discussed in many literatures. A definition of an action as a form of a rule was given in [4]. The method of action rules discovery from certain pairs of association rules was proposed in [3]. A concept similar to action rules known as interventions was introduced in [5]. The action rules introduced in [3] has been investigated further. In [12], authors present a new agglomerative strategy for constructing action rules from single classification rules. Algorithm ARAS, proposed in [13], is also an agglomerative type strategy generating action rules. The method generates sets of terms (built from values of attributes) around classification rules and constructs action rules directly from them. In [10], authors proposed a method that extracts action rules directly from attribute values from an incomplete information system without using pre-existing conditional rules. In these earlier works, action rules are constructed from classification rules. This means that they use pre-existing classification rules or generate rules using a rule discovery algorithm, such as LERS [6], then, construct action rules either from certain pairs of the rules or from a single classification rule. The methods in [10], [13], [14] and [7] do not formulate actions directly from existing classification rules. However, the extraction of classification rules during the formulation of an action rule is inevitable because actions are built as the effect of possible changes in different rules. Action rules constructed from classification rules provide a complete list of actions to be taken. However, we want to develop a method that provides a simple set of attribute values to be modified without using a typical action rule form (e.g. [condition1 condition2]=>[decision1 decision2] ) for decision makers who want to have simple recommendations. The recommendations made by -reduct is typically quite simple, i.e. change a couple of values to have a better outcome. In addition, it does not require from the user to define two sets of support and confidence values; one set for classification rules, and other for action rules. 5 Summary This paper discussed an algorithm for finding reducts (also called action reducts) from an information system, and presented an experimental result. An action reduct is a minimal set of attribute values distinguishing a favorable object from other objects, and are used to formulate necessary actions. The action suggested by an action reduct aims to induce the change of the decision attribute value by changing the condition attribute values to the unique pattern in the action reduct. The algorithm is developed as part of ongoing research project seeking solution to the problem of reducing college freshman dropouts. We plan to run the algorithm using real world data in the near future.

9 Acknowledgments. This research has been partially supported by the President's Mentorship Fund at the University of Pittsburgh Johnstown. References 1. Gimpel, J. F.: A Reduction Technique for Prime Implicant Tables. In: The Fifth Annual Symposium on Switching Circuit Theory and Logical Design, pp IEEE, Washington (1964) 2. Pawlak, Z.: Rough Sets-Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht. (1991) 3. Ras, Z.W., Wieczorkowska, A.: Action-Rules: How to Increase Profit of a Company. In: Editor: Zighed, D., Komorowski, J., Zytkow, J. (eds.) Principles of Data Mining and Knowledge Discovery, PADD LNCS, vol. 1920, pp Springer, Heidelberg (2000) 4. Geffner, H., Wainer, J.: Modeling Action, Knowledge and Control. In: ECAI, pp John Wiley & Sons (1998) 5. Greco, S., Matarazzo, B., Pappalardo, N., Slowinski, R.: Measuring Expected Effects of Interventions based on Decision Rules. Journal of Experimental and Theoretical Artificial Intelligence., Vol. 17, No. 1-2, (2005) 6. Grzymala-Busse, J.: A New Version of the Rule Induction System LERS, Fundamenta Informaticae, Vol. 31, No. 1, (1997) 7. He, Z., Xu, X., Deng, S., Ma, R.: Mining Action Rules from Scratch, Expert Systems with Applications, Vol. 29, No. 3, (2005) 8. Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, mlearn/mlrepository.html, University of California, Irvine, Dept. of Information and Computer Sciences (1998) 9. Skowron, A.: Rough Sets and Boolean Reasoning. In: Pedrycz, W. (eds.) Granular Computing : An Emerging Paradigm, pp Springer, Heidelberg (2001) 10. Im, S., Ras Z.W., Wasyluk, H.: Action Rule Discovery from Incomplete Data. Knowledge and Information Systems. Vol. 25, No. 1, (2010) 11. Qiao, Y., Zhong, K., Wangand, H., Li, X.: Developing Event-Condition-Action Rules in Realtime Active Database. In: ACM Symposium on Applied Computing 2007, pp ACM, New York (2007) 12. Ras, Z.W., Dardzinska, A.: Action Rules Discovery, a New Simplified Strategy. In: F. Esposito et al. (eds.), Foundations of Intelligent Systems, ISMIS LNAI, Vol. 4203, pp , Springer, Heidelberg (2006)

10 13. Ras, Z.W., Tzacheva, A., Tsay, L., Gurdal, O.: Mining for Interesting Action Rules. In: IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pp , IEEE, Washington (2005) 14. Ras, Z.W., Dardzinska, A.: Action Rules Discovery based on Tree Classifiers and Meta-Actions. In: J. Rauch et al. (eds.), Foundations of Intelligent Systems, ISMIS LNAI, Vol. 5722, pp , Springer, Heidelberg (2009)

### SEEM Introduction to Knowledge Systems and Machine Learning

SEEM 5470 Introduction to Knowledge Systems and Machine Learning Data vs. Knowledge Society produces huge amounts of data sources: business, science, medicine, economics, geography, environment, sports,

### Data mining techniques: decision trees

Data mining techniques: decision trees 1/39 Agenda Rule systems Building rule systems vs rule systems Quick reference 2/39 1 Agenda Rule systems Building rule systems vs rule systems Quick reference 3/39

### Rough Sets and Fuzzy Rough Sets: Models and Applications

Rough Sets and Fuzzy Rough Sets: Models and Applications Chris Cornelis Department of Applied Mathematics and Computer Science, Ghent University, Belgium XV Congreso Español sobre Tecnologías y Lógica

### Towards a Practical Approach to Discover Internal Dependencies in Rule-Based Knowledge Bases

Towards a Practical Approach to Discover Internal Dependencies in Rule-Based Knowledge Bases Roman Simiński, Agnieszka Nowak-Brzezińska, Tomasz Jach, and Tomasz Xiȩski University of Silesia, Institute

### Encyclopedia of Information Ethics and Security

Encyclopedia of Information Ethics and Security Marian Quigley Monash University, Australia InformatIon ScIence reference Hershey New York Acquisitions Editor: Development Editor: Senior Managing Editor:

### Enhanced data mining analysis in higher educational system using rough set theory

African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

### A ROUGH SET-BASED KNOWLEDGE DISCOVERY PROCESS

Int. J. Appl. Math. Comput. Sci., 2001, Vol.11, No.3, 603 619 A ROUGH SET-BASED KNOWLEDGE DISCOVERY PROCESS Ning ZHONG, Andrzej SKOWRON The knowledge discovery from real-life databases is a multi-phase

### HANDLING MISSING ATTRIBUTE VALUES

Chapter 1 HANDLING MISSING ATTRIBUTE VALUES Jerzy W. Grzymala-Busse University of Kansas Witold J. Grzymala-Busse FilterLogix Abstract In this chapter methods of handling missing attribute values in data

### Inclusion degree: a perspective on measures for rough set data analysis

Information Sciences 141 (2002) 227 236 www.elsevier.com/locate/ins Inclusion degree: a perspective on measures for rough set data analysis Z.B. Xu a, *, J.Y. Liang a,1, C.Y. Dang b, K.S. Chin b a Faculty

### Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

### Modelling Complex Patterns by Information Systems

Fundamenta Informaticae 67 (2005) 203 217 203 IOS Press Modelling Complex Patterns by Information Systems Jarosław Stepaniuk Department of Computer Science, Białystok University of Technology Wiejska 45a,

### An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

### Data Mining: A Preprocessing Engine

Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Pattern Mining and Querying in Business Intelligence

Pattern Mining and Querying in Business Intelligence Nittaya Kerdprasop, Fonthip Koongaew, Phaichayon Kongchai, Kittisak Kerdprasop Data Engineering Research Unit, School of Computer Engineering, Suranaree

### Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

### Roulette Sampling for Cost-Sensitive Learning

Roulette Sampling for Cost-Sensitive Learning Victor S. Sheng and Charles X. Ling Department of Computer Science, University of Western Ontario, London, Ontario, Canada N6A 5B7 {ssheng,cling}@csd.uwo.ca

### Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Analysis of Software Process Metrics Using Data Mining Tool -A Rough Set Theory Approach

Analysis of Software Process Metrics Using Data Mining Tool -A Rough Set Theory Approach V.Jeyabalaraja, T.Edwin prabakaran Abstract In the software development industries tasks are optimized based on

### FACTORIZATION METHODS OF BINARY, TRIADIC, REAL AND FUZZY DATA

STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVI, Number 2, 2011 FACTORIZATION METHODS OF BINARY, TRIADIC, REAL AND FUZZY DATA CYNTHIA VERA GLODEANU Abstract. We compare two methods regarding the factorization

### A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA. Kevin Voges University of Canterbury. Nigel Pope Griffith University

Abstract A ROUGH CLUSTER ANALYSIS OF SHOPPING ORIENTATION DATA Kevin Voges University of Canterbury Nigel Pope Griffith University Mark Brown University of Queensland Track: Marketing Research and Research

### The Application of Association Rule Mining in CRM Based on Formal Concept Analysis

The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

### On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

### DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

### Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

### DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

### Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering. PhD Thesis. Khurum Nazir Junejo 2004-03-0018

Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar

### Equational Reasoning as a Tool for Data Analysis

AUSTRIAN JOURNAL OF STATISTICS Volume 31 (2002), Number 2&3, 231-239 Equational Reasoning as a Tool for Data Analysis Michael Bulmer University of Queensland, Brisbane, Australia Abstract: A combination

### Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

### Clustering in Machine Learning. By: Ibrar Hussain Student ID:

Clustering in Machine Learning By: Ibrar Hussain Student ID: 11021083 Presentation An Overview Introduction Definition Types of Learning Clustering in Machine Learning K-means Clustering Example of k-means

### ROUGH SETS AND DATA MINING. Zdzisław Pawlak

ROUGH SETS AND DATA MINING Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, ul. altycka 5, 44 100 Gliwice, Poland ASTRACT The paper gives basic ideas of rough

### Controlflow-based Coverage Criteria

Controlflow-based Coverage Criteria W. Eric Wong Department of Computer Science The University of Texas at Dallas ewong@utdallas.edu http://www.utdallas.edu/~ewong Dataflow-based Coverage Criteria ( 2012

### Content-Based Recommendation

Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

### Marcin Szczuka The University of Warsaw Poland

Marcin Szczuka The University of Warsaw Poland Why KDD? We are drowning in the sea of data, but what we really want is knowledge PROBLEM: How to retrieve useful information (knowledge) from massive data

### INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal

INTERNATIONAL JOURNAL OF ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY An International online open access peer reviewed journal Research Article ISSN 2277 9140 ABSTRACT Web page categorization based

### Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Abstract Effective website personalization is at the heart of many e-commerce applications. To ensure that customers

### Reducing multiclass to binary by coupling probability estimates

Reducing multiclass to inary y coupling proaility estimates Bianca Zadrozny Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 zadrozny@cs.ucsd.edu

### An Outline of a Theory of Three-way Decisions

An Outline of a Theory of Three-way Decisions Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract. A theory of three-way

### Discernibility Thresholds and Approximate Dependency in Analysis of Decision Tables

Discernibility Thresholds and Approximate Dependency in Analysis of Decision Tables Yu-Ru Syau¹, En-Bing Lin²*, Lixing Jia³ ¹Department of Information Management, National Formosa University, Yunlin, 63201,

### Incorporating Evidence in Bayesian networks with the Select Operator

Incorporating Evidence in Bayesian networks with the Select Operator C.J. Butz and F. Fang Department of Computer Science, University of Regina Regina, Saskatchewan, Canada SAS 0A2 {butz, fang11fa}@cs.uregina.ca

### WEB PAGE CATEGORISATION BASED ON NEURONS

WEB PAGE CATEGORISATION BASED ON NEURONS Shikha Batra Abstract: Contemporary web is comprised of trillions of pages and everyday tremendous amount of requests are made to put more web pages on the WWW.

### MONITORING CUSTOMER SATISFACTION IN SERVICE INDUSTRY: A CLUSTER ANALYSIS APPROACH

KVALITA INOVÁCIA PROSPERITA / QUALITY INNOVATION PROSPERITY XVI/1 2012 49 MONITORING CUSTOMER SATISFACTION IN SERVICE INDUSTRY: A CLUSTER ANALYSIS APPROACH MATUS HORVATH, ALEXANDRA MICHALKOVA 1 INTRODUCTION

### Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

### Requirements-based Test Generation for Predicate Testing

Requirements-based Test Generation for Predicate Testing W. Eric Wong Department of Computer Science The University of Texas at Dallas ewong@utdallas.edu http://www.utdallas.edu/~ewong 1 Speaker Biographical

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Probability Using Dice

Using Dice One Page Overview By Robert B. Brown, The Ohio State University Topics: Levels:, Statistics Grades 5 8 Problem: What are the probabilities of rolling various sums with two dice? How can you

### Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

### Feature Selection with Rough Sets for Web Page Classification

Feature Selection with Rough Sets for Web Page Classification Aijun An 1, Yanhui Huang 2, Xiangji Huang 1, and Nick Cercone 3 1 York University, Toronto, Ontario, M3J 1P3, Canada 2 University of Waterloo,

### Cluster analysis and Association analysis for the same data

Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland hfu@tssg.org Abstract: Both cluster

### IT services for analyses of various data samples

IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

### Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

### Data Mining. Practical Machine Learning Tools and Techniques. Classification, association, clustering, numeric prediction

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 2 of Data Mining by I. H. Witten and E. Frank Input: Concepts, instances, attributes Terminology What s a concept? Classification,

### Domain Classification of Technical Terms Using the Web

Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

### A Comparison of Variable Selection Techniques for Credit Scoring

1 A Comparison of Variable Selection Techniques for Credit Scoring K. Leung and F. Cheong and C. Cheong School of Business Information Technology, RMIT University, Melbourne, Victoria, Australia E-mail:

### Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

### Ensemble Data Mining Methods

Ensemble Data Mining Methods Nikunj C. Oza, Ph.D., NASA Ames Research Center, USA INTRODUCTION Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods

### AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM ABSTRACT Luis Alexandre Rodrigues and Nizam Omar Department of Electrical Engineering, Mackenzie Presbiterian University, Brazil, São Paulo 71251911@mackenzie.br,nizam.omar@mackenzie.br

### DATA MINING PROCESS FOR UNBIASED PERSONNEL ASSESSMENT IN POWER ENGINEERING COMPANY

Medunarodna naucna konferencija MENADŽMENT 2012 International Scientific Conference MANAGEMENT 2012 Mladenovac, Srbija, 20-21. april 2012 Mladenovac, Serbia, 20-21 April, 2012 DATA MINING PROCESS FOR UNBIASED

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

### Data Mining Practical Machine Learning Tools and Techniques

Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Outline Terminology What s a concept Classification, association, clustering, numeric

### CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition

CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition Neil P. Slagle College of Computing Georgia Institute of Technology Atlanta, GA npslagle@gatech.edu Abstract CNFSAT embodies the P

### T3: A Classification Algorithm for Data Mining

T3: A Classification Algorithm for Data Mining Christos Tjortjis and John Keane Department of Computation, UMIST, P.O. Box 88, Manchester, M60 1QD, UK {christos, jak}@co.umist.ac.uk Abstract. This paper

Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

### 11/14/2010 Intelligent Systems and Soft Computing 1

Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic

### Grid Density Clustering Algorithm

Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2

### A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

### Interactive Fuzzy Interval Reasoning for smart Web shopping

Applied Soft Computing 5 (2005) 433 439 www.elsevier.com/locate/asoc Interactive Fuzzy Interval Reasoning for smart Web shopping Fuyu Liu, Hongli Geng, Yan-Qing Zhang* Department of Computer Science, Georgia

### Data mining and complex telecommunications problems modeling

Paper Data mining and complex telecommunications problems modeling Janusz Granat Abstract The telecommunications operators have to manage one of the most complex systems developed by human beings. Moreover,

### Overview. Essential Questions. Precalculus, Quarter 4, Unit 4.5 Build Arithmetic and Geometric Sequences and Series

Sequences and Series Overview Number of instruction days: 4 6 (1 day = 53 minutes) Content to Be Learned Write arithmetic and geometric sequences both recursively and with an explicit formula, use them

### EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

### METHODOLOGICAL CONSIDERATIONS OF DRIVE SYSTEM SIMULATION, WHEN COUPLING FINITE ELEMENT MACHINE MODELS WITH THE CIRCUIT SIMULATOR MODELS OF CONVERTERS.

SEDM 24 June 16th - 18th, CPRI (Italy) METHODOLOGICL CONSIDERTIONS OF DRIVE SYSTEM SIMULTION, WHEN COUPLING FINITE ELEMENT MCHINE MODELS WITH THE CIRCUIT SIMULTOR MODELS OF CONVERTERS. Áron Szûcs BB Electrical

### MADR Algorithm to Recover Authenticity from Damage of the Important Data

, pp. 443-452 http://dx.doi.org/10.14257/ijmue.2014.9.12.39 MADR Algorithm to Recover Authenticity from Damage of the Important Data Seong-Ho An 1, * Kihyo Nam 2, Mun-Kweon Jeong 2 and Yong-Rak Choi 1

### Performance Analysis of Clustering using Partitioning and Hierarchical Clustering Techniques. Karunya University, Coimbatore, India. India.

Vol.7, No.6 (2014), pp.233-240 http://dx.doi.org/10.14257/ijdta.2014.7.6.21 Performance Analysis of Clustering using Partitioning and Hierarchical Clustering Techniques S. C. Punitha 1, P. Ranjith Jeba

### Cracking a Logical Puzzle with ROSETTA

Cracking a Logical Puzzle with ROSETTA Aleksander Øhrn Knowledge Systems Group Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway aleks@idi.ntnu.no

### In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

### Modeling and Design of Intelligent Agent System

International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,

### Data Integration: Financial Domain-Driven Approach

70 Data Integration: Financial Domain-Driven Approach Caslav Bozic 1, Detlef Seese 2, and Christof Weinhardt 3 1 IME Graduate School, Karlsruhe Institute of Technology (KIT) bozic@kit.edu 2 Institute AIFB,

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor

### Representation of Electronic Mail Filtering Profiles: A User Study

Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 +1 949 824 5888 pazzani@ics.uci.edu

### Section 7.1 First-Order Predicate Calculus predicate Existential Quantifier Universal Quantifier.

Section 7.1 First-Order Predicate Calculus Predicate calculus studies the internal structure of sentences where subjects are applied to predicates existentially or universally. A predicate describes a

### Probabilistic Rough Set Approximations

Probabilistic Rough Set Approximations Yiyu (Y.Y.) Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca Abstract Probabilistic approaches

### ON THE USE OF ASHENHURST DECOMPOSITION CHART AS AN ALTERNATIVE TO ALGORITHMIC TECHNIQUES IN THE SYNTHESIS OF MULTIPLEXER-BASED LOGIC CIRCUITS

Nigerian Journal of Technology, Vol. 18, No. 1, September, 1997 OSUAGWU 28 ON THE USE OF ASHENHURST DECOMPOSITION CHART AS AN ALTERNATIVE TO ALGORITHMIC TECHNIQUES IN THE SYNTHESIS OF MULTIPLEXER-BASED

### New inference algorithms based on rules partition

New inference algorithms based on rules partition Agnieszka Nowak-Brzezińska and Roman Simiński Department of Computer Science, Institute of Computer Science, Silesian University, Poland http://ii.us.edu.pl

### APPLYING GMDH ALGORITHM TO EXTRACT RULES FROM EXAMPLES

Systems Analysis Modelling Simulation Vol. 43, No. 10, October 2003, pp. 1311-1319 APPLYING GMDH ALGORITHM TO EXTRACT RULES FROM EXAMPLES KOJI FUJIMOTO* and SAMPEI NAKABAYASHI Financial Engineering Group,

### Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

### Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR ccfong@umac.mo Abstract An e-government

### SVM Ensemble Model for Investment Prediction

19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of

### Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

### CS 6220: Data Mining Techniques Course Project Description

CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### Hybrid Lossless Compression Method For Binary Images

M.F. TALU AND İ. TÜRKOĞLU/ IU-JEEE Vol. 11(2), (2011), 1399-1405 Hybrid Lossless Compression Method For Binary Images M. Fatih TALU, İbrahim TÜRKOĞLU Inonu University, Dept. of Computer Engineering, Engineering

### Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

### Investigation of Adherence Degree of Agile Requirements Engineering Practices in Non-Agile Software Development Organizations

Investigation of Adherence Degree of Agile Requirements Engineering Practices in Non-Agile Software Development Organizations Mennatallah H. Ibrahim Department of Computers and Information Sciences Institute

### INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) upasnaa.8@gmail.com Harpreet Kaur

### Ensemble Methods for Noise Elimination in Classification Problems

Ensemble Methods for Noise Elimination in Classification Problems Sofie Verbaeten and Anneleen Van Assche Department of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200 A, B-3001 Heverlee,