Fuzzy sets in Data mining- A Review
|
|
|
- Hortense Ethel Holland
- 10 years ago
- Views:
Transcription
1 Fuzzy sets in Data mining- A Review MUNTAHA AHMAD Assistant Professor Birla Institute of Technology, Mesra Ranchi,Extension Centre NOIDA Prof. (Dr.) AJAY RANA Program Director Amity School of Engineering and Technology Amity University Uttar Pradesh, NOIDA Abstract Data mining, also called knowledge discovery in databases, is regarded as a non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable knowledge in large-scale data. This paper briefly reviews some typical applications and highlights potential contributions that fuzzy set theory can make to data mining. In this connection, some advantages of fuzzy methods for representing and mining vague patterns in data are especially emphasized. The aim of this paper is to convey an impression of the current status and prospects of fuzzy set in data mining, especially highlighting potential features and advantages of fuzzy in comparison with non-fuzzy approaches. Keywords: Fuzzy set; Data mining; Association rules; Fuzzy rule bases; Clustering; Decision trees; Gradual nature; 1. Introduction Data mining techniques (DMT) have formed a branch of applied artificial intelligence (AI), since the 1960s. During the intervening decades, important innovations in computer systems have led to the introduction of new technologies ( Ha, Bae, & Park, 2000), for web-based education. Data mining allows a search, for valuable information, in large volumes of data ( Weiss & Indurkhya, 1998). The explosive growth in databases has created a need to develop technologies that use information and knowledge intelligently. Therefore, DMT has become an increasingly important research area( Fayyad, Djorgovski, & Weir, 1996)[1]. Data mining is of an exploratory nature and can also be seen as exploratory data a n a l ys i s with a special focus on large data collections. It is quite possible that the questions we want to answer with data mining methods are not clear from the beginning. During the analysis process new questions may arise and we may have to repeat it several times, possibly applying different methods each time. Some well-known analysis methods and tools that are used in data mining are, for example, statistics (regression analysis, discriminate analysis etc.), time series analysis, decision trees, (fuzzy) cluster analysis, neural networks, inductive logic programming, and association rules. In this paper we concentrate on fuzzy methods in data mining and show where and how they can be used. Fuzzy set theory provides excellent means to model the fuzzy boundaries of linguistic terms by introducing gradual memberships. In contrast to classical set theory, in which an object or a case either is a member of a given set (defined, e.g., by some property) or not, fuzzy set theory makes it possible that an object or a case belongs to a set only to a certain degree, thus modeling the penumbra of the linguistic term describing the property that defines the set [2]. Interpretations of membership degrees include similarity, preference, and uncertainty : They can state how similar an object or case is to a prototypical one, they can indicate preferences between suboptimal solutions to a problem, or they can model uncertainty about the true situation, if this situation is described in imprecise terms. In general, due to their closeness to human reasoning, solutions obtained using fuzzy approaches are easy 273
2 to understand and to apply. Due to these strengths, fuzzy systems are the method of choice, if linguistic, vague, or imprecise information has to be modeled Fuzzy Logic Fundamentals The concept of Fuzzy Logic (FL) was conceived by Lotfi Zadeh, a professor at the University of California at Berkley, and presented not as a control methodology, but as a way of processing data by allowing partial set membership rather than crisp set membership or non-membership. This approach to set theory was not applied to control systems until the 70's due to insufficient small-computer capability prior to that time. Professor Zadeh reasoned that people do not require precise, numerical information input, and yet they are capable of highly adaptive control. If feedback controllers could be programmed to accept noisy, imprecise input, they would be much more effective and perhaps easier to implement. 2. Typical Applications of fuzzy set theory The tools and technologies that have been developed in FST have the potential to support all of the steps that comprise a process of model induction or knowledge discovery. In particular, FST can already be used in the data selection and preparation phase, e.g., for modeling vague data in terms of fuzzy sets, to condense several crisp observations into a single fuzzy one, or to create fuzzy summaries of the data. As the data to be analyzed thus becomes fuzzy, one subsequently faces a problem of analyzing fuzzy data, i.e., of fuzzy data analysis [3]. The problem of analyzing fuzzy data can be approached in at least two principally different ways. First, standard methods of data analysis can be extended in a rather generic way by means of an extension principle, that is, by fuzzifying the mapping from data to models. A second, often more sophisticated approach is based on embedding the data into more complex mathematical spaces, such as fuzzy metric spaces [4], and to carry out data analysis in these spaces [5]. If fuzzy methods are not used in the data preparation phase, they can still be employed in a later stage in order to analyze the original data. Thus, it is not the data to be analyzed that is fuzzy, but rather the methods used for analyzing the data (in the sense of resorting to tools from FST). Subsequently, we shall focus on this type of fuzzy data analysis (where the adjective fuzzy refers to the term analysis, not to the term data), which is predominant in DM. In the following, we focus on fuzzy extensions of some well-known data mining methods without repeating the original methods themselves; thus, we assume basic familiarity with these methods Learning Fuzzy Rule Bases One possible application of fuzzy systems in data mining is the induction of fuzzy rules in order to interpret the underlying data linguistically. To describe a fuzzy system completely we need to determine a rule base (structure) and fuzzy partitions (parameters) for all variables. If we apply such techniques, we must be aware of the trade-off between precision and interpretability. A solution is not only judged by its accuracy, but also if not primarily by its simplicity and readability: A user of a fuzzy system must be able to comprehend the rule base. Important points for the interpretability of a fuzzy system are that there are only few fuzzy rules in the rule base. Each rule should use only a few variables and the variables should be partitioned by few meaningful fuzzy sets. It is also important that no linguistic label is represented by more than one fuzzy set. The complexity of the learning task, obviously, leads to a problem: When learning from information, one must choose between(often quantitative) methods that achieve good performance and (often qualitative) models that explains what is going on to a user. This is another good example of Zadeh s principle of the incompatibility between precision and meaning. Of course, precision and high performance are important goals. However, in the most successful fuzzy applications in industry such as intelligent control and pattern classification, the introduction of fuzzy sets was motivated by the need for more human friendly computerized devices that help a user to formulate his knowledge and to clarify, to process, to retrieve, and to exploit the available information in a most simple way. In order to achieve this user friendliness, o f t e n certain ( limited) reductions in performance a nd solution quality are accepted. In fuzzy set theory (FST), one of the cornerstones of soft computing, aspects of knowledge representation and reasoning have dominated research for a long time, at least in that part of the theory which lends itself to intelligent systems design and applications in AI. Yet, problems of automated learning and knowledge acquisition have more and more come to the fore in 274
3 recent years, and numerous contributions to DM have been made in the meantime Fuzzy cluster analysis Many conventional clustering algorithms, such as the prominent k-means algorithm, produce a clustering structure in which every object is assigned to one cluster in an unequivocal way. Consequently, the individual clusters are separated by sharp boundaries. In practice, such boundaries are often not very natural or even counterintuitive. Rather, the boundary of single clusters and the transition between different clusters are usually smooth. This is the main motivation underlying fuzzy extensions to clustering algorithms [6]. In fuzzy clustering, an object may belong to different clusters at the same time, at least to some extent, and the degree to which it belongs to a particular cluster is expressed in terms of a fuzzy membership. The membership functions of the different clusters (defined on the set of observed data points) is usually assumed to form a partition of unity. This version, often called probabilistic clustering, can be generalized further by weakening this constraint as, e.g., in possibilistic clustering [7]. Fuzzy clustering has proved to be extremely useful in practice and is now routinely applied also outside the fuzzy community (e.g., in recent bioinformatics applications) Fuzzy decision tree induction Fuzzy variants of decision tree induction have been developed for quite a while (e.g.[8,9]) and seem to remain a topic of interest even today. In the case of decision trees, it is primarily the crisp thresholds used for defining splitting predicates (constraints), such as size - 181, at inner nodes such thresholds lead to hard decision boundaries in the input space, which means that a slight variation of an attribute (e.g. size ¼ 182 instead of size ¼ 181) can entail a completely different classification of an object (e.g., of a person characterized by size, weight, gender, etc.). Moreover, the learning process becomes unstable in the sense that a slight variation of the training examples can change the induced decision tree drastically. In order to make the decision boundaries soft, an obvious idea is to apply fuzzy predicates at the inner nodes of a decision tree, such as height- TALL, where TALL is a fuzzy set (rather than an interval). In other words, a fuzzy partition instead of a crisp one is used for the splitting attribute (here size) at an inner node. Since an example can satisfy a fuzzy predicate to a certain degree, the examples are partitioned in a fuzzy manner as well. That is, an object is not assigned to exactly one successor node in a unique way, but perhaps to several successors with a certain degree. For example, a person whose height is 181 cm could be an element of the TALL-group to the degree, say, 0.7 and of the complementary group to the degree Fuzzy association analysis The use of fuzzy sets in connection with association analysis has been proposed by numerous authors (see [10,11] for recent overviews), with motivations closely resembling those in the case of rule learning and decision tree induction. Again, by allowing for soft rather than crisp boundaries of intervals, fuzzy sets can avoid certain undesirable threshold effects, this time concerning the quality measures of association rules (like support and confidence) rather than the classification of objects. Moreover, identifying fuzzy sets with linguistic terms allows for a comprehensible and user-friendly presentation of rules discovered in a database. Many standard techniques for association rule mining have been transferred to the fuzzy case, sometimes in a rather ad-hoc manner. Indeed, publications on this topic are often more concerned with issues of data pre-processing, e.g., the problem of finding good fuzzy partitions for the quantitative attributes, rather than the rule mining process itself. For example, the existence of different types of fuzzy rules suggests that fuzzy associations can be interpreted in different ways and, hence, that the evaluation of an association cannot be independent of its interpretation. In particular, one can raise the question which generalized logical operators can reasonably be applied in order to evaluate fuzzy associations, e.g., whether the antecedent part and the consequent part should be combined in a conjunctive way (a` la Mamdani rules) or by means of a generalized implication (as in implication-based fuzzy rules) [12]. Moreover, since standard evaluation measures for association rules can be generalized in many ways, it is interesting to investigate properties of particular generalizations and to look for an axiomatic basis that supports the choice of specific measures [13]. 3. Potential contributions of fuzzy set theory 275
4 In the following, we highlight and critically comment some potential contributions that fuzzy set theory can make to data mining Gradual concept The ability to represent gradual concepts and fuzzy properties in a thorough way is one of the key features of fuzzy sets. In data mining, the patterns of interest are often vague and have boundaries that are non-sharp in the sense of fuzzy set theory. To illustrate, consider the concept of a peak : it is usually not possible to decide in an unequivocal way whether a timely ordered sequence of measurements has a peak (a particular kind of pattern) or not. Rather, there is a gradual transition between having a peak and not having a peak. Taking graduality into account is also important if one must decide whether a certain property is frequent among a set of objects, e.g., whether a pattern occurs frequently in a data set. In fact, if the pattern is specified in an overly restrictive manner, it might easily happen that none of the objects matches the specification, even though many of them can be seen as approximate matches. Unfortunately, the representation of graduality is often foiled in machine learning applications, especially in connection with the learning of predictive models. In such applications, a fuzzy prediction is usually not desired, rather one is forced to come up with a definite final decision. Classification is an obvious example: eventually, a decision in favour of one particular class label has to be made, even if the object under consideration seems to have partial membership in several classes simultaneously. This is the case both in theory and practice: in practice, the bottom line is the course of action one takes on the basis of a prediction, not the prediction itself. In theory, a problem concerns the performance evaluation of a fuzzy classifier: the standard benchmark data sets have crisp rather than fuzzy labels. Moreover, a fuzzy classifier cannot be compared with a standard (non-fuzzy) classifier unless it eventually outputs crisp predictions. Needless to say, if a fuzzy predictor is supplemented with a defuzzification mechanism (like a winnertakes-all strategy in classification), many of its merits are lost. In the classification setting, for instance, a defuzzified fuzzy classifier does again produce hard decision boundaries in the input space. Thereby, it is actually reduced to a standard classifier. Moreover, if a classifier is solely evaluated on the basis of its predictive accuracy, then all that matters is the decision boundaries it produces in the input space. Since a defuzzified fuzzy classifier does not produce a decision boundary that is principally different from the boundaries produced by alternative classifiers (such as decision trees or neural networks), fuzzy machine learning methods do not have much to offer with regard to generalization performance. And indeed, fuzzy approaches to classification do usually not improve predictive accuracy Granularity Granular computing, including fuzzy set theory as one its main constituents, is an emerging paradigm of information processing in which information granules are considered as key components of knowledge representation. A central idea is that information can be processed on different levels of abstraction, and that the choice of the most suitable level depends on the problem at hand. As a means to trade off accuracy against efficiency and interpretability, granular computing is also relevant for data mining, not only for the model induction or pattern discovery process itself, but also for data preand post-processing, such as data compression and dimensionality reduction. For example, one of the most important data analysis methods, cluster analysis, can be seen as a process of information granulation, in which data objects are combined into meaningful groups so as to convey a useful idea of the main structure of a data set Interpretability A primary motivation for the development of fuzzy sets was to provide an interface between a numerical scale and a symbolic scale which is usually composed of linguistic terms. Thus, fuzzy sets have the capability to interface quantitative patterns with qualitative knowledge structures expressed in terms of natural language. This makes the application of fuzzy technology very appealing from a knowledge representational point of view. For example, it allows association rules discovered in a database to be presented in a linguistic and hence comprehensible way. In fact, the user-friendly representation of models and patterns is often emphasized as one of the key features of fuzzy methods. The use of linguistic modelling techniques does also produce some disadvantages, however. A first problem 276
5 concerns the ambiguity of fuzzy models: linguistic terms and, hence, models are highly subjective and context-dependent. It is true that the imprecision of natural language is not necessarily harmful and can even be advantageous. 4 A fuzzy controller, for example, can be quite insensitive to the concrete mathematical translation of a linguistic model. One should realize, however, that in fuzzy control the information flows in a reverse direction: the linguistic model is not the end product, as in DM, it rather stands at the beginning. It is of course possible to disambiguate a model by complementing it with the semantics of the fuzzy concepts it involves (including the specification of membership functions). Then, however, the complete model, consisting of a qualitative (linguistic) and a quantitative part, becomes cumbersome and will not be easily understandable Robustness It is often claimed that fuzzy methods are more robust than non-fuzzy methods. Of course, the term robustness can refer to many things, e.g., to the sensitivity of an induction method toward violations of the model assumptions. In connection with fuzzy methods, the most relevant type of robustness concerns sensitivity toward variations of the data. Generally, a learning or data mining method is considered robust if a small variation of the observed data does hardly alter the induced model or the evaluation of a pattern. A common argument supporting the claim that fuzzy models are in this sense more robust than non-fuzzy models refers to a boundary effect which occurs in various variants and is arguably an obvious drawback of interval-based methods. This effect refers to the fact that a variation of the boundary points of an interval can have a strong influence on a model or a pattern. In fact, it is not difficult to construct convincing demonstrations Representation of uncertainty To begin with, the data presented to learning algorithms is imprecise, incomplete or noisy most of the time, a problem that can badly mislead a learning procedure. But even if observations are perfect, the generalization beyond that data, the process of induction, is still afflicted with uncertainty. For example, observed data can generally be explained by more than one candidate theory, which means that one can never be sure of the truth of a particular model. Fuzzy sets and possibility theory have made important contributions to the representation and processing of uncertainty. In DM, like in other fields, related uncertainty formalisms can complement probability theory in a reasonable way, because not all types of uncertainty relevant to machine learning are probabilistic and because other formalisms are more expressive than probability. 3.6 Incorporation of background knowledge Roughly speaking, inductive (machine) learning can be seen as searching the space of candidate hypotheses for a most suitable model. The corresponding search process, regardless whether it is carried out in an explicit or implicit way, is usually biased in various ways, and each bias usually originates from a sort of background knowledge. For example, the representation bias restricts the hypothesis space to certain types of input output relations, such as linear or polynomial relationships. Incorporating background knowledge is extremely important, because the data by itself would be totally meaningless if considered from an unbiased point of view. As demonstrated by other application fields such as fuzzy control, fuzzy set-based modelling techniques provide a convenient tool for making expert knowledge accessible to computational methods and, hence, to incorporate background knowledge in the learning process. This can be done in various ways and on different levels. One very obvious approach is to combine modelling and learning in rule-based systems. For example, an expert can describe an input output relation in terms of a fuzzy rule base (as in fuzzy control). Afterward, the membership functions specifying the linguistic terms that have been employed by the expert can be adapted to the data in an optimal way. In other words, the expert specifies the rough structure of the rule-based model, while the fine-tuning ( model calibration ) is done in a data-driven way. An alternative approach, called constraint-regularized learning, aims at exploiting fuzzy set-based modelling techniques within the context of the regularization (penalization) framework of inductive learning. Here, the idea is to express vague, partial knowledge about an input output relation in terms of fuzzy constraints and to let such constraints play the role of a penalty term 277
6 within the regularization approach. Optimal model is one that achieves an optimal trade-off between fitting the data and satisfying the constraints. 4. Conclusions The previous sections have shown that FST can contribute to machine learning and data mining in various ways. Needless to say, for most of the issues that were addressed, a fuzzy approach will not be the only solution. Still, FST provides a relatively flexible framework in which different aspects of machine learning and data mining systems can be handled in a coherent way. In this regard, let us again highlight the following points: 1. FST has the potential to produce models that are more comprehensible, less complex, and more robust; fuzzy information granulation appears to be an ideal tool for trading off accuracy against complexity and understand ability. 2. In data mining, fuzzy methods appear to be especially useful for representing vague patterns, a point of critical importance in many fields of application. 3. FST, in conjunction with possibility theory, can contribute considerably to the modelling and processing of various forms of uncertain and incomplete information. 4. Fuzzy methods appear to be particularly useful for data pre-and post-processing. Despite the fact that substantial contributions have already been made to all of the aforementioned points, there is still space for improvement and a high potential for further developments. For example, concerning the first point, we already mentioned that notions like comprehensibility, simplicity, or robustness still lack an underlying formal theory including a quantification of their intuitive meaning in terms of universally accepted measures. Likewise, the fourth point has not received enough attention sofar. References [1] Shu-Hsien Liao, Pei-Hui Chu, Pei-Yuan Hsiao, Data mining techniques and applications A decade review from 2000 to 2011, Department of Management Sciences, Tamkang University, No. 151, Yingzhuan Rd., Tamsui Dist., New Taipei City 25137, Taiwan, ROC [2] Rudolf Kruse, Detlef Nauck, and Christian Borgelt, Data Mining with Fuzzy Methods: Status and Perspectives, Department of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg Universitatsplatz 2, D Magdeburg, Germany [3] H. Bandemer, W. Na ther, Fuzzy Data Analysis, Kluwer Academic Publishers, Dordrecht, [4] P. Diamond, P. Kloeden, Metric Spaces of Fuzzy Sets: Theory and Applications, World Scientific, Singapore, [5] P. Diamond, H. Tanaka, Fuzzy regression analysis, in: R. Slowinski (Ed.), Fuzzy Sets in Decision Analysis, Operations Research and Statistics, Kluwer, 1998, pp [6] F. Ho ppner, F. Klawonn, F. Kruse, T. Runkler, Fuzzy Cluster Analysis, Wiley, Chichester, [7] R. Krishnapuram, J.M. Keller, A possibilistic approach to clustering, IEEE Trans. Fuzzy Syst. 1 (2) (1993) [8] R. Weber, Fuzzy-ID3: a class of methods for automatic knowledge acquisition, in: IIZUKA-92, Proceedings of the 2nd International Con-ference on Fuzzy Logic, vol. 1, 1992, pp [9] C.Z. Janikow, Fuzzy decision trees: issues and methods, IEEE Trans. Syst. Man Cybern. 28 (1) (1998) [10] G. Chen, Q. Wei, E. Kerre, G. Wets, Overview of fuzzy associations mining, in: Proceedings of the ISIS-2003, 4th International Symposium on Advanced Intelligent Systems, Jeju, Korea, September, [11] M. Delgado, N. Marin, D. Sanchez, M.A. Vila, Fuzzy association rules: general model and applications, IEEE Trans. Fuzzy Syst. 11 (2) (2003) [12] E. Hu llermeier, Implication-based fuzzy association rules, in: Proceedings of the PKDD-01, 5th European Conference on Principles and Practice of Knowledge Discovery in Databases, Freiburg, Germany, (2001), pp [13] D. Dubois, E. Hu llermeier, H. Prade, A note on quality measures for fuzzy association rules, in: Proceedings of the IFSA-03, 10th International Fuzzy Systems Association World Congress, number 2715 in LNAI, Springer-Verlag, Istambul, 2003, pp [14] T.Y. Lin, Y.Y. Yao, L.A. Zadeh (Eds.), Data Mining, Rough Sets and Granular Computing, Physica-Verlag, Heidelberg, [15] A. Laurent, Generating fuzzy summaries: a new approach based on fuzzy multidimensional databases, Intell. Data Anal. J. 7 (2) (2003) [16] T.Y. Lin, Y.Y. Yao, L.A. Zadeh (Eds.), Data Mining, Rough Sets and Granular Computing, Physica-Verlag, Heidelberg, [17] D. Dubois, H. Prade, What are fuzzy rules and how to use them, 84 (1996) [18] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery: an overview, in: Advances in Knowledge Discovery and Data Mining, MIT Press, [19] A.P. Gasch, M.B. Eisen, Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering, Genome Biol. 3 (11) (2002) [20] A.P. Gasch, M.B. Eisen, Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering, Genome Biol. 3 (11) (2002)
Fuzzy Methods in Machine Learning and Data Mining: Status and Prospects
Fuzzy Methods in Machine Learning and Data Mining: Status and Prospects Eyke Hüllermeier University of Magdeburg, Faculty of Computer Science Universitätsplatz 2, 39106 Magdeburg, Germany [email protected]
A FUZZY LOGIC APPROACH FOR SALES FORECASTING
A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for
Knowledge Based Descriptive Neural Networks
Knowledge Based Descriptive Neural Networks J. T. Yao Department of Computer Science, University or Regina Regina, Saskachewan, CANADA S4S 0A2 Email: [email protected] Abstract This paper presents a
Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment,
Uncertainty Problems often have a certain amount of uncertainty, possibly due to: Incompleteness of information about the environment, E.g., loss of sensory information such as vision Incorrectness in
DESIGN AND STRUCTURE OF FUZZY LOGIC USING ADAPTIVE ONLINE LEARNING SYSTEMS
Abstract: Fuzzy logic has rapidly become one of the most successful of today s technologies for developing sophisticated control systems. The reason for which is very simple. Fuzzy logic addresses such
Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Introduction to Fuzzy Control
Introduction to Fuzzy Control Marcelo Godoy Simoes Colorado School of Mines Engineering Division 1610 Illinois Street Golden, Colorado 80401-1887 USA Abstract In the last few years the applications of
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining
Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology
DATA MINING IN FINANCE
DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Linguistic Preference Modeling: Foundation Models and New Trends. Extended Abstract
Linguistic Preference Modeling: Foundation Models and New Trends F. Herrera, E. Herrera-Viedma Dept. of Computer Science and Artificial Intelligence University of Granada, 18071 - Granada, Spain e-mail:
Standardization of Components, Products and Processes with Data Mining
B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization
Data Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: [email protected] http://sci2s.ugr.es
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Introduction to Pattern Recognition
Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University [email protected] CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining
A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining Binu Thomas and Rau G 2, Research Scholar, Mahatma Gandhi University,Kerala, India. [email protected] 2 SCMS School of Technology
Classification of Fuzzy Data in Database Management System
Classification of Fuzzy Data in Database Management System Deval Popat, Hema Sharda, and David Taniar 2 School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia Phone: +6 3
Project Management Efficiency A Fuzzy Logic Approach
Project Management Efficiency A Fuzzy Logic Approach Vinay Kumar Nassa, Sri Krishan Yadav Abstract Fuzzy logic is a relatively new technique for solving engineering control problems. This technique can
Product Selection in Internet Business, A Fuzzy Approach
Product Selection in Internet Business, A Fuzzy Approach Submitted By: Hasan Furqan (241639) Submitted To: Prof. Dr. Eduard Heindl Course: E-Business In Business Consultancy Masters (BCM) Of Hochschule
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
Prototype-based classification by fuzzification of cases
Prototype-based classification by fuzzification of cases Parisa KordJamshidi Dep.Telecommunications and Information Processing Ghent university [email protected] Bernard De Baets Dep. Applied Mathematics
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
Data Mining with Fuzzy Methods: Status and Perspectives
Data Mining with Fuzzy Methods: Status and Perspectives Rudolf Kruse, Detlef Nauck, and Christian Borgelt Department of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg
DEVELOPMENT OF FUZZY LOGIC MODEL FOR LEADERSHIP COMPETENCIES ASSESSMENT CASE STUDY: KHOUZESTAN STEEL COMPANY
DEVELOPMENT OF FUZZY LOGIC MODEL FOR LEADERSHIP COMPETENCIES ASSESSMENT CASE STUDY: KHOUZESTAN STEEL COMPANY 1 MOHAMMAD-ALI AFSHARKAZEMI, 2 DARIUSH GHOLAMZADEH, 3 AZADEH TAHVILDAR KHAZANEH 1 Department
Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997
1 of 11 5/24/02 3:50 PM Data Mining and KDD: A Shifting Mosaic By Joseph M. Firestone, Ph.D. White Paper No. Two March 12, 1997 The Idea of Data Mining Data Mining is an idea based on a simple analogy.
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data
A Data Mining Study of Weld Quality Models Constructed with MLP Neural Networks from Stratified Sampled Data T. W. Liao, G. Wang, and E. Triantaphyllou Department of Industrial and Manufacturing Systems
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Data mining and official statistics
Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale
On Development of Fuzzy Relational Database Applications
On Development of Fuzzy Relational Database Applications Srdjan Skrbic Faculty of Science Trg Dositeja Obradovica 3 21000 Novi Sad Serbia [email protected] Aleksandar Takači Faculty of Technology Bulevar
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Analyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
Fuzzy regression model with fuzzy input and output data for manpower forecasting
Fuzzy Sets and Systems 9 (200) 205 23 www.elsevier.com/locate/fss Fuzzy regression model with fuzzy input and output data for manpower forecasting Hong Tau Lee, Sheu Hua Chen Department of Industrial Engineering
BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
QOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2015 Submitted: March 19, 2015 Accepted: April
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Design of Prediction System for Key Performance Indicators in Balanced Scorecard
Design of Prediction System for Key Performance Indicators in Balanced Scorecard Ahmed Mohamed Abd El-Mongy. Faculty of Systems and Computers Engineering, Al-Azhar University Cairo, Egypt. Alaa el-deen
Optimization of Fuzzy Inventory Models under Fuzzy Demand and Fuzzy Lead Time
Tamsui Oxford Journal of Management Sciences, Vol. 0, No. (-6) Optimization of Fuzzy Inventory Models under Fuzzy Demand and Fuzzy Lead Time Chih-Hsun Hsieh (Received September 9, 00; Revised October,
Healthcare Data Mining: Prediction Inpatient Length of Stay
3rd International IEEE Conference Intelligent Systems, September 2006 Healthcare Data Mining: Prediction Inpatient Length of Peng Liu, Lei Lei, Junjie Yin, Wei Zhang, Wu Naijun, Elia El-Darzi 1 Abstract
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)
DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell [email protected] David Kopcso Babson College [email protected] Abstract: A series of simulations
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM MS. DIMPI K PATEL Department of Computer Science and Engineering, Hasmukh Goswami college of Engineering, Ahmedabad, Gujarat ABSTRACT The Internet
ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
Three Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this
Comparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
EMPLOYEE PERFORMANCE APPRAISAL SYSTEM USING FUZZY LOGIC
EMPLOYEE PERFORMANCE APPRAISAL SYSTEM USING FUZZY LOGIC ABSTRACT Adnan Shaout* and Mohamed Khalid Yousif** *The Department of Electrical and Computer Engineering The University of Michigan Dearborn, MI,
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
Nine Common Types of Data Mining Techniques Used in Predictive Analytics
1 Nine Common Types of Data Mining Techniques Used in Predictive Analytics By Laura Patterson, President, VisionEdge Marketing Predictive analytics enable you to develop mathematical models to help better
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Improving Computer Supported Environmental Friendly Product Development by Analysis of Data
ECIT/SISCS 2002-07-04 European Conferences on Intelligent Systems and Technologies IASI, ROMANIA Improving Computer Supported Environmental Friendly Product Development by Analysis of Data Ileana Hamburg
Introduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Intuitionistic fuzzy load balancing in cloud computing
8 th Int. Workshop on IFSs, Banská Bystrica, 9 Oct. 2012 Notes on Intuitionistic Fuzzy Sets Vol. 18, 2012, No. 4, 19 25 Intuitionistic fuzzy load balancing in cloud computing Marin Marinov European Polytechnical
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Grid Density Clustering Algorithm
Grid Density Clustering Algorithm Amandeep Kaur Mann 1, Navneet Kaur 2, Scholar, M.Tech (CSE), RIMT, Mandi Gobindgarh, Punjab, India 1 Assistant Professor (CSE), RIMT, Mandi Gobindgarh, Punjab, India 2
Towards applying Data Mining Techniques for Talent Mangement
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Towards applying Data Mining Techniques for Talent Mangement Hamidah Jantan 1,
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION
REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety
Meta-learning. Synonyms. Definition. Characteristics
Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore [email protected] (or search
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Mining the Software Change Repository of a Legacy Telephony System
Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,
Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa [email protected] skype, gtalk: avellido Tels.:
Computational Intelligence in Data Mining and Prospects in Telecommunication Industry
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 Scholarlink Research Institute Journals, 2011 (ISSN: 2141-7016) jeteas.scholarlinkresearch.org Journal of Emerging
