Subjective Measures and their Role in Data Mining Process

Size: px
Start display at page:

Download "Subjective Measures and their Role in Data Mining Process"

Transcription

1 Subjective Measures and their Role in Data Mining Process Ahmed Sultan Al-Hegami Department of Computer Science University of Delhi Delhi INDIA Abstract Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases. Data mining is a stage of the entire KDD process that involves applying a particular data mining algorithm to extract an interesting knowledge. One of the very important aspects of any data mining task is the evaluation process of the discovered knowledge. Furthermore, the major issue that faces data mining community is how to use our existing knowledge about domain to evaluate the discovered patterns. For the patterns to be interesting, the user has to be involved by providing his/her prior knowledge about domain. While objective measures can be quantified by using statistical methods, subjective measures are determined based on the user understandability of the domain. Use of objective measures of interestingness in popular data mining algorithms often leads to another data mining problem, although of reduced complexity. The reduction in the volume of the discovered patterns is desirable in order to improve the efficiency of the overall KDD process. Subjective measures of interestingness are required to achieve this. In this paper we study the subjective interestingness of the discovered patterns and show their role in extracting novel and interesting knowledge. Keywords Knowledge discovery in databases, data mining, subjective measures, objective measures, domain knowledge, classification, machine learning, decision tree. 1 Introduction It is not exaggeration to say that, the information get doubled every year due to the mechanical production of texts [2]. This Potentially large datasets are rich in information but it is difficult to find the meaningful facts we seek, unless there are methods for developing models to exploit this wealth. Researchers in different areas of Artificial Intelligence, Expert Systems, Statistics, Machine Learning, Databases, etc., are struggling to find new mechanisms, methods and techniques to transfer this ocean of data into a useful, effective, meaningful, and interesting information that play an effective role for decision support systems [26]. Knowledge Discovery of Databases (KDD) is a new area of research that attempts to solve the complexity mentioned above. It is a process of extracting previously unknown, hidden, novel and interesting knowledge from massive volumes of data stored in databases [18,44,16,7,11]. It is an iterative process carried out in three stages. The KDD process begins with the understanding of problem and ends with the analysis and evaluation of the results. It includes preprocessing of the data (Data Preparation) stage), extracting information from the data mining stage, and analyzing the discovered knowledge (Analysis stage) [18,16]. Actual extraction of patterns is preceded by preliminary analysis of data, followed by selection of relevant horizontal or vertical subset and appropriate data transformations. This is the preprocessing stage of KDD and it is considered to be the most time-consuming stage [45]. Often, the preparation of the data is influenced by the extraction algorithms used during the mining (second) stage. Data mining algorithms are applied during the second stage of the KDD process, which is considered to be the core stage. It involves selection and application of appropriate mining algorithm to search for patterns in the data. Sometimes combination of mining algorithms may be required to extract interesting patterns from the pre-processed data [15,54]. The outcome of this stage is the discovery of models/patterns hidden in databases, which are interpreted and analyzed during the third stage. The final stage of KDD process is analysis and evaluation of the knowledge discovered in the second stage. Having obtained patterns/models is not the end of the KDD process. Evaluation and analysis is equally important (if not more), particularly in view of the proliferation of KDD techniques being used to solve real-life applications. It is common knowledge that the volume of patterns discovered from data mining algorithms becomes huge due to the large size of target database [33,41,48,30,36]. Identifying interesting patterns from the vast set of discovered patterns still remains fundamentally a mining problem, though of reduced complexity. Time required to generate rules, space required to store, maintain and understand the rules by end users are some of the practical issues that need attention.

2 2 Integrating Subjective Measures with Data Mining The problem of reducing the volume of the discovered knowledge has been attacked at all the three stages of KDD process. Psaila proposed analysis of data to identify meta-patterns during pre-processing stage [43]. During the data mining stage, researchers commonly use either constraints or appropriate measures of interestingness to reduce the number of discovered rules. The third stage of the KDD process, which aims at analysis and interpreting the discovered knowledge is carried out by the end user. Post analysis of the discovered patterns as proposed in [33,32, 30] aids the user to focus on a small subset of discovered patterns. On account of wide variation of users' need and their subjectivity, the end users design and develop need based filters in an adhoc manner. We briefly discuss the three approaches in the following subsections. 2.1 Interestingness Measures Use of interestingness measures is one of the primary techniques to reduce the number of rules discovered by the user. The interestingness measures guide the KDD process in both mining stage and analysis stage, in order to restrict to rules that are of the user interest [12]. There are two types of rule interestingness measures that have been studied in data mining literature, namely, objective and subjective measures. Objective measures are based on the structure and statistical significance of the patterns [33,34,41]. Subjective measures are based on the subjectivity of the user who evaluates the patterns on the basis of novelty, actionability unexpectedness etc. [32, 48,49,58,59]. Novelty, unexpectedness and actionability are some of the subjective measures that are of immense importance to the end user of the KDD endeavor [12,36,59]. Novelty of a rule 1 is the extent to which the rule is added to the prior knowledge of the user [6,59]. Unexpectedness [33,36] is the extent to which the rule is surprising to the user. Actionability indicates the benefit that the rule can bring to the user. It is implicitly captured by novelty and unexpectedness. It is important to distinguish between novelty and unexpectedness measures. While the former implies discovering knowledge that are totally new to some extent, the later implies discovering knowledge that would increase/decrease the user expectation about the domain. 2.2 Constrained Mining Constraint based mining allows the users to specify the rules to be discovered according to their background knowledge, thereby making the data mining process more effective. Han and Kamber elaborate on various types of constraints viz. knowledge type constraints, data constraints and dimension/level constraints [22]. Though not all of the specified constraints can be pushed into the data mining algorithm [40], yet recently constraints have been successfully used to restrict the search space [8,9,10]. 2.3 Post-processing Filters After the rules have been discovered by the mining algorithm, further focusing is possible by use of post-analysis filters. Because of their inherent subjective needs, the end users develop and design their own post-processing filters in an adhoc manner. Consequently, they are highly specific and have not been an active area of research. To the best of our knowledge, the only generalized work on post-analysis is reported by [31,32]. 3 Interestingness Measures and Data Mining 3.1 Overview One of the majors issues that face data mining community is how to use our existing knowledge about domain to discover novel and interesting rules. For this reason, researchers lay more emphasis on the use of interestingness measures as one of the most important ways of reducing the number of discovered rules. Such measures can help to confine the number of uninteresting patterns discovered. This issue is very crucial and the most complicated one. The interestingness measures can guide the analysis stage in order to look for the rules that are of the user interest [12]. There are two aspects of rules interestingness that have been studied in data mining literature, objective and subjective measures [46,22,33,32,48,49,30,36,23, 25, 47,31]. Objective measures are data-driven and domain-independent. Generally, these measures 1 Rules are one of the commonly used form of representing the discovered models/patterns [13,46].

3 evaluate the rules based on the quality as well as the similarity between the rules, rather than considering the user believe about domain. Subjective measures by contrast, are user-driven and domain-dependent. For example, the user may be involved to specify rule template, indicating which attribute(s) must occur in the rule to be interesting from his/her point of view [27]. Another example is that, the user is asked to give a general, high level description of his expectation about the domain, then the system searches for the only rules which are unexpected to the user [33,32,30,31] 3.2 Objective Measures Objective measures play a critical role in the different stages of KDD process. In data mining stage, quantitative measures can be used to reduce the search space. For instance, support measure is used to reduce the number of itemsets to be examined [3,4,5]. In the evaluation stage of KDD, objective measures are used to select interesting rules from a set of discovered rules [48,49,20,21,24,42,50,51,53,56,57]. For instance, confidence measure of association rules is used to select only strong rules from a set of discovered association rules [4]. Furthermore, objective measures are used in the consolidating and acting on discovered rules. In this phase, these measures are used to quantify the effectiveness and usefulness of discovered rules. For instance, cost, classification error, and classification accuracy are used to establish such role [19]. Objective measures are based on the structure and statistics of the patterns [46,33, 41,49,14]. Many measures such as confidence, support, classification error, etc., are defined based on statistical characteristics of rules. Usually using statistical methods are easier to use. These methods applied to data or rules in order to obtain the nature of relationship between variables (attributes). These constraints based mining allow the users to determine the interestingness of the rules to be discovered. It should be noted that both objective and subjective measures are complementary. While objective measure can be used as a kind of first filter to select interesting rules, the subjective measures can finally be used as a final filter to select the desired interesting rules. Objective measures will not be studied further, as they are beyond the scope of this paper. 4 Subjective Measures Subjective measures are based on the subjectivity of the user who examines the patterns such as actionability and unexpectedness [33,32,48,30,36,31]. This paper studies the subjective interestingness. There are two main subjective measures that have been studied in data mining literature namely, unexpectedness, and actionability. A rule is unexpected if it contradicts the user belief about the domain and therefore surprise the user [33,32,37]. A rule is actionable if he/she can do any action to his/her advantage based on this rule [33,32,1]. Another important subjective measure, which has met less attention in data mining community, is the novelty measure of discovered rules [18,17,49]. The rule is novel if to some extent contributes to new knowledge. 4.1 Unexpectedness Unexpectedness of the discovered rules has been studied exhaustively in the literature [41,48,49,30,36,37,38,52,27,23,25,47]. However, [41], [48,49], [30], [36], [37,38], [52] and [39] present different approaches to tackle this measure. [41] studied the issue of interestingness of the discovered rules in the context of a health care application. KEFIR tries to look for the deviation in the data and looks at how a relevant action may affect a deviation. The system analyzes health care information to uncover key findings. The interesting rules are provided after measuring the degree of interestingness. The degree of interestingness is estimated by the amount of benefit when an action is taken. The analyst provides recommendations based on his/her prior knowledge. It will rank all the rules based on interestingness of the deviation. This system is considered to be a good method for incorporating the domain knowledge into an application system. However, the system is domain dependent and cannot be used for any other application. [48,49] studied the subjective interestingness by providing a framework to measure rules unexpectedness with respect to the user belief. They proposed to use a probabilistic belief and belief revision methods. The belief is used for defining the unexpectedness. A revision method is used to modify the belief confidence when new evidence arrived. A rule is considered unexpected if there is some change in this belief. In practice, it is difficult to obtain belief information, especially specific domain knowledge. The approach presented in [30] is based on a syntactic comparison between a discovered rule and a rule in domain knowledge. Both rules are dissimilar if either the consequents of both rules are similar but the antecedents are far apart or the consequents are far apart but the antecedents are similar. Where similarity and dissimilarity are defined based on the structure of the rules. The problem with this approach is that it does not specify the degree of the unexpectedness and does not consider the case in which both antecedents and consequents are dissimilar between the discovered rules and rules in domain knowledge. [36,37,38] proposed a new definition of unexpectedness in term of a logical contradiction of a rule with respect to belief. Given a rule A B and belief X Y, where both A and X are antecedents and both B and Y are a single atomic conditions, if the rule A B is unexpected with respect to belief X Y, then the rule A, X B also holds.

4 An alternative approach is presented in [52] that proposes autonomous probabilistic estimation method that can discover all rule pairs (i.e., an exception rule associated with a common sense rule) with high confidence. The approach discovers pairs of rules A B and their corresponding exception A, C B, where A and C are conjunction of <attribute, value> pairs and B and B are <attribute, value> pairs corresponding to the same attribute but with different values. In addition, the unexpectedness of the exception rule is defined by an additional constraint that the reference rule C B has low confidence. Neither users evaluation nor domain knowledge is required in this approach. Another approach to measure subjective interestingness requires the user to specify what types of rules that are interesting and uninteresting. Then, the matching techniques are performed to generate rules taking in the consideration the user belief. [27] proposes this kind of user belief and uses a template-based approach in which the user specifies a set of interesting and uninteresting rules using templates. A template describes a set of rules in terms of items occurred in the antecedent and consequence parts. Finally, the system retrieves the matching rules from the set of discovered rules. Another methods of quantifying subjective interesting rules are query-based [23, 25,47]. For example M-SQL in [25], DMQL in [23], and Metaqueries in [47]. These methods look at the process of finding subjective interesting rules as a query-based process. The user basically specifies a set of rules or constraints on the rules using data mining query. The system, then, finds the rules that satisfy this query [29]. The drawbacks of query-based approaches are that, they find only those expected rules, which match the query specified by the user. The real interesting rules, which are unexpected or novel can never be found by these methods. Furthermore, the user may not be able to determine what is interesting to him/her. 4.2 Actionability The actionability measure is based on the rules benefit to the user, that is, the user can do something to his/her interest [12,33,32,48,49,30,31,29]. This measure is very important for the rules to be interesting in the sense that the users always looking for patterns to improve their performance and establishing better work. The users can take some actions in response to the actionable knowledge. It is therefore important to remember that, one of primary domain of most data mining algorithms is dealing with business activities. From business point of view, getting information in not desired purely for its own interest. The practical implication of getting information is to improve the business, that is, the information must ensure the success of business for decision-making. Making business is an action performed to make business succeed. However, in practice, it is not an easy task to determine which information is actionable. [48,49] quantify the actionability in term of unexpectedness. They define unexpectedness as a subjective measure of interestingness. They show that the most of actionable knowledge is unexpected and most of the unexpected knowledge is actionable. Since actionability is a subjective measure that is hard to define, they propose that unexpectedness is a good approximation for actionability. Furthermore, they argue that actionability is a good measure for unexpectedness. Since unexpectedness is easier to measure than actionability, unexpectedness is the measurement used to address actionability. In [48,49] subjective interestingness is categorized into three categories: 1. Rules that are both unexpected and actionable, 2. Rules that are unexpected and not actionable, and 3. Rules that are expected and actionable. They argue that new metric is not needed since 1 and 2 can be handled by finding rules, which are unexpected, and 3 can be handled by finding rules that conform to the user existing knowledge about domain. In fact that process does not solve the problem of determining how actionability affects interestingness. The actionability and unexpectedness measures must be addressed individually and each represented separately in the interestingness measure. If a rule is unexpected and not actionable, this rule is not as interesting as a rule that is unexpected and actionable. Both rules must be presented with different degrees of interestingness. In order to measure the actionability, the user needs to be involved. The user can rank each attribute based on his ability to act on that attribute [12]. It may be possible also to rank a rule based on its actionability. Actionability in addition can be measured with respect to different data mining algorithms, that is, a set of discovered rules generated by a particular algorithm is more actionable than those rules discovered by another data mining algorithm. Therefore, the actionability has to be measured independently, not through measuring the unexpectedness. 4.3 Novelty Measure A key factor in determining whether a KDD process is successful is whether it provides the user with previously unknown, useful, and interesting knowledge [17, 48,49]. The term, previously unknown, has been argued to imply interesting [48,49]. This implies that the interestingness increases as the newness of the knowledge increases and vise-versa. For instance, if the following discovered rule: age>50 ^ sex=female loan=no, then if the user does not know this rule and this rule is not discovered previously, then novel rule is provided to the user. This is interesting since it increases the user knowledge. However, if the user already knows this rule, no novel information is provided to the user and the rule is considered uninteresting to the user. Since novelty measure is based on the users feeling and subjectivity of the user about the discovered rules, it is considered subjective measure.

5 Novelty is very important aspect of KDD process. It can be applied to the different stages of KDD process. In pre-processing stage, the novelty measure can be used as a filter to select and concentrate on a set of instances that should be given more attention. It can also be used to determine what features are more important to the learning algorithms and hence focus attention when something new comes. In the second stage of KDD process, the novelty measure can guide the mining process to form a constraint in order to discover the only novel rules. In post-processing stage of KDD, this measure can analyze the discover knowledge objectively and/or subjectively to form a filter that minimize the number of discovered rules which are easier to understand by the user. There are many proposals that studied the novelty in other disciplines such as robotics, machine learning and statistical outlier detection. Generally, these methods build a model of a training set that is selected to contain no examples of the important (i.e. novel) class. Then, the model built; detect the deviation from this model by some way. For instance, Kohonen and Oja proposed a novelty filter, which based on computing the bit-wise difference between the current input and the closest match in the training set [28]. In [55] a sample application of applying association rule learning is presented. By monitoring the variance of the confidence of particular rules inferred from the association rule learning on training data, it provides information on the difference of such parameters before and after the testing data entering the system. Hence, with some pre-defined threshold, abnormalities can be fairly detected. The techniques that have been proposed in statistical literature are focused on modeling the support of the dataset and then detecting inputs that don t belong to that support. The choice of whether to use statistical methods or machine learning methods is based on the data that is available, the application, and the domain knowledge [35]. To our knowledge no concrete work has been conducted to tackle the novelty measure of data mining. The only work that has been proposed is detecting the novelty of rules mined from text [6]. In [6], the novelty is estimated based on the lexical knowledge in WordNet. The proposed approach defines a measure of semantic distance between two words in WordNet and determined by the length of the shortest path between the two words (w i,w j ). The novelty then is defined as the average of this distance across all pairs of the words (w i,w j ), where w i in a word in the antecedent and w j is a word in the consequent. In [59], we proposed a framework to quantify the novelty in terms of the computing the deviation of currently discovered knowledge with respect to domain knowledge and previously discovered knowledge. The approach presented in [59] is used as a post analysis filter in order to discover only novel rules. 5 COMPARISON OF SUBJECTIVE MEASURES Most existing approaches to measure subjective interestingness require a user to explicitly state what type of knowledge the user expects. The system then applies some searching techniques to select rules according to the user previous expectation. Most of these measures concentrate on the unexpectedness and actionability measures as the most influential aspect of rules interestingness. However, no general approach was proposed for handling novelty. Although, the actionability and unexpectedness are important, the rule to be interesting must also be novel. We assume that if novelty is occurred, this implies explicitly or implicitly that the rule may also be unexpected and/or actionable. Even though the unexpectedness measure may sound like novelty in some aspects, most of the research on unexpectedness focused on generating rules, which contradict the user belief about domain [33, 32, 48,49,30,36]. Table 2 shows the subjective measures and their importance with each other. Non-Actionable Rule Unexpected Rule Expected Rule Actionable Rule Novel Rule Most Interesting Not Interesting Most Interesting More Interesting Non-Novel rule Less Interesting Not Interesting Less Interesting Not Interesting Table 1. Subjective interestingness measures categories References 1. Adomavicius, G., Tuzhilin, A., Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach, In Proceedings of the Third International Conference of Knowledge Discovery & Data Mining, The AAAI Press, Adrians, P., Zantiage, D., Data Mining, 1 st edition, Addison Wesley, longman, Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A., Fast Discovery of Association Rules, In Advances in knowledge discovery and data mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, Agrawal, R., Imielinski, T., Swami, A., Mining Association Rules between Sets of Items in Large Databases, In ACM SIGMOD Conference of Management of Data. Washington D.C., Agrawal, R., Srikant, R., Fast Algorithms for Mining Association Rules in Large Databases, In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago. Chile, 1994.

6 6. Basu, S., Mooney, R. J., Pasupuleti, K. V., Ghosh, J., Using Lexical Knowledge to Evaluate the Novelty of Rules Mined from Text, In Proceedings of the NAACL workshop and other Lexical Resources: Applications, Extensions and Customizations, Brachman, R. J., Anand, T., The Process of Knowledge Discovery in Databases, In Advances in Knowledge Discovery and Data mining. Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., Adaptive Constraint Pushing in Frequent Pattern Mining, In Proceedings of the 17 th European Conference on PAKDD03, Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., ExAMiner: Optimized Level-wise Frequent pattern Mining with Monotone Constraints, In Proceedings of the 3 rd International Conference on Data Mining (ICDM03), Bronchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D., Exante: Anticipated Data Reduction in Constrained Pattern Mining, In Proceedings of the 7 th PAKDD03, 2003). 11. Cabena, P., Hadjinian, P., Stadler, R., Verhess, J., Zanasi, A., Discovering Data Mining from Concepts to Implementation, New Jersey, Prentice Hall, Clair, C., A Usefulness Metric and its Application to Decision Tree Based Classification, Ph.D. thesis, School of Computer Science, USA, Clark, P., Niblett, T., The CN2 Induction Algorithm, In Machine learning 3(4), Dhar, V., Tuzhilin, A., Abstract-Driven Pattern Discovery in Databases, In IEEE Transactions on Knowledge and Data Engineering 5(6), Duda, R. O, Hart, P. E., Stork, D. G., Pattern Classification, 2 nd Edition. John Wiley & Sons ( Asia) PV. Ltd, Dunham M. H., Data Mining: Introductory and Advanced Topics, 1 st Edition Pearson Education (Singapore) Pte. Ltd., Fayyad, U. M., Djorgovski, S. G., Weir, N., Automating the Analysis and Cataloging of Sky Surveys,. In Advances in Knowledge Discovery and Data Mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P,, Menlo Park, CA:AAAI/MIT Press, Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., From Data Mining to Knowledge Discovery, In Advances in Knowledge Discovery and Data Mining. Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, Freitas, A. A., On Rule Interestingness Measures, Knowledge-Based Systems 12, Gray, B., Orlowska, M. E., CCAIIA: Clustering Categorical Attributes into Interesting Association Rules, In Proceedings of the 2 nd Pacific- Asia Conference, PAKDD-98, Lecture Notes in Artificial Intelligence, Guillaume, S., Guillet, F., Philippé, J., Improving the Discovery of Association Rules with Intensity of Implication, In Proceedings of the 2 nd European Symposium, PKDD98, Lecture Notes in Artificial Intelligence, Han, J., Kamber, M.:, Data Mining: Concepts and Techniques, 1 st Edition, Harcourt India Private Limited, Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O., DMQL: A Data Mining Query Language for Relational Databases, In Proceedings of the SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Hong, J., Mao, C., Incremental Discovery of Rules and Structure by Hierarchical and Parallel Clustering. In Knowledge Discovery in Databases, Imielinski, T., Virmani, A., Abdulghani, A., DataMine: Application Programming Interface and Query Language for Database Mining, KDD-96, Janakiramn, V. K., Saurukesi, K., Decision Support Systems, 2 nd edition, Perntice-Hall,India, Klemetinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A. I., Finding Interesting Rules from Large Sets of Discovered Association Rules, In Proceedings of the 3 rd International Conference on Information and Knowledge Management. Gaithersburg, Maryland, Kohonen, T., Self-Organization and Associative Memory, 3 rd edition, Springer, Berlin, Liu, B., Hsu, W., Chen, S., Ma, Y., Analyzing the Subjective Interestingness of Association Rules, IEEE Intelligent Systems, Liu, B., Hsu, W., Post Analysis of Learned Rules, In Proceedings of the 13 th National Conference on AI (AAAI 96), Liu, B., Hsu, W., Lee, H-Y., Mum, L-F., Tuple-Level Analysis for Identification of Interesting Rules, In Technical Report TRA5/95, SoC., National University of Singapore, Singapore, Liu, B., Hsu, W., Finding Interesting Patterns Using User Expectations, DISCS Technical Report, Liu, B., Hsu, W., Chen, S., Using General Impressions to Analyze Discovered Classification Rules, In Proceedings of the 3 rd International Conference on Knowledge Discovery and Data mining (KDD 97), Luger, G. F., Artificial Intelligence: Structure and Strategies for Complex Problem Solving, 4 th Edition, Pearson Education Ltd.,Delhi, India, Marsland, S., On-Line Novelty Detection Through Self-Organization, with Application to Robotics, Ph.D. Thesis, Department of Computer Science, University of Manchester, Padmanabhan, B., Tuzhilin, A., Unexpectedness as a Measure of Interestingness in Knowledge Discovery, Working paper # IS-97-. Dept. of Information Systems, Stern School of Business, NYU, Padmanabhan, B., Tuzhilin, A., A Belief-Driven Method for Discovering Unexpected Patterns, KDD-98, Padmanabhan, B., Tuzhilin, A., Small is Beautiful: Discovering the Minimal Set of Unexpected Patterns, KDD-2000, Patterson, D. W., Introduction to Artificial Intelligence and Expert Systems, 8 th Edition, Prentice-Hall, India, Pei, J., Han, J., Can We Push More Constraints into Frequent Pattern Mining, In Proceeding of the 6 th ACM SIGKDD, Piatetsky-Shapiro, G., Matheus, C. J., The Interestingness of Deviations, In Proceedings of AAAI Workshop on Knowledge Discovery in Databases, Piatetsky-Shapiro, G., Discovery, Analysis, and Presentation of Strong Rules, In Knowledge Discovery in Databases, The AAAI Press, Psaila, G., Discovery of Association Rules Meta-Patterns, In Proceedings of 2 nd International Conference on Data Warehousing and Knowledge Discovery (DAWAK99), 1999). 44. Pujari, A. K., Data Mining Techniques, 1 st Edition, Universities Press (India) Limited, Pyle, D., Data Preparation for Data Mining, Morgan Kaufmanns, San Francisco, CA, USA, Quinlan, J. R, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmanns, 1993.

7 47. Shen, W-M., Ong, K-L., Mitbander, B., Zaniolo, C., Metaqueries for Data Mining, In Advances in Knowledge Discovery and Data Mining, Edited by Fayyad, U. M. & Piatetsky-Shapiro, G. & Uthurusamy, P. Menlo Park, CA:AAAI/MIT Press, Silberschatz, A., Tuzhilin, A., On Subjective Measures of Interestingness in Knowledge Discovery, In Proceedings of the 1 st International Conference on Knowledge Discovery and Data Mining, Silberschatz, A., Tuzhilin, A., What Makes Patterns Interesting in Knowledge Discovery Systems, IEEE Trans. and Data Engineering. V.5, no.6, Smyth, P., Goodman, R. M., Rule Induction Using Information Theory, In Knowledge Discovery in Databases, Suzuki, E., Kodratoff, Y., Discovery of Surprising Exception Rules Based on Intensity of Implication, In Proceedings of the 2 nd European Symposium, PKDD98, Lecture Notes in Artificial Intelligence, Suzuki, E., Autonomous Discovery of Reliable Exception Rules, In Proceedings of The 3 rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, USA, Wang, K., Tay, S.H.W., Liu, B., Interestingness-Based Interval Merger for Numeric Association Rules, In Proceedings of the 4 th International Conference on Knowledge and Data Mining, Williams, G. J., Evolutionary Hot Spot Data Mining: An Architecture for Exploring For Interesting Discoveries, In Proceeding of the 3 rd PAKDD99, Yairi, T., Kato, Y., Hori K., Fault Detection by Mining Association Rules from House-keeping Data, In Proceedings of International Symposium on Artificial Intelligence, Robotics and Automation in Space (SAIRAS 2001), Yao, Y. Y., Liau, C. J., A generalized Decision Logic Language for Granular Computing, FUZZ-IEE on Computational Intelligence, Yao, Y. Y., Zhong, N., An Analysis of Quantitative Measures Associated with Rules, In Proceedings of PAKDD, Al-Hegami, A. S., Interestingness Measures for KDD: A Comparative Analysis, In Proceedings of the 11 th International Conference on Concurrent Engineering: Research and Applications, Beijing, China, 2004, pp Al-Hegami, A. S., Kumar, N., Bhatnagar, V., Novelty Framework for Knowledge Discovery in Databases, In Proceedings of the 6 th International Conference on Data warehousing and Knowledge Discovery (DaWak 2004), Zaragoza, Spain, 2004, pp

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Evolutionary Hot Spots Data Mining. An Architecture for Exploring for Interesting Discoveries

Evolutionary Hot Spots Data Mining. An Architecture for Exploring for Interesting Discoveries Evolutionary Hot Spots Data Mining An Architecture for Exploring for Interesting Discoveries Graham J Williams CRC for Advanced Computational Systems, CSIRO Mathematical and Information Sciences, GPO Box

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Mining changes in customer behavior in retail marketing

Mining changes in customer behavior in retail marketing Expert Systems with Applications 28 (2005) 773 781 www.elsevier.com/locate/eswa Mining changes in customer behavior in retail marketing Mu-Chen Chen a, *, Ai-Lun Chiu b, Hsu-Hwa Chang c a Department of

More information

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

33 Data Mining Query Languages

33 Data Mining Query Languages 33 Data Mining Query Languages Jean-Francois Boulicaut 1 and Cyrille Masson 1 INSA Lyon, LIRIS CNRS FRE 2672 69621 Villeurbanne cedex, France. jean-francois.boulicaut,cyrille.masson@insa-lyon.fr Summary.

More information

A Database Perspective on Knowledge Discovery

A Database Perspective on Knowledge Discovery Tomasz Imielinski and Heikki Mannila A Database Perspective on Knowledge Discovery The concept of data mining as a querying process and the first steps toward efficient development of knowledge discovery

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model

A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model arxiv:cs.db/0112013 v1 11 Dec 2001 A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model Tom Brijs Bart Goethals Gilbert Swinnen Koen Vanhoof Geert

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Mining various patterns in sequential data in an SQL-like manner *

Mining various patterns in sequential data in an SQL-like manner * Mining various patterns in sequential data in an SQL-like manner * Marek Wojciechowski Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland Marek.Wojciechowski@cs.put.poznan.pl

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Postprocessing in Machine Learning and Data Mining

Postprocessing in Machine Learning and Data Mining Postprocessing in Machine Learning and Data Mining Ivan Bruha A. (Fazel) Famili Dept. Computing & Software Institute for Information Technology McMaster University National Research Council of Canada Hamilton,

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Mining Association Rules: A Database Perspective

Mining Association Rules: A Database Perspective IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology

More information

Thomas M. Tirpak, Weimin Xiao Motorola Labs 1301 E. Algonquin Rd. Schaumburg, IL 60196. USA {T.Tirpak, awx003}@motorola.com. {kzhao, liub}@cs.uic.

Thomas M. Tirpak, Weimin Xiao Motorola Labs 1301 E. Algonquin Rd. Schaumburg, IL 60196. USA {T.Tirpak, awx003}@motorola.com. {kzhao, liub}@cs.uic. Opportunity Map: A Visualization Framework for Fast Identification of Actionable Knowledge 1 Kaidi Zhao, Bing Liu Department of Computer Science University of Illinois at Chicago 851 S. Morgan Street Chicago,

More information

DATA MINING QUERY LANGUAGES

DATA MINING QUERY LANGUAGES Chapter 32 DATA MINING QUERY LANGUAGES Jean-Francois Boulicaut INSA Lyon, URIS CNRS FRE 2672 69621 Villeurbanne cedex, France. jean-francois.boulicautoinsa-1yort.fr Cyrille Masson INSA Lyon, URIS CNRS

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Data Mining as an Automated Service

Data Mining as an Automated Service Data Mining as an Automated Service P. S. Bradley Apollo Data Technologies, LLC paul@apollodatatech.com February 16, 2003 Abstract An automated data mining service offers an out- sourced, costeffective

More information

Three Perspectives of Data Mining

Three Perspectives of Data Mining Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining

More information

Standardization of Components, Products and Processes with Data Mining

Standardization of Components, Products and Processes with Data Mining B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization

More information

Association rules for improving website effectiveness: case analysis

Association rules for improving website effectiveness: case analysis Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, dimitrijevic@vtsns.edu.rs Tanja Krunić, The

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Fundations of Data Mining

Fundations of Data Mining A Step Towards the Foundations of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 ABSTRACT This paper addresses some fundamental issues related

More information

Knowledge Discovery and Data Mining: Towards a Unifying Framework

Knowledge Discovery and Data Mining: Towards a Unifying Framework From: KDD-96 Proceedings. Copyright 1996, AAAI (www.aaai.org). All rights reserved. Knowledge Discovery and Data Mining: Towards a Unifying Framework Usama Fayyad Microsoft Research One Microsoft Way Redmond,

More information

2.1. Data Mining for Biomedical and DNA data analysis

2.1. Data Mining for Biomedical and DNA data analysis Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: simmibagga12@gmail.com) Dr. G.N. Singh Department of Physics and

More information

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques Data Mining: Concepts and Techniques Chapter 1 Introduction SURESH BABU M ASST PROF IT DEPT VJIT 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind of

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Analyzing Polls and News Headlines Using Business Intelligence Techniques

Analyzing Polls and News Headlines Using Business Intelligence Techniques Analyzing Polls and News Headlines Using Business Intelligence Techniques Eleni Fanara, Gerasimos Marketos, Nikos Pelekis and Yannis Theodoridis Department of Informatics, University of Piraeus, 80 Karaoli-Dimitriou

More information

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT

FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT ANURADHA.T Assoc.prof, atadiparty@yahoo.co.in SRI SAI KRISHNA.A saikrishna.gjc@gmail.com SATYATEJ.K satyatej.koganti@gmail.com NAGA ANIL KUMAR.G

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

NEURAL NETWORKS IN DATA MINING

NEURAL NETWORKS IN DATA MINING NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997

Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997 1 of 11 5/24/02 3:50 PM Data Mining and KDD: A Shifting Mosaic By Joseph M. Firestone, Ph.D. White Paper No. Two March 12, 1997 The Idea of Data Mining Data Mining is an idea based on a simple analogy.

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Association Rule Mining: A Survey

Association Rule Mining: A Survey Association Rule Mining: A Survey Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University, Singapore 1. DATA MINING OVERVIEW Data mining [Chen et

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

Data Mining to Recognize Fail Parts in Manufacturing Process

Data Mining to Recognize Fail Parts in Manufacturing Process 122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak

More information

Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval

Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval Mariângela Vanzin, Karin Becker, and Duncan Dubugras Alcoba Ruiz Faculdade de Informática - Pontifícia Universidade Católica do Rio

More information

Use of Data Mining in the field of Library and Information Science : An Overview

Use of Data Mining in the field of Library and Information Science : An Overview 512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

Marshall University Syllabus Course Title/Number Data Mining / CS 515 Semester/Year Fall / 2015 Days/Time Tu, Th 9:30 10:45 a.m. Location GH 211 Instructor Hyoil Han Office Gullickson Hall 205B Phone (304)696-6204

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Gold. Mining for Information

Gold. Mining for Information Mining for Information Gold Data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way Joseph M. Firestone, Ph.D. During the late 1980s,

More information

Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation

Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation Yuming Ou, Longbing Cao, Chao Luo, and Chengqi Zhang Faculty of Information Technology, University of Technology, Sydney,

More information

Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA jadb3@mizzou.edu James Laffey University of Missouri Columbia Columbia, MO USA LaffeyJ@missouri.edu

More information

The KDD Process for Extracting Useful Knowledge from Volumes of Data

The KDD Process for Extracting Useful Knowledge from Volumes of Data Knowledge Discovery in bases creates the context for developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific,

More information

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO) Cost Drivers of a Parametric Cost Estimation Model for Mining Projects (DMCOMO) Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García Universidad Carlos III de Madrid (UC3M) Abstract Mining is

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Mining Multi Level Association Rules Using Fuzzy Logic

Mining Multi Level Association Rules Using Fuzzy Logic Mining Multi Level Association Rules Using Fuzzy Logic Usha Rani 1, R Vijaya Praash 2, Dr. A. Govardhan 3 1 Research Scholar, JNTU, Hyderabad 2 Dept. Of Computer Science & Engineering, SR Engineering College,

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results , pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department

More information

An architecture for an effective usage of data mining in business intelligence systems

An architecture for an effective usage of data mining in business intelligence systems Knowledge Management and Innovation in Advancing Economies: Analyses & Solutions 1319 An architecture for an effective usage of data mining in business intelligence systems Ana Azevedo, ISCAP/IPP, Porto,

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH

KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH Shi-Ming Huang and Tsuei-Chun Hu* Department of Accounting and Information Technology *Department of Information Management

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare Syllabus HMI 7437: Data Warehousing and Data/Text Mining for Healthcare 1. Instructor Illhoi Yoo, Ph.D Office: 404 Clark Hall Email: muteaching@gmail.com Office hours: TBA Classroom: TBA Class hours: TBA

More information

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology Graham J. Williams and Zhexue Huang CSIRO Division of Information Technology GPO Box 664 Canberra ACT 2601 Australia

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS Nitin Trivedi, Research Scholar, Manav Bharti University, Solan HP ABSTRACT The purpose of this study is not to delve deeply into the technical

More information

A Spatial Decision Support System for Property Valuation

A Spatial Decision Support System for Property Valuation A Spatial Decision Support System for Property Valuation Katerina Christopoulou, Muki Haklay Department of Geomatic Engineering, University College London, Gower Street, London WC1E 6BT Tel. +44 (0)20

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

Mining On-line Newspaper Web Access Logs

Mining On-line Newspaper Web Access Logs Mining On-line Newspaper Web Access Logs Paulo Batista, Mário J. Silva Departamento de Informática Faculdade de Ciências Universidade de Lisboa Campo Grande 1700 Lisboa, Portugal {pb, mjs} @di.fc.ul.pt

More information

Visual Analysis of the Behavior of Discovered Rules

Visual Analysis of the Behavior of Discovered Rules Visual Analysis of the Behavior of Discovered Rules Kaidi Zhao, Bing Liu School of Computing National University of Singapore Science Drive, Singapore 117543 {zhaokaid, liub}@comp.nus.edu.sg ABSTRACT Rule

More information

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626

More information

WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques

WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). All rights reserved. WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques Howard J. Hamilton, Xuewei Wang, and Y.Y. Yao

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

An Overview of Temporal Data Mining

An Overview of Temporal Data Mining An Overview of Temporal Data Mining Weiqiang Lin Department of Computing I.C.S., Macquarie University Sydney, NSW 2109, Australia wlin@ics.mq.edu.au Mehmet A. Orgun Department of Computing I.C.S., Macquarie

More information

Privacy Preserved Association Rule Mining For Attack Detection and Prevention

Privacy Preserved Association Rule Mining For Attack Detection and Prevention Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Knowledge-Based Visualization to Support Spatial Data Mining

Knowledge-Based Visualization to Support Spatial Data Mining Knowledge-Based Visualization to Support Spatial Data Mining Gennady Andrienko and Natalia Andrienko GMD - German National Research Center for Information Technology Schloss Birlinghoven, Sankt-Augustin,

More information

II. OLAP(ONLINE ANALYTICAL PROCESSING)

II. OLAP(ONLINE ANALYTICAL PROCESSING) Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK 1 K.RANJITH SINGH 1 Dept. of Computer Science, Periyar University, TamilNadu, India 2 T.HEMA 2 Dept. of Computer Science, Periyar University,

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com

INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR. ankitanandurkar2394@gmail.com IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY DATA MINING IN HEALTHCARE SECTOR Bharti S. Takey 1, Ankita N. Nandurkar 2,Ashwini A. Khobragade 3,Pooja G. Jaiswal 4,Swapnil R.

More information

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region International Journal of Computational Engineering Research Vol, 03 Issue, 8 Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region 1, Salim Diwani, 2, Suzan Mishol, 3, Daniel

More information

Mining Generalized Query Patterns from Web Logs

Mining Generalized Query Patterns from Web Logs Mining Generalized Query Patterns from Web Logs Charles X. Ling* Dept. of Computer Science, Univ. of Western Ontario, Canada ling@csd.uwo.ca Jianfeng Gao Microsoft Research China jfgao@microsoft.com Huajie

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information