A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology
|
|
|
- Nickolas Cooper
- 10 years ago
- Views:
Transcription
1 A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology Graham J. Williams and Zhexue Huang CSIRO Division of Information Technology GPO Box 664 Canberra ACT 2601 Australia [email protected] Abstract We describe some initial experiences in dealing with the task of acquiring knowledge where a very large collection of case histories is available. A Knowledge Discovery in Databases (KDD) approach is taken. KDD is the process of extracting novel information and knowledge from large databases, consisting of many interacting stages performing specific data manipulation and transformation operations with an information flow from one stage onto the next (and usually with feedback into previous stages). We characterise our experiences of this process for the task of acquiring knowledge for the domain of motor vehicle insurance premium setting for NRMA Insurance Limited. Keywords: Knowledge acquisition, knowledge discovery in databases, data mining, insurance premiums, risk analysis, fraud. 1 Introduction Knowledge Discovery in Databases (KDD) is commonly defined as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayyad, Piatetsky-Shapiro and Smyth 1996). KDD technology combines techniques from a variety of related disciplines, notably Databases, Artificial Intelligence, Statistics, Scientific Discovery, and Visualisation. KDD is indeed identical to none of them but rather draws upon the research and technology from all of them. A particular focus of KDD is on extremely large real world training datasets often of sizes measured in giga-bytes and tera-bytes. The sheer size alone eliminates many existing techniques (from statistics, machine learning, and knowledge acquisition) for automatically, let alone manually, analysing the data. With such a wealth of data available a knowledge engineering exercise in extracting interesting and useful knowledge needs to be carefully guided by the domain experts. We relate in this paper our experience in using a KDD process to assist in the task of knowledge acquisition when very large training datasets are available. The KDD process is recognised as consisting of a number of interacting, iterative, stages involving various data manipulation and transformation operations. Information flows from one stage onto the next and also backwards to previous stages. Fayyad, Piatetsky-Shapiro and Smyth (1996), for instance, identify 9 steps in the KDD process whilst Williams and Huang (1996) identify a simpler, yet still sufficient model of the process involving 4 primary stages. Most attention within the KDD community has focused on the Data Mining (algorithmic exploration of the data) stage of the process. It is critical, however, to recognise that Data Mining is only one part of the whole process (Brachman and Anand 1996). Anecdotal evidence (and our Presented at PKAW96, the Pacific Rim Knowledge Acquisition Workshop, Sydney, Australia, October
2 own experience) suggests that the other stages account for up to 95% of the effort it is important then to understand the whole KDD process. Our project brings together a team of researchers in Databases, Machine Learning, Statistics, and Visualisation, to perform KDD research and development with industry partners who provide domain expertise and real world databases. A particular focus of the project is the use of high performance computers and parallel algorithms. Domains ranging from insurance fraud to astronomy have been studied. In this paper we first present a model of the KDD process which identifies the necessary and sufficient elements of the process and supports the variety of operations required for knowledge acquisition using KDD. Section 2 identifies some of the issues that characterise KDD. We then present a view of the KDD process in Section 3. Sections 4 7 then illustrate the various aspects of the model in the context of an actual case study from insurance risk analysis performed with NRMA Insurance Limited, one of Australia's largest general insurers. 2 Characteristics of KDD Applications What distinguishes KDD from related research areas such as Knowledge Acquisition, Machine Learning, and Statistics is the size of the training dataset available, the complexity of that data, and the expected results, rather than by the particular methods and algorithms used. But more than this, KDD is viewed as an all encompassing process for the discovery of knowledge in databases. KDD attempts to deal with real world problems and hence real world data. The source data for KDD is often extracted from databases which are generally not built with KDD in mind. The data must be cleansed and moulded into a form suitable for the particular KDD task. It must then be transformed into a format that the particular knowledge acquisition or machine learning tools can work with. Some of the issues which need to be addressed within the KDD context include noise, redundant information, missing values and attributes, large data sets and sparse data (Frawley, Piatetsky-Shapiro and Matheus 1992, Matheus, Chan and Piatetsky- Shapiro 1993). Redundancy can result from the inclusion of attributes and records in the source data which are irrelevant or superfluous to the data mining. Determining which attributes and records are redundant is usually difficult. Removing apparently irrelevant attributes can lead to a reduction in the types of new knowledge that can be discovered. In insurance risk analysis, for example, the office at which an insurance application was lodged may seem irrelevant, yet could be a particularly interesting risk factor. Leaving truly redundant attributes in the data, though, can lead to aberrant results or could significantly impact upon the time taken to run the data mining algorithms. Redundancy can also occur when multiple instances in the database represent the same object, but perhaps recording some different information. For many data mining algorithms this is not a desirable situation often the algorithms assume independence between records. A typical example is when the source database simply records transactions performed by clients. While some data mining algorithms, such as the association algorithms of Agrawal and Srikant (1994), work directly with such data, many require the data to be entity (in our case, insurance policy) oriented. The problem of missing attributes often arises because of the desire to use more information in data mining than the original designers of the database considered necessary. The missing data can only sometimes be obtained, and then at a cost. The volume of data available for a KDD task is an obvious problem, but one which often leads to the related problem of sparse data. Data can be sparse, for example, when there are many missing values for various attributes. It can also be sparse when the concept of particular interest in the data does not occur very often. This is the case, for example, in using KDD to assist in fraud detection in large insurance databases, where instances of fraud are usually relatively rare. Another issue which complicates KDD relates to its exploratory nature. Often in starting a KDD task it will not be clear exactly what is to be discovered. In general there is a broad idea of what is expected but as the process proceeds and results are generated the directions taken may change quite dramatically. Although the nature of the data may lead us to use particular techniques, it 2
3 is not always clear a priori which tools will produce the best results. Indeed, the use of a variety of tools and techniques often leads to an interesting variety of results. 3 The KDD Process We view the process of knowledge discovery from databases as consisting of the four stages identified in Figure 1. This covers the 9 steps identified by Fayyad, Piatetsky-Shapiro and Smyth (1996) into a smaller number of higher-level stages to more simply identify the whole process. It is important though to also appreciate the complexity of the process. Selection Preprocess Database Source Data Working Data Evaluation Extraction Knowledge Patterns Figure 1: The four stage KDD process. In most cases we begin with an original Database, usually developed for tasks other than KDD. After due consideration (which can require initial exploratory runs through the KDD process to gain better insights into what is required from the Database) an appropriate collection of Source Data is extracted from the Database. This Source Data forms the basis for the rest of the KDD process. 3.1 Data Pre Processing The purpose of the Preprocessing stage is to cleanse the data as much as possible and to put it into a form that is suitable for use in later stages. The types of Preprocessing that a KDD environment needs to support includes traditional database projection and selection, and value mapping and classification functions. Data cleansing type operations, such as operations to resolve fuzzy matches between entities in the data that actually represent the same entity, are also useful. The original Database is generally stored and maintained separately from the KDD process. Control and access to this original Database is usually restricted (by legislation and commercial concerns) due to the confidential nature of the material of interest. The Selection stage then is usually performed by the data owners, with input on data requirements from the KDD team. The Source Data often requires cleansing as part of the KDD process and may be transformed in many ways to turn it into suitable Working Data. These operations are part of the arsenal represented in the set of operations S. Meta-knowledge about the Source Data is also important in KDD. This includes semantic information provided by the schema, domains of attributes including types and value ranges, distributions of attribute values, and relations between attributes. This meta-knowledge is usually obtained from domain experts or can be calculated directly from the data. It is often considered as part of the background knowledge. 3.2 Knowledge Extraction Once a suitable Working Dataset has been produced the knowledge extraction phase begins. A variety of tools to explore different types of patterns in the Working Data can be explored. Some tools produce information that is used by other tools (e.g., statistical tools used to determine 3
4 characteristics of the data required as input parameters for machine learning). The extraction stage itself is often cyclic. In Statistics it is often represented as a cycle between the three stages of: model identification; parameter estimation; and diagnostic checking of the model fitted. The variety of techniques used in extraction include clustering, decision tree induction, neural networks, genetic algorithms, and a variety of statistical algorithms. These operations are usually oriented towards data description (as in clustering) or model fitting (as in classification). Classification rules are often extracted using these techniques and can be used for prediction on unseen objects. Various visualisation tools which include methods and algorithms for viewing the internal representation of the data in various human interpretable forms are also useful. These tools enable close involvement of the domain experts in all of the KDD stages domain experts are often more sensitive to subtleties of the data than automated algorithms. 3.3 Pattern Evaluation Using a variety of tools and even tuning a single tool in a variety of ways leads to the discovery of many different patterns. The fourth stage of KDD involves an evaluation of the patterns discovered in order to find those that give rise to useful and novel knowledge. Evaluation is a non-trivial task. A Pattern Evaluation function is used to evaluate the interestingness of discovered knowledge from the user's viewpoint. KDD requires that the discovered knowledge be useful in some sense. The Pattern Evaluation function governs the selection of the patterns which are of interest to the user and has been identified by others as an integral part of KDD (Frawley et al. 1992, Matheus et al. 1993, Fayyad, Piatetsky-Shapiro and Smyth 1996). Pattern evaluation is also important in reducing the search space (Holsheimer, Kersten, Mannila and Toivonen 1995). Formally, a Pattern Evaluation function F is a function that maps from a set of statements expressed in some language L (e.g., production rules) to a set of (usually) numeric values. The evaluation might consider each pattern in the context of all discovered patterns, or patterns might be evaluated for their usefulness, novelty, and validity in the context of the application domain. Evaluation functions are often quite complex and application specific. Domain experts often provide the most effective form of evaluation of discovered patterns. Visualisation tools are a critical component of this Evaluation, providing effective presentations of the results of mining. In such cases there is little need nor desire to implement algorithmic evaluation functions. Where they can be automated it is advantageous to do so to assist in automatically processing the large collection of discoveries common in the KDD context. Evaluation functions can be generic, as are those that are expressed in terms of the data mining tools being used (e.g., a good pattern is one that minimises the misclassification cost of the discovered rules). Alternatively they can be specific to an application, where the usefulness of a rule might be measured in terms of the associated claim costs in an insurance application. 3.4 The KDD Model The KDD process as described above represents the typical scenario for those involved in knowledge discovery from databases, even though the details of the process may differ for different KDD applications. The key elements of the process are: the data; the background knowledge and the discovered patterns and knowledge; the method(s) for evaluating discovered patterns; and the collection of operations associated with the different stages of the process. The basic model is thus identified as a four tuple < D; L; F; S > consisting of: database D; knowledge representation language L; evaluation functions F; and operations S. It is important to emphasise that for the successful application of the KDD process both domain expertise and KDD expertise are crucial, and that the KDD process is very much an iterative process where earlier stages often need to be refined to address discoveries made in later stages. 4
5 4 Case Study: Insurance Risk Analysis Assessing the performance of an insurance portfolio requires both overall and within-portfolio analyses. The overall analysis demonstrates whether the portfolio in the past was profitable or not, whereas a detailed within-portfolio analysis reveals which particular areas within the portfolio made significant profits and loses (Coutts 1984). The latter information is particularly important for a company in order to maintain both its competitiveness in the market and its profitability. Without knowing the areas of significant risk in its portfolio an insurer will be unable to sustain operation, even though the overall portfolio may perform well temporarily. A portfolio should balance its exposure to risk. The overall analysis of an insurance portfolio can be performed using straight forward statistical techniques, based on the total premiums earned and the total of the claims paid. More sophisticated approaches are required for within-portfolio analysis (Brockman and Wright 1992, Coutts 1984). A common approach to within-portfolio analysis is to partition the risk of the whole portfolio into small areas of risk. These are identified by a set of risk rating factors usually represented by a contingency table. Only a few variables are usually considered at a time (the claim frequency, the claim cost, and the exposure, for example) for each cell of the contingency table. When a portfolio is partitioned a model can be fitted to each cell relating the level of the risk rating factors to the claim frequency and claim cost. Historical data is used to estimate parameters of the model. The model can then be used to predict the expected claim frequencies and claim costs for different risk rating factor levels. With such an approach though the risk rating factors must be categorical. A continuous factor has to be categorised into a small number of levels. A trade-off is needed in the detail of the analysis and fit of the model. Also, the interaction between factors is often ignored because of the difficulty with handling them. With many variables and with categorical variables with many values, the number of possible interactions is very large. Alternative approaches to the task of insurance risk analysis have primarily been explored within the statistical/actuarial domains. Siebes (1994), however, considered the problem of insurance risk analysis in the context of data mining by employing probability theory. Insurance claims on policies in any one year were viewed in terms of Bernoulli experiments. This work lead to the idea of developing equal probability, homogeneous descriptions of classes of the insurance portfolio. An important goal in our exploration of the KDD process for real world knowledge acquisition tasks is to involve expert intuition as an integral aspect of the process, supported by formal analyses. 5 Insurance Portfolio Databases A portfolio database in an insurance company contains a set of insurance policies purchased by customers. A policy insures up to a specified value a particular entity (motor vehicle, household contents, buildings) from loss. When damage or loss occurs to the insured entity a claim is made against the policy for compensation. A policy is valid for a certain time period and is usually renewed upon payment of the premium. The period during which a policy is active is referred to as the period of exposure (as in the insurer's exposure to risk) and is usually measured in days. At any particular time the exposures associated with active policies in the portfolio database will differ depending upon their dates of validity. A key factor to the success of an insurance portfolio is balancing the trade-off between the setting of competitive premiums and covering the risk associated with the exposure. The insurance market is very competitive and setting premiums too high leads to a loss of market share. Yet setting premiums too low leads to loss of profit. Premiums are generally based on a small number of key factors (such as age of driver, type of vehicle, and address of owner) identified by various analyses and intuitions. Because of the size of many insurance portfolios, analysis is often restricted to straightforward techniques. 5
6 The performance of a portfolio is usually assessed in the context of the previous year's data. The analyses performed on the data are used by the underwriter to envisage the near future performance of the portfolio and to adjust the policy rating structure to reflect changes in the market and in the behaviour of the insured. Essentially the analyses are used each year to tune the rules which set premiums for the coming year. There are two extremes in setting premiums: all policies attract the same premium; or each individual policy has an individual premium determined for it based on all of the details of the policy. Neither extreme is particularly useful nor practical. The former would likely penalise those who are less likely to lodge claims in favour of those who a more likely to do so, and would probably drive them to the competitors! The latter would generally be impractical because of the complexity of the rules that would be required. However, the goal of good insurance premium setting is likely to lean towards the latter than to the former. An insurer would prefer to fine tune their premium setting more so than is often currently performed, identifying more factors that affect the risk and taking more information into account in setting premiums. 6 Preprocessing Starting with the data extracted from the source database maintained by NRMA Insurance Limited (consisting of many millions of policies), a number of transformations had to be performed before a suitable Working Dataset was built. This involved many iterations, some of which were performed after the initial knowledge extraction tasks were performed. (The initial runs indicated different directions to take, different data to obtain, and different transformations to be performed.) The Source Dataset was extracted from the NRMA's motor vehicle insurance portfolio database. An initial trial dataset of some 125,000 records, covering a time period of only 6 months (1 July 1994 to 31 December 1994), was used prior to working with the full dataset. Each record contained 93 fields of data. Figure 2 summarises the preprocessing performed records policies Database 58MB Selection Source Data 18MB Cleanse Same Disjoint Extended Period Periods Periods Noise Same Person Different Persons Preprocessing records policies Working Data 15MB Figure 2: Preprocessing. 6
7 The first task was to remove from the database those fields/attributes/variables which were irrelevant to the task at hand. This process was complicated by the fact that some obviously irrelevant attributes (e.g., office at which application for insurance was lodged) could contain surprising information. On the other-hand, leaving irrelevant attributes in the data set can lead to aberrant results. A panel of KDD experts (from Machine Learning, Statistics, and Databases) and domain experts (from the NRMA) considered each attribute in turn, removing only those that were seen as clearly irrelevant by all. The second task, labelled Cleansing in Figure 2, involved performing various transformations on the data (e.g., birth dates transformed into ages). Various computed fields were also added to the data at this stage. One, for example, calculated the period of exposure associated with the policy in the context of the period of interest (last six months of 1994). The third task involved collapsing the transaction-oriented data that was supplied into a policyoriented dataset which is required for the types of analyses intended to be performed. The original database was oriented towards managing the insurance portfolio it was not designed with Data Mining in mind. A transaction in the original database records some change made to, or incident associated with, a policy. Hence, in the 125,000 records, there were only about 83,000 policies. Some policies had multiple owners (hence multiple records) while others had a variety of changes, such as renewals, cancellations, etc. Other policies appeared more than once when there was more than a single claim made on the policy in the period of interest. A careful study of the dataset had to be made to understand all of its nuances. This finally resulted in the definition of a set of rules to merge the multi-record policies into single policy records. These rules were implemented as a variety of operations and were tuned iteratively as the data and problems became better understood. An intermediate dataset was created with some 78,000 individual policies, each described by 43 attributes, some of which were generated in the merging process to record, for example, the number of owners of each individual policy and the total amount of all claims made against a policy. Policies that were cancelled or met other criteria (e.g., contained critical but missing data) were also removed from the dataset. The intermediate dataset was intended to be used for multiple data mining objectives (e.g., fraud detection). Consequently, some of its attributes were still irrelevant to the risk analysis and thus further processing was performed to build the final risk analysis Working Dataset. This dataset contained some 75,000 unique policies, each being represented by 23 attributes, including exposure and total claims cost. An initial observation that has some impact upon the type of analysis performed was that of the 75,000 policies, only about 4,000 of them actually had claims against them. While not particularly surprising, such a low relative occurrence (about 5%) of an important aspect of the data does require attention (e.g., appropriate use of prior probabilities), but is beyond the immediate scope of this paper. A second observation of relevance is that different policies have different exposures, even though the majority have exposure for the whole period. This indicates that the exposure should be used to weight the examples in some way. This important aspect of the data is also left for future exposition. 7 Risk Analysis We now describe the process of extracting knowledge for use in this domain. 7.1 Decision Trees An implementation of the classification and regression tree software CART (Breiman, Friedman, Olshen and Stone 1984) was used in the initial experiments to extract interesting patterns. Due to the large number of training examples a parallel implementation by Thinking Machines Corporation, called StarTree and part of the Darwin toolkit (Thinking Machines Corporation 1995), was 7
8 used. CART and StarTree are similar in many ways to the traditional machine learning decision tree induction algorithm C4.5 (Quinlan 1993). StarTree consists of three operations: grow tree; evaluate tree; and apply tree. The grow tree stage actually implements the usual divide-and-conquer strategy in a parallel architecture for building a full decision tree based on a training set. The evaluate tree stage evaluates the generated tree in the context of a test dataset. It is here that pruning is performed and a variety of test datasets can be used to explore the performance of the tree in the context of pruning. The apply tree stage simply applies any resulting trees to new unseen data. StarTree provides a variety of output options, including a L A TEX formatted decision tree or a collection of rules. 7.2 Methodology The claim cost attribute in the final database was selected as the target attribute (or dependent variable) and the remainder (excluding exposure) as the independent variables. All policies were classified into two classes: those with claims and those with no claims the claim cost was effectively reclassified as 1 or 0. The dataset was processed with grow tree to produce a complete decision tree. Using StarTree a rule base can be built from a generated decision tree. A rule base consists of a set of rules, each corresponding to a different path through the tree representing a conjunction of the conditions associated with the nodes of the tree. An example rule might be: If age 20 and sex = Male and insured amount 5000 and insured amount < Then insurance claimed = 1, cost = 0, (0, 15) The rule indicates that under the given conditions an insurance policy is claimed against (and the cost of misclassification associated with this rule is 0). There are 15 true examples from which this rule was generated. The primary rule base generated by StarTree is now extended with information derived from other sources, including the Source Dataset and original database. Significant areas of risk were explored by viewing those leaf nodes containing claims. The frequency of claims at any leaf together with the total of the claim costs at the leaf nodes provide important information. However, since claims cost had been factored out of the training set used by StarTree to build the decision tree, this information was incorporated as an extra step in the process. The rules are thus extended with information that records the total exposure and the total claim cost for each rule. Other information such as the total of the premiums associated with the area of risk could also have been determined. The generation of this extra information is an important step, particularly in assessing the usefulness and appropriateness of the discovered knowledge. The additional information can be used to determine whether the information embodied as a rule is useful as knowledge. Such an exploration of the rules discovered turns this data mining operation, in a sense, back onto itself. This is particularly the case where we are dealing with very large datasets with many attributes from which very large trees (and hence very many rules) are generated without pruning (and hence over fitting). The generated rules themselves need to be data mined (in the context of this evaluation stage). 7.3 Initial Results Some initial results from the analysis of the NRMA Insurance Limited sample database are presented here. Due to the commercial nature of the data the actual variables will be referred to as F01, F02, :::, F23. For illustration, three decision trees were generated using different selection measures for the partitions we will refer to the three trees as the entropy, gini, and error trees. (Refer to Mingers 8
9 Tree name: Entropy Gini Error Decrease Function entropy gini error Number of Nodes Depth Total leaves Positive leaves (Pl) Negative leaves (Nl) Policies by Pl Policies by Nl Table 1: Using alternative decrease functions. (1989) for a discussion of selection measures.) An idea of the size and organisation of the trees produced from the sample data is gained from the statistics of the various trees grown (Table 1). It is clear from Table 1 that very large trees are generated given the large number of attributes (and possible values) and the large number of examples. The row in the table giving the total number of leaves corresponds also to the number of rules generated from the tree. Similarly the positive leaves and the negative leaves. The Entropy tree, for example, generates 1967 rules for claims. Although 21 independent variables were used in the input data, only 19 were selected by the Entropy and Gini trees and 20 by the Error tree. Table 2 lists the selected variables and the frequencies with which the variables appear in each tree. Thus, the variable F01 appears 111 times in the Error tree. Such information can be of use in gaining some insights into which variables appear to be important. (The location of the variable within the tree and the type of the variable categorical versus continuous are other important indicators.) Variable Entropy Gini Error F F F F F F F F F F F Variable Entropy Gini Error F F F F F F F F F F Table 2: Attribute frequencies. Table 3 records the number of claims associated with the rules generated from each of the three trees, aggregated by the number of claims. There are 1090 rules from the Entropy tree, for example, which have just a single claim. On the other hand, there are 36 claims associated with one of the rules from the Gini tree. Those rules associated with a higher number of claims would tend to indicate areas of high risk. Tables 4 summarises the rules from each tree with more than 14 claims against them or with a high average claim cost with respect to the exposure. For each such rule the tables record the number of claims, the total amount of exposure, and the sum of the claim costs. Each row represents a single rule. In the Gini sub-table, for example, there are 36 claims associated with a single rule. These claims cover 442 days of exposure and have a total claim cost of $159,472. Such information can again be used to target areas described by the rules for further explorations. Armed with such post learning analyses, areas of significant insurance risk can be identified and further investigated. Such investigations could be expected to lead to a better understanding of insurance risk and to a finer tuning of insurance premium setting. 9
10 Claims Number of Rules Entropy Gini Error Claims Number of Rules Entropy Gini Error Table 3: Claims associated with each risk area. Gini Claims Exposure Cost Entropy Claims Exposure Cost Error Claims Exposure Cost Table 4: High claim risk areas. 8 Summary In this paper we have followed the KDD process and highlighted its application to the domain of motor vehicle insurance risk assessment as an adjunct to the usual knowledge acquisition process. StarTree was used to analyse the large insurance dataset, generally building very large decision trees (and hence large rule sets). The rules were used only as an aid in the discovery of interesting areas of the data. By understanding such hot spots, better insights into policy premium setting can be gained. We have identified a simple yet sufficient breakdown of the KDD process involving four interacting and iterative stages, and have presented an associated model of KDD identifying the key elements of the process. An example knowledge acquisition exercise using the KDD methodology for insurance risk analysis has been used to illustrate the process and the model. Our ongoing work is exploring alternative approaches for dealing with the very large datasets that are generally available for applications such as insurance. An approach whereby large datasets are first clustered and where the resulting clusters are subjected to decision tree induction has been found to simplify the knowledge acquisition task. Even with high performance computers which can handle larger problems, such considerations remain necessary. Acknowledgements The authors acknowledge the support provided by the Cooperative Research Centre for Advanced Computational Systems (ACSys) established under the Australian Government's Cooperative Research Centres Program. This work has been performed within the Divisions of 10
11 Information Technology (DIT) and Mathematics and Statistics (DMS) of the Australian Governments' Commonwealth Scientific and Industrial Research Organisation (CSIRO). Contributions made by Peter Milne of DIT and Murray Cameron, Glenn Stone, Petra Kuhnert, and David Chan of DMS are also acknowledged. Industry collaborators from NRMA Insurance Limited include Keith Forster, Philip Woods, and Hong Ooi. References Agrawal, R. and Srikant, R.: 1994, Fast algorithms for mining association rules in large databases, VLDB '94. Brachman, R. J. and Anand, T.: 1996, The process of knowledge discovery in databases: A human-centered approach, in Fayyad, Piatetsky-Shapiro, Smyth and Uthurusamy (1996), chapter 2. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J.: 1984, Classification and regression trees, Wadsworth, Belmont, CA. Brockman, M. J. and Wright, T. S.: 1992, Statistical motor rating: Making effective use of your data, Journal of Institute of Acturaries 119, Coutts, S. M.: 1984, Motor insurance rating: An acturarial approach, Journal of Institute of Acturaries 111, Fayyad, U. M., Piatetsky-Shapiro, G. and Smyth, P.: 1996, From data mining to knowledge discovery: An overview, in Fayyad, Piatetsky-Shapiro, Smyth and Uthurusamy (1996), chapter 1. Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusamy, R. (eds): 1996, Advances in Knowledge Discovery and Data Mining, AAAI Press. Frawley, W. J., Piatetsky-Shapiro, G. and Matheus, C. J.: 1992, Knowledge discovery in databases: An overview, AI Magazine 13(3), Holsheimer, M., Kersten, M., Mannila, H. and Toivonen, H.: 1995, A perspective on databases and data mining, Proc. of the First International Conference on Knowledge Discovery and Data Mining, AAAI Press, pp Matheus, C. J., Chan, P. K. and Piatetsky-Shapiro, G.: 1993, Systems for knowledge discovery in databases, IEEE Transactions on Knowledge and Data Engineering 5(6), Mingers, J.: 1989, An empirical comparison of selection measures for decision-tree induction, Machne Learning 3(4), Quinlan, J. R.: 1993, C4.5: Programs for machine learning, Morgan Kaufmann. Siebes, A.: 1994, Homogeneous discoveries contain no surprises: Inferering risk-profiles from large databases, Technical Report CS-R9430, CWI. Thinking Machines Corporation: 1995, The Darwin solution: A family of prediction and classification tools for large databases, Technical report, Thinking Machines Corporation. Williams, G. J. and Huang, Z.: 1996, Modelling the KDD process: A four stage process and four element model, Technical Report TR-DM-96013, CSIRO Division of Information Technology, available from [email protected]. 11
Evolutionary Hot Spots Data Mining. An Architecture for Exploring for Interesting Discoveries
Evolutionary Hot Spots Data Mining An Architecture for Exploring for Interesting Discoveries Graham J Williams CRC for Advanced Computational Systems, CSIRO Mathematical and Information Sciences, GPO Box
How To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland [email protected],
Decision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
A Data Mining Tutorial
A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York [email protected] Abstract
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa [email protected] skype, gtalk: avellido Tels.:
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email [email protected]
Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Interactive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,[email protected]) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING
DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING ABSTRACT The objective was to predict whether an offender would commit a traffic offence involving death, using decision tree analysis. Four
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *
CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA [email protected] Efficient partitioning
Performance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Data Mining and KDD: A Shifting Mosaic. Joseph M. Firestone, Ph.D. White Paper No. Two. March 12, 1997
1 of 11 5/24/02 3:50 PM Data Mining and KDD: A Shifting Mosaic By Joseph M. Firestone, Ph.D. White Paper No. Two March 12, 1997 The Idea of Data Mining Data Mining is an idea based on a simple analogy.
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
The KDD Process for Extracting Useful Knowledge from Volumes of Data
Knowledge Discovery in bases creates the context for developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific,
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil [email protected] 2 Network Engineering
Knowledge Discovery and Data Mining: Towards a Unifying Framework
From: KDD-96 Proceedings. Copyright 1996, AAAI (www.aaai.org). All rights reserved. Knowledge Discovery and Data Mining: Towards a Unifying Framework Usama Fayyad Microsoft Research One Microsoft Way Redmond,
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Rule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
How To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
On the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
Standardization of Components, Products and Processes with Data Mining
B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization
Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
D A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM
Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable
Weather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,[email protected],8407974457 Abstract
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. www.cs.cmu.edu/~awm [email protected].
Decision Trees Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm [email protected] 42-268-7599 Copyright Andrew W. Moore Slide Decision Trees Decision trees
An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
Introduction to Data Mining
Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro) Overview Why data mining (data cascade) Application examples Data Mining
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)
Cost Drivers of a Parametric Cost Estimation Model for Mining Projects (DMCOMO) Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García Universidad Carlos III de Madrid (UC3M) Abstract Mining is
Perspectives on Data Mining
Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London [email protected] April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
Use of Data Mining in the field of Library and Information Science : An Overview
512 Use of Data Mining in the field of Library and Information Science : An Overview Roopesh K Dwivedi R P Bajpai Abstract Data Mining refers to the extraction or Mining knowledge from large amount of
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
A Spatial Decision Support System for Property Valuation
A Spatial Decision Support System for Property Valuation Katerina Christopoulou, Muki Haklay Department of Geomatic Engineering, University College London, Gower Street, London WC1E 6BT Tel. +44 (0)20
How To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző [email protected] 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
Three Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model
A Study of Detecting Credit Card Delinquencies with Data Mining using Decision Tree Model ABSTRACT Mrs. Arpana Bharani* Mrs. Mohini Rao** Consumer credit is one of the necessary processes but lending bears
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
Importance or the Role of Data Warehousing and Data Mining in Business Applications
Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information
DATA MINING AND WAREHOUSING CONCEPTS
CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
Data mining and official statistics
Quinta Conferenza Nazionale di Statistica Data mining and official statistics Gilbert Saporta président de la Société française de statistique 5@ S Roma 15, 16, 17 novembre 2000 Palazzo dei Congressi Piazzale
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
