Data Mining as an Automated Service

Size: px
Start display at page:

Download "Data Mining as an Automated Service"

Transcription

1 Data Mining as an Automated Service P. S. Bradley Apollo Data Technologies, LLC February 16, 2003 Abstract An automated data mining service offers an out- sourced, costeffective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized. 1 Introduction The amount spent by organizations in implementing and supporting database technology is considerable. Global 3500 enterprises spend, typically, $664,000 on databases annually, primarily focusing on transactional systems and web-based access [25]. Unfortunately, the ability of organizations to effectively utilize this information for decision support typically lags behind their ability to collect and store it. But, organizations that can leverage their data for decision support are more likely to have a competitive edge in their sector of the market [19]. An organization may choose from a number of options when implementing the data analysis and mining technology needed to support decision-making processes or to optimize operations. One option is to perform the analysis within the organization (i.e. in-house ). It is 1

2 costly to obtain the ingredients required for a successful project: analytical and data mining experience on the part of the team performing the work, software tools, and hardware. Hence the in-house data mining investment is substantial. Another option that an organization may pursue is to out-source the data analysis and mining project to a third party. Data mining consultants have gone into business to address this market need. Consultants typically have data analysis and mining experience within one or more market sectors (e.g. banking, health-care, etc.), have analysis and mining software at their disposal, have the required hardware to process large datasets, and are able to produce customized analysis solutions. The cost of an outsourced data mining project is dependent upon a number of factors including: project complexity, data availability, and dataset size. Out-sourcing such projects to a third party consulting firm has proven to be successful, however, the cost is on the order of tens to hundreds of thousands of dollars. Unfortunately, a number of organizations simply do not have the resources available to perform data mining and analysis in-house or to out-source these tasks to large consulting firms. But these organizations often realize that they may be able to gain a competitive advantage by utilizing their data for decision support and/or operational improvement. Hence, there is an opportunity to deliver data mining and analysis services at a reduced cost. We argue that an automated data mining service can provide such analysis at a reduced cost by targeting organizations (or data mining clients) within a given market sector (or market vertical) and automating the knowledge discovery and data mining process [12, 9] to the extent possible. We note that there are similarities between the design of an automated data mining service and data mining software packages focusing on specific business needs within given vertical markets. By focusing on specific problems in a vertical market, both the software and service designer are able to address data format and preparation issues and choose appropriate modeling, evaluation and deployment strategies. An organization may find a software solution appealing, if it addresses their specific analysis needs, has the proper interfaces to the data and end-user (who may not be an analyst), and the analysis results are easily deployable. Organizations that have problems not specifically addressed by software solutions, or are attracted to a low-cost, outsourced alternative, may find an automated data mining service to be their best option. We next present the steps of the knowledge discovery and data mining process to provide context for the discussion on automating various steps. Data mining project engagements typically follow the following process (note that the sequence is not strict and often moving back and forth between steps is required) [9]: 2

3 1. Problem understanding: This initial step primarily focuses on data mining problem definition and specification of the project objectives. 2. Data understanding: This step includes data extraction, data quality measurement, and detection of interesting data subsets. 3. Data preparation: This step includes all activities required to construct the final dataset for modeling from the raw data including: data transformation, merging of multiple data sources, data cleaning, record and attribute selection. 4. Modeling: In this step, different data mining algorithms and/or tools are selected and applied to the table constructed in the data preparation step. Algorithm parameters are tuned to optimal values. 5. Evaluation: The goal of this step is to verify that the data mining or analysis models generated in the modeling stage are robust and achieve the objectives specified in the problem understanding step. 6. Deployment: In this step, the results are put in a form to be utilized for decision support (e.g. a report) or the data mining models may be integrated with one or more of the organizations IT systems (e.g. scoring prospective customers in real-time). Fig. 1 shows these analysis steps and the party that is primarily responsible for each individual step. We next specifically define the data resources and parties that form the basis of the relationship between the automated data mining service and the client. 1.1 Definitions Definition 1.1 (Raw Data) The base electronic data sources that contain or are believed to contain information relevant to the data mining problem of interest. Definition 1.2 (Metadata) Additional information that is necessary to properly clean, join and aggregate the raw data into the form of a dataset that is suitable for analysis. Metadata may consist of data schemas, data dictionaries (including the number of records in each file, and the names, types and range of values for each field), and aggregation rules. Definition 1.3 (Domain-specific Information) Additional information on specific rules, conventions, practices, etc. that are limited to the problem domain. 3

4 Problem Understanding Data Understanding Data Preparation Data Mining Client Modeling Data Mining Service Evaluation Deployment Figure 1: Data Mining Project Steps: Problem understanding, data understanding, data preparation, modeling, evaluation and deployment. Dashed lines indicate that primary responsibility is on the part of the data mining client. Dotted lines indicate that primary responsibility is on the part of the service. Notice that both the client and service play central roles for the data understanding step. 4

5 For example, a retail chain may indicate product returns in their transactional record with negative sales and quantity values. This information may be extremely helpful in the data preparation, evaluation and possibly the modeling steps. Definition 1.4 (Third-party data sources) Additional raw data that will likely improve the analysis result, but not collected or owned by the data mining client. Examples of third-party data sources include address information from the postal service, demographic information from third party data collection companies, etc. Definition 1.5 (Automated Data Mining Service) An organization that has implemented or obtained processes, tools, algorithms and infrastructure to perform the work of data understanding, data preparation, modeling, and evaluation with minimal input from a human analyst. The high-level goals of the automated data mining service are the following. 1. Provide the data mining client with a high quality, pertinent result. Achieving this goal ensures that the organization receiving the data mining/analysis results is satisfied (and hopefully a return customer!). 2. Remove or minimize the amount of human intervention required to produce a high-quality, pertinent result. Achieving the second goal allows the data mining/analysis service to scale to a large number of concurrent customers, allowing it to amortize the cost of offering the service across a large number of customers. In addition to automating as much of the operational aspects of the analysis process as possible, this goal is achievable by focusing on problems common in a few market verticals or problem domains. For this discussion, we will assume that the set of data mining analysis problems that the automated service addresses is fixed. For example, the service may be able to offer data clustering analysis services, but is not able to offer association-rule services. We will also drop the word automated for the remaining discussion and it is assumed that when referring to the data mining service, we are referring to an automated data mining service. Definition 1.6 (Data Mining Client) An organization that possesses two items: (1) a data mining analysis problem; (2) the raw data, metadata, domain-specific information that, in combination with possibly third party data sources, are needed to solve the data mining analysis problem. 5

6 We next specify the context of the relationship between the data mining client and the data mining service. 1.2 Relationship between Consumer and Service The data mining service receives raw data, metadata, and possibly domain-specific information from the client. The service then performs data understanding, data preparation, modeling, and evaluation for the analyses specified by the consumer, for an agreed-upon fee. When these steps are completed, the results and possibly the provided data and other information (reports on analyses tried, intermediate aggregations, etc.) are returned to the client. The combination of offering a fixed set of data mining and analysis solutions and focusing on clients from similar domains or market verticals enables the data mining service to perform the data understanding and deployment tasks with minimal intervention from the human data analyst working for the service. Offering a fixed set of data mining and analysis solutions allows the service to templatize problem definition documents and related information. Solution deployment is also similarly constrained by the problem domain and the focused vertical market to again allow the service to templatize deliverables. Additionally, by focusing on a particular problem domain, the data mining service analyst gains invaluable domain knowledge increasing the likelihood of successful solution delivery in future engagements. Example: Consider the e-commerce domain. A data mining service may offer the following solutions: determining the most common paths followed by website visitors, determining the most common products purchased together and ranking products that are most likely to be purchased together. For the e-commerce domain, problem definition and specification document templates can help make the problem understanding phase clear and efficient. Additionally, the deployment steps may be nearly automated. Depending on the analysis performed, the results may take the form of an automatically generated report, a file, an executable, etc. Analysis of common paths and common products purchased together are typically best delivered in a standard report form. Product rankings may best be delivered to the client in the form of a file or an executable that takes a given product ID and returns the ranked list of product IDs. There are often legal, security, and privacy concerns regarding data extraction on the part of the data mining client that should be addressed by the service. For more detail into these issues, please see [22]. The remainder of the paper is organized as follows. Sections 2, 6

7 3 and 4 focus on issues involved in automating and scaling the data preparation, modeling, and evaluation steps in the general KDD process. Section 5 concludes the paper. 2 Data Understanding and Preparation Tasks involved in the data understanding step are data extraction, data quality measurement and the detection of interesting data subsets. The data preparation step consists of all activities required to construct the final dataset for modeling. Responsibility for tasks in the data understanding step is typically split between the client and the service. The data mining client is responsible for extracting and providing the required raw data sources to the service for analysis. The data mining service may also augment the raw data provided by the client with third-party data sources. Data quality measurement and detection of interesting data subsets are performed by the service. To efficiently address data understanding and preparation tasks, the data mining service needs to rely upon the fact that its clients come from certain specific domain areas. Ideally, these domains are chosen so that the organizations within a particular domain have similar data schemas and data dictionaries characterizing the data sources that are to be analyzed. Given a-priori knowledge about the data schemas and data dictionaries for a given domain, the data mining service can automate the operational steps of joining the appropriate raw data sources and possibly integrating third-party data sources. Similar data formats within a market vertical or domain also justifies the building of automated domain-specific data cleaning and data quality measurement tools. In the perfect setting, domain-specific rules for data cleaning can be completely automated. But, a useful solution automatically captures and fixes a majority of data cleaning issues and only flags a small fraction of violations for human intervention and triage. The goal of data quality measurement is to justify the potential that the data may provide the required solution to the given problem [22]. Since the solutions offered by the service and the data sources themselves tend to be domain-specific, automating the data quality measurement may be done with minimal effort and requiring only configuration information. Example: For example, suppose a data mining service is offering market-basket analysis for e-commerce companies. The typical data source of interest includes the order header, order line item information, and product catalog data. Since these sources tend to have similar schemas, it is possible to automate some data cleaning processes (en- 7

8 suring that product IDs in the order information correlate with product IDs in the catalog, that rules governing the line item price information with respect to the catalog prices are respected, etc.). Additionally, initial data quality measurements may include the number of 1-itemsets that have sufficient support. Similarly, when restricted to a small number of domains and a small set of data mining problems to address, with knowledge and expertise, it is possible to automatically apply a number data transformations and feature selection techniques that are shown to be useful for the given domain (possibly consisting of a combination of domain knowledge and automated feature selection methods). These can be automatically executed and the resulting models may be automatically scored, yielding a system that constructs a number of models with little intervention from the human analyst. For a more detailed discussion on automating the feature selection/variable transformation task, see [2]. 3 Modeling Modeling is the step in the data mining process that includes the application of one or more data mining algorithms to the prepared dataset, including selection of algorithm parameter values. The result of the modeling step is a series of either predictive or descriptive models or both. Descriptive models are useful when the data mining client is attempting to get a better understanding of the dataset at hand. Predictive models summarize trends in the data so that, if future data has the same or similar distribution to past data, the model will predict trends with some degree of accuracy. In this section we assume that the data understanding and data preparation phases have produced a dataset from which the desired data mining solution can be derived. Although, in practice, often results of the modeling step motivate revisiting the data understanding and data preparation steps. For example, after building a series of models it may become apparent that a different data transformation would greatly aid in the modeling step. When evaluating the utility of a given data mining algorithm for possible use in the service, the following considerations should be taken into account: 1. Assuming the prepared dataset is informative, is it possible to obtain high-quality models consistently using the given algorithm? 2. Is the algorithm capable of optimizing objectives specific to the client s organization (e.g. total monetary cost/return)? 3. Is the algorithm efficient? 8

9 There are two factors influencing the likelihood of obtaining a high quality, useful model using a given algorithm. The first factor relates to the robustness of the computed solution with respect to small changes in the input parameters or slight changes in the data. Ideally, for a majority of datasets that are encountered in a given domain, the data mining service prefers robust algorithms since model quality with respect to small parameter and data change is then predictable (and hence, the algorithm is amenable to automation). The second factor is the ease at which the insight gained by analyzing the model can be communicated to the data mining client. Typically, prior to deployment of a model or utilizing the model in organizational decision processes, the data mining client desires to understand the insights gleaned from the model. This process is typically difficult to automate, but developing intuitive, easy to understand user interfaces aid greatly. Additionally, a process that identifies a fraction of interesting rules or correlations and reports these to a data mining service analyst is very useful quality assurance tool. These are primarily concerns during the deployment step, but the choice of modeling technique does effect this later phase. We note that there are some data mining applications in which the client often does not analyze the model, but analyzes the computed results (e.g. results produced by product recommender systems are often analyzed, rather than attempting to understand the underlying model). Industry standards for data mining model storage such as PMML [13] and OLE DB for Data Mining [10] enable consultants and third party vendors to build effective model browsers for specific industry problems and domains. These standards provide a basis for data mining platforms that enable data mining clients to more easily deploy and understand the models built by the service. From the viewpoint of the data mining client, model maintenance tends to be an important issue. The data mining client may not have or may not want to invest resources to ensure that the data mining models they ve received from the service are maintained and accurately model their underlying organizational processes, which may be changing over time. Techniques for incrementally maintaining data mining models are discussed in [4]. Additionally, work on identifying the fit of data mining models to data changing over time includes [7]. It may be possible for the service to incorporate these techniques into the client deliverable so that that model may maintain itself or notify the client that it is not sufficiently modeling recently collected data. We briefly discuss some popular algorithms used in developing data mining solutions. Note that this list is not exhaustive. 9

10 3.1 Decision Trees Decision tree construction algorithms are appealing from the perspective of the data mining service for a number of reasons. The tree s hierarchical structure enables non-technical clients to understand and effectively explore the model after a short learning period. Decision tree construction typically requires the service to provide few, if any, parameters, hence tuning the algorithm is relatively easy. Typically, small data changes result in small, if any, changes to the resulting model (although this is not true for all datasets and possible changes in data) [8]. For excellent discussions on decision tree algorithms, please see [5, 18]. For a discussion on techniques used to scale decision tree algorithms to large datasets, see [4]. 3.2 Association Rules Association rule algorithms identify implications of the form X Y where X and Y are sets of items. The association rule model consists of a listing of all such implications existing in the given dataset. These implications are useful for data exploration and may be very useful in predictive applications (e.g. see [17]). Association rules are often presented to the user in priority order with the most interesting rules occurring first. Alternatively or in addition to the list of interesting rules, a browser allowing the data mining client to filter rules with specified item occurring in the set X or Y is typically useful. For an overview of association rule algorithms, see [1, 16]. Approaches used to scale association rule algorithms to large databases are discussed in [4]. 3.3 Clustering Clustering algorithms aim to partition the given dataset into several groups such that records in the same group are similar to each other, identifying subpopulations in the data. Typically, the data mining client is not interested in the particular clustering strategy employed by the service, but is interested in a concise, effective summary of the groups that occur in their data. Although there are numerous clustering approaches available, we will focus the discussion on two methods: iterative and hierarchical methods. Iterative clustering methods are well-known and typically straightforward to implement. But from the perspective of the data mining service, there are two challenges to automating them: obtaining a robust model that accurately characterizes the underlying data, and determining the correct number of clusters existing in the underlying data. Iterative clustering methods require the specification of initial clusters and the computed solution is dependent upon the quality of 10

11 this initial partition. Hence to ensure a quality solution, the data mining service must implement a search for a good initial clusters [3]. This is typically done by re-running the iterative clustering algorithm from multiple random initial clusters and taking the best model or utilizing sampling strategies. Additionally, determining the correct number of clusters is challenging, but strategies such as those discussed in [23] are useful. For a general overview of iterative clustering methods, see [15, 11]. Hierarchical clustering methods build a tree-based hierarchical taxonomy (dendogram) summarizing similarity relationships between subsets of the data at different levels of granularity. The hierarchical nature of the resulting model is a benefit for the data mining service since this structure is typically easily browsed and understood by the client. Additionally, these clustering methods are very flexible when it comes to the distance metric employed to group together similar items, making hierarchical methods applicable to a number of problems that require the use of non-standard distance metrics. The main caveat to standard hierarchical clustering implementations is in their computational complexity, requiring either O(m 2 ) memory or O(m 2 ) time for m data points, but automating these standard implementations is straightforward. Work on scaling these methods to large datasets includes [14, 20]. For a detailed discussion of hierarchical clustering methods, see [15]. 3.4 Support Vector Machines Support Vector Machines (SVMs) are powerful and popular solutions to predictive modeling problems. SVM algorithms are typically stable and robust with respect to small changes in the underlying data. The algorithms require the specification of a parameter that effectively balances the predictive performance on the available training data with the complexity of the predictive model computed. Tuning set strategies are typically used to automate the selection of optimal values of this parameter. The SVM predictive model is a function in the space of the input or predictor attributes of the underlying data. Since this space tends to be high-dimensional, presenting the SVM model to the data mining client for the purpose of gaining insight is often a difficult proposition. The SVM is a very good predictive model, but is somewhat of a black-box with respect to understanding and extracting information from its functional form. For an overview of SVMs, see [6]. For strategies on scaling SVM computation to large datasets, see [4]. 11

12 4 Evaluation Prior to delivering a data mining model or solution to the client, the service will evaluate the model or solution to ensure that it has sufficient predictive power or provides adequate insight into trends existing in the underlying data. The primary focus in the evaluation phase is ensuring that the client is being handed a high-quality result from the service. Depending upon the project, model evaluation may involve one or two components. The first is an objective, empirical measurement of the performance of the model or of the ability of the model to accurately characterize the underlying data. The second, which may not be needed for some projects, involves a discussion or presentation of the model with the client to collect feedback and ensure that the delivered model satisfies the client s needs. This second component is typical for projects or models in which the goal is data understanding or data exploration. We discuss the empirical measurement component of the evaluation phase in more detail for two high-level data mining tasks: predictive applications and data exploration. By exposing intuitive, well-designed model browsers to the client, the service may automate the process of presenting the model and collecting client feedback to the extent possible. As the service focuses on clients in particular domains or verticals, model browsers or report templates may be created that raise attention to important or interesting results for the specific domain or market. 4.1 Evaluating Predictive Models The primary focus of predictive model evaluation is to estimate the predictive performance of a given model when it is applied to future or unseen data instances. Algorithms discussed that produce predictive models include decision trees (Section 3.1), association rules (Section 3.2) and support vector machines (Section 3.4). The basic assumption underlying different predictive performance estimation strategies is that the distribution of future or unseen data is the same (or similar to) the distribution of the training data used to construct the model. Popular methods for estimating the performance of predictive models include cross-validation [24] and ROC curves [21]. Cross-validation provides an overall average predictive performance value, given that the data distribution assumption above is satisfied. ROC curves provide a more detailed analysis of the frequency of false positives and false negatives for predictors constructed from a given classification algorithm. 12

13 From the viewpoint of the data mining service, automating crossvalidation and ROC computations is straightforward. Running these evaluation techniques requires little (if any) input from a human analyst on the part of the service. But computation of these values may be time consuming, especially when the predictive modeling algorithm used has a lengthy running time on the client s data. In addition to evaluating a given model (or set of models) with respect to predictive performance, the data mining service may implement evaluation metrics that are more informative for clients within a given domain or vertical. Example: Consider again the data mining service that caters to e- commerce clients. The service may provide product recommendations for e-commerce companies (i.e. when a customer is viewing sheets at the e-commerce site, recommend pillows to them also). In this case, although the recommender system is a predictive modeling system, the data mining service may evaluate different predictive models with respect to the amount of revenue expected to be generated when the recommender is placed in production. 4.2 Evaluating Data Exploration Models The primary goal in evaluation of models that support data exploration tasks is ensuring that the model is accurately summarizing and characterizing patterns and trends in the underlying dataset. Algorithms discussed that address data exploration tasks are the clustering methods mentioned in Section 3.3. To some extent association rules (Section 3.2) are also used as data exploration tools. Objective measures for evaluating clustering models to ensure that the model accurately captures data characteristics include Monte Carlo cross-validation [23]. This method is straightforward to automate on the part of the data mining service. Given the nature of association rule discovery algorithms, the set of association rules found are, by definition, accurately derived from the data. So there is no need to empirically measure the fit of the set of association rules to the underlying dataset. The quality of data exploration models is related to the utility of the extracted patterns and trends with respect to the client s organization. When the data mining service focuses on a particular client domain or vertical market, effective model browsers and templates can be constructed that focus the client s attention to information that is frequently deemed useful in the domain or vertical. Hence the quality of the model with respect to the particular domain is then easily evaluated by the client. Additionally, the service can use these browsers 13

14 and templates to evaluate model quality prior to exposing the model to the client. 5 Conclusion The goal of the data mining service is to effectively and efficiently produce high-quality data mining results for the data mining client for a reasonable (low) cost. We argued that a quality, cost-effective data mining result may be delivered by automating the operational aspects of the data mining process and focusing on specific client domains. Upon successful execution of these tasks, the service is then an attractive option for small, medium and large organizations to capitalize on and leverage their data investment to improve the delivery of their products and services to their customers. References [1] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages , Washington, D.C., May [2] J. D. Becher, P. Berkhin, and E. Freeman. Automating exploratory data analysis for efficient mining. In Proc. of the Sixth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD-2000), pages , Boston, MA, [3] P. S. Bradley and U. M. Fayyad. Refining initial points for K- Means clustering. In Proc. 15th International Conf. on Machine Learning, pages Morgan Kaufmann, San Francisco, CA, [4] P. S. Bradley, J. Gehrke, R. Ramakrishnan, and R. Srikant. Scaling mining algorithms to large databases. Comm. of the ACM, 45(8):38 43, [5] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, [6] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2): , [7] I. V. Cadez and P. S. Bradley. Model based population tracking and automatic detection of distribution changes. In Proc. Neural Information Processing Systems 2001, [8] D. M. Chickering. Personal communication, January

15 [9] CRISP-DM Consortium. Cross industry standard process for data mining (crisp-dm). [10] Microsoft Corp. Introduction to ole db for data mining. [11] R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, New York, [12] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, [13] Data Mining Group. Pmml version [14] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 73 84, New York, ACM Press. [15] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, [16] Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages , Seattle, Washington, AAAI Press. [17] Nimrod Megiddo and Ramakrishnan Srikant. Discovering predictive association rules. In Knowledge Discovery and Data Mining, pages , [18] Sreerama K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4): , [19] M. T. Oguz. Strategic intelligence: Business intelligence in competitive strategy. DM Review, August [20] Clark F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21(8): , [21] Foster J. Provost and Tom Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43 48, [22] D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, San Francisco, CA, [23] Padhraic Smyth. Clustering using monte carlo cross-validation. In Knowledge Discovery and Data Mining, pages ,

16 [24] M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36: , [25] D. E. Weisman and C. Buss. Database functionality high, analytics lags, September 28, Forrester Brief: Business Technographics North America. 16

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Postprocessing in Machine Learning and Data Mining

Postprocessing in Machine Learning and Data Mining Postprocessing in Machine Learning and Data Mining Ivan Bruha A. (Fazel) Famili Dept. Computing & Software Institute for Information Technology McMaster University National Research Council of Canada Hamilton,

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology

A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology Graham J. Williams and Zhexue Huang CSIRO Division of Information Technology GPO Box 664 Canberra ACT 2601 Australia

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

How To Use Data Mining For Loyalty Based Management

How To Use Data Mining For Loyalty Based Management Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1

Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

DATA MINING TECHNIQUES FOR CRM

DATA MINING TECHNIQUES FOR CRM International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 509 DATA MINING TECHNIQUES FOR CRM R.Senkamalavalli, Research Scholar, SCSVMV University, Enathur, Kanchipuram

More information

Knowledge Mining for the Business Analyst

Knowledge Mining for the Business Analyst Knowledge Mining for the Business Analyst Themis Palpanas 1 and Jakka Sairamesh 2 1 University of Trento 2 IBM T.J. Watson Research Center Abstract. There is an extensive literature on data mining techniques,

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Data Mining Techniques and Opportunities for Taxation Agencies

Data Mining Techniques and Opportunities for Taxation Agencies Data Mining Techniques and Opportunities for Taxation Agencies Florida Consultant In This Session... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

Emerging Trends in Business Analytics

Emerging Trends in Business Analytics Appears in the Communications of the ACM, Volume 45, Number 8, Aug 2002, pages 45-48 Note: This is the final draft prior to copy-editing done by CACM Emerging Trends in Business Analytics Ron Kohavi Blue

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Bayesian Predictive Profiles with Applications to Retail Transaction Data

Bayesian Predictive Profiles with Applications to Retail Transaction Data Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

Gold. Mining for Information

Gold. Mining for Information Mining for Information Gold Data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way Joseph M. Firestone, Ph.D. During the late 1980s,

More information

A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics

analytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics stone analytics Automated Analytics and Predictive Modeling A White Paper by Stone Analytics 3665 Ruffin Road, Suite 300 San Diego, CA 92123 (858) 503-7540 www.stoneanalytics.com Page 1 Automated Analytics

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Horsens, Denmark, 26th September 2005

More information

Measuring Lift Quality in Database Marketing

Measuring Lift Quality in Database Marketing Measuring Lift Quality in Database Marketing Gregory Piatetsky-Shapiro Xchange Inc. One Lincoln Plaza, 89 South Street Boston, MA 2111 gps@xchange.com Sam Steingold Xchange Inc. One Lincoln Plaza, 89 South

More information

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.

More information

Data Mining Analysis of a Complex Multistage Polymer Process

Data Mining Analysis of a Complex Multistage Polymer Process Data Mining Analysis of a Complex Multistage Polymer Process Rolf Burghaus, Daniel Leineweber, Jörg Lippert 1 Problem Statement Especially in the highly competitive commodities market, the chemical process

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

IBM's Fraud and Abuse, Analytics and Management Solution

IBM's Fraud and Abuse, Analytics and Management Solution Government Efficiency through Innovative Reform IBM's Fraud and Abuse, Analytics and Management Solution Service Definition Copyright IBM Corporation 2014 Table of Contents Overview... 1 Major differentiators...

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

CRISP-DM: Towards a Standard Process Model for Data Mining

CRISP-DM: Towards a Standard Process Model for Data Mining CRISP-DM: Towards a Standard Process Model for Mining Rüdiger Wirth DaimlerChrysler Research & Technology FT3/KL PO BOX 2360 89013 Ulm, Germany ruediger.wirth@daimlerchrysler.com Jochen Hipp Wilhelm-Schickard-Institute,

More information

Clustering Marketing Datasets with Data Mining Techniques

Clustering Marketing Datasets with Data Mining Techniques Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo oornek@ibu.edu.ba Abdülhamit Subaşı International Burch University, Sarajevo asubasi@ibu.edu.ba

More information

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview

More information

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in

Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in 96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad

More information

Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage

Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage Profit from Big Data flow 2 Tapping the hidden assets in hospitals data Revenue leakage can have a major

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Visual Data Mining with Pixel-oriented Visualization Techniques

Visual Data Mining with Pixel-oriented Visualization Techniques Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Visual Analysis of the Behavior of Discovered Rules

Visual Analysis of the Behavior of Discovered Rules Visual Analysis of the Behavior of Discovered Rules Kaidi Zhao, Bing Liu School of Computing National University of Singapore Science Drive, Singapore 117543 {zhaokaid, liub}@comp.nus.edu.sg ABSTRACT Rule

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining

More information

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,

More information

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD 72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is

More information

Fraud Detection for Online Retail using Random Forests

Fraud Detection for Online Retail using Random Forests Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.

More information

Distributed Regression For Heterogeneous Data Sets 1

Distributed Regression For Heterogeneous Data Sets 1 Distributed Regression For Heterogeneous Data Sets 1 Yan Xing, Michael G. Madden, Jim Duggan, Gerard Lyons Department of Information Technology National University of Ireland, Galway Ireland {yan.xing,

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a

Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information