Data Mining as an Automated Service
|
|
- Leonard Newman
- 8 years ago
- Views:
Transcription
1 Data Mining as an Automated Service P. S. Bradley Apollo Data Technologies, LLC February 16, 2003 Abstract An automated data mining service offers an out- sourced, costeffective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized. 1 Introduction The amount spent by organizations in implementing and supporting database technology is considerable. Global 3500 enterprises spend, typically, $664,000 on databases annually, primarily focusing on transactional systems and web-based access [25]. Unfortunately, the ability of organizations to effectively utilize this information for decision support typically lags behind their ability to collect and store it. But, organizations that can leverage their data for decision support are more likely to have a competitive edge in their sector of the market [19]. An organization may choose from a number of options when implementing the data analysis and mining technology needed to support decision-making processes or to optimize operations. One option is to perform the analysis within the organization (i.e. in-house ). It is 1
2 costly to obtain the ingredients required for a successful project: analytical and data mining experience on the part of the team performing the work, software tools, and hardware. Hence the in-house data mining investment is substantial. Another option that an organization may pursue is to out-source the data analysis and mining project to a third party. Data mining consultants have gone into business to address this market need. Consultants typically have data analysis and mining experience within one or more market sectors (e.g. banking, health-care, etc.), have analysis and mining software at their disposal, have the required hardware to process large datasets, and are able to produce customized analysis solutions. The cost of an outsourced data mining project is dependent upon a number of factors including: project complexity, data availability, and dataset size. Out-sourcing such projects to a third party consulting firm has proven to be successful, however, the cost is on the order of tens to hundreds of thousands of dollars. Unfortunately, a number of organizations simply do not have the resources available to perform data mining and analysis in-house or to out-source these tasks to large consulting firms. But these organizations often realize that they may be able to gain a competitive advantage by utilizing their data for decision support and/or operational improvement. Hence, there is an opportunity to deliver data mining and analysis services at a reduced cost. We argue that an automated data mining service can provide such analysis at a reduced cost by targeting organizations (or data mining clients) within a given market sector (or market vertical) and automating the knowledge discovery and data mining process [12, 9] to the extent possible. We note that there are similarities between the design of an automated data mining service and data mining software packages focusing on specific business needs within given vertical markets. By focusing on specific problems in a vertical market, both the software and service designer are able to address data format and preparation issues and choose appropriate modeling, evaluation and deployment strategies. An organization may find a software solution appealing, if it addresses their specific analysis needs, has the proper interfaces to the data and end-user (who may not be an analyst), and the analysis results are easily deployable. Organizations that have problems not specifically addressed by software solutions, or are attracted to a low-cost, outsourced alternative, may find an automated data mining service to be their best option. We next present the steps of the knowledge discovery and data mining process to provide context for the discussion on automating various steps. Data mining project engagements typically follow the following process (note that the sequence is not strict and often moving back and forth between steps is required) [9]: 2
3 1. Problem understanding: This initial step primarily focuses on data mining problem definition and specification of the project objectives. 2. Data understanding: This step includes data extraction, data quality measurement, and detection of interesting data subsets. 3. Data preparation: This step includes all activities required to construct the final dataset for modeling from the raw data including: data transformation, merging of multiple data sources, data cleaning, record and attribute selection. 4. Modeling: In this step, different data mining algorithms and/or tools are selected and applied to the table constructed in the data preparation step. Algorithm parameters are tuned to optimal values. 5. Evaluation: The goal of this step is to verify that the data mining or analysis models generated in the modeling stage are robust and achieve the objectives specified in the problem understanding step. 6. Deployment: In this step, the results are put in a form to be utilized for decision support (e.g. a report) or the data mining models may be integrated with one or more of the organizations IT systems (e.g. scoring prospective customers in real-time). Fig. 1 shows these analysis steps and the party that is primarily responsible for each individual step. We next specifically define the data resources and parties that form the basis of the relationship between the automated data mining service and the client. 1.1 Definitions Definition 1.1 (Raw Data) The base electronic data sources that contain or are believed to contain information relevant to the data mining problem of interest. Definition 1.2 (Metadata) Additional information that is necessary to properly clean, join and aggregate the raw data into the form of a dataset that is suitable for analysis. Metadata may consist of data schemas, data dictionaries (including the number of records in each file, and the names, types and range of values for each field), and aggregation rules. Definition 1.3 (Domain-specific Information) Additional information on specific rules, conventions, practices, etc. that are limited to the problem domain. 3
4 Problem Understanding Data Understanding Data Preparation Data Mining Client Modeling Data Mining Service Evaluation Deployment Figure 1: Data Mining Project Steps: Problem understanding, data understanding, data preparation, modeling, evaluation and deployment. Dashed lines indicate that primary responsibility is on the part of the data mining client. Dotted lines indicate that primary responsibility is on the part of the service. Notice that both the client and service play central roles for the data understanding step. 4
5 For example, a retail chain may indicate product returns in their transactional record with negative sales and quantity values. This information may be extremely helpful in the data preparation, evaluation and possibly the modeling steps. Definition 1.4 (Third-party data sources) Additional raw data that will likely improve the analysis result, but not collected or owned by the data mining client. Examples of third-party data sources include address information from the postal service, demographic information from third party data collection companies, etc. Definition 1.5 (Automated Data Mining Service) An organization that has implemented or obtained processes, tools, algorithms and infrastructure to perform the work of data understanding, data preparation, modeling, and evaluation with minimal input from a human analyst. The high-level goals of the automated data mining service are the following. 1. Provide the data mining client with a high quality, pertinent result. Achieving this goal ensures that the organization receiving the data mining/analysis results is satisfied (and hopefully a return customer!). 2. Remove or minimize the amount of human intervention required to produce a high-quality, pertinent result. Achieving the second goal allows the data mining/analysis service to scale to a large number of concurrent customers, allowing it to amortize the cost of offering the service across a large number of customers. In addition to automating as much of the operational aspects of the analysis process as possible, this goal is achievable by focusing on problems common in a few market verticals or problem domains. For this discussion, we will assume that the set of data mining analysis problems that the automated service addresses is fixed. For example, the service may be able to offer data clustering analysis services, but is not able to offer association-rule services. We will also drop the word automated for the remaining discussion and it is assumed that when referring to the data mining service, we are referring to an automated data mining service. Definition 1.6 (Data Mining Client) An organization that possesses two items: (1) a data mining analysis problem; (2) the raw data, metadata, domain-specific information that, in combination with possibly third party data sources, are needed to solve the data mining analysis problem. 5
6 We next specify the context of the relationship between the data mining client and the data mining service. 1.2 Relationship between Consumer and Service The data mining service receives raw data, metadata, and possibly domain-specific information from the client. The service then performs data understanding, data preparation, modeling, and evaluation for the analyses specified by the consumer, for an agreed-upon fee. When these steps are completed, the results and possibly the provided data and other information (reports on analyses tried, intermediate aggregations, etc.) are returned to the client. The combination of offering a fixed set of data mining and analysis solutions and focusing on clients from similar domains or market verticals enables the data mining service to perform the data understanding and deployment tasks with minimal intervention from the human data analyst working for the service. Offering a fixed set of data mining and analysis solutions allows the service to templatize problem definition documents and related information. Solution deployment is also similarly constrained by the problem domain and the focused vertical market to again allow the service to templatize deliverables. Additionally, by focusing on a particular problem domain, the data mining service analyst gains invaluable domain knowledge increasing the likelihood of successful solution delivery in future engagements. Example: Consider the e-commerce domain. A data mining service may offer the following solutions: determining the most common paths followed by website visitors, determining the most common products purchased together and ranking products that are most likely to be purchased together. For the e-commerce domain, problem definition and specification document templates can help make the problem understanding phase clear and efficient. Additionally, the deployment steps may be nearly automated. Depending on the analysis performed, the results may take the form of an automatically generated report, a file, an executable, etc. Analysis of common paths and common products purchased together are typically best delivered in a standard report form. Product rankings may best be delivered to the client in the form of a file or an executable that takes a given product ID and returns the ranked list of product IDs. There are often legal, security, and privacy concerns regarding data extraction on the part of the data mining client that should be addressed by the service. For more detail into these issues, please see [22]. The remainder of the paper is organized as follows. Sections 2, 6
7 3 and 4 focus on issues involved in automating and scaling the data preparation, modeling, and evaluation steps in the general KDD process. Section 5 concludes the paper. 2 Data Understanding and Preparation Tasks involved in the data understanding step are data extraction, data quality measurement and the detection of interesting data subsets. The data preparation step consists of all activities required to construct the final dataset for modeling. Responsibility for tasks in the data understanding step is typically split between the client and the service. The data mining client is responsible for extracting and providing the required raw data sources to the service for analysis. The data mining service may also augment the raw data provided by the client with third-party data sources. Data quality measurement and detection of interesting data subsets are performed by the service. To efficiently address data understanding and preparation tasks, the data mining service needs to rely upon the fact that its clients come from certain specific domain areas. Ideally, these domains are chosen so that the organizations within a particular domain have similar data schemas and data dictionaries characterizing the data sources that are to be analyzed. Given a-priori knowledge about the data schemas and data dictionaries for a given domain, the data mining service can automate the operational steps of joining the appropriate raw data sources and possibly integrating third-party data sources. Similar data formats within a market vertical or domain also justifies the building of automated domain-specific data cleaning and data quality measurement tools. In the perfect setting, domain-specific rules for data cleaning can be completely automated. But, a useful solution automatically captures and fixes a majority of data cleaning issues and only flags a small fraction of violations for human intervention and triage. The goal of data quality measurement is to justify the potential that the data may provide the required solution to the given problem [22]. Since the solutions offered by the service and the data sources themselves tend to be domain-specific, automating the data quality measurement may be done with minimal effort and requiring only configuration information. Example: For example, suppose a data mining service is offering market-basket analysis for e-commerce companies. The typical data source of interest includes the order header, order line item information, and product catalog data. Since these sources tend to have similar schemas, it is possible to automate some data cleaning processes (en- 7
8 suring that product IDs in the order information correlate with product IDs in the catalog, that rules governing the line item price information with respect to the catalog prices are respected, etc.). Additionally, initial data quality measurements may include the number of 1-itemsets that have sufficient support. Similarly, when restricted to a small number of domains and a small set of data mining problems to address, with knowledge and expertise, it is possible to automatically apply a number data transformations and feature selection techniques that are shown to be useful for the given domain (possibly consisting of a combination of domain knowledge and automated feature selection methods). These can be automatically executed and the resulting models may be automatically scored, yielding a system that constructs a number of models with little intervention from the human analyst. For a more detailed discussion on automating the feature selection/variable transformation task, see [2]. 3 Modeling Modeling is the step in the data mining process that includes the application of one or more data mining algorithms to the prepared dataset, including selection of algorithm parameter values. The result of the modeling step is a series of either predictive or descriptive models or both. Descriptive models are useful when the data mining client is attempting to get a better understanding of the dataset at hand. Predictive models summarize trends in the data so that, if future data has the same or similar distribution to past data, the model will predict trends with some degree of accuracy. In this section we assume that the data understanding and data preparation phases have produced a dataset from which the desired data mining solution can be derived. Although, in practice, often results of the modeling step motivate revisiting the data understanding and data preparation steps. For example, after building a series of models it may become apparent that a different data transformation would greatly aid in the modeling step. When evaluating the utility of a given data mining algorithm for possible use in the service, the following considerations should be taken into account: 1. Assuming the prepared dataset is informative, is it possible to obtain high-quality models consistently using the given algorithm? 2. Is the algorithm capable of optimizing objectives specific to the client s organization (e.g. total monetary cost/return)? 3. Is the algorithm efficient? 8
9 There are two factors influencing the likelihood of obtaining a high quality, useful model using a given algorithm. The first factor relates to the robustness of the computed solution with respect to small changes in the input parameters or slight changes in the data. Ideally, for a majority of datasets that are encountered in a given domain, the data mining service prefers robust algorithms since model quality with respect to small parameter and data change is then predictable (and hence, the algorithm is amenable to automation). The second factor is the ease at which the insight gained by analyzing the model can be communicated to the data mining client. Typically, prior to deployment of a model or utilizing the model in organizational decision processes, the data mining client desires to understand the insights gleaned from the model. This process is typically difficult to automate, but developing intuitive, easy to understand user interfaces aid greatly. Additionally, a process that identifies a fraction of interesting rules or correlations and reports these to a data mining service analyst is very useful quality assurance tool. These are primarily concerns during the deployment step, but the choice of modeling technique does effect this later phase. We note that there are some data mining applications in which the client often does not analyze the model, but analyzes the computed results (e.g. results produced by product recommender systems are often analyzed, rather than attempting to understand the underlying model). Industry standards for data mining model storage such as PMML [13] and OLE DB for Data Mining [10] enable consultants and third party vendors to build effective model browsers for specific industry problems and domains. These standards provide a basis for data mining platforms that enable data mining clients to more easily deploy and understand the models built by the service. From the viewpoint of the data mining client, model maintenance tends to be an important issue. The data mining client may not have or may not want to invest resources to ensure that the data mining models they ve received from the service are maintained and accurately model their underlying organizational processes, which may be changing over time. Techniques for incrementally maintaining data mining models are discussed in [4]. Additionally, work on identifying the fit of data mining models to data changing over time includes [7]. It may be possible for the service to incorporate these techniques into the client deliverable so that that model may maintain itself or notify the client that it is not sufficiently modeling recently collected data. We briefly discuss some popular algorithms used in developing data mining solutions. Note that this list is not exhaustive. 9
10 3.1 Decision Trees Decision tree construction algorithms are appealing from the perspective of the data mining service for a number of reasons. The tree s hierarchical structure enables non-technical clients to understand and effectively explore the model after a short learning period. Decision tree construction typically requires the service to provide few, if any, parameters, hence tuning the algorithm is relatively easy. Typically, small data changes result in small, if any, changes to the resulting model (although this is not true for all datasets and possible changes in data) [8]. For excellent discussions on decision tree algorithms, please see [5, 18]. For a discussion on techniques used to scale decision tree algorithms to large datasets, see [4]. 3.2 Association Rules Association rule algorithms identify implications of the form X Y where X and Y are sets of items. The association rule model consists of a listing of all such implications existing in the given dataset. These implications are useful for data exploration and may be very useful in predictive applications (e.g. see [17]). Association rules are often presented to the user in priority order with the most interesting rules occurring first. Alternatively or in addition to the list of interesting rules, a browser allowing the data mining client to filter rules with specified item occurring in the set X or Y is typically useful. For an overview of association rule algorithms, see [1, 16]. Approaches used to scale association rule algorithms to large databases are discussed in [4]. 3.3 Clustering Clustering algorithms aim to partition the given dataset into several groups such that records in the same group are similar to each other, identifying subpopulations in the data. Typically, the data mining client is not interested in the particular clustering strategy employed by the service, but is interested in a concise, effective summary of the groups that occur in their data. Although there are numerous clustering approaches available, we will focus the discussion on two methods: iterative and hierarchical methods. Iterative clustering methods are well-known and typically straightforward to implement. But from the perspective of the data mining service, there are two challenges to automating them: obtaining a robust model that accurately characterizes the underlying data, and determining the correct number of clusters existing in the underlying data. Iterative clustering methods require the specification of initial clusters and the computed solution is dependent upon the quality of 10
11 this initial partition. Hence to ensure a quality solution, the data mining service must implement a search for a good initial clusters [3]. This is typically done by re-running the iterative clustering algorithm from multiple random initial clusters and taking the best model or utilizing sampling strategies. Additionally, determining the correct number of clusters is challenging, but strategies such as those discussed in [23] are useful. For a general overview of iterative clustering methods, see [15, 11]. Hierarchical clustering methods build a tree-based hierarchical taxonomy (dendogram) summarizing similarity relationships between subsets of the data at different levels of granularity. The hierarchical nature of the resulting model is a benefit for the data mining service since this structure is typically easily browsed and understood by the client. Additionally, these clustering methods are very flexible when it comes to the distance metric employed to group together similar items, making hierarchical methods applicable to a number of problems that require the use of non-standard distance metrics. The main caveat to standard hierarchical clustering implementations is in their computational complexity, requiring either O(m 2 ) memory or O(m 2 ) time for m data points, but automating these standard implementations is straightforward. Work on scaling these methods to large datasets includes [14, 20]. For a detailed discussion of hierarchical clustering methods, see [15]. 3.4 Support Vector Machines Support Vector Machines (SVMs) are powerful and popular solutions to predictive modeling problems. SVM algorithms are typically stable and robust with respect to small changes in the underlying data. The algorithms require the specification of a parameter that effectively balances the predictive performance on the available training data with the complexity of the predictive model computed. Tuning set strategies are typically used to automate the selection of optimal values of this parameter. The SVM predictive model is a function in the space of the input or predictor attributes of the underlying data. Since this space tends to be high-dimensional, presenting the SVM model to the data mining client for the purpose of gaining insight is often a difficult proposition. The SVM is a very good predictive model, but is somewhat of a black-box with respect to understanding and extracting information from its functional form. For an overview of SVMs, see [6]. For strategies on scaling SVM computation to large datasets, see [4]. 11
12 4 Evaluation Prior to delivering a data mining model or solution to the client, the service will evaluate the model or solution to ensure that it has sufficient predictive power or provides adequate insight into trends existing in the underlying data. The primary focus in the evaluation phase is ensuring that the client is being handed a high-quality result from the service. Depending upon the project, model evaluation may involve one or two components. The first is an objective, empirical measurement of the performance of the model or of the ability of the model to accurately characterize the underlying data. The second, which may not be needed for some projects, involves a discussion or presentation of the model with the client to collect feedback and ensure that the delivered model satisfies the client s needs. This second component is typical for projects or models in which the goal is data understanding or data exploration. We discuss the empirical measurement component of the evaluation phase in more detail for two high-level data mining tasks: predictive applications and data exploration. By exposing intuitive, well-designed model browsers to the client, the service may automate the process of presenting the model and collecting client feedback to the extent possible. As the service focuses on clients in particular domains or verticals, model browsers or report templates may be created that raise attention to important or interesting results for the specific domain or market. 4.1 Evaluating Predictive Models The primary focus of predictive model evaluation is to estimate the predictive performance of a given model when it is applied to future or unseen data instances. Algorithms discussed that produce predictive models include decision trees (Section 3.1), association rules (Section 3.2) and support vector machines (Section 3.4). The basic assumption underlying different predictive performance estimation strategies is that the distribution of future or unseen data is the same (or similar to) the distribution of the training data used to construct the model. Popular methods for estimating the performance of predictive models include cross-validation [24] and ROC curves [21]. Cross-validation provides an overall average predictive performance value, given that the data distribution assumption above is satisfied. ROC curves provide a more detailed analysis of the frequency of false positives and false negatives for predictors constructed from a given classification algorithm. 12
13 From the viewpoint of the data mining service, automating crossvalidation and ROC computations is straightforward. Running these evaluation techniques requires little (if any) input from a human analyst on the part of the service. But computation of these values may be time consuming, especially when the predictive modeling algorithm used has a lengthy running time on the client s data. In addition to evaluating a given model (or set of models) with respect to predictive performance, the data mining service may implement evaluation metrics that are more informative for clients within a given domain or vertical. Example: Consider again the data mining service that caters to e- commerce clients. The service may provide product recommendations for e-commerce companies (i.e. when a customer is viewing sheets at the e-commerce site, recommend pillows to them also). In this case, although the recommender system is a predictive modeling system, the data mining service may evaluate different predictive models with respect to the amount of revenue expected to be generated when the recommender is placed in production. 4.2 Evaluating Data Exploration Models The primary goal in evaluation of models that support data exploration tasks is ensuring that the model is accurately summarizing and characterizing patterns and trends in the underlying dataset. Algorithms discussed that address data exploration tasks are the clustering methods mentioned in Section 3.3. To some extent association rules (Section 3.2) are also used as data exploration tools. Objective measures for evaluating clustering models to ensure that the model accurately captures data characteristics include Monte Carlo cross-validation [23]. This method is straightforward to automate on the part of the data mining service. Given the nature of association rule discovery algorithms, the set of association rules found are, by definition, accurately derived from the data. So there is no need to empirically measure the fit of the set of association rules to the underlying dataset. The quality of data exploration models is related to the utility of the extracted patterns and trends with respect to the client s organization. When the data mining service focuses on a particular client domain or vertical market, effective model browsers and templates can be constructed that focus the client s attention to information that is frequently deemed useful in the domain or vertical. Hence the quality of the model with respect to the particular domain is then easily evaluated by the client. Additionally, the service can use these browsers 13
14 and templates to evaluate model quality prior to exposing the model to the client. 5 Conclusion The goal of the data mining service is to effectively and efficiently produce high-quality data mining results for the data mining client for a reasonable (low) cost. We argued that a quality, cost-effective data mining result may be delivered by automating the operational aspects of the data mining process and focusing on specific client domains. Upon successful execution of these tasks, the service is then an attractive option for small, medium and large organizations to capitalize on and leverage their data investment to improve the delivery of their products and services to their customers. References [1] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages , Washington, D.C., May [2] J. D. Becher, P. Berkhin, and E. Freeman. Automating exploratory data analysis for efficient mining. In Proc. of the Sixth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD-2000), pages , Boston, MA, [3] P. S. Bradley and U. M. Fayyad. Refining initial points for K- Means clustering. In Proc. 15th International Conf. on Machine Learning, pages Morgan Kaufmann, San Francisco, CA, [4] P. S. Bradley, J. Gehrke, R. Ramakrishnan, and R. Srikant. Scaling mining algorithms to large databases. Comm. of the ACM, 45(8):38 43, [5] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, [6] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2): , [7] I. V. Cadez and P. S. Bradley. Model based population tracking and automatic detection of distribution changes. In Proc. Neural Information Processing Systems 2001, [8] D. M. Chickering. Personal communication, January
15 [9] CRISP-DM Consortium. Cross industry standard process for data mining (crisp-dm). [10] Microsoft Corp. Introduction to ole db for data mining. [11] R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, New York, [12] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, [13] Data Mining Group. Pmml version [14] S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 73 84, New York, ACM Press. [15] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, [16] Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages , Seattle, Washington, AAAI Press. [17] Nimrod Megiddo and Ramakrishnan Srikant. Discovering predictive association rules. In Knowledge Discovery and Data Mining, pages , [18] Sreerama K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4): , [19] M. T. Oguz. Strategic intelligence: Business intelligence in competitive strategy. DM Review, August [20] Clark F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21(8): , [21] Foster J. Provost and Tom Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43 48, [22] D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, San Francisco, CA, [23] Padhraic Smyth. Clustering using monte carlo cross-validation. In Knowledge Discovery and Data Mining, pages ,
16 [24] M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36: , [25] D. E. Weisman and C. Buss. Database functionality high, analytics lags, September 28, Forrester Brief: Business Technographics North America. 16
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationPhilosophies and Advances in Scaling Mining Algorithms to Large Databases
Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationAssessing Data Mining: The State of the Practice
Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationInternational Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
More informationEFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationPostprocessing in Machine Learning and Data Mining
Postprocessing in Machine Learning and Data Mining Ivan Bruha A. (Fazel) Famili Dept. Computing & Software Institute for Information Technology McMaster University National Research Council of Canada Hamilton,
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationCollege information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationNEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
More informationA Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology
A Case Study in Knowledge Acquisition for Insurance Risk Assessment using a KDD Methodology Graham J. Williams and Zhexue Huang CSIRO Division of Information Technology GPO Box 664 Canberra ACT 2601 Australia
More informationA STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationHow To Use Data Mining For Loyalty Based Management
Data Mining for Loyalty Based Management Petra Hunziker, Andreas Maier, Alex Nippe, Markus Tresch, Douglas Weers, Peter Zemp Credit Suisse P.O. Box 100, CH - 8070 Zurich, Switzerland markus.tresch@credit-suisse.ch,
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationMining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
More informationCredit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1
Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 Salvatore J. Stolfo, David W. Fan, Wenke Lee and Andreas L. Prodromidis Department of Computer Science Columbia University
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationDATA MINING TECHNIQUES FOR CRM
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 509 DATA MINING TECHNIQUES FOR CRM R.Senkamalavalli, Research Scholar, SCSVMV University, Enathur, Kanchipuram
More informationKnowledge Mining for the Business Analyst
Knowledge Mining for the Business Analyst Themis Palpanas 1 and Jakka Sairamesh 2 1 University of Trento 2 IBM T.J. Watson Research Center Abstract. There is an extensive literature on data mining techniques,
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationData Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationData Mining Techniques and Opportunities for Taxation Agencies
Data Mining Techniques and Opportunities for Taxation Agencies Florida Consultant In This Session... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationEmerging Trends in Business Analytics
Appears in the Communications of the ACM, Volume 45, Number 8, Aug 2002, pages 45-48 Note: This is the final draft prior to copy-editing done by CACM Emerging Trends in Business Analytics Ron Kohavi Blue
More informationDMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support
DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationWebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationWhy do statisticians "hate" us?
Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationBayesian Predictive Profiles with Applications to Retail Transaction Data
Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic
More informationData Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction
Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration
More informationWorking with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
More informationGold. Mining for Information
Mining for Information Gold Data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way Joseph M. Firestone, Ph.D. During the late 1980s,
More informationA Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationanalytics stone Automated Analytics and Predictive Modeling A White Paper by Stone Analytics
stone analytics Automated Analytics and Predictive Modeling A White Paper by Stone Analytics 3665 Ruffin Road, Suite 300 San Diego, CA 92123 (858) 503-7540 www.stoneanalytics.com Page 1 Automated Analytics
More informationDataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
More informationIntroduction to Data Mining
Introduction to Data Mining José Hernández ndez-orallo Dpto.. de Systems Informáticos y Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Horsens, Denmark, 26th September 2005
More informationMeasuring Lift Quality in Database Marketing
Measuring Lift Quality in Database Marketing Gregory Piatetsky-Shapiro Xchange Inc. One Lincoln Plaza, 89 South Street Boston, MA 2111 gps@xchange.com Sam Steingold Xchange Inc. One Lincoln Plaza, 89 South
More informationDigging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA
Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA ABSTRACT Current trends in data mining allow the business community to take advantage of
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationA Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
More informationCredit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
More informationData Mining Analysis of a Complex Multistage Polymer Process
Data Mining Analysis of a Complex Multistage Polymer Process Rolf Burghaus, Daniel Leineweber, Jörg Lippert 1 Problem Statement Especially in the highly competitive commodities market, the chemical process
More informationSelection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
More informationIBM's Fraud and Abuse, Analytics and Management Solution
Government Efficiency through Innovative Reform IBM's Fraud and Abuse, Analytics and Management Solution Service Definition Copyright IBM Corporation 2014 Table of Contents Overview... 1 Major differentiators...
More informationData Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
More informationLluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining
Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:
More informationCRISP-DM: Towards a Standard Process Model for Data Mining
CRISP-DM: Towards a Standard Process Model for Mining Rüdiger Wirth DaimlerChrysler Research & Technology FT3/KL PO BOX 2360 89013 Ulm, Germany ruediger.wirth@daimlerchrysler.com Jochen Hipp Wilhelm-Schickard-Institute,
More informationClustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo oornek@ibu.edu.ba Abdülhamit Subaşı International Burch University, Sarajevo asubasi@ibu.edu.ba
More informationOverview. Background. Data Mining Analytics for Business Intelligence and Decision Support
Mining Analytics for Business Intelligence and Decision Support Chid Apte, PhD Manager, Abstraction Research Group IBM TJ Watson Research Center apte@us.ibm.com http://www.research.ibm.com/dar Overview
More informationDr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad Email: Prasad_vungarala@yahoo.co.in
96 Business Intelligence Journal January PREDICTION OF CHURN BEHAVIOR OF BANK CUSTOMERS USING DATA MINING TOOLS Dr. U. Devi Prasad Associate Professor Hyderabad Business School GITAM University, Hyderabad
More informationHospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage
Hospital Billing Optimizer: Advanced Analytics Solution to Minimize Hospital Systems Revenue Leakage Profit from Big Data flow 2 Tapping the hidden assets in hospitals data Revenue leakage can have a major
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationGrow Revenues and Reduce Risk with Powerful Analytics Software
Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationBuilding A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed
More informationVisual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 mihael.ankerst@boeing.com Abstract Pixel-oriented visualization
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationHigh-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationVisual Analysis of the Behavior of Discovered Rules
Visual Analysis of the Behavior of Discovered Rules Kaidi Zhao, Bing Liu School of Computing National University of Singapore Science Drive, Singapore 117543 {zhaokaid, liub}@comp.nus.edu.sg ABSTRACT Rule
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationCRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining
Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining
More informationApigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
More information72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology Paulo.gottgtroy@aut.ac.nz Abstract This paper is
More informationFraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
More informationDistributed Regression For Heterogeneous Data Sets 1
Distributed Regression For Heterogeneous Data Sets 1 Yan Xing, Michael G. Madden, Jim Duggan, Gerard Lyons Department of Information Technology National University of Ireland, Galway Ireland {yan.xing,
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationMauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a
Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626
More informationBuilding a Database to Predict Customer Needs
INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationEfficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
More information