Data Mining is the process of knowledge discovery involving finding

Size: px
Start display at page:

Download "Data Mining is the process of knowledge discovery involving finding"

Transcription

1 using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden patterns and associations, constructing analytical models, performing classification and prediction, and presenting mining results. Data Mining is one of the functional groups that is offered with Hyperion System 9 BI+ Analytic Services a highly scalable enterprise class architecture analytic server (OLAP). The Data Mining Framework within Analytic Services integrates data mining functions with OLAP and provides the users with highly flexible and extensible on-line analytical mining capabilities. On-line analytical mining greatly enhances the power of exploratory data analysis by providing users with the facilities for data mining on different subsets of data at different levels of abstraction in combination with the core analytic services like drill up, drill down, pivoting, filtering, slicing and dicing all performed on the same OLAP data source. introduction This paper focuses on using Naïve Bayes, one of the Data Mining algorithms (shipped in-the-box with Analytic Services) to develop a model to solve a typical business problem in the admissions department at an academic university referred to as ABC University in the paper. The paper details out the approach that is taken by the user to solve the problem and explains the various steps that are performed by using Analytic Services in general and the Analytic Services Data Mining Framework in particular, towards arriving at the solution. problem statement One of the problems related to managing admissions that typical universities face is to be able to predict with reasonable accuracy the likelihood that an applicant would eventually enroll in an academic program. Universities typically incur a considerable expense in promoting their programs and in following up with prospective candidates. Identifying applicants with a higher likelihood of enrollment into the program will help the university channel the promotional expenditure in a more gainful way. The candidates typically apply to more than one university to widen their chances of getting enrolled within that academic year. Universities that can quickly arrive at a decision on the applicant stand a higher chance of getting acceptance from candidates. ABC University collects from applicants a variety of data as part of the admissions process: demographic, geographic, test scores, financial information, etc. In addition to that, the admissions department at the ABC University also has

2 acceptance information from the previous year s admissions process. The problem at hand is to use all this available data and predict whether an applicant will choose to enroll or not. The ABC University is also interested in analyzing the composite factors influencing the enrollment decision. This additional analysis is useful in adjusting the admissions policy at the university and also in ensuring effective cost management in the admissions department. available data The admissions department is currently gathering demographic, geographic, test scores, financial information, etc., from applicants as part of the admissions process. There is also historical data available indicating the actual enrollment status of applicants along with all the other attributes that were collected as part of the admission process. The dataset made available has 33 different attributes for each applicant inclusive of the decision result attribute. There are in all about records available. Table 1: List of potential mining attributes available in database 2

3 preparing for data mining cube is the data source The algorithms in the Data Mining Framework are designed to work on data present within an Analytic Services cube. The design of the cube should take into consideration the data needs for all kinds of analyses (OLAP and Data Mining) that the user is interested in performing. Once the data is brought into the cube environment it can then be accessed through the Data Mining Framework for predictive analytics. The Data Mining Framework uses MDX expressions to identify sections within the cube to obtain input data for the algorithm as well as to write back the results. The Data Mining Framework can only take regular dimension members as mining attributes. What this implies is that only data that is referenced through regular dimension members (not through attribute dimensions or user defined attributes) can be presented as input data to the Data Mining Framework. Accordingly, the data that is required for predictive analytics should be modeled within the standard dimensions and measures within a cube. In the case study being discussed in this paper, the primary business requirement was to build a classification model for prediction. Since there were no other accompanying business requirements, the design of the Analytic Services cube was primarily driven by the Data Mining analytics need. For example, we have not used any attribute dimension modeling in the case study. However, in the generic case it is more likely that the cube caters to both regular OLAP analytics and predictive analytics within the same dimensional model. preparing mining attributes The available input data can broadly be of two data types number or string. However, since measures in Analytic Services are essentially stored in the database in a numerical format, the string type input data will have to be encoded into a number type data before being stored in Analytic Services. For example, if the gender information is available as a string stating Male or Female it needs to be first encoded into a numeric like 1 or 0, before being stored as a measure in the Analytic Services OLAP database. Mining attributes can be of two types categorical or numerical. Mining attributes that describe discrete information content like gender ( Male or Female ), zip code (95054, 94304, 90210, etc.), customer category ( Gold, Silver, Blue ), status information ( Applied, Approved, Declined, On Hold ), etc. are termed categorical attribute types. Mining attributes that describe continuous information content like sales, revenue, income, etc. are termed numerical attribute types. The Analytic Services Data Mining Framework has the capability of working with algorithms that can handle both categorical and numerical attribute types. Among the algorithms that are shipped in the box with the Analytic Services Data Mining Framework, the Naïve Bayes and the Decision Tree algorithms have the capability to handle both categorical as well as numerical mining attribute types and treat them accordingly. One of the key steps in Data Mining is the data auditing or the data conditioning phase. This involves putting together, cleansing, categorizing, normalizing, and proper encoding of data. This step is usually performed outside the Data Mining tool. The effectiveness of the Data Mining algorithm is largely dependent on the quality and completeness of the source data. In some cases, for various mathematical reasons, the available input data may also need to be transformed before it is brought into a Data Mining environment. Transformations may sometimes also include splitting or combining of input data columns. Some of these transformations may be done on the input dataset outside the Data Mining Framework by using standard data manipulation techniques available in ETL tools or RDBMS environments. For the current case the input data does not need any mathematical transformation, but some encoding is needed to convert data into a format that can be processed within the Analytic Services OLAP environment. In the current problem at the ABC University, the available set of input data consisted of both string and number data types. The list below gives some of the input data, which needed encoding of string type input into number type input: Identity related data like Gender, City, State, Ethnicity Data related to the application process like Application Status, Primary Source of contact, Applicant Type, etc. Date related data like Application Date, Source Date, etc. (Dates were available in the original dataset as strings, specifically they had two different formats yymmdd and mm/dd/yy, and they had to be encoded into a number.) In the current case study, these encodings were done outside the Analytic Services environment by the construction of look-up master tables where the string type input were listed in a tabular format and the records were sequentially numbered. Subsequently, the string type input was referred to by its corresponding numeric identifier during data load into Analytic Services. Table 2 shows a few samples of how such mapping files will look like. State ID State Name VT CA MA MI NH NJ AppliedStatus ID Application Status Applied Offered Admission Paid Fees Enrolled Table 2: Typical mapping of numeric identifiers 3

4 preparing the cube After all the input data has been identified and made ready, the next step is to design an outline and load the data into an Analytic Services cube. In the context of the current case the Analytic Services outline created was as follows: All the input data (measures in the OLAP context) were organized together into five groups (a two level hierarchy created in the measures dimension) based on a logical grouping of measures. The details of each of the measure are explained in the table below -Table 3: Analytic Services outline expanded. Data load is performed just as it is normally done for any Analytic Services cube. At this stage we have: Designed an Analytic Services cube Loaded it with relevant data It should be noted that the steps described so far are generic to Analytic Services cube building and did not need any specific support from the Analytic Services Data Mining Framework. Measure Group Explanation Measures related to information about the applicants identity were organized into this group. Some of these measures were transformed from string type to number type to facilitate modeling it within the Analytic Services database context. Measures related to various test scores and high school examination results were organized into this group. Measures related to the context of the applicants application processing have been organized together into this group. Measures related to the academic background. Measures providing information about the financial support and funding associated with the applicant. Table 3: Analytic Services outline expanded 4

5 identifying the optimal set of mining attributes It is necessary to reduce the number of attributes / variables presented to an algorithm so that the information content is enhanced and the noise minimized. This is usually performed using supporting mathematical techniques to ensure that the most significant attributes are retained within the dataset that is presented to the algorithm. It should be noted here that the choice of significant attributes are more driven by the particular data rather than by the problem itself. Attribute analysis or attribute conditioning is one of the initial steps in the Data Mining process and is currently performed outside the Data Mining Framework. The main objective during this exercise is to identify a subset of mining attributes that are highly correlated with the predicted attribute; while ensuring that the correlation within the identified subset of attributes is as low as possible. The Analytic Services platform provides for a wide variety of tools and techniques that can be used in the attribute selection process. One method to identify an optimal set of attributes is to use certain special data reduction techniques implemented within Analytic Services through Custom Defined Functions (CDFs). Additionally, users can use other data visualization tools like Hyperion Visual Explorer to arrive at a decision on the effectiveness of specific attributes in contributing to the overall predictive strength of the Data Mining algorithm. Depending on the nature of the problem the users may choose to utilize an appropriate tool and technique in deciding the optimal set of attributes. One of the advantages of working with the Analytic Services Data Mining Framework is the inherent capability in Analytic Services to support customized methods for attribute selection by the use of Custom Defined Functions (CDFs). This is essential since the process of mining attribute selection can vary significantly across various problems and having an extensible toolkit comes in very handy to be able to customize a method to suit a specific problem. In the current case at ABC University, a CDF was used to identify the correlation effects amongst the available set of mining attributes. A thorough analysis of various subsets of the available mining attributes was performed to identify a subset that is highly correlated with the predicted mining attribute and at the same time has low correlation scores within the subset in itself. Since some Data Mining algorithms (like Naïve Bayes, Neural Net) are quite sensitive to interattribute dependencies, an attempt was made to outline the clusters of mutually dependent attributes, with a certain degree of success. From each cluster a single, most convenient, attribute was selected. For this case study, an expert made the decision, but this process can be generalized to a large degree. An optimal set of five mining attributes was identified after this exercise. Table 4 shows the list of identified mining attributes, grouped by the input attribute type categorical or numerical. Categorical Type FARecieved AppStatus Applicant Type Numerical Type StudBudget TotalAward Table 4: Optimal set of mining attributes identified At this stage we have: Designed an Analytic Services cube Loaded it with relevant data Identified the optimal subset of measures (mining attributes) modeling the problem We will now use the Data Mining Framework to define an appropriate model (for the business problem) based on the Analytic Services cube and the identified subset of mining attributes (measures). Setting up the model includes selecting the algorithm, defining algorithm parameters and identifying the input data location and output data location for the algorithm. choosing the algorithm The next step in the Data Mining process is to pick the appropriate algorithm. There are a set of six basic algorithms provided in the Data Mining Framework Naïve Bayes, Regression, Decision Tree, Neural Network, Clustering and Association Rules. The Analytic Services Data Mining Framework also allows for the inclusion of new algorithms through a well defined process described in the vendor guide that is part of the Data Mining SDK. The six basic algorithms are a sample set that is shipped with the product to provide a starting point for using the Data Mining Framework. Choosing an algorithm for a specific problem needs basic knowledge of the problem domain and the applicability of specific mathematical techniques to efficiently solve problems in that domain. The specific problem that is being discussed in this paper falls into a class of problems termed as classification problems. The need here is to classify each applicant into a discrete set of classes on the basis of certain numerical and categorical information available about the applicant. The class referred to in this context is the status of the applicants application looked at from an enrollment perspective: will enroll or will not enroll. There is historical data available indicating which kind (with a specific combination of categorical and numerical factors associated with them) of applicants that have gone ahead and accepted offers from the ABC University and subsequently enrolled into the programs. There is data available for the negative case as well i.e. applicants that did not eventually enroll into the program. 5

6 Given the fact that this problem can be looked at as a classification problem and the fact that there is historical information available, one of the algorithms that is suitable for the analysis is the Naïve Bayes classification algorithm. We chose Naïve Bayes for modeling this particular business problem. deciding on the algorithm parameters Every algorithm has a set of parameters that control the behavior of the algorithm. Algorithm users need to choose the parameters based on their knowledge of the problem domain and the characteristics of the input data. Analytic Services provides adequate support for such preliminary analysis of data using Hyperion Visual Explorer or the Analytic Services Spreadsheet Client. Users are free to analyze the data using any tool convenient and determine their choices for the various algorithm parameters. Each of the algorithms has a set of parameters that determine the way the algorithm will process the input data. For the current case, the algorithm chosen is Naïve Bayes and it has four parameters that need to be specified Categorical, Numerical, RangeCount, Threshold. The details of each of the parameters and the implications of setting them are described in the online help documentation. Out of the selected list of attributes we have a few that are of categorical type and hence our choice for the Categorical parameter is a yes. Similarly, there are attributes that are of numerical type and hence the choice for Numerical parameter also is a yes. The data was analyzed using a histogram plot to understand the distribution before deciding on the value to be provided for the RangeCount parameter. This parameter needs to be large enough to allow for the algorithm to use all the variety available in the data and at the same time should be small enough to prevent over fitting. From the analysis of the input data for this particular case, setting this parameter 12 seemed reasonable. The RangeCount controls the binning 1 process in the algorithm. It should be emphasized that the binning schemes (including bin count) really depend on the specific circumstances and may vary to a great degree between different problems. At this stage we have: Designed an Analytic Services cube Loaded it with relevant data Identified the optimal subset of measures (mining attributes) Chosen the algorithm suitable for the problem Identified the parameter values for the chosen algorithm applying the data mining framework Now that we have completed all the preparatory steps for Data Mining, the next step is to use the Data Mining Wizard in the Administration Services Console to build a Data Mining model for the business problem. There are three steps involved in effectively using the Data Mining functionality to provide predictive solutions to business problems. 1. Building the Data Mining model 2. Testing the Data Mining model 3. Applying the Data Mining model Each of these steps, performed using the Data Mining Wizard in the Administration Services Console, uses MDX expressions to define the context within the cube to perform the data mining operation. Various accessors, specified as MDX expressions, identify data locations within the cube. The framework uses the data in the locations as input to the algorithm or writes output to the specified location. Accessors need to be defined for each of the algorithms so as to let the algorithm know specific contexts for each of the following: (the attribute domain) the expression to identify the factors of our analysis that will be used for prediction [In the current context this expression pertains to the mining attributes that we identified] (the sequence domain) the expression to identify the cases/records that need to be analyzed [In the current context this expression will identify the list of applicants] (the external domain) the expression to identify if multiple models need to be built [Not relevant in the current context] (the anchor) the expression to specify the additional restrictions from dimensions that are not really participating in this data mining operation [In the current context all the dimensions of the cube that we used have relevance to the problem. Accordingly, the anchor in the current context only helps restrict the algorithm scope to the right measure in the Measures dimension] Additional details for each of these expressions can be obtained from the online help documentation. building the data mining model To access the Data Mining Framework, you will need to bring up the Data Mining Wizard in the Administration Services Console, and choose the appropriate application and database as shown in Figure 1 on the next page. 6

7 Figure 1: Choosing the application and database In the next screen (Figure 2 below), depending on whether you are building a new model or revising an existing model, you choose the appropriate task option. Figure 2: Creating a Build Task 7

8 Figure 3: Settings to handle missing data This will bring up the wizard screen for setting the algorithm parameters and the accessor information associated with the chosen algorithm, in this case Naïve Bayes. The user will select a node in the left pane to see and provide values for the appropriate options and fields displayed in the right pane. As shown in Figure 3, select Choose mining task settings to set how to handle missing data in the cube. The choice in this case is to replace with As NaN (Not-A-Number). The Naïve Bayes algorithm requires that we declare upfront if we plan to use either or both of Categorical and Numerical predictors. In the context of the current case, we have both categorical and numerical attribute types and hence the choice is True for both these parameters. RangeCount was decided at 12. Threshold was fixed at 1e-4, a very small value. Figure 4 shows the completed screen for the parameters setting. Figure 4: Setting parameters 8

9 The Naïve Bayes algorithm has two predictor accessors Numerical Predictor and Categorical Predictor and one target accessor. Figure 5 shows the various domains that need to be defined for the accessors. Table 5 shows the values that were used for the case being discussed. All the information provided during this stage of model building is preserved in a template file so as to facilitate reuse of the information if necessary. Figure 5: Accessors associated with Naive Bayes algorithm Table 5: Setting up accessors for the build mode while using Naive Bayes algorithm 9

10 Figure 6: Generating the template and model Once the accessors are defined, the Data Mining Wizard will prompt the user to provide names for the template and model that will be generated at this stage. Figure 6 shows the screen in which the model and template names need to be defined. At this stage we have: Built a Data Mining model built using the Naïve Bayes algorithm testing the data mining model The next step will be to test the newly built model to verify that it satisfies the level of statistical significance that is needed for the model to be put to use. Ideally, a part of the input data (with valid known outcomes historical data) will be set aside as a test dataset to verify the goodness of the Data Mining model that is developed by the use of the algorithm. Testing the model on this test dataset and comparing the outcomes predicted by the model against the known outcomes (historical data) is also one among the multiple processes supported by the Data Mining Wizard. A test mode template can be created by a process similar to creating a build mode template as described in the previous section. While building the test mode template the user needs to provide a Confidence parameter to let the Data Mining Framework know the minimum confidence level necessary to declare the model as a valid one. We specified a value of 0.95 for the Confidence parameter. The exact steps in the wizard and descriptions of the various parameters can be obtained from the online help documentation. 10

11 Once the process is completed the results of the test appear (the name of which was specified in the last step of the Data Mining Wizard) against the Model Results node. Figure 7 shows the node in the Administration Services Console Enterprise View pane where the Mining Results node is visible. The model can be queried within the Administration Services Console interface to obtain a list of the model accessors by using the Query Result functionality. Invoking Show Result for the Test accessor will indicate the result of the test. Figure 8 below shows the list of model accessors in the result set of a model based on the Naïve Bayes algorithm used in the test mode. If the Test accessor has a value 1.0 then the test is deemed successful and the model is declared good or valid for prediction. Figure 9 shows the result of test for the case being discussed in this paper. At this stage we have: Built a Data Mining model built using the Naïve Bayes algorithm The model has been verified as valid with 95% confidence Figure 7: Model Results node in the Administration Services Console interface Figure 8: Model accessors for result set associated with a model based on Naive Bayes algorithm Figure 9: Test results 11

12 applying the data mining model The intent at this stage is to use the recently constructed Data Mining model to predict whether new applicants are likely to enroll into the program. Using the Data Mining model in the apply mode is similar to the earlier two steps. The Data Mining Wizard guides the user to provide the parameters appropriate to the apply mode. The Target domain is usually different in the apply mode since data is written back to the cube. The details of the various accessors and the associated domains can be obtained from the online help documentation. Table 6 shows the values that were provided to the Data Mining Wizard to use the model in the apply mode. Just as in the build mode the names of the results model and template are specified in the wizard and the template is saved before the model is executed. The results of the prediction are written into the location specified by the Target accessor The mining attribute that is referred to by the MDX expression: {[ActualStatus]}. The results can be visualized either by querying the model results in the Administration Services Console using the Query Result functionality as described in the previous section, or by accessing the cube and reviewing the data written back to the cube. One of the options to view the results will be to use the Analytical Services Spread Sheet Client to connect to the database and view the cube data for the ActualStatus measure. interpreting the results The results of the Data Mining model need to be interpreted in the context of the business problem that it is attempting to solve. Any transformation done to the input measures need to be appropriately adjusted for while attempting to interpret the results. In the context of the case being discussed in this paper, the intent was to predict whether applicants were likely to enroll at the ABC University. The possible outcomes in this case are either the applicant will enroll or the applicant will not enroll. The model was verified against the entire set of available data (over records). the confusion matrix You can construct a confusion matrix by listing the false positives and false negatives in a tabular format. A false positive happens when the model predicts that an applicant will enroll and in reality the applicant does not enroll. A false negative happens when the model predicts that an applicant will not enroll and in reality the applicant does enroll. The results predicted by the model can be compared with the actual outcome as available in the historical data to build the confusion matrix. In general for such classification problems, it is most likely that one of these ( false positives or false negatives ) will be slightly more important than the other in a business context. In the case being discussed in this paper, a false negative means lost revenue, whereas a false positive Table 6: Setting up accessors for the apply mode while using Naive Bayes algorithm 12

13 means additional promotional expenditure in trying to follow up on an applicant who will eventually not enroll. The importance of each should be analyzed in the context of the business and the model needs to be rebuilt if necessary with a different training set (historical data) or with a different set of attributes. Figure 10 below shows the confusion matrix constructed using the data set that was analyzed as part of this case study. It is evident from the confusion matrix that the model predicted that 1550 ( ) students will enroll. Of that, only 1478 actually enrolled and 72 did not enroll. This implies that there were 72 false positives. Similarly, the model predicted that 9805 ( ) students will not enroll. Of that, only 9356 actually did not enroll, whereas 449 actually did enroll. This implies that there were 449 false negatives. Figure 10: Confusion matrix to analyze the model s effectiveness in prediction analyzing the results On further analysis of the results the following observations can be made: Incorrect Predictions False positives False negatives Total # of Cases Percentage of Cases 0.634% 3.954% 4.59% Success rate of the model: 95.41% (only 521 incorrect predictions in cases) additional functionality The Analytic Services Data Mining Framework offers more functionality that can be used when deploying models in real business scenarios. Some of the further steps that can be considered include: transformations The Data Mining Framework also offers the ability to apply a transform to the input data just before it is presented to the algorithm. Similarly, the output data can be transformed before being written into the Analytic Services cube. The Data Mining Framework offers a basic list of transformations exp, log, pow, scale, shift, linear that can be used through the Data Mining Wizard. The details of each of these transformations, what they do and how to use them can be obtained from the Analytic Services online help documentation. This list of transformations is further extensible through the import of custom Java routines written specifically for the purpose. The details of how to write Java routines to be imported as additional transforms can be obtained from the vendor guide that is shipped as part of the Data Mining SDK mapping In some cases when the model has been developed for a different context and needs to be used elsewhere, the Mapping functionality is useful. Through this functionality the user can provide information to the Data Mining Framework on how to interpret the existing model accessors in the new context in which it is being deployed. More information on using this functionality can be obtained from the online help documentation. import/export of pmml models The Data Mining Framework allows for portability through import and export of mining models using the PMML format. setting up models for scoring The Data Mining models built using the Analytic Services Data Mining Framework can also be set up for scoring. In the scoring mode the user interacts with the model at real time and the results are not written to the database. The input data can either be sourced from the cube or through data templates which the user fills up during execution. The scoring mode of deployment can be combined with custom applications built using developer tools provided by Hyperion Application Builder to make applications that cater to a specific business process while leveraging powerful predictive analytic capability from the Analytic Services Data Mining Framework. The online help documentation provides additional details on how to score a Data Mining model. using the data mining framework in batch mode There is also a batch mode interface to access the functionalities provided in the Data Mining Framework. Scripts written using the MaxL command interface can be used to do almost all the functionality that is exposed through the Data Mining Wizard. Details of the MaxL commands and their usage can be obtained from the online help documentation. building custom applications Custom applications can be developed using Analytic Services as the backend database and developer tools provided along with Hyperion Application Builder. The functionality provided by the Data Mining Framework can be invoked through APIs. 13

14 summary Data Mining is one of the functional groups among the comprehensive enterprise class analytic functionalities offered within Analytic Services. This case study focused on using the Naïve Bayes algorithm to solve a classification problem, modeled using a real life data set. It was possible to get a 95.41% success rate in the classification exercise using the Analytic Services Data Mining Framework. Some of the business benefits of Data Mining in the OLAP context that can be illustrated from the current case include: It can serve as a discovery tool in a critical decisionsupport process. It includes evaluation of the critical parameters affecting the outcome of a customer (applicant) behavior. The ABC University had initially assumed that some time-related factors played a stronger role in influencing the judgment to enroll. The Data Mining exercise proved it not to be true. In fact, some other, financial attributes appeared as number one. The successful prediction mechanism can become a base for a full-blown risk-management application. In case of ABC University, again, they can devise a policy to invest more promotional expenditure in tracking applicants with distinctly higher academic credentials but with moderate probability of enrollment. Similarly, the prediction mechanism can help the admissions department in making decisions on admission offers even before they have seen the entire applicant pool. Operational control and reporting tool. Traditional OLAP reporting can provide visibility into the state of the admissions operations, extent of funds utilization and reporting on various other financial/operational indicators; in all providing better control on the conformance between planned and actual business positions. suggested reading 1. Data Mining: Concepts and Techniques Jiawei Han, Micheline Kamber 2. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management Michael J. A. Berry, Gordon S. Linoff. 3. Data Mining Explained Rhonda Delmater, Jr., Monte Hancock 4. Data Mining: A Hands-On Approach for Business Professionals (Data Warehousing Institute Series) Robert Groth footnote 1 Breaking up a continuous range of data into discrete segments / bins. HYPERION SOLUTIONS CORPORATION WORLDWIDE HEADQUARTERS 5450 GREAT AMERICA PARKWAY SANTA CLARA, CA TEL FAX Copyright 2005 Hyperion Solutions Corporation. All rights reserved. Hyperion, the Hyperion H logo, and Hyperion s product names are trademarks of Hyperion. References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only. 5164_0805

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence. White Paper November 2002 IAF Business Intelligence Solutions Make the Most of Your Business Intelligence White Paper INTRODUCTION In recent years, the amount of data in companies has increased dramatically as enterprise resource

More information

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

COURSE SYLLABUS COURSE TITLE:

COURSE SYLLABUS COURSE TITLE: 1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

How To Model Data For Business Intelligence (Bi)

How To Model Data For Business Intelligence (Bi) WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

SQL Server Administrator Introduction - 3 Days Objectives

SQL Server Administrator Introduction - 3 Days Objectives SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Oracle Warehouse Builder 10g

Oracle Warehouse Builder 10g Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Distance Learning and Examining Systems

Distance Learning and Examining Systems Lodz University of Technology Distance Learning and Examining Systems - Theory and Applications edited by Sławomir Wiak Konrad Szumigaj HUMAN CAPITAL - THE BEST INVESTMENT The project is part-financed

More information

TU04. Best practices for implementing a BI strategy with SAS Mike Vanderlinden, COMSYS IT Partners, Portage, MI

TU04. Best practices for implementing a BI strategy with SAS Mike Vanderlinden, COMSYS IT Partners, Portage, MI TU04 Best practices for implementing a BI strategy with SAS Mike Vanderlinden, COMSYS IT Partners, Portage, MI ABSTRACT Implementing a Business Intelligence strategy can be a daunting and challenging task.

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Data Warehouse design

Data Warehouse design Data Warehouse design Design of Enterprise Systems University of Pavia 21/11/2013-1- Data Warehouse design DATA PRESENTATION - 2- BI Reporting Success Factors BI platform success factors include: Performance

More information

Chapter 7: Data Mining

Chapter 7: Data Mining Chapter 7: Data Mining Overview Topics discussed: The Need for Data Mining and Business Value The Data Mining Process: Define Business Objectives Get Raw Data Identify Relevant Predictive Variables Gain

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

SQL SERVER TRAINING CURRICULUM

SQL SERVER TRAINING CURRICULUM SQL SERVER TRAINING CURRICULUM Complete SQL Server 2000/2005 for Developers Management and Administration Overview Creating databases and transaction logs Managing the file system Server and database configuration

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

What s Cooking in KNIME

What s Cooking in KNIME What s Cooking in KNIME Thomas Gabriel Copyright 2015 KNIME.com AG Agenda Querying NoSQL Databases Database Improvements & Big Data Copyright 2015 KNIME.com AG 2 Querying NoSQL Databases MongoDB & CouchDB

More information

The Benefits of Data Modeling in Business Intelligence. www.erwin.com

The Benefits of Data Modeling in Business Intelligence. www.erwin.com The Benefits of Data Modeling in Business Intelligence Table of Contents Executive Summary...... 3 Introduction.... 3 Why Data Modeling for BI Is Unique...... 4 Understanding the Meaning of Information.....

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Framework for Data warehouse architectural components

Framework for Data warehouse architectural components Framework for Data warehouse architectural components Author: Jim Wendt Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 04/08/11 Email: erg@evaltech.com Abstract:

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

More information

BENEFITS OF AUTOMATING DATA WAREHOUSING

BENEFITS OF AUTOMATING DATA WAREHOUSING BENEFITS OF AUTOMATING DATA WAREHOUSING Introduction...2 The Process...2 The Problem...2 The Solution...2 Benefits...2 Background...3 Automating the Data Warehouse with UC4 Workload Automation Suite...3

More information

Oracle Primavera Gateway

Oracle Primavera Gateway Oracle Primavera Gateway Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

What is Data Virtualization?

What is Data Virtualization? What is Data Virtualization? Rick F. van der Lans Data virtualization is receiving more and more attention in the IT industry, especially from those interested in data management and business intelligence.

More information

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778 Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778 Course Outline Module 1: Introduction to Business Intelligence and Data Modeling This module provides an introduction to Business

More information

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led

The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led The Microsoft Business Intelligence 2010 Stack Course 50511A; 5 Days, Instructor-led Course Description This instructor-led course provides students with the knowledge and skills to develop Microsoft End-to-

More information

Online Courses. Version 9 Comprehensive Series. What's New Series

Online Courses. Version 9 Comprehensive Series. What's New Series Version 9 Comprehensive Series MicroStrategy Distribution Services Online Key Features Distribution Services for End Users Administering Subscriptions in Web Configuring Distribution Services Monitoring

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?

More information

ETPL Extract, Transform, Predict and Load

ETPL Extract, Transform, Predict and Load ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements

More information

Making confident decisions with the full spectrum of analysis capabilities

Making confident decisions with the full spectrum of analysis capabilities IBM Software Business Analytics Analysis Making confident decisions with the full spectrum of analysis capabilities Making confident decisions with the full spectrum of analysis capabilities Contents 2

More information

CART 6.0 Feature Matrix

CART 6.0 Feature Matrix CART 6.0 Feature Matri Enhanced Descriptive Statistics Full summary statistics Brief summary statistics Stratified summary statistics Charts and histograms Improved User Interface New setup activity window

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010

End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010 www.etidaho.com (208) 327-0768 End to End Microsoft BI with SQL 2008 R2 and SharePoint 2010 5 Days About This Course This instructor-led course provides students with the knowledge and skills to develop

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Data Mining for Successful Healthcare Organizations

Data Mining for Successful Healthcare Organizations Data Mining for Successful Healthcare Organizations For successful healthcare organizations, it is important to empower the management and staff with data warehousing-based critical thinking and knowledge

More information

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy

What is Data Virtualization? Rick F. van der Lans, R20/Consultancy What is Data Virtualization? by Rick F. van der Lans, R20/Consultancy August 2011 Introduction Data virtualization is receiving more and more attention in the IT industry, especially from those interested

More information

DATA VALIDATION AND CLEANSING

DATA VALIDATION AND CLEANSING AP12 Data Warehouse Implementation: Where We Are 1 Year Later Evangeline Collado, University of Central Florida, Orlando, FL Linda S. Sullivan, University of Central Florida, Orlando, FL ABSTRACT There

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? www.ptr.co.uk Business Benefits From Microsoft SQL Server Business Intelligence (September

More information

SQL Server Analysis Services Complete Practical & Real-time Training

SQL Server Analysis Services Complete Practical & Real-time Training A Unit of Sequelgate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner SQL Server Analysis Services Complete Practical & Real-time Training Mode: Practical,

More information

Expense Planning and Control Performance Blueprint Powered by TM1

Expense Planning and Control Performance Blueprint Powered by TM1 IBM Software Business Analytics Application Brief Cognos Software Expense Planning and Control Performance Blueprint Powered by TM1 2 Expense Planning and Control Performance Blueprint Powered by TM1 Executive

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

The Benefits of Data Modeling in Business Intelligence

The Benefits of Data Modeling in Business Intelligence WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Using Adaptive Random Trees (ART) for optimal scorecard segmentation A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

More information

Connectivity Pack for Microsoft Guide

Connectivity Pack for Microsoft Guide HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 2/20/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper October 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Generate a PL/SQL script for workflow deployment Denny Wong Oracle Data Mining Technologies 10 Van de Graff Drive Burlington,

More information

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Data Mining Tutorial Seth Paul Jamie MacLennan Zhaohui Tang Scott Oveson Microsoft Corporation June 2005 Abstract: Microsoft SQL Server 2005 provides an integrated environment for creating and working

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

BUSINESS INTELLIGENCE

BUSINESS INTELLIGENCE BUSINESS INTELLIGENCE Microsoft Dynamics NAV BUSINESS INTELLIGENCE Driving better business performance for companies with changing needs White Paper Date: January 2007 www.microsoft.com/dynamics/nav Table

More information

Creating BI solutions with BISM Tabular. Written By: Dan Clark

Creating BI solutions with BISM Tabular. Written By: Dan Clark Creating BI solutions with BISM Tabular Written By: Dan Clark CONTENTS PAGE 3 INTRODUCTION PAGE 4 PAGE 5 PAGE 7 PAGE 8 PAGE 9 PAGE 9 PAGE 11 PAGE 12 PAGE 13 PAGE 14 PAGE 17 SSAS TABULAR MODE TABULAR MODELING

More information

MicroStrategy Products

MicroStrategy Products MicroStrategy Products Bringing MicroStrategy Reporting, Analysis, and Monitoring to Microsoft Excel, PowerPoint, and Word With MicroStrategy Office, business users can create and run MicroStrategy reports

More information

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course

M2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course Module 1: Introduction to Data Warehousing and OLAP Introducing Data Warehousing Defining OLAP Solutions Understanding Data Warehouse Design Understanding OLAP Models Applying OLAP Cubes At the end of

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Data Mining for Everyone

Data Mining for Everyone Page 1 Data Mining for Everyone Christoph Sieb Senior Software Engineer, Data Mining Development Dr. Andreas Zekl Manager, Data Mining Development Page 2 Executive Summary Contents 2 Data mining in the

More information

Using Data Mining to Detect Insurance Fraud

Using Data Mining to Detect Insurance Fraud IBM SPSS Modeler Using Data Mining to Detect Insurance Fraud Improve accuracy and minimize loss Highlights: Combine powerful analytical techniques with existing fraud detection and prevention efforts Build

More information

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

GETTING AHEAD OF THE COMPETITION WITH DATA MINING WHITE PAPER GETTING AHEAD OF THE COMPETITION WITH DATA MINING Ultimately, data mining boils down to continually finding new ways to be more profitable which in today s competitive world means making better

More information

College Readiness LINKING STUDY

College Readiness LINKING STUDY College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

More information

Salesforce Certified Data Architecture and Management Designer. Study Guide. Summer 16 TRAINING & CERTIFICATION

Salesforce Certified Data Architecture and Management Designer. Study Guide. Summer 16 TRAINING & CERTIFICATION Salesforce Certified Data Architecture and Management Designer Study Guide Summer 16 Contents SECTION 1. PURPOSE OF THIS STUDY GUIDE... 2 SECTION 2. ABOUT THE SALESFORCE CERTIFIED DATA ARCHITECTURE AND

More information

Technology WHITE PAPER

Technology WHITE PAPER Technology WHITE PAPER What We Do Neota Logic builds software with which the knowledge of experts can be delivered in an operationally useful form as applications embedded in business systems or consulted

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by: BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE Cerulium Corporation has provided quality education and consulting expertise for over six years. We offer customized solutions to

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator

CROSS INDUSTRY PegaRULES Process Commander. Bringing Insight and Streamlining Change with the PegaRULES Process Simulator CROSS INDUSTRY PegaRULES Process Commander Bringing Insight and Streamlining Change with the PegaRULES Process Simulator Executive Summary All enterprises aim to increase revenues and drive down costs.

More information

Business Intelligence & Product Analytics

Business Intelligence & Product Analytics 2010 International Conference Business Intelligence & Product Analytics Rob McAveney www. 300 Brickstone Square Suite 904 Andover, MA 01810 [978] 691 8900 www. Copyright 2010 Aras All Rights Reserved.

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Java Metadata Interface and Data Warehousing

Java Metadata Interface and Data Warehousing Java Metadata Interface and Data Warehousing A JMI white paper by John D. Poole November 2002 Abstract. This paper describes a model-driven approach to data warehouse administration by presenting a detailed

More information

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining with SAS. Mathias Lanner mathias.lanner@swe.sas.com. Copyright 2010 SAS Institute Inc. All rights reserved. Data Mining with SAS Mathias Lanner mathias.lanner@swe.sas.com Copyright 2010 SAS Institute Inc. All rights reserved. Agenda Data mining Introduction Data mining applications Data mining techniques SEMMA

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Integrating SAP and non-sap data for comprehensive Business Intelligence

Integrating SAP and non-sap data for comprehensive Business Intelligence WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Oracle Data Miner (Extension of SQL Developer 4.0)

Oracle Data Miner (Extension of SQL Developer 4.0) An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Visual Data Mining in Indian Election System

Visual Data Mining in Indian Election System Visual Data Mining in Indian Election System Prof. T. M. Kodinariya Asst. Professor, Department of Computer Engineering, Atmiya Institute of Technology & Science, Rajkot Gujarat, India trupti.kodinariya@gmail.com

More information