Working with Multidimensional Cubes in. SQL Server Data Mining

Size: px
Start display at page:

Download "Working with Multidimensional Cubes in. SQL Server Data Mining"

Transcription

1 Working with Multidimensional Cubes in SQL Server Data Mining MIS 5346 Foundations of Data Warehousing G. Green Student Notes 4/15/ :31 AM Page 1 of 38

2 Building a Multidimensional Cube Last Updated Data Mining... 3 Example: Increasing Student Support Decision Tree, First Attempt... 6 Problem Definition... 6 Data Preparation... 7 Model Development/Training... 9 Example: Increase Student Support Decision Tree Problem Definition Data Preparation Model Development/Training Model Validation/Evaluation Prepare Test Data Lift Chart with No Predict Value to Validate Model Lift Chart with Predict Value to Validate Model Classification Matrix Model Deployment/Use Singleton Query Prediction Join Query Example: Increase Student Support Clustering Problem Definition Data Preparation Model Development/Training Model Validation/Evaluation Model Deployment/Use Data Mining with Excel Example: Course Mixture -- Association Data Preparation Configure a connection to SSAS Import data Explore Data Clean/Transform Data Model Development/Training Model Validation/Evaluation Model Deployment/Use Student Notes 4/15/ :31 AM Page 2 of 38

3 Data Mining Data Mining is a set of techniques for exploring large amounts of data to find patterns and perform predictions. We use results to help improve the organization s performance. Data warehouses can assist in decision-making to improve organizational performance. For example, an OLTP or data warehouse can easily tell us How many of productx were sold last month? An OLAP cube can easily tell us What is the difference in product sales of productx over the last 5 years by region? However analyses such as Which consumers should be targeted for future sales of productx? would be more easily addressed by data mining analyses. Other examples of analyses that data mining can assist in includes why did a certain political candidate win/lose an election; will this particular customer buy a product or not; which of my loyalty card members are most likely to buy productx; predict the utilization of my hospital beds over the next month; based on my sales, how do I need to staff; etc Data mining and data warehouses complement each other well. Data warehouses provide historical data which has been integrated and cleansed; data mining helps identify what data is more meaningful for decision-making and may therefore warrant further attention in the warehouse. 6 Broad Categories of Data Mining Tasks and Related Algorithms 1. Classification Prediction of a discrete attribute (an attribute with distinct values like yes/no, good/bad, h/m/l likely/unlikely, ) based on the other column/attribute values in the case Decision Tree Naïve Bayes Neural Net 2. Estimation/Regression/Forecasting Prediction of a continuous attribute (like sales, probability, ) based on the other column/attribute values in the case Linear Regression Logistical Regression Time Series 3. Association AKA market basket analysis; finds which cases belong together in a group; requires having grouping of data from the past Association 4. Segmentation/Clustering AKA market segmentation; Grouping data into categories based on shared/similar attribute values Clustering 5. Sequence Analysis Examines the ordering of events over time to predict future sequencing. Eg, what products or services a customer will need next? How should our TV programs be sequenced? Clustering 6. Deviation Analysis AKA fraud detection; Finding exceptions or outliers in the data Clustering Student Notes 4/15/ :31 AM Page 3 of 38

4 We will focus on three algorithms: 1. Decision Tree a. Tree-like model of decisions and consequences/outcomes. b. Good for predicting c. Non-parametric so no specific data distribution is needed/assumed d. Supervised so you specify: i. A key column unique to each case/row ii. One or more input variables iii. the variable you re trying to predict/target e. Can handle unbalanced datasets (i.e. datasets with large numbers of positive or negative targets, and small numbers of the opposite target type) f. Very simple to understand 2. Clustering a. Grouping similar objects together b. Good for exploring data, gaining insights into cluster characteristics c. Unsupervised so no target variable needed. But do need: i. a key column that is unique to each record ii. one or more input columns that are used to form the clusters 3. Association a. Discovering relationships/regularities between data b. Generates association rules (e.g., if (antecedent)/ then (consequence) statements) c. Rule strength is indicated by Support (frequency items appear) and Confidence (number of times rule is true) numbers d. Best rules can be used for marketing campaigns e. Need: i. Granular data ii. A key column that is unique to each transaction /itemset/case (e.g., each basket) iii. A predictable/target variable that is typically the key of the items grouped in an itemset (e.g., each item in a basket) iv. One or more ID/input columns that have discrete values The Association algorithm requires source data to have: A Key column that uniquely identifies an itemset; cannot be a concatenated key A column that serves as a predictable column (typically the key of a nested table) <=item#> One or more input columns that have discrete values NOTE: good website for questions to ask before starting analytics project: questions-first Steps for Data Mining 1. Problem Definition 2. Data Preparation 3. Model Development/Training 4. Model Validation/Evaluation 5. Model Deployment/Use Student Notes 4/15/ :31 AM Page 4 of 38

5 A good, short tutorial for Microsoft SSAS Data Mining (includes link to sample file): Student Notes 4/15/ :31 AM Page 5 of 38

6 Example: Increasing Student Support Decision Tree, First Attempt Problem Definition Here we identify our business goal, followed by careful consideration of opportunities for data mining to assist in achieving the goal. As Kimball puts it: the overall business value goal [should be described] in as narrow and measurable way as possible A goal like reduce the monthly churn rate is a bit more manageable. Next, think about what factors influence the goal. What might indicate that someone is likely to churn? How can we tell if someone would be interested in a given product? try to translate them into specific attributes and behaviors that are known to exist in a usable, accessible form the data miner should work with the business folks to prioritize the various opportunities based on the estimated potential for business impact and the difficulty of implementation. The Microsoft Data Warehouse Toolkit, by Mundy, Thornthwaite, and Kimball <pg 442> We have the following business goals: Increase academic success by students o Identify potentially at-risk students so we can take proactive measures to increase their likelihood of academic success. o We will use classification with a decision tree algorithm Student Notes 4/15/ :31 AM Page 6 of 38

7 Data Preparation The case is the basic unit of analysis in data mining. The objective in data preparation is to build data mining case sets that can be effectively used by the data mining algorithms. A case set is a dataset that includes one row per instance or event or customer. All the information about the instance/event/customer is included in one record. A common example of a case set is a set of customer records containing demographic-type data. However for many events like purchases, the case set may include one row for each product purchased by a customer. This is called a nested case set because it has two components for each case one row representing a customer with customer attributes, and multiple product rows that represent the products purchased by the customer. Data preparation typically involves creating at least two case sets one to be used for training our data mining model, and another to be used for subsequent testing of our model. We could split our source dataset into two datasets for this purpose. However we will use a single dataset. Then when training our model, we will tell SSAS to set aside a percentage of cases for subsequent testing. Another issue in data preparation involves the identification of attributes needed in the case set. Not all attributes in our fact or dim tables will be deemed useful for data mining. There are many published approaches to eliminating attributes from case sets, many of them involving looking at statistical measures such as degree of multicollinearity, high significance (P) values, etc. Other potential eliminators include things like high percentage of missing values, single-valued fields, fields with personally-identifying information, etc. However from a practical standpoint, don t forget the importance of judging attributes based on importance to the business! In the identification of attributes there is also the case where we may need one or more attributes in our case set that do not currently exist in fact or dimension tables (e.g., dependent variables being predicted. In these cases we will need to add these attributes to our case set. Rather than use the base data mart dimensions and facts as our case set, we will instead flatten the data from these sources for use in data mining. Basically, flattening involves going into the source database or data warehouse, and combining data from different tables into a single table using inner and/or outer joins. This flattened dataset is then used as the case set for data mining. Student Notes 4/15/ :31 AM Page 7 of 38

8 To analyze student performance in courses, we create a flattened dataset by creating the view below: CREATE VIEW view_student_performance AS SELECT s.[student_sk], [city], [state_abbreviation], [major], [classification], [gmat], [gpa] as HighSchoolGPA, AVG([coursegrade]) AS AverageCourseGrade, case when avg(coursegrade) >= 3.3 then 'high' when avg(coursegrade) < 3.3 and avg(coursegrade) >= 2.8 then 'medium' else 'low' end as GradeCategory FROM dimstudent s, dimlocation l, factenrollment e WHERE s.student_sk = e.student_sk AND l.location_sk = e.location_sk GROUP BY s.[student_sk], [city], [state_abbreviation], [major], [classification], [gmat], [gpa]; Run the above statement(s) in SSMS. View the data. Note there is one row per student Student Notes 4/15/ :31 AM Page 8 of 38

9 Model Development/Training Model development involves the creation of the data mining structures that will be used to shed insight on the business questions identified previously. The structures that need to be created include: An analysis services project Data Source Data Source View Mining Structure Mining Model(s) 1. Create an analysis services project in SSDT a. We will begin by using our existing ClassPerformanceAS project 2. Create a data source a. We will use the existing data source (ClassPerformanceDWDS) that points to our ClassPerformanceDW data mart in SQL Server 3. Create a data source view (DSV) a. We will modify the existing DSV i. Right-click in the DSV display area and add/remove tables ii. Select the new view(s) we created during data preparation iii. Save the DSV. Next we create mining models to help achieve our two business goals. We also supply data to our model algorithms to train the algorithm on a particular aspect of the organization. We start with a decision-tree based model for predicting performance. 4. In Solution Explorer, right-click Mining Structures subfolder; click New Mining Structure 5. Choose From existing relational database or data warehouse ; Next 6. Choose to Create mining structure with a mining model and select the Microsoft Decision Trees algorithm; Next 7. Select the ClassPerformanceDW DSV; Next 8. Check the Case box next to the view view_student_performance; Next 9. On the Specify the Training Data dialog page, a. Check the Key box next to student_sk if not already selected b. Check the Predictable box next to GradeCategory. This action identifies that column as the column to be predicted. c. Click the Suggest button. This allows SSAS to recommend additional input columns based on whether they correlate with the Predictable variable at higher than.05; click OK. d. Remove AverageCourseGrade as a model column by unchecking the box to the left of the column name e. Ensure city, classification, gmat, gpa, major, and state selected as input columns; Next f. Click Next Student Notes 4/15/ :31 AM Page 9 of 38

10 10. On the Specify Columns Content and Data Type dialog page, a. Ensure city, classification, GradeCategory, major, and state, have Content Types of Discrete. You can click the dropdown and change the types manually OR you can click the Detect button. 11. Next we set aside a portion of data that will be used for validating our model a. Change the Percentage of data for testing to 25%; Next 12. On the Completing the Wizard dialog page, a. Name the mining structure Predict Successful Students b. Name the mining model Predict Successful Students Decision Tree c. Check the Allow drill through box d. Click Finish 13. Redeploy the project The next appropriate step is to create alternate mining models within this same mining structure that can also predict donors using algorithms such as Naïve Bayes and/or neural net. This is sometimes referred to as triangulation. Ideally the different algorithms would provide similar results. By trying several algorithms we (1) have the ability to compare models to determine the best predictor of donors, and (2) provide more confidence to our users that our predictions are valid if their outcomes are consistent. We will skip triangulation and proceed to explore the results of our decision tree model. We can view the resulting decision tree created by going to the Mining Model Viewer. There are two ways to view a decision tree model: the Decision Tree view, and the Dependency Network view. For additional information, refer to the following Microsoft website: Decision Tree View The decision tree view shows us the decision steps used to predict the target variable. 1. Click on the Mining Model Viewer tab. 2. Click on the Decision Tree tab. As a result of running a decision tree analysis, one tree is created for each predictable column. Since we identified only one predictable column, there is only one tree to view in this tab. Student Notes 4/15/ :31 AM Page 10 of 38

11 The resulting tree should have multiple levels; however we have only one level. One issue is that our sample size is way too small; analysis services will stop splitting nodes or fail to split a node if the number of cases is too small. Another issue is that because our class performance data is fictitious with values randomly assigned, Analysis Services likely could not find an input variable that predicted/differentiated successful students better than other input variables. So we will use another dataset to demonstrate decision tree analysis. Example: Increase Student Support Decision Tree Problem Definition We have the following business goals: Anticipate financial needs of students o Predict whether a potential student will require financial aid We will use classification with a decision tree algorithm Increase academic success by students o Determine student profiles so that we can better serve students with common characteristics We will use a clustering algorithm Data Preparation To predict student need for financial aid, we create a flattened dataset by creating the view below: CREATE VIEW StudentInfo AS SELECT dbo.dim_student.student_ak, dbo.fact_academic.act, dbo.fact_academic.sat, dbo.fact_academic.high_school_gpa, dbo.fact_academic.highschoolrank, dbo.dim_student.first_name, dbo.dim_student.last_name, dbo.dim_student.birth_date, dbo.dim_student.marital_status, dbo.dim_student.gender, dbo.dim_student.[full_time/part_time], dbo.dim_student.legacy_status, dbo.dim_student.transfer_flag, dbo.dim_student.state, dbo.dim_student.zip_code, dbo.dim_student.country, dbo.dim_student.financial_aid FROM dbo.dim_student INNER JOIN dbo.fact_academic ON dbo.dim_student.student_sk = dbo.fact_academic.student_sk WHERE (dbo.dim_student.student_sk <> - 1); Run the above statement(s) in SSMS. View the data. Note there is one row per student. Student Notes 4/15/ :31 AM Page 11 of 38

12 Model Development/Training 1. Create an analysis services project in SSDT a. We will begin by creating a new HigherEdDM analysis services project/solution (store on desktop) 2. Create a data source a. Create a data source (HigherEdDW DS) that points to our existing HigherEdDW data mart in SQL Server 3. Create a data source view (DSV) a. Create a DSV (HigherEdDW DSV) that contains only the new view(s) we created during data preparation 4. In Solution Explorer, right-click Mining Structures subfolder; click New Mining Structure 5. Choose From existing relational database or data warehouse ; Next 6. Choose to Create mining structure with a mining model and select the Microsoft Decision Trees algorithm; Next 7. Select the HigherEDW DSV; Next 8. Check the Case box next to the view StudentInfo; Next 9. On the Specify the Training Data dialog page, a. Check the Key box next to student_ak if not already selected b. Check the Predictable box next to Financial_Aid. This action identifies that column as the column to be predicted. c. Click the Suggest button. This allows SSAS to recommend additional input columns based on whether they correlate with the Predictable variable at higher than.05. Click the input cells for columns that are visible (not grayed-out); click OK d. Choose HighSchoolRank and SAT as additional input columns by checking the box Input boxes to the right of the column names; Next e. Click Next 10. On the Specify Columns Content and Data Type dialog page, a. Click the Detect button. b. Click Next 11. Next we set aside a portion of data that will be used for validating our model a. Keep the Percentage of data for testing at 30%; Next 12. On the Completing the Wizard dialog page, a. Name the mining structure Predict Financial Aid b. Name the mining model Predict Financial Aid Decision Tree c. Check the Allow drill through box d. Click Finish 13. Deploy the project (be sure that the OLAP login has datareader and datawriter roles mapped in SSMS) Student Notes 4/15/ :31 AM Page 12 of 38

13 Decision Tree View The decision tree view shows us the decision steps used to predict the target variable. 1. Go to the Mining Model Viewer tab 2. Click on the Decision Tree tab. As a result of running a decision tree analysis, one tree is created for each predictable column. Since we identified only one predictable column, there is only one tree to view in this tab. If there are multiple levels in the tree, we can expand or reduce the number of tree levels shown by moving the Show Level slider appropriately. A quick look at the tree tells us that a student s ACT score appears to be the primary predictor of whether or not the student will require financial aid and after accounting for the ACT score, whether or not the student is a transfer student appears to be the next important predictor. The mining legend area gives us guidance in how to interpret the node and histogram bar colors. Each node represents a subset of cases in the decision tree. By default, the darkest nodes indicate the nodes with the largest number of cases. This default can be changed by setting the Background field (see description below). Hovering over a node provides data about the cases in that node. From it, you can see the condition required to reach that node from the node preceding it. For example, hovering over the Leaf node (ie, a lowest-level node) representing ACT s >=19 and Transfer Flags = 1, you can see that there were 2,264 cases of students meeting that criteria who required financial aid while 906 of students meeting that same criteria did not require financial aid. Student Notes 4/15/ :31 AM Page 13 of 38

14 By examining the histogram bars within the nodes you can visually see the approximate ratio of nonfinancial aid students to financial aid students. According to the mining legend, the blue color histogram bar indicates students that received financial aid (predict value = 1); the red color bars indicate students who did not receive financial aid (predictable value = 0). A visual examination of these bars tells us, for example, that all the students with ACT scores under 11 required financial aid, whether they were transfer students or not. Similarly, the majority of students with ACT scores between 11 and 15 did NOT require financial aid, regardless of their transfer status. The Background field shows which predictable column cases are being highlighted: yes or true cases (meaning the tree is showing the darkest color where more cases of students receiving financial aid), no or false (the tree is showing the darkest color where there are more cases of students NOT receiving financial aid), cases missing prediction values, or All cases (the tree is showing the darkest color where there are more total cases). Dependency Network View The Dependency Network tab shows the relationships between the attributes that contributed to the predictive ability of the mining model. 1. Click on the Dependency Network tab By default, all the predictive attributes are shown. However, by moving the slider on the left side, you can see which attributes are the strongest predictors in this case, ACT scores. This can also be seen on the previous Decision Tree tab as ACT scores were the main variables in the first split of the tree. Student Notes 4/15/ :31 AM Page 14 of 38

15 Model Validation/Evaluation After reviewing the results of the mining algorithm in the Mining Model Viewer, we validate the model. Validating the model involves checking the accuracy of the results produced by the algorithm, and comparing the predictive ability of multiple models/algorithms. From the Mining Accuracy Tab, there are three tools for model validation: Lift chart Classification Matrix Cross Validation tool Model validation begins with using the test data that we set aside previously as input to our mining model(s). The model then predicts the predictable attribute in this test data set. We then validate the model by comparing the model s predictive performance against the known outcomes. Prepare Test Data 1. Click on the Mining Accuracy Chart tab 2. Click on the Input Selection tab 3. Ensure Synchronize Prediction Columns and Values is checked, and that all mining models are selected 4. Ensure Use Mining model test cases radio button is selected at the bottom Lift Chart with No Predict Value to Validate Model 5. Click on Lift Chart tab a. The Blue line represents the Ideal Model b. The red line represents the predictive ability of our model c. The dark gray vertical line indicates what percentage of the sample population the Mining Legend is displaying statistics for Student Notes 4/15/ :31 AM Page 15 of 38

16 This version of the lift chart does not contain a Target Value; i.e., we did not specify on the input selection tab whether we wanted to see the accuracy of yes predictions vs no predictions, so this current lift chart shows ALL predictions. We can see that when 50% of the population is processed, our model predicts 37.2% of financial aid students vs. non-financial aid students correctly as opposed to the perfect (blue line) model which would predict all 50% of the population correctly. If we had used multiple models, additional lines would be included on the chart showing the predictive ability of each model for comparison. The Predict Probability is similar to a confidence level; it tells us that our decision tree model will correctly predict 37.2% of the current population IF you rely on the results that have a 62.49% predict probability (i.e., confidence) or higher. Student Notes 4/15/ :31 AM Page 16 of 38

17 Lift Chart with Predict Value to Validate Model 6. Click on Input Selection tab 7. Change Predict Value to Yes (or 1) 8. Return to Lift Chart tab a. The Blue Line now represents random guessing b. The Green Line represents the ideal model c. The Red Line still represents our model Now we see that when 50% of the population is processed, our model predicts approx. 63% of students who will receive financial aid correctly. This is better than random guessing. Many say that this type of lift chart is more valuable to analyze. Classification Matrix Allows us to compare predicted versus actual values. 9. Click on the Classification Matrix tab The numbers circled in green represent cases where the model correctly predicted financial aid students (i.e., true positives); the numbers circled in red represent incorrectly predicted financial aid students (i.e., false positives). Using the test data, our model resulted in 3824 correct predictions, and 2176 incorrect predictions. Note that if we used multiple models, we could have compared accuracy of the models using this matrix. Student Notes 4/15/ :31 AM Page 17 of 38

18 Model Deployment/Use At this point we have already developed, compared, selected, and deployed a data mining model ready to support real-world prediction and decision-making. We can preview the ability for users and/or client applications to issue (DMX) queries against our model to retrieve results for a specific case or for a batch of cases. Issuing DMX queries against a previously-created mining model is how client applications can use our mining model to gain real-time insights and/or predictions. We can preview this functionality in the Mining Model Prediction tab of SSDT. Two types of DMX prediction queries are supported via SSDT: a single-row query (AKA Singleton query), and a batch query (AKA batch or Prediction Join query). We will look at examples of both. In addition, two ways to create your DMX query are supported via SSDT: using a GUI that generates the DMX for you, or by entering the DMX manually. We will look at examples of both. Singleton Query Using a singleton query, we can feed a single case as input to a mining model in order to retrieve the predicted value for that case. This is how an application could use the mining model in real-time. 1. With the mining model open, click on the Mining Model Prediction tab, and click on the Singleton Query button Student Notes 4/15/ :31 AM Page 18 of 38

19 2. Click the Select Model button 3. Highlight the Predict Financial Aid Decision Tree model and click OK In the Singleton Query Input window, you can enter the specific values you want the Decision Tree model to predict. Then you can specify which values you want the DMX query to return to you in the bottom area of the Mining Model Prediction window. At a minimum, you d want the query to return the predictable attribute, in our case, the Donated attribute. In addition, we will ask the model to return the probability that the predicted attribute is true, as well as the probability the predicted attribute is false. 4. Enter values as shown below, then click on the View dropdown and select Results: Student Notes 4/15/ :31 AM Page 19 of 38

20 Results of our prediction query are shown below. We aliased the two expressions so that they appear in the result set with user friendly names (similar to aliasing columns in SQL). 5. Click on the Query view in order to see the DMX query that SSDT generated to produce our results. 6. Copy/paste the query into SSMS and run it from there as well. Be sure to open a DMX query window. Note that we can modify the DMX code to do the aliasing if we didn t do in the SSDT version of the DMX query: Student Notes 4/15/ :31 AM Page 20 of 38

21 The previous example illustrates how a user or application can dynamically retrieve a prediction result for a single case. However, our user/application needs might require us to calculate values for a group of casesin this case we need to create a Prediction Join query. Prediction Join Query A prediction join query or batch query allows us to feed multiple cases to a mining model and retrieve the resulting predictions for each of the cases. The cases need to be contained in a table. Ideally we would have new a set of data that we would use as input in a prediction join query; however for the purpose of our class example, we will use the same set of data that we used to create the model. 1. With the mining model still open in the Mining Model Prediction tab, click again on the Singleton Query button to toggle us back to Prediction Query mode (you do not need to save the previous query). 2. Click Select Case Table, choose the StudentInfo view table, then click OK Notice that SSDT automatically tries to map fields used in the mining model to fields in the input table. However we do not want it to use the Financial_Aid field in the input table this is the field we want the model to predict. Therefore, we need to remove the mapping for this field. 3. Click on the mapping line going from Financial Aid in the mining model to Financial_Aid in the input table; right-click on the line and Delete it. Student Notes 4/15/ :31 AM Page 21 of 38

22 4. Once again, we now choose the data we want to appear in the results of our mining model. Set the fields as indicated below: 5. Click the dropdown to view the Result of running the query. We can save these results as a relational table so that users/applications will have access to it. 5. Click on the Save button on the Mining Model Prediction tab. We can save the results as a new table in the database of our datasource. Student Notes 4/15/ :31 AM Page 22 of 38

23 Example: Increase Student Support Clustering Problem Definition We have the following business goals at BT University: Anticipate financial needs of students o Predict whether a potential student will require financial aid We will use classification with a decision tree algorithm Increase academic success by students o Determine student profiles so that we can better serve students with common characteristics We will use a clustering algorithm Data Preparation To predict if a student will graduate in fewer than 4 years, we create a flattened dataset as follows: <20,000 recs> CREATE VIEW StudentInfoDetailed AS SELECT dbo.dim_student.student_ak, dbo.fact_academic.act, dbo.fact_academic.sat, dbo.fact_academic.high_school_gpa, dbo.fact_academic.highschoolrank, dbo.dim_student.first_name, dbo.dim_student.last_name, dbo.dim_student.birth_date, dbo.dim_student.marital_status, dbo.dim_student.gender, dbo.dim_student.race, dbo.dim_student.[full_time/part_time], dbo.dim_student.legacy_status, dbo.dim_student.transfer_flag, dbo.dim_student.state, dbo.dim_student.zip_code, dbo.dim_student.country, dbo.dim_student.financial_aid, CASE WHEN Major_1_name <> CONVERT(varchar, - 1) AND Major_2_name <> CONVERT(varchar, - 1) THEN 2 WHEN Major_1_name <> CONVERT(varchar, - 1) AND Major_2_name = CONVERT(varchar, - 1) THEN 1 WHEN Major_1_name = CONVERT(varchar, - 1) AND Major_2_name <> CONVERT(varchar, - 1) THEN 1 WHEN Major_1_name = CONVERT(varchar, - 1) AND Major_2_name = CONVERT(varchar, - 1) THEN 0 ELSE 0 END AS num_majors, CASE WHEN YEAR(start_date) - YEAR(graduation_date) < 4 THEN 1 WHEN YEAR(start_date) - YEAR(graduation_date) >= 4 THEN 0 END AS early_graduate, YEAR(dbo.dim_student.Start_Date) - YEAR(dbo.dim_student.Birth_Date) AS age FROM dbo.dim_student INNER JOIN dbo.fact_academic ON dbo.dim_student.student_sk = dbo.fact_academic.student_sk WHERE (dbo.dim_student.student_sk <> - 1); Run the above statements in SSMS. View the data. Note there is one row per student. Student Notes 4/15/ :31 AM Page 23 of 38

24 Clustering attempts to group similar records together and does not need a target variable specified in advance. It is more exploratory in nature and is sometimes done before other, more predictive techniques in order to better understand the data. We will use the same analysis services project (HigherEdDM) and the same data source (HigherEdDW DS). But because we are using a different view we ll need to add it to our current DSV. Model Development/Training 1. Update the existing data source view (DSV) a. Double-click the HigherEdDW DSV to open it, if it is not already open b. Right-click in an empty area of the DSV design area and choose Add/Remove Tables c. Highlight the StudentInfoDetailed view and click the right arrow to move it to Included objects; OK d. Save the DSV 2. In Solution Explorer, right-click Mining Structures subfolder; click New Mining Structure 3. Choose From existing relational database or data warehouse ; Next 4. Choose the Microsoft Clustering technique 5. Choose the Higher Ed DW DSV 6. Check the Case box next to the StudentInfoDetailed view to indicate the case set we ll be using for the analysis 7. Choose Student_AK as the Key column; with the exception of Birth_Date, First_Name, Last_Name, and Zip_Code, set all other variables as Input variables 8. Click Detect to have SSAS detect the type of data stored in the input variables 9. Choose 30% for testing 10. Name the Mining Structure Student Info Detailed; name the Mining Model Student Info Detailed Clustering; check box to Allow drill through; Finish You can guide how mining algorithms work by overriding default parameters. 1. Click the Mining Models tab 2. Right-click the clustering model and choose Set Algorithm Parameters 3. Set the CLUSTER_COUNT parameter to 0 which allows the clustering algorithm to determine the best number of clusters to create; click OK 4. Save and Redeploy the project. 5. Go to the Mining Model Viewer tab to view the resulting cluster diagram. There are 4 sub-tabs for viewing the results of clustering: Cluster Diagram, Cluster Profiles, Cluster Characteristics, and Cluster Discrimination. Student Notes 4/15/ :31 AM Page 24 of 38

25 Cluster Diagram View On the Cluster Diagram tab, we can see that 6 clusters have been created. The connecting lines tell us how closely related one cluster is to another. The darker the line the stronger the relationship. The slider on the left can be adjusted to see only strongest links up to all links. In our case, the strongest links are between clusters 1 and 5. By default, the Shading Variable is Population. This means that the clusters with darker shades have the most cases in that group. If you want to examine the number of cases in a group that have a certain attribute value, you can change the shading variable. 1. Change the Shading Variable to State. The State box will default to Alabama since values are alphabetized; change to Texas. 2. Cluster 3 is the darkest. Hover over cluster 3; notice that 3% of the cases in that cluster have State values of Texas. Student Notes 4/15/ :31 AM Page 25 of 38

26 Cluster Profiles View A challenge when using the clustering algorithm is determining what profile each cluster represents. 1. Click on the Cluster Profiles tab. The first profile shown is the entire population, as a reference. Also notice that discrete values are presented as bars, where continuous values are presented as sliders. The Histogram Bars field shows how many bars are visible in the attribute profiles. Each bar corresponds to distinct values. If more values exist than the number of bars you display, the remaining values are grouped together into a gray bucket. One obvious observation is that all the country values are the same. Also Age, Gender, Legacy Status, Marital Status, Num Majors, Race, and State appear to not vary much across clusters. We can go back to the mining models tab, and change these fields from Input to Ignore. If you do this, redeploy the project. Student Notes 4/15/ :31 AM Page 26 of 38

27 It s still pretty hard to distinguish between the 6 profiles, so we can narrow it down to a smaller number. 2. Change the CLUSTER_COUNT parameter to 4; redeploy and view the updated cluster profiles. Note the cluster diagram has changed to 4 nodes as well (population column manually hidden). There are a few distinguishing cluster differences: Cluster 1 has students almost all of whom are on financial aid, who are transfers, and with low ACTs. Cluster 4 has students who are not on financial aid, not transfer students, and with aboveaverage ACTs. Cluster 2 has students on some financial aid but not all, with high ACTs and high GPAs Cluster 3 has a mix of financial aid and nonfinancial aid students but with low GPAs We can rename the clusters to help us better identify them. 3. Right-click in the cluster1 column heading and select Rename Cluster a. Rename cluster1 to Financial Aid Students b. Rename cluster2 to High Performers c. Rename cluster3 to Low GPAs d. Rename cluster4 to Non Financial Aid Students Student Notes 4/15/ :31 AM Page 27 of 38

28 Cluster Characteristics View This view allows you to examine the characteristics that make up a selected cluster. Attributes are shown in order of importance, and includes the probability that the attribute appears in the cluster. 1. Select the Financial Aid Students cluster in the Cluster: dropdown 2. Select different clusters and see if/how the important variables change. Student Notes 4/15/ :31 AM Page 28 of 38

29 Cluster Discrimination View This view helps determine which attributes most differentiate a selected cluster from all other clusters. 1. Choose the Financial Aid cluster as Cluster 1; notice that the lack of high ACT scores is the primary distinguishing factor of this cluster from the others; after that is Transfer status. 2. Choose the High Performers cluster as Cluster 1; notice that the primary distinguishing factor from high performers and non-high performers appears to be whether they graduate early. Change Cluster 2 to Low GPAs; when High Performers are compared to the Low GPAs cluster, GPAs and ACT scores set those apart. Student Notes 4/15/ :31 AM Page 29 of 38

30 Model Validation/Evaluation As we are not doing any predicting, then examining Model Accuracy and Model Prediction do not apply for our use of clustering. Model Deployment/Use To deploy and use resulting clusters, we can use the methods discussed earlier when we discussed decision trees. 1. Go to Mining Model Prediction tab 2. Click Select Case Table and choose the StudentInfoDetailed (dbo) tble 3. Design the DMX query as shown below: 4. Click on the Results view 5. Click the disk to Save the results as a table (Segment Students) in SSMS 6. Switch to the Query view to see the DMX query that produced the model results Student Notes 4/15/ :31 AM Page 30 of 38

31 SELECT Cluster(), t.[act], t.[early_graduate], t.[financial_aid], t.[transfer_flag], t.[high_school_gpa] From [Student Info Detailed Clustering] PREDICTION JOIN OPENQUERY([Higher Ed DW DS], 'SELECT [act], [early_graduate], [Financial_Aid], [Transfer_Flag], [high_school_gpa], [sat], [highschoolrank], [Full_Time/Part_Time] FROM [dbo].[studentinfodetailed] ') AS t ON [Student Info Detailed Clustering].[Act] = t.[act] AND [Student Info Detailed Clustering].[Sat] = t.[sat] AND [Student Info Detailed Clustering].[High School Gpa] = t.[high_school_gpa] AND [Student Info Detailed Clustering].[Highschoolrank] = t.[highschoolrank] AND [Student Info Detailed Clustering].[Full Time Part Time] = t.[full_time/part_time] AND [Student Info Detailed Clustering].[Transfer Flag] = t.[transfer_flag] AND [Student Info Detailed Clustering].[Financial Aid] = t.[financial_aid] AND [Student Info Detailed Clustering].[Early Graduate] = t.[early_graduate] 7. Rather than use the DMX query generated in SSDT, enter the DMX query below in SSMS (ensure the HigherEdDM database & StudentInfoDetailed clustering model has been selected): SELECT t.*, Cluster() FROM [Student Info Detailed Clustering] NATURAL PREDICTION JOIN (SELECT * FROM [Student Info Detailed Clustering].CASES) as t order by cluster() Just as with decision trees, we can also generate a singleton query to identify the cluster group that an individual case most closely fits. 6. Enter the DMX query below in SSMS (it was generated in SSDT using a singleton query): SELECT Cluster() From [Student Info Detailed Clustering] NATURAL PREDICTION JOIN (SELECT 30 AS [ACT], 0 AS [Transfer Flag], 200 AS [High School Rank]) AS t Student Notes 4/15/ :31 AM Page 31 of 38

32 Data Mining with Excel Excel can be used as a client to create and execute data mining models, in lieu of or addition to SSAS. Excel has two Data Mining Interfaces: one developer or power-user oriented, the other end-user oriented. A key difference between the two interfaces is that the developer-oriented one can generally work directly with SQL Server or SSAS data; the end-user oriented interface requires a local copy of the data be stored in Excel. For this example we will use the user-oriented interface. Example: Course Mixture -- Association Data Preparation The Association algorithm analyzes groups of related items and predicts the likelihood of items occurring together. The Association algorithm is used frequently in retail as Market Basket Analysis where an item would be analogous to an individual product in a customer s shopping basket, and an itemset would represent all the combinations of items purchased in shopping basket transactions. This application of association is used to gain insights about customer purchases, including which products are purchased together for potential cross-selling, which products benefit from promotions, etc... The association algorithm produces association rules about how items are related e.g., product X is ordered with product Y with Z degree of statistical confidence. Rules can include recommendations recommendations are rules that exceed a certain probability threshold that you can specify. The Association algorithm requires source data to have: A Key column that uniquely identifies an itemset; cannot be a concatenated key A column that serves as a predictable column (typically the key of a nested table One or more input columns that have discrete values Source data for the association algorithm is often in a flattened dataset created from a table containing transactions. We will create a flattened dataset based on student course enrollments in the ClassPerformanceDW database. Each enrollment is identified by a combination of student_sk and class_sk. However the association algorithm requires a key that identifies the market basket transaction. For our example, one student s set of enrolled classes represents a market basket. Therefore we have to identify (or create if it didn t exist) the atomic key that identifies the market basket transaction when we setup the association analysis. Note that the Excel Association mining algorithm cannot accept a concatenated key. 1. Execute the SQL command below in the ClassPerformanceDW database in SSMS: create view view_enrollments as select top 100 percent [student_sk] as transactionid, fe.[class_sk], [coursename], [ActualDate] from factenrollment fe, dimclass c, dimtime t where fe.class_sk = c.class_sk and fe.date_sk = t.datesk order by transactionid; Student Notes 4/15/ :31 AM Page 32 of 38

33 Note that TOP % is used to circumvent the limitation of having an Order By in a view. Order By is used to ensure each student s registrations are grouped together. Configure a connection to SSAS 1. Open Excel; open a Blank Workbook 2. Click on the Data Mining tab on the ribbon 3. Click on the <No Connection> button 4. Click New and choose the Server and Catalog names as indicated below, then click OK: 5. Click Close. Student Notes 4/15/ :31 AM Page 33 of 38

34 Import data 1. Go to the Data tab 2. In the External Data group, click on the From Other Sources dropdown, choose From SQL Server, then choose options as follows: Set the correct database Server Name Choose the ClassPerformanceDW database Accept the defaults Ensure cell $A$1 selected Student Notes 4/15/ :31 AM Page 34 of 38

35 Explore Data You can Explore data that will be used for mining. Exploring data in Excel gives you visual plots of distribution of column values: 1. Click Data Mining tab 2. Click Explore Data; Next; Next 3. On the Select Column page, select a column to explore (e.g., Coursename) by clicking on the column heading; Next 4. You should see a histogram showing the Values of the column you chose and how many records have those values; click Finish Clean/Transform Data You can also Clean/transform data that will be used for mining. For example you can remove outlier values and expand data values (e.g., TX becomes Texas). We ll do an example of re-labeling. 1. Click Data Mining tab 2. Click Clean Data dropdown and choose Re-Label; Next; Next 3. On the Select Column page, select a column to re-label (e.g., Coursename); Next 4. Specify New Labels for a one course (e.g., change db to database); Next 5. Select Change data in place ; Finish Model Development/Training Like SSAS, Excel provides wizards that help you build data mining models without having to understand the details of the algorithms that the models are built on. The data modeling group of the Data Mining tab shows that Excel supports the development of the following types of models: Classification (i.e., Decision Tree, discrete value prediction) Estimation (i.e., Decision Tree, continuous value prediction) Cluster Associate Forecast (i.e., Time Series) Advanced (above plus Regression, Naïve Bayes, and Neural Networks) Student Notes 4/15/ :31 AM Page 35 of 38

36 We ll use the Shopping Basket Analysis wizard to create an association-based model to understand which courses tend to be taken together. 1. Click anywhere in the table of data to expose the Table Tools contextual tabs 2. On the Analyze tab, click Shopping Basket Analysis 3. Set fields as indicated below, then click Advanced 4. Set minimum support and minimum rule probability as indicated below; OK, then Run Student Notes 4/15/ :31 AM Page 36 of 38

37 Model Validation/Evaluation The resulting itemsets appear on the Shopping Basket Bundled Item worksheet, in descending order of the number of times the grouping of items appears in transactions. Note that those itemsets that didn t meet the minimum support criteria (combination occurs in 10% of sales /transactions) are not shown here. Because we had only 11 baskets/students/transactions/ sales, the minimum required was Truncate(11 sales *.10) = 1 sale. So basically all our itemsets are shown. Student Notes 4/15/ :31 AM Page 37 of 38

38 A second worksheet, Shopping Basket Recommendations, shows the subset of itemsets that meet the minimum probability criteria these are also known as the association rules. For our example the minimum probability criteria was a minimum of 40% of baskets/students/transactions/ sales. In the example below, class #2 appeared in 6 baskets/students/transactions/ sales, but class #8 appeared on only 5 of 6 of those sales, resulting in 5/6 = 83.33% of linked sales. The Importance represents the statistical confidence in the association rule. Model Deployment/Use If we had completed this Association analyses in SSAS, we could have generated DMX queries that could be issued by users and/or applications to make recommendations of products. Student Notes 4/15/ :31 AM Page 38 of 38

Data Mining with SQL Server Data Tools

Data Mining with SQL Server Data Tools Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining

More information

IT462 Lab 5: Clustering with MS SQL Server

IT462 Lab 5: Clustering with MS SQL Server IT462 Lab 5: Clustering with MS SQL Server This lab should give you the chance to practice some of the data mining techniques you've learned in class. Preliminaries: For this lab, you will use the SQL

More information

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT. Data Mining Tutorial Seth Paul Jamie MacLennan Zhaohui Tang Scott Oveson Microsoft Corporation June 2005 Abstract: Microsoft SQL Server 2005 provides an integrated environment for creating and working

More information

SQL Server 2014 BI. Lab 04. Enhancing an E-Commerce Web Application with Analysis Services Data Mining in SQL Server 2014. Jump to the Lab Overview

SQL Server 2014 BI. Lab 04. Enhancing an E-Commerce Web Application with Analysis Services Data Mining in SQL Server 2014. Jump to the Lab Overview SQL Server 2014 BI Lab 04 Enhancing an E-Commerce Web Application with Analysis Services Data Mining in SQL Server 2014 Jump to the Lab Overview Terms of Use 2014 Microsoft Corporation. All rights reserved.

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Data Mining and Predictive Modeling with Excel 2007

Data Mining and Predictive Modeling with Excel 2007 Spyridon Ganas Abstract With the release of Excel 2007 and SQL Server 2008, Microsoft has provided actuaries with a powerful and easy to use predictive modeling platform. This paper provides a brief overview

More information

from Larson Text By Susan Miertschin

from Larson Text By Susan Miertschin Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1 Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines.

More information

Microsoft Access 2010 handout

Microsoft Access 2010 handout Microsoft Access 2010 handout Access 2010 is a relational database program you can use to create and manage large quantities of data. You can use Access to manage anything from a home inventory to a giant

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

Retrieving Data from OLAP Servers

Retrieving Data from OLAP Servers CHAPTER Retrieving Data from OLAP Servers In this chapter What Is OLAP? 790 by Timothy Dyck and John Shumate timothy_dyck@dyck.org, jcshumate@starpower.net Server Versus Client OLAP 791 Creating an OLAP

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Heat Map Explorer Getting Started Guide

Heat Map Explorer Getting Started Guide You have made a smart decision in choosing Lab Escape s Heat Map Explorer. Over the next 30 minutes this guide will show you how to analyze your data visually. Your investment in learning to leverage heat

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Create a New Database in Access 2010

Create a New Database in Access 2010 Create a New Database in Access 2010 Table of Contents OVERVIEW... 1 CREATING A DATABASE... 1 ADDING TO A DATABASE... 2 CREATE A DATABASE BY USING A TEMPLATE... 2 CREATE A DATABASE WITHOUT USING A TEMPLATE...

More information

Decision Support AITS University Administration. Web Intelligence Rich Client 4.1 User Guide

Decision Support AITS University Administration. Web Intelligence Rich Client 4.1 User Guide Decision Support AITS University Administration Web Intelligence Rich Client 4.1 User Guide 2 P age Web Intelligence 4.1 User Guide Web Intelligence 4.1 User Guide Contents Getting Started in Web Intelligence

More information

Using Microsoft Dynamics CRM for Analytical CRM: A Curriculum Package for Business Intelligence or Data Mining Courses

Using Microsoft Dynamics CRM for Analytical CRM: A Curriculum Package for Business Intelligence or Data Mining Courses Using Microsoft Dynamics CRM for Analytical CRM: A Curriculum Package for Business Intelligence or Data Mining Courses Huei Lee, Ph.D. Professor Department of Computer Information Systems College of Business

More information

Creating BI solutions with BISM Tabular. Written By: Dan Clark

Creating BI solutions with BISM Tabular. Written By: Dan Clark Creating BI solutions with BISM Tabular Written By: Dan Clark CONTENTS PAGE 3 INTRODUCTION PAGE 4 PAGE 5 PAGE 7 PAGE 8 PAGE 9 PAGE 9 PAGE 11 PAGE 12 PAGE 13 PAGE 14 PAGE 17 SSAS TABULAR MODE TABULAR MODELING

More information

MicroStrategy Desktop

MicroStrategy Desktop MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from

More information

SQL Server Administrator Introduction - 3 Days Objectives

SQL Server Administrator Introduction - 3 Days Objectives SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying

More information

Chapter 4 Displaying and Describing Categorical Data

Chapter 4 Displaying and Describing Categorical Data Chapter 4 Displaying and Describing Categorical Data Chapter Goals Learning Objectives This chapter presents three basic techniques for summarizing categorical data. After completing this chapter you should

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Easily Identify Your Best Customers

Easily Identify Your Best Customers IBM SPSS Statistics Easily Identify Your Best Customers Use IBM SPSS predictive analytics software to gain insight from your customer database Contents: 1 Introduction 2 Exploring customer data Where do

More information

Tutorial 3. Maintaining and Querying a Database

Tutorial 3. Maintaining and Querying a Database Tutorial 3 Maintaining and Querying a Database Microsoft Access 2010 Objectives Find, modify, and delete records in a table Learn how to use the Query window in Design view Create, run, and save queries

More information

Microsoft Office 2010

Microsoft Office 2010 Access Tutorial 3 Maintaining and Querying a Database Microsoft Office 2010 Objectives Find, modify, and delete records in a table Learn how to use the Query window in Design view Create, run, and save

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Working with SQL Server Integration Services

Working with SQL Server Integration Services SQL Server Integration Services (SSIS) is a set of tools that let you transfer data to and from SQL Server 2005. In this lab, you ll work with the SQL Server Business Intelligence Development Studio to

More information

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint 2013. InfoPath 2013 Web Enabled (Browser) forms

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint 2013. InfoPath 2013 Web Enabled (Browser) forms Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint 2013. InfoPath 2013 Web Enabled (Browser) forms InfoPath 2013 Web Enabled (Browser) forms Creating Web Enabled

More information

Master Data Services. SQL Server 2012 Books Online

Master Data Services. SQL Server 2012 Books Online Master Data Services SQL Server 2012 Books Online Summary: Master Data Services (MDS) is the SQL Server solution for master data management. Master data management (MDM) describes the efforts made by an

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

HELP CONTENTS INTRODUCTION...

HELP CONTENTS INTRODUCTION... HELP CONTENTS INTRODUCTION... 1 What is GMATPrep... 1 GMATPrep tip: to get the most from GMATPrep software, think about how you study best... 1 Navigating around... 2 The top navigation... 2 Breadcrumbs,

More information

SQL Server Business Intelligence

SQL Server Business Intelligence SQL Server Business Intelligence Setup and Configuration Guide Himanshu Gupta Technology Solutions Professional Data Platform Contents 1. OVERVIEW... 3 2. OBJECTIVES... 3 3. ASSUMPTIONS... 4 4. CONFIGURE

More information

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

GETTING AHEAD OF THE COMPETITION WITH DATA MINING WHITE PAPER GETTING AHEAD OF THE COMPETITION WITH DATA MINING Ultimately, data mining boils down to continually finding new ways to be more profitable which in today s competitive world means making better

More information

Search help. More on Office.com: images templates

Search help. More on Office.com: images templates Page 1 of 14 Access 2010 Home > Access 2010 Help and How-to > Getting started Search help More on Office.com: images templates Access 2010: database tasks Here are some basic database tasks that you can

More information

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents: Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap

More information

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle Outlines Business Intelligence Lecture 15 Why integrate BI into your smart client application? Integrating Mining into your application Integrating into your application What Is Business Intelligence?

More information

Oracle Data Mining Hands On Lab

Oracle Data Mining Hands On Lab Oracle Data Mining Hands On Lab Material provided by Oracle Corporation Vlamis Software Solutions is one of the most respected training organizations in the Oracle Business Intelligence community because

More information

Data Mining is the process of knowledge discovery involving finding

Data Mining is the process of knowledge discovery involving finding using analytic services data mining framework for classification predicting the enrollment of students at a university a case study Data Mining is the process of knowledge discovery involving finding hidden

More information

Outlook 2010 Essentials

Outlook 2010 Essentials Outlook 2010 Essentials Training Manual SD35 Langley Page 1 TABLE OF CONTENTS Module One: Opening and Logging in to Outlook...1 Opening Outlook... 1 Understanding the Interface... 2 Using Backstage View...

More information

Access 2007 Creating Forms Table of Contents

Access 2007 Creating Forms Table of Contents Access 2007 Creating Forms Table of Contents CREATING FORMS IN ACCESS 2007... 3 UNDERSTAND LAYOUT VIEW AND DESIGN VIEW... 3 LAYOUT VIEW... 3 DESIGN VIEW... 3 UNDERSTAND CONTROLS... 4 BOUND CONTROL... 4

More information

How To Create A Report In Excel

How To Create A Report In Excel Table of Contents Overview... 1 Smartlists with Export Solutions... 2 Smartlist Builder/Excel Reporter... 3 Analysis Cubes... 4 MS Query... 7 SQL Reporting Services... 10 MS Dynamics GP Report Templates...

More information

3 What s New in Excel 2007

3 What s New in Excel 2007 3 What s New in Excel 2007 3.1 Overview of Excel 2007 Microsoft Office Excel 2007 is a spreadsheet program that enables you to enter, manipulate, calculate, and chart data. An Excel file is referred to

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

USER GUIDE. Unit 2: Synergy. Chapter 2: Using Schoolwires Synergy

USER GUIDE. Unit 2: Synergy. Chapter 2: Using Schoolwires Synergy USER GUIDE Unit 2: Synergy Chapter 2: Using Schoolwires Synergy Schoolwires Synergy & Assist Version 2.0 TABLE OF CONTENTS Introductions... 1 Audience... 1 Objectives... 1 Before You Begin... 1 Getting

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Instructions for applying data validation(s) to data fields in Microsoft Excel

Instructions for applying data validation(s) to data fields in Microsoft Excel 1 of 10 Instructions for applying data validation(s) to data fields in Microsoft Excel According to Microsoft Excel, a data validation is used to control the type of data or the values that users enter

More information

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc. Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

More information

Microsoft Word Track Changes

Microsoft Word Track Changes Microsoft Word Track Changes This document is provided for your information only. You SHOULD NOT upload a document into imedris that contains tracked changes. You can choose to use track changes for your

More information

Microsoft Access 2010 Part 1: Introduction to Access

Microsoft Access 2010 Part 1: Introduction to Access CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES Microsoft Access 2010 Part 1: Introduction to Access Fall 2014, Version 1.2 Table of Contents Introduction...3 Starting Access...3

More information

Creating a Patch Management Dashboard with IT Analytics Hands-On Lab

Creating a Patch Management Dashboard with IT Analytics Hands-On Lab Creating a Patch Management Dashboard with IT Analytics Hands-On Lab Description This lab provides a hands-on overview of the IT Analytics Solution. Students will learn how to browse cubes and configure

More information

Instructions for Use. CyAn ADP. High-speed Analyzer. Summit 4.3. 0000050G June 2008. Beckman Coulter, Inc. 4300 N. Harbor Blvd. Fullerton, CA 92835

Instructions for Use. CyAn ADP. High-speed Analyzer. Summit 4.3. 0000050G June 2008. Beckman Coulter, Inc. 4300 N. Harbor Blvd. Fullerton, CA 92835 Instructions for Use CyAn ADP High-speed Analyzer Summit 4.3 0000050G June 2008 Beckman Coulter, Inc. 4300 N. Harbor Blvd. Fullerton, CA 92835 Overview Summit software is a Windows based application that

More information

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts

More information

Microsoft Excel 2007 Consolidate Data & Analyze with Pivot Table Windows XP

Microsoft Excel 2007 Consolidate Data & Analyze with Pivot Table Windows XP Microsoft Excel 2007 Consolidate Data & Analyze with Pivot Table Windows XP Consolidate Data in Multiple Worksheets Example data is saved under Consolidation.xlsx workbook under ProductA through ProductD

More information

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining. SPSS Clementine 12.0. 1. Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine Data Mining SPSS 12.0 1. Overview Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Types of Models Interface Projects References Outline Introduction Introduction Three of the common data mining

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Unleash the Power of e-learning

Unleash the Power of e-learning Unleash the Power of e-learning Version 1.5 November 2011 Edition 2002-2011 Page2 Table of Contents ADMINISTRATOR MENU... 3 USER ACCOUNTS... 4 CREATING USER ACCOUNTS... 4 MODIFYING USER ACCOUNTS... 7 DELETING

More information

BID2WIN Workshop. Advanced Report Writing

BID2WIN Workshop. Advanced Report Writing BID2WIN Workshop Advanced Report Writing Please Note: Please feel free to take this workbook home with you! Electronic copies of all lab documentation are available for download at http://www.bid2win.com/userconf/2011/labs/

More information

I. Create the base view with the data you want to measure

I. Create the base view with the data you want to measure Developing Key Performance Indicators (KPIs) in Tableau The following tutorial will show you how to create KPIs in Tableau 9. To get started, you will need the following: Tableau version 9 Data: Sample

More information

SPSS: Getting Started. For Windows

SPSS: Getting Started. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 Introduction to SPSS Tutorials... 3 1.2 Introduction to SPSS... 3 1.3 Overview of SPSS for Windows... 3 Section 2: Entering

More information

Introduction to Microsoft Access 2003

Introduction to Microsoft Access 2003 Introduction to Microsoft Access 2003 Zhi Liu School of Information Fall/2006 Introduction and Objectives Microsoft Access 2003 is a powerful, yet easy to learn, relational database application for Microsoft

More information

Excel 2010: Create your first spreadsheet

Excel 2010: Create your first spreadsheet Excel 2010: Create your first spreadsheet Goals: After completing this course you will be able to: Create a new spreadsheet. Add, subtract, multiply, and divide in a spreadsheet. Enter and format column

More information

STC: Descriptive Statistics in Excel 2013. Running Descriptive and Correlational Analysis in Excel 2013

STC: Descriptive Statistics in Excel 2013. Running Descriptive and Correlational Analysis in Excel 2013 Running Descriptive and Correlational Analysis in Excel 2013 Tips for coding a survey Use short phrases for your data table headers to keep your worksheet neat, you can always edit the labels in tables

More information

Configuration Manager

Configuration Manager After you have installed Unified Intelligent Contact Management (Unified ICM) and have it running, use the to view and update the configuration information in the Unified ICM database. The configuration

More information

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com Sage 500 ERP Intelligence Reporting Getting Started Guide 27.11.2012 Table of Contents 1.0 Getting started 3 2.0 Managing your reports 10 3.0 Defining report properties 18 4.0 Creating a simple PivotTable

More information

Microsoft SQL Server" Analysis Services 2008. ArtTennick

Microsoft SQL Server Analysis Services 2008. ArtTennick Microsoft SQL Server" Analysis Services 2008 ArtTennick Contents Acknowledgments Introduction xvii x'x Chapter 1 Cases Queries 1 Examining Source Data 2 Flattened Nested Case Table 3 Specific Source Columns

More information

Access Tutorial 1 Creating a Database. Microsoft Office 2013 Enhanced

Access Tutorial 1 Creating a Database. Microsoft Office 2013 Enhanced Access Tutorial 1 Creating a Database Microsoft Office 2013 Enhanced Objectives Session 1.1 Learn basic database concepts and terms Start and exit Access Explore the Microsoft Access window and Backstage

More information

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700

ACCESS 2007. Importing and Exporting Data Files. Information Technology. MS Access 2007 Users Guide. IT Training & Development (818) 677-1700 Information Technology MS Access 2007 Users Guide ACCESS 2007 Importing and Exporting Data Files IT Training & Development (818) 677-1700 training@csun.edu TABLE OF CONTENTS Introduction... 1 Import Excel

More information

Microsoft Office 2010

Microsoft Office 2010 Access Tutorial 1 Creating a Database Microsoft Office 2010 Objectives Learn basic database concepts and terms Explore the Microsoft Access window and Backstage view Create a blank database Create and

More information

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide

Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner A Beginner s Guide Olivia Parr-Rud From Business Analytics Using SAS Enterprise Guide and SAS Enterprise Miner. Full book available

More information

Plotting: Customizing the Graph

Plotting: Customizing the Graph Plotting: Customizing the Graph Data Plots: General Tips Making a Data Plot Active Within a graph layer, only one data plot can be active. A data plot must be set active before you can use the Data Selector

More information

SAP BusinessObjects Business Intelligence (BI) platform Document Version: 4.1, Support Package 3-2014-04-03. Report Conversion Tool Guide

SAP BusinessObjects Business Intelligence (BI) platform Document Version: 4.1, Support Package 3-2014-04-03. Report Conversion Tool Guide SAP BusinessObjects Business Intelligence (BI) platform Document Version: 4.1, Support Package 3-2014-04-03 Table of Contents 1 Report Conversion Tool Overview.... 4 1.1 What is the Report Conversion Tool?...4

More information

EzyScript User Manual

EzyScript User Manual Version 1.4 Z Option 417 Oakbend Suite 200 Lewisville, Texas 75067 www.zoption.com (877) 653-7215 (972) 315-8800 fax: (972) 315-8804 EzyScript User Manual SAP Transaction Scripting & Table Querying Tool

More information

This document is provided "as-is". Information and views expressed in this document, including URLs and other Internet Web site references, may

This document is provided as-is. Information and views expressed in this document, including URLs and other Internet Web site references, may This document is provided "as-is". Information and views expressed in this document, including URLs and other Internet Web site references, may change without notice. Some examples depicted herein are

More information

Microsoft Access 2007

Microsoft Access 2007 How to Use: Microsoft Access 2007 Microsoft Office Access is a powerful tool used to create and format databases. Databases allow information to be organized in rows and tables, where queries can be formed

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

Tutorial 3 Maintaining and Querying a Database

Tutorial 3 Maintaining and Querying a Database Tutorial 3 Maintaining and Querying a Database Microsoft Access 2013 Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the Query window in

More information

EXCEL 2007. Using Excel for Data Query & Management. Information Technology. MS Office Excel 2007 Users Guide. IT Training & Development

EXCEL 2007. Using Excel for Data Query & Management. Information Technology. MS Office Excel 2007 Users Guide. IT Training & Development Information Technology MS Office Excel 2007 Users Guide EXCEL 2007 Using Excel for Data Query & Management IT Training & Development (818) 677-1700 Training@csun.edu http://www.csun.edu/training TABLE

More information

Excel -- Creating Charts

Excel -- Creating Charts Excel -- Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel

More information

Microsoft Outlook 2007 Calendar Features

Microsoft Outlook 2007 Calendar Features Microsoft Outlook 2007 Calendar Features Participant Guide HR Training and Development For technical assistance, please call 257-1300 Copyright 2007 Microsoft Outlook 2007 Calendar Objectives After completing

More information

Figure 1. An embedded chart on a worksheet.

Figure 1. An embedded chart on a worksheet. 8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the

More information

Creating Dashboards for Microsoft Project Server 2010

Creating Dashboards for Microsoft Project Server 2010 Creating Dashboards for Microsoft Project Server 2010 Authors: Blaise Novakovic, Jean-Francois LeSaux, Steven Haden, Microsoft Consulting Services Information in the document, including URL and other Internet

More information

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced

Access Tutorial 3 Maintaining and Querying a Database. Microsoft Office 2013 Enhanced Access Tutorial 3 Maintaining and Querying a Database Microsoft Office 2013 Enhanced Objectives Session 3.1 Find, modify, and delete records in a table Hide and unhide fields in a datasheet Work in the

More information

Snap 9 Professional s Scanning Module

Snap 9 Professional s Scanning Module Miami s Quick Start Guide for Using Snap 9 Professional s Scanning Module to Create a Scannable Paper Survey Miami s Survey Solutions Snap 9 Professional Scanning Module Overview The Snap Scanning Module

More information

Module One: Getting Started... 6. Opening Outlook... 6. Setting Up Outlook for the First Time... 7. Understanding the Interface...

Module One: Getting Started... 6. Opening Outlook... 6. Setting Up Outlook for the First Time... 7. Understanding the Interface... 2 CONTENTS Module One: Getting Started... 6 Opening Outlook... 6 Setting Up Outlook for the First Time... 7 Understanding the Interface...12 Using Backstage View...14 Viewing Your Inbox...15 Closing Outlook...17

More information

SQL Server Analysis Services Complete Practical & Real-time Training

SQL Server Analysis Services Complete Practical & Real-time Training A Unit of Sequelgate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner SQL Server Analysis Services Complete Practical & Real-time Training Mode: Practical,

More information

Microsoft Excel 2010 Pivot Tables

Microsoft Excel 2010 Pivot Tables Microsoft Excel 2010 Pivot Tables Email: training@health.ufl.edu Web Page: http://training.health.ufl.edu Microsoft Excel 2010: Pivot Tables 1.5 hours Topics include data groupings, pivot tables, pivot

More information

SAS BI Dashboard 4.3. User's Guide. SAS Documentation

SAS BI Dashboard 4.3. User's Guide. SAS Documentation SAS BI Dashboard 4.3 User's Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS BI Dashboard 4.3: User s Guide. Cary, NC: SAS Institute

More information

Excel Database Management Microsoft Excel 2003

Excel Database Management Microsoft Excel 2003 Excel Database Management Microsoft Reference Guide University Technology Services Computer Training Copyright Notice Copyright 2003 EBook Publishing. All rights reserved. No part of this publication may

More information

CHAPTER 11: SALES REPORTING

CHAPTER 11: SALES REPORTING Chapter 11: Sales Reporting CHAPTER 11: SALES REPORTING Objectives Introduction The objectives are: Understand the tools you use to evaluate sales data. Use default sales productivity reports to review

More information

Search help. More on Office.com: images templates. Here are some basic tasks that you can do in Microsoft Excel 2010.

Search help. More on Office.com: images templates. Here are some basic tasks that you can do in Microsoft Excel 2010. Page 1 of 8 Excel 2010 Home > Excel 2010 Help and How-to > Getting started with Excel Search help More on Office.com: images templates Basic tasks in Excel 2010 Here are some basic tasks that you can do

More information

SAS BI Dashboard 3.1. User s Guide

SAS BI Dashboard 3.1. User s Guide SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard

More information

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification

More information

Developing Web and Mobile Dashboards with Oracle ADF

Developing Web and Mobile Dashboards with Oracle ADF Developing Web and Mobile Dashboards with Oracle ADF In this lab you ll build a web dashboard that displays data from the database in meaningful ways. You are going to leverage Oracle ADF the Oracle Application

More information

Chapter 3 ADDRESS BOOK, CONTACTS, AND DISTRIBUTION LISTS

Chapter 3 ADDRESS BOOK, CONTACTS, AND DISTRIBUTION LISTS Chapter 3 ADDRESS BOOK, CONTACTS, AND DISTRIBUTION LISTS 03Archer.indd 71 8/4/05 9:13:59 AM Address Book 3.1 What Is the Address Book The Address Book in Outlook is actually a collection of address books

More information

Hierarchical Clustering Analysis

Hierarchical Clustering Analysis Hierarchical Clustering Analysis What is Hierarchical Clustering? Hierarchical clustering is used to group similar objects into clusters. In the beginning, each row and/or column is considered a cluster.

More information

Salient Dashboard Designer 5.75. Training Guide

Salient Dashboard Designer 5.75. Training Guide Salient Dashboard Designer 5.75 Training Guide Salient Dashboard Designer Salient Dashboard Designer enables your team to create interactive consolidated visualizations of decision support intelligence,

More information

Core Essentials. Outlook 2010. Module 1. Diocese of St. Petersburg Office of Training Training@dosp.org

Core Essentials. Outlook 2010. Module 1. Diocese of St. Petersburg Office of Training Training@dosp.org Core Essentials Outlook 2010 Module 1 Diocese of St. Petersburg Office of Training Training@dosp.org TABLE OF CONTENTS Topic One: Getting Started... 1 Workshop Objectives... 2 Topic Two: Opening and Closing

More information

Outlook Email. User Guide IS TRAINING CENTER. 833 Chestnut St, Suite 600. Philadelphia, PA 19107 215-503-7500

Outlook Email. User Guide IS TRAINING CENTER. 833 Chestnut St, Suite 600. Philadelphia, PA 19107 215-503-7500 Outlook Email User Guide IS TRAINING CENTER 833 Chestnut St, Suite 600 Philadelphia, PA 19107 215-503-7500 This page intentionally left blank. TABLE OF CONTENTS Getting Started... 3 Opening Outlook...

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information