Data Mining in SQL Server 2005 SQL Server 2005 Data Mining structures SQL Server 2005 mining structures created and managed using Visual Studio 2005 Business Intelligence Projects Analysis Services Project The generated models are deployed into an Analysis Services database 2 Basic steps Loading the data into a SQL Server 2005 database Creating the project into Visual Studio 2005 Business Intelligence Projects Analysis Services Project Defining a data source view Data mining algorithm choice Parameter settings Project deploy Model generation and browsing Example: classification You want to create a classification model based on the Microsoft Decision Trees mining structure The data have been loaded into a SQL Server 2005 relation database Database name: Iris Training data: iristrain table Test data: iristest table 3 4 Creating the project Create a new project into Visual Studio 2005 Business Intelligence Projects Analysis Services Project 5 6 1
Data source Define a new data source Connect to the Iris database in SQL Server 2005 Impersonation Information Default option 7 8 D B 9 D B 10 D B 11 D B 12 2
Data source view Define a new data source view Choose only the tables you need to train and to test the model In this example, both iristrain and iristest tables are needed Name matching Disable the option create logical relationships by matching column 13 14 D B 15 D B 16 D B 17 D B 18 3
Analysis Server The models are stored into an Analysis Server database Define the Analysis Server and the data base where to store the models Deployment project properties 19 20 Model definition Create a new Mining Structure Choose the data source type Relational table Cube Choose a data mining algorithm Choose the data source and the table with training data Choose the table fields and their meaning Give a name to the model D B 21 D B 22 D B 23 D B 24 4
D B 25 D B 26 D B 27 D B 28 D B 29 D B 30 5
Model generation To create the model you have to deploy the model defined so far The model is saved into the Analysis Server database 31 32 D B 33 D B 34 Browsing the model Browse and analyze the generated model Choose Browse from the right click context menu of the Mining Structure D B 35 D B 36 6
D B 37 D B 38 Model validation Model validation can be performed classifying test data, i.e., new, previously unseen data Select the Mining Accuracy Chart tab Choose the test data Click the Select Case Table button from the Select Input Table(s) window In this example, choose the iristest table 39 40 D B 41 D B 42 7
Model validation Confusion matrix The Classification Matrix tab presents the confusion matrix obtained by classifying test data 43 44 Model validation Select the Lift Chart tab The Lift Chart allows to analyze the model performance regarding a particular class of test data Choose the class you want to evaluate the model on in the Column Mapping tab (at the bottom) Then, select the Lift Chart tab to see the result 45 46 Model validation Select Profit Chart from the combo box menu in the Lift Chart tab Useful for marketing campaign Profit analysis for different numbers of people contacted according to Used model Class (label) of the people to contact Overall population cardinality Fixed cost of the marketing campaign Individual cost for each contacted person Revenue for each individual D B 47 D B 48 8
D B 49 D B 50 Parameter settings Each model is characterized by some parameters The parameters set different features of the chosen mining structure, i.e., the data mining algorithm used by the model To set the mining structure parameters Select the Mining Models tab Select the Mining Structure e.g., Microsoft_Decision_Trees From the right click context menu, chose Set algorithm parameters D B 51 D B 52 D B 53 D B 54 9
D B 55 D B 56 10