9. Business Intelligence. 9.1 BI Overview. 9.1 BI Overview. 9.1 BI Overview. 9.1 BI Overview. 9. Business Intelligence

Size: px
Start display at page:

Download "9. Business Intelligence. 9.1 BI Overview. 9.1 BI Overview. 9.1 BI Overview. 9.1 BI Overview. 9. Business Intelligence"

Transcription

1 9. Business Intelligence Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig 9. Business Intelligence 9.1 Business Intelligence Overview 9.2 Principles of Data Mining DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2 What is Business Intelligence (BI)? The process, technologies and tools needed to turn data into information, information into knowledge and knowledge into plans that drive profitable business action BI comprises data warehousing, business analytic tools, and content/knowledge management Typical BI applications are Customer segmentation Propensity to buy (customer disposition to buy) Customer profitability Fraud detection Customer attrition (loss of customers) Channel optimization (connecting with the customer) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4 Customer segmentation What market segments do my customers fall into, and what are their characteristics? Personalize customer relationships for higher customer satisfaction and retention Propensity to buy Which customers are most likely to respond to my promotion? Target the right customers Increase campaign profitability by focusing on the customers most likely to buy DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

2 Customer profitability What is the lifetime profitability of my customer? Make individual business interaction decisions based on the overall profitability of customers Fraud detection How can I tell which transactions are likely to be fraudulent? If your wife has just proposed to increase your life insurance policy, you should probably order pizza for a while Quickly determine fraud and take immediate action to minimize damage DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8 Customer attrition Which customer is at risk of leaving? Prevent loss of high-value customers and let go of lower-value customerss Channel optimization What is the best channel to reach my customer in each segment? Interact with customers based on their preference and your need to manage cost Automated decision tools Rule-based systems that provide a solution usually in one functional area to a specific repetitive management problem in one industry E.g., automated loan approval, intelligent price setting Business performance management (BPM) A framework for defining, implementing and managing an enterprise s business strategy by linking objectives with factual measures - key performance indicators DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10 Dashboards Provide a comprehensive visual view of corporate performance measures, trends, and exceptions from multiple business areas Allows executives to see hot spots in seconds and explore the situation 9.2 Data Mining What is data mining (knowledge discovery in databases)? Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

3 9.2 Applications Market analysis Targeted marketing/ Customer profiling Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time Cross-market analysis Associations/co-relations between product sales Prediction based on the association of information 9.2 Applications Corporate analysis and risk management Finance planning and asset evaluation Cash flow analysis and prediction Trend analysis, time series, etc. Resource planning Summarize and compare the resources and spending Competition Monitor competitors and market directions Group customers into classes and a class-based pricing procedure Set pricing strategy in a highly competitive market DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Data Mining Architecture of DM systemsste Graphical user interface ETL Pattern evaluation Data mining engine Database or data warehouse server Filtering Knowledge-base 9.2 Data Mining Techniques Association (correlation and causality) Multi-dimensional vs. single-dimensional association age(x, ), income(x, K ) buys(x, PC ) [support = 2%, confidence = 60%] contains(t, computer ) contains(x, software ) [1%, 75%] Classification and Prediction Finding models (functions) that describe and distinguish classes or concepts for future predictions Presentation: decision-tree, classification rule, neural network Prediction: predict some unknown or missing numerical values Databases Data Warehouse DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Data Mining Techniques Cluster analysis Class label is unknown: group data to form new classes, e.g., advertising based on client groups Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Outlier analysis Outlier: a data object that does not comply with the general behavior of the data Can be considered as noise or exception, but is quite useful in fraud detection, rare events analysis Association rule mining has the objective of finding all co-occurrence relationships (called associations), among data items Classical application: market basket data analysis, which aims to discover how items are purchased by customers in a supermarket E.g., Cheese Wine [support = 10%, confidence = 80%] meaning that 10% of the customers buy cheese and wine together, and 80% of customers buying cheese also buy wine DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

4 Basic concepts of association rules Let be a set of items. Let be a set of transactions where each transaction is a set of items such that. An association rule is an implication of the form: where and Association rule mining market basket analysis example I set of all items sold in a store E.g., = Beef, = Chicken, = Cheese, T set of transactions The content of a customers basket E.g., : Beef, Chicken, Milk; : Beef, Cheese; : Cheese, Wine; : An association rule might be Beef, Chicken Milk, where {Beef, Chicken} is and {Milk} is DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20 Rules can be weak or strong The strength of a rule is measured by its support and confidence The support of a rule, is the percentage of transactions in T that contains and Can be seen as an estimate of the probability With as number of transactions in T the support of the rule is: The confidence of a rule, is the percentage of transactions in T containing, that contain Can be seen as estimate of the probability DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22 How do we interpret support and confidence? If support is too low, the rule may just occur due to chance Acting on a rule with low support may not be profitable since it covers too few cases If confidence is too low, we cannot reliably predict from Objective of mining association rules is to discover all associated rules in T that have support and confidence greater than a minimum threshold (minsup, minconf)! Finding rules based on support and confidence thresholds Transactions Let minsup = 30% and minconf = 80% T1 T2 Beef, Chicken, Milk Beef, Cheese Chicken, Clothes Milk is valid, [sup = 3/7 (42.84%), conf = 3/3 (100%)] Clothes Milk, Chicken is also valid, and there are more T3 T4 T5 T6 T7 Cheese, Boots Beef, Chicken, Cheese Beef, Chicken, Clothes, Cheese, Milk Clothes, Chicken, Milk Chicken, Milk, Clothes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

5 This is rather a simplistic view of shopping baskets Some important information is not considered e.g. the quantity of each item purchased, the price paid, There are a large number of rule mining algorithms They use different strategies and data structures Their resulting sets of rules are all the same Approaches in association rule mining Apriori algorithm Mining with multiple minimum supports Mining class association rules The best known mining algorithm is the Apriori algorithm Step 1: find all frequent itemsets (set of items with support minsup) Step 2: use frequent itemsets to generate rules DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Apriori Algorithm: Step Apriori Algorithm: Step 1 Step 1: frequent itemset generation The key is the apriori property (downward closure property): any subset of a frequent itemset is also a frequent itemset E.g., for minsup = 30% Chicken, Clothes, Milk Chicken, Clothes Chicken, Milk Clothes, Milk Chicken Clothes Milk T1 T2 T3 T4 T5 T6 T7 Transactions Beef, Chicken, Milk Beef, Cheese Cheese, Boots Beef, Chicken, Cheese Beef, Chicken, Clothes, Cheese, Milk Clothes, Chicken, Milk Chicken, Milk, Clothes Finding frequent items Find all 1-item frequent itemsets; then all 2-item frequent itemsets, etc. In each iteration k, only consider itemsets that contain a k-1 frequent itemset Optimization: the algorithm assumes that items are sorted in lexicographic order The order is used throughout the algorithm in each itemset {w[1], w[2],, w[k]} represents a k-itemset w consisting of items w[1], w[2],, w[k], where w[1] < w[2] < < w[k] according to the lexicographic order DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Finding frequent items 9.3 Apriori Algorithm: Step 1 Initial step Find frequent itemsets of size 1: F 1 Generalization, k 2 C k = candidates of size k: those itemsets of size k that could be frequent, given F k-1 F k = those itemsets that are actually frequent, F k C k (need to scan the database once) Generalization of candidates uses F k-1 as input and returns a superset (candidates) of the set of all frequent k-itemsets. It has two steps: Join step: generate all possible candidate itemsets C k of length k, e.g., I k = join(a k-1, B k-1 ) A k-1 = {i 1, i 2,, i k-2, i k-1 } and B k-1 = {i 1, i 2,, i k-2, i k-1 } and i k-1 < i k-1 ; Then I k = {i 1, i 2,, i k-2, i k-1, i k-1 } Prune step: remove those candidates in C k that do not respect the downward closure property (include k-1 non-frequent subsets) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

6 9.3 Apriori Algorithm: Step Apriori Algorithm: Step 1 Generalization e.g., F 3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}} Try joining each 2 candidates from F 3 {1, 2, 3} {1, 3, 4} {1, 2, 4} {1, 3, 4} {1, 3, 5} {2, 3, 4} {2, 3, 4} {1, 2, 3, 4} {1, 3, 5} {1, 3, 4, 5} {1, 2, 4} {1, 3, 4} {1, 3, 5} {2, 3, 4} {1, 3, 5} {2, 3, 4} After join C 4 = {{1, 2, 3, 4}, {1, 3, 4, 5}} Pruning: {1, 2, 3, 4} {1, 3, 4, 5} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 3, 4} {1, 3, 5} {1, 4, 5} {3, 4, 5} {1, 2, 3, 4} After pruning C 4 = {{1, 2, 3, 4}} F 3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}} {1, 3, 4, 5} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Apriori Algorithm: Step 1 Finding frequent items, example, minsup = 0.5 T300 1, 2, 3, 5 First T scan ({item}:count) T400 2, 5 C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3 F 1 : {1}:2, {2}:3, {3}:3, {5}:3; {4} has a support of ¼ < 0.5 so it does not belong to the frequent items C 2 = prune(join(f 1 )) join : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}; prune: C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}; (all items belong to F 1 ) TID Items T100 1, 3, 4 T200 2, 3, 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Apriori Algorithm: Step 1 Second T scan T100 1, 3, 4 C 2 : {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, T200 2, 3, 5 {3,5}:2 T300 1, 2, 3, 5 T400 2, 5 F 2 : {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2 Join: we could join {1,3} only with {1,4} or {1,5}, but they are not in F 2. The only possible join in F 2 is {2, 3} with {2, 5} resulting in {2, 3, 5}; prune({2, 3, 5}): {2, 3}, {2, 5}, {3, 5} all belong to F 2, hence, C 3 : {2, 3, 5} Third T scan {2, 3, 5}:2, then sup({2, 3, 5}) = 50%, minsup condition is fulfilled. Then F 3 : {2, 3, 5} TID Items DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Apriori Algorithm: Step 2 Step 2: generating rules from frequent itemsets Frequent itemsets are not the same as association rules One more step is needed to generate association rules: for each frequent itemset for each proper nonempty subset of : Let = \ ; is an association rule if: Confidence( ) minconf, Support( ) := = support(i) Confidence( ) := = support( ) / support( ) 9.3 Apriori Algorithm: Step 2 Rule generation example, minconf = 50% Suppose {2, 3, 5} is a frequent itemset, with sup=50%, as calculated in step 1 Proper nonempty subsets: {2, 3}, {2, 5}, {3, 5}, {2}, {3}, {5}, with sup=50%, 75%, 50%, 75%, 75%, 75% respectively These generate the following association rules: All rules have support = support(i) = 50% TID T100 T200 T400 Items 1, 3, 4 2, 3, 5 2, 5 T300 1, 2, 3, 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

7 9.3 Apriori Algorithm: Step Apriori Algorithm Rule generation, summary In order to obtain, we need to know support( ) and support( ) All the required information for confidence computation has already been recorded in itemset generation No need to read the transactions data any more This step is not as time-consuming as frequent itemsets generation Apriori Algorithm, summary If k is the size of the largest itemset, then it makes at most k passes over data (in practice, k is bounded e.g., 10) The mining exploits sparseness of data, and high minsup and minconf thresholds High minsup threshold makes it impossible to find rules involving rare items in the data. The solution is a mining with multiple minimum supports approach DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38 Mining with multiple minimum supports Single minimum support assumes that all items in the data are of the same nature and/or have similar frequencies, which is incorrect In practice, some items appear very frequently in the data, while others rarely appear E.g., in a supermarket, people buy cooking pans much less frequently than they buy bread and milk Rare item problem: if the frequencies of items vary significantly, we encounter two problems If minsup is set too high, those rules that involve rare items will not be found To find rules that involve both frequent and rare items, minsup has to be set very low. This may cause combinatorial explosion because those frequent items will be associated with one another in all possible ways DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40 Multiple Minimum Supports Each item can have a minimum item support Different support requirements for different rules To prevent very frequent items and very rare items from appearing in the same itemset, we introduce a support difference constraint ( ) Minsup of a rule Let MIS(i) be the minimum item support (MIS) value of item i. The minsup of a rule R is the lowest MIS value of the items in the rule: Rule satisfies its minimum support if its actual support is E.g., the user-specified MIS values are as follows: MIS(bread) = 2%, MIS(shoes) = 0.1%, MIS(clothes) = 0.2% clothes bread [sup=0.15%,conf =70%] doesn t satisfy its minsup clothes shoes [sup=0.15%,conf =70%] satisfies its minsup DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

8 Downward closure property is not valid anymore E.g., consider four items 1, 2, 3 and 4 in a database Their minimum item supports are MIS(1) = 10%, MIS(2) = 20%, MIS(3) = 5%, MIS(4) = 6% {1, 2} with a support of 9% is infrequent since min(10%, 20%) > 9%, but {1, 2, 3} could be frequent, if it would have a support of e.g., 7% If applied, downward closure, eliminates {1, 2} so that {1, 2, 3} is never evaluated How do we solve the downward closure property problem? Sort all items in according to their MIS values (make it a total order) The order is used throughout the algorithm in each itemset Each itemset w is of the following form: {w[1], w[2],, w[k]}, consisting of items, w[1], w[2],, w[k], where MIS(w[1]) MIS(w[2]) MIS(w[k]) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44 : Step 1 Multiple minimum supports is an extension of the Apriori algorithm Step 1: frequent itemset generation Initial step Produce the seeds for generating candidate itemsets Candidate generation For k = 2 Generalization For k 2, pruning step differs from the Apriori algorithm Step 2: rule generation Step 1: frequent itemset generation E.g., ={1, 2, 3, 4}, with given MIS(1)=10%, MIS(2)=20%, MIS(3)=5%, MIS(4)=6%, and consider n=100 transactions: Initial step Sort I according to the MIS value of each item. Let M represent the sorted items Sort, in = {3, 4, 1, 2} Scan the data once to record the support count of each item E.g., {3}:6, {4}:3, {1}:9 and {2}:25 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46 : Step 1 MIS(1)=10%, MIS(2)=20%, MIS(3)=5%, MIS(4)=6%, n=100 {3}:6, {4}:3, {1}:9 {2}:25 Go through the items in M to find the first item i, that meets MIS(i). Insert it into a list of seeds L For each subsequent item j in M (after i), if sup(j) MIS(i), then insert j in L MIS(3) = 5%; sup ({3}) = 6%; sup(3) > MIS(3), so L={3} Sup({4}) = 3% < MIS(3), so L remains {3} Sup({1}) = 9% > MIS(3), L = {3, 1} Sup({2}) = 25% > MIS(3), L = {3, 1, 2} Calculate F 1 from L based on MIS of each item in L F 1 = {{3}, {2}}, since sup({1}) = 9% < MIS(1) Why not eliminate {1} directly? Why calculate L and not directly F? Downward closure property is not valid from F anymore due to multiple minimum supports : Step 1 Items Candidate generation, k = 2. Let = 10% (support difference) L {3, 1, 2} Take each item (seed) from L in order. Use L and not F 1 due to the downward closure property invalidity! Test the chosen item against its MIS: sup({3}) MIS(3) If true, then we can use this value to form a level 2 candidate If not, then go to the next element in L If true, e.g., then try to form a 2 level candidate together with each of the next items in L, e.g., {3, 1}, then {3, 2} MIS SUP DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

9 : Step 1 {3, 1} is a candidate : and sup({1}) = 9%; MIS(3) = 5%; sup({3}) = 6%; := 10% 9% > 5% and 6%-9% < 10%, thus C 2 = {3, 1} Now try {3, 2} sup({2}) = 25%; 25% > 5% but 6%-25% > 10% so this candidate will be rejected due to the support difference constraint Items MIS SUP L {3, 1, 2} : Step 1 Items Pick the next seed from L, i.e. 1 (needed to try {1,2}) L {3, 1, 2} sup({1}) < MIS(1) so we can not use 1 as seed! Candidate generation for k=2 remains C 2 = {{3, 1}} Now read the transaction list and calculate the support of each item in C 2. Let s assume sup({3, 1})=6, which is larger than min(mis(3), MIS(1)) Thus F 2 = {{3, 1}} MIS SUP DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50 : Step 1 : Step 1 Generalization, k > 2 uses L k-1 as input and returns a superset (candidates) of the set of all frequent k- itemsets. It has two steps: Join step: same as in the case of k=2 I k = join(a k-1, B k-1 ) A k-1 = {i 1, i 2,, i k-2, i k-1 } and B k-1 = {i 1, i 2,, i k-2, i k-1 } and i k-1 < i k-1 and sup(i k-1 ) sup(i k-1 ) Then I k = {i 1, i 2,, i k-2, i k-1, i k-1 } Prune step: for each (k-1) subset s of I k, if s is not in F k-1, then I k can be removed from C k (it is not a good candidate). There is however one exception to this rule, when s does not include the first item from I k Generalization, k > 2 example: let s consider F3={{1, 2, 3}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {1, 4, 6}, {2, 3, 5}} After join we obtain {1, 2, 3, 5}, {1, 3, 4, 5} and {1, 4, 5, 6} (we do not consider the support difference constraint) After pruning we get C 4 = {{1, 2, 3, 5}, {1, 3, 4, 5}} {1, 2, 3, 5} is ok {1, 3, 4, 5} is not deleted although {3, 4, 5} F 3, because MIS(3) > MIS(1). If MIS(3) = MIS(1), it could be deleted {1, 4, 5, 6} is deleted because {1, 5, 6} F 3 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52 : Step 2 Step 2: rule generation Downward closure property is not valid anymore, therefore we have frequent k order items, which contain (k-1) non-frequent sub-items For those non-frequent items we do not have the support value recorded This problem arises when we form rules of the form A,B C, where MIS(C) = min(mis(a), MIS(B), MIS(C)) Conf(A,B C) = sup({a,b,c}) / sup({a,b}) We have the frequency of {A, B, C} because it is frequent, but we don t have the frequency to calculate support of {AB} since it is not frequent by itself This is called head-item problem : Step 2 Items {Clothes},{Bread} {Shoes, Clothes, Bread} SUP Rule generation Items Bread Clothes Shoes example MIS {Shoes, Clothes, Bread} is a frequent itemset since MIS({Shoes, Clothes, Bread}) = 0.1 < sup({shoes, Clothes, Bread}) = 0.12 However {Clothes, Bread} is not since neither Clothes nor Bread can seed frequent itemsets So we may not calculate the confidence of all rules depending on Shoes, i.e. rules: Clothes, Bread Shoes Clothes Shoes, Bread Bread Shoes, Clothes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

10 : Step 2 Head-item problem e.g.: Clothes, Bread Shoes; Clothes Shoes, Bread; Bread Shoes, Clothes. If we have some item on the right side of a rule, which has the minimum MIS (e.g. Shoes), we may not be able to calculate the confidence without reading the data again Advantages It is a more realistic model for practical applications The model enables us to find rare item rules, but without producing a huge number of meaningless rules with frequent items By setting MIS values of some items to 100% (or more), we can effectively instruct the algorithms not to generate rules only involving these items DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Class Association Rules Mining Class Association Rules (CAR) Normal association rule mining doesn t have a target It finds all possible rules that exist in data, i.e., any item can appear as a consequent or a condition of a rule However, in some applications, the user is interested in some targets E.g. the user has a set of text documents from some known topics. He wants to find out what words are associated or correlated with each topic CAR, example A text document data set doc 1: Student, Teach, School : Education doc 2: Student, School : Education doc 3: Teach, School, City, Game : Education doc 4: Baseball, Basketball : Sport doc 5: Basketball, Player, Spectator : Sport doc 6: Baseball, Coach, Game, Team : Sport doc 7: Basketball, Team, City, Game : Sport Let minsup = 20% and minconf = 60%. Examples of class association rules: Student, School Education [sup= 2/7, conf = 2/2] Game Sport [sup= 2/7, conf = 2/3] DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig Class Association Rules CAR can also be extended with multiple minimum supports The user can specify different minimum supports to different classes, which effectively assign a different minimum support to rules of each class E.g. a data set with two classes, Yes and No. We may want rules of class Yes to have the minimum support of 5% and rules of class No to have the minimum support of 10% By setting minimum class supports to 100% we can skip generating rules of those classes Tools Open source projects Weka RapidMiner Commercial Intelligent Miner, replaced by DB2 Data Warehouse Editions PASW Modeler, developed by SPSS Oracle Data Mining (ODM) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

11 Apriori algorithm, on CAR data-sets Class values: unacceptable, acceptable, good, very good And 6 attributes: Buying cost: vhigh, high, med, low Maintenance costs: vhigh, high, med, low Apriori algorithm Number of rules Support interval Upper and lower bound Class index Confidence DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62 Apriori algorithm Largest frequent itemsets comprise 3 items Most powerful rules are simple rules Most of the people find 2 person cars unacceptable Lower confidence rule (62%) If 4 seat car, is found unacceptable, the it is because it s unsafe (rule 30) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64 Open source projects also have their limits Car accidents data set rows 54 attributes Summary Business Intelligence Overview Customer segmentation, propensity to buy, customer profitability, attrition, etc. Data Mining Overview Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Association Rule Mining Apriori algorithm, support, confidence, downward closure property Multiple minimum supports solve the rare-item problem Head-item problem DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65 Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 66

12 Next lecture Data Mining Time Series data Trend and Similarity Search Analysis Sequence Patterns DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

Data Mining Apriori Algorithm

Data Mining Apriori Algorithm 10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Organization Lectures Mondays and Thursdays from 10:30 to 12:30 Lecturer: Mouna Kacimi Office hours: appointment by email Labs Thursdays from 14:00 to 16:00 Teaching Assistant:

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2012/2013 Free University of Bozen, Bolzano DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html Organization

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Winter Semester 2010/2011 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms Data Mining Techniques forcrm Data Mining The non-trivial extraction of novel, implicit, and actionable knowledge from large datasets. Extremely large datasets Discovery of the non-obvious Useful knowledge

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots?

Class 10. Data Mining and Artificial Intelligence. Data Mining. We are in the 21 st century So where are the robots? Class 1 Data Mining Data Mining and Artificial Intelligence We are in the 21 st century So where are the robots? Data mining is the one really successful application of artificial intelligence technology.

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM Relationship Management Analytics What is Relationship Management? CRM is a strategy which utilises a combination of Week 13: Summary information technology policies processes, employees to develop profitable

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1 Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Chapter 6 - Enhancing Business Intelligence Using Information Systems Chapter 6 - Enhancing Business Intelligence Using Information Systems Managers need high-quality and timely information to support decision making Copyright 2014 Pearson Education, Inc. 1 Chapter 6 Learning

More information

Project Report. 1. Application Scenario

Project Report. 1. Application Scenario Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

Data Mining - Introduction

Data Mining - Introduction Data Mining - Introduction Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 Outline Business Intelligence and its components Knowledge discovery

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India

Data Warehousing and Data Mining for improvement of Customs Administration in India. Lessons learnt overseas for implementation in India Data Warehousing and Data Mining for improvement of Customs Administration in India Lessons learnt overseas for implementation in India Participants Shailesh Kumar (Group Leader) Sameer Chitkara (Asst.

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

ETPL Extract, Transform, Predict and Load

ETPL Extract, Transform, Predict and Load ETPL Extract, Transform, Predict and Load An Oracle White Paper March 2006 ETPL Extract, Transform, Predict and Load. Executive summary... 2 Why Extract, transform, predict and load?... 4 Basic requirements

More information

Databases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25

Databases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 Databases - Data Mining (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 This lecture This lecture introduces data-mining through market-basket analysis. (GF Royle, N Spadaccini 2006-2010)

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives

More information

Business Intelligence Solutions for Gaming and Hospitality

Business Intelligence Solutions for Gaming and Hospitality Business Intelligence Solutions for Gaming and Hospitality Prepared by: Mario Perkins Qualex Consulting Services, Inc. Suzanne Fiero SAS Objective Summary 2 Objective Summary The rise in popularity and

More information

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved

CHAPTER SIX DATA. Business Intelligence. 2011 The McGraw-Hill Companies, All Rights Reserved CHAPTER SIX DATA Business Intelligence 2011 The McGraw-Hill Companies, All Rights Reserved 2 CHAPTER OVERVIEW SECTION 6.1 Data, Information, Databases The Business Benefits of High-Quality Information

More information

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University Bussiness Intelligence and Data Warehouse Schedule Bussiness Intelligence (BI) BI tools Oracle vs. Microsoft Data warehouse History Tools Oracle vs. Others Discussion Business Intelligence (BI) Products

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Chapter 4 Getting Started with Business Intelligence

Chapter 4 Getting Started with Business Intelligence Chapter 4 Getting Started with Business Intelligence Learning Objectives and Learning Outcomes Learning Objectives Getting started on Business Intelligence 1. Understanding Business Intelligence 2. The

More information

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011 DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data

More information

Introduction to Data Mining

Introduction to Data Mining Bioinformatics Ying Liu, Ph.D. Laboratory for Bioinformatics University of Texas at Dallas Spring 2008 Introduction to Data Mining 1 Motivation: Why data mining? What is data mining? Data Mining: On what

More information

Data Mining for Fun and Profit

Data Mining for Fun and Profit Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools

More information

Data Mining for Business Analytics

Data Mining for Business Analytics Data Mining for Business Analytics Lecture 2: Introduction to Predictive Modeling Stern School of Business New York University Spring 2014 MegaTelCo: Predicting Customer Churn You just landed a great analytical

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Data Mining: An Introduction

Data Mining: An Introduction Data Mining: An Introduction Michael J. A. Berry and Gordon A. Linoff. Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition, 2004 Data mining What promotions should be targeted

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Data Mining. Vera Goebel. Department of Informatics, University of Oslo Data Mining Vera Goebel Department of Informatics, University of Oslo 2011 1 Lecture Contents Knowledge Discovery in Databases (KDD) Definition and Applications OLAP Architectures for OLAP and KDD KDD

More information

Mining Association Rules. Mining Association Rules. What Is Association Rule Mining? What Is Association Rule Mining? What is Association rule mining

Mining Association Rules. Mining Association Rules. What Is Association Rule Mining? What Is Association Rule Mining? What is Association rule mining Mining Association Rules What is Association rule mining Mining Association Rules Apriori Algorithm Additional Measures of rule interestingness Advanced Techniques 1 2 What Is Association Rule Mining?

More information

Hexaware Webinar Series Presents: The Presentation Will Begin Momentarily

Hexaware Webinar Series Presents: The Presentation Will Begin Momentarily Hexaware Webinar Series Presents: Data Mining - Seeing the future and knowing the patterns of your business using your organization data Rajesh Natarajan Senior Consultant, Hexaware Technologies Dec 5

More information

Mining Sequence Data. JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych

Mining Sequence Data. JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych Mining Sequence Data JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych Outline of the presentation 1. Realtionships to mining frequent items 2. Motivations for

More information

Fuzzy Association Rules

Fuzzy Association Rules Vienna University of Economics and Business Administration Fuzzy Association Rules An Implementation in R Master Thesis Vienna University of Economics and Business Administration Author Bakk. Lukas Helm

More information

How To Build A Decision Support System

How To Build A Decision Support System Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13. Decision Support Systems 13. Decision

More information

Users Interest Correlation through Web Log Mining

Users Interest Correlation through Web Log Mining Users Interest Correlation through Web Log Mining F. Tao, P. Contreras, B. Pauer, T. Taskaya and F. Murtagh School of Computer Science, the Queen s University of Belfast; DIW-Berlin Abstract When more

More information

Data Mining Approach in Security Information and Event Management

Data Mining Approach in Security Information and Event Management Data Mining Approach in Security Information and Event Management Anita Rajendra Zope, Amarsinh Vidhate, and Naresh Harale Abstract This paper gives an overview of data mining field & security information

More information

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

Five Predictive Imperatives for Maximizing Customer Value

Five Predictive Imperatives for Maximizing Customer Value Five Predictive Imperatives for Maximizing Customer Value Applying predictive analytics to enhance customer relationship management Contents: 1 Customers rule the economy 1 Many CRM initiatives are failing

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz March 1, 2015 The Database Approach to Data Management Database: Collection of related files containing records on people, places, or things.

More information

MBA 8473 - Data Mining & Knowledge Discovery

MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various

More information

Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts and Algorithms 6 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data

More information

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators

More information

CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics

CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CII Institute of Logistics Session map Session1 Session 2 Introduction The new focus on customer loyalty CRM and Business Intelligence CRM Marketing initiatives Session

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Data Mining for Retail Website Design and Enhanced Marketing

Data Mining for Retail Website Design and Enhanced Marketing Data Mining for Retail Website Design and Enhanced Marketing Inaugural-Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultät der Heinrich-Heine-Universität Düsseldorf

More information

CONTEMPORARY DECISION SUPPORT AND KNOWLEDGE MANAGEMENT TECHNOLOGIES

CONTEMPORARY DECISION SUPPORT AND KNOWLEDGE MANAGEMENT TECHNOLOGIES I International Symposium Engineering Management And Competitiveness 2011 (EMC2011) June 24-25, 2011, Zrenjanin, Serbia CONTEMPORARY DECISION SUPPORT AND KNOWLEDGE MANAGEMENT TECHNOLOGIES Slavoljub Milovanovic

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

An Empirical Study of Application of Data Mining Techniques in Library System

An Empirical Study of Application of Data Mining Techniques in Library System An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani

More information

Data Warehousing and Data Mining. A.A. 04-05 Datawarehousing & Datamining 1

Data Warehousing and Data Mining. A.A. 04-05 Datawarehousing & Datamining 1 Data Warehousing and Data Mining A.A. 04-05 Datawarehousing & Datamining 1 Outline 1. Introduction and Terminology 2. Data Warehousing 3. Data Mining Association rules Sequential patterns Classification

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Data Mining as Part of Knowledge Discovery in Databases (KDD) Mining as Part of Knowledge Discovery in bases (KDD) Presented by Naci Akkøk as part of INF4180/3180, Advanced base Systems, fall 2003 (based on slightly modified foils of Dr. Denise Ecklund from 6 November

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Five Predictive Imperatives for Maximizing Customer Value

Five Predictive Imperatives for Maximizing Customer Value Executive Brief Five Predictive Imperatives for Maximizing Customer Value Applying Predictive Analytics to enhance customer relationship management Table of contents Executive summary...2 The five predictive

More information

STRATEGIC AND FINANCIAL PERFORMANCE USING BUSINESS INTELLIGENCE SOLUTIONS

STRATEGIC AND FINANCIAL PERFORMANCE USING BUSINESS INTELLIGENCE SOLUTIONS STRATEGIC AND FINANCIAL PERFORMANCE USING BUSINESS INTELLIGENCE SOLUTIONS Boldeanu Dana Maria Academia de Studii Economice Bucure ti, Facultatea Contabilitate i Informatic de Gestiune, Pia a Roman nr.

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

Starting Smart with Oracle Advanced Analytics

Starting Smart with Oracle Advanced Analytics Starting Smart with Oracle Advanced Analytics Great Lakes Oracle Conference Tim Vlamis Thursday, May 19, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed

More information

III JORNADAS DE DATA MINING

III JORNADAS DE DATA MINING III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE

More information

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web

More information

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper

Retail POS Data Analytics Using MS Bi Tools. Business Intelligence White Paper Retail POS Data Analytics Using MS Bi Tools Business Intelligence White Paper Introduction Overview There is no doubt that businesses today are driven by data. Companies, big or small, take so much of

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Mining the Most Interesting Web Access Associations

Mining the Most Interesting Web Access Associations Mining the Most Interesting Web Access Associations Li Shen, Ling Cheng, James Ford, Fillia Makedon, Vasileios Megalooikonomou, Tilmann Steinberg The Dartmouth Experimental Visualization Laboratory (DEVLAB)

More information

Data Mining: Overview. What is Data Mining?

Data Mining: Overview. What is Data Mining? Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,

More information

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 Online Analytic Processing OLAP 2 OLAP OLAP: Online Analytic Processing OLAP queries are complex queries that Touch large amounts of data Discover

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK Agenda Analytics why now? The process around data and text mining Case Studies The Value of Information

More information

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2 Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

More information

How To Understand The Value Of Data Mining

How To Understand The Value Of Data Mining Getting Real Business Value out of Oracle BI and Oracle Data Mining Antony Heljula Brendan Tierney Peak Indicators Limited Agenda Aim of Presentation About Oracle Data-Mining What you need to know about

More information