Unit 4 DECISION ANALYSIS. Lesson 37. Decision Theory and Decision Trees. Learning objectives:



Similar documents
Decision trees DECISION NODES. Kluwer Academic Publishers / X_220. Stuart Eriksen 1, Candice Huynh 1, and L.

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

(Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics)

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Measuring the Performance of an Agent

Start-up Companies Predictive Models Analysis. Boyan Yankov, Kaloyan Haralampiev, Petko Ruskov

PROJECT RISK MANAGEMENT

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Lecture 10: Regression Trees

Professor Anita Wasilewska. Classification Lecture Notes

Negative Risk. Risk Can Be Positive. The Importance of Project Risk Management

Chapter 4 DECISION ANALYSIS

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

2 Decision tree + Cross-validation with R (package rpart)

Social Media Mining. Data Mining Essentials

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Numerical Algorithms Group

Fundamentals of Decision Theory

Game Theory and Algorithms Lecture 10: Extensive Games: Critiques and Extensions

Sequential lmove Games. Using Backward Induction (Rollback) to Find Equilibrium

Data Mining Techniques Chapter 6: Decision Trees

Simulation and Risk Analysis

Antti Salonen KPP227 KPP227 1

Data Mining Classification: Decision Trees

A Decision Theoretic Approach to Targeted Advertising

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Decision Trees. A Primer for Decision-making Professionals

Decision Trees from large Databases: SLIQ

Chapter 11 Monte Carlo Simulation

CSE 326: Data Structures B-Trees and B+ Trees

CE Entrepreneurship. Investment decision making

A Property & Casualty Insurance Predictive Modeling Process in SAS

PROJECT RISK MANAGEMENT

Using multiple models: Bagging, Boosting, Ensembles, Forests

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Data Preprocessing. Week 2

Making Hard Decision. ENCE 627 Decision Analysis for Engineering

Classification and Prediction

from Larson Text By Susan Miertschin

Marketing Strategies for Retail Customers Based on Predictive Behavior Models

An Overview and Evaluation of Decision Tree Methodology

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Options Pricing. This is sometimes referred to as the intrinsic value of the option.

Gerry Hobbs, Department of Statistics, West Virginia University

Derivative Users Traders of derivatives can be categorized as hedgers, speculators, or arbitrageurs.

Leaving Certificate Technology. Project Management. Student Workbook

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Performance Analysis of Decision Trees

Data quality in Accounting Information Systems

MSc Project Planning

NEURAL NETWORKS IN DATA MINING

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Homework Assignment 7

Classification/Decision Trees (II)

Data Mining on Streams

Easily Identify Your Best Customers

Measurement Information Model

Learning Objectives. Essential Concepts

How To Calculate Discounted Cash Flow

Data Mining Methods: Applications for Institutional Research

1000-Grid Banner Set. This Really Good Stuff product includes: 1000-Grid Banner Set This Really Good Stuff Activity Guide

Decision making in the presence of uncertainty II

Final Project Report

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Data Mining for Knowledge Management. Classification

Lesson 3.1 Factors and Multiples of Whole Numbers Exercises (pages )

Chapter 12 Discovering New Knowledge Data Mining

ISSN: (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Development of FIDO - A Network Design Tool for Fibre Cable Layout

21. Earned Value deals with: a. EV Scope b. PV Time c. AC Cost 22. Portfolios are organized around business goals. 23. Take stern action against

Option pricing. Vinod Kothari

Visualization methods for patent data

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Paper P1 Performance Operations Post Exam Guide September 2011 Exam. General Comments

Risk and Uncertainty. Managerial Economics: Economic Tools for Today s Decision Makers, 4/e

The Data Mining Process

Welcome to the Data Analytic Toolkit PowerPoint presentation an introduction to project management. In this presentation, we will take a brief look

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

Efficiency in Software Development Projects

Polynomials and Factoring. Unit Lesson Plan

A Review of Data Mining Techniques

How To Find Local Affinity Patterns In Big Data

Mining the Software Change Repository of a Legacy Telephony System

(Refer Slide Time: 2:03)

Inv 1 5. Draw 2 different shapes, each with an area of 15 square units and perimeter of 16 units.

Version Part 1 Addition

If A is divided by B the result is 2/3. If B is divided by C the result is 4/7. What is the result if A is divided by C?

EVALUATING METRICS AT CLASS AND METHOD LEVEL FOR JAVA PROGRAMS USING KNOWLEDGE BASED SYSTEMS

Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System

Data Mining Algorithms Part 1. Dejan Sarka

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

Mathematics Navigator. Misconceptions and Errors

Grade 5 Fraction Equivalency

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Transcription:

Unit 4 DECISION ANALYSIS Lesson 37 Learning objectives: To learn how to use decision trees. To structure complex decision making problems. To analyze the above problems. To find out limitations & advantages of decision tree analysis. Helping you to think your way to an excellent life! Decision Theory and Decision Trees Hello Students, Decision trees are excellent tools for making financial or number based decisions where a lot of complex information needs to be taken into account. They provide an effective structure in which alternative decisions and the implications of taking

those decisions can be laid down and evaluated. They also help you to form an accurate, balanced picture of the risks and rewards that can result from a particular choice. What is a decision tree? Decision tree is a classifier in the form of a tree structure. So far, the decision criteria are applicable to situations where we have to make a single decision. There is another technique for analyzing a decision situation. A decision tree is a schematic diagram consisting of nodes and branches. There are some advantages of using this technique: 1. It provides a pictorial representation of the sequential decision process. Each process requires one payoff table. 2. It is easier to compute the expected value (opportunity loss). We can do them directly on the tree diagram. 3. More than one decision maker can be easily involved with the decision processes. There are only three rules governing the construction of a decision tree: branches emanating from a decision node reflect the alternative decisions possible at that point. If a decision situation requires a series of decisions, then a payoff table approach cannot accommodate the multiple layers of decision - making. A decision tree approach becomes the best method. However, a tree diagram will contain nothing but more squares (decision nodes). We can still apply the expected value criterion at each decision node from the right to the left of a tree diagram (backward process).

A decision tree can be used to classify an example by starting at the root of the tree and moving through it until an event node occurs, which provides the classification of the instance. Decision tree induction is a typical inductive approach to learn knowledge on classification. The key requirements to do mining with decision trees are: o Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes. This means that we need to discretize continuous attributes, or this must have been provided in the algorithm. o Predefined classes (target attribute values): The categories to which examples are to be assigned must have been established beforehand (supervised data). o Discrete classes: A case does or does not belong to a particular class, and there must be more cases than classes. o Sufficient data: Usually hundreds or even thousands of training cases. How to Draw a Decision Tree? You start a decision tree with a decision that needs to be made. This decision is represented by a small square towards the left of a large piece of paper. From this box draw out lines towards the right for each possible solution, and write that solution along the line. Keep the lines apart as far as possible so that you can expand your thoughts. At the end of each solution line, consider the results. If the result of taking that decision is uncertain, draw a small circle. If the result is another decision that needs to be made, draw another square. Squares represent decisions, circles represent uncertainty or random factors. Write the decision or factor to be considered above the square or circle. If you have completed the solution at the end of the line, just leave it blank. Starting from the new decision squares on your diagram, draw out lines representing the options that could be taken. From the circles draw out lines representing possible outcomes. Again mark a brief note on the line saying what it means. Keep on doing this until you have drawn down as many of the possible outcomes and decisions as you can see leading on from your original decision.

An example of the sort of thing you will end up with is shown below: PORTRAY OF A DECISION TREE Once you have done this, review your tree diagram. Challenge each square and circle to see if there are any solutions or outcomes you have not considered. If there are, draw them in. If necessary, redraft your tree if parts of it are too congested or untidy. You should now have a good understanding of the range of possible outcomes. Starting to Evaluate Your Decision Tree Now you are ready to evaluate the decision tree. This is where you can calculate the decision that has the greatest worth to you. Start by assigning a cash or numeric value to each possible outcome - how much you think it would be worth to you. Next look at each circle (representing an uncertainty point) and estimate the probability of each outcome. If you use percentages, the total must come to 100% at each circle. If you use fractions, these must add up to 1. If you have data on past events you may be able to make rigorous estimates of the probabilities. Otherwise write down your best guess.

Calculating Tree Values Once you have worked out the value of the outcomes, and have assessed the probability of the outcomes of uncertainty, it is time to start calculating the values that will help you make your decision. We start on the right hand side of the decision tree, and work back towards the left. As we complete a set of calculations on a node (decision square or uncertainty circle), all we need to do is to record the result. All the calculations that lead to that result can be ignored from now on - effectively that branch of the tree can be discarded. This is called 'pruning the tree'. Calculating The Value of Decision Nodes When you are evaluating a decision node, write down the cost of each option along each decision line. Then subtract the cost from the value of that outcome that you have already calculated. This will give you a value that represents the benefit of that decision. Sunk costs, amounts already spent, do not count for this analysis. When you have calculated the benefit of each decision, select the decision that has the largest benefit, and take that as the decision made and the value of that node. Now, consider this with a simple example.. Example 1. An executive has to make a decision. He has four alternatives D1, D2, D3 and D4. When the decision has been made events may lead such that any of the four results may occur. The results are R1, R2, R3 and R4. Probabilities of occurrence of these results are as follows: R1 = 0.5, R2 = 0.2, R3 = 0.2, R4 = 0.1 The matrix of pay-off between the decision and the results is indicated below: D1 D2 D3 D4 R1 R2 R3 R4 14 9 10 5 11 10 8 7 9 10 10 11 8 10 11 13

Show this decision situation in the form of a decision tree and indicate the most preferred decision and corresponding expected value. Solution. A decision tree, which represents possible courses of action and states of nature are shown in the following figure. In order to analyses the tree, we start working backward from the end branches. The most preferred decision at the decision node 1 found is by calculating expected value of each decision branch and selecting the path (course of action) with high value. The expected monetary value of node A, B and C is calculate as follows: EMV (A) = 0.5 x 14 + 0.2 x 9 + 0.2 x 10 + 0.1 x 5 = 11.3 EMV (B) = 0.5 x 11 + 0.2 x 10 + 0.2 x 8 + 0.1 x 7 = 9.8 EMV (C) = 0.5 x 9 + 0.2 x 10 +0.2 x 10 + 0.1 x 11 = 9.6 EMV (D) = 0.5 x 8 + 0.2 x 10 + 0.2 x 11 + 0.1 x 13 = 9.5

Since node A has the highest EMV, the decision at node 1 will be to choose the course of action D4. Example 2. A farm owner is seriously considering of drilling farm well. In the past, only 70% of wells drilled were successful at 200 feet of depth in the area. Moreover, on finding no water at 200 ft., some persons drilled it further up to 250 feet but only 20% struck water at 250 ft. The prevailing cost of drilling is Rs. 50 per feet. The farm owner has estimated that in case he loss not get his own well, he will have to pay Rs. 15,000 over the next 10 years (in PV terms) to buy water from the neighbor. The following decisions can be optimal: (i) do not drill any well, (ii) drill up to 200 ft. (iii) if no water is found at 200 ft., drill further up to 250 ft. Draw an appropriate decision tree and determine the farm owner s strategy under EMV approach. Solution. The given data can easily be represented by the following decision tree diagram. Consequence of outflow (Rs.)

There are two decision points in the tree indicated by 1 and 2. In order to decide between the two basis alternatives, we have to fold back (backward induction) the tree from the decision point 2, using EMV as criterion: EVALUATION OF DECISION POINTS Decision point 1. Drill up to 250 fit. 2. Do not drill up to 250 ft. 1. Drill up to State of Nature Probability Decision at point D 2 Expected cash Cash outflows Outflow Water struck 0.2 Rs. 12,5000 Rs. 2,500 No water struck 0.8 27,500 22,000 EMV (outflows) = 24,500 EMV (outflow) = Rs. 25,000 The decision at D 2 is : Drill up to 250 feet. Decision at point D 2 Water struck 0.7 Rs. 10,000 Rs. 7,000 200 ft. 2. Do not drill Not water struck 0.3 24,500 7,350 EMV (outflow) = Rs. 15,000 EMV (outflow) = Rs. 14,350 up to 200 ft. The decision at D 1 is : Drill up to 200 ft. Thus the optimal strategy for the farm-owner is to drill the well up to 200 ft. and if no water is struck, then further drill it up to 250 ft. Example 3. A businessman has two independent investments A and B available to him; but he lacks the capital to undertake both of them simultaneously. He can choose to

take A first and then stop, or if A is successful then take B, or vice versa. The probability of success of A is 0.7, while for B it is 0 4. Both investments require an initial capital outlay of Rs. 2,000, and both return nothing if the venture is unsuccessful. Successful competitions of A will return Rs. 3,000 (over cost), and successful completion of B will return Rs. 5,000 (over cost). Draw the decision tree and determine the best strategy. Solution. The appropriate decision tree is shown below: There are three decision points in the above decision tree indicated by D 1, D 2 and D 3.

EVALUATION OF DECISION POINTS Decision point Outcome Probability Conditional Values Expected Values D 3 (i) Accept A Success 0.7 Rs. 3000 Rs. 2100 Failure 0.3 -Rs. 2000 -Rs. 600 Rs. 1500 (ii) Stop 0 D 2 (i) Accept B Success 0.4 Rs. 5000 Rs. 2000 Failure 0.6 -Rs. 2000 -Rs. 1200 Rs. 800 Stop 0 D 1 (i) Accept A Success 0.7 Rs. 3000 + 800 Rs. 2660 Failure 0.3 -Rs. 2000 -Rs. 600 (ii) Accept B Success 0.4 Rs. 5000 + 1500 Rs. 2600 Failure 0.6 -Rs. 2000 -Rs. 1200 Rs. 1400 (iii) Do Nothing 0 Hence, the best strategy is to accept A first, and if it is successful, then accept B. Strengths and Weakness of Decision Tree Methods The strengths of decision tree methods are: o Decision trees are able to generate understandable rules. o Decision trees perform classification without requiring much computation. o Decision trees are able to handle both continuous and categorical variables. o Decision trees provide a clear indication of which fields are most important for prediction or classification. The weaknesses of decision tree methods o Decision trees are less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute. o Decision trees are prone to errors in classification problems with many class and relatively small number of training examples.

o Decision tree can be computationally expensive to train. The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field must be sorted before its best split can be found. In some algorithms, combinations of fields are used and a search must be made for optimal combining weights. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared. o Decision trees do not treat well non-rectangular regions. Most decisiontree algorithms only examine a single field at a time. This leads to rectangular classification boxes that may not correspond well with the actual distribution of records in the decision space. In this lecture you must have got a hang as to how decision tree analysis is used to arrive at business decisions. Summary Decision trees provide an effective method of decision making because they: clearly lay out the problem so that all choices can be viewed, discussed and challenged provide a framework to quantify of the values of outcomes and the probabilities of achieving them help us to make the best decisions on the basis of our existing information and best guesses. As with all decision-making methods, though, decision tree analysis should be used in conjunction with common sense. They are just one important part of your decision-making tool kit.