The Impact of No-Show Prediction Quality. on Optimal Appointment Schedules



Similar documents
Scheduling Appointments in a Multi-Day Scheduling Horizon. given Individual Show Probabilities

Testing the assumptions of outpatient healthcare appointment scheduling

Appointment Scheduling: Evaluating the Robustness of Models

Appointment Scheduling under Patient Preference and No-Show Behavior

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Scheduling Patients Appointments: Allocation of Healthcare Service Using Simulation Optimization

CAPACITY MANAGEMENT AND PATIENT SCHEDULING IN AN OUTPATIENT CLINIC USING DISCRETE EVENT SIMULATION. Todd R. Huschka Thomas R. Rohleder Yariv N.

An Autonomous Agent for Supply Chain Management

24. The Branch and Bound Method

Load Balancing and Switch Scheduling

Automated Scheduling Methods. Advanced Planning and Scheduling Techniques

HVMA: Mitigating the Effects of Patient No-shows

The Trip Scheduling Problem

Data quality in Accounting Information Systems

INTEGER PROGRAMMING. Integer Programming. Prototype example. BIP model. BIP models

Chapter 13: Binary and Mixed-Integer Programming

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Multiple Linear Regression in Data Mining

Data Isn't Everything

Efficient and Robust Allocation Algorithms in Clouds under Memory Constraints

Bootstrapping Big Data

Revenue management based hospital appointment scheduling

IBM SPSS Direct Marketing 23

SIMULATION STUDY OF THE OPTIMAL APPOINTMENT NUMBER FOR OUTPATIENT CLINICS

Using simulation to calculate the NPV of a project

IBM SPSS Direct Marketing 22

Employer Health Insurance Premium Prediction Elliott Lui

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Social Media Mining. Data Mining Essentials

Multi-service Load Balancing in a Heterogeneous Network with Vertical Handover

The Data Mining Process

How To Check For Differences In The One Way Anova

The Problem of Scheduling Technicians and Interventions in a Telecommunications Company

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Azure Machine Learning, SQL Data Mining and R

Appendix: Simple Methods for Shift Scheduling in Multi-Skill Call Centers

Data Mining Applications in Higher Education

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Introduction. Chapter 1

Project management: a simulation-based optimization method for dynamic time-cost tradeoff decisions

Online Clinic Appointment Scheduling

The Predictive Data Mining Revolution in Scorecards:

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

SIMULATING CANCELLATIONS AND OVERBOOKING IN YIELD MANAGEMENT

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Scheduling Jobs and Preventive Maintenance Activities on Parallel Machines

A Production Planning Problem

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

Leveraging Ensemble Models in SAS Enterprise Miner

Making Sense of the Mayhem: Machine Learning and March Madness

DECISION TREE ANALYSIS: PREDICTION OF SERIOUS TRAFFIC OFFENDING

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Single item inventory control under periodic review and a minimum order quantity

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

Analysis of a Production/Inventory System with Multiple Retailers

Solving the chemotherapy outpatient scheduling problem with constraint programming

Handling attrition and non-response in longitudinal data

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Chapter 11 Monte Carlo Simulation

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

ICT Perspectives on Big Data: Well Sorted Materials

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Project Scheduling: PERT/CPM

Understanding Characteristics of Caravan Insurance Policy Buyer

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

Determining Inventory Levels in a CONWIP Controlled Job Shop

Supervised Learning (Big Data Analytics)

Data Mining Practical Machine Learning Tools and Techniques

Customer Analytics. Turn Big Data into Big Value

A Property & Casualty Insurance Predictive Modeling Process in SAS

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Principles of demand management Airline yield management Determining the booking limits. » A simple problem» Stochastic gradients for general problems

SAP InfiniteInsight Explorer Analytical Data Management v7.0

Chapter 10: Network Flow Programming

Question 2 Naïve Bayes (16 points)

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Forecaster comments to the ORTECH Report

Prescriptive Analytics. A business guide

VENDOR MANAGED INVENTORY

Make Better Decisions with Optimization

Simulation-based Optimization Approach to Clinical Trial Supply Chain Management

Revenue Management for Transportation Problems

Big Data Optimization at SAS

A Constraint Programming based Column Generation Approach to Nurse Rostering Problems

Transcription:

The Impact of No-Show Prediction Quality on Optimal Appointment Schedules One of the challenges in appointment scheduling at outpatient clinics is dealing with patient no-shows. Recent approaches model no-show occurrences through an individual patient show probability, predicted through analytics techniques from historical data. However, they present two shortcomings: first, they do not take into account the day-dependent attributes, even if their impact on show probability is well known; second, they do not study the impact of the prediction quality on the optimal clinic configuration. We consider the case of individual day-dependent show outcome predictions and develop a dynamic appointment scheduling procedure. An analysis of the schedules obtained suggests that a subset of appointment slots should be assigned at the last moment (e.g. same-day appointment) to patients who are likely to show, while the other slots should be assigned in advance to patients who are unlikely to show. Our results also suggest that the trade-off between prediction sensitivity and prediction specificity determines whether we should overbook or adopt a single-day scheduling horizon. We also find that our method is preferable to Open Access for a wide range of show rates. Our results are validated on the data set of a large mental health center. 1. Introduction One of the biggest obstacles to efficient appointment scheduling in outpatient clinics is patient noshows (Cayirli and Veral 2003; Gupta and Denton 2008), which leads to provider underutilization and delayed patient access. The problem persists despite efforts to understand why no-shows occur (Barron 1980; Bean and Talaga 1995; Sharp and Hamilton 2001), attempts to reduce them through interventions such as charging patients fees for no-shows (Lowes 2005) and sending reminder cards (Hixon et al. 1999). Short scheduling horizons, such as those achieved with open-access policies that offer same- or nextday scheduling, are helpful in reducing no-shows for patients who need urgent attention or prefer immediate appointments (Galucci et al. 2005; Liu et al. 2010; Robinson and Chen 2010). However, for ongoing care for long-term conditions and prescription renewals, many patients schedule farther ahead of time, which raises the risk of higher no-show rates. A common solution is to overbook, that is scheduling more appointments than the capacity would allow, in order to increase the provider s utilization. Although overbooking allows more patients to be seen, it also introduces patient waiting time and clinic overtime. The problem consists of finding the optimal trade-off between maximizing the revenue made 1

from seeing patients and minimizing the costs associated with patient waiting time and clinic overtime. The difference between revenue and costs is the clinic profit, which should be maximized. Some recent works improve the performance of overbooking by employing analytics to predict the show outcome of the individual appointments (i.e., whether the appointment will show or not), and by subsequently using this prediction to guide the scheduling decision. They assume that the show outcome is determined by the day-independent attributes of the appointment, which are those characteristics that are not influenced by the scheduling decision, such as the type of medical procedure to perform, the patient s demographic information, his or her no-show history, etc However, these works do not consider the day-dependent attributes, such as the day itself (i.e., Monday, Tuesday, ) or the number of days to the appointment, even if day-dependent attributes are known to influence the show outcome. In this work, we analyze the case where the show outcome is affected by both the day-dependent and day-independent attributes. Four day-dependent attributes are known to affect show probability. First, the lead time to the appointment, i.e., the number of days between the arrival of the appointment request and the appointment day (Galucci et al. 2005); second, the appointment location (Bean and Talaga 1995), which is a day-dependent attribute in clinics where doctors work in different locations in different days; third, the day of the week (Glowacka et al. 2009), whose impact on no-shows is caused by the incompatibility between the patient s personal schedule and the appointment time; fourth, weather conditions (Campbell et al. 2000), which are obviously related to the day. Our goal is not only to develop a dynamic scheduling algorithm for the case of day-dependent individual show predictions, but it is also to analyze the impact of the prediction quality on the clinic performance under different conditions, such as different show rate values. In particular, we address the trade-off between prediction specificity (i.e., proportion of correctly classified shows) and sensitivity (i.e., proportion of correctly classified no-shows). Since it may be impossible to predict both shows and no-shows accurately, we give guidelines on when one should lean towards maximizing specificity rather than sensitivity. Our contributions are summarized as follows. (1) We develop a dynamic appointment scheduling method that incorporates show predictions based on both day-dependent and day-independent attributes. (2) Through a simulation procedure, we estimate how the optimal overbooking level and the optimal scheduling horizon length depend on the prediction sensitivity and specificity. (3) We analyze the scheduling decisions made in these experiments and, guided by our findings, we develop a nearoptimal scheduling heuristic that can be easily implemented by clinics. Interestingly, we find that the ideal schedule consists of sequences of shows with one no-show inserted between them, so as to absorb any appointment delay. Then, (4) we estimate the prediction quality obtainable on the database of a large mental health center with a high no-show rate. (5) Through a database exploration technique called propositionalization, we identify the causes of no-shows, some of which had not been identified by 2

previous studies. Last, (6) we compare the performance of our simple heuristic to that of a popular scheduling policy, Open Access, and determine under which conditions each policy is preferable. After a brief survey of the relevant literature (Section 2), we introduce our appointment scheduling problem and assumptions (Section 3). Then, we develop a column-generation solution method (Section 4) and a simulation-based framework to test it (Section 5). In Section 6, we study the impact of prediction quality on performance measures and detect when it is better to lean towards high specificity rather than high sensitivity. By analyzing the optimal decisions made throughout our experiments, we develop an easy-to-implement heuristic procedure, which finds almost optimal solutions (Section 7). With the goal of testing our heuristic, we then turn our attention towards a real-world mental health center with a low show rate (Section 8). First, we analyze their database to discover the causes of no-shows and to estimate the prediction quality (Section 8.1). Then, we extend the model proposed by Robinson and Chen (2010) in order to adapt it to the show rates observed (Section 8.2), and we compare the performance of our heuristic to that of open access (Section 8.3). To generalize our results to clinics with higher show rates, we perform the same comparison for different show rates and provide guidelines to determine when our heuristic is preferable to open access (Section 8.4). We finally show that our assumption of constant service times does not affect our findings (Section 8.5). In Section 9, we conclude the paper by summarizing our main findings. 2. Literature Review We extend previous works in the appointment scheduling literature and, in particular, in the research stream dealing with patient no-shows. In this research area, we consider two streams of research. In the former (Liu et al. 2010, Qu et al. 2007, Robinson and Chen 2010), no-show occurrences are modeled through a show probability that depends on the day-dependent but not on the day-independent attributes; in the latter (Glowacka et al 2009, Huang and Zuniga 2012, Muthuram and Lawley 2008, Zeng et al 2010), no-show occurrences are modeled through a show probability that depends on the day-independent but not on the day-dependent attributes. Our work merges these streams by considering show probabilities that depend on both the appointment characteristics and the appointment day. We further contribute to each body of work individually. With regards to the first one, we consider several day-dependent show attributes, and not only the lead time. We also generalize the Open Access model proposed by Robinson and Chen (2010) by relaxing their assumption that same-day and next-day appointments have a 100% show rate. Our generalization is motivated by the lower no-show rates observed in several studies, including this one and Liu et al. (2010). With regards to the second body of work, we analyze the impact of the prediction quality on the clinic performance and on the clinic s 3

optimal clinic configuration, which has not been studied before. In particular, we determine under which combinations of prediction sensitivity and specificity it is better to overbook or to use a same-day scheduling horizon. Furthermore, we use overbooking by slot compression (LaGanga and Lawrence 2007b) rather than by the double booking strategy adopted by the existing works, because it arguably offers a fairer balance of patient wait time and provider utilization (LaGanga and Lawrence 2007a). We also contribute to the existing empirical work that attempts to understand the causes of no-shows. Existing works have attributed no-shows to several factors, such as lack of transportation, scheduling problems, oversleeping or forgetfulness, lack of child care (Campbell et al. 2000), patient age, gender, the number of previous no-shows (Shonick and Klein 1977), appointment lead time (Bean and Talaga 1995) and Medicaid status (Rust et al. 1995). These works analyze a carefully selected set of variables, obtained from data or through questionnaires, in the hope of finding causal relationships between them and the show outcome. By contrast, we build our variables by automatically exploring the database through the propositionalization approach proposed by Samorani et al. (2011). To the best of our knowledge, ours is the first application of propositionalization to appointment scheduling. Finally, our work has two important methodological contributions. First, we make extensive use of cost-sensitive learning (Elkan 2001) to increase prediction specificity and decrease prediction sensitivity or vice versa. In this way, the scheduling procedure favors maximizing shows or favors minimizing waiting time and overtime. This is the first application of cost-sensitive classification to favor one of two conflicting objectives. Second, we employ association analysis (Tan et al. 2006) to study the output of our exact scheduling algorithm with the goal of defining a heuristic algorithm that is simple and explainable. Only Shaw et al. (1992), who use decision trees to learn the dispatching rules adopted by a scheduling algorithm, apply analytics to derive a heuristic scheduling procedure. However, their problem is easier than ours, because while their set of scheduling decisions is composed of only four dispatching rules, ours is composed of one possible decision for each appointment slot and day. 3. Clinic Model and Problem Definition We model the clinic as a single server, single stage system, which is suitable also for those clinics with multiple providers, each serving a different set of patients. We assume a constant service time equal to (this assumption is relaxed in Section 8.5), patient punctuality, and no walk-ins. Let be the show rate corresponding to a lead time of days. For notational convenience, let be the average of the components of the vector. Let be the capacity of the clinic, i.e., the maximum number of patients that can be serviced in one clinic session without overbooking and without incurring overtime, so that the nominal length of the clinic session is. Under the overbooking policy of slot compression (LaGanga 4

and Lawrence 2007b), appointments are scheduled more frequently than the capacity would allow, so that the clinic session is composed of slots of length. For example, if the capacity is = 9 and the service time is = 40 minutes, by shortening the slots to = 30 the clinic can schedule up to = 12 appointments, without double-booking any patient into appointment slots. While overbooking increases the number of patients serviced, it introduces clinic overtime and patient waiting time. The overtime cost includes the wages for the staff and the extra use of electricity and other resources, while the waiting time cost of a patient represents monetary and nonmonetary costs, such as the patient s lost earnings while waiting or a loss of goodwill incurred by the clinic. We assume that appointment requests (or more simply, requests) arrive before the clinic session according to a Poisson process with parameter. In Section 4, we show that our method can be easily extended to the case where requests may arrive during the clinic session. The scheduling problem consists of optimally scheduling the arriving request in an empty slot of any day, where is the current day and is the scheduling horizon. Alternatively, the request may be rejected. Rejections are allowed in some works in appointment scheduling (as in Muthuraman and Lawley 2008) and forbidden in other works (as in Robinson and Chen 2010). For now, we allow rejections; in the experiments in Section 8, we forbid them. Whenever a request arrives, an entity called classifier (Tan et al. 2006) predicts its show outcome for each day of the horizon. Since we are more interested in the prediction quality obtained than in the prediction technique used, we use the term classifier to indicate any statistical, machine learning, or data mining technique that, given past appointment data, can predict the show outcomes of new unseen requests. For the moment, let us assume that a classifier has been trained with the clinic data and has learned to predict show outcomes of new requests. So, the classifier provides a predicted show vector, whose component, for, is the predicted show outcome (show = 1, no-show = 0) expected if the arriving request is scheduled in day ( is the current day). While previous studies consider show probabilities, we consider binary predictions (show or no-show) in order to keep the problem computationally tractable, as this approximation has been shown not to negatively affect the performance of the scheduling algorithm (Samorani and LaGanga 2011). Given the show vector of the arriving request and given the information on the appointments already scheduled, an optimization algorithm schedules the request in a day and slot with the goal of maximizing the clinic profit, which is composed of one positive term (revenue) and two negative terms (waiting time cost and overtime cost). The clinic accrues a unit of revenue for scheduling a patient in day. While our method allows arbitrary revenue vectors, in this paper we assume that a unit of revenue is earned for each showing patient, which represents a monetary revenue (for for-profit clinics) or a measure of societal benefit (for non-profit clinics) earned for each patient seen. Furthermore, to encourage patient access, the 5

clinic accrues a revenue for any appointment that is scheduled (i.e., not rejected). Finally, to reward short lead times, the revenue earned in day is decreased by a penalty proportional to the lead time. In summary, the revenue vector is built from the show outcome vector as follows: (1) The two costs are waiting time cost and overtime cost. Unlike previous works, where the waiting time cost is equal for all patients, we assume that each patient belongs to a waiting cost category, which is characterized by a waiting time cost per unit of waiting time. By assigning priorities, we try to minimize the waiting time of certain subsets of patients, such as, for instance, very sick people, children, or those who complained in the past about excessive waiting times. Finally, an overtime cost is paid for each unit of overtime. The notation is reported in Table 1. Table 1. Notation Service time of an appointment Interval between scheduled appointments, Clinic capacity (i.e. the number of slots obtained by setting ) Number of appointments scheduled (i.e. number of slots), Scheduling horizon, in days Daily arrival rate of appointment requests Show rate of day 1 if the appointment request is predicted to show on day, 0 otherwise Predicted revenue earned if the appointment request is scheduled in day Benefit of scheduling (i.e., not rejecting) any appointment request Penalty for each day of delay in scheduling any appointment request Additional benefit of scheduling a showing appointment request Cost for each unit of waiting time of a patient of category Cost for each unit of overtime 6

Extending the model proposed by LaGanga and Lawrence (2007b), whose objective was to maximize the clinic profit in one single day, here the objective is to maximize the clinic profit of the next h days: [ ] (2) is the sum of the predicted revenues made in day, is the overtime cost experienced by the clinic in day, and is the average waiting time cost experienced by the showing patients in day. Instead of total patient wait time, we use the average wait time because it evaluates the waiting time that a showing patient expects. The formulae for the overtime and the waiting time are straightforward but lengthy to compute and are omitted. Our objective is myopic, because it considers only the next days, and not the whole (possibly infinitely long) horizon. However, this objective is a substantial improvement upon most of the literature on no-shows (Robinson and Chen 2010; LaGanga and Lawrence 2007b; Glowacka et al. 2009; Muthuraman and Lawley 2008; Zeng et al. 2009), which considers only one day. In our experiments, we use a rolling-horizon approach (Rohleder and Klassen 2002) where the scheduling horizon shifts through time. A more formal approach would be to model the problem as a stochastic process whose objective is to maximize the long-run average profit or, possibly, the total discounted profit over an infinite horizon. However, Alden and Smith (1992) show that the performances of the two approaches converge for a sufficiently long scheduling horizon. Even by casting the problem as a Markov decision process (Puterman 1994) and tackling it through backward induction, we could solve within 1 hour only small ( ) and simplified instances of the problem, where revenues and costs were the same for all appointments. These considerations lead to our preference for our myopic objective (2). Our results in Section 8, however, suggest that the solutions we found are close to optimal. Next, we show a solution method for the scheduling problem. 4. The Column-Generation Formulation Let be the current request. The scheduling procedure starts by generating scenarios to capture the uncertainty around the arrival of future appointment requests (or future requests ). Each scenario is described by sets, which include the future requests that are forecasted to arrive during day ( = 0,, ) under scenario. Note that these future requests are fictitious and should be generated according to historical data (see Section 5). Like, future requests are characterized by a predicted show vector and belong to a waiting time cost category. Obviously, future requests can be scheduled only on or 7

after their arrival day. So, let be the set of future requests that can be scheduled in day under scenario. After generating the scenarios, we solve a stochastic optimization problem that simultaneously schedules the future requests and to the empty slots of the existing schedule in order to maximize (2), while ensuring that is scheduled in the same slot (or rejected) under all scenarios, in order to reach a unique solution. Obviously, since the future requests are fictitious, the only request that will be actually scheduled (or rejected) is. The scheduling decision is determined by the variables, which indicate whether is scheduled in slot ( ) of day ( ). If, where, then is scheduled in the -th slot of day, whereas if, then is not scheduled in day. If, then is rejected. Let us focus on modeling the objective function and, in particular, one of its components, the average waiting time. To compute this expression, we need to divide the total waiting time by the number of showing patients. This ratio could be modeled by including variables that keep track of the patients waiting time and others that count how many patients show in each day; however, such variables would make the expression for average waiting time nonlinear. Furthermore, keeping track of the patients waiting time would require the introduction of constraints to compute the waiting time of a patient given the waiting time of the patient scheduled in the previous slot. To keep the model linear and the number of constraints low, we tackle this problem with column generation. This technique is commonly used to solve large linear programs (Barnhart et al. 1998, Desrosiers and Lübbecke 2005). Under this framework, the problem formulation includes a large number of primal variables (columns), which are iteratively added during the execution of the Simplex method, so that, usually, only a small fraction is explicitly considered. At each iteration, a subset of violated dual constraints is identified by solving a sub-problem and the corresponding primal variables are included in the model at the following iteration. We introduce the variables to indicate whether the future request is assigned to day under scenario. Although the slot assigned to each future request is not explicitly considered, this information is implicitly determined once we also know the slot where is scheduled. For example, suppose that after considering existing appointment reservations and after scheduling, the partial schedule of day results in the sequence SN_S_S (show, no-show, empty slot, show, empty slot, show). If the variables indicate that the future requests scheduled in day under scenario are predicted to be N (a no-show) and S (a show), then the complete schedule of day under scenario must be SNSSNS. The alternative complete sequence, SNNSSS, obtained by switching the slot assigned to S and N, is not optimal, as it leads to a longer waiting time and overtime than SNSSNS. Note that a complete sequence determines the patients waiting times, the clinic overtime, and, therefore, the clinic profit. For example, assuming = 30 and = 20, in the sequence SSNSSS, the patients in the 1 st and 4 th 8

slot wait 0 minutes, the ones in the 2 nd and the 5 th slot wait 10 minutes, the 6 th patient waits 20 minutes, and the overtime is 30 minutes. In summary, to determine the schedule obtained and the clinic profit made in day under scenario, we need to know the slot assigned to (if, is not scheduled in ) and the set of future requests (but not their order) assigned to under. So, we introduce the column variables, which are equal to 1 if and only if: 1. is scheduled in slot of day, that is, ( ; ), and 2. The future requests in the set are scheduled in day under scenario ( ). Let be the clinic profit made in day under scenario if. In our notation, the set is an index over the subsets of including. Since we consider slots and days, the entire model includes column variables for each scenario. The column-generation (CG) formulation is the following: (CG) s.t. (3) (4) (5) (6) (7) The objective (CG) is to maximize the expected clinic profit over the next days among the considered scenarios. Constraint set (3) ensures that any request is assigned to at most one of the days that follow its arrival. Constraint (4) enforces q to be assigned to at most one slot. Constraint set (5) forces the selection of exactly one set for each day under each scenario. Constraint set (6) sets to 1 if and only if the set, scheduled in day under scenario, contains. Constraint set (7) assigns to slot of day if and only if the column variable selected for day and scenario assigns to the -th slot. When the model is initialized, it includes only the column variables corresponding to the decision of scheduling no patient in day under scenario, i.e.,. Let be the dual variables corresponding to constraint set (5), the dual variables corresponding to constraint set (6), and 9

the dual variables corresponding to constraint set (7). It is easy to verify that the dual constraints corresponding to the primal variables are satisfied if and only if: { } Thus, at each iteration we need to solve for each scenario = 1,,, day = 0,,, and slot = 0,,. This problem, which we call the One-Day Scheduling Problem, consists of finding the set that maximizes the profit obtained by scheduling to slot (if = 0, is not scheduled in day ) and to the remaining empty slots, with the original revenues of the future requests modified by the parameters. Details on the solution method for the One-Day Scheduling Problem are reported in the Appendix. After solving all the one-day scheduling problems, we add to the model the column variables corresponding to the violated dual constraints. Note that the column generation procedure solves the linear relaxation of CG (i.e., the,, and variables are continuous). Therefore, when optimality is reached, we keep the column variables that have been added during the procedure and transform the,, and variables into binary. We then solve this binary program. If for some, then the optimal solution is to schedule in day and slot ; otherwise, the optimal decision is to reject. In our experiments, the average time taken to schedule the current request is 0.19 seconds and never exceeded 1 second. The average gap between the optimal solution of the LP relaxation and the integer solution is 3.7% with standard deviation = 2.8%. Finally, to allow patients preferences on the appointment time is sufficient to include simple constraints on the variables. Similar constraints can be used to allow requests to arrive during the clinic, in which case they cannot be scheduled to past slots. It is also straightforward to forbid request rejections by replacing the inequality sign of (4) with an equality sign, so that the current request must be scheduled in exactly one slot, and by slightly modifying the setup of the column generation procedure. Our experiments suggest that the case where rejections are forbidden is much harder to solve and may require up to 71 seconds of computation time for = 12 and = 5. 5. Clinic Simulation We evaluate the performance of the scheduling procedure through a rolling-horizon simulation, where requests arrive according to a Poisson process with parameter and need to be scheduled in one of the following clinic sessions. We consider two categories of patients, whose waiting time costs are and, and the average waiting cost is a parameter of the simulation. An arriving 10

request is equally likely to belong to category 1 or 2. We fix the clinic capacity to = 8, which is a realistic value for afternoon-only clinics if the service time is, for example, 30 minutes. In our simulation, requests are also characterized by their realized show vector, which specifies whether the request actually shows or not in day. Unlike the predicted show vector, which is revealed upon the request s arrival and guides the scheduling decision, the realized show vector is revealed only when the appointment takes place (or results in a no-show) and is used to compute the realized clinic profit. The realized show vector is randomly generated by setting the -th component to 1 with a probability equal to, the show rate for day. We generate the realized show vectors randomly rather than from real data because the show outcome of a real appointment is known for only one day of the horizon, that is, the day in which it was scheduled; but we cannot know if the same appointment would have had the same show outcome if scheduled in another day. The realized revenue vector is built from the realized show vector using (1). In our experiments, is set according to Table 2. Table 2. Show rates considered (Same day) Low.74.64.65.62.61.65 Observed.87.74.75.72.71.76 High 1.00.85.87.83.81.87 The second row of Table 2 reports the show rates observed at the health center under study together with their average (.76). Our analysis, though, is not limited to these values, but also considers a case where the show rates are lower and a case where they are higher. The high show rates are computed by multiplying the observed show rates by a factor such that the show rate for same-day appointments is equal to 1.00. The low show rates are computed by dividing the show rates by the factor that results in an average show rate of.65. In this way, the three average show rates =.65, =.76, and =.87 are equidistant. The generation of the predicted show vector depends on the prediction quality. We measure the prediction quality by its sensitivity, which is the proportion of correctly predicted no-shows, and its specificity, which is the proportion of correctly predicted shows. If a component of the realized show vector is a show, the corresponding element in the predicted show vector is set to show with probability and to no-show with probability. Similarly, components of the realized show vector that are no-shows are classified as no-shows with probability and to show with probability. The predicted revenue vector is built from the predicted show vector using (1). 11

We prefer using sensitivity and specificity rather than the better known measure accuracy (i.e., the proportion of correctly predicted requests) because they can be controlled to some extent through a technique called cost-sensitive learning, which consists of varying the weight of misclassifying shows and no-shows, so that the rule favors one type of prediction or the other. We show that shifting the prediction quality towards sensitivity or specificity causes favoring the objective of maximizing revenue or that of minimizing costs, respectively. We set the number of scenarios to = 10. Under each scenario, the future requests arrival rate is set to the real arrival rate, and the characteristics of each request are generated in the same way as the real requests, as described above. Practically, the requests arriving under the scenarios are different from the ones arriving during the simulation, but their characteristics follow the same patterns. At the end of each day of the simulation, we consider the realized show outcome and revenue of the requests scheduled in, and collect the following statistics: average clinic profit, average waiting time, average overtime, and average number of shows. The results of the first days of the simulation are excluded because during this start-up period the schedule is mostly empty; similarly, the last days are excluded because the scheduling problem solved may have a shorter scheduling horizon than. 6. Impact of the Prediction Quality We measure the impact of prediction quality through a full factorial experiment, where the simulation procedure is executed for the different parameter values reported in Table 3. Table 3. Factors of the full factorial Factor Description Number of Levels Levels Arrival rate 2 12, 15 requests/day Average show rate 2.65,.87 Revenue for shows 2 1, 3 per show Overtime cost 2 1, 3 per unit of overtime Average waiting time cost 2 1, 3 per unit of waiting time Scheduling horizon 3 1, 3, 5 days Number of slots 2 9, 12 slots Sensitivity 3.4,.7,.9 Specificity 3.4,.7,.9 12

Some simulation parameters can be fixed or controlled by the clinic, while others cannot. The controllable parameters are the scheduling horizon, the number of slots, the sensitivity and the specificity. The non-controllable parameters are the requests arrival rate, the average show rate vector, the profit for scheduling a show, the overtime cost per time unit, the waiting time cost vector. We use 3 levels for the controllable factors,, and, which have not been studied in detail, and 2 levels for the other factors, which have been studied extensively by LaGanga and Lawrence (2007b). For now, we consider a scheduling horizon of up to 5 days; in Section 7 and 8, we also consider longer ones. The two show rate vectors considered are those corresponding to row 1 ( =.65) and 3 ( =.87) of Table 2. Here, is set to 0.05 and to 0.005. These small values ensure that, so that the revenue made for timely scheduling is secondary to that made for seeing patients. For every combination of parameters (2 6 x 3 3 = 1728 combinations), we run a 100-day long simulation and record the measures of clinic performance. The same random seed is used, so that the request characteristics are the same for any parameter combination. All main effects and the majority of the 2-factor interactions are significant at = 0.01. However, here we focus only on the impact of the prediction quality ( and ) on clinic performance (Figure 1) and its interaction with other controllable parameters (Figure 2). In our discussion, we consider the impact on overtime but not that on waiting time because they are very similar. (1a) (1b) Figure 1. Impact of and on clinic performance 13

(2a) (2b) (2c) (2d) Figure 2. Interactions of and with other controllable parameters Unsurprisingly, both and are positively correlated with the clinic profit (bars and labels in Figure 1). However, their impact on shows (continuous line) and overtime (dashed line) is different: while sensitivity is positively correlated with shows and overtime (Figure 1a), specificity is not (Figure 1b). To understand why, let us consider, for example, the case of high sensitivity and low specificity. Under these settings, when a request is predicted to be a show, it rarely happens to be a no-show (i.e., the prediction is correct), because the no-shows are mostly classified correctly. Conversely, when a request is predicted to be a no-show, this prediction is more likely to be wrong (i.e., the request is actually a show). Since the procedure, guided by the predictions, aims at optimally balancing shows and no-shows (see Section 7), this case may result in a greater-than-optimal number of shows, which likely leads to overtime (Figure 1a). This example intuitively suggests that high sensitivity tends to favor the objective of maximizing the revenue over the objective of minimizing the overbooking costs. By the same token, high specificity tends to favor the objective of minimizing the overbooking costs over that of maximizing the revenue, which results in a low number of shows and low overtime (Figure 1b). 14

Because sensitivity is positively correlated with the number of shows and overtime, a high sensitivity is particularly beneficial when the average show rate is low, when is low, or is large. Interestingly, as suggested by Figure 2a, a high sensitivity is more beneficial if combined with light overbooking ( = 9) than if combined with heavy overbooking ( = 12), in which case the number of shows is more likely to exceed the capacity, causing long overtime. On the other hand, specificity is negatively correlated with overtime, but results in a small number of shows. So, a high specificity is particularly beneficial if,, or are large, or if is low. Unlike high sensitivity, high specificity is more beneficial if combined with heavy overbooking ( = 12) than if combined with light overbooking ( = 9), in which case there is little advantage in attempting to limit overtime (Figure 2c). Finally, high sensitivity and high specificity make the scheduling horizon positively correlated with the profit. Figures 2b and 2d suggest that, when or are high, the profit increases when increases. In fact, same-day scheduling ( ) is the best strategy only when = 0.4 or = 0.4; otherwise, larger values of are preferred. We attribute the benefit of a longer horizon to the increased flexibility in choosing the appointment day that best suits the appointment s predicted show outcome: if, for example, a patient is predicted to show in a day and not to show in another, the scheduling algorithm may choose whether it is better to schedule a show in the first day or a no-show in the other, given the current schedule of both days. Obviously, this flexibility is useless if the prediction quality is low. Our results suggest two important guidelines for clinics. First, same-day scheduling is the best policy only if no effort is made to predict if patients show or not, i.e., if no-show predictions are not used; otherwise, it is the worst policy. Second, if a policy of heavy overbooking is adopted, the prediction quality should be shifted towards high specificity; conversely, if light overbooking is adopted, the prediction quality should be shifted towards high sensitivity. Next, we analyze the decisions made throughout our experiments to develop a heuristic procedure that can be readily implemented by clinics. 7. Heuristic Procedure We now define a heuristic procedure that is easier to implement than the column-generation approach (CG) developed in Section 4. Its design is driven by mining the set of decisions made by CG through association analysis (Tan et al. 2006). To make the interpretation easier and the computational times shorter, we embed CG in a simulation procedure with = 6 slots, a capacity of = 4, and a scheduling horizon of = 5 days. We set the revenue earned for each showing patient to = 1, the average waiting time cost to = 0.5, and the overtime cost to = 1.2, as in LaGanga and Lawrence (2007b). Whenever a scheduling decision is made, 15

we record the chosen day and slot, the request s predicted show vector, and the predicted show outcome of every appointment currently scheduled. We then use association analysis to identify patterns in the decisions made. Association analysis comprises the set of exploratory data mining techniques that find relationships between attributes. Its best-known application is the market basket analysis, which finds rules related to customers purchases, such as A customer who buys product x also buys product y (Tan et al. 2006). Here, we use it to identify the relationship between the predicted show outcome and the slot chosen. We use the default association rules algorithm in Weka (Hall et al. 2009), an open source software which provides the implementation of several statistical, machine learning, and data mining techniques. After eliminating trivial rules, such as If the request is scheduled in slot 1, it is not scheduled in slot 2, we are left with the rules summarized in Table 4. Table 4. Scheduling rules emerged from the association analysis If the predicted show Then, schedule the outcome is appointment in Support Confidence Conviction Lift No-show Slot 3 or 6 12% 88% 6.74 2.31 Show Slot 1, 2, 4, or 5 83% 95% 3.30 35.93 The first two columns report the rule s antecedent (i.e., the if part of the rule) and consequent (i.e., the then part of the rule); the other columns report four popular interestingness measures. The support is the proportion of decisions where the rule is applied; the confidence is the proportion of decisions satisfying the antecedent that also satisfy the consequent. Conviction and the lift are other two popular interestingness measures (Bayardo and Agrawal 1999), whose magnitude is proportional to the strength of the rule, with 1 corresponding to the antecedent and consequent being independent. The rules in Table 4 suggest that predicted no-shows are scheduled in slot 3 or 6 and predicted shows in slot 1, 2, 4, or 5. In practice, the CG procedure attempts to obtain the sequence SSNSSN, which results in a larger profit than any other sequence with the same number of shows and no-shows. In general, the CG procedure tries to achieve the same sequence of shows and no-shows by scheduling shows in certain show-slots (S-slots) and no-shows in certain no-show slots (N-slots). Table 5 shows the sequences targeted by the algorithm for different values of and. 16

Table 5. Target sequences for different combinations of capacity and appointments slots = = +1 = +2 = +3 = +4 = +5 =4 SSSS SSSSN SSNSSN SSNSNSN SNSNSNSN SNSNSNSNN =5 SSSSS SSSSSN SSSNSSN SSNSSNSN SSNSNSNSN SNSNSNSNSN =6 SSSSSS SSSSSSN SSSNSSSN SSNSSNSSN SSNSNSSNSN SSNSNSNSNSN =7 SSSSSSS SSSSSSSN SSSSNSSSN SSSNSSNSSN SSNSSNSSNSN SSNSNSSNSNSN =8 SSSSSSSS SSSSSSSSN SSSSNSSSSN SSSNSSSNSSN SSNSSNSSNSSN SSNSSNSNSSNSN It is easy to see that whenever we are overbooking (i.e., ), the last slot is an N-slot, so as to limit the overtime, and the other N-slots are strategically positioned to limit long sequences of S-slots, so as to avoid long waiting times. Note that the sequences in Table 5 depend on the revenue made per patient seen, the waiting time cost, and overtime cost, and may change if these parameters change. A further analysis shows that S-slots are almost 6 times more likely to be assigned to same-day appointments than N-slots. This suggests that predicted shows tend to be scheduled at the last moment, while predicted no-shows tend to be scheduled farther out. This counterintuitive scheduling decision is driven by the need to schedule shows in S-slots and no-shows in N-slots: to maximize the chance of obtaining a show, it is advisable to schedule the patient at the last moment, while to maximize the chance of obtaining a no-show, it is advisable to schedule the patient a few days in advance. We also executed experiments with a much longer scheduling horizon ( ), and noted that slots are allocated according to the same logic. Guided by these findings, we developed the data-driven heuristic procedure of Figure 3. 1. If possible, schedule the request in the earliest available S-slot where it is predicted to show; otherwise, proceed to step 2; 2. If possible, schedule the request in the latest available N-slot where it is predicted not to show; otherwise, proceed to step 3 3. If rejections are allowed, reject the request. Otherwise, schedule it in the first available slot, even beyond the predefined horizon of days. Figure 3. Data-Driven Heuristic Procedure In the case where rejections are allowed, the heuristic procedure leads to a clinic profit which is only 2.9% smaller than that obtained by the CG procedure; in the case where rejections are not allowed, it leads to a clinic profit which is only 2.1% smaller than that obtained by the CG procedure (modified to 17

forbid rejections). Thus, a clinic can use this heuristic rather than solving CG at the cost of a modest decrease in performance. The clinic profit of our heuristic heavily depends on the sensitivity and specificity of the classifier. Using HP(, ) as shorthand for the heuristic procedure with the sensitivity and specificity as parameters, we observe the following. If and, all requests are predicted to show. Therefore, the current request is scheduled in the earliest available S-slot (step 1 in Figure 3). If there is no S-slot is open, the request is most likely scheduled in the earliest available N-slot (step 3). Less likely, if all slots within days are already taken, the request is scheduled in the earliest available slot beyond. In other words, HP(0,1) attempts to assign the S-slots first. This configuration represents the lower bound policy of our heuristic because no effort is made to predict show outcomes. The configuration HP(1,1), on the other hand, represents the upper bound policy, where all show outcomes are correctly predicted. 8. A Real-World Example In this section, we estimate the performance that our data-driven heuristic procedure would obtain at the health center under study. First, we measure the sensitivity and specificity attainable on their data set. Our analysis leads also to finding the causes of no-shows. Then, we compare our heuristic procedure to an open access policy, which attempts to maximize the show rate by scheduling only same-day or nextday appointments. To generalize our results, we perform this comparison for a variety of show rates. 8.1 Data and Prediction Quality The center s database includes the details of about 50,000 appointments and 6,700 patients. All protected health information was removed or coded to protect patients rights to privacy. Figure 4 shows the Entity-Relationship diagram of the database (Chen 1976). Here, each rectangle corresponds to a table in the database, and contains the list of columns, or attributes, of the table (identifiers are not included for clarity); each arrow represents the relationship between table and table, that is, the correspondence of one or more rows of to each row of. The number of rows of involved is called cardinality and it is reported next to each arrow. For example, each appointment is performed by exactly one staff member and involves exactly one patient. However, each patient may have any number (0 to N) of appointments. Each patient is periodically evaluated through two types of outcome indicators proprietary to the health center under consideration: the Recovery Marker Inventory (RMI) and Consumer Recovery Measures (CRM) (Olmos et al. 2012). RMI scores are evaluations of patient progress, assessed by the provider, on different aspects of the patient s life, such as his/her job, housing situation, and so 18

forth. On the other hand, CRM scores are self-evaluations of the patients, who are asked how they cope with symptoms, what their level of hope is, and so on. At any given point, a patient may have an arbitrary number of RMI or CRM evaluations, but each evaluation is relative to just one patient. Figure 4: Entity-Relationship diagram of the database In order to use any classification technique, we need to build a mining table, which has one row for each appointment and one column for each characteristic of the appointment. One last column is the target attribute, which is the one that we want to predict, i.e., the show outcome. To achieve a good prediction quality, the attributes of the mining table should contain as much information as possible. They should also include information that is not in the table Appointment, such as, for example, the demographics of the patient associated with the appointment, a summary of his/her RMI and CRM scores, etc. Although these attributes could be constructed manually by a domain expert, an automatic way to do this has been proposed by Samorani et al. (2011), who developed a propositionalization procedure to attach to the mining table information from the other tables. At the beginning of the propositionalization procedure, the mining table contains no attributes. Then, the procedure generates increasingly longer paths originating from the mining table. For each path, the so-called Roll-up algorithm adds attributes to the mining table which summarize the information present on the path. It does so by starting from the last table of the path and virtually adding attributes to the previous table, until it reaches the mining table. When considering the relationship, what attributes are added to the table depend on the cardinality of the relationship. If it is one-to-one, such as in the case of Appointment Patient, the attributes of are simply added to ; if it is one-to-many, such 19

as in the case Patient RMI, the attributes of are aggregated and then added to. The aggregation is obtained by applying the following operators: average, sum, max, and min for numerical attributes, and the mode for categorical attributes. An example of aggregation on Patient Appointment is the average lead time of the patient s appointments. Given any aggregation operation for the relationship, the procedure generates also all possible filters involving any attribute of. An example of aggregation with filter on Patient Appointment is the average lead time of the patient s appointments, computed only among the appointments that took place in location. The propositionalization procedure generates one attribute for all paths and for all combinations of aggregations and filters. For example, an attribute built by considering the path Mining table Appointment Patient Appointment is the historical show rate. At the first step (Patient Appointment), we virtually add to each patient the average show outcome value among his/her appointments; then (Appointment Patient), we virtually attach this attribute to the table Appointment and, finally (Mining table Appointment), we add it to the mining table. Other attributes generated along the same path include the most frequent location and service type. Longer paths result in more complex attributes, such as the average age of the patients seen by the staff member, which is obtained on Mining table Appointment Staff Appointment Patient. As in Samorani et al. (2011), we allocate part of the allotted computation time (2 hours) to generate attributes on short paths (of up to length 4) and part (1 hour) to generate attributes on long paths (of up to length 7), in the hope of capturing complex patterns. In this way, we added 3,474 attributes to the mining table. To assess the prediction quality, half of the data set (training set) is used to train a classifier and the other half (test set) is used to evaluate the prediction quality. The classification techniques provided by Weka include logistic regression, Bayesian networks, SVMs, decision trees, neural networks, and others (Tan et al. 2006). Because of the predominance of showing appointments, most of these techniques predict every request to be a show, which results in high accuracy (0.76) and specificity (1), but low sensitivity (0) because all no-shows are misclassified. One technique that does not present this behavior is Bayesian networks, which predict some requests to be shows and some to be no-shows. On the training set, the Bayesian Network classifier obtains an accuracy equal to 0.7, which is lower than that of other techniques, but it obtains a good balance between sensitivity and specificity, both equal to 0.7. Let us record this balanced classifier. Now, we iteratively assign different costs to misclassifying no-shows and misclassifying shows and train other Bayesian network classifiers with the new costs. This technique, called cost-sensitive learning, is used, for example, in diagnosis applications, where misclassifying positive objects (classifying sick people as healthy) is much worse than misclassifying negative objects (classifying healthy people as sick). Here, we use it to increase sensitivity and decrease specificity or vice versa. Considering only the training set, we train other Bayesian network classifiers using different configurations of costs, with the goal of obtaining classifiers that are slightly biased 20

towards sensitivity or towards specificity. Through a trial-and-error procedure, two such classifiers are finally found. At this point, three classifiers have been identified: a balanced one, one biased towards high sensitivity, and one biased towards high specificity. Finally, we train the three classifiers with the whole training set and we measure their prediction quality on the test set, so as to obtain the expected prediction quality for future requests. The balanced classifier obtains = =.7, the high sensitivity classifier obtains =.9 and =.5, and the high specificity classifier obtains =.6 and =.8. Due to their complexity and number, most attributes retrieved on long paths are not likely to be included in a manually built mining table; however, they may be good predictors of no-shows. To verify this, we generated a limited set of attributes by exploring only paths of up to length 4. We then compared the prediction quality obtained using this limited attribute set to that obtained using the original attribute set. In particular, we measured the area under the receiver operating characteristic curve, which, as suggested by Tan et al. (2006), is an appropriate measure when the classes (shows, no-shows) are imbalanced. The value obtained with the limited set of attributes is 4.2% smaller than the one obtained with the extended set of attributes, which suggests that propositionalization can potentially find some discriminating attributes that a manual analysis would most likely not consider. Propositionalization not only increases prediction quality but it also allows new knowledge to emerge. Among the attributes it generates, the best predictors of no-show include the patient identification (coded to protect privacy and confidentiality), his/her no-show history, the lead time, and other information on the previous appointments held by the staff, such as the number of times they performed particular services such as individual or group therapy, case management, medication management, etc., and the number of times they were in a particular location such as the regular clinic, specialized clinic, or community setting. Interestingly, it is not only the patient s service history and characteristics that matter but also those of the provider. Such attributes could be related to differences in practice models between providers. Such differences could be due to providers assignments on different types of teams. For example, a high portion of services delivered in the community and more tailored to patients individualized needs could result in higher show rates for providers working in such teams than for those working in more standardized office-based teams. In addition, some providers specialize in particular populations of patients. For example, a clinician who works mostly with court-ordered patients might have higher show rates than other clinicians because there are legal consequences for those patients who do not show up. Or, if a clinician has been working for a long time with such patients, the clinician may have learned through experience how to be effective in engaging them so they return for their ongoing appointments rather than no-show. 21

8.2 Open Access Model for General Show Rates Under the open access policy (Robinson and Chen, 2010), appointments are scheduled in the same day of request or, at the latest, in the next one. They are scheduled sequentially, in consecutive slots whose length is equal to the service time. This setting corresponds to no overbooking. If the number of patients scheduled for today exceeds the capacity, up to of these extra requests are deferred to tomorrow, where is the deferral threshold and is less than or equal to. If the number of deferrals exceeds, the extra requests are scheduled today after hours, i.e., in overtime. When = 0, only same-day appointments are allowed. Our open access model is a generalization of the model developed by Robinson and Chen (2010). Unlike their model, ours does not assume that same-day and next-day appointments surely show up, so that we can measure the performance of open access on the show rates observed, which are less than 1 (see Table 2). Let be the show rate for same-day appointments and the show rate for next-day appointments. Let be the probability that patients are deferred, which can be computed by solving the system of transition equations relative to a Markov process, as described in detail by Robinson and Chen (2010). The expected number of shows is: [ ] (8) The appointments scheduled today include the appointments that have been deferred from yesterday, which will show up with probability, and some or all of the appointments requested today, who will show up with probability. If the appointments requested today are at most, then they are all scheduled for today. If they are between and, then are scheduled for today and the others are deferred to tomorrow and do not appear in (8). If the appointments requested today are, then all but (i.e., ) are scheduled for today, while are deferred to tomorrow. Overtime is experienced only if today s arrivals are so many that, even after deferring, are scheduled for today. If is the number of those deferred from yesterday, then requests have arrived today. The probability that at least a patient is scheduled in overtime (i.e., after the -th slot) is therefore: (9) Note that the last patient is scheduled in the -th slot, with. However, this does not necessarily result in an overtime equal to. In fact, if the patient scheduled in the last slot does not show up, the 22

provider may leave at the beginning of that slot. In this case, the overtime is equal to. Therefore, the expected overtime can be computed as follows: [ ] (10) The formula to compute the expected revenue number of scheduled patients by their expected revenue: is easily derived from (8) by multiplying the [ [ ] [ ] [ ]] (11) From (1), the expected revenue made for a same-day appointment is for a next-day appointment is. ; the expected revenue made 8.3 Comparison to Open Access In the no-rejection case, the comparison between two scheduling policies is performed, as suggested by Robinson and Chen (2010), not only by ensuring that the two policies have the same capacity, but also by ensuring that the workloads (i.e., the number of patients seen) are the same and equal to. To measure the performance of a certain policy while ensuring workload equality, the requests arrival rate is set to the target arrival rate that results in an expected number of showing patients per day equal to under that policy. The reason for manipulating the arrival rate lies in the assumption that non-showing patients will reschedule their appointment, thereby increasing the arrival rate. Although finding the value of is easy in the absence of no-shows ( ) or for a constant show rate ( ), it cannot be computed analytically in the case considered here, where show rates depend on the scheduling decision. For example, a policy which tends to schedule appointments far in advance requires a larger than a policy which tends to schedule appointments in the near future, otherwise the patients seen under the first policy would be fewer than those seen under the second policy. Under open access, the target arrival rate is computed through a numerical procedure, which, by increasing the arrival rate from 8 to 12 in steps of 0.025, analytically computes the number of expected shows using (8), and chooses the value that results in the number of shows that is closest to. Under the heuristic policy, the target arrival rate is computed through a similar numerical procedure, except that the expected number of shows is estimated through a 1,000-day long simulation, rather than computed analytically. This approach also provides the expected clinic profit, waiting time, and overtime 23

obtained under each policy. We use = 12 slots and a scheduling horizon of days. These values ensure that the capacity is greater than the arrival rate and that the heuristic procedure allocates most slots based on the predictions (steps 1-2 in Figure 3) and only few slots otherwise (step 3 in Figure 3). Since we are comparing strategies that result in different arrival rates, we set the earliness bonus and the lead time penalty to 0, so that a no-show results in no revenue; otherwise, strategies resulting in a higher arrival rate would implicitly benefit because they would schedule more no-shows. The other parameter values (,,, etc ) are the same as in Section 6. Table 6 reports the performance of open access for each deferring threshold, while Table 7 reports the performance of the heuristic procedure for the combinations of sensitivity and specificity achieved on the health center data set (Section 8.1), along with the lower bound (0,1) and upper bound (1,1) configurations. The measures reported in these tables represent an estimate of the performance that would be obtained by implementing the open access policy or our heuristic at the mental health center under study. Table 6: Estimated performance of Open Access Deferring Target Arrival Overall Exp. Exp. Wait. Exp. Profit Profit threshold Rate Show Rate Overtime Time deviation 0 9.20 0.87 53.81 0.00 5.85-3% 1 9.30 0.86 50.23 0.00 5.99-1% 2 9.41 0.85 48.79 0.00 6.05 0% 3 9.53 0.84 49.10 0.00 6.04 0% 4 9.66 0.83 50.80 0.00 6.01-1% 5 9.81 0.82 53.84 0.00 5.93-2% 6 9.96 0.80 57.56 0.00 5.80-4% 7 10.12 0.79 61.71 0.00 5.53-9% 8 10.28 0.78 66.29 0.00 5.35-12% Table 7: Estimated performance of the Heuristic Procedure Target Arrival Overall Exp. Exp. Wait. Exp. Profit Profit Rate Show Rate Overtime Time deviation 0.9, 0.5 9.1 0.88 15.22 14.74 7.14 18% 0.7, 0.7 10.35 0.79 20.47 15.58 6.92 14% 0.6, 0.8 10.8 0.75 22.97 15.42 6.82 13% 1.0, 1.0 (UB) 8.775 0.92 0.37 5.18 7.89 30% 0.0, 1.0 (LB) 11.05 0.73 31.11 18.14 6.44 6% 24

In both tables, the first column reports the experimental settings under consideration, and the following columns report the target arrival rate needed to obtain = 8 expected shows, the overall show rate (computed as ), the expected overtime, waiting time, clinic profit, and the deviation compared to the largest profit obtained by open access. We assume a service time equal to 30 minutes. Under open access, while a large deferral threshold is beneficial because it limits the variability of the daily workload, it is detrimental if it becomes too large because, as more appointments are deferred to the next day, their show rate decreases. The deferring threshold that optimizes this trade-off is. Compared to open access, the heuristic procedure leads to a much lower overtime, at the cost of introducing a modest amount of waiting time, resulting in a 13-18% increase in clinic profit. The largest profit is obtained by HP(.9,.5), which is capable of achieving a show rate (.88) slightly greater than the same-day appointment show rate (.87). The reason for this lies in the ability of this high-sensitivity classifier to detect the day where patients are most likely to show, which is not necessarily today. If our predictions were perfect (as under the upper bound policy HP(1,1)), we would achieve an even larger show rate. Interestingly, the lower bound policy, HP(0,1), which simply assigns the S-slots first and the N-slots later without using no-show predictions, outperforms open access. However, we show below that this is true only for low show rates. We also tested our heuristic with scheduling horizons of 10 and 30 days, and observed a substantial decrease in show rate and clinic profit. An analysis of the scheduling decisions reveals that most S-slots are scheduled far in advance; in fact, many N-slots remain empty because, with such a long horizon, almost all requests are scheduled in the first step of the heuristic procedure (see Figure 3), and none is scheduled in the second step. As many appointments are scheduled far in advance, the show rate is low. This result suggests that the scheduling horizon should be long enough to accommodate almost all requests, but not so long to cause a low show rate. 8.4 Impact of Show Rate Here, we perform the experiments of Section 8.3 for different show rate vectors. In particular, we iteratively multiply the observed show rate vector (second row of Table 2) by a factor. For, the show rate vector is the original one; for, the sameday and next-day show rates and become equal to 1 (we cap at 1). Note, though, that even for, the show rates and are smaller than 1. The results are displayed in Figure 5, where, for clarity purposes, the x-axis reports only the values of and, instead of the entire show-rate vector. 25

(5a) Figure 5. Effect of show rate on clinic profit (5b) In Figure 5a, the grey continuous line reports the largest clinic profit that can be obtained by the open access policy using the deferral limit values ; the black continuous line reports the largest clinic profit obtained by the heuristic procedure among the three configurations HP(.9,.5), HP(.7,.7), and HP(.6,.8); the grey dashed line reports the clinic profit obtained by the lower bound policy, i.e., HP(0,1); the black dashed line reports the clinic profit obtained by the upper bound policy HP(1,1). The gap between the lower bound policy and the heuristic procedure represents the extra profit obtained through the adoption of predictive analytics at the health center under study; the gap between the heuristic procedure and the upper bound represents the potential extra profit that can be obtained by improving our prediction quality. The open access policy performs poorly for low show rates because it schedules many non-showing appointments afterhours, which dramatically increases overtime without increasing the revenue. Conversely, the other policies schedule non-showing appointments during the clinic hours, impacting the overtime by a lesser degree. We also analyzed the deferral threshold that leads to the largest profit for each show rate under open access, and found that it is positively correlated with the show rate: in other words, the higher the show rate, the higher the best deferral threshold. For, the largest profit is obtained with, in accordance to the findings of Robinson and Chen (2010); for, the largest profit is obtained with ; it can be easily shown that, for even lower show rates, the largest profit is obtained with. This result suggests that while for high show rates it is important to limit the daily variability of the workload by deferring appointments, for low show rates it is more important to maximize the number of shows by not deferring appointments. The heuristic procedure generally outperforms open access. However, its advantage is small for high show rates (1.0% at ) and is large for low show rates (18.0% at ). Figure 5b shows that, among the three heuristic configurations, the largest profit is obtained by HP(.9,.5) 26

Prediction Quality for low show rates, by HP(.7,.7) for intermediate show rates, and by HP(.6,.8) for high show rates. The reason, once again, lies in the importance of maximizing the number of shows when the show rate is low, a task better performed by the high-sensitivity configuration, and in the importance of minimizing overtime when the show rate is high, a task better performed by the high-specificity configuration. If the sensitivity and specificity were both equal to 1, we would obtain the upper bound profit, which is almost constant throughout the interval. Interestingly, its value is only about 1% lower than 8, which is the theoretical upper bound of the clinic profit, obtained by seeing 8 patients per day without incurring any cost. This confirms the high quality of both our column-generation and our heuristic procedure. Based on these findings, Table 8 summarizes our recommendations. Table 8. Policy recommendations Low Medium High Show Rate Very Low Low-Medium Medium-High Very High Lower bound Open Access Open Access Lower bound or Open Access Heuristic Heuristic Heuristic ( ) Heuristic ( ) or Lower ( ) ( ) or Open Access Bound or Open Access Heuristic Heuristic ( ) ( ) Heuristic ( ) Heuristic ( ) Clinics whose prediction quality is low (or who are simply not willing to implement the predictive logic of our framework) should adopt the lower bound policy for extremely low show rates, such as those observed in the clinic under study, the open access policy for intermediate show rates, and either the lower bound policy or the open access policy for extremely high show rates (i.e., if same-day and nextday appointments always show). Clinics achieving a prediction quality similar to ours, which we call Medium in Table 8, should adopt the heuristic procedure if the show rate is not high; otherwise, adopting open access (or the lower bound policy if the show rate is very high) results in a similar performance. Clinics achieving a higher prediction quality than ours should always use the heuristic procedure. When implementing the heuristic procedure, one should shift the prediction quality towards high sensitivity ( ) in case of low show rate and towards high specificity in case of high show rate; for intermediate show rates, one should seek a balanced configuration ( ). 27

8.5 Variable Service Times In order to relax the assumption of constant service times, we performed a sensitivity analysis which consists of executing the experiments of Section 8.3 with variable service times. To this end, we model the service times with a Gamma distribution (, ) with parameter pairs of (8.0,.125), (3.0,.333), (2.0,.5), and (1.0, 1.0), as in LaGanga and Lawrence (2007b), in order to study the impact of different levels of uncertainty. Table 9 shows the average percentage difference between some of the profits reported in Tables 6 and 7 and the profits made in case of service time uncertainty. Table 9: Clinic profit reduction due to service time variability Service Time Variability (, ) Policy 8,.125 3,.333 2,.5 1, 1 HP(.9,.5) 4% 10% 14% 22% HP(.7,.7) 6% 11% 15% 24% HP(.6,.8) 5% 12% 14% 25% Open access ( = 2) 9% 16% 21% 31% The columns correspond to the different levels of uncertainty, from the least uncertain (left) to the most uncertain configuration (right). Unsurprisingly, the clinic profit is reduced proportionally to the level of uncertainty in service time; but, for the same level, this reduction is similar across all policies. This result suggests that our considerations about the heuristic procedure and open access would not change even in the case of variable service times. 9. Concluding Remarks In this work, we studied the impact of no-show predictions on appointment scheduling in the case where the individual patient s show outcome depends on both day-dependent and day-independent attributes. To this end, we developed a column-generation procedure which uses the no-show prediction to optimally schedule appointments, and tested it via simulation for different parameter combinations. Our results suggest that: 1. Same-day scheduling is the worst policy if individual no-show predictions are considered, because a longer horizon increases the flexibility in matching the appointment s predicted show outcome to the needs of the current schedule. 2. The trade-off between specificity and sensitivity, which can be regulated by using cost-sensitive learning, determines which objective to favor: a higher sensitivity favors maximizing the revenue and, therefore, the number of shows, whereas a higher specificity favors minimizing overtime and 28

waiting time. For this reason, if heavy overbooking is adopted, the prediction quality should be shifted towards high specificity; conversely, if light overbooking (or no overbooking) is adopted, the prediction quality should be shifted towards high sensitivity. Then, by analyzing the decisions made by the exact procedure, we developed a near-optimal heuristic that is easy to implement. Our heuristic procedure consists of scheduling the predicted shows in S-slots in the near future and the predicted no-shows in the N-slots farther into the future. The patterns emerged from our analysis show that no-shows are used as buffers to avoid long sequences of shows, which would result in long waiting time and overtime. The value of scheduling patients far out even though they have a high probability of not showing up is that it satisfies the patient s immediate perception of needing nonurgent care and the patient will not continue to call back seeking an appointment. If the patient does not show, their appointment time will be partly used by appointments assigned to the preceding S-slots. We then considered the case of a large mental health center with a high no-show rate. First, we analyzed their database with the goal of measuring the prediction quality (sensitivity and specificity). By exploring their database with propositionalization, we found that causes of no-shows include not only the characteristics of the patients, but also those of the provider. This could be explained by differences in providers amount of experience or different practice approaches used by providers, due to either their working on different kinds of teams (more community-based versus more traditional office-based), or possibly differences in skills in engaging patients so they return for their scheduled appointments. Second, we compared the estimated performance of the heuristic procedure to that of the open access policy proposed by Robinson and Chen (2010). Our results show that, if implemented at the health center, our heuristic would outperform open access, leading to a profit increase of up to 18.0%. We also showed that for low show rates the prediction quality should be shifted towards high sensitivity, while for high show rates it should be shifted towards high specificity. We believe that these simple guidelines are of great interest to practitioners who are willing to integrate predictive analytics into their appointment scheduling system. As medical practices are rapidly adopting electronic medical record systems to qualify for federal payments and avoid penalties for noncompliance, the availability of electronic appointment data is increasing. This can help increase prediction accuracy in an expanding number of clinics, and provides valuable opportunities to apply our models to improve clinic performance even more. Finally, although our work is applied to appointment scheduling, we believe that our methodology contributions can be easily generalized to other domains. First, propositionalization has the potential of becoming one of the most powerful tools in descriptive analytics one of the few that truly finds new knowledge. Second, our use of cost-sensitive learning to favor one of two objectives is just an example of how the prediction quality can impact decisions made downstream. Third, the use of data mining to study 29

the output of an optimization procedure is an exciting research avenue with several possible future directions. We showed one: data mining can be used to aid the design of a heuristic procedure. However, data mining can also find the structural difference between solutions obtained by two different methods (for example, a robust optimization and a classic stochastic optimization approach). Or, data mining can find clusters of similar solutions. All these applications of data mining have the potential of making optimization more interpretable and, therefore, applicable. Appendix: Solving the One-Day Scheduling Problem The One-Day Scheduling Problem must be solved at each iteration of the column generation procedure for each scenario, day, and slot. The solution is found by optimally scheduling the requests of in the available slots of day, while fixing in slot ( = 0 corresponds to the case when is not scheduled in day ). Provided that a data structure is built before arrives, i.e., offline, the One-Day Scheduling Problem can be solved in polynomial time by a procedure that we call online. The offline procedure finds the best ways to fill the current schedule of each day in the scheduling horizon. The number of shows obtained after completing a partial schedule is at least equal to the number of shows currently scheduled and at most equal to the number of shows currently scheduled plus the number of empty slots. For example, the number of shows obtained after completing SN_S_S is at least 3 (if the complete sequence is SNNSNS) and at most 5 (if the complete sequence is SNSSSS). The offline procedure considers the partial schedule of each day and detects the optimal ways to complete it in order to achieve any possible number of shows. To this end, starting from the left of the sequence, the algorithm fills the _ with S or N in order to generate a complete schedule that is not dominated by already generated complete schedules. Although the complexity of the offline procedure is exponential, clinic operations are not necessarily slowed down, because the procedure can be executed right after an appointment is scheduled. In our experiments, the average time to execute the offline procedure is only 0.04 seconds. The online procedure is executed for each scenario = 1,,, for each day = 0,, -1, for each slot = 0,,, and for each non-dominated sequence = 1, found by the offline procedure. The problem consists of finding the optimal assignment of the shows and the no-shows of to the available s and n of, with the constraint of being assigned to slot (or not scheduled in day, if = 0). It can be shown that this problem can be solved in O( ), where is the number of requests in and the number of waiting time cost categories. Therefore, the overall complexity of step 6 is O( ). 30

References Alden, J.M., R. L. Smith. 1992. Rolling Horizon Procedures in Nonhomogeneous Markov Decision Processes. Oper. Res. 40(2) S183-S194. Barnhart, C., E. L. Johnson, G. L. Nemhauser, M. W. P. Savelsbergh, P. H. Vance. 1998. Branch-andprice: Column generation for solving huge integer programs. Oper. Res. 46 316 329. Barron, W. M. 1980. Failed appointments: Who misses them, why they are missed, and what can be done. Primary Care 7(4)563-574. Bayardo, R. J., R. Agrawal. 1999. Mining the most interesting rules. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, CA. Bean, A.G., J. Talaga. 1995. Predicting appointment breaking. Journal of Health Care Marketing 15(1) 29 34. Campbell, J.D., R.A. Chez, T. Queen, A. Barcelo, E. Patron. 2000. The no-show rate in a high-risk obstetric clinic. Journal of Women s Health & Gender-based Medicine 9(8) 891 895. Cayirli, T., E. Veral. 2003. Outpatient scheduling in health care: A review of literature. Production Oper. Management 12(4) 519 549. Chen, P. P. 1976. The entity-relationship model toward a unified view of data. ACM Trans. Database Syst. 1(1) 9 36. Desrosiers, J., M. E. Lübbecke. 2005. A Primer in Column Generation. G. Desaulniers, J. Desrosiers, M. M. Solomon. Column Generation. Springer, US, 1 32. Elkan, C. 2001. The foundations of cost-sensitive learning, Proceedings of the 17th international joint conference on Artificial intelligence (IJCAI), 2 973 978. Galucci, G., W. Swartz, F. Hackerman. 2005. Impact of the wait for an initial appointment on the rate of kept appointments at a mental health center. Psychiatric Services 56(3) 344 346. Glowacka, K.J., R.M. Henry, J.H. May. 2009. A hybrid data mining/simulation approach for modeling outpatient no-shows in clinic scheduling. Journal of the Operational Research Society 60 1056 1068. Gupta, D., B. Denton. 2008. Appointment scheduling in health care: Challenges and opportunities. IIE Transactions 40(9) 800 819. Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten. 2009. The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1. Hixon, A. L., R. W. Chapman, J. Nuovo.1999. Failure to keep clinic appointments: Implications for residency education and productivity. Family Medicine 31(9) 627-630. 31

Huang, Y., P. Zuniga. 2012. Dynamic overbooking scheduling system to improve patient access. Journal of the Operational Research Society 63 810 820. LaGanga, L.R., Lawrence, S. R. 2007a. Appointment scheduling with overbooking to mitigate productivity loss from no-shows. Proceedings of Decision Sciences Institute Annual Conference, Phoenix, Arizona, November 17-20, 2007. LaGanga, L.R., S.R. Lawrence. 2007b. Clinic overbooking to improve patient access and increase provider productivity. Decision Sciences, 38(2), 251-276. Liu, N., S. Ziya, V.G. Kulkarni. 2010. Dynamic scheduling of outpatient appointments under patient no- Shows and cancellations. Manufacturing & Service Operations Management 12(2) 347 364. Lowes, R. 2005. Practice pointers: How to handle no-shows. Medical Economics 82(8) 62-65. Muthuraman, K., M. Lawley. 2008. A stochastic overbooking model for outpatient clinical scheduling with no-shows. IIE Transactions 40, 820 837. Olmos-Gallo, P. A., R. Starks, K. D. Lusczakoski, S. Huff, K. Mock. 2012. Seven key strategies that work together to create recovery based transformation. Community Mental Health Journal 48(3) 294 301. Puterman, M.L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, New York, NY: Wiley. Qu, X., R. L. Rardin, J. A. N. Williams, D. R. Willis. 2007. Matching daily healthcare provider capacity to demand in advanced access scheduling systems. European Journal of Operational Research 2(1) 812 826. Robinson, L. W., R. R. Chen. 2010. A Comparison of Traditional and Open-Access Policies for Appointment Scheduling. Manufacturing & Service Operations Management 12(2) 330 346. Rohleder, T. R., K. J. Klassen. 2002. Rolling horizon appointment scheduling: a simulation study. Health Care Management Science 5(3) 201 209. Rust, C. T., N. H. Gallups, W. S. Clark, D. S. Jones, W. D. Wilcox. 1995. Patient appointment failures in pediatric resident continuity clinics. Archives of Pediatrics & Adolescent Medicine 149(6) 693 695. Samorani, M., L. LaGanga. 2011. A Stochastic Programming Approach to Improve Overbooking in Clinic Appointment Scheduling. POMS Conference Proceedings. Samorani, M., M. Laguna, K.R. DeLisle, D. Weaver. 2010. A Randomized Exhaustive Propositionalization Approach for Molecule Classification. INFORMS Journal on Computing 23(3) 331 345. Sharp, D. J., W. Hamilton. 2001. Non-attendance at general practices and outpatient clinics: local systems are needed to address local problems. British Medical Journal 323(7321) 1081-1082. 32

Shaw, M. J., S. Park, N. Raman. 1992. Intelligent Scheduling with Machine Learning Capabilities: The Induction of Scheduling Knowledge. IIE Transactions 24(2) 156 168. Shonick, W., B. W. Klein. 1977. An approach to reducing the adverse effects of broken appointments in primary care systems. Medical Care 15(5) 419 429. Tan, P., M. Steinbach, V. Kumar. 2006. Introduction to data mining. Addison-Wesley. Zeng, B., A. Turkcan, J. Lin, M. Lawley. 2010. Clinic scheduling models with overbooking for patients with heterogeneous no-show probabilities. Annals of Operations Research 178(1) 121 144. 33