Scheduling Algorithm for Delivery and Collection System Kanwal Prakash Singh Data Scientist, DSL, Housing.com
Abstract Extreme teams, large-scale agents teams operating in dynamic environments are quite frequent these days. The task allocation forms a crucial part of the operations team. Housing.com faces a particularly challenging task with regards to optimizing the routes of its data collectors, who are required to collect and verify data for each of the company s listings. We discuss an approach for optimal allocation of timed tasks (Listing Requests) to the work force (Data Collectors) and also discuss the handling of real time tasks and their distribution. We use Hungarian Algorithm for finding minimum cost maximum matching in a graph to achieve this. Keywords: Scheduling, bipartite graphs, matching, Hungarian algorithm, operations research, decision science 2
Introduction Route optimization poses a logistical challenge to organizations with large on-ground footprints. The objective of route optimization is to maximize the number of visits per agent, while also minimizing the distance to be travelled by said agent. Housing.com faces a particularly challenging task with regards to optimizing the routes of its data collectors, who are required to collect and verify data for each of the company s listings. With over a 1000 new listings added to the website daily, each containing 90 associated data points for collection and verification, the company was likely to benefit significantly from the optimization of this process. The process of data collection starts with an owner or broker raising a listing request (LR). The LR acts as an expression of interest from the owner/broker to list their property on the portal. Once the LR is raised, a data collector (DC) is allotted with the responsibility of verifying the data submitted for the listing at a time that is convenient for both the DC and the listee. Once the DC has collected the data, the data is transferred to the Data Quality team, who are entrusted with the responsibility of verifying the listing. Once the listing is thoroughly verified, it is uploaded on the portal. The challenge is to allot DCs LRs such that their travelling time is minimized. There are several direct and indirect benefits of route optimization. Firstly, DCs are enabled to process a large number of requests daily, minimizing the lag time between an LR and posting of a request. An optimal scenario also prevents DCs from having to travel to locations that are far away from each other. Additionally, this will save the fuel and associated costs. The solution outlined in this document has the potential to have far-reaching consequences not only for data collection optimization in real estate, but also for similar optimization problems in other industries such as the logistics sector. 3
Methodology Background Task assignment problem (Carpaneto et al, 1980) can be optimized through the creation of a bipartite graph (Appendix 6) of persons and tasks and then finding out minimum cost maximum matching (Appendix 7,8,9). Hungarian Algorithm (Appendix 8) is usually used to find the solution to problems of this kind. An optimal task assignment would be one in which tasks can be completed with minimum cost, measured either as time or monetary cost. For solving such problems a graph is constructed with one partition of nodes as persons and other partition as jobs, the weights of the edges between the persons and tasks can be treated as the cost incurred by a person to perform a particular task. Note that a person cannot perform two different jobs, i.e. a node in one partition can be connected to at most one node in the other partition and vice versa. An edges set needs to be found such that no two edges have a common vertex. Out of the many possible edges the one, which has minimum value for sum of edge-weights in the edge set, is found. This is the same as finding the minimum cost maximum matching in the bipartite graph. There are two possible cases that are outlined in this report. In case 1, some owners/brokers ask for a specific time allotment on a day other than the current day. After checking if such requests can be entertained, they are allocated to DCs in a way that leads to minimum deviation from their allocated path. At the end of the day, the Hungarian algorithm is run and it makes fresh allocations of DC s to the LRs, keeping the start time of LRs fixed. This is discussed in section 2.2.1. In case 2, each LR has a slot number, and LR is scheduled such that it is visited in the order of increasing slot number. When LRs arrive in the system, a check is made in the system to find out whether it can be serviced in the requested time slot. If DCs are available nearby, the LR is added to one of their computed paths such that it leads to minimum cost. Insertion of Real Time LRs in the previously calculated optimal paths of DCs causes deviations from the optimal. If appearances of LRs can be predicted, an optimal solution for assignments can be reached. But since that isn t possible here, the Hungarian Algorithm is run repeatedly at different intervals during the day to fix the deviations caused by real time incoming requests. This has been further discussed in Case 2. 4
Model Implementation Case 1: DC allotment when start time of LRs is known Before the start of each day, a list of LRs scheduled for the next day is available. Based on heuristics and limited knowledge of future, LRs need to be optimally reassigned. This can be viewed as a task allotment problem where the start time of tasks is known. Each task has a weight (cost) of being performed by a particular person. A graph is constructed in the following way. LRs are numbered as ki, where k is the slot number and i is the LR number, so LR 13 and 14 have to be served in parallel (i.e. they lie in the same time slot). The number of LRs in a slot can never be greater than the number of DC s that can reach the serve the LRs in that slot, otherwise we won t be able to schedule the LR s. An edge between ki and mj exists if Where T(ki,mj) (Start(mj) - b)-( End(ki) + b) i.e. T(ki,mj)+ 2b (Start(mj) - End(ki)) m > k T(i,j) is the time taken to travel between i and j b is the buffer Start(a) and End(a) is the start and end time of a respectively m,k are the slot nos. There is a buffer time after the end and before the start of every LR, so as to take care of random delays that may occur due to various natural reasons. A DCs initial coordinates (node/vertices) are represented by 01 : DC_1, 02 : DC_2 and so on. This can be seen in figure [1] 0 1 1 1 2 1 m 1 0 2 1 2 2 2 m 2 0 3 1 3 2 3 m 3 0 L 1 N 1 2 N 2 m N m Figure 1 5
Now a dummy vertex ki (read as ki prime) is attached corresponding to each LR, signifying that a DC will have this coordinate when he covers this particular LR (ki). Now, each vertex is joined with prime notation (ki ) to the vertex of LRs such that all nodes are connected. An exception occurs when k >= a, in which case an edge between ki and ai doesn t exist. This is because it represents an LR (ai) in a previous time slot. The weight of the edge is the time taken to reach from ki to aj. Note that the vertices with prime notations cannot are never connected to each other (there is no edge between ki, ji no matter what). Also, each LR has some expected amount of time spent so an edge between ki and aj even when k < a wouldn t exist if the expected time spent at LR ki exceeds the time it will take to reach LR aj. The graph would appear like in figure [2] 0 1 1 1 1 1 2 1 m 1 0 2 1 2 1 2 2 2 m 2 0 3 1 3 1 3 2 3 m 3 0 L 1 N 1 1 N 1 2 N 2 m Nm Figure 2 6
The graph in figure 2 can be divided into two disjoint subsets, one containing all the LR s and the other containing a DC s starting coordinates and the coordinates a DC will have after he is sent to a particular LR, i.e. the prime notations ex. ki. Since there is no edge between the vertices belonging to the same set it is a bipartite graph. Refer figure [3] LRs have to be assigned to DCs such that the sum of the distance travelled by the DCs is minimum. The speed of each DC is assumed to be constant. As mentioned previously, the edges represent the time between different paths. Thus all the vertices of the graph must be covered and the sum of the edge weights of this matching has to be minimum. The Hungarian Algorithm (Appendix 10) is used to find an optimal allocation of requests. 0 1 0 2 1 1 1 2 0 L 1 1 1 2 1 N 1 2 1 2 2 1 N 1 2 1 2 2 2 N 2 3 1 3 2 2 N 2 3 1 3 2 3 N3 4 1 4 2 m N m m Nm DC Coordinates LR Coordinates Figure 3 7
Case 2: Real time allotment of LRs when start time is unknown For real time requests, the model must be able to confirm if the LRS can be served at the requested time. For this, DC s are required to be freed in the requested time slot. The DC at minimum distance from the assignment is picked. Suppose a DC is allocated requests mi and nj, a new request kj that appears in system real time will be assigned to a DC if ARGMIN_DC (T(mi,kl) + T(kl,nj)) (T(mi,kl)+b)+( T(kl,nj)+b)+(End(kl) + b)-( Start(nj)-b) ) Start(nj)-End(kl) i.e. T(mi,kl)+ T(kl,nj)+ ( End(kl)- Start(nj) )+ 4b Start(nj) - End(kl) n > k > m Where T(i,j) is the time taken between i and j Start(a) and End(a) are the start time and end time of a respectively b is the buffer time If the constraints of the previous equation are not satisfied for any DC, then a different time will have to be provided for the new request kl and the same procedure will be repeated. If there is no possible scheduling of this new request kl, it will have to be scheduled for the next day. If the requested time for the new LR (kl) is more than β, Hungarian Algorithm is rerun. In this case, the locations of the DC s next LRs (or current locations in case they are free in the next slots) are treated as his starting locations and the algorithm is rerun for the LRs falling after this time. A DC should have information about his next visit at leastαtime before starting his next job. This is done to remove any hassles reassignment may create. Another important thing to note is that the algorithm doesn t understand the concept of waiting time between the successive tasks assigned to a DC. Suppose a DC is allocated requests mi and nj. if m - n γ T(mi,home)+ T(home,nj)+ 4*buffer Start(nj)-End(mi) or T(mi,office) + T(office,nj) + 4*buffer Start(nj)-End(mi) 8
The DC can either go home or to a nearby office. We can decide which of the two places DC should go by using the following algorithm Next = ARGMIN_a ( T(mi,a) +T(a,nj) a = {home,office} The value of γsignifies the time for which wish to keep a DC idle at a particular location in case he has no task. 9
Results & Conclusions The proposed solution is likely to improve organizational efficiency significantly. Optimization achieved through this solution is measured on two fronts. The first is the through measuring the reduction in travel time (also the distance in our case) of DCs. The second is through ascertaining that DCs are able to service LRs per schedule after implementation of the solution. For benchmarking the new algorithm against the new one we calculated the distance travelled through the existing algorithm (fetching the records from our database) and simulated the LRs using the proposed algorithm. First we scheduled all the LRs that were made for a particular date t, at date t-1 we reassigned the LRs using approach described in section 2.2.1. Then the requests, which came into the system real time on date t, were added to the pre-computed paths of DCs. These simulations revealed greater than 50% reduction in distance with no compromise on servicing of LRs. Plot 1 and Plot 2 demonstrate the total distance travelled by the current algorithm and proposed algorithm for different dates for Bangalore and Mumbai respectively. 1200 1000 Bangalore Distance 800 600 400 Current Projected 200 0 17/05/14 18/05/14 19/05/14 20/05/14 21/05/14 22/05/14 23/05/14 Date Plot 1: Total Distance vs Date (Bangalore) 10
From the simulations of the proposed algorithm we also observed that 27 % of the DCs remained idle and were not given any LR. In order to see the limit of utilizations we took the LRs of 2 days and started the allocation through the approach discussed in sec 2.2.1. We found out that 97% of the LRs of these extra LRs could be scheduled in the same day, thus improving our resource utilizations. The relevance of route optimization was alluded to earlier in the document. Perhaps the most important feature of the solution is its applicability to the operations of several diverse industries. For instance, the solution can be extended to logistics companies, optimizing the route of delivery boys for a given number of package drop locations. 1000 900 800 700 Mumbai Distance 600 500 400 300 200 100 0 17/05/14 18/05/14 19/05/14 20/05/14 21/05/14 22/05/14 23/05/14 Date Current Projected Plot 2: Total Distance vs Date (Mumbai) 11
Acknowledgements I was able to complete (and publish) this white paper with the support and encouragement of numerous people. I would first like to extend my gratitude to my managers Mr. Abhishek Anand, and Mr. Abhimanyu Dhamija for their constant encouragement and technical support. I am also grateful to my colleague Mr. Shanu Vivek for identifying the problem and entrusting me with the responsibility of solving it. I am also indebted to my colleague, Mr. Nitin Sangwan for providing me with resources and organisational support I required to solve this problem, and my intern Mr. Adit Rustagi for carrying out the simulations and tests highlighted in this report. I would also like to acknowledge my colleague, Mr. Akhil Srivatsan for aiding me with the development of the content of this report. 12
Definitions/ Appendix 1. Node/Vertex: an independent entity, smallest building block of a graph 2. Edge: the connection between vertices 3. Graph: a collection of edges and vertices 4. DC: data collector 5. LR: listing request 6. Bipartite Graph: A graph is said to bipartite if it can be represented as two disjoint sets such that no two nodes belonging to the same set have an edge 7. Matching: A set of edges such that no two edges have a common vertex 8. Maximum Matching: A matching of highest size 9. Minimum Cost Maximum Matching: A matching where the sum of edges in the maximum matching is minimum 10. Hungarian Algorithm Construction of Cost Matrix Firstly all the listing requests have been divided into slots say A, B, C and so on. The X-axis of the matrix consists of all the listing requests. The Y-axis of the matrix consists of all the Data Collectors followed by all the listing requests locations excluding the listing requests of the final slot (as the listing requests are also possible locations from where the Data Collector can go to his next allotment). The cost matrix is then constructed by using the distances between locations as the costs. The matrix is made square because the Hungarian algorithm requires a square matrix as its input. If there are more number of rows then we add extra columns with all elements set to zero. If there are more number of columns then we add extra rows with all elements set to a very high cost. This is done because an extra column means a dummy request and it takes zero effort to perform a dummy request. Similarly an extra row means a fake Data Collector and it takes infinite amount of effort for a fake Data Collector to perform a real task. Procedure 1. Subtract row minimums from all the elements of the respective row. Then go to step 2. 2. Select a set of independent zeroes from the matrix and mark them with a star. This is done by selecting the leading zeroes from the matrix, marking them with a star and removing the row and column corresponding to the zero. Then go to step 3. 3. Cover the columns containing the starred zeroes. If all the columns of the matrix are covered then the starred zeroes represent the optimal assignments otherwise go to step 4. 13
4. Find an uncovered zero. If there is none then go to step 6. If there is at least one then prime it (mark it as ) and call it P0. If there is a starred zero in the row corresponding to the primed zero then cover the row corresponding to the star zero and uncover the column corresponding to the star zero and repeat step 4. If there is no star zero then go to step 5. 5. Find a path of alternating primes and stars by starting from the prime P0 of step 4. This is done by finding a starred zero in the column of P0(if it exists) and then finding a prime zero(which is surely there) in the row of the starred zero. Continue with the process till you do not find a starred zero in the column of the prime zero. Then un-star all the starred zeroes found in the path and star all the primed zeroes found in the path. After that un prime all the primed zeroes and uncover all the rows and columns and go to step 3. 6. Find the minimum uncovered element. Subtract it from all the uncovered elements and add it to the twice-coveredelements that are both row and column covered. Source:http://archive.vector.org.uk/trad/v201/scholes_201_080.pdf References 1. Carpaneto, Giorgio, and Paolo Toth. Algorithm 548: Solution of the assignment problem [H]. ACM Transactions on Mathematical Software (TOMS) 6.1 (1980): 104-111. 2. Giovanni Righini, Associate professor of Operations Research at Università degli Studi di Milano,Facoltà di Scienze Matematiche, Fisiche e Naturali, Dipartimento di Tecnologie dell Informazione, Lecture Notes on Matching http://www.dti.unimi.it/~righini/didattica/complementiricercaoperativa/materialecro/ Matching.pdf, last accessed 20 June 2014 3. Lovász, László, and Michael D. Plummer. Matching theory. New York (1986). 4. Munkres (Hungarian) Algorithm for Linear Assignment Problem by Yi Cao http://www.mathworks.in/matlabcentral/fileexchange/20652-hungarian-algorithmfor-linear-assignment-problems--v2-3-, last accessed 20 June 2014 14