ALGORITHMS OF INFORMATICS

Transcription

1 ALGORITHMS OF INFORMATICS Volume 2 APPLICATIONS mondat Kiadó Budapest, 2007

2 The book appeared with the support of the Department of Mathematics of Hungarian Academy of Science Editor: Antal Iványi Authors: Zoltán Kása (Chapter 1), Zoltán Csörnyei (2), Ulrich Tamm (3), Péter Gács (4), Gábor Ivanyos, Lajos Rónyai (5), Antal Járai, Attila Kovács (6), Jörg Rothe (7, 8), Csanád Imreh (9), Ferenc Szidarovszky (10), Zoltán Kása (11), Aurél Galántai, András Jeney (12), István Miklós (13), László Szirmay-Kalos (14), Ingo Althöfer, Stefan Schwarz (15), Burkhard Englert, Dariusz Kowalski, Grzegorz Malewicz, Alexander Allister Shvartsman (16), Tibor Gyires (17), Antal Iványi, Claudia Leopold (18), Eberhard Zehendner (19), Ádám Balogh, Antal Iványi (20), János Demetrovics, Attila Sali (21, 22), Attila Kiss (23) Validators: Zoltán Fülöp (1), Pál Dömösi (2), Sándor Fridli (3), Anna Gál (4), Attila Peth (5), Gábor Ivanyos (6), János Gonda (7), Lajos Rónyai (8), Béla Vizvári (9), János Mayer (10), András Recski (11), Tamás Szántai (12), István Katsányi (13), János Vida (14), Tamás Szántai (15), István Majzik (16), János Sztrik (17), Dezs Sima (18, 19), László Varga (20), Attila Kiss (21, 22), András Benczúr (23) Linguistical validators: Anikó Hörmann and Veronika Vöröss Translators: Anikó Hörmann, László Orosz, Miklós Péter Pintér, Csaba Schneider, Veronika Vöröss Cover art: Victor Vasarely, Kubtuz, With the permission of c HUNGART, Budapest. The used lm is due to GOMA ZRt. Cover design by Antal Iványi c Ingo Althöfer, Viktor Belényesi, Zoltán Csörnyei, János Demetrovics, Pál Dömösi, Burkhard Englert, Péter Gács, Aurél Galántai, Anna Gál, János Gonda, Tibor Gyires, Anikó Hörmann, Csanád Imreh, Anna Iványi, Antal Iványi, Gábor Ivanyos, Antal Járai, András Jeney, Zoltán Kása, István Katsányi, Attila Kiss, Attila Kovács, Dariusz Kowalski, Claudia Leopold, Kornél Locher, Gregorz Malewicz, János Mayer, István Miklós, Attila Peth, András Recski, Lajos Rónyai, Jörg Rothe, Attila Sali, Stefan Schwarz, Alexander Allister Shvartsman, Dezs Sima, Tamás Szántai, Ferenc Szidarovszky, László Szirmay-Kalos, János Sztrik, Ulrich Tamm, László Varga, János Vida, Béla Vizvári, Veronika Vöröss, Eberhard Zehendner, 2007 ISBN of Volume 2: ISBN ; ISBN of Volume 1 and Volume 2: Ö Published by mondat Kiadó H-1158 Budapest, Jánoshida u. 18. Telephone/facsimile: Internet: [email protected] Responsible publisher: ifj. László Nagy Printed and bound by mondat Kft, Budapest

3 Contents Preface Introduction IV. COMPUTER NETWORKS Distributed Algorithms (Burkhard Englert, Dariusz Kowalski, Grzegorz Malewicz, Alexander Allister Shvartsman) Message passing systems and algorithms Modeling message passing systems Asynchronous systems Synchronous systems Basic algorithms Broadcast Construction of a spanning tree Ring algorithms The leader election problem The leader election algorithm Analysis of the leader election algorithm Fault-tolerant consensus The consensus problem Consensus with crash failures Consensus with Byzantine failures Lower bound on the ratio of faulty processors A polynomial algorithm Impossibility in asynchronous systems Logical time, causality, and consistent state Logical time Causality Consistent state Communication services Properties of broadcast services Ordered broadcast services Multicast services Rumor collection algorithms

4 580 Contents Rumor collection problem and requirements Ecient gossip algorithms Mutual exclusion in shared memory Shared memory systems The mutual exclusion problem Mutual exclusion using powerful primitives Mutual exclusion using read/write registers Lamport's fast mutual exclusion algorithm Computer Graphics (László Szirmay-Kalos) Fundamentals of analytic geometry Cartesian coordinate system Description of point sets with equations Solids Surfaces Curves Normal vectors Curve modelling Surface modelling Solid modelling with blobs Constructive solid geometry Geometry processing and tessellation algorithms Polygon and polyhedron Vectorization of parametric curves Tessellation of simple polygons Tessellation of parametric surfaces Subdivision curves and meshes Tessellation of implicit surfaces Containment algorithms Point containment test Polyhedron-polyhedron collision detection Clipping algorithms Translation, distortion, geometric transformations Projective geometry and homogeneous coordinates Homogeneous linear transformations Rendering with ray tracing Ray surface intersection calculation Speeding up the intersection calculation Incremental rendering Camera transformation Normalizing transformation Perspective transformation Clipping in homogeneous coordinates Viewport transformation Rasterization algorithms

5 Contents Incremental visibility algorithms Subject Index Name Index

6 Preface It is a special pleasure for me to recommend to the Readers the book Algorithms of Computer Science, edited with great care by Antal Iványi. Computer algorithms form a very important and fast developing branch of computer science. Design and analysis of large computer networks, large scale scientic computations and simulations, economic planning, data protection and cryptography and many other applications require eective, carefully planned and precisely analysed algorithms. Many years ago we wrote a small book with Péter Gács under the title Algorithms. The two volumes of the book Algorithms of Computer Science show how this topic developed into a complex area that branches o into many exciting directions. It gives a special pleasure to me that so many excellent representatives of Hungarian computer science have cooperated to create this book. It is obvious to me that this book will be one of the most important reference books for students, researchers and computer users for a long time. Budapest, July 2007 László Lovász

7 Introduction The rst volume of the book Informatikai algoritmusok (in English: Algorithms of Informatics) appeared in 2004, and the second volume of the book appeared in Two volumes contained 31 chapters: 23 chapters of the present book, and further chapters on clustering (author: András Lukács), frequent elements in data bases (author: Ferenc Bodon), geoinformatics (authors: István Elek, Csaba Sidló), inner-point methods (authors: Tibor Illés, Marianna Nagy, Tamás Terlaky), number theory (authors: Gábor Farkas, Imre Kátai), Petri nets (authors: Zoltán Horváth, Máté Tejfel), queueing theory (authors: László Lakatos, László Szeidl, Miklós Telek), scheduling (author: Béla Vizvári). The Hungarian version of the rst volume contained those chapters which were nished until May of 2004, and the second volume contained the chapters nished until April of English version contains the chapters submitted until April of Volume 1 contains the chapters belonging to the fundamentals of informatics, while the second volume contains the chapters having closer connection with some applications. The chapters of the rst volume are divided into three parts. The chapters of Part 1 are connected with automata: Automata and Formal Languages (written by Zoltán Kása, Babes-Bolyai University of Cluj-Napoca), Compilers (Zoltán Csörnyei, Eötvös Loránd University), Compression and Decompression (Ulrich Tamm, Chemnitz University of Technology Commitment), Reliable Computations (Péter Gács, Boston University). The chapters of Part 2 have algebraic character: here are the chapters Algebra (written by Gábor Ivanyos, Lajos Rónyai, Budapest University of Technology and Economics), Computer Algebra (Antal Járai, Attila Kovács, Eötvös Loránd University), further Cryptology and Complexity Theory (Jörg Rothe, Heinrich Heine University). The chapters of Part 3 have numeric character: Competitive Analysis (Csanád Imreh, University of Szeged), Game Theory (Ferenc Szidarovszky, The University of Arizona) and Scientic Computations (Aurél Galántai, András Jeney, University of Miskolc).

8 584 Introduction The second volume is also divided into three parts. The chapters of Part 4 are connected with computer networks: Distributed Algorithms (Burkhard Englert, California State University; Dariusz Kowalski, University of Liverpool; Grzegorz Malewicz, University of Alabama; Alexander Allister Shvartsman, University of Connecticut), Network Simulation (Tibor Gyires, Illinois State University), Parallel Algorithms (Antal Iványi, Eötvös Loránd University; Claudia Leopold, University of Kassel), and Systolic Systems (Eberhard Zehendner, Friedrich Schiller University). The chapters of Part 5 are Memory Management (Ádám Balogh, Antal Iványi, Eötvös Loránd University), Relational Databases and Query in Relational Databases (János Demetrovics, Eötvös Loránd University; Attila Sali, Alfréd Rényi Institute of Mathematics), Semi-structured Data Bases (Attila Kiss, Eötvös Loránd University). The chapters of Part 6 of the second volume have close connections with biology: Bioinformatics (István Miklós, Eötvös Loránd University), Human-Computer Interactions (Ingo Althöfer, Stefan Schwarz, Friedrich Schiller University), and Computer Graphics (László Szirmay-Kalos, Budapest University of Technology and Economics). The chapters are validated by Gábor Ivanyos, Lajos Rónyai, András Recski, and Tamás Szántai (Budapest University of Technology and Economics), Sándor Fridli, János Gonda, and Béla Vizvári (Eötvös Loránd University), Pál Dömösi, and Attila Peth (University of Debrecen), Zoltán Fülöp (University of Szeged), Anna Gál (University of Texas), János Mayer (University of Zürich). The validators of the chapters which appeared only in the Hungarian version: István Pataricza, Lajos Rónyai (Budapest University of Economics and Technology), András A. Benczúr (Computer and Automation Research Institute), Antal Járai (Eötvös Loránd University), Attila Meskó (Hungarian Academy of Sciences), János Csirik (University of Szeged), János Mayer (University of Zürich), The book contains verbal description, pseudocode and analysis of over 200 algorithms, and over 350 gures and 120 examples illustrating how the algorithms work. Each section ends with exercises and each chapter ends with problems. In the book you can nd over 330 exercises and 70 problems. We have supplied an extensive bibliography, in the section Chapter notes of each chapter. The web site of the book contains the maintained living version of the bibliography in which the names of authors, journals and publishers are usually links to the corresponding web site. The LATEX style le was written by Viktor Belényesi. The gures was drawn or corrected by Kornél Locher. Anna Iványi transformed the bibliography into hypertext Ṫhe linguistical validators of the book are Anikó Hörmann and Veronika Vöröss. Some chapters were translated by Anikó Hörmann (Eötvös Loránd University), László Orosz (University of Debrecen), Miklós Péter Pintér (Corvinus University of Budapest), Csaba Schneider (Budapest University of Technology and Economics), and Veronika Vöröss (Eötvös Loránd University). The publication of the book was supported by Department of Mathematics of Hungarian Academy of Science.

9 Introduction 585 We plan to publish the corrected and extended version of this book in printed and electronic form too. This book has a web site: You can use this website to obtain a list of known errors, report errors, or make suggestions (using the data of the colofon page you can contact with any of the creators of the book). The website contains the maintaned PDF version of the bibliography in which the names of the authors, journals and publishers are usually active links to the corresponding web sites (the living elements are underlined in the printed bibliography). We welcome ideas for new exercises and problems. Budapest, July 2007 Antal Iványi ([email protected])

10 IV. COMPUTER NETWORKS

11 13. Distributed Algorithms We dene a distributed system as a collection of individual computing devices that can communicate with each other. This denition is very broad, it includes anything, from a VLSI chip, to a tightly coupled multiprocessor, to a local area cluster of workstations, to the Internet. Here we focus on more loosely coupled systems. In a distributed system as we view it, each processor has its semi-independent agenda, but for various reasons, such as sharing of resources, availability, and fault-tolerance, processors need to coordinate their actions. Distributed systems are highly desirable, but it is notoriously dicult to construct ecient distributed algorithms that perform well in realistic system settings. These diculties are not just of a more practical nature, they are also fundamental in nature. In particular, many of the diculties are introduced by the three factors of: asynchrony, limited local knowledge, and failures. Asynchrony means that global time may not be available, and that both absolute and relative times at which events take place at individual computing devices can often not be known precisely. Moreover, each computing device can only be aware of the information it receives, it has therefore an inherently local view of the global status of the system. Finally, computing devices and network components may fail independently, so that some remain functional while others do not. We will begin by describing the models used to analyse distributed systems in the message-passing model of computation. We present and analyze selected distributed algorithms based on these models. We include a discussion of fault-tolerance in distributed systems and consider several algorithms for reaching agreement in the messages-passing models for settings prone to failures. Given that global time is often unavailable in distributed systems, we present approaches for providing logical time that allows one to reason about causality and consistent states in distributed systems. Moving on to more advanced topics, we present a spectrum of broadcast services often considered in distributed systems and present algorithms implementing these services. We also present advanced algorithms for rumor gathering algorithms. Finally, we also consider the mutual exclusion problem in the shared-memory model of distributed computation.

12 Distributed Algorithms Message passing systems and algorithms We present our rst model of distributed computation, for message passing systems without failures. We consider both synchronous and asynchronous systems and present selected algorithms for message passing systems with arbitrary network topology, and both synchronous and asynchronous settings Modeling message passing systems In a message passing system, processors communicate by sending messages over communication channels, where each channel provides a bidirectional connection between two specic processors. We call the pattern of connections described by the channels, the topology of the system. This topology is represented by an undirected graph, where each node represents a processor, and an edge is present between two nodes if and only if there is a channel between the two processors represented by the nodes. The collection of channels is also called the network. An algorithm for such a message passing system with a specic topology consists of a local program for each processor in the system. This local program provides the ability to the processor to perform local computations, to send and receive messages from each of its neighbours in the given topology. Each processor in the system is modeled as a possibly innite state machine. A conguration is a vector C = (q 0,..., q n 1 ) where each q i is the state of a processor p i. Activities that can take place in the system are modeled as events (or actions) that describe indivisible system operations. Examples of events include local computation events and delivery events where a processor receives a message. The behaviour of the system over time is modeled as an execution, a (nite or innite) sequence of congurations (C i ) alternating with events (a i ): C 0, a 1, C 1, a 2, C 2,.... Executions must satisfy a variety of conditions that are used to represent the correctness properties, depending on the system being modeled. These conditions can be classied as either safety or liveness conditions. A safety condition for a system is a condition that must hold in every nite prex of any execution of the system. Informally it states that nothing bad has happened yet. A liveness condition is a condition that must hold a certain (possibly innite) number of times. Informally it states that eventually something good must happen. An important liveness condition is fairness, which requires that an (innite) execution contains innitely many actions by a processor, unless after some conguration no actions are enabled at that processor Asynchronous systems We say that a system is asynchronous if there is no xed upper bound on how long it takes for a message to be delivered or how much time elapses between consecutive steps of a processor. An obvious example of such an asynchronous system is the Internet. In an implementation of a distributed system there are often upper bounds on message delays and processor step times. But since these upper bounds are often very large and can change over time, it is often desirable to develop an algorithm

13 13.1. Message passing systems and algorithms 589 that is independent of any timing parameters, that is, an asynchronous algorithm. In the asynchronous model we say that an execution is admissible if each processor has an innite number of computation events, and every message sent is eventually delivered. The rst of these requirements models the fact that processors do not fail. (It does not mean that a processor's local program contains an innite loop. An algorithm can still terminate by having a transition function not change a processors state after a certain point.) We assume that each processor's set of states includes a subset of terminated states. Once a processor enters such a state it remains in it. The algorithm has terminated if all processors are in terminated states and no messages are in transit. The message complexity of an algorithm in the asynchronous model is the maximum over all admissible executions of the algorithm, of the total number of (point-to-point) messages sent. A timed execution is an execution that has a nonnegative real number associated with each event, the time at which the event occurs. To measure the time complexity of an asynchronous algorithm we rst assume that the maximum message delay in any execution is one unit of time. Hence the time complexity is the maximum time until termination among all timed admissible executions in which every message delay is at most one. Intuitively this can be viewed as taking any execution of the algorithm and normalising it in such a way that the longest message delay becomes one unit of time Synchronous systems In the synchronous model processors execute in lock-step. The execution is partitioned into rounds so that every processor can send a message to each neighbour, the messages are delivered, and every processor computes based on the messages just received. This model is very convenient for designing algorithms. Algorithms designed in this model can in many cases be automatically simulated to work in other, more realistic timing models. In the synchronous model we say that an execution is admissible if it is innite. From the round structure it follows then that every processor takes an innite number of computation steps and that every message sent is eventually delivered. Hence in a synchronous system with no failures, once a (deterministic) algorithm has been xed, the only relevant aspect determining an execution that can change is the initial conguration. On the other hand in an asynchronous system, there can be many dierent executions of the same algorithm, even with the same initial conguration and no failures, since here the interleaving of processor steps, and the message delays, are not xed. The notion of terminated states and the termination of the algorithm is dened in the same way as in the asynchronous model. The message complexity of an algorithm in the synchronous model is the maximum over all admissible executions of the algorithm, of the total number of messages sent.

14 Distributed Algorithms To measure time in a synchronous system we simply count the number of rounds until termination. Hence the time complexity of an algorithm in the synchronous model is the maximum number of rounds in any admissible execution of the algorithm until the algorithm has terminated Basic algorithms We begin with some simple examples of algorithms in the message passing model Broadcast We start with a simple algorithm Spanning-Tree-Broadcast for the (single message) broadcast problem, assuming that a spanning tree of the network graph with n nodes (processors) is already given. Later, we will remove this assumption. A processor p i wishes to send a message M to all other processors. The spanning tree rooted at p i is maintained in a distributed fashion: Each processor has a distinguished channel that leads to its parent in the tree as well as a set of channels that lead to its children in the tree. The root p i sends the message M on all channels leading to its children. When a processor receives the message on a channel from its parent, it sends M on all channels leading to its children. Spanning-Tree-Broadcast Initially M is in transit from p i to all its children in the spanning tree. Code for p i : 1 upon receiving no message: // rst computation event by p i 2 terminate Code for p j, 0 j n 1, j i: 3 upon receiving M from parent: 4 send M to all children 5 terminate The algorithm Spanning-Tree-Broadcast is correct whether the system is synchronous or asynchronous. Moreover, the message and time complexities are the same in both models. Using simple inductive arguments we will rst prove a lemma that shows that by the end of round t, the message M reaches all processors at distance t (or less) from p r in the spanning tree. Lemma 13.1 In every admissible execution of the broadcast algorithm in the synchronous model, every processor at distance t from p r in the spanning tree receives the message M in round t. Proof We proceed by induction on the distance t of a processor from p r. First let t = 1. It follows from the algorithm that each child of p r receives the message in round 1.

15 13.2. Basic algorithms 591 Assume that each processor at distance t 1 received the message M in round t 1. We need to show that each processor p t at distance t receives the message in round t. Let p s be the parent of p t in the spanning tree. Since p s is at distance t 1 from p r, by the induction hypothesis, p s received M in round t 1. By the algorithm, p t will hence receive M in round t. By Lemma 13.1 the time complexity of the broadcast algorithm is d, where d is the depth of the spanning tree. Now since d is at most n 1 (when the spanning tree is a chain) we have: Theorem 13.2 There is a synchronous broadcast algorithm for n processors with message complexity n 1 and time complexity d, when a rooted spanning tree with depth d is known in advance. We now move to an asynchronous system and apply a similar analysis. Lemma 13.3 In every admissible execution of the broadcast algorithm in the asynchronous model, every processor at distance t from p r in the spanning tree receives the message M by time t. Proof We proceed by induction on the distance t of a processor from p r. First let t = 1. It follows from the algorithm that M is initially in transit to each processor p i at distance 1 from p r. By the denition of time complexity for the asynchronous model, p i receives M by time 1. Assume that each processor at distance t 1 received the message M at time t 1. We need to show that each processor p t at distance t receives the message by time t. Let p s be the parent of p t in the spanning tree. Since p s is at distance t 1 from p r, by the induction hypothesis, p s sends M to p t when it receives M at time t 1. By the algorithm, p t will hence receive M by time t. We immediately obtain: Theorem 13.4 There is an asynchronous broadcast algorithm for n processors with message complexity n 1 and time complexity d, when a rooted spanning tree with depth d is known in advance Construction of a spanning tree The asynchronous algorithm called Flood, discussed next, constructs a spanning tree rooted at a designated processor p r. The algorithm is similar to the Depth First Search (DFS) algorithm. However, unlike DFS where there is just one processor with global knowledge about the graph, in the Flood algorithm, each processor has local knowledge about the graph, processors coordinate their work by exchanging messages, and processors and messages may get delayed arbitrarily. This makes the design and analysis of Flood algorithm challenging, because we need to show that the algorithm indeed constructs a spanning tree despite conspiratorial selection of these delays. Algorithm description Each processor has four local variables. The links adjacent to a processor are identied with distinct numbers starting from 1 and stored in a local variable called

16 Distributed Algorithms neighbours. We will say that the spanning tree has been constructed, when the variable parent stores the identier of the link leading to the parent of the processor in the spanning tree, except that this variable is none for the designated processor p r ; children is a set of identiers of the links leading to the children processors in the tree; and other is a set of identiers of all other links. So the knowledge about the spanning tree may be distributed across processors. The code of each processor is composed of segments. There is a segment (lines 14) that describes how local variables of a processor are initialised. Recall that the local variables are initialised that way before time 0. The next three segments (lines 511, 1215 and 1619) describe the instructions that any processor executes in response to having received a message: <adopt>, <approved> or <rejected>. The last segment (lines 2022) is only included in the code of processor p r. This segment is executed only when the local variable parent of processor p r is nil. At some point of time, it may happen that more than one segment can be executed by a processor (e.g., because the processor received <adopt> messages from two processors). Then the processor executes the segments serially, one by one (segments of any given processor are never executed concurrently). However, instructions of dierent processor may be arbitrarily interleaved during an execution. Every message that can be processed is eventually processed and every segment that can be executed is eventually executed (fairness). Flood Code for any processor p k, 1 k n 1 initialisation 2 parent nil 3 children 4 other 5 process message <adopt> that has arrived on link j 6 if parent = nil 7 then parent j 8 send <approved> to link j 9 send <adopt> to all links in neighbours \ {j} 10 else send <rejected> to link j 11 process message <approved> that has arrived on link j 12 children children {j} 13 if children other = neighbours \ {parent} 14 then terminate 15 process message <rejected> that has arrived on link j 16 other other {j} 17 if children other = neighbours \ {parent} 18 then terminate

17 13.2. Basic algorithms 593 Extra code for the designated processor p r 19 if parent = nil 20 then parent none 21 send <adopt> to all links in neighbours Let us outline how the algorithm works. The designated processor sends an <adopt> message to all its neighbours, and assigns none to the parent variable (nil and none are two distinguished values, dierent from any natural number), so that it never again sends the message to any neighbour. When a processor processes message <adopt> for the rst time, the processor assigns to its own parent variable the identier of the link on which the message has arrived, responds with an <approved> message to that link, and forwards an <adopt> message to every other link. However, when a processor processes message <adopt> again, then the processor responds with a <rejected> message, because the parent variable is no longer nil. When a processor processes message <approved>, it adds the identier of the link on which the message has arrived to the set children. It may turn out that the sets children and other combined form identiers of all links adjacent to the processor except for the identier stored in the parent variable. In this case the processor enters a terminating state. When a processor processes message <rejected>, the identier of the link is added to the set other. Again, when the union of children and other is large enough, the processor enters a terminating state. Correctness proof We now argue that Flood constructs a spanning tree. The key moments in the execution of the algorithm are when any processor assigns a value to its parent variable. These assignments determine the shape of the spanning tree. The facts that any processor eventually executes an instruction, any message is eventually delivered, and any message is eventually processed, ensure that the knowledge about these assignments spreads to neighbours. Thus the algorithm is expanding a subtree of the graph, albeit the expansion may be slow. Eventually, a spanning tree is formed. Once a spanning tree has been constructed, eventually every processor will terminate, even though some processors may have terminated even before the spanning tree has been constructed. Lemma 13.5 For any 1 k n, there is time t k which is the rst moment when there are exactly k processors whose parent variables are not nil, and these processors and their parent variables form a tree rooted at p r. Proof We prove the statement of the lemma by induction on k. For the base case, assume that k = 1. Observe that processor p r eventually assigns none to its parent variable. Let t 1 be the moment when this assignment happens. At that time, the parent variable of any processor other than p r is still nil, because no <adopt> messages have been sent so far. Processor p r and its parent variable form a tree with a single node and not arcs. Hence they form a rooted tree. Thus the inductive hypothesis holds for k = 1.

18 Distributed Algorithms For the inductive step, suppose that 1 k < n and that the inductive hypothesis holds for k. Consider the time t k which is the rst moment when there are exactly k processors whose parent variables are not nil. Because k < n, there is a non-tree processor. But the graph G is connected, so there is a non-tree processor adjacent to the tree. (For any subset T of processors, a processor p i is adjacent to T if and only if there an edge in the graph G from p i to a processor in T.) Recall that by denition, parent variable of such processor is nil. By the inductive hypothesis, the k processors must have executed line 7 of their code, and so each either has already sent or will eventually send <adopt> message to all its neighbours on links other than the parent link. So the non-tree processors adjacent to the tree have already received or will eventually receive <adopt> messages. Eventually, each of these adjacent processors will, therefore, assign a value other than nil to its parent variable. Let t k+1 > t k be the rst moment when any processor performs such assignment, and let us denote this processor by p i. This cannot be a tree processor, because such processor never again assigns any value to its parent variable. Could p i be a non-tree processor that is not adjacent to the tree? It could not, because such processor does not have a direct link to a tree processor, so it cannot receive <adopt> directly from the tree, and so this would mean that at some time t between t k and t k+1 some other nontree processor p j must have sent <adopt> message to p i, and so p j would have to assign a value other than nil to its parent variable some time after t k but before t k+1, contradicting the fact the t k+1 is the rst such moment. Consequently, p i is a non-tree processor adjacent to the tree, such that, at time t k+1, p i assigns to its parent variable the index of a link leading to a tree processor. Therefore, time t k+1 is the rst moment when there are exactly k + 1 processors whose parent variables are not nil, and, at that time, these processors and their parent variables form a tree rooted at p r. This completes the inductive step, and the proof of the lemma. Theorem 13.6 Eventually each processor terminates, and when every processor has terminated, the subgraph induced by the parent variables forms a spanning tree rooted at p r. Proof By Lemma 13.5, we know that there is a moment t n which is the rst moment when all processors and their parent variables form a spanning tree. Is it possible that every processor has terminated before time t n? By inspecting the code, we see that a processor terminates only after it has received <rejected> or <approved> messages from all its neighbours other than the one to which parent link leads. A processor receives such messages only in response to <adopt> messages that the processor sends. At time t n, there is a processor that still has not even sent <adopt> messages. Hence, not every processor has terminated by time t n. Will every processor eventually terminate? We notice that by time t n, each processor either has already sent or will eventually send <adopt> message to all its neighbours other than the one to which parent link leads. Whenever a processor receives <adopt> message, the processor responds with <rejected> or <approved>, even if the processor has already terminated. Hence, eventually, each processor will receive either <rejected> or <approved> message on each link to which the processor has sent <adopt> message. Thus, eventually, each processor terminates. We note that the fact that a processor has terminated does not mean that a

19 13.3. Ring algorithms 595 spanning tree has already been constructed. In fact, it may happen that processors in a dierent part of the network have not even received any message, let alone terminated. Theorem 13.7 Message complexity of Flood is O(e), where e is the number of edges in the graph G. The proof of this theorem is left as Problem Exercises It may happen that a processor has terminated even though a processor has not even received any message. Show a simple network and how to delay message delivery and processor computation to demonstrate that this can indeed happen It may happen that a processor has terminated but may still respond to a message. Show a simple network and how to delay message delivery and processor computation to demonstrate that this can indeed happen Ring algorithms One often needs to coordinate the activities of processors in a distributed system. This can frequently be simplied when there is a single processor that acts as a coordinator. Initially, the system may not have any coordinator, or an existing coordinator may fail and so another may need to be elected. This creates the problem where processors must elect exactly one among them, a leader. In this section we study the problem for special types of networksrings. We will develop an asynchronous algorithm for the problem. As we shall demonstrate, the algorithm has asymptotically optimal message complexity. In the current section, we will see a distributed analogue of the well-known divide-and-conquer technique often used in sequential algorithms to keep their time complexity low. The technique used in distributed systems helps reduce the message complexity The leader election problem The leader election problem is to elect exactly leader among a set of processors. Formally each processor has a local variable leader initially equal to nil. An algorithm is said to solve the leader election problem if it satises the following conditions: 1. in any execution, exactly one processor eventually assigns true to its leader variable, all other processors eventually assign false to their leader variables, and 2. in any execution, once a processor has assigned a value to its leader variable, the variable remains unchanged. Ring model We study the leader election problem on a special type of network the ring. Formally, the graph G that models a distributed system consists of n nodes that form a simple cycle; no other edges exist in the graph. The two links adjacent to a

20 Distributed Algorithms processor are labeled CW (Clock-Wise) and CCW (counter clock-wise). Processors agree on the orientation of the ring i.e., if a message is passed on in CW direction n times, then it visits all n processors and comes back to the one that initially sent the message; same for CCW direction. Each processor has a unique identier that is a natural number, i.e., the identier of each processor is dierent from the identier of any other processor; the identiers do not have to be consecutive numbers 1,..., n. Initially, no processor knows the identier of any other processor. Also processors do not know the size n of the ring The leader election algorithm Bully elects a leader among asynchronous processors p 1,..., p n. Identiers of processors are used by the algorithm in a crucial way. Briey speaking, each processor tries to become the leader, the processor that has the largest identier among all processors blocks the attempts of other processors, declares itself to be the leader, and forces others to declare themselves not to be leaders. Let us begin with a simpler version of the algorithm to exemplify some of the ideas of the algorithm. Suppose that each processor sends a message around the ring containing the identier of the processor. Any processor passes on such message only if the identier that the message carries is strictly larger than the identier of the processor. Thus the message sent by the processor that has the largest identier among the processors of the ring, will always be passed on, and so it will eventually travel around the ring and come back to the processor that initially sent it. The processor can detect that such message has come back, because no other processor sends a message with this identier (identiers are distinct). We observe that, no other message will make it all around the ring, because the processor with the largest identier will not pass it on. We could say that the processor with the largest identier swallows these messages that carry smaller identiers. Then the processor becomes the leader and sends a special message around the ring forcing all others to decide not to be leaders. The algorithm has Θ(n 2 ) message complexity, because each processor induces at most n messages, and the leader induces n extra messages; and one can assign identiers to processors and delay processors and messages in such a way that the messages sent by a constant fraction of n processors are passed on around the ring for a constant fraction of n hops. The algorithm can be improved so as to reduce message complexity to O(n lg n), and such improved algorithm will be presented in the remainder of the section. The key idea of the Bully algorithm is to make sure that not too many messages travel far, which will ensure O(n lg n) message complexity. Specically, the activity of any processor is divided into phases. At the beginning of a phase, a processor sends probe messages in both directions: CW and CCW. These messages carry the identier of the sender and a certain time-to-live value that limits the number of hops that each message can make. The probe message may be passed on by a processor provided that the identier carried by the message is larger than the identier of the processor. When the message reaches the limit, and has not been swallowed, then it is bounced back. Hence when the initial sender receives two bounced back messages, each from each direction, then the processor is certain

21 13.3. Ring algorithms 597 that there is no processor with larger identier up until the limit in CW nor CCW directions, because otherwise such processor would swallow a probe message. Only then does the processor enter the next phase through sending probe messages again, this time with the time-to-live value increased by a factor, in an attempt to nd if there is no processor with a larger identier in twice as large neighbourhood. As a result, a probe message that the processor sends will make many hops only when there is no processor with larger identier in a large neighbourhood of the processor. Therefore, fewer and fewer processors send messages that can travel longer and longer distances. Consequently, as we will soon argue in detail, message complexity of the algorithm is O(n lg n). We detail the Bully algorithm. Each processor has ve local variables. The variable id stores the unique identier of the processor. The variable leader stores true when the processor decides to be the leader, and false when it decides not to be the leader. The remaining three variables are used for bookkeeping: asleep determines if the processor has ever sent a <probe,id,0,0> message that carries the identier id of the processor. Any processor may send <probe,id,phase,2 phase 1 > message in both directions (CW and CCW) for dierent values of phase. Each time a message is sent, a <reply, id, phase> message may be sent back to the processor. The variables CW replied and CCW replied are used to remember whether the replies have already been processed the processor. The code of each processor is composed of ve segments. The rst segment (lines 15) initialises the local variables of the processor. The second segment (lines 68) can only be executed when the local variable asleep is true. The remaining three segments (lines 917, 126, and 2731) describe the actions that the processor takes when it processes each of the three types of messages: <probe, ids, phase, ttl>, <reply, ids, phase> and <terminate> respectively. The messages carry parameters ids, phase and ttl that are natural numbers. We now describe how the algorithm works. Recall that we assume that the local variables of each processor have been initialised before time 0 of the global clock. Each processor eventually sends a <probe,id,0,0> message carrying the identier id of the processor. At that time we say that the processor enters phase number zero. In general, when a processor sends a message <probe,id,phase,2 phase 1>, we say that the processor enters phase number phase. Message <probe,id,0,0> is never sent again because false is assigned to asleep in line 7. It may happen that by the time this message is sent, some other messages have already been processed by the processor. When a processor processes message <probe, ids, phase, ttl> that has arrived on link CW (the link leading in the clock-wise direction), then the actions depend on the relationship between the parameter ids and the identier id of the processor. If ids is smaller than id, then the processor does nothing else (the processor swallows the message). If ids is equal to id and processor has not yet decided, then, as we shall see, the probe message that the processor sent has circulated around the entire ring. Then the processor sends a <terminate> message, decides to be the leader, and terminates (the processor may still process messages after termination). If ids is larger than id, then actions of the processor depend on the value of the parameter ttl (time-to-live). When the value is strictly larger than zero, then the processor

22 Distributed Algorithms passes on the probe message with ttl decreased by one. If, however, the value of ttl is already zero, then the processor sends back (in the CW direction) a reply message. Symmetric actions are executed when the <probe, ids, phase, ttl> message has arrived on link CCW, in the sense that the directions of sending messages are respectively reversed see the code for details. Bully Code for any processor p k, 1 k n 1 initialisation 2 asleep true 3 CWreplied false 4 CCWreplied false 5 leader nil 6 if asleep 7 then asleep false 8 send <probe,id,0,0> to links CW and CCW 9 process message <probe, ids, phase, ttl> that has arrived on link CW (resp. CCW) 10 if id = ids and leader = nil 11 then send <terminate> to link CCW 12 leader true 13 terminate 14 if ids > id and ttl > 0 15 then send < probe,ids,phase,ttl 1 > to link CCW (resp. CW) 16 if ids > id and ttl = 0 17 then send <reply,ids,phase> to link CW (resp. CCW) 18 process message <reply,ids,phase> that has arrived on link CW (resp. CCW) 19 if id ids 20 then send <reply,ids,phase> to link CCW (resp. CW) 21 else CWreplied true (resp. CCWreplied) 22 if CWreplied and CCWreplied 23 then CWreplied false 24 CCWreplied false 25 send <probe,id,phase+1,2 phase+1 1> to links CW and CCW

23 13.3. Ring algorithms process message <terminate> that has arrived on link CW 28 if leader nil 29 then send <terminate> to link CCW 30 leader false 31 terminate When a processor processes message <reply, ids, phase> that has arrived on link CW, then the processor rst checks if ids is dierent from the identier id of the processor. If so, the processor merely passes on the message. However, if ids = id, then the processor records the fact that a reply has been received from direction CW, by assigning true to CWreplied. Next the processor checks if both CWreplied and CCWreplied variables are true. If so, the processor has received replies from both directions. Then the processor assigns false to both variables. Next the processor sends a probe message. This message carries the identier id of the processor, the next phase number phase + 1, and an increased time-to-live parameter 2 phase+1 1. Symmetric actions are executed when <reply,ids,phase> has arrived on link CCW. The last type of message that a processor can process is <terminate>. The processor checks if it has already decided to be or not to be the leader. When no decision has been made so far, the processor passes on the <terminate> message and decides not to be the leader. This message eventually reaches a processor that has already decided, and then the message is no longer passed on Analysis of the leader election algorithm We begin the analysis by showing that the algorithm Bully solves the leader election problem. Correctness proof Theorem 13.8 Bully solves the leader election problem on any ring with asynchronous processors. Proof We need to show that the two conditions listed at the beginning of the section are satised. The key idea that simplies the argument is to focus on one processor. Consider the processor p i with maximum id among all processors in the ring. This processor eventually executes lines 68. Then the processor sends <probe,id,0,0> messages in CW and CCW directions. Note that whenever the processor sends <probe,id,phase,2 phase 1> messages, each such message is always passed on by other processors, until the ttl parameter of the message drops down to zero, or the message travels around the entire ring and arrives at p i. If the message never arrives at p i, then a processor eventually receives the probe message with ttl equal to zero, and the processor sends a response back to p i. Then, eventually p i receives messages <reply,id,phase> from each directions, and enters phase number phase + 1 by sending probe messages <probe,id,phase+1,2 phase+1 1> in both directions. These messages carry a larger time-to-live value compared to the value from the previous phase number phase. Since the ring is nite, eventually ttl becomes so large that

24 Distributed Algorithms processor p i receives a probe message that carries the identier of p i. Note that p i will eventually receive two such messages. The rst time when p i processes such message, the processor sends a <terminate> message and terminates as the leader. The second time when p i processes such message, lines 1113 are not executed, because variable leader is no longer nil. Note that no other processor p j can execute lines 1113, because a probe message originated at p j cannot travel around the entire ring, since p i is on the way, and p i would swallow the message; and since identiers are distinct, no other processor sends a probe message that carries the identier of processor p j. Thus no processor other than p i can assign true to its leader variable. Any processor other than p i will receive the <terminate> message, assign false to its leader variable, and pass on the message. Finally, the <terminate> message will arrive at p i, and p i will not pass it anymore. The argument presented thus far ensures that eventually exactly one processor assigns true to its leader variable, all other processors assign false to their leader variables, and once a processor has assigned a value to its leader variable, the variable remains unchanged. Our next task is to give an upper bound on the number of messages sent by the algorithm. The subsequent lemma shows that the number of processors that can enter a phase decays exponentially as the phase number increases. Lemma 13.9 Given a ring of size n, the number k of processors that enter phase number i 0 is at most n/2 i 1. Proof There are exactly n processors that enter phase number i = 0, because each processor eventually sends <probe,id,0,0> message. The bound stated in the lemma says that the number of processors that enter phase 0 is at most 2n, so the bound evidently holds for i = 0. Let us consider any of the remaining cases i.e., let us assume that i 1. Suppose that a processor p j enters phase number i, and so by denition it sends message <probe,id,i,2 i 1>. In order for a processor to send such message, each of the two probe messages <probe,id,i-1,2 i 1 1> that the processor sent in the previous phase in both directions must have made 2 i 1 hops always arriving at a processor with strictly lower identier than the identier of p j (because otherwise, if a probe message arrives at a processor with strictly larger or the same identier, than the message is swallowed, and so a reply message is not generated, and consequently p j cannot enter phase number i). As a result, if a processor enters phase number i, then there is no other processor 2 i 1 hops away in both directions that can ever enter the phase. Suppose that there are k 1 processors that enter phase i. We can associate with each such processor p j, the 2 i 1 consecutive processors that follow p j in the CW direction. This association assigns 2 i 1 distinct processors to each of the k processors. So there must be at least k + k 2 i 1 distinct processor in the ring. Hence k(1 + 2 i 1 ) n, and so we can weaken this bound by dropping 1, and conclude that k 2 i 1 n, as desired. Theorem The algorithm Bully has O(n lg n) message complexity, where n is the size of the ring. Proof Note that any processor in phase i, sends messages that are intended to travel 2 i away and back in each direction (CW and CCW). This contributes at most 4 2 i

25 13.4. Fault-tolerant consensus 601 messages per processor that enters phase number i. The contribution may be smaller than 4 2 i if a probe message gets swallowed on the way away from the processor. Lemma 13.9 provides an upper bound on the number of processors that enter phase number k. What is the highest phase that a processor can ever enter? The number k of processors that can be in phase i is at most n/2 i 1. So when n/2 i 1 < 1, then there can be no processor that ever enters phase i. Thus no processor can enter any phase beyond phase number h = 1+ log 2 n, because n < 2 (h+1) 1. Finally, a single processor sends one termination message that travels around the ring once. So for the total number of messages sent by the algorithm we get the n + 1+ log 2 n i=0 ( n/2 i i) 1+ log 2 n = n + i=0 8n = O(n lg n) upper bound. Burns furthermore showed that the asynchronous leader election algorithm is asymptotically optimal: Any uniform algorithm solving the leader election problem in an asynchronous ring must send the number of messages at least proportional to n lg n. Theorem Any uniform algorithm for electing a leader in an asynchronous ring sends Ω(n lg n) messages. The proof, for any algorithm, is based on constructing certain executions of the algorithm on rings of size n/2. Then two rings of size n/2 are pasted together in such a way that the constructed executions on the smaller rings are combined, and Θ(n) additional messages are received. This construction strategy yields the desired logarithmic multiplicative overhead. Exercises Show that the simplied Bully algorithm has Ω(n 2 ) message complexity, by appropriately assigning identiers to processors on a ring of size n, and by determining how to delay processors and messages Show that the algorithm Bully has Ω(n lg n) message complexity Fault-tolerant consensus The algorithms presented so far are based on the assumption that the system on which they run is reliable. Here we present selected algorithms for unreliable distributed systems, where the active (or correct) processors need to coordinate their activities based on common decisions. It is inherently dicult for processors to reach agreement in a distributed setting prone to failures. Consider the deceptively simple problem of two failure-free processors attempting to agree on a common bit using a communication medium where messages may be lost. This problem is known as the two generals problem. Here two generals must coordinate an attack using couriers that may be destroyed by the enemy. It turns out that it is not possible to solve this problem using a nite

26 Distributed Algorithms number of messages. We prove this fact by contradiction. Assume that there is a protocol used by processors A and B involving a nite number of messages. Let us consider such a protocol that uses the smallest number of messages, say k messages. Assume without loss of generality that the last k th message is sent from A to B. Since this nal message is not acknowledged by B, A must determine the decision value whether or not B receives this message. Since the message may be lost, B must determine the decision value without receiving this nal message. But now both A and B decide on a common value without needing the k th message. In other words, there is a protocol that uses only k 1 messages for the problem. But this contradicts the assumption that k is the smallest number of messages needed to solve the problem. In the rest of this section we consider agreement problems where the communication medium is reliable, but where the processors are subject to two types of failures: crash failures, where a processor stops and does not perform any further actions, and Byzantine failures, where a processor may exhibit arbitrary, or even malicious, behaviour as the result of the failure. The algorithms presented deal with the so called consensus problem, rst introduced by Lamport, Pease, and Shostak. The consensus problem is a fundamental coordination problem that requires processors to agree on a common output, based on their possibly conicting inputs The consensus problem We consider a system in which each processor p i has a special state component x i, called the input and y i, called the output (also called the decision). The variable x i initially holds a value from some well ordered set of possible inputs and y i is undened. Once an assignment to y i has been made, it is irreversible. Any solution to the consensus problem must guarantee: Termination: In every admissible execution, y i is eventually assigned a value, for every nonfaulty processor p i. Agreement: In every execution, if y i and y j are assigned, then y i = y j, for all nonfaulty processors p i and p j. That is nonfaulty processors do not decide on conicting values. Validity: In every execution, if for some value v, x i = v for all processors p i, and if y i is assigned for some nonfaulty processor p i, then y i = v. That is, if all processors have the same input value, then any value decided upon must be that common input. Note that in the case of crash failures this validity condition is equivalent to requiring that every nonfaulty decision value is the input of some processor. Once a processor crashes it is of no interest to the algorithm, and no requirements are put on its decision. We begin by presenting a simple algorithm for consensus in a synchronous message passing system with crash failures.

27 13.4. Fault-tolerant consensus Consensus with crash failures Since the system is synchronous, an execution of the system consists of a series of rounds. Each round consists of the delivery of all messages, followed by one computation event for every processor. The set of faulty processors can be dierent in dierent executions, that is, it is not known in advance. Let F be a subset of at most f processors, the faulty processors. Each round contains exactly one computation event for the processors not in F and at most one computation event for every processor in F. Moreover, if a processor in F does not have a computation event in some round, it does not have such an event in any further round. In the last round in which a faulty processor has a computation event, an arbitrary subset of its outgoing messages are delivered. Consensus-with-Crash-Failures Code for processor p i, 0 i n 1. Initially V = {x} round k, 1 k f send {v V : p i has not already sent v} to all processors 2 receive S j from p j, 0 j n 1, j i 3 V V n 1 j=0 S j 4 if k = f then y min(v ) In the previous algorithm, which is based on an algorithm by Dolev and Strong, each processor maintains a set of the values it knows to exist in the system. Initially, the set contains only its own input. In later rounds the processor updates its set by joining it with the sets received from other processors. It then broadcasts any new additions to the set of all processors. This continues for f + 1 rounds, where f is the maximum number of processors that can fail. At this point, the processor decides on the smallest value in its set of values. To prove the correctness of this algorithm we rst notice that the algorithm requires exactly f + 1 rounds. This implies termination. Moreover the validity condition is clearly satised since the decision value is the input of some processor. It remains to show that the agreement condition holds. We prove the following lemma: Lemma In every execution at the end of round f + 1, V i = V j, for every two nonfaulty processors p i and p j. Proof We prove the claim by showing that if x V i at the end of round f + 1 then x V j at the end of round f + 1. Let r be the rst round in which x is added to V i for any nonfaulty processor p i. If x is initially in V i let r = 0. If r f then, in round r + 1 f + 1 p i sends x to each p j, causing p j to add x to V j, if not already present. Otherwise, suppose r = f + 1 and let p j be a nonfaulty processor that receives x for the rst time in round f + 1. Then there must be a chain of f + 1 processors

28 Distributed Algorithms p i1,... p if+1 that transfers the value x to p j. Hence p i1 sends x to p i2 in round one etc. until p if+1 sends x to p j in round f + 1. But then p i1,..., p if+1 is a chain of f + 1 processors. Hence at least one of them, say p ik must be nonfaulty. Hence p ik adds x to its set in round k 1 < r, contradicting the minimality of r. This lemma together with the before mentioned observations hence implies the following theorem. Theorem The previous consensus algorithm solves the consensus problem in the presence of f crash failures in a message passing system in f + 1 rounds. The following theorem was rst proved by Fischer and Lynch for Byzantine failures. Dolev and Strong later extended it to crash failures. The theorem shows that the previous algorithm, assuming the given model, is optimal. Theorem There is no algorithm which solves the consensus problem in less than f + 1 rounds in the presence of f crash failures, if n f + 2. What if failures are not benign? That is can the consensus problem be solved in the presence of Byzantine failures? And if so, how? Consensus with Byzantine failures In a computation step of a faulty processor in the Byzantine model, the new state of the processor and the message sent are completely unconstrained. As in the reliable case, every processor takes a computation step in every round and every message sent is delivered in that round. Hence a faulty processor can behave arbitrarily and even maliciously. For example, it could send dierent messages to dierent processors. It can even appear that the faulty processors coordinate with each other. A faulty processor can also mimic the behaviour of a crashed processor by failing to send any messages from some point on. In this case, the denition of the consensus problem is the same as in the message passing model with crash failures. The validity condition in this model, however, is not equivalent with requiring that every nonfaulty decision value is the input of some processor. Like in the crash case, no conditions are put on the output of faulty processors Lower bound on the ratio of faulty processors Pease, Shostak and Lamport rst proved the following theorem. Theorem In a system with n processors and f Byzantine processors, there is no algorithm which solves the consensus problem if n 3f A polynomial algorithm The following algorithm uses messages of constant size, takes 2(f + 1) rounds, and assumes that n > 4f. It was presented by Berman and Garay. This consensus algorithm for Byzantine failures contains f + 1 phases, each

29 13.4. Fault-tolerant consensus 605 taking two rounds. Each processor has a preferred decision for each phase, initially its input value. At the rst round of each phase, processors send their preferences to each other. Let vi k be the majority value in the set of values received by processor p i at the end of the rst round of phase k. If no majority exists, a default value v is used. In the second round of the phase processor p k, called the king of the phase, sends its majority value vk k to all processors. If p i receives more than n/2 + f copies of vi k (in the rst round of the phase) then it sets its preference for the next phase to be vi k ; otherwise it sets its preference to the phase kings preference, vk k received in the second round of the phase. After f + 1 phases, the processor decides on its preference. Each processor maintains a local array pref with n entries. We prove correctness using the following lemmas. Termination is immediate. We next note the persistence of agreement: Lemma If all nonfaulty processors prefer v at the beginning of phase k, then they all prefer v at the end of phase k, for all k, 1 k f + 1. Proof Since all nonfaulty processors prefer v at the beginning of phase k, they all receive at least n f copies of v (including their own) in the rst round of phase k. Since n > 4f, n f > n/2 + f, implying that all nonfaulty processors will prefer v at the end of phase k. Consensus-with-Byzantine-failures Code for processor p i, 0 i n 1. Initially pref[j] = v, for any j i round 2k 1, 1 k f send pref[i] to all processors 2 receive v j from p j and assign to pref[j], for all 0 j n 1, j i 3 let maj be the majority value of pref[0],...,pref[n 1](v if none) 4 let mult be the multiplicity of maj round 2k, 1 k f if i = k 6 then send maj to all processors 7 receive king-maj from p k (v if none) 8 if mult > n 2 + f 9 then pref[i] maj 10 else pref[i] king maj 11 if k = f then y pref[i] This implies the validity condition: If they all start with the same input v they will continue to prefer v and nally decide on v in phase f +1. Agreement is achieved by the king breaking ties. Since each phase has a dierent king and there are f + 1

30 Distributed Algorithms phases, at least one round has a nonfaulty king. Lemma Let g be a phase whose king p g is nonfaulty. Then all nonfaulty processors nish phase g with the same preference. Proof Suppose all nonfaulty processors use the majority value received from the king for their preference. Since the king is nonfaulty, it sends the same message and hence all the nonfaulty preferences are the same. Suppose a nonfaulty processor p i uses its own majority value v for its preference. Thus p i receives more than n/2 + f messages for v in the rst round of phase g. Hence every processor, including p g receives more than n/2 messages for v in the rst round of phase g and sets its majority value to v. Hence every nonfaulty processor has v for its preference. Hence at phase g+1 all processors have the same preference and by Lemma they will decide on the same value at the end of the algorithm. Hence the algorithm has the agreement property and solves consensus. Theorem There exists an algorithm for n processors which solves the consensus problem in the presence of f Byzantine failures within 2(f + 1) rounds using constant size messages, if n > 4f Impossibility in asynchronous systems As shown before, the consensus problem can be solved in synchronous systems in the presence of both crash (benign) and Byzantine (severe) failures. What about asynchronous systems? Under the assumption that the communication system is completely reliable, and the only possible failures are caused by unreliable processors, it can be shown that if the system is completely asynchronous then there is no consensus algorithm even in the presence of only a single processor failure. The result holds even if the processors only fail by crashing. The impossibility proof relies heavily on the system being asynchronous. This result was rst shown in a breakthrough paper by Fischer, Lynch and Paterson. It is one of the most inuential results in distributed computing. The impossibility holds for both shared memory systems if only read/write registers are used, and for message passing systems. The proof rst shows it for shared memory systems. The result for message passing systems can then be obtained through simulation. Theorem There is no consensus algorithm for a read/write asynchronous shared memory system that can tolerate even a single crash failure. And through simulation the following assertion can be shown. Theorem There is no algorithm for solving the consensus problem in an asynchronous message passing system with n processors, one of which may fail by crashing. Note that these results do not mean that consensus can never be solved in

31 13.5. Logical time, causality, and consistent state 607 asynchronous systems. Rather the results mean that there are no algorithms that guarantee termination, agreement, and validity, in all executions. It is reasonable to assume that agreement and validity are essential, that is, if a consensus algorithm terminates, then agreement and validity are guaranteed. In fact there are ecient and useful algorithms for the consensus problem that are not guaranteed to terminate in all executions. In practice this is often sucient because the special conditions that cause non-termination may be quite rare. Additionally, since in many real systems one can make some timing assumption, it may not be necessary to provide a solution for asynchronous consensus. Exercises Prove the correctness of algorithm Consensus-Crash Prove the correctness of the consensus algorithm in the presence of Byzantine failures Prove Theorem Logical time, causality, and consistent state In a distributed system it is often useful to compute a global state that consists of the states of all processors. Having access to the global can allows us to reason about the system properties that depend on all processors, for example to be able to detect a deadlock. One may attempt to compute global state by stopping all processors, and then gathering their states to a central location. Such a method is will-suited for many distributed systems that must continue computation at all times. This section discusses how one can compute global state that is quite intuitive, yet consistent, in a precise sense. We rst discuss a distributed algorithm that imposes a global order on instructions of processors. This algorithm creates the illusion of a global clock available to processors. Then we introduce the notion of one instruction causally aecting other instruction, and an algorithm for computing which instruction aects which. The notion turns out to be very useful in dening a consistent global state of distributed system. We close the section with distributed algorithms that compute a consistent global state of distributed system Logical time The design of distributed algorithms is easier when processors have access to (Newtonian) global clock, because then each event that occurs in the distributed system can be labeled with the reading of the clock, processors agree on the ordering of any events, and this consensus can be used by algorithms to make decisions. However, construction of a global clock is dicult. There exist algorithms that approximate the ideal global clock by periodically synchronising drifting local hardware clocks. However, it is possible to totally order events without using hardware clocks. This idea is called the logical clock. Recall that an execution is an interleaving of instructions of the n programs. Each instruction can be either a computational step of a processor, or sending a message, or receiving a message. Any instruction is performed at a distinct point of

32 Distributed Algorithms global time. However, the reading of the global clock is not available to processors. Our goal is to assign values of the logical clock to each instruction, so that these values appear to be readings of the global clock. That is, it possible to postpone or advance the instants when instructions are executed in such a way, that each instruction x that has been assigned a value t x of the logical clock, is executed exactly at the instant t x of the global clock, and that the resulting execution is a valid one, in the sense that it can actually occur when the algorithm is run with the modied delays. The Logical-Clock algorithm assigns logical time to each instruction. Each processor has a local variable called counter. This variable is initially zero and it gets incremented every time processor executes an instruction. Specically, when a processor executes any instruction other than sending or receiving a message, the variable counter gets incremented by one. When a processor sends a message, it increments the variable by one, and attaches the resulting value to the message. When a processor receives a message, then the processor retrieves the value attached to the message, then calculates the maximum of the value and the current value of counter, increments the maximum by one, and assigns the result to the counter variable. Note that every time instruction is executed, the value of counter is incremented by at least one, and so it grows as processor keeps on executing instructions. The value of logical time assigned to instruction x is dened as the pair (counter, id), where counter is the value of the variable counter right after the instruction has been executed, and id is the identier of the processor. The values of logical time form a total order, where pairs are compared lexicographically. This logical time is also called Lamport time. We dene t x to be a quotient counter + 1/(id + 1), which is an equivalent way to represent the pair. Remark For any execution, logical time satises three conditions: (i) if an instruction x is performed by a processor before an instruction y is performed by the same processor, then the logical time of x is strictly smaller than that of y, (ii) any two distinct instructions of any two processors get assigned dierent logical times, (iii) if instruction x sends a message and instruction y receives this message, then the logical time of x is strictly smaller than that of y. Our goal now is to argue that logical clock provides to processors the illusion of global clock. Intuitively, the reason why such an illusion can be created is that we can take any execution of a deterministic algorithm, compute the logical time t x of each instruction x, and run the execution again delaying or speeding up processors and messages in such a way that each instruction x is executed at the instant t x of the global clock. Thus, without access to a hardware clock or other external measurements not captured in our model, the processors cannot distinguish the reading of logical clock from the reading of a real global clock. Formally, the reason why the re-timed sequence is a valid execution that is indistinguishable from the original execution, is summarised in the subsequent corollary that follows directly from Remark Corollary For any execution α, let T be the assignment of logical time to

33 13.5. Logical time, causality, and consistent state 609 instructions, and let β be the sequence of instructions ordered by their logical time in α. Then for each processor, the subsequence of instructions executed by the processor in α is the same as the subsequence in β. Moreover, each message is received in β after it is sent in β Causality In a system execution, an instruction can aect another instruction by altering the state of the computation in which the second instruction executes. We say that one instruction can causally aect (or inuence) another, if the information that one instruction produces can be passed on to the other instruction. Recall that in our model of distributed system, each instruction is executed at a distinct instant of global time, but processors do not have access to the reading of the global clock. Let us illustrate causality. If two instructions are executed by the same processor, then we could say that the instruction executed earlier can causally aect the instruction executed later, because it is possible that the result of executing the former instruction was used when the later instruction was executed. We stress the word possible, because in fact the later instruction may not use any information produced by the former. However, when dening causality, we simplify the problem of capturing how processors inuence other processors, and focus on what is possible. If two instructions x and y are executed by two dierent processors, then we could say that instruction x can causally aect instruction y, when the processor that executes x sends a message when or after executing x, and the message is delivered before or during the execution of y at the other processor. It may also be the case that inuence is passed on through intermediate processors or multiple instructions executed by processors, before reaching the second processor. We will formally dene the intuition that one instruction can causally aect another in terms of a relation called happens before, and that relates pairs of instructions. The relation is dened for a given execution, i.e., we x a sequence of instructions executed by the algorithm and instances of global clock when the instructions were executed, and dene which pairs of instructions are related by the happens before relation. The relation is introduced in two steps. If instructions x and y are executed by the same processor, then we say that x happens before y if and only if x is executed before y. When x and y are executed by two dierent processors, then we say that x happens before y if and only if there is a chain of instructions and messages snd 1 rcv2... snd2... rcvk 1...snd k 1 rcvk

34 Distributed Algorithms for k 2, such that snd 1 is either equal to x or is executed after x by the same processor that executes x; rcv k is either equal to y or is executed before y by the same processor that executes y; rcv h is executed before snd h by the same processor, 2 h < k; and snd h sends a message that is received by rcv h+1, 1 h < k. Note that no instruction happens before itself. We write x < HB y when x happens before y. We omit the reference to the execution for which the relation is dened, because it will be clear from the context which execution we mean. We say that two instructions x and y are concurrent when neither x < HB y nor y < HB x. The question stands how processors can determine if one instruction happens before another in a given execution according to our denition. This question can be answered through a generalisation of the Logical-Clock algorithm presented earlier. This generalisation is called vector clocks. The Vector-Clocks algorithm allows processors to relate instructions, and this relation is exactly the happens before relation. Each processor p i maintains a vector V i of n integers. The j-th coordinate of the vector is denoted by V i [j]. The vector is initialised to the zero vector (0,..., 0). A vector is modied each time processor executes an instruction, in a way similar to the way counter was modied in the Logical-Clock algorithm. Specically, when a processor p i executes any instruction other than sending or receiving a message, the coordinate V i [i] gets incremented by one, and other coordinates remain intact. When a processor sends a message, it increments V i [i] by one, and attaches the resulting vector V i to the message. When a processor p j receives a message, then the processor retrieves the vector V attached to the message, calculates coordinatewise maximum of the current vector V j and the vector V, except for coordinate V j [j] that gets incremented by one, and assigns the result to the variable V j. V j [j] V j [j] + 1 for all k [n] \ {j} V j [k] max{v j [k], V [k]} We label each instruction x executed by processor p i with the value of the vector V i right after the instruction has been executed. The label is denoted by V T (x) and is called vector timestamp of instruction x. Intuitively, V T (x) represents the knowledge of processor p i about how many instructions each processor has executed at the moment when p i has executed instruction x. This knowledge may be obsolete. Vector timestamps can be used to order instructions that have been executed. Specically, given two instructions x and y, and their vector timestamps V T (x) and V T (y), we write that x V T y when the vector V T (x) is majorised by the vector V T (y) i.e., for all k, the coordinate V T (x)[k] is at most the corresponding coordinate V T (y)[k]. We write x < V T y when x V T y but V T (x) V T (y). The next theorem explains that the Vector-Clocks algorithm indeed implements the happens before relation, because we can decide if two instructions happen or not before each other, just by comparing the vector timestamps of the instructions. Theorem For any execution and any two instructions x and y, x < HB y if and only if x < V T y.

35 13.5. Logical time, causality, and consistent state 611 Proof We rst show the forward implication. Suppose that x < HB y. Hence x and y are two dierent instructions. If the two instructions are executed on the same processor, then x must be executed before y. Only nite number of instructions have been executed by the time y has been executed. The Vector-Clock algorithm increases a coordinate by one as it calculates vector timestamps of instructions from x until y inclusive, and no coordinate is ever decreased. Thus x < V T y. If x and y were executed on dierent processors, then by the denition of happens before relation, there must be a nite chain of instructions and messages leading from x to y. But then by the Vector-Clock algorithm, the value of a coordinate of vector timestamp gets increased at each move, as we move along the chain, and so again x < V T y. Now we show the reverse implication. Suppose that it is not the case that x < HB y. We consider a few subcases always concluding that it is not that case that x < V T y. First, it could be the case that x and y are the same instruction. But then obviously vector clocks assigned to x and y are the same, and so it cannot be the case that x < V T y. Let us, therefore, assume that x and y are dierent instructions. If they are executed by the same processor, then x cannot be executed before y, and so x is executed after y. Thus, by monotonicity of vector timestamps, y < V T x, and so it is not the case that x < V T y. The nal subcase is when x and y are executed by two distinct processors p i and p j. Let us focus on the component i of vector clock V i of processor p i right after x was executed. Let its value be k. Recall that other processors can only increase the value of their components i by adopting the value sent by other processors. Hence, in order for the value of component i of processor p j to be k or more at the moment y is executed, there must be a chain of instructions and messages that passes a value at least k, originating at processor p i. This chain starts at x or at an instruction executed by p i subsequent to x. But the existence of such chain would imply that x happens before y, which we assumed was not the case. So the component i of vector clock V T (y) is strictly smaller than the component i of vector clock V T (x). Thus it cannot be the case that x < V T y. This theorem tells us that we can decide if two distinct instructions x and y are concurrent, by checking that it is not the case that V T (x) < V T (y) nor is it the case that V T (x) > V T (y) Consistent state The happens before relation can be used to compute a global state of distributed system, such that this state is in some sense consistent. Shortly, we will formally dene the notion of consistency. Each processor executes instructions. A cut K is dened as a vector K = (k 1,..., k n ) of non-negative integers. Intuitively, the vector K denotes the states of processors. Formally, k i denotes the number of instructions that processor p i has executed. Not all cuts correspond to collections of states of distributed processors that could be considered natural or consistent. For example, if a processor p i has received a message from p j and we record the state of p i in the cut by making k i appropriately large, but make k j so small that the cut contains the state of the sender before the moment when the message was sent, then we could say that such cut is not naturalthere are instructions recorded in the cut

36 Distributed Algorithms that are causally aected by instructions that are not recorded in the cut. Such cuts we consider not consistent and so undesirable. Formally, a cut K = (k 1,..., k n ) is inconsistent when there are processors p i and p j such that the instruction number k i of processor p i is causally aected by an instruction subsequent to instruction number k j of processor p j. So in an inconsistent cut there is a message that crosses the cut in a backward direction. Any cut that is not inconsistent is called a consistent cut Ṫhe Consistent-Cut algorithm uses vector timestamps to nd a consistent cut. We assume that each processor is given the same cut K = (k 1,..., k n ) as an input. Then processors must determine a consistent cut K that is majorised by K. Each processor p i has an innite table V T i [0, 1, 2,...] of vectors. Processor executes instructions, and stores vector timestamps in consecutive entries of the table. Specically, entry m of the table is the vector timestamp V T i [m] of the m- th instruction executed by the processor; we dene V T i [0] to be the zero vector. Processor p i begins calculating a cut right after the moment when the processor has executed instruction number k i. The processor determines the largest number k i 0 that is at most k i, such that the vector V T i [k i ] is majorised by K. The vector K = (k 1,..., k n) that processors collectively nd turns out to be a consistent cut. Theorem For any cut K, the cut K computed by the Consistent-Cut algorithm is a consistent cut majorised by K. Proof First observe that there is no need to consider entries of V T i further than k i. Each of these entries is not majorised by K, because the i-th coordinate of any of these vectors is strictly larger than k i. So we can indeed focus on searching among the rst k i entries of V T i. Let k i 0 be the largest entry such that the vector V T i[k i is majorised by the vector K. We know that such vector exists, because ] V T i [0] is a zero vector, and such vector is majorised by any cut K. We argue that (k 1,..., k n) is a consistent cut by way of contradiction. Suppose that the vector (k 1,..., k n) is an inconsistent cut. Then, by denition, there are processors p i and p j such that there is an instruction x of processor p i subsequent to instruction number k i, such that x happens before instruction number k j of processor p j. Recall that k i is the furthest entry of V T i majorised by K. So entry k i + 1 is not majorised by K, and since all subsequent entries, including the one for instruction x, can have only larger coordinates, the entries are not majorised by K either. But, x happens before instruction number k j, so entry k j can only have lager coordinates than respective coordinates of the entry corresponding to x, and so V T j [k j be majorised by ] cannot K either. This contradicts the assumption that V T j [k j ] is majorised by K. Therefore, (k 1,..., k n) must be a consistent cut. There is a trivial algorithm for nding a consistent cut. The algorithm picks K = (0,..., 0). However, the Consistent-Cut algorithm is better in the sense that the consistent cut found is maximal. That this is indeed true, is left as an exercise. There is an alternative way to nd a consistent cut. The Consistent Cut algorithm requires that we attach vector timestamps to messages and remember vector timestamps for all instructions executed so far by the algorithm A which consistent cut we want to compute. This may be too costly. The algorithm called Distributed-

37 13.5. Logical time, causality, and consistent state 613 Snapshot avoids this cost. In the algorithm, a processor initiates the calculation of consistent cut by ooding the network with a special message that acts like a sword that cuts the execution of algorithm A consistently. In order to prove that the cut is indeed consistent, we require that messages are received by the recipient in the order they were sent by the sender. Such ordering can be implemented using sequence number. In the Distributed-Snapshot algorithm, each processor p i has a variable called counter that counts the number of instructions of algorithm A executed by the processor so far. In addition the processor has a variable k i that will store the i-th coordinate of the cut. This variable is initialised to. Since the variables counter only count the instructions of algorithm A, the instructions of Distributed-Snapshot algorithm do not aect the counter variables. In some sense the snapshot algorithm runs in the background. Suppose that there is exactly one processor that can decide to take a snapshot of the distributed system. Upon deciding, the processor oods the network with a special message <Snapshot>. Specically, the processor sends the message to all its neighbours and assigns counter to k i. Whenever a processor p j receives the message and the variable k j is still, then the processor sends <Snapshot> message to all its neighbours and assigns current to k j. The sending of <Snapshot> messages and assignment are done by the processor without executing any instruction of A (we can think of Distributed-Snapshot algorithm as an interrupt). The algorithm calculates a consistent cut. Theorem Let for any processors p i and p j, the messages sent from p i to p j be received in the order they are sent. The Distributed-Snapshot algorithm eventually nds a consistent cut (k 1,..., k n ). The algorithm sends O(e) messages, where e is the number of edges in the graph. Proof The fact that each variable k i is eventually dierent from follows from our model, because we assumed that instructions are eventually executed and messages are eventually received, so the <Snapshot> messages will eventually reach all nodes. Suppose that (k 1,..., k n ) is not a consistent cut. Then there is a processor p j such that instruction number k j + 1 or later sends a message <M> other than <Snapshot>, and the message is received on or before a processor p i executes instruction number k i. So the message <M> must have been sent after the message <Snapshot> was sent from p j to p i. But messages are received in the order they are sent, so p i processes <Snapshot> before it processes <M>. But then message <M> arrives after snapshot was taken at p i. This is a desired contradiction. Exercises Show that logical time preserves the happens before (< HB ) relation. That is, show that if for events x and y it is the case that x < HB y, then LT (x) < LT (y), where LT ( ) is the logical time of an event Show that any vector clock that captures concurrency between n processors must have at least n coordinates Show that the vector K calculated by the algorithm Consistent-Cut is in fact a maximal consistent cut majorised by K. That is that there is no K that majorises K and is dierent from K, such that K is majorised by K.

38 Distributed Algorithms Communication services Among the fundamental problems in distributed systems where processors communicate by message passing are the tasks of spreading and gathering information. Many distributed algorithms for communication networks can be constructed using building blocks that implement various broadcast and multicast services. In this section we present some basic communication services in the message-passing model. Such services typically need to satisfy some quality of service requirements dealing with ordering of messages and reliability. We rst focus on broadcast services, then we discuss more general multicast services Properties of broadcast services In the broadcast problem, a selected processor p i, called a source or a sender, has the message m, which must be delivered to all processors in the system (including the source). The interface of the broadcast service is specied as follows: bc-send i (m, qos) : an event of processor p i that sends a message m to all processors. bc-recv i (m, j, qos) : an event of processor p i that receives a message m sent by processor p j. In above denitions qos denotes the quality of service provided by the system. We consider two kinds of quality service: Ordering: how the order of received messages depends on the order of messages sent by the source? Reliability: how the set of received messages depends on the failures in the system? The basic model of a message-passing distributed system normally does not guarantee any ordering or reliability of messaging operations. In the basic model we only assume that each pair of processors is connected by a link, and message delivery is independent on each link the order of received messages may not be related to the order of the sent messages, and messages may be lost in the case of crashes of senders or receivers. We present some of the most useful requirements for ordering and reliability of broadcast services. The main question we address is how to implement a stronger service on top of the weaker service, starting with the basic system model. Variants of ordering requirements Applying the denition of happens before to messages, we say that message m happens before message m if either m and m are sent by the same processor and m is sent before m, or the bc-recv event for m happens before the bc-send event for m. We identify four common broadcast services with respect to the message ordering properties: Basic Broadcast: no order of messages is guaranteed. Single-Source FIFO (rst-in-rst-out): messages sent by one processor are received by each processor in the same order as sent; more precisely, for all processors p i, p j and messages m, m, if processor p i sends m before it sends m then processor p j does not receive message m before message m.

39 13.6. Communication services 615 Causal Order: messages are received in the same order as they happen; more precisely, for all messages m, m and every processor p i, if m happens before m then p i does not receive m before m. Total Order: the same order of received messages is preserved in each processor; more precisely, for all processors p i, p j and messages m, m, if processor p i receives m before it receives m then processor p j does not receive message m before message m. It is easy to see that Causal Order implies Single-Source FIFO requirements (since the relation happens before for messages includes the order of messages sent by one processor), and each of the given services trivially implies Basic Broadcast. There are no additional relations between these four services. For example, there are executions that satisfy Single-Source FIFO property, but not Causal Order. Consider two processors p 0 and p 1. In the rst event p 0 broadcasts message m, next processor p 1 receives m, and then p 1 broadcasts message m. It follows that m happens before m. But if processor p 0 receives m before m, which may happen, then this execution violates Causal Order. Note that trivially Single-Source FIFO requirement is preserved, since each processor broadcasts only one message. We denote by bb the Basic Broadcast service, by ssf the Single-Source FIFO, by co the Causal Order and by to the Total Order service. Reliability requirements In the model without failures we would like to guarantee the following properties of broadcast services: Integrity: each message m received in event bc-recv has been sent in some bc-send event. No-Duplicates: each processor receives a message not more than once. Liveness: each message sent is received by all processors. In the model with failures we dene the notion of reliable broadcast service, which satises Integrity, No-Duplicates and two kinds of Liveness properties: Nonfaulty Liveness: each message m sent by non-faulty processor p i must be received by every non-faulty processor. Faulty Liveness: each message sent by a faulty processor is either received by all non-faulty processors or by none of them. We denote by rbb the Reliable Basic Broadcast service, by rssf the Reliable Single-Source FIFO, by rco the Reliable Causal Order, and by rto the Reliable Total Order service Ordered broadcast services We now describe implementations of algorithms for various broadcast services.

40 Distributed Algorithms Implementing basic broadcast on top of asynchronous point-to-point messaging The bb service is implemented as follows. If event bc-send i (m, bb) occurs then processor p i sends message m via every link from p i to p j, where 0 i n 1. If a message m comes to processor p j then it enables event bc-recv j (m, i, bb). To provide reliability we do the following. We build the reliable broadcast on the top of basic broadcast service. When bc-send i (m, rbb) occurs, processor p i enables event bc-send i ( m, i, bb). If event bc-recv j ( m, i, k, bb) occurs and messagecoordinate m appears for the rst time then processor p j rst enables event bcsend j ( m, i, bb) (to inform other non-faulty processors about message m in case when processor p i is faulty), and next enables event bc-recv j (m, i, rbb). We prove that the above algorithm provides reliability for the basic broadcast service. First observe that Integrity and No-Duplicates properties follow directly from the fact that each processor p j enables bc-recv j (m, i, rbb) only if messagecoordinate m is received for the rst time. Nonfaulty liveness is preserved since links between non-faulty processors enables events bc-recv j (,, bb) correctly. Faulty Liveness is guaranteed by the fact that if there is a non-faulty processor p j which receives message m from the faulty source p i, then before enabling bc-recv j (m, i, rbb) processor p j sends message m using bc-send j event. Since p j is non-faulty, each nonfaulty processor p k gets message m in some bc-recv k ( m, i,, bb) event, and then accepts it (enabling event bc-recv k (m, i, rbb)) during the rst such event. Implementing single-source FIFO on top of basic broadcast service Each processor p i has its own counter (timestamp), initialised to 0. If event bcsend i (m, ssf) occurs then processor p i sends message m with its current timestamp attached, using bc-send i (< m, timestamp >, bb). If an event bc-recv j (< m, t >, i,bb) occurs then processor p j enables event bc-recv j (m, i,ssf) just after events bcrecv j (m 0, i,ssf),..., bc-recv j (m t 1, i,ssf) have been enabled, where m 0,..., m t 1 are the messages such that events bc-recv j (< m 0, 0 >, i,bb),...,bc-recv j (< m t 1, t 1 >, i,bb) have been enabled. Note that if we use reliable Basic Broadcast instead of Basic Broadcast as the background service, the above implementation of Single-Source FIFO becomes Reliable Single-Source FIFO service. We leave the proof to the reader as an exercise. Implementing causal order and total order on the top of single-source FIFO service We present an ordered broadcast algorithm which works in the asynchronous message-passing system providing single-source FIFO broadcast service. It uses the idea of timestamps, but in more advanced way than in the implementation of ssf. We denote by cto the service satisfying causal and total orders requirements. Each processor p i maintains in a local array T its own increasing counter (timestamp), and the estimated values of timestamps of other processors. Timestamps are used to mark messages before sendingif p i is going to broadcast a message, it increases its timestamp and uses it to tag this message (lines 11-13). During the execution processor p i estimates values of timestamps of other processors in the local vector Tif processor p i receives a message from processor p j with a tag t (times-

41 13.6. Communication services 617 tamp of p j ), it puts t into T [j] (lines 2332). Processor p i sets its current timestamp to be the maximum of the estimated timestamps in the vector T plus one (lines 2426). After updating the timestamp processor sends an update message. Processor accepts a message m with associated timestamp t from processor j if pair (t, j) is the smallest among other received messages (line 42), and each processor has at least as large a timestamp as known by processor p i (line 43). The details are given in the code below. Ordered-Broadcast Code for any processor p i, 0 i n 1 01 initialisation 02 T [j] 0 for every 0 j n 1 11 if bc-send i (m, cto) occurs 12 then T [i] T [i] enable bc-send i (< m, T [i] >, ssf) 21 if bc-recv i (< m, t >, j, ssf) occurs 22 then add triple (m, t, j) to pending 23 T [j] t 24 if t > T [i] 25 then T [i] t 26 enable bc-send i (< update, T [i] >, ssf) 31 if bc-recv i (< update, t >, j, ssf) occurs 32 then T [j] t 41 if 42 (m, t, j) is the pending triple with the smallest (t, j) and t T [k] for every 0 k n 1 43 then enable bc-recv i (m, j,cto) 44 remove triple (m, t, j) from pending Ordered-Broadcast satises the causal order requirement. We leave the proof to the reader as an exercise (in the latter part we show how to achieve stronger reliable causal order service and provide the proof for that stronger case). Theorem Ordered-Broadcast satises the total order requirement. Proof Integrity follows from the fact that each processor can enable event bcrecv i (m, j, cto) only if the triple (m, t, j) is pending (lines 4145), which may happen after receiving a message m from processor j (lines 2122). No-Duplicates property is guaranteed by the fact that there is at most one pending triple containing message m sent by processor j (lines 13 and 2122). Liveness follows from the fact that each pending triple satises conditions in lines 4243 in some moment of the execution. The proof of this fact is by induction on the events in the execution suppose to the contrary that (m, t, j) is the triple with smallest (t, j) which does not satisfy conditions in lines 4243 at any moment of the execution. It follows that there is a moment from which triple (m, t, j) has

42 Distributed Algorithms smallest (t, j) coordinates among pending triples in processor p i. Hence, starting from this moment, it must violate condition in line 43 for some k. Note that k i, j, by updating rules in lines It follows that processor p i never receives a message from p k with timestamp greater than t 1, which by updating rules in lines means that processor p k never receives a message < m, t > from j, which contradicts the liveness property of ssf broadcast service. To prove Total Order property it is sucient to prove that for every processor p i and messages m, m sent by processors p k, p l with timestamps t, t respectively, each of the triples (m, t, k), (m, t, l) are accepted according to the lexicographic order of (t, k), (t, l). There are two cases. Case 1. Both triples are pending in processor p i at some moment of the execution. Then condition in line 42 guarantees acceptance in order of (t, k), (t, l). Case 2. Triple (m, t, k) (without loss of generality) is accepted by processor p i before triple (m, t, l) is pending. If (t, k) < (t, l) then still the acceptance is according to the order of (t, k), (t, l). Otherwise (t, k) > (t, l), and by condition in line 43 we get in particular that t T [l], and consequently t T [l]. This can not happen because of the ssf requirement and the assumption that processor p i has not yet received message < m, t > from l via the ssf broadcast service. Now we address reliable versions of Causal Order and Total Order services. A Reliable Causal Order requirements can be implemented on the top of Reliable Basic Broadcast service in asynchronous message-passing system with processor crashes using the following algorithm. It uses the same data structures as previous Ordered-Bbroadcast. The main dierence between reliable Causally- Ordered-Broadcast and Ordered-Broadcast are as follows: instead of using integer timestamps processors use vector timestamps T, and they do not estimate timestamps of other processors, only compare in lexicographic order their own (vector) timestamps with received ones. The intuition behind vector timestamp of processor p i is that it stores information how many messages have been sent by p i and how many have been accepted by p i from every p k, where k i. In the course of the algorithm processor p i increases corresponding position i in its vector timestamp T before sending a new message (line 12), and increases jth position of its vector timestamp after accepting new message from processor p j (line 38). After receiving a new message from processor p j together with its vector timestamp ˆT, processor p i adds triple (m, ˆT, j) to pending and accepts this triple if it is rst not accepted message received from processor p j (condition in line 33) and the number of accepted messages (from each processor p k p i ) by processor p j was not bigger in the moment of sending m than it is now in processor p i (condition in line 34). Detailed code of the algorithm follows.

43 13.6. Communication services 619 Reliable-Causally-Ordered-Broadcast Code for any processor p i, 0 i n 1 01 initialisation 02 T [j] 0 for every 0 j n 1 03 pending list is empty 11 if bc-send i (m,rco) occurs 12 then T [i] T [i] enable bc-send i (< m, T >,rbb) 21 if bc-recv i (< m, ˆT >, j,rbb) occurs 22 then add triple (m, ˆT, j) to pending 31 if (m, ˆT, j) is the pending triple, and 32 ˆT [j] = T [j] + 1, and 33 ˆT [k] T [k] for every k i 34 then enable bc-recv i (m, j,rco) 35 remove triple (m, ˆT, j) from pending 36 T [j] T [j] + 1 We argue that the algorithm Reliable-Causally-Ordered-Broadcast provides Reliable Causal Order broadcast service on the top of the system equipped with the Reliable Basic Broadcast service. Integrity and No-Duplicate properties are guaranteed by rbb broadcast service and facts that each message is added to pending at most once and non-received message is never added to pending. Nonfaulty and Faulty Liveness can be proved by one induction on the execution, using facts that non-faulty processors have received all messages sent, which guarantees that conditions in lines are eventually satised. Causal Order requirement holds since if message m happens before message m then each processor p i accepts messages m, m according to the lexicographic order of ˆT, ˆT, and these vector-arrays are comparable in this case. Details are left to the reader. Note that Reliable Total Order broadcast service can not be implemented in the general asynchronous setting with processor crashes, since it would solve consensus in this model rst accepted message would determine the agreement value (against the fact that consensus is not solvable in the general model) Multicast services Multicast services are similar to the broadcast services, except each multicast message is destined for a specied subset of all processors.in the multicast service we provide two types of events, where qos denotes a quality of service required: mc-send i (m, D,qos) : an event of processor p i which sends a message m together with its id to all processors in a destination set D {0,..., n 1}. mc-recv i (m, j,qos) : an event of processor p i which receives a message m sent by processor p j.

44 Distributed Algorithms Note that the event mc-recv is similar to bc-recv. As in case of a broadcast service, we would like to provide useful ordering and reliable properties of the multicast services. We can adapt ordering requirements from the broadcast services. Basic Multicast does not require any ordering properties. Single-Source FIFO requires that if one processor multicasts messages (possibly to dierent destination sets), then the messages received in each processors (if any) must be received in the same order as sent by the source. Denition of Causal Order remains the same. Instead of Total Order, which is dicult to achieve since destination sets may be dierent, we dene another ordering property: Sub-Total Order: orders of received messages in all processors may be extended to the total order of messages; more precisely, for any messages m, m and processors p i, p j, if p i and p j receives both messages m, m then they are received in the same order by p i and p j. The reliability conditions for multicast are somewhat dierent from the conditions for reliable broadcast. Integrity: each message m received in event mc-recv i was sent in some mc-send event with destination set containing processor p i. No Duplicates: each processor receives a message not more than once. Nonfaulty Liveness: each message m sent by non-faulty processor p i must be received in every non-faulty processor in the destination set. Faulty Liveness: each message sent by a faulty processor is either received by all non-faulty processors in the destination set or by none of them. One way of implementing ordered and reliable multicast services is to use the corresponding broadcast services (for Sub-Total Order the corresponding broadcast requirement is Total Order). More precisely, if event mc-send i (m, D, qos) occurs processor p i enables event bc-send i (< m, D >,qos). When an event bcrecv j (< m, D >, i,qos) occurs, processor p j enables event mc-recv j (m, i,qos) if p j D, otherwise it ignores this event. The proof that such method provides required multicast quality of service is left as an exercise Rumor collection algorithms Reliable multicast services can be used as building blocks in constructing algorithms for more advanced communication problems. In this section we illustrate this method for the problem of collecting rumors by synchronous processors prone to crashes. (Since we consider only fair executions, we assume that at least one processor remains operational to the end of the computation) Rumor collection problem and requirements The classic problem of collecting rumors, or gossip, is dened as follows:

45 13.7. Rumor collection algorithms 621 At the beginning, each processor has its distinct piece of information, called a rumor, the goal is to make every processor know all the rumors. However in the model with processor crashes we need to re-dene the gossip problem to respect crash failures of processors. Both Integrity and No-Duplicates properties are the same as in the reliable broadcast service, the only dierence (which follows from the specication of the gossip problem) is in Liveness requirements: Non-faulty Liveness: the rumor of every non-faulty processor must be known by each non-faulty processor. Faulty Liveness: if processor p i has crashed during execution then each non-faulty processor either knows the rumor of p i or knows that p i is crashed. The eciency of gossip algorithms is measured in terms of time and message complexity. Time complexity measures number of (synchronous) steps from the beginning to the termination. Message complexity measures the total number of pointto-point messages sent (more precisely, if a processor sends a message to three other processors in one synchronous step, it contributes three to the message complexity). The following simple algorithm completes gossip in just one synchronous step: each processor broadcasts its rumor to all processors. The algorithm is correct, because each message received contains a rumor, and a message not received means the failure of its sender. A drawback of such a solution is that a quadratic number of messages could be sent, which is quite inecient. We would like to perform gossip not only quickly, but also with fewer point-topoint messages. There is a natural trade-o between time and communication. Note that in the system without processor crashes such a trade-o may be achieved, e.g., sending messages over the (almost) complete binary tree, and then time complexity is O(lg n), while the message complexity is O(n lg n). Hence by slightly increasing time complexity we may achieve almost linear improvement in message complexity. However, if the underlying communication network is prone to failures of components, then irregular failure patterns disturb a ow of information and make gossiping last longer. The question we address in this section is what is the best trade-o between time and message complexity in the model with processor crashes? Ecient gossip algorithms In this part we describe the family of gossip algorithms, among which we can nd some ecient ones. They are all based on the same generic code, and their eciency depends on the quality of two data structures put in the generic algorithm. Our goal is to prove that we may nd some of those data structures that obtained algorithm is always correct, and ecient if the number of crashes in the execution is at most f, where f n 1 is a parameter. We start with description of these structures: communication graph and communication schedules. Communication graph A graph G = (V, E) consists of a set V of vertices and a set E of edges. Graphs in this paper are always simple, which means that edges are pairs of vertices, with

46 Distributed Algorithms no direction associated with them. Graphs are used to describe communication patterns. The set V of vertices of a graph consists of the processors of the underlying distributed system. Edges in E determine the pairs of processors that communicate directly by exchanging messages, but this does not necessarily mean an existence of a physical link between them. We abstract form the communication mechanism: messages that are exchanged between two vertices connected by an edge in E may need to be routed and traverse a possibly long path in the underlying physical communication network. Graph topologies we use, for a given number n of processors, vary depending on an upper bound f on the number of crashes we would like to tolerate in an execution. A graph that matters, at a given point in an execution, is the one induced by the processors that have not crashed till this step of the execution. To obtain an ecient gossip algorithm, communication graphs should satisfy some suitable properties, for example the following property R(n, f): Denition Let f < n be a pair of positive integers. Graph G is said to satisfy property R(n, f), if G has n vertices, and if, for each subgraph R G of size at least n f, there is a subgraph P (R) of G, such that the following hold: 1 : P (R) R 2 : P (R) = R /7 3 : The diameter of P (R) is at most ln n 4 : If R 1 R 2, then P (R 1 ) P (R 2 ) In the above denition, clause (1.) requires the existence of subgraphs P (R) whose vertices has the potential of (informally) inheriting the properties of the vertices of R, clause (2.) requires the subgraphs to be suciently large, linear in size, clause (3.) requires the existence of paths in the subgraphs that can be used for communication of at most logarithmic length, and clause (4.) imposes monotonicity on the required subgraphs. Observe that graph P (R) is connected, even if R is not, since its diameter is nite. The following result shows that graphs satisfying property R(n, f) can be constructed, and that their degree is not too large. Theorem For each f < n, there exists a graph G(n, f) satisfying property R(n, f). The maximum degree of graph G(n, f) is O ( ) n n f. Communication schedules A local permutation is a permutation of all the integers in the range [0.. n 1]. We assume that prior the computation there is given set Π of n local permutations. Each processor p i has such a permutation π i from Π. For simplicity we assume that π i (0) = p i. Local permutation is used to collect rumor in systematic way according to the order given by this permutation, while communication graphs are rather used to exchange already collected rumors within large and compact non-faulty graph component. Generic algorithm We start with specifying a goal that gossiping algorithms need to achieve. We say

47 13.7. Rumor collection algorithms 623 that processor p i has heard about processor p j if either p i knows the original input rumor of p j or p knows that p j has already failed. We may reformulate correctness of a gossiping algorithm in terms of hearing about other processors: algorithm is correct if Integrity and No-Duplicates properties are satised and if each processor has hard about any other processor by the termination of the algorithm. The code of a gossiping algorithm includes objects that depend on the number n of processors in the system, and also on the bound f < n on the number of failures which are eciently tolerated (if the number of failures is at most f then message complexity of design algorithm is small). The additional parameter is a termination threshold τ which inuences time complexity of the specic implementation of the generic gossip scheme. Our goal is to construct the generic gossip algorithm which is correct for any additional parameters f, τ and any communication graph and set of schedules, while ecient for some values f, τ and structures G(n, f) and Π. Each processor starts gossiping as a collector. Collectors seek actively information about rumors of the other processors, by sending direct inquiries to some of them. A collector becomes a disseminator after it has heard about all the processors. Processors with this status disseminate their knowledge by sending local views to selected other processors. Local views. Each processor p i starts with knowing only its ID and its input information rumor i. To store incoming data, processor p i maintains the following arrays: Rumors i, Active i and Pending i, each of size n. All these arrays are initialised to store the value nil. For an array X i of processor p i, we denote its jth entry by X i [j] - intuitively this entry contains some information about processor p j. The array Rumor is used to store all the rumors that a processor knows. At the start, processor p i sets Rumors i [i] to its own input rumor i. Each time processor p i learns some rumor j, it immediately sets Rumors i [j] to this value. The array Active is used to store a set of all the processors that the owner of the array knows as crashed. Once processor p i learns that some processor p j has failed, it immediately sets Active i [j] to failed. Notice that processor p i has heard about processor p j, if one among the values Rumors i [j] and Active i [j] is not equal to nil. The purpose of using the array Pending is to facilitate dissemination. Each time processor p i learns that some other processor p j is fully informed, that is, it is either a disseminator itself or has been notied by a disseminator, then it marks this information in Pending i [j]. Processor p i uses the array Pending i to send dissemination messages in a systematic way, by scanning Pending i to nd those processors that possibly still have not heard about some processor. The following is a useful terminology about the current contents of the arrays Active and Pending. Processor p j is said to be active according to p i, if p i has not yet received any information implying that p j crashed, which is the same as having nil in Active i [j]. Processor p j is said to need to be notied by p i if it is active according to p i and Pending i [j] is equal to nil. Phases. An execution of a gossiping algorithm starts with the processors initialising all the local objects. Processor p i initialises its list Rumors i with nil at all the

48 Distributed Algorithms locations, except for the ith one, which is set equal to rumor i. The remaining part of execution is structured as a loop, in which phases are iterated. Each phase consists of three parts: receiving messages, local computation, and multicasting messages. Phases are of two kinds: regular phase and ending phase. During regular phases processor: receives messages, updates local knowledge, checks its status, sends its knowledge to neighbours in communication graphs as well as inquiries about rumors and replies about its own rumor. During ending phases processor: receives messages, sends inquiries to all processors from which it has not heard yet, and replies about its own rumor. The regular phases are performed τ times; the number τ is a termination threshold. After this, the ending phase is performed four times. This denes a generic gossiping algorithm. Generic-Gossip Code for any processor p i, 0 i n 1 01 initialisation 02 processor p i becomes a collector 03 initialisation of arrays Rumors i, Active i and Pending i 11 repeat τ times 12 perform regular phase 20 repeat 4 times 21 perform ending phase Now we describe communication and kinds of messages used in regular and ending phases. Graph and range messages used during regular phases. A processor p i may send a message to its neighbour in the graph G(n, f), provided that it is is still active according to p i. Such a message is called a graph one. Sending these messages only is not sucient to complete gossiping, because the communication graph may become disconnected as a result of node crashes. Hence other messages are also sent, to cover all the processors in a systematic way. In this kind of communication processor p i considers the processors as ordered by its local permutation π i, that is, in the order π i (0), π i (1),..., π i (n 1). Some of additional messages sent in this process are called range ones. During regular phase processors send the following kind of range messages: inquiring, reply and notifying messages. A collector p i sends an inquiring message to the rst processor about which p i has not heard yet. Each recipient of such a message sends back a range message that is called a reply one. Disseminators send range messages also to subsets of processors. Such messages are called notifying ones. The target processor selected by disseminator p i is the rst one that still needs to be notied by p i. Notifying messages need not to be replied to: a sender already knows the rumors of all the processors, that are active according to it, and the purpose of the message is to disseminate this knowledge.

49 13.7. Rumor collection algorithms 625 Regular-Phase Code for any processor p i, 0 i n 1 01 receive messages 11 perform local computation 12 update the local arrays 13 if p i is a collector, that has already heard about all the processors 14 then p i becomes a disseminator 15 compute set of destination processors: for each processor p j 16 if p j is active according to p i and p j is a neighbour of p i in graph G(n, t) 17 then add p j to destination set for a graph message 18 if p i is a collector and p j is the rst processor about which p i has not heard yet 19 then send an inquiring message to p j 20 if p i is a disseminator and p j is the rst processor that needs to be notied by p i 21 then send a notifying message to p j 22 if p j is a collector, from which an inquiring message was received in the receiving step of this phase 23 then send a reply message to p j 30 send graph/inquiring/notifying/reply messages to corresponding destination sets Last-resort messages used during ending phases. Messages sent during the ending phases are called last-resort ones. These messages are categorised into inquiring, replying, and notifying, similarly as the corresponding range ones, which is because they serve a similar purpose. Collectors that have not heard about some processors yet send direct inquiries to all of these processors simultaneously. Such messages are called inquiring ones. They are replied to by the non-faulty recipients in the next step, by way of sending reply messages. This phase converts all the collectors into disseminators. In the next phase, each disseminator sends a message to all the processors that need to be notied by it. Such messages are called notifying ones Ṫhe number of graph messages, sent by a processor at a step of the regular phase, is at most as large as the maximum node degree in the communication graph. The number of range messages, sent by a processor in a step of the regular phase, is at most as large as the number of inquiries received plus a constant - hence the global number of point-to-point range messages sent by all processors during regular phases may be accounted as a constant times the number of inquiries sent (which is one per processor per phase). In contrast to that, there is no a priori upper bound on the number of messages sent during the ending phase. By choosing the termination threshold τ to be large enough, one may control how many rumors still needs to be collected during the ending phases. Updating local view. A message sent by a processor carries its current local knowledge. More precisely, a message sent by processor p i brings the following: the ID p i, the arrays Rumors i, Active i, and Pending i, and a label to notify the

50 Distributed Algorithms recipient about the character of the message. A label is selected from the following: graph_message, inquiry_from_collector, notication_from_disseminator, this_is_a_reply, their meaning is self-explanatory. A processor p i scans a newly received message from some processor p j to learn about rumors, failures, and the current status of other processors. It copies each rumor from the received copy of Rumors j into Rumors i, unless it is already there. It sets Active i [k] to failed, if this value is at Active j [k]. It sets Pending i [k] to done, if this value is at Pending j [k]. It sets Pending i [j] to done, if p j is a disseminator and the received message is a range one. If p i is itself a disseminator, then it sets Pending i [j] to done immediately after sending a range message to p j. If a processor p i expects a message to come from processor p j, for instance a graph one from a neighbour in the communication graph, or a reply one, and the message does not arrive, then p i knows that processor p j has failed, and it immediately sets Active i [j] to failed. Ending-Phase Code for any processor p i, 0 i n 1 01 receive messages 11 perform local computation 12 update the local arrays 13 if p i is a collector, that has already heard about all the processors 14 then p i becomes a disseminator 15 compute set of destination processors: for each processor p j 16 if p i is a collector and it has not heard about p j yet 17 then send an inquiring message to p j 18 if p i is a disseminator and p j needs to be notied by p i 19 then send a notifying message to p j 20 if an inquiring message was received from p j in the receiving step of this phase 21 then send a reply message to p j 30 send inquiring/notifying/reply messages to corresponding destination sets Correctness. Ending phases guarantee correctness, as is stated in the next fact. Lemma Generic-Gossip is correct for every communication graph G(n, f) and set of schedules Π. Proof Integrity and No-Duplicates properties follow directly from the code and the multicast service in synchronous message-passing system. It remains to prove that each processor has heard about all processors. Consider the step just before the rst ending phases. If a processor p i has not heard about some other processor p j yet, then it sends a last-resort message to p j in the rst ending phase. It is replied to in the second ending phase, unless processor p j has crashed already. In any case, in the third ending phase, processor p i either learns the input rumor of p j or it gets to know that p j has failed. The fourth ending phase provides an opportunity to receive notifying messages, by all the processors that such messages were sent to by p i.

51 13.7. Rumor collection algorithms 627 The choice of communication graph G(n, f), set of schedules Π and termination threshold τ inuences however time and message complexities of the specic implementation of Generic Gossip Algorithm. First consider the case when G(n, f) is a communication graph satisfying property R(n, f) from Denition 13.27, Π contains n random permutations, and τ = c log 2 n for suciently large positive constant c. Using Theorem we get the following result. Theorem For every n and f c n, for some constant 0 c < 1, there is a graph G(n, f) such that the implementation of the generic gossip scheme with G(n, f) as a communication graph and a set Π of random permutations completes gossip in expected time O(log 2 n) and with expected message complexity O(n log 2 n), if the number of crashes is at most f. Consider a small modication of Generic Gossip scheme: during regular phase every processor p i sends an inquiring message to the rst (instead of one) processors according to permutation π i, where is a maximum degree of used communication graph G(n, f). Note that it does not inuence the asymptotic message complexity, since besides inquiring messages in every regular phase each processor p i sends graph messages. Theorem For every n there are parameters f n 1 and τ = O(log 2 n) and there is a graph G(n, f) such that the implementation of the modied Generic Gossip scheme with G(n, f) as a communication graph and a set Π of random permutations completes gossip in expected time O(log 2 n) and with expected message complexity O(n ), for any number of crashes. Since in the above theorem set Π is selected prior the computation, we obtain the following existential deterministic result. Theorem For every n there are parameters f n 1 and τ = O(lg n) and there are graph G(n, f) and set of schedules Π such that the implementation of the modied Generic Gossip scheme with G(n, f) as a communication graph and schedules Π completes gossip in time O(lg n) and with message complexity O(n ), for any number of crashes. Exercises Design executions showing that there is no relation between Causal Order and Total Order and between Single-Source FIFO and Total Order broadcast services. For simplicity consider two processors and two messages sent Does broadcast service satisfying Single-Source FIFO and Causal Order requirements satisfy a Total Order property? Does broadcast service satisfying Single- Source FIFO and Total Order requirements satisfy a Causal Order property? If yes provide a proof, if not show a counterexample Show that using reliable Basic Broadcast instead of Basic Broadcast in the implementation of Single-Source FIFO service, then we obtain reliable Single-Source FIFO broadcast.

52 Distributed Algorithms Prove that the Ordered Broadcast algorithm implements Causal Order service on a top of Single-Source FIFO one What is the total number of point-to-point messages sent in the algorithm Ordered-Broadcast in case of k broadcasts? Estimate the total number of point-to-point messages sent during the execution of Reliable-Causally-Ordered-Broadcast, if it performs k broadcast and there are f < n processor crashes during the execution Show an execution of the algorithm Reliable-Causally-Ordered- Broadcast which violates Total Order requirement Write a code of the implementation of reliable Sub-Total Order multicast service Show that the described method of implementing multicast services on the top of corresponding broadcast services is correct Show that the random graph G(n, f) - in which each node selects independently at random n n f log n edges from itself to other processors - satises property R(n, f) from Denition and has degree O( n n f lg n) with probability at least 1 O(1/n) Leader election problem is as follows: all non-faulty processors must elect one non-faulty processor in the same synchronous step. Show that leader election can not be solved faster than gossip problem in synchronous message-passing system with processors crashes Mutual exclusion in shared memory We now describe the second main model used to describe distributed systems, the shared memory model. To illustrate algorithmic issues in this model we discuss solutions for the mutual exclusion problem Shared memory systems The shared memory is modeled in terms of a collection of shared variables, commonly referred to as registers. We assume the system contains n processors, p 0,..., p n 1, and m registers R 0,..., R m 1. Each processor is modeled as a state machine. Each register has a type, which species: 1. the values it can hold, 2. the operations that can be performed on it, 3. the value (if any) to be returned by each operation, and 4. the new register value resulting from each operation. Each register can have an initial value. For example, an integer valued read/write register R can take on all integer values and has operations read(r,v) and write(r,v). The read operation returns the value v of the last preceding write, leaving R unchanged. The write(r,v) operation has an integer parameter v, returns no value and changes R's value to v. A congu-

53 13.8. Mutual exclusion in shared memory 629 ration is a vector C = (q 0,..., q n 1, r 0,..., r m 1 ), where q i is a state of p i and r j is a value of register R j. The events are computation steps at the processors where the following happens atomically (indivisibly): 1. p i chooses a shared variable to access with a specic operation, based on p i 's current state, 2. the specied operation is performed on the shared variable, 3. p i 's state changes based on its transition function, based on its current state and the value returned by the shared memory operation performed. A nite sequence of congurations and events that begins with an initial conguration is called an execution. In the asynchronous shared memory system, an innite execution is admissible if it has an innite number of computation steps The mutual exclusion problem In this problem a group of processors need to access a shared resource that cannot be used simultaneously by more than a single processor. The solution needs to have the following two properties. (1) Mutual exclusion: Each processor needs to execute a code segment called a critical section so that at any given time at most one processor is executing it (i.e., is in the critical section). (2) Deadlock freedom: If one or more processors attempt to enter the critical section, then one of them eventually succeeds as long as no processor stays in the critical section forever. These two properties do not provide any individual guarantees to any processor. A stronger property is (3) No lockout: A processor that wishes to enter the critical section eventually succeeds as long as no processor stays in the critical section forever. Original solutions to this problem relied on special synchronisation support such as semaphores and monitors. We will present some of the distributed solutions using only ordinary shared variables. We assume the program of a processor is partitioned into the following sections: Entry / Try: the code executed in preparation for entering the critical section. Critical: the code to be protected from concurrent execution. Exit: the code executed when leaving the critical section. Remainder: the rest of the code. A processor cycles through these sections in the order: remainder, entry, critical and exit. A processor that wants to enter the critical section rst executes the entry section. After that, if successful, it enters the critical section. The processor releases the critical section by executing the exit section and returning to the remainder section. We assume that a processor may transition any number of times from the remainder to the entry section. Moreover, variables, both shared and local, accessed in the entry and exit section are not accessed in the critical and remainder section. Finally, no processor stays in the critical section forever. An algorithm for a shared memory system solves the mutual exclusion problem with no deadlock (or no lockout) if the following hold:

54 Distributed Algorithms Mutual Exclusion: In every conguration of every execution at most one processor is in the critical section. No deadlock: In every admissible execution, if some processor is in the entry section in a conguration, then there is a later conguration in which some processor is in the critical section. No lockout: In every admissible execution, if some processor is in the entry section in a conguration, then there is a later conguration in which that same processor is in the critical section. In the context of mutual exclusion, an execution is admissible if for every processor p i, p i either takes an innite number of steps or p i ends in the remainder section. Moreover, no processor is ever stuck in the exit section (unobstructed exit condition) Mutual exclusion using powerful primitives A single bit suces to guarantee mutual exclusion with no deadlock if a powerful test&set register is used. A test&set variable V is a binary variable which supports two atomic operations, test&set and reset, dened as follows: test&set(v : memory address) returns binary value: temp V V 1 return (temp) reset(v : memory address): V 0 The test&set operation atomically reads and updates the variable. The reset operation is merely a write. There is a simple mutual exclusion algorithm with no deadlock, which uses one test&set register. Mutual exclusion using one test&set register Initially V equals 0 Entry : 1 wait until test&set(v ) = 0 Critical Section Exit : 2 reset(v ) Remainder Assume that the initial value of V is 0. In the entry section, processor p i repeatedly tests V until it returns 0. The last such test will assign 1 to V, causing any following test by other processors to return 1, prohibiting any other processor from entering the critical section. In the exit section p i resets V to 0; another processor

55 13.8. Mutual exclusion in shared memory 631 waiting in the entry section can now enter the critical section. Theorem The algorithm using one test &set register provides mutual exclusion without deadlock Mutual exclusion using read/write registers If a powerful primitive such as test&set is not available, then mutual exclusion must be implemented using only read/write operations. The bakery algorithm Lamport's bakery algorithm for mutual exclusion is an early, classical example of such an algorithm that uses only shared read/write registers. The algorithm guarantees mutual exclusion and no lockout for n processors using O(n) registers (but the registers may need to store integer values that cannot be bounded ahead of time). Processors wishing to enter the critical section behave like customers in a bakery. They all get a number and the one with the smallest number in hand is the next one to be served. Any processor not standing in line has number 0, which is not counted as the smallest number. The algorithm uses the following shared data structures: Number is an array of n integers, holding in its i-th entry the current number of processor p i. Choosing is an array of n boolean values such that Choosing[i] is true while p i is in the process of obtaining its number. Any processor p i that wants to enter the critical section attempts to choose a number greater than any number of any other processor and writes it into Number[i]. To do so, processors read the array Number and pick the greatest number read +1 as their own number. Since however several processors might be reading the array at the same time, symmetry is broken by choosing (Number[i], i) as i's ticket. An ordering on tickets is dened using the lexicographical ordering on pairs. After choosing its ticket, p i waits until its ticket is minimal: For all other p j, p i waits until p j is not in the process of choosing a number and then compares their tickets. If p j 's ticket is smaller, p i waits until p j executes the critical section and leaves it. Bakery Code for processor p i, 0 i n 1. Initially Number[i] = 0 and Choosing[i] = false, for 0 i n 1 Entry : 1 Choosing[i] true 2 Number[i] max(number[0],..., Number[n 1]) Choosing[i] false

56 Distributed Algorithms 4 for j 1 to n ( i) do 5 wait until Choosing[j] = false 6 wait until Number[j] = 0 or (Number[j],j > (Number[i],i) Critical Section Exit : 7 Number[i] 0 Remainder We leave the proofs of the following theorems as Exercises and Theorem Bakery guarantees mutual exclusion. Theorem Bakery guarantees no lockout. A bounded mutual exclusion algorithm for n processors Lamports Bakery algorithm requires the use of unbounded values. We next present an algorithm that removes this requirement. In this algorithm, rst presented by Peterson and Fischer, processors compete pairwise using a two-processor algorithm in a tournament tree arrangement. All pairwise competitions are arranged in a complete binary tree. Each processor is assigned to a specic leaf of the tree. At each level, the winner in a given node is allowed to proceed to the next higher level, where it will compete with the winner moving up from the other child of this node (if such a winner exists). The processor that nally wins the competition at the root node is allowed to enter the critical section. Let k = log n 1. Consider a complete binary tree with 2 k leaves and a total of 2 k+1 1 nodes. The nodes of the tree are numbered inductively in the following manner: The root is numbered 1; the left child of node numbered m is numbered 2m and the right child is numbered 2m + 1. Hence the leaves of the tree are numbered 2 k, 2 k + 1,...,2 k+1 1. With each node m, three binary shared variables are associated: Want m [0], Want m [1], and Priority m. All variables have an initial value of 0. The algorithm is recursive. The code of the algorithm consists of a procedure Node(m, side) which is executed when a processor accesses node m, while assuming the role of processor side. Each node has a critical section. It includes the entry section at all the nodes on the path from the nodes parent to the root, the original critical section and the exit code on all nodes from the root to the nodes parent. To begin, processor p i executes the code of node (2 k + i/2, i mod 2). Tournament-Tree procedure Node(m: integer; side: 0.. 1) 1 Want m [side] 0 2 wait until (Want m [1 side] = 0 or Priority m = side) 3 Want m [side] 1

57 13.8. Mutual exclusion in shared memory if Priority m = 1 side 5 then if Want m [1 side] = 1) 6 then goto line 1 7 else wait until Want m [1 side] = 0 8 if v = 1 9 then Critical Section 10 else Node( m/2, m mod 2) 11 Priority m = 1 side 12 Want m [side] 0 end procedure This algorithm uses bounded values and as the next theorem shows, satises the mutual exclusion, no lockout properties: Theorem The tournament tree algorithm guarantees mutual exclusion. Proof Consider any execution. We begin at the nodes closest to the leaves of the tree. A processor enters the critical section of this node if it reaches line 9 (it moves up to the next node). Assume we are at a node m that connects to the leaves where p i and p j start. Assume that two processors are in the critical section at some point. It follows from the code that then W ant m [0] = W ant m [1] = 1 at this point. Assume, without loss of generality that p i 's last write to W ant m [0] before entering the critical section follows p j 's last write to W ant m [1] before entering the critical section. Note that p i can enter the critical section (of m) either through line 5 or line 6. In both cases p i reads W ant m [1] = 0. However p i 's read of W ant m [1], follows p j 's write to W ant m [0], which by assumption follows p j 's write to W ant m [1]. Hence p i 's read of W ant m [1] should return 1, a contradiction. The claim follows by induction on the levels of the tree. Theorem The tournament tree algorithm guarantees no lockout. Proof Consider any admissible execution. Assume that some processor p i is starved. Hence from some point on p i is forever in the entry section. We now show that p i cannot be stuck forever in the entry section of a node m. The claim then follows by induction. Case 1: Suppose p j executes line 10 setting Priority m to 0. Then Priority m equals 0 forever after. Thus p i passes the test in line 2 and skips line 5. Hence p i must be waiting in line 6, waiting for W ant m [1] to be 0, which never occurs. Thus p j is always executing between lines 3 and 11. But since p j does not stay in the critical section forever, this would mean that p j is stuck in the entry section forever which is impossible since p j will execute line 5 and reset W ant m [1] to 0. Case 2: Suppose p j never executes line 10 at some later point. Hence p j must be waiting in line 6 or be in the remainder section. If it is in the entry section, p j passes the test in line 2 (Priority m is 1). Hence p i does not reach line 6. Therefore p i waits in line 2 with W ant m [0] = 0. Hence p j passes the test in line 6. So p j cannot be forever in the entry section. If p j is forever in the remainder section W ant m [1]

58 Distributed Algorithms equals 0 henceforth. So p i cannot be stuck at line 2, 5 or 6, a contradiction. The claim follows by induction on the levels of the tree. Lower bound on the number of read/write registers So far, all deadlock-free mutual exclusion algorithms presented require the use of at least n shared variables, where n is the number of processors. Since it was possible to develop an algorithm that uses only bounded values, the question arises whether there is a way of reducing the number of shared variables used. Burns and Lynch rst showed that any deadlock-free mutual exclusion algorithm using only shared read/write registers must use at least n shared variables, regardless of their size. The proof of this theorem allows the variables to be multi-writer variables. This means that each processor is allowed to write to each variable. Note that if the variables are single writer, that the theorem is obvious since each processor needs to write something to a (separate) variable before entering the critical section. Otherwise a processor could enter the critical section without any other processor knowing, allowing another processor to enter the critical section concurrently, a contradiction to the mutual exclusion property. The proof by Burns and Lynch introduces a new proof technique, a covering argument: Given any no deadlock mutual exclusion algorithm A, it shows that there is some reachable conguration of A in which each of the n processors is about to write to a distinct shared variable. This is called a covering of the shared variables. The existence of such a conguration can be shown using induction and it exploits the fact that any processor before entering the critical section, must write to at least one shared variable. The proof constructs a covering of all shared variables. A processor then enters the critical section. Immediately thereafter the covering writes are released so that no processor can detect the processor in the critical section. Another processor now concurrently enters the critical section, a contradiction. Theorem Any no deadlock mutual exclusion algorithm using only read/write registers must use at least n shared variables Lamport's fast mutual exclusion algorithm In all mutual exclusion algorithms presented so far, the number of steps taken by processors before entering the critical section depends on n, the number of processors even in the absence of contention (where multiple processors attempt to concurrently enter the critical section), when a single processor is the only processor in the entry section. In most real systems however, the expected contention is usually much smaller than n. A mutual exclusion algorithm is said to be fast if a processor enters the critical section within a constant number of steps when it is the only processor trying to enter the critical section. Note that a fast algorithm requires the use of multi-writer, multi-reader shared variables. If only single writer variables are used, a processor would have to read at least n variables. Such a fast mutual exclusion algorithm is presented by Lamport.

59 13.8. Mutual exclusion in shared memory 635 Fast-Mutual-Exclusion Code for processor p i, 0 i n 1. Initially Fast-Lock and Slow-Lock are 0, and Want[i] is false for all i, 0 i n 1 Entry : 1 Want[i] true 2 Fast-Lock i 3 if Slow-Lock 0 4 then Want[i] false 5 wait until Slow-Lock =0 6 goto 1 7 Slow-Lock i 8 if Fast-Lock i 9 then Want[i] false 10 for all j, wait until Want[j] = false 11 if Slow-Lock i 12 then wait until Slow-Lock = 0 13 goto 1 Critical Section Exit : 14 Slow-Lock 0 15 Want[i] false Remainder Lamport's algorithm is based on the correct combination of two mechanisms, one for allowing fast entry when no contention is detected, and the other for providing deadlock freedom in the case of contention. Two variables, Fast-Lock and Slow- Lock are used for controlling access when there is no contention. In addition, each processor p i has a boolean variable Want[i] whose value is true if p i is interested in entering the critical section and false otherwise. A processor can enter the critical section by either nding Fast-Lock = i - in this case it enters the critical section on the fast path - or by nding Slow-Lock = i in which case it enters the critical section along the slow path. Consider the case where no processor is in the critical section or in the entry section. In this case, Slow-Lock is 0 and all Want entries are 0. Once p i now enters the entry section, it sets Want[i] to 1 and Fast-Lock to i. Then it checks Slow-Lock which is 0. then it checks Fast-Lock again and since no other processor is in the entry section it reads i and enters the critical section along the fast path with three writes and two reads. If Fast-Lock i then p i waits until all Want ags are reset. After some processor executes the for loop in line 10, the value of Slow-Lock remains unchanged until some processor leaving the critical section resets it. Hence at most one processor p j may nd Slow-Lock= j and this processor enters the critical section along the slow path. Note that the Lamport's Fast Mutual Exclusion algorithm does not guarantee

60 Distributed Algorithms lockout freedom. Theorem Algorithm Fast-Mutual-Exclusion guarantees mutual exclusion without deadlock. Exercises An algorithm solves the 2-mutual exclusion problem if at any time at most two processors are in the critical section. Present an algorithm for solving the 2- mutual exclusion problem using test & set registers Prove that bakery algorithm satises the mutual exclusion property Prove that bakery algorithm provides no lockout Isolate a bounded mutual exclusion algorithm with no lockout for two processors from the tournament tree algorithm. Show that your algorithm has the mutual exclusion property. Show that it has the no lockout property Prove that algorithm Fast-Mutual-Exclusion has the mutual exclusion property Prove that algorithm Fast-Mutual-Exclusion has the no deadlock property Show that algorithm Fast-Mutual-Exclusion does not satisfy the no lockout property, i.e. construct an execution in which a processor is locked out of the critical section Construct an execution of algorithm Fast-Mutual-Exclusion in which two processors are in the entry section and both read at least Ω(n) variables before entering the critical section. Problems 13-1 Number of messages of the algorithm Flood Prove that the algorithm Flood sends O(e) messages in any execution, given a graph G with n vertices and e edges. What is the exact number of messages as a function of the number of vertices and edges in the graph? 13-2 Leader election in a ring Assume that messages can only be sent in CW direction, and design an asynchronous algorithm for leader election on a ring that has O(n lg n) message complexity. Hint. Let processors work in phases. Each processor begins in the active mode with a value equal to the identier of the processor, and under certain conditions can enter the relay mode, where it just relays messages. An active processor waits for messages from two active processors, and then inspects the values sent by the processors, and decides whether to become the leader, remain active and adopt one of the values, or start relaying. Determine how the decisions should be made so as to ensure that if there are three or more active processors, then at least one will remain active; and no matter what values active processors have in a phase, at most half of them will still be active in the next phase.

61 13. Chapter notes Validity condition in asynchronous systems Show that the validity condition is equivalent to requiring that every nonfaulty processor decision be the input of some processor Single source consensus An alternative version of the consensus problem requires that the input value of one distinguished processor (the general) be distributed to all the other processors (the lieutenants). This problem is also called single source consensus problem. The conditions that need to be satised are: Termination: Every nonfaulty lieutenant must eventually decide, Agreement: All the nonfaulty lieutenants must have the same decision, Validity: If the general is nonfaulty, then the common decision value is the general's input. So if the general is faulty, then the nonfaulty processors need not decide on the general's input, but they must still agree with each other. Consider the synchronous message passing system with Byzantine faults. Show how to transform a solution to the consensus problem (in Subsection ) into a solution to the general's problem and vice versa. What are the message and round overheads of your transformation? 13-5 Bank transactions Imagine that there are n banks that are interconnected. Each bank i starts with an amount of money m i. Banks do not remember the initial amount of money. Banks keep on transferring money among themselves by sending messages of type <10> that represent the value of a transfer. At some point of time a bank decides to nd the total amount of money in the system. Design an algorithm for calculating m m n that does not stop monetary transactions. Chapter notes The denition of the distributed systems presented in the chapter are derived from the book by Attiya and Welch [?]. The model of distributed computation, for message passing systems without failures, was proposed by Attiya, Dwork, Lynch and Stockmeyer [?]. Modeling the processors in the distributed systems in terms of automata follows the paper of Lynch and Fisher [?]. The concept of the execution sequences is based on the papers of Fischer, Gries, Lamport and Owicki [?,?,?]. The denition of the asynchronous systems reects the presentation in the papers of Awerbuch [?], and Peterson and Fischer [?]. The algorithm Spanning-Tree-Broadcast is presented after the paper due to Segall [?]. The leader election algorithm Bully was proposed by Hector Garcia-Molina in 1982 [?]. The asymptotic optimality of this algorithm was proved by Burns [?]. The two generals problem is presented as in the book of Gray [?].

62 Distributed Algorithms The consensus problem was rst studied by Lamport, Pease, and Shostak [?,?]. They proved that the Byzantine consensus problem is unsolvable if n 3f [?]. One of the basic results in the theory of asynchronous systems is that the consensus problem is not solvable even if we have reliable communication systems, and one single faulty processor which fails by crashing. This result was rst shown in a breakthrough paper by Fischer, Lynch and Paterson [?]. The algorithm Consensus-with-Crash-Failures is based on the paper of Dolev and Strong [?]. Berman and Garay [?] proposed an algorithm for the solution of the Byzantine consensus problem for the case n > 4f. Their algorithm needs 2(f + 1) rounds. The bakery algorithm [?] for mutual exclusion using only shared read/write registers to solve mutual exclusion is due to Lamport [?]. This algorithm requires arbitrary large values. This requirement is removed by Peterson and Fischer [?]. After this Burns and Lynch proved that any deadlock-free mutual exclusion algorithm using only shared read/write registers must use at least n shared variables, regardless of their size [?]. The algorithm Fast-Mutual-Exclusion is presented by Lamport [?]. The source of the problems 13-3, 13-4, 13-5 is the book of Attiya and Welch [?]. Important textbooks on distributed algorithms include the monumental volume by Nancy Lynch [?] published in 1997, the book published by Gerard Tel [?] in 2000, and the book by Attiya and Welch [?]. Also of interest is the monograph by Claudia Leopold [?] published in 2001, and the book by Nicola Santoro [?], which appeared in Finally, several important open problems in distributed computing can be found in a recent paper of Aspnes et al. [?].

63 14. Computer Graphics Computer Graphics algorithms create and render virtual worlds stored in the computer memory. The virtual world model may contain shapes (points, line segments, surfaces, solid objects etc.), which are represented by digital numbers. Rendering computes the displayed image of the virtual world from a given virtual camera. The image consists of small rectangles, called pixels. A pixel has a unique colour, thus it is sucient to solve the rendering problem for a single point in each pixel. This point is usually the centre of the pixel. Rendering nds that shape which is visible through this point and writes its visible colour into the pixel. In this chapter we discuss the creation of virtual worlds and the determination of the visible shapes Fundamentals of analytic geometry The base set of our examination is the Euclidean space. In computer algorithms the elements of this space should be described by numbers. The branch of geometry describing the elements of space by numbers is the analytic geometry. The basic concepts of analytic geometry are the vector and the coordinate system. Denition 14.1 A vector is a translation that is dened by its direction and length. A vector is denoted by v. The length of the vector is also called its absolute value, and is denoted by v. Vectors can be added, resulting in a new vector that corresponds to subsequent translations. Addition is denoted by v 1 + v 2 = v. Vectors can be multiplied by scalar values, resulting also in a vector (λ v 1 = v), which translates at the same direction as v 1, but the length of translation is scaled by λ. The dot product of two vectors is a scalar that is equal to the product of the lengths of the two vectors and the cosine of their angle: v 1 v 2 = v 1 v 2 cos α, where α is the angle between v 1 and v 2. Two vectors are said to be orthogonal if their dot product is zero. On the other hand, the cross product of two vectors is a vector that is orthogonal to the plane of the two vectors and its length is equal to the product of the

64 Computer Graphics lengths of the two vectors and the sine of their angle: v 1 v 2 = v, where v is orthogonal to v 1 and v 2, and v = v 1 v 2 sin α. There are two possible orthogonal vectors, from which that alternative is selected where our middle nger of the right hand would point if our thumb were pointing to the rst and our forenger to the second vector (right hand rule). Two vectors are said to be parallel if their cross product is zero Cartesian coordinate system Any vector v of a plane can be expressed as the linear combination of two, nonparallel vectors i, j in this plane, that is v = x i + y j. Similarly, any vector v in the three-dimensional space can be unambiguously dened by the linear combination of three, not coplanar vectors: v = x i + y j + z k. Vectors i, j, k are called basis vectors, while scalars x, y, z are referred to as coordinates. We shall assume that the basis vectors have unit length and they are orthogonal to each other. Having dened the basis vectors any other vector can unambiguously be expressed by three scalars, i.e. by its coordinates. A point is specied by that vector which translates the reference point, called origin, to the given point. In this case the translating vector is the place vector of the given point. The origin and the basis vectors constitute the Cartesian coordinate system, which is the basic tool to describe the points of the Euclidean plane or space by numbers. The Cartesian coordinate system is the algebraic basis of the Euclidean geometry, which means that scalar triplets of Cartesian coordinates can be paired with the points of the space, and having made a correspondence between algebraic and geometric concepts, the theorems of the Euclidean geometry can be proven by algebraic means. Exercises Prove that there is a one-to-one mapping between Cartesian coordinate triplets and points of the three-dimensional space Prove that if the basis vectors have unit length and are orthogonal to each other, then (x 1, y 1, z 1 ) (x 2, y 2, z 2 ) = x 1 x 2 + y 1 y 2 + z 1 z Description of point sets with equations Coordinate systems provide means to specify points by numbers. Conditions on these numbers, on the other hand, may dene sets of points. Conditions are formulated by

65 14.2. Description of point sets with equations 1015 solid f(x, y, z) implicit function sphere of radius R R 2 x 2 y 2 z 2 block of size 2a, 2b, 2c min{a x, b y, c z } torus of axis z, radii r (tube) and R (hole) r 2 z 2 (R x 2 + y 2 ) 2 Figure Functions dening the sphere, the block, and the torus. equations. The coordinates found as the solution of these equations dene the point set. Let us now consider how these equations can be established Solids A solid is a subset of the three-dimensional Euclidean space. To dene this subset, continuous function f is used which maps the coordinates of points onto the set of real numbers. We say that a point belongs to the solid if the coordinates of the point satisfy the following implicit inequality: f(x, y, z) 0. Points satisfying inequality f(x, y, z) > 0 are the internal points, while points dened by f(x, y, z) < 0 are the external points. Because of the continuity of function f, points satisfying equality f(x, y, z) = 0 are between external and internal points and are called the boundary surface of the solid. Intuitively, function f describes the signed distance between a point and the boundary surface. We note that we usually do not consider any point set as a solid, but also require that the point set does not have lower dimensional degeneration (e.g. hanging lines or surfaces), i.e. that arbitrarily small neighborhoods of each point of the boundary surface contain internal points. Figure 14.1 lists the dening functions of the sphere, the box, and the torus Surfaces Points having coordinates that satisfy equation f(x, y, z) = 0 are on the boundary surface. Surfaces can thus be dened by this implicit equation. Since points can also be given by the place vectors, the implicit equation can be formulated for the place vectors as well: f( r) = 0. A surface may have many dierent equations. For example, equations f(x, y, z) = 0, f 2 (x, y, z) = 0, and 2 f 3 (x, y, z) = 0 are algebraically dierent, but they have the same roots and thus dene the same set of points. A plane of normal vector n and place vector r 0 contains those points for which vector r r 0 is perpendicular to the normal, thus their dot product is zero. Based on this, the points of a plane are dened by the following vector or scalar equations: ( r r 0 ) n = 0, n x x + n y y + n z z + d = 0, (14.1)

66 Computer Graphics solid x(u, v) y(u, v) z(u, v) sphere of radius R R cos 2πu sin πv R sin 2πu sin πv R cos πv cylinder of radius R, axis z, and of height h R cos 2πu R sin 2πu h v cone of radius R, axis z, and of height h R (1 v) cos 2πu R (1 v) sin 2πu h v Figure Parametric forms of the sphere, the cylinder, and the cone, where u, v [0, 1]. where n x, n y, n z are the coordinates of the normal and d = r 0 n. If the normal vector has unit length, then d expresses the signed distance between the plane and the origin of the coordinate system. Two planes are said to be parallel if their normals are parallel. In addition to using implicit equations, surfaces can also be dened by parametric forms. In this case, the Cartesian coordinates of surface points are functions of two independent variables. Denoting these free parameters by u and v, the parametric equations of the surface are: x = x(u, v), y = y(u, v), z = z(u, v), u [u min, u max ], v [v min, v max ]. The implicit equation of a surface can be obtained from the parametric equations by eliminating free parameters u, v. Figure 14.2 includes the parametric forms of the sphere, the cylinder and the cone. Parametric forms can also be dened directly for the place vectors: r = r(u, v). Points of a triangle are the convex combinations of points p 1, p 2, and p 3, that is r(α, β, γ) = α p 1 + β p 2 + γ p 3, where α, β, γ 0 and α + β + γ = 1. From this denition we can obtain the usual two-variate parametric form of a triangle substituting α by u, β by v, and γ by (1 u v): r(u, v) = u p 1 + v p 2 + (1 u v) p 3, where u, v 0 and u + v Curves By intersecting two surfaces, we obtain a curve that may be dened formally by the implicit equations of the two intersecting surfaces f 1 (x, y, z) = f 2 (x, y, z) = 0, but this is needlessly complicated. Instead, let us consider the parametric forms of the two surfaces, given as r 1 (u 1, v 1 ) and r 2 (u 2, v 2 ), respectively. The points of the intersection satisfy vector equation r 1 (u 1, v 1 ) = r 2 (u 2, v 2 ), which corresponds to three scalar equations, one for each coordinate of the three-dimensional space. Thus we can eliminate three from the four unknowns (u 1, v 1, u 2, v 2 ), and obtain a one-variate parametric equation for the coordinates of the curve points:

67 14.2. Description of point sets with equations 1017 test x(u, v) y(u, v) z(u, v) ellipse of main axes 2a, 2b on plane z = 0 a cos 2πt b sin 2πt 0 helix of radius R, axis z, and elevation h R cos 2πt R sin 2πt h t line segment between points (x 1, y 1, z 1 ) and (x 2, y 2, z 2 ) x 1 (1 t) + x 2 t y 1 (1 t) + y 2 t z 1 (1 t) + z 2 t Figure Parametric forms of the ellipse, the helix, and the line segment, where t [0, 1]. x = x(t), y = y(t), z = z(t), t [t min, t max ]. Similarly, we can use the vector form: r = r(t), t [t min, t max ]. Figure 14.3 includes the parametric equations of the ellipse, the helix, and the line segment. Note that we can dene curves on a surface by xing one of free parameters u, v. For example, by xing v the parametric form of the resulting curve is r v (u) = r(u, v). These curves are called iso-parametric curves. Two points dene a line. Let us select one point and call the place vector of this point the place vector of the line. On the other hand, the vector between the two points is the direction vector. Any other point of the line can be obtained by a translation of the point of the place vector parallel to the direction vector. Denoting the place vector by r 0 and the direction vector by v, the equation of the line is: r(t) = r 0 + v t, t (, ). (14.2) Two lines are said to be parallel if their direction vectors are parallel. Instead of the complete line, we can also specify the points of a line segment if parameter t is restricted to an interval. For example, the equation of the line segment between points r 1 and r 2 is: r(t) = r 1 + ( r 2 r 1 ) t = r 1 (1 t) + r 2 t, t [0, 1]. (14.3) According to this denition, the points of a line segment are the convex combinations of the endpoints Normal vectors In computer graphics we often need the normal vectors of the surfaces (i.e. the normal vector of the tangent plane of the surface). Let us take an example. A mirror reects light in a way that the incident direction, the normal vector, and the reection direction are in the same plane, and the angle between the normal and the incident direction equals to the angle between the normal and the reection direction. To carry out such and similar computations, we need methods to obtain the normal of the surface. The equation of the tangent plane is obtained as the rst order Taylor approximation of the implicit equation around point (x 0, y 0, z 0 ): f(x, y, z) = f(x 0 + (x x 0 ), y 0 + (y y 0 ), z 0 + (z z 0 ))

68 Computer Graphics f(x 0, y 0, z 0 ) + f x (x x 0) + f y (y y 0) + f z (z z 0). Points (x 0, y 0, z 0 ) and (x, y, z) are on the surface, thus f(x 0, y 0, z 0 ) = 0 and f(x, y, z) = 0, resulting in the following equation of the tangent plane: f x (x x 0) + f y (y y 0) + f z (z z 0) = 0. Comparing this equation to equation (14.1), we can realize that the normal vector of the tangent plane is n = ( f x, f y, f ) = gradf. (14.4) z The normal vector of parametric surfaces can be obtained by examining the iso-parametric curves. The tangent of curve r v (u) dened by xing parameter v is obtained by the rst-order Taylor approximation: r v (u) = r v (u 0 + (u u 0 )) r v (u 0 ) + d r v du (u u 0) = r v (u 0 ) + r u (u u 0). Comparing this approximation to equation (14.2) describing a line, we conclude that the direction vector of the tangent line is r/ u. The tangent lines of the curves running on a surface are in the tangent plane of the surface, making the normal vector perpendicular to the direction vectors of these lines. In order to nd the normal vector, both the tangent line of curve r v (u) and the tangent line of curve r u (v) are computed, and their cross product is evaluated since the result of the cross product is perpendicular to the multiplied vectors. The normal of surface r(u, v) is then n = r u r v. (14.5) Curve modelling Parametric and implicit equations trace back the geometric design of the virtual world to the establishment of these equations. However, these equations are often not intuitive enough, thus they cannot be used directly during design. It would not be reasonable to expect the designer working on a human face or on a car to directly specify the equations of these objects. Clearly, indirect methods are needed which require intuitive data from the designer and dene these equations automatically. One category of these indirect approaches apply control points. Another category of methods work with elementary building blocks (box, sphere, cone, etc.) and with set operations. Let us discuss rst how the method based on control points can dene curves. Suppose that the designer specied points r 0, r 1,..., r m, and that parametric curve of equation r = r(t) should be found which follows these points. For the time being, the curve is not required to go through these control points. We use the analogy of the centre of mass of mechanical systems to construct our curve. Assume that we have sand of unit mass, which is distributed at the control

69 14.2. Description of point sets with equations 1019 points. If a control point has most of the sand, then the centre of mass is close to this point. Controlling the distribution of the sand as a function of parameter t to give the main inuence to dierent control points one after the other, the centre of mass will travel through a curve running close to the control points. Let us put weights B 0 (t), B 1 (t),..., B m (t) at control points at parameter t. These weighting functions are also called the basis functions of the curve. Since unit weight is distributed, we require that for each t the following identity holds: m B i (t) = 1. i=0 For some t, the respective point of the curve curve is the centre of mass of this mechanical system: r(t) = m i=0 B i(t) r i m i=0 B i(t) = m B i (t) r i. Note that the reason of distributing sand of unit mass is that this decision makes the denominator of the fraction equal to 1. To make the analogy complete, the basis functions cannot be negative since the mass is always non-negative. The centre of mass of a point system is always in the convex hull 1 of the participating points, thus if the basis functions are non-negative, then the curve remains in the convex hull of the control points. The properties of the curves are determined by the basis functions. Let us now discuss two popular basis function systems, namely the basis functions of the Béziercurves and the B-spline curves. Bézier-curve Pierre Bézier, a designer working at Renault, proposed the Bernstein polynomials as basis functions. Bernstein polynomials can be obtained as the expansion of 1 m = (t + (1 t)) m according to binomial theorem: (t + (1 t)) m = m i=0 i=0 ( ) m t i (1 t) m i. i The basis functions of Bézier curves are the terms of this sum (i = 0, 1,..., m): B Bezier i,m (t) = ( m i ) t i (1 t) m i. (14.6) According to the introduction of Bernstein polynomials, it is obvious that they really meet condition m i=0 B i(t) = 1 and B i (t) 0 in t [0, 1], which guarantees that Bézier curves are always in the convex hulls of their control points. The basis functions and the shape of the Bézier curve are shown in Figure At parameter value t = 0 the rst basis function is 1, while the others are zero, therefore the curve 1 The convex hull of a point system is by denition the minimal convex set containing the point system.

70 Computer Graphics 1 b0 b1 b2 b t Figure A Bézier curve dened by four control points and the respective basis functions (m = 3). starts at the rst control point. Similarly, at parameter value t = 1 the curve arrives at the last control point. At other parameter values, all basis functions are positive, thus they simultaneously aect the curve. Consequently, the curve usually does not go through the other control points. B-spline The basis functions of the B-spline can be constructed applying a sequence of linear blending. A B-spline weights the m + 1 number of control points by (k 1)-degree polynomials. Value k is called the order of the curve. Let us take a non-decreasing series of m + k + 1 parameter values, called the knot vector: t = [t 0, t 1,..., t m+k ], t 0 t 1 t m+k. By denition, the ith rst order basis function is 1 in the ith interval, and zero elsewhere (Figure 14.5): { Bi,1 BS 1, if t (t) = i t < t i+1, 0 otherwise. Using this denition, m + k number of rst order basis functions are established, which are non-negative zero-degree polynomials that sum up to 1 for all t [t 0, t m+k ) parameters. These basis functions have too low degree since the centre of mass is not even a curve, but jumps from control point to control point. The order of basis functions, as well as the smoothness of the curve, can be increased by blending two consecutive basis functions with linear weighting (Figure 14.5). The rst basis function is weighted by linearly increasing factor (t t i )/(t i+1 t i ) in domain t i t < t i+1, where the basis function is non-zero. The next basis function, on the other hand, is scaled by linearly decreasing factor (t i+2 t)/(t i+2 t i+1 ) in its domain t i+1 t < t i+2 where it is non zero. The two weighted basis

71 14.2. Description of point sets with equations B (t) i,1 basis function linear smoothing constant basis functions t 0 t t 1 2 t3 t4 t5 t6 B (t) i,2 t 7 t 8 linear smoothing linear basis functions t 1 1 B i,3(t) t 5 t 7 linear smoothing quadratic basis functions t 2 1 B (t) i,4 t 5 t 6 linear smoothing cubic basis functions t 3 t 5 Figure Construction of B-spline basis functions. A higher order basis function is obtained by blending two consecutive basis functions on the previous level using a linearly increasing and a linearly decreasing weighting, respectively. Here the number of control points is 5, i.e. m = 4. Arrows indicate useful interval [t k 1, t m+1 ] where we can nd m + 1 number of basis functions that add up to 1. The right side of the gure depicts control points with triangles and curve points corresponding to the knot values by circles. functions are added to obtain the tent-like second order basis functions. Note that while a rst order basis function is non-zero in a single interval, the second order basis functions expand to two intervals. Since the construction makes a new basis function from every pair of consecutive lower order basis functions, the number of new basis functions is one less than that of the original ones. We have just m+k 1 second order basis functions. Except for the rst and the last rst order basis functions, all of them are used once with linearly increasing and once with linearly decreasing weighting, thus with the exception of the rst and the last intervals, i.e. in [t 1, t m+k 1 ], the new basis functions also sum up to 1. The second order basis functions are rst degree polynomials. The degree of basis functions, i.e. the order of the curve, can be arbitrarily increased by the recursive application of the presented blending method. The dependence of the next order basis functions on the previous order ones is as follows: B BS i,k 1 (t) i,k (t) = (t t i)b BS t i+k 1 t i + (t i+k t)b BS i+1,k 1 (t) t i+k t i+1, if k > 1. Note that we always take two consecutive basis functions and weight them in their non-zero domain (i.e. in the interval where they are non-zero) with linearly increasing factor (t t i )/(t i+k 1 t i ) and with linearly decreasing factor (t i+k t)/(t i+k t i+1 ),

72 Computer Graphics c m c 0 c 1 p m p 1 p 0 p c 2 c 3 2 c m+1 c -1 Figure A B-spline interpolation. Based on points p 0,..., p m to be interpolated, control points c 1,..., c m+1 are computed to make the start and end points of the segments equal to the interpolated points. respectively. The two weighted functions are summed to obtain the higher order, and therefore smoother basis function. Repeating this operation (k 1) times, k-order basis functions are generated, which sum up to 1 in interval [t k 1, t m+1 ]. The knot vector may have elements that are the same, thus the length of the intervals may be zero. Such intervals result in 0/0 like fractions, which must be replaced by value 1 in the implementation of the construction. The value of the ith k-order basis function at parameter t can be computed with the following Cox-deBoor-Manseld recursion: B-spline(i, k, t, t) 1 if k = 1 Trivial case. 2 then if t i t < t i+1 3 then return 1 4 else return 0 5 if t i+k 1 t i > 0 6 then b 1 (t t i )/(t i+k 1 t i ) Previous with linearly increasing weight. 7 else b 1 1 Here: 0/0 = 1. 8 if t i+k t i+1 > 0 9 then b 2 (t i+k t)/(t i+k t i+1 ) Next with linearly decreasing weight. 10 else b 2 1 Here: 0/0 = B b 1 B-spline(i, k 1, t, t) + b 2 B-spline(i + 1, k 1, t, t) Recursion. 12 return B In practice, we usually use fourth-order basis functions (k = 4), which are thirddegree polynomials, and dene curves that can be continuously dierentiated twice. The reason is that bent rods and motion paths following the Newton laws also have this property. While the number of control points is greater than the order of the curve, the basis functions are non-zero only in a part of the valid parameter set. This means that a control point aects just a part of the curve. Moving this control point, the change of the curve is local. Local control is a very important property since the designer can adjust the shape of the curve without destroying its general form. A fourth-order B-spline usually does not go through its control points. If we wish to use it for interpolation, the control points should be calculated from the points to

73 14.2. Description of point sets with equations 1023 be interpolated. Suppose that we need a curve which visits points p 0, p 1,..., p m at parameter values t 0 = 0, t 1 = 1,..., t m = m, respectively (Figure 14.6). To nd such a curve, control points [ c 1, c 0, c 1,..., c m+1 ] should be found to meet the following interpolation criteria: r(t j ) = m+1 i= 1 c i B BS i,4 (t j ) = p j, j = 0, 1,..., m. These criteria can be formalized as m + 1 linear equations with m + 3 unknowns, thus the solution is ambiguous. To make the solution unambiguous, two additional conditions should be imposed. For example, we can set the derivatives (for motion paths, the speed) at the start and end points. B-spline curves can be further generalized by dening the inuence of the ith control point as the product of B-spline basis function B i (t) and additional weight w i of the control point. The curve obtained this way is called the Non-Uniform Rational B-Spline, abbreviated as NURBS, which is very popular in commercial geometric modelling systems. Using the mechanical analogy again, the mass put at the ith control point is w i B i (t), thus the centre of mass is: m i=0 r(t) = w ibi BS (t) r i m m j=0 w jbj BS = Bi NURBS (t) r i. (t) The correspondence between B-spline and NURBS basis functions is as follows: B NURBS i (t) = i=0 w i Bi BS (t) m j=0 w jbj BS (t). Since B-spline basis functions are piece-wise polynomial functions, NURBS basis functions are piece-wise rational functions. NURBS can describe quadratic curves (e.g. circle, ellipse, etc.) without any approximation error Surface modelling Parametric surfaces are dened by two variate functions r(u, v). Instead of specifying this function directly, we can take nite number of control points r ij which are weighted with the basis functions to obtain the parametric function: r(u, v) = n m i=0 j=0 r ij B ij (u, v). (14.7) Similarly to curves, basis functions are expected to sum up to 1, i.e. n i=0 m j=0 B ij(u, v) = 1 everywhere. If this requirement is met, we can imagine that the control points have masses B ij (u, v) depending on parameters u, v, and the centre of mass is the surface point corresponding to parameter pair u, v. Basis functions B ij (u, v) are similar to those of curves. Let us x parameter v. Changing parameter u, curve r v (u) is obtained on the surface. This curve can be

74 Computer Graphics Figure Iso-parametric curves of surface. dened by the discussed curve denition methods: r v (u) = n i=0 B i (u) r i, (14.8) where B i (u) is the basis function of the selected curve type. Of course, xing v dierently we obtain another curve of the surface. Since a curve of a given type is unambiguously dened by the control points, control points r i must depend on the xed v value. As parameter v changes, control point r i = r i (v) also runs on a curve, which can be dened by control points r i,0, r i,2,..., r i,m : r i (v) = m B j (v) r ij. j=0 Substituting this into equation (14.8), the parametric equation of the surface is: r(u, v) = r v (u) = n m B i (u) B j (v) r ij = i=0 j=0 n i=0 j=0 m B i (u)b j (v) r ij. Unlike curves, the control points of a surface form a two-dimensional grid. The twodimensional basis functions are obtained as the product of one-variate basis functions parameterized by u and v, respectively Solid modelling with blobs Free form solids similarly to parametric curves and surfaces can also be specied by nite number of control points. For each control point r i, let us assign inuence function h(r i ), which expresses the inuence of this control point at distance R i = r r i. By denition, the solid contains those points where the total inuence of the control points is not smaller than threshold T (Figure 14.8): f( r) = m i=0 h i (R i ) T 0, where R i = r r i.

75 14.2. Description of point sets with equations 1025 h(r) T R addition subtraction Figure The inuence decreases with the distance. Spheres of inuence of similar signs increase, of dierent signs decrease each other. With a single control point a sphere can be modeled. Spheres of multiple control points are combined together to result in an object having smooth surface (Figure 14.8). The inuence of a single point can be dened by an arbitrary decreasing function that converges to zero at innity. For example, Blinn proposed the h i (R) = a i e bir2 inuence functions for his blob method Constructive solid geometry Another type of solid modelling is constructive solid geometry (CSG for short), which builds complex solids from primitive solids applying set operations (e.g. union, intersection, dierence, complement, etc.) (Figures 14.9 and 14.10). Primitives usually include the box, the sphere, the cone, the cylinder, the half-space, etc. whose functions are known. Figure The operations of constructive solid geometry for a cone of implicit function f and for a sphere of implicit function g: union (max(f, g)), intersection (min(f, g)), and dierence (min(f, g)).

76 Computer Graphics U \ U U Figure Constructing a complex solid by set operations. The root and the leaf of the CSG tree represents the complex solid, and the primitives, respectively. Other nodes dene the set operations (U: union, \: dierence). The results of the set operations can be obtained from the implicit functions of the solids taking part of this operation: intersection of f and g: min(f, g); union of f and g: max(f, g). complement of f: f. dierence of f and g: min(f, g). Implicit functions also allow to morph between two solids. Suppose that two objects, for example, a box of implicit function f 1 and a sphere of implicit function f 2 need to be morphed. To dene a new object, which is similar to the rst object with percentage t and to the second object with percentage (1 t), the two implicit equations are weighted appropriately: f morph (x, y, z) = t f 1 (x, y, z) + (1 t) f 2 (x, y, z). Exercises Find the parametric equation of a torus Prove that the fourth-order B-spline with knot-vector [0,0,0,0,1,1,1,1] is a Bézier curve Give the equations for the surface points and the normals of the waving ag and waving water disturbed in a single point Prove that the tangents of a Bézier curve at the start and the end are the lines connecting the rst two and the last two control points, respectively Give the algebraic forms of the basis functions of the second, the third, and the fourth-order B-splines.

77 14.3. Geometry processing and tessellation algorithms 1027 (a) (b) (c) Figure Types of polygons. (a) simple; (b) complex, single connected; (c) multiply connected Develop an algorithm computing the path length of a Bézier curve and a B-spline. Based on the path length computation move a point along the curve with uniform speed Geometry processing and tessellation algorithms In Section 14.2 we met free-form surface and curve denition methods. During image synthesis, however, line segments and polygons play important roles. In this section we present methods that bridge the gap between these two types of representations. These methods convert geometric models to lines and polygons, or further process line and polygon models. Line segments connected to each other in a way that the end point of a line segment is the start point of the next one are called polylines. Polygons connected at edges, on the other hand, are called meshes. Vectorization methods approximate free-form curves by polylines. A polyline is dened by its vertices. Tessellation algorithms, on the other hand, approximate free-form surfaces by meshes. For illumination computation, we often need the normal vector of the original surface, which is usually stored with the vertices. Consequently, a mesh contains a list of polygons, where each polygon is given by its vertices and the normal of the original surface at these vertices. Methods processing meshes use other topology information as well, for example, which polygons share an edge or a vertex Polygon and polyhedron Denition 14.2 A polygon is a bounded part of the plane, i.e. it does not contain a line, and is bordered by line segments. A polygon is dened by the vertices of the bordering polylines. Denition 14.3 A polygon is single connected if its border is a single closed polyline (Figure 14.11). Denition 14.4 A polygon is simple if it is single connected and the bordering polyline does not intersect itself (Figure 14.11(a)).

78 Computer Graphics diagonal r 2 r 0 r 1 r 3 r 4 ear Figure Diagonal and ear of a polygon. For a point of the plane, we can detect whether or not this point is inside the polygon by starting a half-line from this point and counting the number of intersections with the boundary. If the number of intersections is an odd number, then the point is inside, otherwise it is outside. In the three-dimensional space we can form meshes, where dierent polygons are in dierent planes. In this case, two polygons are said to be neighboring if they share an edge. Denition 14.5 A polyhedron is a bounded part of the space, which is bordered by polygons. Similarly to polygons, a point can be tested for polyhedron inclusion by casting a half line from this point and counting the number of intersections with the face polygons. If the number of intersections is odd, then the point is inside the polyhedron, otherwise it is outside Vectorization of parametric curves Parametric functions map interval [t min, t max ] onto the points of the curve. During vectorization the parameter interval is discretized. The simplest discretization scheme generates N +1 evenly spaced parameter values t i = t min +(t max t min ) i/n (i = 0, 1,..., N), and denes the approximating polyline by the points obtained by substituting these parameter values into parametric equation r(t i ) Tessellation of simple polygons Let us rst consider the conversion of simple polygons to triangles. This is easy if the polygon is convex since we can select an arbitrary vertex and connect it with all other vertices, which decomposes the polygon to triangles in linear time. Unfortunately, this approach does not work for concave polygons since in this case the line segment connecting two vertices may go outside the polygon, thus cannot be the edge of one decomposing triangle. Let us start the discussion of triangle conversion algorithms with two denitions: Denition 14.6 The diagonal of a polygon is a line segment connecting two vertices and is completely contained by the polygon (line segment r 0 and r 3 of Figure 14.12). The diagonal property can be checked for a line segment connecting two vertices by

79 14.3. Geometry processing and tessellation algorithms 1029 r i-1 r i-1 y x r i diagonal r i+1 r i p diagonal r i+1 Figure The proof of the existence of a diagonal for simple polygons. trying to intersect the line segment with all edges and showing that intersection is possible only at the endpoints, and additionally showing that one internal point of the candidate is inside the polygon. For example, this test point can be the midpoint of the line segment. Denition 14.7 A vertex of the polygon is an ear if the line segment between the previous and the next vertices is a diagonal (vertex r 4 of Figure 14.12). Clearly, only those vertices may be ears where the inner angle is not greater than 180 degrees. Such vertices are called convex vertices. For simple polygons the following theorems hold: Theorem 14.8 A simple polygon always has a diagonal. Proof Let the vertex standing at the left end (having the minimal x coordinate) be r i, and its two neighboring vertices be r i 1 and r i+1, respectively (Figure 14.13). Since r i is standing at the left end, it is surely a convex vertex. If r i is an ear, then line segment ( r i 1, r i+1 ) is a diagonal (left of Figure 14.13), thus the theorem is proven for this case. Since r i is a convex vertex, it is not an ear only if triangle r i 1, r i, r i+1 contains at least one polygon vertex (right of Figure 14.13). Let us select from the contained vertices that vertex p which is the farthest from the line dened by points r i 1, r i+1. Since there are no contained points which are farther from line ( r i 1, r i+1 ) than p, no edge can be between points p and r i, thus ( p, r i ) must be a diagonal. Theorem 14.9 A simple polygon can always be decomposed to triangles with its diagonals. If the number of vertices is n, then the number of triangles is n 2. Proof This theorem is proven by induction. The theorem is obviously true when n = 3, i.e. when the polygon is a triangle. Let us assume that the statement is also true for polygons having m (m = 3,..., n 1) number of vertices, and consider a polygon with n vertices. According to Theorem 14.8, this polygon of n vertices has a diagonal, thus we can subdivide this polygon into a polygon of n 1 vertices and a polygon of n 2 vertices, where n 1, n 2 < n, and n 1 + n 2 = n + 2 since the vertices at the ends of the diagonal participate in both polygons. According to the assumption of the induction, these two polygons can be separately decomposed to triangles. Joining the two sets of triangles, we can obtain the triangle decomposition of the

80 Computer Graphics original polygon. The number of triangles is n n 2 2 = n 2. The discussed proof is constructive thus it inspires a subdivision algorithm: let us nd a diagonal, subdivide the polygon along this diagonal, and continue the same operation for the two new polygons. Unfortunately the running time of such an algorithm is in Θ(n 3 ) since the number of diagonal candidates is Θ(n 2 ), and the time needed by checking whether or not a line segment is a diagonal is in Θ(n). We also present a better algorithm, which decomposes a convex or concave polygon dened by vertices r 0, r 1,..., r n. This algorithm is called ear cutting. The algorithm looks for ear triangles and cuts them until the polygon gets simplied to a single triangle. The algorithm starts at vertex r 2. When a vertex is processed, it is rst checked whether or not the previous vertex is an ear. If it is not an ear, then the next vertex is chosen. If the previous vertex is an ear, then the current vertex together with the two previous ones form a triangle that can be cut, and the previous vertex is deleted. If after deletion the new previous vertex has index 0, then the next vertex is selected as the current vertex. The presented algorithm keeps cutting triangles until no more ears are left. The termination of the algorithm is guaranteed by the following two ears theorem: Theorem A simple polygon having at least four vertices always has at least two not neighboring ears that can be cut independently. Proof The proof presented here has been given by Joseph O'Rourke. According to theorem 14.9, every simple polygon can be subdivided to triangles such that the edges of these triangles are either the edges or the diagonals of the polygon. Let us make a correspondence between the triangles and the nodes of a graph where two nodes are connected if and only if the two triangles corresponding to these nodes share an edge. The resulting graph is connected and cannot contain circles. Graphs of these properties are trees. The name of this tree graph is the dual tree. Since the polygon has at least four vertices, the number of nodes in this tree is at least two. Any tree containing at least two nodes has at least two leaves 2. Leaves of this tree, on the other hand, correspond to triangles having an ear vertex. According to the two ears theorem, the presented algorithm nds an ear in O(n) steps. Cutting an ear the number of vertices is reduced by one, thus the algorithm terminates in O(n 2 ) steps Tessellation of parametric surfaces Parametric forms of surfaces map parameter rectangle [u min, u max ] [v min, v max ] onto the points of the surface. In order to tessellate the surface, rst the parameter rectangle is subdivided to triangles. Then applying the parametric equations for the vertices of the parameter triangles, the approximating triangle mesh can be obtained. The simplest subdivision of the parametric rectangle decomposes the domain of parameter u to N parts, and 2 A leaf is a node connected by exactly one edge.

81 14.3. Geometry processing and tessellation algorithms 1031 r v (u) r u (v) Figure Tessellation of parametric surfaces. error Figure Estimation of the tessellation error. the domain of parameter v to M intervals, resulting in the following parameter pairs: [u i, v j ] = [ u min + (u max u min ) i N, v min + (v max v min ) j M Taking these parameter pairs and substituting them into the parametric equations, point triplets r(u i, v j ), r(u i+1, v j ), r(u i, v j+1 ), and point triplets r(u i+1, v j ), r(u i+1, v j+1 ), r(u i, v j+1 ) are used to dene triangles. The tessellation process can be made adaptive as well, which uses small triangles only where the high curvature of the surface justies them. Let us start with the parameter rectangle and subdivide it to two triangles. In order to check the accuracy of the resulting triangle mesh, surface points corresponding to the edge midpoints of the parameter triangles are compared to the edge midpoints of the approximating triangles. Formally the following distance is computed (Figure 14.15): ( r u1 + u 2 2, v 1 + v 2 2 ) r(u 1, v 1 ) + r(u 2, v 2 ) 2 where (u 1, v 1 ) and (u 2, v 2 ) are the parameters of the two endpoints of the edge. A large distance value indicates that the triangle mesh poorly approximates the parametric surface, thus triangles must be subdivided further. This subdivision can, ].

82 Computer Graphics T vertex subdivision new T vertex recursive subdivision Figure T vertices and their elimination with forced subdivision. r i r i+1 h i h i-1 r i r i-1 =1/2 Σ =1/2 Σ +1/4 Σ Figure Construction of a subdivision curve: at each step midpoints are obtained, then the original vertices are moved to the weighted average of neighbouring midpoints and of the original vertex. be executed by cutting the triangle to two triangles by a line connecting the midpoint of the edge of the largest error and the opposing vertex. Alternatively, a triangle can be subdivided to four triangles with its halving lines. The adaptive tessellation is not necessarily robust since it can happen that the distance at the midpoint is small, but at other points is still quite large. When the adaptive tessellation is executed, it may happen that one triangle is subdivided while its neighbour is not, which results in holes. Such problematic midpoints are called T vertices (Figure 14.16). If the subdivision criterion is based only on edge properties, then T vertices cannot show up. However, if other properties are also taken into account, then T vertices may appear. In such cases, T vertices can be eliminated by recursively forcing the subdivision also for those neighbouring triangles that share subdivided edges Subdivision curves and meshes This section presents algorithms that smooth polyline and mesh models. Let us consider a polyline of vertices r 0,..., r m. A smoother polyline is generated by the following vertex doubling approach (Figure 14.17). Every line segment of the polyline is halved, and midpoints h 0,..., h m 1 are added to the polyline as new vertices. Then the old vertices are moved taking into account their old position and the positions of the two enclosing midpoints, applying the following weighting: r i = 1 2 r i h i h i = 3 4 r i r i r i+1. The new polyline looks much smoother. If we should not be satised with the smoothness yet, the same procedure can be repeated recursively. As can be shown, the

83 14.3. Geometry processing and tessellation algorithms 1033 =1/4 Σ =1/4 Σ +1/4 Σ =1/2 +1/16 Σ +1/16 Σ Figure One smoothing step of the Catmull-Clark subdivision. First the face points are obtained, then the edge midpoints are moved, and nally the original vertices are rened according to the weighted sum of its neighbouring edge and face points. result of the recursive process converges to the B-spline curve. The polyline subdivision approach can also be extended for smoothing threedimensional meshes. This method is called Catmull-Clark subdivision algorithm. Let us consider a three-dimensional quadrilateral mesh (Figure 14.18). In the rst step the midpoints of the edges are obtained, which are called edge points. Then face points are generated as the average of the vertices of each face polygon. Connecting the edge points with the face points, we still have the original surface, but now dened by four times more quadrilaterals. The smoothing step modies rst the edge points setting them to the average of the vertices at the ends of the edge and of the face points of those quads that share this edge. Then the original vertices are moved to the weighted average of the face points of those faces that share this vertex, and of edge points of those edges that are connected to this vertex. The weight of the original vertex is 1/2, the weights of edge and face points are 1/16. Again, this operation may be repeated until the surface looks smooth enough (Figure 14.19). If we do not want to smooth the mesh at an edge or around a vertex, then the averaging operation ignores the vertices on the other side of the edge to be preserved. The Catmull-Clark subdivision surface usually does not interpolate the original vertices. This drawback is eliminated by the buttery subdivision, which works on triangle meshes. First the buttery algorithm puts new edge points close to the midpoints of the original edges, then the original triangle is replaced by four triangles dened by the original vertices and the new edge points (Figure 14.20). The position of the new edge points depend on the vertices of those two triangles incident to this edge, and on those four triangles which share edges with these two. The arrangement of the triangles aecting the edge point resembles a buttery, hence the name of this algorithm. The edge point coordinates are obtained as a weighted sum of the edge endpoints multiplied by 1/2, the third vertices of the triangles sharing this edge using weight 1/8 + 2w, and nally of the other vertices of the additional triangles with weight 1/16 w. Parameter w can control the curvature of the resulting mesh.

84 Computer Graphics Figure Original mesh and its subdivision applying the smoothing step once, twice and three times, respectively. -1/16-w 1/8+2w -1/16-w 1/2 1/2-1/16-w 1/8+2w -1/16-w Figure Generation of the new edge point with buttery subdivision. Setting w = 1/16, the mesh keeps its original faceted look, while w = 0 results in strong rounding Tessellation of implicit surfaces A surface dened by implicit equation f(x, y, z) = 0 can be converted to a triangle mesh by nding points on the surface densely, i.e. generating points satisfying f(x, y, z) 0, then assuming the close points to be vertices of the triangles. First function f is evaluated at the grid points of the Cartesian coordinate system and the results are stored in a three-dimensional array, called voxel array. Let us call two grid points as neighbours if two of their coordinates are identical and the dierence in their third coordinate is 1. The function is evaluated at the grid points and is assumed to be linear between them. The normal vectors needed for shading are obtained as the gradient of function f (equation 14.4), which are also interpolated between the grid points. When we work with the voxel array, original function f is replaced by its tri-

85 14.3. Geometry processing and tessellation algorithms 1035 Figure Possible intersections of the per-voxel tri-linear implicit surface and the voxel edges. From the possible 256 cases, these 15 topologically dierent cases can be identied, from which the others can be obtained by rotations. Grid points where the implicit function has the same sign are depicted by circles. linear approximation (tri-linear means that xing any two coordinates the function is linear with respect to the third coordinate). Due to the linear approximation an edge connecting two neighbouring grid points can intersect the surface at most once since linear equations may have at most one root. The density of the grid points should reect this observation, then we have to dene them so densely not to miss roots, that is, not to change the topology of the surface. The method approximating the surface by a triangle mesh is called marching cubes algorithm. This algorithm rst decides whether a grid point is inside or outside of the solid by checking the sign of function f. If two neighbouring grid points are of dierent type, the surface must go between them. The intersection of the surface and the edge between the neighbouring points, as well as the normal vector at the intersection are determined by linear interpolation. If one grid point is at r 1, the other is at r 2, and function f has dierent signs at these points, then the intersection of the tri-linear surface and line segment ( r 1, r 2 ) is: r i = r 1 The normal vector here is: n i = gradf( r 1 ) f( r 2 ) f( r 2 ) f( r 1 ) + r f( r 1 ) 2 f( r 2 ) f( r 1 ). f( r 2 ) f( r 2 ) f( r 1 ) + gradf( r f( r 1 ) 2) f( r 2 ) f( r 1 ). Having found the intersection points, triangles are dened using these points

86 Computer Graphics as vertices. When dening these triangles, we have to take into account that a trilinear surface may intersect the voxel edges at most once. Such intersection occurs if function f has dierent signs at the two grid points. The number of possible variations of positive/negative signs at the 8 vertices of a cube is 256, from which 15 topologically dierent cases can be identied (Figure 14.21). The algorithm inspects grid points one by one and assigns the sign of the function to them encoding negative sign by 0 and non-negative sign by 1. The resulting 8 bit code is a number in 0255 which identies the current case of intersection. If the code is 0, all voxel vertices are outside the solid, thus no voxel surface intersection is possible. Similarly, if the code is 255, the solid is completely inside, making the intersections impossible. To handle other codes, a table can be built which describes where the intersections show up and how they form triangles. Exercises Prove the two ears theorem by induction Develop an adaptive curve tessellation algorithm Prove that the Catmull-Clark subdivision curve and surface converge to a B-spline curve and surface, respectively Build a table to control the marching cubes algorithm, which describes where the intersections show up and how they form triangles Propose a marching cubes algorithm that does not require the gradients of the function, but estimates these gradients from its values Containment algorithms When geometric models are processed, we often have to determine whether or not one object contains points belonging to the other object. If only yes/no answer is needed, we have a containment test problem. However, if the contained part also needs to be obtained, the applicable algorithm is called clipping. Containment test is also known as discrete time collision detection since if one object contains points from the other, then the two objects must have been collided before. Of course, checking collisions just at discrete time instances may miss certain collisions. To handle the collision problem robustly, continuous time collision detection is needed which also computes the time of the collision. Continuous time collision detection may use ray tracing (Section 14.6). In this section we only deal with the discrete time collision detection and the clipping of simple objects Point containment test A solid dened by function f contains those (x, y, z) points which satisfy inequality f(x, y, z) 0. It means that point containment test requires the evaluation of function f and the inspection of the sign of the result.

87 14.4. Containment algorithms 1037 in out out in convex polyhedron in out concave 1 2 polyhedron point Figure Polyhedron-point containment test. A convex polyhedron contains a point if the point is on that side of each face plane where the polyhedron is. To test a concave polyhedron, a half line is cast from the point and the number of intersections is counted. If the result is an odd number, then the point is inside, otherwise it is outside. Half space Based on equation (14.1), points belonging to a half space are identied by inequality ( r r 0 ) n 0, n x x + n y y + n z z + d 0, (14.9) where the normal vector is supposed to point inward. Convex polyhedron Any convex polyhedron can be constructed as the intersection of halfspaces (left of Figure 14.22). The plane of each face subdivides the space into two parts, to an inner part where the polyhedron can be found, and to an outer part. Let us test the point against the planes of the faces. If the point is in the inner part with respect to all planes, then the point is inside the polyhedron. However, if the point is in the outer part with respect to at least one plane, then the point is outside of the polyhedron. Concave polyhedron As shown in Figure 14.22, let us cast a half line from the tested point and count the number of intersections with the faces of the polyhedron (the calculation of these intersections is discussed in Section 14.6). If the result is an odd number, then the point is inside, otherwise it is outside. Because of numerical inaccuracies we might have diculties to count the number of intersections when the half line is close to the edges. In such cases, the simplest solution is to nd another half line and carry out the test with that. Polygon The methods proposed to test the point in polyhedron can also be used for polygons limiting the space to the two-dimensional plane. For example, a point is in a general polygon if the half line originating at this point and lying in the plane of the polygon intersects the edges of the polygon odd times. In addition to those methods, containment in convex polygons can be tested by adding the angles subtended by the edges from the point. If the sum is 360 degrees,

88 Computer Graphics then the point is inside, otherwise it is outside. For convex polygons, we can also test whether the point is on the same side of the edges as the polygon itself. This algorithm is examined in details for a particularly important special case, when the polygon is a triangle. Triangle Let us consider a triangle of vertices a, b and c, and point p lying in the plane of the triangle. The point is inside the triangle if and only if it is on the same side of the boundary lines as the third vertex. Note that cross product ( b a) ( p a) has a dierent direction for point p lying on the dierent sides of oriented line ab, thus the direction of this vector can be used to classify points (should point p be on line ab, the result of the cross product is zero). During classication the direction of ( b a) ( p a) is compared to the direction of vector n = ( b a) ( c a) where tested point p is replaced by third vertex c. Note that vector n happens to be the normal vector of the triangle plane (Figure 14.23). We can determine whether two vectors have the same direction (their angle is zero) or they have opposite directions (their angle is 180 degrees) by computing their scalar product and looking at the sign of the result. The scalar product of vectors of similar directions is positive. Thus if scalar product (( b a) ( p a)) n is positive, then point p is on the same side of oriented line ab as c. On the other hand, if this scalar product is negative, then p and c are on the opposite sides. Finally, if the result is zero, then point p is on line ab. Point p is inside the triangle if and only if all the following three conditions are met: (( b a) ( p a)) n 0, (( c b) ( p b)) n 0, (( a c) ( p c)) n 0. (14.10) This test is robust since it gives correct result even if due to numerical precision problems point p is not exactly in the plane of the triangle as long as point p is in the prism obtained by perpendicularly extruding the triangle from the plane. The evaluation of the test can be speeded up if we work in a two-dimensional projection plane instead of the three-dimensional space. Let us project point p as well as the triangle onto one of the coordinate planes. In order to increase numerical precision, that coordinate plane should be selected on which the area of the projected triangle is maximal. Let us denote the Cartesian coordinates of the normal vector by (n x, n y, n z ). If n z has the maximum absolute value, then the projection of the maximum area is on coordinate plane xy. If n x or n y had the maximum absolute value, then planes yz or xz would be the right choice. Here only the case of maximum n z is discussed. First the order of vertices are changed in a way that when travelling from vertex a to vertex b, vertex c is on the left side. Let us examine the equation of line ab: b y a y b x a x (x b x ) + b y = y. According to Figure point c is on the left of the line if c y is above the line

89 14.4. Containment algorithms 1039 n - b a ( b- a ) x ( p- a ) a b - p a c - a c - p c p ( a- c ) x ( p- c ) Figure Point in triangle containment test. The gure shows that case when point p is on the left of oriented lines ab and bc, and on the right of line ca, that is, when it is not inside the triangle. case 1: ( b - a ) > 0 x x a c a b or b a c b case 2: ( b x - a x ) < 0 or c c a b Figure Point in triangle containment test on coordinate plane xy. Third vertex c can be either on the left or on the right side of oriented line ab, which can always be traced back to the case of being on the left side by exchanging the vertices. at x = c x : b y a y (c x b x ) + b y < c y. b x a x Multiplying both sides by (b x a x ), we get: (b y a y ) (c x b x ) < (c y b y ) (b x a x ). In the second case the denominator of the slope of the line is negative. Point c is on the left of the line if c y is below the line at x = c x : b y a y b x a x (c x b x ) + b y > c y. When the inequality is multiplied with negative denominator (b x a x ), the relation is inverted: (b y a y ) (c x b x ) < (c y b y ) (b x a x ).

90 Computer Graphics vertex penetration edge penetration Figure Polyhedron-polyhedron collision detection. Only a part of collision cases can be recognized by testing the containment of the vertices of one object with respect to the other object. Collision can also occur when only edges meet, but vertices do not penetrate to the other object. Note that in both cases we obtained the same condition. If this condition is not met, then point c is not on the left of line ab, but is on the right. Exchanging vertices a and b in this case, we can guarantee that c will be on the left of the new line ab. It is also important to note that consequently point a will be on the left of line bc and point b will be on the left of line ca. In the second step the algorithm tests whether point p is on the left with respect to all three boundary lines since this is the necessary and sucient condition of being inside the triangle: (b y a y ) (p x b x ) (p y b y ) (b x a x ), (c y b y ) (p x c x ) (p y c y ) (c x b x ), (a y c y ) (p x a x ) (p y a y ) (a x c x ). (14.11) Polyhedron-polyhedron collision detection Two polyhedra collide when a vertex of one of them meets a face of the other, and if they are not bounced o, the vertex goes into the internal part of the other object (Figure 14.25). This case can be recognized with the discussed containment test. All vertices of one polyhedron is tested for containment against the other polyhedron. Then the roles of the two polyhedra are exchanged. Apart from the collision between vertices and faces, two edges may also meet without vertex penetration (Figure 14.25). In order to recognize this edge penetration case, all edges of one polyhedron are tested against all faces of the other polyhedron. The test for an edge and a face is started by checking whether or not the two endpoints of the edge are on opposite sides of the plane, using inequality (14.9). If they are, then the intersection of the edge and the plane is calculated, and nally it is decided whether the face contains the intersection point. Polyhedra collision detection tests each edge of one polyhedron against each face of the other polyhedron, which results in an algorithm of quadratic time complexity with respect to the number of vertices of the polyhedra. Fortunately, the algorithm can be speeded up applying bounding volumes (Subsection ). Let us assign a simple bounding object to each polyhedron. Popular choices for bounding volumes are the sphere and the box. During testing the collision of two objects, rst their bounding volumes are examined. If the two bounding volumes do not collide, then

91 14.4. Containment algorithms 1041 neither can the contained polyhedra collide. If the bounding volumes penetrate each other, then one polyhedra is tested against the other bounding volume. If this test is also positive, then nally the two polyhedra are tested. However, this last test is rarely required, and most of the collision cases can be solved by bounding volumes Clipping algorithms Clipping takes an object dening the clipping region and removes those points from another object which are outside the clipping region. Clipping may alter the type of the object, which cannot be specied by a similar equation after clipping. To avoid this, we allow only those kinds of clipping regions and objects where the object type is not changed by clipping. Let us assume that the clipping region is a half space or a polyhedron, while the object to be clipped is a point, a line segment or a polygon. If the object to be clipped is a point, then containment can be tested with the algorithms of the previous subsection. Based on the result of the containment test, the point is either removed or preserved. Clipping a line segment onto a half space Let us consider a line segment of endpoints r 1 and r 2, and of equation r(t) = r 1 (1 t) + r 2 t, (t [0, 1]), and a half plane dened by the following equation derived from equation (14.1): ( r r 0 ) n 0, n x x + n y y + n z z + d 0. Three cases need to be distinguished: 1. If both endpoints of the line segment are in the half space, then all points of the line segment are inside, thus the whole segment is preserved. 2. If both endpoints are out of the half space, then all points of the line segment are out, thus the line segment should be completely removed. 3. If one of the endpoints is out, while the other is in, then the endpoint being out should be replaced by the intersection point of the line segment and the boundary plane of the half space. The intersection point can be calculated by substituting the equation of the line segment into the equation of the boundary plane and solving the resulting equation for the unknown parameter: ( r 1 (1 t i ) + r 2 t i r 0 ) n = 0, = t i = ( r 0 r 1 ) n ( r 2 r 1 ) n. Substituting parameter t i into the equation of the line segment, the coordinates of the intersection point can also be obtained. Clipping a polygon onto a half space This clipping algorithm tests rst whether a vertex is inside or not. If the vertex is in, then it is also the vertex of the resulting polygon. However, if it is out, it can be ignored. On the other hand, the resulting polygon may have vertices other than the vertices of the original polygon. These new vertices are the intersections of the edges and the boundary plane of the half space. Such intersection occurs when one

92 Computer Graphics q[2] p[1] q[1] p[2] p[0] q[0] p[3] p[4] q[3] p[5] q[4] clipping plane Figure Clipping of simple convex polygon p[0],..., p[5] results in polygon q[0],..., q[4]. The vertices of the resulting polygon are the inner vertices of the original polygon and the intersections of the edges and the boundary plane. endpoint is in, but the other is out. While we are testing the vertices one by one, we should also check whether or not the next vertex is on the same side as the current vertex (Figure 14.26). Suppose that the vertices of the polygon to be clipped are given in array p = p[0],..., p[n 1], and the vertices of the clipped polygon is expected in array q = q[0],..., q[m 1]. The number of the vertices of the resulting polygon is stored in variable m. Note that the vertex followed by the ith vertex has usually index (i + 1), but not in the case of the last, (n 1)th vertex, which is followed by vertex 0. Handling the last vertex as a special case is often inconvenient. This can be eliminated by extending input array p by new element p[n] = p[0], which holds the element of index 0 once again. Using these assumptions, the Sutherland-Hodgeman polygon clipping algorithm is: Sutherland-Hodgeman-Polygon-Clipping(p) 1 m 0 2 for i 0 to n 1 3 do if p[i] is inside 4 then q[m] p[i] The ith vertex is the vertex of the resulting polygon. 5 m m if p[i + 1] is outside 7 then q[m] Edge-Plane-Intersection( p[i], p[i + 1]) 8 m m else if p[i + 1] is inside 10 then q[m] Edge-Plane-Intersection( p[i], p[i + 1]) 11 m m return q

93 14.4. Containment algorithms 1043 even number of boundaries double boundary Figure When concave polygons are clipped, the parts that should fall apart are connected by even number of edges. Let us apply this algorithm for such a concave polygon which is expected to fall to several pieces during clipping (Figure 14.27). The algorithm storing the polygon in a single array is not able to separate the pieces and introduces even number of edges at parts where no edge could show up. These even number of extra edges, however, pose no problems if the interior of the polygon is dened as follows: a point is inside the polygon if and only if starting a half line from here, the boundary polyline is intersected by odd number of times. The presented algorithm is also suitable for clipping multiple connected polygons if the algorithm is executed separately for each closed polyline of the boundary. Clipping line segments and polygons on a convex polyhedron As stated, a convex polyhedron can be obtained as the intersection of the half spaces dened by the planes of the polyhedron faces (left of Figure 14.22). It means that clipping on a convex polyhedron can be traced back to a series of clipping steps on half spaces. The result of one clipping step on a half plane is the input of clipping on the next half space. The nal result is the output of the clipping on the last half space. Clipping a line segment on an AABB Axis aligned bounding boxes, abbreviated as AABBs, play an important role in image synthesis. Denition A box aligned parallel to the coordinate axes is called AABB. An AABB is specied with the minimum and maximum Cartesian coordinates: [xmin, ymin, zmin, xmax, ymax, zmax]. Although when an object is clipped on an AABB, the general algorithms that clip on a convex polyhedron could also be used, the importance of AABBs is acknowledged by developing algorithms specially tuned for this case. When a line segment is clipped to a polyhedron, the algorithm would test the line segment with the plane of each face, and the calculated intersection points may turn out to be unnecessary later. We should thus nd an appropriate order of planes which makes the number of unnecessary intersection calculations minimal. A simple method that implements this idea is the Cohen-Sutherland line clipping

94 Computer Graphics Figure The 4-bit codes of the points in a plane and the 6-bit codes of the points in space. algorithm. Let us assign code bit 1 to a point that is outside with respect to a clipping plane, and code bit 0 if the point is inside with respect to this plane. Since an AABB has 6 sides, we get 6 bits forming a 6-bit code word (Figure 14.28). The interpretation of code bits C[0],..., C[5] is the following: { { { 1, x xmin, 1, x xmax, 1, y ymin, C[0] = C[1] = C[2] = 0 otherwise. 0 otherwise. 0 otherwise. C[3] = { 1, y ymax, 0 otherwise. C[4] = { 1, z zmin, 0 otherwise. C[5] = { 1, z zmax, 0 otherwise. Points of code word are obviously inside, points of other code words are outside (Figure 14.28). Let the code words of the two endpoints of the line segment be C 1 and C 2, respectively. If both of them are zero, then both endpoints are inside, thus the line segment is completely inside (trivial accept). If the two code words contain bit 1 at the same location, then none of the endpoints are inside with respect to the plane associated with this code bit. This means that the complete line segment is outside with respect to this plane, and can be rejected (trivial reject). This examination can be executed by applying the bitwise AND operation on code words C 1 and C 2 (with the notations of the C programming language C 1 & C 2 ), and checking whether or not the result is zero. If it is not zero, there is a bit where both code words have value 1. Finally, if none of the two trivial cases hold, then there must be a bit which is 0 in one code word and 1 in the other. This means that one endpoint is inside and the other is outside with respect to the plane corresponding to this bit. The line segment should be clipped on this plane. Then the same procedure should be repeated starting with the evaluation of the code bits. The procedure is terminated when the conditions of either the trivial accept or the trivial reject are met. The Cohen-Sutherland line clipping algorithm returns the endpoints of the clipped line by modifying the original vertices and indicates with true return value if the line is not completely rejected:

95 14.5. Translation, distortion, geometric transformations 1045 Cohen-Sutherland-Line-Clipping( r 1, r 2 ) 1 C 1 codeword of r 1 Code bits by checking the inequalities. 2 C 2 codeword of r 2 3 while true 4 do if C 1 = 0 AND C 2 = 0 5 then return true Trivial accept: inner line segment exists. 6 if C 1 & C then return false Trivial reject: no inner line segment exists. 8 f index of the rst bit where C 1 and C 2 dier 9 r i intersection of line segment ( r 1, r 2 ) and the plane of index f 10 C i codeword of r i 11 if C 1 [f] = 1 12 then r 1 r i 13 C 1 C i r 1 is outside w.r.t. plane f. 14 else r 2 r i 15 C 2 C i r 2 is outside w.r.t. plane f. Exercises Propose approaches to reduce the quadratic complexity of polyhedronpolyhedron collision detection Develop a containment test to check whether a point is in a CSG-tree Develop an algorithm clipping one polygon onto a concave polygon Find an algorithm computing the bounding sphere and the bounding AABB of a polyhedron Develop an algorithm that tests the collision of two triangles in the plane Generalize the Cohen-Sutherland line clipping algorithm to convex polyhedron clipping region Propose a method for clipping a line segment on a sphere Translation, distortion, geometric transformations Objects in the virtual world may move, get distorted, grow or shrink, that is, their equations may also depend on time. To describe dynamic geometry, we usually apply two functions. The rst function selects those points of space, which belong to the object in its reference state. The second function maps these points onto points dening the object in an arbitrary time instance. Functions mapping the space onto itself are called transformations. A transformation maps point r to point r = T ( r). If the transformation is invertible, we can also nd the original for some transformed point r using inverse transformation T 1 ( r ). If the object is dened in its reference state by inequality f( r) 0, then the

96 Computer Graphics points of the transformed object are { r : f(t 1 ( r )) 0}, (14.12) since the originals belong to the set of points of the reference state. Parametric equations dene the Cartesian coordinates of the points directly. Thus the transformation of parametric surface r = r(u, v) requires the transformations of its points r (u, v) = T ( r(u, v)). (14.13) Similarly, the transformation of curve r = r(t) is: r (t) = T ( r(t)). (14.14) Transformation T may change the type of object in the general case. It can happen, for example, that a simple triangle or a sphere becomes a complicated shape, which are hard to describe and handle. Thus it is worth limiting the set of allowed transformations. Transformations mapping planes onto planes, lines onto lines and points onto points are particularly important. In the next subsection we consider the class of homogeneous linear transformations, which meet this requirement Projective geometry and homogeneous coordinates So far the construction of the virtual world has been discussed using the means of the Euclidean geometry, which gave us many important concepts such as distance, parallelism, angle, etc. However, when the transformations are discussed in details, many of these concepts are unimportant, and can cause confusion. For example, parallelism is a relationship of two lines which can lead to singularities when the intersection of two lines is considered. Therefore, transformations are discussed in the context of another framework, called projective geometry. The axioms of projective geometry turn around the problem of parallel lines by ignoring the concept of parallelism altogether, and state that two dierent lines always have an intersection. To cope with this requirement, every line is extended by a point at innity such that two lines have the same extra point if and only if the two lines are parallel. The extra point is called the ideal point. The projective space contains the points of the Euclidean space (these are the so called ane points) and the ideal points. An ideal point glues the ends of an Euclidean line, making it topologically similar to a circle. Projective geometry preserves that axiom of the Euclidean geometry which states that two points dene a line. In order to make it valid for ideal points as well, the set of lines of the Euclidean space is extended by a new line containing the ideal points. This new line is called the ideal line. Since the ideal points of two lines are the same if and only if the two lines are parallel, the ideal lines of two planes are the same if and only if the two planes are parallel. Ideal lines are on the ideal plane, which is added to the set of planes of the Euclidean space. Having made these extensions, no distinction is needed between the ane and ideal points. They are equal members of the projective space. Introducing analytic geometry we noted that everything should be described by numbers in computer graphics. Cartesian coordinates used so far are in one to

97 14.5. Translation, distortion, geometric transformations 1047 h [x. h,y. h,h] line h=1 [x,y,1] [X,Y,h] point h h y x X h Y h [X h,y h,0] point Figure The embedded model of the projective plane: the projective plane is embedded into a three-dimensional Euclidean space, and a correspondence is established between points of the projective plane and lines of the embedding three-dimensional Euclidean space by tting the line to the origin of the three-dimensional space and the given point. one relationship with the points of Euclidean space, thus they are inappropriate to describe the points of the projective space. For the projective plane and space, we need a dierent algebraic base. Projective plane Let us consider rst the projective plane and nd a method to describe its points by numbers. To start, a Cartesian coordinate system x, y is set up in this plane. Simultaneously, another Cartesian system X h, Y h, h is established in the three-dimensional space embedding the plane in a way that axes X h, Y h are parallel to axes x, y, the plane is perpendicular to axis h, the origin of the Cartesian system of the plane is in point (0, 0, 1) of the three-dimensional space, and the points of the plane satisfy equation h = 1. The projective plane is thus embedded into a three-dimensional Euclidean space where points are dened by Descartes-coordinates (Figure 14.29). To describe a point of the projective plane by numbers, a correspondence is found between the points of the projective plane and the points of the embedding Euclidean space. An appropriate correspondence assigns that line of the Euclidean space to either ane or ideal point P of the projective plane, which is dened by the origin of the coordinate system of the space and point P. Points of an Euclidean line that crosses the origin can be dened by parametric equation [t X h, t Y h, t h] where t is a free real parameter. If point P is an ane point of the projective plane, then the corresponding line is not parallel with plane h = 1 (i.e. h is not constant zero). Such line intersects the plane of equation h = 1 at point [X h /h, Y h /h, 1], thus the Cartesian coordinates of point P in planar coordinate system x, y are (X h /h, Y h /h). On the other hand, if point P is ideal, then the corresponding line is parallel to the plane of equation h = 1 (i.e. h = 0). The direction of the ideal point is given by vector (X h, Y h ). The presented approach assigns three dimensional lines crossing the origin and eventually [X h, Y h, h] triplets to both the ane and the ideal points of the projective plane. These triplets are called the homogeneous coordinates of a point in the projective plane. Homogeneous coordinates are enclosed by brackets to distinguish them from Cartesian coordinates. A three-dimensional line crossing the origin and describing a point of the pro-

98 Computer Graphics jective plane can be dened by its arbitrary point except the origin. Consequently, all three homogeneous coordinates cannot be simultaneously zero, and homogeneous coordinates can be freely multiplied by the same non-zero scalar without changing the described point. This property justies the name homogeneous. It is often convenient to select that triplet from the homogeneous coordinates of an ane point, where the third homogeneous coordinate is 1 since in this case the rst two homogeneous coordinates are identical to the Cartesian coordinates: X h = x, Y h = y, h = 1. (14.15) >From another point of view, Cartesian coordinates of an ane point can be converted to homogeneous coordinates by extending the pair by a third element of value 1. The embedded model also provides means to dene the equations of the lines and line segments of the projective space. Let us select two dierent points on the projective plane and specify their homogeneous coordinates. The two points are different if homogeneous coordinates [X 1 h, Y 1 h, h1 ] of the rst point cannot be obtained as a scalar multiple of homogeneous coordinates [X 2 h, Y 2 h, h2 ] of the other point. In the embedding space, triplet [X h, Y h, h] can be regarded as Cartesian coordinates, thus the equation of the line tted to points [X 1 h, Y 1 h, h1 ] and [X 2 h, Y 2 h, h2 ] is: X h (t) = X 1 h (1 t) + X 2 h t, Y h (t) = Y 1 h (1 t) + Y 2 h t, (14.16) h(t) = h 1 (1 t) + h 2 t. If h(t) 0, then the ane points of the projective plane can be obtained by projecting the three-dimensional space onto the plane of equation h = 1. Requiring the two points be dierent, we excluded the case when the line would be projected to a single point. Hence projection maps lines to lines. Thus the presented equation really identies the homogeneous coordinates dening the points of the line. If h(t) = 0, then the equation expresses the ideal point of the line. If parameter t has an arbitrary real value, then the points of a line are dened. If parameter t is restricted to interval [0, 1], then we obtain the line segment dened by the two endpoints. Projective space We could apply the same method to introduce homogeneous coordinates of the projective space as we used to dene the homogeneous coordinates of the projective plane, but this approach would require the embedding of the three-dimensional projective space into a four-dimensional Euclidean space, which is not intuitive. We would rather discuss another construction, which works in arbitrary dimensions. In this construction, a point is described as the centre of mass of a mechanical system. To identify a point, let us place weight X h at reference point p 1, weight Y h at reference point p 2, weight Z h at reference point p 3, and weight w at reference point p 4. The centre of mass of this mechanical system is: r = X h p 1 + Y h p 2 + Z h p 3 + w p 4 X h + Y h + Z h + w.

99 14.5. Translation, distortion, geometric transformations 1049 Let us denote the total weight by h = X h + Y h + Z h + w. By denition, elements of quadruple [X h, Y h, Z h, h] are the homogeneous coordinates of the centre of mass. To nd the correspondence between homogeneous and Cartesian coordinates, the relationship of the two coordinate systems (the relationship of the basis vectors and the origin of the Cartesian coordinate system and of the reference points of the homogeneous coordinate system) must be established. Let us assume, for example, that the reference points of the homogeneous coordinate system are in points (1,0,0), (0,1,0), (0,0,1), and (0,0,0) of the Cartesian coordinate system. The centre of mass (assuming that total weight h is not zero) is expressed in Cartesian coordinates as follows: r[x h, Y h, Z h, h] = 1 h (X h (1, 0, 0)+Y h (0, 1, 0)+Z h (0, 0, 1)+w (0, 0, 0)) = ( Xh h, Y h h, Z ) h. h Hence the correspondence between homogeneous coordinates [X h, Y h, Z h, h] and Cartesian coordinates (x, y, z) is (h 0): x = X h h, y = Y h h, z = Z h h. (14.17) The equations of lines in the projective space can be obtained either deriving them from the embedding four-dimensional Cartesian space, or using the centre of mass analogy: X h (t) = X 1 h (1 t) + X 2 h t, Y h (t) = Y 1 h (1 t) + Y 2 h t, Z h (t) = Z 1 h (1 t) + Z 2 h t, (14.18) h(t) = h 1 (1 t) + h 2 t. If parameter t is restricted to interval [0, 1], then we obtain the equation of the projective line segment. To nd the equation of the projective plane, the equation of the Euclidean plane is considered (equation 14.1). The Cartesian coordinates of the points on an Euclidean plane satisfy the following implicit equation n x x + n y y + n z z + d = 0. Using the correspondence between the Cartesian and homogeneous coordinates (equation 14.17) we still describe the points of the Euclidean plane but now with homogeneous coordinates: n x Xh h + n y Yh h + n z Zh h + d = 0. Let us multiply both sides of this equation by h, and add those points to the plane which have h = 0 coordinate and satisfy this equation. With this step the set of points of the Euclidean plane is extended with the ideal points, that is, we obtained the set of points belonging to the projective plane. Hence the equation of the projective plane is a homogeneous linear equation: n x X h + n y Y h + n z Z h + d h = 0, (14.19)

100 Computer Graphics or in matrix form: [X h, Y h, Z h, h] n x n y n z d = 0. (14.20) Note that points and planes are described by row and column vectors, respectively. Both the quadruples of points and the quadruples of planes have the homogeneous property, that is, they can be multiplied by non-zero scalars without altering the solutions of the equation Homogeneous linear transformations Transformations dened as the multiplication of the homogeneous coordinate vector of a point by a constant 4 4 T matrix are called homogeneous linear transformations: [X h, Y h, Z h, h ] = [X h, Y h, Z h, h] T. (14.21) Theorem Homogeneous linear transformations map points to points. Proof A point can be dened by homogeneous coordinates in form λ [X h, Y h, Z h, h], where λ is an arbitrary, non-zero constant. The transformation results in λ [X h, Y h, Z h, h ] = λ [X h, Y h, Z h, h] T when a point is transformed, which are the λ-multiples of the same vector, thus the result is a single point in homogeneous coordinates. Note that due to the homogeneous property, homogeneous transformation matrix T is not unambiguous, but can be freely multiplied by non-zero scalars without modifying the realized mapping. Theorem Invertible homogeneous linear transformations map lines to lines. Proof Let us consider the parametric equation of a line: [X h (t), Y h (t), Z h (t), h(t)] = [X 1 h, Y 1 h, Z 1 h, h 1 ] (1 t)+[x 2 h, Y 2 h, Z 2 h, h 2 ] t, t = (, ), and transform the points of this line by multiplying the quadruples with the transformation matrix: [X h(t), Y h(t), Z h(t), h (t)] = [X h (t), Y h (t), Z h (t), h(t)] T = [X 1 h, Y 1 h, Z 1 h, h 1 ] T (1 t) + [X 2 h, Y 2 h, Z 2 h, h 2 ] T t = [X 1 h, Y 1 h, Z 1 h, h 1 ] (1 t) + [X 2 h, Y 2 h, Z 2 h, h 2 ] t, where [Xh 1, Y 1 h, Z 1 h, h 1 ] and [Xh 2, Y 2 h, Z 2 h, h 2 ] are the transformations of [Xh 1, Y h 1, Z1 h, h1 ] and [Xh 2, Y h 2, Z2 h, ], respectively. Since the transformation is invertible, the two points are dierent. h2 The resulting equation is the equation of a line tted to the transformed points. We note that if we had not required the invertibility of the the transformation, then it could have happened that the transformation would have mapped the two

101 14.5. Translation, distortion, geometric transformations 1051 points to the same point, thus the line would have degenerated to single point. If parameter t is limited to interval [0, 1], then we obtain the equation of the projective line segment, thus we can also state that a homogeneous linear transformation maps a line segment to a line segment. Even more generally, a homogeneous linear transformation maps convex combinations to convex combinations. For example, triangles are also mapped to triangles. However, we have to be careful when we try to apply this theorem in the Euclidean plane or space. Let us consider a line segment as an example. If coordinate h has dierent sign at the two endpoints, then the line segment contains an ideal point. Such projective line segment can be intuitively imagined as two half lines and an ideal point sticking the endpoints of these half lines at innity, that is, such line segment is the complement of the line segment we are accustomed to. It may happen that before the transformation, coordinates h of the endpoints have similar sign, that is, the line segment meets our intuitive image about Euclidean line segments, but after the transformation, coordinates h of the endpoints will have dierent sign. Thus the transformation wraps around our line segment. Theorem Invertible homogeneous linear transformations map planes to planes. Proof The originals of transformed points [X h, Y h, Z h, h ] dened by [X h, Y h, Z h, h] = [X h, Y h, Z h, h ] T 1 are on a plane, thus satisfy the original equation of the plane: [X h, Y h, Z h, h] n x n y n z d = [X h, Y h, Z h, h ] T 1 n x n y n z d = 0. Due to the associativity of matrix multiplication, the transformed points also satisfy equation [X h, Y h, Z h, h ] which is also a plane equation, where n x n y n z d = T 1 n x n y n z d = 0, This result can be used to obtain the normal vector of a transformed plane. An important subclass of homogeneous linear transformations is the set of ane transformations, where the Cartesian coordinates of the transformed point are linear functions of the original Cartesian coordinates: n x n y n z d. [x, y, z ] = [x, y, z] A + [p x, p y, p z ], (14.22)

102 Computer Graphics where vector p describes translation, A is a matrix of size 3 3 and expresses rotation, scaling, mirroring, etc., and their arbitrary combination. For example, the rotation around axis (t x, t y, t z ), ( (t x, t y, t z ) = 1) by angle φ is given by the following matrix A = (1 t 2 x) cos φ + t 2 x t x t y (1 cos φ) + t z sin φ t x t z (1 cos φ) t y sin φ t y t x (1 cos φ) t z sin φ (1 t 2 y) cos φ + t 2 y t x t z (1 cos φ) + t x sin φ. t z t x (1 cos φ) + t y sin φ t z t y (1 cos φ) t x sin φ (1 t 2 z) cos φ + t 2 z This expression is known as the Rodrigues formula. Ane transformations map the Euclidean space onto itself, and transform parallel lines to parallel lines. Ane transformations are also homogeneous linear transformations since equation (14.22) can also be given as a 4 4 matrix operation, having changed the Cartesian coordinates to homogeneous coordinates by adding a fourth coordinate of value 1: [x, y, z, 1] = [x, y, z, 1] A 11 A 12 A 13 0 A 21 A 22 A 23 0 A 31 A 32 A 33 0 p x p y p z 1 = [x, y, z, 1] T. (14.23) A further specialization of ane transformations is the set of congruence transformations (isometries) which are distance and angle preserving. Theorem In a congruence transformation the rows of matrix A have unit length and are orthogonal to each other. Proof Let us use the property that a congruence is distance and angle preserving for the case when the origin and the basis vectors of the Cartesian system are transformed. The transformation assigns point (p x, p y, p z ) to the origin and points (A 11 + p x, A 12 + p y, A 13 + p z ), (A 21 + p x, A 22 + p y, A 23 + p z ), and (A 31 + p x, A 32 + p y, A 33 + p z ) to points (1, 0, 0), (0, 1, 0), and (0, 0, 1), respectively. Because the distance is preserved, the distances between the new points and the new origin are still 1, thus (A 11, A 12, A 13 ) = 1, (A 21, A 22, A 23 ) = 1, and (A 31, A 32, A 33 ) = 1. On the other hand, because the angle is also preserved, vectors (A 11, A 12, A 13 ), (A 21, A 22, A 23 ), and (A 31, A 32, A 33 ) are also perpendicular to each other. Exercises Using the Cartesian coordinate system as an algebraic basis, prove the axioms of the Euclidean geometry, for example, that two points dene a line, and that two dierent lines may intersect each other at most at one point Using the homogeneous coordinates as an algebraic basis, prove an axiom of the projective geometry stating that two dierent lines intersect each other in exactly one point Prove that homogeneous linear transformations map line segments to line segments using the centre of mass analogy How does an ane transformation modify the volume of an object? Give the matrix of that homogeneous linear transformation which translates by vector p.

103 14.6. Rendering with ray tracing 1053 Figure Ray tracing Prove the Rodrigues formula A solid dened by inequality f( r) 0 in time t = 0 moves with uniform constant velocity v. Let us nd the inequality of the solid at an arbitrary time instance t Prove that if the rows of matrix A are of unit length and are perpendicular to each other, then the ane transformation is a congruence. Show that for such matrices A 1 = A T Give that homogeneous linear transformation which projects the space from point c onto a plane of normal n and place vector r Show that ve point correspondences unambiguously identify a homogeneous linear transformation if no four points are co-planar Rendering with ray tracing When a virtual world is rendered, we have to identify the surfaces visible in dierent directions from the virtual eye. The set of possible directions is dened by a rectangle shaped window which is decomposed to a grid corresponding to the pixels of the screen (Figure 14.30). Since a pixel has a unique colour, it is enough to solve the visibility problem in a single point of each pixel, for example, in the points corresponding to pixel centres. The surface visible at a direction from the eye can be identied by casting a half line, called ray, and identifying its intersection closest to the eye position. This operation is called ray tracing. Ray tracing has many applications. For example, shadow computation tests whether or not a point is occluded from the light source, which requires a ray to be sent from the point at the direction of the light source and the determination whether this ray intersects any surface closer than the light source. Ray tracing is also used by collision detection since a point moving with constant and uniform speed collides that surface which is rst intersected by the ray describing the motion of the point.

104 Computer Graphics A ray is dened by the following equation: ray(t) = s + v t, (t > 0), (14.24) where s is the place vector of the ray origin, v is the direction of the ray, and ray parameter t characterizes the distance from the origin. Let us suppose that direction vector v has unit length. In this case parameter t is the real distance, otherwise it would only be proportional to the distance 3. If parameter t is negative, then the point is behind the eye and is obviously not visible. The identication of the closest intersection with the ray means the determination of the intersection point having the smallest, positive ray parameter. In order to nd the closest intersection, the intersection calculation is tried with each surface, and the closest is retained. This algorithm obtaining the rst intersection is: Ray-First-Intersection( s, v) 1 t tmax Initialization to the maximum size in the virtual world. 2 for each object o 3 do t o Ray-Surface-Intersection( s, v) Negative if no intersection exists. 4 if 0 t o < t Is the new intersection closer? 5 then t t o Ray parameter of the closest intersection so far. 6 ovisible o Closest object so far. 7 if t < tmax then Has been intersection at all? 8 then x s + v t Intersection point using the ray equation. 9 return t, x, ovisible 10 else return no intersection No intersection. This algorithm inputs the ray dened by origin s and direction v, and outputs the ray parameter of the intersection in variable t, the intersection point in x, and the visible object in ovisible. The algorithm calls function Ray-Surface-Intersection for each object, which determines the intersection of the ray and the given object, and indicates with a negative return value if no intersection exists. Function Ray- Surface-Intersection should be implemented separately for each surface type Ray surface intersection calculation The identication of the intersection between a ray and a surface requires the solution of an equation. The intersection point is both on the ray and on the surface, thus it can be obtained by inserting the ray equation into the equation of the surface and solving the resulting equation for the unknown ray parameter. Intersection calculation for implicit surfaces For implicit surfaces of equation f( r) = 0, the intersection can be calculated by solving the following scalar equation for t: f( s + v t) = 0. 3 In collision detection v is not a unit vector, but the velocity of the moving point since this makes ray parameter t express the collision time.

105 14.6. Rendering with ray tracing 1055 Let us take the example of quadrics that include the sphere, the ellipsoid, the cylinder, the cone, the paraboloid, etc. The implicit equation of a general quadric contains a quadratic form: [x, y, z, 1] Q x y z 1 = 0, where Q is a 4 4 matrix. Substituting the ray equation into the equation of the surface, we obtain [s x + v x t, s y + v y t, s z + v z t, 1] Q s x + v x t s y + v y t s z + v z t 1 = 0. Rearranging the terms, we get a second order equation for unknown parameter t: t 2 (v Q v T ) + t (s Q v T + v Q s T ) + (s Q s T ) = 0, where v = [v x, v y, v z, 0] and s = [s x, s y, s z, 1]. This equation can be solved using the solution formula of second order equations. Now we are interested in only the real and positive roots. If two such roots exist, then the smaller one corresponds to the intersection closer to the origin of the ray. Intersection calculation for parametric surfaces The intersection of parametric surface r = r(u, v) and the ray is calculated by rst solving the following equation for unknown parameters u, v, t r(u, v) = s + t v, then checking whether or not t is positive and parameters u, v are inside the allowed parameter range of the surface. Roots of non-linear equations are usually found by numeric methods. On the other hand, the surface can also be approximated by a triangle mesh, which is intersected by the ray. Having obtained the intersection on the coarse mesh, the mesh around this point is rened, and the intersection calculation is repeated with the rened mesh. Intersection calculation for a triangle To compute the ray intersection for a triangle of vertices a, b, and c, rst the ray intersection with the plane of the triangle is found. Then it is decided whether or not the intersection point with the plane is inside the triangle. The normal and a place vector of the triangle plane are n = ( b a) ( c a), and a, respectively, thus points r of the plane satisfy the following equation: n ( r a) = 0. (14.25)

106 Computer Graphics The intersection of the ray and this plane is obtained by substituting the ray equation (equation (14.24)) into this plane equation, and solving it for unknown parameter t. If root t is positive, then it is inserted into the ray equation to get the intersection point with the plane. However, if the root is negative, then the intersection is behind the origin of the ray, thus is invalid. Having a valid intersection with the plane of the triangle, we check whether this point is inside the triangle. This is a containment problem, which is discussed in Subsection Intersection calculation for an AABB The surface of an AABB, that is an axis aligned block, can be subdivided to 6 rectangular faces, or alternatively to 12 triangles, thus its intersection can be solved by the algorithms discussed in the previous subsections. However, realizing that in this special case the three coordinates can be handled separately, we can develop more ecient approaches. In fact, an AABB is the intersection of an x-stratum dened by inequality xmin x xmax, a y-stratum dened by ymin y ymax and a z-stratum of inequality zmin z zmax. For example, the ray parameters of the intersections with the x-stratum are: t 1 x = x min s x v x, t 2 x = x max s x v x. The smaller of the two parameter values corresponds to the entry at the stratum, while the greater to the exit. Let us denote the ray parameter of the entry by t in, and the ray parameter of the exit by t out. The ray is inside the x-stratum while the ray parameter is in [t in, t out ]. Repeating the same calculation for the y and z-strata as well, three ray parameter intervals are obtained. The intersection of these intervals determine when the ray is inside the AABB. If parameter t out obtained as the result of intersecting the strata is negative, then the AABB is behind the eye, thus no rayaabb intersection is possible. If only t in is negative, then the ray starts at an internal point of the AABB, and the rst intersection is at t out. Finally, if t in is positive, then the ray enters the AABB from outside at parameter t in. The computation of the unnecessary intersection points can be reduced by applying the CohenSutherland line clipping algorithm (subsection ). First, the ray is replaced by a line segment where one endpoint is the origin of the ray, and the other endpoint is an arbitrary point on the ray which is farther from the origin than any object of the virtual world. Then this line segment is tried to be clipped by the AABB. If the Cohen Sutherland algorithm reports that the line segment has no internal part, then the ray has no intersection with the AABB Speeding up the intersection calculation A naive ray tracing algorithm tests each object for a ray to nd the closest intersection. If there are N objects in the space, the running time of the algorithm is Θ(N) both in the average and in the worst case. The storage requirement is also linear in terms of the number of objects. The method would be speeded up if we could exclude certain objects from the intersection test without testing them one by one. The reasons of such exclusion include that these objects are behind the ray or not in the direction of the ray.

107 14.6. Rendering with ray tracing 1057 v c x/vx c y/vy cy (x,y,z ) min min min cx Figure Partitioning the virtual world by a uniform grid. The intersections of the ray and the coordinate planes of the grid are at regular distances c x/v x,c y/v y, and c z/v z, respectively. Additionally, the speed is also expected to improve if we can terminate the search having found an intersection supposing that even if other intersections exist, they are surely farther than the just found intersection point. To make such decisions safely, we need to know the arrangement of objects in the virtual world. This information is gathered during the pre-processing phase. Of course, pre-processing has its own computational cost, which is worth spending if we have to trace a lot of rays. Bounding volumes One of the simplest ray tracing acceleration technique uses bounding volumes. The bounding volume is a shape of simple geometry, typically a sphere or an AABB, which completely contains a complex object. When a ray is traced, rst the bounding volume is tried to be intersected. If there is no intersection with the bounding volume, then neither can the contained object be intersected, thus the computation time of the ray intersection with the complex object is saved. The bounding volume should be selected in a way that the ray intersection is computationally cheap, and it is a tight container of the complex object. The application of bounding volumes does not alter the linear time complexity of the naive ray tracing. However, it can increase the speed by a scalar factor. On the other hand, bounding volumes can also be organized in a hierarchy putting bounding volumes inside bigger bounding volumes recursively. In this case the ray tracing algorithm traverses this hierarchy, which is possible in sub-linear time. Space subdivision with uniform grids Let us nd the AABB of the complete virtual world and subdivide it by an axis aligned uniform grid of cell sizes (c x, c y, c z ) (Figure 14.31). In the preprocessing phase, for each cell we identify those objects that are at least

108 Computer Graphics partially contained by the cell. The test of an object against a cell can be performed using a clipping algorithm (subsection ), or simply checking whether the cell and the AABB of the object overlap. Uniform-Grid-Construction() 1 Compute the minimum corner of the AABB (xmin, ymin, zmin) and cell sizes (c x, c y, c z ) 2 for each cell c 3 do object list of cell c empty 4 for each object o Register objects overlapping with this cell. 5 do if cell c and the AABB of object o overlap 6 then add object o to object list of cell c During ray tracing, cells intersected by the ray are visited in the order of their distance from the ray origin. When a cell is processed, only those objects need to be tested for intersection which overlap with this cell, that is, which are registered in this cell. On the other hand, if an intersection is found in the cell, then intersections belonging to other cells cannot be closer to the ray origin than the found intersection. Thus the cell marching can be terminated. Note that when an object registered in a cell is intersected by the ray, we should also check whether the intersection point is also in this cell. We might meet an object again in other cells. The number of raysurface intersection can be reduced if the results of raysurface intersections are stored with the objects and are reused when needed again. As long as no raysurface intersection is found, the algorithm traverses those cells which are intersected by the ray. Indices X, Y, Z of the rst cell are computed from ray origin s, minimum corner (xmin, ymin, zmin) of the grid, and sizes (c x, c y, c z ) of the cells: Uniform-Grid-Enclosing-Cell( s) 1 X Integer((s x xmin)/c x ) 2 Y Integer((s y ymin)/c y ) 3 Z Integer((s z zmin)/c z ) 4 return X, Y, Z The presented algorithm assumes that the origin of the ray is inside the subspace covered by the grid. Should this condition not be met, then the intersection of the ray and the scene AABB is computed, and the ray origin is moved to this point. The initial values of ray parameters t x, t y, t z are computed as the intersection of the ray and the coordinate planes by the Uniform-grid-ray-parameterinitialization algorithm:

109 14.6. Rendering with ray tracing 1059 Uniform-Grid-Ray-Parameter-Initialization( s, v, X, Y, Z) 1 if v x > 0 2 then t x (xmin + (X + 1) c x s x )/v x 3 else if v x < 0 4 then t x (xmin + X c x s x )/v x 5 else t x tmax The maximum distance. 6 if v y > 0 7 then t y (ymin + (Y + 1) c y s y )/v y 8 else if v y < 0 9 then t y (ymin + Y c y s y )/v y 10 else t y tmax 11 if v z > 0 12 then t z (zmin + (Z + 1) c z s z )/v z 13 else if v z < 0 14 then t z (zmin + Z c z s z )/v z 15 else t z tmax 16 return t x, t y, t z The next cell of the sequence of the visited cells is determined by the 3D line drawing algorithm (3DDDA algorithm). This algorithm exploits the fact that the ray parameters of the intersection points with planes perpendicular to axis x (and similarly to axes y and z) are regularly placed at distance c x /v x (c y /v y, and c z /v z, respectively), thus the ray parameter of the next intersection can be obtained with a single addition (Figure 14.31). Ray parameters t x, t y, and t z are stored in global variables, and are incremented by constant values. The smallest from the three ray parameters of the coordinate planes identies the next intersection with the cell. The following algorithm computes indices X, Y, Z of the next intersected cell, and updates ray parameters t x, t y, t z : Uniform-Grid-Next-Cell(X, Y, Z, t x, t y, t z ) 1 if t x = min(t x, t y, t z ) Next intersection is on the plane perpendicular to axis x. 2 then X X + sgn(v x ) Function sgn(x) returns the sign. 3 t x t x + c x / v x 4 else if t y = min(t x, t y, t z ) Next intersection is on the plane perpendicular to axis y. 5 then Y Y + sgn(v y ) 6 t y t y + c y / v y 7 else if t z = min(t x, t y, t z ) Next intersection is on the plane perpendicular to axis z. 8 then Z Z + sgn(v z ) 9 t z t z + c z / v z To summarize, a complete ray tracing algorithm is presented, which exploits the uniform grid generated during preprocessing and computes the ray-surface intersec-

110 Computer Graphics tion closest to the ray origin. The minimum of ray parameters (t x, t y, t z ) assigned to the coordinate planes, i.e. variable t out, determines the distance as far as the ray is inside the cell. This parameter is used to decide whether or not a ray-surface intersection is really inside the cell. Ray-First-Intersection-with-Uniform-Grid( s, v) 1 (X, Y, Z) Uniform-Grid-Enclosing-Cell( s) 2 (t x, t y, t z ) Uniform-Grid-Ray-Parameter-Initialization( s, v, X, Y, Z) 3 while X, Y, Z are inside the grid 4 do t out min(t x, t y, t z ) Here is the exit from the cell. 5 t t out Initialization: no intersection yet. 6 for each object o registered in cell (X, Y, Z) 7 do t o Ray-Surface-Intersection( s, v, o) Negative: no intersection. 8 if 0 t o < t Is the new intersection closer? 9 then t t o The ray parameter of the closest intersection so far. 10 ovisible o The rst intersected object. 11 if t < t out Was intersection in the cell? 12 then x s + v t The position of the intersection. 13 return t, x, ovisible Termination. 14 Uniform-Grid-Next-Cell(X, Y, Z, t x, t y, t z ) 3DDDA. 15 return no intersection Time and storage complexity of the uniform grid algorithm The preprocessing phase of the uniform grid algorithm tests each object with each cell, thus runs in Θ(N C) time where N and C are the numbers of objects and cells, respectively. In practice, the resolution of the grid is set to make C proportional to N since in this case, the average number of objects per cell becomes independent of the total number of objects. Such resolution makes the preprocessing time quadratic, that is Θ(N 2 ). We note that sorting objects before testing them against cells may reduce this complexity, but this optimization is not crucial since not the preprocessing but the ray tracing time is critical. Since in the worst case all objects may overlap with each cell, the storage space is also in O(N 2 ). The ray tracing time can be expressed by the following equation: T = T o + N I T I + N S T S, (14.26) where T o is the time needed to identify the cell containing the origin of the ray, N I is the number of raysurface intersection tests until the rst intersection is found, T I is the time required by a single raysurface intersection test, N S is the number of visited cells, and T S is the time needed to step onto the next cell. To nd the rst cell, the coordinates of the ray origin should be divided by the cell sizes, and the cell indices are obtained by rounding the results. This step thus runs in constant time. A single raysurface intersection test also requires constant time. The next cell is determined by the 3DDDA algorithm in constant time as well.

111 14.6. Rendering with ray tracing 1061 Thus the complexity of the algorithm depends only on the number of intersection tests and the number of the visited cells. Considering a worst case scenario, a cell may contain all objects, requiring O(N) intersection test with N objects. In the worst case the ray tracing has linear complexity. This means that the uniform grid algorithm needs quadratic preprocessing time and storage, but solves the ray tracing problem still in linear time as the naive algorithm, which is quite disappointing. However, uniform grids are still worth using since worst case scenarios are very unlikely. The fact is that classic complexity measures describing the worst case characteristics are not appropriate to compare the naive algorithm and the uniform grid based ray tracing. For a reasonable comparison, the probabilistic analysis of the algorithms is needed. Probabilistic model of the virtual world To carry out the average case analysis, the scene model, i.e. the probability distribution of the possible virtual world models must be known. In practical situations, this probability distribution is not available, therefore it must be estimated. If the model of the virtual world were too complicated, we would not be able to analytically determine the average, i.e. the expected running time of the ray tracing algorithm. A simple, but also justiable model is the following: Objects are spheres of the same radius r, and sphere centres are uniformly distributed in space. Since we are interested in the asymptotic behavior when the number of objects is really high, uniform distribution in a nite space would not be feasible. On the other hand, the boundary of the space would pose problems. Thus, instead of dealing with a nite object space, the space should also be expanded as the number of objects grows to sustain constant average spatial object density. This is a classical method in probability theory, and its known result is the Poisson point process. Denition A Poisson point process N(A) counts the number of points in subset A of space in a way that N(A) is a Poisson distribution of parameter ρv (A), where ρ is a positive constant called intensity and V (A) is the volume of A, thus the probability that A contains exactly k points is Pr {N(A) = k} = (ρv (A))k k! e ρv (A), and the expected number of points in volume V (A) is ρv (A); for disjoint A 1, A 2,..., A n sets random variables N(A 1 ), N(A 2 ),..., N(A n ) are independent. Using the Poisson point process, the probabilistic model of the virtual world is: 1. The object space consists of spheres of the same radius r. 2. The sphere centres are the realizations of a Poisson point process of intensity ρ. Having constructed a probabilistic virtual world model, we can start the analysis of the candidate algorithms assuming that the rays are uniformly distributed in space.

112 Computer Graphics intersection space candidate space Figure Encapsulation of the intersection space by the cells of the data structure in a uniform subdivision scheme. The intersection space is a cylinder of radius r. The candidate space is the union of those spheres that may overlap a cell intersected by the ray. Calculation of the expected number of intersections Looking at Figure we can see a ray that passes through certain cells of the space partitioning data structure. The collection of those sphere centres where the sphere would have an intersection with a cell is called the candidate space associated with this cell. Only those spheres of radius r can have intersection with the ray whose centres are in a cylinder of radius r around the ray. This cylinder is called the intersection space (Figure 14.32). More precisely, the intersection space also includes two half spheres at the bottom and at the top of the cylinder, but these will be ignored. As the ray tracing algorithm traverses the data structure, it examines each cell that is intersected by the ray. If the cell is empty, then the algorithm does nothing. If the cell is not empty, then it contains, at least partially, a sphere which is tried to be intersected. This intersection succeeds if the centre of the sphere is inside the intersection space and fails if it is outside. The algorithm should try to intersect objects that are in the candidate space, but this intersection will be successful only if the object is also contained by the intersection space. The probability of the success s is the ratio of the projected areas of the intersection space and the candidate space associated with this cell. >From the probability of the successful intersection in a non-empty cell, the probability that the intersection is found in the rst, second, etc. cells can also be computed. Assuming statistical independence, the probabilities that the rst, second, third, etc. intersection is the rst successful intersection are s, (1 s)s, (1 s) 2 s, etc., respectively. This is a geometric distribution with expected value 1/s. Consequently, the expected number of the rayobject intersection tests is: E[N I ] = 1 s. (14.27) If the ray is parallel to one of the sides, then the projected size of the candidate space is c 2 +4cr+r 2 π where c is the edge size of a cell and r is the radius of the spheres. The other extreme case happens when the ray is parallel to the diagonal of the cubic cell, where the projection is a rounded hexagon having area 3c 2 + 6cr + r 2 π. The

113 14.6. Rendering with ray tracing 1063 success probability is then: r 2 π 3c2 + 6cr + r 2 π s r 2 π c 2 + 4cr + r 2 π. According to equation (14.27), the average number of intersection calculations is the reciprocal of this probability: 3 ( c π r 1 ( c ) 2 4 c + π r π r + 1 E ) 2 [N 6 c I] + π r + 1. (14.28) Note that if the size of the cell is equal to the diameter of the sphere (c = 2r), then 3.54 < E [N I ] < This result has been obtained assuming that the number of objects converges to innity. The expected number of intersection tests, however, remains nite and relatively small. Calculation of the expected number of cell steps In the following analysis the conditional expected value theorem will be used. An appropriate condition is the length of the ray segment between its origin and the closest intersection. Using its probability density p t (t) as a condition, the expected number of visited cells N S can be written in the following form: E[N S ] = 0 E[N S t = t] p t (t) dt, where t is the length of the ray and p t is its probability density. Since the intersection space is a cylinder if we ignore the half spheres around the beginning and the end, its total volume is r 2 πt. Thus the probability that intersection occurs before t is: Pr {t < t} = 1 e ρr2πt. Note that this function is the cumulative probability distribution function of t. The probability density can be computed as its derivative, thus we obtain: The expected length of the ray is then: E[t ] = p t (t) = ρr 2 π e ρr2 πt. 0 t ρr 2 π e ρr2 πt dt = 1 ρr 2 π. (14.29) In order to simplify the analysis, we shall assume that the ray is parallel to one of the coordinate axes. Since all cells have the same edge size c, the number of cells intersected by a ray of length t can be estimated as E[N S t = t] t/c + 1. This estimation is quite accurate. If the the ray is parallel to one of the coordinate axes,

114 Computer Graphics then the error is at most 1. In other cases the real value can be at most 3 times the given estimation. The estimated expected number of visited cells is then: E [N S ] 0 ( t c + 1 ) ρr 2 π e ρr2 πt dt = 1 cρr 2 π + 1. (14.30) For example, if the cell size is similar to the object size (c = 2r), and the expected number of sphere centres in a cell is 0.1, then E [N S ] 14. Note that the expected number of visited cells is also constant even for innite number of objects. Expected running time and storage space We concluded that the expected numbers of required intersection tests and visited cells are asymptotically constant, thus the expected time complexity of the uniform grid based ray tracing algorithm is constant after quadratic preprocessing time. The value of the running time can be controlled by cell size c according to equations (14.28) and (14.30). Smaller cell sizes reduce the average number of intersection tests, but increase the number of visited cells. According to the probabilistic model, the average number of objects overlapping with a cell is also constant, thus the storage is proportional to the number of cells. Since the number of cells is set proportional to the number of objects, the expected storage complexity is also linear unlike the quadratic worst-case complexity. The expected constant running time means that asymptotically the running time is independent of the number of objects, which explains the popularity of the uniform grid based ray tracing algorithm, and also the popularity of the algorithms presented in the next subsections. Octree Uniform grids require many unnecessary cell steps. For example, the empty spaces are not worth partitioning into cells, and two cells are worth separating only if they contain dierent objects. Adaptive space partitioning schemes are based on these recognitions. The space can be partitioned adaptively following a recursive approach. This results in a hierarchical data structure, which is usually a tree. The type of this tree is the base of the classication of such algorithms. The adaptive scheme discussed in this subsection uses an octal tree (octree for short), where non-empty nodes have 8 children. An octree is constructed by the following algorithm: For each object, an AABB is found, and object AABBs are enclosed by a scene AABB. The scene AABB is the cell corresponding to the root of the octree. If the number of objects overlapping with the current cell exceeds a predened threshold, then the cell is subdivided to 8 cells of the same size by halving the original cell along each coordinate axis. The 8 new cells are the children of the node corresponding to the original cell. The algorithm is recursively repeated for the child cells. The recursive tree building procedure terminates if the depth of the tree becomes too big, or when the number of objects overlapping with a cell is smaller than

115 14.6. Rendering with ray tracing 1065 I II IV Figure A quadtree partitioning the plane, whose three-dimensional version is the octree. The tree is constructed by halving the cells along all coordinate axes until a cell contains just a few objects, or the cell sizes gets smaller than a threshold. Objects are registered in the leaves of the tree. III the threshold. The result of this construction is an octree (Figure 14.33). Overlapping objects are registered in the leaves of this tree. When a ray is traced, those leaves of the tree should be traversed which are intersected by the ray, and raysurface intersection test should be executed for objects registered in these leaves: Ray-First-Intersection-with-Octree( s, v) 1 q intersection of the ray and the scene AABB 2 while q is inside of the scene AABB Traversal of the tree. 3 cell Octree-Cell-Search(octree root, q) 4 t out ray parameter of the intersection of the cell and the ray 5 t t out Initialization: no raysurface intersection yet. 6 for each object o registered in cell 7 do t o Ray-Surface-Intersection( s, v) Negative if no intersection exists. 8 if 0 t o < t Is the new intersection closer? 9 then t t o Ray parameter of the closest intersection so far. 10 ovisible o First intersected object so far. 11 if t < t out Has been intersection at all? 12 then x s + v t Position of the intersection. 13 return t, x, ovisible 14 q s + v (t out + ε) A point in the next cell. 15 return no intersection The identication of the next cell intersected by the ray is more complicated for

116 Computer Graphics octrees than for uniform grids. The Octree-Cell-Search algorithm determines that leaf cell which contains a given point. At each level of the tree, the coordinates of the point are compared to the coordinates of the centre of the cell. The results of these comparisons determine which child contains the point. Repeating this test recursively, we arrive at a leaf sooner or later. In order to identify the next cell intersected by the ray, the intersection point of the ray and the current cell is computed. Then, ray parameter t out of this intersection point is increased a little (this little value is denoted by ε in algorithm Ray-First- Intersection-with-Octree). The increased ray parameter is substituted into the ray equation, resulting in point q that is already in the next cell. The cell containing this point can be identied with Octree-cell-search. Cells of the octree may be larger than the allowed minimal cell, therefore the octree algorithm requires less number of cell steps than the uniform grid algorithm working on the minimal cells. However, larger cells reduce the probability of the successful intersection tests since in a large cell it is less likely that a random ray intersecting the cell also intersects a contained object. Smaller successful intersection probability, on the other hand, results in greater expected number of intersection tests, which aects the performance negatively. It also means that non-empty octree cells are worth subdividing until the minimum cell size is reached even if the cell contains just a single object. Following this strategy, the size of the non-empty cells are similar, thus the results of the complexity analysis made for the uniform grid remain to be applicable to the octree as well. Since the probability of the successful intersection depends on the size of the non-empty cells, the expected number of needed intersection tests is still given by inequality (14.28). It also means that when the minimal cell size of an octree equals to the cell size of a uniform grid, then the expected number of intersection tests is equal in the two algorithms. The advantage of the ocree is the ability to skip empty spaces, which reduces the number of cell steps. Its disadvantage is, however, that the time of the next cell identication is not constant. This identication requires the traversal of the tree. If the tree construction is terminated when a cell contains small number of objects, then the number of leaf cells is proportional to the number of objects. The depth of the tree is in O(lg N), so is the time needed to step onto the next cell. kd-tree An octree adapts to the distribution of the objects. However, the partitioning strategy of octrees always halves the cells without taking into account where the objects are, thus the adaptation is not perfect. Let us consider a partitioning scheme which splits a cell into two cells to make the tree balanced. Such method builds a binary tree which is called binary space partitioning tree, abbreviated as BSP-tree. If the separating plane is always perpendicular to one of the coordinate axes, then the tree is called kd-tree. The separating plane of a kd-tree node can be placed in many dierent ways: the spatial median method halves the cell into two congruent cells. the object median method nds the separating plane to have the same number of objects in the two child cells.

117 14.6. Rendering with ray tracing 1067 I 1 2 II Figure A kd-tree. A cell containing many objects are recursively subdivided to two cells with a plane that is perpendicular to one of the coordinate axes. the cost driven method estimates the average computation time needed when a cell is processed during ray tracing, and minimizes this value by placing the separating plane. An appropriate cost model suggests to separate the cell to make the probabilities of the raysurface intersection of the two cells similar. The probability of the raysurface intersection can be computed using a fundamental theorem of the integral geometry: Theorem If convex solid A contains another convex solid B, then the probability that a uniformly distributed line intersects solid B provided that the line intersected A equals to the ratio of the surface areas of objects B and A. According to this theorem the cost driven method nds the separating plane to equalize the surface areas in the two children. Let us now present a general kd-tree construction algorithm. Parameter cell identies the current cell, depth is the current depth of recursion, and coordinate stores the orientation of the current separating plane. A cell is associated with its two children (cell.right and cell.left), and its left-lower-closer and right-upperfarther corners (cell.min and cell.max). Cells also store the list of those objects which overlap with the cell. The orientation of the separation plane is determined by a round-robin scheme implemented by function Round-robin providing a sequence like (x, y, z, x, y, z, x,...). When the following recursive algorithm is called rst, it gets the scene AABB in variable cell and the value of variable depth is zero: Kd-Tree-Construction(cell, depth, coordinate) 1 if the number of objects overlapping with cell is small or depth is large 2 then return 3 AABB of cell.left and AABB of cell.right AABB of cell

118 Computer Graphics 4 if coordinate = x 5 then cell.right.min.x x perpendicular separating plane of cell 6 cell.left.max.x x perpendicular separating plane of cell 7 else if coordinate = y 8 then cell.right.min.y y perpendicular separating plane of cell 9 cell.left.max.y y perpendicular separating plane of cell 10 else if coordinate = z 11 then cell.right.min.z z perpendicular separating plane of cell 12 cell.left.max.z z perpendicular separating plane of cell 13 for each object o of cell 14 do if object o is in the AABB of cell.left 15 then assign object o to the list of cell.left 16 if object o is in the AABB of cell.right 17 then assign object o to the list of cell.right 18 Kd-Tree-Construction(cell.left, depth + 1, Round-Robin(coordinate)) 19 Kd-Tree-Construction(cell.right, depth + 1, Round-Robin(coordinate)) Now we discuss an algorithm that traverses the constructed kd-tree and nds the visible object. First we have to test whether the origin of the ray is inside the scene AABB. If it is not, the intersection of the ray and the scene AABB is computed, and the origin of the ray is moved there. The identication of the cell containing the ray origin requires the traversal of the tree. During the traversal the coordinates of the point are compared to the coordinates of the separating plane. This comparison determines which child should be processed recursively until a leaf node is reached. If the leaf cell is not empty, then objects overlapping with the cell are intersected with the ray, and the intersection closest to the origin is retained. The closest intersection is tested to see whether or not it is inside the cell (since an object may overlap more than one cells, it can also happen that the intersection is in another cell). If the intersection is in the current cell, then the needed intersection has been found, and the algorithm can be terminated. If the cell is empty, or no intersection is found in the cell, then the algorithm should proceed with the next cell. To identify the next cell, the ray is intersected with the current cell identifying the ray parameter of the exit point. Then the ray parameter is increased a little to make sure that the increased ray parameter corresponds to a point in the next cell. The algorithm keeps repeating these steps as it processes the cells of the tree. This method has the disadvantage that the cell search always starts at the root, which results in the repetitive traversals of the same nodes of the tree. This disadvantage can be eliminated by putting the cells to be visited into a stack, and backtracking only to the point where a new branch should be followed. When the ray arrives at a node having two children, the algorithm decides the order of processing the two child nodes. Child nodes are classied as near and far depending on whether or not the child cell is on the same side of the separating plane as the origin of the ray. If the ray intersects only the near child, then the algorithm processes only that subtree which originates at this child. If the ray intersects both

119 14.6. Rendering with ray tracing 1069 children, then the algorithm pushes the far node onto the stack and starts processing the near node. If no intersection exists in the near node, then the stack is popped to obtain the next node to be processed. The notations of the ray tracing algorithm based on kd-tree traversal are shown by Figure The algorithm is the following: Ray-First-Intersection-with-kd-Tree(root, s, v) 1 (tin, tout) Ray-AABB-Intersection( s, v, root) Intersection with the scene AABB. 2 if no intersection 3 then return no intersection 4 Push(root, t in, t out ) 5 while the stack is not empty Visit all nodes. 6 do Pop(cell, t in, t out ) 7 while cell is not a leaf 8 do coordinate orientation of the separating plane of the cell 9 d cell.right.min[coordinate] s[coordinate] 10 t d/ v[coordinate] Ray parameter of the separating plane. 11 if d > 0 Is s on the left side of the separating plane? 12 then (near, far) (cell.left, cell.right) Left. 13 else (near, far) (cell.right, cell.left) Right. 14 if t > t out or t < 0 15 then cell near The ray intersects only the near cell. 16 else if t < t in 17 then cell far The ray intersects only the far cell. 18 else Push(far, t, t out ) The ray intersects both cells. 19 cell near First near is intersected. 20 t out t The ray exists at t from the near cell. If the current cell is a leaf. 21 t t out Maximum ray parameter in this cell. 22 for each object o of cell 23 do t o Ray-surface-intersection( s, v) Negative if no intersection exists. 24 if t in t o < t Is the new intersection closer to the ray origin? 25 then t t o The ray parameter of the closest intersection so far. 26 ovisible o The object intersected closest to the ray origin. 27 if t < t out Has been intersection at all in the cell? 28 then x s + v t The intersection point. 29 return t, x, ovisible Intersection has been found. 30 return no intersection No intersection. Similarly to the octree algorithm, the likelihood of successful intersections can be increased by continuing the tree building process until all empty spaces are cut

120 Computer Graphics t in t t out v s s d t out left right notations t left right t > t out s d left s left right left d > 0 d < 0 s t t t < 0 right t in left t < t in Figure Notations and cases of algorithm Ray-First-Intersection-with-kd-Tree. t in, t out, and t are the ray parameters of the entry, exit, and the separating plane, respectively. d is the signed distance between the ray origin and the separating plane. right right d s Figure Kd-tree based space partitioning with empty space cutting. (Figure 14.36). Our probabilistic world model contains spheres of same radius r, thus the nonempty cells are cubes of edge size c = 2r. Unlike in uniform grids or octrees, the separating planes of kd-trees are not independent of the objects. Kd-tree splitting planes are rather tangents of the objects. This means that we do not have to be concerned with partially overlapping spheres since a sphere is completely contained by a cell in a kd-tree. The probability of the successful intersection is obtained applying Theorem In the current case, the containing convex solid is a cube of edge size 2r, the contained solid is a sphere of radius r, thus the intersection probability is: s = 4r2 π 6a 2 = π 6. The expected number of intersection tests is then: E [N I ] = 6 π 1.91.

121 14.7. Incremental rendering 1071 We can conclude that the kd-tree algorithm requires the smallest number of raysurface intersection tests according to the probabilistic model. Exercises Prove that the expected number of intersection tests is constant in all those ray tracing algorithms which process objects in the order of their distance from the ray origin Propose a ray intersection algorithm for subdivision surfaces Develop a ray intersection method for B-spline surfaces Develop a ray intersection algorithm for CSG models assuming that the rayprimitive intersection tests are already available Propose a ray intersection algorithm for transformed objects assuming that the algorithm computing the intersection with the non-transformed objects is available (hints: transform the ray) Incremental rendering Rendering requires the identication of those surface points that are visible through the pixels of the virtual camera. Ray tracing solves this visibility problem for each pixel independently, thus it does not reuse visibility information gathered at other pixels. The algorithms of this section, however, exploit such information using the following simple techniques: 1. They simultaneously attack the visibility problem for all pixels, and handle larger parts of the scene at once. 2. Where feasible, they exploit the incremental principle which is based on the recognition that the visibility problem becomes simpler to solve if the solution at the neighbouring pixel is taken into account. 3. They solve each task in that coordinate system which makes the solution easier. The scene is transformed from one coordinate system to the other by homogeneous linear transformations. 4. They minimize unnecessary computations, therefore remove those objects by clipping in an early stage of rendering which cannot be projected onto the window of the camera. Homogeneous linear transformations and clipping may change the type of the surface except for points, line segments and polygons 4. Therefore, before rendering is started, each shape is approximated by points, line segments, and meshes (Subsection 14.3). Steps of incremental rendering are shown in Figure Objects are dened in their reference state, approximated by meshes, and are transformed to the virtual world. The time dependence of this transformation is responsible for object animation. The image is taken from the camera about the virtual world, which requires 4 Although Bézier and B-Spline curves and surfaces are invariant to ane transformations, and NURBS is invariant even to homogeneous linear transformations, but clipping changes these object types as well.

122 Computer Graphics (a) Modelling (b) Tessellation (c) Modelling transformation (d) Camera transformation (e) Perspective transformation (f) Clipping (g) Hidden surface elimination (h) Projection and shading Figure Steps of incremental rendering. (a) Modelling denes objects in their reference state. (b) Shapes are tessellated to prepare for further processing. (c) Modelling transformation places the object in the world coordinate system. (d) Camera transformation translates and rotates the scene to get the eye to be at the origin and to look parallel with axis z. (e) Perspective transformation converts projection lines meeting at the origin to parallel lines, that is, it maps the eye position onto an ideal point. (f) Clipping removes those shapes and shape parts, which cannot be projected onto the window. (g) Hidden surface elimination removes those surface parts that are occluded by other shapes. (h) Finally, the visible polygons are projected and their projections are lled with their visible colours.

123 14.7. Incremental rendering 1073 up bp lookat fp v fov eye u w z Figure Parameters of the virtual camera: eye position eye, target lookat, and vertical direction up, from which camera basis vectors u, v, w are obtained, front f p and back b p clipping planes, and vertical eld of view fov (the horizontal eld of view is computed from aspect ratio aspect). y x the identication of those surface points that are visible from the camera, and their projection onto the window plane. The visibility and projection problems could be solved in the virtual world as happens in ray tracing, but this would require the intersection calculations of general lines and polygons. Visibility and projection algorithms can be simplied if the scene is transformed to a coordinate system, where the X, Y coordinates of a point equal to the coordinates of that pixel onto which this point is projected, and the Z coordinate can be used to decide which point is closer if more than one surfaces are projected onto the same pixel. Such coordinate system is called the screen coordinate system. In screen coordinates the units of axes X and Y are equal to the pixel size. Since it is usually not worth computing the image on higher accuracy than the pixel size, coordinates X, Y are integers. Because of performance reasons, coordinate Z is also often integer. Screen coordinates are denoted by capital letters. The transformation taking to the screen coordinate system is dened by a sequence of transformations, and the elements of this sequence are discussed separately. However, this transformation is executed as a single multiplication with a 4 4 transformation matrix obtained as the product of elementary transformation matrices Camera transformation Rendering is expected to generate an image from a camera dened by eye position ( eye) (the focal point of the camera), looking target ( lookat) where the camera looks at, and by vertical direction up (Figure 14.38). Camera parameter fov denes the vertical eld of view, aspect is the ratio of the width and the height of the window, f p and b p are the distances of the front and back clipping planes from the eye, respectively. These clipping planes allow to remove those objects that are behind, too close to, or too far from the eye.

124 Computer Graphics y y z z f p b p f p b p Figure The normalizing transformation sets the eld of view to 90 degrees. We assign a coordinate system, i.e. three orthogonal unit basis vectors to the camera. Horizontal basis vector u = (u x, u y, u z ), vertical basis vector v = (v x, v y, v z ), and basis vector w = (w x, w y, w z ) pointing to the looking direction are obtained as follows: w = eye lookat eye lookat up w, u =, v = w u. up w The camera transformation translates and rotates the space of the virtual world in order to get the camera to move to the origin, to look at direction axis z, and to have vertical direction parallel to axis y, that is, this transformation maps unit vectors u, v, w to the basis vectors of the coordinate system. Transformation matrix T camera can be expressed as the product of a matrix translating the eye to the origin and a matrix rotating basis vectors u, v, w of the camera to the basis vectors of the coordinate system: where [x, y, z, 1] = [x, y, z, 1] T camera = [x, y, z, 1] Ttranslation Trotation, (14.31) Ttranslation = eye x eye y eye z 1, T rotation = u x v x w x 0 u y v y w y 0 u z v z w z Let us note that the columns of the rotation matrix are vectors u, v, w. Since these vectors are orthogonal, it is easy to see that this rotation maps them to coordinate axes x, y, z. For example, the rotation of vector u is: [u x, u y, u z, 1] Trotation = [ u u, u v, u w, 1] = [1, 0, 0, 1] Normalizing transformation In the next step the viewing pyramid containing those points which can be projected onto the window is normalized making the eld of view equal to 90 degrees (Figure 14.39)..

125 14.7. Incremental rendering 1075 y 1 y z -1 1 z f p b p -1 Figure The perspective transformation maps the nite frustum of pyramid dened by the front and back clipping planes, and the edges of the window onto an axis aligned, origin centred cube of edge size 2. Normalization is a simple scaling transformation: Tnorm = 1/(tan(fov/2) aspect) / tan(fov/2) The main reason of this transformation is to simplify the formulae of the next transformation step, called perspective transformation Perspective transformation The perspective transformation distorts the virtual world to allow the replacement of the perspective projection by parallel projection during rendering. After the normalizing transformation, the potentially visible points are inside a symmetrical nite frustum of pyramid of 90 degree apex angle (Figure 14.39). The perspective transformation maps this frustum onto a cube, converting projection lines crossing the origin to lines parallel to axis z (Figure 14.40). Perspective transformation is expected to map point to point, line to line, but to map the eye position to innity. It means that perspective transformation cannot be a linear transformation of Cartesian coordinates. Fortunately, homogeneous linear transforms also map point to point, line to line, and are able to handle points at innity with nite coordinates. Let us thus try to nd the perspective transformation. in the form of a homogeneous linear transformation dened by a 4 4 matrix: Tpersp = t 11 t 12 t 13 t 14 t 21 t 22 t 23 t 24 t 31 t 32 t 33 t 34 t 41 t 42 t 43 t 44 Figure shows a line (projection ray) and its transform. Let m x and m y be the x/z and the y/z slopes of the line, respectively. This line is dened by equation [ m x z, m y z, z] in the normalized camera space. The perspective transformation maps this line to a horizontal line crossing point [m x, m y, 0] and being parallel.

126 Computer Graphics to axis z. Let us examine the intersection points of this line with the front and back clipping planes, that is, let us substitute ( f p ) and ( b p ) into parameter z of the line equation. The transformation should map these points to [m x, m y, 1] and [m x, m y, 1], respectively. The perspective transformation of the point on the rst clipping plane is: [m x f p, m y f p, f p, 1] T persp = [m x, m y, 1, 1] λ, where λ is an arbitrary, non-zero scalar since the point dened by homogeneous coordinates does not change if the homogeneous coordinates are simultaneously multiplied by a non-zero scalar. Setting λ to f p, we get: [m x f p, m y f p, f p, 1] T persp = [m x f p, m y f p, f p, f p ]. (14.32) Note that the rst coordinate of the transformed point equals to the rst coordinate of the original point on the clipping plane for arbitrary m x, m y, and f p values. This is possible only if the rst column of matrix T persp is [1, 0, 0, 0] T. Using the same argument for the second coordinate, we can conclude that the second column of the matrix is [0, 1, 0, 0] T. Furthermore, in equation (14.32) the third and the fourth homogeneous coordinates of the transformed point are not aected by the rst and the second coordinates of the original point, requiring t 13 = t 14 = t 23 = t 24 = 0. The conditions on the third and the fourth homogeneous coordinates can be formalized by the following equations: f p t 33 + t 43 = f p, f p t 34 + t 44 = f p. Applying the same procedure for the intersection point of the projection line and the back clipping plane, we can obtain other two equations: b p t 33 + t 43 = b p, b p t 34 + t 44 = b p. Solving this system of linear equations, the matrix of the perspective transformation can be expressed as: T persp = (f p + b p )/(b p f p ) f p b p /(b p f p ) 0 Since perspective transformation is not ane, the fourth homogeneous coordinate of the transformed point is usually not 1. If we wish to express the coordinates of the transformed point in Cartesian coordinates, the rst three homogeneous coordinates should be divided by the fourth coordinate. Homogeneous linear transforms map line segment to line segment and triangle to triangle, but it may happen that the resulting line segment or triangle contains ideal points (Subsection ). The intuition behind the homogeneous division is a traveling from the projective space to the Euclidean space, which converts a line segment containing an ideal point to two half lines. If just the two endpoints of the line segment is transformed, then it is not unambiguous whether the two transformed points need to be connected by.

127 14.7. Incremental rendering 1077 a line segment or the complement of this line segment should be considered as the result of the transformation. This ambiguity is called the wrap around problem. The wrap around problem does not occur if we can somehow make sure that the original shape does not contain points that might be mapped onto ideal points. Examining the matrix of the perspective transformation we can conclude that the fourth homogeneous coordinate of the transformed point will be equal to the z coordinate of the original point. Ideal points having zero fourth homogeneous coordinate (h = 0) may thus be obtained transforming the points of plane z = 0, i.e. the plane crossing the origin and parallel to the window. However, if the shapes are clipped onto a rst clipping plane being in front of the eye, then these points are removed. Thus the solution of the wrap around problem is the execution of the clipping step before the homogeneous division Clipping in homogeneous coordinates The purpose of clipping is to remove all shapes that either cannot be projected onto the window or are not between the front and back clipping planes. To solve the wrap around problem, clipping should be executed before the homogeneous division. The clipping boundaries in homogeneous coordinates can be obtained by transforming the screen coordinate AABB back to homogeneous coordinates. In screen coordinates, i.e. after homogeneous division, the points to be preserved by clipping meet the following inequalities: 1 X = X h /h 1, 1 Y = Y h /h 1, 1 Z = Z h /h 1. (14.33) On the other hand, points that are in front of the eye after camera transformation have negative z coordinates, and the perspective transformation makes the fourth homogeneous coordinate h equal to z in normalized camera space. Thus the fourth homogeneous coordinate of points in front of the eye is always positive. Let us thus add condition h > 0 to the set of conditions of inequalities (14.33). If h is positive, then inequalities (14.33) can be multiplied by h, resulting in the denition of the clipping region in homogeneous coordinates: h X h h, h Y h h, h Z h h. (14.34) Points can be clipped easily, since we should only test whether or not the conditions of inequalities (14.34) are met. Clipping line segments and polygons, on the other hand, requires the computation of the intersection points with the faces of the clipping boundary, and only those parts should be preserved which meet inequalities (14.34). Clipping algorithms using Cartesian coordinates were discussed in Subsection Those methods can also be applied in homogeneous coordinates with two exceptions. Firstly, for homogeneous coordinates, inequalities (14.34) dene whether a point is in or out. Secondly, intersections should be computed using the homogeneous coordinate equations of the line segments and the planes. Let us consider a line segment with endpoints [Xh 1, Y h 1, Z1 h, h1 ] and [Xh 2, Y h 2, Z2 h, ]. This line segment can be an independent shape or an edge of a h2

128 Computer Graphics polygon. Here we discuss the clipping on half space of equation X h h (clipping methods on other half spaces are very similar). Three cases need to be distinquished: 1. If both endpoints of the line segment are inside, that is X 1 h h1 and X 2 h h2, then the complete line segment is in, thus is preserved. 2. If both endpoints are outside, that is X 1 h > h1 and X 2 h > h2, then all points of the line segment are out, thus it is completely eliminated by clipping. 3. If one endpoint is outside, while the other is in, then the intersection of the line segment and the clipping plane should be obtained. Then the endpoint being out is replaced by the intersection point. Since the points of a line segment satisfy equation (14.19), while the points of the clipping plane satisfy equation X h = h, parameter t i of the intersection point is computed as: X h (t i ) = h(t i ) = X 1 h (1 t i ) + X 2 h t i = h 1 (1 t i ) + h 2 t i = = t i = X 1 h h1 X 1 h X2 h + h2 h 1. Substituting parameter t i into the equation of the line segment, homogeneous coordinates [X i h, Y i h, Zi h, hi ] of the intersection point are obtained. Clipping may introduce new vertices. When the vertices have some additional features, for example, the surface colour or normal vector at these vertices, then these additional features should be calculated for the new vertices as well. We can use linear interpolation. If the values of a feature at the two endpoints are I 1 and I 2, then the feature value at new vertex [X h (t i ), Y h (t i ), Z h (t i ), h(t i )] generated by clipping is I 1 (1 t i ) + I 2 t i Viewport transformation Having executed the perspective transformation, the Cartesian coordinates of the visible points are in [ 1, 1]. These normalized device coordinates should be further scaled and translated according to the resolution of the screen and the position of the viewport where the image is expected. Denoting the left-bottom corner pixel of the screen viewport by (Xmin, Ymin), the right-top corner by (Xmax, Ymax), and Z coordinates expressing the distance from the eye are expected in (Zmin, Zmax), the matrix of the viewport transformation is: Tviewport = (Xmax Xmin)/ (Ymax Ymin)/ (Zmax Zmin)/2 0 (Xmax + Xmin)/2 (Ymax + Ymin)/2 (Zmax + Zmin)/2 1 Coordinate systems after the perspective transformation are left handed, unlike the coordinate systems of the virtual world and the camera, which are right handed. Left handed coordinate systems seem to be unusual, but they meet our natural expectation that the screen coordinate X grows from left to right, the Y coordinate from bottom to top and, the Z coordinate grows in the direction of the camera target..

129 14.7. Incremental rendering Rasterization algorithms After clipping, homogeneous division, and viewport transformation, shapes are in the screen coordinate system where a point of coordinates (X, Y, Z) can be assigned to a pixel by extracting the rst two Cartesian coordinates (X, Y ). Rasterization works in the screen coordinate system and identies those pixels which have to be coloured to approximate the projected shape. Since even simple shapes can cover many pixels, rasterization algorithms should be very fast, and should be appropriate for hardware implementation. Line drawing Let the endpoints of a line segment be (X 1, Y 1 ) and (X 2, Y 2 ) in screen coordinates. Let us further assume that while we are going from the rst endpoint towards the second, both coordinates are growing, and X is the faster changing coordinate, that is, X = X 2 X 1 Y = Y 2 Y 1 0. In this case the line segment is moderately ascending. We discuss only this case, other cases can be handled by exchanging the X, Y coordinates and replacing additions by substractions. Line drawing algorithms are expected to nd pixels that approximate a line in a way that there are no holes and the approximation is not fatter than necessary. In case of moderately ascending line segments this means that in each pixel column exactly one pixel should be lled with the colour of the line. This coloured pixel is the one closest to the line in this column. Using the following equation of the line y = m X + b, where m = Y 2 Y 1 X 2 X 1, and b = Y 1 X 1 Y2 Y 1 X 2 X 1, (14.35) in pixel column of coordinate X, the pixel closest to the line has Y coordinate that is equal to the rounding of m x + b. Unfortunately, the determination of Y requires a oating point multiplication, addition, and a rounding operation, which are too slow İn order to speed up line drawing, we apply a fundamental trick of computer graphics, the incremental principle. The incremental principle is based on the recognition that it is usually simpler to evaluate a function y(x + 1) using value y(x) than computing it from X. Since during line drawing the columns are visited one by one, when column (X + 1) is processed, value y(x) is already available. In case of a line segment we can write: y(x + 1) = m (X + 1) + b = m X + b + m = y(x) + m. Note that the evaluation of this formula requires just a single oating point addition (m is less than 1). This fact is exploited in digital dierential analyzator algorithms (DDA-algorithms). The DDA line drawing algorithm is then:

130 Computer Graphics 1 t(x+1) t(x) t(x+1) t(x) s(x+1) Y s(x) s(x+1) s(x) X Figure Notations of the Bresenham algorithm: s is the signed distance between the closest pixel centre and the line segment along axis Y, which is positive if the line segment is above the pixel centre. t is the distance along axis Y between the pixel centre just above the closest pixel and the line segment. DDA-Line-Drawing(X 1, Y 1, X 2, Y 2,colour) 1 m (Y 2 Y 1 )/(X 2 X 1 ) 2 y Y 1 3 for X X 1 to X 2 4 do Y Round(y) 5 Pixel-Write(X, Y, colour) 6 y y + m Further speedups can be obtained using xed point number representation. This means that the product of the number and 2 T is stored in an integer variable, where T is the number of fractional bits. The number of fractional bits should be set to exclude cases when the rounding errors accumulate to an incorrect result during long iteration sequences. If the longest line segment covers L columns, then the minimum number of fractional bits guaranteeing that the accumulated error is less than 1 is log 2 L. Thanks to clipping only lines tting to the screen are rasterized, thus L is equal to the maximum screen resolution. The performance and simplicity of the DDA line drawing algorithm can still be improved. On the one hand, the software implementation of the DDA algorithm requires shift operations to realize truncation and rounding operations. On the other hand once for every line segment the computation of slope m involves a division which is computationally expensive. Both problems are solved in the Bresenham line drawing algorithm. Let us denote the vertical, signed distance of the line segment and the closest pixel centre by s, and the vertical distance of the line segment and the pixel centre just above the closest pixel by t (Figure 14.41). As the algorithm steps onto the next pixel column, values s and t change and should be recomputed. While the new s and t values satisfy inequality s < t, that is, while the lower pixel is still closer

131 14.7. Incremental rendering 1081 to the line segment, the shaded pixel of the next column is in the same row as in the previous column. Introducing error variable e = s t, the row of the shaded pixel remains the same until this error variable is negative (e < 0). As the pixel column is incremented, variables s, t, e are updated using the incremental formulae ( X = X 2 X 1, Y = Y 2 Y 1 ): s(x +1) = s(x)+ Y X, Y t(x +1) = t(x) X = e(x +1) = e(x)+2 Y X. These formulae are valid if the closest pixel in column (X + 1) is in the same row as in column X. If stepping to the next column, the upper pixel gets closer to the line segment (error variable e becomes positive), then variables s, t, e should be recomputed for the new closest row and for the pixel just above it. The formulae describing this case are as follows: s(x + 1) = s(x) + Y X 1, = e(x + 1) = e(x) + 2 Y t(x + 1) = t(x) X + 1 = ( ) Y X 1 Note that s is a signed distance which is negative if the line segment is below the closest pixel centre, and positive otherwise. We can assume that the line starts at a pixel centre, thus the initial values of the control variables are: s(x 1 ) = 0, t(x 1 ) = 1 = e(x 1 ) = s(x 1 ) t(x 1 ) = 1. This algorithm keeps updating error variable e and steps onto the next pixel row when the error variable becomes positive. In this case, the error variable is decreased to have a negative value again. The update of the error variable requires a non-integer addition and the computation of its increment involves a division, similarly to the DDA algorithm. It seems that this approach is not better than the DDA. Let us note, however, that the sign changes of the error variable can also be recognized if we examine the product of the error variable and a positive number. Multiplying the error variable by X we obtain decision variable E = e X. In case of moderately ascending lines the decision and error variables change their sign simultaneously. The incremental update formulae of the decision variable can be obtained by multiplying the update formulae of error variable by X: E(X + 1) = E(X) + 2 Y, if Y is not incremented, E(X) + 2( Y X), if Y needs to be incremented. The initial value of the decision variable is E(X 1 ) = e(x 1 ) X = X. The decision variable starts at an integer value and is incremented by integers in each step, thus it remains to be an integer and does not require fractional numbers at all. The computation of the increments need only integer additions or subtractions and multiplications by 2. The complete Bresenham line drawing algorithm is:.

132 Computer Graphics Y Y q[1] q[1] X 0 X 1 q[0] q[2] X 0 X 3 X 2 X 1 q[3] q[3] X q[0] q[2] X Figure Polygon ll. Pixels inside the polygon are identied scan line by scan line. Bresenham-Line-Drawing(X 1, Y 1, X 2, Y 2,colour) 1 X X 2 X 1 2 Y Y 2 Y 1 3 (de +, de ) (2( Y X), 2 Y ) 4 E X 5 Y Y 1 6 for X X 1 to X 2 7 do if E 0 8 then E E + de The line stays in the current pixel row. 9 else E E + de + The line steps onto the next pixel row. 10 Y Y Pixel-Write(X, Y, colour) The fundamental idea of the Bresenham algorithm was the replacement of the fractional error variable by an integer decision variable in a way that the conditions used by the algorithm remained equivalent. This approach is also called the method of invariants, which is useful in many rasterization algorithms. Polygon ll The input of an algorithm lling single connected polygons is the array of vertices q[0],..., q[m 1] (this array is usually the output of the polygon clipping algorithm). Edge e of the polygon connects vertices q[e] and q[e + 1]. The last vertex needs not be treated in a special way if the rst vertex is put again after the last vertex in the array. Multiply connected polygons are dened by more than one closed polylines, thus are specied by more than one vertex arrays. The lling is executed by processing a horizontal pixel row called scan line at a time. For a single scan line, the pixels belonging to the interior of the polygon can be found by the following steps. First the intersections of the polygon edges and the scan line are calculated. Then the intersection points are sorted in the ascending order of their X coordinates. Finally, pixels between the rst and the second intersection points, and between the third and the fourth intersection points,

133 14.7. Incremental rendering 1083 Y+2 Y+1 Y X X/ Y X/ Y X(Y) X(Y+1) X(Y+2) Figure Incremental computation of the intersections between the scan lines and the edges. Coordinate X always increases with the reciprocal of the slope of the line. AET Y max X/ Y X Y X/ Y max X Figure The structure of the active edge table. or generally between the (2i + 1)th and the (2i + 2)th intersection points are set to the colour of the polygon (Figure 14.42). This algorithm lls those pixels which can be reached from innity by crossing the polygon boundary odd number of times. The computation of the intersections between scan lines and polygon edges can be speeded up using the following observations: 1. An edge and a scan line can have intersection only if coordinate Y of the scan line is between the minimum and maximum Y coordinates of the edge. Such edges are the active edges. When implementing this idea, an active edge table (AET for short) is needed which stores the currently active edges. 2. The computation of the intersection point of a line segment and the scan line requires oating point multiplication, division, and addition, thus it is time consuming. Applying the incremental principle, however, we can also obtain the intersection point of the edge and a scan line from the intersection point with the previous scan line using a single, xed-point addition (Figure 14.43). When the incremental principle is exploited, we realize that coordinate X of the intersection with an edge always increases by the same amount when scan line Y is incremented. If the edge endpoint having the larger Y coordinate is (Xmax, Ymax) and the endpoint having the smaller Y coordinate is (Xmin, Ymin), then the increment of the X coordinate of the intersection is X/ Y, where X = Xmax Xmin and Y = Ymax Ymin. This increment is usually not an integer, hence increment X/ Y and intersection coordinate X should be stored in non-integer, preferably xed-point variables. An active edge is thus represented by a xed-point increment X/ Y, the xed-point coordinate value of intersection X, and the maximum vertical coordinate of the edge (Ymax). The maximum vertical coordinate is needed to recognize when the edge becomes inactive. Scan lines are processed one after the other. First, the algorithm determines which edges become active for this scan line, that is, which edges have minimum Y coordinate being equal to the scan line coordinate. These edges are inserted into

134 Computer Graphics the active edge table. The active edge table is also traversed and those edges whose maximum Y coordinate equals to the scan line coordinate are removed (note that this way the lower end of an edge is supposed to belong to the edge, but the upper edge is not). Then the active edge table is sorted according to the X coordinates of the edges, and the pixels between each pair of edges are lled. Finally, the X coordinates of the intersections in the edges of the active edge table are prepared for the next scan line by incrementing them by the reciprocal of the slope X/ Y. Polygon-Fill(polygon, colour) 1 for Y 0 to Ymax 2 do for each edge of polygon Put activated edges into the AET. 3 do if edge.ymin = Y 4 then Put-AET(edge) 5 for each edge of the AET Remove deactivated edges from the AET. 6 do if edge.ymax Y 7 then Delete-from-AET(edge) 8 Sort-AET Sort according to X. 9 for each pair of edges (edge1,edge2) of the AET 10 do for X Round(edge1.x) to Round(edge2.x) 11 do Pixel-Write(X, Y, colour) 11 for each edge in the AET Incremental principle. 12 do edge.x edge.x + edge. X/ Y The algorithm works scan line by scan line and rst puts the activated edges (edge.ymin = Y ) to the active edge table. The active edge table is maintained by three operations. Operation Put-AET(edge) computes variables (Ymax, X/ Y, X) of an edge and inserts this structure into the table. Operation Delete-from-AET removes an item from the table when the edge is not active any more (edge.ymax Y ). Operation Sort-AET sorts the table in the ascending order of the X value of the items. Having sorted the lists, every two consecutive items form a pair, and the pixels between the endpoints of each of these pairs are lled. Finally, the X coordinates of the items are updated according to the incremental principle Incremental visibility algorithms The three-dimensional visibility problem is solved in the screen coordinate system. We can assume that the surfaces are given as triangle meshes. Z-buer algorithm The z-buer algorithm nds that surface for each pixel, where the Z coordinate of the visible point is minimal. For each pixel we allocate a memory to store the minimum Z coordinate of those surfaces which have been processed so far. This memory is called the z-buer or the depth-buer. When a triangle of the surface is rendered, all those pixels are identied which fall into the interior of the projection of the triangle by a triangle lling algorithm. As

135 14.7. Incremental rendering 1085 the lling algorithm processes a pixel, the Z coordinate of the triangle point visible in this pixel is obtained. If this Z value is larger than the value already stored in the z-buer, then there exists an already processed triangle that is closer than the current triangle in this given pixel. Thus the current triangle is obscured in this pixel and its colour should not be written into the raster memory. However, if the new Z value is smaller than the value stored in the z-buer, then the current triangle is the closest so far, and its colour and Z coordinate should be written into the pixel and the z-buer, respectively. The z-buer algorithm is then: Z-buffer() 1 for each pixel p Clear screen. 2 do Pixel-Write(p, background-colour) 3 z-buer[p] maximum value after clipping 4 for each triangle o Rendering. 5 do for each pixel p of triangle o 6 do Z coordinate Z of that point o which projects onto pixel p 7 if Z < z-buer[p] 8 then Pixel-Write(p, colour of triangle o in this point) 9 z-buer[p] Z When the triangle is lled, the general polygon lling algorithm of the previous section could be used. However, it is worth exploiting the special features of the triangle. Let us sort the triangle vertices according to their Y coordinates and assign index 1 to the vertex of the smallest Y coordinate and index 3 to the vertex of the largest Y coordinate. The third vertex gets index 2. Then let us cut the triangle into two pieces with scan line Y 2. After cutting we obtain a lower triangle and an upper triangle. Let us realize that in such triangles the rst (left) and the second (right) intersections of the scan lines are always on the same edges, thus the administration of the polygon lling algorithm can be signicantly simplied. In fact, the active edge table management is not needed anymore, only the incremental intersection calculation should be implemented. The classication of left and right intersections depend on whether (X 2, Y 2 ) is on the right or on the left side of the oriented line segment from (X 1, Y 1 ) to (X 3, Y 3 ). If (X 2, Y 2 ) is on the left side, the projected triangle is called left oriented, and right oriented otherwise. When the details of the algorithm is introduced, we assume that the already re-indexed triangle vertices are r 1 = [X 1, Y 1, Z 1 ], r 2 = [X 2, Y 2, Z 2 ], r 3 = [X 3, Y 3, Z 3 ]. The rasterization algorithm is expected to ll the projection of this triangle and also to compute the Z coordinate of the triangle in every pixel (Figure 14.45). The Z coordinate of the triangle point visible in pixel X, Y is computed using the equation of the plane of the triangle (equation (14.1)): n X X + n Y Y + n Z Z + d = 0, (14.36)

136 Computer Graphics Z(X,Y) n r 3 =(X 3, Y 3, Z 3 ) r 1 =(X 1, Y 1, Z 1 ) Y r 2=(X 2, Y 2, Z 2 ) X Figure A triangle in the screen coordinate system. Pixels inside the projection of the triangle on plane XY need to be found. The Z coordinates of the triangle in these pixels are computed using the equation of the plane of the triangle. X,Y (X 3,Y 3,Z 3 ) Y Z = Z(X,Y) Z X (X 2,Y 2,Z 2 ) δz X δx Y s δz Y s (X,Y,Z ) δx Y e Figure Incremental Z coordinate computation for a left oriented triangle. where n = ( r 2 r 1 ) ( r 3 r 1 ) and d = n r 1. Whether the triangle is left oriented or right oriented depends on the sign of the Z coordinate of the normal vector of the plane. If n Z is negative, then the triangle is left oriented. If it is negative, then the triangle is right oriented. Finally, when n Z is zero, then the projection maps the triangle onto a line segment, which can be ignored during lling. Using the equation of the plane, function Z(X, Y ) expressing the Z coordinate corresponding to pixel X, Y is: Z(X, Y ) = n X X + n Y Y + d n Z. (14.37) According to the incremental principle, the evaluation the Z coordinate can take advantage of the value of the previous pixel: Z(X + 1, Y ) = Z(X, Y ) n X n Z = Z(X, Y ) + δz X. (14.38) Since increment δz X is constant for the whole triangle, it needs to be computed

137 14.7. Incremental rendering 1087 only once. Thus the calculation of the Z coordinate in a scan line requires just a single addition per pixel. The Z coordinate values along the edges can also be obtained incrementally from the respective values at the previous scan line (Figure 14.46). The complete incremental algorithm which renders a lower left oriented triangle is as follows (the other cases are very similar): Z-buffer-Lower-Triangle(X 1, Y 1, Z 1, X 2, Y 2, Z 2, X 3, Y 3, Z 3,colour) 1 n ((X 2, Y 2, Z 2 ) (X 1, Y 1, Z 1 )) ((X 3, Y 3, Z 3 ) (X 1, Y 1, Z 1 )) Normal vector. 2 δz X n X /n Z Z increment. 3 (δx s Y, δzs Y, δxe Y ) ((X 2 X 1 )/(Y 2 Y 1 ), (Z 2 Z 1 )/(Y 2 Y 1 ), (X 3 X 1 )/(Y 3 Y 1 )) 4 (Xleft, Xright, Zleft) (X 1, X 1, Z 1 ) 5 for Y Y 1 to Y 2 6 do Z Zleft 7 for X Round(Xleft) to Round(Xright) One scan line. 8 do if Z < z-buer[x, Y ] Visibility test. 9 then Pixel-Write(X, Y, colour) 10 z-buer[x, Y ] Z 11 Z Z + δz X 12 (Xleft, Xright, Zleft) (Xleft + δx s Y, X right + δx e Y, Z left + δz s Y ) Next scan line. This algorithm simultaneously identies the pixels to be lled and computes the Z coordinates with linear interpolation. Linear interpolation requires just a single addition when a pixel is processed. This idea can also be used for other features as well. For example, if the colour of the triangle vertices are available, the colour of the internal points can be set to provide smooth transitions applying linear interpolation. Note also that the addition to compute the feature value can also be implemented by a special purpose hardware. Graphics cards have a great number of such interpolation units. Warnock algorithm If a pixel of the image corresponds to a given object, then its neighbours usually correspond to the same object, that is, visible parts of objects appear as connected territories on the screen. This is a consequence of object coherence and is called image coherence. If the situation is so fortunatefrom a labor saving point of viewthat a polygon in the object scene obscures all the others and its projection onto the image plane covers the image window completely, then we have to do no more than simply ll the image with the colour of the polygon. If no polygon edge falls into the window, then either there is no visible polygon, or some polygon covers it completely (Figure 14.47). The window is lled with the background colour in the rst case, and with the colour of the closest polygon in the second case. If at least one polygon edge falls into the window, then the solution is not so simple. In this case, using a divide-and-conquer approach, the window is subdivided into four quarters, and each subwindow is searched recursively for a simple solution.

138 Computer Graphics polygon polygon polygon window window window window polygon (a) (b) (c) (d) Figure Polygon-window relations:: (a) distinct; (b) surrounding ; (c) intersecting; (d) contained. The basic form of the algorithm called Warnock-algorithm rendering a rectangular window with screen coordinates X 1, Y 1 (lower left corner) and X 2, Y 2 (upper right corner) is this: Warnock(X 1, Y 1, X 2, Y 2 ) 1 if X 1 X 2 or Y 1 Y 2 Is the window larger than a pixel? 2 then if at least one edge projects onto the window 3 then Non-trivial case: Subdivision and recursion. 4 Warnock(X 1, Y 1, (X 1 + X 2 )/2, (Y 1 + Y 2 )/2) 5 Warnock(X 1, (Y 1 + Y 2 )/2, (X 1 + X 2 )/2, Y 2 ) 6 Warnock((X 1 + X 2 )/2, Y 1, X 2, (Y 1 + Y 2 )/2) 7 Warnock((X 1 + X 2 )/2, (Y 1 + Y 2 )/2, X 2, Y 2 ) 8 else Trivial case: window (X 1, Y 1, X 2, Y 2 ) is homogeneous. 9 polygon the polygon visible in pixel ((X 1 + X 2 )/2, (Y 1 + Y 2 )/2) 10 if no visible polygon 11 then ll rectangle (X 1, Y 1, X 2, Y 2 ) with the background colour 12 else ll rectangle (X 1, Y 1, X 2, Y 2 ) with the colour of polygon Note that the algorithm can handle non-intersecting polygons only. The algorithm can be accelerated by ltering out those distinct polygons which can denitely not be seen in a given subwindow at a given step. Furthermore, if a surrounding polygon appears at a given stage, then all the others behind it can be discarded, that is all those which fall onto the opposite side of it from the eye. Finally, if there is only one contained or intersecting polygon, then the window does not have to be subdivided further, but the polygon (or rather the clipped part of it) is simply drawn. The price of saving further recurrence is the use of a scan-conversion algorithm to ll the polygon. Painter's algorithm If we simply scan convert polygons into pixels and draw the pixels onto the screen without any examination of distances from the eye, then each pixel will contain the colour of the last polygon falling onto that pixel. If the polygons were ordered by their distance from the eye, and we took the farthest one rst and the closest one last, then the nal picture would be correct. Closer polygons would obscure farther

139 14.7. Incremental rendering 1089 ones just as if they were painted an opaque colour. This method is known as the painter's algorithm. The only problem is that the order of the polygons necessary for performing the painter's algorithm is not always simple to compute. We say that a polygon P does not obscure another polygon Q if none of the points of Q is obscured by P. To have this relation, one of the following conditions should hold 1. Polygons P and Q do not overlap in Z range, and the minimum Z coordinate of polygon P is greater than the maximum Z coordinate of polygon Q. 2. The bounding rectangle of P on the XY plane does not overlap with that of Q. 3. Each vertex of P is farther from the viewpoint than the plane containing Q. 4. Each vertex of Q is closer to the viewpoint than the plane containing P. 5. The projections of P and Q do not overlap on the XY plane. All these conditions are sucient. The diculty of their test increases, thus it is worth testing the conditions in the above order until one of them proves to be true. The rst step is the calculation of an initial depth order. This is done by sorting the polygons according to their maximal Z value into a list. Let us rst take the polygon P which is the last item on the resulting list. If the Z range of P does not overlap with any of the preceding polygons, then P is correctly positioned, and the polygon preceding P can be taken instead of P for a similar examination. Otherwise P overlaps a set {Q 1,..., Q m } of polygons. The next step is to try to check whether P does not obscure any of the polygons in {Q 1,..., Q m }, that is, that P is at its right position despite the overlapping. If it turns out that P obscures Q for a polygon in the set {Q 1,..., Q m }, then Q has to be moved behind P in the list, and the algorithm continues stepping back to Q. Unfortunately, this algorithm can run into an innite loop in case of cyclic overlapping. Cycles can be resolved by cutting. In order to accomplish this, whenever a polygon is moved to another position in the list, we mark it. If a marked polygon Q is about to be moved again, then assuming that Q is a part of a cycle Q is cut into two pieces Q 1, Q 2 by the plane of P, so that Q 1 does not obscure P and P does not obscure Q 2, and only Q 1 is moved behind P. BSP-tree Binary space partitioning divides rst the space into two halfspaces, the second plane divides the rst halfspace, the third plane divides the second halfspace, further planes split the resulting volumes, etc. The subdivision can well be represented by a binary tree, the so-called BSP-tree illustrated in Figure The kd-tree discussed in Subsection is also a special version of BSP-trees where the splitting planes are parallel with the coordinate planes. The BSP-tree of this subsection, however, uses general planes. The rst splitting plane is associated with the root node of the BSP-tree, the second and third planes are associated with the two children of the root, etc. For our application, not so much the planes, but rather the polygons dening them, will be assigned to the nodes of the tree, and the set of polygons contained by the volume is also necessarily associated with each node. Each leaf node will then contain either

140 Computer Graphics P 3 P 1 P 1 P 4 P 2 P 3 P 2 P 4 null Figure A BSP-tree. The space is subdivided by the planes of the contained polygons. no polygon or one polygon in the associated set. The BSP-Tree-Construction algorithm for creating the BSP-tree for a set S of polygons uses the following notations. A node of the binary tree is denoted by node, the polygon associated with the node by node.polygon, and the two child nodes by node.left and node.right, respectively. Let us consider a splitting plane of normal n and place vector r 0. Point r belongs to the positive (right) subspace of this plane if the sign of scalar product n ( r r 0 ) is positive, otherwise it is in the negative (left) subspace. The BSP construction algorithm is: BSP-Tree-Construction(S) 1 Create a new node 2 if S is empty or contains just a single polygon 3 then node.polygon S 4 node.left null 5 node.right null 6 else node.polygon one polygon from list S 7 Remove polygon node.polygon from list S 8 S + polygons of S which overlap with the positive subspace of node.polygon 9 S polygons of S which overlap with the negative subspace of node.polygon 10 node.right BSP-Tree-Construction(S + ) 11 node.left BSP-Tree-Construction(S ) 12 return node The size of the BSP-tree, i.e. the number of polygons stored in it, is on the one hand highly dependent on the nature of the object scene, and on the other hand on the choice strategy used when one polygon from list S is selected. Having constructed the BSP-tree the visibility problem can be solved by traversing the tree in the order that if a polygon obscures another than it is processed later. During such a traversal, we determine whether the eye is at the left or right subspace at each node, and continue the traversal in the child not containing the eye. Having processed the child not containing the eye, the polygon of the node is

141 14. Problems 1091 drawn and nally the child containing the eye is traversed recursively. Exercises Implement the complete Bresenham algorithm that can handle not only moderately ascending but arbitrary line segments The presented polygon lling algorithm tests each edges at a scan line whether it becomes active here. Modify the algorithm in a way that such tests are not executed at each scan line, but only once Implement the complete z-buer algorithm that renders left/righ oriented, upper/lower triangles Improve the presented Warnock-algorithm and eliminate further recursions when only one edge is projected onto the subwindow Apply the BSP-tree for discrete time collision detection Apply the BSP-tree as a space partitioning structure for ray tracing. Problems 14-1 Ray tracing renderer Implement a rendering system applying the ray tracing algorithm. Objects are dened by triangle meshes and quadratic surfaces, and are associated with diuse reectivities. The virtual world also contains point light sources. The visible colour of a point is proportional to the diuse reectivity, the intensity of the light source, the cosine of the angle between the surface normal and the illumination direction (Lambert's law), and inversely proportional with the distance of the point and the light source. To detect whether or not a light source is not occluded from a point, use the ray tracing algorithm as well Continuous time collision detection with ray tracing Using ray tracing develop a continuous time collision detection algorithm which computes the time of collision between a moving and rotating polyhedron and a still half space. Approximate the motion of a polygon vertex by a uniform, constant velocity motion in small intervals dt Incremental rendering system Implement a three-dimensional renderer based on incremental rendering. The modelling and camera transforms can be set by the user. The objects are given as triangle meshes, where each vertex has colour information as well. Having transformed and clipped the objects, the z-buer algorithm should be used for hidden surface removal. The colour at the internal points is obtained by linear interpolation from the vertex colours. Chapter notes The elements of Euclidean, analytic and projective geometry are discussed in the books of Maxwell [?,?] and Coxeter [?]. The application of projective geometry in

142 Computer Graphics computer graphics is presented in Herman's dissertation [?] and Krammer's paper [?]. Curve and surface modelling is the main focus of computer aided geometric design (CAD, CAGD), which is discussed by Gerald Farin [?], and Rogers and Adams [?]. Geometric models can also be obtained measuring real objects, as proposed by reverse engineering methods [?]. Implicit surfaces can be studied by reading Bloomenthal's work [?]. Solid modelling with implicit equations is also booming thanks to the emergence of functional representation methods (F-Rep), which are surveyed at Blobs have been rst proposed by Blinn [?]. Later the exponential inuence function has been replaced by polynomials [?], which are more appropriate when roots have to be found in ray tracing. Geometric algorithms give solutions to geometric problems such as the creation of convex hulls, clipping, containment test, tessellation, point location, etc. This eld is discussed in the books of Preparata and Shamos [?] and of Marc de Berg [?,?]. The triangulation of general polygons is still a dicult topic despite to a lot of research eorts. Practical triangulation algorithms run in O(n lg n) [?,?,?], but Chazelle [?] proposed an optimal algorithm having linear time complexity. The presented proof of the two ears theorem has originally been given by Joseph O'Rourke [?]. Subdivision surfaces have been proposed and discussed by Catmull and Clark [?], Warren and Weimer [?], and by Brian Sharp [?,?]. The buttery subdivision approach has been published by Dyn et al. [?]. The SutherlandHodgeman polygon clipping algorithm is taken from [?]. Collision detection is one of the most critical problems in computer games since it prevents objects to y through walls and it is used to decide whether a bullet hits an enemy or not. Collision detection algorithms are reviewed by Jiménez, Thomas and Torras [?]. Glassner's book [?] presents many aspects of ray tracing algorithms. The 3D DDA algorithm has been proposed by Fujimoto et al. [?]. Many papers examined the complexity of ray tracing algorithms. It has been proven that for N objects, ray tracing can be solved in O(lg N) time [?,?], but this is theoretical rather than practical result since it requires Ω(N 4 ) memory and preprocessing time, which is practically unacceptable. In practice, the discussed heuristic schemes are preferred, which are better than the naive approach only in the average case. Heuristic methods have been analyzed by probabilistic tools by Márton [?], who also proposed the probabilistic scene model used in this chapter as well. We can read about heuristic algorithms, especially about the ecient implementation of the kd-tree based ray tracing in Havran's dissertation [?]. A particularly ecient solution is given in Szécsi's paper [?]. The probabilistic tools, such as the Poisson point process can be found in the books of Karlin and Taylor [?] and Lamperti [?]. The cited fundamental law of integral geometry can be found in the book of Santaló [?]. The geoinformatics application of quadtrees and octrees are also discussed in chapter 16 of this book. The algorithms of incremental image synthesis are discussed in many computer graphics textbooks [?]. Visibility algorithms have been compared in [?,?]. The painter's algorithm has been proposed by Newell et al. [?]. Fuchs examined the construction of minimal depth BSP-trees [?]. The source of the Bresenham algorithm is

143 Computer Graphics [?]. Graphics cards implement the algorithms of incremental image synthesis, including transformations, clipping, z-buer algorithm, which are accessible through graphics libraries (OpenGL, DirectX). Current graphics hardware includes two programmable processors, which enables the user to modify the basic rendering pipeline. Furthermore, this exibility allows non graphics problems to be solved on the graphics hardware. The reason of using the graphics hardware for non graphics problems is that graphics cards have much higher computational power than CPUs. We can read about such algorithms in the ShaderX or in the GPU Gems [?] series or visiting the web page. This bibliography is made by HBibTEX. First key of the sorting is the name of the authors (rst author, second author etc.), second key is the year of publication, third key is the title of the document. Underlying shows that the electronic version of the bibliography on the homepage of the book contains a link to the corresponding address.

144 Subject Index This index uses the following conventions. Numbers are alphabetised as if spelled out; for example, tree" is indexed as if were two-three-four-tree". When an entry refers to a place other than the main text, the page number is followed by a tag: ex for exercise, g for gure, pr for problem and fn for footnote. The numbers of pages containing a denition are printed in italic font, e.g. time complexity, 583.

145 Name Index This index uses the following conventions. If we know the full name of a cited person, then we print it. If the cited person is not living, and we know the correct data, then we print also the year of their birth and death.