Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware

Similar documents

Introduction to GPU Programming Languages

Texture Cache Approximation on GPUs

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

Recent Advances and Future Trends in Graphics Hardware. Michael Doggett Architect November 23, 2005

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

CUDA programming on NVIDIA GPUs

Introduction to GPU hardware and to CUDA

GPGPU Computing. Yong Cao

GPU Architecture. Michael Doggett ATI

GPUs for Scientific Computing

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

Multi-GPU Load Balancing for Simulation and Rendering

Parallel Prefix Sum (Scan) with CUDA. Mark Harris

Radeon HD 2900 and Geometry Generation. Michael Doggett

L20: GPU Architecture and Models

Brook for GPUs: Stream Computing on Graphics Hardware

GPU Point List Generation through Histogram Pyramids

General Purpose Computation on Graphics Processors (GPGPU) Mike Houston, Stanford University

Shader Model 3.0. Ashu Rege. NVIDIA Developer Technology Group

AMD GPU Architecture. OpenCL Tutorial, PPAM Dominik Behr September 13th, 2009

GPGPU: General-Purpose Computation on GPUs

GPU Shading and Rendering: Introduction & Graphics Hardware

HPC with Multicore and GPUs

Data Parallel Computing on Graphics Hardware. Ian Buck Stanford University

Introduction to GPGPU. Tiziano Diamanti

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

GPGPU for Real-Time Data Analytics: Introduction. Nanyang Technological University, Singapore 2

VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Interactive Level-Set Deformation On the GPU

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

ultra fast SOM using CUDA

HP ProLiant SL270s Gen8 Server. Evaluation Report

Accelerating Wavelet-Based Video Coding on Graphics Hardware

An investigation of the efficient implementation of Cellular. Automata on multi-core CPU and GPU hardware.

Radeon GPU Architecture and the Radeon 4800 series. Michael Doggett Graphics Architecture Group June 27, 2008

Overview Motivation and applications Challenges. Dynamic Volume Computation and Visualization on the GPU. GPU feature requests Conclusions

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU Parallel Computing Architecture and CUDA Programming Model

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

NVPRO-PIPELINE A RESEARCH RENDERING PIPELINE MARKUS TAVENRATH MATAVENRATH@NVIDIA.COM SENIOR DEVELOPER TECHNOLOGY ENGINEER, NVIDIA

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Real-time Visual Tracker by Stream Processing

ATI Radeon 4800 series Graphics. Michael Doggett Graphics Architecture Group Graphics Product Group

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

GUI GRAPHICS AND USER INTERFACES. Welcome to GUI! Mechanics. Mihail Gaianu 26/02/2014 1

GPU for Scientific Computing. -Ali Saleh

Next Generation GPU Architecture Code-named Fermi

Computer Graphics Hardware An Overview

Game Development in Android Disgruntled Rats LLC. Sean Godinez Brian Morgan Michael Boldischar

Introduction to Cloud Computing

CUDA Basics. Murphy Stein New York University

CUBE-MAP DATA STRUCTURE FOR INTERACTIVE GLOBAL ILLUMINATION COMPUTATION IN DYNAMIC DIFFUSE ENVIRONMENTS

Multi-GPU Load Balancing for In-situ Visualization

GPU Computing - CUDA

How To Teach Computer Graphics

NVIDIA GeForce GTX 580 GPU Datasheet

A Computer Vision System on a Chip: a case study from the automotive domain

Hardware design for ray tracing

Web Based 3D Visualization for COMSOL Multiphysics

Writing Applications for the GPU Using the RapidMind Development Platform

Optimizing AAA Games for Mobile Platforms

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

Optimizing Unity Games for Mobile Platforms. Angelo Theodorou Software Engineer Unite 2013, 28 th -30 th August

Introduction to GPU Architecture

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

GPU(Graphics Processing Unit) with a Focus on Nvidia GeForce 6 Series. By: Binesh Tuladhar Clay Smith

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

GPGPU Parallel Merge Sort Algorithm

Rethinking SIMD Vectorization for In-Memory Databases

Big Data Visualization on the MIC

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

QCD as a Video Game?

Guided Performance Analysis with the NVIDIA Visual Profiler

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

OpenGL Performance Tuning

Final Project Report. Trading Platform Server

Image Processing and Computer Graphics. Rendering Pipeline. Matthias Teschner. Computer Science Department University of Freiburg

CUDA SKILLS. Yu-Hang Tang. June 23-26, 2015 CSRC, Beijing

HPC Wales Skills Academy Course Catalogue 2015

String Matching on a multicore GPU using CUDA

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

A Load Balancing Schema for Agent-based SPMD Applications

Developer Tools. Tim Purcell NVIDIA

Parallel Computing with MATLAB

A Performance-Oriented Data Parallel Virtual Machine for GPUs

A Case Study - Scaling Legacy Code on Next Generation Platforms

Design and Optimization of a Portable Lattice Boltzmann Code for Heterogeneous Architectures

Performance Optimization and Debug Tools for mobile games with PlayCanvas

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Transcription:

Flexible Agent Based Simulation for Pedestrian Modelling on GPU Hardware The University of Sheffield, UK www.dcs.shef.ac.uk/~paul Richmond Paul, Coakley Simon, Romano Daniela, "Cellular Level Agent Based Modelling on the Graphics Processing Unit (with FLAME GPU)", Selected for review in the special issue: "Parallel and Ubiquitous methods and tools in Systems Biology" of the international journal: Briefings in Bioinformatics Richmond Paul, Coakley Simon, Romano Daniela (2009), "Cellular Level Agent Based Modelling on the Graphics Processing Unit", Proc. of HiBi09 - High Performance Computational Systems Biology, 14-16 October 2009,Trento, Italy Richmond Paul, Coakley Simon, Romano Daniela(2009), "A High Performance Agent Based Modelling Framework on Graphics Card Hardware with CUDA", Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), May, 10 15, 2009, Budapest, Hungary Richmond Paul, Romano Daniela(2008), "A High Performance Framework For Agent Based Pedestrian Dynamics On GPU Hardware", Proceedings of EUROSIS ESM 2008 (European Simulation and Modelling), October 27-29, 2008, Universite du Havre, Le Havre, France

Introduction and Scope Agent Based Modelling (ABM) Emergence of Complex natural behaviour for simple rules Individuals are agents with memory Update own memory by considering neighbours Of Pedestrian Behaviour Continuous space mobile agents Discrete time steps On the GPU Why?: Performance and real time visualisation Aim is for Flexibility: Want to be able to harness the GPUs power without modellers having to understand GPU programming Not Continuum based (Treuille 06) or using mobile discrete agents (D Souza 07)

Outline FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

What is FLAME? What is FLAME (and what FLAME is not)? Flexible Large-scale Agent Modelling Environment XML Model specification based on the X-Machine Template systems for generating simulation code Single CPU GRID GPU Not a modelling application itself (dynamically generated API) Why extend FLAME to the GPU Complete modelling environment (beyond that of simple swarms) Formal and portable specification technique based on the X-Machine Many existing models to be used for benchmarking What is FLAME GPU Data parallel implementation of FLAME using CUDA Offers real time visualisation Cost effective solution for high performance ABM

FLAME and Formal Agent Specification The X-Machine formally defined by Eilenberg (Eilenberg 74) as a 8-tuple (, Γ, Q, M, Φ, F, q0, m0), where; and Γ are the input and output finite alphabet respectively; Q is the finite set of states; M is the (possibly) infinite set called memory; Φ is a finite set of partial functions ø that map an input and a memory state to an output and a new memory state, ø: M Γ M; F is the next state partial function that, given a state and a function from the type Φ, provides the next state, F: Q Φ Q (F is often described as a transition state diagram); q0 and m0 are the initial state and memory respectively; and

Agents as Communicating X-Machine s Each agent is a Communicating Stream X- Machine (Balanescu 99) Stream: input and output are streams of data Communicating: agents input and output messages State transitions (functions) describe agent behaviour Updates agent memory Outputs messages (and agents) and process input messages

Specifying an Agent in XMML <xagent> <name>pedestrian</name> <memory> <variable><type>float</type><name>x</name></variable> <variable><type>float</type><name>y</name></variable> <variable><type>float</type><name>velx</name></variable> <variable><type>float</type><name>vely</name></variable> </memory> <states> <state><name>start_state</name></state> <state><name>wait_input</name></state> <initialstate>start_state</initialstate> </states> <functions> <function> <name>output_location</name> <currentstate>start_state</currentstate><nextstate>wait_input</nextstate> <outputs> <output><messagename>pedestrian_location</messagename></output> </outputs> </function> <function> <name>input_locations</name> <currentstate>wait_input</currentstate><nextstate>start_state</nextstate> <inputs> <input><messagename>pedestrian_location</messagename></input> </inputs> </function> </functions> <type>continuous</type> </xagent>

Specifying Agent Communication in XMML <message> <name>pedestrian_location</name> <variables> <variable> <type>float</type><name>x</name> </variable> <variable> <type>float</type><name>y</name> </variable> <variable> <type>float</type><name>velx</name> </variable> <variable> <type>float</type><name>vely</name> </variable> </variables> <partitioningspatial> <radius>25</radius> <xmin>-100.0</xmin> <xmax>100.0</xmax> <ymin>-100.0</ymin> <ymax>100.0</ymax> <zmin>0.0</zmin> <zmax>25</zmax> </partitioningspatial> </partitioningnone> <partitioningdiscrete> <radius>0</radius> </partitioningdiscrete> </message>

Specifying the Function Order <layers> <layer> <layerfunction> <name>output_location</name> </layerfunction> </layer> <layer> <layerfunction> <name>input_locations</name> </layerfunction> </layer> </layers> agent->x agent->y agent->vel_x agent->vel_x pedestrian_location Message list output_location() OUT start_state wait_input input_locations() IN

Simulation Process and Code Generation XMML Model File Syntax validated through XML Schema Base XMML Schema describes the basic structure of an X-Machine agent GPU Specific extensions (partitioning) available through a XMMLGPU Schema Object Orientated Approach to extension of the base model C Function Files Translates an XMML model file into simulation source code Templates are written in XML (using XSLT Schema) so can be syntax validated XSLT Processors implement a W3C specification: Any compliant processor can be used to generate code FLAME GPU is therefore not dependant on internal tools or parsers XML Input Data Defines the internal memory of an initial population of agents

FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

Programming the GPU Purpose of the GPU Data parallel device for operation on streams of data Programming for General Purpose Use Graphics API Technique: Not ideal High Level Alternatives Brook GPU (Buck 04): SIMD Stream programming extension for C Sh (McCool 02): C++ language with a Compiler for GPU backends Hardware Specific Stream SDK: Low level ATI specific native instruction set and High Level support with Brook + CUDA: NVIDIA programming for GPU using a compiler and a C syntax with extensions OpenCL: New standard but growing limited support

NVIDIA CUDA Programming Model GPU is a coprocessor to CPU (with its own global memory) Many parallel threads of execution Each thread runs the same kernel program (SPMD) Threads are grouped into regular sized blocks Threads within a block can communicate through shared memory Simple synchronisation primitive Threads across blocks can not communicate Block of Threads Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread N Grid of Blocks Block 0 Block 1 Block 2 Block 3

CUDA Hardware Model Thread blocks are mapped to Multi Processors (MPs) Multiprocessors are a set of SIMD thread (vector) processors Limited shared memory per MP (and hence blocks) Limited cache and registers per MP GPU Device Multiprocessor 1 Registers Vector Processor 1 Multiprocessor N Multiprocessor 2 Shared Memory Registers Vector Processor 2 Registers Vector Processor N GPU DRAM Device Memory Instruction Unit Constant Cache Texture Cache

Mapping Agent Functions to the GPU Each transition function is wrapped by a GPU kernel Each agent is a thread performing the function Functions can input and output messages Functions can output new agents (agent birth) An agent can be removed (agent death) by returning non 0 value FLAME_GPU_FUNC int input_function( xmachine_memory_pedestrian* xmemory, xmachine_message_pedestrian_location_list* location_messages) { /* Get the first message */ xmachine_message_pedestrian_location* location_message = get_first_pedestrian_location_message(location_messages); } /* Repeat untill there are no more messages */ while(location_message) { /* Process the message */ if distance_check(xmemory, location_message) { updatesteervelocity(xmemory, location_message); } } /* Get the next message */ location_message = get_next_pedestrian_location_message(location_message, location_messages); /* Update any other xmemory variables */ xmemory->x += xmemory->vel_x*time_step;... return 0;

Mapping X-Machine Agent Data to the GPU All data (agents and messages) is mapped to global memory on the GPU Lists are stored using an Structure of Arrays (SoA) rather than an Array of Structures (AoS) Data is read from global memory to registers Agents and messages are referenced as C structures within function code typedef struct agent_list{ float x[n]; float y[n]; } xm_memory_agent_list; typedef struct agent{ float x; float y; } xm_memory_agent_list [N]; 0 1 2 3 N 0 1 2 3 N 0 1 2 3 N

Use of Parallel Compaction Need to avoid diversity within thread blocks Agents are stored and processed in state lists to avoid conditional branching Sparse lists still occur as a result off Agent births Function filters Also during message outputs Agent Function Agent List Agent Birth Output Flags Agent Function Agent list (colour represents state) Prefix Sum Algorithm 1 0 1 1 0 1 1 0 0 0 1 2 2 3 4 4 Agent list after agent function Compact New Agent List

Brute Force Message Communication Tile message lists into shared memory to reduce global memory access (Nyland 07) Each thread in the thread block loads a single message into shared memory on the load_first_message function Each call to load_next_message then iterates through messages in shared memory When a call to load_next_message is made after each message in SM has been returned then tile a new batch of messages Repeat until all messages have been considered

Effect of Optimisations on for Brute Force Message Communication Simple benchmarking model Efficient data access methods double performance Massive performance gain by using shared memory 100 SoA +SM SoA -SM AoS -SM 90 80 Relative Speedup over FLAME 70 60 50 40 30 20 10 0 1024 2048 4096 8192 16384 32768 65536 131072 Population Size

Limited Range Message Communication For each message output Environment is split into discrete partitions equal to the message range (each has a unique identifier) The message list is sorted depending on the partition which the message is within A boundary matrix indicates how many messages are within each partition by indicating the start and end index of agents within the sorted list To read all messages within a partition the boundary matrix indicates the range within the message list which needs to be iterated Each agent reads 27 partitions (for a 3D environment) including its own which guarantees messages within the range are processed. Roughly 2/3 messages are outside the range but much better than O(n)² Texture cache is used to read messages from global memory

Evaluation of Limited Range Communication N 32 64 96 128 160 192 224 256 1024 0.94 1.05 0.90 0.86 0.93 0.89 0.95 0.88 4096 1.24 1.25 1.30 1.22 1.39 1.22 1.24 1.25 16384 2.45 2.48 2.62 2.53 2.76 2.81 2.77 2.60 65536 9.09 9.34 9.47 9.23 9.22 9.31 9.45 9.42 262144 33.74 37.99 36.88 37.39 36.61 36.83 37.81 38.12 1048576 136.28 169.73 147.39 172.98 145.21 165.34 151.26 177.06 100% 80% Percentage of GPU Time 60% 40% 20% 0% 1024 4096 16384 65536 262144 1048576 Population Size GPUFLAME_inputdata GPUFLAME_move GPUFLAME_outputdata radixsort_kernel merge_kernel memcpyhtod other_gpuflame_kernels

Discrete Agent Communication Discrete Agents reading Discrete Messages Load messages into shared memory 2D Message Output 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Message Load 1 16 17 18 19 20 21 22 23 Message Load 2 24 25 26 27 28 29 30 31 Message Load 3 32 33 34 35 36 37 38 39 Message Load 4 40 41 42 43 44 45 46 47 Message Load 5 48 49 50 51 52 53 54 55 Message Load 6, 7, 8 and 9 56 57 58 59 60 61 62 63 SM in Block 1 SM in Block 2 SM in Block 3 SM in Block 4 63 56 57 58 59 60 59 60 61 62 63 56 31 24 25 26 27 28 27 28 29 30 31 24 7 0 1 2 3 4 3 4 5 6 7 0 39 32 33 34 35 36 35 36 37 38 39 32 15 8 9 10 11 12 11 12 13 14 15 8 47 40 41 42 43 44 43 44 45 46 47 40 23 16 17 18 19 20 16 20 21 22 23 16 55 48 49 50 51 52 52 52 53 54 55 48 31 24 25 26 27 28 27 28 29 30 31 24 63 56 57 58 59 60 60 60 61 62 63 56 39 32 33 34 35 36 35 36 37 38 39 32 7 0 1 2 3 4 3 4 5 6 7 0 Continuous Agent Reading Discrete Messages Cant ensure all messages are loaded into shared memory Use the texture cache instead

Performance of Discrete Message Communication Cellular Automaton Model (Game of Life) Over 1 million agents Shared memory only suitable for very small interaction ranges 300 TEX 64 TEX 256 SMC 64 SM 256 250 GPU Time (ms) 200 150 100 50 0 1 2 4 8 Message Range

FLAME and FLAME GPU About FLAME A simple example of an pedestrian model specification Implementing FLAME on the GPU Brief overview of GPU technology Mapping agent data and functions to the GPU Agent communication patterns Case Study Pedestrian modelling Discrete agents Performance results Conclusions

A Simple Pedestrian Model Inter agent interaction (using spatially partitioned messaging) is based on a hybrid of Reynolds and Social Forces Social repulsion force Navigates pedestrians to area of low concentration Limited forward Vision Preference over agents in direct line of sight Scaled depending on distance to neighbour Close Range Interaction Force Very short range with no limited vision Acts as collision avoidance

Visualisation Technique Agent data is already on the GPU Agent positions are made available to OpenGL by mapping them to a Buffer Object We can also store geometry on the GPU to reduce draw calls For Complex models (lots of vertices) Store a single instance of the geometry in a Vertex Array Draw the array for each agent and set a Vertex Attribute each time to indicate the agent index GLSL vertex shader is used to displace vertices in the same way For Simple Models we can use a single large Vertex Array to hold a geometry instance for each agent Associate each vertex with an agent by using a Vertex Attribute stored in a Vertex Attribute Array Only suitable for simple geometry but very few draw calls

Animation and Level of Detail (LOD) Animation - Very simple Interpolate between 2 key frames Rotate the model depending on velocity direction Performed in a vertex shader LOD - All data is maintained on GPU so must remain parallel Set View position as a GLOBAL variable Use agent script to calculate viewing distance Save LOD Level in an agent variable Use parallel reduction function to count number of agents per Level Secondary sort of the agents by LOD Level and render in groups

Demo Agents coloured by LOD

Performance Results Observables Performance Dependant on Communication Radius Larger communication = less partitions = more agents considered per update LOD technique has a cost Don t use for small populations Very large population sizes possible in real time Fram es per Second (FPS) 500 450 400 350 300 250 200 150 100 Billboards Detail Level 0 Detail Level 1 Detail Level 2 Dynamic 50 0 64 256 1024 4096 16384 65536 262144 1048576 Pedestrian Population

Environment Collision Avoidance Discrete grid of agents to encode the environment Static Discrete Agents Repulsive forces direct agents from wall Automatically generated in advance Continuous Pedestrian Agents read discrete messages Apply a collision force Displace pedestrian agents by height value

Long Range Navigation Many agents following similar paths so a global solution is used Fluid flow route for each path through the environment Calculated offline in advance by backtracking from exit point Smooth movement around obstacles Discrete Agents also responsible for pedestrian birth allocation

Conclusions and Future Work Summary Flexible agent architecture for the GPU suitable for force models Easily extendible Massive performance/cost benefits Scope for Future Work Multi GPU Would enable extremely large populations of systems to be simulated For Spatial partitioning only partition boundaries would need to be communicated between GPU devices Improve pedestrian models Improved collision detection (more accurate) Long range individual path planning without flow grids Physically accurate animation and movement Much larger models (need appropriate scenarios)

References A. Treuille, S. Cooper, and Z. Popović, "Continuum crowds," in SIGGRAPH '06: ACM SIGGRAPH 2006 Papers. New York, NY, USA: ACM, 2006, pp. 1160-1168. R. M. D Souza, M. Lysenko, and K. Rahmani. Sugarscape on steroids: simulating over a million agents at interactive rates. In Proceedings of Agent2007, 2007. Samuel Eilenberg. Automata, Languages, and Machines. Academic Press, Inc., Orlando, FL, USA, 1974. T. Balanescu, A. J. Cowling, H. Georgescu, M. Gheorghe, M. Holcombe, and C. Vertan. Communicating stream x-machines systems are no more than x-machines. j-jucs, 5(9):494 507, 1999. http://www.jucs.org/jucs_5_9/communicating_stream_x_machines. Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777 786, 2004. Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. Shader metaprogramming. In HWWS 02: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 57 68, Aire-la-Ville, Switzerland, Switzerland, 2002. Eurographics Association. Lars Nyland, Mark Harris, and Jan Prins. Fast n-body simulation with cuda. In Hubert Nguyen, editor, GPU Gems 3, chapter 31. Addison Wesley Professional, August 2007.