During the fifteen years that have elapsed

Transcription

1 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance Marc P. Armstrong, Mary Kathryn Cowles, and Shaowen Wang The University of Iowa High performance computing has undergone a radical transformation during the past decade. Though monolithic supercomputers continue to be built with significantly increased computing power, geographically distributed computing resources are now routinely linked using high-speed networks to address a broad range of computationally complex problems. These confederated resources are referred to collectively as a computational Grid. Many geographical problems exhibit characteristics that make them candidates for this new model of computing. As an illustration, we describe a spatial statistics problem and demonstrate how it can be addressed using Grid computing strategies. A key element of this application is the development of middleware that handles domain decomposition and coordinates computational functions. We also discuss the development of Grid portals that are designed to help researchers and decision makers access and use geographic information analysis tools. Key Words: Grid computing, parallel computing, spatial statistics, middleware, Grid portals. Introduction During the fifteen years that have elapsed since Griffith (1990) described an approach to geographic information analysis that required the use of then-current supercomputers, the characteristics of many geographic problems have evolved in two important ways. First, data sets have continued to increase in size as a consequence of improved data collection technologies and practices (Armstrong 2000). Gigabyte-sized datasets, almost unheard of a decade and a half ago, are now being eclipsed by terabyte and petabyte scale databases 1 (e.g., Thakar et al. 2003). Secondly, computationally intensive methods of analysis that were just becoming known in the early nineties are now gaining wider acceptance. Either of these changes could lead to computational intractability, but when taken together and when computationally intensive methods are applied to enormous disaggregated datasets, the burdens imposed can effectively preclude analyses, unless one is willing to wait months, or even years, for a result (Armstrong, Xiao, and Bennett 2003; Cowles 2003). Geography, of course, is not alone in facing such dilemmas and, fortunately, computationally based geographic research can reap significant benefits by exploiting advances now being put in place by researchers in other disciplines such as computer science and engineering. During the past few years, these advances have begun to transform the landscape of highperformance computing. In particular, an emerging model of computation, called the Grid (Foster and Kesselman 1999), uses high-speed networks to link available, trusted computer systems and form configurable, distributed virtual computing resources. The Grid represents a change in modality in the use of computer technology in which a certain amount of control over local resources is relinquished for use by other trusted applications that are used by a much wider audience. Though the focus of this article is on computation, these shared resources can take other forms, such as distributed data storage and collaborative visualization (Gao et al. 2002; Rajasekar, Wan, and Moore 2002). It is also important to note that these changes are not confined to hardware and software issues alone but also include a set of behavioral changes that enable the shared use of the emerging cyberinfrastructure to support collaborative research (Laforenza 2002) that is being conducted in cognate disciplines such as earth science (e.g., GEON) and ecology (e.g., SEEK EcoGrid). This has required the definition and promulgation of new communication protocols, software that coordinates use (middle- The Professional Geographer, 57(3) 2005, pages r Copyright 2005 by Association of American Geographers. Initial submission, October 2003; revised submission, July 2004; final acceptance, August Published by Blackwell Publishing, 350 Main Street, Malden, MA 02148, and 9600 Garsington Road, Oxford OX4 2DQ, U.K.

2 366 Volume 57, Number 3, August 2005 ware), and the development of sociocomputational mores that enable sharing of distributed resources. The purpose of this article is to sketch out the state of the practice in Grid computing and to present some results from a project that illustrates how this new approach to computation can be used to address a spatial problem. We conclude with some comments about our current research on the development of Grid portals. Such portals are designed to provide researchers with access to collections of easyto-use distributed computing tools that can be configured into a domain-specific, problemsolving environment ( PSE; Gallopoulos, Houstis, and Rice 1994). Computational Grids The idea that gave rise to Grid computing is superficially simple: turn computation into a service delivered over a widely available infrastructure (e.g., cyberinfrastructure). Metaphors such as electrical power transmission and water distribution systems are typically invoked, though the problem of a computational Grid is somewhat different (see, e.g., Licklider and Taylor 1968; Smarr 1999). The concept of Grid computing was developed by computer scientists and engineers after a realistic appraisal of trends in computer technology. The wellknown and oft-cited Moore s Law asserted that chip density (number of transistors per integrated circuit) would double every eighteen months (Moore 1965; Borkar 2003). This observation has recently been extended to include processing speed and other aspects of computer performance (see Bell and Gray 1997, 13). In particular, when trends in storage and networks are considered, the doubling time of capacity and speed turns out to be approximately one year and nine months, respectively (Foster 2002, 42). There is, as might be expected, pessimism in some circles about the likely longterm duration of such trends, but few researchers dispute that they will persist well into the coming decade. Moore, for example, has subsequently predicted that his law will hold until approximately 2017 when other basic physical laws will take precedence. Computer scientists and engineers are well aware of these trends and have acted accordingly. The expenses associated with manufacturing high-density microcircuitry are astronomical. As a consequence, high sales volumes (or prohibitively high unit costs) are needed to maintain the economic viability of companies involved in the design and creation of new generations of computer processors. This has led to significant economic problems for most computer equipment manufacturers. Though vector supercomputers, such as those manufactured by Cray, represented the high end of scientific computing in the late 1980s and early 1990s, they were expensive to develop, acquire, and maintain and were typically tightly controlled by a centralized administrative structure. This is the type of computer environment described by Griffith (1990), for example, and with only one notable exception ( Japan s Earth Simulator), vector supercomputers have receded from view. Around the same time, more explicitly parallel systems were being developed and used (see, e.g., Armstrong, Pavlik, and Marciano 1994; Cramer and Armstrong 1999), but since these systems were also expensive and difficult to maintain, many using custom processors and interconnects, most of them became extinct during the mid-1990s. The availability of highspeed Internet, however, has enabled the high level of connectivity that would support the decentralized computation that has emerged to represent the current state of the practice; this trend is likely to persist well into the future, given the gathering force of recent nationallevel, high-performance network initiatives such as Internet2 ( and LambdaRail ( Since network speed and storage capacities are increasing at such a rapid pace, and since there are commensurate efforts to expand the reach of high-performance networks, a next logical step is to link large numbers of existing distributed computational resources together (now often sitting idle) so that they can be used to address computationally intensive problems. Most of the original work in this area has taken place using hardware that is under the specific control of a local administrator (e.g., a laboratory or university facility). Clusters such as those using the Beowulf architecture (Sterling et al. 1999) have become widespread; they exist on many campuses and are used in a variety of business applications. The cluster-computing approach does have two significant limitations, however. One pundit has compared this ap-

3 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance 367 proach to amassing a large number of rowboats that, collectively, can never hope to fulfill the functions of a cruise ship. The argument here is that while large numbers of commodity-class processors can crunch numbers, they are inadequate for memory-intensive problems that require high performance I-O and memory subsystems. The ability to link heterogeneous collections of processors that may vary, for example, in clock frequency, RAM, disk capacity and architecture, is precisely where Grid concepts become relevant. A second problem is that the purchase of needed resources can become quite expensive. Total costs are reduced if resources are shared cooperatively, and, again, this is a key element of Grid computing. At the present time, a pattern in distributed computing resource use is beginning to emerge: problems that are relatively small can be addressed locally, using a Beowulf cluster for example, while large problems that require a considerable expenditure of processing time are shifting to national (or international) Grid constellations, such as the TeraGrid (Reed 2003). The use of distributed, shared resources has required the establishment of protocols that govern their use. In addition, these protocols are designed to support secure movement of information and processing capabilities. Grid computing has encouraged the establishment of flexibly defined virtual organizations (VO) that exist to support interdisciplinary and collaborative research. Such problems, involving, for example, linked human-environment interaction and models in sustainability science, might require expertise in climatology, hydrology, GIScience, computer science, economics, sociology, agriculture, and industrial ecology. Though sufficiently high levels of expertise in each of these areas might be co-located in a single place at a few locations in the world, it is far more likely that scientific expertise will be geographically distributed, perhaps on different continents. Each VO creates and enforces rules that define group membership and rights of access to Grid resources (Foster et al. 2002). Based on these rules, VO members are allowed to access domain-specific, problem-solving environments. In a later section of this article, we describe such an environment that is designed to support the analysis of point-based geographic information using the Grid. The architecture of the Grid is typically conceptualized as a sequence of layers. At the lowest level, the fabric layer specifies protocols and software interfaces that enable access to required hardware resources. Above this layer, middleware consists of specialized software that handles tasks such as user authentication, authorization, encryption, and secure execution; in many ways, this middleware layer holds the Grid together since it also links high-level applications software that performs needed analytical functions (e.g., spatial statistics software) to the low-level disk access and network protocols ( fabric). For example, VO members, such as a spatial statistician and epidemiologist, could access programs and data that are then divided into tasks that are allocated to execute on distributed computing resources. Middleware handles the decomposition of data and allocates tasks based on an assessment of resource availability and capacity, as well as the requirements of each application. Results are then collected (also using middleware) and visualized, and, based on the domain knowledge of users, parameters for otherwise prohibitively timeconsuming analyses can be modified in an exploratory fashion. Implementing a Grid Application Grid applications access distributed resources that may be linked by commodity-grade Internet connections with unknown and highly variable latencies. Consequently, they are normally based on a programming model that supports coarse-grained communication and problem decomposition. Two approaches can be used to accomplish such tasks (Laforenza 2002): The shared data approach can be implemented through the use of distributed virtual shared memory as exemplified by Linda (see Carriero and Gelernter 1990; Rokos and Armstrong 1996). The efficiency of this approach, however, is affected by communication latency, and, consequently, it is not widely adopted. The shared nothing approach, in contrast, has gained widespread acceptance and is used in the example application described in this article. This second approach relies on message passing to coordinate the execution of a balanced set of tasks that have

4 368 Volume 57, Number 3, August 2005 been decomposed for execution on distributed processors. Grid users do not need to know where a program is currently being executed or where intermediate results are being stored. The management of these tasks is handled by middleware, which sits between traditional operating systems (e.g., Linux or Windows) and Grid applications and acts as a metacomputing operating system (Smarr and Catlett 1992). It manages the complexity of securely using geographically distributed computing resources and hides this complexity from application developers and Grid end users. Indeed, middleware is sometimes referred to as plumbing since it connects applications and passes data between them (Laforenza 2002, 1739). Two Grid middleware architectures have been adopted: object oriented and bag of services. The object-oriented architecture takes advantage of object-oriented modeling capabilities such as polymorphism, encapsulation, overloading, and inheritance to address the complexity of designing and implementing Grid middleware. The bag-of-service model, on the other hand, aims to provide configurable middleware components that can be assembled to meet customizable needs from Grid users. Legion ( legion.virginia.edu/) is an object-oriented implementation of Grid middleware, while the Globus Toolkit ( is an exemplar of the bag-of-service approach. We have adopted Globus Toolkit for our research. Globus Toolkit The Globus Toolkit, a part of the National Science Foundation Middleware Initiative ( is becoming a de facto standard of Grid middleware. Globus adopts the hourglass model (National Research Council 1994) to construct interfaces between different layers of Grid middleware components (Foster, Kesselman, and Tuecke 2001) (Figure 1). The hourglass neck consists of the resource and connectivity layers shown in Figure 1. In these two layers, the Globus Toolkit integrates the following four protocols: GSI (Grid Security Infrastructure), based on the Public-Key Infrastructure ( html), is used for authentication, community protection, and authorization. GRAM (Globus Resource Access and Management) is used to interact with local job schedulers such as PBS ( and Condor ( that allocate computational resources for applications; it is also used to monitor and control computing processes. GridFTP extends the Internet File Transfer Protocol to support and manage secure data transfers in Grid environments. GRIP (Grid Resource Information Protocol) is used to define and represent Grid resource information. Higher-level Grid services are built upon these protocols to support application development. For example, DUROC (Dynamically Updated Request Online Co-allocator) is based on GRAM and is used to allocate a set of resources simultaneously. Application Development Application development requires domain-specific contextualization of the components in the top two (collective and application) Grid layers (Figure 1). In the collective layer, middleware development usually involves implementing resource discovery and broker utilities that are tailored to specific applications. Resource discovery utilities are constructed using monitoring and discovery services that are built using GRIP. Resource broker utilities include task scheduling as a major component, and their output normally takes the form of RSL (Resource Specification Language) scripts. These scripts describe resources and computing tasks, and specify how to match them with each other. For example, the following is an RSL script fragment: & (resourcemanagercontact ¼ rtgrid1.its. uiowa.edu/jobmanager-condor ) (maxtime ¼ 2) (count ¼ 1) (memory4 ¼ 256) (memoryo ¼ 512) (executable ¼./GridGiStar ) (arguments ¼ uniform4000.txt ) (directory ¼ /home/swang/gridgistar/ ) This RSL fragment states that the executable GridGiStar located in the /home/swang/ GridGiStar/ directory should be executed

5 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance 369 Figure 1 Grid architecture, after Foster, Kesselman, and Tuecke (2001). once within two minutes on the resource rtgrid1.its.uiowa.edu with a memory size between 256MB and 512MB and that Condor should be used as the local job manager. The arguments statement specifies the particular dataset to be processed as well as the parameters that will control the execution of the distributed program. Additional functions, such as multicast I-O to distribute data across distributed systems and information collection about available resources, are also typically invoked using RSL scripts or other middleware components. Grid Testbed We have implemented a campus Grid testbed to support the development of prototype Gridbased geographic analysis applications and to conduct computational experiments. The testbed is comprised of three Linux clusters that are located in different campus buildings; their individual hardware configurations are shown in Table 1. The three clusters and a Grid client desktop system are connected through the optical fiber infrastructure of The University of Iowa. A snapshot of peer-to-peer network bandwidth information is provided in Figure 2. Globus Toolkit, Condor, and PBS are used as core Grid middleware components (Figure 3). In the next section we describe a spatial statistics problem that was addressed using this Grid testbed. An Illustrative Example:G n i (d) Spatial data typically consist of measurements of a variable of primary interest (here termed the response variable) at sites identified by spatial coordinates. A common research goal in analyzing such data is to identify clusters of points (sometimes called hot spots ) at which values of the response variable are unusually high or unusually low compared to the rest of the region. A related, but broader, goal is to identify pockets of spatial dependence. Getis and Ord (1992) introduced a statistic, G i *(d), that may be used for these purposes. Here we focus on the simplest version of the standardized form of this statistic that was further elucidated by Ord and Getis (1995). Let x i, i ¼ 1,...n, denote measurements of the response variable of interest taken at n distinct locations with known coordinates. One or more distances d of interest are chosen such that most locations in the dataset have at least one neighbor point within distance d. Then a spatial weight matrix {w ij (d)} is defined such that w ij (d) ¼ 1 if location i is within distance d of location j, and w ij (d) ¼ 0 otherwise; note that in the computation of G i *(d), w ii (d) ¼ 1. If we let Table 1 Resource Configurations of a Grid Testbed Name (Department) Processors Total Memory Total Local Disk Storage cluster0.its.uiowa.edu (ITS) 32 1 GHz 16 GB 640 GB 160 GB rtgrid1.its.uiowa.edu (ITS) 6 Athlon GHz 4 GB 240 GB 0.5 TB beowulf.stat.uiowa.edu (Statistics) GHz 14GB GB N/A

6 370 Volume 57, Number 3, August 2005 Figure 2 Network bandwidth of the Grid testbed. Wi ¼ P j w ijðdþ, S1i ¼ P j w2 ijðdþ, and x and s2 denote the standard sample mean and sample variance, then the value of G i *(d) is calculated as P j w ijðdþx j Wi x G i ðdþ ¼ ð1þ sf½ðns1i Þ W 2 i =ðn 1Þg 1=2 In preparation for calculating G i *(d), all the pairwise distances between measurement locations must be computed and the {w ij (d)} derived. Then the calculation in Equation 1 must be performed for each point i in the dataset. The worst-case time complexity of computing G i *(d) sequentially is O(n 3 ) because traversing a twodimensional distance matrix has O(n 2 ) complexity and computations involving the d parameter contribute an additional O(n) complexity. Experimental dataset A synthetic dataset of points (x, y, z) was generated in which each (x, y) pair represents the Euclidian coordinate of a point (i.e., location) and z represents a measured value of a variable at that point. The dataset has 24,000 measured points contained in the range [0 100, 0 100] that are distributed into twelve clusters (Figure 4). Several of these clusters overlap, but in each Figure 3 Software configuration of the Grid testbed.

7 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance 371 Figure 4 Experimental dataset. one of them, points are distributed using a twodimensional normal probability density function, the standard deviation of which is two. The twelve clusters were created based on two density functions. One density function takes 4,000 points as input and uniformly distributes them into six clusters. The other density function takes the remaining 20,000 points as input and uniformly distributes them among the other six clusters. Application-specific Grid middleware for computing G i *(d) Specific components in the application and collective layers (Figure 3) must be developed to support the computation of G i *(d) using Grid resources. These application-specific Grid middleware components must accomplish a set of required tasks with an overarching goal of optimal performance and efficient use of Grid resources. Various issues such as load balancing, speed-up, and scalability are considered to achieve this goal. In our application, a domain decomposition algorithm was implemented that is based on a regular quadtree (Samet 1990) and a space-filling curve that uses a Morton ordering (Morton 1965; Tomlinson 1970, 75). A moving-window technique is applied to each quad cell so that distance computations are localized by the boundaries of the quad cell and its (8-connected) neighbors (cf. Goodchild and Mark 1987). The Morton ordering of cells is used to organize them into a number of local groups such that each group represents a task that is submitted to a Grid resource. The advantage of using a spacefilling curve is that it maintains data locality. This locality property can be used to reduce search and helps to define the set of tasks that are distributed to Grid resources. Task scheduling also takes account of the characteristics of the network (Figure 2) and Grid middleware (Figure 3) for load-balancing purposes. Performance A set of computational experiments was designed to evaluate the application-specific Grid middleware that is used to compute G i *(d) for the experimental dataset (Figure 4). These experiments emulate a situation in which the number of d values considered is incremented to meet data exploration needs. For example, in Table 2, each column represents one independent exploratory scenario. In order to create a basis for comparison, a sequential algorithm for computing G i *(d) was implemented first, and its running time for each independent scenario is shown in Table 2. For the same exploratory scenarios, the running time of the parallel algorithm based on the application-specific Grid middleware is shown in Table 3.

8 372 Volume 57, Number 3, August 2005 Table 2 Computing Time Required to Run a Sequential G i *(d) Algorithm on a Single Processor of the Head Node of rtgrid.its.uiowa.edu Cluster (the Best Computing Resource in the Grid Testbed) Using Several d Values as Input d values used to compute G i *(d) 2 2, 3 2, 3, 4 2, 3, 4, 5 2, 3, 4, 5, 6 Computing Time (seconds) Table 3 also shows the number of tasks computed by individual Grid resources as well as the speed-up of the Grid-based algorithm over the sequential algorithm. The number of tasks allocated to the three individual resources in our Grid testbed is determined not only by the resource hardware configurations (Table 1) but is also inversely proportional to the Grid middleware overhead that is added when remote computing and data transfers are involved. For example, although the resource beowulf.stat. uiowa.edu has better CPU performance than the resource cluster0.its.uiowa.edu, a larger number of tasks are allocated to cluster0 than to beowulf. This is because the PBS middleware used on cluster0 contributes less overhead than the Condor middleware used on the beowulf Cluster. At the same time, the overhead difference from the Grid middleware is more significant than the gain from the hardware configurations in our computational experiments. It is evident that for the more computationally intensive exploratory scenarios (i.e., as more d values are explored), greater speed-ups are achieved. As exploratory analyses become more computationally intensive, the overhead from Grid middleware components becomes proportionally less compared to the performance gain afforded by parallelism. Future Work: Creating Problem-Solving Environments Using Grid Portals During the mid-1990s, Gallopoulos, Houstis, and Rice (1994) developed the concept of a problem-solving environment ( PSE): a computational system that provides a set of high-level tools for solving problems. In particular, a PSE should provide transparent access to heterogeneous distributed computing resources (Laforenza 2002) and allow users to define and modify problems, choose solution strategies, visualize and analyze results, and record and coordinate extended problem-solving tasks. Though this view resembles many gardenvariety GIS and statistical software environments, we are further concerned with geographical problems that require significant computing, high-end, cartographic visualization tools and interaction among groups of individuals (Armstrong 1994; Jankowski and Nyerges 2001; MacEachren and Brewer 2004). This general class of problems is ideally suited for Grid environments. Ramakrishnan et al. (2002), for example, describe five PSE examples, and two of them have significant geographical components that involve computationally intensive models, mapped output, and input from domain experts in several fields. The development of domain-specific PSEs is regarded as an important focus of Grid computing research and development activity. For example, supported by the National Science Foundation Middleware Initiative, the OGCE (Open Grid Computing Environments) project was established in 2003 to foster collaborations and promote the development of sharable and reusable components within the Grid portal development community (Gannon et al. 2004). The OGCE focuses on establishing a repository of portal service components that can be used to develop domain-specific PSEs. Table 3 Computing Time and Speedup of the G i *(d) Algorithm Based on Application-Specific Grid Middleware Grid testbed resources Number of tasks scheduled on each Grid resource d ¼ 2 d ¼ 2, 3 d ¼ 2, 3, 4 d ¼ 2, 3, 4, 5 d ¼ 2, 3, 4, 5, 6 cluster0.its.uiowa.edu beowulf.stat.uiowa.edu rtgrid1.its.uiowa.edu Computing Time (seconds) Speedup

9 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance 373 Figure 5A GeoPSE (geographical problem-solving environment) computational service selection interface. Increasingly, Grid applications software is accessed using high-level Grid portals that employ GUIs and Web service protocols to support software component compositions. Portals hide implementation-level details so that they might be used to concentrate on problemsolving tasks. Most computer users routinely access Web portals; for example, several news, shopping, and search portals are in wide use (e.g., Google) and, in the spatial domain, the Geodata portal ( is designed to serve as a data clearinghouse to help users find data resources with a minimum of effort. In fact, a design goal of some portals is that a user should be two clicks from entering the portal to accessing needed information. In this particular case, the general set of capabilities for a geographical problem-solving environment (GeoPSE) can be implemented using a Web portal. In the case of a GeoPSE, however, additional access to a set of specialized tools and data will be required. At the outset, a GeoPSE portal would need to be tuned narrowly to one or more applications, though, as time progresses, we envision generic GeoPSE portals that will provide access to a broad class of geographic information services. A pointbased GeoPSE will employ generic domain decomposition middleware similar to the quadtree approach described by Wang and Armstrong (2003). Localized tasks will be processed using distributed resources using application-specific Figure 5B Selection of domain decomposition and task scheduling services.

10 374 Volume 57, Number 3, August 2005 software that is organized into classes, such as spatial statistics and interpolation (see Figure 5). Each class is further refined to the point of a user selecting parameters such as the d used in our illustrative example, or the k parameter when k- closest neighbors are used by interpolation algorithms to create surfaces. We are currently developing a portal for a GeoPSE that is designed not only to support spatial statistics but also other types of computationally intensive analyses such as interpolation and Markovchain, Monte Carlo methods. Concluding Discussion Geographers and others involved in the use of computer technology to address complex environmental, social, and economic problems, must be knowledgeable about changes in the state of the practice in computing. At present, a sea change is taking place in the way that access to computational resources is occurring. Grid computing, which is enabled by a class of software called middleware, is markedly different from the high-performance computing environments of the previous decade. Though middleware that is tuned to a specific class of applications in geography remains to be fully developed, our initial attempts demonstrate that Grid technologies can be employed successfully to address a spatial statistics problem that shares several basic characteristics with other point-based analyses such as interpolation and optimization. We expect that middleware can be designed and implemented to handle flexible allocations of point data to distributed processors in the near term since our initial experience with quadtree-based decomposition appears to show considerable promise (Wang and Armstrong 2003). Though a considerable amount of work remains to be completed before a fully realized generic GeoPSE is available to support the analysis of point data, we are optimistic about its prognosis. Note 1 A terabyte is a measure of computer storage capacity and is 2 40 bytes or approximately a thousand billion bytes (a thousand gigabytes). A petabyte is 2 50 bytes or, approximately a thousand terabytes. Literature Cited Armstrong, Marc P Requirements for the development of GIS-based group decision support systems. Journal of the American Society for Information Science 45 (9): Geography and computational science. Annals of the Association of American Geographers 90: Armstrong, Marc P., Claire E. Pavlik, and Richard Marciano Experiments in the measurement of spatial association using a parallel supercomputer. Geographical Systems 1 (4): Armstrong, Marc P., Ningchuan Xiao, and David Bennett Using genetic algorithms to create multicriteria class intervals for choropleth maps. Annals of the Association of American Geographers 93: Bell, Gordon, and James Gray The revolution yet to happen. In Beyond calculation: The next fifty years of computing, ed. Peter Denning and Robert Metcalfe, New York: Springer-Verlag. Borkar, Shekhar Getting Gigascale chips: Challenges and opportunities in continuing Moore s law. ACM Queue 1 (7): Carriero, Nicholas, and David Gelernter How to write parallel programs: A first course. Cambridge, MA: MIT Press. Cowles, M. K Efficient model-fitting and model-comparison for high-dimensional Bayesian geostatistical models. Journal of Statistical Planning and Inference 112: Cramer, Barton E., and Marc P. Armstrong An evaluation of domain decomposition strategies for parallel spatial interpolation of surfaces. Geographical Analysis 31 (2): Foster, Ian The Grid: A new infrastructure for 21st century science. Physics Today, February: Foster, Ian, and Carl Kesselman The Grid: Blueprint for a new computing infrastructure. San Francisco: Morgan Kaufmann Publishers. Foster, Ian, Carl Kesselman, Jeffrey Nick, and Steve Tuecke Grid services for distributed system integration. IEEE Computer 35 (6): Foster, Ian, Carl Kesselman, and Steve Tuecke The anatomy of the Grid: Enabling scalable virtual organization. International Journal of High-Performance Computing Applications 15 (3): Gallopoulos, Efstratios, Elias Houstis, and John R. Rice Computer as thinker/doer: Problem solving environments for computational science. IEEE Computational Science and Engineering 2: Gannon, Dennis, Geoffrey Fox, Marlon Pierce, Beth Plale, Gregor Von Laszewski, Charles Severance, Joseph Hardin, Jay Alameda, Mary Thomas, and John Boisseau Grid portals: A scientist s access point for Grid services (draft 1). Global Grid

11 Using a Computational Grid for Geographic Information Analysis: A Reconnaissance 375 Forum technical document, available at: extreme.indiana.edu/gannon/ggf-portals-draft. pdf (last accessed 16 August 2004). Gao, Guang R., Kevin B. Theobald, Ziang Hu, Haiping Wu, Jizhu Lu, Keshav Pingali, Paul Stodghill, Thomas L. Sterling, Rick Stevens, and Mark Hereld Next generation system software for future high-end computing systems. International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL. Getis, Arthur, and J. K. Ord The analysis of spatial association by distance statistics. Geographical Analysis 24: Goodchild, Michael, and David Mark The fractal nature of geographic phenomena. Annals of the Association of American Geographers 77: Griffith, Daniel A Supercomputing and spatial statistics: a reconnaissance. The Professional Geographer 42: Jankowski, Piotr, and Timothy Nyerges Geographic Information Systems for group decision making. London: Taylor & Francis. Laforenza, Domenico Grid programming: some indications where we are headed. Parallel Computing 28: Licklider, Joseph C. R. Lick, and Robert Taylor The computer as a communication device, available at: (last accessed 16 August 2004). MacEachren, Alan, and Isaac Brewer Developing a conceptual framework for visually-enabled geocollaboration. International Journal of Geographical Information Science 18 (1): Moore, Gordon E Cramming more components onto integrated circuits. Electronics 38 (8): Morton, Guy M A computer oriented geodetic data base and a new technique in file sequencing. Ottawa, Canada: IBM. National Research Council Realizing the information future: The Internet and beyond. National Academy Press, available at: readingroom/books/rtif/(last accessed 16 August 2004). Ord, J. Keith, and Arthur Getis Local spatial autocorrelation statistics: distributional issues and an application. Geographical Analysis 27: Rajasekar, Arcot, Michael Wan, and Reagan Moore MySRB & SRB Components of a data Grid. The 11th International Symposium on High Performance Distributed Computing (HPDC-11), July 24 26, Edinburgh, U.K. Ramakrishnan, Naren, Layne Watson, Dennis Kafura, Calvin Ribbens, and Clifford Shaffer Programming environments for multidisciplinary Grid communities. Concurrency and Computation: Practice and Experience 14: Reed, Daniel A Grids, the Teragrid, and beyond. IEEE Computer 36 (1): Rokos, Demitrius, and Marc P. Armstrong Using Linda to compute spatial autocorrelation in parallel. Computers & Geosciences 22 (5): Samet, Hanan Applications of spatial data structures. Reading, MA: Addison Wesley. Smarr, Larry Grids in context. In The Grid: Blueprint for a new computing infrastructure, ed. Ian Foster and Carl Kesselman, San Francisco: Morgan Kaufman. Smarr, Larry, and Charles Catlett Metacomputing. Communications of the Association for Computing Machinery 35 (6): Sterling, Thomas, John Salmon, Donald Becker, and Daniel Savarese How to build a Beowulf. Cambridge, MA: MIT Press. Thakar, Ani, Alexander Szalay, Peter Kunszt, and James Gray Migrating a multiterabyte archive from object to relational databases. Computing in Science and Engineering 5 (5): Tomlinson, Roger F Environment Information Systems. The Proceedings of the UNESCO/IGU First Symposium on Geographical Information Systems, Ottawa, September, 1970, a publication of the International Geographical Union Commission on Geographical Data Sensing and Processing. Wang, Shaowen, and Marc P. Armstrong A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing 29 (10): MARC P. ARMSTRONG is a Professor in the Department of Geography and the Program in Applied Mathematical and Computational Science at The University of Iowa, Iowa City, IA marc-armstrong@uiowa.edu. His research interests include computational geography, cartography, mobile computing, and spatial decision support systems. MARY KATHRYN COWLES is an Associate Professor in the Department of Statistics and Actuarial Science and the Department of Biostatistics at The University of Iowa, Iowa City, IA katecowles@uiowa.edu. Her research interests include Bayesian methods in biostatistics and environmental science, Markov chain Monte Carlo algorithms and convergence assessment, and statistical computing. SHAOWEN WANG is an Associate Research Scientist in the Academic Technologies Research Services Division of Information Technology Services and an adjunct Assistant Professor in the Department of Geography at The University of Iowa, Iowa City, IA shaowen-wang@uiowa.edu. His research interests include Grid computing, parallel and distributed computing, computationally intensive geographic analysis, and problem-solving environments.