The Advantages and Disadvantages of a Standard Data Storage System

Transcription

1 Databases: towards performance and scalability Bibliographical study Silviu-Marius Moldovan Supervisors: Gabriel Antoniu, Luc Bougé INSA, IFSIC, IRISA, Paris Project February 2008 Keywords: databases, grids, scalability, performance, fault tolerance, consistency.

2 Contents 1 Introduction 2 2 Getting efficient: main-memory databases Main-memory databases: a few new issues Case study: The DERBY data storage system Getting scalable: distributed main-memory databases Motivation Case study: Sprint Databases and grids Motivation Grid data sharing services DSM systems P2P systems A prototype grid data-sharing service: the JuxMem platform Conclusion 11

3 1 Introduction The most common way to store data nowadays relies on databases. Almost every type of application use them, ranging from those performing intense scientific computation to those storing personal data of employees in a factory. As time goes by, databases will get bigger and bigger and more difficult to manage. The amount of data to be stored will be larger and we will be facing the problem of lacking storage space, even though disk storage capacity is growing at a reasonable rate. Because database data is disk-resident, the access to data will be even slower as the database will be larger. In-memory databases are one way to increase the access perfomance. In this approach, the data is stored in the main-memory and a backup copy may be kept on the disk. Since memory access time is smaller than disk access time, the access time can be reduced. But the storage capacity of the database will still be limited to the memory of a single machine. A simple approach to extend the database and increase the storage capacity is to use distributed databases. Thus, one could use the storage capacities and the computing power of different nodes in the system. Of course, supplementary problems might arise. For example, one might want, for efficiency, to replicate a piece of data and to have several copies of it, on different nodes. Thus, the coherence of the multiple copies must be assured. For that, the underlying infrastructure of the database must provide efficient consistency protocols. Data shared memory(dsm) systems only deal with unstructured data, in contrast to databases, but they provide consistency protocols for the mutable (modifiable) data that is shared. Also, the data sharing and transfers are done in a transparent way, without involving the user. The resources in such a system are trusted and under a central autority. The main limitation of DSM systems is that they are designed for small-scale configurations (just a few tens of nodes) in static environments, where nodes are not supposed to be leaving or joining the system. Peer-to-peer(P2P) systems, on the other hand, offer large scale (millions of nodes) data sharing possibilities, in a completely decentralized way. The configuration is dynamic, nodes joining and leaving the system all the time. Thus, fault-tolerance mechanisms are provided. As opposed to DSM systems, only immutable data can be shared, in order not to affect the system s scalability. A hybrid design between DSM and P2P systems could harness the advantages the two type of systems provide. Such an approach can be used for a grid data sharing service. Grid systems, infrastructures of heterogeneous resources interconnected by networks, were born from the need for larger computing power and storage capacity. Even though grids were designed in the context of scientific applications, like numerical simulations, a data sharing service for grids seems a good infrastructure for the design of a scalable and performant database. The rest of the report is structured as folllows: Section 2 presents main-memory databases and their advantages; Section 3 draws the attention on the importance of distributed databases; Section 4 explores the possibility of using a grid infrastructure to implement a database; finally, Section 5 concludes on the main points of this report and defines the subject of the internship. 2

4 2 Getting efficient: main-memory databases Storing large databases in memory has become a reality nowadays, since memory is getting cheaper and chip capacity increases. In main-memory database systems data is stored in main physical memory, as opposed to the conventional approach in which data is disk resident. Even if the data may have a copy on disk, the primary copy lives permanently in memory, and this has great influence on the design and performance of this type of database system. 2.1 Main-memory databases: a few new issues The properties that distinguish main memory from magnetic disks lead to optimizations in most of the aspects of database management. Memory resident data has impact on several functional components of database management systems, as shown in [10]. These new issues are illustrated below. Concurrency Control The most commonly used concurrency control methods in practice are lock-based. In conventional database systems, where data are disk-resident, small locking granules (fields or records) are chosen, to reduce contention. In main-memory database systems, on the other hand, contention is already low because data are memory resident. Thus, for these systems it has been suggested that very large lock granules (e.g., relations) are more suitable for the data. The lock granule can be chosen to be the entire database, in the extreme case, which leads to a desirable, serial execution of transactions. In a conventional system, locks are implemented through a hash table that contains entries for the locked objects. The objects themselves, located on the disk, contain no lock information. In main-memory systems, in contrast, a few bits in the objects can be used to represent lock status. The overhead of a small number of bits is not significant, for records with reasonable length, and lookups through the entire hash table are avoided. Persistence To first step to achieve persistence in a database system is to keep a log of transaction activity. This log must reside on stable storage. In conventional systems, it is copied directly to the disk. But logging can affect response time, as well as throughput, if the log becomes a bottleneck. These two problems exist in main-memory systems, too, but their impact on performance is bigger, since logging is the only disk operation required by each transaction. To eliminate the response time problem, one solution to this problem is to use a small amount of stable main memory to hold a portion of the log (the log tail); after its log information is written into stable memory, a transaction is considered comitted. If there is not enough memory for the log tail, the solution is to pre-commit transactions. This implies releasing the transaction s locks as soon as its log record is placed in the log, without waiting for the information to be propagated to the disk. To relieve a log bottleneck, group commits can be used: the log records of several transactions are allowed to accumulate in memory, and they are flushed to the disk in a single disk operation. The second step in achieving persistence is to keep the disk resident copy of the database up-to-date. This is done by checkpointing, which is one of the few reasons to access the disk resident copy of the database, in a main-memory database system. Thus, disk access can be tailored to the needs of the checkpointer: disk I/O is performed using very large blocks, which are more efficiently written. The 3

5 third step to achieve persistence is recovery from failures, which implies restoring the data from its disk resident backup. To do that in main-memory systems, one solution that increase performance is to load blocks of the database on demand until all data has been loaded. Another solution is to use disk stripping: the database is spread across multiple disks and read in parallel. Access methods In conventional systems, index structures like B-Trees are used for data access. They have a short, bushy structure and the data values on which the index is built need to be stored in the index itself. In main-memory systems, hashing is one solution to access data. It provides fast lookup and update, but is not as spaceefficient as a tree. Thus, tree structures such as the T-Tree have been designed explicitly for memory-resident databases [12]. As opposed to B-Trees, main memory trees can have a deeper, less-complicated structure, since traversing them is much faster. Also, because random access is fast in main memory, index structures can store pointers to the indexed data, rather than the data itself, which is more efficient. Data representation In main memory databases, pointers are used for data representation: relational tuples can be represented as a set of pointers to data values. Using pointers has two main advantages: it is space efficient (if large values appear more than once in the database) and it simplifies the handling of variable length fields. Query processing In disk resident systems, the techniques that are used for query processing take advantage of sequential access, which is faster for disks. In memory resident databases, on the other hand, sequentiel access is not much faster than random access. The common approach to speed up queries, in this case, is to build appropriate, compact structures for the in-memory data. Query processors for conventional systems target to minimize disk access, while query processors for memory resident systems are designed to reduce processing costs. Performance assessing The metrics used in performance analysis are different in the case of disk resident and memory resident systems. The performance of disk-based systems depends on the characteristics of the disk. These systems count I/O operations to determine the performance of an algorithm. The performance of main memory databases, on the other hand, depends mostly on processing time. The components taken into account when assessing performance are different, too. For example, in conventional systems making backups does not impact performance. This contrasts with the case of memory resident systems, where backups are more frequent and involve writing to devices slower than memory. Thus, the performance of checkpointing algorithms is much more critical. Application Programming Interface In conventional systems, application exchange data with the database by using private buffers. In memory resident systems, we can eliminate these buffers and give transactions direct access to objects. For security issues, however, only transactions compiled by a special database system compiler (that checks for proper authorization) should be run. Another possible optimization is to give applications that want to read an object the actual memory position of the object, instead of a more general object id, used in conventional systems. This provides more efficient object access. 4

6 Data clustering and migration In disk resident systems, it is common to store together (cluster) data objects that are accessed together. In memory resident systems, as opposite, not only there is no need to cluster objects, but components of one object may be spread in memory. However, problems specific to main memory databases arise, related to the moment and place where an object should be migrated to the disk. As a solution to this, either the user specifies how objects are clustered, or the system determines the clusters automatically (dynamic clustering). 2.2 Case study: The DERBY data storage system The DERBY data storage system, described in [11], is used to support a distributed memory resident database system. DERBY consists of general purpose workstations, connected through a high-speed network, and makes use of them to create a massive, cost-effective main memory storage system. All applications (including the DERBY database application) executed by users of a workstation are clients of the DERBY storage management system. Servers execute on machine with idle resources and provide the storage space for the DERBY memory storage system. They have the lowest priority and are "guests" of the machines they borrow resources from. One important advantage of DERBY is that, at any moment, each workstation can be operating as a client, server, both or neither and this may change over time. Another advantage is that the system configuration models a dynamic, realistic data processing environment where workstations come and go over time. The primary role of a DERBY client is to forward read/write requests from the database application to the DERBY server where the record is located. DERBY s basic data storage abstraction assumes there is exactly one server responsible for each record. A server where the record is stored is called the primary location of the record; this location may change over time, due to the dynamic nature of the system. The primary role of a server in DERBY is to keep all records memory resident and to avoid disk accesses when satisfying client requests. Servers guarantee long-term data persistence (they eventually propagate modified records to the disk), but also short-term persistence without disk storage. The latter is achieved through a new and interesting approach: a part of the nodes in the network are supplied with Uninterrupted Power Supplies (UPS) which temporarily hold data. This a cost-effective way to achieve reasonably large amounts of persistent storage. To provide high-speed persistent storage, each server is associated with a number of workstations equipped with these UPS-s. To guarantee data consistency and regulate concurrent access, DERBY provides a basic locking mechanism, with a lock granularity of a single record. This is quite unusual, since big granularities are most appropriate for memory resident systems. Every client must hold locks for the records it operates on. The server maintains a list with each record of the clients caching the record as read-only and as write-locked. One important contribution of DERBY is that, during initial testing, it has been shown that load balancing has little effect on the performance of memory-based systems, in contrast to disk-based storage systems. Memory-based systems need to consider migrating load only when a limited resource is near saturation. Thus, DERBY dynamically redistributes the load via a saturation prevention algorithm that is different from the conventional approach to load balancing. The algorithm attempts to optimize processing load constrained by memory space availability, more exactly to find the first acceptable distribution of load that does not 5

7 Figure 1: The DERBY architecture (node 5 serves as the UPS for nodes 1, 2, 3 and 4) exceed a predefined threshold of available memory space on any machine. This way, the servers are kept from reaching their saturation points. 3 Getting scalable: distributed main-memory databases 3.1 Motivation Applications accessing an in-memory database are usually limited by the memory capacity of the machine hosting the IMDB. By distributing the database on a cluster of workstations the application will take advantage of the aggregated memory of all the machines in the cluster, and, thus, the available storage capacity will be larger. Also, by distributing the database over more sites, fault tolerance can be achieved. Copies of the same piece of data can be replicated over multiple sites and, so, if one node fails, the data can be recovered from these sites. The failure recovery will also be performant, since for restoring a failed node, the backup copies of the data which are closest to the node will be used. 3.2 Case study: Sprint Sprint is a middleware infrastructure for high performance and high availability data management. It manages commodity in-memory databases running in a cluster of sharednothing servers, as stated in [8]. Applications will then be limited by the aggregated capacity of the servers in the cluster. The hardware infrastructure of Sprint is represented by its physical servers, while the software one by the logical servers: edge servers, data servers and durability servers. The advantage of this decoupled design is that each type of server can handle a different aspect of database management. Edge servers receive client queries and execute them against data servers. Data servers run a local in-memory database and execute transactions without accessing the disk. Durability servers ensure transaction persistency and handle recovery. Sprint partitions and replicates the database into segments and stores them in several data servers. 6

8 Figure 2: The Sprint architecture Physical servers communicate by message-passing only, while logical ones can use pointto-point or total order multicast communication. Physical servers can fail by crashing and may recover after the failure but lose all information stored in the main-memory before the crash. The failure of a physical server implies the failure of all the logical servers it hosts. Sprint tolerates unreliable failure detection. The advantages of this are fast reaction to failures and the certainty that failed servers are eventually detected by operational servers. The limitation of this approach is illustrated by the case of an operational server that may be mistakenly suspected to have failed. But even if that happens, the system remains consistent: the falsely suspected server is replaced by another one and they exist simultaneously for a certain time. The traditional ACID (atomicity, consistency, isolation, durability) properties of a transaction are guaranteed by Sprint. The system distinguishes between two types of transactions: local transactions that only access data stored on a single data server, and global transactions that access data on multiple servers. Database tables are partitioned over the data servers. Data items can be replicated on multiple data servers, which leads to several benefits. First of all, the failure recovery mechanism is more efficient. Then, this allows parallel execution of read operations, even though it would make write operations more difficult, since all replicas of a data item must be modified. One important contribution of Sprint is its interesting approach to distributed query processing. In traditional parallel database architectures, high-level client queries are translated into lower-level internal requests. In Spring, however, a middleware solution is adopted: external SQL queries are decomposed into internal ones, according to the way the database is fragmented and replicated. Also, the distributed query decomposition and merging are sim- 7

9 ple, since Sprint was designed for multi-tier architectures, where transactions are predefined and parameterized prior to execution. Experiments conducted on clusters with up to 64 nodes (among which 32 data servers) showed that Sprint can provide very good performance and scalability. 4 Databases and grids 4.1 Motivation As seen above, the development of database system is oriented towards more storage capacity and a better performance. The user of the database should not know, nor care, where his data are. Grid computing has emerged as a response to the growing demand for resources. A grid system seems to offer the necessary infrastructure for the storing databases: they offer much larger amounts of storing capacities, on many nodes. This could allow to store larger volumes of data. But new problems arise, related to grid infrastructures. A grid system is composed of many hosts, from many administrative domains, with heterogeneous resource capabilities (computing power, storage, operating system). Also, together with the number of nodes, the number of failures will increase. Thus, special care must be taken when managing data in such a system. Grids have been developped in the context of high-performance computing (HPC) applications, like numerical simulations. The use of these infrastructures has not been explored in the context of databases. Our goal is to explore this possibility. 4.2 Grid data sharing services A data sharing service for grid computing opens an alternative approach to the problem of grid data management, traditionally based on explicit data transfers. This concept decouples data management from grid computation, by providing location transparency as well as data persistence in a dynamic environment. The ultimate goal of such a system, as stated in [4], is to provide transparent access to data. The most widely-used approach to data management in grid environments is by explicit data transfers. But this is a big limitation in using efficiently a large-scale computational grid, since the client has to specify where the input data is located and where it has to be transferred. By providing transparent access to remote data through an external datasharing service, the client does not have to handle data transfers and does not have to care where his data are. There are another three properties that a data-sharing service provides. First of all, it provides persistent data storage, to save data transfers. Since large masses of data are to be handled, data transfers between different grid components can be costly and is desirable to avoid repeating them. Solutions to this problem reuse previously produced data, trigger pre-fetching actions (to anticipate future accesses) or provide information on data location to the task scheduler. Second of all, the service is fault tolerant. Data is available despite events that can occur because of the dynamic character of the grid infrastructure, like resources joining and leaving, or unexpected failures. Thus, it is necessary to apply replication techniques and failure detection mechanisms. Finally, the consistency of data is assured. Since data manipulated by grid applications are mutable, and data are often replicated to enhance access locality, one must ensure the consistency of the different replicas. To achieve 8

10 this, the service relies on consistency models, implemented by consistency protocols. Building a data-sharing service for the grid requires a new approach to the design of consistency protocols, since the previous work (in the context of DSM systems) assumes a small-scaled, stable architecture, without failures, assumption which does not hold in the grid context. According to [5], a data sharing service for grid computing can be thought of as a compromise between DSM systems and P2P systems. A hybrid system can benefit both from the advantages provided by DSM systems and from those provided by P2P systems, while having an architecture with intermediate features. These two classes of systems are discussed below DSM systems Distributed Shared Memory (DSM) systems provide, as a central feature, transparent data sharing, via a unique address space accessible to physically distributed machines, according to [5]. First, these systems provide transparent access to data: all nodes can read and write any shared data, local or remote, in a uniform way. The system takes the appropriate action in order to satisfy the access, after internally checking for data locality. Second, DSM systems provide transparent localization: if the client program accesses remote data, it is the responsability of the system to localize, transfer or replicate it locally, according to the existing consistency protocol. Thus, the user does not have to handle the data explicitly and does not know, nor care, where the data he is working with are localized. In these systems, the nodes are generally under the control of a single administration and the resources are trusted. Also, from the point of view of the resources they offer, the nodes in the system are homogeneous. DSM systems are typically used to support complex numerical simulation applications, where data are accessed in parallel by multiple nodes. For efficiency, data are replicated on multiple nodes. Since one great advantage of DSM systems is that they share mutable data, the consistency between the replicas of the same piece of data, distributed on different nodes, must be assured. Thus, a large variety of DSM consistency models and protocols have been defined, which provide different compromises between the strength of the consistency guarantees and the efficiency of the consistency actions. However, these systems have drawbacks, too. They have been designed for small-scale configurations (tens or hundreds of nodes), corresponding to a cluster of computers. The main reason for this is the lack of scalability of the algorithms that handle data consistency. Also, the majority of consistency protocols assume a static environment, where nodes do not leave (by disconnecting or failing) and new nodes do not join the system. Node failures are considered infrequent and are not regarded as normal behavior. That is why, generally, DSM systems do not have any mechanism for fault tolerance integrated P2P systems Peer-to-peer (P2P) systems proved to be an adequate approach for data sharing on highly dynamic, large scale configurations (millions of nodes). The environment is a dynamic one, with new nodes joining or leaving (because of disonnection or failure) at any time, as opposed to DSM systems. The failure of a node is considered normal and, thus, fault tolerance mechanisms are provided. 9

11 Traditional distributed systems, based on the client-server model, have limited scalability, mostly because of the bottlenecks that occur from the use of a single, centralized server. In the underlying model of P2P systems, on the other hand, relations between machines are symmetrical: each node can be client in a transaction and server in another, and each client node can be a server for the other peers (nodes), by providing the shared files. Such a model has the advantage that it scales very well, without any need for a centralized storage server. By removing the above mentioned bottlenecks, the P2P model not only enhances the system s scalability, but also improves fault tolerance and data availability. Most P2P systems have been designed for storing and sharing only immutable files (unlike DSM systems), which is their main drawback. Also, the files shared are generally of media type (music, films) and that is why these systems are so popular nowadays. As opposed to DSM systems, the shared resources are heterogeneous (in terms of processors, operating systems, network links etc) and, generally, they do not present any trust guarantee or control. Most P2P systems choose to favor scalability, by sacrifying data mutability, as explained in [5]. If the shared data are read-only, they are easily replicated on multiple nodes. If the data were modifiable, some mechanism would have to be provided to handle the consistency of data copies. But previous experiments have proven that guaranteeing consistency is a serious problem for P2P systems, since all solutions limit the system s scalability, which is highly undesirable A prototype grid data-sharing service: the JuxMem platform As pointed in the previous paragraphs, DSM systems provide transparent access to data and consistency protocols, but neglect fault tolerance and scalability. P2P systems, on the other hand, provide scalable protocols and cope with volatility, but generally deal with immutable data and, thus, do not address the consistency issue. In order to take advantage simultaneously of these two type of systems, an architecture is proposed ([5], [4]) for a data sharing service. The software architecture of JuxMem (for Juxtaposed Memory) reflects the hardware architecture of a grid: a hierarchical model consisting of a federation of distributed clusters of computers. This architecture is made up of a network of peer groups (called cluster groups), which can correspond to clusters at the physical level, to a subset of the same physical cluster, or to nodes spread over several physical clusters. All the groups, together with all the peers that run the service, are included in a larger group: the juxmem group. Each cluster group is composed of nodes that provide memory for data storage, called providers. In each cluster group, a node is used to make up the backbone of the network of peers and to manage the available memory: the cluster manager. Finally, all the nodes that simply use the service to allocate and to access data blocks are called clients. A node may at the same time play more roles, from the ones mentioned above. Each block of data stored in the system is replicated and associated to a group of peers called data group, each peer in the group hosting a copy of the same data block. The data group can be made up of providers from different cluster groups and, thus, the data could be replicated on several physical clusters. The JuxMem architecture is dynamic, since cluster groups and data groups can be created at run time. In order to allocate memory, the client must specify on how many clusters the data should be replicated, and on how many nodes in each cluster. In response, a set of data replicas are 10

12 "JuxMem" group Client "Cluster A" group "Data" group "Cluster C" group Manager Provider Group "cluster B" Overlay network Physical network Cluster A Cluster C Node Cluster B Figure 3: Hierarchy of the entities in the network overlay defined by JuxMem instantiated and an ID is returned. To read or write a data block, the client only needs to specify this ID, and JuxMem transparently locates the corresponding data block and performs the necessary data transfers. JuxMem uses replication within data groups to keep data available despite failures. To manage these groups in the presence of failures, group communication and group management protocols (studied in the context of fault-tolerant distributed systems) are employed. The consistency of the different copies of a same piece of data must be assured. Consistency protocols studied within the context of DSM systems cannot be used, since they assume a static configuration, which does not hold in the context of large-scale, dynamic grid infrastructure. The approach that JuxMem takes to this problem starts from the fact that in many consistency protocols (e.g., in home-based protocols), for each data there is a node holding the most recent copy. The home node (assumed never to fail) is in charge of maintaining a reference data copy. Implementing the home entity using a self-organizing replication group (like JuxMem s data group) allows the consistency protocol to assume that this entity is stable. A protocol implementing the entry consistency model in a fault-tolerant way has been developed. To limit inter-cluster communications, the home entity is organized in a hierarchical way: local homes, at cluster level, are the clients of a global home, at grid level ([6], [14]). A prototype of JuxMem proposing these functionalities was implemented in the Paris project, using the generic JXTA ([1]) P2P library. 5 Conclusion The progress made in the design of databases has emerged from the needs of larger storage space and more performant operations. Main-memory databases have brought many new optimizations in almost all the design aspects of databases: concurrency control, commit processing, access methods, data representation, query processing, failure recovery, performance, data clustering. Probably the shorter access time and higher transaction throughput are the most important. DERBY is a typical example of an in-memory database system, that uses a saturation prevention algorithm for load balancing and UPS-s to assure short- 11

13 term persistence of data. Also, this system models a dynamic, realistic environment where server/client roles may change. Distributed main-memory database systems make use of the aggregated memory capacities of all the nodes and improve failure recovery. Sprint, an in-memory database system running in a cluster of servers, supports unreliable failure detection and offers a middleware solution to query decomposition. It also has specialized logical servers and provides very good performance and scalability. The systems mentioned above use clusters of computers. The next step in a natural evolution is represented by grids. A data sharing service for grid computing would decouple data management from grid computation, while still providing huge storing and comptuing capacities. The concept can be designed as a hybrid approach between DSM and P2P systems, that benefits from the advantages they offer: high scalability, dynamicity (P2P), transparency, consystency (DSM). The JuxMem software system illustrates that this hybrid approach can be successfully implemented. On grids, numerical simulations are the typical applications. It is therefore interesting to explore how these infrastructures could be useful in the context of database applications. A first step in this direction has been taken ([3]) by extending the JuxMem grid data-sharing service with a database-oriented API, allowing to perform basic operations like table creation, record insertion and simple select queries. To approach to achieve this relies on using more sophisticated high-level layers over JuxMem, each layer having a precise role in database management: data storage, indexing, table fragmentation a.s.o. The contribution of this work is that it proposes a grid computing infrastructure, as opposed to previous efforts, which provided a distributed main-memory database management system that reached only the scale of a cluster of computers. The implementation has been realized using the C programming language, based on JuxMem 0.4. The purpose of this internship is to further explore the possibility to design a scalable and performant grid-enabled database management system, based on the concept of grid datasharing service. Various representations for index structures will be studied, with respect to the needs resulting from the access patterns exhibited by a few types of applications to be defined. The implementation will be performed by coupling the JuxMem platform with an existing open-source database engine. The Grid 5000 grid testbed ([9]) will be used for experimental evaluation. 12

14 References [1] The JXTA project (juxtaposed). [2] Fuat Akal, Klemens Böhm, and Hans-Jörg Schek. Olap query evaluation in a database cluster: A performance study on intra-query parallelism. In ADBIS 02: Proceedings of the 6th East European Conference on Advances in Databases and Information Systems, pages , London, UK, Springer-Verlag. [3] Abdullah Almousa Almaksour, Gabriel Antoniu, Luc Bougé, Loïc Cudennec, and Stéphane Gançarski. Building a dbms on top of the juxmem grid data-sharing service. In Proc. HiPerGRID Workshop, Brasov, Romania, September Held in conjunction with Parallel Architectures and Compilation Techniques 2007 (PACT2007). [4] Gabriel Antoniu, Marin Bertier, Eddy Caron, Frédéric Desprez, Luc Bougé, Mathieu Jan, Sébastien Monnet, and Pierre Sens. Future Generation Grids, chapter GDS: An Architecture Proposal for a Grid Data-Sharing Service, pages CoreGRID series. Springer-Verlag, [5] Gabriel Antoniu, Luc Bougé, and Mathieu Jan. Peer-to-peer distributed shared memory? In Proc. IEEE/ACM 12th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT 2003), Work in Progress Session, pages 1 6, New Orleans, Louisiana, September [6] Gabriel Antoniu, Jean-François Deverge, and Sébastien Monnet. How to bring together fault tolerance and data consistency to enable grid data sharing. Concurrency and Computation: Practice and Experience, 18(13): , November [7] P. M. G. Apers, C. A. van den Berg, J. Flokstra, P. W. P. J. Grefen, M. L. Kersten, and A. N. Wilschut. Prisma/db: A parallel, main memory relational dbms. IEEE Transactions on Knowledge and Data Engineering, 4(6): , [8] Lásaro Camargos, Fernando Pedone, and Marcin Wieloch. Sprint: a middleware for high-performance transaction processing. In EuroSys 07: Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages , New York, NY, USA, ACM. [9] Franck Cappello, Eddy Caron, Michel Dayde, Frederic Desprez, Emmanuel Jeannot, Yvon Jegou, Stephane Lanteri, Julien Leduc, Nouredine Melab, Guillaume Mornet, Raymond Namyst, Pascale Primet, and Olivier Richard. Grid 5000: a large scale, reconfigurable, controlable and monitorable Grid platform. In Grid 2005 Workshop, Seattle, USA, November IEEE/ACM. [10] H. Garcia-Molina and K. Salem. Main memory database systems: An overview. IEEE Transactions on Knowledge and Data Engineering, 4(6): , [11] Jim Griffioen, Radek Vingralek, Todd A. Anderson, and Yuri Breitbart. DERBY: A memory management system for distributed main memory databases. In RIDE-NDS, pages ,

15 [12] Tobin J. Lehman and Michael J. Carey. A study of index structures for main memory database management systems. In VLDB 86: Proceedings of the 12th International Conference on Very Large Data Bases, pages , San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. [13] Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst., 7(4): , [14] Sébastien Monnet. Gestion des données dans les grilles de calcul : support pour la tolérance aux fautes et la cohérence des données. Thèse de doctorat, Université de Rennes 1, IRISA, Rennes, France, November To appear. 14