Principles of Distributed Data Management in 2020?
|
|
|
- Rudolph Walsh
- 10 years ago
- Views:
Transcription
1 Principles of Distributed Data Management in 2020? Patrick Valduriez To cite this version: Patrick Valduriez. Principles of Distributed Data Management in 2020?. DEXA 11: International Conference on Databases and Expert Systems Applications, 2011, Toulouse, France. Springer, 6860, pp.1-11, 2011, Lecture Notes in Computer Science. < < / >. <hal > HAL Id: hal Submitted on 11 Nov 2011 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
2 Principles of Distributed Data Management in 2020? 1 Patrick Valduriez INRIA and LIRMM, Montpellier France [email protected] Abstract. With the advents of high-speed networks, fast commodity hardware, and the web, distributed data sources have become ubiquitous. The third edition of the Özsu-Valduriez textbook Principles of Distributed Database Systems [10] reflects the evolution of distributed data management and distributed database systems. In this new edition, the fundamental principles of distributed data management could be still presented based on the three dimensions of earlier editions: distribution, heterogeneity and autonomy of the data sources. In retrospect, the focus on fundamental principles and generic techniques has been useful not only to understand and teach the material, but also to enable an infinite number of variations. The primary application of these generic techniques has been obviously for distributed and parallel DBMS versions. Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems (e.g. MapReduce, Hadoop, SciDB, Peanut, Pig latin, etc.) are emerging and receiving much attention from the research community. Although they do well in terms of consistency/flexibility/performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability. The key questions I discuss are: What are the fundamental principles behind the emerging solutions? Is there any generic architectural model, to explain those principles? Do we need new foundations to look at data distribution? 1 Introduction The 1980 s were very active periods for the development of distributed relational database technology and all commercial DBMSs today are distributed. The decade of 1990 s saw the development and maturation of client-server technology and the introduction of object-orientation both as stand-alone systems and as objectrelational DBMSs. The Özsu-Valduriez textbook Principles of Distributed Database Systems [10] was first published in 1991, covering the fundamental distribution principles and techniques. The second edition was published in 1999 and included coverage of client-server systems and distributed object systems. The third edition of the book was out in April During the writing of the third edition, we have been evaluating the 1 Work partially funded by the DataRing project of the French ANR.
3 past and contemplating the future. It has been almost twenty years since the first edition appeared, and ten years since the second edition. As one can imagine, in a fast changing area such as this, there have been significant changes in the intervening period. As we wrote the third edition, we incorporated technologies that were developed in late 1990 s and in 2000 s P2P systems, data integration, database clusters, web and XML data management, stream data management, and cloud data management. It is apparent that the last ten years have seen an accelerated investigation of distributed data management technologies spurred by advent of highspeed networks, fast commodity hardware, very heavy parallelization of hardware, and, of course, the increasing pervasiveness of the web. Now, the question is what is likely to happen in the next decade; or to put it differently, if there were to be a fourth edition of our book in 2020, what would it be? what would be new? This is the motivation for this paper 2. In observing the changes that have taken place over the past twenty years of our involvement with this field, what has struck as interesting is that the fundamental principles of distributed data management still hold, and distributed data management can be characterized on three dimensions: distribution, heterogeneity and autonomy of the data sources. What has changed much since and made the problems much harder, is the scale of the dimensions: very large scale distribution (cluster, P2P, web and cloud); very high heterogeneity (web); and high autonomy (web, P2P). Also interesting to note is that the fundamental principles of database fragmentation (or partitioning), data integration, transaction management, replication and relational query processing have stood the test of time. In particular, new techniques and algorithms could be presented as extensions of earlier material, using relational concepts. Today, to support the requirements of important data-intensive applications (e.g. social networks, web data analytics), new distributed data management techniques (e.g. MapReduce, Hadoop, Peanut, Pig latin, SciDB) are emerging and receiving much attention from the research community. Although they do well in terms of consistency-flexibility-performance trade-offs for specific applications, they seem to be ad-hoc and might hurt data interoperability. The key questions are: What are the fundamental principles behind the emerging solutions? Is there any generic architectural model, to explain those principles? Do we need new foundations to look at data distribution? In this paper, I discuss these questions. I first recall the fundamental principles of distributed data management (Section 2). Then, in Section 3, I illustrate the challenges introduced by new data-intensive applications in the context of scientific applications and cloud computing. In Section 4, I discuss emerging solutions and show that they can still be explained along the three main dimensions of distributed data management (distribution, autonomy, heterogeneity). Finally, in Section 5, I discuss what is likely to come next. 2 This was also the basis for a panel at ICDE 2011 [11].
4 2 Principles of Distributed Data Management The fundamental principle behind data management is data independence, which enables applications and users to share data at a high conceptual level while ignoring implementation details. This principle has been achieved by database systems which provide advanced capabilities such as schema management, high-level query languages, access control, automatic query processing and optimization, transactions, data structures for supporting complex objects, etc. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database system is defined as the software system that permits the management of the distributed database and makes the distribution transparent to the users. Distribution transparency extends the principle of data independence so that distribution is not visible to users. These definitions assume that each site logically consists of a single, independent computer. Therefore, each site has the capability to execute applications on its own. The sites are interconnected by a computer network with loose connection between sites which operate independently. Applications can then issue queries and transactions to the distributed database system which transforms them into local queries and local transactions (see Figure 1) and integrates the results. The distributed database system can run at any site s, not necessarily distinct from the data (i.e. it can be site 1 or 2 in Figure 1). Queries Transactions site s Distributed Database System site 1 site 2 DBMS1 DB1 DBMS2 DB2 Figure 1. A distributed database system with two data sites The database is physically distributed across the data sites by fragmenting and replicating the data. Given a relational database schema, for instance, fragmentation subdivides each relation into partitions based on some function applied to some tuples attributes. Based on the user access patterns, each of the fragments may also be replicated to improve locality of reference (and thus performance) and availability. The use of a set-oriented data model (like relational) has been crucial to define fragmentation, based on data subsets. The functions provided by a distributed database system could be those of a database system (schema management, access control, query processing, transaction support, etc). But since they must deal with distribution, they are more complex to implement. Therefore, many systems support only a subset of these functions.
5 When the data and the databases already exist, one is faced with the problem of providing integrated access to heterogeneous data. This process is known as data integration: it consists in defining a global schema over the existing data and mappings between the global schema and the local database schemas. Data integration systems have received several names such as federated database systems, multidatabase systems and, more recently, mediator systems. Standard protocols such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) ease data integration using SQL. In the context of the Web, mediator systems [13] allow general access to autonomous data sources (such as files, databases, documents, etc.) in read only mode. Thus, they typically do not support all database functions such as transactions and replication. When the architectural assumption of each site being a (logically) single, independent computer is relaxed, one gets a parallel database system [14], i.e. a database system implemented on a tightly-coupled multiprocessor or a cluster. The main difference with a distributed database system is that there is a single operating system which eases implementation and the network is typically faster and more reliable. The objective of parallel database systems is high-performance and highavailability. High-performance (i.e. improving transaction throughput or query response time) is obtained by exploiting data partitioning and query parallelism while high-availability is obtained by exploiting replication. Again this has been made possible by the use of a set-oriented data model, which eases parallelism, in particular, independent parallelism between data subsets. The distributed database approach has proved effective for applications that can benefit from static administration, centralized control and full-fledge database capabilities, e.g. information systems. For administrative reasons, the distributed database system typically runs on a separate server, which reduces scalability to tens of databases. Data integration systems achieve better scalability to hundreds or thousands of data sources by restricting functionality (i.e. read-only querying). Parallel database systems can also scale up to large configurations with thousands of processing nodes. However, both data integration systems and parallel database systems typically rely on a global schema that can be either centralized or replicated. We now consider the possible ways in which a distributed DBMS may be architected. We use a classification (Figure 1) that organizes the systems as characterized with respect to three dimensions: (1) the autonomy of local systems, (2) their distribution, and (3) their heterogeneity. Autonomy, in this context, refers to the distribution of control, not of data. It indicates the degree to which individual DBMSs can operate independently. Whereas autonomy refers to the distribution (or decentralization) of control, the distribution dimension of the taxonomy deals with the physical distribution of data over multiple sites (or nodes in a parallel system). There are a number of ways DBMSs have been distributed. We distinguish between client/server (C/S) distribution and peer-to-peer (P2P) distribution (or full distribution). With C/S DBMS, sites may be clients or servers, thus with different functionality, whereas with homogeneous P2P DBMS (DDBMS in Figure 1), all sites provide the same functionality. Note that DDBMS came before C/S DBMS in the late P2P data management struck back in the 2000 with modern variations to deal with very large scale, autonomy and decentralized control. Heterogeneity refers to
6 data models, query languages, and transaction management protocols. Multidatabase systems (MDBMS) deal with heterogeneity, in addition to autonomy and distribution. Distribution DDBMS MDBMS C/S DBMS Autonomy Heterogeneity Figure 2. Distributed DBMS Architectures (modified after [10]) 3 New Challenges for Distributed Data Management The pervasiveness of the web has spurred all kinds of data-intensive applications and introduced great challenges for distributed data management. New data-intensive applications such as social networks, web data analytics and scientific applications have requirements that are not met by the traditional distributed database systems in Figure 2. What has changed much and made the problems much harder, is the scale of the dimensions: very large scale distribution, very high heterogeneity, and high autonomy. Let us illustrate the challenges for distributed data management with two important domains: scientific data management and cloud data management. 3.1 Scientific Data Management Scientific data management has become a major challenge for the database and data management research community [7]. Modern science such as agronomy, bioinformatics, physics and environmental science must deal with overwhelming amounts of experimental data produced through empirical observation and simulation. Such data must be processed (cleaned, transformed, analyzed) in all kinds of ways in order to draw new conclusions, prove scientific theories and produce knowledge. However, constant progress in scientific observational instruments (e.g. satellites, sensors, large hadron collider) and simulation tools (that foster in silico experimentation, as opposed to traditional in situ or in vivo experimentation) creates a
7 huge data overload. For example, climate modeling data are growing so fast that they will lead to collections of hundreds of exabytes expected by Scientific data is also very complex, in particular because of heterogeneous methods used for producing data, the uncertainty of captured data, the inherently multi-scale nature (spatial scale, temporal scale) of many sciences and the growing use of imaging (e.g. satellite images), resulting in data with hundreds of attributes, dimensions or descriptors. Processing and analyzing such massive sets of complex scientific data is therefore a major challenge since solutions must combine new data management techniques with large-scale parallelism in cluster, grid or cloud environments [12]. Furthermore, modern science research is a highly collaborative process, involving scientists from different disciplines (e.g. biologists, soil scientists, and geologists working on an environmental project), in some cases from different organizations distributed in different countries. Since each discipline or organization tends to produce and manage its own data, in specific formats, with its own processes, integrating distributed data and processes gets difficult as the amounts of heterogeneous data grow. Despite their variety, we can identify common features of scientific data [1]: massive scale; manipulated through complex, distributed workflows; typically complex, e.g. multidimensional or graph-based; with uncertainty in the data values, e.g., to reflect data capture or observation; important metadata about experiments and their provenance; heavy floating-point computation; and mostly append-only (with rare updates) Cloud Data Management Cloud computing is the latest trend in distributed computing and has been the subject of much hype. The vision encompasses on demand, reliable services provided over the Internet (typically represented as a cloud) with easy access to virtually infinite computing, storage and networking resources. Through very simple Web interfaces and at small incremental cost, users can outsource complex tasks, such as data storage, system administration, or application deployment, to very large data centers operated by cloud providers. Thus, the complexity of managing the software/hardware infrastructure gets shifted from the users' organization to the cloud provider. From a technical point of view, the grand challenge is to support in a costeffective way the very large scale of the infrastructure which has to manage lots of users and resources with high quality of service. However, not all data-intensive applications are good candidates for being "cloudified" [1]. To simplify, we can classify between the two main classes of dataintensive applications: On Line Transaction Processing (OLTP) and On Line Analytical Processing (OLAP). Let us recall their main characteristics. OLTP deals with operational databases of average sizes (up to a few Terabytes), is write-intensive, and requires complete ACID transactional properties, strong data protection and response time guarantees. On the other hand, OLAP deals with historical databases of very large sizes (up to Petabytes), is read-intensive and thus can accept relaxed ACID properties. Furthermore, since OLAP data are typically extracted from operational
8 OLTP databases, sensitive data can be simply hidden for analysis (e.g. using anonymization) so that data protection is not as crucial as in OLTP. OLAP is more suitable than OLTP for cloud primarily because of two cloud characteristics (see the detailed discussion in [1]): elasticity and security. To support elasticity in a cost-effective way, the best solution that most cloud providers adopt is a shared-nothing cluster. Shared-nothing provides high-scalability but requires careful data partitioning. Since OLAP databases are very large and mostly read-only, data partitioning and parallel query processing are effective. However, it is much harder to support OLTP on shared-nothing because of ACID guarantees which require complex concurrency control. For these reasons and because OLTP databases are not so large, shared-disk is the preferred architecture for OLTP. The second reason that OLTP is not so suitable for cloud is that highly sensitive data get stored at an untrusted host (the provider site). Storing corporate data at an untrusted third-party, even with a carefully negociated Service Level Agreement (SLA) with a reliable provider, creates resistance from some customers because of security issues. However, this resistance is much reduced for historical data, with anonymized sensitive data. There is much more variety in cloud data than in scientific data since there are many different kinds of customers (individuals, SME, large corporations, etc.). However, we can identify common features. Cloud data can be very large, unstructured (e.g. text-based) or semi-structured, and typically append-only (with rare updates). And cloud users and application developers may be in high numbers, but not DBMS experts. 4 Emerging Solutions Generic data management solutions (e.g. relational DBMS) that have proved effective in many application domains (e.g. business transactions) are not efficient at dealing with these emerging applications, thereby forcing developers to build ad-hoc solutions that are labor-intensive and cannot scale. In particular, relational DBMSs have been lately criticized for their one size fits all approach. Although they have been able to integrate support for all kinds of data (e.g., multimedia objects, XML documents and new functions), this has resulted in a loss of performance and flexibility for applications with specific requirements because they provide both too much and too little. Therefore, it has been argued that more specialized DBMS engines are needed. For instance, column-oriented DBMSs, which store column data together rather than rows in traditional row-oriented relational DBMSs, have been shown to perform more than an order of magnitude better on decision-support workloads. The one size does not fit all counter-argument generally applies to cloud data management as well. Therefore, current data management solutions have traded consistency for scalability, simplicity and flexibility. As alternative to relational DBMS (which use the standard SQL language), these solutions have been recently quoted as Not Only SQL (NOSQL) by the database research community. Many of these solutions have
9 been developed for the cloud or the grid, which both exploit large scale parallelism, typically with shared-nothing clusters. Figure 3 positions the architectures of emerging solutions along the same three dimensions (distribution, heterogeneity and autonomy). Like cloud computing, grid computing enables access to very large compute and storage resources over the Web. Compared with cloud computing which deals with large-scale parallelism, the grid is characterized with high heterogeneity, large-scale distribution and large-scale parallelism. In addition to grid and cloud, we also position the recent P2P DBMS that target large-scale distribution and data integration systems (like MDBMS) that deal with large-scale distribution, heterogeneity and autonomy. Distribution DDBMS C/S DBMS Data cloud Data grid Data integration systems P2P DBMS Autonomy Heterogeneity Figure 3. Architectures of Emerging Solutions Distributed data management for cloud applications emphasizes scalability, faulttolerance and availability, sometimes at the expense of consistency or ease of development. Let us illustrate this approach with two popular solutions: Google Bigtable and Yahoo! PNUTS. Bigtable is a database storage system for a shared-nothing cluster [4]. It uses a distributed file system (Google File System - GFS) for storing structured data in distributed files, with fault-tolerance and availability. It also uses a form of dynamic data partitioning for scalability. There are also open source implementations of Bigtable, such as Hadoop Hbase, which runs on Hadoop Distributed File System (HDFS). Bigtable supports a simple data model that resembles the relational model, with multi-valued, timestamped attributes. It provides a basic API for defining and manipulating tables, within a programming language such as C++, and various operators to write and update values, and to iterate over subsets of data, produced by a scan operator. There are various ways to restrict the rows, columns and timestamps produced by a scan, as in a relational select operator. However, there is no complex operator such as join or union, which should be programmed using the scan operator. Transactional atomicity is supported for single row updates only. To store a table in GFS, Bigtable uses range partitioning on the row key. Each table is divided into partitions, called tablets, each corresponding to a row range.
10 PNUTS is a parallel and distributed database system for cloud applications at Yahoo! [5]. It is designed for serving Web applications, which typically do not need complex queries, but require good response time, scalability and high availability and can tolerate relaxed consistency guarantees for replicated data. PNUTS supports the relational data model, with arbitrary structures allowed within attributes of Blob type. Schemas are flexible as new attributes can be added at any time even though the table is being queried or updated, and records need not have values for all attributes. PNUTS provides a simple query language with selection and projection on a single relation. Updates and deletes must specify the primary key. PNUTS provides a replica consistency model that is between strong consistency and eventual consistency, with several API operations with different guarantees. Database tables are horizontally partitioned into tablets, through either range partitioning or hashing, which are distributed across many servers in a cluster (at a site). To summarize, both Bigtable and PNUTS provide some variations of the relational model, a simple API or language for manipulating data, and relaxed consistency guarantees. They also rely on fragmentation (partitioning) and replication for faulttolerance. Thus, they capitalize on the well-known principles of distributed data management. Emerging solutions strive to be more generic than ad-hoc solutions which are laborintensive and cannot scale. For instance, the SciDB organization ( is building an open source database system for scientific data analytics. SciDB will be certainly effective for similar applications for which the data is well understood (with well-defined data structures). However, to avoid that the one size fits all argument applies to SciDB as well, the key question is: how generic should scientific data management be, without hampering application-specific optimizations? For instance, to perform scientific data analysis efficiently, scientists typically resort to dedicated indexes, compression techniques and specific algorithms. Thus, generic techniques, inspired from the DB research community should be able to cope with these specific techniques. Genericity in data management encompasses two dimensions: data model (which provides data structures (captured by the data model) and data processing (inferred by the query language). Relational DBMS have initially provided genericity through the relational data model (that subsumes earlier data models) and a high-level query language (SQL). However, successive object extensions to include new data structures such as lists and arrays and support user-defined functions in a programming language have resulted in a yet generic, but more complex data model and language for the developers. Therefore, emerging solutions tend to rely on a more specific data model (e.g. Bigtable which is some kind of nested relational model) with a simple set of operators easy to use from a programming language. For instance, to address the requirements of social network applications, new solutions rely on a graph data model and graph-based operators. To address the requirements of scientific applications, SciDB supports an array data model, which generalizes the relational model, with array operators. User-defined functions also allow for more specific data processing. MapReduce [6] is a good example of generic parallel data processing framework, on top of a distributed file system (GFS). It supports a simple data model (sets of (key, value) pairs), which allows user-defined functions (map and reduce). Although quite successful among developers, it is relatively low-level and rigid,
11 leading to custom user code that is hard to maintain and reuse. Pig latin [8] is an alternative data management solution that raises the level of abstraction with an algebraic query language. In emerging solutions, it is interesting to witness the development of algebras, with specific operators, to raise the level of abstraction in a way that enables optimization. For instance, in [9], we propose an algebraic approach enables automatic optimization of scientific workflows that manipulate huge amounts of data through specific programs and files. 5 Conclusion To support the requirements of important data-intensive applications (e.g. social networks, web data analytics, scientific applications, etc.), new distributed data management techniques and systems (e.g. MapReduce, Hadoop, SciDB, Peanut, Pig latin, etc.) have emerged and are receiving much attention from the research community. Now, the guiding question for this paper was what is likely to happen in the next decade; or what will be the principles of distributed data management in I translated this into three questions: (1) What are the fundamental principles behind the emerging solutions? (2) Is there any generic architectural model, to explain those principles? (3) Do we need new foundations to look at data distribution? To address (1), I illustrated the challenges introduced by new data-intensive applications in the context of scientific applications and cloud computing. The emerging solutions typically provide some variations of the relational model, a simple API or language for manipulating data, and relaxed consistency guarantees. They also rely on fragmentation (partitioning) for large-scale parallelism and replication for fault-tolerance. Thus, they capitalize on the well-known principles of distributed data management. Wrt (2), I showed that emerging solutions can still be explained along the three main dimensions of distributed data management (distribution, autonomy, heterogeneity), yet pushing the scales of the dimensions high up. However, I raised the question of how generic should distributed data management be, without hampering application-specific optimizations. Emerging NOSQL solutions tend to rely on a specific data model (e.g. Bigtable, MapReduce) with a simple set of operators easy to use from or with a programming language. It is also interesting to witness the development of algebras, with specific operators, to raise the level of abstraction in a way that enables optimization [9]. What is missing to explain the principles of emerging solutions is one or more dimensions on generic/specific data model and data processing. The hardest question is (3): Do we need new foundations to look at data distribution? It all depends what we mean by data and whether we consider the continuum between data, information and knowledge, with increasing pervasiveness of data semantics. During the ICDE 2011 panel [11], one of us (S. Abiteboul) argued that accessing highly distributed, heterogeneous data in personal dataspaces on the web was beyond human expertise, and requires us moving from data to knowledge, with automated reasoning. This is the motivation for the comeback of Datalog as a uniform language to deal with data, metadata, rules, distribution, time, etc. [2].
12 Acknowledgements This paper has benefited from fruitful discussions with many colleagues: Tamer M. T. Özsu, of course, during the writing of [10]; Serge Abiteboul, Bettina Kemme, Ricardo Jiménez-Peris, Beng Chin Ooi in the ICDE 2011 panel on the same topic [11]; the members of the Zenith team at INRIA-LIRMM, in particular, Reza Akbarinia, Florent Masseglia and Esther Pacitti; and the members of the CNPq-INRIA project Datluge, in particular, Marta Mattoso, Eduardo Osgarawa, and Fabio Porto. References 1. D. J. Abadi. Data Management in the Cloud: Limitations and Opportunities. IEEE Data Engineering Bulletin 32(1): 3-12, S. Abiteboul, M. Bienvenu, A. Galland, E. Antoine. A Rule-based Language for Web Data Management. ACM Symposium on Principles of Database Systems (PODS), A. Ailamaki, V. Kantere, D. Dash. Managing scientific data. CACM, 53(6):68-78, F. Chang et al. Bigtable: A Distributed Storage System for Structured Data. ACM TOCS, 26(2): B. F. Cooper et al. PNUTS: Yahoo!'s Hosted Data Serving Platform. Proceedings of the VLDB Endowment (PVLDB) 1(2): , J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Symp. on Operating System Design and Implementation (OSDI), , J. Gray, D. T. Liu, M. A. Nieto-Santisteban, A. S. Szalay, D. J. DeWitt, G. Heber. Scientific Data Management in the Coming Decade. Technical Report MSR-TR , Microsoft Research, C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins. Pig latin: a not-so-foreign language for data processing. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD), , E. Osgarawa, J. Dias, D. Oliveira, F. Porto, P. Valduriez, M. Mattoso. An Algebraic Approach for Data-Centric Scientific Workflows. Proceedings of the VLDB Endowment (PVLDB), to appear, T. Özsu, P. Valduriez. Principles of Distributed Database Systems. 3rd Edition, Springer T. Özsu, P. Valduriez, S. Abiteboul, B. Kemme, R. Jiménez-Peris, B. Chin Ooi. Distributed data management in 2020? IEEE Int. Conf. On Data Engineering (ICDE), 1360, E. Pacitti, P. Valduriez, M. Mattoso. Grid Data Management: open problems and new issues. Journal of Grid Computing, 5(3): , A. Tomasic, L. Raschid, P. Valduriez. Scaling access to heterogeneous data sources with DISCO. IEEE Trans. on Knowledge and Data Engineering, 10(5), , P. Valduriez. Parallel Database Systems: open problems and new issues. Int. Journal on Distributed and Parallel Databases, 1(2), , P. Valduriez, E. Pacitti. Data Management in Large-scale P2P Systems. Int. Conf. on High Performance Computing for Computational Science (VecPar 2004), LNCS 3402, , Springer 2005.
Distributed Data Management in 2020?
Distributed Data Management in 2020? Patrick Valduriez INRIA & LIRMM Montpellier, France 1 DEXA, Toulouse, August 30, 2011 1 Basis for this Talk T. Özsu and P. Valduriez. Principles of Distributed Database
Data Management in the Cloud -
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
Zenith: a hybrid P2P/cloud for Big Scientific Data
Zenith: a hybrid P2P/cloud for Big Scientific Data Esther Pacitti & Patrick Valduriez INRIA & LIRMM Montpellier, France 1 1 Basis for this Talk Zenith: Scientific Data Management on a Large Scale. Esther
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data
FP-Hadoop: Efficient Execution of Parallel Jobs Over Skewed Data Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez To cite this version: Miguel Liroz-Gistau, Reza Akbarinia, Patrick Valduriez. FP-Hadoop:
Mobility management and vertical handover decision making in heterogeneous wireless networks
Mobility management and vertical handover decision making in heterogeneous wireless networks Mariem Zekri To cite this version: Mariem Zekri. Mobility management and vertical handover decision making in
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Report Data Management in the Cloud: Limitations and Opportunities
Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]
Principles of Distributed Database Systems
M. Tamer Özsu Patrick Valduriez Principles of Distributed Database Systems Third Edition
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
ibalance-abf: a Smartphone-Based Audio-Biofeedback Balance System
ibalance-abf: a Smartphone-Based Audio-Biofeedback Balance System Céline Franco, Anthony Fleury, Pierre-Yves Guméry, Bruno Diot, Jacques Demongeot, Nicolas Vuillerme To cite this version: Céline Franco,
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Cloud & Big Data a perfect marriage? Patrick Valduriez
Cloud & Big Data a perfect marriage? Patrick Valduriez Cloud & Big Data: the hype! 2 Cloud & Big Data: the hype! 3 Behind the Hype? Every one who wants to make big money Intel, IBM, Microsoft, Oracle,
Hosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
P2Prec: A Social-Based P2P Recommendation System
P2Prec: A Social-Based P2P Recommendation System Fady Draidi, Esther Pacitti, Didier Parigot, Guillaume Verger To cite this version: Fady Draidi, Esther Pacitti, Didier Parigot, Guillaume Verger. P2Prec:
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
A graph based framework for the definition of tools dealing with sparse and irregular distributed data-structures
A graph based framework for the definition of tools dealing with sparse and irregular distributed data-structures Serge Chaumette, Jean-Michel Lepine, Franck Rubi To cite this version: Serge Chaumette,
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Managing Risks at Runtime in VoIP Networks and Services
Managing Risks at Runtime in VoIP Networks and Services Oussema Dabbebi, Remi Badonnel, Olivier Festor To cite this version: Oussema Dabbebi, Remi Badonnel, Olivier Festor. Managing Risks at Runtime in
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2
Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India [email protected] 2 PDA College of Engineering, Gulbarga, Karnataka,
Data Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Flauncher and DVMS Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically
Flauncher and Deploying and Scheduling Thousands of Virtual Machines on Hundreds of Nodes Distributed Geographically Daniel Balouek, Adrien Lèbre, Flavien Quesnel To cite this version: Daniel Balouek,
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System
Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Study on Cloud Service Mode of Agricultural Information Institutions
Study on Cloud Service Mode of Agricultural Information Institutions Xiaorong Yang, Nengfu Xie, Dan Wang, Lihua Jiang To cite this version: Xiaorong Yang, Nengfu Xie, Dan Wang, Lihua Jiang. Study on Cloud
R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 [email protected], 2 [email protected],
Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
QASM: a Q&A Social Media System Based on Social Semantics
QASM: a Q&A Social Media System Based on Social Semantics Zide Meng, Fabien Gandon, Catherine Faron-Zucker To cite this version: Zide Meng, Fabien Gandon, Catherine Faron-Zucker. QASM: a Q&A Social Media
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Daniel J. Adabi. Workshop presentation by Lukas Probst
Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Fault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Microsoft Azure Data Technologies: An Overview
David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...
Cloud Storage Solution for WSN in Internet Innovation Union
Cloud Storage Solution for WSN in Internet Innovation Union Tongrang Fan, Xuan Zhang and Feng Gao School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050043, China
GEOG 482/582 : GIS Data Management. Lesson 10: Enterprise GIS Data Management Strategies GEOG 482/582 / My Course / University of Washington
GEOG 482/582 : GIS Data Management Lesson 10: Enterprise GIS Data Management Strategies Overview Learning Objective Questions: 1. What are challenges for multi-user database environments? 2. What is Enterprise
Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Enhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
Distributed Databases
Distributed Databases Chapter 1: Introduction Johann Gamper Syllabus Data Independence and Distributed Data Processing Definition of Distributed databases Promises of Distributed Databases Technical Problems
A survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
A usage coverage based approach for assessing product family design
A usage coverage based approach for assessing product family design Jiliang Wang To cite this version: Jiliang Wang. A usage coverage based approach for assessing product family design. Other. Ecole Centrale
A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System
A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System Mohammad Ghulam Ali Academic Post Graduate Studies and Research Indian Institute of Technology, Kharagpur Kharagpur,
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
Faut-il des cyberarchivistes, et quel doit être leur profil professionnel?
Faut-il des cyberarchivistes, et quel doit être leur profil professionnel? Jean-Daniel Zeller To cite this version: Jean-Daniel Zeller. Faut-il des cyberarchivistes, et quel doit être leur profil professionnel?.
International Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
Alternatives to HIVE SQL in Hadoop File Structure
Alternatives to HIVE SQL in Hadoop File Structure Ms. Arpana Chaturvedi, Ms. Poonam Verma ABSTRACT Trends face ups and lows.in the present scenario the social networking sites have been in the vogue. The
Bigdata : Enabling the Semantic Web at Web Scale
Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
Big Data on Cloud Computing- Security Issues
Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now
Additional mechanisms for rewriting on-the-fly SPARQL queries proxy
Additional mechanisms for rewriting on-the-fly SPARQL queries proxy Arthur Vaisse-Lesteven, Bruno Grilhères To cite this version: Arthur Vaisse-Lesteven, Bruno Grilhères. Additional mechanisms for rewriting
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
Distributed Databases in a Nutshell
Distributed Databases in a Nutshell Marc Pouly [email protected] Department of Informatics University of Fribourg, Switzerland Priciples of Distributed Database Systems M. T. Özsu, P. Valduriez Prentice
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university
DISTRIBUTED AND PARALLELL DATABASE
DISTRIBUTED AND PARALLELL DATABASE SYSTEMS Tore Risch Uppsala Database Laboratory Department of Information Technology Uppsala University Sweden http://user.it.uu.se/~torer PAGE 1 What is a Distributed
Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
Cloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 [email protected]
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
New implementions of predictive alternate analog/rf test with augmented model redundancy
New implementions of predictive alternate analog/rf test with augmented model redundancy Haithem Ayari, Florence Azais, Serge Bernard, Mariane Comte, Vincent Kerzerho, Michel Renovell To cite this version:
bigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
Big Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Introduction to Parallel and Distributed Databases
Advanced Topics in Database Systems Introduction to Parallel and Distributed Databases Computer Science 600.316/600.416 Notes for Lectures 1 and 2 Instructor Randal Burns 1. Distributed databases are the
Fig. 3. PostgreSQL subsystems
Development of a Parallel DBMS on the Basis of PostgreSQL C. S. Pan [email protected] South Ural State University Abstract. The paper describes the architecture and the design of PargreSQL parallel database
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Cloud Storage Solution for WSN Based on Internet Innovation Union
Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,
Data Distribution with SQL Server Replication
Data Distribution with SQL Server Replication Introduction Ensuring that data is in the right place at the right time is increasingly critical as the database has become the linchpin in corporate technology
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
