Parallel Processing of JOIN Queries in OGSA-DAI

Size: px
Start display at page:

Download "Parallel Processing of JOIN Queries in OGSA-DAI"

Transcription

1 Parallel Processing of JOIN Queries in OGSA-DAI Fan Zhu Aug 21, 2009 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2009

2 Abstract JOIN Query is the most important and often most expensive of all relational operations, especially when its input is obtained from considerable size of tables on distributed heterogeneous database. As parallel join processing is a well understood technique to get results as quickly as possible, one way to speed up query execution is to exploit parallelism. Since most real queries involve joins of several tables, efficient join execution becomes very important. This thesis focuses on query processing in a distributed heterogeneous database not in a DBMS. The aims of the project are: a) to investigate methods for parallel execution of join query, which are usually used to optimize a single join operation. b) To analyze the difference in performance caused by different query plans, which is used to speed up complex queries that contains multiple join operations. The main steps and achievements of this project are the following: a) The first step of the project is to study and extend my knowledge on relational algebra and OGSA-DAI (Open Grid Service Architecture - Data Access and Integration) software. As OGSA-DAI middleware allows process query and transform data from distributed resources, the mechanism and the interface defined in OGSA and the primary components of OGSA-DAI middleware are used into our experiments. b) The second step is to design efficient parallel approaches to optimize join execution strategies currently used by OGSA-DAI. It is the most important work to analyze and investigate the parallel mechanism when executing complex join query operations on large tables, including Independent Parallelism, Pipelined Parallelism, Partitioned Parallelism and Mixed Parallelism. Based on our parallelism analysis, two parallel join algorithms - Hash Split Join algorithm and Sorted Merge Join algorithm are adopted in the project. All function modules are divided into OGSA-DAI activities and the function of implementation activities are described in detail. c) The third step is to implement the parallel algorithms and to evaluate the performance of parallel query. The thesis discusses and analyzes the performance of every functionality activity such as SQL Query Activity, Tuple Sort Activity and Sorted Merge Activity. It analyzes respectively the performance of queries based on two-table-join, multi-table-join and join on distributed heterogeneous database. Based on our experiments, it point out the affect made by different query plans. Keywords: SQL Query, Join Query, OGSA-DAI, Parallelization.

3 Contents Chapter 1 Introduction Project Aims Research Methods Thesis Structure... 2 Chapter 2 Background Knowledge SQL and Relational Theory Query Graphs and Query Plans OGSA-DAI... 7 Chapter 3 Analysis and Design of Parallel Algorithms Requirements Capture Mechanisms of Parallel Query Execution Partitioning Algorithms Parallel Join Implementations User Side Workflow Chapter 4 Performance Analysis Experimental Setup Performance Analysis for Single Activity Single Join Multiple Join i

4 4.5 Join on Distributed Heterogeneous Database Chapter 5 Conclusions Appendix A Source Code Appendix B Submission Script References ii

5 List of Tables Table 1 Bandwidth of SQL Query Activity Table 2 Bandwidth of Hash Split Activity Table 2 Sorted Merge Activity and Union All Activity Table 3 Overall Activity Performance Table 4 Query Plan 1 vs. Query Plan Table 5 Performance on Different Database Table 6 Performance of Heterogeneous Database iii

6 List of Figures Figure 1 Logical Query Plan... 5 Figure 2 Inner Join... 6 Figure 3 Query Graph Example... 7 Figure 4 Query Plan Example... 7 Figure 5 OGSA Services Framework Figure 6 The Architecture of OGSA-DAI Figure 7 OGSA-DAI Runtime Overview Figure 8 Independent Parallelism Figure 9 Pipelined Parallelism Figure 10 Partitioned Parallelism Figure 11 Independent and Pipelined Mixed Parallelism Figure 12 Serial Join Workflow Figure 13 Hash Split Join Figure 14 Sorted Merge Join Figure 15 Query Graph Figure 16 Query Tree Figure 18 Running Time of Reproduced Test Figure 19 Workflow without Swallow Activity Figure 20 Workflow with Swallow Activity iv

7 Figure 21 Performance of Hash Split Activity Figure 22 Array List vs. Linked List Figure 23 Performance of Tuple Sort Activity Figure 24 Query Plan Figure 25 Query Plan Figure 26 Re-use in Hash Split Join Figure 27 DB2 vs. MySQL v

8 Acknowledgements First of all, I would like to show my deepest thanks to my supervisor, Mr. Bartosz Dobrzelecki, who has provided me with valuable suggestion and guides among this dissertation. I also want to extend my thanks to all my friends for their encouragement and support. vi

9 Chapter 1 Introduction 1.1 Project Aims With the wide application of digital technology, the amount of data to be processed increases at a higher rate than the speed of processing units. This leads to a problem that the traditional database query algorithm may NOT be best suited for massive distributed data sets anymore on the internet. If a query operation takes long time to get its final result, the information it generates may already be obsolete in many application domains. Could we reduce the running time of a query by some techniques? On the other hand, given a query on multiple tables in the database application system, there are many schemes that a database management system can follow to process it and produce its results. Although all schemes will produce equivalent result in terms of their final output, their running cost varies. For example, the amount of time that two schemes need to run is different. Sometimes the time cost difference between two schemes may be enormous. What is the scheme that needs the least amount of time? Here I will also identify the research meaning by describing them from the perspectives of problem statement: The join query is the most expensive operation executed by databases and is proved that it can be optimized by parallelization. Thus it is important work for this project to parallelize a join query. Besides, different query plans may affect performance a lot. Matching a query to the most suitable plan will be also very helpful. The fundamental goal of this project is to investigate and solve the join processing problem by parallelization technique. This project will develop OGSA-DAI implementations of several parallel join algorithms. We also want to gather some experimental data that would help us understand which approaches to parallel join execution are most beneficial. 1.2 Research Methods The research methods of this project are: - To research and analyze on efficient parallelization approaches to optimize existing implementation of join operators which take significant time to execute. 1

10 - To design and implement parallelization algorithms useful for querying distributed data based on OGSA-DAI. 1.3 Thesis Structure The thesis is organised as five chapters. This chapter describes the project s purposes, roadmap and methods adopted in the project research. The rest of the thesis is structured as follow: Chapter 2 introduces the basic of SQL (Structured Query Language) and OGSA-DAI (Open Grid Service Architecture - Data Access and Integration) so that it is easier to understand our work and techniques in the project. In section 2.1 the four subsets of the declarative database language SQL are described and the SELECT query on multiple tables is introduced. Then, a basic set of relational operators is described. The Query Graph, which is used as a graph tool in analysis for query operation, is introduced in section 2.2. The description of the mechanisms and interfaces defined in OGSA and the primary components of architecture for OGSA-DAI are shown in section 2.3. We consider the query requirements based on database integration by OGSA applications. Some aspects in which consumers make requests to an OGSA-DAI product are described in detail. Chapter 3 discusses design and implementation algorithms. As this project is implemented on OGSA-DAI framework, the functional modules are divided into OGSA-DAI activities. It is the most important work to analyze and investigate the parallel mechanism when executing complex join query operations on large tables, including Independent Parallelism, Pipelined Parallelism, Partitioned Parallelism and Mixed Parallelism. Based on our parallelism analysis for join queries, two parallel join algorithms - Hash Split Join algorithm and Sorted Merge Join algorithm are used in the project. Section 3.4 shows the functionality of implemented activities in detail. Finally we discuss how the activities are assembled into OGSA-DAI workflows. In section 3.5, we give the implementation detail of Hash Split Join workflow and Sorted Merge Join. Chapter 4 contains the performance analysis for our parallel. First of all, it shows the test environment of software and hardware, the test data set and the test join query on the TPC Benchmark H (TPC-H) [4]. Then it discusses and analyzes the performance of every functionality activity such as SQL Query Activity, Tuple Sort Activity and Sorted Merge Activity. Sections 4.3 to 4.5 analyze the performance of queries based on two-table-join, multi-table-join and join on distributed heterogeneous database. It explores the reasons for different performance by analysing their implementation. It also illustrates the overall workflow performance and how it works on different databases. Based on our experiments, it points out the effect made by different query plans. It provides some conclusion about how to match a query to a plan. In Chapter 5, the final part of the thesis, the conclusions are presented based on our analysis and experiments. Our discussion in this thesis focuses on join query 2

11 optimization for sequential processing by parallelization method and query plan choosing for complex request. It touches upon issues and techniques related to optimizing join queries in distributed heterogeneous database environments. 3

12 Chapter 2 Background Knowledge 2.1 SQL and Relational Theory SQL (Structured Query Language) is a declarative database language which designed for management and retrieval of data in RDBMS (Relational Database Management System). There are four important parts of the SQL language: Data Manipulation Language (DML), Data Definition Language (DDL), Data Control Language (DCL) and Transactional Control Language (TCL). This project cares about DML part of SQL which is used to retrieve, store, modify, delete, update and manage data in database. For example, DML allows users to describe the desired properties of the result without specifying how to obtain it. This is also why SQL is a declarative language. The most common operation in SQL is result retrieval, which is performed with key word SELECT. A SELECT query can retrieve data from one or more tables. Join operations are needed in order to combine multiple tables. This project focuses on SELECT queries joining multiple tables Relational Algebra In order to define the database structure and constraints, a data model must include a set of operations to manipulate the data. A basic set of relational model operations constitute the relational algebra. Relational algebra is used to represent declarative SQL queies in a procedural form which can be executed. A sequence of relational algebra operations forms a relational algebra expression. [5] A SQL query is a relational algebra expression and can be performed with relational algebra operations such as SELECT, PROJECT, JOIN, UNION, INTERSECTION and CARTESIAN PRODUCT. Select and Join will be used in this project and will be explained in the next sections. 4

13 2.1.2 Select Statement and Logical Query Plan Select statement, which retrieves data from specified table(s), is the most commonly used of SQL expressions. For example, here is a simple Select From Where query: SELECT id, name, job FROM employee WHERE salary > 100 To be able to execute this declarative query, a logical query plan needs to be complied. SELECT query is translated to relational expression using Projection, Selection and Table Scan. The above simple query will be translated to a logical query plan (Figure 1): Figure 1 Logical Query Plan On execution, the system will fetch all records stored in the employee table (TABLE SCAN), then it will filter records and discard all those for which salary is <= 100 (SELECT). Finally the PROJECT operation will select only three attributes from each record. However, if you want to join more tables, the number of possible combinations rapidly explodes. All these plans will generate identical result but will have different cost. Due to combinatorial explosion it is not possible to perform exhaustive search for the best query plan. In this project, we try to devise some heuristic rules to help us choose the most promising plan in a limited time Join Query JOIN Query is the most important among all relational operations. A Join query clause combines tuples from two source tables. The SQL language supports fours types of joins: INNER, OUTER, LEFT, and RIGHT JOIN. This project focuses on INNER JOIN which is the most commonly used in applications and also the default join-type. 5

14 An INNER JOIN essentially combines the records from two tables (A and B) based on a given join-predicate. The result of join can be defined as the outcome of first taking the Cartesian product (or cross-join) of all records in the tables (combining every record in table A with every record in table B) - then return all records which satisfy the join predicate [1]. People Name ID Betty 100 Jones 101 Jack 102 Nationality ID Country 100 United Kingdom 101 Australia SELECT * FROM People INNER JOIN Nationality Where People.ID = Nationality.ID; People.Name People.ID Nationality.Country Nationality.ID Betty 100 United Kingdom 100 Jones 101 Australia 101 Figure 2 Inner Join An EQUIJOIN is a specific type of comparator-based join, which uses only equality comparisons (= only) in the join-predicate. Figure 2 is an example of EQUIJOIN. Tuples with ID equal to 100 or 101 are accepted because these values appear in both tables. Tuples with People.ID = 102 will be discarded as there is no related value in table Nationality. SQL queries often include multiple joins. The SQL language allows to define joins explicitly using the JOIN keyword (see example above). However, user queries usually contain implicit joins with join predicates defined after the WHERE clause. 2.2 Query Graphs and Query Plans Query Graph is a single graph corresponding to each query. It does not specify any order on which operation to perform first. For example, the join query in previous section can be translated into Figure 3. 6

15 Figure 3 Query Graph Example Query Plan (Figure 4) presents a specific order of operations for executing a query. It is a set of steps used to help accessing and modifying a SQL RDMS. Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, with widely varying performance. When a query is submitted to the database, the query optimizer evaluates some of the different, correct possible plans for executing the query and returns what it considers the best alternative [2]. Figure 4 Query Plan Example In this project, SQL query will be analysed first and parsed into a query graph. After observe this query graph, a query plan will be chosen based on our heuristic rules. There will be more details in Section and Section OGSA-DAI OGSA-DAI stands for Open Grid Services Architecture Data Access and Integration. The aim of OGSA-DAI is to develop a standard interface for distributed data resources on the Grid. Nowadays, there are a lot of data out there but these data are not in the same database or even not linked together. Islands of data have this problem. We need a way to integrate isolated and distributed data sources. An OGSA-DAI web service allows data to be queried, updated, transformed and delivered. OGSA-DAI web services can be used to provide web services that offer data integration functionality to clients. OGSA-DAI web services can be deployed within a Grid environment. OGSA-DAI thereby provides a means for users to Grid-enable their data resources [3]. 7

16 2.3.1 OGSA Grid Environment The Grid is defined as an infrastructure consisting of multiple computers connected via network technologies providing the impression of one computer system. In 2001, researchers led by Globus and IBM began developing new Grid standards and technology. The aim was to merge the understanding developed through the design of early Grid applications with the Web Services middleware. Their goal was to allow Grid developers to exploit the huge commercial investment in Web Services infrastructure. The result was the Open Grid Services Architecture (OGSA) -- a high-level framework designed to support dynamic virtual organizations that share independently administered data and resources seamlessly across a network of heterogeneous computers. The OGSA is used to identify the components needed in a grid system. OGSA defines a service-based structure for creating a grid computing environment. Still under development, this architecture defines the major functional components required to meet those requirements. Prof. Ian Foster gave a description of the mechanisms and interfaces defined in OGSA [10][11]. The OGSA services framework is shown in Figure 5. The services are built on Web service standards, with semantics, additions, extensions and modifications that are relevant to Grids [11]. Figure 5 OGSA Services Framework. Cylinders represent individual services The important points are the followings: An important motivation for OGSA is the composition paradigm or building block approach, where a set of functions is built or adapted as required. This provides the adaptability, flexibility and robustness to change that is required in the architecture. 8

17 The entire set of OGSA capabilities does not have to be present in a system. A system may choose to utilize or provide only a subset of services from any capability. OGSA represents the services, their interfaces, and the semantics/behavior and interaction of these services. The architecture is not layered, where the implementation of one service is built upon OGSA-DAI Software With the increase of data produced in research and business environments, data management is increasingly challenging. Since 2002, the Open Grid Service Architecture - Data Access and Integration (OGSA-DAI) project funded by the UK e-science Programme has been working to develop an effective solution to the data management challenge and in particular to data access and integration problems. OGSA-DAI facilitates Data Access and Integration of data resources such as relational databases within a Grid. The reference paper [12] presents a status report on OGSA-DAI activities and announces future directions. The paper [13] describes a new architecture for future OGSA-DAI releases and its rationale. The OGSA-DAI 3.0 is a complete top-to-bottom redesign and implementation of the OGSA-DAI product. The paper [14] describes the motivation behind this redesign and provides an overview of OGSA-DAI 3.0, comparing and contrasting with last OGSA-DAI releases OGSA-DAI Framework The OGSA-DAI is a framework that enables existing data resources to be integrated into a grid environment. OGSA-DAI is a middleware to interface with databases, which allows data resources, such as file systems, relational or XML databases, to be accessed, federated and integrated across the network [15]. As well as accessing and updating data in a database, OGSA-DAI offers an extensibility mechanism, making it possible to add further user defined activities to OGSA-DAI that can be executed in addition to activities already offered by OGSA-DAI, such as SQL query and update. The primary components of new architecture for OGSA-DAI are shown 2 in Figure 6 [13]. The architecture looks forward to multiple data services administered through a consistent regime. There are three data services: one serves OGSA-DAI, one serves the WS-DAI standard perhaps as a configuration of OGSA-DAI and one serves Mobius. 1 Paper [12] and [13] talks about the old OGSA-DAI product, while paper [14] is related to the current one. 2 This figure is for OGSA-DAI 2.x. the architecture used by OGSA-DAI 3.x has slightly different. 9

18 Figure 6 The Architecture of OGSA-DAI OGSA-DAI Activity Activity is a workflow unit implementing a certain function linked with a specific name. Arbitrary data related function can be encapsulated as an activity. These activities can be used to provide complex functionality. OGSA-DAI come with a default set of activities like: SQL query activity, data format transfer activity, data set union activity. These activities can split into several categories like delivery activities and relational activities. As it is showed in Figure 7, every activity has a client side code and server side code; they are matched by their unique ID. There are actually three parts in an activity workflow: 1. User code: Client toolkit API allows user to assemble workflows by connecting activities. It also provides methods for submitting workflows to OGSA-DAI services. It calls the client side code to fill required inputs and declare output. Note that one user code can call more than one activities and every activity can have multiple instances. Executed on user side. 10

19 2. Client Side: It manages the inputs and outputs of an activity. Inputs will be sent to server side code and outputs will be forward to user. Executed on client side, too. 3. Server Side: It is the one who actually do the functionality task. It may connect to database (SQL related activities). Executed on server side. Figure 7 OGSA-DAI Runtime Overview 11

20 Chapter 3 Analysis and Design of Parallel Algorithms This chapter contains requirements capture and system design. Some of implementation details are also introduced in order to specify a low level overview of main functions and solutions to common problems. 3.1 Requirements Capture The main aim of this project is to use parallelization to optimize SQL join query processing which have huge input tables on distributed database system. The other aim of this project is to analyse how different query plans affect the performance. OGSA-DAI is designed to enable remote access to data. It is a well designed framework and takes advantage in management of distributed database system. OGSA-DAI is a framework that simplifies building distributed data processing systems. By using OGSA-DAI in our project we can focus on parallel algorithms and not worry about the details of distributed processing. The following sections present the basic and additional functional goals in this project Basic Functional Goals This project mainly contains four goals: Implementation of Hash Join algorithm and Sorted Merge algorithm to implement query execution on OGSA-DAI. Parallelisation of the above algorithms. Performance analysis of these two joins algorithms. Performance analysis of different query plans. 12

21 3.1.2 Performance Goals Because this project is about optimization, it focuses on performance. As a client/server framework, OGSA-DAI may introduce some overhead during execution. In this project we will investigate if this overhead is damaging, how bad is it and try to understand where exactly time is spent. We also try to find the bottlenecks in this project and whether parallelization is going to reduce execution times. 3.2 Mechanisms of Parallel Query Execution When executing query operations on large tables, poor performance may occurs, especially on complex join operations. There are two limiting factors: the amount of available main memory and computational complexity. Consequently, we try to use parallel mechanism, which handles both the limitations well, to improve runtime efficiency. In the distributed context, when queries may be executed by middleware sitting on top of RDBMS we cannot use foreign key based indexes that are available to the local RDBMS. Besides, we also have limited plans to choose because tables have different locations. In this project, we will try to investigate three basic mechanisms to bring parallelism into Join execution. Taking Query - R 1 JOIN R2 JOIN R 3 JOIN R 4 (R i stands for input table) as example, the three mechanisms are: Independent Parallelism The above query could be executed in the following three steps: Step 1: R 1 JOIN R 2 => R 12 Step 2: R 3 JOIN R 4 => R 34 Step 3: R 12 JOIN R 34 => R 1234 Independent parallelism is illustrated in Figure 8. In this algorithm, independent steps (Step 1 & Step 2 in this case) can be fully parallelised, which leads to a great possibly of huge speedup. However, scalability is limited. To execute 4 joins independently you will need to have at least 8 relations where every pair is joined independently - this is not a frequent scenario. We get some parallelism - but it will rarely allow us to use say 8 processors. 13

22 3.2.2 Pipelined Parallelism Figure 8 Independent Parallelism Another approach would be building a data processing pipeline as in these steps: Step 1: R 1 JOIN R 2 => R 12 Step 2: R 12 JOIN R 3 => R 123 Step 3: R 123 JOIN R 4 => R 1234 Pipelined parallelism is illustrated in Figure 9. In this algorithm, there are data dependencies in different steps, which mean all these steps have to execute one by one. If two operations are related in such a way, the output of first operation is used as input in the second operation. On the other hand, if the first operation can be carried out so that partial results can be produced and immediately channelled to the second operation, then it becomes possible for the first operation to produce the next partial result while the second operation processes earlier partial results. Figure 9 Pipelined Parallelism Partitioned Parallelism Partitioned Parallelism is used for single join operation (R i JOIN R j => R ij ). There are three steps to join two tables together by using partitioned parallelism: Step 1: Split the input data into small sets. Step 2: Join related sets together. Step 3: Union previous results together. 14

23 Figure 10 shows how Partitioned Parallelism works. Figure 10 Partitioned Parallelism Mixed Parallelism Different parallelization mechanisms can be applied to different parts of a query plans. A query plan may be divided into parts which belong to different algorithms. For example, Figure 11 presents one of possible query plans for the following query: R 1 JOIN R 2 JOIN R 3 JOIN R 4 JOIN R 5 JOIN R 6 JOIN R 7 The following equivalence holds for EQUIJOIN. Therefore we can rewrite our query as: (R 1 JOIN R 2 ) JOIN R 3 R 1 JOIN (R 2 JOIN R 3 ) ((R 1 JOIN R 2 ) JOIN (R 3 JOIN R 4 )) JOIN ((R 5 JOIN R 6 ) JOIN R 7 ) The underlined one is applied independent parallelism and italic one is pipelined parallelism. Figure 11 Independent and Pipelined Mixed Parallelism 15

24 3.3 Partitioning Algorithms Data partitioning is used in the partitioned parallelism approach to distribute data over a number of processing elements. Each processing element is then executed simultaneously with other processing elements, thereby creating parallelism. It is the basic step of parallel query processing. When partitioning the workload, four partition algorithms are taken into consideration [3] : 1. Round-robin data partitioning In round-robin algorithm, data is partitioned by its record number. To illustrate, if data is partitioned into n parts, the (xn+i) th data will be put in i th block. The biggest advantage of this algorithm is its perfect load balance (every part has the same amount of data (±1)). 2. Hash data partitioning Data will be partitioned by applying a hash function so every new work set has its specific set of attribute values. However, load balance will be poor if distribution of values is skewed. For example, if we try to partition a work set into five parts. The work set is {1, 2, 3, 4, 6, 7, 8, 11, 12, 16} and hash function is (x mod 5). The result of partitioning will be: Work set 1: {1, 6, 11, 16} Work set 2: {2, 7, 12} Work set 3: {3, 8} Work set 4: {4} Work set 5: {} This shows the potential bad load balance of hash data partitioning. Furthermore, hash data partitioning is the best way to handle EQUIJOIN operation while range data partitioning is used to solve JOIN with greater than / less than operations. If we use the same hash function to partition both join inputs then related data tuples will be end up in the same bucket. So when processing an EQUIJOIN operation, we can easily match related hash split work sets together. 3. Range data partitioning A simple example makes this algorithm easy to understand: Partition a set of discrete number into three subsets. In this case, all the numbers less than 100 can be grouped 16

25 into set 1; numbers ranged in [101, 1000] will be set 2; numbers which are larger than 1000 will be split into last set. Same as hash data partitioning, range data partitioning has similar pros and cons. 4. Random-unequal data partitioning The partitioning function of this algorithm maybe hash or range partitioning function, or just unknown function. Data will be grouped randomly. All these partitioning algorithms have their advantages and disadvantages. So, partitioning algorithm should be chosen based on the type of JOIN algorithm. The project will use hash data partitioning and round-robin partitioning as partitioning algorithms. The former one is used for Hash Split Join Algorithm because it splits data based on the values of input tuples; the latter one is used for Sorted Merge Join Algorithm because when splitting input data for this JOIN algorithm, we do not care about values of tuples but split data randomly in order to ensure a good load balance. Further information is available in chapter Parallel Join Implementations There are many parallel join algorithms, but in this project, we just focus two of them: Hash Split Join algorithm and Sorted Merge Join algorithm. This section focuses on the structure of OGSADAI server side and client side code Hash Split Join In this algorithm, both input tuple sets will be split by their sorting key using default hash function first. Given a value K, the hash value produced by the default hash function is: hash ( K ) = K.hashCode () mod NUM NUM stands for the number of output subsets. hashcode() is the JAVA library function belongs to Object class which generate a integer number as result. After that, related sets can be joined in parallel. Every set contains one part of final result. The last step is to union all join result into the final result. The following activities were implemented to support hash split join algorithm in OGSA-DAI. For each activity input, output and behaviour are described. Hash Split Activity 17

26 Split input data by the giving name of column into a given number of sets by hash functions. Activity inputs: Data. Type: OGSA-DAI list of Tuples. A stream of tuples to be split. Name. Type: String. The name of column to split on. Number. Type: Integer. The number of output sets. Activity outputs: Result. Type: Array of OGSA-DAI list of Tuples. Hash Join Activity Join two sets together on the term of inner join operation. There is one more thing should be noticed that this is a generic activity. It can be also used join un-split input sets. Activity inputs: Data1. Type: OGSA-DAI list of Tuples. The first dataset to be joined. Data2. Type: OGSA-DAI list of Tuples. The second dataset to be joined. Name1. Type: String. The name of column to use for the join from the first dataset. Name2. Type: String. The name of column to use for the join from the second dataset. Activity outputs: Result. Type: OGSA-DAI list of Tuples. Union All Activity Union the given array of list of tuples into one. This activity is used to generate the final result. Activity inputs: Data. Type: Array of OGSA-DAI list of Tuples. The datasets to be union together. Number. Type: Integer. The number of datasets to be union together. Activity outputs: Result. Type: OGSA-DAI list of Tuples. 18

27 Hash Split Join User Side Code This is user side function. This function manages all hash join related activities. It connects activities output to the certain input. It is the one who build the entire workflow from single activities. Activity inputs: Query. Type: SQL Query. The request we try to executed. Number. Type: Integer. The number of processors. Activity outputs: Result. Type: OGSA-DAI list of Tuples Sorted Merge Join Sorted merge join algorithm is different from the hash split join algorithm. This algorithm needs four steps: split, sort split sets, merge split sets, sort merge join. This algorithm uses parallelization to sort the original input set and performs the low complexity sorted join as the last step. First of all, input tuples will be split into subsets. After that, these balanced sets can be sorted in parallel and merged together. The last step is to join two ordered sets into the final result. The following activities were used to form sorted merge join algorithm in OGSA-DAI. Random Split Activity Split input data into subsets with equivalent size. Activity inputs: Data. Type: OGSA-DAI list of Tuples. A stream of tuples to be split. Number. Type: Integer. The number of output sets. Activity outputs: Result. Type: Array of OGSA-DAI list of Tuples. 19

28 Tuple Sort Activity Sort input data by the giving column. Activity inputs: Data. Type: OGSA-DAI list of Tuples. A stream of tuples to be sorted. Name. Type: String. The name of column to sort. Activity outputs: Result. Type: OGSA-DAI list of Tuples. Sorted Merge Activity Merge sorted sets into one. This function only needs to scan every input set once, which leads to a good performance. Activity inputs: Data. Type: Array of OGSA-DAI list of Tuples. The ordered datasets to be merged together. Number. Type: Integer. The number of sets. Name. Type: String. The name of column used for merge. Activity outputs: Result. Type: OGSA-DAI list of Tuples. Sorted Join Activity Join two ordered sets together. This function also only need to scan every input sets once. Activity inputs: Data1. Type: OGSA-DAI list of Tuples. The first dataset to be joined. Data2. Type: OGSA-DAI list of Tuples. The second dataset to be joined. Name1. Type: String. The name of column to use for the join from the first dataset. Name2. Type: String. The name of column to use for the join from the second dataset. Activity outputs: Result. Type: OGSA-DAI list of Tuples. 20

29 Sorted Merge Join User Side Code This is user side code. Similar with Hash Join User Side Code, this function manages all sorted merge join related activities. It is the one who build the entire workflow from single activities. Activity inputs: Query. Type: SQL Query. The request we try to executed. Number. Type: Integer. The number of processors. Activity outputs: Result. Type: OGSA-DAI list of Tuples. 3.5 User Side Workflow OGSA-DAI workflow OGSA-DAI class PipelineWorkflow is used to assemble activities into workflow. By using this class, activities can be organized by their logical order. Independent activities can run in parallel. In detail, this is how OGSA-DAI workflows are assembled and executed programmatically: Step 1: Initialize all the activities in the workflow. Step 2: Connect active ties inputs and outputs. Step 3: Add activities to pipeline. Step 4: Get a handle of OGSA-DAI DataRequestExecutionResource object, and then execute the entire pipeline on this object. Note that, this class is called pipeline only because it organize related activities as pipeline. It allows parallelization. For example, several sort activities can run in parallel Serial Join Serial Join read both tables and joins them together without any parallel optimization. It is used to generate comparable result for testing and a baseline execution time for performance comparison. The main steps in the workflow are following: 21

30 Step 1: Read tables from database using SQLQuery activity. Step 2: Sort both left side and right side table using TupleSort activity. Step 3: Join tables by using OGSA-DAI default TupleMergeJoin activity. Figure 12 is the workflow of serial join. It uses default join activity provided by OGSA-DAI. Because of default join activity requires ordered inputs. The input tuples need to be sorted first. Figure 12 Serial Join Workflow Hash Split Join As it is motioned in chapter 3.4.1, Figure 13 illustrates Hash Split Join user side workflow: Step 1: Read tables from database using SQLQuery activity. Step 2: Split both left side and right side table into hash sets by HashSplit activity. Step 3: Use sort-merge join to apply smallest join unit. Step 3.1: Sort every hash set using TupleSort activity. Step 3.2: Join ordered sets using SortedMergeJoin activity. Step 4: Union all the result using UnionAll activity. 22

31 Figure 13 Hash Split Join Sorted Merge Join As it is motioned in chapter 3.4.2, Figure 14 illustrates Sorted Merge Join user side workflow which is quite similar with Hash Split Join workflow: Step 1: Read tables from database using SQLQuery activity. Step 2: Split both left side and right side table into sets randomly using RandomSplit activity. Step 3: Sort every set using TupleSort activity. Step 4: Combine sorted sets in Step 3 into big sorted set using SortedMerge activity. Step 5: Join two sorted sets using SortedJoin activity. 23

32 Figure 14 Sorted Merge Join 24

33 Chapter 4 Performance Analysis 4.1 Experimental Setup Test Environment Here is a list of key software used in our tests: Linux: el5 x86_64 GNU/Linux Tomcat: OGSA-DAI-3.1-axis The test machine is Ness, which is a parallel machine based on AMD Opteron processors running Linux. It has shared memory architecture. The system consists of the two back end X4600 SMP nodes; both nodes contain 16 2GB memory processor cores. This project only uses one of its back end node with a maximal 16 cores. Furthermore, all the queries request data from IBM DB2 server if not specifically mentioned otherwise. More details about databases and machines they run on are included in Chapter Test Data Set In order to evaluate correctness and performance, TPC-H Benchmark is introduced. TPC is Transaction Processing Performance Council. TPC benchmarks are widely used nowadays to evaluate performance and verify correctness of a database system. The TPC Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions [4]. The default TPC-H has uniform distribution of values. In order to analyse the performance under various distributions, TPC-H skew is used to test the performance on unbalanced input data. It is a modified version of this benchmark provided by Surajit Chaudhuri and Vivek Narasayya from Microsoft. 25

34 The TPC-H generator allows choosing the size of dataset, this project uses the default setting - 100MB as the size of database size Test Query This project chooses a query that contains largest number of tables from TPC-H queries as the test query. This query is: SELECT * FROM customer, orders, lineitem, supplier, nation, region Where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey This query contains two largest tables in database: orders (600,000 tuples) and lineitem (150,000 tuples). This query is complex enough for our needs as it joins six tables together. The bad news is that it is hard to reuse previously sorted tuple streams in consecutive joins. When this query is analysed, it is clear that all the tables are joined on different keys, which means we need to split tuples by different key, sort tuples by different order. Under this situation, there is no way to reuse previous results. However, after analysing other queries of the TPC-H, it is clear that it is the case for most of the queries Query Graph The large query in the previous section can be represented graphically as a query graph (Figure 15). Nodes (like P, PS, and L) represent source tables. JOIN operations are represented by the graph Edges. Query graph is a compact and convenient representation of join queries and it is used in join ordering algorithms. 26

35 Figure 15 Query Graph Query Plan The query graph presented in Figure 15 can be translated into a query plan shown in Figure 16. This query graph can be mapped to a logical query plan. Note that there may be several mappings that result in semantically equivalent query plans. Figure 16 Query Tree The last step is to translate logical query plan into the executable code. Alternatively, previous query graph can be also transferred into other query plan. Other examples of equivalent query plans are presented in Section Measurement of Time OGSA-DAI workflow is only being defined on client side and OGSA-DAI activities are only initialized in user side code. However, the workflow is executing on the server side 27

36 after its initialization. As a result, only overall running time is available in user side test code; we can not specify the cost of each activity in the test code. An alternative solution is adding timer before and after OGSA-DAI execute unit as debug information, thus can help us to obtain approximate running time of each single activity. As initialization cost of activities is excluded from time measurement, time measured this way must be smaller than the real time because it does not account for the pre-processing and post-processing time in. However, OGSA-DAI mechanism is not that suitable for testing individual running time of every activity. When connecting output stream of one activity to another s input stream, sender will produce its data in small chunks and insert data into the pipe connecting activities block by block. Once the first data chunk is sent, receiver activity is started and blocked by waiting for input stream. There may has overlap between the time line of these two activities. That is also why the sum of individual activities running times may be larger than the overall running time measured in user side Script for Submission As this project tries to measure performance on 16 processors, OGSA-DAI must be run on the back end of Ness. Cause OGSA-DAI runs on the tomcat server, the submit script will be organized as five steps: 1) Set environment parameters. 2) Start up tomcat server. 3) Wait for a while until its service boot successfully by sleep. 4) Run our test case. 5) Shut down tomcat server. The script is available in Appendix B Reproducibility of Measurements There is one thing that should be noted: Java needs a pre-run to warm up. A warm up can help OGSA-DAI initialize its context on the first request and perform just in time complication. Without the warm up phase, the performance of initial test runs may differ significantly from subsequent runs. In our case the initial run is about four times slower than the second run. In order to solve this problem, every test contains a inner loop that executes the same query ten times. Running time of each query is measured by the average time of the last nine loop operations (result of first one will be discarded). 28

37 Besides, JIT, which stand for just-in-time, technique may take some advantages in this test. This technique is used for improving the runtime performance of a program. It is an automatic optimisation based on runtime analysis and dynamic translation. It gains improvement over interpreters to speed up the hot spot of the code. It also can re-compile the code if this is found to be advantageous. Figure 17 illustrate that there is a more than four times speedup when executing query more than once. It can be contributed to that OGSA-DAI only initializes its context in the first request, repeatedly query request can save this time in the rest of executions. In the real world, the OGSA-DAI service should be always initialized. As a result, the first query running time, which is also the slowest one, will be discarded. Furthermore, it is hard to find some benefits bought by the JIT technique. If it works, the running time of second round should be a little bit slower than later ones. That is because it should spend extra time to re-compile the code whose execution time should be counted in round two and gain some speedup in the rest rounds due to the optimization made by re-compile. It may also be the case that all JIT optimisations are applied during the initial run. Figure 17 Running Time of Reproduced Test. The result is based on large parallel test (150,000 tuples). 4.2 Performance Analysis for Single Activity Performance and analysis of every activity is presented in this section. This information is used to spot the bottlenecks in the join workflow. Bandwidth is used to measure the performance of these activities, which is calculated by dividing the number of joined tuples by the processing time. 29

38 Every activity has pre-process, process and post-process steps. When evaluating an activity, timer starts at its pre-process step and ends at post-process step Swallow Activity Swallow activity is used to empty a given tuple list. It goes though its input (OGSA-DAI requirement) and returns an empty list or a count as output. This activity is very fast due to its empty body it basically swallows input tuples. It has two purposes: a) It removes noise in time measurement. As OGSA-DAI requires connecting all the activities input and output in the workflow, we need to add some activities which are not essential for JOIN operation. For example, TupleToWebRowSetCharArrays activity is used to transfer tuples list format to web readable format and is the last part in the workflow. When its input set is large, this activity is really slow and contributes more than 80% of overall running time. However, this activity is used to transfer data and is not part of the actual join processing. The alternative solution is adding a swallow activity before TupleToWebRowSetCharArrays activity. The TupleToWebRowSetCharArrays activity will only take a negligible time to execute for its empty input set and will damage the performance anymore. b) Performance of individual activity may confuse us due to mechanism of OGSA-DAI workflow. As it is showed in Figure 18, Activity 3 needs the output from both Activity 1 and Activity 2. However, Activity 1 is slower than Activity 2, so Activity 2 will block and wait for Activity 1 to finish. In this case, the individual time of Activity 2 will be larger than it is. Figure 18 Workflow without Swallow Activity A swallow activity can handle this problem easily. As it is showed in Figure 19, when testing the running time for individual activity, a swallow activity will be added behind the target activity. In above case, Activity 2 will end without waiting. The waiting time will be transferred to swallow activity which will not confuse us anymore. 30

39 Figure 19 Workflow with Swallow Activity To conclude, swallow activity is inserted after every activity in order to measure execution time of a part of the workflow and is very helpful activity to simplify code Performance of SQL Query Activities Table 1 shows the performance of SQL Query Activity. This is a serial activity. Workflow for this test is: SQL Query -> Swallow -> TupleToWebRowSetCharArrays Running time in this section is measured by the overall running time of this workflow. As it contains two additional and inexpensive activities, it should be a little bit bigger than it is. According to this table, this activity has a 45,000 to 95,000 tuples per second bandwidth which depending on the data size. It can also find that the bandwidth increased when increasing the number of tuples. As SQL query activity contains steps with steady and nonignorable cost like setup and steps whose cost closely related with data size, this activity has a better performance when handling large data sets. Number of Tuples Time (s) Bandwidth 150,000 Tuples ,000 Tuples ,000 Tuples ,000 Tuples ,000 Tuples ,000 Tuples Table 1 Bandwidth of SQL Query Activity 31

40 4.2.3 Performance of Split Activities Both Hash Split Activity and Random Split Activity is an O (N) task. It can be seen that with the number of tuples decreased, the running time of both the two activities will be decreased in the same pattern. Actually, as Hash Split Activity has an extra hash function, it is a little bit slower than Random Split Activity. However, as this extra hash function only contributes a little of the overall activity running time. The performances of both the two hash activities are almost the same. In this section, Hash Split Activity will be used to illustrate the performance. Execution time in this section is measures only the split activity. This is a serial activity. Here is the workflow to evaluate performance: SQL Query -> Split -> Swallow -> TupleToWebRowSetCharArrays As the reason pointed out in Section 4.1.6, SQL query activity will introduce some noise. Table 2 shows the bandwidth of this activity. Number of Tuples Time (s) Bandwidth (Tuples / second) 150, , , , , , Table 2 Bandwidth of Hash Split Activity 32

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3

Unit 4.3 - Storage Structures 1. Storage Structures. Unit 4.3 Storage Structures Unit 4.3 Unit 4.3 - Storage Structures 1 The Physical Store Storage Capacity Medium Transfer Rate Seek Time Main Memory 800 MB/s 500 MB Instant Hard Drive 10 MB/s 120 GB 10 ms CD-ROM

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

What Is Specific in Load Testing?

What Is Specific in Load Testing? What Is Specific in Load Testing? Testing of multi-user applications under realistic and stress loads is really the only way to ensure appropriate performance and reliability in production. Load testing

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

1. INTRODUCTION TO RDBMS

1. INTRODUCTION TO RDBMS Oracle For Beginners Page: 1 1. INTRODUCTION TO RDBMS What is DBMS? Data Models Relational database management system (RDBMS) Relational Algebra Structured query language (SQL) What Is DBMS? Data is one

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World COSC 304 Introduction to Systems Introduction Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca What is a database? A database is a collection of logically related data for

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC

Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time

More information

Business Intelligence Extensions for SPARQL

Business Intelligence Extensions for SPARQL Business Intelligence Extensions for SPARQL Orri Erling (Program Manager, OpenLink Virtuoso) and Ivan Mikhailov (Lead Developer, OpenLink Virtuoso). OpenLink Software, 10 Burlington Mall Road Suite 265

More information

A Middleware Strategy to Survive Compute Peak Loads in Cloud

A Middleware Strategy to Survive Compute Peak Loads in Cloud A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: sashko.ristov@finki.ukim.mk

More information

Towards Heterogeneous Grid Database Replication. Kemian Dang

Towards Heterogeneous Grid Database Replication. Kemian Dang Towards Heterogeneous Grid Database Replication Kemian Dang Master of Science Computer Science School of Informatics University of Edinburgh 2008 Abstract Heterogeneous database replication in the Grid

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com murachbooks@murach.com Expanded

More information

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner

Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author

More information

2. Basic Relational Data Model

2. Basic Relational Data Model 2. Basic Relational Data Model 2.1 Introduction Basic concepts of information models, their realisation in databases comprising data objects and object relationships, and their management by DBMS s that

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases

More information

The Advantages and Disadvantages of Network Computing Nodes

The Advantages and Disadvantages of Network Computing Nodes Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR

ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal Paper Title: Generic Framework for Video Analysis Authors: Luís Filipe Tavares INESC Porto lft@inescporto.pt Luís Teixeira INESC Porto, Universidade Católica Portuguesa lmt@inescporto.pt Luís Corte-Real

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff D80198GC10 Oracle Database 12c SQL and Fundamentals Summary Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff Level Professional Delivery Method Instructor-led

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

Parallel & Distributed Data Management

Parallel & Distributed Data Management Parallel & Distributed Data Management Kai Shen Data Management Data management Efficiency: fast reads/writes Durability and consistency: data is safe and sound despite failures Usability: convenient interfaces

More information

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor

PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework

More information

SQL Server Query Tuning

SQL Server Query Tuning SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Planning the Installation and Installing SQL Server

Planning the Installation and Installing SQL Server Chapter 2 Planning the Installation and Installing SQL Server In This Chapter c SQL Server Editions c Planning Phase c Installing SQL Server 22 Microsoft SQL Server 2012: A Beginner s Guide This chapter

More information

Improving SQL Server Performance

Improving SQL Server Performance Informatica Economică vol. 14, no. 2/2010 55 Improving SQL Server Performance Nicolae MERCIOIU 1, Victor VLADUCU 2 1 Prosecutor's Office attached to the High Court of Cassation and Justice 2 Prosecutor's

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Information and Communications Technology Courses at a Glance

Information and Communications Technology Courses at a Glance Information and Communications Technology Courses at a Glance Level 1 Courses ICT121 Introduction to Computer Systems Architecture This is an introductory course on the architecture of modern computer

More information

Locality-Sensitive Operators for Parallel Main-Memory Database Clusters

Locality-Sensitive Operators for Parallel Main-Memory Database Clusters Locality-Sensitive Operators for Parallel Main-Memory Database Clusters Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner*, Angelika Reiser, Alfons Kemper, Thomas Neumann Technische Universität München,

More information

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012 In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational

More information

Optimizing the Performance of Your Longview Application

Optimizing the Performance of Your Longview Application Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not

More information

Performance Comparison of Database Access over the Internet - Java Servlets vs CGI. T. Andrew Yang Ralph F. Grove

Performance Comparison of Database Access over the Internet - Java Servlets vs CGI. T. Andrew Yang Ralph F. Grove Performance Comparison of Database Access over the Internet - Java Servlets vs CGI Corresponding Author: T. Andrew Yang T. Andrew Yang Ralph F. Grove yang@grove.iup.edu rfgrove@computer.org Indiana University

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

Reversing Statistics for Scalable Test Databases Generation

Reversing Statistics for Scalable Test Databases Generation Reversing Statistics for Scalable Test Databases Generation Entong Shen Lyublena Antova Pivotal (formerly Greenplum) DBTest 2013, New York, June 24 1 Motivation Data generators: functional and performance

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

FileMaker 12. ODBC and JDBC Guide

FileMaker 12. ODBC and JDBC Guide FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

Technical Writing - A Practical Case Study on ehl 2004r3 Scalability testing

Technical Writing - A Practical Case Study on ehl 2004r3 Scalability testing ehl 2004r3 Scalability Whitepaper Published: 10/11/2005 Version: 1.1 Table of Contents Executive Summary... 3 Introduction... 4 Test setup and Methodology... 5 Automated tests... 5 Database... 5 Methodology...

More information

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able

More information

IT2305 Database Systems I (Compulsory)

IT2305 Database Systems I (Compulsory) Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

More information

In-memory Tables Technology overview and solutions

In-memory Tables Technology overview and solutions In-memory Tables Technology overview and solutions My mainframe is my business. My business relies on MIPS. Verna Bartlett Head of Marketing Gary Weinhold Systems Analyst Agenda Introduction to in-memory

More information

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Database Application Developer Tools Using Static Analysis and Dynamic Profiling Database Application Developer Tools Using Static Analysis and Dynamic Profiling Surajit Chaudhuri, Vivek Narasayya, Manoj Syamala Microsoft Research {surajitc,viveknar,manojsy}@microsoft.com Abstract

More information

Report Paper: MatLab/Database Connectivity

Report Paper: MatLab/Database Connectivity Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages ICOM 4036 Programming Languages Preliminaries Dr. Amirhossein Chinaei Dept. of Electrical & Computer Engineering UPRM Spring 2010 Language Evaluation Criteria Readability: the ease with which programs

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop Why Another Data Warehousing System? Data, data and more data 200GB per day in March 2008 12+TB(compressed) raw data per day today Trends

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Chapter 1: Introduction. Database Management System (DBMS) University Database Example This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

Efficient database auditing

Efficient database auditing Topicus Fincare Efficient database auditing And entity reversion Dennis Windhouwer Supervised by: Pim van den Broek, Jasper Laagland and Johan te Winkel 9 April 2014 SUMMARY Topicus wants their current

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master build 57 August 22, 2012 , Version 5.0 by EnterpriseDB Corporation Copyright 2012

More information

MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server. Course Overview MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General

More information

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...

More information

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

HansaWorld SQL Training Material

HansaWorld SQL Training Material HansaWorld University HansaWorld SQL Training Material HansaWorld Ltd. January 2008 Version 5.4 TABLE OF CONTENTS: TABLE OF CONTENTS:...2 OBJECTIVES...4 INTRODUCTION...5 Relational Databases...5 Definition...5

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

1/20/2016 INTRODUCTION

1/20/2016 INTRODUCTION INTRODUCTION 1 Programming languages have common concepts that are seen in all languages This course will discuss and illustrate these common concepts: Syntax Names Types Semantics Memory Management We

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

More information

How to Design and Create Your Own Custom Ext Rep

How to Design and Create Your Own Custom Ext Rep Combinatorial Block Designs 2009-04-15 Outline Project Intro External Representation Design Database System Deployment System Overview Conclusions 1. Since the project is a specific application in Combinatorial

More information

Query Optimization in Teradata Warehouse

Query Optimization in Teradata Warehouse Paper Query Optimization in Teradata Warehouse Agnieszka Gosk Abstract The time necessary for data processing is becoming shorter and shorter nowadays. This thesis presents a definition of the active data

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

3. Relational Model and Relational Algebra

3. Relational Model and Relational Algebra ECS-165A WQ 11 36 3. Relational Model and Relational Algebra Contents Fundamental Concepts of the Relational Model Integrity Constraints Translation ER schema Relational Database Schema Relational Algebra

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Portable Scale-Out Benchmarks for MySQL MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Continuent 2008 Agenda / Introductions / Scale-Out Review / Bristlecone Performance Testing Tools /

More information

Object Oriented Database Management System for Decision Support System.

Object Oriented Database Management System for Decision Support System. International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 3, Issue 6 (June 2014), PP.55-59 Object Oriented Database Management System for Decision

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

OPTIMIZING QUERIES IN SQL SERVER 2008

OPTIMIZING QUERIES IN SQL SERVER 2008 Scientific Bulletin Economic Sciences, Vol. 9 (15) - Information technology - OPTIMIZING QUERIES IN SQL SERVER 2008 Professor Ph.D. Ion LUNGU 1, Nicolae MERCIOIU 2, Victor VLĂDUCU 3 1 Academy of Economic

More information

Database Management. Chapter Objectives

Database Management. Chapter Objectives 3 Database Management Chapter Objectives When actually using a database, administrative processes maintaining data integrity and security, recovery from failures, etc. are required. A database management

More information

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition Liferay Portal Performance Benchmark Study of Liferay Portal Enterprise Edition Table of Contents Executive Summary... 3 Test Scenarios... 4 Benchmark Configuration and Methodology... 5 Environment Configuration...

More information

10g versions followed on separate paths due to different approaches, but mainly due to differences in technology that were known to be huge.

10g versions followed on separate paths due to different approaches, but mainly due to differences in technology that were known to be huge. Oracle BPM 11g Platform Analysis May 2010 I was privileged to be invited to participate in "EMEA BPM 11g beta bootcamp" in April 2010, where I had close contact with the latest release of Oracle BPM 11g.

More information