Key/Value Pair versus hstore

Size: px
Start display at page:

Download "Key/Value Pair versus hstore"

Transcription

1 Benchmarking Entity-Attribute-Value Structures in PostgreSQL HSR Hochschule für Technik Rapperswil Institut für Software Oberseestrasse 10 Postfach 1475 CH-8640 Rapperswil Advisor: Prof. Stefan Keller Author: Michel Ott Rapperswil May 26 th, 2011

2 Table of Figures Table of Contents Table of Figures Tables Figures Listings List Of Abbreviations IV IV IV V VI 1 Introduction Project description Restrictions on the scope of the project 1 2 Overview PostgreSQL Key Value Pair Hstore Functions Working principle Benchmark Tools Pgbench HSR Texas Geo Database Benchmark 6 3 Benchmark proposal Terms Generate / Preprocessing phase Execution phase Benchmark / Analysis phase Performance Benchmark Design Table Schema Statements Datasets Test Application 13 4 Benchmark Mai Technical specification Execution Preparing Database Generating Test Data Executing Benchmark Results Findings Conclusion 31 Bibliography 32 University of Applied Science Rapperswil II

3 Table of Figures Appendix 33 University of Applied Science Rapperswil III

4 Table of Figures Table of Figures Tables Table 1: KVP additional information table... 3 Table 2: KVP table... 3 Table 3: Columns of a test dataset record Table 4: Number of dataset length Table 5: Input parameters for test data generator Table 6: Input parameters for benchmark application Table 7: Hardware specification of system under test Table 8: Software specification of system under test Table 9: Hstore table abstract Table 10: KVP table abstract Table 11: Hstore tuple examples Figures Figure 1: KVP example SQL statement... 3 Figure 2: hstore create table SQL example... 3 Figure 3: hstore insert SQL example... 4 Figure 5: Example of a test data file Figure 6: Number of test cicle Figure 7: Test / benchmark application incl. test data generator Figure 8: Database setup script Figure 9: Test data generation script Figure 10: Benchmark script example Figure 11: Overview KVP vs. hstore benchmark Figure 12: Benchmark KVP vs. hstore from 10 to 2.5K Figure 13: Overview KVP hstore with combined index Figure 14: Benchmark KVP hstore from 10 to 2.5K with combined index Figure 15: Overview of difference between KVP single and combined index Figure 16: 10 to 2.5K: Difference between KVP single and combined index University of Applied Science Rapperswil IV

5 Table of Figures Figure 17: Overview of KVP with index on key and combined index against hstore Figure 18: 10 to 2.5K: KVP with index on key and combined index against hstore Figure 19: Index size overview Figure 20: Index size for 10 to records Listings Listing 1: Hstore data type definition... 5 Listing 2: Registering a PostgreSQL operator... 5 Listing 3: Defining a PostgreSQL function... 5 Listing 4: KVP Benchmark Table Listing 5: Hstore Benchmark Table Listing 6: KVP index Listing 7: Hstore index Listing 8: KVP select example Listing 9: KVP select example Listing 10: Transformed test data to KVP SQL statement Listing 11: Transformed test data to hstore SQL statement Listing 12: Explain Analyze statement for hstore without index Listing 13: Output of Explain Analyze statement for hstore without index Listing 14: Explain Analyze statement for hstore with index Listing 15: Output of Explain Analyze statement for hstore with index Listing 16: Explain Analyze statement for KVP Listing 17: Output of Explain Analyze statement for KVP Listing 18: Programming example of explain sequences Listing 19: Output of Explain Analyze statement for KVP with index on attribute key Listing 20: Explain Analyze statement for KVP Listing 21: Output of Explain Analyze statement for KVP with combined index Listing 22: Querying all person how lives in region with zip code University of Applied Science Rapperswil V

6 List Of Abbreviations List Of Abbreviations Abbreviation bash CPU DB2 GiST KVP ms MSSQL OpenFTS Oracle PgSQL PostGIS SQL URL Description Bourne-again shell Central Processing Unit Commercial relational database management system developed by IBM Generalized Search Tree Key Value Pair milliseconds Commercial relational database management system produced by Microsoft Open Source Full Text Search engine Commercial object-relational database management system produced by Oracle Corporation PostgreSQL Open source object relational database Adds support for geographic objects to the PostgreSQL object-relational database Structured Query Language Uniform Resource Locator University of Applied Science Rapperswil VI

7 Chapter 1: Introduction 1 Introduction The following chapter describes the scope of the project and its boundaries and restrictions. In general the goal is to benchmark the performance of PostgreSQL key-value-pairs against PostgreSQL hstore data type. 1.1 Project description As part of this term paper a project evolved to benchmark PostgreSQL key-value-pairs, further referred to as KVP, versus PostgreSQL in combination with hstore, further referred to as hstore data type (probably an abbreviation for hash storage structure à la Perl hash). Hstore is part of PostgreSQL distribution since version 8.2 as an additional module and storage for semi structural data with GiST index access. The PostgreSQL core distribution does not know of key value pair (KVP) information in a single attribute. That means it is not possible to store an associative array e.g. {surname : John, name : Smith } in a attribute and query John s name. This additional functionality was introduced by Oleg Bartunov and Teodor Sigaev and enhanced by Andrew Gierth under the synonym hstore. Hstore is an enhancement for PostgreSQL, which provides a new data type and a bunch of functions to store and query for KVP information. Dictionaries or associative arrays are the parent synonym for key value pairs (KVP) or abstract data structures (ADT). They handle pairs, also known as items, as keys and their corresponding values. Most modern script languages support dictionaries/associative arrays as a primary container type. KVP is also called entity attribute value model (EAV) or object attribute value mode. Both techniques, KVP and hstore, store arbitrary data as objects in the database, however, the design of the database tables and therefore its schema is quite different in a way how the data is stored, referenced and queried. The goal of this document is to find an answer which technique performs faster. 1.2 Restrictions on the scope of the project The purpose of this benchmark is not to test the stability or scalability of PostgresSQL. This is rather a test of KVP and hstore on a given PostgreSQL environment. This implies that the performance tuning of PostgreSQL is not in scope of this document. Nevertheless this document and all the applications can be the basis for such a test. In addition, benchmarking the insertion of the test data is also out-of-scope. Secondly, the semantics of the data, which is used for the test, is not relevant for any outcome of the benchmark. For this reason the lorem ipsum dolor 1 dummy text is used in the test data generation application and a name generator, which is based on combining syllables. 1 lorem ipsum dolor is a nonsense paragraph which aims at demonstrating a font to a reader without distracting him by the gibberish of the text (Walsh, 1996). University of Applied Science Rapperswil 1

8 Chapter 2: Overview 2 Overview Today PostgreSQL has a huge community, not only because it is for free, but also due to the fact that it has a lot of extensions like the geospatial extension PostGIS or the hstore mentioned before. The following chapter first describes what PostgreSQL is, then It explains the difference between key-value-pairs (KVP) as a table structure versus KVP using the hstore data type. By studying subchapter 2.2 Key Value Pair and 2.3 Hstore you will recognize that KVP stores the key and one or more related values in different table columns whereas hstore introduces a new abstract data type allowing storing an associated array in the form of unique keys and related values within a single table column. Suitable is KVP for easy data storage and data capture, rows with many attributes that are rarely examined, and semi structured data. 2.1 PostgreSQL PostgreSQL is an open source relational database, even an object-relational database according (PostgreSQL Global Development Group). Since its life of over 15 years, PostgreSQL has a proven standing in different applications fields. This, because it implements a set of capabilities that are well known from proprietary software vendors like Oracle, IBM DB2 or Microsoft and of course it provides all the other features such as scalability, maintainability, asynchronous replication, etc. This and many more brings PostgreSQL in a position of a real competitor for proprietary software vendors in companies of different size and as of actual state, PostgreSQL is an enterprise class database (PostgreSQL Global Development Group). Additionally, PostgreSQL s SQL implementation conforms the ANSI-SQL:2008 standard and implements beside the standard SQL statements select, insert, etc. also primary and foreign keys with restrictions, check constraints, unique constraints, cascading and many more. Its highly customizable environment let users and developers easily extend PostgreSQL. Examples are Generalized Search Tree (GiST), Open Source Full Text Search engine (OpenFTS) and PostGIS. This is done by using one of dozen different programming languages, including Java, Perl, Python, Ruby, Tcl, C/C++, and its own PL/pgSQL. Developers and users can draw upon hundreds of built-in functions of the standard library from basic math and string operations to cryptography and Oracle compatibility (PostgreSQL Global Development Group). 2.2 Key Value Pair Not to be mistaken, key value pair (KVP) in the context of this paper is not the same as known in database context. The normal interpretation is an associated array in an abstract data type, the value, along a unique identifier, the key. This purpose belongs in this paper to the term hstore described in the chapter below. Thus, KVP means the standard way of creating, maintaining, and using tables in PostgreSQL with exactly two attributes key and value. That means that the KVP table can store different unique key value pairs for specific information. That implies that for a given key value pair a unique identifier is needed that reverses to this information. Consequently that KVP table needed to be University of Applied Science Rapperswil 2

9 Chapter 2: Overview enhanced with an identifier and an additional table is needed to store additional information. From this it follows that the base schema of the KVP structure look likes the following. Table: bench_kvp_info id : Integer attribute_1 : Text attribute_n : Text Table 1: KVP additional information table Table: bench_kvp id_fk : Integer key : Text value : Text Table 2: KVP table The bench_kvp_info table holds a unique identifier for specific information and the additional data to it. For example it could hold the information of a restaurant such as street, postal code, phone number and so on. The bench_kvp table stores in addition information that is not foreseeable. Such as those, that could describe or give more specific information to the restaurant like the type of cuisine, a URL to its homepage, and so on. Key value pairs are information that specify and information more exactly but not necessary mandatory for all data in the information table. This structure allows adding easily new nonmandatory information without touching the table schema. In Postgresql it can be setup as follow: 1 CREATE TABLE bench_kvp_info ( id integer PRIMARY KEY, attribute_1 text, attribute_n text ); 2 CREATE TABLE bench_kvp ( id integer REFERENCE bench_kvp_info(id), key text NOT NULL, value text ); Figure 1: KVP example SQL statement 2.3 Hstore Hstore means in this paper an associated array stored in an abstract data type composed of a unique identifier based on the hstore PostgreSQL enhancement developed by Oleg Bartunov and Teodor Sigaev as an additional module. The hstore enhancement introduces an abstract data type called hstore which can store an associated array. In addition it provides a bunch of different functions and operators as well as indexing possibilities on the abstract data type. An index can be created on the GiST, GIN, BTree, or Hash engine (Bartunov, Sigaev, & Gierth). A pure hstore table can be created in the following way: 1 CREATE TABLE bench_hstore ( kvp_hstore hstore ); 2 CREATE INDEX hidx ON bench_hstore USING GIST(kvp_hstore); Figure 2: hstore create table SQL example University of Applied Science Rapperswil 3

10 Chapter 2: Overview Inserting a tuple is as easy as creating an attribute of type hstore: 1 INSERT INTO bench_hstore(kvp_hstore) VALUES( hstore( id =>1, surname => McNeal, forename => Bob ) ); 2 INSERT INTO bench_hstore(kvp_hstore) VALUES( hstore( id =>2, surname => Gates ) ); Figure 3: hstore insert SQL example Functions As you can see in the above example, the length of the array may vary from tuple to tuple. Important to see is that each line the associated array has a key and a value separated by comma e.g. hstore( id =>2, surname => Gates ) hstore( <key 1> => <value 1>,, <key n> => <value n> ) means that we have two different unique keys, id and surname, and each unique key has a value, for id it is 2 and for surname it is Gates. Unique means that in a tuple a key can only be defined once. For example the id surname can only appear once in the same tuple; the following hstore is not allowed: hstore( id =>1, surname => Gates, surname => McNeal ) As mentioned above in the introduction of this chapter hstore provide in addition to the hstore data type a lot of PostgreSQL functions and operators, which can be used for querying, manipulating, and comparing values. At this point only some of them are explained. The most important one is to query a specific value in a key. In the KVP methodology you do it like this: SELECT surname, forname FROM bench_kvp WHERE id = 2; On a hstore data type it works like as follow: SELECT kvp_hstore-> surname AS surname, kvp_hstore-> forname AS forename FROM bench_hstore WHERE kvp_hstore-> id = 2 ; Although the id is in this example an integer, you need query it as if it is a string (see the WHERE clause). Because tuples can have hstores of different length and therefore various keys, it is sometimes important to find out if a key exists in the hstore attribute: SELECT kvp_hstore? forename AS available FROM bench_hstore; This statement gives you for each tuple a t for true or f for false if key forename is whether available or not. University of Applied Science Rapperswil 4

11 Chapter 2: Overview Or maybe you want to know all possible keys in a hstore: SELECT skeys(kvp_hstore) AS keys FROM bench_hstore GROUP BY keys; To become the key only once in the result list, a GROUP BY clause by keys need to be added to the statement Working principle Hstore is implemented in C as a PostgreSQL add-on and provides a SQL script to install the data type and all the PostgreSQL functions. Hstore tries to build a buffer over all the keys, which are in the hstore data type, if they are in alphabetical order. If not in some special functions the array will be sorted to have it alphabetical order. Hstore as data type is defined as follow: 1 CREATE TYPE hstore ( INTERNALLENGTH = -1, INPUT = hstore_in, OUTPUT = hstore_out, RECEIVE = hstore_recv, SEND = hstore_send, STORAGE = extended ); Listing 1: Hstore data type definition The important parameter is the INPUT one, which is linked to a C method. The hstore_in method parses the hstore string to a C structure that holds the key, value, and length of the key and value as well as the position in the array. The position is needed because the array is not really stored as an array in the database but rather as string (see Table 10: KVP table abstract). Querying now for a specific key value pair combination, hstore needs only to loop over this buffer and find the key right key value combination to get the position of the key and value in the array. Exemplary we take the comparison operator ->. Each operator, such as the hstore operators, needs to be registered in PostgreSQL. Executing the following statement can do this: 1 CREATE OPERATOR -> ( LEFTARG = hstore, RIGHTARG = text, PROCEDURE = fetchval ); Listing 2: Registering a PostgreSQL operator The important part is the PROCEDURE parameter. This parameter links the operator to a PostgreSQL function. So that means the fetchval function need to be defined as follow: 1 CREATE OR REPLACE FUNCTION fetchval(hstore,text) 2 RETURNS text 3 AS 'MODULE_PATHNAME','hstore_fetchval' 4 LANGUAGE C STRICT IMMUTABLE; Listing 3: Defining a PostgreSQL function As we see, the function returns a text. The PostgreSQL function fetchval is now linked to a C- implementation called hstore_fetchval defined on line 3 in Listing 3. University of Applied Science Rapperswil 5

12 Chapter 2: Overview For more information please visit the official PostgreSQL hstore documentation Benchmark Tools Currently two programs should be mentioned for benchmarking PostgreSQL. Both, pgbench and HSR Texas Geo Database Benchmark, are running in sequential mode SQL statements to test the database under test. For the test proposed in this paper an own benchmark tools has been written to fulfill the desired hypotheses Pgbench Pgbench is shipped in the PostgreSQL distribution package and runs test on a PostgreSQL instance in a sequential mode. Sequential mode means that the same SQL statement is run over and over in possible multiple concurrent database sessions, which fulfill the multi-processing architecture. At the end of the benchmark it calculates the average transaction time per seconds. Pgbench provides an own scripting language to customize the test scripts for using own data sets and test SQL statements. In addition it includes some industry-standard test cases, which let you compare PostgreSQL with other database products (Smith, 2010, S. 189). Custom scripts in pgbench allow you to create your own test scripts. It can handle statements with variable, which are known in Java as prepared statements. It is possible to define a SELECT statement like this: 1 SELECT value FROM bench_kvp WHERE forname = :sforname; As you can see :sforename is a variable that will be replaced on runtime with the variable value. All your statements need to be wrapped around the statements BEGIN; and END;, which defines the beginning and the end of the benchmark. Variables like :sforename can be filled before the test begins that means before the BEGIN; - statement. Each variable assigned need to be set by the function \set : 1 \set sforname Greg Pgbench provides additional functions like \setrandom for a random integer number, \setshell to read the result of a shell command into a variable or \shell to run a shell command but ignoring the result, and \sleep causes the script execution to sleep for a specific duration HSR Texas Geo Database Benchmark HSR Texas Geo Database Benchmark is another benchmark program written in Python to test spatial database systems regarding its performance. The benchmark is based predefined set of queries consisting simple spatial statements. The queries are run on different data sizes to moni- 2 PostgreSQL hstore documentation can be found here: The explenation in this paper are based on the PostgreSQL hstore documentation 9.0: University of Applied Science Rapperswil 6

13 Chapter 2: Overview tor the behavior. For all this queries this program provides the test data that comes from Texas USA. A test script looks like this: 1 SELECT count(*) FROM {dataset polygons} pg WHERE ST_Intersects(@bbox, pg.geo); This SELECT statement counts all polygons that intersect with a given bounding In general, the benchmark program is based on a cube. Different queries can be run on different systems by using different dataset. Each dataset will be installed on each system and all queries are run on all systems times the number of datasets. This guarantees that the different systems can be compared, because all are using the same data and statements. Figure 4: HSR Texas Geo Database Benchmark Cube Source: (Krummenacher, 2009) University of Applied Science Rapperswil 7

14 Chapter 3: Benchmark proposal 3 Benchmark proposal Before we can have a look at the benchmarking utility and the result, we need first to consider what the ingredients of a benchmark are. The term benchmark can be substitute into three different processes described in the following chapter. 3.1 Terms The term benchmark has a high cohesion to the term test. Looking into the Cambridge dictionary benchmark is defined as follows: a level of quality which can be used as a standard when comparing other things (Cambridge University Press). That by which the existence, quality, or genuineness of anything is or may be determined; [ ] (Oxford University Press). The definition suggests that two different things of approximately the same topic need to be put in contrast to each other. That means that the things need to be converted in a form that makes them comparable. At this point the term test come in place, which does exactly this transformation: an act of using something to find out whether it is working correctly or how effective it is (Cambridge University Press) To evaluate or check (something) by comparison with an established standard; to measure against a comparable or equivalent point of reference, esp. in order to assess performance or set performance standards. (Oxford University Press) (Cambridge University Press). A test tries to find a form, which makes the things comparable. The result of the test act is a standardized input form for the benchmark. To bring the things into this comparable form, an upstream process needs to provide meaningful data that can be transformed by the test process and analyzed with accurate techniques by the benchmarking process to find at the end conclusions and define further activities. 3.2 Generate / Preprocessing phase The generate / preprocessing phase is the first step in a benchmark. This process tries to find accurate data and need to be prepared for the testing. The steps in this phase must not be underestimated, because the impact of a statistical irrelevant result is very high. The better the data and therefore the ground for the test itself is the higher is the probability that the analysis of the test result gives a significant outcome. University of Applied Science Rapperswil 8

15 Chapter 3: Benchmark proposal So some questions need here to be considered: What exactly I want to test? What should be the data and can they be transformed in the test process? Does the data fit into the given environment? Etc. All these questions are very fundamental and often at the first moment very easy to answer. However, finding the right data that fit into the environment and test process is not that easy e.g. test data has a wrong encoding and cannot be loaded, data does not cover the whole test design, and no significant result can be achieved, etc Execution phase The test execution phase defines the design of the chosen test technique. In general it can be divided into the following fields: Load Testing: measures and establishes benchmarks for the system under test by pushing transactions to the system. It can be incremental, or can be set amount that is proportional to the values of the system. Performance Testing: that is run repeatedly until acceptable performance levels are achieved through database tuning activities. Stress Testing: is the goal to break the system under test to gauge the system limits. Volume Testing: is similar to load testing, but involves placing a large amount of requesting objects on the system. The data for the system under test comes from the preprocessing phase and need to be transformed into a form that can be loaded into the data source (e.g. inserting data in a database). In the test design need to be defined if the insertion of the data need also to be tested or only the queries. 3.4 Benchmark / Analysis phase After all data has been loaded and run on the system under test, the given data from the test execution phase need to be analyzed from different aspects. Here it is important to know in advance, how the results could look like to choose the right algorithm and mathematical formulas for the analysis. Tools can heavily support this process, by automatically trying different mathematical functions like correlations, regressions, O-notation, and probabilities to calculate the course of a graph. At the end the user need to interpret the result of the benchmark phase and draw the right conclusions. University of Applied Science Rapperswil 9

16 Chapter 3: Benchmark proposal 3.5 Performance Benchmark Design To benchmark the KVP and hstore in PostgreSQL the decision has been taken in favor of a perfromance test. In this benchmark it is not important how stable and scalable PostgreSQL is, it is more interesting how does KVP and hstore perform on given preconfigured PostgreSQL environment Table Schema As described in chapter 2.2 Key Value Pair and 2.3 Hstore, the table schemas need to be defined in such a way that the comparison between KVP and hstore is fair. The goal of the schema definitions is to have for both an associative array in matters of the data, which need to be stored. It is not important to have an equal representation of the key value pairs in the database, however the philosophy of what information type at its granularity need to be stored and queried is important. It means that the data are note foreseeable in sense of additional information that could be provided to a specific data record. In this benchmark we use the following table schemas to represent the associative array in a database table. For KVP 1 CREATE TABLE bench_kvp_id (bench_id BIGINT PRIMARY KEY); 2 CREATE TABLE bench_kvp ( bench_id BIGINT REFERENCES bench_kvp_id(bench_id), key TEXT NOT NULL, value TEXT ); Listing 4: KVP Benchmark Table and for hstore: 1 CREATE TABLE bench_hstore ( bench_id BIGINT PRIMARY KEY, bench_hstore HSTORE NOT NULL ); Listing 5: Hstore Benchmark Table You can imagine that the KVP and hstore table schema provides the same strategy of storing an associative array. In spite of the strategy, the way how they store the data are quite different. The KVP table creates for each key value pair a new tuple and reference it to a unique identifier, which holds the additional information to the key value pairs. In contrast to KVP, hstore needs only one tuple for the key value pairs. Hstore saves the key value pairs in an associative string array, which looks like an array that developers used to use in their programming language. For both a test run will be executed once with an index and once without. For the KVP table the standard PostgreSQL index take place, which is a bitmap index. In addition to the table creation statement in Listing 4 we need to create an index for the KVP table: University of Applied Science Rapperswil 10

17 Chapter 3: Benchmark proposal 1 CREATE INDEX kvpidx1 ON bench_kvp (key); 2 CREATE INDEX kvpidx2 ON bench_kvp (key, value); Listing 6: KVP index Index for KVP shall be tested in two different ways. Firstly with a single index on the key attribute and secondly a combined index on the attributes key and value. For hstore an index can be created as follow: 1 CREATE INDEX hidx ON bench_hstore USING GIST (bench_hstore); Listing 7: Hstore index Statements To query a tuple based on a key value pair we have for each, KVP and hstore, an own SELECT statement. Because KVP needs for each key value pair a new tuple we have first to find the unique identifier to the key value pair and then we can select the information we need. This example selects all the information of a person with surname McNeal: 1 SELECT * FROM bench_kvp WHERE bench_id = ( SELECT bench_id FROM bench_kvp WHERE key = 'surname' AND value = 'McNeal' ); Listing 8: KVP select example By using hstore we need first to convert the attribute which stores the hstore string into a hstore object and then we could query the for a specific key value pair. The following statement does this: 1 SELECT * FROM bench_hstore WHERE hstore(bench_hstore)->'surname'='mcneal'; Listing 9: KVP select example Datasets For the load process an own test data generator (see also the next chapter 3.6 Test Application) has been written in Python that generates based on the lorem ipsum dolor dummy text and random numbers different sets of data. lorem ipsum dolor is a nonsense paragraph which aims at demonstrating a font to a reader without distracting him by the gibberish of the text (Walsh, 1996). Each dataset includes at maximum 5 attributes and the amount of attributes varies from dataset to dataset whereas the first two columns, id and forename, are mandatory in each record to guarantee a valid key value pair. This variation guarantees that the hstore does not have the same length and a more significant test can be achieved. A sample record has the following attributes: Column id : integer, sequence Description Mandatory. A unique sequence identifier. University of Applied Science Rapperswil 11

18 Chapter 3: Benchmark proposal surname : Text forename : Text zip : Integer Mandatory. A fancy name. Optional: A fancy name. Can be empty to have a variable KVP length. Optional: A number between 1000 and Can be empty to have a variable KVP length. comment : Text Table 3: Columns of a test dataset record and an abstract of a test file looks as follow: Optional: A dummy text. Can be empty to have a variable KVP length. 1 id,forename,surname,zip,comment 2 1,cucyp,,6593,lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet 3 2,kasarzyc,ecnalehad,8463, 4 3,inwa,,, Figure 5: Example of a test data file The test data records needs to be transformed into a valid KVP SQL statement like this: 1 INSERT INTO bench_kvp_info(id) VALUES(1) 2 INSERT INTO bench_kvp(id, key, value) VALUES(1, id, 1 ); 3 INSERT INTO bench_kvp(id, key, value) VALUES(1, forname, cucyp ); 4 INSERT INTO bench_kvp(id, key, value) VALUES(1, zip, 6593 ); 5 INSERT INTO bench_kvp(id, key, value) VALUES(1, comment, lorem ipsum dolor sit amet consetetur sadipscing elitr ); 6 INSERT INTO bench_kvp_info(id) VALUES(2) 7 INSERT INTO bench_kvp(id, key, value) VALUES(2, id, 2 ); 8 INSERT INTO bench_kvp(id, key, value) VALUES(2, forname, kasarzyc ); 9 INSERT INTO bench_kvp(id, key, value) VALUES(2, surname, ecnalehad); 10 INSERT INTO bench_kvp(id, key, value) VALUES(2, zip, 8463 ); 11 INSERT INTO bench_kvp_info(id) VALUES(3) 12 INSERT INTO bench_kvp(id, key, value) VALUES(3, id, 3 ); 13 INSERT INTO bench_kvp(id, key, value) VALUES(3, forname, inwa ); Listing 10: Transformed test data to KVP SQL statement and hstore SQL statement like this: 1 INSERT INTO bench_hstore(bench_hstore) VAL- UES(hstore( id =>1, forename => cucyp, zip =>6593, com ment => lorem ipsum dolor )) 2 INSERT INTO bench_hstore(bench_hstore) VAL- UES(hstore( id =>2, forename => kasarzyc, surname => e cnalehad, zip =>8463 )) 3 INSERT INTO bench_hstore(bench_hstore) VAL- UES(hstore( id =>3, forename => inwa )) Listing 11: Transformed test data to hstore SQL statement The test tool automatically adds an additional unique id of type String respectively of type Text. The following dataset length are used to test KVP and hstore: University of Applied Science Rapperswil 12

19 Chapter 3: Benchmark proposal Datasets 10 records 100 records 500 records records records records records records records records records records Table 4: Number of dataset length Additionally, for KVP and hstore one test circle, means test phase and benchmark phase, take place once with database index and once without for each dataset length. In total 24 different test cycle were executed: 12 [# of length] 2 [# of types] 2 [# of indices] 3 [warm start] =144 [cicles] Figure 6: Number of test cicle # of length defines the number of different datasets that need to be run, # of types means the amount of different data sources that need to be testes. In our case it is KVP and hstore. #of indices means that the test will be run once with indexed data and once without. This results in 20 different test cycles. 3.6 Test Application For benchmarking KVP and hstore an own application has been written in Python. It supports all three phases: Generate / Preprocessing, Execution, and Bechmark / Analysis. It has been written in a way that it can be enhanced for other data sources like MySQL, MSSQL, Oracle, etc. by inheriting the general adapter object, which ensures the database connection and runs all the benchmark task (compare with Figure 7). University of Applied Science Rapperswil 13

20 Chapter 3: Benchmark proposal Data generator Log and Graphs < write Data.csv < load write > Benchmark n standalone processes Adapter insert > set > hstore {name: Joe, } {name: Anne, } {name: Smith, } {name: Bob, } {name: Marco,...} < add insert > KVP < get Task queue inherit > < get Processes excute task > Response queue < write response excute task > (Adapters) Figure 7: Test / benchmark application incl. test data generator As you can see in the Figure 7 the test applications consists of a bundle of two modules. The first module is accountable for the preprocessing phase, or to be more precise, for test data generation. Based on input parameters it creates a dataset in a format that is readable for the execution phase. The following input parameters are mandatory: Parameter Description -a or amount Integer: Amount of records to be created. -t or --desc-type String: Description type can be words, lines, or chars. -l or --desc-length Integer: defines how many words, lines, or characters need to be added to the attribute comment in a test data record. Table 5: Input parameters for test data generator As described in chapter Error! Reference source not found. the test data generator creates a sequence of unique identifiers (ID), a forname and surname based on the name generator by Chris Gonnerman, a random zip code between 1000 and 9000, and a comment based on the lorem ipsum generator of Per Erik Strandberg. The name generator creates names by randomly assembling random syllables. No name lists or other fancy features are included (Gonnerman, 2003). Lorem ipsum generator creates different length of dummy texts including non-ascii characters if needed (Strandberg, 2007). The second module contains the two other phases, execution and benchmark / analysis, and need to be run separately. The input parameters for this module are the following: University of Applied Science Rapperswil 14

21 Chapter 3: Benchmark proposal Parameter Description -t or --type String: Type of database test. Currently pgsql and pgsqlhstore supported. -x or --processes Integer: The amount of parallel processes that should be allocated. If this parameter is not set, then the software tries to find out the maximum parallel processes of the CPU architecture. -s or --server String: Server or hostname where the database runs e.g. localhost -p or --port Integer: Port of the database server. -d or --database String: Database name. -u or --user String: A user how has the rights to create tables, do insertions and run queries. -p or --password String: User s password. -a or --data String: File that includes all test records. -i or --index Boolean: Defines whether an index on the tables should be allocated or not. -n or --no-hot-start Boolean: The flag defines if a hot start is required. If it is not set, it runs the test 2 times before it measures on the 3 round the transaction time. -l or --log Boolean: Whether a log file should be created or not. -g or --graph Boolean: Should a Graph be created or not. If it is set, than the --log parameter need also to be set. Table 6: Input parameters for benchmark application At the beginning, this module try to find out how many processes the hardware can run in parallel if the parameter -x or --processes is not set. Each process is a separate adapter object, which is waiting on tasks from the task queue. In the next steps it takes the input data, generates a unique identifier and calls the appropriate database adapter for insertion. Then it loops over all unique identifier and creates tasks, which it puts into the task queue. Each adapter takes autonomous a task from the queue and executes it. The time the adapter needs to execute the task flow back over the response queue to the benchmark object in the module. The benchmark object reads all response, generates the average and tries to find a suitable regression curve. Additionally all this information will be written in a file and a suitable graphical representation will be created for manual analysis. University of Applied Science Rapperswil 15

22 Chapter 4: Benchmark Mai Benchmark Mai 2011 The following chapter specifies the hardware and software used for the system under test and describes the results as well as the findings. 4.1 Technical specification The server on which KVP and hstore need to be tested has the following hardware specification: Type Comment Processor CPU Instructions set Intel(R) Xeon(R) CPU 2.27GHz 64-bit # of cores 4 # of threads 8 # of CPUs 2 Memory Total Speed kb about 24 GB 1066 MHz Idle modus Only Ubuntu and PostgreSQL are running on the same hard disk whereas kb about 23 GB RAM is free Table 7: Hardware specification of system under test The software specification is as follow: Software OS Kernel Coment Ubuntu LTS server GCC Database PostgreSQL and PostGIS Python Version and the following add-ons: Numpy Scipy Matplotlib Table 8: Software specification of system under test 4.2 Execution Before the test can be started a database and the test data need to be created. The following subchapters describe which steps are needed to be ready for executing a benchmark. For all steps, preparing database, generating test data, and executing benchmark, an example bash script is provided. University of Applied Science Rapperswil 16

23 Chapter 4: Benchmark Mai Preparing Database To prepare the database you need first to login as a user that has the privilege to create a database such as user postgres : 1 sudo su postgres As postgres user you can run the first script called step_1.sh which creates a new database user benchmark, a database called benchmark and runs the hstore script that install the hsotre data type and a bunch of PostgreSQL functions. 1./step_1.sh The content of the step_1.sh script is as follow: 1 #!/bin/bash 2 # first login as user with privilege to create a database e.g. sudo su postgres 3 4 # create new user benchmark 5 createuser -l -D -R -S benchmark 6 7 # alter user's password 8 psql -U postgres -c "ALTER USER benchmark WITH PASSWORD 'benchmark'" 9 10 # create new database benchmark 11 createdb -U postgres benchmark # create language plpgsql on benchmark database 14 createlang -U postgres -d benchmark plpgsql # load hstore data type and functions 17 psql -U postgres -d benchmark -f /usr/share/postgresql/9.0/contrib/hstore.sql # grant access to user benchmark 20 psql -U postgres -d benchmark -c "GRANT ALL PRIVILEGES ON DATA- BASE benchmark TO benchmark" Figure 8: Database setup script Generating Test Data As described in chapter 3.6 Test Application the data generator need to be run separately. The data can be created by executing the step_2.sh bash script which creates the defined data sets from chapter Error! Reference source not found. Error! Reference source not found.. 1./step_2.sh The bash script step_2.sh has the following content: 1 #!/bin/bash 2 # - for 10 3 python generator.py -a 10 -t words -l 50 4 mv data/testdata.csv data/testdata_10.csv 5 6 # create test data sets 7 # - for python generator.py -a 100 -t words -l 50 9 mv data/testdata.csv data/testdata_100.csv University of Applied Science Rapperswil 17

24 Chapter 4: Benchmark Mai # create test data sets 12 # - for python generator.py -a 500 -t words -l mv data/testdata.csv data/testdata_500.csv # etc. Figure 9: Test data generation script Executing Benchmark Now all prerequisites are fulfilled and the benchmark can be started. Also for this step an example script is available. Run the following command to benchmark KVP and hstore once with index and once without based on the generated datasets in the previous script. 1./step_3.sh The step_3.sh script includes the following statements. 1 #!/bin/bash 2 ############ 3 #for hstore 4 ############ 5 # - for 10 6 # - without index 7 python benchmark.py -t pgsqlhstore -s localhost -p d benchmark -u benchmark -w benchmark -a data/testdata_10.csv -l -g -n 8 mv output/1.png output/hstore_10_1.png 9 mv output/2.png output/hstore_10_2.png 10 mv output/log.csv output/hstore_10_log.csv 11 mv output/log_summary.csv output/hstore_10_log_summary.csv 12 psql -d benchmark -c "EXPLAIN ANALYZE SELECT * FROM bench_hstore WHERE hstore(bench_hstore)->'id'='7';" > output/analyze.log 13 mv output/analyze.log output/hstore_10_analyze.log # - with index 16 python benchmark.py -t pgsqlhstore -s localhost -p d benchmark -u benchmark -w benchmark -a data/testdata_10.csv -i -l -g -n 17 mv output/1.png output/hstore_10_index_1.png 18 mv output/2.png output/hstore_10_index_2.png 19 mv output/log.csv output/hstore_10_index_log.csv 20 mv output/log_summary.csv output/hstore_10_index_log_summary.csv 21 psql -d benchmark -c "EXPLAIN ANALYZE SELECT * FROM bench_hstore WHERE hstore(bench_hstore)->'id'='7';" > output/analyze.log 22 mv output/analyze.log output/hstore_10_index_analyze.log ############ 25 #for KVP 26 ############ 27 # - for # - without index 29 python benchmark.py -t pgsql -s localhost -p d benchmark -u benchmark -w benchmark -a data/testdata_10.csv -l -g -n 30 mv output/1.png output/kvp_10_1.png 31 mv output/2.png output/kvp_10_2.png 32 mv output/log.csv output/kvp_10_log.csv University of Applied Science Rapperswil 18

25 Chapter 4: Benchmark Mai mv output/log_summary.csv output/kvp_10_log_summary.csv 34 psql -d benchmark -c "EXPLAIN ANALYZE SELECT * FROM bench_kvp WHERE bench_id = (SELECT bench_id FROM bench_kvp WHERE key = 'id' AND value = '7');" > output/analyze.log 35 mv output/analyze.log output/kvp_10_analyze.log # - with index 38 python benchmark.py -t pgsql -s localhost -p d benchmark -u benchmark -w benchmark -a data/testdata_10.csv -i -l -g -n 39 mv output/1.png output/kvp_10_index_1.png 40 mv output/2.png output/kvp_10_index_2.png 41 mv output/log.csv output/kvp_10_index_log.csv 42 mv output/log_summary.csv output/kvp_10_index_log_summary.csv 43 psql -d benchmark -c "EXPLAIN ANALYZE SELECT * FROM bench_kvp WHERE bench_id = (SELECT bench_id FROM bench_kvp WHERE key = 'id' AND value = '7');" > output/analyze.log 44 mv output/analyze.log output/kvp_10_index_analyze.log # etc. Figure 10: Benchmark script example 4.3 Results The test has been executed in May 2011 based on the test design described in chapter Error! Reference source not found. and the hardware specification in chapter 4.1. All the test logs were aggregated into a single file showing the start and end time as well as the duration and the average time in seconds per SELECT statement. The detail aggregation and the full extent diagrams can be reviewed in the appendix on page 32. In general it can be said that the hstore performs much better than a KVP table schema. Nevertheless this general conclusion needs to be differentiated and a closer look on the different data sets gives us a detailed understanding why the first assumption is not accurate enough. On the first test run each data set has been tested twice, once without an index and once with an index. For the KVP schema an index on the attribute key has been used. This benchmark gave the result as in Figure 11. In general, hstore does a really good job when querying a tuple by a given key value pair. Specifically when using big data sets, which results in a lot of tuples in the database table. One reason could be that hstore needs for storing arbitrary information only one attribute on the same tuple as all the other information are stored. But more on that later in chapter 4.4 Findings. University of Applied Science Rapperswil 19

26 Chapter 4: Benchmark Mai 2011 Figure 11: Overview KVP vs. hstore benchmark From 10 to approximately 500 records the KVP is much faster by querying a key value pair. Afterwards hstore demonstrates its strength especially when having more than tuples in the hstore table, whereas by using an index on the KVP table, hstore is faster by a factor of 4.04 and without an index by a factor of 7.9. This sounds like KVP is a performance killer, which is not the truth, because if we look at the absolute querying time per SELECT statement, than KVP needs in average for records heavy data set seconds (19.54 ms) without and seconds (15.12 ms) with an index compared to hstore seconds (4.84 ms) without and (1.91 ms) with index. The difference is only around 13 milliseconds per SELECT statement, which is not that much. Figure 12: Benchmark KVP vs. hstore from 10 to 2.5K University of Applied Science Rapperswil 20

27 Chapter 4: Benchmark Mai 2011 The circumstance changes, if we use a combined index for the KVP table on the attributes key and value. Hstore is still a tick faster than the KVP schema, but the difference between the average SELECT transaction time for KVP indexed table shrunk extremely fast. Overall we can say that the combined index comes very near to the hstore schema. Despite the fact that the combined index gives some performance boost, we can still see the problem, that the more tuples we store in the KVP table the higher will be the difference between the hstore and the KVP. This concludes that the more arbitrary data results need to be store, the faster grows the combined index and the longer needs PostgreSQL to find the right key value pair combination. But more on that later in chapter 4.4 Findings. Figure 13: Overview KVP hstore with combined index For small data stets the KVP schema is still a little bit faster however this changes quickly by an amount of circa 500 and more records / tuples per associative array. University of Applied Science Rapperswil 21

28 Chapter 4: Benchmark Mai 2011 Figure 14: Benchmark KVP hstore from 10 to 2.5K with combined index Having a nearer look to the data sets between 10 and tuples we see that the average transaction time on a big data set shrunk to (9.48 ms) from seconds (15.12 ms) that leads to the fact that a combined index is faster by a factor of ms 9.48ms»1.6 Applying this calculation to the big data set we become a factor of 1.72 because it shrunk from milliseconds to milliseconds, which is not that much. Figure 15: Overview of difference between KVP single and combined index University of Applied Science Rapperswil 22

29 Chapter 4: Benchmark Mai 2011 Small data sets show a contrary perspective. From 10 to 500 records the difference between the hstore and a KVP with a combined index is negative. This means that the KVP is faster then the hstore. From more or less 500 records upwards hstore will be faster even only a little bit. Figure 16: 10 to 2.5K: Difference between KVP single and combined index The subsequent two diagrams show the KVP without and with and index on attribute key as well as with combined index against hstore with and without index. It depict that KVP with a combined index comes very near to the hstore data type. Figure 17: Overview of KVP with index on key and combined index against hstore University of Applied Science Rapperswil 23

30 Chapter 4: Benchmark Mai 2011 Figure 18: 10 to 2.5K: KVP with index on key and combined index against hstore Analyzing the indices of hstore and KVP shows, that KVP needs a lot more size on the disk to build the index. Both an index on the attribute key and a combined index on the attributes key and value whereat the combined index need more disk size then the index on the attribute key. Surprisingly a GiST index on the hstore nees much less disk size even though GiST creates for each unique key in the hstore an own index. Figure 19: Index size overview University of Applied Science Rapperswil 24

31 Chapter 4: Benchmark Mai 2011 Especially in the area of tuples and more is the different significant. KVP with an index on the attribute key needs by tuples 3.73 times more disk space than the GiST index on a hstore and 3.55 times for the combined index. Figure 20: Index size for 10 to records 4.4 Findings As described in the chapter 0 Results, hstore is by a factor of 7.9 faster than KVP with an index on the key attribute and 4.95 with a combined index by records in an array. This can be traced back to the fact that KVP needs much more units (cost) 3 of work to get the right tuple, whereupon the size in bytes for a single tuple is a lot lower. Lets say we want to save array entries in a KVP and hstore schema. For the hstore we have exactly tuples in the database table as follow: å array entries = åtuples Database table: bench_id : BIGINT bench_hstore bench_hstore : HSTORE 1 "id"=>"1", "comment"=>"lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo 3 The cost factor in the EXPLAIN result defines the cost to read a single database page from disk. It does not relate to anything and therefore it can only be compared to other cost units (Nasby, 2010). University of Applied Science Rapperswil 25

32 Chapter 4: Benchmark Mai 2011 duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet ", "surname"=>"ebsaveq", "forename"=>"maeznidus" 2 "id"=>"2", "zip"=>"6489", "comment"=>"lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet ", "surname"=>"epofod", "forename"=>"teer" Table 9: Hstore table abstract 2500 "id"=>"2500", "comment"=>"lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet ", "forename"=>"sorietet" What we need to do is to analyze an example query. As an example we take this one: 1 EXPLAIN ANALYZE SELECT * FROM bench_hstore WHERE hstore(bench_hstore)->'id'='1735'; Listing 12: Explain Analyze statement for hstore without index The analysis says that the cost for getting the first tuple is 0 and all tuples needs cost units. PostgreSQL thinks that he will return 45 rows each of a size of 40 bytes. 1 Seq Scan on bench_hstore (cost= rows=45 width=40) (actual time= rows=1 loops=1) 2 Filter: ((bench_hstore -> 'id'::text) = '1735'::text) 3 Total runtime: ms 4 (3 rows) Listing 13: Output of Explain Analyze statement for hstore without index The actual time shows the effectively milliseconds needed to execute the example statement. PostgreSQL needed 1 loop, returned 1 row and required milliseconds. Hstore with an index needs to have another SELECT statement in the analyze query because the hstore operator to query an indexed attribute is a little bit different. In hstore it can be done by using operator. 1 EXPLAIN ANALYZE SELECT * FROM bench_hstore WHERE Listing 14: Explain Analyze statement for hstore with index The difference between a hstore without and with an index is, the hstore without an index is doing a sequential scan for each SELECT statement. That means that it begins at tuple one and continuous until the query is satisfied. When using an index at bitmap heap scan takes place. PostgreSQL has found a small subset of tuples that can fulfill the query. The smaller set of tuples to loop over enhances the performance to find the right tuple. University of Applied Science Rapperswil 26

33 Chapter 4: Benchmark Mai Bitmap Heap Scan on bench_hstore (cost= rows=2 width=218) (actual time= rows=1 loops=1) 2 Recheck Cond: '"id"=>"1735"'::hstore) 3 -> Bitmap Index Scan on hidx_2_5k (cost= rows=2 width=0) (actual time= rows=70 loops=1) 4 Index Cond: '"id"=>"1735"'::hstore) 5 Total runtime: ms 6 (5 rows) Listing 15: Output of Explain Analyze statement for hstore with index Now lets have a look at the KVP tables. Remember that we have in the schema two different tables but we only need the table with the attributes key and value. For the same amount of array entries in the KVP table a multiple of tuples will be stored. Each key value pair entry in the array needs to be a separate tuple in the table. If we have the following keys id, surname, forname, zip, and comment for a big array and each key in this array has an assigned value, then it will results to tuples in the database table. Compared to hstore it has 5 times more tuples or to be more precise the sum of all filled keys. å tuples = åvalue in the array, whereas value ¹ null The database table includes then for each key value pair an own tuple: Database table: bench_kvp bench_id : BIGINT key : TEXT NOT NULL value : TEXT 1 id 1 1 comment lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet 1 surname Ebsaveq 1 forename Maeznidus 2 id 2 2 zip comment lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet University of Applied Science Rapperswil 27

34 Chapter 4: Benchmark Mai surname epofod 2 forename Teer 2500 id forename sorietet Table 10: KVP table abstract 2500 Comment lorem ipsum dolor sit amet consetetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat sed diam voluptua at vero eos et accusam et justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata sanctus est lorem ipsum dolor sit amet Taking the same key value pair we used for hstore gives the following KVP SQL statement: 1 EXPLAIN ANALYZE SELECT * FROM bench_kvp WHERE bench_id = ( SELECT bench_id FROM bench_kvp WHERE key = 'id' AND value = '1735' ); Listing 16: Explain Analyze statement for KVP which results in the following EXPLAIN output: 1 Seq Scan on bench_kvp (cost= rows=3 width=60) (actual time= rows=2 loops=1) 2 Filter: (bench_id = $0) 3 InitPlan 1 (returns $0) 4 -> Seq Scan on bench_kvp (cost= rows=1 width=8) (actual time= rows=1 loops=1) 5 Filter: ((key = 'id'::text) AND (value = '1735'::text)) 6 Total runtime: ms Listing 17: Output of Explain Analyze statement for KVP To get the same result as for hstore two different sequences are needed. The inner sequence looks for the key value pair and gives the unique identifier back. The sequence cost to get all tuples back is whereas each tuple is 8 bytes big. Then with the given identifier a second sequence collects all tuples that relates to the given identifier. The cost for the first sequence begins where the inner sequence stops and ends at In a procedural language it works like this: 1 For each row in input_1 2 For each row in input_2 3 // Do_something 4 Next 5 Next Listing 18: Programming example of explain sequences University of Applied Science Rapperswil 28

35 Chapter 4: Benchmark Mai 2011 The difficulty of creating an own hstore like database schema is, that more sequences and therefore more reads of pages on the disk are needed which results in higher cost units. The lower size of the tuples in bytes does compensate the amount of pages to be read on the disk. Because in the short run it looks like that at the best-case only 60 bytes for the first sequence and 8 bytes for second sequence are needed. That is not the truth because if the first read in the second sequence finds the key value pair, than 8 bytes are consumed. With the found identifier the first sequence read all the tuples that match the identifier. In our case we have 5 key value pair combinations that means 5 tuples in the KVP table. Each tuple consumes 60 bytes, which is for all 5 tuples 300 bytes, plus the 8 bytes for the second sequence results in a total size of 308 bytes. Compared to the hstore in uses in the best-case only 213 bytes. Using an index on the key attribute can enforce finding the unique identifier for a given key value pair. Analyzing the technique shows that we have an additional sequence. 1 Seq Scan on bench_kvp (cost= rows=3 width=60) (actual time= rows=2 loops=1) 2 Filter: (bench_id = $0) 3 InitPlan 1 (returns $0) 4 -> Bitmap Heap Scan on bench_kvp (cost= rows=1 width=8) (actual time= rows=1 loops=1) 5 Recheck Cond: (key = 'id'::text) 6 Filter: (value = '1735'::text) 7 -> Bitmap Index Scan on kvpidx (cost= rows=2499 width=0) (actual time= rows=2500 loops=1) 8 Index Cond: (key = 'id'::text) 9 Total runtime: ms Listing 19: Output of Explain Analyze statement for KVP with index on attribute key It reads the index to find all the tuples, which have the given key and afterwards in the second sequence PostgreSQL needs to find the value in a shorter list of key value pairs. This enables to have an enhancement of factor 1.29, which is calculated by the average SELECT time without an index divided by the average SELECT time with an index on the key attribute. Interesting is what happens if we take a combined index on the attributes key and value. Again we take the same statement as before 1 EXPLAIN ANALYZE SELECT * FROM bench_kvp WHERE bench_id = ( SELECT bench_id FROM bench_kvp WHERE key = 'id' AND value = '1735' ); Listing 20: Explain Analyze statement for KVP Using a combined index reduces the cost consumption by a factor of 2.06, calculated in the same way as before: Average SELECT time without an index divided by the average SELECT time with a combined index ms 15.12ms» ms 9.48ms» 2.06 University of Applied Science Rapperswil 29

36 Chapter 4: Benchmark Mai 2011 This factor results, because no additional sequence is needed, like we had it when using only an index on the attribute key. The first index scan can directly find the unique identifier for a given key value pair and reading afterward all tuples for that identifier. The bytes, which are needed to gather the data, are exactly the same, as we needed for the first index alternative. In contrast to the first alternative the combined index on the attributes key and value will grow very fast, because for each key value pair combination a new entry in the index is needed. It is in the nature of the key value pair philosophy, that only arbitrary unforeseen information is stored as key value pair and therefore the probability that an equal key value pair appears in the table is very unlikely. 1 Seq Scan on bench_kvp (cost= rows=3 width=60) (actual time= rows=5 loops=1) 2 Filter: (bench_id = $0) 3 InitPlan 1 (returns $0) 4 -> Index Scan using kvpidx2 on bench_kvp (cost= rows=1 width=8) (actual time= rows=1 loops=1) 5 Index Cond: ((key = 'id'::text) AND (value = '1735'::text)) 6 Total runtime: ms 7 (6 rows) Listing 21: Output of Explain Analyze statement for KVP with combined index Lastly we need to have a short look on the hstore implementation. Please consider for that chapter Working principle. Lets take the following hstore example: bench_id : BIGINT bench_hstore : HSTORE 1 "zip"=>"8000 ", "surname"=>"ebsaveq", "forename"=>"maeznidus" 2 "zip"=>"6489", "surname"=>"epofod", "forename"=>"teer" 3 "zip"=>"8000 ", "surname"=>"kjuefs", "forename"=>"beer" Table 11: Hstore tuple examples Now we want to find all names how lives in the region with zip code For that, we need the following SQL statement: 1 SELECT hstore(bench_hstore)-> surname FROM bench_hstore WHERE hstore(bench_hstore)->'zip = 8000'; Listing 22: Querying all person how lives in region with zip code 8000 In this example PostgreSQL calls six times the fetchval function, because it loops over each tuple and calls the function twice. Once to get the value of the attribute surname and once the get the zip so that PostgreSQL can compare the returned value with the value When PostgreSQL is calling the fetchval function he hands the array over to the C function hstore_fetchval, which finds the key in the buffer. Based on the buffer entry it knows on which position begins the value and how long it is. For the key zip and the first tuple it would be something like start position 8 and length 4. So know hstore can substring the array and return the value 8000 back to PostgreSQL, which will then compare it with the given value after the equals sign. University of Applied Science Rapperswil 30

37 Chapter 4: Benchmark Mai Conclusion As described in the previous chapters hstore perform much faster then a KVP schema described in chapter 2.2. The stored data of type hstore are not lost in the database and can be migrated with a minimum of effort to another schema, because it is stored as string in the form of an associative array in the database. In addition, hstore provides PostgreSQL functions to transform the associative array into a column row like table, as it is know in every database management system. The keys in the associative array are transposed to columns and each row in the array is a tuple in the column row like table. The values are transformed to the values in a tuple. Therefore the fear for a later migration should not be the criteria of not using hstore. Also the way how it is implemented nees much less size on the disk for the indices and costs less performance. This shows the explain analysis in chapter 4.4 and the graphs on page 24 and 25. The cost of reading data is much lower then the one of a KVP schema. To remember, the cost defines a factor of reading a page from the disk. The higher it is the more it needs to read on the disk and the slower it will be. In addition hstore buffers all the keys and values to provide a faster read and along a single buffer entry it stores the position of the key and value in the string, which is of type hstore in the database table that represents the associative array. That means when hstore found the key it does not need to substring the string because it already knows the position in the string. Thus, it is to consider that for small datasets hstore is not the preferable method to store key value pairs. Especially when having an array size of 1 to more or less 500 records. At this size a KVP schema is preferable. The problem we face here is at the beginning we do not have a lot of records in the table. But when the time goes over the 500 records limit can be easily exceeded and then switching from the KVP schema to the hstore is not that easy, because not only the data need to be transformed to the hstore array but also the database tables and all the SQL statements for querying, inserting, modifying and deleting records need to be changed to the new schema. The conclusion is to first think about how many data will be expected during the database table lifetime and then choosing the right schema. If you are unsure which one you should take, my succession is to use hstore as default, because the difference of the average querying time between the KVP and the hstore schema is at 500 records only milliseconds, which means that KVP is 0.45 ms faster. To summarize, hstore provides in general a faster mechanism to store key value pairs. It is easy to use and offers a lot of operations to compare, transform, and search for data. Not only the performance but also the pagings and bytes which are needed to get the data for a given key value pair is better than setting up an own key value pair schema. All in all hstore is the preferable way of storing arbitrary unforeseen information in a database table. University of Applied Science Rapperswil 31

38 Bibliography Bibliography Bartunov, O., Sigaev, T., & Gierth, A. (n.d.). PostgreSQL 9.0: hstore. Retrieved Mai 1, 2011, from Cambridge University Press. (n.d.). Cambridge Dictionary Online. Retrieved April 26, 2011, from Gonnerman, C. (2003). Python Name Generators. Retrieved April 27, 2011, from Alderon's Tower: Krummenacher, R. (2009, December 21). HSR Texas Geo Database Benchmar. Retrieved June 3, 2011, from Wiki GISpunkt HSR: Nasby, J. (2010, May 13). Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT. Retrieved May 27, 2011, from PostgreSQL wiki: OUNT Oxford University Press. (n.d.). Oxford English Dictionary. Retrieved April 28, 2011, from PostgreSQL Global Development Group. (n.d.). PostgreSQL: About. Retrieved Mai 26, 2011, from Smith, G. (2010). PostgreSQL 9.0. High Performance. Olton, Birmingham, United Kingdom: Packt Publishing Ltd. Strandberg, P. E. (2007). Lorem Ipsum Generator. Retrieved April 28, 2011, from Walsh, N. (1996). What does `lorem ipsum dolor' mean? Retrieved April 26, 2011, from University of Applied Science Rapperswil 32

39 Appendix Appendix Benchmark with KVP index on attribute key Benchmark of hstore and KVP once with index (w) and once without index (o). For KVP an index on the attribute key has been choosen. University of Applied Science Rapperswil 33

40 Appendix University of Applied Science Rapperswil 34

41 Appendix University of Applied Science Rapperswil 35

42 Appendix Benchmark with combined KVP index Benchmark of hstore and KVP once with index (w) and once without index (o). For KVP a combined index on the attribute key and value has been choosen. University of Applied Science Rapperswil 36

43 Appendix University of Applied Science Rapperswil 37

44 Appendix University of Applied Science Rapperswil 38

45 Appendix Differences between KVP and hstore University of Applied Science Rapperswil 39

46 Appendix University of Applied Science Rapperswil 40

47 Appendix Average SELECT time for KVP and hstore University of Applied Science Rapperswil 41

48 Appendix University of Applied Science Rapperswil 42

49 Appendix KVP and hstore index sizes University of Applied Science Rapperswil 43

50 Appendix University of Applied Science Rapperswil 44

Global Innovation. GPS Tracking of Air Cargo

Global Innovation. GPS Tracking of Air Cargo Global Innovation GPS Tracking of Air Cargo 2 Offers More Transparency and Security for Air Cargo In the area of air cargo logistics, customer requirements in terms of monitoring and track-and-trace functions

More information

How To Run The Magic File Manipulator 2 On A Computer Or Computer (Windows)

How To Run The Magic File Manipulator 2 On A Computer Or Computer (Windows) Magic File Manipulator 2 Description of Functions Version 0.03 system-99 user-group Last Manual Edit: 2009-06-03 Translation by Bob Carmany Actual versions at system-99 user-group Seite 2 Table of Contents

More information

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013 Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

More information

Buyer s! Tips and info for buyers of property in the Bahamas

Buyer s! Tips and info for buyers of property in the Bahamas Buyer s! Tips and info for buyers of property in the Bahamas Looking To Purchase a! 2 Second Home in Abaco?! Or are you looking for a lot on which you can build your dream home? It can be a pleasant experience

More information

MongoDB. An introduction and performance analysis. Seminar Thesis

MongoDB. An introduction and performance analysis. Seminar Thesis MongoDB An introduction and performance analysis Seminar Thesis Master of Science in Engineering Major Software and Systems HSR Hochschule für Technik Rapperswil www.hsr.ch/mse Advisor: Author: Prof. Stefan

More information

Whitepaper: performance of SqlBulkCopy

Whitepaper: performance of SqlBulkCopy We SOLVE COMPLEX PROBLEMS of DATA MODELING and DEVELOP TOOLS and solutions to let business perform best through data analysis Whitepaper: performance of SqlBulkCopy This whitepaper provides an analysis

More information

PostgreSQL Functions By Example

PostgreSQL Functions By Example Postgre [email protected] credativ Group January 20, 2012 What are Functions? Introduction Uses Varieties Languages Full fledged SQL objects Many other database objects are implemented with them

More information

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. SQL Databases Course by Applied Technology Research Center. 23 September 2015 This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. Oracle Topics This Oracle Database: SQL

More information

New Issues and Initial Public Offerings (IPOs)

New Issues and Initial Public Offerings (IPOs) Lorem ipsum dolor Sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. New Issues and

More information

Postgres Plus xdb Replication Server with Multi-Master User s Guide

Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master User s Guide Postgres Plus xdb Replication Server with Multi-Master build 57 August 22, 2012 , Version 5.0 by EnterpriseDB Corporation Copyright 2012

More information

Virtuoso and Database Scalability

Virtuoso and Database Scalability Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of

More information

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database. Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

DBMS / Business Intelligence, SQL Server

DBMS / Business Intelligence, SQL Server DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition Liferay Portal Performance Benchmark Study of Liferay Portal Enterprise Edition Table of Contents Executive Summary... 3 Test Scenarios... 4 Benchmark Configuration and Methodology... 5 Environment Configuration...

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski [email protected]

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski [email protected] Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Postgres Plus Advanced Server

Postgres Plus Advanced Server Postgres Plus Advanced Server An Updated Performance Benchmark An EnterpriseDB White Paper For DBAs, Application Developers & Enterprise Architects June 2013 Table of Contents Executive Summary...3 Benchmark

More information

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009

More information

What Is Specific in Load Testing?

What Is Specific in Load Testing? What Is Specific in Load Testing? Testing of multi-user applications under realistic and stress loads is really the only way to ensure appropriate performance and reliability in production. Load testing

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Instant SQL Programming

Instant SQL Programming Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions

More information

One step forward true json data type. Nested hstore with arrays support. Oleg Bartunov, Teodor Sigaev Moscow University, MEPhI

One step forward true json data type. Nested hstore with arrays support. Oleg Bartunov, Teodor Sigaev Moscow University, MEPhI One step forward true json data type. Nested hstore with arrays support Oleg Bartunov, Teodor Sigaev Moscow University, MEPhI Hstore developers Teodor Sigaev, Oleg Bartunov Sternberg Astronomical Institute

More information

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.

Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc. Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly

More information

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011

Availability Digest. www.availabilitydigest.com. Raima s High-Availability Embedded Database December 2011 the Availability Digest Raima s High-Availability Embedded Database December 2011 Embedded processing systems are everywhere. You probably cannot go a day without interacting with dozens of these powerful

More information

Contents RELATIONAL DATABASES

Contents RELATIONAL DATABASES Preface xvii Chapter 1 Introduction 1.1 Database-System Applications 1 1.2 Purpose of Database Systems 3 1.3 View of Data 5 1.4 Database Languages 9 1.5 Relational Databases 11 1.6 Database Design 14 1.7

More information

OSM-in-a-box A Ready-Made Highly Configurable Map Server

OSM-in-a-box A Ready-Made Highly Configurable Map Server OSM-in-a-box A Ready-Made Highly Configurable Map Server 1 Prof. Stefan Keller Computer Science Department University of Applied Sciences Rapperswil (CH) www.gis.hsr.ch Contents 2 Motivation The osm2gis

More information

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML? CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency

More information

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2 Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.2 June 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3 Wort ftoc.tex V3-12/17/2007 2:00pm Page ix Introduction xix Part I: Finding Bottlenecks when Something s Wrong Chapter 1: Performance Tuning 3 Art or Science? 3 The Science of Performance Tuning 4 The

More information

How To Create A Table In Sql 2.5.2.2 (Ahem)

How To Create A Table In Sql 2.5.2.2 (Ahem) Database Systems Unit 5 Database Implementation: SQL Data Definition Language Learning Goals In this unit you will learn how to transfer a logical data model into a physical database, how to extend or

More information

Performance And Scalability In Oracle9i And SQL Server 2000

Performance And Scalability In Oracle9i And SQL Server 2000 Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability

More information

Various Load Testing Tools

Various Load Testing Tools Various Load Testing Tools Animesh Das May 23, 2014 Animesh Das () Various Load Testing Tools May 23, 2014 1 / 39 Outline 3 Open Source Tools 1 Load Testing 2 Tools available for Load Testing 4 Proprietary

More information

This guide specifies the required and supported system elements for the application.

This guide specifies the required and supported system elements for the application. System Requirements Contents System Requirements... 2 Supported Operating Systems and Databases...2 Features with Additional Software Requirements... 2 Hardware Requirements... 4 Database Prerequisites...

More information

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Simon Law TimesTen Product Manager, Oracle Meet The Experts: Andy Yao TimesTen Product Manager, Oracle Gagan Singh Senior

More information

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL

More information

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Portable Scale-Out Benchmarks for MySQL MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc. Continuent 2008 Agenda / Introductions / Scale-Out Review / Bristlecone Performance Testing Tools /

More information

FileMaker 11. ODBC and JDBC Guide

FileMaker 11. ODBC and JDBC Guide FileMaker 11 ODBC and JDBC Guide 2004 2010 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker is a trademark of FileMaker, Inc. registered

More information

SQL Server Instance-Level Benchmarks with DVDStore

SQL Server Instance-Level Benchmarks with DVDStore SQL Server Instance-Level Benchmarks with DVDStore Dell developed a synthetic benchmark tool back that can run benchmark tests against SQL Server, Oracle, MySQL, and PostgreSQL installations. It is open-sourced

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

Database Administration with MySQL

Database Administration with MySQL Database Administration with MySQL Suitable For: Database administrators and system administrators who need to manage MySQL based services. Prerequisites: Practical knowledge of SQL Some knowledge of relational

More information

Oracle SQL. Course Summary. Duration. Objectives

Oracle SQL. Course Summary. Duration. Objectives Oracle SQL Course Summary Identify the major structural components of the Oracle Database 11g Create reports of aggregated data Write SELECT statements that include queries Retrieve row and column data

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

FileMaker 12. ODBC and JDBC Guide

FileMaker 12. ODBC and JDBC Guide FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases

More information

"SQL Database Professional " module PRINTED MANUAL

SQL Database Professional  module PRINTED MANUAL "SQL Database Professional " module PRINTED MANUAL "SQL Database Professional " module All rights reserved. No parts of this work may be reproduced in any form or by any means - graphic, electronic, or

More information

FileMaker 13. ODBC and JDBC Guide

FileMaker 13. ODBC and JDBC Guide FileMaker 13 ODBC and JDBC Guide 2004 2013 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

Programming Database lectures for mathema

Programming Database lectures for mathema Programming Database lectures for mathematics students April 25, 2015 Functions Functions are defined in Postgres with CREATE FUNCTION name(parameter type,...) RETURNS result-type AS $$ function-body $$

More information

In-memory databases and innovations in Business Intelligence

In-memory databases and innovations in Business Intelligence Database Systems Journal vol. VI, no. 1/2015 59 In-memory databases and innovations in Business Intelligence Ruxandra BĂBEANU, Marian CIOBANU University of Economic Studies, Bucharest, Romania [email protected],

More information

MS ACCESS DATABASE DATA TYPES

MS ACCESS DATABASE DATA TYPES MS ACCESS DATABASE DATA TYPES Data Type Use For Size Text Memo Number Text or combinations of text and numbers, such as addresses. Also numbers that do not require calculations, such as phone numbers,

More information

Database Replication with MySQL and PostgreSQL

Database Replication with MySQL and PostgreSQL Database Replication with MySQL and PostgreSQL Fabian Mauchle Software and Systems University of Applied Sciences Rapperswil, Switzerland www.hsr.ch/mse Abstract Databases are used very often in business

More information

Why Zalando trusts in PostgreSQL

Why Zalando trusts in PostgreSQL Why Zalando trusts in PostgreSQL A developer s view on using the most advanced open-source database Henning Jacobs - Technical Lead Platform/Software Zalando GmbH Valentine Gogichashvili - Technical Lead

More information

SQL Server An Overview

SQL Server An Overview SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system

More information

User Management Resource Administrator 7.2

User Management Resource Administrator 7.2 User Management Resource Administrator 7.2 Table Of Contents What is User Management Resource Administrator... 1 UMRA Scripts... 1 UMRA Projects... 1 UMRA Software... 1 Quickstart - Sample project wizard...

More information

Technology Foundations. Conan C. Albrecht, Ph.D.

Technology Foundations. Conan C. Albrecht, Ph.D. Technology Foundations Conan C. Albrecht, Ph.D. Overview 9. Human Analysis Reports 8. Create Reports 6. Import Data 7. Primary Analysis Data Warehouse 5. Transfer Data as CSV, TSV, or XML 1. Extract Data

More information

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy MyOra 3.5 SQL Tool for Oracle User Guide Kris Murthy Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL Editor...

More information

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation

INTRODUCTION ADVANTAGES OF RUNNING ORACLE 11G ON WINDOWS. Edward Whalen, Performance Tuning Corporation ADVANTAGES OF RUNNING ORACLE11G ON MICROSOFT WINDOWS SERVER X64 Edward Whalen, Performance Tuning Corporation INTRODUCTION Microsoft Windows has long been an ideal platform for the Oracle database server.

More information

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added? DBMS Questions 1.) Which type of file is part of the Oracle database? A.) B.) C.) D.) Control file Password file Parameter files Archived log files 2.) Which statements are use to UNLOCK the user? A.)

More information

Basic Installation of the Cisco Collection Manager

Basic Installation of the Cisco Collection Manager CHAPTER 3 Basic Installation of the Cisco Collection Manager Introduction This chapter gives the information required for a basic installation of the Cisco Collection Manager and the bundled Sybase database.

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the

More information

Report Paper: MatLab/Database Connectivity

Report Paper: MatLab/Database Connectivity Report Paper: MatLab/Database Connectivity Samuel Moyle March 2003 Experiment Introduction This experiment was run following a visit to the University of Queensland, where a simulation engine has been

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle University Contact Us: 001-855-844-3881 & 001-800-514-06-97 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals

More information

Technical Data Sheet: imc SEARCH 3.1. Topology

Technical Data Sheet: imc SEARCH 3.1. Topology : imc SEARCH 3.1 Database application for structured storage and administration of measurement data: Measurement data (measurement values, measurement series, combined data from multiple measurement channels)

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Oracle Database 12c: Introduction to SQL Ed 1.1

Oracle Database 12c: Introduction to SQL Ed 1.1 Oracle University Contact Us: 1.800.529.0165 Oracle Database 12c: Introduction to SQL Ed 1.1 Duration: 5 Days What you will learn This Oracle Database: Introduction to SQL training helps you write subqueries,

More information

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2.

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2. IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft

More information

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12 XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5

More information

IGEL Universal Management. Installation Guide

IGEL Universal Management. Installation Guide IGEL Universal Management Installation Guide Important Information Copyright This publication is protected under international copyright laws, with all rights reserved. No part of this manual, including

More information

Basics on Geodatabases

Basics on Geodatabases Basics on Geodatabases 1 GIS Data Management 2 File and Folder System A storage system which uses the default file and folder structure found in operating systems. Uses the non-db formats we mentioned

More information

Comparing MySQL and Postgres 9.0 Replication

Comparing MySQL and Postgres 9.0 Replication Comparing MySQL and Postgres 9.0 Replication An EnterpriseDB White Paper For DBAs, Application Developers, and Enterprise Architects March 2010 Table of Contents Introduction... 3 A Look at the Replication

More information

In-Memory Databases MemSQL

In-Memory Databases MemSQL IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

INFORMATION BROCHURE Certificate Course in Web Design Using PHP/MySQL

INFORMATION BROCHURE Certificate Course in Web Design Using PHP/MySQL INFORMATION BROCHURE OF Certificate Course in Web Design Using PHP/MySQL National Institute of Electronics & Information Technology (An Autonomous Scientific Society of Department of Information Technology,

More information

Proval LS Database & Client Software (Trial or Full) Installation Guide

Proval LS Database & Client Software (Trial or Full) Installation Guide Proval LS Database & Client Software (Trial or Full) Installation Guide Prerequisites: Technical Requirements SDS ProVal is an independent application and does not require Microsoft Office to run. It is

More information

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals Oracle University Contact Us: 1.800.529.0165 Oracle Database: SQL and PL/SQL Fundamentals Duration: 5 Days What you will learn This course is designed to deliver the fundamentals of SQL and PL/SQL along

More information

BrightStor ARCserve Backup for Windows

BrightStor ARCserve Backup for Windows BrightStor ARCserve Backup for Windows Agent for Microsoft SQL Server r11.5 D01173-2E This documentation and related computer software program (hereinafter referred to as the "Documentation") is for the

More information

Connecting to a Database Using PHP. Prof. Jim Whitehead CMPS 183, Spring 2006 May 15, 2006

Connecting to a Database Using PHP. Prof. Jim Whitehead CMPS 183, Spring 2006 May 15, 2006 Connecting to a Database Using PHP Prof. Jim Whitehead CMPS 183, Spring 2006 May 15, 2006 Rationale Most Web applications: Retrieve information from a database to alter their on-screen display Store user

More information

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to: D61830GC30 for Developers Summary Duration Vendor Audience 5 Days Oracle Database Administrators, Developers, Web Administrators Level Technology Professional Oracle 5.6 Delivery Method Instructor-led

More information

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination. GCE Computing COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination 2510 Summer 2014 Version: 1.0 Further copies of this Report are available from aqa.org.uk

More information

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Subject NoSQL Databases - MongoDB Submission Date 20.11.2013 Due Date 26.12.2013 Programming Environment

More information

Jet Data Manager 2012 User Guide

Jet Data Manager 2012 User Guide Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform

More information

RS MDM. Integration Guide. Riversand

RS MDM. Integration Guide. Riversand RS MDM 2009 Integration Guide This document provides the details about RS MDMCenter integration module and provides details about the overall architecture and principles of integration with the system.

More information

Understanding the Benefits of IBM SPSS Statistics Server

Understanding the Benefits of IBM SPSS Statistics Server IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster

More information

White paper FUJITSU Software Enterprise Postgres

White paper FUJITSU Software Enterprise Postgres White paper FUJITSU Software Enterprise Postgres Open Source Value, Enterprise Quality Strong growth in Database Management Systems (DBMSs) is expected to continue, making DBMS the largest single cost

More information

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia [email protected]

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia smith@backendmedia.com Lukas Smith database abstraction layers in PHP BackendMedia 1 Overview Introduction Motivation PDO extension PEAR::MDB2 Client API SQL syntax SQL concepts Result sets Error handling High level features

More information

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/- Oracle Objective: Oracle has many advantages and features that makes it popular and thereby makes it as the world's largest enterprise software company. Oracle is used for almost all large application

More information