In Memory Database. Performance evaluation based on query time. Seminar Database Systems

In Memory Database Performance evaluation based on query time Sansar Choinyambuu schoinya@hsr.ch Supervisor: Prof. Markus Stolze 21.12.2010

1 In Memory Database Table of Contents Introduction... 2 Technology overview... 3 In Memory database solutions on market... 5 Method... 6 Results... 7 Experiment on Northwind database... 7 Experiment on MovieLens dataset... 9 Discussion... 10 References... 11 Appendix A: Experiment results on Northwind Database... 12 Appendix B: Experiment Results on MovieLens dataset... 15

2 In Memory Database Introduction Today s big enterprises have built information systems that fundamentally depend on pace of transactions. The ever advancing requirement of users expects from the real time databases to deliver the results faster and the various modern technologies make it possible to meet these needs to a lesser or greater degree. A typical real time database stores value of variables representing the external environment, and processing activity involving the variables may result in outputs that trigger actuators to affect the environment [15]. The demands of real time database applications such as Recommendation system, Stream processing and Complex event processing span even further when we combine the speed they require and volume of transactions facilitated by the Internet with the enterprise business scope extends around the globe, the fact should be realized that the traditional hard disk based database management systems cannot fulfill these necessities [15][19]. In Memory Database (IMDB) also called fully cached database [3] or Main Memory database relies completely on system RAM rather than the hard disk operating at orders of magnitude slower [1]. In Memory databases are able to process the transactions distinctly faster and it consumes noticeably less energy [3]. They improve system function greatly thanks to great decrease of I/O operations, state transitions of affairs and its associated cache memory replacement, decrease of lock contention, more efficient memory search structures and query processing, etc. [14] Despite the promising performance in terms of speed, IMDB face some challenges to keep up with traditional on-disk databases. While the robust database systems, as a matter of design based on hard disk, guarantee atomic, consistent, isolated and durable (ACID) nature of the transactions they process, IMDB s have problem with assuring the durable characteristic due to its RAM-based design [1]. Different products are offering diverse ways to cope with this issue as will be shown in IMDB solutions on the market section of this paper. Giant players on Database market such as IBM, Oracle, SAP and VMWare have all acquired in recent years the companies which were developing in memory database solutions and already marketing or promoting to offer their IMDB solutions on the market [9][10][11][12]. Gartner has placed In Memory Database Management Systems At the Peak of their Hype cycle for Data Management 2010 [4]. The trend clearly indicates the strong presence of demand of IMDB solutions on market and usage of them to an increasing degree in coming years. However, do the IMDB s always perform better than on-disk databases, regardless of the applications they re used for? The intention of this article is to show the potential advantages of IMDB s over traditional RDBMS s with the help of experiment results presented in the Methods and Results sections of the present paper.

3 In Memory Database Technology overview IMDB s accelerate information storage, retrieval and sort by holding all records in main memory. Transactional operations are processed entirely independent from hard disk. This eliminates a major source of processing overhead and can deliver performance gains of an order of magnitude or more [6]. Common applications for IMDBs are as embedded databases in such high-speed specialty devices as medical equipment, network and telecom equipment, and industry control devices [3]. However, the role of the IMDB has greatly expanded over the last couple of years as high-end 64-bit servers can now accommodate databases that are quite large even terabytes in size. Therefore the applications for large IMDBs further include many popular large-scale Web applications and social networking sites [3]. Main performance burden of On-Disk databases is I/O operations on file system that are necessarily to be done for the database transactions as could be seen in the following figure. In contrast, an in-memory database system entails little or no data transfer. The IMDB system gives the application a pointer that refers directly to the data item in the database, enabling the application to work with the data directly. Elimination of multiple data transfers streamlines processing. Cutting multiple data copies reduces memory consumption, and the simplicity of this design makes for greater reliability. [2] Fig 1. Data transfer in an On-Disk Database system [2] Traditional on-disk database systems have been evolving decades towards the competent performance using various approaches. By applying some additional technologies, the performance of on-disk database systems can be optimized to a certain degree. One of the most known technologies that can be counted in this category is Caching. When the caching is enabled on the on-disk database

4 In Memory Database system, depending on the cache-size you can keep the most frequently accessed records (most recently used e.g. reliant on the product) in memory for faster access. However caching only accelerates reading operation, thus in the case of any writing action, inserting new record to the database or updating the records, it still has to do I/O operations on disk. Therefore it only proves partial performance improvement. If we take the additional CPU and memory resources to be used to manage caching, into the account Caching clearly underperforms the IMDB s [8]. If the reason behind the accomplishment of IMDB s is keeping all records in memory, it should be possible to get the same performance deploying the on-disk database on a RAM disk. RAM disk or RAM drive is the software that creates the virtual hard disk using the part of the system memory. Unfortunately due to the natures of the on-disk database systems by design, such as I/O to file system and caching continue working and drain the performance. Figure 1 shows the handoffs required for an application to read a piece of data from on-disk database modify it and write it back to database. These steps require time and CPU cycles and cannot be avoided even if on-disk database is deployed on RAM disk. NAND Flash-based solid state drives (SSDs) are enjoying wide usage as data storage for recent years. Websites, data center and embedded applications even personal laptops are using then as storage media. Thanks to the point that no mechanical part is included in SSDs, they perform superior to the hard disks in terms of data access. Storage on SSD eliminates physical I/O, resulting in better responsiveness [6]. However all the ordinary performance drains would still persist for on-disk database systems even if they use SSD s. The physical responsiveness and performance of SSD s based on NAND flashes are definitely better than those of hard disks but nowhere near to those of DRAM s. [16] Whereas there are applications that don t require the database to be durable, for most of the applications durable characteristic of a database has deciding importance. Data stored on volatile medium get lost in case of power shutdown. This can be avoided using different features that are usually additionally offered by vendors. Transactional durability is provided, depending on the vendor, by several different approaches:[3][7] Hybrid database structures: In this approach, hard disks handle persistent data while RAM handles all of the volatile, but disposable data. This approach costs less because it uses the inexpensive disks, but maintains excellent speed by handling data directly in RAM without buffering to disk. Logged operations: In this approach, a journal file records every transaction of the in-memory database to battery-backed RAM or EEPROM memory. Snapshotting: This approach creates a complete duplicate of the data at a given point in time. Any new data created since the last snapshot would be lost in the event of a power failure. High availability: Some vendors opt to allow in-memory database to fail over to an identical copy of the database, kept in sync using replication or another failover protocol. Non-volatile RAM or NVRAM provides another means of in-memory database persistence. One type of NVRAM, called battery-ram, is backed up by a battery so that even if a device is turned off or loses its power source, the memory content including the database remains. Newer types of NVRAM, including ferroelectric RAM (FeRAM), magnetoresistive RAM (MRAM) and

5 In Memory Database phase change RAM (PRAM) are designed to maintain information when power is turned off, and offer similar persistence options. [6] Every hypothesis made for the performance results of alternative technologies to IMDB s in this section, can be reviewed and proven with the experiment results, which are presented in the corresponding section. In Memory database solutions on market In recent years variety of IMDB solutions made available on market from big vendors to spin-off, from commercial products to open source initiatives. They usually differ from each other how they solve the durability problem for IMDB s. Oracle has acquired TimesTen in 2005 [9] and offers it as In-Memory database, which can be used as a preprocessing cache for its traditional flagship RDBMS or as a stand-alone database product. (Kline) TimesTen used transactional logging and database checkpointing as a measure to provide durability. IBM has acquired SolidDB in 2008 [10] and markets it as IMDB which maintains durability by keeping two separate but synchronized copies of database all the times as well as permanent log file stored on disk. (Kline) As pointed out by vendor, in case of event failure recovery happens to standby database in less than one second without data loss. According to the vendor announcement, the product called extremedb by McObject LLC is the fastest IMDB solution on the market. extremedb Fusion version provides durability using hybrid approach where database design or "schema" causes certain record types to be written to disk, while all others are managed entirely in memory. On-disk functions such as cache management are applied only to those records stored on disk, minimizing these activities performance impact and CPU demands [8] SQLite is a software library that implements a self-contained, serverless, zeroconfiguration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world [17]. SQLite database can exist purely in memory, with the help of :memory: option provided for creating a database connection. The database ceases to exist as soon as the database connection is closed. Thus the developer who s deploying IMDB should take an action for keep the durability feature. The SQLIte in-memory option was used in the reported in this paper. For the further configuration and use cases please refer to the Performance Evaluation section. There is large number of other IMDB solutions offered on the market, for the rough estimate, please refer to the Products list provided by Wikipedia [18]. In all of the above products, the IMDB supports industry standards such as SQL for data processing and either JDBC or ODBC for end-user application interfaces.

6 In Memory Database Method The purpose of the experiment was to see whether the IMDB surpasses the on-disk database in terms of performance speed. The performances of alternative technologies which help the on-disk databases to perform competently (introduced in Technology review section), are also demonstrated within this experiment. Therefore it gives an opportunity to see whether those technologies used together with on-disk databases could be potential alternatives to IMDB s. Northwind Traders Access database is a sample database that shipped with Microsoft Office suite. The Northwind database contains the sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world. The Northwind database served as test database in our experiment. Thanks to its realistic structure for Order processing applications it provided suitable input for queries to be executed. For database Order processing delivers main input for online recommendation systems, by analyzing the selection and consumption behavior of individual users both in past and real time. Therefore Northwind database offer realistic experimental application case for use of IMDB. The database used by recommendation systems can scale up to large volume in terms of records if we consider it to be the base for Recommendation System of Online shop with several thousand or more users. To get the simple Northwind database large enough, the Orders table was copied 8192 times to another so called BigOrders table. The definitions of the fields remain the same for this large table, except the OrderID field becoming auto-incremented. The Big Orders table contains 6'799'360 records and the Northwind database scales to 1 09 GB in size. List of selected queries were run on following 4 scenarios: SQLite on-disk database SQLite on-disk database with cache size of 2G SQLite on-disk database on RAM disk SQLite in-memory database RAMDisk free version by DATARAM was used for creation of RAM disk of 4G in size, which is used to deploy the database for the third scenario. To be able to use the exactly same database for all 4 scenarios, an on-disk Northwind database was attached to in-memory database to copy the complete data over to new in-memory database. (For the further configuration the experiment code can provide more information). The experiment was done using a console program, written in c# programming language, using System.Data.SQLite and Microsoft Visual Studio 2008. System.Data.SQLite is the original SQLite database engine and a complete ADO.NET 2.0/3.5 provider all rolled into a single mixed mode assembly.

7 In Memory Database The experiments were performed on a PC running Microsoft Windows 7 Enterprise (v. 6.1.7600), with a 2.27 GHz Quad core Intel Xeon processor and 24 gigabytes of RAM. The main criterion for the performance evaluation of above 4 scenarios was query execution time, which is for IMDB claimed to be superior to the traditional on-disk database performances according to most of the vendors and researchers. The further criterion such as CPU and memory consumption was observed informally during the experiment, but not documented. Results Experiment on Northwind database The first scenario of the experiment was prepared to get reference values for measurement, where the set of queries were run against the database on disk and elapsed time was measured until result set is delivered. Performance improvements for other three scenarios are calculated using the outputs of first scenario as a base using following formula: iiii TT oooo dddddddd > TT rrrrrrrrrrrr SSSSSSSSSSSSSS = TT oooo dddddddd TT rrrrrrrrrrrr 1 eeeeeeee SSSSSSSSSSSSSS = ( TT oooo dddddddd TT rrrrrrrrrrrr ) 1 Formula 1: Calculation of speedup factor However, as it could be observed in Appendix Experiment Result, on-disk database with Caching did not deliver faster query execution except for few slight improvements. Therefore the results of second scenario is ignored and not analyzed. The queries were executed noticeably faster on the third scenario where database file was saved in virtual RAMDisk. Scatter points in Figure 2 show the achieved speedups for each query execution, which were calculated using Formula 1, where TT rrrrrrrrrrrr substituted with TT RRRRRRRRRRRR kk. The number of the query is presented through values on horizontal axis. The calculated speedup factors are presented through numbers on vertical axis. Speedup 3.00 2.50 2.00 1.50 1.00 0.50 0.00 RAMDisk 0 5 10 15 20 25 Number of query RAMDisk Fig 2. Query time improvements for SQLite database on RAMDisk

8 In Memory Database According to the experiment results, the performance improvement brought by RAMDisk is considerably constant and there was no query that executed slower on RAM disk than on hard disk. Due to the fact that granularity on time measurement was restricted to microseconds, for the queries which delivered the result in less than 0.001 microsecond we got zero as elapsed time. Therefore it was not possible to determine the speedup factor using the Formula.1, if one of the parameters has zero value. 5 scatter points which lie on zero level on Figure 2 represent these cases. Main hypothesis for the experiment in scenario 4 was that the IMDB delivers faster query execution than on-disk database when exactly the same inputs used and the same set of queries run. Unfortunately this statement could not be confirmed with the experiment results as could be seen in Figure 3. On the contrary, there were number of queries which took noticeably longer time than they did on on-disk database. So the statement that the IMDB always performs faster than on-disk database cannot be proven. This applies at least for SQLite database engine, for the other IMDB solutions the experiment results cannot guarantee to be the same. 6.00 In Memory 4.00 Sppedup 2.00 0.00-2.00 0 5 10 15 20 25 In Memory -4.00-6.00 Number of query Fig 3. Query time improvements for SQLite database in memory In contrast with the RAMDisk speedup, the in-memory database results don t summon up on positive side and present varying values for speedup factor. The queries which run slower on IMDB than on-disk database are presented as the scatter points which lie lower than zero level. As previously mentioned, the scatter points, which are on zero level on vertical axis show the queries for which the speedup factor could not be calculated using Formula 1. To track down the roots of these slower query times, SELECT queries with various SQL statements were intentionally included in the set. Based on the analysis of results, the reason for slower query execution time could be defined as the ORDER BY, GROUP BY and JOIN statements of the queries. There was certain suspect that this negative result could be caused by the replicated nature of the Experiment database Northwind. Therefore an additional experiment was done on another dataset, which will be explained in the following section.

9 In Memory Database Experiment on MovieLens dataset MovieLens data sets were collected by the GroupLens Research Project at the University of Minnesota. The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. This dataset is used to study Recommendation Engines, tagging systems and user interfaces. In the experiment a slightly modified version of MovieLens dataset was used, which was created with the help of SQL file provided by Iwis project [21]. The database consists of two tables, firstly the Resources table with 10680 movies and the Rate table containing 250 000 ratings for those movies. CREATE TABLE [RESOURCES] ([ID] INTEGER PRIMARY KEY NOT NULL, [URI] TEXT NULL, [TITLE] TEXT NULL, [DESCRIPTION] TEXT NULL, [LINK] TEXT NULL, [TYPE] TEXT NULL, [GENERALTAGFREQUENCY] FLOAT NULL, [GENERALTAGWEIGHT] FLOAT NULL, [AUTHOR_ID] NUMERIC NULL ) CREATE TABLE [RATE] ([ID] INTEGER NOT NULL PRIMARY KEY, [RATE] FLOAT NULL, [MOVIE_ID] INTEGER NOT NULL, [USER_ID] INTEGER NOT NULL ) The Resources table has been populated with null values except the ID, Title and Type fields. The queries were selected with the intention to reproduce the poor performance of IMDB in comparison with on-disk database in SQLite. The experiments were performed on a netbook running Microsoft Windows 7 Starter (v. 6.1.7600), with a 1.83 GHz Intel Atom processor and 1 gigabyte of RAM. Identical with the experiment on Northwind database, the experiment on MovieLens dataset was done using a console program, written in c# programming language, using System.Data.SQLite and Microsoft Visual Studio 2008. The results of this experiment were confirming the original suspect of negative effect of GROUP BY, ORDER BY and JOIN statements on IMDB. For instance the query: SELECT * FROM Resources ORDER BY ID DESC This has taken 1 msec time on on-disk database where in memory database has responded after 326.019 msec. Moreover, even more devastating difference has been shown with the following query: SELECT Resources.ID, Resources.TITLE from Resources INNER JOIN Rate ON Resources.ID=Rate.MOVIE_ID AND Rate.RATE=5 ORDER BY Resources.TITLE This one has taken 1.645 seconds on on-disk database and 45.418 minutes on IMDB. However the actual cause why these SQL statements have negative effect on query execution time just for in-memory database and not for on-disk database could not be determined in scope of this paper. For complete results for experiments please refer to the Appendix section of this paper.

10 In Memory Database Discussion There are certain limitations for the experiments that were made in scope of this paper. In addition to the mentioned constraints of criterions in Method section, the small number of the queries used in the experiment can be inadequate to clearly represent the performance of SQLite In Memory mode. However certain pattern in terms of query execution time could be definitely observed using these queries. Furthermore, it could be claimed that the select queries involved in experiment are reasonably realistic for order processing and recommendation system transactions. Already when the scenarios for the experiment were getting prepared, the negative outcomes of some queries on IMDB in terms of speed in comparison to on-disk databases could be noticed. Therefore some of the queries are selected intentionally to reproduce these unexpected effects using GROUP BY, ORDER BY and JOIN sql statements. As mentioned early in Introduction section, the Gartner has placed In Memory Database Management Systems at the peak of their Hype cycle for Data Management 2010 [4]. For the technologies at this height, it could be relatively easy to get the attention of market and generate mass usage. However as it could be seen in the delivered results from this paper, the applications should be analyzed based on their nature and differentiated according to the performance gain it should benefit from, if they would use IMDB, before any decisive action is taken. For the final note, the statements that could be made using these experiment results have validity only for SQLite database engine.

11 In Memory Database References [ 1 ] Hector Garcia-Molina, Kenneth Salem, Main Memory Database Systems: An Overview IEEE Transactions On Knowledge And Data Engineering, Vol. 4, No. 6, December 1992 [ 2 ] Steve Graves, In Memory Database Systems ACM Digital Library, Linux Journal Vol. 2002, Issue 101 [ 3 ] Kevin Kline, In Memory Databases push the Envelope, Information Management Newsletters, May 10, 2010 [ 4 ] Eric Thoo, Ted Friedman, Donald Feinberg and Mark A. Beyer, Hype cycle for Data Management 2010 ID Number: G00200878, July 22, 2010 [ 5 ] McObject LLC, Main Memory vs. RAM-Disk Databases: A Linux-based Benchmark, 2009 [ 6 ] McObject LLC, In-Memory Database Systems: Myths and Facts, 2009 [ 7 ] McObject LLC, McObject Breaks In-Memory Database Boundaries in New Benchmark, 2009 [ 8 ] McObject LLC, In Memory Database Systems Question and Answers, http://www.mcobject.com/in_memory_database, 2009 [ 9 ] Oracle Press Release, Oracle To Acquire TimesTen, Inc., http://www.oracle.com/corporate/press/2005_jun/060905_timesten_final_site.html, July, 2005 [ 10 ]IBM Press, IBM Acquires Solid Information Technology http://www- 01.ibm.com/software/data/info/solidinformation/, January 30, 2008 [ 11 ]SAP Acquisitions, SAP Acquires Sybase, http://www.sap.com/about/investor/inbrief/acquisitions/sybase/index.epx, July 30, 2010 [ 12 ]VMWare News Release, SpringSource to Acquire Gemstone Systems Data Management Technology, http://www.vmware.com/company/news/releases/spring-gemstone.html, May 06, 2010 [ 13 ] Lv Junyan, Xu Shiguo, Li Yijie, Application Research of Embedded Database SQLite IEEE Xplore Digital Library, International Forum on Information Technology and Applications, 2009 [14] Tang Yanjun, Luo Wen-hua, A Model of Crash Recovery in Main Memory Database IEEE Xplore Digital Library, 2010 International Conference On Computer Design And Appliations, 2010 [15] Steven Van Singel,Theodore Tabe, Performance in Real time Main Memory Databases IEEE Xplore Digital Library, 0-8186-7102-5/9, 1995 [16] IMEX Research, The State of Solid State Storage Industry Report, 2010

12 In Memory Database [17] SQLite, http://www.sqlite.org/mostdeployed.html, 12 November, 2010 [18] Wikipedia, In-Memory Database, http://en.wikipedia.org/w/index.php?oldid=387313595 November, 2010 [19] Kam-Yiu Lam, Tei-Wei Kuo, Real-Time Database Systems, Architecture and Techniques, KLUWER ACADEMIC PUBLISHERS, ISBN 0-792-37218-2, 2002 [20] GroupLens Research, MovieLens dataset, 5 October, 2006 [21]SQL for MovieLens, Iwis Platform Project at http://sourceforge.net/projects/iwis/ Appendix A: Experiment results on Northwind Database Database : Northwind Size : 1'147'131 KB ~ 1,09GB BigOrders table : 6'799'360 records No Query On Disk On Disk with Cache RAMDisk In Memory RAMDisk Speedup In Memory Speedup 1 SELECT * FROM BigOrders 0.000 microsec 1.000 msec 0.000 microsec 0.000 microsec #DIV/0! #DIV/0! 2 SELECT * FROM BigOrders ORDER BY OrderID 0.000 microsec 0.000 microsec 0.000 microsec 1.148 min #DIV/0! #DIV/0! 3 SELECT * FROM BigOrders WHERE ShipCountry = 'France' GROUP BY ShipCity 17.771 sec 17.378 sec 13.666 sec 10.093 sec 1.30 0.76 4 5 6 SELECT * FROM BigOrders WHERE Freight BETWEEN 0 AND 100 GROUP BY CustomerID 1.085 min 1.123 min 1.007 min 2.742 min 1.08-3.53 SELECT * FROM BigOrders WHERE Freight > 5 ORDER BY OrderID DESC 0.000 microsec 0.000 microsec 0.000 microsec 1.096 min #DIV/0! #DIV/0! SELECT COUNT(OrderID) FROM BigOrders 6.271 sec 6.393 sec 3.323 sec 1.357 sec 1.89 3.62 7 SELECT DISTINCT Customers.CustomerID, Customers.ContactName FROM Customers INNER JOIN BigOrders ON Customers.CustomerID = BigOrders.CustomerID 53.898 sec 55.875 sec 49.452 sec 3.583 min 1.09-4.99

13 In Memory Database 8 9 SELECT Customers.* FROM Customers LEFT JOIN BigOrders ON (Customers.CustomerID = BigOrders.CustomerID AND strftime('%y', OrderDate) = NULL) WHERE BigOrders.ORDERID IS NOT NULL 11.807 min 11.645 min 6.988 min 3.348 min 1.69 2.53 SELECT ShipCountry As country, COUNT(OrderID) As total FROM BigOrders WHERE ShipCountry IN('Germany', 'Brazil') GROUP BY ShipCountry 17.930 sec 17.878 sec 14.180 sec 10.873 sec 1.26 0.65 10 11 12 13 SELECT COUNT(DISTINCT ShipCountry) AS total FROM BigOrders 9.110 sec 9.095 sec 5.975 sec 3.385 sec 1.52 1.69 SELECT Customers.CustomerID, Customers.ContactName, COUNT(OrderID) as total FROM Customers INNER JOIN BigOrders ON Customers.CustomerID = BigOrders.CustomerID GROUP BY Customers.CustomerID, Customers.ContactName HAVING COUNT(OrderID) > 5 ORDER BY COUNT(OrderID) DESC 1.041 min 1.013 min 56.020 sec 3.469 min 1.11-4.33 SELECT BigOrders.OrderID,Employees.Employ eeid, BigOrders.ShipCity FROM Employees INNER JOIN BigOrders ON Employees.EmployeeID = BigOrders.EmployeeID ORDER BY ShipCity ASC 51.047 sec 49.655 sec 44.788 sec 57.658 sec 1.14-2.13 SELECT MAX(OrderID) AS lastorder FROM BigOrders GROUP BY CustomerID 36.659 sec 35.084 sec 30.701 sec 27.284 sec 1.19 0.34 14 15 SELECT COUNT(*), MAX(Freight), EmployeeID FROM BigOrders WHERE ShippedDate IS NOT NULL GROUP BY EmployeeID HAVING MAX(Freight) >= 800 ORDER BY EmployeeID 39.756 sec 38.891 sec 34.648 sec 30.326 sec 1.15 0.31 SELECT COUNT(*) AS OrderCount, strftime('%m', OrderDate) AS OrderMonth, strftime('%y', OrderDate) AS OrderYear FROM BigOrders GROUP BY OrderDate ORDER BY OrderDate 43.603 sec 42.619 sec 38.329 sec 2.232 min 1.14-4.07 16 SELECT COUNT(*) AS OrderCount, strftime('%m', OrderDate) AS OrderMonth, strftime('%y', OrderDate) AS OrderYear FROM BigOrders GROUP BY strftime('%y', OrderDate), strftime('%m', OrderDate) ORDER BY strftime('%y', OrderDate), strftime('%m', OrderDate) 40.372 sec 40.014 sec 35.911 sec 31.855 sec 1.12 0.27

14 In Memory Database 17 CREATE TABLE "CopyOrders" ( OrderID INTEGER PRIMARY KEY AUTOINCREMENT, CustomerID varchar(5), EmployeeID int, OrderDate timestamp, RequiredDate timestamp, ShippedDate timestamp, ShipVia int, Freight float(26), ShipName varchar(40), ShipAddress varchar(60), ShipCity varchar(15), ShipRegion varchar(15), ShipPostalCode varchar(10), ShipCountry varchar(15)) 86.005 msec 78.000 msec 0.000 microsec 0.000 microsec #DIV/0! #DIV/0! 18 INSERT INTO CopyOrders ( CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry) SELECT CustomerID, EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry FROM BigOrders 42.294 sec 45.334 sec 32.713 sec 20.046 sec 1.29 1.11 19 20 UPDATE CopyOrders SET CustomerID="Test", EmployeeID = 100, ShipVia = 100, ShipName = "Test Ship Name" WHERE ShippedDate IS NOT NULL OR OrderDate IS NOT NULL 1.690 min 1.736 min 44.881 sec 23.338 sec 2.26 3.34 DELETE from CopyOrders WHERE ShippedDate IS NULL OR OrderDate IS NOT NULL 1.119 min 1.138 min 27.066 sec 12.605 sec 2.48 4.33 21 DROP TABLE CopyOrders 65.004 msec 46.800 msec 0.000 microsec 0.000 microsec #DIV/0! #DIV/0!

15 In Memory Database Appendix B: Experiment Results on MovieLens dataset Resources Table: 10680 movies with ID, Title, URI, Description, Link, Type, GeneralTagFrequency, GeneralTagWeight, Author_ID fields. Rate Table:250 000 ratings with ID, Rate (0-5), Movie_ID,USER_ID fields. No. Query On Disk In Memory On Disk Normalized in msec In Memory normalized in msec In Memory Speedup 1 SELECT * FROM Resources 1.000 msec 0.000 microsec 1 0 #DIV/0! 2 SELECT * FROM Resources ORDER BY ID DESC 1.000 msec 326.019 msec 1 326.019-327.02 3 SELECT ID,TITLE FROM Resources WHERE TYPE='Drama' 0.000 microsec 0.000 microsec 0 0 #DIV/0! 4 SELECT ID,TITLE FROM Resources WHERE TYPE='Drama' ORDER BY TITLE 57.003 msec 32.002 msec 57.003 32.002 0.78 5 SELECT ID,TITLE,TYPE FROM Resources WHERE ID IS NOT NULL GROUP BY TYPE ORDER BY TITLE 241.014 msec 185.011 msec 241.014 185.011 0.30 6 SELECT * FROM Rate 1.000 msec 0.000 microsec 1 0 #DIV/0! 7 SELECT * FROM Rate ORDER BY USER_ID 3.364 sec 3.142 sec 3364 3142 0.07 8 SELECT * FROM Rate GROUP BY RATE 4.088 sec 4.222 sec 4088 4222-2.03 9 10 SELECT DISTINCT Resources.ID, Resources.TITLE from Resources INNER JOIN Rate ON Resources.ID=Rate.MOVIE_ID 13.973 sec 33.126 min 13973 1987560-143.24 SELECT Resources.ID, Resources.TITLE from Resources INNER JOIN Rate ON Resources.ID=Rate.MOVIE_ID AND Rate.RATE=5 ORDER BY Resources.TITLE 1.645 sec 45.418 min 1645 2708880-1647.74