Lecture Data Warehouse Systems

Size: px
Start display at page:

Download "Lecture Data Warehouse Systems"

Transcription

1 Lecture Data Warehouse Systems Eva Zangerle SS 2013

2 PART C: Novel Approaches Column-Stores

3 Horizontal/Vertical Partitioning Horizontal Partitions Master Table Vertical Partitions Primary Key 3

4 Motivation Most relational database systems store data row by row, i.e. Disadvantage: OLAP systems like data warehouse systems frequently want to read only a few columns of all rows If the data is stored row by row, they have to read a lot of data the query does not actually need 4

5 Column Stores So: Why not storing data column by column? Column Stores 5

6 Sample OLAP Query Consider the OLAP query SELECT avg(totalprice) FROM Order In a column store, just the data marked red must be read 6

7 Sample OLTP Query Of course, for a OLTP query like SELECT* from Order where orderkey = storing data row by row is better 7

8 OLAP queries usually are OLTP vs. OLAP More exploratory you do not exactly know in advance what some analyst wants to know, so optimizations like introducing indices are rather difficult Longer lasting More read-oriented than write-oriented the data is typically produced in some OLTP system, and then transferred to the OLAP system (=DWH) in batch runs Running OLAP queries on a separate system is typically a good idea for performance reasons More attribute focused than entity focused (e.g. calculate the sum of a column vs. read all columns of one specific row) 8

9 Approaches Building a Column Store Emulate on top of a row store Use a row oriented query executor on top of a column-oriented storage layer Use a column oriented query executor on top of a columnoriented storage layer (column store) 9

10 Approach 1 Called Decomposition Storage Model (DSM) or Vertical Partitioning Emulate by splitting each table T up into two-column tables (primary key, i-th column of T) and storing them in a row store Easy to implement Any current relational database system can be used Additional wrapper is necessary 10

11 Building a Column Store Approach 1 Primary key column must be stored multiple times Overhead because of tuple headers (for each tuple, about 8 bytes of administrative informations are stored) Easily two or three times the disk space of the original table required In summary: easy to implement, but consumes a huge amount of both disk space and I/O bandwidth Scientists tested this approach using the SSBM (star schema benchmark) and reported a performance decrease instead of an increase 11

12 Approach 2 Modify the storage layer of a conventional relational database system to store data column by column instead of row by row Unchanged schema at the logical level Data is stored column by column on storage level, plus tuple headers separately When executing a query Required data (subsets of columns) is fetched from storage layer Tuples containing exactly the required columns are constructed (next slide) Finally a row oriented query executor processes the query 12

13 Building a Column Store Approach 2 Constructing tuples from the individually stored columns Implicit column positions are used Each tuple is assigned an implicit position i In each column, the attribute value of tuple i is stored at the i-th position Construct the i-th tuple by taking the i-th value from each column 13

14 Building a Column Store Approach 2 Modify the storage layer of a conventional relational database system to store data column by column instead of row by row In theory One does not need to write a complex query executor containing a huge amount of optimization logic But can use a mature conventional relational database system for this In practice Integration into the existing DBS can be difficult to impossible 14

15 Approach 3 Rewrite both storage layer and query executor from scratch On storage level, data is stored column by column, maybe redundantly for reasons of efficiency The query executor works in column-oriented fashion Huge implementation effort But many chances for optimizations 15

16 Comparing Column Store Architectures Outline Benchmarks Why you want to use them A concrete example: The Star System Benchmark Materialized views: A row store approach and its limitations Comparing the three approaches from 2.3 using the Star System Benchmark 16

17 Benchmarks Comparing different Implementations or Algorithms Goal: One wants to find out, which implementation, algorithm, etc. performs best for solving some problem Here: Which kind of database system performs best for processing OLAP queries To answer this question, one basically has to execute queries and measure the results, however Which queries should one choose? Are they representative for the daily work with the system? Does the test setup miss some important case? What do the results tell us with respect to the measurements of other people? Solution: Use a standardized benchmark 17

18 The Star Schema Benchmark A data Warehouse Benchmark 18

19 The Star Schema Benchmark Using scalefactor, different sizes of the data warehouse can be simulated Contains 13 OLAP queries like SELECT SUM (o.extendedprice * o.discount) as revenue FROM Order o, Date d WHERE o.date = d.datekey AND d.year = 1993 AND o.discount between 1 and 3 AND o.quantity < 25 19

20 The Star Schema Benchmark Contains 13 OLAP queries like select d.year, s.nation, p.category, sum(o.revenue o.supplycost) as profit1 from date d, customer c, supplier s, part p, order o where o.customerkey = c.customerkey and o.supplierkey = s.supplierkey and o.partkey = p.partkey and o.date = d.datekey and c.region = `AMERICA` and s.region = `AMERICA` and (d.year = 1997 or d.year = 1998) and (p.mfgr = `MFGR#1` or p.mfgr = `MFGR#2`) group by d.year, s.nation, p.category order by d.year asc, s.nation asc, p.category asc 20

21 Materialized Views for OLAP Optimization for conventional relational databases: create materialized views containing only the columns needed for answering the expected queries Of course, the original tables still exist Advantage: No need to fetch data not needed for the query at hand; hence IO is minimzed Disadvantage: Knowledge about expected queries is needed in advance Remember: OLAP queries are often used for analyzing things, which can be a quite creative process Nevertheless, measurements using this approach are useful as reference when evaluating the column store approaches 21

22 Remember Comparing CS Approaches Approach 1: Store data in two-column tables in a Row-Store Approach 2: Row oriented query executor on top of column-oriented storage layer Approach 3: Column oriented query executor on top of column-oriented storage layer 22

23 Comparing CS Approaches About the presented performance measuremens Obtained using the SSBM benchmark Each repeated several times About the evaluated implementations Most OLAP database systems are commercial, so [1] choose one of them for evaluating both the materialized view and the Decomposite Storage Model approach According to [1], mentioning which database system they choose was forbidden for license reasons (approach 1: two column tables, approach 2: row oriented query executor on top of selfwritten column-oriented storage layer, approach 3: everything self-written) 23

24 Comparing CS Approaches About the evaluated implementations Ideally, when comparing implementations, they should only differ in the change or improvement one actually wants to evaluate Hence Approaches 1 and 2 should share the query executor Approaches 2 and 3 should share the storage layer Here Proprietary database system for approach 1 Self written implementation for approaches 2 and 3 (approach 1: two column tables, approach 2: row oriented query executor on top of self-written column-oriented storage layer, approach 3: everything self-written) 24

25 Comparing CS Approaches About the evaluated implementations So: Approaches 1 and 2 did not share the query executor Optimized commercial vs. basic self-written version According to [1], additional experiments showed that the commercial version alone is about two or three times faster than the basic self written version In summary Not a perfect comparison Keep this in mind when having a look on the results 25

26 Comparing CS Approaches Performance comparison using the SSBM between Traditional: Traditional row-oriented implementation Materialized views on a row store Approach 1: Column store using two column row store tables Approach 2: Row store query executor on top of column store storage layer Approach 3: Column store completely self written 26

27 C-Store C-Store: A Column Store implementation Developed in about 2005 within a dissertation So: not a fully developed and optimized database system, rather a proof of concept Its successor today is a commercial database system 27

28 C-Store Architecture Data is physically stored column by column The users interact with a relational interface, using SQL Each table is physically represented by a collection of projections Projection: Subset of a table, containing all rows and some columns There is one projection covering the whole table, as joining them is slow Each column can be part of any number of projections Each projection has its own sort order, shared by all its columns 28

29 C-Store Architecture Advantage: Each column can be stored in multiple sort orders, the query optimizer can choose a projection based on the query at hand and the primary, secondary, etc. sort orders in the projections Disadvantage: Data is stored redundantly, updates are more expensive, more space is required But OLAP systems work mostly read oriented Memory has become cheap Here: new data arrives in a write store and is transferred to the read store e.g. once a day (batch updates) 29

30 C-Store Storage Layer Stores data in 64 KB blocks Indices to blocks If sorted: Sparse index on column value Always: Sparse index on tuple ID Although Column Stores very much work with full table scans, those indices are needed in some situations Some operators can work on position lists (details later) E.g. the input data of such an operator may tell it do something on the tuples with IDs 64, 332, 749, 1212, When filtering for certain tuples 30

31 Compression Reduced disk I/O Reduced amount of storage Seek time reduced locality of data Buffers may hold larger amount of data 31

32 Compression in Column Stores In row stores, often dictionary based schemes are used Encoding multiple values of a column at a time is not easily possible But: The values of a column are usually much more similar than the different attribute values of a tuple In Column Stores More compression algorithms can be used Their compression ratio is often higher Columns hold similar data Iterate over page of values vs. page of tuples: easier and faster Exploit sort order Also operators working directly on compressed data are possible 32

33 Compression in Column Stores Note: The focus of compression in column stores is to maximize query performance, not minimize storage sizes Improved IO performance vs. CPU cost for decompression tradeoff 33

34 Null suppression Compression Schemes Consecutive null values or blanks are deleted Replaced with description how many null values there were and where these occurred E.g. store number of bytes previously occupied by null values variable field size: description of size required 34

35 Compression Schemes Run Length Encoding Replace runs of the same value by a compact description e.g. given a sorted integer column, instead of storing store (e.g. at position 9, a sequence of 10 times 6 starts) However: For a run of length 1, three data items instead of one will be stored So, only useful if there actually are long runs of the same value 35

36 Dictionary encoding Compression Schemes Replace frequent patterns by codes E.g. replace the strings red, blue, yellow and green by the bit sequences 00, 01, 10, 11 Row stores: Values from different tuples usually cannot be mixed Column stores: Mixing is possible, e.g represents the strings red, green, blue, green in the same column of four consecutive rows Decompression using bitshift operations Algorithms like Huffman-encoding based on popularity distribution of characters 36

37 Bit-Vector encoding Compression Schemes Situation: There is a column with very few distinct values, e.g. containing only the two strings Yes, No" Store a bit vector for each string, e.g. for the string sequence Yes Yes Yes No Yes No No No Unknown in subsequent rows of a column, store the bitvectors , and Performance in Column Stores questionable Merge of bitvectors for complete decompression is expensive Algorithms for further compression (e.g. runs of the same value like in ) exist 37

38 Compression Schemes Heavyweight Compression Schemes Algorithms like Lempel-Ziv Lempel-Ziv is the algorithm used for gzip They use a possibly sophisticated algorithm to compress big blocks of data at a time minimize storage size 38

39 Operating on Position Lists Join in a column store can produce position lists instead of tuples (compression-aware) 39

40 Operating on Compressed Data Operating directly on compressed data possible in some situations IF Column c1 is not compressed AND Column c2 is RLE compressed FOR EACH VALUE valc1 WITH POSITION i in c1 DO FOR EACH TRIPLE t with VAL V, STARTPOS j AND RUNLEN k in c2 DO IF joinpredicate(valc1,v) THEN OUTPUT-LEFT: NEW RLE TRIPLE (NULL, i, k) 40 OUTPUT-RIGHT: (j j+k-1)

41 Different Compression Schemes How to deal with different compression algorithms at once Solution: Software design Introduce compression blocks Storage blocks contain any number of compression blocks A compression block hides its contents, provides only access through the methods isonevalue: true if the block contains one value on one/more positions isvaluesorted: true if the block is sorted isposcontig: true if in the block is a consecutive subset of a column getnext: iterator access: return the next value asarray: decompress and return as an array getsize: returns the number of values in the block getstartvalue: returns the first value of the block getendposition: returns the position of the last value in the block 41

42 Compression: Summary According to [1], enabling compression can make a Column Store about two times faster Different techniques Null suppression, Run Length Encoding Dictionary Encoding Bit vector Encoding Heavyweight compression schemes Different techniques more or less useful in different situations E.g. depending on sort order, distribution of data values, etc. Maybe choose compression strategy dynamically based on cost models, heuristics, etc., without explicit administration 42

43 Materialization Strategies A column store stores data column by column Users and applications usually expect row-oriented results Hence: When (during the process of query execution) should the tuples be constructed? 43

44 Early Materialization Query like SELECT a,b,c FROM T WHERE a < const1 AND b < const2 AND c < const3 Early materialization Construct tuples early Pass them between the operators 44

45 Late Materialization Query like SELECT a,b,c FROM T WHERE a < const1 AND b < const2 AND c < const3 Late materialization Construct tuples as late as possible Operate on position information before 45

46 Early vs. Late Materialization Early materialization Certain column values may have to be accessed multiple times in a query plan E.g. (1) filter for a predicate A < 7, (2) return column A Early materialization has less overhead in such situations However, even late materialization shouldn t have multiple disk accesses due to caching either Late materialization Often, operating directly on positions or compressed data is possible Certain tuples need not be constructed at all (e.g. because discarded by predicates or in aggregations) 46

47 Early vs. Late Materialization The choice between the strategies is not always obvious Usually, Late Materialization is better for Aggregative queries Queries with highly selective predicates Queries on compressed data Enabling Late Materialization made the SSBM benchmark queries about two to three times faster [1] 47

48 Results Dictionary compression achieves higher results the more data is stored Generally, dictionary compression and LZ work best in regards to column storage size Query performance directly on compressed data: RLE and dictionary compression work best 48

49 Some Real World Systems Sybase IQ (commercial) Vertica, the commercial successor of C-Store MonetDB (open source) 49

50 Summary Column Stores store data column by column Using them can speed up OLAP applications Three possible alternatives Decomposition Storage Model Replace storage layer of row oriented database system Rewrite from scratch Implementing a Column Store C-Store architecture Compression Early vs. Late Materialization 50

51 10 Literature [1] Daniel J. Abadi: Query execution in column-oriented database systems, Ph.D. thesis, Massachusetts Institute of Technology, [2] M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O Neil, P. E. O Neil, A. Rasin, N. Tran, and S. B. Zdonik, C-Store: A column-oriented DBMS, in Proc. of the 31st Int. Conf. on Very Large Databases (VLDB 05), Trondheim, 30 th August September 2 nd 2005, pp [3] Hasso Plattner. A common database approach for OLTP and OLAP using an in-memory column database, in Proc. of the 35th Int. Conf. on Management of Data, Providence, Rhode Island, USA, June 29 th July 2 nd 2009, pp [4] Daniel Abadi, Samuel Madden, and Miguel Ferreira. Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06). ACM, New York, NY, USA,

In-Memory Data Management for Enterprise Applications

In-Memory Data Management for Enterprise Applications In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University

More information

Column-Stores vs. Row-Stores: How Different Are They Really?

Column-Stores vs. Row-Stores: How Different Are They Really? Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi Yale University New Haven, CT, USA dna@cs.yale.edu Samuel R. Madden MIT Cambridge, MA, USA madden@csail.mit.edu Nabil Hachem

More information

low-level storage structures e.g. partitions underpinning the warehouse logical table structures

low-level storage structures e.g. partitions underpinning the warehouse logical table structures DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures

More information

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D.

How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. 1 How To Build a High-Performance Data Warehouse How to Build a High-Performance Data Warehouse By David J. DeWitt, Ph.D.; Samuel Madden, Ph.D.; and Michael Stonebraker, Ph.D. Over the last decade, the

More information

DKDA 2012 and the Impact of In-Memory Database Algorithms

DKDA 2012 and the Impact of In-Memory Database Algorithms DKDA 2012 : The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications Leveraging Compression in In-Memory Databases Jens Krueger, Johannes Wust, Martin Linkhorst, Hasso

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur. Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases

More information

Coldbase - A Column-Oriented In-Memory Database

Coldbase - A Column-Oriented In-Memory Database Coldbase - A Column-Oriented In-Memory Database Johan Jonsson February 10, 2009 Master s Thesis in Computing Science, 30 ECTS-credits Supervisor at CS-UmU: Michael Minock Examiner: Per Lindström Umeå University

More information

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG 1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates

More information

Communication Cost Optimization for Cloud Data Warehouse Queries

Communication Cost Optimization for Cloud Data Warehouse Queries Communication Cost Optimization for Cloud Data Warehouse Queries Swathi Kurunji Tingjian Ge Benyuan Liu Cindy X Chen Computer Science Department University of Massachusetts Lowell Lowell, Massachusetts,

More information

Integrating Compression and Execution in Column-Oriented Database Systems

Integrating Compression and Execution in Column-Oriented Database Systems Integrating Compression and Execution in Column-Oriented Database Systems Daniel J. Abadi MIT dna@csail.mit.edu Samuel R. Madden MIT madden@csail.mit.edu Miguel C. Ferreira MIT mferreira@alum.mit.edu ABSTRACT

More information

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 IN-MEMORY DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 Analytical Processing Today Separation of OLTP and OLAP Motivation Online Transaction Processing (OLTP)

More information

Telemetry Database Query Performance Review

Telemetry Database Query Performance Review Telemetry Database Query Performance Review SophosLabs Network Security Group Michael Shannon Vulnerability Research Manager, SophosLabs michael.shannon@sophos.com Christopher Benninger Linux Deep Packet

More information

Parquet. Columnar storage for the people

Parquet. Columnar storage for the people Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various

More information

MapReduce With Columnar Storage

MapReduce With Columnar Storage SEMINAR: COLUMNAR DATABASES 1 MapReduce With Columnar Storage Peitsa Lähteenmäki Abstract The MapReduce programming paradigm has achieved more popularity over the last few years as an option to distributed

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not

More information

Simple Solutions for Compressed Execution in Vectorized Database System

Simple Solutions for Compressed Execution in Vectorized Database System University of Warsaw Faculty of Mathematics, Computer Science and Mechanics Vrije Universiteit Amsterdam Faculty of Sciences Alicja Luszczak Student no. 248265(UW), 2128020(VU) Simple Solutions for Compressed

More information

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. "Executive Summary": here.

Principles of Database Management Systems. Overview. Principles of Data Layout. Topic for today. Executive Summary: here. Topic for today Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) How to represent data on disk

More information

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers 1 Table of Contents Table of Contents2 1 Introduction 3 2 Oracle Database In-Memory

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices

The Yin and Yang of Processing Data Warehousing Queries on GPU Devices The Yin and Yang of Processing Data Warehousing Queries on GPU Devices Yuan Yuan Rubao Lee Xiaodong Zhang Department of Computer Science and Engineering The Ohio State University {yuanyu, liru, zhang}@cse.ohio-state.edu

More information

Efficient Compression Techniques for an In Memory Database System

Efficient Compression Techniques for an In Memory Database System Efficient Compression Techniques for an In Memory Database System Hrishikesh Arun Deshpande Member of Technical Staff, R&D, NetApp Inc., Bangalore, India ABSTRACT: Enterprise resource planning applications

More information

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012

In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012 In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational

More information

Column-Oriented Databases to Gain High Performance for Data Warehouse System

Column-Oriented Databases to Gain High Performance for Data Warehouse System International Journal of Advancements in Research & Technology, Volume 2, Issue4, April-2013 235 Column-Oriented Databases to Gain High Performance for Data Warehouse System By Nirmal Lodhi, PHD Research

More information

Query Execution in Column-Oriented Database Systems. Daniel J. Abadi dna@csail.mit.edu

Query Execution in Column-Oriented Database Systems. Daniel J. Abadi dna@csail.mit.edu Query Execution in Column-Oriented Database Systems by Daniel J. Abadi dna@csail.mit.edu M.Phil. Computer Speech, Text, and Internet Technology, Cambridge University, Cambridge, England (2003) & B.S. Computer

More information

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23

DATA WAREHOUSING II. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23 DATA WAREHOUSING II CS121: Introduction to Relational Database Systems Fall 2015 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

SQL Server Column Store Indexes

SQL Server Column Store Indexes SQL Server Column Store Indexes Per-Åke Larson, Cipri Clinciu, Eric N. Hanson, Artem Oks, Susan L. Price, Srikumar Rangarajan, Aleksandras Surna, Qingqing Zhou Microsoft {palarson, ciprianc, ehans, artemoks,

More information

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011 SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,

More information

Performance Verbesserung von SAP BW mit SQL Server Columnstore

Performance Verbesserung von SAP BW mit SQL Server Columnstore Performance Verbesserung von SAP BW mit SQL Server Columnstore Martin Merdes Senior Software Development Engineer Microsoft Deutschland GmbH SAP BW/SQL Server Porting AGENDA 1. Columnstore Overview 2.

More information

Fact Sheet In-Memory Analysis

Fact Sheet In-Memory Analysis Fact Sheet In-Memory Analysis 1 Copyright Yellowfin International 2010 Contents In Memory Overview...3 Benefits...3 Agile development & rapid delivery...3 Data types supported by the In-Memory Database...4

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database

A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database Hasso Plattner Hasso Plattner Institute for IT Systems Engineering University of Potsdam Prof.-Dr.-Helmert-Str. 2-3 14482

More information

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008 One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code

More information

The SAP HANA Database An Architecture Overview

The SAP HANA Database An Architecture Overview The SAP HANA Database An Architecture Overview Franz Färber and Norman May and Wolfgang Lehner and Philipp Große and Ingo Müller and Hannes Rauhe and Jonathan Dees Abstract Requirements of enterprise applications

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

A Comparison of Approaches to Large-Scale Data Analysis

A Comparison of Approaches to Large-Scale Data Analysis A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce

More information

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0

Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0 SQL Server Technical Article Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0 Writer: Eric N. Hanson Technical Reviewer: Susan Price Published: November 2010 Applies to:

More information

Optimizing Performance. Training Division New Delhi

Optimizing Performance. Training Division New Delhi Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Database 2 Lecture I. Alessandro Artale

Database 2 Lecture I. Alessandro Artale Free University of Bolzano Database 2. Lecture I, 2003/2004 A.Artale (1) Database 2 Lecture I Alessandro Artale Faculty of Computer Science Free University of Bolzano Room: 221 artale@inf.unibz.it http://www.inf.unibz.it/

More information

Storage in Database Systems. CMPSCI 445 Fall 2010

Storage in Database Systems. CMPSCI 445 Fall 2010 Storage in Database Systems CMPSCI 445 Fall 2010 1 Storage Topics Architecture and Overview Disks Buffer management Files of records 2 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley Tiber Solutions Understanding the Current & Future Landscape of BI and Data Storage Jim Hadley Tiber Solutions Founded in 2005 to provide Business Intelligence / Data Warehousing / Big Data thought leadership

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Columnstore in SQL Server 2016

Columnstore in SQL Server 2016 Columnstore in SQL Server 2016 Niko Neugebauer 3 Sponsor Sessions at 11:30 Don t miss them, they might be getting distributing some awesome prizes! HP SolidQ Pyramid Analytics Also Raffle prizes at the

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

Data Warehousing With DB2 for z/os... Again!

Data Warehousing With DB2 for z/os... Again! Data Warehousing With DB2 for z/os... Again! By Willie Favero Decision support has always been in DB2 s genetic makeup; it s just been a bit recessive for a while. It s been evolving over time, so suggesting

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 14: DATA STORAGE AND REPRESENTATION Data Storage Memory Hierarchy Disks Fields, Records, Blocks Variable-length

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing The Vertica Analytic Database Technical Overview White Paper A DBMS Architecture Optimized for Next-Generation Data Warehousing Copyright Vertica Systems Inc. March, 2010 Table of Contents Table of Contents...2

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional

More information

How To Write A Database Program

How To Write A Database Program SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store

More information

MapReduce for Data Warehouses

MapReduce for Data Warehouses MapReduce for Data Warehouses Data Warehouses: Hadoop and Relational Databases In an enterprise setting, a data warehouse serves as a vast repository of data, holding everything from sales transactions

More information

05. Alternative Speichermodelle. Architektur von Datenbanksystemen I

05. Alternative Speichermodelle. Architektur von Datenbanksystemen I 05. Alternative Speichermodelle Architektur von Datenbanksystemen I Einführung LETZTE VORLESUNG ROW-BASED RECORD MANAGEMENT klassisches N-äres Speichermodell (NSM), auch row-store NSM = Normalized Storage

More information

NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring

NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring Paul Giura and Nasir Memon Polytechnic Intitute of NYU, Six MetroTech Center, Brooklyn, NY Abstract. With the increasing

More information

Original-page small file oriented EXT3 file storage system

Original-page small file oriented EXT3 file storage system Original-page small file oriented EXT3 file storage system Zhang Weizhe, Hui He, Zhang Qizhen School of Computer Science and Technology, Harbin Institute of Technology, Harbin E-mail: wzzhang@hit.edu.cn

More information

Object Oriented Database Management System for Decision Support System.

Object Oriented Database Management System for Decision Support System. International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 3, Issue 6 (June 2014), PP.55-59 Object Oriented Database Management System for Decision

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Data Storage - II: Efficient Usage & Errors

Data Storage - II: Efficient Usage & Errors Data Storage - II: Efficient Usage & Errors Week 10, Spring 2005 Updated by M. Naci Akkøk, 27.02.2004, 03.03.2005 based upon slides by Pål Halvorsen, 12.3.2002. Contains slides from: Hector Garcia-Molina

More information

Report Data Management in the Cloud: Limitations and Opportunities

Report Data Management in the Cloud: Limitations and Opportunities Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

RESEARCH PLAN PROPOSAL

RESEARCH PLAN PROPOSAL RESEARCH PLAN PROPOSAL Performance Enhancement Techniques of Cloud Database Queries For registration to Doctor of Philosophy IN THE FACULTY OF COMPUTER SCIENCE to THE IIS UNIVERSITY, JAIPUR Submitted By:

More information

Query Optimization in Cloud Environment

Query Optimization in Cloud Environment Query Optimization in Cloud Environment Cindy Chen Computer Science Department University of Massachusetts Lowell May 31, 2014 OUTLINE Introduction Our Approach Performance Evaluation Conclusion and Future

More information

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

More information

Enterprise Edition Analytic Data Warehouse Technology White Paper

Enterprise Edition Analytic Data Warehouse Technology White Paper Enterprise Edition Analytic Data Warehouse Technology White Paper August 2008 Infobright 47 Colborne Lane, Suite 403 Toronto, Ontario M5E 1P8 Canada www.infobright.com info@infobright.com Table of Contents

More information

Optimizing Your Data Warehouse Design for Superior Performance

Optimizing Your Data Warehouse Design for Superior Performance Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

Main Memory & Near Main Memory OLAP Databases. Wo Shun Luk Professor of Computing Science Simon Fraser University

Main Memory & Near Main Memory OLAP Databases. Wo Shun Luk Professor of Computing Science Simon Fraser University Main Memory & Near Main Memory OLAP Databases Wo Shun Luk Professor of Computing Science Simon Fraser University 1 Outline What is OLAP DB? How does it work? MOLAP, ROLAP Near Main Memory DB Partial Pre

More information

Querying data warehouses efficiently using the Bitmap Join Index OLAP Tool

Querying data warehouses efficiently using the Bitmap Join Index OLAP Tool CLEI ELECTRONIC JOURNAL, VOLUME 15, NUMBER 2, PAPER 7, AUGUST 2012 Querying data warehouses efficiently using the Bitmap Join Index OLAP Tool Anderson Chaves Carniel São Paulo Federal Institute of Education,

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems

Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems Ingo Müller #, Cornelius Ratsch #, Franz Faerber # ingo.mueller@kit.edu, cornelius.ratsch@sap.com, franz.faerber@sap.com

More information

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University

Operating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications

More information

SanssouciDB: An In-Memory Database for Processing Enterprise Workloads

SanssouciDB: An In-Memory Database for Processing Enterprise Workloads SanssouciDB: An In-Memory Database for Processing Enterprise Workloads Hasso Plattner Hasso-Plattner-Institute University of Potsdam August-Bebel-Str. 88 14482 Potsdam, Germany Email: hasso.plattner@hpi.uni-potsdam.de

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART A: Architecture Chapter 1: Motivation and Definitions Motivation Goal: to build an operational general view on a company to support decisions in

More information

EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP

EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP ABSTRACT EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP Muthukumar Murugesan 1, T. Ravichandran 2 1 Research Scholar, Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu-641021,

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Storing Data: Disks and Files

Storing Data: Disks and Files Storing Data: Disks and Files (From Chapter 9 of textbook) Storing and Retrieving Data Database Management Systems need to: Store large volumes of data Store data reliably (so that data is not lost!) Retrieve

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES

EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information