low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Size: px
Start display at page:

Download "low-level storage structures e.g. partitions underpinning the warehouse logical table structures"

Transcription

1 DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures supporting efficient access paths to logical table data - e.g. bitmap indexes additional structures supporting more efficient query processing e.g. materialized views. Physical design decisions are driven by query performance and warehouse maintenance considerations. 1

2 The previous refresher note Database Definition introduced the basic physical storage structures supported by a mainstream relational DBMS, including tablespaces, segments, extents, block structures, B-tree indexes and clusters. This note introduces the additional constructs often found at the physical level in a data warehouse and their use in query processing. LARGE TABLE SUPPORT - PARTITIONIING A table is a logical structure forming the basic unit of access in SQL. It has already been seen that normally an Oracle table is contained in a single tablespace, which is the basic unit of storage organization, but the stored rows may be held in multiple files. For very large tables and indexes, partitioning allows decomposition into smaller, more manageable structures. This enables: enhanced query optimization easier administration of tables/indexes increased availability. 2

3 The basis for partitioning can be on the basis of: ranges of values for a specific column or columns within a table e.g. partitioning sales data by month an explicit list of values e.g. partitioning inventory data by warehouse id hashing e.g. to ensure an even distribution of data between partitions where no obvious range or list partitioning strategy. Each partition of a table or index can often be treated as a separate object as well as part of the overall table/index. For example, consider a large table/associated indexes containing sales data which has been range partitioned with one partition for each month s data. Then: 3

4 Queries only involving the most recent month s data need access the single partition containing that data remaining partitions may be offline. Queries may use multiple processes to access partitions in parallel. Partitions for months no longer required can be easily dropped without affecting partitions holding more recent months data. If the table needs reorganization at the physical level, perhaps to reduce the number of chained rows, this can be done on a partition by partition basis thus requiring much reduced temporary storage. If an index needs to be rebuilt, each index partition can be rebuilt one by one, again with much reduced overhead. All of the above are likely to be beneficial in a warehousing application. Parallel queries, though, might reduce performance in OLTP applications the extra overhead initiating parallel queries may not be justified for short, fast transactions accessing few rows. 4

5 LARGE TABLE SUPPORT - PARALLELISM AND COMPRESSION It has just been seen that partitioning provides one basis for exploiting parallelism in a warehouse. Striping data across multiple disks or RAID systems also allows parallelism to be exploited. This can be done whether or not tables are partitioned. Striping data avoids I/O bottlenecks by allowing multiple controllers, I/O channels and internal buses to be used to increase bandwidth of data movement to and from disk. Striping of partitioned tables/indexes can also be used to support increased availability of data if each partition has its own set of disks/files and a disk fails, access to data in remaining partitions is still possible. Data segments may be compressed, enabling reduced disk use and smaller buffer cache sizes at the cost of extra CPU, particularly on data updates. Hence, compression is most suitable for read-only data as often found in warehouses. Bitmap indexes, introduced below, also lend themselves to efficient compression techniques. 5

6 BITMAP INDEXES Traditional B-tree indexes allow fast access to indexed rows, and can be updated efficiently as updates are made to the indexed rows. However, they can be very expensive in terms of storage space used, particularly when multiple indexes to support ad hoc queries are defined on a table. Also, efficient updating may be not needed in a data warehouse consisting of data which once loaded into the warehouse is not updated. Bitmap indexes provide an alternative which can be more space efficient while providing efficient access for the ad hoc queries often required for a data warehouse. Bitmap indexes are most effective when the number of distinct values that a column can hold is relatively small compared to the number of rows, e.g. < 1%. Such low cardinality certainly holds for a column such as gender which holds values M or F, but it equally holds for a column holding approaching possible values in a table of 1 million rows. 6

7 In a bitmap index: A separate bitmap is held for each possible value for an indexed column in a table. Each position in the bitmap corresponds to a row in the table. The bit corresponding to a row is set to 1 in a bitmap if that row holds the value corresponding to the bitmap, otherwise the bit is set to 0. For example, consider the customer table in the FoodMart database with three bitmap indexes on marital_status, gender, and houseowner and index entries shown in respect of the rows for customers with ids from 1 to 4. customer_id marital_status marital_status gender gender houseowner houseowner = M = S = M = F = Y = N

8 The bitmap indexes can be used to efficiently answer queries such as: How many male married customers are there? The marital_status = M and gender = M bitmaps can be anded together producing bitmap 0001 for the four customer rows shown and the number of bits set can be counted. Retrieve customer records for customers who are either female houseowners or single males. The gender = F and houseowner = Y bitmaps can be anded together producing bitmap 1010 for the four customers shown. The gender = M and marital_status = S bitmaps can be anded together producing bitmap Finally, the two anded bitmaps may be ored - producing a final bitmap 1110 showing that the customers with id 1 to 3 will be among those satisfying the query. 8

9 Bitmap indexes may also be used to support join processing. For example consider a sales table including the following rows: sales_id customer_id Then a bitmap join index could be created on the sales table in respect of related customer marital_status values: sales_id marital_status marital_status = M = S Hence, the bitmap identifies the rows in sales which will join with rows in customer with a given value for marital_status. 9

10 The index would then efficiently support a query such as SELECT S.SALES_ID, C.MARITAL_STATUS FROM SALES S, CUSTOMER C WHERE S.CUSTOMER_ID = C.CUSTOMER_ID Oracle supports bitmap indexes such as introduced above. For example, indexes could be created for customer and sales tables as follows: CREATE BITMAP INDEX cust_marital_status_bx ON customer(marital_status); CREATE BITMAP INDEX sales_cust_marital_status_jbx ON sales(customer.marital_status) FROM sales, customer WHERE sales.customer_id = customer.customer_id; While bitmap indexes work well for read-only warehouse data, they are not suited for tables with many updates since a stored bitmap record referencing many rows will be locked for update to reflect a changed value in the indexed column. 10

11 So far we have seen that each possible value for a column has been represented by a separate bitmap. For columns which have more than a small number of values, holding a bitmap for each value may require too much space. Encoded bitmap indexes use schemes which encode the possible values of a column in more space efficient ways than a separate bitmap for each possible value. Some encoded bitmap indexes can also efficiently support a wider range of queries than just exact match with a single value. For example, consider a column which may contain a value between 1 and 6. A conventional bitmap index would have 6 separate bitmaps. An alternative approach would be encode ranges of values <2, <3, <4, <5, <6 resulting in 5 bitmaps only. Such a scheme enables queries with a search condition < n to be answered by reference to a single bitmap, while still enabling exact match queries to be answered by reference to 2 bitmaps only. 11

12 For example, a row with a value 4 for the column would have the corresponding bit set to 1 in the <5 and <6 bitmaps with 0 set in the remainder. Hence, rows with a value 4 are those whose corresponding bit in the <4 bitmap is set to 0 while the corresponding bit in the <5 is set to 1. The example above encodes ranges of values. Another approach encodes intervals rather than ranges e.g. 1-3, 2-4, 3-5, 4-6 resulting in 4 bitmaps only. Now support for queries with a lower and upper bound on values are directly supported, while again exact match queries can be answered by reference to 2 bitmaps only. For example, rows with a value 4 are those whose corresponding bit in the 1-3 bitmap is set to 0 while the corresponding bit in the 2-4 bitmap is set to 1. 12

13 If a bitmap index exists on each foreign key in the fact table of a star schema, the query optimizer can use a technique known as a star transformation or star join optimization. Consider the following query on the FoodMart schema: select t.week_of_year, p.product_id, p.product_name, s.store_city, sum(sf.store_sales) sales_sum from sales_fact_1998 sf, store s, product p, time_by_day t where sf.store_id = s.store_id and sf.product_id = p.product_id and sf.time_id = t.time_id and s.store_country = USA and p.product_class_id = 94 and t.quarter = Q1 group by t.week_of_year, p.product_id, p.product_name, s.store_city order by t.week_of_year, p.product_id, sales_sum desc, s.store_city This query has a structure which is often found: an aggregate of a measure in a fact table involving a number of dimensions is computed, with selection conditions being applied to the dimensions. 13

14 If a bitmap index exists on each of the foreign key columns store_id, product_id and time_id in the fact table sales_fact_1998, then the optimizer can optimize the query as follows. Execute the selection conditions on the dimension tables s.store_country = USA p.product_class_id = 94 t.quarter = Q1 to identify the key values of s.store_id, p.product_id and t.time_id satisfying the selection conditions. Access the bitmap indexes on sf.store_id, sf.product_id and sf.time_id foreign keys in sales_fact_1998 which will identify the rows in the fact table with matching foreign key values in each case. Intersect the bitmaps to identify those rows in the fact table which have matching key values in all of the dimension tables involved in the selection conditions. Only join those matching rows in the fact table with rows in the dimension tables. 14

15 Another use of a bitmap is in a Bloom Filter which tests whether an element with a particular value is a member of a set. Unlike other data structures such as search trees and conventional hash tables, Bloom filters are more space efficient since the value itself is not stored. Also, the structure can represent a set with any number of elements. In the context of data warehouses, a Bloom Filter is used in some architectures to test whether data with a particular key value is stored within the warehouse. If not, unnecessary disk accesses for non-existent data can be avoided. This is particularly valuable in warehouse architectures built on distributed file systems where access to a remote node is thereby avoided. The Bloom Filter algorithm uses a number of different hash algorithms: we assume 3 in the example below, h 1, h 2, h 3. Also, a bit vector is used, which initially has all elements set to

16 If a data item with key k is inserted into the warehouse, each hash function h 1, h 2, h 3 is applied to k giving 3 results each of which identifies a position in the bit vector. The bits in those positions are set to 1 if currently 0. h 1 (k) h 2 (k) h 3 (k) To test whether a data item with key k exists in the warehouse, each hash function is applied to k. If any of the bit positions identified is 0, then the data item cannot exist in the warehouse. The approach may lead to false positive: all bit positions being 1 for a key value does not guarantee that the data item exists in the warehouse. It does not lead to false negatives however: if any bit position is 0,then the data item does not exist in the warehouse. As the size of the bit vector increases, the number of false positives decreases. For a given number of data items and bit vector size, the number of hash functions needed to minimize the probability of false positives can be calculated. 16

17 MATERIALIZED VIEWS Materialized view tables have already been introduced as a logical data warehouse structure. Since their purpose is to increase query performance, and they may be transparent to users of a warehouse, they are perhaps better considered along with indexes as structures to support efficient access paths. Although users of a warehouse may access materialized views tables directly, they are often user for query rewrite by the query optimizer without explicit reference to them in SQL queries. 17

18 For example, consider the FoodMart sales_fact _1998 table. Assume that the sales performance of stores needs to be analyzed frequently, resulting in queries like: select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from sales_fact_1998 group by store_id, time_id; Query Plan SELECT STATEMENT Cost = 396 SORT GROUP BY TABLE ACCESS FULL SALES_FACT_1998 On a large fact table, performance for this query may be unacceptable. A materialized view to support this query could be created in Oracle as follows: 18

19 CREATE MATERIALIZED VIEW sales_group_store_1998 build immediate refresh on demand enable query rewrite as select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from sales_fact_1998 group by store_id, time_id; The materialized view definition specifies that: The materialized view should be populated from the underlying base table immediately - build immediate. The alternative parameter build deferred indicates that the materialized view will be populated by the next REFRESH operation. 19

20 Modifications (updates/inserts/deletes) to the underlying base table are propagated to the materialized view by running a system procedure when required - refresh on demand. The alternative parameters include: complete the defining query of the materialized view is executed. fast an incremental refresh is performed which takes account of changes within the underlying tables. force will perform a fast refresh is possible, otherwise a complete refresh. on commit a fast refresh is to occur whenever the database commits a transaction that operates on an underlying table. The materialized view can be used for query rewrite - enable query rewrite. This is explained further below. 20

21 The materialized view definition can be used directly in a query, for example: select * from sales_group_store_1998 where store_id between 1 and 20; Query Plan SELECT STATEMENT Cost = 2 TABLE ACCESS FULL SALES_GROUP_STORE_1998 The corresponding query on the base table would be: select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from fm_admin.sales_fact_1998 where store_id between 1 and 20 group by store_id, time_id; Query Plan SELECT STATEMENT Cost = 413 SORT GROUP BY TABLE ACCESS FULL SALES_FACT_

22 With query rewrite enabled, the materialized view may also be used transparently: ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE; ALTER SESSION SET QUERY_REWRITE_INTEGRITY = ENFORCED; select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from fm_admin.sales_fact_1998 where store_id between 1 and 20 group by store_id, time_id; Query Plan SELECT STATEMENT Cost = 2 TABLE ACCESS FULL SALES_GROUP_STORE_

23 Query rewrite has been enabled with: The statement ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE ALTER SESSION SET QUERY_REWRITE_INTEGRITY = ENFORCED controls how Oracle rewrites queries. In this case ENFORCED ensures that queries are only rewritten using the constructs which Oracle itself enforces and are thereby guaranteed correct such as the materialized view in the example. 23

24 Alternative parameters to ALTER SESSION SET QUERY_REWRITE_INTEGRITY are: TRUSTED This additionally allows Oracle to use constructs which are user-specified. For example, a pre-existing table may be specified as a materialized view. A value TRUSTED would enable such a materialized view to be used in query rewrite, Oracle trusting that the pre-built table contains correct data for the specified view. STALE_TOLERATED This additionally allows Oracle to use a materialized view even if the contents are out of synchronization with the source tables. This might be acceptable for some applications. 24

25 DIMENSIONS Dimensions have already been introduced as a logical design construct in multidimensional models. If dimensions are explicitly created, they may give performance advantages through additional optimizations being possible with query rewrite. They may also improve the performance of materialized view refresh operations. For example consider the following dimension created with Oracle: create dimension cust_dim level region is customer.customer_region_id level province is customer.state_province level country is customer.country hierarchy geog_rollup ( region child of province child of country ) 25

26 In this example, the 1:n hierarchical relationship between attributes in the customer table is defined. Dimensions may also be defined in respect of 1:n relationships in multiple tables as well as relationships between a hierarchy level and other functionally dependent attributes. With such a dimension, a materialized view is not needed for each level in the hierarchy for efficient evaluation of an aggregation at that level. For example, a materialized view aggregating at the region level may still be used in the evaluation of a query aggregating at the province or country level given the definition of cust_dim. 26

27 COLUMN STORES Relational DBMS implementations have traditionally been row-oriented: at the storage level the logical rows of tables are stored as individual records or chained records if the record representing an entire row is too large to fit in a page. Alternative architectures have been researched for many years in which columns rather than rows are the primary basis of physical organization. These column stores have become of particular interest in recent years for data warehouse applications. Since many data warehousing applications require the analysis of data in just some of the columns of a table, even without sophisticated implementation techniques columns stores are likely to have performance advantages over row stores since it is not necessary to retrieve records representing all the columns in a table. Relational DBMS vendors increasingly incorporate column store technology within their own architectures to enhance performance with data warehouse analytics workloads, 27

28 The advantages can be even greater if the following optimization techniques are incorporated in the storage and query execution levels of column store implementations. Storing columns rather than rows of data is likely to allow compression techniques allowing the values of a column in a table with very many rows to be efficiently stored. For example, if a column is sorted, and the same value repeats many times, it can be stored as a value and the number of repeats rather than the value being stored multiple times. Whether sorted or not, a column of a table can be stored conceptually as an identifier for the row together with an encoding of the value of that column in that row. A fixed width may be used for both the row identifier and the encoded value which means the row identifier need not be physically stored at all the storage position of the value for a row may be computed as an offset. 28

29 The performance advantages of compression are enhanced if the compressed data can be operated on without decompression first by the query execution engine: this can be done in many cases with late materialization techniques. For example, if values are represented as offsets in columns, these can be manipulated efficiently using bit level representations as with bitmap indexes seen earlier. Hence: The construction of tuples may be avoided in many cases, with only the columns required in the final result materialized at a late stage in the query execution. The need to decompress other than the final result data may be avoided. Cache memory may be used efficiently given the compact bit level representation of values. Block iteration query processing is possible in which blocks of values in columns are processed by an operator in a single function call, enabling efficient parallel execution using pipelining techniques. 29

30 Storage structures have also been developed to to scale to very large amounts of data processed on commodity hardware. An example of this is HDFS (Hadoop Distributed File System) which has been developed as part of the Apache Hadoop project. HDFS is designed to be: Scalable to store very large amounts of data. Economical by utilising clusters of nodes running on commodity hardware with heterogeneous operating systems any machine which runs Java can run HDFS. Fault tolerant with automatic recovery in the presence of failed nodes. The HDFS architecture supports interconnected clusters, each of which consists of: Multiple DataNodes which store data in blocks in files. A single NameNode that manages the file system namespace and maintains mappings of file blocks to DataNodes. 30

31 HDFS supports operations to create and delete directories, and create, read, write and delete files within directories. A user application does not need to be aware of the distributed nature of the architecture. HDFS replicates data blocks for fault tolerance: typically HDFS clusters are spread across multiple hardware racks and a NameNode aims to place replicated blocks on multiple racks. When a user application reads a file, the HDFS client asks the NameNode for a list of DataNodes that hold the replicas of the file s blocks. It then requests transfer of the desired block from the closest DataNode to the reader. During normal operation, each DataNode sends periodic (default 3 seconds) messages called heartbeats to the NameNode to confirm availability. If a NameNode does not receive a heartbeat in 10 minutes, the NameNode considers the DataNode to have failed and schedules creation of the unavailable block replicas at other DataNodes. When a user application writes to a file, the HDFS client caches data in a temporary local file which only gets written to the HDFS file when a block is filled. The client flushes the block to one DataNode which itself flushes the block to the next DataNode holding replicas in a pipeline fashion. 31

32 READING D J Abadi, S R Madden, N Hachem, Column-Stores vs. Row-Stores: How Different Are They Really?, Proc SIGMOD 08, June (Section 6 optional.) P-Å Larson et al., Enhancements to SQL Server Column Stores, Proc SIGMOD 13, June (Section 5 optional.) J Jeffrey Hanson, An Introduction to the Hadoop Distributed File System, IBM developerworks, February FOR REFERENCE Oracle Database Data Warehousing Guide, Part I Data Warehouse Fundamentals (Chapters 3-4), Part II Optimizing Data Warehouses (Chapters 5-6, 9-11). 32

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771 ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries. Abstract Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

More information

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION

A B S T R A C T. Index Terms : Apache s Hadoop, Map/Reduce, HDFS, Hashing Algorithm. I. INTRODUCTION Speed- Up Extension To Hadoop System- A Survey Of HDFS Data Placement Sayali Ashok Shivarkar, Prof.Deepali Gatade Computer Network, Sinhgad College of Engineering, Pune, India 1sayalishivarkar20@gmail.com

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction

More information

In-Memory Data Management for Enterprise Applications

In-Memory Data Management for Enterprise Applications In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University

More information

Oracle Database Concepts

Oracle Database Concepts Oracle Database Concepts Database Structure The database has logical structures and physical structures. Because the physical and logical structures are separate, the physical storage of data can be managed

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html

Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the

More information

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG 1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System The Hadoop Distributed File System, Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo, 2010 Agenda Topic 1: Introduction Topic 2: Architecture

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches Column-Stores Horizontal/Vertical Partitioning Horizontal Partitions Master Table Vertical Partitions Primary Key 3 Motivation

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We

More information

chapater 7 : Distributed Database Management Systems

chapater 7 : Distributed Database Management Systems chapater 7 : Distributed Database Management Systems Distributed Database Management System When an organization is geographically dispersed, it may choose to store its databases on a central database

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden Robert C. Nickerson ISYS 464 Spring 2003 Topic 23 Database

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

A survey of big data architectures for handling massive data

A survey of big data architectures for handling massive data CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - jordydomingos@gmail.com Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Hadoop Distributed File System. Jordan Prosch, Matt Kipps Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3 Oracle Database In-Memory Power the Real-Time Enterprise Saurabh K. Gupta Principal Technologist, Database Product Management Who am I? Principal Technologist, Database Product Management at Oracle Author

More information

Inge Os Sales Consulting Manager Oracle Norway

Inge Os Sales Consulting Manager Oracle Norway Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

The Cubetree Storage Organization

The Cubetree Storage Organization The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction

More information

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra

More information

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing The Vertica Analytic Database Technical Overview White Paper A DBMS Architecture Optimized for Next-Generation Data Warehousing Copyright Vertica Systems Inc. March, 2010 Table of Contents Table of Contents...2

More information

ORACLE DATABASE 10G ENTERPRISE EDITION

ORACLE DATABASE 10G ENTERPRISE EDITION ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

Module 14: Scalability and High Availability

Module 14: Scalability and High Availability Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 IN-MEMORY DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 Analytical Processing Today Separation of OLTP and OLAP Motivation Online Transaction Processing (OLTP)

More information

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Maximizing Materialized Views

Maximizing Materialized Views Maximizing Materialized Views John Jay King King Training Resources john@kingtraining.com Download this paper and code examples from: http://www.kingtraining.com 1 Session Objectives Learn how to create

More information

Bigdata High Availability (HA) Architecture

Bigdata High Availability (HA) Architecture Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Oracle Database 11g: SQL Tuning Workshop

Oracle Database 11g: SQL Tuning Workshop Oracle University Contact Us: + 38516306373 Oracle Database 11g: SQL Tuning Workshop Duration: 3 Days What you will learn This Oracle Database 11g: SQL Tuning Workshop Release 2 training assists database

More information

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database 11g: SQL Tuning Workshop Release 2 Oracle University Contact Us: 1 800 005 453 Oracle Database 11g: SQL Tuning Workshop Release 2 Duration: 3 Days What you will learn This course assists database developers, DBAs, and SQL developers to

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems Chapter 13 File and Database Systems Outline 13.1 Introduction 13.2 Data Hierarchy 13.3 Files 13.4 File Systems 13.4.1 Directories 13.4. Metadata 13.4. Mounting 13.5 File Organization 13.6 File Allocation

More information

HDFS Architecture Guide

HDFS Architecture Guide by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Zainab Qays Abdulhadi* * Ministry of Higher Education & Scientific Research Baghdad, Iraq Zhang Zuping Hamed Ibrahim Housien**

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

GraySort and MinuteSort at Yahoo on Hadoop 0.23

GraySort and MinuteSort at Yahoo on Hadoop 0.23 GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit. Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application

More information

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Oracle Database In-Memory The Next Big Thing

Oracle Database In-Memory The Next Big Thing Oracle Database In-Memory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database In-Memory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes

More information

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Hadoop Distributed File System. Dhruba Borthakur June, 2007 Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle

More information

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

In-Memory Databases MemSQL

In-Memory Databases MemSQL IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5

More information

SQL Server 2008 Performance and Scale

SQL Server 2008 Performance and Scale SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress* Oracle Database 11 g Performance Tuning Recipes Sam R. Alapati Darl Kuhn Bill Padfield Apress* Contents About the Authors About the Technical Reviewer Acknowledgments xvi xvii xviii Chapter 1: Optimizing

More information

Oracle Architecture, Concepts & Facilities

Oracle Architecture, Concepts & Facilities COURSE CODE: COURSE TITLE: CURRENCY: AUDIENCE: ORAACF Oracle Architecture, Concepts & Facilities 10g & 11g Database administrators, system administrators and developers PREREQUISITES: At least 1 year of

More information

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics data 4

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers 1 Table of Contents Table of Contents2 1 Introduction 3 2 Oracle Database In-Memory

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.

Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs. Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design

More information

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015 7/04/05 Fundamentals of Distributed Systems CC5- PROCESAMIENTO MASIVO DE DATOS OTOÑO 05 Lecture 4: DFS & MapReduce I Aidan Hogan aidhog@gmail.com Inside Google circa 997/98 MASSIVE DATA PROCESSING (THE

More information

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

Data Management in the Cloud

Data Management in the Cloud Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Capacity Planning Process Estimating the load Initial configuration

Capacity Planning Process Estimating the load Initial configuration Capacity Planning Any data warehouse solution will grow over time, sometimes quite dramatically. It is essential that the components of the solution (hardware, software, and database) are capable of supporting

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information