low-level storage structures e.g. partitions underpinning the warehouse logical table structures

Similar documents

ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #

Data Warehousing Concepts

Indexing Techniques for Data Warehouses Queries. Abstract

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Actian Vector in Hadoop

Big Data With Hadoop

Fault Tolerance in Hadoop for Work Migration

In-Memory Data Management for Enterprise Applications

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG

chapater 7 : Distributed Database Management Systems

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

The Hadoop Distributed File System

CSE-E5430 Scalable Cloud Computing Lecture 2

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Distributed File Systems

Who am I? Copyright 2014, Oracle and/or its affiliates. All rights reserved. 3

Hadoop & its Usage at Facebook

Design and Evolution of the Apache Hadoop File System(HDFS)

DATA WAREHOUSING AND OLAP TECHNOLOGY

The Cubetree Storage Organization

Cloud Computing at Google. Architecture

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

ORACLE DATABASE 10G ENTERPRISE EDITION

Hadoop Architecture. Part 1

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

The Hadoop Distributed File System

Module 14: Scalability and High Availability

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe

Chapter 13 File and Database Systems

Chapter 13 File and Database Systems

Safe Harbor Statement

Inge Os Sales Consulting Manager Oracle Norway

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

A survey of big data architectures for handling massive data

Distributed File Systems

Maximizing Materialized Views

HDFS Architecture Guide

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Comparing SQL and NOSQL databases

CS54100: Database Systems

SQL Server 2008 Performance and Scale

NoSQL and Hadoop Technologies On Oracle Cloud

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

CitusDB Architecture for Real-Time Big Data

The Sierra Clustered Database Engine, the technology at the heart of

GraySort and MinuteSort at Yahoo on Hadoop 0.23

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Hadoop & its Usage at Facebook

Using distributed technologies to analyze Big Data

Apache Hadoop FileSystem and its Usage in Facebook

The Classical Architecture. Storage 1 / 36

Bigdata High Availability (HA) Architecture

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Oracle Architecture, Concepts & Facilities

Apache Hadoop. Alexandru Costan

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

Oracle Database 11g: SQL Tuning Workshop Release 2

Oracle Database In-Memory The Next Big Thing

Query Acceleration of Oracle Database 12c In-Memory using Software on Chip Technology with Fujitsu M10 SPARC Servers

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

In-Memory Databases MemSQL

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Distributed Data Management

International Journal of Advance Research in Computer Science and Management Studies

Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse

Oracle Database 11g: SQL Tuning Workshop

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Hypertable Architecture Overview

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Database Design Patterns. Winter Lecture 24

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Data Management in the Cloud

Innovative technology for big data analytics

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Capacity Planning Process Estimating the load Initial configuration

Big Fast Data Hadoop acceleration with Flash. June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Transcription:

DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures supporting efficient access paths to logical table data - e.g. bitmap indexes additional structures supporting more efficient query processing e.g. materialized views. Physical design decisions are driven by query performance and warehouse maintenance considerations. 1

The previous refresher note Database Definition introduced the basic physical storage structures supported by a mainstream relational DBMS, including tablespaces, segments, extents, block structures, B-tree indexes and clusters. This note introduces the additional constructs often found at the physical level in a data warehouse and their use in query processing. LARGE TABLE SUPPORT - PARTITIONIING A table is a logical structure forming the basic unit of access in SQL. It has already been seen that normally an Oracle table is contained in a single tablespace, which is the basic unit of storage organization, but the stored rows may be held in multiple files. For very large tables and indexes, partitioning allows decomposition into smaller, more manageable structures. This enables: enhanced query optimization easier administration of tables/indexes increased availability. 2

The basis for partitioning can be on the basis of: ranges of values for a specific column or columns within a table e.g. partitioning sales data by month an explicit list of values e.g. partitioning inventory data by warehouse id hashing e.g. to ensure an even distribution of data between partitions where no obvious range or list partitioning strategy. Each partition of a table or index can often be treated as a separate object as well as part of the overall table/index. For example, consider a large table/associated indexes containing sales data which has been range partitioned with one partition for each month s data. Then: 3

Queries only involving the most recent month s data need access the single partition containing that data remaining partitions may be offline. Queries may use multiple processes to access partitions in parallel. Partitions for months no longer required can be easily dropped without affecting partitions holding more recent months data. If the table needs reorganization at the physical level, perhaps to reduce the number of chained rows, this can be done on a partition by partition basis thus requiring much reduced temporary storage. If an index needs to be rebuilt, each index partition can be rebuilt one by one, again with much reduced overhead. All of the above are likely to be beneficial in a warehousing application. Parallel queries, though, might reduce performance in OLTP applications the extra overhead initiating parallel queries may not be justified for short, fast transactions accessing few rows. 4

LARGE TABLE SUPPORT - PARALLELISM AND COMPRESSION It has just been seen that partitioning provides one basis for exploiting parallelism in a warehouse. Striping data across multiple disks or RAID systems also allows parallelism to be exploited. This can be done whether or not tables are partitioned. Striping data avoids I/O bottlenecks by allowing multiple controllers, I/O channels and internal buses to be used to increase bandwidth of data movement to and from disk. Striping of partitioned tables/indexes can also be used to support increased availability of data if each partition has its own set of disks/files and a disk fails, access to data in remaining partitions is still possible. Data segments may be compressed, enabling reduced disk use and smaller buffer cache sizes at the cost of extra CPU, particularly on data updates. Hence, compression is most suitable for read-only data as often found in warehouses. Bitmap indexes, introduced below, also lend themselves to efficient compression techniques. 5

BITMAP INDEXES Traditional B-tree indexes allow fast access to indexed rows, and can be updated efficiently as updates are made to the indexed rows. However, they can be very expensive in terms of storage space used, particularly when multiple indexes to support ad hoc queries are defined on a table. Also, efficient updating may be not needed in a data warehouse consisting of data which once loaded into the warehouse is not updated. Bitmap indexes provide an alternative which can be more space efficient while providing efficient access for the ad hoc queries often required for a data warehouse. Bitmap indexes are most effective when the number of distinct values that a column can hold is relatively small compared to the number of rows, e.g. < 1%. Such low cardinality certainly holds for a column such as gender which holds values M or F, but it equally holds for a column holding approaching 10000 possible values in a table of 1 million rows. 6

In a bitmap index: A separate bitmap is held for each possible value for an indexed column in a table. Each position in the bitmap corresponds to a row in the table. The bit corresponding to a row is set to 1 in a bitmap if that row holds the value corresponding to the bitmap, otherwise the bit is set to 0. For example, consider the customer table in the FoodMart database with three bitmap indexes on marital_status, gender, and houseowner and index entries shown in respect of the rows for customers with ids from 1 to 4. customer_id marital_status marital_status gender gender houseowner houseowner = M = S = M = F = Y = N 1 1 0 0 1 1 0 2 0 1 1 0 0 1 3 1 0 0 1 1 0 4 1 0 1 0 0 1 7

The bitmap indexes can be used to efficiently answer queries such as: How many male married customers are there? The marital_status = M and gender = M bitmaps can be anded together producing bitmap 0001 for the four customer rows shown and the number of bits set can be counted. Retrieve customer records for customers who are either female houseowners or single males. The gender = F and houseowner = Y bitmaps can be anded together producing bitmap 1010 for the four customers shown. The gender = M and marital_status = S bitmaps can be anded together producing bitmap 0100. Finally, the two anded bitmaps may be ored - producing a final bitmap 1110 showing that the customers with id 1 to 3 will be among those satisfying the query. 8

Bitmap indexes may also be used to support join processing. For example consider a sales table including the following rows: sales_id customer_id 1 2 2 3 3 3 4 1 Then a bitmap join index could be created on the sales table in respect of related customer marital_status values: sales_id marital_status marital_status = M = S 1 0 1 2 1 0 3 1 0 4 1 0 Hence, the bitmap identifies the rows in sales which will join with rows in customer with a given value for marital_status. 9

The index would then efficiently support a query such as SELECT S.SALES_ID, C.MARITAL_STATUS FROM SALES S, CUSTOMER C WHERE S.CUSTOMER_ID = C.CUSTOMER_ID Oracle supports bitmap indexes such as introduced above. For example, indexes could be created for customer and sales tables as follows: CREATE BITMAP INDEX cust_marital_status_bx ON customer(marital_status); CREATE BITMAP INDEX sales_cust_marital_status_jbx ON sales(customer.marital_status) FROM sales, customer WHERE sales.customer_id = customer.customer_id; While bitmap indexes work well for read-only warehouse data, they are not suited for tables with many updates since a stored bitmap record referencing many rows will be locked for update to reflect a changed value in the indexed column. 10

So far we have seen that each possible value for a column has been represented by a separate bitmap. For columns which have more than a small number of values, holding a bitmap for each value may require too much space. Encoded bitmap indexes use schemes which encode the possible values of a column in more space efficient ways than a separate bitmap for each possible value. Some encoded bitmap indexes can also efficiently support a wider range of queries than just exact match with a single value. For example, consider a column which may contain a value between 1 and 6. A conventional bitmap index would have 6 separate bitmaps. An alternative approach would be encode ranges of values <2, <3, <4, <5, <6 resulting in 5 bitmaps only. Such a scheme enables queries with a search condition < n to be answered by reference to a single bitmap, while still enabling exact match queries to be answered by reference to 2 bitmaps only. 11

For example, a row with a value 4 for the column would have the corresponding bit set to 1 in the <5 and <6 bitmaps with 0 set in the remainder. Hence, rows with a value 4 are those whose corresponding bit in the <4 bitmap is set to 0 while the corresponding bit in the <5 is set to 1. The example above encodes ranges of values. Another approach encodes intervals rather than ranges e.g. 1-3, 2-4, 3-5, 4-6 resulting in 4 bitmaps only. Now support for queries with a lower and upper bound on values are directly supported, while again exact match queries can be answered by reference to 2 bitmaps only. For example, rows with a value 4 are those whose corresponding bit in the 1-3 bitmap is set to 0 while the corresponding bit in the 2-4 bitmap is set to 1. 12

If a bitmap index exists on each foreign key in the fact table of a star schema, the query optimizer can use a technique known as a star transformation or star join optimization. Consider the following query on the FoodMart schema: select t.week_of_year, p.product_id, p.product_name, s.store_city, sum(sf.store_sales) sales_sum from sales_fact_1998 sf, store s, product p, time_by_day t where sf.store_id = s.store_id and sf.product_id = p.product_id and sf.time_id = t.time_id and s.store_country = USA and p.product_class_id = 94 and t.quarter = Q1 group by t.week_of_year, p.product_id, p.product_name, s.store_city order by t.week_of_year, p.product_id, sales_sum desc, s.store_city This query has a structure which is often found: an aggregate of a measure in a fact table involving a number of dimensions is computed, with selection conditions being applied to the dimensions. 13

If a bitmap index exists on each of the foreign key columns store_id, product_id and time_id in the fact table sales_fact_1998, then the optimizer can optimize the query as follows. Execute the selection conditions on the dimension tables s.store_country = USA p.product_class_id = 94 t.quarter = Q1 to identify the key values of s.store_id, p.product_id and t.time_id satisfying the selection conditions. Access the bitmap indexes on sf.store_id, sf.product_id and sf.time_id foreign keys in sales_fact_1998 which will identify the rows in the fact table with matching foreign key values in each case. Intersect the bitmaps to identify those rows in the fact table which have matching key values in all of the dimension tables involved in the selection conditions. Only join those matching rows in the fact table with rows in the dimension tables. 14

Another use of a bitmap is in a Bloom Filter which tests whether an element with a particular value is a member of a set. Unlike other data structures such as search trees and conventional hash tables, Bloom filters are more space efficient since the value itself is not stored. Also, the structure can represent a set with any number of elements. In the context of data warehouses, a Bloom Filter is used in some architectures to test whether data with a particular key value is stored within the warehouse. If not, unnecessary disk accesses for non-existent data can be avoided. This is particularly valuable in warehouse architectures built on distributed file systems where access to a remote node is thereby avoided. The Bloom Filter algorithm uses a number of different hash algorithms: we assume 3 in the example below, h 1, h 2, h 3. Also, a bit vector is used, which initially has all elements set to 0. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15

If a data item with key k is inserted into the warehouse, each hash function h 1, h 2, h 3 is applied to k giving 3 results each of which identifies a position in the bit vector. The bits in those positions are set to 1 if currently 0. h 1 (k) h 2 (k) h 3 (k) 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 To test whether a data item with key k exists in the warehouse, each hash function is applied to k. If any of the bit positions identified is 0, then the data item cannot exist in the warehouse. The approach may lead to false positive: all bit positions being 1 for a key value does not guarantee that the data item exists in the warehouse. It does not lead to false negatives however: if any bit position is 0,then the data item does not exist in the warehouse. As the size of the bit vector increases, the number of false positives decreases. For a given number of data items and bit vector size, the number of hash functions needed to minimize the probability of false positives can be calculated. 16

MATERIALIZED VIEWS Materialized view tables have already been introduced as a logical data warehouse structure. Since their purpose is to increase query performance, and they may be transparent to users of a warehouse, they are perhaps better considered along with indexes as structures to support efficient access paths. Although users of a warehouse may access materialized views tables directly, they are often user for query rewrite by the query optimizer without explicit reference to them in SQL queries. 17

For example, consider the FoodMart sales_fact _1998 table. Assume that the sales performance of stores needs to be analyzed frequently, resulting in queries like: select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from sales_fact_1998 group by store_id, time_id; Query Plan -------------------------------------------- SELECT STATEMENT Cost = 396 SORT GROUP BY TABLE ACCESS FULL SALES_FACT_1998 On a large fact table, performance for this query may be unacceptable. A materialized view to support this query could be created in Oracle as follows: 18

CREATE MATERIALIZED VIEW sales_group_store_1998 build immediate refresh on demand enable query rewrite as select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from sales_fact_1998 group by store_id, time_id; The materialized view definition specifies that: The materialized view should be populated from the underlying base table immediately - build immediate. The alternative parameter build deferred indicates that the materialized view will be populated by the next REFRESH operation. 19

Modifications (updates/inserts/deletes) to the underlying base table are propagated to the materialized view by running a system procedure when required - refresh on demand. The alternative parameters include: complete the defining query of the materialized view is executed. fast an incremental refresh is performed which takes account of changes within the underlying tables. force will perform a fast refresh is possible, otherwise a complete refresh. on commit a fast refresh is to occur whenever the database commits a transaction that operates on an underlying table. The materialized view can be used for query rewrite - enable query rewrite. This is explained further below. 20

The materialized view definition can be used directly in a query, for example: select * from sales_group_store_1998 where store_id between 1 and 20; Query Plan -------------------------------------------- SELECT STATEMENT Cost = 2 TABLE ACCESS FULL SALES_GROUP_STORE_1998 The corresponding query on the base table would be: select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from fm_admin.sales_fact_1998 where store_id between 1 and 20 group by store_id, time_id; Query Plan -------------------------------------------- SELECT STATEMENT Cost = 413 SORT GROUP BY TABLE ACCESS FULL SALES_FACT_1998 21

With query rewrite enabled, the materialized view may also be used transparently: ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE; ALTER SESSION SET QUERY_REWRITE_INTEGRITY = ENFORCED; select store_id, time_id, sum(store_sales), sum(store_cost), sum(unit_sales) from fm_admin.sales_fact_1998 where store_id between 1 and 20 group by store_id, time_id; Query Plan ------------------------------------------------ SELECT STATEMENT Cost = 2 TABLE ACCESS FULL SALES_GROUP_STORE_1998 22

Query rewrite has been enabled with: The statement ALTER SESSION SET QUERY_REWRITE_ENABLED = TRUE ALTER SESSION SET QUERY_REWRITE_INTEGRITY = ENFORCED controls how Oracle rewrites queries. In this case ENFORCED ensures that queries are only rewritten using the constructs which Oracle itself enforces and are thereby guaranteed correct such as the materialized view in the example. 23

Alternative parameters to ALTER SESSION SET QUERY_REWRITE_INTEGRITY are: TRUSTED This additionally allows Oracle to use constructs which are user-specified. For example, a pre-existing table may be specified as a materialized view. A value TRUSTED would enable such a materialized view to be used in query rewrite, Oracle trusting that the pre-built table contains correct data for the specified view. STALE_TOLERATED This additionally allows Oracle to use a materialized view even if the contents are out of synchronization with the source tables. This might be acceptable for some applications. 24

DIMENSIONS Dimensions have already been introduced as a logical design construct in multidimensional models. If dimensions are explicitly created, they may give performance advantages through additional optimizations being possible with query rewrite. They may also improve the performance of materialized view refresh operations. For example consider the following dimension created with Oracle: create dimension cust_dim level region is customer.customer_region_id level province is customer.state_province level country is customer.country hierarchy geog_rollup ( region child of province child of country ) 25

In this example, the 1:n hierarchical relationship between attributes in the customer table is defined. Dimensions may also be defined in respect of 1:n relationships in multiple tables as well as relationships between a hierarchy level and other functionally dependent attributes. With such a dimension, a materialized view is not needed for each level in the hierarchy for efficient evaluation of an aggregation at that level. For example, a materialized view aggregating at the region level may still be used in the evaluation of a query aggregating at the province or country level given the definition of cust_dim. 26

COLUMN STORES Relational DBMS implementations have traditionally been row-oriented: at the storage level the logical rows of tables are stored as individual records or chained records if the record representing an entire row is too large to fit in a page. Alternative architectures have been researched for many years in which columns rather than rows are the primary basis of physical organization. These column stores have become of particular interest in recent years for data warehouse applications. Since many data warehousing applications require the analysis of data in just some of the columns of a table, even without sophisticated implementation techniques columns stores are likely to have performance advantages over row stores since it is not necessary to retrieve records representing all the columns in a table. Relational DBMS vendors increasingly incorporate column store technology within their own architectures to enhance performance with data warehouse analytics workloads, 27

The advantages can be even greater if the following optimization techniques are incorporated in the storage and query execution levels of column store implementations. Storing columns rather than rows of data is likely to allow compression techniques allowing the values of a column in a table with very many rows to be efficiently stored. For example, if a column is sorted, and the same value repeats many times, it can be stored as a value and the number of repeats rather than the value being stored multiple times. Whether sorted or not, a column of a table can be stored conceptually as an identifier for the row together with an encoding of the value of that column in that row. A fixed width may be used for both the row identifier and the encoded value which means the row identifier need not be physically stored at all the storage position of the value for a row may be computed as an offset. 28

The performance advantages of compression are enhanced if the compressed data can be operated on without decompression first by the query execution engine: this can be done in many cases with late materialization techniques. For example, if values are represented as offsets in columns, these can be manipulated efficiently using bit level representations as with bitmap indexes seen earlier. Hence: The construction of tuples may be avoided in many cases, with only the columns required in the final result materialized at a late stage in the query execution. The need to decompress other than the final result data may be avoided. Cache memory may be used efficiently given the compact bit level representation of values. Block iteration query processing is possible in which blocks of values in columns are processed by an operator in a single function call, enabling efficient parallel execution using pipelining techniques. 29

Storage structures have also been developed to to scale to very large amounts of data processed on commodity hardware. An example of this is HDFS (Hadoop Distributed File System) which has been developed as part of the Apache Hadoop project. HDFS is designed to be: Scalable to store very large amounts of data. Economical by utilising clusters of nodes running on commodity hardware with heterogeneous operating systems any machine which runs Java can run HDFS. Fault tolerant with automatic recovery in the presence of failed nodes. The HDFS architecture supports interconnected clusters, each of which consists of: Multiple DataNodes which store data in blocks in files. A single NameNode that manages the file system namespace and maintains mappings of file blocks to DataNodes. 30

HDFS supports operations to create and delete directories, and create, read, write and delete files within directories. A user application does not need to be aware of the distributed nature of the architecture. HDFS replicates data blocks for fault tolerance: typically HDFS clusters are spread across multiple hardware racks and a NameNode aims to place replicated blocks on multiple racks. When a user application reads a file, the HDFS client asks the NameNode for a list of DataNodes that hold the replicas of the file s blocks. It then requests transfer of the desired block from the closest DataNode to the reader. During normal operation, each DataNode sends periodic (default 3 seconds) messages called heartbeats to the NameNode to confirm availability. If a NameNode does not receive a heartbeat in 10 minutes, the NameNode considers the DataNode to have failed and schedules creation of the unavailable block replicas at other DataNodes. When a user application writes to a file, the HDFS client caches data in a temporary local file which only gets written to the HDFS file when a block is filled. The client flushes the block to one DataNode which itself flushes the block to the next DataNode holding replicas in a pipeline fashion. 31

READING D J Abadi, S R Madden, N Hachem, Column-Stores vs. Row-Stores: How Different Are They Really?, Proc SIGMOD 08, June 2008. (Section 6 optional.) P-Å Larson et al., Enhancements to SQL Server Column Stores, Proc SIGMOD 13, June 2013. (Section 5 optional.) J Jeffrey Hanson, An Introduction to the Hadoop Distributed File System, IBM developerworks, February 2011. FOR REFERENCE Oracle Database Data Warehousing Guide, Part I Data Warehouse Fundamentals (Chapters 3-4), Part II Optimizing Data Warehouses (Chapters 5-6, 9-11). 32