BBM467 Data Intensive ApplicaAons

Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr

FoundaAons of Data[base] Clusters Database Clusters Hardware Architectures Data Design Schemes ReplicaAon Schemes Query Parallelism Logical Cluster OrganizaAon ReplicaAon Management

Database Clusters A cluster of computers can be thought as a single compuang resource. It ualizes mulaple machines to provide a more powerful compuang environment through a single system image. There are two types clusters high availability clusters (HA) high performance compu5ng clusters (HPC)

Hardware Architectures: Shared Memory All processors have access to the main memory and the disk, respecavely. The processors are Aghtly coupled inside the same box and interconnected with a special switch. The interprocess communicaaon is done by using a shared memory. The shared- memory approach presents simplicity and allows for load balancing as well as inter- query parallelism which comes for free. However, it is too expensive since it requires a special interconnect among the processors. P P P D D M Its performance and scalability are limited with the available memory and communicaaon bandwidths.

Hardware Architectures: Shared Disk In the shared- disk approach, all processors have their own memory, but they share disks. The interprocess communicaaon occurs over a common high- speed bus. Provides high availability. All data is sall accessible even when a node fails. Since each node has its own data cache, cache coherency must be maintained, e.g. by means of a lock manager, which results in reduced performance. Shared- disk systems have limited scalability due to bandwidth of the high- speed bus and potenaal bo7lenecks of shared hardware. M M P P D D D

Hardware Architectures: Shared Nothing In a shared- nothing architecture, each node is a complete stand- alone computer with its own memory and disk. M M The nodes are connected via switch or LAN. But, they do not share anything. D P P D The main advantages of such systems are very good scalability and high availability. P D However, the management of data is complicated and the programming with this model is harder due to importance of data paraaoning and allocaaon. M

ParAAoning Schemes Ver$cal Par$$oning: VerAcal paraaoning divides the columns of a table into separate tables. VerAcal paraaoning makes projecaons and joins easier and helps opamizing access to the cache by reducing size of the tuples. However, access to the whole table may be required anyway, when execuang queries. Horizontal Par$$oning: Horizontal paraaoning divides a table along its tuples. Its basic advantage is to allow parallel scans or projects. The hash par55oning is based on a hash funcaon that distributes the tuples according to a hashing key. useful for parallel exact match queries and hash- join operaaons. not appropriate for range queries and operaaons on other than paraaoning keys. The range par55oning is made based on value intervals of paraaoning keys. ualizes evaluaaons of range queries. the performance of the range paraaoning depends on the interval size. The round robin paraaoning technique distributes the tuples on each of the paraaons. This approach is also called striping. The number of logically con- secuave tuples forms a striping unit. The relaave size of the striping unit directly affects the performance. Small striping units result in more I/O parallelism for scans and long range queries. Larger striping units, on the other hand, may cause latency to complete scans.

ParAAoning Schemes A B A C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 a) Vertical Partitioning Original Table A B C 1 2 3 4 5 6 7 8 9 10 A B C 1 4 A B C 5 2 3 A B C 6 7 8 10 9 b) Hash Partitioning A B C A B C A B C A B C A B C 1 2 3 4 5 6 7 8 9 10 d) Round-Robin Partitioning 1 2 3 4 5 6 7 8 9 10 c) Range Partitioning

Virtual ParAAoning Virtual paraaoning, also called query paraaoning, assumes that all tables are fully replicated on each cluster node. In this approach, a query is decomposed into subqueries which access small pieces of data by appending range predicates to the where clause of that query. Each subquery then deals with only a small part of the data.

Virtual ParAAoning (Example) original query SELECT Sum(L_ExtendedPrice*L_Discount) AS Revenue FROM LineItem WHERE L_Discount BETWEEN 0.03 AND 0.05 subquery1 SELECT Sum(L_ExtendedPrice*L_Discount) AS Revenue FROM LineItem WHERE L_Discount BETWEEN 0.03 AND 0.05 AND L_OrderKey BETWEEN 0 AND 3000000 subquery2 SELECT Sum(L_ExtendedPrice*L_Discount) AS Revenue FROM LineItem WHERE L_Discount BETWEEN 0.03 AND 0.05 AND L_OrderKey BETWEEN 300001 AND 6000000 LineItem node A LineItem node B

ReplicaAon Schemes Full Replica$on: Tables are duplicated on each cluster node. That is, each node holds an exact copy of the original database. Par$al Replica$on: ParAal replicaaon means that only parts of original database are replicated on the different cluster nodes. Mixed Replica$on: Both full and paraal replicaaon at the same Ame.

ReplicaAon Schemes Original Database c) Mixed Replica$on a) Full Replica$on b) Par$al Replica$on

Mixed Data Design - Organize as node groups (NG) - Freely design every NG Global Database Scheme Co-existing Design Schemes 1 2 3 Node 1 Node 2 Node 3 Node 4 Node 5 Node Group 1 NG 2 NG 3 Database Cluster

Query Parallelism in a Cluster inter- query parallelism: The capability of the database management system to accept queries from mulaple users simultaneously. Each query is executed independently of the others. intra- query parallelism: Achieved by decomposing queries into subqueries and evaluaang them simultaneously. inter- par55on, intra- par55on and hybrid parallelism

Q 1 Q 2 Q 4 Data Data Data Database (Partition) Database Partition Database Partition a) inter-query c) intra-query & inter-partition Q 3 Q 5 Data Data Data Database Partition Database Partition Database Partition b) intra-query & intra-partition c) intra-query & intra-partition & inter-partition

Logical Cluster OrganizaAon Flat Cluster Architecture: Allows any cluster node to be accessible by clients. Forms a federated database of disanct databases running on independent servers. Connected by a LAN, no resource sharing, such as disks. Provides high availability and simple design. ReplicaAon is difficult to implement with this model. Middleware Based Cluster Architecture: A client can only interact with the cluster through a coordinaaon middleware. The middleware is responsible for scheduling and rouang of the clients requests. The middleware has the knowledge about underlying cluster. It can be used to ensure correct execuaons of concurrent updates and reads. It also allows to improve overall throughput by choosing be7er components, e.g. with less load to perform client requests. It is subject to single point of failure. If the middleware fails, the cluster will become useless. The middleware must be decentralized to improve scalability.

Logical Cluster OrganizaAon Clients Coordination Middleware Database Cluster a) flat architecture b) middleware-based architecture

ReplicaAon Management ReplicaAon is an essenaal technique to improve availability and scalability by fully or paraally duplicaang data objects among the nodes of a distributed system. ReplicaAon management is responsible for the maintenance of replicas and ensures consistency of mulaple copies of the same data object residing on different nodes. That is, replicaaon management is not simply copying data objects onto different nodes of a distributed system.

SynchronizaAon of Updates There are two possibiliaes for the locaaon of updates: Updates can either be centralized on one primary copy Or, be distributed on (a subset of) all replicas (update everywhere). : update : propagation : updatable object : read-only object a) Primary Copy b) Update Everywhere SynchronizaAon of updates can be done in two ways: eager and lazy

SynchronizaAon of Updates Eager (or synchronous) replicaaon. All copies of an object are synchronized within the same database transacaon. Allows early detecaon of conflicts and presents a simple soluaon to provide consistency. Has drawbacks regarding performance and due to the high communicaaon overhead among the replicas and the high probability of deadlocks. Lazy (or asynchronous) replicaaon. Replica maintenance is decoupled from the original database transacaon. The transacaons keeping the replicas up- to- date and consistent run as separate and independent database transacaons aler the original transacaon has commi7ed. Compared to eager replicaaon approaches, lazy approaches require addiaonal efforts to guarantee serializable execuaons.

Eager Primary Copy ReplicaAon

Eager Update Everywhere ReplicaAon

Lazy Primary Copy ReplicaAon with Immediate Updates

Lazy Primary Copy ReplicaAon with Deferred Updates

Lazy Update Everywhere ReplicaAon