Distributed Databases Distributed Database Design Distributed Database System MS MS Web Web data mm xml mm dvanced Database Systems, mod1-1, 2004 1 Advanced Database Systems, mod1-1, 2004 2 Advantages of a DS Modularity Fault Tolerance High Performance Data Sharing Low Cost Components Issues Data Distribution Exploiting Parallelism Concurrency and Recovery Distributed Queries/Transactions Security Access Control Heterogeneity
Outline Introduction Objectives of Distributed Database Design Bottom-up Approach Reference Architecture Attribute Equivalency Integration Process Top-down Approach Partitioning Allocation Replication Introduction Data is stored at several sites, each managed by a MS that can run independently. Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts. dvanced Database Systems, mod1-1, 2004 5 Advanced Database Systems, mod1-1, 2004 6 Database Management Systems, R. Ramakrishnan & J. Gehrke, Mc Graw Hill, 2002 Recent Trends Users have to be aware of where data is located, i.e., Distributed Data Independence and Distributed Transaction Atomicity are not supported. These properties are hard to support efficiently. For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent. Database Management Systems, R. Ramakrishnan & J. Gehrke, Mc Graw Hill, 2002 Types of Distributed Databases Homogeneous: Every site runs same type of MS. Heterogeneous: Different sites run different MSs (different RMSs or even nonrelational MSs). Gateway MS1 MS2 MS3
Distributed MS Architectures Client-Server QUERY lient ships query o single site. All query rocessing at server. - Thin vs. fat clients. -Set-oriented communication, client side caching. Collaborating-Server uery can span multiple ites. CLIENT CLIENT SERVER SERVER SERVER SERVER QUERY SERVER SERVER dvanced Database Systems, mod1-1, 2004 9 Databases Deepen the Web OC A common way to connect and log on to a MC Libraries of OC API function calls that let an application connect to a MS, execute SQL statement, and retrieve results. JC Java Database Connectivity: a standard in the Java platform OC-JC bridges: enable developers in non- Java environments to use JC drivers Advanced Database Systems, mod1-1, 2004 10 EEE Computer, Jan 2004, pp. 116-117 Databases Deepen the Web Tiered Model Overview Web database environments Web browser Web server Database 2 Tier & 3 Tier Middleware tier application server Centralized, non-distributed Client-server, Distributed Two-tier Distributed Three-tier Fat client terminal terminal Server terminal Mainframe terminal Thin client services
Case Study Case Study JRMP JRMP Customer Web server proprietary Data server (proprietary) messaging ERP Customer Web server jar war Data server (proprietary) messaging ERP Business logic resides on Web server Data server manages transactions and requests to the database and the ERP system Architecture affects scalability and availability. dvanced Database Systems, mod1-1, 2004 13 Call center Business logic must move to the data server. Business process changes should not cause the enterprise to rework its entire application. Advanced Database Systems, mod1-1, 2004 14 What is J2EE? J2EE Architecture J2EE provides a service-oriented infrastructure to automatically support and manage components. The enterprise developer can concentrate on application components, not the underlying services. Separation of business logic and services provide for better reuse of business logic. Business Logic service Business Logic service Applet Application Client Web JSP Servlet J2EE APIs JNDI RMI-IIOP EJB Connector EJB Connector J2EE APIs JNDI ERP
Oracle 9i OC4J Oracle9iAS s For J2EE Applet Web JSP Servlet EJB Entity beans EJB dvanced Database Systems, mod1-1, 2004 17 System Architecture High Availability Standard n-tier architecture Front end application layer load-balancer Oracle 9iAS Web Cache Cluster of stateless application servers Oracle 9iAS J2EE container Clustered database nodes Oracle 9i/RAC Shared SAN storage Fibre Channel storage Advanced Database Systems, mod1-1, 2004 18 External LAN Internal LAN Storage Network Defining EJB Technology EJB servers provide core services to components: Transaction Security Naming Persistence Life cycle Concurrency EJB technology enhances: Simplified access to services Defining EJB Technology A server component specification (for vendors) Separates and defines development roles: Component creation Application assembly Application deployment
Objectives of Distributed Design Local Processing Maximizing local processing Minimizing remote referencing Availability of distributed data Multiple copies Reliability of distributed data Master copies Workload distribution Balancing and Parallelism Cost effectiveness of storage Promises of DSs Transparent Management of Distributed and Replicated Data Reliability through Distributed Transaction Improved Performance Easier System Expansion dvanced Database Systems, mod1-1, 2004 21 Advanced Database Systems, mod1-1, 2004 22 Principles of Distributed Database Systems, M. Ozsu & P. Valduriez, Prentice Hall, 1999 Transparent Management Transparency = separation of the higher-level semantics of a system from low-level implementation issues Boston employees, Paris employees Boston projects Boston Communication Network Edmonton Edmonton employees, Edmonton projects, Paris projects Paris SF SF employees, SF projects Paris employees, Paris projects Principles of Distributed Database Systems, M. Ozsu & P. Valduriez, Prentice Hall, 1999 Transparent Management Data Independence Network (or Distribution) Transparency To hide the existence of the network Replication Transparency To hide the existence of copies Fragmentation Transparency SELECT ename, sal FROM EMP, ASG, PAY WHERE ASG.dur > 12 AND EMP.eno = ASG.eno AND PAY.title = EMP.title
Reliability thru Distributed Xact A Xact transforms a consistent database state to another consistent database state even under concurrent execution or failures. Failure atomicity Ex) failure while update Concurrency transparency Ex) calculation while update Improved Performance The case for the improved performance of distributed MSs is typically made based on two points: Data fragmentation Data localization helps better CPU and I/O services, and less data transmission Parallelism Inter-query parallelism Intra-query parallelism dvanced Database Systems, mod1-1, 2004 25 Advanced Database Systems, mod1-1, 2004 26