Marek Rychly mrychly@strathmore.edu Strathmore University, @ilabafrica & Brno University of Technology, Faculty of Information Technology Advanced Databases and Enterprise Systems 24 August 2015 Marek Rychly ADES, 24 August 2015 1 / 34
Outline 1 Course Content Lectures and Assessments Evaluation Background on the Course 2 3 Sets and Relations Functions Marek Rychly ADES, 24 August 2015 2 / 34
Lectures and Assessments Lectures and Assessments Evaluation Background on the Course 1 st 2 nd Relational Database Model (1 st Assessment: define relations, keys, and constraints) 3 rd Relational Algebra and Calculus (2 nd Assessment: make expressions in relational algebra/calculus) 4 th 5 th SQL (3 rd Assessment: create database objects and define several queries) 6 th Transactions and Transaction Management 7 th Database Analysis and Design 8 th Entity-Relationship Modelling 9 th Normalization and Database Design (4 th Assessment: make an ER model and perform the normalization) 10 th Database Performance 11 th Data warehousing and Data-mining; Post-relational Databases 12 th Current developments in database technology; Distributed, Object-oriented, and NoSQL Databases (5 th Assessment: propose utilization of post-relational/nosql databases) Marek Rychly ADES, 24 August 2015 4 / 34
Core Reading Materials Lectures and Assessments Evaluation Background on the Course T. Connolly, T.M. Connolly, and C.E. Beg. Database Systems: A Practical Approach to Design, Implementation, and Management. Pearson Education, 2014. ISBN 9780132943260. URL http://www.pearsonhighered.com/educator/product/ 9780132943260.page#dw_resources. M. Kifer, A.J. Bernstein, and P.M. Lewis. Database Systems: An Application-oriented Approach. Pearson/Addison-Wesley, 2006. ISBN 9780321268457. URL http://www.pearsonhighered.com/educator/product/ 9780321268457.page#dw_resources. Marek Rychly ADES, 24 August 2015 5 / 34
Evaluation Course Content Lectures and Assessments Evaluation Background on the Course 50% assessments (each result has to be submitted to Marek Rychly <mrychly@strathmore.edu> by, at latest, two weeks after the assessment specification in a lecture) 1 st Assessment by 14 September 2015 2 nd Assessment by 21 September 2015 3 rd Assessment by 28 September 2015 4 th Assessment by 16 October 2015 5 th Assessment by 26 October 2015 50% final exam (in the Exam Week) Marek Rychly ADES, 24 August 2015 6 / 34
Prior Knowledge and Skills Lectures and Assessments Evaluation Background on the Course The course does not require any prior knowledge or skill, however, the following may help you to understand the lectures: 1 basic mathematical background (sets, relations, functions) 2 basics of software engineering (software development process and its phases, software life-cycle) 3 software modelling (data modelling, Class Diagram in Unified Modelling Language) 4 concepts of object-oriented programming (encapsulation, composition, inheritance) 5 operation system design principles (file system organization / data blocks, memory management / paging) 6 concepts of concurrent computation (readers-writers problem, critical section, locking) Marek Rychly ADES, 24 August 2015 7 / 34
Lecturer Introduction Course Content Lectures and Assessments Evaluation Background on the Course RNDr. Marek Rychly, Ph.D. assistant professor at BUT FIT Brno University of Technology Faculty of Information Technology Brno, Czech Republic, Central Europe over 10 years of lecturing experience (programming, cryptography, databases, SW architectures) over 25 of scientific conference and journal publications (dynamic architectures, SOA, Web services, distributed systems) a researcher in several national and international projects a happily married man with two small kids :-) http://www.fit.vutbr.cz/homes/rychly/.en https://www.linkedin.com/in/rychly Marek Rychly ADES, 24 August 2015 8 / 34
Lectures and Assessments Evaluation Background on the Course Students Introduction? (optional) What is your name? Where are you from? What is your educational background? (in general, in IT, any database experiences, etc.) What is your professional background? (a job title, why do you need this course, etc.) What do you want to focus on in this course? (see the list of lectures on page 4) Is there anything you would like to skip? (because you already know about that or because you are not interested in) Please send your (very short) introduction to mrychly@strathmore.edu with subject ADES Student Introduction. Knowing your background and priorities will help me to better meet your expectations. Marek Rychly ADES, 24 August 2015 9 / 34
How to Store Data? File-based Decentralized Storage each application site/domain manages its own data (a sales dept. manages sales data, a contract dept. manages contract data) data saved to/loaded from local files by individual applications (one file per one application instance) data processing and querying is done by individual applications (the application interprets the data) possible data transfers between site/domains are done manually (manual merging and conflict solving, versioning by backup files, etc.) Marek Rychly ADES, 24 August 2015 11 / 34
File-based Decentralized Storage II (adopted from Connolly & Begg: Database Systems, 2005, p. 11. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 12 / 34
File-based Decentralized Storage III Advantages and Disadvantages + easy to implement (an application just save data to or load data from a file) + full control over data (local data storage is easy to control, secure, and backup) separation and isolation of data (it is more difficult to access data from different locations or to combine them) duplication of data (the same data stored in different locations, difficult to control/synchronize them) program-data dependence (structure of the data stored is limited by a related application/program, it is difficult to modify the structure without modification of the program) incompatible file formats (different applications store data in different formats, e.g., little/big endian) fixed queries/proliferation of application programs (data are queried by their applications which implement just limited/currently required set of queries, it is difficult to implement another queries) Marek Rychly ADES, 24 August 2015 13 / 34
How to Store Data? Database Approach Definition (Database, DB) A shared collection of logically related data, and a description of this data (i.e., meta-data, a system catalogue), designed to meet the information needs of an organization (to answer queries). Definition (Database Management System, DBMS, DB server) A software system that enables users to define, create, maintain, and control access to the database (it manages data and meta-data). Definition (Database Application Program, DB client) A computer program that interacts with the database by issuing an appropriate request (typically an SQL statement) to the DBMS. Marek Rychly ADES, 24 August 2015 14 / 34
Database (DB) An Example of Entity-Relationship Diagram (entities, attributes, relationships) (adopted from Connolly & Begg: Database Systems, 2005, p. 16. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 15 / 34
Database Management System and App. Program DB Clients Accessing Data by Means of Centralized DBMS (adopted from Connolly & Begg: Database Systems, 2005, p. 17. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 16 / 34
Database Approach Advantages and Disadvantages Course Content + centralized approach (data are stored in one location, without duplicities, no separation/isolation) + data abstraction (structure of the data is not affected by related applications/programs, it respects well-established practices/normal forms) + standardized interface (DBMS provides a standardized interface to access data, e.g, SQL 2003) + performance/security tuning (DBMS is optimized for performance and security, applications need not to care) single point of failure (failure on a DBMS server affects all its clients; solved by replication/distribution) limited scalability (scalability by upgrading a DBMS server or a cluster, which is not trivial) issues related to concurrent access to/modification of data (multiple concurrent modifications of single data need to be serialized, long-running modifications may require locking the data) Marek Rychly ADES, 24 August 2015 17 / 34
DBMS Environment Course Content 1 hardware (HW servers where DBMSs are running, communication infrastructure, storage) 2 software (DBMS software, e.g., Oracle 12c, Microsoft Server, PostgreSQL, MySQL, etc.) 3 data (stored data and their meta-data) 4 procedures (procedures manipulating the data, invoked on demand or triggered automatically) 5 people (a data/db administrator, logical/physical DB designers app. developer, end-user) (adopted from Connolly & Begg: Database Systems, 2005, p. 19. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 18 / 34
The Three-Level ANSI-SPARC Architecture (adopted from Connolly & Begg: Database Systems, 2005, p. 35. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 19 / 34
Three Types of Database Schema in ANSI-SPARC 1 External level/external Schemas (different external views of the database for different users/applications) data displayed by an application to its users (tables, forms) results of user queries of data in the database (SELECTs) database views (predefined SELECTs) 2 Conceptual level/conceptual Schema/Logical model (structure of data stored in the database, describes what the data is stored) all entities, their attributes, and their relationships keys and constraints on the data security and integrity information 3 Internal level/internal schema/physical model (physical representation of the database, describes how the data is stored) storage space allocation for data and indexes exact data-types for all attributes and their storage size data encoding, compression, and encryption Changes in lower levels do not affect levels above. (there is mapping of data in lower levels to their representation in upper levels) Marek Rychly ADES, 24 August 2015 20 / 34
Three Types of Database Schema in ANSI-SPARC Example of the Database Schema in Three Different Levels (adopted from Connolly & Begg: Database Systems, 2005, p. 38. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 21 / 34
Data Model and Data Modelling Definition (Data model) A high-level description of a database schema as an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization. a data model consists of three components 1 structure of the data (objects/entities, attributes, relationships) 2 operations manipulating the data (methods, event handlers) 3 integrity constraints (for correct and accurate data) a data model respect different levels of ANSI-SPARC architecture (external, conceptual/logical, and internal/physical data models) the process of construction of a data model is data modelling there exists different types of data models (i.e., data modelling) (e.g., entity-relationship, object, rational, etc.) Marek Rychly ADES, 24 August 2015 22 / 34
Types of Data Model and Data Modelling there were proposed several categories of data models: 1 object-based data models (e.g., entity-relationship, object, functional, and semantic data models) 2 record-based data models (e.g., relational, network, and hierarchical data models) 3 physical data models (e.g., unifying model, frame model) we will focus on entity-relationship and relational data models (these are the most popular models in current databases and DBMS) moreover, we will focus on conceptual modelling (that is on the process of creation of a conceptual model, in terms of ANSI-SPARC) Marek Rychly ADES, 24 August 2015 23 / 34
A Sample Database Schema in Relational Data Model (adopted from Connolly & Begg: Database Systems, 2005, p. 45. ISBN 0-321-21025-5 ) Marek Rychly ADES, 24 August 2015 24 / 34
Classification of DBMS DBMSs can be classified based on several criteria based on data model network, hierarchical (traditional models not commonly used due to their complexity) rational, object-relational (well-known DBMSs like Oracle, MS SQL Server, DB2, PostgreSQL, MySQL/MariaDB, etc.) object-oriented, XML, document-oriented, key-value, etc. (suitable for particular applications) based on user numbers (single-user, multi-user) based on database distribution (embedded, centralized, homogeneous distributed, heterogeneous distributed) Marek Rychly ADES, 24 August 2015 25 / 34
Eight Functions of Full-scale DBMS (by Codd, 1982) 1 data storage, retrieval, and update (in different external views) 2 a user-accessible system catalogue (meta-data) 3 transaction support (atomicity, consistency, integrity, durability) 4 concurrency control services (serialization of concurrent requests) 5 recovery services (data consistency even on failure) 6 authorization services (security) 7 support for data communication (requests as comm. messages) 8 integrity services (with respect to defined constraints) Marek Rychly ADES, 24 August 2015 26 / 34
System Architecture with DBMS DBMS implemented as a database server in multi-tier architecture. Two-tier client-server architecture: 1 client (user interface and partially also business and data processing logic) 2 database server (data validation and access, partially business and data processing logic) Three-tier client-server architecture: 1 client (user interface, no business and data processing logic, i.e. it is a thin client ) 2 application server (business and data processing logic) 3 database server (data validation and access) The app. server may implement a TP Monitor to access DBMS. (Transaction Processing Monitor provides transaction routing, distribution, load balancing, funnelling, and increased reliability by employing multiple DBMSs) Marek Rychly ADES, 24 August 2015 27 / 34
Transaction Processing Monitor Definition (Transaction Processing Monitor, TP Monitor) A program that controls data transfer between clients and servers to provide a consistent environment, particularly for on-line transaction processing (OLTP). transaction routing (directs transactions/requests to a particular DBMS, to increase scalability, perform load-balancing, or increase reliability a primary DBMS server is inaccessible) managing distributed transactions (performs operations across several heterogeneous DBMSs in single transactions) funnelling (establishes a small number of network connections to DBMSs which are utilized/reused for incoming requests of a large number of clients as needed) Marek Rychly ADES, 24 August 2015 28 / 34
Sets and Set Operations Sets and Relations Functions Definition (Set) A finite or infinite collection of objects in which order has no significance and multiplicity is (generally) also ignored. a 2 A is used to denote that a is an element of a set A. A is a subset of B, A B, iff all members of A are also members of B. (e.g., f2; 3g f1; 2; 3; 4; 5g; an empty set, denoted ;, is a subset of all sets) fxjp(x)g is a set of elements x s.t. a property p is satisfied by x. (e.g., fn 2 Njn <= 3g or fn 2 Z jn is eveng) A [ B = fxjx 2 A or x 2 Bg is an union of sets A and B. (e.g., f1; 2; 3g [ f3; 4; 5g = f1; 2; 3; 4; 5g) A \ B = fxjx 2 A and x 2 Bg is an intersection of sets A and B. (e.g., f1; 2; 3g \ f3; 4; 5g = f3g) B n A = fxjx 2 B and x =2 Ag is a relative complement of A in B. (e.g., f1; 2; 3g n f3; 4; 5g = f1; 2g) Marek Rychly ADES, 24 August 2015 30 / 34
Sets and Relations Functions Cartesian Product and Relations Definition (Cartesian Product) The Cartesian product of two sets A and B, denoted A B (also called the cross product), is defined to be the set of all points (a; b) where a 2 A and b 2 B; i.e., A B = f(a; b)ja 2 A and b 2 Bg. Definition (Relation) A relation is any subset of a Cartesian product. For example, let A = f1; 2; 3g; B = f4; 5g C = A B = f(1; 4); (1; 5); (2; 4); (2; 5); (3; 4); (3; 5)g D = f(a; b)j(a; b) 2 C and a = 2g Then, C is a Cartesian product of A and B, and C and D are relations (of the Cartesian product C). Marek Rychly ADES, 24 August 2015 31 / 34
Functions Course Content Sets and Relations Functions Definition (Function) A function is a relation that uniquely associates members of one set with members of another set. Formally, function f from A to B, denoted f : A! B, uniquely associates every a 2 A with object f (a) 2 B. For example, f : N! N; f (x) = x 2 ; (it is a relation f = f(1; 1); (2; 4); (3; 9); (4; 16); : : :g) g : f1; 2g! f3g; g(1) = 3 and g(2) = 3; (it is a relation g = f(1; 3); (2; 3)g) h : f3g! f1; 2g; h(3) = 1 and h(3) = 2; (it is just a relation h = f(3; 1); (3; 2)g) f is a function. g is a function. h is not a function. Marek Rychly ADES, 24 August 2015 32 / 34
Summary Summary DBMSs are trying to solve issue of file-based storage. (issues: separation, isolation, duplication, program-data dependence, formats, etc.) Database schema can be viewed from multiple levels. (external, conceptual/logical, internal/physical; corresponds to DB schema models) The most common approaches to data model/modelling are entity-relationship (ER) model (based on entities, their attributes and relationships) relational model (based on mathematical relations) conceptual model (based on the conceptual level of ANSI-SPARC architecture) In the next lecture: rational model, key and constraints (mathematical relations and data model; super-key, primary key, foreign key, etc.) Marek Rychly ADES, 24 August 2015 33 / 34
Thanks Thank you for your attention! Marek Rychly <mrychly@strathmore.edu> Marek Rychly ADES, 24 August 2015 34 / 34