Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Similar documents
Database Design Methodologies

7.1 The Information system

DATABASE NORMALIZATION

Normalization. Purpose of normalization Data redundancy Update anomalies Functional dependency Process of normalization

IT2305 Database Systems I (Compulsory)

Lesson 8: Introduction to Databases E-R Data Modeling

IT2304: Database Systems 1 (DBS 1)

DATABASE DESIGN. - Developing database and information systems is performed using a development lifecycle, which consists of a series of steps.

2. Conceptual Modeling using the Entity-Relationship Model

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

Database Design Methodology

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage

Lecture 2 Normalization

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Normalization. Normalization. Normalization. Data Redundancy

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model


CSC 742 Database Management Systems

- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

Chapter 3. Data Modeling Using the Entity-Relationship (ER) Model

Lecture 6. SQL, Logical DB Design

Relational Schema Design

Unit 2.1. Data Analysis 1 - V Data Analysis 1. Dr Gordon Russell, Napier University

Introduction to normalization. Introduction to normalization

Data Analysis 1. SET08104 Database Systems. Napier University

BCA. Database Management System

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

1 File Processing Systems

Designing a Database Schema

Databases What the Specification Says

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

Database Concepts. Database & Database Management System. Application examples. Application examples

Fundamentals of Database Design

Entity-Relationship Model

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

DATABASE DESIGN: NORMALIZATION NOTE & EXERCISES (Up to 3NF)

Conceptual Design Using the Entity-Relationship (ER) Model

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

Part 2 - The Database Environment

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Foundations of Information Management

Information Systems Analysis and Design CSC John Mylopoulos Database Design Information Systems Analysis and Design CSC340

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

Chapter 1: Introduction

Chapter 2 Database System Concepts and Architecture

Databases and BigData

DATABASE INTRODUCTION

A brief overview of developing a conceptual data model as the first step in creating a relational database.

Introduction to Databases

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Lecture Notes INFORMATION RESOURCES

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Database Fundamentals: 1

Database Management Systems. Chapter 1

Database Systems Introduction Dr P Sreenivasa Kumar

Database Design Methodology

GUJARAT TECHNOLOGICAL UNIVERSITY, AHMEDABAD, GUJARAT. COURSE CURRICULUM COURSE TITLE: DATABASE MANAGEMENT (Code: ) Information Technology

Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology

Outline. Data Modeling. Conceptual Design. ER Model Basics: Entities. ER Model Basics: Relationships. Ternary Relationships. Yanlei Diao UMass Amherst

DBMS / Business Intelligence, SQL Server

The process of database development. Logical model: relational DBMS. Relation

Basic Concepts of Database Systems

Review: Participation Constraints

Database Management Systems

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

Database Management. Technology Briefing. Modern organizations are said to be drowning in data but starving for information p.

LOGICAL DATABASE DESIGN

Relational Database Basics Review

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

IV. The (Extended) Entity-Relationship Model

Chapter 1: Introduction

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Chapter 1: Introduction. Database Management System (DBMS)

2. Basic Relational Data Model

How To Write A Diagram

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method

Data Modeling Basics

Study Notes for DB Design and Management Exam 1 (Chapters 1-2-3) record A collection of related (logically connected) fields.

Modern Systems Analysis and Design

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

Fundamentals of Database System

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Instant SQL Programming

The Entity-Relationship Model

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Information Systems SQL. Nikolaj Popov

Normalization in Database Design

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

Chapter 2. Data Model. Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel

1. INTRODUCTION TO RDBMS

A Tool for Generating Relational Database Schema from EER Diagram

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Database System Concepts

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur

Transcription:

1 Topics for this week: 1. Good Design 2. Functional Dependencies 3. Normalization Readings for this week: 1. E&N, Ch. 10.1-10.6; 12.2 2. Quickstart, Ch. 3 3. Complete the tutorial at http://sqlcourse2.com/ In-class Activity: 1. Review components of DBMS, with an eye to viewing the whole process as systematic activities: conceptual, logical, and physical design. 2. Demonstrate by creating a small conceptual and logical model then translating this into part of the physical model - creating the database and tables using MySQL s terminal window. is demonstrates, too, the DDL. 3. Demonstrate a few commands issued against the data (DML). 4. Review some of your ER questions. Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations No doubt, you ve learned already some of the concepts & terminology of relational database modeling. To get up to speed with what you ve covered, here are some quick notes that may help us harmonize our perspectives. Components of a DBMS: Many roles and activities are involved; not all of which you may have encountered. Here s a view of the components of a DBMS. Notice the different contributions of programmers, users, and the DBA - the database administrator. Note, too, the functions that constitute a DBMS (and where the DDL and DML fit in). Ultimately the commands and functions must be communicated to the computer system itself - via the file manager, various other access methods and buffers before reaching the actual data stored in the relationship databases and tables. Query Processor: transforms queries into a series of low-level instructions directed to the database manager. Database Manager (DM): Interacts with the usersubmitted application programs and queries. e DM accepts queries and examines the external and conceptual schema to determine what conceptual records are required to satisfy the request. e DN places a call to the File Manger to perform the request. File Manager: manipulates the underlying storage files and manages the allocation of storage space on the disk. Actual physical manipulation of the data is passed to the appropriate access method. DML preprocessor: converts DML statements embedded in an application program into a

2 MySQL). standard function call in the host language (e.g., DDL compiler: compiler converts DDL statements into a set of tables containing meta-data. ese tables are then stored in the catalog while control info is stored in the data files headers. Catalog manager: [not the library kind!] - manages access to and maintains the system catalogue; the system catalog is accessed by most of the DBMS components. ere are other important functions in SQL so ware Terms: Relation: Attribute: Domain a relation is a table with columns & rows an attribute is a named column of a relation is the set of allowable values for one or more attributes. e idea of domain is important because it allows the user to define in a central place the meaning and source of values that attributes can hold. [ is leads to the data dictionary; example below.] Tuple: a tuple is a row of a relation. Degree: the degree of a relation is the number of attributes it contains. Cardinality: the cardinality of a relation is the number of tuples (rows) it contains. As rows are added or deleted, cardinality changes. Relational database is a collection or normalized relations. Example of domains for some branches and staff relations: Attribute Bno Street Area Domain Name BRANCH_NO STREET_NAME AREA_NAME Meaning Domain definition The set of all possible branch characters: size 3, range B1- numbers B99 Set of all streets in Boston character: size 25 Set of all local area names character: size 2

3 City Zip and so on... CITY_NAMES ZIP_CODE Names of all towns in Boston Set of zips in Boston character: size 20 character: size 12 Alternative Names: Relation Table File Tuple Row Record Attribute Column Field Now consider the various mathematical combinations from your other readings. ese become important when we consider joins, unions, and other kinds of set operations. Entity-Relationship Modeling ree main items: entity types, relationship types, and attributes. Entity type: an object or concept that is identified by the organization as having an independent existence. Entity: an instance of an entity type that is uniquely identifiable. Weak entity type: an entity type that is existence-dependent on some other entity type. Strong entity type: an entity type that is not existence-dependent on some other entity type. Example: e Staff and Branch items above are strong entities; if there were data about a staff member s next-of-kin, that entity is weak: it exists only if a staff member has a next-of-kin. Attributes are a property of an entity or a relationship type. e Attribute domain is a set of values that may be assigned to that attribute. E.g., Staff->Lname should include only the last names of employees. Some attributes are simple, others composite. A single attribute is composed of a single component with an independent existence. E.g., cannot be further subdivided (e.g., sex, salary; aka atomic attributes ). Composite attributes consist of multiple components; e.g., e Address attribute could be divided into multiple attributes each with their own existence (e.g., street, city, state, zip). Derived attributes are values that are derived from the value of a related attribute or set of attributes, not necessarily in the same entity. For example, paycheck could be derived from hourly_wage * hours_worked. Keys Candidate keys are attributes or sets of attributes that uniquely identify individual occurrences of an entity type. For example, branch number (Bno) is a candidate key for the Branch entity type and has a distinct value for each branch entity. Must be unique for each occurrence (like unique IDs). Primary key is that candidate key selected because of its current and future uniqueness. Composite key is a key consisting of two or more attributes. Diagrammatic Representations Here s an example of Staff, Branch, and Next_of_Kin entities and attributes represented graphically followed by the composition of keys: Staff entity composition: Staff (Staff_No, FName, LName, Address, Tel_No, Sex, DOB, Position, Salary) Primary Key Staff_no Alternate Key FName, LName, DOB Composite Attribute Name (FName, LName) Derived Attribute Total_Staff

4 Branch entity type composition: Branch (Branch_No, Street, Area, City, Zip, Tel_No, Fax_No) Primary Key Branch_No Alternate Key Fax_No Composite Attribute Address (Street, Area, City, Zip) Multi-valued Attribute Tel_No Next_of_Kin entity type composition: Next_of_Kin (NName, Address, Tel_No, Relationship) Figure: Diagrammatic representation compare this chart to the above compositions. Other kinds of relationships: Binary, Ternary, Quaternary... (ternary, quaternary are examples of n-ary)

5 In addition, there s the issue of recursive relationships and views: Cardinality constraints: Cardinality is the number of possible relationships for each participating entity. e most common degree for relationships is binary and the cardinality ratio for binary relationships are one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N). If the business rule permits it, it is possible to have M:N but with an intervening relationship, e.g., Newspaper [M] ---<advertises> --- [N] Rentals Constraints: Sometimes you can indicate when the relationship participation is mandator and when optional. In addition, you can display the minimum and maximum values (e.g., 5, N) means minimum of 5 instances, no maximum). Problems with ER Models ese problems are called connection traps and occur due to a misinterpretation of the meaning of certain relationships. Two main types: fan traps and chasm traps. In general, to identify connection traps, you must ensure that the meaning of a relationship is fully understood and clearly defined. If not, then the representation doesn t reflect the real world issue. Fan trap: is where a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous. Usually where two or more 1:M relationships fan out from the

6 sane entity. Chasm map is where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. [ ere are many more traps; there is also the Enhanced Entity Relationship Model (EER). Normalization: to produce a set of relations with desirable properties, given the data requirements of the organization. A main goal is to remove redundancies (although later you ll see there s a time for including redundant data). ere are also many forms of normalization - we ll look at only 1st, 2nd, and 3rd. We want to achieve several goals: (1) to be able to identify uniquely through single or composite keys every datum in the table or sets of tables. (2) e purpose is to avoid anomalies: insertion (when data is added but not in all the right places, e.g., adding new staff but not to the branch where they work); deletion anomalies (same idea but in reverse); modification anomalies (like adding money to your checking account using the ATM but the new balance doesn t reflect the new cash!). Part of the technique of normalization is understanding how data might depend on each others. Functional dependencies: is is a relationship between attributes in a relation. For example, if A and B are attributes of a relation R, B is functionally dependent upon A (denoted A B), if each value of A is associated with exactly one value of B. E.g., Staff_No -- [Position is functionally dependent on Staff_No] --> Position. For example: Staff number S100 ---------> Manager Fundational dependency is a property of the meaning or semantics of the attributes in a relation. e semantics indicate how attributes relate to one another, and specify the functional dependencies between them. when a functional dependency is present, the dependency is specified as a constraint between them. For example: Let s say B is dependent upon A : if we know the value of A and examine the relationship of the dependency, we find only one value of B in all the rows that have a given value of A. Let s say every staff member has a staff ID # and a job position [S100, Librarian]. at would be represented as StaffNo --> Position. If we know the StaffNo (the data in A), we know the job position (Librarian, the data in B). Conversely we cannot know the data in B (job title = Librarian) unless we know A (StaffNo=S100). [Position attribute is functionally dependent upon Staff_No.] If a person has two job IDs (say login as the department manager or login as the librarian), then then Staff_No is not functionally dependent on Position. [In other words, Jane is the manager and the librarian. She has two ID #s (M5 for manager, S100 for librarian). While knowing staff no. gives us Jane s position, there are many people who are also managers and many who are librarians. Cannot identify uniquely the position. e relationship between Staff_No and Position is 1:1; for each staff number there is only one position. On the other hand, the relationship between Position and Staff_No is 1:M; there are several staff numbers (members of staff) associated with a position. ere is an example of functional dependencies of the Staff_Branch relation [ e ER diagram 2 pages back]. Staff_No --> SName, SAddress, Position, Salary, Branch_no, BAddress, Tel_no Branch_No --> B_Address, Tel_No BAddress -> Branch_No, Tel_No Tel_No --> Branch_No, BAddress Now, to create candidate key(s) for this relation, what attribute or group of attributes uniquely identify each row? Process of normalization: first analyze the relations based on their primary key (or candidate keys) and functional dependencies.

7 1st Normal Form (1NF): e intersection of each row and column contains one and only one value. 2nd Normal Form (2NF): A relation that is in first normal form and every non-primary key attribute is fully functionally dependent on the primary key. [Full functional dependency indicates that if A and B are attributes or a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A] 3rd NF: A relation that is in first and second normal form and in which no non-primary-key attribute is transitively dependent on the primary key. [Transitive dependence is a condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).] Methodology - Conceptual Database Design. e logic to creating databases has 3 major components: conceptual database design (to build the conceptual representations of the DB, including identification of important entity and relationships); logical database design (to translate the conceptual representation to the logical structure of the database, which includes designing the relations); and Physical database design to allow the designer to decide how the logical structure is to be physically implemented (as tables) on the target database management system. Factors for success in database design: Work interactively with users as much as possible ( Joint Application Design - JAD) Follow a structured methodology throughout the data modeling process Employ a data-driven approach incorporate structural and integrity considerations into the data model Combine conceptual, normalization, and transaction validation techniques into the data modeling methods Use diagrams to represent as much of the data models as possible Use a database design language (DBDL) to represent additional data semantics Build a data dictionary to supplement the data model diagrams Be willing to repeat these steps! Overview of the Database Design Methodology Conceptual DB Design Step 1 Build local conceptual model for each user view 1.1. Identify entity types 1.2. Identify relationship types 1.3. Identify and associate attributes with entity or relationship types 1.4. Determine attribute domains 1.5. Determine candidate and primary key attributes 1.6. Specialize/generalize entity types (optional step) 1.7. Draw ER diagram 1.8. Review local conceptual data model with the user Logical database design for the relationship model Step 2: Build and validate local logical data model for each user view 1.1. Map local conceptual data model to local logical data model 1.2. Derive relations from local logical data model 1.3. Validate model using normalization 1.4. Validate model against user transactions

8 1.5. Draw ER diagrams 1.6. Define integrity constraints 1.7. Review local logical data model with user Step 3: Build and validate global logical data model 1.1. Merge local logical data models into a global model 1.2. Validate global logical data model 1.3. Check for future growth 1.4. Draw final ER diagram 1.5. Review with users Physical Database Design Step 4: Translate global local data model for target DBMS 1.1 Design base relations for target DBMS 1.2 Design enterprise constraints for the target DBMS Step 5: Design physical representation 1.1 Analyze transactions 1.2 Choose file organizations 1.3 Choose secondary indices 1.4 Consider the introduction of controlled redundancy 1.5 Estimate disk space requirements Step 6: Design security mechanisms 1.1 Design user views 1.2 Design access rules Step 7: Monitor and tune the operational system * ** * ** * Focusing on the local logical data model (Step 2 above): e goal is to refine the conceptual model to remove undesirable features. 1. Remove M:N relationships: if a M:N is found, try to decompose this relationship to identify an intermediate entry. e M:N relationship becomes two 1:M relationships to the newly identified entity. For example, consider this M:N (Many) newspapers <advertise> (many) rentals. e <advertise> can be decomposed we identify the Advert entity and two new 1:M relationships

9 ( Lists and AdvertisesIn Advert (see image): 2. Remove complex relationships 3. Remove recursive relationships 4. Remove relationships with attributes 5. Remove multi-valued relationships 6. Re-examine 1:1 relationships 7. Remove redundant relationships.