Foundations of Information Management



Similar documents
Foundations of Information Management

Lesson 8: Introduction to Databases E-R Data Modeling

Chapter 2: Entity-Relationship Model. Entity Sets. " Example: specific person, company, event, plant

Chapter 2: Entity-Relationship Model. E-R R Diagrams

Chapter 2: Entity-Relationship Model

Introduction to database management systems

ER modelling, Weak Entities, Class Hierarchies, Aggregation

Database System Concepts

Chapter 1: Introduction. Database Management System (DBMS)

BİL 354 Veritabanı Sistemleri. Entity-Relationship Model

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

We know how to query a database using SQL. A set of tables and their schemas are given Data are properly loaded

Entity-Relationship Model

Chapter 3. Data Modeling Using the Entity-Relationship (ER) Model

COMP 378 Database Systems Notes for Chapter 7 of Database System Concepts Database Design and the Entity-Relationship Model

IV. The (Extended) Entity-Relationship Model

Chapter 1: Introduction

Exercise 1: Relational Model

Unit 2.1. Data Analysis 1 - V Data Analysis 1. Dr Gordon Russell, Napier University

Data Analysis 1. SET08104 Database Systems. Napier University

Database Management Systems

The Entity-Relationship Model

Converting E-R Diagrams to Relational Model. Winter Lecture 17

Chapter 1: Introduction

Chapter 1: Introduction. Database Management System (DBMS) University Database Example

Chapter 7 Data Modeling Using the Entity- Relationship (ER) Model

not necessarily strictly sequential feedback loops exist, i.e. may need to revisit earlier stages during a later stage

Lecture 12: Entity Relationship Modelling

IT2305 Database Systems I (Compulsory)

Database Design. Marta Jakubowska-Sobczak IT/ADC based on slides prepared by Paula Figueiredo, IT/DB

CSE 132A. Database Systems Principles

2. Conceptual Modeling using the Entity-Relationship Model

CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?

The Entity-Relationship Model

Comp 3311 Database Management Systems. 2. Relational Model Exercises

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

OVERVIEW 1.1 DATABASE MANAGEMENT SYSTEM (DBMS) DEFINITION:-

DATABASE INTRODUCTION

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Database Design Overview. Conceptual Design ER Model. Entities and Entity Sets. Entity Set Representation. Keys

Course: CSC 222 Database Design and Management I (3 credits Compulsory)

Databases Model the Real World. The Entity- Relationship Model. Conceptual Design. Steps in Database Design. ER Model Basics. ER Model Basics (Contd.

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

GUJARAT TECHNOLOGICAL UNIVERSITY, AHMEDABAD, GUJARAT. COURSE CURRICULUM COURSE TITLE: DATABASE MANAGEMENT (Code: ) Information Technology

Basic Concepts of Database Systems


- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

Data Modeling: Part 1. Entity Relationship (ER) Model

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Designing a Database Schema

DATABASE MANAGEMENT SYSTEMS. Question Bank:

IT2304: Database Systems 1 (DBS 1)

Lecture 6. SQL, Logical DB Design

1. INTRODUCTION TO RDBMS

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN INFORMATION AND COMMUNICATION TECHNOLOGY

Chapter 7: Relational Database Design

Introduction to Database Systems CS4320/CS5320. CS4320/4321: Introduction to Database Systems. CS4320/4321: Introduction to Database Systems

7.1 The Information system

Database Design Process

Data Modeling Basics

Relational Database Basics Review

Database Design Process

Objectives of Lecture 1. Labs and TAs. Class and Office Hours. CMPUT 391: Introduction. Introduction

Databases and BigData

Boyce-Codd Normal Form

Modern Systems Analysis and Design

Conceptual Design Using the Entity-Relationship (ER) Model

DATABASE DESIGN. - Developing database and information systems is performed using a development lifecycle, which consists of a series of steps.

XV. The Entity-Relationship Model

SQL, PL/SQL FALL Semester 2013

æ A collection of interrelated and persistent data èusually referred to as the database èdbèè.

Entity Relationship Diagram

Lecture Notes INFORMATION RESOURCES

Fundamentals of Database Design

Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

Principles of Database. Management: Summary

CSC 742 Database Management Systems

SCHEMAS AND STATE OF THE DATABASE

3. Relational Model and Relational Algebra

A brief overview of developing a conceptual data model as the first step in creating a relational database.

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Tutorial on Relational Database Design

Database Design Process. Databases - Entity-Relationship Modelling. Requirements Analysis. Database Design

Database Fundamentals: 1

Introduction to Databases

2. Basic Relational Data Model

Doing database design with MySQL

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

three Entity-Relationship Modeling chapter OVERVIEW CHAPTER

Database Concepts. Database & Database Management System. Application examples. Application examples

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

7. Databases and Database Management Systems

ECS 165A: Introduction to Database Systems

Relational Database Systems 2 1. System Architecture

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

CS 4604: Introduc0on to Database Management Systems

Transcription:

Foundations of Information Management - WS 2009/10 Juniorprofessor Alexander Markowetz Bonn Aachen International Center for Information Technology (B-IT)

Alexander Markowetz Born 1976 in Brussels, Belgium Raised in Marburg, Germany Research visits University of California Riverside (2000) Polytechnic Institute of the New York University (2004 & 5) 2004: Diplom Informatik, University of Marburg 2008: PhD Computer Science, The Hongkong University of Science and Technology 2009: Assistant Professor in Bonn 2

My Research Database Systems Information Retrieval (Search Engines) At the moment: Searching in Databases Searching in Code Repositories Architectures for Online Games 3

My Other Life Scuba Diving Yoga Hiking A lot of time in Asia 4

Contacts Room A225 Römerstr. 164 Tel.: 0228 73-7409 alex@iai.uni-bonn.de http://www.iai.uni-bonn.de/~alex I will NOT be available directly after class But usually before Else, send an email 5

Schedule 20. October 27. October 3. November 10. November 17. November 24. November 1. December 8. December 15. December 22. December 29. December 5. January 12. January 19. January 26. January 2. February Only 13 lectures this semester! Little time for a lot of things Consequences: Work efficiently Concentrate on key topics Little time for practical exercise You need to practice independently In every spare minute 6

Attendance You need to attend at least 80% = 10 of the 13 lectures You will catch a cold, at some time during the winter Hence, never skip classes intentionally Save the days for when you are really sick 7

Home Work The class will contain a certain amount of homework, and interactive exercises What and how much, will be determined throughout the semester We need to see what works best Homework and class participation are mandatory to qualify for final exams 8

Exams One Final Exam Early February Exact date to be announced! We aim at the ECTS grading scheme: A - 10% B - 25% C - 30% D - 25% E - 10% 9

Background Dilemma Broad spectrum of student's backgrounds: life sciences to computer science (and beyond)! Some people know (nearly) nothing about information management using computers. Others know something, or even a lot about databases and information systems. Be patient! Assist others! 10

Two very different classes Bio Databases (Prof. Hofman-Apitius) Foundations of Information Management Application oriented Real examples Method oriented Few general techniques General techniques From CS perspective From LSI perspective 11

Information and Resources Quick Books (Schaum's Outlines) R.A. Mata-Toledo, P.K. Cushman Fundamentals of Relational Databases, 2000 R.A. Mata-Toledo, P.K. Cushman Fundamentals of SQL Programming, 2000 Real Database Textbooks A. Silberschatz, H. F. Korth und S. Sudarshan Database System Concepts, 2006. R. Ramakrishnan, J. Gehrke: Database Management Systems, 2003. 12

Conferences and Journals This may all be a bit early for you But, if you do read papers, read good ones: Only top-10 publications You find most of the papers online: DBLP: http://www.informatik.uni-trier.de/~ley/db/ Citeseer: http://citeseer.ist.psu.edu/ The ACM Digital Library http://portal.acm.org/dl.cfm 13

DBLP 14

OK, so let's get started Foundations of Data Management Really: Database Management Systems 15

Data & Databases Data: Simple information Database: Collection of interrelated data Examples Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases 16

Database (Management) Systems Software to access data Convenient and efficient to use DBMS Users & application programs DB 17

Amazon: A really big DBMS Purchasing Customers Web Shop DBMS Warehouse & Shipping External Vendors Advertising DB Plus many more (external) connections. 18

Commercial DBMS The Big Three: Open Source: Oracle PostgresSQL IBM DB2 MySQL MS SQL Server Others: Sybase Informix (now IBM) Ingress Office Toys: MS Access 19

Databases in Life Science Most databases in the life sciences do not use a DBMS! Hundreds of databases in biology, chemistry, pharmacy, or medicine are based on dedicated (system-specific) textfile formats which come with very limited software support (if any). This lecture familiarizes you with the ideal of a database + DBMS, in order to be able to properly judge how much DBMS you need. There are cases where using a full DBMS would be overkill sometimes a less powerful system is more appropriate. There is a big turn towards moving LS databases to a stable and powerful general purpose DBMS you ought to know the basic principles of database technology. At the end of the lecture, we will look at alternatives to (real) database systems, though. 20

Before Database Systems Binary Files: 0100 1001 0101 0001 0101 0001 0101 0101 0101 1100 1111 1100 0110 Text Files: 01, Alexander, Markowetz, Professor 02, Bob, Benson, Truck Driver 03, Janice, Watson, Nurse 21

Drawbacks of storing data in files (1) Data redundancy and inconsistency Multiple file formats Duplication of information in different files Difficulty in accessing data Need to write a new program to carry out each new task Integrity problems Integrity constraints (e.g. account balance > 0) become part of program code Hard to add new constraints or change existing ones

Drawbacks of storing data in files (2) Atomicity of updates Failures may leave database in an inconsistent state E.g. transfer of funds between accounts should either complete or not happen at all Concurrent access by multiple users Concurrent accesses needed for performance Uncontrolled concurrent accesses can lead to inconsistencies E.g. two people reading a balance and updating it at the same time Security problems

Example (1) Alex writes a program to manage the addresses of all students at this university He uses a text file to store the addresses: Name, Address, Program of Study He has to write code parsing the text lines He has to write a code to ensure that the name of a student cannot become null When he wants to add another data-field Age He has to change all of the above code Two separate departments need access to this data Each keeps its own copy Over time, the two databases will drift apart, become inconsistent

Example (2) Whenever Alex introduces any change in the data format When he implements another project for the university He has to change all the above code, yet again, at both departments He has to write all the above again Still, his code is full of errors, does not allow two users to access data at the same time, and lacks many other features DBMS solve all the above problems

Data Independence Application program is isolated from the way that data is stored in the DBMS DBMS is isolated from hardware Achieved in a 3-layer architecture application view Logical Independence logical Physical Independence physical

Parts of a Database Database Schema Metadata, data about data Describes the structure of the data What sets (tables) of data there are Which data-fields (attributes) they contain Database Instance The actual data stored in the database At this moment!!!

Database Design (1) 1) Requirements Analysis Analyze real world, user needs & requirements Informal process, client interviews, etc. 2) Conceptual Design High level description of data to be stored Results in an ER-model 3) Logical Design Convert conceptual design into a relational database schema

Database Design (2) 4) Schema Refinement Analyze and refine logical schema Guided by powerful and elegant theory 5) Physical Design Address database performance Create Indexes 6) Application and Security Design

Interacting with a Database Data Definition Language (DDL) Describes the schema Data Manipulation Language Insert, delete and update data objects Retrieve data (query the database) There are graphical tools as well, these too can be categorized into the above categories SQL comprises both, a DDL as well as a DML

Thinking Databases As seen above, there are many benefits to using DBMS However, there is one more: Entity Relationship Diagrams A formal way to design data Relational Algebra A formal way to query data

Basic Concepts of ER A database can be modeled as a collection of entities relationships among entities Entity: an object that exists independently and is distinguishable from other objects. an employee, a company, a car, a student, a class etc. color, age, etc. are not entities

Entity set: entities of the same type E.g., a set of employees, a set of departments also called entity types Entity Type : Entity set: Employee e1 e2 e3 A general specification The actual employees

Attributes Properties of an entity name, address, weight, height are properties of a Person entity Properties of relationships date of marriage is a property of the relationship Marriage

Types of Attributes Simple attribute: contains a single value. EmpNo Employee Name Address

Composite Attributes EmpNo Name Employee Street Address City Country

Multivalued attributes: > 1 values Phone Employee Email

Derived attributes: computed from others Age Employee Date of birth

Key Attributes A set of attributes that can uniquely identify an entity EmpNo ERD Employee Name tabular EmpNo Name... 123456 John Wong... 456789 Mary Cheung... 146777 John Wong...

Key Attributes Composite key: Name or Address alone cannot uniquely identify a student, but together they can! Name Student Address

Key Attributes An entity may have more than one key Candidate key Primary key A minimal set of attributes that uniquely identifies an entity One candidate key is selected to be the primary key Sometimes artificial keys may be created E.g. we can enumerate all employees in a company

Example Entity (Customer)

Relationship A relationship is an association among several entities The degree refers to the number of entity sets that participate in a relationship set. Binary: two entity sets More than two relationships: very rare

Example of (Binary) Relationship Borrower is a relationship between Customers and Loans A customer can associated with one or more loans And vice versa

Relationship Sets with Attributes Depositor is a relationship between Customers and Accounts Access-date is an attribute of Depositor

Cardinality Constraints We express cardinality constraints by drawing either a directed line ( ), signifying one, or an undirected line ( ), signifying many, between the relationship set and the entity set. E.g.: One-to-one relationship: A customer is associated with at most one loan via the relationship borrower A loan is associated with at most one customer via borrower

One-To-Many Relationship In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower

Many-To-One Relationships In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower

Many-To-Many Relationship A customer is associated with several (possibly 0) loans via borrower A loan is associated with several (possibly 0) customers via borrower

Participation of an Entity Set in a Relationship Set Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total every loan must have a customer associated to it via borrower Partial participation: some entities may not participate in any relationship in the relationship set E.g. participation of customer in borrower is partial

Alternative Cardinality Notation Cardinality limits can also express participation constraints

Roles Entity sets of a relationship need not be distinct The labels manager and worker are called roles; they specify how employee entities interact via the works-for relationship set. Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the relationship

Keys for Relationship Sets The combination of primary keys of the participating entity sets forms a super key of a relationship set. (customer-id, account-number) is the super key of depositor This means that a pair of entities can have at most one relationship in a particular relationship set. E.g. if we wish to track all access-dates to each account by each customer, we cannot assume a relationship for each access. Solution: use a multivalued attribute for access dates. Must consider the mapping cardinality of the relationship set when deciding the candidate keys

Ternary Relationships Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch

Binary Vs. Non-Binary Relationships Some relationships that appear to be nonbinary may be better represented using binary relationships E.g. A ternary relationship parents, relating a child to his/her father and mother, is best replaced by two binary relationships, father and mother Using two binary relationships allows partial information (e.g. only mother being known) But there are some relationships that are naturally non-binary E.g. works-on

Weak Entity Sets An entity set that does not have a primary key is referred to as a weak entity set. The existence of a weak entity set depends on the existence of a identifying entity set it must relate to the identifying entity set via a total, one-to-many relationship set from the identifying to the weak entity set Identifying relationship depicted using a double diamond The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set s discriminator.

Weak Entity Sets (Cont.) We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with a dashed line. payment-number discriminator of the payment entity set Primary key for payment (loan-number, payment-number)

Another example of weak entity type EmpNo Name Age Employee Emp_Dep Dependent A child may not be old enough to have a passport number Even if he/she has a passport number, the company may not be interested in keeping it in the database.

Summary of Symbols (Cont.)

Design Decisions - Attribute vs Entity For each employee we want to store the office number, location of the office (e.g., Building A, floor 6), and telephone. Several employees share the same office Office as attribute Employee_id Name Employee_id Name Office_number Employee Office_location Office_phone Office as entity Employee Office_number Office Office_location Office_phone

ER Design Decisions - Entity vs Relationship Account example Can you see some differences? (e.g., can you have accounts without a customer?) Account as an entity Customer Account Branch Account as relationship Account Customer Branch

ER Design Decisions - Entity vs Relationship You want to record the period that an employ works for some department. from name ssn to lot did name ssn lot Employees from did Works_In3 Duration budget Departments Works_In2 Employees dname dname budget Departments to

ER Design Decisions - Strong vs. Weak Entity Example: What if in the accounts example an account must be associated with exactly one branch two different branches are allowed to have accounts with the same number. Number Account Branch_id Branch