Database models and specimen-level databases 4/30/11



Similar documents
TIM 50 - Business Information Systems

1. INTRODUCTION TO RDBMS

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Foundations of Business Intelligence: Databases and Information Management

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

DATABASE MANAGEMENT SYSTEMS. Question Bank:

Databases and Information Management

Introduction to Computing. Lectured by: Dr. Pham Tran Vu

Relational Database Concepts

Database Design for the Uninitiated CDS Brownbag Series CDS

DATABASE MANAGEMENT SYSTEM

ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati

Bridge from Entity Relationship modeling to creating SQL databases, tables, & relations

Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER

- Suresh Khanal. Microsoft Excel Short Questions and Answers 1

Guideline for Implementing the Universal Data Element Framework (UDEF)

A. TRUE-FALSE: GROUP 2 PRACTICE EXAMPLES FOR THE REVIEW QUIZ:

Microsoft Access Part I (Database Design Basics) ShortCourse Handout

Databases in Organizations

Database IST400/600. Jian Qin. A collection of data? A computer system? Everything you collected for your group project?

Data Modeling Basics

Getting Started with Access 2007

12 File and Database Concepts 13 File and Database Concepts A many-to-many relationship means that one record in a particular record type can be relat

Qlik REST Connector Installation and User Guide

Database Design and Normalization

Module 3: File and database organization


CS2Bh: Current Technologies. Introduction to XML and Relational Databases. The Relational Model. The relational model

Topics. Database Essential Concepts. What s s a Good Database System? Using Database Software. Using Database Software. Types of Database Programs

Relational Database Basics Review

Practical Database Design

Tutorial on Relational Database Design


Database Setup. Coding, Understanding, & Executing the SQL Database Creation Script

Access Tutorial 2 Building a Database and Defining Table Relationships

Foundations of Information Management

Welcome to the Data Analytics Toolkit PowerPoint presentation on EHR architecture and meaningful use.

Basics of Dimensional Modeling

BUILDING A BIODIVERSITY CONTENT MANAGEMENT SYSTEM FOR SCIENCE, EDUCATION, AND OUTREACH

Name-based Approach to Build a Hub for Biodiversity LOD

Stored Documents and the FileCabinet

Databases What the Specification Says

Basic Concepts of Database Systems

The Relational Model. Why Study the Relational Model? Relational Database: Definitions

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method

Relational model. Relational model - practice. Relational Database Definitions 9/27/11. Relational model. Relational Database: Terminology

Assign: Unit 1: Preparation Activity page 4-7. Chapter 1: Classifying Life s Diversity page 8

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

Fundamentals of Database Design

When to consider OLAP?

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Model Driven Laboratory Information Management Systems Hao Li 1, John H. Gennari 1, James F. Brinkley 1,2,3 Structural Informatics Group 1

Microsoft Access Basics

1. Domain Name System

Use Find & Replace Commands under Home tab to search and replace data.

SQL DATA DEFINITION: KEY CONSTRAINTS. CS121: Introduction to Relational Database Systems Fall 2015 Lecture 7

7. Databases and Database Management Systems

Information Technology Services Kennesaw State University

Ken Goldberg Database Lab Notes. There are three types of relationships: One-to-One (1:1) One-to-Many (1:N) Many-to-Many (M:N).

The Entity-Relationship Model

Study Notes for DB Design and Management Exam 1 (Chapters 1-2-3) record A collection of related (logically connected) fields.

DBMS / Business Intelligence, SQL Server

The Relational Data Model: Structure

Database Management Systems. Rebecca Koskela DataONE University of New Mexico

Jet Data Manager 2012 User Guide

CHAPTER 5: BUSINESS ANALYTICS

The Relational Model. Why Study the Relational Model? Relational Database: Definitions. Chapter 3

Filter by Selection button. Displays records by degree to which they match the selected record. Click to view advanced filtering options

The process of database development. Logical model: relational DBMS. Relation

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

A Basic introduction to Microsoft Access

Challenges in digital preservation: Relational databases

Attribute Data and Relational Database. Lecture 5 9/21/2006

Doing database design with MySQL

Course MIS. Foundations of Business Intelligence

8. DATABASES 8.1 WIKIPEDIA ON DATABASES: MSF 574: NO NEED TO DO THESE PROJECTS.

Consider the possible problems with storing the following data in a spreadsheet:

The Relational Model. Ramakrishnan&Gehrke, Chapter 3 CS4320 1

Course Scheduling Support System

Toad for Data Analysts, Tips n Tricks

Planning and Creating a Custom Database

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Network Activity D Developing and Maintaining Databases

CDC UNIFIED PROCESS PRACTICES GUIDE

Normalization. Functional Dependence. Normalization. Normalization. GIS Applications. Spring 2011

- 1 - Guidance for the use of the WEB-tool for UWWTD reporting

- Eliminating redundant data - Ensuring data dependencies makes sense. ie:- data is stored logically

Data warehousing with PostgreSQL

ICAB4136B Use structured query language to create database structures and manipulate data

Optimum Database Design: Using Normal Forms and Ensuring Data Integrity. by Patrick Crever, Relational Database Programmer, Synergex

MySQL for Beginners Ed 3

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

N o r m a l i z a t i o n

DATABASE INTRODUCTION

IT2305 Database Systems I (Compulsory)

Lesson 8: Introduction to Databases E-R Data Modeling

Oracle Database 10g: Introduction to SQL

Access Database Design

Introducing the Database

Foundations of Business Intelligence: Databases and Information Management

Transcription:

Database models and specimen-level databases 4/30/11

TDWIG and Darwin Core Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a scientific and educational association affiliated with the International Union of Biological Sciences. TDWG was formed to establish international collaboration among biological database projects and promotes the wider and more effective dissemination of information about biological organisms for the benefit of the world at large. Biodiversity Information Standards (TDWG) now focuses on the development of standards for the exchangeof biological/biodiversity data.

Darwin Core The Darwin Core is body of standards. It includes a glossary of terms (properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. Each term has a definition and commentaries that are meant to promote the consistent use of the terms across applications and disciplines. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. It is based on the standards developed by the Dublin Core Metadata Initiative

Simple Darwin Core The Simple Darwin Coreis a predefined subset of the terms that have common use across a wide variety of biodiversity applications. The terms used in the Simple DarwinCore are those that are found at the cross-section of taxonomic names, places, and events that document biological occurrences on the planet. The two driving principles are simplicity and flexibility. The Simple Darwin Corehas no restrictions on which fields are required. A few general guiding principles on how to make the best use of the Simple Darwin Core: Any Darwin Core term name can be used as a field name. No field name may be repeated in a record. Provide data in as many fields as you can. Populate fields with data that match the definition of the field. Use the controlled vocabulary for the values of fields that recommend them.

What is a database A database is a system intended to organize, store, and retrieve data This is achieved using a database management system a set of programs that controls the creation, maintenance, and the use of a database Database management systems can use different database models, which describe ways in which data are structured and what operations can be performed on the data

Database models Flat file Hierarchical Network Relational Object-oriented

The flat file database model Data arrangement consists of a series of columns (=attributes, fields) and rows (=tuples) organized into a tabular format There exists no relationships or links between records and fields except the table structure One row = one record Fields can have a fixed width or be separated by delimiters (e.g., tab, comma, paragraph)

The flat file database model Data as text CT789 Pupa Lymantria dispar CT Lymantriidae CT881 Adult Lymantria dispar CT Lymantriidae CT9933 Adult Lymantria dispar MA Lymantriidae INBIO0983819 Adult Bufo marinus Heredia Bufonidae INBIO3892898 Larva Dynastes hercules Alajuela Scarabaeidae Data as a Simple Darwin Core spreadsheet individualid lifestage scientificname stateprovince family CT789 Pupa Lymantria dispar CT Lymantriidae CT881 Adult Lymantria dispar CT Lymantriidae CT9933 Adult Lymantria dispar MA Lymantriidae INBIO0983819 Adult Bufo marinus Heredia Bufonidae INBIO3892898 Larva Dynastes hercules Alajuela Scarabaeidae

The flat file database model Contains no formatting information Vast redundancy of data Great potential for data-entry errors Difficult or impossible to extract subsets of data

The hierarchical database model Data is organized into a tree-like structure It allows for repeating information using parent/child relationships One-to-many relationship (1:m) each parent can have many children but each child only has one parent Used in XML schema e.g., <SimpleDarwinRecord> <dwc:lifestage>pupa</dwc:lifestage> <dwc:scientificname>lymantria dispar</dwc:scientificname> <dwc:stateprovince> CT</dwc:stateProvince> </SimpleDarwinRecord>

The network database model Organizes data using two fundamental constructs, records and sets Records contain fields whereas sets define one-to-many relationships between records: one parent, many children Differs from the hierarchical model in that branches can be connected to multiple nodes e.g., Lymantridiidae Economic importance Euproctis Lymantria pest not pest L. ampla L. dispar

The object-oriented database model Information is represented in the form of objects as used in object-oriented programming The efficiency of such databases is greatly improved in areas which demand massive amounts of data about one item (e.g., all info about a client s account in banking)

The relational database model Matches data by using common characteristics found within the data set Data are stored in (flat file) tables, which are linked using unique keys Description of data can be separated from their values in different tables

The relational database model terminology A table (=relation) is a set of data elements (values) that is organized using a model of vertical columns (=fields, which are identified by their name, or the attribute) and horizontal rows (=records) Fields of a record can be of different types (e.g., character, numeric, date) The domain (=field specification) refers to the possible values each field can contain Each record (row) is uniquely identified by its primary key

The relational database model Occurrence Table individualid lifestage scientificname eventdate 789 Pupa Lymantria dispar 06/23/1998 881 Adult Lymantria dispar 08/11/1982 9933 Adult Lymantria dispar 07/15/2007 0983819 Adult Bufo marinus 12/04/2001 Numeric field Character field Date field Domain of the field lifestage: egg, caterpillar, pupa, adult

The relational database model terminology A candidate keyis any field, or combination of fields, that uniquely identifies a record. The field(s) of the candidate key must contain unique values (if the values were duplicated, they would be no longer identify unique records), and cannot contain a null value.(is the taxon name a good candidate key?) A primary keyis the candidate key that has been chosen to identify unique records in a particular table. A foreign key is a field in a relational table that matches the primary key of another table; a foreign key is a referential constraint between two tables.

The relational database model Foreign Key individualid lifestage taxonid CT789 Pupa Sp122 CT881 Adult Sp122 CT9933 Adult Sp122 INBIO0983819 Adult Sp88726 Occurrence Table Primary Key Candidate Key taxonid scientificname Family Sp122 Lymantria dispar Lymantriidae Sp88726 Bufo marinus Bufonidae Taxon Table One-to-many relationship

The relational database model terminology A one-to-one (1:1)relationship occurs where, for each instance of table A, only one instance of table B exists, and vice-versa. For example, a social security number is associated with only one taxonomist, and vice versa. A one-to-many (1:m)relationship is where, for each instance of table A, many instances of the table B exist, but for each instance of table B, only once instance of table A exists. For example, for each species, there are many specimens. Amany-to-many (m:n)relationship occurs where, for each instance of table A, there are many instances of table B, and for each instance of table B, there are many instances of the table A. For example, a species can occur in many localities, and each locality can have many species.

The relational database model country eventid USA 323 Mexico 88 Event Table event ID lifestage taxonid 323 Pupa Sp122 88 Adult Sp122 88 Adult Sp122 88 Adult Sp88726 Occurrence Table Many-to-many relationship taxonid scientificname Sp122 Lymantria dispar Sp88726 Bufo marinus Taxon Table Join table (=junction table, intersection table, map table etc.)

Desktop specimen-level databases Few and far in between developers focus on web-based, distributed databases Desktop databases are designed for individuals or small groups of biologists working on a centralized data set Limited integration with online distributed databases

Desktop specimen-level databases http://specifysoftware.org/ Based on MySQL RDBMS Robust security Good tools for web publishing Steep learning curve, unfriendly GUI

Desktop specimen-level databases Developed using FileMaker RDBMS Focus on rapid data entry and easy exploration of records Simple mapping and species identification tools Ability to serve data online (with FM Server) http://insects.oeb.harvard.edu/mantis/

Desktop specimen-level databases Developed using 4D RDBMS User-friendly interface Ability to serve data online (with 4D Server) http://viceroy.eeb.uconn.edu/biota

Exercise 1 of 3 Create a flat file database in MS Excel and enter your specimen data Use the recommended terms of the Darwin Core in the column (field) names Create one record (row) for each specimen Parse the taxonomic and locality data into appropriate fields Note:Specimens are treated as instances of the class Occurrence in Darwin Core

Exercise 2 of 3 Enter 3 of your specimens into Biota Create appropriate higher level taxonomic records for each species (Species, Genus, Family, Order) Create appropriate Collection and Locality records

Exercise 3 of 3 Create a simple relational database for your specimens in FileMaker Pro using the instructions provided in the DB Exercise.doc file (download) The database should consist of 3 tables: 1. Locality 2. Specimen 3. Taxon Link the tables into a many-to-many relationship, using the Specimen table as the join table