Schemas Supporting Physical Data Storage



Similar documents
The Import & Export of Data from a Database

A Review of Database Schemas

Database Administrator [DBA]

BCA. Database Management System

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?


Structure of Presentation. The Role of Programming in Informatics Curricula. Concepts of Informatics 2. Concepts of Informatics 1

chapater 7 : Distributed Database Management Systems

B.Com(Computers) II Year DATABASE MANAGEMENT SYSTEM UNIT- V

2. Basic Relational Data Model

2) What is the structure of an organization? Explain how IT support at different organizational levels.

Oracle Data Integrator: Administration and Development

1. INTRODUCTION TO RDBMS

Physical Database Design Process. Physical Database Design Process. Major Inputs to Physical Database. Components of Physical Database Design

Introduction to Databases

Domain Model for Identity Management

COURSE NAME: Database Management. TOPIC: Database Design LECTURE 3. The Database System Life Cycle (DBLC) The database life cycle contains six phases;

Set operations and Venn Diagrams. COPYRIGHT 2006 by LAVON B. PAGE

Chapter 6: Physical Database Design and Performance. Database Development Process. Physical Design Process. Physical Database Design

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

THE EVOLVING ROLE OF DATABASE IN OBJECT SYSTEMS

Distributed Databases. Concepts. Why distributed databases? Distributed Databases Basic Concepts

Hadoop Architecture. Part 1

Spatial Database Support

DBMS / Business Intelligence, SQL Server

14 Databases. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

ao consult Basic Experience: Must have 7 years of experience in the development and maintenance of database systems.

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Developer Guide to Authentication and Authorisation Web Services Secure and Public

Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification

An Overview of Distributed Databases

Introduction. Introduction: Database management system. Introduction: DBS concepts & architecture. Introduction: DBS versus File system

Refresh Web Query Synonym

Introduction to Databases

Network Attached Storage. Jinfeng Yang Oct/19/2015

CFT Provision of a common data transmission system. Questions/Responses

Introduction: Database management system

LISTSERV LDAP Documentation

A database can simply be defined as a structured set of data

Understanding TCP/IP. Introduction. What is an Architectural Model? APPENDIX

Chapter 3. Database Environment - Objectives. Multi-user DBMS Architectures. Teleprocessing. File-Server

DATA ITEM DESCRIPTION

How to Implement Multi-way Active/Active Replication SIMPLY

How To Understand And Solve A Linear Programming Problem

Chapter 3: Distributed Database Design

New Mexico State Personnel Office 2600 Cerrillos Road Santa Fe, New Mexico

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Index Selection Techniques in Data Warehouse Systems

Database Concepts. Database & Database Management System. Application examples. Application examples

Conventional Files versus the Database. Files versus Database. Pros and Cons of Conventional Files. Pros and Cons of Databases. Fields (continued)

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

CHAPTER 17: File Management

Post-Class Quiz: Software Development Security Domain

Unicenter NSM Integration for BMC Remedy. User Guide

Master s Program in Information Systems

THE ENTITY- RELATIONSHIP (ER) MODEL CHAPTER 7 (6/E) CHAPTER 3 (5/E)

Redundancy & the Netnod Internet Exchange Points

3 Extending the Refinement Calculus

IP Addressing A Simplified Tutorial

SCHEDULE 1 SERVICE DESCRIPTION

Principles of Database. Management: Summary

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

SOLUTIONS TO ASSIGNMENT 1 MATH 576

The Classical Architecture. Storage 1 / 36

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies

Framework 8.1. External Authentication. Reference Manual

WebSphere Commerce V7 Feature Pack 2

CPS221 Lecture: Layered Network Architecture

Oracle Utilities Meter Data Management

Database Management. Chapter Objectives

hp ProLiant network adapter teaming

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Windows Scheduled Task and PowerShell Scheduled Job Management Pack Guide for Operations Manager 2012

Chapter 2 Database System Concepts and Architecture

Copyright 2010 Netbriefings, Inc. Page1 agendalayout-wr2010.doc

File Management. Chapter 12

Secure information storage

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

Distributed Data Management

IBM SPSS Collaboration and Deployment Services Version 6 Release 0. Single Sign-On Services Developer's Guide

Cost Effective Deployment of VoIP Recording

Click Studios. Passwordstate. High Availability Installation Instructions

THE SECURITY ARCHITECTURE OF THE SECURE MOBILE MESSAGING DEMONSTRATOR

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS. Learning Objectives

KNOWLEDGE FACTORING USING NORMALIZATION THEORY

NEC Storage Manager Data Replication User's Manual (Function Guide)

Slope and Rate of Change

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

Distributed Database Management Systems for Information Management and Access

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

Feature and Technical

Microsoft SQL Server OLTP Best Practice

Software Architecture Document

White Paper. Optimizing the Performance Of MySQL Cluster

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Transcription:

s Supporting Data Storage 21 st January 2014 (30 th March 2001) s Supporting Physical Data Storage Introduction A RAQUEL DB is made up of a DB, which itself consists of a set of schemas. These schemas fall into two subsets. One subset supports the relational model in that every schema in it consist of a set of relational variables (= relvars). The other subset supports the physical storage of the DB s data in that every schema in it consists of a set of members that are to do with physical data storage. Only the physical data storage subset is considered here. Factors Determining Physical Data Storage In order to understand the physical data storage schemas, it is first necessary to understand the storage factors recognised and dealt with by the RAQUEL DBMS via its storage schemas. There are three factors and hence three different types of schema, one to handle each factor. The three factors are : 1. The relationship of real relvar values to the relvalues that are actually physically stored. Traditionally a real relvar s complete relvalue would be stored as a single, coherent volume of physical data. However this need not be the case. For example, a relvar s value may be split into several fragments each of which is a relvalue and each fragment stored as a single, coherent volume of physical data. Alternatively the relvalues of 2 or more relvars could be merged into a single relvalue which is then stored as a single, coherent volume of physical data. A combination of the fragmentation and merging approaches could also be applied. Therefore RAQUEL stores the relvalues of stored relvars rather than real relvars. A stored relvar is a fragment of a real relvar, or the merge of 2 or more real relvars, or the result of the fragmentation and merging of real relvars. A relational algebra expression, called a Storage Expression, is used to define the relationship of a real relvar to the stored relvar(s) whose value(s) are used to derive the real relvar s value. The storage expression, whose only variables must be stored relvars, is assigned to the real relvar. It is permissible if desired to assign 2 or more different storage expressions to the same real relvar. Each storage expression represents a different way of storing the real relvar s value. The real relvar s value is stored according to each storage expression, and the RAQUEL DBMS is responsible for ensuring consistency between them. 2. The locations and names of the physical storage units that hold relvalue data. A Storage Unit is a named storage vessel in a particular location that holds the data which is the value of a stored relvar. It is permissible if desired to hold a stored relvars value in multiple storage units, each typically in a different location. The term location is used in a very general sense to maximise its applicability. Usually the multiple locations concerned are multiple networked Page 1 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) computers, and/or multiple storage systems managed by a single computer. Reasons for distribution typically include the need to store data near where it is used, because one computer installation is insufficient to store all the physical data of a DB, and to maintain multiple copies of relvalues for backup and recovery purposes. Where multiple storage units are used, the RAQUEL DBMS is responsible for ensuring that they all always hold the same relvalue, even if a different kind of physical storage mechanism is used by each copy s storage unit. 3. The nature of the physical storage mechanisms used to store relvalue data. Each storage unit has a Storage Mechanism or Storage Facility associated with it. The storage mechanism provides the means by which a relvalue is stored in the storage unit. Traditionally a storage mechanism consists of one or more software utilities designed in conjunction with one or more computer file designs so that the software utility/ies move data into and out from a file(s) of that design; examples are hash files, sequential files and index sequential files. In RAQUEL the idea of a storage mechanism is generalised to permit any kind of physical storage; so it can include another DB, a specialised hardware device, etc. The three factors take no account of any need to store data in encrypted form. Encrypted data can arise on 2 levels, the logical and the physical. Encrypted data at the logical level means that the relvalue of a real relvar is in encrypted form. So en/decryption takes place either on input to/output from the DB or within the application(s) that use(s) the encrypted relvalues. In both cases this does not affect the physical storage of data within the DB, even if the DBMS carries out the en/decryption. Encrypted data at the physical level means that a logical relvalue is encrypted before being stored, and inversely the stored form is decrypted before being retrieved; i.e. encryption is incorporated into the process of physically storing data and decryption into the process of physically retrieving data. En/decryption can be applied before the value of a real relvar is mapped to the value(s) of stored relvar(s) or after. In the former case the enc/decryption process may well affect how the value of the real relvar is allocated to the value(s) of stored relvar(s). In both cases the nature of the physical storage facilities used may be affected by the en/decryption process. However the same three factors apply regardless of whether data is stored in encrypted form or not, and so data encryption is henceforth ignored. Note that stored relvars are only used to physically store the relvalues of real relvars. They are not used to store the values of source relvars or sink relvars. The values of source and sink relvars are stored outside the DB and not managed by the DBMS. Sources and sinks are the relational interface of locations outside the DB from and to which respectively relvalues are transmitted. (So again any en/decryption of source and sink relvar values will operate outside the jurisdiction of the DBMS, which will be unaffected by it). Page 2 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) Physical Storage s Physical storage schemas fall into 2 categories, system schemas and default schemas. A system schema is created automatically as part of the creation of a DB and has a standard name which cannot be altered. A default schema is either created as part of DB creation with a default name which can later be changed, or is optionally created after DB creation with a name which is determined at creation time and can also be changed later if desired. The automatic creation and use of system schemas minimises the work needed to provide the physical storage of data. While default schemas are similarly supportive, they permit flexibility in the way that data storage is spread over one or more physical computer locations. Note that flexibility as regards the means by which data is physically stored is also provided by being able to plug different kinds of physical storage mechanisms into the DBMS; these mechanisms are referenced by the relevant storage schemas and do not affect the DBMS schema architecture. The physical storage schemas are : The Storage This schema is a system schema. It is the set of all the stored relvars held in the entire DB. A stored relvar is one whose value is directly physically stored as a single-valued entity. Hence the Storage holds all the data in the DB; in this sense it corresponds to the Logical. However the Storage consists of stored relvars whose values are actually physically stored, whereas the Logical consists of real relvars that are the basis of what is available to an application program that accesses the DB. A real relvar may also be a stored relvar indeed this is the default but a real relvar may also have its value stored in fragments, merges, or some combination of the two. The intersection of the Logical and Storage s contains those relvars that are both real and stored. It is also possible for a real relvar to have two or more copies of its value held via stored relvars, which the RAQUEL DBMS must automatically maintain in a consistent state. Copies may use the same or different arrangements of stored relvars. Named Stack s A Stack is a default schema. The purpose of a Stack is to manage the physical storage at a particular computer location of that data that comprises the values of a set of stored relvars. (A computer location is a specific computer or a specific storage device attached to a specific computer). Since one can expect a DB to contain at least one real relvar, there must be at least one Stack to handle the storage of physical data. So when a DB is created, by default a single Stack is created with a default name at a default computer location (which is typically the computer installation on which the DBMS is installed). Page 3 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) Each Stack consists of a set of 2 system schemas, which are : 1. A Location. 2. A Physical. When a Stack is created, its 2 member schemas are automatically created within it. The member schemas are empty when created. In a Unix/Linux environment, the assignment that creates a Stack takes a parameter that consists of a path name. The path name defines where the computer location is with respect to the RAQUEL DBMS. Therefore it could contain a URL if the stack were located on another, networked computer. A Location This schema is a system schema. Each Location comprises a set of stored relvars, namely those whose values are stored at the particular computer location of a Storage Stack. Thus a Location is a subset (not necessarily a proper subset) of the Storage. A stored relvar must appear in at least one Location for its value to be physically stored, but it may appear in two or more Location s (in different stacks), in which case the RAQUEL DBMS must automatically maintain the copies in a consistent state. A stored relvar has the same name in each of the Location s in which it appears. Different copies of the stored relvar are differentiated by the Location in which they exist; only one copy can exist in a Location. A Physical This schema is a system schema. Each Physical consists of a set of physical storage specifications, one for each of the stored relvars in the associated Location. Each specification consists of : A type of physical storage mechanism or facility (e.g. an index sequential file or a hashed file). The name of a storage unit (e.g. a file or hardware device). This is defined to be that of its associated stored relvar with the suffix St. (It is not a default that can be changed). The specification denotes that a particular storage mechanism is used to store the value of a particular stored relvalue in a particular storage unit. The Architecture Viewed as Layers The schemas can be considered as forming a layered architecture :- Page 4 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) Storage Stack 1 Location Physical Stack 2 Location Physical Here the diagram assumes that there just happen to be two storage stacks. There could be more, as required. If one were to add in the relational model schemas lying immediately above, and the actual physical data storage managed by each stack, then the complete RAQUEL schema architecture would be :- Subschema Subschema Subschema Logical Virtual Source Sink Storage Stack 1 Location Physical Stack 2 Location Physical Data Data Note that this architecture does not preclude the addition of materialised views. In RAQUEL, a materialised view is a method of directly storing the relvalues of virtual relvars (and so is expressed via the formal data storage model, not the logical relational model). The Architecture Viewed as Sets The schema architecture can also be considered as forming an inter-related collection of sets. Page 5 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) The following small DB, expressed via a Venn diagram, illustrates this, where R n represents a real relvar, S n represents a stored relvar, and S n St represents the physical storage specification of a stored relvar :- Logical R 1 R 2 R 3 Storage Stack1 s Location R 4 S 1 S 2 S 3 Stack2 s Location Stack2 Stack1 S 1 St S 2 St S 2 St R 4 St S 3 St Stack1 s Physical Stack2 s Physical In the above example, R 4 is a both a real relvar and a stored relvar. Real relvars R 1, R 2 and R 3 have their values contained in stored relvars S 1, S 2 and S 2. There cannot be a 1 : 1 relationship between the 3 real relvars and the 3 stored relvars, because that would mean that these real relvars were also stored relvars. It may be that one real relvar uses two stored relvars (say a Join or Union of them) to hold its value, while two real relvars use one stored relvar to hold their value (each being, say, a Projection or Restriction of the stored relvar s value). The Storage Expression bound to each real relvar via an ==Equate assignment specifies the precise relationship between the real and stored relvars. The value of stored relvar S 2 is held in both stacks, so the RAQUEL DBMS must ensure that the two values are always identical regardless of the changes in value made to S 2. S 2 may or may not have a different physical storage specification in each stack; this is not apparent from the identifier S 2 St used in the Venn diagram. Note : it is important to distinguish between the terms Stack and Storage Stack. A Stack is a schema in the schema architecture used in the formal data storage model of a RAQUEL DBMS whereas a Storage Stack is that part of the RAQUEL DBMS software architecture that implements a particular kind of physical storage mechanism or facility. Page 6 of 7

s Supporting Data Storage 21 st January 2014 (30 th March 2001) Using the Physical Storage s The names of the system schemas within a (default) stack schema are the system names Location and Physical prefixed by the stack name. When a real relvar is created, DBMS defaults are used to automatically create physical storage for the value of the relvar. The defaults are as follows : 1. Each real relvar becomes a stored relvar as well, and appears in the Storage as well as the Logical. 2. The real/stored relvar appears in the Location of the default storage stack, which is either the initial stack or, if there are 2 or more stacks, a stack specified as the default from those available. If there is only one storage stack, its Location will have the same set of members as the Storage. 3. A default physical storage mechanism or facility is specified. The default mechanism to be used can vary between DBMS installations. The storage defaults allow a real relvar to be used immediately after it has been created. However if preferred, the defaults may be overridden completely or to whatever extent is required, either immediately after the relvar s creation or at some later date as required. When changing a real relvar s physical data storage from the default arrangement, advantage can be taken of the fact that RAQUEL treats the properties of a relvar as orthogonal to each other (although the properties must be consistent with each other). Thus a change to physical data storage can be specified in whatever is the most effective sequence a number of sequences are possible, and none of them is guaranteed to be the most effective in all circumstances. However the sequence should make sure that relvar values held in store before the change are not lost in the process of making the change. Page 7 of 7