THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon



Similar documents
10 Ways to Not Get Caught Hacking On Your Mac

Hadoop Architecture. Part 1

PARALLEL PROCESSING AND THE DATA WAREHOUSE

DATABASE MANAGEMENT SYSTEMS

Data Memo. BY: Associate Director John B. Horrigan ( ) RE: USE OF CLOUD COMPUTING APPLICATIONS AND SERVICES September 2008

Original-page small file oriented EXT3 file storage system

Institute for Advanced Study Shelby White and Leon Levy Archives Center

Laserfiche Volumes: Introduction and Best Practices

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

DESIGN AND IMPLEMENTATION OF A SECURE MULTI-CLOUD DATA STORAGE USING ENCRYPTION

Secure information storage

The Key Elements of Digital Asset Management

A block based storage model for remote online backups in a trust no one environment

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

Recommendations for Performance Benchmarking

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Users are Complaining that the System is Slow What Should I Do Now? Part 1

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Chapter 7: Termination Detection

CSE 120 Principles of Operating Systems

Digital Forensics Tutorials Acquiring an Image with FTK Imager

A Deduplication-based Data Archiving System

6. Storage and File Structures

IDERA WHITEPAPER. The paper will cover the following ten areas: Monitoring Management. WRITTEN BY Greg Robidoux

Template 4: Description of Archiving System

DOCUMENT MANAGEMENT. Evo2: YOUR FLEXIBLE FRIEND Evo3: SEEK AND YE SHALL FIND

In-Memory Databases MemSQL

iservdb The database closest to you IDEAS Institute

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

RAID HARDWARE. On board SATA RAID controller. RAID drive caddy (hot swappable) SATA RAID controller card. Anne Watson 1

Discovery Technology Group

Unit Storage Structures 1. Storage Structures. Unit 4.3

Understanding Disk Storage in Tivoli Storage Manager

BridgeWays Management Pack for VMware ESX

Best Practices for Architecting Storage in Virtualized Environments

Availability and Disaster Recovery: Basic Principles

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Recordkeeping for Good Governance Toolkit. GUIDELINE 14: Digital Recordkeeping Choosing the Best Strategy

Azure VM Performance Considerations Running SQL Server

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.

Distributed File Systems

ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001

Glossary of Records Management Terms

A Survey on Data Integrity of Cloud Storage in Cloud Computing

Image Gateway for Apeos 2.0

HowTo: Logging, reporting, log-analysis and log server setup Version 2007nx Release 3. Log server version 2.0

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

EMC VNX2 Deduplication and Compression

Addressing the Abandoned Archive Risk

Guideline for stresstest Page 1 of 6. Stress test

Backup. Contents. 1 Storage, the base of a backup system. 2 Selection, extraction and manipulation of data. 3 Managing the backup process.

Google File System. Web and scalability

Index Terms Cloud Storage Services, data integrity, dependable distributed storage, data dynamics, Cloud Computing.

Storage Switzerland White Paper Storage Infrastructures for Big Data Workflows

Best Practices for Virtualised SharePoint

Backup and Recovery 1

ZNetLive Malware Monitoring

Considerations for Management of Laboratory Data

The Hadoop Distributed File System

Automated file management with IBM Active Cloud Engine

Worldwide Managed Services for. 402 Amherst Street, Suite 300 Nashua, NH 03063, USA. Phone: e mail: holland@sciinc.com

an introduction to networked storage

Configuring Apache Derby for Performance and Durability Olav Sandstå

MONITORING PERFORMANCE IN WINDOWS 7

EMC DATA DOMAIN DATA INVULNERABILITY ARCHITECTURE: ENHANCING DATA INTEGRITY AND RECOVERABILITY

Operating Systems. Virtual Memory

Oracle Total Recall with Oracle Database 11g Release 2

BACKUP STRATEGY AND DISASTER RECOVERY POLICY STATEMENT

How To Manage Records And Information Management In Alberta

Efficient database auditing

Using Speccy to Report on Your Computer Components

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Data Classification Technical Assessment

Chapter 2: OS Overview

Cloud Computing with Azure PaaS for Educational Institutions

Cosmos. Big Data and Big Challenges. Pat Helland July 2011

Assignment 1 Briefing Paper on the Pratt Archives Digitization Projects

Internet Grocery Stores What does the future look like? By: Matthew Rousu

Database Normalization. Mohua Sarkar, Ph.D Software Engineer California Pacific Medical Center

Overview of Storage and Indexing

Alternatives to Big Backup

The Classical Architecture. Storage 1 / 36

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

Benchmarking Hadoop & HBase on Violin

Why You Need Archiving

2) What is the structure of an organization? Explain how IT support at different organizational levels.

Addressing Legal Discovery & Compliance Requirements

Transcription:

The fourth sector of the DW2.0 environment is the archival sector. Fig arch.1 shows the architectural positioning of the archival sector. Fig arch.1 The archival sector All data that flows into the archival sector comes from the near line sector. Fig arch.2 shows the source of the data. Fig arch.2 The source of data for the archival sector is the near line sector The reason why data is placed in the archival sector is that the probability of access has dropped significantly. Fig arch.3 shows that data whose probability of access approaches zero is placed in the archival sector. Fig arch.3 The probability of access of archival data is very low In many cases data is archived for legal reasons. The probability of access is actually very near zero. Yet the data still needs to be saved. Fig arch.4 shows the archiving of data for the purpose of satisfying legal requirements.

Fig arch.4 Often times data is archived for legal reasons, not for reasons of probability of access From a philosophical standpoint, if the corporation has taken the trouble of capturing and electronically structuring data, then throwing the data away seems like a poor choice. If the data ever has to be reconstructed, then once it is thrown away, it is either impossible to reconstruct the data or very expensive and troublesome do such a reconstruction. Therefore, if there is a need for ever accessing the data, then it usually is not destroyed. One of the reasons why archival data is held indefinitely is that storing archival data is an inexpensive thing to do. For that reason archival data is almost never stored on disk storage, as seen in Fig arch.5. Fig arch.5 Archived data is almost never stored on disk storage The essence of archival data is the storage of data for a long time 10 years, 20 years, and beyond. Fig arch.6 shows that archival data is meant to be kept for long periods of time. Fig arch.6 Archived data is stored for a long time As such all data in the archival environment is related to time. Fig arch.7 shows that data in the archival environment is organized by time, usually by years.

Fig arch.7 All data inside the archival environment is related to time Because there is a lot of data in the archival environment and because the data is organized primarily by time, metadata becomes very important. It is through metadata that the different types of data are located. Fig arch.8 shows the importance of metadata. Md Fig arch.8 Metadata is a very important component of the archival sector The importance of metadata is such that without metadata the archival environment becomes a one way street, as seen in Fig arch.9.

one way Md Fig arch.9 Without metadata, the archival sector becomes a one way street Once the metadata is in place, the archival environment can be searched in a reasonably efficient manner. But without metadata, entire files may have to be scanned, which is a huge waste of resources. From the standpoint of data structure, the records in the archival sector can take many different forms. Some of the possibilities of the form of record that can be taken are that records can be split, written as is, combined. Fig arch.10 shows some of the possibilities for structuring records in the archival sector. Fig arch.10 The records in the archival sector can be copies of records, can be records that have been split, or can be any number of other record types. In addition to metadata being important as a guide to the contents to the archival sector, indexes are important as well. Metadata describes the types of data that are found in the archival sector, while indexes describe the contents. Fig arch.11 shows the indexes that can be created for the archival environment.

Fig arch.11 Passive indexes for the archival sector are as important as they are for the near line sector In most cases the archival sector has a separate processor that manages the data found in the sector. And in most cases the machine is kept idle most of the time. A good usage of the machine resources is to create indexes in anticipation of the future usage of archival data. These indexes can be called passive indexes, for they are created not based on any known information requirement, but are based on future unknown requirements. Once the passive indexes are created and the metadata infrastructure is created, the archival environment can be accessed with a reasonable amount of efficiency. The metadata that is created needs to be stored as an actual part of the archival sector itself. It needs to be stored in the actual data set itself. The reason for storing metadata as part of the actual data is so that over time the data and the metadata won t become separated. Fig arch.12 shows that metadata is part of the archival sector and is stored with the data itself. Md Fig arch.12 The metadata needs to be stored as a close and integral part of the archival sector

The practice of storing metadata with data is to ensure that over time the metadata will not become lost. If the metadata is ever lost, then the worth of the archival data is much less. Fig arch.13 illustrates this fact. Fig arch.13 If the metadata ever becomes lost to the archival sector, then using the archival becomes very difficult. Access to the archival sector occurs in a pattern that can be described as a sequentially random pattern, as seen in Fig arch.14 When access archival data, it is normal to access the first record in a random manner, followed by a number of records that are sequentially accessed after the first record is found.

When activities are run against the archival sector, those activities tend to be large, as seen in Fig arch.15. Fig arch.15 When transactions are run against the archival sector, they tend to be large transactions When data is inserted in the archival sector, it is inserted in the form of snapshots, as seen in Fig arch.16. Fig arch.16 When data is inserted into the archival sector, it is inserted in the form of snapshots But suppose an erroneous unit of data happens to be found in the archival sector. At most, the erroneous data may be deleted. Then a correcting snapshot is entered into the archival sector. Fig arch.17 shows this process. Fig arch.17 If an error is found in archival data, it is not corrected or removed, Instead, a correcting snapshot is entered

And of course, on occasion whole sections of data can be pulled out of the archival sector. Once pulled out they can be placed anywhere in DW2.0 in the interactive sector, in the integrated sector, or in the near line sector. Fig arch.18 shows this placement. Fig arch.18 Once it has been decided to pull data out of the archival sector, the data can be placed anywhere - the interactive sector, the integrated sector, or the near line sector.