THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon

The fourth sector of the DW2.0 environment is the archival sector. Fig arch.1 shows the architectural positioning of the archival sector. Fig arch.1 The archival sector All data that flows into the archival sector comes from the near line sector. Fig arch.2 shows the source of the data. Fig arch.2 The source of data for the archival sector is the near line sector The reason why data is placed in the archival sector is that the probability of access has dropped significantly. Fig arch.3 shows that data whose probability of access approaches zero is placed in the archival sector. Fig arch.3 The probability of access of archival data is very low In many cases data is archived for legal reasons. The probability of access is actually very near zero. Yet the data still needs to be saved. Fig arch.4 shows the archiving of data for the purpose of satisfying legal requirements.

Fig arch.4 Often times data is archived for legal reasons, not for reasons of probability of access From a philosophical standpoint, if the corporation has taken the trouble of capturing and electronically structuring data, then throwing the data away seems like a poor choice. If the data ever has to be reconstructed, then once it is thrown away, it is either impossible to reconstruct the data or very expensive and troublesome do such a reconstruction. Therefore, if there is a need for ever accessing the data, then it usually is not destroyed. One of the reasons why archival data is held indefinitely is that storing archival data is an inexpensive thing to do. For that reason archival data is almost never stored on disk storage, as seen in Fig arch.5. Fig arch.5 Archived data is almost never stored on disk storage The essence of archival data is the storage of data for a long time 10 years, 20 years, and beyond. Fig arch.6 shows that archival data is meant to be kept for long periods of time. Fig arch.6 Archived data is stored for a long time As such all data in the archival environment is related to time. Fig arch.7 shows that data in the archival environment is organized by time, usually by years.

Fig arch.7 All data inside the archival environment is related to time Because there is a lot of data in the archival environment and because the data is organized primarily by time, metadata becomes very important. It is through metadata that the different types of data are located. Fig arch.8 shows the importance of metadata. Md Fig arch.8 Metadata is a very important component of the archival sector The importance of metadata is such that without metadata the archival environment becomes a one way street, as seen in Fig arch.9.

one way Md Fig arch.9 Without metadata, the archival sector becomes a one way street Once the metadata is in place, the archival environment can be searched in a reasonably efficient manner. But without metadata, entire files may have to be scanned, which is a huge waste of resources. From the standpoint of data structure, the records in the archival sector can take many different forms. Some of the possibilities of the form of record that can be taken are that records can be split, written as is, combined. Fig arch.10 shows some of the possibilities for structuring records in the archival sector. Fig arch.10 The records in the archival sector can be copies of records, can be records that have been split, or can be any number of other record types. In addition to metadata being important as a guide to the contents to the archival sector, indexes are important as well. Metadata describes the types of data that are found in the archival sector, while indexes describe the contents. Fig arch.11 shows the indexes that can be created for the archival environment.

Fig arch.11 Passive indexes for the archival sector are as important as they are for the near line sector In most cases the archival sector has a separate processor that manages the data found in the sector. And in most cases the machine is kept idle most of the time. A good usage of the machine resources is to create indexes in anticipation of the future usage of archival data. These indexes can be called passive indexes, for they are created not based on any known information requirement, but are based on future unknown requirements. Once the passive indexes are created and the metadata infrastructure is created, the archival environment can be accessed with a reasonable amount of efficiency. The metadata that is created needs to be stored as an actual part of the archival sector itself. It needs to be stored in the actual data set itself. The reason for storing metadata as part of the actual data is so that over time the data and the metadata won t become separated. Fig arch.12 shows that metadata is part of the archival sector and is stored with the data itself. Md Fig arch.12 The metadata needs to be stored as a close and integral part of the archival sector

The practice of storing metadata with data is to ensure that over time the metadata will not become lost. If the metadata is ever lost, then the worth of the archival data is much less. Fig arch.13 illustrates this fact. Fig arch.13 If the metadata ever becomes lost to the archival sector, then using the archival becomes very difficult. Access to the archival sector occurs in a pattern that can be described as a sequentially random pattern, as seen in Fig arch.14 When access archival data, it is normal to access the first record in a random manner, followed by a number of records that are sequentially accessed after the first record is found.

When activities are run against the archival sector, those activities tend to be large, as seen in Fig arch.15. Fig arch.15 When transactions are run against the archival sector, they tend to be large transactions When data is inserted in the archival sector, it is inserted in the form of snapshots, as seen in Fig arch.16. Fig arch.16 When data is inserted into the archival sector, it is inserted in the form of snapshots But suppose an erroneous unit of data happens to be found in the archival sector. At most, the erroneous data may be deleted. Then a correcting snapshot is entered into the archival sector. Fig arch.17 shows this process. Fig arch.17 If an error is found in archival data, it is not corrected or removed, Instead, a correcting snapshot is entered

And of course, on occasion whole sections of data can be pulled out of the archival sector. Once pulled out they can be placed anywhere in DW2.0 in the interactive sector, in the integrated sector, or in the near line sector. Fig arch.18 shows this placement. Fig arch.18 Once it has been decided to pull data out of the archival sector, the data can be placed anywhere - the interactive sector, the integrated sector, or the near line sector.