Digital Dilemmas: Dealing with Born-Digital and Digital Surrogate Audio and Audio-Visual Collections ARCHIVES 2008 August 28 San Francisco I am Angelo Sacerdote and I am the Preservation manager at the Bay Area Video Coalition here in San Francisco. In addition to audio and video preservation, we also provide post production services, teach electronic media classes and serve Bay Area Youth, both in schools and at BAVC.
Born-Digital Video Sources Digital Video Tape Solid State Media Optical Media Hard Drives (computer generated video and devices with built in hard disk) Developing workflows and management schemes for "Born Digital" audio-visual materials helps us to prepare for the much larger task of reformatting and storing analog materials. For instance, the NDIIPP preserving digital public television project http://ptvdigitalarchive.org/ focused on Born Digital for it's first phase. Born digital means that the digital object in question was created in the digital realm, not migrated from analog. Today, I am focusing on digital video. There are many different sources of digital video files.
Born-Digital Video Sources Digital Video Tape Some common types: Digital Betacam DVCAM Mini DV Also- Octoplex, D1, D2, D3, D5, D6 DCT, DVCPRO, DVCPRO 50, Betacam SX, Digital 8, Digital S, etc. Many types of digital video tape do not enjoy widespread adoption. Betacam SX, for instance, is mostly found in TV news stations. I just found out about the first digital video format, Octoplex, by Ampex that recorded onto 2 tape. D1, D2, D3, D5, D6 and DCT are not nearly as common as DVCAM, DVCPRO, Digibeta and mini DV and should therefore be transferred as soon as possible before there are no more playback machines. At BAVC, we only support Digibeta, DVCAM and DVCPRO as digital tape formats.
Born-Digital Video Sources Solid State Media (no moving parts) Sony has SxS Pro to record XDCAM and Panasonic has P2 to record DVCPRO HD. Some consumer camcorders and digital still cameras and cell phones that also record video use SD cards. Each of these formats is proprietary and needs current software to play back. Solid state is very expensive but is coming down in price.
Born-Digital Video Sources Optical Media Some consumer video cameras record directly to DVD, as do some set-top video recorders. In addition, Sony has professional cameras that record to the high capacity Professional Disc that can store 50 GB of data.
Born-Digital Video Sources Hard Drives (computer generated video and devices with built in hard disk) Software generated video, as well as devices that record to a built in hard drive may also be sources for video files. Hard drives can suffer from data corruption and mechanical failure, although their reliability has increased over the years.
Digital Surrogates A digital surrogate is basically a digital copy of the original item. It may be a replacement for the original or it may be an access copy. Due to the march of progress, it is not possible to keep many analog video formats alive. Therefore, it is necessary to reformat analog sources to digital files. Deciding which file format is appropriate can be a formidable task.
Digital Video Files Wrappers: AVI MXF Quicktime MPEG2 MP4.divx.flv etc. The ideal would be for open standards that are universally readable. However, the needs of the archival community are not always in agreement with the motives of manufacturers of video equipment and software. The reality we must contend with is a sea of generic and not-so generic files. A wrapper is a binary container that the video essence, encoded in a supported codec, resides. Some of these wrappers are proprietary and some are not. Depending on the wrapper, there will be a certain level of metadata and a certain number of video and audio tracks..
Digital Video Files MXF MXF can contain many different codecs but not all flavors of MXF are the same and not all video software can read all MXF files. These are two different versions of mxf. The one on the left is from Panasonic s HVX200 camera, recorded on a P2 card. The one on the right looks more like what might come out of a Samma system or an Avid. Panasonic's P2 card based DVCPROHD format is a proprietary HD format, compressed about 6.7:1 and yields files which are about 40 GB per hour, much smaller than uncompressed standard definition video. The file created on a P2 card is a complex MXF series of folders. If any one of these folders is renamed or lost, the entire package is unreadable. Fortunately, these files can be re-wrapped into a more widely readable Quicktime package. Quicktime is the proprietary wrapper developed by Apple. While not an open standard, it is widely used and can be installed on Windows or Mac based computers. The "Pro" version that can read more codecs costs $30. as you can see the P2 card generated MXF comes as a series of folders. If you move or rename any single file, the whole thing becomes unusable.
Digital Video Files Codecs: DV DVCPRO50 DVCPROHD H.264 XDCAM EX XDCAM HD IMX JPEG2000 etc. The codec refers to how the video file is encoded. In order for software or hardware to play back the file, it must be able to open the wrapper and it must have the proper decoder installed. Digital video tape also uses codecs, such as DV. In this case, we could think of the tape as a wrapper. Not all players can play all files. For instance, a DV tape machine cannot play back an HDV tape and Quicktime cannot play back an MXF wrapped JPEG2000 file. It is therefore important to determine which video formats will be supported in your repository and develop a plan for ensuring some degree of uniformity. It will also be important to monitor trends in the technology over time to determine if the files are becoming obsolete and have a plan to transcode to the newer widely supported format when that time arrives. In the case of audio, storage has become so inexpensive that there is widespread agreement on Broadcast Wave Files, an uncompressed universally accepted file format. For images, TIFF is a good uncompressed standard. This is not the case with video. While uncompressed video formats exist, none are universal and require some sort of codec and wrapper that will be read by some but not all systems. In addition to this issue, the vast majority of new digital formats are heavily compressed and proprietary. What is the appropriate preservation format for these files?
Digital Video Compression File Size Codec Approximate size per hour of video Approximate storage needed for 100 hours DV 13 GB per hour 1.5 TB DV50 25 GB per hour 2.5 TB MPEG2 DVD quality 2-5 GB per hour 500 GB MPEG2 50 mbps 25 GB per hour 2.5 TB JPEG2000 Lossless 30 GB per hour 3 TB Uncompressed 85-95 GB per hour 9.5 TB For example, one of the most common digital video formats, DV is compressed 5:1, is abut 13 GB per hour and is widely read by virtually all video software. The DV stream can be captured easily into a computer over a firewire cable. Uncompressed standard definition video can be about 95 GB per hour. Most other compression schemes will yield a file that is larger than the original. It makes sense to keep DV in it's native file format, since it will save storage space and is fairly generic. I am leaving out many options including all HD formats. These are approximate values. There are many factors that may contribute to the ultimate file size, including
Digital Video Compression Compression Ratio and Chroma Subsampling DV 5:1 compression ratio, 4:1:1 chroma subsampling Digital Betacam approx. 2:1 compression ratio, 4:2:2 chroma subsampling DV50 3.3:1 compression ratio, 4:2:2 chroma subsampling Motion JPEG2000 lossless at approx. 3:1 compression but not widely supported by hardware or software MPEG2 Interframe, different bitrates possible, 4:2:0 chroma subsampling H.264 mainly used for access, very efficient compression 4:2:0 chroma subsampling, higher quality profiles available Compression ratios are one indication of how much data you may be losing. Chroma subsampling is an indication of color reproduction fidelity. Because human eyes notice subtle differences in color less than in lumanence, some color information may be sacrificed to make best use of limited bandwidth. When the analog source is VHS, video 8 or Umatic, this may not matter. However, when the source is a very high bandwidth analog master or high quality digital master, a difference may be noticed. Some formats, such as DV and DV50 are constant bit-rate while MPEG2, H.264, Motion JPEG 2000 are variable bit rate, meaning their file size will vary depending of the settings used to encode them. All of these compression schemes are lossy except for JPEG 2000, which can be lossy or lossless. Lossy means some data Is discarded in the compression process. Lossless means that no data is discarded and the decoded file is as good as uncompressed. Another lossless codec is Huffy UV, which is also not widely supported in the post production world.
Digital Video Compression Intraframe DV DV50 Motion JPEG2000 Interframe MPEG2 (including DVD, HDV, IMX, etc.) H.264 Intraframe compresses each frame individually, while Interframe analyses a Group of Pictures and
Digital Video Compression Motion-Compensated Interframe Prediction Group of Pictures (GOP) Usually between 1-15 frames (30 fps in NTSC video) and made up of: I-Frames are intraframe. They refer only to themselves. P-Frames are predictive frames. They refer to the information in the I-Frames and themselves (intraframe) and can be smaller in file size than I-Frames. B-Frames are bi-directional predicted frames. Refers to both I and P- Frames. Requires less data than either I or P-Frames. Cannot be used as a reference frame in MPEG 2. compresses based on I frames, P frames and B frames. I-Frame only, and therefore essentially intraframe is possible with these compression schemes, yielding better quality and higher file sizes than IPB GOPS.
Digital Video Compression Compression Artifacts Most video compression schemes use some form of discrete cosine transform (DCT). fr Why not live? Red Cross film from archive.org MPEG2 encoded at 3.7 mbps DCT breaks the image up into discrete areas which are then subjected to calculations which determine how much information can be discarded. At high compression rates, this can result in blockiness, ringing and blurring. JPEG2000 uses wavelettes and in lossy high compression rates, exhibits more blurring and ringing but not blockiness. This image is from a Red Cross film called Why not live from the Prelinger collection at archive.org. It is an MPEG2 file encoded at a low bit rate, about 3.7 mbps.
Digital Video Compression Compression Artifacts This close up shows the artifacts more clearly. Moving water suffers a great deal of loss at this data rate.the diagonal lines in the hat have aliasing problems. But the arms and legs are not so bad because they lack detail and complexity.
Digital Video Compression When is uncompressed appropriate? Video Formats chart available at http://videopreservation.stanford.edu/dig_mig/video_formats_v4_850.html When moving from analog to digital it may be overkill to capture as uncompressed digital video files, depending on the original tape format. Does it make sense to store marginal standard definition formats in the highest quality digital format? For instance, VHS has 240 lines of resolution, Umatic has 280 and DV has 480 lines. DV is compressed 5:1 but is a component video format, whereas Umatic and VHS are composite. There is a useful chart available at http://videopreservation.stanford.edu/dig_mig/video_formats_v4_850.html As stated earlier, uncompressed video can be 95 GB per hour, whereas DV is only 13 GB per hour. There are a wide range of commonly used compression schemes for video. Some are less compressed than DV (DV50, MPEG2 50 mbps, etc.) and some are more heavily compressed (DVD, H.264, etc.) It makes sense to limit the number of formats to a few supported ones or you may run up against unexpected playback problems in the future. In the coming year, BAVC will be investigating which of the most commonly used compression schemes are appropriate for various analog tape sources. This study will be part of a larger project we are working on with the Dance Heritage Coalition and the results and recommendations will be made public at a future date.
Storage and Management Which medium? Which system? Once you have your digital assets, you need a place to store them, manage them and access them.
Storage Video Tape Optical Media Data Tape Hard drives Servers There are many different forms of media to store files on.
Storage Digital Video Tape Robust professional formats are expensive Still need to migrate later Subject to chemical breakdown, physical damage The traditional method is to reformat to another tape. We are comfortable with this way of working and you can put it on the shelf and forget about it. Unfortunately, if you wait too long, you can end up with an expensive reformatting project again in several years, especially if the format becomes obsolete or the tapes get compromised. There is no way to automatically check their integrity over time.
Storage Analog Video Tape Dropout The source of this tape is Hi8. A small analog tape format with bad dropout issues. As you can see, though imperfect, analog is forgiving. In less than a second, the dropout has passed and most people won t notice.
Storage Analog Video Tape Environmental Damage This is a 1/2 open reel tape that came to us from New Orleans. In addition to the mold, the tape had absorbed a great deal of moisture and had become sticky. It was very labor intensive, but it did play back, albeit imperfectly. If this were a digital tape, I doubt any of it would be retrievable. We have some similarly damaged tapes from Hawaii. We are transferring them to Digital Betacam and DVCAM and I have warned them to store them in a climate controlled facility or they will lose all of the video when those digital tapes get attacked by mold and moisture. By the way, please don t send us moldy tapes. If you have them, we can refer you to the right place. What this means is that your digital video tapes may be in greater danger than analog tapes, depending on format and storage conditions.
Storage Digital Video Tape Dropout DV Tape Digital Tape vs. Analog tape - Dropout may be a nuisance in the analog realm, but can be catastrophic in the digital realm. This is a simulation of dropout on DV tape. The block size represents the amount of compression used. Mini DV is very small (about 1/4 ) and easily damaged. Dropout can cause loss of audio information as well.
Storage Digital Video Tape Dropout Digital Betacam This is a simulation of dropout on Digital betacam tape. The block size represents the amount of compression used. Digital betacam is more robust (1/2 ) but is also quite thin. To be fair, Digital Betacam has been around since 1993 and I have rarely seen playback problems with it. I have not yet been presented with a moldy or water damaged Digital betacam tape. If I was presented with such a tape, I would send it elsewhere because our Digibeta decks are too expensive to repair to risk damaging them.
Storage Optical Media OK as interim access format Damage from scratches, chemical breakdown Limited storage capacity Easy to lose Standard single layer DVDs are 4.7 GB, Dual layer can be about 9 GB. Blu Ray disks can reach up to 50 GB with 100 GB already prototyped, but not yet available.
Storage Data Tape Good for data backup, not great for access. Slow to recover files. Unless you store all data tapes in an autoloader, difficult to do error checking.
Storage Hard Drives On A Shelf If you have just a few video tapes in your collection, you could get away with a hard drive on a shelf (in an enclosure of course). At this level, without the overhead of a server, it is equivalent in price to digital betacam with potentially better quality and easier access and migration. You just have to remember it is there and check it from time to time.
Storage Servers It is my opinion that servers are the way to go for storing files long term. They allow for checksums to be run at regular intervals and DAM software to manage the files. I would recommend servers with tape backup to be safe. Of course, servers require expensive infrastructure and consume large amounts of electricity, especially since they require air conditioning. They also require an IT staff. As the cost continues to drop, this will seem more realistic. These are BAVC servers. Some of them contain media files, others are for admin purposes. Everything is backed up to LTO 4 tapes and taken off site. They are connected to other parts of the facility with 10 gb fiber and 1 gb ethernet.
Storage Solid State The future of servers is solid state storage. It is my opinion that future server storage will be made from solid state drives. While they are still too expensive, solid state storage devices consume very little electricity, have no moving parts and are very sturdy. I accidentally ran this thumb drive through the laundry twice and it still works. I lost no data (Probably just a fluke). One downside is excessive heat with heavy duty access. I am confident that this will be resolved in time. They already make full sized hard drives such as this 64 GB Flash SSD laptop drive for only $750. Flash SSD drives as big as 128 GB go for about $450.
Metadata The great thing about standards is there are so many to choose from. Metadata. I can t talk about this without talking about metadata a bit. The ideal is for all of the standardized metadata to travel with the digital object. In the world of video, there are greater or fewer fields of metadata available depending on the wrapper. Some wrappers, such as MXF contain an XML element to hold metadata. Unfortunately, MXF has many different flavors and is still not widely used. Most notably, Apple has refused to support it in favor of their own Quicktime. There is no truly generic form of MXF, but if their were, it may solve the embedded metadata problem. AVI files can hold very limited basic metadata and Quicktime files can hold more, but not anything that lives up to library standards. Suffice it to say, as it stands now, some rudimentary metadata can be encoded to the object directly and the rest will happen in your Digital Asset Management system. I am not an authority on metadata, so I can t make strong recommendations, it seems like there is emerging consensus regarding METS with MODS and PREMIS or some variation of that, but I ll bet Hannah can address this better than I can.
It would be wise to try to limit the number of formats in your DAM and to make sure that you always have software and/or hardware attached that can read every format you are storing. BAVC is beginning to work with Final Cut Server and we are moving towards a FEDORA implementation in the near future. Final Cut Server uses no recognizable metadata standard, but we are looking into customizing metadata sets. Digital Asset Management Open Source: DSpace Fedora Commons Greenstone Proprietary Canto Cumulus VFinity Final Cut Server etc. When embarking upon a digitization program, it is necessary to first have a strategy for managing the files. This requires the implementation of a Digital Asset Management system and the accompanying infrastructure. There are many choices, both Open Source and proprietary. The advantage to open source is that you are usually staying in a standards based world and not locked into working with a single company. The downside is that you need a higher level of expertise to set up these systems and it may be challenging to find consultants for some of them. Proprietary solutions may come with a service package and skilled people to customize them and you may get up and running more quickly, but you may find yourself stuck within their solution. In addition, many commercial solutions for managing audio and video may not conform to digital library standards. In addition, If you make changes to open source software, they can be shared with the wider community to advance knowledge overall. There are advantages and disadvantages to both options, but this is a discussion for another panel. At present, many DAMs being used for smaller digital files but storing and serving large video files is still in the experimental phase. Storage costs are dropping and will soon be much less of an issue than they are today.
Digital Dilemmas: Dealing with Born-Digital and Digital Surrogate Audio and Audio-Visual Collections angelo@bavc.org www.bavc.org