Cloud Computing and Digital Preservation: A Comparison of Two Services Amanda L. Stowell San Jose State University Abstract This paper will discuss the obstacles and methods of digital object preservation, define the terms cloud computing and software as a service, and identify the risks associated with cloud usage. Additionally, a comparison of two digital object preservation, cloud-based services will be provided.
Digital Object Preservation In the archival and records information management role, it becomes necessary to preserve an object, digital or physical, sometimes for a designated period of time and sometimes for permanency. In fact, preservation is the primary responsibility of museums, libraries, and archives and safeguarding these objects of enduring value, legal, societal, and business, is of utmost priority (Cloonan, 2007). In regards to digital objects, preservation can be described as,... ensuring the authenticity, usability, and accessibility of digital objects through time as long as required or/and... ensuring technological survival of digital information or objects as long as required (Hofman, 2002, p. 1). Digital objects pose a unique challenge for archivist and records information managers. Unlike their tangible physical object counter parts, digital objects can be difficult to describe and observe. Preservation is of serious concern in the library/archival community because of the evanescence and eventual obsolescence of the hardware and software we must use to access the texts (Cloonan, 2007, p. 135). Another important factor to consider in the preservation of a digital object is the recreation of the experience as the creator intended. An informational entity that is "preserved" without being usable in a meaningful and valid way has not been meaningfully preserved, i.e., has not been preserved at all (Rothenberg, 2000a, n.p.). To explain further, a digital object cannot simply be preserved as a single entity. Rather, the environment that it was created in must also be replicated. According to Rothenberg, preserved publications should retain as much as possible of their original functionality and behavior (2000b, p. 43). In the case of digital objects, this means that the environment must be recreated. Metadata plays an important role in the preservation of objects of any kind; however in the case of digital objects, its role is crucial. Metadata, in its simplest form is data about data, but it is far more complex and without it a
digital object might be entirely inaccessible. It provides future users with all the necessary information they will need to open, render and interact with the preserved digital object (Doyle, 2009, p. 34). There are three primary methods of digital preservation; migration, normalization, and emulation. Migration of digital objects involves waiting until the software or hardware required to access a digital object becomes obsolete, and then transferring the files to a new software application or hardware configuration (Doyle, 2009, p. 35). Normalization, a type of migration, is the process of converting a digital object type to an open-source format type. This minimizes the risk of a file format becoming obsolete due to proprietary software or a company no longer supporting the format. Emulation is the process of recreating on current hardware the technical environment required to view and use digital objects from earlier times (p.34). It is the mimicking of a software, hardware, or operating system environment for the purpose of allowing an object to be experienced in its original state. For particular digital object types, this might be the only method of preservation currently available. Cloud Computing and Software as a Service Cloud computing and software as a service (SaaS) is ever changing and evolving as the needs of the organizational communities change, lawsuits are filed and legalities adjust, and understanding of cloud potential become more prevalent. In 2011, the National Institute of Standards and Technology (NIST) released a publication that defined cloud computing as having five Essential Characteristics; on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. NIST determined that the cloud could be deployed using one of the four models: Private cloud (limited access), community cloud (access to a select group with a common purpose), public cloud (access to all), or a hybrid cloud (a combination of the
aforementioned) (p. 2-3). The publication also provided a definition of SaaS the consumer [uses] the provider s application running on a cloud infrastructure which are accessed via a web browser or a program interface that is managed by the provider (p. 2). The provider is also responsible for managing the network, servers, storage, etc. Prior to the NIST publication, academic documentation, definitions, and research can be difficult to find or can vary in specifics. In fact, many publications prior to 2008 reflect uncertainty in the longevity cloud computing and struggle to define the term. While definitions can be found, the NIST publication seem to unify and set a standard for the academic fields to go forward with. One notable publication in 2010 by Armbrust et al. identified 10 Obstacles and Opportunities for Cloud Computing, provided clear definitions, and offered recommendations for the future of cloud computing and SaaS including increased scalability. Cloud Computing Risks Just as when information or records are stored in a physical location is susceptible to certain risks, so are digital objects in a cloud environment. Two of the main risks are security and privacy. These are not only concerns of the business or organization that is utilizing the cloud they are concerns of the patrons or customers. Cloud storage and computing is still a relatively unknown term outside of the information occupations and with the threat of identity and information theft, patrons and customers can be quite skeptical of their information floating in an unknown realm. Companies that offer cloud computing solutions understand these threats and have taken measures to minimize the risk. Security One of the main challenges of cloud computing is security. The most secure cloud environment would be an environment with no access. No access equals optimal security. This,
however, is simply not possible and completely defeats the purpose of cloud computing. The consumer requires access. In fact, an ultimate consumer cloud environment would be one of unfettered access, high-speed, high-performance, access. This is not possible either as limitless access would make security impossible and the loss of data and information would be uncontrollable. Both scenarios, ultimate security and limitless access, are not possible. This is the challenge of a cloud service provider find the balance between access and security a secure cloud provides a reliable service by protecting data and its services available to the client with high performance (Sinjilawi, Y. K., AL-Nabhan, M. Q., & Abu-Shanab, E. A., 2014, p. 193). Privacy Information and user privacy is important in cloud computing and poses one of the other main challenges of a cloud environment. When information is left exposed in the cloud, transferred, or shared it is at a risk of exposing personal or sensitive data. Sinjilawi et al. offered these possible solutions to address privacy in the cloud: Anonymity-based method data is anonymized before it transfers from the user to the cloud service. The cloud service provider has the necessary data on hand to obtain what they need from the anonymized information. Privacy preserving authorization system data is not encrypted as the service provider is trusted. The user sets parameters on access and access policies. Privacy-preserving architecture this method requires that the service provider have a key or a decoder to even read the information being sent or requested. This ensures that no party without a key would be able to decipher the data. Oruta approach original user can control their data and its flow in the cloud so; users request then wait for verification of data depending on TPA to carry out auditing. Here,
three algorithms: KeyGen, RingSign and RingVerify are constructed for achieving the privacy-preserving auditing (p. 197). Cloud Service Comparison This paper has briefly discussed many of the factors relating to cloud computing, environment, infrastructure, and (SaaS). This next portion will compare two vendors, their cloud services and their approaches to security and privacy in the management of digital object preservation. Digital Archive OCLC offers many products for libraries, archives and records management. For the purposes of this paper, the service Digital Archive will be discussed. Located under the digital collection management tab on their website, Digital Archive is secure, managed storage for digital preservation (OCLC, 2015). In the overview section, OCLC mentions a few key features of the software long-term storage of an organization s digital files in a purpose-built environment, specific processes are followed to ensure the files are secure over a long period of time, quality checks are performed when the data arrives at the center and are routinely checked for digital health, and works smoothly with OCLC s ContentDM software by integrating with user-defined workflows. As previously discussed, security and privacy are two of the main risks to digital objects in a cloud environment. Digital Archive takes measures to minimize these risks: Physical security the system is housed in a limited-access operations facility, monitored 24/7 by system operators, security guards and cameras. Data security processes and procedures are created and actioned. Date backups copies of all data is distributed between multiple locations.
Disaster recovery facilities possess a disaster plan. ISO compliant ISO 9001 certified and ISO 27001 compliant. Additionally, OCLC performs routine virus scans, manifest checks, fixity checks, and format verifications. It is not explicitly state whether or not the files are routinely migrated to a newer format version or if files are converted to an open source. Preservica Preservica offers cloud services aimed at archives, libraries, museums, and government organizations, small to large. In fact, the scalability of the software is one of the main features with the offering of three packages to fit any size organization. The website features a few videos that demonstrate the processes and services that Preservica can offer. This makes the website attractive and easy to maneuver and understand. Preservica offers many reasons the service is superior to competition. Notably, Preservica has contributed to the development of many standards in the field - including PRONOM, DROID and OAIS. Preservica addresses the concerns of privacy and security in cloud computing by using the file migration method. This ensures that a digital object, uploaded by the user, will still be accessible and fully functional in the future. A video demonstrating this fact can be found on the website under the how it works tab. Additionally, users define access rights and privileges and external users are required to authenticate before gaining access to information. Information could not be found on the website regarding ISO standards of compliance or measures taken to ensure the physical security of the digital information. Conclusion This paper provided a brief overview of the digital abject preservation, cloud computing and its risks, and then compared two such services based solely on what could be found on their
websites. Preservica seems to have the upper hand in notoriety as the list of organizations using the software was quite lengthy. However, it was appreciated that OCLC made it a point to list the steps that have been taken to ensure the security and integrity of the information. Both services offer the same final element preservation of the digital object, but a decision as to which would be appropriate for an individual organization would have to be made with more information. Such as, cost, longevity of the service provider s business plan, interface, and specific functions.
References Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A.,... & Zaharia, M. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50-58. Cloonan, M. (2007). The paradox of preservation. Library Trends, 56(1), 133-147. Doyle, J., Viktor, H., & Paquet, E. (2009). Long-term digital preservation: preserving authenticity and usability of 3-D data. International Journal on Digital Libraries, 10(1), 33-47. doi:10.1007/s00799-009-0051-7 Hofman, H. (2002). Can Bits and Bytes be Authentic? Preserving the Authenticity of Digital Objects. Retrieved from http://eprints.erpanet.org/39/01/hofman_glasgow02.pdf Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing (800-145). Retrieved from National Institute of Standards and Technology website: http://csrc.nist.gov/publications/nistpubs/800-145/sp800-145.pdf OCLC: Worldwide, member-driven library cooperative Global. (2015). Retrieved from http://www.oclc.org/en-us/home.html Preservica: Digital preservation technology. (2015). Retrieved from http://preservica.com/ Rothenberg, J. (2000a). Preserving Authentic Digital Information (92). Retrieved from Council on Library and Information Resources website: http://www.clir.org/pubs/reports/pub92/rothenberg.html Rothenberg, J. (2000b). An experiment in using emulation to preserve digital publications. Koninklijke Bibliotheek, The Hague, The Netherlands. Retrieved from http://www.studioautomata.com/itp/indestudy/emulationpreservationreport.pdf
Sinjilawi, Y. K., AL-Nabhan, M. Q., & Abu-Shanab, E. A. (2014). Addressing Security and Privacy Issues in Cloud Computing. Journal Of Emerging Technologies In Web Intelligence, 6(2), 192-199. doi:10.4304/jetwi.6.2.192-199