Campus Research Data Storage Survey
|
|
|
- Clare Fletcher
- 10 years ago
- Views:
Transcription
1 November 7, 2014 Campus Research Data Storage Survey As a follow up to the discussion of item #12 on the September 4 CoUL meeting agenda, all campuses were asked to provide information about local research data storage solutions. At the time of this summary, responses have been received from nine campuses. In general, all campuses offer a wide range of storage services to their local, and in some cases, external communities. These solutions are supported at all campus levels, from central IT, the library, and individual schools, departments, and laboratories, and are either locally provisioned or provided by other campus (e.g., SDSC) or system wide units (e.g., CDL), or commercial cloud providers (e.g., Box, Google, Microsoft). These storage offerings are supported through a mixture of central funding and recharge. While the latter option is appropriate for meeting grant funded project needs during the period of project performance, it does leave open the question of long term sustainability once a project terminates. It is unclear whether central campus funding for storage is recognized as an ongoing obligation by campus administrations or if it is being provided as a one time or temporary initiative made possible by so far modest demands for use. All campuses are open to exploring the widest range of future options for providing appropriate storage solutions for their constituencies, including the use of other campus and commercial providers so long as minimal service level assurances are provided. The campuses all recognize that a careful balance always needs to be made between financial sustainability and the desire to satisfy fully the particular storage needs of various categories of content, e.g., active vs. final form, private vs. public, dark vs. bright, common vs. unique. Note that this implies a non trivial level of curation in order to classify content properly for disposition to an appropriate service level. This survey reveals an abundance of storage options, although much is available on a relatively unmanaged basis, and thus may not be appropriate for the long term stewardship of research data or other library content. One promising avenue forward, then, would be for the campuses and CDL to explore options to layer appropriate curation services on top of existing non curation aware storage solutions. For example, external validation of bit level fixity of remote storage, external replication brokerage to insure sufficient redundancy, unified managerial interfaces to distributed content, etc. By enabling "preservation in place," existing storage options can be used with greater efficiency, minimizing the necessity for provisioning new alternative solutions. It also provides a means to exploit low cost commercial options, e.g., "free" storage capacity provided by Dropbox, Google Drive, Microsoft 365, etc., with reasonable assurances of desirable long term curation outcomes. Page 1 of 14
2 Survey Responses 1. What storage facilities for research data are available on your campus at the campus wide, school, department, ORU/MRU level? IS&T hosted data storage service. There are multiple versions of data storage most of which is unmanaged (e.g. good for data storage but not really good for long term curation without additional software or services on top of this storage). The library also has considerable amounts of curated data storage but in general we turn to Merritt for long term preservation. The campus IT department (IET) provides a SAN service for anyone on campus, charged by the GB and provisioned using NFS or CIFS. School and departmental offerings vary greatly. In Colleges, researchers can leverage any combination of the IET SAN, College SAN, and local disk array with cloud backup. Most Colleges have some level of local storage that is made available for their researchers use. Additional data storage services for particular data types are also available: e.g. data storage for HPC clusters or social science data storage managed by the College of Letters & Science; data storage to support bioinformatics at the School of Medicine s CTSC and the campus Genome Center (e.g. the Redcap service). The Office of Information Technology (OIT) manages a research computing cluster (HPC); users of HPC can purchase or add storage to its Petabyte sized high performance BeeCFs distributed file service. Some faculty have built their own servers with 100+ TB of storage just for their labs. Many have done this because they need fast access to the data (1Gb speed) in order to process data quickly so they have built their own internal network infrastructure (with OIT s help). For those people the concern is more of how quickly they can access the data. Libraries have significant data storage capacity. At the campus level, the core storage component at is the Cloud and Archive Storage Service (CASS), run out of IDRE, the central IT unit for research support. CASS is working storage, and is available via a number of common protocols (NFS, SMB, iscsi, etc.) There are some good preservation features (two site replication, software RAID, parity, etc.) baked in for those who want them. Negotiations are ongoing to get a replication partner outside of Los Angeles. IDRE also maintains several high end, fast storage systems in support of HPC operations. Schools, divisions, departments and labs are all in the same boat. They either buy local storage on their own, or increasingly, buy it from IDRE. The Library has its own high end storage with off site replication, about 200 TB worth, focused on the Digital Library. IT provides the campus network attached storage (NAS stor) with a starting quota of 1GB for individuals and a 60GB for departments at no charge. Although limited at this time IT also provides direct attach storage via fibre channel to VM host machines, stand alone servers, and VMs on a recharge model. Campus IT rolled out Box service in 2013: individual accounts get 10 GB storage; group/department option of 30 GB storage; option to increase. At the Library: SAN based storage is available via the Library datacenter for Library digital initiatives and other uses for our operational needs. At the campus level, from the Computing and Communications (C&C) organization: Page 2 of 14
3 Research data storage varies widely across the campus. High level overview: C&C provides faculty with web servers / limited storage for web sites / summary data. C&C provides a computational cluster and disk for research needs. C&C provides general purpose storage (network drives, SharePoint, etc.) for general purpose needs. Faculty grants / projects include a huge variety of multi TB storage systems for storing research data. The Library has a large file server (Edgar) for storing digital library objects; we temporarily use this server to store research data when part of the Merritt deposit workflow. The central campus data center offers data storage services, but at the present it is mainly used for administrative applications and data, rather than research data. Most faculty generated data is handled at the academic department and divisional level. SDSC data center multiple petabytes of storage available on a recharge basis not preservation storage. Library DAMS hundreds of terabytes of digital asset storage. Chronopolis hundreds ( ) of terabytes of preservation storage (also housed in the SDSC data center). Departmental storage ranges from full data centers with up to a petabyte of storage, down to individuals with large drives. MyResearch a controlled access environment, especially used for data with PHI. An environment that provides tools for analysis as well as storage. Available campus wide, provided by Academic Research Systems. Box a Dropbox like environment for file sharing. 5GB file size limit. Cannot put data including PHI in it. Available campus wide, provided by Information Technology Services (ITS). Data Services Center provides hosting and data management solutions. Available campus wide, provided by ITS. DataShare (Dash) a subject and format agnostic open data repository. Available campuswide, provided by the Library. There are institute and department wide options for storage of research data. The services are mostly limited to the serving the needs of the individual department or institute. Available to specific departments or institutes. All campuses offer a range of storage services spanning all levels of their organizations: central IT, library, school, department, lab. There is some reliance on system wide (i.e., CDL) and commercial offerings. Page 3 of 14
4 2. Are these primarily targeted for working space for highly transactional and dynamic resources or for long term preservation of largely final form resources? The storage for IS&T is largely focused on high transaction/dynamic resources. In fact there is SSD (solid state storage) backed solutions which are exceptionally fast. In conjunction with the Berkeley Research Computing effort there are new efforts around connecting fast storage to parallel computing facilities. Each storage solution is targeted towards the need (which is why one size does not fit all). Some researchers need hundreds of TBs of storage but their retention period is only a few days. Others need only a few GB but their retention period is forever. The solutions align with the researchers need and budget. None of the data storage services available on campus offer curation services like digital preservation (beyond backup and fixity checking). But researchers on campus do not make a distinction between working and archival storage so options that create barriers to working with the data aren t acceptable to them (and unnecessary in their view). They are primarily targeted for highly transactional and dynamic resources, mainly short term data generation and analysis. The transactional resources are usually very large data sets, but the final form resources tend to be very small in comparison. Many faculty just do long term storage on lab computers & drives. Libraries rely on Merritt for long term preservation. The campus focus is on active storage, not long term storage. The Library is more oriented towards preservation. Primarily for highly transactional and dynamic resources. At the Library: For research generated storage needs, the Library does not have long term preservation storage for final form resources or dynamic, highly transactional storage. However, the Library provides working space storage for the development of sustainable digital objects. For long term preservation, the Library is planning to use the California Digital Library (CDL) Merritt repository. At the campus level (C&C): In general, these storage systems are NOT targeted for longterm preservation, although there are exceptions (e.g. Center for Bibliographical Studies and Research). The services are primarily used as working spaces for highly transactional and dynamic resources. Long term preservation at is the Library s use of Merritt. SDSC storage is highly transactional and geared especially toward active computing. Library and Chronopolis storage are for long term storage and/or preservation. Departmental storage is usually transactional and preservation hasn t been considered yet. Except for DataShare, the campus wide services tend to fall more into highly transactional/ dynamic resources rather than long term preservation. MyResearch is meant as a secure space for datasets that are part of an ongoing research project and are not ready for longterm preservation. However, MyResearch does provide HIPAA compliant back ups. Box could be used to store research data but anecdotal evidence says that researchers aren't using it for that purpose and is primarily used for sharing of documents that are presentation ready (ppt, doc). Page 4 of 14
5 Most campus storage options are intended as working space for transactional resources. Some campuses rely on CDL or locally developed services for long term preservation purposes. makes the important point that while the distinction between working and preservation storage is meaningful to us, it isn't to faculty. 3. Under what financial arrangement are these storage services offered? BRC is offered through a condo purchase model (e.g. you contribute resources but benefit from central management and a large cluster). Disk space and traditional computing offered through service recharge (e.g. subscription). Different services have different cost models from pay as you go and bulk recharge to centrally funded, e.g. Box.com and One Drive (Office365). In general, small amounts of storage are covered by the local unit or centrally provided services, while larger amounts are covered by grants via recharge services. For the HPC, storage is paid by both the schools and the faculty. If paid by school, storage is shared by all faculty and researchers, but those who pay by themselves are entitled to all the amount of space they purchased. For school level storage, researchers either purchase their own NAS devices, or pay monthly for their use of NetApp storage appliance (where available). Both CASS and HPC storage are available on a TB/year basis, by recharge. Discounts are available for large or multi year purchases. Currently, storage services are centrally funded with cost models being developed. Box service offered free at initial levels of storage noted above; option to increase. At the Library: Since the Library does not offer/provide storage services at the present time, we do not have any system in place for financial arrangements for storage services. In the future, as needs require, the Library may consider the financial models used by CDL Merritt and the San Diego Supercomputer Center (SDSC). The Library will reevaluate this topic, at most, in a year or two. At the campus level (C&C): Central services are offered at no cost to principal investigators (PI s) or funded by grants / contracts. The central campus data center has a fee based system: for each virtual machine the cost is $ 3.50/GB/year and the base for the VM (1CPU & 1GB RAM) is $ /year. SDSC is under a recharge, per TB service. Library service is provided free to campus users (i.e., funded via campus). Chronopolis service is provided under a recharge for non campus users. Campus use is funded via campus. Departmental storage is covered by a variety of university and grant funds. To date, storage on MyResearch has been free. Beginning Nov. 1 st, researchers will receive 10GB storage for free. Above 10GB, they will be charged enterprise costs (currently $0.24/GB/month). Specific information about these charges available at Page 5 of 14
6 There is no charge to use Box. Data Services Center provides fee based services services. Currently there is no charge to use DataShare. Storage is provided either on a centrally funded or recharge basis. "Free" (i.e., centrallyfunded) storage often has size limitations. Funds for recharge are generally assumed to come from grants/contracts. While this is adequate during the period of grant performance, it leaves open the question of the sustainability of longer term retention after grant termination. 4. Who is eligible for these services? Any department or individual researcher can take advantage of these technologies. All campus affiliates. The research computing cluster is made available for any faculty/researcher and their lab staff. School/department level services are offered to their faculty, staff and grad students. CASS is presently available to all UC campuses. HPC storage is reserved for research groups. Any UC Merced researcher or department is eligible. At the Library: Faculty/researchers, also the possibly of assisting other on campus units with transitional storage. For example: Strategic Communications passing data through university archives. At the campus level (C&C): Faculty / researchers / graduate students. Departments, divisions and research groups. No individuals. SDSC: anyone. Library: campus users. Chronopolis: anyone (non commercial). Departmental: usually limited to members of that department. Campus wide services are available to all faculty and staff. Box and DataShare are also available to students. MyResearch is available to non collaborators who are working with a member of the community. All campus storage services are available to the campus research community, i.e., faculty, staff, and (in some cases) graduate students. Some campuses also provide services for their entire campus community, including undergraduates. offers services for the broader (noncommercial) community. Page 6 of 14
7 5. Is your library or campus aware of or making use of storage services made available from other campuses? We are and are completing our own investigation of data storage services. The library currently has the capacity to store 100TB across all of our data storage services but need some mechanism to make this storage more dynamically allocatable. In addition, we are interested in layering data curation services on top of these storage platforms to allow us to more effectively use our storage. The solutions being considered include CDL's Merritt, SDSC data storage, Amazon Glacier and S3 and a few local infrastructure based solutions. I have also been watching Google, Dropbox, GitHub and other SaaS data storage providers to see if they offer solutions that would be a good fit for library data storage services. As you know, we make extensive use of Google Drive for our own business data storage services and the presence of APIs on Google Drive makes long term curation and storage possible with little user intervention (e.g. creating a right click menu to store or publish versions of data). There is general awareness of these services but only a few Colleges do some work with SDSC and LLNL facilities, typically for large grant projects like the CTSC or LSST. Otherwise, there isn t an institutional program to work with another campus/university for storage (yet) and many users are looking toward the cloud to provide this type of service. Our campus has some storage arrangements (including for replication of data on the NetApp storage appliance) with SDSC. We re aware of both CDL and storage offerings, and are currently replicating our high end library storage to Berkeley. However, with the exception of a recently implemented DASH/Merritt instance, we are not going to other schools for storage. SDSC storage facilities have been used by some researchers, although not widely. In addition to SDSC (who IT is consulting with regarding our data management solutions), provides storage and backup solutions using an excellent recharge model and we have assisted folks in taking advantage of these solutions. IT is using as part of our Data Recovery plan by leveraging the SnapVault functionality of our NetApp (NAS) device to replicate data (deltas) to their storage solution. We are using this primarily for our LMS's associated loose files. Small number of individual researchers have used Berkeley IS&T Research Hub (through a pilot initiated by Library), but primarily for collaboration purposes rather than storage. Researchers have also used LBNL. Library has made use of SDSC services and is making use of Merritt. At the Library: Yes, the Library makes use of storage services from the California Digital Library (CDL) and the San Diego Supercomputer Center. At the campus level (C&C): Yes, from the San Diego Supercomputing Center. To our knowledge, the Human Genome Project has redundancy at the /SDSC. This project operates independently of the academic division or the other central IT services provided to campus. Our business systems department shares disaster recovery space with UCOP. Yes, we re using Dash/DataShare/Merritt from the CDL. All the campuses have secured or are investigating services located at SDSC,, CDL, or UCOP. Page 7 of 14
8 Some of this use is for purposes of backup/recovery. 6. Is your library or campus aware of or making use of services from commercial providers, e.g., Amazon, Box, Dropbox, etc? If yes, what are the terms for allocating and funding storage space? Yes, see above question. I am not sure I completely understand the question around 'terms' but generally we have been very conservative in deciding when to put our data onto third party providers. If we can find a 3rd party provider with a compelling technical solution and cost model we would likely pursue it with the proper safeguards in place. Our campus employs Box, One Drive (Office365), Google Drive, Amazon S3, and Azure. One Drive (Office365) is funded through central IT as part of the Office365 rollout. Box is funded by a partnership between IET and College of Engineering. The others are cost recovered directly from users. Libraries are currently using Amazon S3 services for storage and backup. The campus is also using Amazon S3 and Google Drive. Many faculty and researchers have been using Dropbox for years. has a Box agreement to provide sponsored storage both to individuals and campus departments, and is actively transitioning from pilot to production. Departments can purchase additional storage from Box, but the rate is unfavorable. In addition, as Office 365 rolls out this fall, we expect individuals to make extensive use of their free storage component, advertised as 1 TB per person. has agreements with Amazon (AWS) and Microsoft (Azure), but these are primarily for compute services; the storage component is minimal. The whole commercial storage market is in flux right now, especially in the educational arena. We re keeping a close eye out for developments which we could leverage to our advantage. IT is currently reviewing Amazon and Microsoft Azure as options for storage. Individual researchers/groups are using Amazon and Dropbox through their own arrangements/ personal accounts. Campus IT provides Box storage, as noted above. At the Library: The Library does not currently use services from commercial providers. At the campus level (C&C): Box and Google free services are utilized. Other paid services are utilized and funded via contracts / grants Amazon is listed as a service from the campus data center, but there has not been a great deal of uptake to date. Many individuals are using Dropbox and Google Drive. UC San Diego is providing a service with Microsoft OneDrive to provide a terabyte of storage for free to all campus users. The Library uses a small Azure instance to back up the ILS. The Library funds this. There are departments on campus that are using commercial providers (especially Google and Amazon) for various services that also involve data and storage. There are also researchers on campus using these services on their own, without departmental ownership of the process. Page 8 of 14
9 provides Box. Limit of 60GB per user. Currently funded by campus funds. Individual PIs are using commercial providers such as Amazon and numerous next gen sequencing software that provide storage as part of their solution. All campuses, except, which is considering to do so, take advantage of commercial cloud based services. Use is either through formal campus brokered arrangements or directly by faculty, staff, etc. 7. Are there any administrative, financial, or legal considerations that would prohibit or inhibit use of commercial service providers? For the type of data we have there are not legal considerations that we know of. I would tend towards services that allow us to control the file encryption (e.g. not Dropbox) and would prefer sites that use US only data centers but otherwise we just need to make sure that the storage solutions we select are capable of meeting vendor or campus security requirements. In general these issues are addressed by a contract or MOU with the provider, and that covers an exit strategy, if needed. There are a few cases where source of funding (e.g. DoD) or type of data (e.g. HIPPAA) require stricter oversight than many providers support but these are not handled locally either. For example, MC patient health data is stored at a Qwest data center that is a HIPAA certified tier 1 data center facility. Of more concern than legal issues of data security and privacy are issues of financial sustainability, to justify the effort involved in adopting the service and establishing the administrative and legal requirements for using the service. Yes, there are administrative, financial, or legal considerations to be taken into consideration, but they are usually dealt with at campus level. The final report of UC Irvine Infrastructure Cloud Strategy workgroup has been submitted recently, and it contained a number of recommendations and use cases re the use of commercial service providers for data storage. The main issues of concern here are legal and privacy. s software licensing group has done a lot of work on this, and maintains a matrix detailing exactly what types of data can be stored both on campus and with commercial providers. Details at Campus IT and our Research Compliance Office does not recommend storing confidential or restricted data on Box. For some our DoD related research we would be restricted to storage located in the US. At the Library: The Library sees issues inhibiting the use of commercial service providers due to the following concerns: FERPA requirements (e.g. patron database) UC wide policy campus policy At the campus level (C&C): HIPPA BAAs is the biggest concern. None that we know of at present. Page 9 of 14
10 The normal issues with HIPAA, FERPA, PII, Export Control, etc. The campus maintains information on this here: security.html HIPAA and FERPA compliance. Campuses must comply with HIPAA, FERPA, and funder mandated restrictions. 8. Is there a significant need for dark archival, as opposed to bright, storage services? There is, much of our storage (~70TB) should really be in a dim/dark archive. This would greatly reduce cost. This will be a key issue when talking about large datasets around research data. As explained in #2 above, we distinguish between data storage services and any curation services that are offered for data such as long term preservation. So any data stored would need to be available for retrieval and any type of dark archive would not meet this need. Yes, a recent survey of faculty highlighted the need for long term, dark storage especially with tied in data management services. Yes, primarily for backup of static data. We re hoping DPN will emerge as a viable preservation system; if it works as planned, we ll be able to repurpose significant amounts of storage. We would also consider other dark archive schemes. However, any such scenarios would have to include periodic brightening and verification as well as a well defined audit system. Yes, although we still need to get a handle on scale/volume. Presumably, we will need dim/dark storage in addition to bright for access. At the Library: Yes, the Library sees a need for dark archival for data which is not publically accessible, but limited need for dark storage on our local Library systems. The Library plans to use CDL Merritt and SDSC for dark storage needs. At the campus level (C&C): This is a growing need for dark archival, but it is not currently a huge issue for. Yes. There are units in the humanities and social sciences where both copyright and privacy are barriers to open access to research data. Significant needs? Yes. Significant unmet needs? Not so much. For example: A number of campus users have indicated that they need dark archival services. They are also putting this information into grant proposals they are writing. The Library DAMS is also preserved in Chronopolis, a dark archive. Chronopolis is also an entrance point to DPN, which will be the largest archive of its kind. No. All campuses, except, have a need for dark archival storage. Page 10 of 14
11 9. Would you be willing to accept lower performance storage, i.e., near line or off line, in exchange for lower cost? Absolutely in fact this should be a design consideration in future systems. No, if the assumption is that data are stored for processing or sharing, then poor performing data retrieval would be unacceptable. Yes, but for many schools this applies to long term archival data only. Yes. A low performance storage tier would be an essential part of any full featured storage system. I ll go out on a limb a bit here and say that this is an area where a well thought out tapebased storage system could come into play. Although cheap rotational storage has pushed tape out of vogue in the datacenter, new advances in Barium Ferrite technology currently allow for up to 150 TB to be stored on a single data cartridge. As this technology matures, and others hopefully emerge, I expect tape to become increasingly attractive for long term storage. Yes. At the Library: Yes, lower performance storage is adequate for some Library storage needs. At the campus level (C&C): already provides lower performance storage. The campus matches storage performance needs to the storage need in question. No. Speaking for the Library, the vast majority of the data needs to be immediately available and networked. In certain instances, yes. Perhaps. Cost seems an important factor for many. Most campuses would be willing to accept lower performance storage in exchange for lower costs, but this is generally limited to dark archival or preservation use cases. 10. Would you be willing to accept lower preservation assurances, i.e., less replication, in exchange for lower cost? Yes, but with the ability to control the data or preservation level of files in this level of service. For example, I would store all access images in a low/non preservation environment while I would want to store original/master files in a highly preserved and replicated environment. In my opinion the best way to balance the cost here is to turn to dim storage with longer retrieval times for these highly curated objects. Yes, depending on the users data needs, allowing for different levels of service on a sliding pricing scale would be acceptable if all data is online. It depends. The needs may be different for large data sets vs. main data or master files. No. We want to be able to sleep at night. Yes, depending on the nature of the assets; we would like to have options available. At the Library: The Library is not willing to accept lower preservation assurances. Page 11 of 14
12 At the campus level (C&C): Please see the answer to question #9. Yes, maybe. This would be situational. No, not below our minimum preservation requirements which include active monitoring and management of the data and multiple, geographically diverse replicated copies. Anything less would not meet our requirements. Perhaps depends upon the situation. Most campuses would be willing to accept lower preservation assurances in exchange for lower costs, but only in well defined use cases. 11. What are the minimally acceptable service level agreement terms for service offerings (commercial or other campus or CDL offerings)? This is a complex question. I think that Amazon's data SLA is pretty good and anyone getting into this would do well to study the preservation standards around data. At the least I would need some assurance around bit fixity, assurance of geographic co location or replication, assurance of retrieval time, and assurance of stability of cost. SLAs can include varying requirements from HIPAA compliance to U.S. only storage of data sets, and is dependent on the particular grant and or data set. In general, though, we require a clear exit strategy, all appropriate control of rights, and immediate online access to the data. There have to be provisions to cover losing data, which would be unacceptable. Some delay between request and restoration of needed archived data would be acceptable, so long as it didn t exceed a few days (again, depending on the type of data). Robustness is the top issue here. The big vendors all talk about ISO27k compliance, but the language in the SLA s is not ironclad. They in essence promise Best Practices (usually as they define them), but also include a limited warranty clause. Bottom line is that there s little transparency into how user data is protected once storage is outsourced, and there are no real guarantees. Let the buyer beware. Almost as important is availability. If remote storage component is part of an operational stack, it needs to be highly available. This is hard to measure as issues in the many network components between the service provider and customer can also fail, but at a minimum 99.9% ( three nines ) uptime is required for research providers, and better than that for library storage. For research and library data, privacy is not a significant concern, nor is encryption. Campus IT has agreement terms for Box service. Assurances re: availability/reliability, security, fixity, replication; as well as communications/reporting on status, usage From both the Library and from the campus level (C&C): Minimally acceptable service level agreement terms depend on the storage need in question (e.g. backup, archival, preservation, transactional, data publishing, sharing, etc.). Any. Depends on price comparison and services offered. This would depend on the service offering. For storage, no data loss. For retrieval, bright Page 12 of 14
13 archives would need to be instantaneous to perform regular fixity checking. Darker archives could have a longer retrieval rate, maybe comparable to Amazon Glacier. 24/7 phone access to help desk. 99.9% uptime will do for most people. Backups that are only good for disaster recovery and not casual restoration of files would probably also meet people's needs as long as the cost was right. And people would probably tolerate slower service if price was acceptable to them. Common needs across most campuses include adherence to minimal best practices (e.g., replication, fixity, security, etc.), legal compliance (e.g., HIPAA, etc.), predictable costs, "reasonable" uptime (i.e., 99.9%), and succession plans. 12. Is there a particular time horizon for the stored resources? E.g., do they need to be kept for 5 years, 10 years, forever, etc? Again, I think this varies. For library generated digital objects I think the correct horizon is probably 20 years with the expectation that over that time file migrations and storage technologies would develop sufficiently to result in a migration to a new platform. For versioned data sets and other more ephemeral information I could see a 5 10 year horizon with some provision for renewal. Final data or analyzed data tied to publication should probably be stored for some longer period of time but in this case there is the need to also store the runtime environment. Yes. A lot of granting agencies require 5 years for the data and publication of the results but this depends on the granting agency and nature of the research. We don t have a policy per se. Generally each PI is responsible for his/her own data management plan and we will provide assistance as needed to interface with the technology. This will vary from faculty to faculty and from school to school. Some faculty keep data for 5 7 years, while others keep it forever (in case someone refutes the work they have published). As we like to say, it depends on your definition of forever. This is really a policy issue more than a technical one. Moving data forward to new storage should not be overly difficult in a well designed system. Format migration is more problematic, but candidly, the issue is less about storage than about the willingness and resources of those holding the data in stewardship. Speaking generally, I m commonly hearing 20 years as the current preservation standard. That s a nice number mainly because it s beyond anyone s reasonable ability to predict the sorts of technical abilities we ll have down the line. The tacit assumption is that the cost of storage will continue to decrease logarithmically over time, and that in the future it will be economically viable to continue to preserve today s data using tomorrow s technology. A range, depending on the nature of the asset. At this point, we would look at 5 10 year horizon with option to renew. Library digital collections presume long term 10+ years, if not forever From both the Library and from the campus level (C&C): Storage is a complex ecosystem, please see the feedback at the bottom of this document, as well as the answer to question #11. Page 13 of 14
14 This is dependent upon the resources to be stored. Unique and curated Library materials need to be kept forever. There is an expectation from campus that data be available for at least six years. We ve been saying 10 years, and for the most part people seem okay with that time frame. One researcher references a UC policy that says research data must be maintained for 24 years, but we ve been unable to find any current reference. I think the time horizon will be largely determined by funder and publisher requirements. Specific retention requirements are dependent on the nature of the stored material and, in some cases, external mandates from funders, publishers, etc. In general, 5 10 years seems appropriate for "active" content, 20 years for final form content, and perpetuity for "unique library materials." 13. Feel free to include anything else you think would be useful From both the Library and from the campus level (C&C): Storage must be thought of as a complex ecosystem that has many service demands (e.g. backup, archival, preservation, transactional, data publishing, sharing, etc.) and many potential providers (e.g. campus, vendor cloud, UC cloud, etc.) at varying cost levels with associated SLAs. Increasingly, staff experience is needed to appropriately manage this ecosystem and assist faculty not only in provisioning storage but with securing it as well. Page 14 of 14
Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving
Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving November, 2013 Saqib Jang Abstract This white paper demonstrates how to increase profitability by reducing the operating costs of backup
Frequently Asked Questions about Cloud and Online Backup
Frequently Asked Questions about Cloud and Online Backup With more companies realizing the importance of protecting their mission-critical data, we know that businesses are also evaluating the resiliency
Research Data Storage, Sharing, and Transfer Options
Research Data Storage, Sharing, and Transfer Options Principal investigators should establish a research data management system for their projects including procedures for storing working data collected
10 How to Accomplish SaaS
10 How to Accomplish SaaS When a business migrates from a traditional on-premises software application model, to a Software as a Service, software delivery model, there are a few changes that a businesses
Data Storage Options for Research
Research IT Office Data Storage Options for Research By Ashok Mudgapalli Director of Research IT Agenda Current Research Data Storage Current Data Backup Strategies Available Storage Solution: Enterprise
How To Store Data On A Server Or Hard Drive (For A Cloud)
Introducing 365 Cloud Storage Local Enterprise Cloud Storage SERVICE AVAILABILITY: SEPTEMBER 23, 2014 What we do 365 Data Centers provides secure and reliable colocation services that offer an easier way
SUCCESSFUL SHAREPOINT IMPLEMENTATIONS. Maximize Application Availability and Protect Your Mission Critical Assets
SUCCESSFUL SHAREPOINT IMPLEMENTATIONS Maximize Application Availability and Protect Your Mission Critical Assets Brought to You By 5090 Richmond Avenue Suite #336 3 Second Street, Suite # 202 Houston,
THE CASE FOR ACTIVE DATA ARCHIVING
THE CASE FOR ACTIVE DATA ARCHIVING Written by Ray Quattromini 3 rd September 2007 Contact details Tel: 01256 782030 / 08704 286 186 Fax: 08704 286 187 Email: [email protected] Web: www.data-storage.co.uk
REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS
REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE STORAGE NICK JARVIS, DIRECTOR, FILE, CONTENT AND CLOUD SOLUTIONS VERTICALS AMERICAS WEBTECH EDUCATIONAL SERIES REDUCE COSTS AND COMPLEXITY WITH BACKUP-FREE
Daren Kinser Auditor, UCSD Jennifer McDonald Auditor, UCSD
Daren Kinser Auditor, UCSD Jennifer McDonald Auditor, UCSD Agenda Cloud Computing Technical Overview Cloud Related Applications Identified Risks Assessment Criteria Cloud Computing What Is It? National
ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions
ADVANCED DEDUPLICATION CONCEPTS Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and
THE FIRST LOCAL ENTERPRISE CLOUD STORAGE FEATURES. Enterprise iscsi (Block) & NFS/ CIFS (File) Storage-as-a-Service
365 Cloud Storage Businesses look to the cloud for flexibility and to reduce the risk and cost associated with buying dedicated infrastructure. 365 Cloud Storage provides a single-tenant, private cloud
WINDOWS AZURE DATA MANAGEMENT
David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A
Zadara Storage Cloud A whitepaper. @ZadaraStorage
Zadara Storage Cloud A whitepaper @ZadaraStorage Zadara delivers two solutions to its customers: On- premises storage arrays Storage as a service from 31 locations globally (and counting) Some Zadara customers
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
IBM Spectrum Protect in the Cloud
IBM Spectrum Protect in the Cloud. Disclaimer IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding
Selling Windows Azure Projects IT INFRASTRUCTURE
Selling Windows Azure Projects IT INFRASTRUCTURE A GUIDE FOR MICROSOFT SI PARTNERS Sponsored by Microsoft Corporation 1/ Why Should You Sell Infrastructure Projects that Use Windows Azure? 2/ Why Sell
The future is in the management tools. Profoss 22/01/2008
The future is in the management tools Profoss 22/01/2008 Niko Nelissen Co founder & VP Business development Q layer Agenda Introduction Virtualization today Server & desktop virtualization Storage virtualization
SELLING SHAREPOINT ENGAGEMENTS IN THE CLOUD ERA A GUIDE FOR MICROSOFT SI PARTNERS
SELLING SHAREPOINT ENGAGEMENTS IN THE CLOUD ERA A GUIDE FOR MICROSOFT SI PARTNERS Sponsored by Microsoft Corporation 1/ Selling SharePoint Online 2/ Selling SharePoint Farms on Windows Azure 3/ Selling
Solution Overview. Business Continuity with ReadyNAS
Business Continuity with ReadyNAS What is ReadyNAS? ReadyNAS is a family of network storage solutions for small and medium businesses, workgroups, and remote/home offices. ReadyNAS delivers unified storage
W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System
W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System Sponsored by: HP Noemi Greyzdorf November 2008 Robert Amatruda INTRODUCTION Global Headquarters:
University of Bristol. Research Data Storage Facility (the Facility) Policy Procedures and FAQs
University of Bristol Research Data Storage Facility (the Facility) Policy Procedures and FAQs This FAQs should be read in conjunction with the RDSF usage FAQs - https://www.acrc.bris.ac.uk/acrc/rdsf-faqs.html
WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007
WHITE PAPER Dedupe-Centric Storage Hugo Patterson, Chief Architect, Data Domain Deduplication Storage September 2007 w w w. d a t a d o m a i n. c o m - 2 0 0 7 1 DATA DOMAIN I Contents INTRODUCTION................................
Total Cost of Ownership Analysis
Total Cost of Ownership Analysis Abstract A total cost of ownership (TCO) analysis can measure the cost of acquiring and operating a new technology solution against a current installation. In the late
Low-cost BYO Mass Storage Project. James Cizek Unix Systems Manager Academic Computing and Networking Services
Low-cost BYO Mass Storage Project James Cizek Unix Systems Manager Academic Computing and Networking Services The Problem Reduced Budget Storage needs growing Storage needs changing (Tiered Storage) I
efolder BDR for Veeam Cloud Connection Guide
efolder BDR for Veeam Cloud Connection Guide Setup Connect Preload Data uh6 efolder BDR Guide for Veeam Page 1 of 36 INTRODUCTION Thank you for choosing the efolder Cloud for Veeam. Using the efolder Cloud
How To Choose A Cloud Computing Solution
WHITE PAPER How to choose and implement your cloud strategy INTRODUCTION Cloud computing has the potential to tip strategic advantage away from large established enterprises toward SMBs or startup companies.
Data In The Cloud: Who Owns It, and How Do You Get it Back?
Data In The Cloud: Who Owns It, and How Do You Get it Back? Presented by Dave Millier, Soban Bhatti, and Oleg Sotnikov 2013 Sentry Metrics Inc. Agenda Reasons for Cloud Adoption How Did My Data Get There?
Feet On The Ground: A Practical Approach To The Cloud Nine Things To Consider When Assessing Cloud Storage
Feet On The Ground: A Practical Approach To The Cloud Nine Things To Consider When Assessing Cloud Storage by seth anderson consultant audiovisual preservation solutions 2014 AVPreserve Media Archiving
Hosted SharePoint: Questions every provider should answer
Hosted SharePoint: Questions every provider should answer Deciding to host your SharePoint environment in the Cloud is a game-changer for your company. The potential savings surrounding your time and money
Electronic Records Storage Options and Overview
Electronic Records Storage Options and Overview www.archives.nysed.gov Objectives Understand the options for electronic records storage, including cloud-based storage Evaluate the options best suited for
VMware VDR and Cloud Storage: A Winning Backup/DR Combination
VMware VDR and Cloud Storage: A Winning Backup/DR Combination 7/29/2010 CloudArray, from TwinStrata, and VMware Data Recovery combine to provide simple, fast and secure backup: On-site and Off-site The
Cloud 101. Mike Gangl, Caltech/JPL, [email protected] 2015 California Institute of Technology. Government sponsorship acknowledged
Cloud 101 Mike Gangl, Caltech/JPL, [email protected] 2015 California Institute of Technology. Government sponsorship acknowledged Outline What is cloud computing? Cloud service models Deployment
Breaking the Storage Array Lifecycle with Cloud Storage
Breaking the Storage Array Lifecycle with Cloud Storage 2011 TwinStrata, Inc. The Storage Array Lifecycle Anyone who purchases storage arrays is familiar with the many advantages of modular storage systems
Technology Insight Series
Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary
Arkivum's Digital Archive Managed Service
ArkivumLimited R21 Langley Park Way Chippenham Wiltshire SN15 1GE UK +44 1249 405060 [email protected] @Arkivum arkivum.com Arkivum's Digital Archive Managed Service Service Description 1 / 13 Table of
5 Essential Benefits of Hybrid Cloud Backup
5 Essential Benefits of Hybrid Cloud Backup QBR is a backup, disaster recovery (BDR), and business continuity solution targeted to the small to medium business (SMB) market. QBR solutions are designed
Pacific Life Insurance Company
Pacific Life Insurance Company Pacific Life Turns Compliance into Business Advantage SOLUTION SNAPSHOT Applications: SAP, Oracle, Microsoft Exchange, Microsoft SQL Server, KVS, FileNet EMC Software: EMCControlCenter
Redefining Microsoft SQL Server Data Management
Redefining Microsoft SQL Server Data Management Contact Actifio Support As an Actifio customer, you can get support for all Actifio products through the Support Portal at http://support.actifio.com/. Copyright,
BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
ILM: Tiered Services & The Need For Classification
ILM: Tiered Services & The Need For Classification Edgar StPierre, EMC 2 SNW San Diego April 2007 SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies
The Modern Virtualized Data Center
WHITEPAPER The Modern Virtualized Data Center Data center resources have traditionally been underutilized while drawing enormous amounts of power and taking up valuable floorspace. Virtualization has been
APS Connect Denver, CO
New Generation Data Protection Powered by the Acronis AnyData Engine APS Connect Denver, CO Jon Farmer February 26, 2015 2015 Acronis Industry Leader in Data Protection Market Leading Solutions & Technology
Big data Devices Apps
Trends driving IT pressures Cloud Big data Devices Apps Devices: Info Workers Will Erase Boundary Between enterprise And Consumer Technologies. Forrester Research. August 30, 2012 Apps: Gartner: Predicts
Hitachi NAS Platform and Hitachi Content Platform with ESRI Image
W H I T E P A P E R Hitachi NAS Platform and Hitachi Content Platform with ESRI Image Aciduisismodo Extension to ArcGIS Dolore Server Eolore for Dionseq Geographic Uatummy Information Odolorem Systems
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
HYBRID ARCHITECTURE IN THE CLOUD
Revision 7.12-4 HIE ELECTRONICS: HYBRID ARCHITECTURE IN THE CLOUD Discussion of the Cloud and Tiered Architecture Hie Electronics, Inc. Copyright 2012 Introduction This paper compares some of the most
Backing up to the Cloud
Backing up to the Cloud Sean Lim, Director, Cloud Service Providers, APAC [email protected] @rhipecloud #RCCS15 Agenda Backup & Archive to the Cloud Veeam & Steelstore Cloud Service Provider Partners
Leveraging Dedicated Servers and Dedicated Private Cloud for HIPAA Security and Compliance
ADVANCED INTERNET TECHNOLOGIES, INC. https://www.ait.com Leveraging Dedicated Servers and Dedicated Private Cloud for HIPAA Security and Compliance Table of Contents Introduction... 2 Encryption and Protection
Building Storage Service in a Private Cloud
Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain
Addendum No. 1 to Packet No. 28-13 Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department
Addendum No. 1 to Packet No. 28-13 Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department The following clarifications, modifications and/or revisions to the above project shall
Tier 2 Nearline. As archives grow, Echo grows. Dynamically, cost-effectively and massively. What is nearline? Transfer to Tape
Tier 2 Nearline As archives grow, Echo grows. Dynamically, cost-effectively and massively. Large Scale Storage Built for Media GB Labs Echo nearline systems have the scale and performance to allow users
ANDREW HERTENSTEIN Manager Microsoft Modern Datacenter and Azure Solutions En Pointe Technologies Phone 317-362-1213
ANDREW HERTENSTEIN Manager Microsoft Modern Datacenter and Azure Solutions En Pointe Technologies Phone 317-362-1213 Application Compatibility Many organizations have business critical or internally
Call: 08715 900800. Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT
Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT The VirtuousIT DR/BC solution is designed around RecoveryShield from Thinking SAFE. The service includes a local backup appliance
TECHNICAL PAPER. Veeam Backup & Replication with Nimble Storage
TECHNICAL PAPER Veeam Backup & Replication with Nimble Storage Document Revision Date Revision Description (author) 11/26/2014 1. 0 Draft release (Bill Roth) 12/23/2014 1.1 Draft update (Bill Roth) 2/20/2015
Overcoming Backup & Recovery Challenges in Enterprise VMware Environments
Overcoming Backup & Recovery Challenges in Enterprise VMware Environments Daniel Budiansky Enterprise Applications Technologist Data Domain Dan Lewis Manager, Network Services USC Marshall School of Business
Licensing & Pricing FAQ
Licensing & Pricing FAQ Table of Contents DATACENTER AND STANDARD EDITIONS... 4 Q1. What is new in Windows Server 2012?... 4 Q2. What is the difference between Windows Server 2012 Standard edition and
A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage
2016 Industry Report Cloud Storage Rent or Buy? A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage Sponsored by StrongBox Data Solutions, Inc. Copyright 2016 Brad Johns Consulting
EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage
EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage Applied Technology Abstract This white paper describes various backup and recovery solutions available for SQL
The Art of High Availability
The Essentials Series: Configuring High Availability for Windows Server 2008 Environments The Art of High Availability by The Art of High Availability... 1 Why Do We Need It?... 1 Downtime Hurts... 1 Critical
Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration
Solutions Integrated Storage Appliances Management Optimized Storage & Migration Archive Data Retention & Compliance Services Global Installation & Support SECURING THE FUTURE OF YOUR DATA w w w.q sta
Symantec Backup Exec 2014
Symantec Backup Exec 2014 Maxim Tsvetaev Principal Systems Engineer Backup Exec 2014 1 Today s Top Backup and Recovery Challenges Reducing Costs and Complexity Protecting Virtual Machines Meeting Backup
VMware vsphere Data Protection
FREQUENTLY ASKED QUESTIONS VMware vsphere Data Protection vsphere Data Protection Advanced Overview Q. What is VMware vsphere Data Protection Advanced? A. VMware vsphere Data Protection Advanced is a backup
<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures
1 Refreshing Your Data Protection Environment with Next-Generation Architectures Dale Rhine, Principal Sales Consultant Kelly Boeckman, Product Marketing Analyst Program Agenda Storage
IT Survey 2014. Frank Dwyer Senior Director, Information Technology The Salk Institute, La Jolla, CA
IT Survey 2014 Frank Dwyer Senior Director, Information Technology The Salk Institute, La Jolla, CA Quick Overview 2012 35 responses Less bimodal distribution than previous survey 25% expect fast growth
We look beyond IT. Cloud Offerings
Cloud Offerings cstor Cloud Offerings As today s fast-moving businesses deal with increasing demands for IT services and decreasing IT budgets, the onset of cloud-ready solutions has provided a forward-thinking
Intro to AWS: Storage Services
Intro to AWS: Storage Services Matt McClean, AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved AWS storage options Scalable object storage Inexpensive archive
XenData Archive Series Software Technical Overview
XenData White Paper XenData Archive Series Software Technical Overview Advanced and Video Editions, Version 4.0 December 2006 XenData Archive Series software manages digital assets on data tape and magnetic
Understanding Enterprise NAS
Anjan Dave, Principal Storage Engineer LSI Corporation Author: Anjan Dave, Principal Storage Engineer, LSI Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA
Seagate Cloud Systems & Solutions
Seagate Cloud Systems & Solutions Market Challenge 3 Traditional Storage Is Not Viable Traditional Enterprise (EMC, NetApp, HP) DIY Public Cloud 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 4 Seagate
EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS
SOLUTION PROFILE EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS MAY 2012 Backups are essential for short-term data recovery
10 Must-Have Features for Every Virtualization Backup and Disaster Recovery Solution
Virtualization Backup and Recovery Solutions for the SMB Market The Essentials Series 10 Must-Have Features for Every Virtualization Backup and Disaster Recovery Solution sponsored by Introduction to Realtime
Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore. The Challenge SOLUTION GUIDE
Implementing Multi-Tenanted Storage for Service Providers with Cloudian HyperStore COST EFFECTIVE SCALABLE STORAGE PLATFORM FOR CLOUD STORAGE SERVICES SOLUTION GUIDE The Challenge Service providers (MSPs/ISPs/ASPs)
IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
Solution Brief: Creating Avid Project Archives
Solution Brief: Creating Avid Project Archives Marquis Project Parking running on a XenData Archive Server provides Fast and Reliable Archiving to LTO or Sony Optical Disc Archive Cartridges Summary Avid
