ORA-Data service charges - details [DRAFT v9]



Similar documents
Research Data Management Guide

Research Data Services at London s Global University. UCL Research Data Services RECODE Meeting 14 th Jan 2015

THE UNIVERSITY OF LEEDS. Vice Chancellor s Executive Group Funding for Research Data Management: Interim

A Guide to the Research Data Service

Research Data Storage and the University of Bristol

1) Is it appropriate for institutions to develop research data management services and e infrastructure using research funding streams?

LIBER Case Study: University of Oxford Research Data Management Infrastructure

EPSRC Research Data Management Compliance Report

Clarifications of EPSRC expectations on research data management.

The BEAR Management Group will report to the University Research Committee.

Research Data Management Policy

RESEARCH DATA MANAGEMENT POLICY

Research Data Understanding your choice for data placement

University of Oxford RCUK open access compliance report

Pricing Document Cloud Storage Infrastructure as a Service (IaaS)

Working with the British Library and DataCite Institutional Case Studies

Arkivum's Digital Archive Managed Service

Long term retention and archiving the challenges and the solution

Research Data Storage, Sharing, and Transfer Options

DataShare & Data Audit. Lessons Learned. Robin Rice. Digital Curation Practice, Promise and Prospects

The data landscape lessons from UK

NERC Biodiversity and Ecosystem Service Sustainability (BESS) Data Management Strategy

Considerations for Research Data Management

Research Data Storage, Sharing, and Transfer Options

SURFsara Data Services

Optimising Data Management: full listing of deliverables. College storage approach

Research Data Management: The library s role

IMPROVE YOUR DATA MANAGEMENT

A Comparative TCO Study: VTLs and Physical Tape. With a Focus on Deduplication and LTO-5 Technology

Research Data Management PROJECT LIFECYCLE

OpenAIRE Research Data Management Briefing paper

How To Cost Model For Digital Preservation

Protecting Information in a Smarter Data Center with the Performance of Flash

Arkivum s 500% Lifetime Guarantee

Data Management Planning

HOW ARKIVUM USES LTFS TO PROVIDE ONLINE ARCHIVING AS A SERVICE

Hitachi Content Platform. Andrej Gursky, Solutions Consultant May 2015

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Research Data Management

Collection Policy. University Library. 1. Introduction. 2. Collection strengths

A grant number provides unique identification for the grant.

Big Data Analytics Service Definition G-Cloud 7

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Introduction to Research Data Management

Data management plan

and Disaster Recovery Environment Corporate Technologies, Inc.

Oxford Digital Asset Management System (DAMS) Update

ADDENDUM 1 September 22, 2015 Request for Proposals: Data Center Implementation

How To Store s On A Server Or On A Hard Drive

B. Preservation is not limited to simply avoiding affirmative acts of destruction because day-to-day operations routinely alter or destroy evidence.

Libraries and Disaster Recovery

How To Save Money On Backup With Disk Backup With Deduplication

DATA CITATION. what you need to know

irods in complying with Public Research Policy

The Modern Virtualized Data Center

Mapping the Technical Dependencies of Information Assets

University of Bristol. Research Data Storage Facility (the Facility) Policy Procedures and FAQs

Writing a Wellcome Trust Data Management & Sharing Plan

INFORMATION UPDATE: Removable media - Storage and Retention of Data - Research Studies

ISS Student Data Storage Policy Security: Internal Only

AHDS Digital Preservation Glossary

Combining Onsite and Cloud Backup

RCUK Policy on Open Access and Supporting Guidance

Columbia University Digital Library Architecture. Robert Cartolano, Director Library Information Technology Office October, 2009

The Key Elements of Digital Asset Management

Is Hyperconverged Cost-Competitive with the Cloud?

Modification after decommission of AX100 SAN (RVN00-FILEDR) /3/2009 Deputy IT Operations Manager

Research Data Management - The Essentials

Integrating archive data storage with institutional repositories

Manchester City Council Report for Information. Report to: Resource and Governance Overview and Scrutiny Committee - 15 December 2011

Groupware Project Definition: Scope of Project

Introduction to Research Data Management. Tom Melvin, Anita Schwartz, and Jessica Cote April 13, 2016

Introduction to AWS Economics

Research Data Management Policy

Benefits of managing and sharing your data

Backup & Disaster Recovery Options

The Group s membership is drawn from across a set of very different sectors, with discrete requirements that need to be balanced.

DATA LIFE CYCLE & DATA MANAGEMENT PLANNING

data.bris: collecting and organising repository metadata, an institutional case study

HPSS Best Practices. Erich Thanhardt Bill Anderson Marc Genty B

How To Manage Research Data At Columbia

Information Management Advice 18 - Managing records in business systems Part 1: Checklist for decommissioning business systems

Symantec Backup Exec 2014 TM Licensing Guide

BT Ireland and the Cloud

NCTE Advice Sheet Storage and Backup Advice Sheet 7

Deduplication has been around for several

Request for Proposal for Backup Medium

Research Data Management Policy. Glasgow School of Art

ISS Student Data Storage Policy Security: Internal Only

Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

TECHNICAL PAPER. Veeam Backup & Replication with Nimble Storage

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

An Introduction to Managing Research Data

Data Management and Retention for Standards Consortia

Local Government Cyber Security:

Progress Report Template -

White Paper. Why Should You Archive Your With a Hosted Service?

How To Store Data In A Cloud Environment

Transcription:

ORA-Data service charges - details [DRAFT v9] 1. ORA-Data is being developed by the Bodleian Libraries as the University s catalogue for Oxford research data, and as a data archive for datasets related to a publication and that cannot be deposited in a subject or national data archive. As well as supporting researchers to meet University requirements 1, ORA-Data will: i) play a critical role in retaining a record (including citable location) of Oxford research datasets relating to publications to gain credit in the REF ii) enable compliance with funders policies by allowing researchers to archive and cite a dataset relating to a publication ORA-Data provides this service to researchers where there is no other option such as a national or subject data archive. 2. ORA-Data provides a secure, long-term, accessible, citable, archival storage service (see APPENDIX 1 for service details). Such an archival service that safeguards valuable data assets for the long-term is costly. Most data repositories charge for archival services (see APPENDIX 2). Many funders expect to see realistic costs for data archiving in grant applications, particularly if the funder requires such data management (see APPENDIX 3). Costs include infrastructure, hardware maintenance and periodic replacement, staff (technical and service support), DOI assignment. 2 3. ORA-Data is governed by a series of policies underpinning the University s RDM and Open Data policy. These include Bodleian RDM policy and DOI policies. 3 4. There is no charge for depositing a metadata only record in ORA-Data. There may in future be a charge for assigning a DOI. 5. The Bodleian Libraries data archiving charging model is calculated per deposit 4. It is intended to encourage deposit during the testing and first iteration of the service. This draft cost model will be applied until 30/4/15 when it will be reviewed. 6. Academic Divisions have stated that they prefer a pro rata scale rather than a block charge per Terabyte (Tb). The draft pro-rata charge comprises two elements: a. A fixed baseline charge b. Lifetime storage charge per deposit based on size of files Fixed costs include costs of: review/checking deposits; Helpdesk service; Freely available item record in compliance with common standards; DOI; Commitment to longevity so that the item remains findable, accessible and citable (see document what to look for in a data archive ). When completed, the fully functioning service storage charge costs will include: resilient storage; two disk copies plus tape backup; power overheads; hardware refresh; system management 1 RDM and open data policy http://researchdata.ox.ac.uk/university-of-oxford-policy-on-the-management-of-researchdata-and-records/ 2 Although personal hard drives, memory sticks and additional hard drives from well know local retailers offer storage at a cheap cost, this type of storage should not be confused with, secure, long-term archival storage. Those wishing to use free/cheap third party offerings such as FigShare should scrutinize the terms and conditions to ensure they provide the service that the researcher requires. Some storage such as HFS is not designed to provide web access. 3 http://www.bodleian.ox.ac.uk/bodley/about-us/policies/preservation 4 Each item deposited is described as a package and may comprise multiple elements (such as image files, a spreadsheet, a licence and so on). Datasets can be zipped up for deposit. Each package has one item catalogue record and is assigned one DOI 1

7. ORA-Data charges are calculated as upfront costs, i.e. to be paid at the point of deposit. The reasons for this are two-fold: a. Funded research projects are not able to charge costs post-project. b. Collecting an annual fee becomes increasingly difficult over time and adds to the administrative burden and therefore cost. The one-off upfront charge covers the period from deposit for as long at the Bodleian holds the data. If the data are removed for any reason a catalogue record will remain in ORA-Data. 8. Charges for archiving datasets in ORA-data (including datasets deposited via ORDS). Every dataset deposited is subject to a baseline ingest and curation charge. Lifetime storage costs are calculated separately and should be added to the baseline charge. See Table A a) Baseline ingest and curation charge b) Calculate the size of the data package rounded up to nearest whole Gb c) TOTAL = Baseline charge + Lifetime storage charge TABLE A: ORA-Data charges Citable, discoverable record in ORA-Data Metadata/catalogue record only Metadata/catalogue record with minted DOI (data archives that are total FREE FREE 5 responsibility of Oxford University only). See DOI assignment policy. Deposit of archival dataset files A) Fixed baseline charge per deposit (See APPENDIX 4) 140 B) Lifetime storage charge for dataset per Gb. (See APPENDIX 5) 5 per Gb Total = Baseline charge + Lifetime storage charge These charges address directly incurred expenditure on datasets that are not covered by costs of general development of the RDM infrastructure. Baseline costs comprise the costs of staff to manage and curate each item over the long-term. Rates have been calculated using the IT Services model (see http://www.oucs.ox.ac.uk/infodev/charges.xml). 9. Charges will be listed in X5 for inclusion as direct costs in grant applications. Procedures for administering the charges will be put in place. 10. The Libraries will examine the suitability of the charging model for application to similar services such as Digital Safe U/C (a prototype University and College service for archiving administrative data). 11. Some researchers produce data as a result of work not supported by external funding. It is assumed this will be research that produces relatively small datasets (mainly in the humanities and social sciences). This group may find they are required to cite data underpinning a publication. The freemium model in Table A proposes to address this scenario Governance and review 12. ORA-Data archiving service charges are calculated by the Bodleian Libraries and ratified by the Oxford RDM and Open Data WG (Chaired by PVC-Research and including representatives from Academic Divisions, IT Services, OUP). The charging model will be reviewed regularly. Sally Rumsey. The Bodleian Libraries, 27 th January 2015 5 Although there is a cost to the Bodleian Libraries of assigning DOIs, administration costs may greatly exceed the cost recovered. Initially the DOI cost to the record depositor will be waived. The Bodleian Libraries reserves the right to review and amend this policy at a later date. 2

APPENDIX 1: Service details Web based service. ORA-Data Data are discoverable, citable and accessible online Rich metadata to accepted standards for discovery, description, reporting and funder compliance can be assigned to each dataset DOIs (Digital Object Identifiers) are assigned to datasets for citation in accordance with DataCite policy A globally accessible catalogue record (sometimes known as a landing page ) is created for your dataset to accepted standards, and that is persistent, freely available, harvestable and citable Option for linking data with publications Deposit review: metadata and files are checked and, if necessary, enhanced by Bodleian staff before release to open ORA-Data Datasets can be made freely available online Bit level preservation. Data are retained in the same form as they were submitted ( what goes in comes out ). Files are not migrated to other file formats. Commitment to longevity of the service by the Bodleian Libraries. Data will be retained beyond staff employment at Oxford and to comply with funder and other requirements. Multi-location backed-up service Dedicated Email and telephone helpdesk service Memorandum of understanding [will be published in due course] Comparison with IT Services HFS Software application requiring local installation. Identical software required to download data as was used originally. Data accessible only via account from which it was backed up / archived. Single field for metadata Identifiers not assigned Bit level preservation. Data are retained in the same form as they were submitted ( what goes in comes out ). Files are not migrated to other file formats. Commitment to longevity of the service by IT Services. Personal accounts expire after individual leaves the university. Long-term archival accounts for project data. Multi-location 3-copy tape storage service General IT Helpdesk for queries Formal Service Level Description (SLD) available from http://help.it.ox.ac.uk/internal/sld/hfs. Clarifications Any individual file format is accepted. File formats used within the Bodleian are likely to receive greater support. Researchers can contact BDLSS for advice. Multiple files should be zipped up before deposit File directory structure of the dataset can be maintained Archival formats are accepted (tar; zip; BagIt; ResearchObjects) ORA-Data does not provide active (live) data storage only archival storage See also ORA-Data Statement and ORA-Data Acceptance & management policy 3

APPENDIX 2: Comparator costs Organisation Model Cost Criteria Cost of archiving 50 Gb for 10 years Oxford draft Edinburgh [DataStore] UCL Imperial Cambridge [DSpace@Cam] Princeton [2009] Purdue (data publications) Columbia FMRIB, Oxford Fixed cost plus charge per Gb FREE quota 0.5Tb p.a. per person. Work in progress [DETAILS ARE CONFIDENTIAL] No model as at August 2014 Placeholder policy One time fee One time fee 1Gb FREE quota. 10 Gb quota per funded project. Up to 10Gb FREE. > 10 Gb one time charge Account charge Low 100 pa/high 1000 pa 140 + 5 per Gb Additional storage 200 per TB per year Possible free quota then charge per Tb Indefinite storage & availability of bitstream Charges for DataStore live data service not for DataShare repository 390 FREE? 2,000 + VAT per TB $0.0006 per Mb Additional space charged $14.30 per Gb p.a. $5 per Gb over 10Gb Additional 0.10 per Gb per month Amazon cloud FREE 5Gb 20 Gb = $10 pa; 100 Gb = $50 pa MS 365 Jisc Data Frwk. Business plan or Jisc negotiated 2 models: Pay as you go and pay upfront. For bulk purchase by institutions. Cloud service Business plan. 1Tb storage per user Paid up price eg 1 100Tb 3000 per Tb for 10yrs. Min commitment 1Tb Indefinite storage & availability of bitstream Min charge $0.60. Prices TBC. Includes 54% overhead. Retained for 10 years Live storage not repository. Tape backup protected area [ 105] Up to 300 users [midsize plan]. Cloud based. 8 per user per month 2000 (if no pro rata option) 30.72 NOTE: 2009 figures Further details http://www.ed.ac.uk/school s-departments/informationservices/researchsupport/datamanagement/data-storage http://www.lib.cam.ac.uk/r epository/policies.html http://dataspace.princeton. edu/jspui/about/dataspace PnG.pdf $572 https://purr.purdue.edu/ab out/pricing $200 $250 Does not include review services 960 Does not include review services [ 147] Does not include review services http://www.fmrib.ox.ac.uk/ support/computingsupport/it_charges https://www.amazon.com/c louddrive/learnmore#planssection http://office.microsoft.com/ en-gb/business/compareoffice-365-for-businessplans-fx102918419.aspx or http.janet/products- services/microsoft-office- 365 https://www.ja.net/product s-services/janet-cloudservices/data-archivingframework 4

APPENDIX 3: Examples of data archiving costs included in two successful AHRC grant applications Project title: Tudor Partbooks: the manuscript legacies of John Sadler, John Baldwin and their antecedents Funder: AHRC Start date: June 2014 Successful bid included line for data storage: Data storage for image masters, for period of the project - 3 years (Oxford) 3000 Additional funding was awarded for interim data storage: Funding to the partner libraries ( 48,000) covers administrative and conservator time, interim data storage costs, delivery media and exhibition costs. Project title: DIAMM: Digital Image Archive of Medieval Music Funder: AHRC Start date: March 2010 Successful bid included line for data storage: 5TB upgrade to the Bodleian storage facility dedicated to the proposed project data 10,000 Excerpts from the technical appendix: The current storage system, provided through the Bodleian Library ensures both the dark preservation of the data and its accessibility to project and library staff in the future. The main focus of sustainability planning in the project has been to ensure a steady income that will meet as a minimum the costs of archiving, necessary upgrades to the web resource to keep it live and part-time project management. Dark archive preservation for digital images is provided by the Bodleian Library. 5

APPENDIX 4: DRAFT Baseline charge Baseline costs are per deposit 10Gb and over, and comprise the costs of staff to manage and curate each item. Rates have been calculated using the IT Services model as employed at http://www.oucs.ox.ac.uk/infodev/charges.xml. Service staffing costs 1. Review staff Check, enhance metadata, contact with depositor, other assistance as required Assume average 20 mins per deposit Day rate @ 365 = 50 per hour 20 mins = 16.66 Round to 20 per item 2. SysAdmin/IT maintenance/curation Assume 0.25 day over the life of the item Day rate @ 470 = 117.50 per item Round to 120 per item NOTE Charges will be reviewed in collaboration with ITS Average staff time spent on review will be monitored over the first year and adjusted as necessary 6

APPENDIX 5: Cost model for long term retention of digital content Calculated at 5000 per Tb. Equates to ~ 5 per Gb Baseline assumptions: The content will be held on spinning disk (or other low latency storage) to provide high availability along with a tape (or other high latency, highly resilient storage) for disaster recovery and to ensure two good copies in the event of corruption/tampering. Cost of storage hardware halves approximately every refresh cycle (5 years) so the cost of perpetual holding of a given volume of data is the sum of a diminishing series of terms which is equal to twice the initial cost. Other studies (Princeton/CDL) have quoted 4 years but fail to account for events such as the Thailand flooding and the Japan tsunami which favour our more conservative estimate. Although power and system administration personnel costs do not likewise diminish, improvements in power efficiency and manageability mean that the power and administration overheads per unit volume of data can be expected to scale in much the same way as baseline cost. The increasing space efficiency of digital storage systems means that the estates cost of accommodating the machinery likewise decreases on a similar basis. Consequently the total cost for long term retention of material is twice the total cost of operating the service for one refresh cycle (as the efficiency reductions in costs only acrrue through when a transition to the next technology cycle is made). This ignores any curation costs for active digital preservation which does not scale correspondingly but is outside the scope of this model. Current calculation: Commodity disk storage: Dell Powervault NX3200 12x4TB disks, 48Tb Gross: 17500 (inc 5 year warranty cover) o After RAID overhead we have 32TB net available storage = 547 per TB o Power consumption of 500W at 7p per kwh over 5 years = 48 per TB o System administration, assume typical 25:1 ratio @ 50K pa = 312.50 per TB o Accommodation: Oxford Data Centre 10 per rack unit per month = 37.50 per TB o Cooling: Oxford Data Centre: 50% of power cost = 24 per TB Total cost for ONE disk copy is 969 over 5 years For simplicity we assume the other disk copy will cost essentially the same, although for preservation purposes it should use a different hardware/software stack to avoid systematic failure modes. Tape copy: Arkivum conveniently quote 1500 per TB for 5 years storage with three tape copies o Arkivum also quote a renewal price of 50% after 5 years which matches our model precisely o Assuming Arkivum's business cost model is sustainable and balancing their economies of scale with profit requirements we can deduce a ballpark for a single tape copy of 500 per TB over five years o Total cost for 2 disk copies and one tape is therefore 2438 over 5 years rounded up to 2500 to cover some administrative overheads Total lifetime cost for storage is therefore twice that, at 5,000 per TB Neil Jefferies, The Bodleian Libraries 7