ORA-Data service charges - details [DRAFT v9] 1. ORA-Data is being developed by the Bodleian Libraries as the University s catalogue for Oxford research data, and as a data archive for datasets related to a publication and that cannot be deposited in a subject or national data archive. As well as supporting researchers to meet University requirements 1, ORA-Data will: i) play a critical role in retaining a record (including citable location) of Oxford research datasets relating to publications to gain credit in the REF ii) enable compliance with funders policies by allowing researchers to archive and cite a dataset relating to a publication ORA-Data provides this service to researchers where there is no other option such as a national or subject data archive. 2. ORA-Data provides a secure, long-term, accessible, citable, archival storage service (see APPENDIX 1 for service details). Such an archival service that safeguards valuable data assets for the long-term is costly. Most data repositories charge for archival services (see APPENDIX 2). Many funders expect to see realistic costs for data archiving in grant applications, particularly if the funder requires such data management (see APPENDIX 3). Costs include infrastructure, hardware maintenance and periodic replacement, staff (technical and service support), DOI assignment. 2 3. ORA-Data is governed by a series of policies underpinning the University s RDM and Open Data policy. These include Bodleian RDM policy and DOI policies. 3 4. There is no charge for depositing a metadata only record in ORA-Data. There may in future be a charge for assigning a DOI. 5. The Bodleian Libraries data archiving charging model is calculated per deposit 4. It is intended to encourage deposit during the testing and first iteration of the service. This draft cost model will be applied until 30/4/15 when it will be reviewed. 6. Academic Divisions have stated that they prefer a pro rata scale rather than a block charge per Terabyte (Tb). The draft pro-rata charge comprises two elements: a. A fixed baseline charge b. Lifetime storage charge per deposit based on size of files Fixed costs include costs of: review/checking deposits; Helpdesk service; Freely available item record in compliance with common standards; DOI; Commitment to longevity so that the item remains findable, accessible and citable (see document what to look for in a data archive ). When completed, the fully functioning service storage charge costs will include: resilient storage; two disk copies plus tape backup; power overheads; hardware refresh; system management 1 RDM and open data policy http://researchdata.ox.ac.uk/university-of-oxford-policy-on-the-management-of-researchdata-and-records/ 2 Although personal hard drives, memory sticks and additional hard drives from well know local retailers offer storage at a cheap cost, this type of storage should not be confused with, secure, long-term archival storage. Those wishing to use free/cheap third party offerings such as FigShare should scrutinize the terms and conditions to ensure they provide the service that the researcher requires. Some storage such as HFS is not designed to provide web access. 3 http://www.bodleian.ox.ac.uk/bodley/about-us/policies/preservation 4 Each item deposited is described as a package and may comprise multiple elements (such as image files, a spreadsheet, a licence and so on). Datasets can be zipped up for deposit. Each package has one item catalogue record and is assigned one DOI 1
7. ORA-Data charges are calculated as upfront costs, i.e. to be paid at the point of deposit. The reasons for this are two-fold: a. Funded research projects are not able to charge costs post-project. b. Collecting an annual fee becomes increasingly difficult over time and adds to the administrative burden and therefore cost. The one-off upfront charge covers the period from deposit for as long at the Bodleian holds the data. If the data are removed for any reason a catalogue record will remain in ORA-Data. 8. Charges for archiving datasets in ORA-data (including datasets deposited via ORDS). Every dataset deposited is subject to a baseline ingest and curation charge. Lifetime storage costs are calculated separately and should be added to the baseline charge. See Table A a) Baseline ingest and curation charge b) Calculate the size of the data package rounded up to nearest whole Gb c) TOTAL = Baseline charge + Lifetime storage charge TABLE A: ORA-Data charges Citable, discoverable record in ORA-Data Metadata/catalogue record only Metadata/catalogue record with minted DOI (data archives that are total FREE FREE 5 responsibility of Oxford University only). See DOI assignment policy. Deposit of archival dataset files A) Fixed baseline charge per deposit (See APPENDIX 4) 140 B) Lifetime storage charge for dataset per Gb. (See APPENDIX 5) 5 per Gb Total = Baseline charge + Lifetime storage charge These charges address directly incurred expenditure on datasets that are not covered by costs of general development of the RDM infrastructure. Baseline costs comprise the costs of staff to manage and curate each item over the long-term. Rates have been calculated using the IT Services model (see http://www.oucs.ox.ac.uk/infodev/charges.xml). 9. Charges will be listed in X5 for inclusion as direct costs in grant applications. Procedures for administering the charges will be put in place. 10. The Libraries will examine the suitability of the charging model for application to similar services such as Digital Safe U/C (a prototype University and College service for archiving administrative data). 11. Some researchers produce data as a result of work not supported by external funding. It is assumed this will be research that produces relatively small datasets (mainly in the humanities and social sciences). This group may find they are required to cite data underpinning a publication. The freemium model in Table A proposes to address this scenario Governance and review 12. ORA-Data archiving service charges are calculated by the Bodleian Libraries and ratified by the Oxford RDM and Open Data WG (Chaired by PVC-Research and including representatives from Academic Divisions, IT Services, OUP). The charging model will be reviewed regularly. Sally Rumsey. The Bodleian Libraries, 27 th January 2015 5 Although there is a cost to the Bodleian Libraries of assigning DOIs, administration costs may greatly exceed the cost recovered. Initially the DOI cost to the record depositor will be waived. The Bodleian Libraries reserves the right to review and amend this policy at a later date. 2
APPENDIX 1: Service details Web based service. ORA-Data Data are discoverable, citable and accessible online Rich metadata to accepted standards for discovery, description, reporting and funder compliance can be assigned to each dataset DOIs (Digital Object Identifiers) are assigned to datasets for citation in accordance with DataCite policy A globally accessible catalogue record (sometimes known as a landing page ) is created for your dataset to accepted standards, and that is persistent, freely available, harvestable and citable Option for linking data with publications Deposit review: metadata and files are checked and, if necessary, enhanced by Bodleian staff before release to open ORA-Data Datasets can be made freely available online Bit level preservation. Data are retained in the same form as they were submitted ( what goes in comes out ). Files are not migrated to other file formats. Commitment to longevity of the service by the Bodleian Libraries. Data will be retained beyond staff employment at Oxford and to comply with funder and other requirements. Multi-location backed-up service Dedicated Email and telephone helpdesk service Memorandum of understanding [will be published in due course] Comparison with IT Services HFS Software application requiring local installation. Identical software required to download data as was used originally. Data accessible only via account from which it was backed up / archived. Single field for metadata Identifiers not assigned Bit level preservation. Data are retained in the same form as they were submitted ( what goes in comes out ). Files are not migrated to other file formats. Commitment to longevity of the service by IT Services. Personal accounts expire after individual leaves the university. Long-term archival accounts for project data. Multi-location 3-copy tape storage service General IT Helpdesk for queries Formal Service Level Description (SLD) available from http://help.it.ox.ac.uk/internal/sld/hfs. Clarifications Any individual file format is accepted. File formats used within the Bodleian are likely to receive greater support. Researchers can contact BDLSS for advice. Multiple files should be zipped up before deposit File directory structure of the dataset can be maintained Archival formats are accepted (tar; zip; BagIt; ResearchObjects) ORA-Data does not provide active (live) data storage only archival storage See also ORA-Data Statement and ORA-Data Acceptance & management policy 3
APPENDIX 2: Comparator costs Organisation Model Cost Criteria Cost of archiving 50 Gb for 10 years Oxford draft Edinburgh [DataStore] UCL Imperial Cambridge [DSpace@Cam] Princeton [2009] Purdue (data publications) Columbia FMRIB, Oxford Fixed cost plus charge per Gb FREE quota 0.5Tb p.a. per person. Work in progress [DETAILS ARE CONFIDENTIAL] No model as at August 2014 Placeholder policy One time fee One time fee 1Gb FREE quota. 10 Gb quota per funded project. Up to 10Gb FREE. > 10 Gb one time charge Account charge Low 100 pa/high 1000 pa 140 + 5 per Gb Additional storage 200 per TB per year Possible free quota then charge per Tb Indefinite storage & availability of bitstream Charges for DataStore live data service not for DataShare repository 390 FREE? 2,000 + VAT per TB $0.0006 per Mb Additional space charged $14.30 per Gb p.a. $5 per Gb over 10Gb Additional 0.10 per Gb per month Amazon cloud FREE 5Gb 20 Gb = $10 pa; 100 Gb = $50 pa MS 365 Jisc Data Frwk. Business plan or Jisc negotiated 2 models: Pay as you go and pay upfront. For bulk purchase by institutions. Cloud service Business plan. 1Tb storage per user Paid up price eg 1 100Tb 3000 per Tb for 10yrs. Min commitment 1Tb Indefinite storage & availability of bitstream Min charge $0.60. Prices TBC. Includes 54% overhead. Retained for 10 years Live storage not repository. Tape backup protected area [ 105] Up to 300 users [midsize plan]. Cloud based. 8 per user per month 2000 (if no pro rata option) 30.72 NOTE: 2009 figures Further details http://www.ed.ac.uk/school s-departments/informationservices/researchsupport/datamanagement/data-storage http://www.lib.cam.ac.uk/r epository/policies.html http://dataspace.princeton. edu/jspui/about/dataspace PnG.pdf $572 https://purr.purdue.edu/ab out/pricing $200 $250 Does not include review services 960 Does not include review services [ 147] Does not include review services http://www.fmrib.ox.ac.uk/ support/computingsupport/it_charges https://www.amazon.com/c louddrive/learnmore#planssection http://office.microsoft.com/ en-gb/business/compareoffice-365-for-businessplans-fx102918419.aspx or http.janet/products- services/microsoft-office- 365 https://www.ja.net/product s-services/janet-cloudservices/data-archivingframework 4
APPENDIX 3: Examples of data archiving costs included in two successful AHRC grant applications Project title: Tudor Partbooks: the manuscript legacies of John Sadler, John Baldwin and their antecedents Funder: AHRC Start date: June 2014 Successful bid included line for data storage: Data storage for image masters, for period of the project - 3 years (Oxford) 3000 Additional funding was awarded for interim data storage: Funding to the partner libraries ( 48,000) covers administrative and conservator time, interim data storage costs, delivery media and exhibition costs. Project title: DIAMM: Digital Image Archive of Medieval Music Funder: AHRC Start date: March 2010 Successful bid included line for data storage: 5TB upgrade to the Bodleian storage facility dedicated to the proposed project data 10,000 Excerpts from the technical appendix: The current storage system, provided through the Bodleian Library ensures both the dark preservation of the data and its accessibility to project and library staff in the future. The main focus of sustainability planning in the project has been to ensure a steady income that will meet as a minimum the costs of archiving, necessary upgrades to the web resource to keep it live and part-time project management. Dark archive preservation for digital images is provided by the Bodleian Library. 5
APPENDIX 4: DRAFT Baseline charge Baseline costs are per deposit 10Gb and over, and comprise the costs of staff to manage and curate each item. Rates have been calculated using the IT Services model as employed at http://www.oucs.ox.ac.uk/infodev/charges.xml. Service staffing costs 1. Review staff Check, enhance metadata, contact with depositor, other assistance as required Assume average 20 mins per deposit Day rate @ 365 = 50 per hour 20 mins = 16.66 Round to 20 per item 2. SysAdmin/IT maintenance/curation Assume 0.25 day over the life of the item Day rate @ 470 = 117.50 per item Round to 120 per item NOTE Charges will be reviewed in collaboration with ITS Average staff time spent on review will be monitored over the first year and adjusted as necessary 6
APPENDIX 5: Cost model for long term retention of digital content Calculated at 5000 per Tb. Equates to ~ 5 per Gb Baseline assumptions: The content will be held on spinning disk (or other low latency storage) to provide high availability along with a tape (or other high latency, highly resilient storage) for disaster recovery and to ensure two good copies in the event of corruption/tampering. Cost of storage hardware halves approximately every refresh cycle (5 years) so the cost of perpetual holding of a given volume of data is the sum of a diminishing series of terms which is equal to twice the initial cost. Other studies (Princeton/CDL) have quoted 4 years but fail to account for events such as the Thailand flooding and the Japan tsunami which favour our more conservative estimate. Although power and system administration personnel costs do not likewise diminish, improvements in power efficiency and manageability mean that the power and administration overheads per unit volume of data can be expected to scale in much the same way as baseline cost. The increasing space efficiency of digital storage systems means that the estates cost of accommodating the machinery likewise decreases on a similar basis. Consequently the total cost for long term retention of material is twice the total cost of operating the service for one refresh cycle (as the efficiency reductions in costs only acrrue through when a transition to the next technology cycle is made). This ignores any curation costs for active digital preservation which does not scale correspondingly but is outside the scope of this model. Current calculation: Commodity disk storage: Dell Powervault NX3200 12x4TB disks, 48Tb Gross: 17500 (inc 5 year warranty cover) o After RAID overhead we have 32TB net available storage = 547 per TB o Power consumption of 500W at 7p per kwh over 5 years = 48 per TB o System administration, assume typical 25:1 ratio @ 50K pa = 312.50 per TB o Accommodation: Oxford Data Centre 10 per rack unit per month = 37.50 per TB o Cooling: Oxford Data Centre: 50% of power cost = 24 per TB Total cost for ONE disk copy is 969 over 5 years For simplicity we assume the other disk copy will cost essentially the same, although for preservation purposes it should use a different hardware/software stack to avoid systematic failure modes. Tape copy: Arkivum conveniently quote 1500 per TB for 5 years storage with three tape copies o Arkivum also quote a renewal price of 50% after 5 years which matches our model precisely o Assuming Arkivum's business cost model is sustainable and balancing their economies of scale with profit requirements we can deduce a ballpark for a single tape copy of 500 per TB over five years o Total cost for 2 disk copies and one tape is therefore 2438 over 5 years rounded up to 2500 to cover some administrative overheads Total lifetime cost for storage is therefore twice that, at 5,000 per TB Neil Jefferies, The Bodleian Libraries 7