Preserving French Scientific data Marion MASSOL (CINES) marion.massol@cines.fr DARIAH General VCC Meeting November 28 th, 29 th, 30 th 2012
AGENDA 1. Preserving data: our mission and strategy 4. The file formats expertise: File format validator, FACILE, white books 28 th November 2012 DARIAH General VCC Meeting 2
In 2004 the CINES was given the mandate to provide long-term preservation capabilities for digital objects related to scientific and technical information. August 7 th, 2006 : Arrêté relatif aux modalités de dépôt, de signalement, de reproduction, de diffusion et de conservation des thèses ou des travaux présentés en soutenance en vue d un doctorat The CINES became the official preservation centre for electronic PhD theses February 12th, 2008 : Lettre de cadrage du ministère Reinforced the two main activities of the CINES : high performance computing and long-term preservation of digital documents Objectives : the rollout of effective, high-performance, scalable, secure and inexpensive solutions for the education and research digital heritage Focus on data: Scientific data results of observations, measurements, etc. Cultural heritage publications, pedagogic, etc. Administrative data semi-current record 1. Preserving data: our mission and strategy In due respect of the French archivistic legal context 28 th November 2012 DARIAH General VCC Meeting 3
1. Preserving data: our mission and strategy Challenges for long-term preservation of digital objects: Challenge Knowledge of content File formats Medias Software and hardware obsolescence Solutions Use of metadata (DCMI, etc) Unique ID for stored documents (ARK) Use of standard formats Logical migration (conversion) Supervision, management of ageing of medias Physical migration Technological watching activities, anticipation 28 th November 2012 DARIAH General VCC Meeting 4
1. Preserving data: our mission and strategy 11/07/2012 5
PAC: the logical architecture 1. Preserving data: our mission and strategy Management PAC : Plateforme d archivage du CINES Producer Transferring Agency Transfer Server Storage Server Access Server User Administration 28 th November 2012 DARIAH General VCC Meeting 6
1. Preserving data: our mission and strategy A team dedicated with 12 FTE : 1 I/T manager, 9 engineers, 1 archivist, 2 technicians File format experts I/T manager Developers Computer scientists Archivists System administrators Digital preservation project Users Lawyer Data producers 7
4. The file formats expertise: File format validator, FACILE, white books A team dedicated with 12 FTE : 1 I/T manager, 9 engineers, 1 archivist, 2 technicians File format experts I/T manager Developers Computer scientists Archivists System administrators Digital preservation project Users Lawyer Data producers 8
4. The file formats expertise: File format validator, FACILE, white books The archivist skills in: Selection of data for a middle/long-term preservation Structuration of SIP / AIP Selection of metadata standards Achievement of mapping producer_systems sip.xml Select and add relevant metadata for preservation (PID, archiving_date, collection_history, transfering_agency_history, etc.) Participation of national and international working groups (ICA, AFNOR, PIN, AAF ) 28 th November 2012 DARIAH General VCC Meeting 9
4. The file formats expertise: File format validator, FACILE, white books A team dedicated with 12 FTE : 1 I/T manager, 9 engineers, 1 archivist, 2 technicians File format experts I/T manager Developers Computer scientists Archivists System administrators Digital preservation project Users Lawyer Data producers 10
The file formats expertise: The file formats supported by PAC are : Open / published format (e.g. WAV, SVG) Widely used format (e.g. XML, MPEG4) Standard format (e.g. PDF (ISO 32000-1:2008), PNG (ISO 15948:2004)) The PAC platform uses Jhove, ImageMagick, DROID and ODF Toolkit libraries to Identify, Validate, And characterize 4. The file formats expertise: File format validator, FACILE, white books the format of transferred files. Type Text Picture Audio Video Format HTML, PDF, TXT, XML, ODT GIF, JPEG, TIFF, PNG, SVG WAV, AIFF, AAC, VORBIS MPEG4, THEORA, MKV 28 th November 2012 DARIAH General VCC Meeting 11
The file formats expertise: 4. The file formats expertise: File format validator, FACILE, white books FACILE validation du Format d Archivage du CInes par analyse et Expertise Tool to check files are compliant with their specifications Checks in FACILE = checks in PAC Validation tools in FACILE = Validation tools in PAC (Jhove, Imagemagick ) Data producer can validate file format quality before any ingestion in PAC. http://facile.cines.fr/ 28 th November 2012 DARIAH General VCC Meeting 12
4. The file formats expertise: File format validator, FACILE, white books And the file format expertise in CINES, it s also 28 th November 2012 DARIAH General VCC Meeting 13
Thanks for your attention Any questions? marion.massol@cines.fr
Annexes
CINES : missions and strategy Data stored in labs, datacenters, computer center Data produced in a national context (e.g. French PhD Thesis ) Relevant data are transfered added-value ISAAC : Middle-term Preservation Platform (max. 3 to 5 years) PAC : Long-term Preservation Platform EUDAT : European data grid Data shared by an European research community SARA JUELICH RZG CINES: 500 To 21/11/2012 Marseille Workshop on Scientific Data Preservation 16
PAC is built of three logical servers, as defined in the OAIS model A transfer server, where the archive producer can transfer his archives Transfer of SIP (Submission Information Package) Generation of acknowledgement receipt Control of SIP potential rejection Creation of AIP (Archival Information Package) A storage server, where the archives are maintained Multiple copy of AIP Generation of archive certificate Maintenance / migration operations Reports 1. Preserving data: our mission and strategy An access server, where the producer and the authorized users can search, browse and retrieve the archives they need on line Authentication of end-user Communication of requested DIP (Dissemination Information Package)