Planning a digitisation project: a rough guide



Similar documents
Evaluating File Formats for Long-term Preservation

JPEG 2000 for long-term preservation in practice Problems, challenges and possible solutions

Overview of NDNP Technical Specifications

The challenges of digital preservation to support research in the digital age

Scalable and sustainable OCR & document image analysis in the cloud

In addition, a decision should be made about the date range of the documents to be scanned. There are a number of options:

Digitisation Disposal Policy Toolkit

Standard. Record-keeping Requirements for Digitization. April 2009

4.16 National CARARE Workshop in the Netherlands

Bibliothèque numérique de l enssib

Summary Report T2 Migration Test

of the public interface service and will also act as the national aggregator for Europeana.

GCP - Records Managers Association

Newspaper Digitization Brief Background

Long-term archiving and preservation planning

E-Content Service Group Virtual Meeting. Digital Preservation: How to Get Started

The European Network on Statistics for Digitisation, Digital Preservation and. co-funded by ICT-PSP

Deliverable D8.1 Water Reuse Europe (WRE) website design and functionality specification

Data rescue and digitization: tips and tricks resulting from the Dutch experience

In this fashion he created a multivolume work consisting of some 9,000 pages with more than 2,600 pages of color illustrations.

Digitization of copyright protected newspapers in Sweden. Torsten Johansson and Heidi Rosen - National Library of Sweden - Kungliga Biblioteket

Archiving digital documents and s in PDF/A

Digital Preservation Strategy,

Newspaper Preservation. by H.R. Mohan Associate VP (Systems) The Hindu Chennai

MFC Mikrokomerc OFFER

Big Data in the Digital Cultural Heritage

How To Manage Pandora

WHERE IS THE DUTCH OER LIBRARIAN?

ICT Consultancy for cultural heritage

Digital Asset Manager, Digital Curator. Cultural Informatics, Cultural/ Art ICT Manager

BRINGING EUROPE S CULTURAL HERITAGE ONLINE, p.9. 1 The New Renaissance. REPORT OF THE COMITÉ DES SAGES REFLECTION GROUP ON

A Digital Library Feasibility Study

Guiding Digital Success

Long-term preservation activities of the Bavarian State Library

SQUARE MILE CONSULTING. Document Management. A brief guide to what it is and how to select and implement a solution.

The challenges of becoming a Trusted Digital Repository

National Library and Library Network in Finland - cooperation being the driving force of success

ECP-2007-DILI ATHENA

Digital preservation a European perspective

Project Information. EDINA, University of Edinburgh Christine Rees Sheila Fraser

Image quality issues in digitization projects of historical documents

Sharpdesk Solution Sharpdesk Document Management Solution

Guide to advanced ediscovery solutions

Project Plan DATA MANAGEMENT PLANNING FOR ESRC RESEARCH DATA-RICH INVESTMENTS

Archiving Full Resolution Images

Appendix A. Functional Requirements: Document Management

WEB Penetration Testing

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

The EU digital libraries initiative: Europeana (and more)

Preservation Action: What, how and when? Hilde van Wijngaarden Head, Digital Preservation Department National Library of the Netherlands

TRANSKRIBUS. Research Infrastructure for the Transcription and Recognition of Historical Documents

What objects must be associable with an identifier? 1 Catch plus: continuous access to cultural heritage plus

How To Use A Court Record Electronically In Idaho

Software Project Management Plan

Scanning and Tossing. Requirements for Scanning and the Destruction of Paper Based Records

User Guide of edox Archiver, the Electronic Document Handling Gateway of

The A-Z of Building a Digital Newspaper Archive: A Case Study of the Upper Hutt City Leader

National Library service centre for all types of libraries in Finland

Sustainable Solutions for Endangered Languages Data: The Language Archive

Corporate Records Scanning Strategy

Archiving the Web: the mass preservation challenge

GEOSPATIAL DIGITAL ASSET MANAGEMENT A SOLUTION INTEGRATING IMAGERY AND GIS WHERE WILL ALL THE PIXELS GO?(AND HOW WILL WE EVER FIND THEM?

DELIVERABLE. Grant Agreement number: Europeana Cloud: Unlocking Europe s Research via The Cloud

Building next generation consortium services. Part 3: The National Metadata Repository, Discovery Service Finna, and the New Library System

Case Study: RAKLI Procurement Clinic

Scanning Made Real. Apply your skills & implement your workflow!

How To Build A Map Library On A Computer Or Computer (For A Museum)

Brown County Information Technology Aberdeen, SD. Request for Proposals For Document Management Solution. Proposals Deadline: Submit proposals to:

Wadden Sea World Heritage

About the Digitization Programme & History of ITU Portal

Anne Karle-Zenith, Special Projects Librarian, University of Michigan University Library

Transcription:

Planning a digitisation project: a rough guide Digiwiki seminar 12.3.2009, Helsinki, Finland edwin.klijn@kb.nl

Planning a digitisation project: a rough guide

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Wrapping up the project (exploitation plan, longterm preservation)

Sample 1: ANP Radionews bulletins 1,5 million handtyped newsitems from the Dutch radio (1937-1995) June 2007 October 2008 Text mass digitisation project Budget: 0,5 million Euros Funded by Memory of the Netherlands programme URL: http://anp.kb.nl (in Dutch only)

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Sample 2: Databank of Digital Daily Newspapers 8 million newspaper pages Selection of Dutch local, regional, national and colonial newspapers 1618-1995 2006 2011 Budget: 12,5 million Euros Funded by the National Programme Investments in Large- Scale Research Facilities 25 billion words,text mass digitisation URL: http://www.kb.nl/projectdagbladen/

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Concluding the project (exploitation plan, longterm maintenance)

1. Writing a project proposal - Business case: benefit for our organisation to start digitising - Planning: how long will it take? - Which resources are needed? Staff, equipment, etc. Estimate of required budget. - Risk asessment: what are the potential risks that can prevent us from reaching our goals? - Very down-to-earth description of final deliverables - Who will be the owner of the digital collection after the project?

Sample case 1: ANP news bulletins - Goal: make the collection online accessible for researchers and the general public - Output-oriented = access project - Why? Task of our organization - Deliverables: 1,5 million JPEG q. 10 files, 1,5 million ALTO xml-files, a website with fulltext search-and-retrieval functionality - Estimated budget: 0,5 million Euro, easy material so 0,33 Ct per page

Sample case 2: DDD Dutch newspapers - Goal: make the collection online accessible for researchers and the general public. - No preservation project but because of vulnerability of material re-scanning in the future no option, thus production project - Why? Task of the KB. - Article-level access

Sample case 2: DDD Dutch newspapers (2) - Deliverables: 8 million JP2 master files, 8 million JP2 access images, PDF per issue, text file per article, MPEG21 per issue, 8 million MIX-files and website with fulltext and advanced search - Estimated budget: 12,5 million Euro

I digitise because - my users frequently use this material - I think my users will frequently use this material - I want to save and protect my vulnerable originals - my users give me money to do so

Digitisation on demand: Stadsarchief Amsterdam Threshold: information in original should be readible 1 copy customer, 1 copy reading room No separate, uncompressed master files (JPEG 10) Now: 32 kilometers of archives digitally available 0,50 per scan

Cost estimate Be realistic Calculate all costs Use realisation data from other projects Beware of all the work BESIDES the actual scanning

Sample: ANP news bulletins and DDD newspapers Staff Hard-and software = 1,50 Euro Research and development Scanning, OCR, metadata Staff = 0,33 Euro Hard- and software Research and development Scanning, OCR, metadata

Outsourcing digitisation: different prices

Outsourcing?

Pitfall: intellectual property rights - Three copyright moments: 1. Making a (digital) copy 2. Making a copy for an internal network 3. Making a copy for the internet - 70 years after death of author and/or date of publication - Legal obligation to retrieve all rightholders, time consuming activity in (mass)digitisation projects - Commission Digiti e: agency responsible to deal with claims and retrieving rightholders?

Privacy laws?

Pitfall: know thy originals! How many? What condition? Where are they? Available metadata Dimensions Colour/greyscale/B&W Available alternatives (eg. microfilm vs originals)

Case 1: ANP news bulletins

Case 1: ANP news bulletins

Case 2: DDD newspapers

Case 2: DDD newspapers The Hague Stockholm Vatican Secret Archives Dresden Parimaribo

Acquiring finances - Resources within own institution - National government - European Union - Private funds - Current figures: 2 to 3% of all Dutch institutionalized cultural heritage is currently available in digital format.

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

National government Netherlands: Dutch government encourages: - crosssector cooperation between heritage institutions - open standards - service-oriented architecture (SOA) - mass digitisation - digitisation incorporated into overall policy ( Digitaliseren met beleid ) - more uniformity of digitisation activities - main target groups: education and research

National government Netherlands - Dutch ministry of Education, Culture and Science but also other ministeries. - Digitisation programmes: * Erfgoed van de Oorlog * Memory of the Netherlands http://www.geheugenvannederland.nl * Images for the future http://www.beeldenvoordetoekomst.nl

National preservation programme: Metamorfoze - KB, National Archives - 1997- - Funded by Ministry of Education, Culture and Science - Preservation of Dutch paper-based heritage - Projectbureau - 30% own contribution - Before 2007: microfilming, after 2007: preservation imaging - http://www.metamorfoze.nl/

Planning a digitisation project: a rough guide

Future: project Nederlands Erfgoed: Digitaal! - Consortium of 10 Dutch heritage institutions - Combining parts of their collections - Cross-media, subject-oriented focus - Target groups: education, research, tourism and creative industry - Canon van Nederland : highlights of Dutch history - 2009-2014 - Overall estimated cost M186 Euro, estimated benefits M172-223 Euro (!)

Koninklijke Bibliotheek: Digital Library programme - Current digitisation projects: (to 2011): M 41 pages, budget M 54 - Digital Library programme 2009-2013 - Target: 20% of all books, newspapers and journals published in the Netherlands digitised in 2013

European projects - Financial commitment institution, matching principle - Aimed at innovative initiatives - Aimed at combining collection on an international level, e.g. Europeana (http://www.europeana.eu) and European Digital Library (http://www.theeuropeanlibrary.org/portal/index.htm)

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Concluding the project (exploitation plan, longterm maintenance)

Project plan - Detailed timeschedule with milestones and deliverables - Division into workpackages (in case of large projects) - Clear overview of dependencies - Detailed risk assessment - Business case : should be checked during project - Translation of project aim into detailed specifications

Planning a digitisation project: a rough guide

http://www.dbnl.org

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Planning a digitisation project: a rough guide

Technical specifications affected by project aim - Image quality: resolution, tonal range, detail reproduction, polarity (B/W, greyscale, colour) - Image format: lossy vs. lossless, compressed vs. uncompressed - Metadata: a lot vs a little vs somewhere in between - Image manipulation: yesvsnovsa little - Good technical manuals available: * JISC Digital Media at http://www.jiscdigitalmedia.ac.uk/ * Cornell University at http://www.library.cornell.edu/preservation/tutorial/

Search-and-retrieval problems - Large amounts of data: how to find your way - Limited capacity of search engine - Limitations of Optical Character Recognition (OCR) software

Optical character recognition (OCR) Blijkens verschillende mededeeelingen in de dagbladen is de Indische regeering den laatsten tijd regelend opgetreden ten aanzien van het Indische handelsverkeer, in het bijzonder ten aanzien van den uitvoer van Indische producten.

Optical character recognition (OCR) Word accuracy: 7/33=79% Character accuracy: 7/202=97% Blijkena verachillende mededeeelingen in de dagbladen is de Indische regeering den 1aatsten tijd regelond opgetreden ten aanzien van het Indische handelsverkeer, in het lijzonder ten a3nzien van den uitvoar van Indische producten.

Optical character recognition (OCR) IINCOLXis strangely forgotten by b visitors to in Washington Washingt onthe Thesightseers who whotluck flock to the National ntionnl Capital at all sea seasons scaon8 seasons Lsons on8 of the year for som e som unknown reason jeeni to find more moreinteresting moreintne8t ing moreinteresting interestingthe thing things of less historic importane than the therelics thcrelic therelics relicspertaining pertainh g iu ti > the fmt martyred President whose un untimely untimely untimely timelydeath was as mourned by the entire oitihzed world Source: http://www.loc.gov/chroniclingamerica/

Optical character recognition (OCR)

Automated OCR - Pilot project Historische kranten (bitonal, from microfilm): between 60% and 70% word accuracy - Results for historical texts very low. EU-project IMPACT (Improving Access to Texts, URL: http://www.impact-project.eu/)

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Concluding the project (exploitation plan, longterm maintenance)

Steering group - representative of end-users - representative of initiating party - representative responsible for quality assurance

Project manager - reports to steering group - responsible for day-to-day work (budget etc.) -management by exception

Project leader - reports to project manager - responsible for workpackage

Selection: Scientific Advisory Committee - Advises on titles to be selected - Advises on search functionality on the website (userperspective) - Advises on content on the website

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Concluding the project (exploitation plan, longterm maintenance)

Managing the project flow - Are the specifications met within timeframe and budget? - Are there any new developments that affect the business case of the project?

Stepping stones... 1. Writing a project proposal (incl. business case ) 2. Acquiring finances 3. Writing a detailed project plan (incl. detailed specs) 4. Setting up a project organisation 5. Managing the project flow 6. Concluding the project (exploitation plan, longterm maintenance, lessons learned)

Exploitation plan - Who will be responsible for maintaining the website? - What possible future purposes can be served? - How much costs are involved in maintaining the website and longterm preservation of the deliverables?

Pitfall: It ain t over when it s over... - Koninklijke Bibliotheek: 41 million pages to be digitized up to 2011. - Required storage space 1 petabyte - Current estimated storage costs: longterm preservation system (e-depot) 1 TB = 8,500 Euro a year - Current estimated storage costs: webserver 1 TB = 7,500 Euro a year - Structural costs in the long run: millions!

Koninklijke Bibliotheek- big issues - Expensive scanning price of 1,3 Euro per page - Intellectual property rights - Quality control of files delivered by suppliers (> 2 million files a month) - Storage - Longterm preservation of all files produced - Inefficient search-and-retrieval software but

we have already come a long way since 1999!