Introducing the Greenstone



Similar documents
CREATING DIGITAL LIBRARY COLLECTIONS WITH GREENSTONE

Course 1: Digital Libraries Introduction Unit 4: Digital libraries functional components software

Greenstone Documentation

Installing and operating Greenstone

DIGITAL LIBRARY OPEN SOURCE SOFTWARE : A COMPARATIVE STUDY

Oracle Universal Content Management

Plagiarism. Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support for South Asia Head, LRC & CDDL, IIM Kozhikode

Outside In Image Export Technology SDK Quick Start Guide

11.5 E-THESIS SUBMISSION PROCEDURE (RESEARCH DEGREES)

The University of Chicago Library

Free web-based solution to manage photographs that could be used to manage collection items online if there is a photo of every item.

How To Manage A Digital Library

DIGITIZATION OF LIBRARY RESOURCES AND THE FORMATION OF DIGITAL LIBRARIES: A PRACTICAL APPROACH

DAR: A Digital Assets Repository for Library Collections

White Paper. 3-Heights Document Converter Basics and Applications

WHY DIGITAL ASSET MANAGEMENT? WHY ISLANDORA?

Digital Forensics with Open Source Tools

Dr.Christina Birdie Indian Institute of Astrophysics, Bangalore

JAVA WEB START OVERVIEW

OCLC CONTENTdm. Geri Ingram Community Manager. Overview. Spring 2015 CONTENTdm User Conference Goucher College Baltimore MD May 27, 2015

How To Understand The History Of The Web (Web)

OCR and PDF Compression

Design and Implement of Digital Library: An Overview

ClickCartPro Software Installation README

The Electronic Document Management Application (EDM)

Customer Profile Report for ABC Hosting Ltd

Installation Guide for FTMS and Node Manager 1.6.0

Using PDF Files in CONTENTdm

Zebra Link-OS Environment Version 2.0

Content Management Software Drupal : Open Source Software to create library website

Installing (1.8.7) 9/2/ Installing jgrasp

PowerPoint Presentation to Accompany. Chapter 3. File Management. Copyright 2014 Pearson Educa=on, Inc. Publishing as Pren=ce Hall

CatDV Pro Workgroup Serve r

Reference Software Workshop Tutorial - The Basics

Interoperabilnost LINUX-Windows. It is easily possible for Linux & Windows to coexist & even work together.

RIC 2007 SNAP: Symbolic Nuclear Analysis Package. Chester Gingrich USNRC/RES 3/13/07

CISCO IP PHONE SERVICES SOFTWARE DEVELOPMENT KIT (SDK)

Current Page Location. Tips for Authors and Creators of Digital Content: Using your Institution's Repository: Using Version Control Software:

Web services with WebSphere Studio: Deploy and publish

TRUD Service Download Guide

UNESCO activities in the field of Free Open Source Software (FOSS)

Open Source Initiative in Digital Preservation: The Need for an Open Source Digital Repository and Preservation System

Additional information >>> HERE <<< Free Download buy website traffic. Click Here =>

Digital Asset Management. Content Control for Valuable Media Assets

Web Conferencing Version 8.3 Troubleshooting Guide

CLC Bioinformatics Database

Issues and Challenges in Open Source Software Environment with Special Reference to India

Using Impatica for Power Point

Version Comparison MAXIMIZER CRM Published By. DATA SHEET Version Comparison 1

bbc Installing Your Development Environment Adobe LiveCycle ES July 2007 Version 8.0

The All-In-One Browser-Based Document Management Solution

A comparison of e-book readers (features)

Useful Utilities. Here are links to free third party applications that we use and recommend.

Computers Are Your Future Eleventh Edition Chapter 5: Application Software: Tools for Productivity

Project Information Project Acronym Kaptur

AccXES Client Tools 10.0 User Guide 701P41529 May 2004

OSAS version 8 A foundation for the future

Building An Institutional Repository With DSpace

INT322. By the end of this week you will: (1)understand the interaction between a browser, web server, web script, interpreter, and database server.

OCLC CONTENTdm and the WorldCat Digital Collection Gateway Overview

The FAO Open Archive: Enhancing Access to FAO Publications Using International Standards and Exchange Protocols

and technology engineered for agile business

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.

Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.

Code Estimation Tools Directions for a Services Engagement

Installing FEAR on Windows, Linux, and Mac Systems

Computer software. Computer software: summary/overview/abstract (Part 1)

Introduction to WIPOScan Software

Creation of Digital Document Archives with Winisis

RJS Software Systems Inc AS/400 Report Delivery System

FreeFlow Accxes Print Server V15.0 August P Xerox FreeFlow Accxes Print Server Drivers and Client Tools Software Installation Guide

IDT Connect User Guide

TEXT FILES. Format Description / Properties Usage and Archival Recommendations

Content Management Systems: Drupal Vs Jahia

AlphaTrust PRONTO Enterprise Platform Product Overview

Bureau for Visual Affairs. content management system. Keep your website up-to-date and relevant with ease

Cisco CNS NetFlow Collection Engine Version 4.0

A Virtual Exhibition of Open Source Software for Libraries

How To Link Tomcat 5 with IIS 6 on Windows 2003 Server using the JK2 ajp13 connector

Analysing log files. Yue Mao Supervisor: Dr Hussein Suleman, Kyle Williams, Gina Paihama. University of Cape Town

EMCO Network Inventory 5.x

More information >>> HERE <<<

More details >>> HERE <<<

Real-time Device Monitoring Using AWS

You can access it anywhere - on your desktop, online, or on your ipad. Benefits include:-

Using Moodle. Moodle can do lots of things but my advice would be to use it for:

Cincom Smalltalk. Installation Guide P SIMPLIFICATION THROUGH INNOVATION

Additional information >>> HERE <<< Get Learn About The Web :: articles on online marketing strategies

Veco User Guides. Document Management

The Open Source CMS. Open Source Java & XML

Archiving Full Resolution Images

Transcription:

Introducing the Greenstone Ian H. Witten Digital Library Software Computer Science Department Waikato University New Zealand http://greenstone.org http://nzdl.org

What we wanted Collections of digital material Individualized, depending on metadata etc Up to several Gb of text + associated images, movies, whatever Fully searchable Served on WWW, or published on removable media Run anywhere, on any computer Fully internationalized Non-exclusive: documents and metadata in any format Non-prescriptive: standard and non-standard metadata

What we got: Greenstone Access Searching/ browsing Extensible Multi-* Accessible via any Web browser Server runs on anything (all Windows + Unix + Mac) Collections can be published on CD-ROM/DVD Trivial to install GUI interface for building and publishing collections Collection-specific Full-text and fielded search Flexible browsing facilities Metadata-based (Dublin Core recommended) Creates all access structures automatically Plugins new document, metadata formats Classifiers new metadata browsers Multilingual: Documents and interfaces Multimedia: image, video, audio collections exist Multiformat: Documents and metadata

Tutorial exercise #1 Installing Greenstone (local library version) Self-installing CD-ROM Greenstone ImageMagick: yes Ghostscript: yes Copy sample_files from CD to your directory Try the users s interface Librarian interface Navigate to Home Folder\Desktop\sample_files Right-click

Greenstone facts Distribution International UN Agencies Technical centers Open source: Gnu GPL Distributed via SourceForge since: Nov 2000 Average downloads: 5000/month since then Humanitarian CD-ROMs produced: 30-35 Distribution for each one: 5000/year Languages for interface: 50 Languages for full software + manuals: 4 Countries represented on email lists: 60 UNESCO training courses in: Bangalore, Almaty, Dakar, Suva, UNESCO, Paris ( Information for All programme) FAO, Rome (Info Management Resource Kit) UNU, Japan (CD-ROM collections of UNU material) University of Waikato, New Zealand Indian Institute of Sciences, Bangalore University College, London University of Cape Town, South Africa University of Lethbridge, Canada

Sample collections (see greenstone.org) International U.S. Argentina Human Rights Commission Tasmania State Library Peking University Digital Library Gresham College, London Oxford Digital Library, University of Oxford University of Applied Sciences, Stuttgart Association of Indian Labour Historians, Delhi Indian Institute of Management, Kozhikode Indian Institute of Science, Bangalore Vimercate Public Library, Milan, Italy Netherlands Institute for Scientific Information Services Philippine Government Information Network Mari El Republic, Russia Slavonski Brod Public Library, Slovenia Vietnam National University Welsh Books Council Auburn University, Alabama Detroit Public Library Hawaiian Electronic Library ibiblio project, University of North Carolina Illinois Wesleyan University LeHigh University, Pennsylvania New York Botanical Garden University of California at Riverside University of Chicago Library University of Illinois Texas A&M University Washington Research Library Consortium Argentina Australia China England England Germany India India India Italy Netherlands Philippines Russia Slovenia Vietnam Wales

Standards Metadata Serving Documents Can use any metadata set, Dublin Core supplied Plugins for XML Refer METS can be used as Greenstone s internal representation Web Can publish Greenstone collections on CD-ROM Can publish Greenstone collections on OAI Export collections to METS Export collections to DSpace (ready for DSpace s batch import program) Plugins for PDF PostScript Word, RTF HTML Plain text Latex MARC CDS/ISIS ProCite BibTex ZIP Excel PPT Email Source code RealMedia OAI METS DSpace Images (GIF, JPEG, TIFF ) MP3 Ogg Vorbis MediaWiki UnknownPlug (e.g. for audio, MPEG, Midi)

Greenstone: Platforms Operating system: Windows (any version) Linux (any version) Unix (most versions, e.g. Solaris) Mac OS X Restrictions: No longer runs under Windows 3.1/3.11 For Librarian interface (GLI), need Java which is no longer supported on Windows 95

Local library vs Web library Local library: stand-alone Serves collections on a standalone PC And on others on the same network Includes built-in Web server Web library: uses external web server Apache, Microsoft PWS/IIS Need to configure web server (geeky job?) Windows: Both local library and web library Normally use local library (else must set up server) Web library works with Microsoft PWS, IIS Unix, Mac OS/10: Web library only Use Apache (or other web server) Linux binaries supplied Tested on SUN Solaris, Mac OS/10 Need GDBM (standard on Linux) Greenstone developed on Linux

Documentation and help Manuals on the CD-ROM (gsdl/docs) From Paper To Collection (30pp) Scanners and scanning, OCR, 3 examples from 1,000 to 100,000 pages, Creating an electronic collection Installer s Guide (36pp) Versions of Greenstone, installation procedure, Greenstone collections, setting up the web server, configuring your site, personalizing your installation User s Guide (90pp) Overview of Greenstone, using Greenstone collections, the collector, administration, software features, glossary of terms Developer s Guide (113pp) Understanding the collection building process, getting the most out of your collections, the Greenstone runtime systems, configuring your Greenstone site

The Greenstone Librarian Interface (GLI) Building collections Interactive Java program Runs on anything Build a collection on the computer you are on plus new applet version Includes metadata editor Caveat: cannot deal with such huge metadata collections as Greenstone can (Tutorial exercise: small collection of HTML files) Invoke GLI: build a small collection of HTML files Gather Create Look at extracted metadata Set up shortcut in the Librarian interface

Set up environment variables Details about the collection Put source docs into a subdirectory collect.cfg (plugins) Docs in Greenstone Archive format collect.cfg Greenstone collection Makecol Import Build Building a collection Create a directory for the collection (with subdirectories), put collect.cfg file in etc subdirectory Convert to archive format Extract metadata create indexing & browsing structures, compress Search collect.cfg + macros (main.cfg) Results

The building process $GSDLHOME collect demo import archives building index etc perllib Put material here import.pl demo build.pl demo rm r index/* mv building/* index Collection configuration file Collection served from here (or to CD-ROM)

How big a library? today 50,000 searchable newspaper issues 250,000 pages 3M articles 9 GB of raw text 27 GB of metadata (including bounding box coords for all words) 50M unique terms (uncorrected OCR) 900,000 non-searchable pages mid 2008 All pages searchable details DLConsulting Ltd uses Lucene Input: METS/ALTO XML files National Library of Singapore Mar 2008 600,000 searchable newspaper issues mid 2009 2M pages Will continue to grow http://paperspast.natlib.govt.nz

How to build web site New Zealand Digital Library Project http://nzdl.sadl.uleth.ca/cgi-bin/library

How to build web site

How to build web site

The Greenstone documentation wiki go to greenstone.org

Greenstone 3 Easier to modify interface: generates data (XML) rather than presentation (HTML) XML can be transformed using XSLT, rather than HTML via macro files More dynamic: more can be done at runtime e.g. sort search results on the fly, generate new classifiers at runtime METS foundation: multiple hierarchies possible e.g. chapter-section-subsection hierarchy can coexist with page header-contents-footer hierarchy Distributed modules, using SOAP interface can be on different computer to collection, can talk to several different Greenstone sites Java basis: platform independent Greenstone2 operates on different platforms, but at a high cost to implementors Better software technology easier for people (e.g. our students!) to modify the software to make it do radically different things Greenstone 2 was designed about 7 years ago Since then, new software and library standards have emerged