Installing and operating Greenstone



Similar documents
Greenstone Documentation

Redatam+SP REtrieval of DATa for Small Areas by Microcomputer

Setting up an Apache Web Server for Greenstone 2 Walkthrough

SIMIAN systems. Setting up a Sitellite development environment on Windows. Sitellite Content Management System

Server & Workstation Installation of Client Profiles for Windows

Talk Internet User Guides Controlgate Administrative User Guide

IBM WebSphere Application Server Version 7.0

Web Enabled Software for 8614xB-series Optical Spectrum Analyzers. Installation Guide

AccXES Client Tools 10.0 User Guide 701P41529 May 2004

VERITAS Backup Exec TM 10.0 for Windows Servers

Security Correlation Server Quick Installation Guide

WEB2CS INSTALLATION GUIDE

RecoveryVault Express Client User Manual

HAHTsite IDE and IP Installation Guide

Installation Instruction STATISTICA Enterprise Server

Bitrix Site Manager ASP.NET. Installation Guide

OroTimesheet 7 Installation Guide

for Networks Installation Guide for the application on the server July 2014 (GUIDE 2) Lucid Rapid Version 6.05-N and later

Installation Guide for Pulse on Windows Server 2012

User Manual. Onsight Management Suite Version 5.1. Another Innovation by Librestream

Dell KACE K1000 System Management Appliance Version 5.4. Service Desk Administrator Guide

SecureAware on IIS8 on Windows Server 2008/- 12 R2-64bit

Trend ScanMail. for Microsoft Exchange. Quick Start Guide

How To Install Amyshelf On Windows 2000 Or Later

Quick Start Guide for VMware and Windows 7

APACHE WEB SERVER. Andri Mirzal, PhD N

Online Backup Client User Manual

Installing SQL-Ledger on Windows

WhatsUp Gold v16.2 Installation and Configuration Guide

Online Backup Client User Manual

Installing Ameos for Windows Platforms

Reference and Troubleshooting: FTP, IIS, and Firewall Information

1. Product Information


Online Backup Client User Manual Linux

PROJECTIONS SUITE. Database Setup Utility (and Prerequisites) Installation and General Instructions. v0.9 draft prepared by David Weinstein

Installation Guide for Pulse on Windows Server 2008R2

SpatialWare. Version for Microsoft SQL Server 2008 INSTALLATION GUIDE

Installation Instructions for Version 8 (TS M1) of the SAS System for Microsoft Windows

Migrating helpdesk to a new server

Installing (1.8.7) 9/2/ Installing jgrasp

Getting Started with Dynamic Web Sites

DiskPulse DISK CHANGE MONITOR

Online Backup Linux Client User Manual

Installing the Microsoft Network Driver Interface

This guide consists of the following two chapters and an appendix. Chapter 1 Installing ETERNUSmgr This chapter describes how to install ETERNUSmgr.

for Networks Installation Guide for the application on a server September 2015 (GUIDE 2) Memory Booster version 1.3-N and later

for Networks Installation Guide for the application on the server August 2014 (GUIDE 2) Lucid Exact Version 1.7-N and later

User Guide. Version 3.2. Copyright Snow Software AB. All rights reserved.

MGC WebCommander Web Server Manager

Web Remote Control SA Software Installation and Setup

Team Foundation Server 2010, Visual Studio Ultimate 2010, Team Build 2010, & Lab Management Beta 2 Installation Guide

System Administration Training Guide. S100 Installation and Site Management

Designing and Implementing Forms 34

aims sql server installation guide

WEB INTERFACE FOR CDS/ISIS. GENISISweb VERSION Deepali Talagala, General Secretary, Sri Lanka Library Association Colombo

Team Foundation Server 2013 Installation Guide

In the same spirit, our QuickBooks 2008 Software Installation Guide has been completely revised as well.

Configuring IBM HTTP Server as a Reverse Proxy Server for SAS 9.3 Web Applications Deployed on IBM WebSphere Application Server

Canto Integration Platform (CIP)

AXIGEN Mail Server. Quick Installation and Configuration Guide. Product version: 6.1 Document version: 1.0

Installation Guide For Choic Enterprise Edition

Microsoft SQL 2008 / R2 Configuration Guide

Cumulus 6 HELIOS Companion 2.0. Administrator Guide

Virtual CD v10. Network Management Server Manual. H+H Software GmbH

National Fire Incident Reporting System (NFIRS 5.0) NFIRS Data Entry/Validation Tool Users Guide

Installing and Configuring Apache

HOW TO SETUP AN APACHE WEB SERVER AND INTEGRATE COLDFUSION

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

WhatsUp Gold v16.3 Installation and Configuration Guide

Pearl Echo Installation Checklist

WhatsUp Gold v16.1 Installation and Configuration Guide

MAS 90. Installation and System Administrator's Guide 4WIN /04

WebSpy Vantage Ultimate 2.2 Web Module Administrators Guide

Archive Attender Version 3.5

Spector 360 Deployment Guide. Version 7

Sophos for Microsoft SharePoint startup guide

Matisse Installation Guide for MS Windows. 10th Edition

Usage Analysis Tools in SharePoint Products and Technologies

Online Backup Client User Manual Mac OS

Online Backup Client User Manual Mac OS

OutDisk 4.0 FTP FTP for Users using Microsoft Windows and/or Microsoft Outlook. 5/1/ Encryptomatic LLC

Document History Revision Date: October 30, 2006

Jetico Central Manager. Administrator Guide

Migrating TimeForce To A New Server

VERITAS Backup Exec 9.1 for Windows Servers Quick Installation Guide

Introduction. Installation of SE S AM E BARCODE virtual machine distribution. (Windows / Mac / Linux)

AccXES Account Management Tool Administrator s Guide Version 10.0

VERSION 9.02 INSTALLATION GUIDE.

Online Backup Client User Manual

Team Foundation Server 2012 Installation Guide

Installing The SysAidTM Server Locally

DS License Server. Installation and Configuration Guide. 3DEXPERIENCE R2014x

User's Manual. Intego VirusBarrier Server 2 / VirusBarrier Mail Gateway 2 User's Manual Page 1

How To Create An Easybelle History Database On A Microsoft Powerbook (Windows)

HIRSCH Velocity Web Console Guide

National Fire Incident Reporting System (NFIRS 5.0) NFIRS Data Entry/Validation Tool Users Guide

Transcription:

Appendix A Installing and operating Greenstone 1 Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato. It is free, open-source software, issued under the terms of the GNU General Public License. You can download it, along with documentation, from www.greenstone.org. Although this URL is hosted in New Zealand, your download will be redirected automatically elsewhere normally to http://sourceforge.net in the U.S. to expedite network access. We want to ensure that the software works well for you: please report any problems on the feedback form at www.greenstone.org or by emailing to greenstone@cs.waikato.ac.nz. In cooperation with UNESCO and the Humanity Libraries Project, Greenstone is issued on a CD-ROM that includes documentation in English, French, and Spanish. If you do not have Internet access, you can obtain a copy of the CD-ROM, for a nominal charge that covers the cost of shipping and handling, by writing to Greenstone, Department of Computer Science, University of Waikato, Hamilton, New Zealand. However, you are encouraged to download the software if you possibly can rather than purchasing the CD-ROM; this puts you in touch with current developments and gives you the latest version of the software. Everything on the CD-ROM is freely available on the Internet from www.greenstone.org. This appendix is written as though you were installing Greenstone from the CD-ROM, to simplify and unify the explanation. Windows or Unix? Windows Unix Binaries available for all versions May need root login to install 3.x 95/98/Me NT/2000/XP Linux Sun Solari s or Macintosh OS/X Other Serves collections but no building Full version available Full version available Full version available Full version available Full version available Only Administrators can install software Source code tested, binaries available Source code tested Untested Figure A.1 The different options for Windows and Unix versions of Greenstone. Greenstone runs on different platforms, and in different configurations, as summarized in Figure A.1. There are many issues that affect (or might affect) the installation procedure. Before reading on, you should consider these questions: Are you using Windows or Unix? If Windows, are you using 3.1/3.11 or a more recent version? Although

2 you can view collections under 3.1/3.11 and serve other computers on the network, you cannot build new collections. The full Greenstone software runs on Windows 95/98/Me and NT/2000/XP. If Unix, are you using Linux or another version of Unix? For Linux, a binary version of the complete system is provided which is easy to install. For other types of Unix, you will have to install the source code and compile it. If Windows NT/2000 or Unix, can you log in as the system administrator or root? This may be required to configure a Web server. Do you want the source code? If you are using Windows or Linux, you can just install binaries. But you may want the source code as well it s all in the Greenstone distribution. Do you want to build new digital library collections? If so, you need Perl, which is freely available for both Windows and Unix. Is your computer already running a Web server? The Windows version of Greenstone comes with a built-in Web server. However, if you are already running a Web server, you may want to stay with it. For Unix, you have to run a separate Web server. Do you know how to reconfigure your Web server? If you don t use the Greenstone Web server, you will have to reconfigure your existing one slightly to recognize the Greenstone software. A.1 Installation procedure Versions of Greenstone are available for both Windows and Unix, as binaries and in source code form. The user interface uses a Web browser: Netscape Navigator and Internet Explorer (version 4.0 or greater in both cases) are both suitable. In case you re not connected to the Internet and don t already have a Web browser, a Windows version of Netscape is provided on the Greenstone CD-ROM. WINDOWS Simple installation If you are a Unix user, please skip ahead to the next subsection. For a simple, straightforward installation on Windows, go through the following simple installation procedure. The Greenstone system occupies about 40 Mb of disk space. If you choose anything other than the default setup, you will have to decide whether you want to install the binary code or the source code. If in doubt, choose the binary code. The installation procedure is the same for both. The following sections tell you more about the options you will be presented with. To install the Windows version from the CD-ROM, insert the disk into the drive (e.g., into D:). (Downloading and installing from the Internet is just as easy.) If the installation procedure does not start automatically after about 20 seconds, click on the Start menu, select Run and type D:\Windows\Setup.exe, where D is the letter that identifies your CD-ROM drive. For Windows 3.1, select Run from the File Manager and type D:\Windows\Setup.exe. For the simplest installation, just accept the default at each point by clicking

3 the Next button. That s all you need to do. Greenstone is installed in the directory C:\Program Files\gsdl. Once installation is complete, to start your Greenstone system click on the Start button, open the Program menu, and select Greenstone Digital Library. This brings up a dialog box: just click Enter Library. This automatically starts your Internet browser and loads the Greenstone home page, which should look something like the example in Figure 7.6 (Chapter 7). You enter the Greenstone Demo collection by clicking on its icon. Windows binaries There are two separate Windows binary programs in the software distribution: the Local Library and the Web Library. The default installation described in the previous subsection selects the Local Library version. We strongly recommend that you use this version. The Web Library, which is harder to set up, is only necessary if you already run a Web server and want to use it for Greenstone. Despite its modest name, the Local Library offers a complete, self-contained, Web-serving capability. After you have selected which version to install, you will be asked to choose between a full or a compact installation. The full installation includes everything. The compact installation (the default) includes everything except the Greenstone manuals and the Export to CD functionality. On your hard drive, the compact installation uses only about one-third the space of the full installation. Local library. This enables any Windows computer to serve prebuilt Greenstone collections. The Greenstone Demo collection will automatically be installed; you can also install other collections (described later). The Local Library software is the same as that used on CD-ROMs produced by the Greenstone system. The Local Library is intended for use on stand-alone computers or computers that do not already have Web-server software. It contains a small built-in Web server so that other computers on the same network can also access the library. (However, the Web server has limited configurability.) The Local Library software automatically determines whether your computer has network software installed or is connected to a network. It operates correctly under any combinations of these conditions. However, there are two possible problems that may be encountered. Greenstone may cause an unwanted telephone dial-up operation fail to run because network software is installed, but installed incorrectly A restricted version of the Local Library is supplied which is intended for use in these situations. The restricted version only works with Netscape (not Internet Explorer). When you invoke the Local Library version of Greenstone, the dialog box contains a button that allows you to use the restricted version instead. Unless the above problems arise, you should always use the standard version. Web library. This enables any computer with an existing Web server to serve prebuilt Greenstone collections. As with the Local Library, the

4 Greenstone Demo collection will automatically be installed. You can also install the other collections included in the software distribution see the subsection entitled Greenstone collections. The Web Library differs from the Local Library because it is intended for computers that already have Web server software. To run it, you also need Web server software. One possibility is Apache (see Table A.3). The Collector. This component, which is included in both the Local Library and the Web Library, allows you to build collections containing material of your choice. (You will not be able to use the Collector on a Windows 3.1/3.11 machine.) Windows Web server configuration (Web Library version only) Windows source An advantage of the Local Library version of Greenstone is that it runs out of the box and does not require any special configuration. For the Web Library version, however, you will have to make some adjustments to your Web server setup. If you already have a Web server, some small changes have to be made to its configuration to make your Greenstone installation operate. The install script explains what these are for the Apache Web server (see Section A.2 for instructions for configuring the PWS and IIS Web servers). You may need help from a system administrator to reconfigure an existing Web server he or she should be able to understand the instructions printed by the install script. If you do not already have a Web server, you will have to install one (see Table A.3 for information on Apache) and then configure it appropriately. Section A.2 gives a detailed account of the parts of a Web server installation that affect Greenstone, and how they need to be altered. It comes down to including half a dozen or so lines in a configuration file. The Greenstone source code occupies 50 Mb of disk space, but to compile it you will need about 90 Mb. To compile the source on Windows you need the Microsoft Visual C++ compiler (You do not need GDBM, the GNU database manager, because it is included in the Greenstone source distribution.) It is unlikely that you will be able to compile Greenstone on a Windows 3.1/3.11 machine. In the event that you recompile Greenstone and wish to use the recompiled version to create CD-ROMs, you should note that code produced by recent versions of the Visual C++ compiler does not run under Windows 3.1/3.11, although there is no problem with later Windows systems (95, 98, Me, NT, 2000, XP). If you want your CD-ROMs to operate on early Windows machines, you will need a different version of the compiler. Moreover, Greenstone uses STL, the C++ standard template library, and although these compilers sometimes come with STL, the provided version does not always work properly. Hence to recompile Greenstone in such a way that it produces

CD-ROMs that work on early versions of Windows, you need 5 the Microsoft Visual C++ compiler, Version 4.0 or 4.2; an external version of STL, the C++ standard template library. STL is packaged with Greenstone for use with these compiler versions. Note that the Windows installation procedure does not attempt to compile Greenstone for you if you choose to install the source code. For platform- and compiler-specific instructions on compiling Greenstone, see the Install.txt document which is placed in the top-level Greenstone directory (C:\Program Files\gsdl by default) during the installation procedure. UNIX This section is written for Unix users. (Windows users should skip ahead.) You need to choose whether to install the binary code or the source code. The binary code occupies about 50 Mb of disk space; the source code requires about 160 Mb to compile. Unix binaries The binary code requires an Intel x86-based Linux distribution which includes ELF binary support. Distributions that meet these requirements include RedHat 5.1 SuSE Linux 6.1 Debian 2.1 Slackware 4.0 More recent versions of these distributions should also work. You will need a Web server: we recommend Apache. You should install a Web server before installing Greenstone; this will make it easier to answer the questions asked during the Greenstone installation procedure. If you want to build new digital library collections, you will also need Perl if this is not already on your system. To check, open a terminal window, type perl v, and see if a message appears specifying, among other things, the version number. For most versions of Linux, Perl is installed by default. Unix source The source code is the same for Unix as for Windows. It has been compiled and tested on Linux, Solaris, and Macintosh OS/X; it should be a fairly routine matter to port it to other flavors of Unix. To compile the Greenstone source code on Unix, you need GCC, the GNU C++ compiler GDBM, the GNU database manager To run the Greenstone software, you also need a Web server and Perl, as described in the previous subsection, Unix binaries. Unix installation To install the Unix version from the CD-ROM, insert the disk into the drive,

and type mount /cdrom cd /cdrom cd Unix sh Install.sh mount the CD-ROM device (this command may differ from one system to another; for example on OS/X you cd to the /Volumes directory and then to the appropriate subdirectory for the CD-ROM) change directory to the CD-ROM s top level change directory to where the Unix install script resides begin the installation process (an explicit sh is used because many installations forbid you to execute programs directly from CD-ROM) 6 (It is just as easy to install Greenstone from the Internet.) The final command begins an interactive dialog that requests the information needed to install Greenstone on your system and gives detailed feedback on what is happening. The installation procedure begins by asking you which directory to install Greenstone into. The first file placed there is the uninstall program that cleans up any partial installation, should you encounter problems or terminate the installation prematurely. Next you choose whether you want to install binaries or source code. You are then asked some questions about your Web server setup. You need to have a valid CGI executable directory (normally called cgi-bin on Unix systems); you can either create a new one or use your existing one. If you create a new one, you will need to enter this information in your Web server s configuration file. In either case you need to enter the Web address of the CGI directory. The installation dialog will guide you through all these choices. It is important to set the file permissions correctly on certain directories, and you are prompted for the necessary information. Finally you are prompted for a password for the administrator user admin. (Note: This is a Greenstone user name/password and is completely independent of Unix user names and passwords.) By default all Greenstone software is installed in the directory /usr/local/gsdl if it is the root user who is doing the installation, and into the directory ~/gsdl otherwise (where ~ indicates the user s home directory). Installing the binaries takes just a few minutes, enough time for you to answer the appropriate questions. If you install the source code, the installation script will compile it, which takes from 10 minutes to an hour or so, depending on the speed of your processor. During the installation procedure you will be asked whether you want to install any Greenstone collections. The Demo collection is installed automatically; other collections in the distribution are described later in the chapter, in the subsection Greenstone collections. To uninstall the software, type cd ~/gsdl sh Uninstall.sh or /usr/local/gsdl if it was the root user who installed Greenstone Unix Web server configuration If you already have a Web server, some small changes will have to be made

7 to its configuration to make your Greenstone installation operate. The install script explains what these are. You may need help from your system administrator to reconfigure the Web server he or she should be able to understand the instructions printed by the install script. For convenience the output of the install script is written to a file called INSTALL_RECORD in the directory into which you installed Greenstone. If you do not already have a Web server, you will have to install one and then configure it appropriately. Section A.2 gives a detailed account of the parts of an Apache Web server installation that affect Greenstone, and how they need to be altered. It comes down to including half a dozen or so lines in a configuration file. You do not need to be the Unix root user to go through this installation procedure. When it comes to configuring an existing Apache server, however, you may need root privileges it all depends on how Apache is set up. If you install Apache yourself, you can do it as a user without root privileges. If necessary, you can always install a second Apache Web server on your computer even if one exists already. HOW TO FIND GREENSTONE Here is how you get to the various components of the Greenstone software. Local Library (Windows only) Web Library (Windows and Unix) The Collector Administration If you are using the Local Library, simply run the Greenstone Digital Library program from the Start menu. This automatically starts your Internet browser and loads the Greenstone home page. The Demo collection is accessible from this page. If you are using the Web Library, once you have installed the software and configured the Web server, use this URL for your Greenstone home page: http://localhost/gsdl/cgi-bin/library. A link to the Collector is provided on the Greenstone home page. A link to the Administration pages is provided on the Greenstone home page. The administrator user is called admin, with a password that you specified during the installation process. The administrator is authorized to add new users and to build collections. TESTING AND TROUBLESHOOTING To test Greenstone enter the Greenstone Demo collection (or any other collections that you have installed) by clicking on its icon on the Greenstone home page. Click liberally: most images that appear on the screen are clickable. If you hold the mouse stationary over an image, most browsers will soon pop up a message that tells you what will happen if you click.

Table A.1 Troubleshooting your Greenstone installation. 8 Experiment! Choose common words such as the and and to search for that should evoke some responses, and nothing will break. (Note: Greenstone indexes all words; there are no stop words.) For more information see Chapter 6 (Section 6.1). LOCAL LIBRARY (WINDOWS ONLY) WEB LIBRARY (WINDOWS AND UNIX) BOTH VERSIONS Problem When I start Greenstone my computer asks me to dial up my Internet Service Provider. When I start Greenstone my computer still asks me to dial up my Internet Service Provider. When I point my browser at the digital library, it can t find that page. The Collector seems to be working very slowly! When I start Apache it quits immediately. When I point my browser at the digital library, it displays garbage a binary file. I get the Greenstone home page (Chapter 7, Figure 7.6), but the Demo collection icon does not appear. When I point my browser at the digital library, it can t find that page. Try this Push the Cancel button in the dialog box. This usually solves the problem. Choose the Restricted version when you run Greenstone. This version only works with Netscape. Check your Internet Proxy settings and turn proxies off (use Edit preferences on Netscape or Internet options on Explorer). Are you using Netscape under Windows 2000? If so, try using Internet Explorer instead on Windows 2000 (only) there seems to be some incompatibility with Netscape. Add a ServerName localhost directive to the Apache configuration file (see Section A.2). Check the ScriptAlias directive in the Apache configuration file, making sure it comes before the Alias directive (see Section A.2). Run the program library (in the cgi-bin directory) from the DOS (or shell) prompt to generate debugging information that will help you locate the problem. Try using 127.0.0.1 in place of localhost. This reserved IP number is defined to be a loopback to your local computer.

My browser complains that it can t find main.cfg. I m having trouble using the Collector. I ve added a new user but that user can t seem to log in. Check that the Greenstone files exist and are world-readable. If you are using the Web Library, try running the library program from the command line. If it runs OK, the problem is with file permissions (see Section A.2). If not, the gsdlhome variable is probably set incorrectly in the gsdlsite.cfg configuration file (see Chapter 7, Section 7.4). Read Section 6.2 of Chapter 6. Check that the directory C:\Program Files\gsdl\etc and all its contents are globally writable (see Section A.2). 9 Table A.1 gives troubleshooting information for common problems. Other problems are documented at www.greenstone.org as they arise, and you can report problems and get help with them there too. Testing the Collector The Collector allows you to build collections containing material of your choice. To test this use the Collector to create a new collection. When you get to the Source data page, specify a directory that contains several Web pages. If you like, you can read files off the Greenstone CD-ROM. On Windows, if the CD-ROM is in drive D:, specify file://d:/collect/dlpeople On Unix, first mount the CD-ROM and then specify file:///cdrom/collect/dlpeople This directory contains the home pages of New Zealand Digital Library Project members, for you to use as a test. Check when you get to the Build Collection page that the building operation terminates correctly, and then look on your Greenstone home page for the new collection. More details appear in Chapter 6 (Section 6.1). GREENSTONE COLLECTIONS Table A.2 Demonstration collections. Several Greenstone collections are included in the software distribution; they are listed in Table A.2. Others can be downloaded from the Greenstone Web site (www.greenstone.org). Name Collection Size Description demo Greenstone Demo 7 Mb Subset of the Humanity Development Library chinese Chinese Collection 1 Mb Classic Chinese literature.

gsarch Greenstone Archives 1 Mb The Greenstone mailing list 10 hdlsub Humanity Development Library Subset 150 Mb Larger subset of the Humanity Development Library wordpdf MSWord and PDF Demo 3 Mb Papers in Microsoft Word and PDF format The Demo collection is a small subset of the Humanity Development Library with which the book opened. It illustrates that relatively rich browsing capabilities can be provided, given appropriate metadata. Of course, if you use the Collector to clone this collection, your new collection will only share these features if your files provide appropriate metadata information. This collection is included automatically when the software is installed. When installing from the Greenstone CD-ROM, the install dialog asks you whether you want to include any other collections. The Chinese collection is a small collection of classic Chinese literature. You need to change your Preferences (Presentation preferences) to Chinese to see this properly. Chinese text is displayed by recent versions of Internet Explorer (Version 5 or greater), provided you download the Chinese character set (if you have not done this, Explorer will prompt you). It can be viewed on Unix with Netscape (4.5 or greater). If you want to use Netscape under Windows, you will have to download a Chinese language module such as NJStar Communicator, and you may have to make Greenstone use a different coding (on the Preferences menu, select GBK rather than UTF-8, the default). To input Chinese you can use NJStar Communicator on Windows, or CXterm on Unix (copy and paste the Chinese characters to the search box). The Greenstone Archives collection is an archive of the Greenstone mailing list, which shows how the software can be used to search and browse e-mail archives. The Humanity Development Library Subset is a subset of the Humanity Development Library, like the Greenstone Demo collection but much larger. It contains about 250 publications books, reports, and magazines in various areas of human development (the full Humanity Development Library contains five times as many publications). The MSWord and PDF Demo contains a small collection of papers, written by various members of the New Zealand Digital Library Project, in both Microsoft Word and PDF format. The original source documents are provided for viewing. The HTML versions are used for full-text indexing, but some are poorly formatted because of deficiencies in the programs that convert documents to HTML. The other collections demonstrate various capabilities of Greenstone; the approximate amount of disk space needed for each collection is shown in Table A.2. You can add these capabilities to your own collections by using the Collector to clone the appropriate demonstration collection. ASSOCIATED SOFTWARE Table A.3 shows software that you may need to download to get your Greenstone installation running properly. Your computer may already have this software installed, while for simple installations (e.g., Windows Local

Library), none of it is necessary. 11 Table A.3 Where to get associated software. Software Apache Web server Perl programming language GCC C++ compiler GDBM database system Internet address www.apache.org Windows: www.activestate.com Unix: www.perl.com www.gnu.org www.gnu.org Apache web server. To run any version of Greenstone apart from the Windows Local Library version, you need an external Web server. Many installations, particularly larger ones, will already have a Web server. If you are using Linux, Apache may be on your installation disk but may not have been selected during the installation procedure. The Apache Web server is free, and easy to install. Perl. Greenstone uses the Perl language when building collections. For Windows, Perl is already included in the Greenstone software. Most Unix systems already have Perl installed, but if not, source code and binaries for a wide range of Unix platforms are freely available. Perl version 5 or higher is needed. GCC. The Unix version of Greenstone compiles under the GNU C++ compiler, GCC. Greenstone makes extensive use of the C++ standard template library. GDBM. All versions of Greenstone use the GNU Database Manager. It is supplied with all Windows versions of Greenstone and is installed automatically during the installation procedure. Linux systems already have GDBM, so we do not provide it. Most other Unix systems have it too, but if necessary you can download it. A.2 Setting up the Web server Now we describe how to set up your Web server to work with Greenstone. Note that all this is unnecessary when using the Windows Local Library, because this software works out of the box and does not require a Web server. We cover both the Apache Web server, which is freely available for both Windows and Unix (see Table A.3), and Microsoft s Personal Web Server (PWS) and Internet Information Services (IIS) Web server. PWS is the standard Microsoft server for Windows 95/98/Me; IIS is the standard Web server for Windows 2000 and Windows XP; Windows NT can use either. The Apache description applies equally to the Windows Web Library and Unix versions (although we use Windows-style terminology and path names); the PWS/IIS section applies only to the Windows Web Library.

12 Once you have installed your Web server, the next step is to install Greenstone. We will assume that during the install procedure you have taken the default action for each stage by clicking on the Next button. The result is that the directory C:\Program Files\gsdl is created and the Web Library binary is stored there, along with some supporting files. All Web servers use the special URL localhost to denote the computer that the Web server is running on. Thus when you install a Web server, you can get at your HTML documents by typing the URL http://localhost into a browser. If a domain name is set up on your computer, this is used instead of localhost to distinguish your computer from remote sites. For example, on the New Zealand Digital Library s computer, http://nzdl.org and http://localhost are equivalent. If you type http://nzdl.org on your computer, you will get the New Zealand Digital Library Web server, whereas if you type http://localhost you will get your own computer s Web server. APACHE WEB SERVER Setting up the Greenstone cgi-bin directory The Apache Web server is usually installed in C:\Program Files\Apache Group\Apache and is configured so that the cgi-bin directory is in the subdirectory \cgi-bin and the document root is the subdirectory \htdocs. It is reconfigured by editing the file C:\Program Files\Apache Group\ Apache\conf\httpd.conf. This is a text file: it s quite easy to read it to see how things are set up. Depending on how your computer s networking software is set up, you may have to add this line to httpd.conf: ServerName localhost If this line is not included, the system attempts to find your server s name. However, there are bugs in some versions of Windows that cause this to fail. These cause Apache to terminate as soon as you start it up. It does display an error message, but it is immediately erased and you probably can t read it. Cgi-bin is a directory from which the Web server treats documents as executable programs. Apache s ScriptAlias directive is used to create a cgibin directory. Note that this directive can make any directory a CGI executable directory it doesn t have to be called cgi-bin! Conversely a directory called cgi-bin isn t special unless ScriptAlias has been applied to it. When installed, Apache has a cgi-bin directory of C:\Program Files\Apache Group\Apache\cgi-bin. This means that if presented with the URL http://localhost/cgi-bin/hello, the Web server will attempt to execute a file called hello from within the above directory. There is one Greenstone program, called library.exe, that needs to be executed by the Web server; it in turn reads a file called the Greenstone site configuration file, or gsdlsite.cfg, which needs to be located in the same directory. The best way of arranging this is to use Apache s ScriptAlias directive to create a new cgi-bin directory. Here s the excerpt from the httpd.conf configuration file that adds C:\Program Files\gsdl\cgi-bin as an additional

cgi-bin directory: 13 ScriptAlias /gsdl/cgi-bin/ "C:/Program Files/gsdl/cgi-bin/" <Directory C:/Program Files/gsdl/cgi-bin> Options None AllowOverride None </Directory> (It s a curious fact that Apache configuration files use forward slashes in place of standard Windows backslashes.) This means that any URLs of the form http://localhost/gsdl/cgi-bin... will be sought in the directory C:\Program Files\gsdl\cgi-bin and executed by the Web server. For example, if presented with the URL http://localhost/gsdl/cgibin/hello, the Web server will attempt to retrieve the file C:\Program Files\gsdl\cgi-bin\hello and execute it. However, the URL http://localhost/cgi-bin/hello looks in Apache s regular cgi-bin directory for the file C:\Program Files\Apache Group\Apache\cgi-bin\hello and executes it, just as it did before. Document root directory The document root directory is the root of your Web server s directory structure. When installed, Apache has a document root of C:\Program Files\Apache Group\Apache\htdocs. This means that if presented with the URL http://localhost/hello.html, the Web server will attempt to retrieve a file called hello.html from within the htdocs directory. Several Greenstone files need to be read by the Web server. The simplest way to arrange this is to use the Alias directive, which is just like ScriptAlias except that it applies to ordinary Web pages, not CGI scripts. Insert these lines into your Apache configuration file, after the ScriptAlias directive, to add C:\Program Files\gsdl as an additional place to look for documents. Alias /gsdl/ "C:/Program Files/gsdl/" <Directory C:/Program Files/gsdl> Options Indexes MultiViews FollowSymLinks AllowOverride None Order allow,deny Allow from all </Directory> This means that any URLs that match the first argument of the Alias command (i.e., gsdl) are sought as files in the place corresponding to the second argument. In other words, URLs of the form http://localhost/gsdl/... will be sought as files in the directory C:\Program Files\gsdl. For example, if presented with the URL http://localhost/gsdl/hello.html, the Web server will attempt to retrieve the file C:\Program Files\gsdl\hello.html. However, the URL http://localhost/hello.html looks in the regular htdocs directory for the file C:\Program Files\Apache Group\Apache\htdocs\hello.html, just as it did before. Be sure to add the Alias directive after the ScriptAlias directive. Instructing Apache to alias /gsdl before /gsdl/cgi-bin would match the URL /gsdl/cgibin/library against the Alias directive rather than the ScriptAlias, and it would be interpreted as a request for a document rather than the result of executing a program. The outcome would be to display the binary program file as a page in the Web browser, instead of executing it.

SECURITY 14 You should be aware that if the Web Library version of Greenstone is set up as instructed earlier in this chapter, anyone will be allowed to download any file in the gsdl directory structure. This includes the index files and source documents of any collections you make, the user database, usage logs, and so on. If you are concerned about this, you can easily tighten up your Web server configuration to improve security. For the Apache server, put these lines into the configuration file instead of those given above: Alias /gsdl/ "C:/Program Files/gsdl/" <Directory "C:/Program Files/gsdl"> Order allow,deny Deny from all <FilesMatch "\.(gif jpe?g png css mov mpeg ps pdf doc rtf jar class)$"> Order allow,deny Allow from all </FilesMatch> </Directory> This means that only files whose extensions match the regular expression in the FilesMatch line may be downloaded. PWS AND IIS WEB SERVERS If the PWS or IIS Web server is not installed by default, either can easily be installed using the Add/Remove Programs control panel. If they are not already on your Windows distribution CD-ROM, you will have to download them from the Microsoft Web site (www.microsoft.com). The setup procedure for Greenstone is identical for both PWS and IIS. Invoke the Personal Web Manager and perform the following actions. 1. Select Advanced to get the Advanced Options screen. 2. Select Home and click Add. Fill out the fields as follows: Directory field: C:\Program Files\gsdl Alias field: gsdl Access permissions: Read Application permissions: None Then Click OK. This makes Greenstone files accessible to the Web server. 3. Back in Advanced Options, select gsdl and click Add. Fill out the fields as follows: Directory field: C:\Program Files\gsdl\cgi-bin Alias field: cgi-bin Access permissions: None Application permissions: Execute Then Click OK. This allows the Greenstone program library.exe to be executed by the Web server. 4. Go to the URL http://localhost/gsdl/cgi-bin/library.exe.

15 Note: With PWS and IIS you do need to specify the.exe file extension. FILE PERMISSIONS For Greenstone to work properly, access permissions for certain files must be set up appropriately. This not necessary for Windows 95/98, because these systems don t identify the owners of files. Also there is a configuration file associated with each Greenstone site. The install procedure creates a generic configuration file based on your installation choices; however, its contents can be tailored to cope with different situations. This is explained in Chapter 7 (Section 7.4). On Windows NT/2000 and Unix systems, CGI scripts don t run as normal users, because users can t be identified over the Web. Instead they run as the user who started up the Web server program (on Windows systems), or as a special user (commonly called nobody on Unix systems). Because of this, all files and directories within C:\Program Files\gsdl need to be globally readable (or at least readable by the CGI user, perhaps nobody). To test whether file permissions are set up correctly, run the program library.exe from the command line. If the files are in the right places but the permissions are set incorrectly, it will run from the command line that is, when you execute it but not from a browser that is, when the nobody user executes it. Another test is to log in as another user to see if the file permissions are specific to your original user account. To work through a Web browser, all the Greenstone directories must be globally readable. Also the C:\Program Files\gsdl\etc directory and all its contents must be globally writable (or at least writable by the CGI user, perhaps nobody). This is the directory into which the library program writes the usage log, error and initialization logs, and various user databases. If you re reluctant to make this directory globally writable, you can set permissions so that just the files errors.txt, events.txt, key.db, users.db, history.db, and usage.txt are writable by the CGI user. If file permissions are not set up correctly for C:\Program Files\gsdl\etc, you may find that user authentication and search history do not work and that no usage log (usage.txt) is generated. A.3 Managing your site Once you get your Greenstone installation up and running, you will probably want to personalize it a little: create your own home page, make a URL through which people can access it conveniently, set up certain users with the ability to build new collections, and monitor what they do. PERSONALIZING THE GREENSTONE HOME PAGE The file that generates the Greenstone home page is called home.dm and is located in the macros subdirectory of the directory into which you installed Greenstone. (The default for Windows systems is C:\Program Files\gsdl.) It is a plain text file that you will have to change to create a new home page. Instead of editing it, we recommend creating a new file say, yourhome.dm.

16 This will be like home.dm but will define package home which is the bit that does the actual work in a different way. The macro language is described in Chapter 7 (Section 7.2), and an example of how to personalize your home page is included there (Figure 7.7). Along with that example you will find explicit instructions on how to make the new file yourhome.dm. To make the system use your new home page instead of the default, first put the yourhome.dm file into the Greenstone macros directory. Then edit the main.cfg configuration file to replace home.dm with yourhome.dm in the list of macro files that are loaded at start-up. This is described in the subsection of Section 7.2 entitled Making macros work. REDIRECTING A URL TO GREENSTONE People often want to redirect a particular URL to their Greenstone installation. For example, on the New Zealand Digital Library computer, the URL http://nzdl.org (which is shorthand for http://nzdl.org/index.html) is redirected to http://nzdl.org/cgi-bin/library. The Apache Web server accomplishes this with a directive called Redirect. Like other directives, it should be placed in the Apache configuration file (normally in the directory C:\Program Files\Apache Group\Apache\conf\httpd.conf). To redirect the URL http://www.yourserver.com to your Greenstone system at http://www.yourserver.com/cgi-bin/library, put this line into httpd.conf: Redirect /index.html http://www.yourserver.com/cgi-bin/library Then people can reach your digital library system from the URL http://www.yourserver.com. Instead, if you want to redirect a URL like http://www.yourserver.com/greenstone to http://www.yourserver.com/cgibin/library, include the following in the httpd.conf file: Redirect /greenstone http://www.yourserver.com/cgi-bin/library If your computer doesn t have a domain name (like the www.yourserver.com just cited), replace www.yourserver.com by localhost. So long as the browser is running on the same machine as the Web server which it must be if your computer doesn t have a domain name this has the same effect as the redirections presented here. Instead of putting redirect directives into the file httpd.conf, you can equally well put them into a file called.htaccess within your server s document root directory. In fact, doing so has two advantages. First, changes to.htaccess take effect immediately, whereas you have to restart the Apache Web server to see the effect of changes to httpd.conf. Second, on Unix systems you usually have to be logged in as the root user to edit httpd.conf, whereas you don t to edit.htaccess. ADMINISTRATIVE FACILITY An administrative facility is included with every Greenstone installation. To access this facility, click the link on the home page.

Figure A.2 Greenstone Administration facility. 17

18 The entry page, shown in Figure A.2, gives information about each of the collections offered by the system. Note that all collections are included there may be private ones that do not appear on the Greenstone home page. With each is given its short name, full name, whether it is publicly displayed, and whether or not it is running. Clicking a particular collection s abbreviation (the first column of links in Figure A.2) brings up information about that collection, gathered from its collection configuration file and from other internal structures created for that collection. If the collection is both public and running, clicking the collection s full name (the second link) takes you to the collection itself.

Figure A.3 Information about the Women s History Excerpt collection. 19

20 As an example, the collection we built in Chapter 6 (Section 6.1) was named wohiex for Women s History Excerpt and is visible near the bottom of Figure A.2. Figure A.3 shows the information that is displayed when this link is clicked. The first section gives some information from the configuration file and the size of the collection (about 1,000 documents, about a million words, over 6 Mb). The next sections contain internal information related to the communications protocol through which collections are accessed. For example, the filter options for QueryFilter show the options and possible values that can be used when querying the collection. The Administration facility also presents configuration information about the installation and allows it to be modified. It gives access to the error logs that record internal errors and to the user logs that record usage. It enables a specified user (or users) to authorize others to build collections and add new material to existing ones. All these facilities are accessed interactively from the menu items at the left-hand side of Figure A.2. CONFIGURATION FILES AND LOGS Two files are used to configure various aspects of your Greenstone site: the site configuration file, gsdlsite.cfg, found in Greenstone s cgi-bin directory, and the main configuration file, main.cfg, in the etc directory. The site configuration file is used to tailor the software for the site where it is installed, while the main configuration file contains information that is common to the interface of all collections served by the site. The facilities these files provide are described in Chapter 7 (Section 7.4). Both can be viewed from the Greenstone administration page. The system maintains logs of activity (depending on configuration file settings): error logs, event logs, and usage logs. Again these are described in Chapter 7 (Section 7.4). The usage log is kept in the file usage.txt in the etc directory of the Greenstone file structure. When logging is enabled, every action by every user is logged. However, only the last 100 entries in the log file are displayed by the usage log link shown in Figure A.2. USER MANAGEMENT Greenstone incorporates an authentication scheme which can be used to control access to certain facilities. At present this is only used to restrict the people who are allowed to enter the Collector and perform certain administration functions. If, for a particular collection, it were necessary to authenticate users before returning information to them, this is possible too for example, documents could be protected on an individual basis so that they can only be accessed by registered users on presentation of a password. Authentication is done by requesting a user name and password, as illustrated in Figure 6.1a (Chapter 6). From the Administration page users can be listed, new ones added, and old ones deleted. The ability to do this is of course also protected: only users who have administrative privileges can add new users. It is also possible for each user to belong to either or both of two groups: administrator and colbuilder. Members of the first can add and remove users, and change their groups. Members of the second can access the facilities described earlier to build new collections and alter (and delete) existing ones.

21 When Greenstone is installed there is one user called admin who belongs to both groups. The password is set during the installation process. This user can create new names and passwords for users who belong only to the colbuilder group, which is the recommended way of giving other users the ability to build collections. User information is recorded in two databases that are placed in the Greenstone file structure. TECHNICAL INFORMATION The links under the Technical information heading show further information on the installation. The general link gives access to technical information, including the directories where things are stored. The Protocols menu item gives, for each possible protocol type, information about each of the collections supported by that protocol. Finally, user interface code (called the receptionist) uses actions to communicate the wishes of the user. These actions correspond to the CGI argument labeled a. For example, if a=status, the receptionist invokes the status action (which displays the Status page). A menu item gives access to lists of all actions supported by the system, and another leads to the arguments that these actions take.