SOLR INSTALLATION & CONFIGURATION GUIDE FOR USE IN THE NTER SYSTEM Prepared By: Leigh Moulder, SRI International leigh.moulder@sri.com
TABLE OF CONTENTS Table of Contents. 1 Document Change Log 2 Solr Server Information. 3 Account Information. 3 Installation Locations 3 Resources 3 Solr Architecture 4 Features of Solr 4 Solr Deployments 4 Master Solr Deployment 4 Local Solr Deployment 4 Solr Server Installation 4 Gather Software.. 4 Solr Home Directory.. 5 Tomcat Configuration.. 7 Solr Webapp.. 8 Solr Server Upgrade. 9 Gather Software.. 9 Update Solr Files. 9 Solr Webapp 11 Update Index.. 11 Securing Solr. 11 Configure Basic Authentication. 11 Advanced Solr Security 12 Configure Basic Authentication. 13 SolrSearch Webapp.. 14 SolrSearch Webapp Upgrade 15 Solr Web Portlet Installation 16 Gather Software 16 Installation.. 16 Solr Web Portlet Upgrade. 16 Appendix A. 17 Account Information.. 17 Installation Locations. 17 Solr Installation Guide 1
DOCUMENT CHANGE LOG Release Date Document Version Notes 8/1/2011 1.0 Initial Release 10/1/2011 1.1 Added details on the Solr deployment environment Updated installation procedure Updated document formatting 12/7/2011 1.2 Removed reference to Nutch download 2/2/2012 1.3 Updated documentation to Solr 3.5.0 2/14/2012 1.4 Included information for advanced Solr configurations Updated steps to secure Solr 2/17/2012 1.5 Improved installation steps Solr Installation Guide 2
SOLR SERVER INFORMATION The following information should be collected prior to starting the installation process. The information will be referenced and used throughout this installation guide. ACCOUNT INFORMATION Account Referenced As Value Solr Server host ${solr.host} Solr Server user ${solr.user} Solr Server password Tomcat account ${solr.password} ${tomcat.user} INSTALLATION LOCATIONS Directory Referenced As Value Tomcat home ${catalina.home} Tomcat base ${catalina.base} Liferay deploy directory Solr Home directory Solr URL Solr webapp user (optional) Solr webapp password (optional) Solr OpenSearch URL ${deploy.dir} ${solr.home} ${solr.url} ${solr.web.user} ${solr.web.password} ${opensearch.url} RESOURCES Solr Download page NTER Solr Webapp NTER Solr OpenSearch http://www.apache.org/dist/lucene/solr/3.5.0/apache solr 3.5.0.tgz http://plugins.nterlearning.org/6.0.x/solr web 6.0.6.8.war http://plugins.nterlearning.org/6.0.x/solrsearch.war Solr Installation Guide 3
SOLR ARCHITECTURE Solr is an open source, enterprise level index and search server. It uses a combination of HTTP, XML, and JSON as communication standards to and from client users. FEATURES OF SOLR In addition to a number of search related features, one of the major benefits provided by a Solr server is its ability to handle multiple, independent indexes. These indexes are known as cores and can represent data from separate sources. In addition to hosting multiple cores, a single Solr server can also host multiple web servlets that provide connectivity to these cores. This functionality allows a master Solr server to be created that can host indexes and search engines for multiple NTER instances. Figure 1 represents the architecture of a Solr server hosting multiple cores as well as multiple web servlets. Figure 1 Multi core Solr Architecture SOLR DEPLOYMENTS This document describes two major Solr deployment strategies, master and local. MASTER SOLR DEPLOYMENT A master Solr deployment needs to be configured only once for the entire NTER infrastructure and is used to store the full text index, as well as local indexes for other NTER installations. A Master Solr deployment consists of a Solr Server configured with multiple cores, a Solr webapp, and an OpenSearch webapp. The number of cores depends on how many NTER instances the server is supporting. There needs to be a core per NTER as well as one for the full text index. Figure 1 depicts a Master Solr installation hosting a core for full text indexing and cores for two different NTER installations. LOCAL SOLR DEPLOYMENT A local Solr deployment must be configured for each NTER installation and is used to store the local index. A Local Solr deployment consists of a Solr Server configured with a single core, a Solr webapp, and a solr web webapp. SOLR SERVER INSTALLATION Note: These instructions assume that Tomcat is already installed and running correctly. GATHER SOFTWARE 1. Download and extract the Solr binary files to the /tmp directory. cd /tmp Solr Installation Guide 4
wget http://www.apache.org/dist/lucene/solr/3.5.0/apache-solr-3.5.0.tgz tar xzf apache-solr-3.5.0.tgz 2. Download and extract the Liferay Solr web portlet to the /tmp directory. cd /tmp wget http://plugins.nterlearning.org/solr-web-portlet-6.0.6.8.war unzip solr-web-portlet-6.0.6.8.war -d solr-web 3. Download and extract the Solr OpenSearch web portlet. cd /tmp wget http://plugins.nterlearning.org/solrsearch.war unzip solrsearch.war d solrsearch SOLR HOME DIRECTORY 1. Create the Solr home and data directories. These directories will be used to store the configuration files and data used by the various cores. A distinct core must be created for each index hosted by the Solr installation. It is recommended that even if the installation will only initially support a single index, to create a core for it to simplify future upgrades. cd / mkdir p ${solr.home} mkdir p ${solr.home}/cores/nutch mkdir p ${solr.home}/cores/nter mkdir p ${solr.home}/cores/${core.name} 2. Create a ${solr.home}/solr.xml file that contains the following. Set the solr.contrib.dir to the fully qualified path and include a <core> value for each directory created above. <solr persistent="false" sharedlib="lib"> <property name="solr.contrib.dir" value="${solr.home}/contrib"/> <cores adminpath="admin/cores" > <core name="nter" instancedir="cores/nter"/> <core name="nutch" instancedir="cores/nutch"/> <core name="${core.name}" instancedir="cores/${core.name}"/> </cores> </solr> 3. For each core created, copy the example conf directory from the Solr source. This will provide a template and starting point for further configurations. cd ${solr.home} cp r /tmp/apache-solr-3.5.0/example/solr/conf cores/nter Solr Installation Guide 5
cp r /tmp/apache-solr-3.5.0/example/solr/conf cores/nutch 4. For each core created above, copy the template solrconfig.xml file into the config directory. cd ${solr.home} cp /tmp/solrsearch/web-inf/conf/solrconfig.xml cores/nter/conf cp /tmp/solrsearch/web-inf/conf/solrconfig.xml cores/nutch/conf 5. If creating an index for NTER, copy the schema.xml file from the solr web directory to the conf directory. cd ${solr.home}/cores/nter/conf mv schema.xml schema.xml.orig cp /tmp/solr-web/web-inf/conf/schema.xml. 6. Copy the Nutch schema.xml file into the Nutch core s conf directory. cd ${solr.home}/cores/nutch/conf mv schema.xml schema.xml.orig cp /tmp/solrsearch/web-inf/conf/nutch_schema.xml schema.xml 7. Copy the jar files needed for Solr. cd ${solr.home} cp r /tmp/apache-solr-3.5.0/dist/*.jar lib 8. Copy the Solr contrib directory. cd ${solr.home} cp r /tmp/apache-solr-3.5.0/contrib. 9. Create a SOLR_HOME environment variable by adding the following line to the /etc/environment file. SOLR_HOME="${solr.home}" 10. Set the following permissions on the Solr data directories: cd ${solr.home} chown R ${tomcat.user}.${tomcat.user} * chmod R 755 * At this point, the Solr configuration files are setup correctly. There is no need to create a data or index directory as these will automatically be created during the Webapp startup. Solr Installation Guide 6
TOMCAT CONFIGURATION This section assumes Tomcat is running behind an Apache Server. If Tomcat is running as a standalone web server, this section should be skipped. APR LIBRARIES Verify that Tomcat is using the APR libraries. This improves overall performance. This can be verified by search for the following line in Tomcat s log file: INFO: Loaded APR based Apache Tomcat Native library 1.1.19. APACHE MOD_JK 1. Update mod_jk.conf with the following: JkMount /solr JkMount /solr/* JkMount /solrsearch JkMount /solrsearch/* ajp13_worker ajp13_worker ajp13_worker ajp13_worker 2. Optionally, redirect any requests to /opensearch to the correct Webapp. Note this should only be done if a single SolrSearch webapp is deployed on the Tomcat server. JkMount /solr JkMount /solr/* ajp13_worker ajp13_worker RedirectMatch ^/opensearch /solrsearch/opensearch JkMount /solrsearch ajp13_worker JkMount /solrsearch/* ajp13_worker This simplifies the request URL from http://{solr.url}/solrsearch/opensearch to http://{solr.url}/opensearch. 3. Restart the Apache and Tomcat servers /etc/init.d/apache2 restart /etc/init.d/tomcat6 restart TOMCAT SERVER.XML Depending on the initial deployment of Apache and Tomcat, port 8080 may, by default, be disabled. This is common when Apache is configured to run as a front end to Tomcat. However, in some situations, it is desirable to allow Tomcat to listen on port 8080 as well. In this Solr environment, OpenSearch requests need to be available on port 80, via Apache. However, the ability to update and commit index changes needs to remain hidden from the general user. To accomplish this, port 8080 will be opened for the /solr webapp. 1. Add the following to the Catalina service section of ${catalina.home}/conf/server.xml Solr Installation Guide 7
<Connector executor="tomcatthreadpool" port="8080" protocol="http/1.1" ConnectionTimeOut="20000" redirectport="8443"/> 2. Restart Tomcat /etc/init.d/tomcat6 restart SOLR WEBAPP 1. Add the Solr home value to Tomcat s startup routine by updating Tomcat s JAVA_OPTS property. vi ${catalina.home}/bin/setenv.sh Add the following line. Note if a JAVA_OPTS line already exists, append the Solr home property to it. JAVA_OPTS=-Dsolr.solr.home=${solr.home} If the setenv.sh file did not exist, ensure that the permissions are set correctly. chmod 755 ${catalina.home}/bin/setenv.sh 2. If the Tomcat6 directory is not located under the user s home directory, create a soft link from the home directory to the Tomcat directory. That will simplify installation, and later troubleshooting. cd ~ ln s ${catalina.base} tomcat 3. Copy the solr.war file to Tomcat s webapp directory. cd ~/tomcat/webapps cp /tmp/apache-solr-3.5.0/dist/apache-solr-3.5.0.war solr.war chown tomcat6.tomcat6 solr.war chmod 644 solr.war 4. Assuming Tomcat is running, the solr.war file should automatically be extracted and create a solr webapp. The Tomcat log file ${catalina.base}/logs/catalina.out should be tailed to verify that Solr was successfully deployed. During deployment, a data directory should be created for each core. 5. To modify the amount of logging information displayed by the Solr webapp, create the following files: ${catalina.base}/webapps/solr/web INF/classes/logging.properties Solr Installation Guide 8
org.apache.solr.level=warning ${catalina.base}/webapps/solr/web INF/classes/log4j.xml <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> <log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/"> <appender name="console" class="org.apache.log4j.consoleappender"> <layout class="org.apache.log4j.patternlayout"> <param name="conversionpattern" value="%d{absolute} %-5p [%c{1}:%l] %m%n" /> </layout> </appender> <appender name="drfa" class="org.apache.log4j.dailyrollingfileappender"> <param name="file" value="${catalina.base}/logs/solr.out"/> <param name="datepattern" value=". yyyy-mm-dd"/> <param name="append" value="true"/> <layout class="org.apache.log4j.patternlayout"> <param name="conversionpattern" value="%d{iso8601} %-5p %c{2} - %m%n"/> </layout> </appender> <category name="org.apache.solr" additivity="false"> <level value="warn"/> <appender-ref ref="console"/> </category> <category name="org.apache.solr" addititivity="false"> <level value="info"/> <appender-ref ref="drfa"/> </category> </log4j:configuration> SOLR SERVER UPGRADE These instructions are designed to provide steps to upgrade a previous Solr installation to the latest version. As such, they will only list changes that must be made to an existing Solr installation. GATHER SOFTWARE 1. Download and extract the Solr files listed in the Solr Server Installation section. UPDATE SOLR FILES 1. Copy the updated jar files needed for Solr. Solr Installation Guide 9
cd ${solr.home} mv lib lib_1.4 cp r /tmp/apache-solr-3.5.0/dist/*.jar lib 2. Copy the updated contrib directory. cd ${solr.home} mv contrib contrib_1.4 cp r /tmp/apache-solr-3.5.0/contrib. 3. Update the schema.xml for Nutch cores. This must be done for each Nutch core originally created. cd ${solr.home}/cores/nutch/conf mv schema.xml schema.xml_1.4 cp /tmp/solrsearch/web-inf/conf/nutch_schema.xml schema.xml 4. Update the schema.xml for NTER cores. This must be done for each NTER core originally created. cd ${solr.home}/cores/nter/conf mv schema.xml schema.xml_1.4 cp /tmp/solr-web/web-inf/conf/schema.xml. 5. Update the solrconfig.xml file. This must be done for each core, regardless of the core being used for Nutch or NTER. cd ${solr.home}/cores/${core.name}/conf mv solrconfig.xml solrconfig.xml_1.4 cp /tmp/solrsearch/web-inf/conf/solrconfig.xml. 6. Update the solr.xml file to include the new property solr.contrib.dir. The new file should be similar to the following: <solr persistent="false" sharedlib="lib"> <property name="solr.contrib.dir" value="${solr.home}/contrib"/> <cores adminpath="admin/cores" > <core name="nter" instancedir="cores/nter"/> <core name="nutch" instancedir="cores/nutch"/> <core name="${core.name}" instancedir="cores/${core.name}"/> </cores> </solr> 7. Once it has been verified that Solr is running correctly, remove the old configuration and setup files. rm rf ${solr.home}/lib_1.4 ${solr.home}/contrib_1.4 Solr Installation Guide 10
SOLR WEBAPP Redeploy the Solr webapp using the steps listed under Solr Server Installation, Solr Webapp. UPDATE INDEX Because both the solrconfig.xml files and the Nutch schema have changed, a reindexing must be done for all affected cores. For NTER cores, this can be done through the Control Panel of each NTER instance. For Nutch cores, this must be done on the Master NTER node. SECURING SOLR It is highly recommended to secure the Solr installation to prevent unauthorized and unwanted access to the Solr index. By default, Solr is open and unsecured. This means, that anyone with the Solr server s URL has the potential to directly access and manipulate the Solr index. CONFIGURE BASIC AUTHENTICATION Basic authentication forces a username and password credential set to be presented to access the Solr service. 1. Add user accounts to Tomcat that will be used to connect to Solr. Edit the ${catalina.base}/conf/tomcatusers.xml file to include: <tomcat-users> <role rolename="index_admin"/> <user username="${solr.web.user}" password="${solr.web.password}" roles="index_admin"/> </tomcat-users> 2. Configure Basic Authentication to just the Solr webapp by editing ${catalina.base}/webapps/solr/web INF/web.xml: <web-app> <!-- Limit Tomcat s admin user to the admin page --> <security-constraint> <web-resource-collection> <web-resource-name>solr Admin</web-resource-name> <url-pattern>/admin/*</url-pattern/> </web-resource-collection> <auth-constraint> <role-name>admin</role-name> </auth-constraint> </security-constraint> <!-- Limit Tomcat s admin and index_admin to the update page --> Solr Installation Guide 11
<security-constraint> <web-resource-collection> <web-resource-name>solr Update</web-resource-name> <url-pattern>/update/*</url-pattern/> </web-resource-collection> <auth-constraint> <role-name>admin</role-name> <role-name>index_admin</role-name> </auth-constraint> </security-constraint> <security-role> <role-name>admin</role-name> <role-name>index_admin</role-name> </security-role> <!-- define login configuration --> <login-config> <auth-method>basic</auth-method> <realm-name>userdatabase</realm-name> </login-config> </web-app> Ensure that any NTER instances connected to this Solr webapp has the correct username and password configured accordingly in their solr spring.xml file. ADVANCED SOLR SECURITY This section describes securing a more complicated Solr installation where a single Solr server is hosting indexes for multiple organizations and institutions. In this use case, the multiple organizations are storing both their local Solr index, and their full text index on a single Solr server. This architecture is described in Figure 2, where the Master Solr server is hosting a collection of cores for the Master NTER installation, Institution A (InstA), and Institution B (InstB). Since each of these installations are storing both their local Solr index, as well as their full text index on the Solr server, they also require a unique OpenSearch portlet. This configuration is atypical and should be configured with caution. Typically, secondary institutions would rely on the Master NTER for the full text index and search capabilities instead of generating their own. However, this arrangement can be useful for test environments, or where the secondary institutions are unable to create their own Solr architecture. Solr Server Solr OpenSearch InstA OpenSearch InstB OpenSearch Servlets Nutch NTER M InstA InstA Nutch InstB InstB Nutch Cores Figure 2 Advanced Multi core Solr Architecture Solr Installation Guide 12
CONFIGURE BASIC AUTHENTICATION As with the more basic implementation described under the Securing Solr section above, basic authentication forces a username and password credential set to access a particular Solr instance. However, unlike the above section, each institution will now be configured for secured access, as opposed to the entire Solr site. 1. Add user accounts to Tomcat that will be used to connect to Solr. Edit the ${catalina.base}/conf/tomcatusers.xml file to include an administrative role for each institution as well as a unique administrative user. <tomcat-users> <role rolename="index_admin"/> <role rolename="institution_a_index_admin"/> <role rolename="institution_b_index_admin"/> <user username="${solr.web.user}" password="${solr.web.password}" roles="index_admin"/> <user username="instituion_a_admin" password="${institution_a_password}" roles="institution_a_index_admin"/> <user username="instituion_b_admin" password="${institution_b_password}" roles="institution_n_index_admin"/> </tomcat-users> 2. Configure Basic Authentication to the Solr webapp for each particular institution by editing ${catalina.base}/webapps/solr/web INF/web.xml. Ensure that each role created above is listed in the <security role /> entry and that a <security constraint /> entry is created for each core with the appropriate core admin role. <web-app> <!-- Limit Tomcat s admin user to the admin page --> <security-constraint> <web-resource-collection> <web-resource-name>solr Admin</web-resource-name> <url-pattern>/admin/*</url-pattern/> </web-resource-collection> <auth-constraint> <role-name>admin</role-name> </auth-constraint> </security-constraint> <!-- Limit access to the update page for each institution --> <security-constraint> <web-resource-collection> <web-resource-name>solr Update</web-resource-name> <url-pattern>/{core.name}/update/*</url-pattern/> </web-resource-collection> <auth-constraint> <role-name>admin</role-name> <role-name>{core.index.admin}</role-name> Solr Installation Guide 13
</auth-constraint> </security-constraint> <security-role> <role-name>admin</role-name> <role-name>index_admin</role-name> <role-name>institution_a_index_admin</role-name> <role-name>institution_b_index_admin</role-name> </security-role> <!-- define login configuration --> <login-config> <auth-method>basic</auth-method> <realm-name>userdatabase</realm-name> </login-config> </web-app> 3. Ensure that each institution configures their local solr web portlet and Nutch configuration to use the correct username and password. SOLRSEARCH WEBAPP The Solrsearch Webapp is designed to produce OpenSearch compliant results from the Full text index. This is only needed on the Master Solr server hosting the full text index. This webapp should be configured to point to the nutch core created previously. 1. Copy the solrsearch.war file to Tomcat s Webapp directory cp /tmp/solrsearch.war ${catalina.base}/webapps 2. Wait for Tomcat to automatically deploy the portlet, and then edit the ${catalina.base}/webapps/solrsearch/web INF/classes/META INF/solr spring.xml file. This file is used to configure the Solr OpenSearch portlet to use a particular type of Solr server as well as the specifics of that server. Modify the username, password, and url properties of the com.sri.nter.solr.server.basicauthsolrserver bean. If no username or password is used for the remote server, remove those properties: <bean id="com.sri.nter.solr.server.basicauthsolrserver" class="com.sri.nter.solr.server.basicauthsolrserver"> <constructor-arg type="java.lang.string" value="${solr.web.user}"/> <constructor-arg type="java.lang.string" value="${solr.web.password}"/> <constructor-arg type="java.lang.string" value="${solr.url}"/> </bean> Solr Installation Guide 14
Or <bean id="com.sri.nter.solr.server.basicauthsolrserver" class="com.sri.nter.solr.server.basicauthsolrserver"> <constructor-arg type="java.lang.string" value="${solr.url}"/> </bean> Ensure that the com.sri.nter.solr.searcher.server corresponds to the correct Solr bean created above. 3. Logging has already been configured for the solrsearch portlet, however, to modify the default settings, update the ${catalina.base}/webapps/solrsearch/web INF/classes/logging.properties and log4j.xml files accordingly. Note that each OpenSearch webapp can only be mapped to a single Solr core. However, to host multiple OpenSearch webapps on the same Tomcat instance, simply rename the WAR file prior to deployment. For example, to create a second OpenSearch webapp for a second NTER instance, perform the following: 1. Copy the solrsearch.war file to Tomcat s Webapp directory cd ${catalina.base}/webapps cp /tmp/solrsearch.war /tmp/nter2solrsearch.war cp /tmp/nter2solrsearch.war. 2. Follow the remaining steps above. SOLRSEARCH WEBAPP UPGRADE Due to the way Tomcat manages jar files during a webapp re installation, it is highly recommended to un deploy the solrsearch webapp and redeploy it as new instead of attempting an upgrade. 1. Stop the Tomcat server and un deploy the solrsearch webapps. cd ${catalina.base} /etc/init.d/tomcat6 stop rm rf solrsearch solrsearch.war /etc/init.d/tomcat6 start 2. Follow the instructions for a clean installation of the SolrSearch webapp. Solr Installation Guide 15
SOLR WEB PORTLET INSTALLATION This portlet is used by NTER (Liferay) to connect to an existing Solr index that has already been configured and is running. GATHER SOFTWARE 1. Download and extract the Liferay Solr web portlet to the /tmp directory. cd /tmp wget http://plugins.nterlearning.org/solr-web-portlet-6.0.6.8.war INSTALLATION 1. Copy the solr web portlet 6.0.6.8.war to Tomcat s Webapp directory. cd ${deploy.dir} cp /tmp/solr-web-6.0.6.8.war. 2. Wait for Tomcat to automatically deploy the portlet, and then edit the ${catalina.base}/webapps/solr webportlet/web INF/classes/META INF/solr spring.xml file. If the Solr server is using Basic Authentication, uncomment the first two constructor args. <bean id="com.liferay.portal.search.solr.server.basicauthsolrserver" class="com.liferay.portal.search.solr.server.basicauthsolrserver"> <constructor-arg type="java.lang.string" value="${solr.url}/solr/${core.name}" /> </bean> Or <bean id="com.liferay.portal.search.solr.server.basicauthsolrserver" class="com.liferay.portal.search.solr.server.basicauthsolrserver"> <constructor-arg type="java.lang.string" value="${solr.web.user}"/> <constructor-arg type="java.lang.string" value="${solr.web.password}"/> <constructor-arg type="java.lang.string" value="${solr.url}/solr/${core.name}"/> </bean> 3. Restart Tomcat to ensure the changes take effect. /etc/init.d/tomcat6 restart SOLR WEB PORTLET UPGRADE Due to the way Tomcat manages jar files during a webapp re installation, it is highly recommended to un deploy the solr web portlet and redeploy it as new instead of attempting an upgrade. Use Liferay s Control Panel to undeploy the portlet, and then redeploy it following the steps above. Solr Installation Guide 16
APPENDIX A The following configuration settings were used for search.nterlearning.org. ACCOUNT INFORMATION Solr Server host search.nterlearning.org Solr Server user / password root / Tomcat account tomcat6 INSTALLATION LOCATIONS Directory Referenced As Value Tomcat home ${catalina.home} /usr/share/tomcat6 Tomcat base ${catalina.base} /var/lib/tomcat6 Solr Home directory ${solr.home} /var/lib/solr (maps to /mnt/solr) Solr URL ${solr.url} http://search.nterlearning.org/solr Solr OpenSearch URL ${opensearch.url} http://search.nterlearning.org/sorlsearch/opensearch Solr Installation Guide 17