Network Activity D - Developing and Maintaining Databases Report D3.2.2 User Interface implementation Patricia KELBERT MNHN Paris BGBM Berlin July 2006-1-
Table of Contents 1 Introduction... 4 2 Material and methods... 4 2.1 Material... 4 2.2 Methods... 5 2.2.1 Programming... 5 2.2.1.a Templates... 5 2.2.2 Search... 5 2.2.2.a Connection and query to the Cache Database... 6 2.2.2.b Connection and query to the original provider... 6 2.2.3 Multithreading... 7 3 Results... 8 3.1 UI... 8 3.2 Search for units... 10 3.2.1 Advanced Search... 10 3.2.2 Simple Search... 12 3.3 Results... 12 3.3.1 Preliminary Results... 13 3.3.1.a Watching the results... 14 3.3.1.b Refining the query... 14 3.3.2 Available Results... 16 3.3.2.a See details for a single unit... 17 3.3.2.b See details for a selection of units... 17 3.3.2.c Download details for a selection of units... 17 3.3.3 Detailed Results... 18 3.4 Internationalisation... 19 3.4.1 Session storage... 20 3.4.1.a Cookies... 20 3.4.1.b Other session preferences...21 4.Conclusion... 22 4.1 Portability & adaptability : GBIF-DE Botany... 22 4.2 Thesaurus... 22 4.3 Feedback... 23 4.4 To do...23-2-
Illustration Index Index Page... 8 Extract of the advanced search page for metadata... 9 Extract of the metadata browser page... 10 Advanced search panel... 11 Fields list available for the advanced search... 11 Query fields for the advanced search... 12 Query field for the simple search... 12 Preliminary results for the search Malus* (extract)... 13 Refining query with a selection of countries... 14 Preliminary results after refining the query with a selection of countries.... 15 Summary box... 15 Extract of the available results page... 16 Confirmation page for downloading a selection of full details... 17 Detailed results page for the unit Malus Sylvestris 100108842.... 18 Detailed results from the cache because of an unavailability of the original provider... 18 Selected units (shopping basket approach)... 19 Preference's page... 20 Extract of the GBIF-DE botany interface, adapted from the BioCASE Portal v.2...22-3-
1 Introduction The goal of the NA-D 3.2.2 is to implement the User Interface (UI) for the new BioCASE Portal. Its implementation has to be based on architecture study including BioCASE, DiGIR and TAPIR protocols support, cache query mechanisms and language module support. There also must be a clear separation between the unit-level query system and the metadata-level system, which has to be re-used. In the last report, the work done for the UI designs and architecture has been described. In brief, we opted for the Internet user interface the TAO 1 design, implemented only through CSS files. The tasks that had to be done were : developing the web pages with the pythonic object oriented web development framework CherryPy developing methods to query the databases developing methods to handle the results from the Cache Databases and from the original providers (display and download) adapting an existing application to translate templates into different languages 2 Material and methods 2.1 Material The programming language used here is Python 2 (version 2.4). Python is a dynamic interpreted objectoriented programming language, developed as an open source project. The code has been written within an open source application framework called Eclipse, usually used to build softwares. The version used here is Eclipse SDK 3.1.2 3, and the plug-in PyDev 4 (version1.0.3) has been combined to Eclipse in order to develop our pythonic project. The whole code has been implemented within the pythonic web development framework CherryPy 5 version 2.2.1, combined to the use of KID 6 (version 0.9.2) templates. To deploy the new BioCASE portal, the Apache HTTP Server 2.0.55 7 has been chosen with the servlet container Tomcat 8. Connectors such as mod-python2.4 9 and jakarta-tomcat-connectors-1.2.15 10 make it possible to run CherryPy and Tomcat behind the Apache HTTP Server. 1 TAO design snapshot: http://ww2.biocase.org/synth-gui/gui-design/tao/ 2 Python: http://www.python.org/ 3 Eclipse Project: http://www.eclipse.org/ 4 PyDev Project: http://pydev.sourceforge.net/ 5 CherryPy: http://www.cherypy.org 6 KID: http://kid.lesscode.org/ 7 Apache HTTP Server: http://httpd.apache.org/ 8 Apache Tomcat: http://tomcat.apache.org/ 9 Mod_python: http://www.modpython.org/ 10 Connectors: http://tomcat.apache.org/connectors-doc/ -4-
2.2 Methods 2.2.1 Programming 2.2.1.a Templates A template engine (or template parser or template processor) is software that processes input text (the template) to produce one or more output texts on a website. The processing itself generally functions by replacing specific sequences of text with data provided by a model or resulting from more complex operations. It separates code (here Python) from HTML. CherryPy can work with several templating packages, such as Kid.... tmpl = kid.template(name=tmpl_name,baseurl=self.baseurl, title=title, body=body, trail=trail, navlist=navlist, **kwargs) return tmpl.serialize(output='html')... Table 1.Extract from the python code... <form action="${baseurl}search/units/simplesearch/query1" method="post" name="unitfrm">... Table 2.Extract from the template code... <FORM ACTION="/synth-ui/search/units/simpleSearch/query1" METHOD="post" NAME="unitFrm">... Table 3.Extract from the generated HTML code, with self.baseurl = /synth-ui 2.2.2 Search According to the SYNTHESYS cache database and the original providers system, the search for full units details has to be done in at least 2 steps: 1. connection and query to the SYNTHESYS Cache Database 2. connection and query to the original provider -5-
2.2.2.a Connection and query to the Cache Database The SYNTHESYS Cache is a MySQL database. The connection to the database is done thanks to the MySQLdb module: self.connection = MySQLdb.connect(db=dbName, user=user, passwd=passwd, host=host,port=port) Table 4. Connection to the MySQL-DB The queries are formed depending on the parameters that are asked. That is to say, not all data are each time retrieved from the DB, but only those the user asked for: the tables from the DB are not joined if information contained in their columns are not relevant for the current query. This provides a gain of time and resources. The connection properties are set as variable: the connection to another database does not require a modification of the python code, but a modification of a text formatted configuration file. 2.2.2.b Connection and query to the original provider The connections and queries to the original providers are performed through the wrapper software. The request (in XLM format) sent to the provider depends on the provider's protocol (BioCASE, DiGIR or TAPIR). This part of the code is based on an existing code developed for the bps2 query tool. The connection and query to the original provider require : the protocol's name BioCASe DiGIR TAPIR the schema with its concepts for example: ABDC2.06 unitid: "/DataSets/DataSet/Units/Unit/UnitID" collectionid: "/DataSets/DataSet/Units/Unit/SourceID" institutionid: "/DataSets/DataSet/Units/Unit/SourceInstitutionID" darwin2 unitid: "CatalogNumber" collectionid: "CollectionCode" institutionid: "InstitutionCode" the resource's URL the query itself, wrapped in a XML format -6-
2.2.3 Multithreading A process is a program that is currently executing. Every process has at least one thread running within it. A thread is a program's path of execution. If only one thread was available, a program would be able to do only one thing at a time. Threads enhance performance and functionality in programming languages by allowing a program to efficiently perform multiple tasks simultaneously. To realize many different connections in the same time, a thread pool is very useful. The thread pool is a queue of idle threads. Thread pooling allows a thread to be assigned to a task and, when the task is completed, to be recycled for use in another task. Because threads in the pool are already up and running, response time is usually reduced. The number of threads in the pool can be fixed to an upper limit to prevent a sudden overloading of the application. In order to prevent simultaneous multiple connections to the SYNTHESYS cache database, a bounded semaphore is used to lock the access to the database. This lock is acquired by a thread (that can come from the threadpool) and released when the query is executed. -7-
3 Results 3.1 UI The new user interface for the BioCASE Portal 2 can be founded at: http://search.biocase.org/synth-ui. The menu provides fast access to : 1. different search types (Unit, Metadata, Map(under development) and Itineraries (under development)) 2. the registry (providers, institutions and collections) 3. related information (about the project, acknowledgments...) Illustration 1: Index Page The Index page provides direct access to the unit-level simple search and to the metadata-level simple search. -8-
The search on metadata is redirected to the old portal, which has been re-used and re-designed with the new TAO style. Illustration 2: Extract of the advanced search page for metadata The three kinds of metadata-search are then still available: Basic Search, now renamed as Simple Search Advanced Search Browser -9-
Illustration 3: Extract of the metadata browser page The innovation in the Portal V2 concerns the units level search. 3.2 Search for units Two kinds of search based on names are available: the simple search, where the user only have to enter a taxonomic name the advanced search, where the user can refine its query from the start, by selecting fields. 3.2.1 Advanced Search The advanced search can be used to do a specific query: the user can fill selected fields and then perform a precise search. -10-
Illustration 4: Advanced search panel The following filters can be applied: only data with multimedia content record basis has to be an observation or a specimen or any By clicking on Choose field, the list of fields the user can pick up is deployed: Illustration 5: Fields list available for the advanced search -11-
For example, if the user is looking for Malus * gathered or observed in France or in Germany, the interface will look like this: Illustration 6: Query fields for the advanced search Note: different fields are linked with an AND operator, whereas comma separated text within a field are linked with an OR operator. 3.2.2 Simple Search The simple search only involves a taxonomic name. The asterisk (*) can be used as a wild card. Illustration 7: Query field for the simple search By clicking on Search, the user is redirected to the results pages. 3.3 Results The results are displayed in at least 3 steps: 1. the preliminary results: the list of taxonomic names that match with the user's query 2. the available results: the list of units corresponding to the user's selection (no more than 10 000 units) 3. the detailed results: the whole details from the original provider, if no error occurred, or details available in the cache if the original provider was not available -12-
3.3.1 Preliminary Results This first result page makes it possible to refine the current query. The page is divided in two main parts: a summary at the top the results formatted in a table take the rest of the page Illustration 8: Preliminary results for the search Malus* (extract) -13-
3.3.1.a Watching the results The search returned 99 taxonomic names matching with Malus* (ie. starting with Malus ). It corresponds to 24883 units. 4785 of them are described as Observation in the database and 24 of them are Specimens. It is possible to see the higher taxa, or the genus, or the families, or the common names, or the countries (of gathering/observation), or the collection's ID or the institution's ID or the record basis of the current query by clicking on the magnifying glass or on the column's name. Clicking on the calculator will count and display the number of distinct names for the selected category. It is also possible to download the list of names (for a category) by clicking on the download box. A text file is created on the server and is sent to the user. Only 10 000 results in term of units can be displayed and handled for the next step: user might have to refine its query to limit the number of results or to search only for specific data. 3.3.1.b Refining the query The user can refine the query by picking boxes within the table's columns and clicking on the Refine query button. This action will always display the list of scientific names whose criteria match with the new query. To refine the query for the units of Malus* gathered or observed in Austria, Bulgaria, Denmark, Finland, France or Germany, the user has to check the corresponding boxes. Illustration 9: Refining query with a selection of countries -14-
When the user clicks on Refine query, the results page is reloaded: Illustration 10: Preliminary results after refining the query with a selection of countries. The number of units decreased: only 4854 units of Malus* with our selection of gathering/observation countries exist in the database. By clicking on the calculators, we get the number of distinct names for each category: -15-
Illustration 11: Summary box As there are less than10 000 units, we can select the 12 scientific names and see the corresponding results. ( Select All and then See Results ). A new page is loaded, with more details coming from the cache. Note: It would have also been possible to select our 6 countries and directly click on See result 3.3.2 Available Results Illustration 12: Extract of the available results page -16-
There is a summary box again, with the query executed, the number of hits returned, the number of data containing multimedia content and a link to the map illustration of these units. The headers of the table (here TaxonName, Country, CollectionID and Institution ID) can be selected in the Preference's page. There are 3 different ways to access the full details for units: see the details for a single unit see the details for a group of units download the details for a selection of units 3.3.2.a See details for a single unit To see the details from the original provider for a single unit, the user only needs to click on the taxonomic name he is interested in. Each name is linked to the triplet unitid-collectionid-institutionid. This action will load a new page, called Detailed Results. 3.3.2.b See details for a selection of units The user can select some units by picking the left boxes, and then click on»» See the details for the selected units : the detailed results page is loaded and a kind of shopping basket is added. 3.3.2.c Download details for a selection of units The user can select some units by picking the left boxes, and then click on»» Download the details for the selected units. A confirmation page is displayed: Illustration 13: Confirmation page for downloading a selection of full details. -17-
The user can cancel his choice, or confirm it: as the data are retrieved in a background task, and as it can take some time, it is possible to get an email when the job is done. A control is used to prevent abuses from robots (here the user has to copy the text contained in the picture Abies alba in the text field). Note: giving an email address is optional Once the form is validated, the ZIP file is created and filled with the non-formatted detailed data (i.e. pure XML). 3.3.3 Detailed Results If the original provider is available and returned a well formed document with the details, these data are displayed. Illustration 14: Detailed results page for the unit Malus Sylvestris 100108842. If an error occurred with the original provider (unavailable, not well formed document, time out, no data corresponding to the triple ID...), all the data stored in the cache database are displayed, with a notice informing that they do not come from the original provider. -18-
Illustration 15: Detailed results from the cache because of unavailability of the original provider With the shopping-basket approach, the user does not need to go back to the previous page, he only needs to click on a name stored in the selection list to see the details for another unit: Illustration 16: Selected units (shopping basket approach) 3.4 Internationalisation The portal can be used for the moment in 3 languages: English German French -19-
The templates are generated one time, and the application calls the template associated to the current language. The different templates are generated by an ANT application, usually used to translate the ABCD and DarwinCore XSLT into different languages. This translation tool is based on a text file with keyword = value and a build.xml that does the job. For the moment, the translated files have to be copied in the good folder manually. The portal's language can be changed by going in the Preference's page, selecting a language from the list and validating this choice. Illustration 17: Preference's page These preferences are stored for each session. -20-
3.4.1 Session storage 3.4.1.a Cookies The user's preferences are stored in a cookie. By the next visit, the portal will be displayed with the language already stored. The cookies are handled through CherryPy and are associated to unique session identification numbers. Note: the user's browser has to accept cookies. 3.4.1.b Other session preferences Some other preferences can be stored in the session's cookie: the default grouping : which column has to be highlighted and displayed first on the Preliminary Results page the default unit search: which kind of units search has to be loaded (either the Simple Search or the Advanced Search) the default metadata search: which kind of metadata search has to be loaded (either the Browser, or the Simple Search or the Advanced Search) the default result display: which columns have to be displayed on the Available Results page. The taxonomic name is always displayed as it links the units triple ID with its name. the number of results per page: by default, the number is set to 20. It is the number of units that can be displayed per page for the Available Results. When the session expires, the preferences are reset to their default values. -21-
4. Conclusion The goal of this 4 months task was to develop the new portal for BioCASE, which had to be based on architecture study including BioCASE, DiGIR and TAPIR protocols support, cache query mechanisms and language module support. The unit-level query system and the metadata-level system, which has to be re-used, had to be clearly separated. This portal also had to be developed in a generic way, in order to be easily re-used for other Search Portals based on the same database structure. 4.1 Portability & adaptability: GBIF-DE Botany The portal has been adapted for the new GBIF-DE Botany portal: for the design, only the CSS files have been modified. The database structure for the cache is identical to the SYNTHESYS cache database, except of the table concerning the countries (the cache is filtered for specimen and observations coming from Germany). Then, some code parts have been set as comment (invisible). Illustration 18:Extract of the GBIF-DE botany interface, adapted from the BioCASE Portal v.2 4.2 Thesaurus TOQE is an acronym for Thesaurus Optimized Query Enhancer. TOQE is a service that may be integrated into search engines or all kinds of query-processing pipelines. Based on the desired mode of operation it expands the query in a thesaurical meaning. TOQE itself is implemented as a Web Service. A client should interact with TOQE, which then queries a database backend and returns the requested data. So the desired design is that TOQE acts as a wrapper for a thesaurus database. This web-service has been implemented by Niels Hoffmann, and will be linked with the new portal. -22-
4.3 Feedback A bugs tracker has been installed by the Edinburgh group. The portal's user will be invite to add comments and report bugs on this website: http://193.62.154.87/biocase 4.4 To do Some improvements are already planned, such as: an auto-completion of the search fields: as you type, the application will offer suggestions for the countries, taxonomic names..., thanks to AJAX technologies the selection of check boxes on several results pages sorting by alphabetical order (ascending or descending) the columns contents of the results pages the possibility to download generic data for queries that obtained more than 10 000 units link the thesaurus with the portal -23-