SEARCHING BIOTECHNOLOGY INFORMATION IN THE 2010s: Section II (Databases & Search Strategies) Luca Falciola, IP Manager, Promethera Biosciences Sardegna Ricerche-Univ. Cagliari (Sep. 15 th 2014)
Databases & Biotechnology : A foreword Covering even a limited number of databases is already pretty impossible and a selection is required according to a few criteria Free access (at most, requiring the registration at a website using a user/password and an e-mail to get full access to services ; at this scoep better using a separate, specific free e-mail on the Internet to be used only at this scopes and for receiving Table of Contents, updates etc.) Overall positive reputation, importance, and good «search experience» even for occasional user This selection can be easily expanded for specific objectives by 2 Searching the structured repertory in Nucl. Acid Res. website and the yearly update Combining search topic in Google and/or Pubmed Exploring he NCBI and EBI websites
DATABASES FOR BIOTECHNOLOGY INFORMATION Scientific Literature 1 - Pubmed HighWire - Publishers website 2 Patent Literature 3 Chemical Structure & Biological Sequences 4 Metabases
Scientific Literature: Introduction The databases of scientific literature are several, mostly thematic, and Pubmed has a major role in life sciences More than 1,2 entries for year 2013, for a total of more than 25 millions Considered by many as the most complete database This leadership should not forget other resources that, at different levels, may be competitive for identifying relevant literature Commercial ones (EMBASE, SCISEARCH,SCI VERSE, BIOSIS, etc.) Databases covering a large panel of publishers for promoting the purchase of articles that provide full-text search features or other advances search / push services The full-text Vs indexing/completeness comparison is actually a main topic 4
Pubmed: Introduction Pubmed offers almost everything you need with exception of full-text search A well organized help page including links to Youtube and other tutorials Access to the large panel of services of NCBI as summarized in this guide and this NAR paper Sign-in page for accessing a even larger panel of features Guides to other literature databases, NCBI digital library, and the MeSH system 5
Pubmed: Advanced Search and Search History Both features are un the same page that can be maintained or even saved 6
Pubmed: Field Search A large number of fields is available for text or numeric searches 7
Pubmed: MeSH Examples (antibodies) 8
Some Pubmed tutorial are too complex and some university provide simplified versions like for MeSH and insisting in Pubmed: MeSH tutorial pursuing a sequential structured approach to identify the more relevant MeSH terms Not forgetting that MeSH are not always present and are relevant for extract a more relevant subset of references to explore with a series of related criteria 9
Pubmed: Search Operators A large number of operators /symbols expand the possibilities well beyond AND OR NOT (and truncation, double quotes are essential for pursuing precise but not too extensive searches) The search can be also improved by the large selection of filters in left sidebar 10
Pubmed: Heterogeneity An important issue is that Pubmed is intended to provide publications as soon to users, explaining some heterogeneity in indexing and access to articles 11
Pubmed: Some tricks Substitute with a between two words in a phrase The use of truncation shows how many spelling errors are present in the database that may make you miss some relevant hits Sedn Pubmed reference by e-mail just by indicating the PMID after http://www.ncbi.nlm.nih.gov/pubmed/ eg http://www.ncbi.nlm.nih.gov/pubmed/25031662,25000062 The «Related» references can be saved in the search history and combined with keywords to search within them Search History is limited in time and length (better not exceeding 50-80 entries) 12
Highwire Press: Overview Large literature life science database hosted by Stanford Univ. aggregating journals from many major publishers but also books and conference abstracts, also as full text and with some useful filters 13
Highwire: Help Page 14
Highwire: Search Results & History 15
Highwire: Services Preview of keywords in the context, alerting for new articles including a given citation or keywords, alternative viewing features, links to supplementary/ free documents and management of ToC are well implemented 16
All main publishers with a large panel of journals have nice feature to keep track of new articles or searching heir publications Nature, Science, Wiley, Springer Scienedirect of Elsevier is particularly rich of functions and has a broad coverage (even of journals not indexed in Pubmed Publishers Website: Introduction 17
Publishers Website: Other Examples Wiley 18
DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature Patent Literature 2 - Lens - Espacenet - Patentscope 3 Chemical Structure & Biological Sequences 4 Metabases
Patent Literature: Introduction Patent information that may be relevant for a biotech search is available in a variety of formats: Text-based Biological sequences Chemical structures Regular review of patent publications can be performed by using appropriately three types of tools : Multi-Patent offices websites (Patentscope, Espacenet, Lens) Patent office-specific tools (at USPTO, EPO, Australian, Indian, etc.) but in general poorly implemented outside basic number or proceedings Access for sequence- or structure-based searches (Lens, EBI) Each approach and tool has own strengths/weakness: Need to compare/double-check Access to PDF and identification of keyword context 20
Main strengths: Patent Literature: Overview Patentscope and Lens: full text/stemmed/nested searches, large number of criteria, login for saving search strategies, graphical/automated grouping of results Patentscope and Espacenet: machine-based translation Lens: somehow easier to use for both searching and getting/sending links to PDF files, nice support section, possible to search only granted patents, nice sorting/filtering functions, claims and abstract on the same page Espacenet: Cooperative Patent Classification & citing/cited documents features for(non-) EP appl., link to EPO register, links to (often) reliable patent family & Inpadoc/status information Main weaknesses: In general: Patentscope: unstability in case of long search session, IPC only, no clear patent family information Lens: format inconsistency for code/number fields, coverage and patent family definition, with functions appearing and disappearing (now providing IPC and USPC) Espacenet: somehow old-style for both searching documents and getting PDF files No visibility on actual coverage for all collections Limited means to identify keyword context 21
Lens : Search window 22
Lens: Search Window 23
Lens: Search Window 24
Lens: Filtering Features 25
Lens: Help Page 26
Espacenet: Search Window and Criteria 27
Espacenet: Patent Kind Codes & Help 28
Espacenet: CPC Classification 29
Espacenet: Results & Record View 30
Patentscope: Search Window & Results 31
Patentscope: Record & Records Analysis 32
DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature 2 Patent Literature 3 Chemical Structure & Biological Sequences -Uniprot - EBI-Fasta - ChEMBL/Pubchem 4 Metabases
Uniprot: Overview & Search Criteria 34
Uniprot: Overview & Search Criteria 35
Uniprot: HBB in Genecards Vs Uniprot 36
EBI-Fasta: Search Window 37
EBI-Fasta: Overview of Results 38
EBI-Fasta: Patent Sequence Record 39
ChEMBL: Introduction Medicinal chemistry data/products is now more accessible also to nonspecialist through portals such as EBI/ChEMBL, PubChem, or Drugbank portals that aggregate and make them searchable through different criteria, across biological/medical/patent information together with chemical information from proprietary repositories) for creating Molecular Clouds 40 (Ertl and Rohde, J Cheminf 2012)
ChEMBL: Features 41
ChEMBL: Search & Browse Features 42
ChEMBL: Targets, Ligands & Drug Approvals 43
DATABASES FOR BIOTECHNOLOGY INFORMATION 1 Scientific Literature 2 Patent Literature 3 Chemical Structure & Biological Sequences Metabases 4 - Google - Google Scholar - Drugbank
Google: Advanced Search & GoogleGuide 45
This site claims having broad coverage of both scientific and patent literature but it is actually unclear the coverage: beyond US patent documents and by which date (they index papers and not journals of which publishers Google Scholar: Introduction The system has some additional useful features compared to pure Google Separate advanced search features Management of alerts through own Gmail account Import features for reference management systems (but not always precise) Selection of publication date instead of appearance on the web (but again not always precise) Clear link to PDF on the left side of the window Citation list (that can be searched separately) and related articles features Metrics / search by journal Focused help page with advis on how getting your paper indexed 46
Google Scholar: Advanced Search Features 47
Google Scholar: Settings and Metrics Features 48
Google Scholar provides means for overcoming only some limitations of pure Google Lack of visibility about publication/journal coverage Unstructured search features within documents Lack of indexing Google Scholar: Final Comments It is an interesting tool for exploratory searches or completing searches made in traditional databases Exploiting full-text and advanced search features in a more structured environment Linking articles to combinations of specific technical details, cross-references, authors Obtaining additional search criteria to be used elsewhere 49
DrugBank: Introduction 50
DrugBank: Results 51
DrugBank: Records 52
Thank you!! Luca.falciola@promethera.com The views and the opinions expressed in this presentation are the author s personal thoughts on these subjects. They are not intended to be considered opinions and positions of Promethera, nor imply any commitment by Promethera to any particular action. 53