ProteinQuest user guide



Similar documents
How to Work with a Reference Answer Set

Guide for Data Visualization and Analysis using ACSN

Protein Protein Interaction Networks

Exercise with Gene Ontology - Cytoscape - BiNGO

How to create and interpret the predictive analysis of a compound

JustClust User Manual

What s New in Pathway Studio Web 11.1

Tutorial for proteome data analysis using the Perseus software platform

Technical Report. The KNIME Text Processing Feature:

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

Visualizing Networks: Cytoscape. Prat Thiru

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

InfoView User s Guide. BusinessObjects Enterprise XI Release 2

Creating an Access Database. To start an Access Database, you should first go into Access and then select file, new.

Get the most value from your surveys with text analysis

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

What you can do:...3 Data Entry:...3 Drillhole Sample Data:...5 Cross Sections and Level Plans...8 3D Visualization...11

Big Data and Text Mining

Cluster software and Java TreeView

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

CENG 734 Advanced Topics in Bioinformatics

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Extraction and Visualization of Protein-Protein Interactions from PubMed

Methods for network visualization and gene enrichment analysis July 17, Jeremy Miller Scientist I jeremym@alleninstitute.org

Asset Track Getting Started Guide. An Introduction to Asset Track

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Writer Guide. Chapter 15 Using Forms in Writer

Extracting value from scientific literature: the power of mining full-text articles for pathway analysis

Mascot Search Results FAQ

IBM SPSS Text Analytics for Surveys

Big Data Text Mining and Visualization. Anton Heijs

Universal Simple Control, USC-1

Software version 1.1 Document version 1.0

Visualization methods for patent data

Attix5 Pro Server Edition

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Final Project Report

Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015

IC05 Introduction on Networks &Visualization Nov

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

> Semantic Web Use Cases and Case Studies

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Visualization with Excel Tools and Microsoft Azure

Chapter 15 Using Forms in Writer

Hierarchical Clustering Analysis

A brief introduction to Cytoscape

Genevestigator Training

Gephi Tutorial Quick Start

A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam

Reporting Manual. Prepared by. NUIT Support Center Northwestern University

A Statistical Text Mining Method for Patent Analysis

DeCyder Extended Data Analysis module Version 1.0

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Mass Frontier 7.0 Quick Start Guide

From Data to Foresight:

MultiExperiment Viewer Quickstart Guide

Interactive Information Visualization in the Digital Flora of Texas

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

SourceForge Enterprise Edition 4.4 SP1 User Guide

CONCEPTCLASSIFIER FOR SHAREPOINT

Product Structure Preface What's New? User Tasks

Intellect Platform - Tables and Templates Basic Document Management System - A101

Discover the best keywords for your online marketing campaign

Executive Dashboard. User Guide

UCINET Quick Start Guide

BusinessObjects Enterprise InfoView User's Guide

Leukemia Drug Pathway Analyzer

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

Geo-Localization of KNIME Downloads

Guide to Building Pathways in Mammal using Pathway Studio Web

ProteinPilot Report for ProteinPilot Software

ResearchGate. Scientific Profile. Professional network for scientists. ResearchGate is. Manage your online presence

Big Data in Drug Discovery

User Guide. Analytics Desktop Document Number:

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

GeneProf and the new GeneProf Web Services

I2B2 TRAINING VERSION Informatics for Integrating Biology and the Bedside

How to stop looking in the wrong place? Use PubMed!

Data, Measurements, Features

HPI in-memory-based database system in Task 2b of BioASQ

ATLAS.ti for Mac OS X Getting Started

Project management (Dashboard and Metrics) with QlikView

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

Transcription:

ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for use... 7 3.1 Simple query... 7 3.2 Advanced query... 8 3.3 Combine dictionary terms with Boolean operators AND, OR, NOT... 8 3.4 Loading a list.... 9 3.5 Identify a list of terms... 10 3.6 How to identify extra data 11 3.7 Highlight data into the documents...... 11 3.8 Wizard.. 12 3.9 Query limits.. 13 3.10 Results bar.. 13 3.10.1 Enlarge, Filter, Clipboard, Network... 14 3.11 Results list... 17 3.11.1 Papers, Patents and Clinical Trials... 17 3.12 Results analysis... 18 3.12.1 Excel download of a list of terms... 18 3.12.2 Excel download of a list of PMIDs... 19 3.12.3 Excel download of a list of PMIDs, Year, Title, Authors;Journal, Volume, Pages, Notes. 19 3.13 Graphs download... 20 3.13.1 Heat Map download... 21 3.13.2 Network download... 22 3.13.2.1 Excel. 22 3.13.2.2 Node XL... 23 3.13.2.3 Cytoscape. 23 4. Tools... 24 4.1 Saved query/load query... 24 4.2 PMID list.. 24 4.3 Network:... 24 4.3.1 How to generate a Network... 24 4.3.1.2 Automatic Network selection... 24 4.3.1.3 Select where to collect data 25 1

4.3.1.4 Select Nodes... 25 4.3.1.5 Generate Network... 26 4.3.1.6 Black and colored edges: two types of information... 27 4.3.1.7 Advanced Network configuration... 28 4.3.1.8 Set the values of occurrence, co-occurrence and Ef.. 28 4.4 The Heat Map... 29 4.4.1 How to generate a Heat Map... 29 4.4.2 How to download a Heat Map... 31 5. ProteinQuest Case Studies... 31 2

1.Introduction ProteinQuest is a new platform for biomedical literature retrieval and analysis. This new platform for biodiscovery smoothly integrates data from scientific literature, data repositories and biological images. Currently ProteinQuest holds more than 15 million indexed abstracts, 9 million images, 1.8 million selected Patents, 250.000 Clinical Trials and 10 billion binary relationships. Literature information can be obtained easily by using two different query types: by inserting free key words and by guided construction of a Boolean query using cured Ontologies. ProteinQuest finds relevant insights into both article abstracts and image captions, producing more specific and comprehensive search results compared to other data mining platforms. Query results can be as specific as users require. ProteinQuest performs an accurate search as it lets you refine the field of interest by selecting specific dictionaries/ontologies such as mirna, drugs, Biological Processes, etc. Moreover queries can be saved and reloaded whenever needed. ProteinQuest can be also used to search Patent abstracts and claims for analysis of the resulting information by means of all dictionaries/ontologies available. Additionally ProteinQuest builds complex network models to extend the understanding of your research. Networks generated by ProteinQuest reveal binding relationships between several types of concepts and biological items, as well as between people, institutions, companies, etc. 1.1 With ProteinQuest you can: Easily understand and interpret literature information through an innovative graphical layout that highlights key relationships and connections between objects included in several different Ontologies Mine for biological relationships between proteins/genes experimentally supported by one or more techniques of our choice Prioritize target genes for biomarker discovery, drug development and repositioning Create powerful, interactive networks connecting genes or proteins to diseases, identify relevant drugs and isolate sub-networks within biological fields 3

Retrieve only clinically-relevant information at any clinical stage of development Examine relevant experiments in the literature and compare your results to what people have already found Track down collaborations among people or institutions working on a topic of your choice, identifying the most relevant players in the field ProteinQuest is available in two versions: basic and extended. 1.2 ProteinQuest basic version The Basic version is the right tool to search and explore PubMed papers for easily getting a quick reply to your query. With ProteinQuest basic version you can: Retrieve information from abstracts of the entire PubMed collection (more than 15.000.000 records) and captions of all free full-text papers (about 9,000,000 entries); Launch queries both to PubMed (simple search) or to our curated, internal database (advanced search). Disambiguate entities using a semantic approach and a highly sophisticated proprietary technology to reduce the number of false positive results which common data mining tools are unable to discriminate Obtain higher accuracy, precision and recall values compared to other tools Auto complete query fields for a guaranteed accurate search Automatically expand queries that include a reference term (e.g. gene symbol), all known synonyms and add disambiguation information for ambiguous terms allowing to perform a single absolute search Perform composite queries by inserting a list of terms such as gene symbols as search input Retrieve clinically-relevant information at any clinical stage for drug development purposes Track down collaborations among people or institutions on a common topic Interrogate the scientific literature using free-words or selecting terms from 9 different dictionaries/ontologies The table below highlights the main features of ProteinQuest s basic version 4

1.3 ProteinQuest extended version This is the full version of ProteinQuest. With ProteinQuest s extended version you can: Retrieve relevant information from abstracts of the entire PubMed collection (more than 15.000.000 records) and captions of all free full-text papers (about 9,000,000 entries) and both Patents and Clinical Trails (1.8 million selected Patents, 250.000 Clinical Trials) Launch queries both to PubMed (simple search) or to our curated, internal database (advanced search) Disambiguate entities using a semantic approach, and a highly sophisticated and proprietary reasoned avoiding the release of false positive results which the common data mining tools are unable to discriminate. Obtain higher accuracy, precision and recall values compared to other tools Auto complete query fields for a guaranteed accurate search Automatically expand queries that include a reference term (e.g. gene symbol), all known synonyms and add disambiguation information for ambiguous terms allowing to perform a single absolute search Perform composite queries by inserting a list of terms such as gene symbols as search input Retrieve clinically-relevant information at any clinical stage for drug development purposes Track down collaborations among people or institutions on a common topic Integrate PubMed information with Patents and Clinical trials data Interrogate the scientific literature using free-words or selecting terms from 9 different dictionaries/ontologies Process the results using Heatmaps and Networks. Define Pathways also with data regarding the activation, inhibition and binding information. The table below highlights the main features of ProteinQuest s extended version. 5

2.ProteinQuest dictionaries Molecules Proteins mirna Drugs Substances Protein families Functions Bio Processes Disease Pathways Anatomy Body parts Tissues Cells Cell parts Lab Organisms Methods Source Papers Organizations Nationality Study type Authors Journals Year Patents Organizations Inventors USP Class Year Trials Organizations Nationality Status Phase Year 6

3. Directions for use 3.1 Simple query To set a simple query insert your keywords into the search space: The selected terms will be searched without any text processing in both abstracts and Mesh terms of PubMed papers =>But organized information is available for each dictionary 7

3.2 Advanced query The selected terms will be searched in both abstracts, Mesh terms and captions of PubMed papers, abstracts and claims of US Patents and Summaries of worldwide Clinical Trials 3.3 Combine dictionary terms with Boolean operators AND, OR, NOT After the query has been set, it is possible to change the Boolean operator from OR to AND or from AND to NOT simply by clicking on it. 8

3.4 Loading a list: - You can insert terms one at the time or you can load a list.: The file must have a.txt format 9

3.5 Identify a list of terms For each dictionary, it is possible to extract the specific match terms identified in the papers: Different lists can be extracted. Below you can see that we chose highlight mirnas that have been identified in the result. 10

3.6 How to identify extra data If you want to visualize all the related mirna described with the ones of the query, just click on{}enlarge: 3.7 Highlight data into the documents 11

3.3. Wizard Through the wizard button it is possible to obtain the results of an advanced query in a single click Through wizard buttons it s possible to obtain networks in a single click 12

3.4 Query limits -How to set limits: 3.5 Results bar For each dictionary it is possible to visualize and export the elements identified in the results. Here is the protein list where the number of documents (abstract, or captions) and images (captions) corresponding to the free captions of papers has been specified. The Ef corresponds to the enrichment factor, which relies on the frequency of the elements in the results. 13

Notably the first protein of the list is TNFAIP3 as it has been cited the most in the results. The list can also be ordered by the highest number of images or enrichment factor simply by clicking on the top of the bar: 3.10.1 Enlarge, Filter, Clipboard, Network -{}Enlarge to visualize all the related concepts of the query elements identified in the results 14

-Filter and restrict the query with an additional group of elements (checked into the box space) -Select and save all documents and images of your interest to the Clipboard 15

Furthermore it is possible to visualize all PMIDs and their corresponding titles by clicking on a title the document will appear behind the clipboard, ready to be analyzed. Within the clipboard you can CLEAR and erase or SAVE the subset of selected documents. The papers can be reloaded by selecting the saved clipboard. -Network and visualize the biological relationship among the selected terms. Inside ProteinQuest a network of at least 240 nodes can be represented. The choice of nodes relies on the number of documents and the enrichment factor (Ef) of the most connected terms. 16

3.11 Results list 3.11.1 Papers, Patents and Clinical Trials The result obtained from a query corresponds to a subset of documents: If you are interested in PubMed publications, just select the Papers directory: The title, affiliation, abstract, mesh and open sources figure of the papers will be analyzed. If you are interested in Patents, just select the Patents directory: The title, affiliation and claims of patents will be analyzed. 17

If you are interested in Clinical trials, just select the Trials directory: The analysis is related to the title, affiliation, summary and eligibility of clinical trials 3.12 Results analysis 3.12.1 Excel download of a list of terms The list of terms of each dictionary identified in the results, can be exported to excel: 18

3.12.2 Excel download of a list of PMIDs Furthermore the PMIDs list of the results can be exported from ProteinQuest. 3.12.3 Excel download of a list of PMID, Year, Title, Authors; Journal, Volume, Pages, Notes Note that the PMID is the link to download of the selected papers. Not only the list of terms but you can also export a graph, heat map or a network 19

3.13 Graphs download Here is a downloaded graph representing the number of documents and images of the most specific biological processes of a query: 20

3.13.1 Heat Map download Here is a downloaded excel file of a Heat Map that represents the methods used for the analysis of genes identified in a specific query 21

3.13.2 Network download The Network can be downloaded in different formats: 3.13.2.1 Excel With the excel file it is possible to visualize the main characteristic of the network generated in ProteinQuest: The concept selected (vertex), the occurrence (label), the Ef (tooltip) and the weight (Cooccurrence) Here is an example of excel file of a network generated from a query Vertex Color Shape Size Label Tooltip Type Occurrences Weight IL6 254, 161, 0 circle 80 61 occ, Ef 167.9 Prot 61 167.86 LMNA 254, 161, 0 circle 80 61 occ, Ef 4160 Prot 61 4160.01 TNF 255, 135, 135 circle 60 31 occ, Ef 46.95 Prot 31 46.9531 IL1B 255, 152, 152 circle 50 16 occ, Ef 71.32 Prot 16 71.3205 NFKB1 255, 156, 156 circle 48 13 occ, Ef 97.91 Prot 13 97.9058 STAT3 255, 156, 156 circle 48 13 occ, Ef 236.7 Prot 13 236.75 IL8 255, 161, 161 circle 46 10 occ, Ef 78.47 Prot 10 78.4673 MAPK8 255, 163, 163 circle 45 9 occ, Ef 80.52 Prot 9 80.5248 RELA 255, 165, 165 circle 45 8 occ, Ef 266.3 Prot 8 266.275 TLR4 255, 165, 165 circle 45 8 occ, Ef 148.4 Prot 8 148.415 CASP3 255, 167, 167 circle 44 7 occ, Ef 35.22 Prot 7 35.2155 CCL2 255, 169, 169 circle 43 6 occ, Ef 70.26 Prot 6 70.2596 COX2 255, 169, 169 circle 43 6 occ, Ef 59.98 Prot 6 59.9779 PTGS2 255, 169, 169 circle 43 6 occ, Ef 60.86 Prot 6 60.8579 MAPK3 255, 171, 171 circle 43 5 occ, Ef 36.28 Prot 5 36.2841 22

3.13.2.2 Node XL Using NodeXL it is possible to edit the network obtained in ProteinQuest and prepare an image of it. 3.13.2.3 Cytoscape Using Cytoscape it is possible to edit your ProteinQuest network and to further analyze it through its plugins (Bingo, GeneMania, Reactome, Network Analyzer etc.,) 23

4. Tools Inside the Tool bar there are several functions: 4.1 Saved query/load query It is possible to save your query before you log out from ProteinQuest and reload it in the following session. 4.2 PMIDs list It is possible to export the list of PMIDs identified in the results 4.3 Network There are two possible network setting options: An automatic selection will choose the interactions by relying on the number of documents and Ef among the most connected terms. 4.3.1 How to generate a Network 4.3.1.2 Automatic Network selection For automatic network generation don t select the advanced configuration option. 24

4.3.1.3 Select where to collect data It is required to select where to collect data: papers, patents or clinical trials. For the PubMed papers it is necessary to select if terms should be collected from either abstracts or images or both 4.3.1.4 Select Nodes The nodes can be represented by their query terms, visualized only by the interactions among them (restrict to query elements) or included terms identified in the results belonging to the same dictionary or other ones. 25

Since the edges selected correspond to the documents where different nodes are described together, it is required to select which interactions to visualize by checking one or more of the options proposed. 4.3.1.5 Generate Network Here is an examle of a network automatically generated by ProteinQuest: 26

4.3.1.6 Black and colored edges: two types of information -black edges correspond to a link of specific papers described together by the relationship among the adjacent nodes. -colored edges correspond to experimental data describing interactions, inhibitions, expression regulation and enzymatic reactions. It is possible to select which data to visualize first. Other information related to the network is available in bibliometric and protein pathway network analysis. 27

4.3.1.7 Advanced Network configuration There is also the possibility to select the advanced configuration to generate the network. 4.3.1.8 Set the values of occurrence, co-occurrence and Ef And set the values of occurrence, co-occurrence and Ef. The only limits sizes are the ones set by user. 28

Here is an example of a network generated in ProteinQuest and visualized in Cytoscape: 4.4 The Heat Map 4.4.1 How to generate a Heat Map Heat Maps can be generated by selecting the correspondent button under the Tools directory. The Heat Map represents a useful tool to explore biological relationship among specific terms identified in the query results. 29

It is possible to visualize where two terms are described together in the papers or patents or clinical trials. Following are the steps necessary to generate a Heat Map. The resulting Heat Map will report in each cell the number of co-occurrences of two terms in the list of documents retrieved in the results. The red intensity is proportional to the fraction of hits normalized to the total hits number of each column. Following is a Heat Map reporting in each box the number of documents where each genes or proteins are described in a specific pathological context. Furthermore the numbers are also linked to the corresponding documents. 30

4.4.2 How to download a Heat Map The Heat Map can be exported to excel for further statistical analysis, such as cluster analysis, Pearson s correlation etc. These analyses are very useful to identify for e.g. biomarker signatures and other biological information. 5. ProteinQuest Case Studies 1] S. Polidoro et al., Effects of bisphosphonate treatment on DNA methylation in osteonecrosis of the jaw., Mutat. Res., vol. 757, no. 2, pp. 104 13, Oct. 2013. [2] T. Alberio et al., Parkinson s disease plasma biomarkers: an automated literature analysis followed by experimental validation., J. Proteomics, vol. 90, pp. 107 14, Sep. 2013. [3] C. Zanini et al., Medullospheres from DAOY, UW228 and ONS-76 cells: increased stem cell population and proteomic modifications., PLoS One, vol. 8, no. 5, p. e63748, Jan. 2013. [4] A. Benso et al., Reducing the complexity of complex gene coexpression networks by coupling multiweighted labeling with topological analysis., Biomed Res. Int., p. 676328, Jan. 2013. 31

32