Integration of Protein-protein Interaction Data in a Genomic and proteomic Data Warehouse



Similar documents
Gruppi di lavoro Biologia Cellulare e Molecolare Biotecnologie e Differenziamento. Università degli Studi di Napoli Federico II BIOGEM.

Digital performance of Italy

E UROPEAN CURRICULUM VITAE FORMAT

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Guidelines for Establishment of Contract Areas Computer Science Department

e INTESA: L'uso di sistemi italiani di telemedicina e loro Integrazione nel Sistema Sanitario Nazionale" L. Guerriero e R. Bedini

BIOCHIP DEVELOPMENT FOR CANCER DIAGNOSIS Paradigmatic model of integration between knowledge and methodology

An Introduction to Genomics and SAS Scientific Discovery Solutions

ANDREA COLOMBARI. Curriculum vitae

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Supplementary data. Ricerche (CNR), Via Orabona, Bari, Italy. cardone@ba.iccom.cnr.it; 10 mrcr01ch@uniba.it.

Web-Based Genomic Information Integration with Gene Ontology

How To Promote Agricultural Productivity And Sustainability

MINUTES OF THE ADMINISTRATIVE COMMITTEE: EVALUATION OF THE REQUIRED INTEGRATIONS

WORKSHOP LIGHT SOURCES

A leader in the development and application of information technology to prevent and treat disease.

CURRICULUM VITAE. Phd in computer science

F ORMATO EUROPEO PER IL CURRICULUM VITAE

Curriculum Vitae et Studiorum

Meeting of the Mongolian Mathematical Society September 15th 2015

Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science

Curriculum Vitae et Studiorum

The Moroccan American Pharmaceutical Sciences & Education Network Group (PharMaSeng), University Hassan II MohammediaCasablanca & FST Mohammedia

Alison Yao, Ph.D. July 2014

Twinning Czech Republic Italy. Capital Market: Legislation and Regulations AGENDA STUDY VISIT

ANTARES A new project for Alternative Methods and REACH

Dr Alexander Henzing

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

EMMEGI is an international press agency based in Florence and gathering freelance reporters both from Italy and from abroad.

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Curriculum Vitae et Studiorum Dossier n Cinzia Di Giusto

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

Extraction and Visualization of Protein-Protein Interactions from PubMed

Rosaria Rinaldi. Dipartimento di Matematica e Fisica «E. De Giorgi» Università del Salento. Scuola Superiore ISUFI Università del Salento

Image quality issues in digitization projects of historical documents

Electronic Critical Edition of Ancient Digital Manuscript Sources

DOCTORAL SCHOOL IN SCIENCES OF REPRODUCTION

PPInterFinder A Web Server for Mining Human Protein Protein Interaction

Doctor of Philosophy in Computer Science

Analysis of Illumina Gene Expression Microarray Data

How To Understand The History Of Centro Servizi Calza

Using Ontologies in Proteus for Modeling Data Mining Analysis of Proteomics Experiments

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

CURRICULUM VITÆ. Prof. FRANCESCO DONATELLI

Industrial Skilled Development Program on Advance Bioinformatics: Tools, Techniques and Applications

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

ig4rt Italian Group for Railway and Road Traffic Technology

Data in biobanking An emerging new era for data management

Capacities vs. jobs in bioinformatics and biotechnology: a few points to the attention of current students & job-seekers

Curriculum Vitae. Alessandro Formaglio. CURRENT POSITION Research Associate from 2007, Department of Information Engineering, University of Siena.

Basic Course on Bioinformatics tools for Next Generation Sequencing data mining June, 2015 Istituto Superiore di Sanità, SIDBAE Training Room

Università di Padova (IT) Dept. of Industrial Engineering Date: from - to (month/year) September Innovation Marks: N.A.

1962 Hematology 1982 FRCPath 1983 FRCP\

I musei virtuali europei e la rete di eccellenza v-must.net European virtual museum and the network of excellence v-must.net

Date and place of birth 28th January 1958; Moretta (Cuneo) (Italy). Two children

CALL FOR THE SELECTION OF ADVANCED SPECIALIZED RESEARCHERS

INTENSIVE COURSE ON GEODIVERSITY AND GEOHERITAGE: EVALUATION AND INTERPRETATION

Nine partners from Italy, France, Switzerland, Norway, Israel, Sweden, Romany, with the coordination of TERN Consortium (Italy)

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

TRAINING, CONNECTING, SUPPORTING THE BEST MANAGERS OF NON PROFIT AND SOCIAL INNOVATION

National Awarding Committee (NAC) for EuroPsy in Italy: Overview

Ariela Benigni. Biol.Sci.D., Ph.D. Curriculum Vitae

From Data to Foresight:

Who can attend the workshop. Architects, engineers, technicians, technical promoters are allowed to attend the course. Program

EUROPEAN CURRICULUM VITAE FORMAT

Europass curriculum vitae

Software review. Bioinformatics software resources

Programme Specification ( ): MSc in Bioinformatics and Computational Genomics

DISIT Lab, competence and project idea on bigdata. reasoning

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

U NI VERSITÀ DEGLI STUDI DI BARI A LDO MORO, ITA LY OFFICIAL ANNOUNCEMENT CALL FOR APPLICATION

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Transcription:

Integration of Protein-protein Interaction Data in a Genomic and proteomic Data Warehouse CANAKOGLU A, GHISALBERTI G, MASSEROLI M Dipartimentodi Elettronicae Informazione,Politecnicodi Milano, PiazzaLeonardoda Vinci32,20133Milano,Italy Motivation Pratein-Pratein Interaction (PPI) information is used very frequently by biologists and bioinformaticians to interp'ret experimental results in the context of biomolecular interaction networks and test biomedical hypotheses. Nonetheless, there is no single database which covers the entire interaction information; thus, in order to achieve the possible widespread coverage, it is a necessity to combine data fram different databases, often pravide9 in different formats. Several approaches have been proposed to integrate data from multiple sources. When the data to be integrated are very numerous and off-line pracessing is required to efticiently and comprehensively mine the integrated data, data warehousing seems to be the most adequate approach. For this purpose, we previously developed an integrated Genomic and Prateomic Data Warehouse (GPDW) and a software framework for its creation and updating. Currently, GPDWintegrates gene and pratein data ~nd annotations fram a total of 22 data sources, as provided by 10 databanks (Entrez Gene, Homologene, GO, GOA, Bio-Cyc,KEGG,Reactome, IPI, Expasy Enzyme and evoc). Here we report on our last effort of integrating also PPI data into the GPDWin order' to be able to subsequently evaluate these valuable data comprehensively with the other annotations available in the GPDW. Methods The developed GPDWsoftware framework handles ali data import and integration process automatically; data are downloaded from the originai databanks, imported and integrated into the GPDW.5upport for managing interattion data has been added in order to manage also these valuable data. We considered PPIdata provided by two well known databanks, MINI and IntAct. The considere6 databanks provide PPI information by extracting experimental details from work published in peer-reviewed jour~als. MINTis supervised and IntAct contains a great number of interaction data. We developed the automatic procedures to extract and import the data fram these databanks and integrate them with the other data available in the GPDW.These automatic procedures also check consistency of the integrated data by checking their IDswith welldefined regular expressions. 113

CANAKOGLU A ET AL. Results PPIdata files downloaded on March 10. 2011 from MINTand IntAct databases were automatically parsed and data of 373,823 PPls, regarding 58.225 distinct proteins of 352 organisms. were imported in the GPDW.These PPI data resulted well annotated and characterized by the other data and annotations previously integrated in the GPDW.as it can be seen in the figure table. It shows the main annotation features of the interacting proteins, the number of distinct features involved in such annotations. the number of protein interactions and the number of distinct interacting proteins. The percentage of PPIproteins annotated with the Gene Ontology (GO) BiologicalProcesses. Molecular Functions and Cellular Components resulted quite high (59.80%. 76.00% and 66.69%. respectively). Aiso the percentage of distinct GO terms (feature items) involved in such annotations among the whole GO Biological Processo Molecular Function and Cellular Components terms.was fairly high (50.24%. 38.92% and 39.47%, respectively). Availability of PPI data integrated with other valuable annotations is paramount to be able to answer biological questions regarding not only interacting proteins. but also their features, whose consideration allows also to filter the available PPI that often include numerous false positive interactions. A good number of quality annotations of PPIdata with.gene Ontology. pathway. genetie disorder and other valuable information is hence a great support for scientists in understanding unveil new biomedical knowledge. biomolecular experiment results and Contact email masseroli@elet.polimi.it Feature Nr. of annotations Nr. of distinct protein Nr. of feature items Enzyme 8.121 7.549 1,512 Biological Process 112.789 34.817 1.252 Molecular Function 173.804 44.250 3.623 Cellular Component 161.739 38.829 6.790 Pathway 27.997 9.808 305 Genetic Disorder 5.102 1.532 1.795 114

BITS 2011 VIII Annual Meeting o. the Bioinformaties Italian Soeiety JWle 20-22, 2011, Pisa, Italy Edited by Filippo Geraci, Roberto Marangoni, Marco Pellegrini, Maria Elena Renda B t S Socied! di Bioinfor-metice Iteliene

Abstracts presented at the BITS 2011 meeting address novel bioinformatics methods, algorithms, databases, tools and applications for research and development in one or more of the following domains: Genomics Molecular Evolution and Comparative Genomics Protein structure and function Proteomics Transcriptomics Systems Biology Biological Databases and Biobanks Algorithms for Bioinformatics Biophysics and Syntq.etic Biology.'.(

BITS 2011 VIII Annua. Meeting o, the BioinforlDaties Italian Soeiety June 20-22, 2011, Pisa, Italy Edited by Filippo Geraei, Roberto Mar~ngoni,Marco Pellegrini, Maria Elena Renda.' B, t Societ.l!l S Bioinf~ d lc8ian8

- ;r: Istituto = Informatica _l di etelematica UNIVERSITÀ DI PISA Under the patronage of Nation~1Research Council (CNR) University of Pisa president of the Provincial Government of Pisa @ Copyrighf2011 EDIZIONIETS piazzacarrara, 16-19, I-S6126 Pisa info@edizioniets.com www.edizioniets.com Distribuzione PDE, via Tevere S4, I-S0019 Sesto Fiorentino [Firenze] ISBN978-884673069-S

scientific Commlttee GianniCesareni(Universitàdi RomaTorVergata) DomenicaD'Elia(CNR- Istituto di TecnologieBiomediche,Bari) Angelo Facchiano(CNR- Istituto di Scienzedell'Alimentazione,Avellino) Manuela Helmer-Citterich(Universitàdi RomaTorVergata) Sabino Liuni (CNR-Istituto di TecnologieBiomediche,Bari) Roberto Marangoni (Università di Pisa) Marco Pellegrini(CNR- Istituto di Informatica e Telematica,Pisa) PaoloRomano(Istituto NazionaleRicercasul Cancro,Genova) Giorgio Valle(Universitàdi Padova) Organlzlng Committee Roberto Marangoni (Univ~rsitàdi Pisa)- Organizationco-chair Marco Pellegrini(CNR- Istituto di Informatica e Telematica,Pisa)-Organizationco-chair GiulianoColombetti (CNR- Istituto di Biofisica,Pisa) PierpaoloDegano(Universitàdi Pisa,Pisa) Andrea Frosini(FondazioneToscanaLife-Sciences,Siena) Filippo Geraci(CNR- Istituto di Informatica e 1elematica,Pisa) Adriana Làzzaroni(CNR- Istituto di Informatica e Telematica,Pisa) Giulia Menconi (Istituto Nazionaledi Alta Matematica, Roma) Nadia Pisanti(Università di Pisa,Pisa) M. ElenaRenda(CNR- Istituto di Inform(!ticae Telematica,Pisa) Organlzatlon Secretarlat Adriana Lazzaroni(CNR--Istituto di Informatica e Telematica,Pisa)- Coordinator PatriziaAndronico (CNR- Istituto di Informatica e Telematica,Pisa) RaffaellaCasarosa(CNR- Istituto di Informatica e Telematica,Pisa) Book of Abstracts M. ElenaRenda(CNR- Istitutodi Informaticae Telematica, Pisa) Edltorlal Assistant PatriziaAndronico(CNR- Istitutodi InformaticaeTelematica,Pisa) Officlal Web site Filippo Geraci(CNR- Istituto di Informaticae Telematic~,Pisa)...

Sponsors Istituto di Informatica e Telematica del Consiglio Nazionale delle Ricerche Dipartimento di Informatica (Università di Pisa) BITS Società di Bioinformatica Italiana Progetto Bioinformatica del Consiglio Nazionale delle Ricerche Associazione per la Fondazione Giuliano Preparata Toscana Life Sciences Rete Nazionale di Bioinformatica Oncologica TD Group Comune di Pisa Provincia di Pisa Regione Toscana ~ ;r: Istituto = Informatica _I di etelematica UNIVERSITÀ Di PIS." B ' te il. Sociecù d;. Bio,infoNnSt<tce Italiana IIIII:IJmIIJ\w~"roà!. 'III CNR Bioinformatics J~. ~- Ital ian N~twork for Oncology Bioinformatics. Rete Nazionale di Bioinformatica Oncologica, REGIONE TOSCANA -"....'..

The Bioinformatics ITalian Society (BITS ~ a non-profit scientific association chartered OD June 19,2003. The Bioinformatics ITalian Society (BITS. aims at joining research scientists interested in Bioinformatics, meant as a multi-disciplina~' science for the study of biological systems at the molecular and cellular level by using informatics and computational methods and models. Main goals of the associations are the study, development and spreading of Bioinformatics in a scientific, academic, technologic and industriai environment. The annual meeting of BITS is an increasingly important event providing an overview of the Italian bioinformatics research and an international for~m for in-depth assessment of new results and new challenges in the fast moving field of Life Sciences research. B ts 8.fT.S. Office Sede deua Società Italiana di Bioinformatica Address: Sezione di Genomica e Bioinformatica' dell'istituto Tecnologie Biomediche del CNR Via Amendola 122/d 70126 Bari, Italia.' URL: http://www,bioinformatics.it/ E-mail: bits@bioinformatics.it ISBN 978-884673069-5 111111 III 9"788846 730695