Integration of Protein-protein Interaction Data in a Genomic and proteomic Data Warehouse CANAKOGLU A, GHISALBERTI G, MASSEROLI M Dipartimentodi Elettronicae Informazione,Politecnicodi Milano, PiazzaLeonardoda Vinci32,20133Milano,Italy Motivation Pratein-Pratein Interaction (PPI) information is used very frequently by biologists and bioinformaticians to interp'ret experimental results in the context of biomolecular interaction networks and test biomedical hypotheses. Nonetheless, there is no single database which covers the entire interaction information; thus, in order to achieve the possible widespread coverage, it is a necessity to combine data fram different databases, often pravide9 in different formats. Several approaches have been proposed to integrate data from multiple sources. When the data to be integrated are very numerous and off-line pracessing is required to efticiently and comprehensively mine the integrated data, data warehousing seems to be the most adequate approach. For this purpose, we previously developed an integrated Genomic and Prateomic Data Warehouse (GPDW) and a software framework for its creation and updating. Currently, GPDWintegrates gene and pratein data ~nd annotations fram a total of 22 data sources, as provided by 10 databanks (Entrez Gene, Homologene, GO, GOA, Bio-Cyc,KEGG,Reactome, IPI, Expasy Enzyme and evoc). Here we report on our last effort of integrating also PPI data into the GPDWin order' to be able to subsequently evaluate these valuable data comprehensively with the other annotations available in the GPDW. Methods The developed GPDWsoftware framework handles ali data import and integration process automatically; data are downloaded from the originai databanks, imported and integrated into the GPDW.5upport for managing interattion data has been added in order to manage also these valuable data. We considered PPIdata provided by two well known databanks, MINI and IntAct. The considere6 databanks provide PPI information by extracting experimental details from work published in peer-reviewed jour~als. MINTis supervised and IntAct contains a great number of interaction data. We developed the automatic procedures to extract and import the data fram these databanks and integrate them with the other data available in the GPDW.These automatic procedures also check consistency of the integrated data by checking their IDswith welldefined regular expressions. 113
CANAKOGLU A ET AL. Results PPIdata files downloaded on March 10. 2011 from MINTand IntAct databases were automatically parsed and data of 373,823 PPls, regarding 58.225 distinct proteins of 352 organisms. were imported in the GPDW.These PPI data resulted well annotated and characterized by the other data and annotations previously integrated in the GPDW.as it can be seen in the figure table. It shows the main annotation features of the interacting proteins, the number of distinct features involved in such annotations. the number of protein interactions and the number of distinct interacting proteins. The percentage of PPIproteins annotated with the Gene Ontology (GO) BiologicalProcesses. Molecular Functions and Cellular Components resulted quite high (59.80%. 76.00% and 66.69%. respectively). Aiso the percentage of distinct GO terms (feature items) involved in such annotations among the whole GO Biological Processo Molecular Function and Cellular Components terms.was fairly high (50.24%. 38.92% and 39.47%, respectively). Availability of PPI data integrated with other valuable annotations is paramount to be able to answer biological questions regarding not only interacting proteins. but also their features, whose consideration allows also to filter the available PPI that often include numerous false positive interactions. A good number of quality annotations of PPIdata with.gene Ontology. pathway. genetie disorder and other valuable information is hence a great support for scientists in understanding unveil new biomedical knowledge. biomolecular experiment results and Contact email masseroli@elet.polimi.it Feature Nr. of annotations Nr. of distinct protein Nr. of feature items Enzyme 8.121 7.549 1,512 Biological Process 112.789 34.817 1.252 Molecular Function 173.804 44.250 3.623 Cellular Component 161.739 38.829 6.790 Pathway 27.997 9.808 305 Genetic Disorder 5.102 1.532 1.795 114
BITS 2011 VIII Annual Meeting o. the Bioinformaties Italian Soeiety JWle 20-22, 2011, Pisa, Italy Edited by Filippo Geraci, Roberto Marangoni, Marco Pellegrini, Maria Elena Renda B t S Socied! di Bioinfor-metice Iteliene
Abstracts presented at the BITS 2011 meeting address novel bioinformatics methods, algorithms, databases, tools and applications for research and development in one or more of the following domains: Genomics Molecular Evolution and Comparative Genomics Protein structure and function Proteomics Transcriptomics Systems Biology Biological Databases and Biobanks Algorithms for Bioinformatics Biophysics and Syntq.etic Biology.'.(
BITS 2011 VIII Annua. Meeting o, the BioinforlDaties Italian Soeiety June 20-22, 2011, Pisa, Italy Edited by Filippo Geraei, Roberto Mar~ngoni,Marco Pellegrini, Maria Elena Renda.' B, t Societ.l!l S Bioinf~ d lc8ian8
- ;r: Istituto = Informatica _l di etelematica UNIVERSITÀ DI PISA Under the patronage of Nation~1Research Council (CNR) University of Pisa president of the Provincial Government of Pisa @ Copyrighf2011 EDIZIONIETS piazzacarrara, 16-19, I-S6126 Pisa info@edizioniets.com www.edizioniets.com Distribuzione PDE, via Tevere S4, I-S0019 Sesto Fiorentino [Firenze] ISBN978-884673069-S
scientific Commlttee GianniCesareni(Universitàdi RomaTorVergata) DomenicaD'Elia(CNR- Istituto di TecnologieBiomediche,Bari) Angelo Facchiano(CNR- Istituto di Scienzedell'Alimentazione,Avellino) Manuela Helmer-Citterich(Universitàdi RomaTorVergata) Sabino Liuni (CNR-Istituto di TecnologieBiomediche,Bari) Roberto Marangoni (Università di Pisa) Marco Pellegrini(CNR- Istituto di Informatica e Telematica,Pisa) PaoloRomano(Istituto NazionaleRicercasul Cancro,Genova) Giorgio Valle(Universitàdi Padova) Organlzlng Committee Roberto Marangoni (Univ~rsitàdi Pisa)- Organizationco-chair Marco Pellegrini(CNR- Istituto di Informatica e Telematica,Pisa)-Organizationco-chair GiulianoColombetti (CNR- Istituto di Biofisica,Pisa) PierpaoloDegano(Universitàdi Pisa,Pisa) Andrea Frosini(FondazioneToscanaLife-Sciences,Siena) Filippo Geraci(CNR- Istituto di Informatica e 1elematica,Pisa) Adriana Làzzaroni(CNR- Istituto di Informatica e Telematica,Pisa) Giulia Menconi (Istituto Nazionaledi Alta Matematica, Roma) Nadia Pisanti(Università di Pisa,Pisa) M. ElenaRenda(CNR- Istituto di Inform(!ticae Telematica,Pisa) Organlzatlon Secretarlat Adriana Lazzaroni(CNR--Istituto di Informatica e Telematica,Pisa)- Coordinator PatriziaAndronico (CNR- Istituto di Informatica e Telematica,Pisa) RaffaellaCasarosa(CNR- Istituto di Informatica e Telematica,Pisa) Book of Abstracts M. ElenaRenda(CNR- Istitutodi Informaticae Telematica, Pisa) Edltorlal Assistant PatriziaAndronico(CNR- Istitutodi InformaticaeTelematica,Pisa) Officlal Web site Filippo Geraci(CNR- Istituto di Informaticae Telematic~,Pisa)...
Sponsors Istituto di Informatica e Telematica del Consiglio Nazionale delle Ricerche Dipartimento di Informatica (Università di Pisa) BITS Società di Bioinformatica Italiana Progetto Bioinformatica del Consiglio Nazionale delle Ricerche Associazione per la Fondazione Giuliano Preparata Toscana Life Sciences Rete Nazionale di Bioinformatica Oncologica TD Group Comune di Pisa Provincia di Pisa Regione Toscana ~ ;r: Istituto = Informatica _I di etelematica UNIVERSITÀ Di PIS." B ' te il. Sociecù d;. Bio,infoNnSt<tce Italiana IIIII:IJmIIJ\w~"roà!. 'III CNR Bioinformatics J~. ~- Ital ian N~twork for Oncology Bioinformatics. Rete Nazionale di Bioinformatica Oncologica, REGIONE TOSCANA -"....'..
The Bioinformatics ITalian Society (BITS ~ a non-profit scientific association chartered OD June 19,2003. The Bioinformatics ITalian Society (BITS. aims at joining research scientists interested in Bioinformatics, meant as a multi-disciplina~' science for the study of biological systems at the molecular and cellular level by using informatics and computational methods and models. Main goals of the associations are the study, development and spreading of Bioinformatics in a scientific, academic, technologic and industriai environment. The annual meeting of BITS is an increasingly important event providing an overview of the Italian bioinformatics research and an international for~m for in-depth assessment of new results and new challenges in the fast moving field of Life Sciences research. B ts 8.fT.S. Office Sede deua Società Italiana di Bioinformatica Address: Sezione di Genomica e Bioinformatica' dell'istituto Tecnologie Biomediche del CNR Via Amendola 122/d 70126 Bari, Italia.' URL: http://www,bioinformatics.it/ E-mail: bits@bioinformatics.it ISBN 978-884673069-5 111111 III 9"788846 730695