Curriculum Vitae. Valter Crescenzi. February 2012



Similar documents
Time: A Coordinate for Web Site Modelling

Publication list for. Matteo Golfarelli. International journal

Curriculum Vitae et Studiorum

Curriculum Vitae. Alessandro Formaglio. CURRENT POSITION Research Associate from 2007, Department of Information Engineering, University of Siena.

ANDREA COLOMBARI. Curriculum vitae

Curriculum Vitae et Studiorum Dossier n Cinzia Di Giusto

Antonino Freno. Curriculum Vitae. Phone (office): Office: +33 (0)

FRANCESCO BELLOCCHIO S CURRICULUM VITAE ET STUDIORUM

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Curriculum Vitae et Studiorum

CURRICULUM VITAE Pier Francesco Cortese

CURRICULUM VITAE. (September, 24th 2008)

Curriculum of the research and teaching activities. Matteo Golfarelli

Martino Sykora CURRICULUM VITAE ET STUDIORUM

Salvatore Capasso (MA Econ, PhD)

Francesco Merlo Curriculum Vitæ

Smart Transport for Sustainable City

CLAUDIO ROSSETTI Curriculum Vitæ. Place of birth: Rome, Italy. Date of birth: April 12,

UNIVERSITÀ DI PISA Department of Computer Science. Master s degree in Business Informatics (2 years, 120 ECTS)

Sofia Ceppi. Personal Information 2. Association Memberships 2. Education 2. Academic Positions and Affiliations 3

CURRICULUM VITAE ET STUDIORUM

WORKSHOP LIGHT SOURCES

CURRICULUM VITAE CECILIA ROSSIGNOLI

Curriculum Vitae Antonino Zanette. Education. Employement. Activity Research

22/11/ :08:30 Pag. 1/10

CURRICULUM VITAE DI FEDERICO LUCHI

Poste Italiane ICT Measurement

Europass Curriculum Vitae

Europass Curriculum Vitae

«Software Open Source come fattore abilitante dei Progetti per le Smart Cities»

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

MINUTES OF THE ADMINISTRATIVE COMMITTEE: EVALUATION OF THE REQUIRED INTEGRATIONS

Born on October 30, 1967, in Rome Married to Silvia, with two children: Livia (7), and Penelope (5).

The Different Types of Engineering Field Development Programs

How To Understand The Theory Of Network Routing In A Computer Program

Curriculum Vitae et Studiorum

Guide: How to fill out your Enrollment Application form for Master Degree courses

CURRICULUM VITAE Michele Gianfelice

Place and date of birth Rome, November 26 th 1983

Curriculum Vitae Marika Arena

CAR AND THE CITY: ATTORI INNOVATIVI E SCENARI SOCIO-TECNOLOGICI. Gerardo Marletto TAT/Lab, Università di Sassari marletto@uniss.it

FOSS Relational Database and GeoDatabase Part II. PostgreSQL, Data Base Open Source and GRASS. Marco Ciolli, Fabio Zottele

XXI Brain Storming Day

Curriculum Vitae et Studiorum Antonio Cigliola

Emanuele Storti Scientific curriculum

Curriculum Vitae Europass

(removed from the online version) Cortona (Arezzo), ITALY. Date of Birth: 21st March

DISIT Lab, competence and project idea on bigdata. reasoning

Dip. di Scienze Economiche, Statistiche e Finanziarie, Università della Calabria, Ponte P. Bucci, Arcavacata di Rende (CS), Italy

Vincenzo Gattulli. Dipartimento di Ingegneria Civile, Edile-Architettura, Ambientale Università di L Aquila, Italy.

Antonio Di Pietro. Italian

Gian-Luca Dei Rossi Curriculum Vitae et Studiorum

Curriculum Vitae Et Studiorum

Link to Google Scholar personal page Link to Scopus personal page

Statistics Jobs. La mia esperienza nell industria farmaceutica Silvia Barbi, Statistician at Novartis Vaccines Bologna, 9 Maggio 2014

(A) DESNET (DEmand & Supply NETwork) Identification. Identification

Curriculum Vitae Luca Miceli

Doctorate of Philosophy Candidate, Information and Communication Technologies, January March 2015.

MATTEO RIONDATO Curriculum vitae

Presente e futuro del Web Semantico

CURRICULUM VITAE. Phd in computer science

MOSTRA. Venezia, ex chiesa di Santa Marta > inaugurazione ore 19

Intinno: A Web Integrated Digital Library and Learning Content Management System

Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms

Sofia Ceppi CURRICULUM VITAE ET STUDIORUM PERSONAL INFORMATION

Exemplar. Novembre Cosima Fiaschi Fabrizio Trentini

Europass Curriculum Vitae

E U R O P E A N C U R R I C U L U M V I T A E F O R M A T PERSONAL INFORMATION

Florida International University - University of Miami TRECVID 2014

Programma corso di formazione J2EE

Catalogo Corsi 2013/2014

Carlo Turri Research fellow and freelance designer Curriculum Vitae 2O14

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

Curriculum Vitae et Studiorum. Antonio Cigliola

Laura Titolo. Dept. of Languages and Computational Science, University of Malaga. Personal Information. Short Bio. date of birth 26 September 1986

Transcription:

Curriculum Vitae Valter Crescenzi February 2012

Contact Info Valter Crescenzi Via della Vasca Navale, 79 I-00146 Rome, Italy Tel. +39 06 5733 3535 e mail: crescenz@dia.uniroma3.it Current Position: Assistant Professor at Università degli Studi Roma Tre Research Activities Research Positions Assistant Professor at Facoltà di Ingegneria of Università degli Studi Roma Tre (2005 Junior Researcher for Dipartimento di Informatica ed Automazione of Università degli Studi Roma Tre. (2003 2004) Research Fellow (project Gestione dei dati per i processi decisionali: acquisizione, integrazione e presentazione ) at Dipartimento di Informatica ed Automazione of Università degli Studi Roma Tre, under the supervision of Prof. Paolo Atzeni. (2002 2003) Education PhD in Computer Engineering received on february 2002 from Dipartimento di Sistemistica of Università degli Studi di Roma La Sapienza. Dissertation: On Automatic Data Extraction from Large Websites [40]. Supervisors: Prof. Paolo Atzeni and Prof. Giansalvatore Mecca. The main results has been published on international journal [2] and presented in international conferences [8]. Computer Engineering degree ( Laurea in Ingegneria Informatica ) in 1998 from Università degli Studi Roma Tre, with a thesis titled Un riconoscitore di grammatiche formali con gestione delle eccezioni, under the supervision of Prof. Paolo Atzeni and Prof. Giansalvatore Mecca. The main results has been published on a international journal [1]. 1

Research Topics During his master thesis, Valter Crescenzi developed interests for research topics related to information extraction from web sources. Initially (1998 1999) he was interested to the definition of a new formalism for manual yet effective specification of wrapper software modules, i.e. programs able at extracting structured information from unstructured web pages. He developed a formalism aiming at joining the advantages of declarative languages (such as grammars) and procedural languages (such as editing scripts) for expressing effective and precise extraction rules [1]. During his PhD studies (1999-2002), he researched how to further improve the level of automation of wrapper production for large website [24, 6, 25], whose pages are generally produced by quering an undelying database and embedding the query results into a fixed HTML template. Even if this websites contains a large number of pages, they can usually be classified in a relatively small number of classes [10] composed of structurally similare pages. This research activities produced results in two phases, from 1999 to 2002, and from 2002 to 2004: in the former phase (1999-2002) an innovative algorithm for inferring regular expressions has been proposed: the algorithm was based on a progressive and comparative analysis of sample pages obaying to the a regular grammar picked from a family of grammars crafted on purpose [8, 26, 9] in the latter phase (2002-2004) the relationships between this algorithm, presented to the data extraction community, and the learning algorithms presented inside the much more consolidated grammar inference community [41, 42] has been clarified, with interesting results for both communities [27, 2] Namely, it has been claried that many inference algorithms taking only positive samples as input (a paradigm known as identification in the limit [41]) that were studied by researchers of the grammar inference community, were not useful as a tool to produce wrappers. One of goals of that community is to study how to learn expressive class of languages, but the more expressive the class of languages inferred, the less likely is the availability of a representative and finite sample of pages [42]). Since a wrapper generation tool requires a non-expert user to provide these samples, they should be obtained by randomly picking a small number of sample pages [27]. A class of languages (called Prefix Mark-Up Languages ) identifiable in the limit has been 2

proposed as first example of class of languages suitable for wrapper generation and formally studied [2]. The following reasearches (2004-2008) can be summarized into two main lines: the grammar inference algorithm has been refined [29, 13, 11] to deal with many structures frequently occuring on the Web; the class of languages identifiable in the limit has been expanded maintaining the simplicity of the characteristic samples [4]. Following research studies (2002-2008) aimed at scaling out the extraction process to cover many classes of pages from several large websites. Many research issues arises, including the effective crawling of sample pages within a website [12, 15, 3], and the classification of downloaded pages into classes suitable from automatic wrapping. There have been pursued both approached base on the analysis of the regularities in the inner structure of pages [10, 14], and approaches based on the analysis of the regularities in the topology of large website [30, 3, 34, 18]. Recently, this reasearch line has been further expanded at the web scale [17, 31, 5] tackling the additional issues related to searching and retrieving websites publishing relevant information [36, 31], their integration [32], and the scalability of the overall approach [38]. In this context, naturally arises the idea of characterizing probabilistically the quality of the extracted information [37, 19, 20] and the accuracy of the involved sources [21, 33, 39, 22], even in presence of copiers amongst them. Most of this research activities has been developed in the context of international and national research projects. Partecipation to Research Projects international research project INTAS: Modeling and Management of Semi Structured Data for Dynamic World Wide Web Applications (1999 2000). national research project MURST (ex 40%) Data X: Gestione, Trasformazione e Scambio di Dati in Ambiente Web (1999 2000). FIRB-MIUR project MAIS: Multichannel adaptive information systems (2002 2006). european research project (Vfp) MOSES: MOdular and Scalable Environment for the Semantic web (2002 2006). national research project MIUR ECD: Tecnologie per arricchire e fornire accesso a contenuti (2002 2005). 3

national research project (PRIN) WISDOM: Ricerca Intelligente su Web basata su Ontologie di Dominio (2004 2006) principal investigator of a project for realizng an industrial demonstrator of a web data extractor. The project has been funded by progetto DOCUP Obiettivo 2 Regione Lazio Programma 2000-2006 sottomisura II.5.2. (2005 2007). national research project MIUR NGS: Nuove Tecnologie e Strumenti per l Interrogazione di Servizi di Ricerca su Web (2007 2009). project MORNING - Metodologie e strumenti per analizzare dati da sorgenti del Social Web. FILAS-RS-2009-1132, funded by CUP F87I10000750007, POR FERS Lazio 2007/2013 Asse I Attività I.1. (2009 2012) national research project (PRIN) EASE: Identificazione, riconciliazione, estrazione e integrazione di Entità dal Web (2010 2012). Other Collaborations Dal 1999 al marzo 2004 ha partecipato alla progettazione, creazione e gestione della versione XML del sito online di ACM Sigmod Record. In particolare si è occupato dell estrazione dei dati da sorgenti web ed il loro riversamento in formato XML. Il risultato dell iniziativa è stato oggetto di molti studi scientifici. Member of the Committee Program of several national and international conferences Workshop on Adaptive Text Extraction and Mining (ATEM 2003), Workshop on Adaptive Text Extraction and Mining (ATEM 2006), International Conference on Web Information Systems Engineering (WISE 2008), Sistemi Evoluti per Basi di Dati (SEBD 2012) External reviewer for many conferences including (SAC 2002, ACM SIGMOD 2003, ICWE 2004, VLDB 2004, ACM SIGMOD 2005, ICDE 2006, EDBT 2006, VLDB 2007) Reviewer for international journal such as Information Systems (Kluwer Publishers), Software: Practice and Experience (Wiley), Data And Knowledge Engineering (Elsevier), Journal of Intelligent Information Systems (Springer) 4

He has been the presenting author in these international conferences: SAC 2002 (Madrid, Spagna), WebDB 2003 (San Diego, USA), ATEM 2003 (San Josè, USA), WEBIST 2005 (Miami, USA), ICDE2006 Workshops (Atalanta, USA) Panelist during the Workshop on Adaptive Text Extraction and Mining (ATEM 2003) co-founder of an academic spin-off Chi-Technologies s.r.l. a company partecipated by Università degli Studi Roma Tre whose goal is the industrial enhancement of the research results on the automatic information extraction from the Web Teaching Experience Institutional Teaching Activities He has been tearcher of the following academic courses, Facoltà di Ingegneria, Università degli Studi Roma Tre : Sistemi Operativi II, 2003/2004, 2004/2005 Programmazione Concorrente, 2005/2006, 2006/2007, 2007/2008, 2008/2009, 2009/2010, 2010/2011 e 2011/2012 Elementi di Informatica, 2010/2011, 2011/2012 Programmazione Orientata agli Oggetti, 2004/2005, 2005/2006, 2006/2007, 2007/2008 He has been teaching assistant for the following courses, Facoltà di Ingegneria, Università degli Studi Roma Tre, Sistemi Operativi, academic year 2000/2001 Sistemi Operativi 1, Sistemi Operativi 2, 2002/2003 Programmazione Orientata agli Oggetti, 2002/2003 Ingegneria del Software, 2003/2004 Progetto di Sistemi Informatici, 2004/2005, 2005/2006, 2006/2007 e 2007/2008 5

He teached in the following second-level master courses of Università degli Studi Roma Tre : Basi di Dati Master Universitario in Economia e Tecnologia della Società dell Informazione academic years 2001/2002, 2002/2003, and 2003/2004 Basi di Dati Master Universitario in Governance, Sistema di Controllo e Auditing academic years 2005/2006 e 2006/2007 Programmazione orientata agli oggetti, Basi di dati ed XML, Metodi per lo sviluppo agile Master Universitario in Governo dei Sistemi Informativi: sviluppo, gestione, monitoraggio, 2007/2008 È stato docente per corsi di Basi di dati e Metodi per lo sviluppo agile Master Universitario in Governo dei Sistemi Informativi: sviluppo, gestione, monitoraggio 2009/2010 and 2011/2012 He is tutor of the following academic courses, Facoltà di Ingegneria, l Università Telematica Internazionale UNINETTUNO: Sistemi Informativi e Basi di dati Corso di Studi in Ingegneria Informatica ed Ingegneria Gestionale academic year 2011/2012 Ingegneria del Software e Programmazione ad Oggetti Corso di Studi in Ingegneria Informatica, 2011/2012 Other Institutional Teaching Activities During academic years 2004/2005, 2005/2006, 2006/2007, and 2007/2008 he designed and supervisioned the developmnet of a web application for partially automatizing the exams of several programming courses of Facoltà di Ingegneria, Università degli Studi Roma Tre, including Programmazione Orientata agli Oggetti, Fondamenti di Informatica I, Laboratorio di Informatica, and Programmazione Concorrente. Professional Teaching Experience He has been teacher for the following courses: Progettazione Banche Dati for Engineering Ingegneria Informatica SpA (2000 2003) 6

Progettazione Banche Dati, Il linguaggio XML, Sistemi Operativi for Direzione Corsi Elettronica, Optoelettronica ed Informatica for Ministero della Difesa (2003 2007) Basi di Dati for Scuola di Polizia Tributaria. Specialista Sviluppo Applicazioni Object Oriented, Analista Programmatore for Centro Italiano Opere Femmilili Salesiane - Formazione Professionale Il linguaggio UML, for Sudgest S.C.p.a Progettista di Siti Web, for ENAIP Lazio, 2010. Publications International Journals [1] V. Crescenzi and G. Mecca. Grammars Have Exceptions. Information Systems, 23(8): 539-565 (1998) [2] V. Crescenzi and G. Mecca. Automatic information extraction from large websites. Journal of the ACM, 51(5): 731-779 (2004) [3] V. Crescenzi, P. Merialdo and P. Missier. Clustering Web pages based on their structure. Data & Knowledge Engineering, 54(3): 279-299 (2005) [4] V. Crescenzi and P. Merialdo. Wrapper Inference for Ambiguous Web Pages. Applied Artificial Intelligence, 22(1):21-52, (2008) [5] L. Blanco, V. Crescenzi and P. Merialdo. Structure and Semantics of Dataintensive Web Pages: an Experimental Study of their Relationships. Journal of Universal Computer Science. Special Issue on Wrapping Web Data Islands. International Conference Proceedings [6] G. Mecca, P. Merialdo, P. Atzeni and V. Crescenzi. The (short) Araneus Guide to Web Site Development. Second Workshop on Databases and the Web (WebDb 99) in conjunction with ACM SIGMOD 99, Philadelphia (Pennsylvania), (giugno 1999). 7

[7] V. Crescenzi, G. Mecca and P. Merialdo. The RoadRunner Project: towards Automatic Extraction of Web Data. International Workshop on Automatic Text Extraction Methods (ATEM 2001) in conjunction with Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), Seattle (Washington), (2001). [8] V. Crescenzi, G. Mecca and P. Merialdo. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. Proceedings of the 27th International Conference on Very Large Databases (VLDB 2001), Roma (Italy), pag. 109 119, Morgan Kaufmann, (2001). [9] V. Crescenzi, G. Mecca and P. Merialdo. Automatic Web Information Extraction in the RoadRunner System. International Workshop on Data Semantics in Web Information Systems (DASWIS 2001) in conjunction with 20th International Conference on Conceptual Modeling (ER 2001), Yokahama (Japan). Lecture Notes in Computer Science 2465 Springer, (2002). [10] V. Crescenzi, G. Mecca and P. Merialdo. Wrapping-oriented classification of web pages. ACM Symposium on Applied Computing (SAC), 10-14 Marzo, 2002, Madrid (Spain). ACM Press (2002). [11] L. Arlotta, V. Crescenzi, G. Mecca and P. Merialdo. Automatic annotation of data extracted from large Web sites. Sixth Int. Workshop on Databases and the Web (WebDb 99) in conjunction with ACM SIGMOD 03, San Diego (California), (giugno 2003). [12] V. Crescenzi, P. Merialdo and P. Missier. Fine-grain Web Site Structure Discovery. Fifth ACM CIKM International Workshop on Web Information and Data Management (ACM WIDM 2003), Novembre 2003, New Orleans (Lousiana). ACM Press (2003). [13] V. Crescenzi, G. Mecca and P. Merialdo. Handling irregularities in roadrunner. The AAAI-04 International Workshop on Adaptive Text Extraction and Mining (ATEM 2004), July 26th, 2004, San Jose (California) (2004). [14] V. Crescenzi, G. Mecca, P. Merialdo and P. Missier. An Automatic Data Grabber for Large Web Sites. Proceedings of the 30th International Conference on Very Large Databases (VLDB 2004), Settembre 2004, Toronto (Ontario, Canada) (2004). 8

[15] L. Blanco, V. Crescenzi, and P. Merialdo. Efficiently Locating Collections of Web Pages to Wrap. First International Conference on Web Information Systems and Technologies, May 2005, Miami (Florida) (2005). [16] V. Crescenzi, and P. Merialdo. Efficient Techniques for Effective Wrapper Induction. Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, April 2006, Atlanta (Georgia) USA. [17] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Flint: Google-basing the Web. 11th International Conference on Extending Database Technology, Nantes, France, March 2008. [18] C. Bertoli, V. Crescenzi, and P. Merialdo. Crawling Programs for Wrapperbased Applications. The 2008 IEEE International Conference on Information Reuse and Integration (IEEE IRI-08), July 13-15, 2008 - Las Vegas, USA. [19] L. Blanco, M. Bronzi, V. Crescenzi, P. Merialdo and P. Papotti. Exploiting information redundancy to wring out structured data from the web. The 19nd International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010. [20] L. Blanco, M. Bronzi, V. Crescenzi, P. Merialdo and P. Papotti. Redundancy- Driven Web Data Extraction and Integration. The 13th International Workshop on the Web and Databases, WebDB 2010, Indianapolis, Indiana, USA, June 6, 2010. [21] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources.The 22nd International Conference on Advanced Information Systems Engineering, CAiSE 10, Hammamet, Tunisia, June 2010. [22] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Automatically Building Probabilistic Databases from the Web The 20th International Conference on World Wide Web, WWW 2011, hyderabad, India, March 18-April 1, 2011. [23] M. Bronzi, V. Crescenzi, P. Merialdo and P. Papotti. Wrapper Generation for Overlapping Web Sources. Web Intelligence 2011, WebDB 2010, Lyon, France, August 22-27, 2011. 9

National Conference Proceedings [24] G. Mecca, P. Merialdo, P. Atzeni and V. Crescenzi. The ARANEUS Guide to Web Site Development. (versione estesa di [6]) Atti del Settimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD 99): pag. 167 177, Como, 23 25 giugno 1999. [25] G. Mecca, P. Merialdo, P. Atzeni and V. Crescenzi. Experiences in XML data management. Atti dell Ottavo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2000): pag. 109 119, L Aquila, 24 26 giugno 2000. [26] V. Crescenzi, G. Mecca and P. Merialdo. The RoadRunner Web Data Extraction System. Atti del Nono Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2001), Venezia, 27 29 giugno 2001. [27] V. Crescenzi, G. Mecca and P. Merialdo. Back to Gold s Age: Bridging the Gap Between Traditional Grammar Inference and Web Information Extraction. Atti del Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2002), Isola d Elba, giugno 2002. [28] L. Arlotta, V. Crescenzi, G. Mecca and P. Merialdo. Automatic annotation of data extracted from large Web sites. Atti dell Undicesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2003), Cetraro (CS), giugno 2003. [29] V. Crescenzi, G. Mecca and P. Merialdo. Improving the expressiveness of RoadRunner. Atti del Dodicesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2004), Cagliari, giugno 2004. [30] L. Blanco, V. Crescenzi, and P. Merialdo. Harvesting Structurally Similar Pages. Atti del Tredicesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2005), Bressanone, giugno 2005. [31] L. Blanco, V. Crescenzi, P. Merialdo. Searching Entities on the Web by Sample. Atti del Sedicesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2008), Mondello (PA), giugno 2008. [32] L. Blanco, V. Crescenzi, P. Merialdo. Data Extraction and Integration from Imprecise Web Sources. Atti del Diciassettesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2009), Camogli (GE), giugno 2009. 10

[33] L. Blanco, V. Crescenzi, P. Merialdo. Probabilistic Reconciliation of Records from Inaccurate Web Sources. Atti del Diciottesimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD2010), Rimini, giugno 2010. Technical Reports [34] V. Crescenzi, P. Merialdo and P. Missier. Discovering the structure of large web sites. Rapporto Tecnico RT-DIA-89-2004, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2004). [35] L. Blanco, V. Crescenzi and P. Merialdo. Automatically Generating Reports from Large Web Sites. Rapporto Tecnico RT-DIA-90-2004, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2004). [36] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Searching Entities on the Web by Sample. Rapporto Tecnico RT-DIA-121-2007, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2007). [37] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. A Probabilistic Model to Characterize the Uncertainty of Web Data Integration: What Sources Have The Good Data? Rapporto Tecnico RT-DIA-146-2009, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2009). [38] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Exploiting Information Redundancy to Extract and Integrate Data from the Web. Rapporto Tecnico RT- DIA-151-2009, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2009). [39] L. Blanco, V. Crescenzi, P. Merialdo and P. Papotti. Probabilistic Models to Reconcile Complex Data from Inaccurate Data. Rapporto Tecnico RT-DIA- 170-2010, Università degli Studi Roma Tre, Dipartimento di Informatica e Automazione (2010). PhD Thesis [40] V. Crescenzi. On Automatic Information Extraction from Large Websites. Collana delle Tesi di Dottorato, Università degli Studi di Roma La Sapienza (2002). 11

Other Cited Publications [41] E. M. Gold. Language identification in the limit. Information and Control. 10(5), 447 474. [42] D. Angluin. Inference of Reversible Languages. Journal of the ACM. 29(3), 741 765. 12