Curriculum Vitae et Studiorum Giovanni Costa March 1, 2007 1 Personal Information Birth date and birth place: April 14 1976, Milano (MI) Italy. Nationality: Italian. Address: via Messina 42, 89026 San Ferdinando (RC), Italy. Office telephone: +39 0984/494618 Personal telephone: +39 393/3197290 Home page: http://www.icar.cnr.it/costa E-mail: costa@icar.cnr.it; gcosta@deis.unical.it 1
2 Education On Febraury 26 2007, he received Ph.D. in Computer Science from Università della Calabria, Cosenza, Italy. Title of the thesis: Knowledge Management and Extraction in XML Data. Advisors: Prof. Domenico Saccà, Dr. Giuseppe Manco. On May 28 2003, he received Laurea degree cum laude in Computer Science Engineering from Università della Calabria, Cosenza, Italy. Title of the thesis : Compressione di dati XML per l ottimizzazione delle interrogazioni. Advisors: Prof. Domenico Saccà, Ing. Angela Bonifati, Ing. Andrea Pugliese. 3 Research activities He was a Ph.D. Student at D.E.I.S (Electronic, Informatics and Systems Department - Università della Calabria) until October 2006. He is currently a post-doc Research Fellow at ICAR-CNR (National Research Council of Italy), Rende (CS). 3.1 Research topics Querying of XML data in compressed domain. XML documents have an inherent textual nature due to redundant tags and to the PCDATA content. Therefore, they lead themselves naturally to compression. Once the compressed documents are produced, however, one would like to be able to still query them under a compressed form as much as possible (lazy decompression). The advantages of processing queries in the compressed domain are several: first, in a traditional query setting, access to small chunks of data may lead to less disk I/Os and reduce the query processing time; second, the memory and computation efforts in processing compressed data can be dramatically lower than those for uncompressed ones, thus even lowbattery mobile devices can afford them; third, the possibility of obtaining compressed query results allows to spare network bandwidth when sending these results to a remote location. The XQueC system compresses XML data and queries it as much as possible under its compressed form, covering all real-life, complex classes of queries. The XQueC system adheres to the following approach: (i) XQueC takes advantage of the XMill principle of compressing separately data and structure for efficiently querying compressed data. (ii) It adopts a simple storage model suitable for compressed XML, and a set of access support 2
structures, allowing for many evaluation alternatives for complex XQuery query. Several storage methods are possible; we view ours as a simple choice for making a proof of concept. (iii) XQueC seamlessly extends a simple algebra for evaluating XML queries to include compression and decompression. This algebra is exploited by a comprehensive cost based optimizer, able to devise query evaluation methods that freely mix regular operator and compression-relevant ones. (iv)it exploits an adaptation of order preserving string compression algorithm ALM in order to evaluate in the compressed domain the class of queries involving inequality comparisons. Clustering of XML documents.the increasing relevance of the Web as a means for sharing information has made traditional approaches to information handling ineffective. Indeed, they are mainly devoted to the management of highly structured information, like relational databases, whereas Web data are semistructured and encoded using different formats. In particular, XML is touted as the driving-force for exchanging data on the Web, since it benefits from several advantages with respect to other data models. Examples are the flexibility for designing ad hoc markup languages for the representation and exchange of semistructured data within any application context, and the support of suitable document type definitions (DTDs) and XML Schema that permit to specify both the structure and the content of the documents. As the heterogeneity of XML sources increases, the need for organizing XML documents according to their structural features has become challenging. XRep is a novel methodology for clustering XML documents by structure, which is based on the notion of XML cluster representative. A cluster representative is a prototype XML document subsuming the most relevant structural features of the documents within a cluster. The intuition at the core of the approach is that a suitable cluster prototype can be obtained as the outcome of a proper overlapping among all the documents within a given cluster. Actually, the resulting tree has the main advantage of retaining the specifics of the enclosed documents, while guaranteeing a compact representation. This eventually makes the proposed notion of cluster representative extremely profitable in the envisaged applications: in particular, as a summary for the cluster, a representative highlights common subparts in the enclosed documents, and can avoid expensive comparisons with individual documents in the cluster. Trajectory Clustering. The discovery of frequently used trajectory segments can be useful in the context of Intelligent Transportation Systems as well as for improving the quality of network services. AT-DCS (Automatic Top-Down Clustering of Sequences) is a new approach to clustering trajec- 3
tories, that scales to processing large volumes of such data both in terms of effectiveness and efficiency. The main idea of the approach is borrowed from decision-tree learning, where traditional classification algorithms (such as C4.5 or CART) implement a top-down approach in order to recursively partition the available data on the basis of the gain in purity of the subsets w.r.t. the original dataset. There, purity is referred to the frequencies of class labels: the more label frequencies within a partition are unbalanced, the purer is the partition. These approaches have been proven to be both efficient and effective. AT-DCS implements a similar strategy for clustering high dimensional categorical data. Given an initial dataset, it recursively searches for a partition, which improves the overall purity. The algorithm is parametric to the notion of purity, which allows to adopt the quality criterion that best adapts to the specific case of clustering. In this paper, we provide a definition of purity, that is directly related to the frequency of the attribute values within the partition. Intuitively, the more predominance of some attribute values w.r.t. other vales is appreciable, the purer is the partition. 3.2 Project Activities 2003-2004 Technologies and Services for Enhanced Content Delivery (ECD). The project studies and proposes advanced technologies to find and organize contents of data available on the web. In this setting, a goal was to develop a prototype able to cluster data coming from transactional databases (such as, e.g., sessions of web users, or documents represented as bags of words). The prototype was applied to the postprocessing of results of queries to search engines. 2004-2005 Grid.it - WP6 - Knowledge services for intensive data analysis and intelligent query answering. This project, coordinated by the National Research Council (CNR), is defined within the scientific and technological context of new ITC platforms and of large scale distributed systems. The goal is to study and to experiment systems and software tools that turn out to be innovative at all levels, as well as to demonstrate their capabilities through some specific applications. 4
3.3 Reviewing Activities He is involved on review activity for the following national and international conference: International Conference on Data Mining (ICDM). International Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). International Symposium on Applied Computing (SAC). Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD). IASTED International Conference on Databases and Applications (DBA). 4 Teaching activities 4.1 Academical Courses Since September 2003, he is involved in teaching activities. In particular: A.Y. 2006-2007 Teacher for the course Fundamentals of Computer Science, Faculty of Political Science, Università della Calabria. Teacher assistant for the course Computer Architecture, Faculty Teacher assistant for the course Network Operating Systems, Faculty A.Y. 2005-2006 Teacher assistant for the course Fundamentals of Computer Science, Faculty Teacher assistant for the course Computer Architecture, Faculty Teacher assistant for the course Network Operating Systems, Faculty A.Y. 2004-2005 Teacher assistant for the course Fundamentals of Computer Science, Faculty 5
Teacher assistant for the course Computer Architecture, Faculty Teacher assistant for the course Network Operating Systems, Faculty Teacher assistant for the course Introduction to Computer Science, Faculty A.Y. 2003-2004 Teacher assistant for the course Computer Architecture, Faculty 4.2 Master and Training courses Since September 2003, he is involved in teaching activities for the following master and training courses: May 2005. Teacher for the module Business Intelligence: Analisi dei dati finalizzata al marketing of training course La gestione delle funzioni aziendali nell era dell e-business for the project M.ENT.E - Management of integrated enterprise organized by Sviluppo Italia Calabria. From Dicember 2004 to January 2005. Teacher for the module Programmazione: Architetture e Sistemi operativi of training course La gestione delle funzioni aziendali nell era dell e-business for the project M.ENT.E - Management of integrated enterprise organized by Sviluppo Italia Calabria. 5 Scientific papers 5.1 Published papers G. Costa, F. Folino,A. Locane, G. Manco, R. Ortale. Data Mining for Effective Risk Analysis in a Bank Intelligence Scenario. In Proceedings of ICDE Workshop on Data Mining and Business Intelligence (DMBI 2007), Instanbul, Turkey, April 2007 (To Appear). G. Costa, A. D Atri, G. Manco, R. Ortale, D. Sacca, S. Za. Logistic management in a mobile environment: an approach Based on trajectory mining. In proceedings of IEEE Workshop on Mobile Communications 6
and Learning (MCL 2007) ; Sainte-Luce, Martinique - April 22-28, 2007 (To appear). A. Arion, A. Bonifati, G. Costa, S. D Aguanno, I. Manolescu, A. Pugliese. XQueC: pushing queries to compressed XML data. In proceedings of International Conference on Very Large Data Bases (VLDB), Berlino, Germania, 2003. ISBN 0-12-722442-4, pp. 1065-1068, Morgan Kaufmann, San Francisco, USA, 2003. A. Arion, A. Bonifati, G. Costa, S. D Aguanno, I. Manolescu, A. Pugliese. Effcient query evaluation over compressed XML data. In proceedings of International Conference on Extending Database Technology (EDBT), 2004. LNCS 2992, pp. 200-218, Springer-Verlag, Berlino, Germania, 2004. A. Arion, A. Bonifati, G. Costa, S. D Aguanno, I. Manolescu, A. Pugliese. XQueC: pushing queries to compressed XML data. Journees Bases de Donnees Avancees (BDA), Lione, Francia, 2003. G. Costa, G. Manco, R. Ortale, A. Tagarelli: Clustering of XML Documents by Structure based on Tree Matching and Merging. Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD), 2004: 314-325. G. Costa, G. Manco, R. Ortale, A. Tagarelli: A Tree-based Approach to Clustering XML Documents by Structure. In proceedings of International Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2004: 137-148. 5.2 Technical reports G. Costa, G. Manco, R. Ortale, A. Tagarelli A Tree-based Approach to Clustering XML Documents by Structure Istituto di calcolo e reti ad alte prestazioni (ICAR-CNR), technical report n.02, 2004. In accordance with the Italian law 675/96 and with D. Lgs n.196 approved June 30th 2003, I hereby authorize the use of my personal and professional details contained in this curriculum vitae. 7
Rende, March 1, 2007 Giovanni Costa 8