An Approach to support Web Service Classification and Annotation
|
|
- Jasmin Arabella Smith
- 7 years ago
- Views:
Transcription
1 An Approach to support Web Service Classification and Annotation Marcello Bruno, Gerardo Canfora, Massimiliano Di Penta, and Rita Scognamiglio RCOST - Research Centre on Software Technology University of Sannio, Department of Engineering Palazzo ex Poste, Via Traiano Benevento, Italy Abstract The need for supporting the classification and semantic annotation of services constitutes an important challenge for service centric software engineering. Late binding and, in general, service matching approaches, require services to be semantically annotated. Such a semantic annotation may require, in turn, to be made in agreement to a specific ontology. Also, a service description needs to properly relate with other similar services. This paper proposes an approach to i) automatically classify services to specific domains and ii) identify key concepts inside service textual documentation, and build a lattice of relationships between service annotations. Support Vector Machines and Formal Concept Analysis have been used to perform the two tasks. Results obtained classifying a set of web services show that the approach can provide useful insights in both service publication and service retrieval phases. Keywords: Ontology Building, Service Classification, Semantic Annotation 1 Introduction One of the most relevant advantages of service centric software engineering is the possibility a developer has to build his/her own system as a composition of one or more abstract services, i.e., semantic descriptions that can be matched at run time with the description of one or more concrete services. The subsumption relationship between an abstract service and the concrete services is completed by means of matching algorithms integrated in the service broker [18]. The choice of the actual concrete service to bind to an abstract service can also consider concrete services Quality of Service (QoS) attributes. The above described scenario requires that each service must have a semantic description, according to a specific ontology 1. Service semantic annotation is, however, a difficult task that, given the actual state of the art, is often too expensive to be done in practice. Also the building and maintenance of ontologies requires expertise and budgets not always available. Unfortunately, very often the only source of information available is a pure textual description of the service, sometimes extracted from source code comments. During service publication, it would be therefore useful to exploit this form of textual information to: permit the automatic classification of services to be published according to the broker s service ontological classification; support building and maintenance of domain specific ontologies; aid the semantic annotation of a service with respect to the ontology. By detecting concepts inside the service textual documentation, it would be possible to see how the service concepts can be identified in the ontology, and how the service can be cataloged with respect to other existing services. The usefulness of a semi automatic support for service classification and annotation is not limited to service publication phase. In fact, it can also be used during service retrieval. Let us suppose that a service integrator is querying (sending a free text query) the broker to search for a service performing a particular task. Such an automatic classification mechanism can be applied to free-text queries to: identify the category (or the scored list of categories) in which any service matching the query can be found; 1 There is work investigating the possibility of matching between services described with different ontologies. This aspect, however, is out of scope for this paper and will not be further considered.
2 ease the browsing among the available services, once the service integrator chooses a category. As it will be clearer later, the relationships between different services belonging to a specific domain can be represented, with some simplifications, using a concept lattice. Thus, it would be useful to develop a mechanism able to identify the lattice region in which the service the integrator is searching for could be found. This paper proposes an approach that, starting from service textual description, performs an automatic classification (to catalog services across specific domains, such as telecommunications, finance, etc.), and then identifies service key concepts and their relationships as a concept lattice. The approach relies on Support Vector Machines (SVM) [22] and Information Retrieval (IR) Vector Spaces [10] for service classification, and uses Formal Concept Analysis (FCA) [23] to build concept lattices from service descriptions. The results showed that, even if a totally automatic construction of the lattice is not feasible, FCA still gives aids and useful insights to help the publisher annotating the service and, when necessary, maintaining the ontology. The remainder of the paper is organized as follows. First, Section 2 provides an overview of the related literature and available tools. Then, Section 3 describes the proposed approach and its application scenarios. The first results obtained are presented and discussed in Section 4. Finally, Section 5 concludes. 2 Related Work Text classification has seen a great deal of success with the application of several studies addressed towards machine learning [12, 15, 19, 24]. Among the many learning algorithms, SVM [22] appears to be most promising. The first application of text classification using SVM has been presented by Joachims [12]. The results were also confirmed by different other studies [12, 24]. Joachims et al. [13] developed a theoretical learning model of text classification for SVMs, which provides some explanation about SVMs performance in text classification. Di Lucca et al. [8] compared the effectiveness of different IR and machine learning approaches for classifying software maintenance requests. As a result, SVM outperformed other approaches. The manual construction and maintenance of specific domain ontologies is an expensive and complex work, requiring significant waste of effort and time, as well as a detailed knowledge of the domain to be modeled. Fridman Noy at al. [17] describe the knowledge model of Protege 2000 [3], an ontology editing and knowledge-acquisition environment. Tao [21] developed a Protege 2000 FCA based plug in for the building and maintenance of ontologies. Up to now, some work has been done in the field of automatic support for ontology building. An example of using FCA in ontology merging has been proposed by Maedche et al. [16]. However, few papers investigated the possibility of using FCA in ontology building and structuring. Cimiano et al. [7] discuss how FCA can be used to support ontology engineering and how ontologies can be exploited in FCA applications. They present the method FCA-Merge for merging ontologies following a bottom up approach. Hele- Mai Haav [11] presented an approach, based on Natural Language Processing (NLP), for the automatic or semiautomatic discovery of domain-specific ontologies from free text. Kim and Compton [14] propose an ontology browsing mechanism relying on FCA and incremental knowledge acquisition mechanisms. JBraindead Information Retrieval System [9] combines a free text search engine that uses FCA to organize the results of a query. This work showed that conceptual lattices can be very useful to group relevant information in a free text search task. 3 Approach Description As stated in the introduction, the proposed free text service classification approach aims at accomplishing a twofold task: i) perform the automatic classification of a service description, i.e., determine to which category/domain a service belongs to; ii) locate a service description in a concept lattice. The remainder of this section will explain in details the three steps of the classification approach, depicted in Figure 1. Figure 1. The service classification approach
3 3.1 Text Preprocessing The first step aims to preprocess service textual descriptions. Textual description of web services might be in the form of Web Service Description Language (WSDL) documents, coming from UDDI registries, as well as any other textual document provided as a documentation support for the service itself. Words are extracted from documents and then preprocessed. Successively, words are filtered by means of a stop list, and normalized. The stop list contains articles, prepositions, and in general words that are frequent in each query, and therefore not discriminant ( web, service or SOAP ). During the stemming phase, verbs are brought back to infinitive, plurals to singulars, etc. using the Wordnet dictionary [5] and its Java API. 3.2 Service Classification The classification of services into domain specific classes is performed using the SVM method. In our implementation, the freely available LIBSVM tool [1] was used. As stated in the introduction, automatic service classification both serves during service publication (to classify the new service) and service retrieval (to identify the class(es) where to restrict the focus of the query). In this section s context, both web service documentation and user queries are considered as a textual description to be classified (represented as grey arrows in Figure 1). Prior to apply SVM, sequences of words, obtained in the previous preprocessing phase, must be mapped onto vectors. In our approach, the mapping is achieved using IR techniques. Each element of the vector corresponds to a word (or term) in a vocabulary extracted from the documents themselves. All words are weighted with tf-idf metric. In this way, each document is mapped onto a vector using an injective function. The whole document set is encoded in a matrix, where rows represent documents (vectors) and columns are the weighted words. No information about the position or the meaning of the words is used, i.e., no semantic is known using this matrix. A classification task using supervised algorithm such as SVM or ANN requires a training set. In other word, our SVM needs to be trained with a pre classified set of documents. This produces a model matrix that will be used to predict to which class the document to be classified belongs to. 3.3 Building the Concept Lattice Once classified the service, or re directed the query to a specific domain, key concepts need to be extracted from the service/query 2 and their lattice needs to be built. Clearly, 2 From this point we will refer both as service indistinctly. such a lattice only represents a simplification of a domain ontology. FCA advantage comes from the way it shows how the presence or absence of attribute distinguishes objects, i.e., by means of super concept/sub concept relationships. A concepts lattice can well represent services names and keywords belonging to a specific domain, highlighting isa relationships between concepts and attributes. Without loss of generality, let us suppose we want to build a concept lattice from a set of service descriptions. First and foremost, we need to identify discriminant words, useful for the lattice. To this aim, we use the idf metric to eliminate words that do not appear in at least two or more, depending on the number of documents/services belonging to that domain/class. More formally: A service context is a triple C=(S, K, I) where S is a set of service names (the objects), K is the set of service description keywords (the attributes), and I the binary relationship which indicates the presence or absence of words into documents. The obtained lattice 3 may be used to identify concepts for a specific domain, as well as the relationships between services belonging to a class. Such a lattice aids a service publisher when providing service semantic annotation, in that it tries, starting just a textual description, to discover the service hidden semantic. 4 Empirical Study To validate and gain insights about the usefulness of the proposed approach, we performed an experiment aiming to classify a set of web service documentations, and to build lattices for services belonging to some particular classes/domains. Results are presented and discussed in this section. 4.1 Case Study Description Getting a suitable and extensive case study for experiments dealing with web services is still a challenge. Although, at the time of writing, several UDDI registries exist and are available for querying, too often the set of services obtained is almost useless. Even the service are trivial, or their documentation is dummy. We used, as a case study, a set of pre classified services available on the net [4] and downloaded from some UDDI registries. Such a set was composed of 205 services, classified in 11 classes, representing domains such as news, weather, credit card, etc. As said, each service is provided with a short description, extracted from the WSDL <documentation> tag. For example, a Credit Card Web Service presents a description as follows: 3 A thorough example is available online at
4 Will accept and validate a Credit Card Number. Returns True for a Valid Number and Returns False for a Invalid Number 4.2 Service Classification Results SVM service classification performances were measured using the leave one out validation [20]: each document (vector) in the set (matrix) was classified using a SVM model built using the remaining ones, and the percentage of correct classifications was measured. We found that different performances can be achieved by properly calibrating the SVM parameters, namely the kernel function and the gamma parameter. Other than the first classification obtained for each service, we let the SVM find alternative classification, for which we also measured the correct classification ratio, incrementally with respect to the first classification ratio. This permits to obtain, for each service, an ordered list of classes, to which the service has the highest likelihood to belong, according to our classification algorithm. In other words, the algorithm ensures that the service belongs, with a given likelihood, to the first class, with an higher likelihood to one of the two first classes, etc. For our case study, precision for the first score position is of about 63%. This is not that high (up to 84% was obtained for software maintenance ticket classification by Di Lucca et al. [8]), however reasonable, considering the quality and quantity of the training set, and that the approach was able to classify across 11 classes. When looking further, to best two and best three class scores, we found that, clearly, performance increases, respectively to 73% and 83%. In conclusion, although the approach could not always suggest the correct class, at least is able to indicate a limited group of classes to which the service could belong. 4.3 Building Service Concept Lattice After classification was performed and thus services belonging to each category identified, we built concept lattices for each category using FCA. As described in Section 3, words having high idf were pruned, in that they are considered not relevant for building the concept lattice. Figure 2 shows a lattice obtained applying FCA to documents belonging to the Credit Card class. Each node in the lattice can show both concepts and objects. Concepts appear in the high part of the node, while objects in the low. In our case, concepts represent sets of keywords, while objects represents services. Generic concepts (referred as top concepts) are placed in the high part of the lattice (card, credit, Visa, Mastercard, etc.). The lattice easy permits to find, for example, services that both support Visa or Mastercard (service11 and service6). service6 is considered to be more specific than service11 because, according to its description, it can validate credit card numbers, while service11 does not advertise such a feature. Going further in our lattice analysis, it can be noted that service16 can be used to validate credit cards and it seems to be more specific than service1. This reflects what is specified in their description: service1: credit card (whole text is: Offering Loans And Credit Cards to Consumers) service16: accept card credit validate valid (whole text is Will accept and validate a Credit Card Number. Returns True for a Valid Number and Returns False for a Invalid Number ) Note that word relevance depends on the term document frequency. If a word appeared only in one document, it has been pruned ( loans, consumer, invalid ). Other words have also been stopped. According to the proposed approach, new concepts are added to the lattice when terms appear in more than one document. Therefore, if a new service description containing the word consumer is used to expand the lattice, a new concept will be added, and the lattice structure will change. A further example of lattice building, related to the mail domain, is reported in a technical report available online [6]. 4.4 Discussion Results obtained for service classification showed how the approach can be useful. It appears evident that the automatic classification helps, identifying, with a likelihood of 83%, the ordered list of 3 out of 11 classes among which the service may be classified. The service publisher can exploit this result from several points of view. First and foremost, he/she can accept one of the classifications proposed by the automatic tool, possibly manually refining the choice. In this case, the tool helps in reducing the publisher classification degrees of freedom across a limited number of service classes/domains. It may happen that the proposed classes may be completely different than publisher expectancies. If a weather service is classified in the finance or mail domain, it means that the service description may be ambiguous. In this case the tool raises the publisher s attention to the problem, highlighting the need for correcting the deployed service documentation. This will reduce the risk that, during service search, the service is never found by queries related to its own features and, instead, it is found by queries related to other kind of features. Regarding concept lattice building, it appears immediately clear that a completely automatic ontology or semantic annotation building is unfeasible. This, however, was not
5 Figure 2. Concept lattice of services: the credit card example our purpose. Instead, we found that, while human supervision and intervention cannot be avoided, useful insights can be obtained from service lattices. In fact, by highlighting relationships between services, it can help to build and refine the service semantic annotation. By looking to the lattice, the publisher can found that some keyword may simply make the service annotation heavier, or even misleading. Thus, it can be decided to remove or replace these keywords. When a service developer publishes some services, he/she is aware of the genericity/specificity of the services. If this is not reflected in the lattice, it means that service descriptions are misleading or incomplete. More generally, if the web service textual description is incomplete, too generic or containing phrases not properly related to the service features, it may be hard to automatically classify it and to find the correct position in the concept lattice. Much in the same way, let us suppose that some domain specific services have been already published and semantically annotated according to a specific ontology, and that we want to publish a new service. By using proper tools, such as the FCA plug-in for Protege 2000 [2], it can be possible to extract a context from the ontology. If we add a row, representing our service keywords, to such a context, and then we build a lattice using FCA, we will be able to immediately highlight how our service can be annotated with respect to the ontology. The second consideration we can make about the usefulness of these concept lattices is related to ontology building. As stated in the introduction, service annotations coherent with ontologies may be necessary to allow automatic service matching for late binding. The concept identification made by FCA, as well as the lattice structure of these concepts, although giving a limited view of an ontology, can indeed be useful for its building, completion or maintenance. In fact, when publishing new service, new concepts may need to be added, especially if the ontology is not yet complete. Conversely, when a user performs a query to retrieve a service, the following scenario can happen. First and foremost, the user is guided, by the SVM classifier, to focus on some particular domains. In these domains, the portion of lattice of interest is highlighted, significantly easing the service search. Finally, our studies suggested that lattices appear to be useful when focusing to well restricted domains. Wide domains and upper ontologies would generate, in fact, unmanageable and difficult to understand lattices.
6 5 Conclusions This paper presented an approach, based on machine learning techniques, to support service classification and annotation. Starting from free text service documentation, services are automatically classified in classes/domains using Support Vector Machines. Successively, Formal Concept Analysis is used to build service concept lattice for each specific domain. Results of a classification experiment on a set of 205 services downloaded from the web shown the feasibility of the approach. Although needing user guidance, automatic classification, by proposing the nearest three classes out of 11 with a likelihood of 83%, can ease and support the service publication and annotation. Much in the same way, the obtained concept lattices highlighted relationships existing between services, and aided the identification of domain key concepts. Finally, we showed with some examples how the same approach can also be integrated in the service retrieval mechanism. Work in progress is devoted to further improve the proposed technique, to confirm the obtained results with other case studies, and to integrate the approach in a service broker we are developing in a project in cooperation with an Italian software company. References [1] Libsvm tool. cjlin/libsvm/. [2] Plugin for protege [3] Protege [4] Textual documentation of web services and classified services. [5] Wordnet dictionary. wn/. [6] M. Bruno, G. Canfora, M. Di Penta, and R. Scognamiglio. An approach to support web service classification and annotation. Technical report, RCOST - University of Sannio, Italy, Sep [7] S. P. Cimiano, J. Staab, and Tane. Deriving concept hierarchies from text by smooth formal concept analysis. In Proceedings of the GI Workshop Lehren Lerner -Wissen - Adaptivitat (LLWA), [8] G. Di Lucca, M. Di Penta, and S. Gradara. An approach to classify software maintenance requests. In Proceedings of IEEE International Conference on Software Maintenance, pages , Montréal, QC, Canada, Oct [9] P. W. Eklund, editor. Browsing Search Results via Formal Concept Analysis: Automatic Selection of Attributes, volume 2961/2004 of Lecture Notes in Computer Science. Springer, feb [10] W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ, [11] H.-M. Haav. An application of inductive concept analysis to construction of domain-specific ontologies. In B. Thalheim and G. Fiedler, editors, Emerging Database Research in East Europe, Proceedings of the Pre-Conference Workshop of VLDB 2003, volume 14/03 of Computer Science Reports, pages Brandenburg University of Technology at Cottbus, nov [12] T. Joachims. Text categorization with support vector machines: learning with many relevant features. European Conf. Mach. Learning, ECML98, Apr [13] T. Joachims. A statistical learning model of text classification for support vector machines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages , [14] M. Kim and P. Compton. Formal concept analysis for domain-specific document retrieval systems. Lecture Notes in Computer Science, 2256, [15] D. D. Lewis. Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US, [16] E. Maedche and G. Stumme. FCA-MERGE: Bottom-up merging of ontologies. Jan [17] N. F. Noy, R. W. Fergerson, and M. A. Musen. The knowledge model of Protégé-2000: Combining interoperability and flexibility. Lecture Notes in Computer Science, 1937, [18] M. Paolucci, T. Kawamura, T. R. Payne, and K. Sycara. Semantic matching of web services capabilities. In First International Semantic Web Conference (ISWC 2002), volume 2348, pages Springer-Verlag, June [19] F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1):1 47, [20] M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36: , [21] G. Tao. Using formal concept analysis for ontology structuring and building. PhD thesis, [22] V. N. Vapnik. Statistical Learning Theory. John Wiley, Sept [23] B. G. R. Wille. Formal Concept Analysis. Mathematical Foundations, Springer Verlag, [24] Y. Yang and X. Liu. A re-examination of text categorization methods. In M. A. Hearst, F. Gey, and R. Tong, editors, Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42 49, Berkeley, US, ACM Press, New York, US.
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationUsing Concept Lattices to Support Web Service Compositions with Backup Services
Using Concept Lattices to Support Web Service Compositions with Backup Services Zeina Azmeh, Marianne Huchard, Chouki Tibermacine LIRMM - CNRS & Univ. Montpellier II - France {azmeh, huchard, tibermacin}@lirmm.fr
More informationAn Approach to Classify Software Maintenance Requests
An Approach to Classify Software Maintenance Requests G.A. Di Lucca, M. Di Penta, S. Gradara dilucca@unina.it, dipenta@unisannio.it, gradara@unisannio.it University of Naples Federico II, DIS - Via Claudio
More informationA Case Study of Question Answering in Automatic Tourism Service Packaging
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question
More informationA Pattern-based Framework of Change Operators for Ontology Evolution
A Pattern-based Framework of Change Operators for Ontology Evolution Muhammad Javed 1, Yalemisew M. Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationCloud Storage-based Intelligent Document Archiving for the Management of Big Data
Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationEXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION
EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationWSPAB: A Tool for Automatic Classification & Selection of Web Services Using Formal Concept Analysis
WSPAB: A Tool for Automatic Classification & Selection of Web Services Using Formal Concept Analysis Zeina Azmeh, Marianne Huchard, Chouki Tibermacine LIRMM, CNRS and Univ. Montpellier II, 161 rue Ada
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationOntological Identification of Patterns for Choreographing Business Workflow
University of Aizu, Graduation Thesis. March, 2010 s1140042 1 Ontological Identification of Patterns for Choreographing Business Workflow Seiji Ota s1140042 Supervised by Incheon Paik Abstract Business
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More informationA QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
More informationEvaluating Semantic Web Service Tools using the SEALS platform
Evaluating Semantic Web Service Tools using the SEALS platform Liliana Cabral 1, Ioan Toma 2 1 Knowledge Media Institute, The Open University, Milton Keynes, UK 2 STI Innsbruck, University of Innsbruck,
More informationExtend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia
More informationOntology for Home Energy Management Domain
Ontology for Home Energy Management Domain Nazaraf Shah 1,, Kuo-Ming Chao 1, 1 Faculty of Engineering and Computing Coventry University, Coventry, UK {nazaraf.shah, k.chao}@coventry.ac.uk Abstract. This
More informationTowards a Visually Enhanced Medical Search Engine
Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationIncorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
More informationOntology-Based Discovery of Workflow Activity Patterns
Ontology-Based Discovery of Workflow Activity Patterns Diogo R. Ferreira 1, Susana Alves 1, Lucinéia H. Thom 2 1 IST Technical University of Lisbon, Portugal {diogo.ferreira,susana.alves}@ist.utl.pt 2
More informationTheme-based Retrieval of Web News
Theme-based Retrieval of Web Nuno Maria, Mário J. Silva DI/FCUL Faculdade de Ciências Universidade de Lisboa Campo Grande, Lisboa Portugal {nmsm, mjs}@di.fc.ul.pt ABSTRACT We introduce an information system
More informationUtilising Ontology-based Modelling for Learning Content Management
Utilising -based Modelling for Learning Content Management Claus Pahl, Muhammad Javed, Yalemisew M. Abgaz Centre for Next Generation Localization (CNGL), School of Computing, Dublin City University, Dublin
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationAssociation rules for improving website effectiveness: case analysis
Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, dimitrijevic@vtsns.edu.rs Tanja Krunić, The
More informationBuilding FCA-based Decision Trees for the Selection of Heterogeneous Services
Building FCA-based Decision Trees for the Selection of Heterogeneous Services Stéphanie Chollet, Vincent Lestideau, Philippe Lalanda, Yoann Maurel Laboratoire d Informatique de Grenoble F-38041 Grenoble
More informationOntology-based Domain Modelling for Consistent Content Change Management
Ontology-based Domain Modelling for Consistent Content Change Management Muhammad Javed 1, Yalemisew Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationA Service Modeling Approach with Business-Level Reusability and Extensibility
A Service Modeling Approach with Business-Level Reusability and Extensibility Jianwu Wang 1,2, Jian Yu 1, Yanbo Han 1 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing,
More informationFolksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
More informationThe Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
More informationMIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
More informationOptimization of Image Search from Photo Sharing Websites Using Personal Data
Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search
More informationCombining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationA Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
More informationDissecting the Learning Behaviors in Hacker Forums
Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,
More informationSmartLink: a Web-based editor and search environment for Linked Services
SmartLink: a Web-based editor and search environment for Linked Services Stefan Dietze, Hong Qing Yu, Carlos Pedrinaci, Dong Liu, John Domingue Knowledge Media Institute, The Open University, MK7 6AA,
More informationAUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES Anwar Ali Yahya *, Addin Osman * * Faculty of Computer Science and Information Systems, Najran University,
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationSpam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationReverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
More informationAutomatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationSemantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
More informationMACHINE LEARNING BASED TICKET CLASSIFICATION IN ISSUE TRACKING SYSTEMS
MACHINE LEARNING BASED TICKET CLASSIFICATION IN ISSUE TRACKING SYSTEMS Mucahit Altintas (a,b,c), A. Cuneyd Tantug (d,b) a maltintas@itu.edu.tr, d tantug@itu.edu.tr b Istanbul Technical University, Istanbul,
More informationHow To Use Networked Ontology In E Health
A practical approach to create ontology networks in e-health: The NeOn take Tomás Pariente Lobo 1, *, Germán Herrero Cárcel 1, 1 A TOS Research and Innovation, ATOS Origin SAE, 28037 Madrid, Spain. Abstract.
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationInteractive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
More informationLexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web
LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web Knarig Arabshian Bell Labs, Alcatel-Lucent Murray Hill, NJ Peter Danielsen Bell Labs, Alcatel-Lucent Naperville, IL Sadia Afroz Drexel
More informationA Semantic Web of Know-How: Linked Data for Community-Centric Tasks
A Semantic Web of Know-How: Linked Data for Community-Centric Tasks Paolo Pareti Edinburgh University p.pareti@sms.ed.ac.uk Ewan Klein Edinburgh University ewan@inf.ed.ac.uk Adam Barker University of St
More informationFoundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationA Road map to More Effective Web Personalization: Integrating Domain Knowledge with Web Usage Mining
A Road map to More Effective Web Personalization: Integrating Domain Knowledge with Web Usage Mining Honghua (Kathy) Dai, Bamshad Mobasher {hdai, mobasher}@cs.depaul.edu School of Computer Science, Telecommunication,
More informationAnalysis of Data Mining Concepts in Higher Education with Needs to Najran University
590 Analysis of Data Mining Concepts in Higher Education with Needs to Najran University Mohamed Hussain Tawarish 1, Farooqui Waseemuddin 2 Department of Computer Science, Najran Community College. Najran
More informationHELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationWeb services with WebSphere Studio: Deploy and publish
Web services with WebSphere Studio: Deploy and publish Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Introduction...
More informationAn Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationStemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System
Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,
More informationA MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
More informationPersonalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de
More informationOntology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationA Symptom Extraction and Classification Method for Self-Management
LANOMS 2005-4th Latin American Network Operations and Management Symposium 201 A Symptom Extraction and Classification Method for Self-Management Marcelo Perazolo Autonomic Computing Architecture IBM Corporation
More informationSemantic Transformation of Web Services
Semantic Transformation of Web Services David Bell, Sergio de Cesare, and Mark Lycett Brunel University, Uxbridge, Middlesex UB8 3PH, United Kingdom {david.bell, sergio.decesare, mark.lycett}@brunel.ac.uk
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationWord Taxonomy for On-line Visual Asset Management and Mining
Word Taxonomy for On-line Visual Asset Management and Mining Osmar R. Zaïane * Eli Hagen ** Jiawei Han ** * Department of Computing Science, University of Alberta, Canada, zaiane@cs.uaberta.ca ** School
More informationIncorporating Semantic Discovery into a Ubiquitous Computing Infrastructure
Incorporating Semantic Discovery into a Ubiquitous Computing Infrastructure Robert E. McGrath, Anand Ranganathan, M. Dennis Mickunas, and Roy H. Campbell Department of Computer Science, University or Illinois
More informationAn ARIS-based Transformation Approach to Semantic Web Service Development
An ARIS-based Transformation Approach to Semantic Web Development Cheng-Leong Ang ϕ, Yuan Gu, Olga Sourina, and Robert Kheng Leng Gay Nanyang Technological University, Singapore eclang@ntu.edu.sg ϕ Abstract
More informationSemantic Structure Matching for Assessing Web-Service Similarity
Semantic Structure Matching for Assessing Web- Service Similarity Yiqiao Wang and Eleni Stroulia Computer Science Department, University of Alberta, Edmonton, AB, T6G 2E8, Canada {yiqiao,stroulia}@cs.ualberta.ca
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationSEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK
SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830
More informationHeterogeneous Data Management on Environmental Sensors Using Ontology Mapping
Lecture Notes on Information Theory Vol. 1, No. 4, December 2013 Heterogeneous Data Management on Environmental Sensors Using Ontology Mapping Kaladevi Ramar and T. T Mirnalinee Department of Computer
More informationRemote support for lab activities in educational institutions
Remote support for lab activities in educational institutions Marco Mari 1, Agostino Poggi 1, Michele Tomaiuolo 1 1 Università di Parma, Dipartimento di Ingegneria dell'informazione 43100 Parma Italy {poggi,mari,tomamic}@ce.unipr.it,
More informationAn Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationA Semantic Approach for Access Control in Web Services
A Semantic Approach for Access Control in Web Services M. I. Yagüe, J. Mª Troya Computer Science Department, University of Málaga, Málaga, Spain {yague, troya}@lcc.uma.es Abstract One of the most important
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationBusiness Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
More informationSIMOnt: A Security Information Management Ontology Framework
SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationAddressing the Contract Issue, Standardisation for QoS
Addressing the Contract Issue, Standardisation for QoS Russell LOCK 1, Glen DOBSON 2, Ian SOMMERVILLE 3 1 InfoLab21, Lancaster University, Lancaster, UK, LA1 4WA, Tel: +44 (0)1524 510356, Email: r.lock@comp.lancs.ac.uk
More informationSupporting Change-Aware Semantic Web Services
Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand a.hinze@cs.waikato.ac.nz Abstract. The Semantic Web is not only evolving into
More informationKing Mongkut s University of Technology North Bangkok 4 Division of Business Computing, Faculty of Management Science
(IJCSIS) International Journal of Computer Science and Information Security, Ontology-supported processing of clinical text using medical knowledge integration for multi-label classification of diagnosis
More information