An Approach to support Web Service Classification and Annotation
|
|
|
- Jasmin Arabella Smith
- 9 years ago
- Views:
Transcription
1 An Approach to support Web Service Classification and Annotation Marcello Bruno, Gerardo Canfora, Massimiliano Di Penta, and Rita Scognamiglio RCOST - Research Centre on Software Technology University of Sannio, Department of Engineering Palazzo ex Poste, Via Traiano Benevento, Italy Abstract The need for supporting the classification and semantic annotation of services constitutes an important challenge for service centric software engineering. Late binding and, in general, service matching approaches, require services to be semantically annotated. Such a semantic annotation may require, in turn, to be made in agreement to a specific ontology. Also, a service description needs to properly relate with other similar services. This paper proposes an approach to i) automatically classify services to specific domains and ii) identify key concepts inside service textual documentation, and build a lattice of relationships between service annotations. Support Vector Machines and Formal Concept Analysis have been used to perform the two tasks. Results obtained classifying a set of web services show that the approach can provide useful insights in both service publication and service retrieval phases. Keywords: Ontology Building, Service Classification, Semantic Annotation 1 Introduction One of the most relevant advantages of service centric software engineering is the possibility a developer has to build his/her own system as a composition of one or more abstract services, i.e., semantic descriptions that can be matched at run time with the description of one or more concrete services. The subsumption relationship between an abstract service and the concrete services is completed by means of matching algorithms integrated in the service broker [18]. The choice of the actual concrete service to bind to an abstract service can also consider concrete services Quality of Service (QoS) attributes. The above described scenario requires that each service must have a semantic description, according to a specific ontology 1. Service semantic annotation is, however, a difficult task that, given the actual state of the art, is often too expensive to be done in practice. Also the building and maintenance of ontologies requires expertise and budgets not always available. Unfortunately, very often the only source of information available is a pure textual description of the service, sometimes extracted from source code comments. During service publication, it would be therefore useful to exploit this form of textual information to: permit the automatic classification of services to be published according to the broker s service ontological classification; support building and maintenance of domain specific ontologies; aid the semantic annotation of a service with respect to the ontology. By detecting concepts inside the service textual documentation, it would be possible to see how the service concepts can be identified in the ontology, and how the service can be cataloged with respect to other existing services. The usefulness of a semi automatic support for service classification and annotation is not limited to service publication phase. In fact, it can also be used during service retrieval. Let us suppose that a service integrator is querying (sending a free text query) the broker to search for a service performing a particular task. Such an automatic classification mechanism can be applied to free-text queries to: identify the category (or the scored list of categories) in which any service matching the query can be found; 1 There is work investigating the possibility of matching between services described with different ontologies. This aspect, however, is out of scope for this paper and will not be further considered.
2 ease the browsing among the available services, once the service integrator chooses a category. As it will be clearer later, the relationships between different services belonging to a specific domain can be represented, with some simplifications, using a concept lattice. Thus, it would be useful to develop a mechanism able to identify the lattice region in which the service the integrator is searching for could be found. This paper proposes an approach that, starting from service textual description, performs an automatic classification (to catalog services across specific domains, such as telecommunications, finance, etc.), and then identifies service key concepts and their relationships as a concept lattice. The approach relies on Support Vector Machines (SVM) [22] and Information Retrieval (IR) Vector Spaces [10] for service classification, and uses Formal Concept Analysis (FCA) [23] to build concept lattices from service descriptions. The results showed that, even if a totally automatic construction of the lattice is not feasible, FCA still gives aids and useful insights to help the publisher annotating the service and, when necessary, maintaining the ontology. The remainder of the paper is organized as follows. First, Section 2 provides an overview of the related literature and available tools. Then, Section 3 describes the proposed approach and its application scenarios. The first results obtained are presented and discussed in Section 4. Finally, Section 5 concludes. 2 Related Work Text classification has seen a great deal of success with the application of several studies addressed towards machine learning [12, 15, 19, 24]. Among the many learning algorithms, SVM [22] appears to be most promising. The first application of text classification using SVM has been presented by Joachims [12]. The results were also confirmed by different other studies [12, 24]. Joachims et al. [13] developed a theoretical learning model of text classification for SVMs, which provides some explanation about SVMs performance in text classification. Di Lucca et al. [8] compared the effectiveness of different IR and machine learning approaches for classifying software maintenance requests. As a result, SVM outperformed other approaches. The manual construction and maintenance of specific domain ontologies is an expensive and complex work, requiring significant waste of effort and time, as well as a detailed knowledge of the domain to be modeled. Fridman Noy at al. [17] describe the knowledge model of Protege 2000 [3], an ontology editing and knowledge-acquisition environment. Tao [21] developed a Protege 2000 FCA based plug in for the building and maintenance of ontologies. Up to now, some work has been done in the field of automatic support for ontology building. An example of using FCA in ontology merging has been proposed by Maedche et al. [16]. However, few papers investigated the possibility of using FCA in ontology building and structuring. Cimiano et al. [7] discuss how FCA can be used to support ontology engineering and how ontologies can be exploited in FCA applications. They present the method FCA-Merge for merging ontologies following a bottom up approach. Hele- Mai Haav [11] presented an approach, based on Natural Language Processing (NLP), for the automatic or semiautomatic discovery of domain-specific ontologies from free text. Kim and Compton [14] propose an ontology browsing mechanism relying on FCA and incremental knowledge acquisition mechanisms. JBraindead Information Retrieval System [9] combines a free text search engine that uses FCA to organize the results of a query. This work showed that conceptual lattices can be very useful to group relevant information in a free text search task. 3 Approach Description As stated in the introduction, the proposed free text service classification approach aims at accomplishing a twofold task: i) perform the automatic classification of a service description, i.e., determine to which category/domain a service belongs to; ii) locate a service description in a concept lattice. The remainder of this section will explain in details the three steps of the classification approach, depicted in Figure 1. Figure 1. The service classification approach
3 3.1 Text Preprocessing The first step aims to preprocess service textual descriptions. Textual description of web services might be in the form of Web Service Description Language (WSDL) documents, coming from UDDI registries, as well as any other textual document provided as a documentation support for the service itself. Words are extracted from documents and then preprocessed. Successively, words are filtered by means of a stop list, and normalized. The stop list contains articles, prepositions, and in general words that are frequent in each query, and therefore not discriminant ( web, service or SOAP ). During the stemming phase, verbs are brought back to infinitive, plurals to singulars, etc. using the Wordnet dictionary [5] and its Java API. 3.2 Service Classification The classification of services into domain specific classes is performed using the SVM method. In our implementation, the freely available LIBSVM tool [1] was used. As stated in the introduction, automatic service classification both serves during service publication (to classify the new service) and service retrieval (to identify the class(es) where to restrict the focus of the query). In this section s context, both web service documentation and user queries are considered as a textual description to be classified (represented as grey arrows in Figure 1). Prior to apply SVM, sequences of words, obtained in the previous preprocessing phase, must be mapped onto vectors. In our approach, the mapping is achieved using IR techniques. Each element of the vector corresponds to a word (or term) in a vocabulary extracted from the documents themselves. All words are weighted with tf-idf metric. In this way, each document is mapped onto a vector using an injective function. The whole document set is encoded in a matrix, where rows represent documents (vectors) and columns are the weighted words. No information about the position or the meaning of the words is used, i.e., no semantic is known using this matrix. A classification task using supervised algorithm such as SVM or ANN requires a training set. In other word, our SVM needs to be trained with a pre classified set of documents. This produces a model matrix that will be used to predict to which class the document to be classified belongs to. 3.3 Building the Concept Lattice Once classified the service, or re directed the query to a specific domain, key concepts need to be extracted from the service/query 2 and their lattice needs to be built. Clearly, 2 From this point we will refer both as service indistinctly. such a lattice only represents a simplification of a domain ontology. FCA advantage comes from the way it shows how the presence or absence of attribute distinguishes objects, i.e., by means of super concept/sub concept relationships. A concepts lattice can well represent services names and keywords belonging to a specific domain, highlighting isa relationships between concepts and attributes. Without loss of generality, let us suppose we want to build a concept lattice from a set of service descriptions. First and foremost, we need to identify discriminant words, useful for the lattice. To this aim, we use the idf metric to eliminate words that do not appear in at least two or more, depending on the number of documents/services belonging to that domain/class. More formally: A service context is a triple C=(S, K, I) where S is a set of service names (the objects), K is the set of service description keywords (the attributes), and I the binary relationship which indicates the presence or absence of words into documents. The obtained lattice 3 may be used to identify concepts for a specific domain, as well as the relationships between services belonging to a class. Such a lattice aids a service publisher when providing service semantic annotation, in that it tries, starting just a textual description, to discover the service hidden semantic. 4 Empirical Study To validate and gain insights about the usefulness of the proposed approach, we performed an experiment aiming to classify a set of web service documentations, and to build lattices for services belonging to some particular classes/domains. Results are presented and discussed in this section. 4.1 Case Study Description Getting a suitable and extensive case study for experiments dealing with web services is still a challenge. Although, at the time of writing, several UDDI registries exist and are available for querying, too often the set of services obtained is almost useless. Even the service are trivial, or their documentation is dummy. We used, as a case study, a set of pre classified services available on the net [4] and downloaded from some UDDI registries. Such a set was composed of 205 services, classified in 11 classes, representing domains such as news, weather, credit card, etc. As said, each service is provided with a short description, extracted from the WSDL <documentation> tag. For example, a Credit Card Web Service presents a description as follows: 3 A thorough example is available online at
4 Will accept and validate a Credit Card Number. Returns True for a Valid Number and Returns False for a Invalid Number 4.2 Service Classification Results SVM service classification performances were measured using the leave one out validation [20]: each document (vector) in the set (matrix) was classified using a SVM model built using the remaining ones, and the percentage of correct classifications was measured. We found that different performances can be achieved by properly calibrating the SVM parameters, namely the kernel function and the gamma parameter. Other than the first classification obtained for each service, we let the SVM find alternative classification, for which we also measured the correct classification ratio, incrementally with respect to the first classification ratio. This permits to obtain, for each service, an ordered list of classes, to which the service has the highest likelihood to belong, according to our classification algorithm. In other words, the algorithm ensures that the service belongs, with a given likelihood, to the first class, with an higher likelihood to one of the two first classes, etc. For our case study, precision for the first score position is of about 63%. This is not that high (up to 84% was obtained for software maintenance ticket classification by Di Lucca et al. [8]), however reasonable, considering the quality and quantity of the training set, and that the approach was able to classify across 11 classes. When looking further, to best two and best three class scores, we found that, clearly, performance increases, respectively to 73% and 83%. In conclusion, although the approach could not always suggest the correct class, at least is able to indicate a limited group of classes to which the service could belong. 4.3 Building Service Concept Lattice After classification was performed and thus services belonging to each category identified, we built concept lattices for each category using FCA. As described in Section 3, words having high idf were pruned, in that they are considered not relevant for building the concept lattice. Figure 2 shows a lattice obtained applying FCA to documents belonging to the Credit Card class. Each node in the lattice can show both concepts and objects. Concepts appear in the high part of the node, while objects in the low. In our case, concepts represent sets of keywords, while objects represents services. Generic concepts (referred as top concepts) are placed in the high part of the lattice (card, credit, Visa, Mastercard, etc.). The lattice easy permits to find, for example, services that both support Visa or Mastercard (service11 and service6). service6 is considered to be more specific than service11 because, according to its description, it can validate credit card numbers, while service11 does not advertise such a feature. Going further in our lattice analysis, it can be noted that service16 can be used to validate credit cards and it seems to be more specific than service1. This reflects what is specified in their description: service1: credit card (whole text is: Offering Loans And Credit Cards to Consumers) service16: accept card credit validate valid (whole text is Will accept and validate a Credit Card Number. Returns True for a Valid Number and Returns False for a Invalid Number ) Note that word relevance depends on the term document frequency. If a word appeared only in one document, it has been pruned ( loans, consumer, invalid ). Other words have also been stopped. According to the proposed approach, new concepts are added to the lattice when terms appear in more than one document. Therefore, if a new service description containing the word consumer is used to expand the lattice, a new concept will be added, and the lattice structure will change. A further example of lattice building, related to the mail domain, is reported in a technical report available online [6]. 4.4 Discussion Results obtained for service classification showed how the approach can be useful. It appears evident that the automatic classification helps, identifying, with a likelihood of 83%, the ordered list of 3 out of 11 classes among which the service may be classified. The service publisher can exploit this result from several points of view. First and foremost, he/she can accept one of the classifications proposed by the automatic tool, possibly manually refining the choice. In this case, the tool helps in reducing the publisher classification degrees of freedom across a limited number of service classes/domains. It may happen that the proposed classes may be completely different than publisher expectancies. If a weather service is classified in the finance or mail domain, it means that the service description may be ambiguous. In this case the tool raises the publisher s attention to the problem, highlighting the need for correcting the deployed service documentation. This will reduce the risk that, during service search, the service is never found by queries related to its own features and, instead, it is found by queries related to other kind of features. Regarding concept lattice building, it appears immediately clear that a completely automatic ontology or semantic annotation building is unfeasible. This, however, was not
5 Figure 2. Concept lattice of services: the credit card example our purpose. Instead, we found that, while human supervision and intervention cannot be avoided, useful insights can be obtained from service lattices. In fact, by highlighting relationships between services, it can help to build and refine the service semantic annotation. By looking to the lattice, the publisher can found that some keyword may simply make the service annotation heavier, or even misleading. Thus, it can be decided to remove or replace these keywords. When a service developer publishes some services, he/she is aware of the genericity/specificity of the services. If this is not reflected in the lattice, it means that service descriptions are misleading or incomplete. More generally, if the web service textual description is incomplete, too generic or containing phrases not properly related to the service features, it may be hard to automatically classify it and to find the correct position in the concept lattice. Much in the same way, let us suppose that some domain specific services have been already published and semantically annotated according to a specific ontology, and that we want to publish a new service. By using proper tools, such as the FCA plug-in for Protege 2000 [2], it can be possible to extract a context from the ontology. If we add a row, representing our service keywords, to such a context, and then we build a lattice using FCA, we will be able to immediately highlight how our service can be annotated with respect to the ontology. The second consideration we can make about the usefulness of these concept lattices is related to ontology building. As stated in the introduction, service annotations coherent with ontologies may be necessary to allow automatic service matching for late binding. The concept identification made by FCA, as well as the lattice structure of these concepts, although giving a limited view of an ontology, can indeed be useful for its building, completion or maintenance. In fact, when publishing new service, new concepts may need to be added, especially if the ontology is not yet complete. Conversely, when a user performs a query to retrieve a service, the following scenario can happen. First and foremost, the user is guided, by the SVM classifier, to focus on some particular domains. In these domains, the portion of lattice of interest is highlighted, significantly easing the service search. Finally, our studies suggested that lattices appear to be useful when focusing to well restricted domains. Wide domains and upper ontologies would generate, in fact, unmanageable and difficult to understand lattices.
6 5 Conclusions This paper presented an approach, based on machine learning techniques, to support service classification and annotation. Starting from free text service documentation, services are automatically classified in classes/domains using Support Vector Machines. Successively, Formal Concept Analysis is used to build service concept lattice for each specific domain. Results of a classification experiment on a set of 205 services downloaded from the web shown the feasibility of the approach. Although needing user guidance, automatic classification, by proposing the nearest three classes out of 11 with a likelihood of 83%, can ease and support the service publication and annotation. Much in the same way, the obtained concept lattices highlighted relationships existing between services, and aided the identification of domain key concepts. Finally, we showed with some examples how the same approach can also be integrated in the service retrieval mechanism. Work in progress is devoted to further improve the proposed technique, to confirm the obtained results with other case studies, and to integrate the approach in a service broker we are developing in a project in cooperation with an Italian software company. References [1] Libsvm tool. cjlin/libsvm/. [2] Plugin for protege [3] Protege [4] Textual documentation of web services and classified services. [5] Wordnet dictionary. wn/. [6] M. Bruno, G. Canfora, M. Di Penta, and R. Scognamiglio. An approach to support web service classification and annotation. Technical report, RCOST - University of Sannio, Italy, Sep [7] S. P. Cimiano, J. Staab, and Tane. Deriving concept hierarchies from text by smooth formal concept analysis. In Proceedings of the GI Workshop Lehren Lerner -Wissen - Adaptivitat (LLWA), [8] G. Di Lucca, M. Di Penta, and S. Gradara. An approach to classify software maintenance requests. In Proceedings of IEEE International Conference on Software Maintenance, pages , Montréal, QC, Canada, Oct [9] P. W. Eklund, editor. Browsing Search Results via Formal Concept Analysis: Automatic Selection of Attributes, volume 2961/2004 of Lecture Notes in Computer Science. Springer, feb [10] W. B. Frakes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ, [11] H.-M. Haav. An application of inductive concept analysis to construction of domain-specific ontologies. In B. Thalheim and G. Fiedler, editors, Emerging Database Research in East Europe, Proceedings of the Pre-Conference Workshop of VLDB 2003, volume 14/03 of Computer Science Reports, pages Brandenburg University of Technology at Cottbus, nov [12] T. Joachims. Text categorization with support vector machines: learning with many relevant features. European Conf. Mach. Learning, ECML98, Apr [13] T. Joachims. A statistical learning model of text classification for support vector machines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages , [14] M. Kim and P. Compton. Formal concept analysis for domain-specific document retrieval systems. Lecture Notes in Computer Science, 2256, [15] D. D. Lewis. Representation and learning in information retrieval. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, US, [16] E. Maedche and G. Stumme. FCA-MERGE: Bottom-up merging of ontologies. Jan [17] N. F. Noy, R. W. Fergerson, and M. A. Musen. The knowledge model of Protégé-2000: Combining interoperability and flexibility. Lecture Notes in Computer Science, 1937, [18] M. Paolucci, T. Kawamura, T. R. Payne, and K. Sycara. Semantic matching of web services capabilities. In First International Semantic Web Conference (ISWC 2002), volume 2348, pages Springer-Verlag, June [19] F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1):1 47, [20] M. Stone. Cross-validatory choice and assesment of statistical predictions (with discussion). Journal of the Royal Statistical Society B, 36: , [21] G. Tao. Using formal concept analysis for ontology structuring and building. PhD thesis, [22] V. N. Vapnik. Statistical Learning Theory. John Wiley, Sept [23] B. G. R. Wille. Formal Concept Analysis. Mathematical Foundations, Springer Verlag, [24] Y. Yang and X. Liu. A re-examination of text categorization methods. In M. A. Hearst, F. Gey, and R. Tong, editors, Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42 49, Berkeley, US, ACM Press, New York, US.
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
A Case Study of Question Answering in Automatic Tourism Service Packaging
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question
A Pattern-based Framework of Change Operators for Ontology Evolution
A Pattern-based Framework of Change Operators for Ontology Evolution Muhammad Javed 1, Yalemisew M. Abgaz 2, Claus Pahl 3 Centre for Next Generation Localization (CNGL), School of Computing, Dublin City
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION
EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Ontological Identification of Patterns for Choreographing Business Workflow
University of Aizu, Graduation Thesis. March, 2010 s1140042 1 Ontological Identification of Patterns for Choreographing Business Workflow Seiji Ota s1140042 Supervised by Incheon Paik Abstract Business
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
A QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
Evaluating Semantic Web Service Tools using the SEALS platform
Evaluating Semantic Web Service Tools using the SEALS platform Liliana Cabral 1, Ioan Toma 2 1 Knowledge Media Institute, The Open University, Milton Keynes, UK 2 STI Innsbruck, University of Innsbruck,
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Ontology for Home Energy Management Domain
Ontology for Home Energy Management Domain Nazaraf Shah 1,, Kuo-Ming Chao 1, 1 Faculty of Engineering and Computing Coventry University, Coventry, UK {nazaraf.shah, k.chao}@coventry.ac.uk Abstract. This
Towards a Visually Enhanced Medical Search Engine
Towards a Visually Enhanced Medical Search Engine Lavish Lalwani 1,2, Guido Zuccon 1, Mohamed Sharaf 2, Anthony Nguyen 1 1 The Australian e-health Research Centre, Brisbane, Queensland, Australia; 2 The
Incorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
Ontology-Based Discovery of Workflow Activity Patterns
Ontology-Based Discovery of Workflow Activity Patterns Diogo R. Ferreira 1, Susana Alves 1, Lucinéia H. Thom 2 1 IST Technical University of Lisbon, Portugal {diogo.ferreira,susana.alves}@ist.utl.pt 2
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Association rules for improving website effectiveness: case analysis
Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, [email protected] Tanja Krunić, The
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
A Service Modeling Approach with Business-Level Reusability and Extensibility
A Service Modeling Approach with Business-Level Reusability and Extensibility Jianwu Wang 1,2, Jian Yu 1, Yanbo Han 1 1 Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing,
Folksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Optimization of Image Search from Photo Sharing Websites Using Personal Data
Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery
Combining SAWSDL, OWL DL and UDDI for Semantically Enhanced Web Service Discovery Dimitrios Kourtesis, Iraklis Paraskakis SEERC South East European Research Centre, Greece Research centre of the University
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
A Proposed Algorithm for Spam Filtering Emails by Hash Table Approach
International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 4 (9): 2436-2441 Science Explorer Publications A Proposed Algorithm for Spam Filtering
SmartLink: a Web-based editor and search environment for Linked Services
SmartLink: a Web-based editor and search environment for Linked Services Stefan Dietze, Hong Qing Yu, Carlos Pedrinaci, Dong Liu, John Domingue Knowledge Media Institute, The Open University, MK7 6AA,
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES Anwar Ali Yahya *, Addin Osman * * Faculty of Computer Science and Information Systems, Najran University,
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
Spam detection with data mining method:
Spam detection with data mining method: Ensemble learning with multiple SVM based classifiers to optimize generalization ability of email spam classification Keywords: ensemble learning, SVM classifier,
Inner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms
Reverse Engineering of Relational Databases to Ontologies: An Approach Based on an Analysis of HTML Forms Irina Astrova 1, Bela Stantic 2 1 Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn,
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Semantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
MACHINE LEARNING BASED TICKET CLASSIFICATION IN ISSUE TRACKING SYSTEMS
MACHINE LEARNING BASED TICKET CLASSIFICATION IN ISSUE TRACKING SYSTEMS Mucahit Altintas (a,b,c), A. Cuneyd Tantug (d,b) a [email protected], d [email protected] b Istanbul Technical University, Istanbul,
How To Use Networked Ontology In E Health
A practical approach to create ontology networks in e-health: The NeOn take Tomás Pariente Lobo 1, *, Germán Herrero Cárcel 1, 1 A TOS Research and Innovation, ATOS Origin SAE, 28037 Madrid, Spain. Abstract.
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web
LexOnt: A Semi-automatic Ontology Creation Tool for Programmable Web Knarig Arabshian Bell Labs, Alcatel-Lucent Murray Hill, NJ Peter Danielsen Bell Labs, Alcatel-Lucent Naperville, IL Sadia Afroz Drexel
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Analysis of Data Mining Concepts in Higher Education with Needs to Najran University
590 Analysis of Data Mining Concepts in Higher Education with Needs to Najran University Mohamed Hussain Tawarish 1, Farooqui Waseemuddin 2 Department of Computer Science, Najran Community College. Najran
HELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Web services with WebSphere Studio: Deploy and publish
Web services with WebSphere Studio: Deploy and publish Table of Contents If you're viewing this document online, you can click any of the topics below to link directly to that section. 1. Introduction...
An Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
Personalized Hierarchical Clustering
Personalized Hierarchical Clustering Korinna Bade, Andreas Nürnberger Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany {kbade,nuernb}@iws.cs.uni-magdeburg.de
Ontology construction on a cloud computing platform
Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
A Symptom Extraction and Classification Method for Self-Management
LANOMS 2005-4th Latin American Network Operations and Management Symposium 201 A Symptom Extraction and Classification Method for Self-Management Marcelo Perazolo Autonomic Computing Architecture IBM Corporation
Semantic Transformation of Web Services
Semantic Transformation of Web Services David Bell, Sergio de Cesare, and Mark Lycett Brunel University, Uxbridge, Middlesex UB8 3PH, United Kingdom {david.bell, sergio.decesare, mark.lycett}@brunel.ac.uk
DATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
Text Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
Incorporating Semantic Discovery into a Ubiquitous Computing Infrastructure
Incorporating Semantic Discovery into a Ubiquitous Computing Infrastructure Robert E. McGrath, Anand Ranganathan, M. Dennis Mickunas, and Roy H. Campbell Department of Computer Science, University or Illinois
An ARIS-based Transformation Approach to Semantic Web Service Development
An ARIS-based Transformation Approach to Semantic Web Development Cheng-Leong Ang ϕ, Yuan Gu, Olga Sourina, and Robert Kheng Leng Gay Nanyang Technological University, Singapore [email protected] ϕ Abstract
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Heterogeneous Data Management on Environmental Sensors Using Ontology Mapping
Lecture Notes on Information Theory Vol. 1, No. 4, December 2013 Heterogeneous Data Management on Environmental Sensors Using Ontology Mapping Kaladevi Ramar and T. T Mirnalinee Department of Computer
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
A Semantic Approach for Access Control in Web Services
A Semantic Approach for Access Control in Web Services M. I. Yagüe, J. Mª Troya Computer Science Department, University of Málaga, Málaga, Spain {yague, troya}@lcc.uma.es Abstract One of the most important
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Business Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Supporting Change-Aware Semantic Web Services
Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand [email protected] Abstract. The Semantic Web is not only evolving into
