Automatically Generated Tag Clouds

Size: px
Start display at page:

Download "Automatically Generated Tag Clouds"

Transcription

1 Automatically Generated Tag Clouds Geraldo Xexéo 1,2, Fernando Morgado 1, Patrícia Fiuza 1 1 Programa de Engenharia de Sistemas e Computação, COPPE/UFRJ 2 Departamento de Ciência da Computação, IM/UFRJ {xexeo,fernandofm,patfiuza}@cos.ufrj.br Abstract. This paper proposes a formal model for discussing the construction of tag clouds, applicable both to human beings and computer programs. From this model we also to propose a methodology for automatic generation of tag clouds, which aim to achieve results similar to the ones produced by human beings. We also present a specific implementation and describe some of our initial results. 1. Introduction With the advancement of social software sites, like Delicious[1], Flickr[2], and the increasing incentive to use tags to describe various types of information objects, it became common to present these tags in a format known as tag cloud. According to Hassan and Herrero Solana[3], a tag cloud is a visual list of tags arranged in a way to transmit information and meaning through the use of different font sizes, styles and colors, based on the their importance within the group in which it appears. The more popular a tag is, the larger is the font and therefore it is more prominent in the tag cloud in which it occurs. Therefore, tag clouds provide a summary or a semantic view of the concepts most important to represent an object [4]. Human beings build this semantic view by associating concepts that are understood from an observed object and the words that represent them, under their specific points of view. If these objects are documents, those words can be found in their contents or inferred from the understanding of these contents. In general, people can easily create words that represent concepts, since they have a reasonable knowledge of the language and of the world. They can also describe concepts as sentences, even if partial. Automatic systems, however, do not own this knowledge; therefore, they must infer knowledge exclusively from the words that compose the document or some other information, such as metadata or the tagging applied to similar documents. We should also point out that the relation between words and concepts is not bijective. A single word can represent more than one concept, what is known as polysemy. A crane, for example, can represent a bird or construction equipment. On the other hand, a concept can be represented by many words (or sequence of words), such as film, movie, picture, motion picture and flick, what is known as synonymy. Also, there could be no word that singularly represents a concept, for instance, blood bank or gene bank. Finally, there are also words with no associated meaning, such as prepositions, but that are very frequent textual documents. One could possible also argue for the existence of concepts that cannot be represented by words, 136

2 but this is not the main stream reasoning. However, it is common sense that there are concepts that need a complex text structure to describe. Facing these limitations, we understand that an automatic solution, such as the one we propose, can only be effective when developed as an approximation of the human behavior. To enable this approximation, we judge necessary to create a model that allows the description of the semantic view of a document, both when analyzed under the human point of view and when analyzed under the point of view of a computer program. Therefore, this work aims to present a formal model for building tag clouds, applicable both to human beings and computer programs, which is presented in section 3. From this model we also to propose a methodology for automatic generation of tag clouds, in section 4, which aim to achieve results similar to the ones produced by human beings, what is defended in section 5. In the next section we give a brief introduction to tag clouds and their automatic generation. 2. Tag Clouds In this section we provide some informal definitions for tag clouds and some related concepts, aiming to provide a common understanding that will allow us to later build a formal framework. We also discuss some questions relating the difference between human generated and computer generated tag clouds. We adopt the definition proposed by Rivadeneira et al.[5]: tag clouds are visual presentations of a set of words, typically a set of tags selected by some rationale, in which attributes of the text such as size, weight, or color are used to represent features of the associated terms. Moreover, a tag cloud usually has another component: a reference, such as a title or a caption that indicates to which object that tag cloud relates. There is no need for the object to be concrete or accessible through an URL, such as a document file. For instance, a tag cloud can refer to an event that occurred or will occur, as a rock show. A tag can be defined as any label or symbol attached to an object, such as a document, an image, music, etc. Usually, these labels are short and often consist of a single word[6]. Each tag usually denotes a concept related to the object to which it is bound. Concepts can include ideas of origin, purpose, description, and others. Marinchev [7] identified the abstract set of concepts related to an object, and represented in the tag cloud, to a semantic field. A semantic field is the set of concepts connected to a focus, but in a form that is now independent of the originating taggers, and available to other people for understanding. [7]. As a result, a tag cloud is a visual representation of the semantic field of an object. The entire process of creating a tag cloud can be summarized in three steps[7]. 1. Understanding an object/focus and the concepts that can be applied it. 2. Capture of the semantic field around the object focus. 3. Transforming the semantic field to a tag cloud. A fourth step of this process of creation is the actual use of the tag cloud, which is recreated as the final user interpret and attempts to understand it as an actual object or possible objects inside a context. 137

3 Two main aspects of presentation are considered during the construction of tag clouds: the properties of the component font and disposal of tags. The font size is usually used to represent the importance of tags, while a common use of color is to highlight possible categories in the set of tags. Tags can be arranged in alphabetical order or based on frequency. They may also have a random position. Terms that have the same semantic classification can be placed close to each other[4]. Although first generation tag clouds were 2 dimensional, nowadays one can find 3 dimensional cloud tags Computers and Tag Clouds In our work, we automatically generate tag clouds from text documents, aiming to approximate a human generated tag cloud. To accomplish that we decided to mimic the process described by Marinchev [7] (explained above). However, the process first step is a challenge, since it says that to create a tag cloud one must first create concepts in human minds. These concepts are abstract thoughts that not always can be perfectly described in words. For example, reading the end of Shakespeare s Romeo and Juliet can induce an overall sentiment of sadness that is only approximately described by tags such as sad or unhappy. Therefore, to adopt that process we must first decide on how to represent concepts in a computer. It should be clear that this representation is not the same representation provided by tags. Tags are symbols, usually words, that humans can understand and assign some meaning. Concepts, in the cognitive sense, are abstract thoughts, while in the computational sense they must be modeled as some data structure, or even a procedure or rule. One can select, for example, Wordnet synsets [8] to represent concepts. It is not unreasonable, however, to select words to represent them, even the same words used as tags. There is some previous work on automatic tag cloud generation. PubCloud[9] is a tool, based on the use of tag clouds, to summarize results returned by a search, and to allow the navigation from tag cloud to the results. CloudMine[6] is a tool that categorizes, summarizes and displays the most important terms of a document as text clouds. Some articles, such as [10] and [11] predict tags by learning from previously existing tagged documents. None of them build a formal model to discuss tag cloud generation, which is the main contribution of this article. 3. A Formal Model to Discuss Tag Clouds In this section we create from scratch a formal and conceptual definition of tag clouds that allow us to derive an abstract method for building them which is similar to the one briefly described by Marinchev[7]. We start by defining resources and contexts. Our motivation for these definitions is that we are interested in creating tag clouds that describe resources in a context. Resources are any abstract concept or physical entity that can be uniquely identified in the web or outside it[12]. The definition of resource is left open, to follow the approach used in the RFC line of documents. In our case, a resource is any identifiable object can be described, at least partially, by a set of tags. These tags act as representations of concepts that reside in human mind or in computer data structures and can be applied to the resource according to some rationale. In the Web, resources 138

4 are identified by URIs. There are many ways to represent resource properties (such as metadata), however RDF[13] representation are standard and stable. A resource is represented by a letter r, possibly indexed. A context, denoted by the word w, is a set of resources that can be analyzed as a whole. The context containing all resources is be denoted by W, for the web. Therefore, w is not an element of W, but a subset of it. Contexts can be abstract, as when defined by a single word such as Medicine, or very objective, such as the answers to the query soccer game by a specific search engine. Contexts can include documents, real life objects and events, i.e., anything that can be described as a resource. There is no restriction that all resources of a context are of the same type. Figure 1. UML Model for resources and contexts 3.1. Preliminary definitions: attribute pair set In this subsection we define some preliminary concepts that will lead us to the definition of an attribute pair set, which is an abstraction created to allow us to define a dynamic set of attributes and its values to an object. An object is a primitive concept, therefore, not defined in the theory. As in object oriented theory, objects are the root set to which all our other defined concepts belong. Both atomic elements and sets belong to the set of objects. The set of all objects is the universal set, denoted U. A domain, or a value set, denoted by V i, is a set of values. We use domains as they are used in most part of database theory: to define a set of admissible values for an attribute. They are indexed, as in V i, to represent the fact that we are using multiple domains. Values in a domain can be indicated by a double indexed letter v, such as in v ij, to show that the value v ij belongs to domain V i. We make no previous requirement on a domain, such as being finite or composed of atomic values. The set of all domains is represented by the letter V (non indexed). An attribute a of a object o is a property that describes o. The set of all possible attributes is the denoted by A. Although abstract, attributes are usually represented (named) by strings. One should expect these strings to be words of sequence of words with some clear meaning. Humans usually can easily associate a domain to an attribute, e.g., meters to evaluate distance or integers to evaluate age. Computer programs, on the other hand, need that this association to be made explicit by some declaration, e.g., in a type declaration such as found in different programming languages. A domain attribution function is a function that associates a domain with an attribute: f da : A V. When defined, a domain attribution function represents the types of values that can be assigned to an attribute. 139

5 From now on, we suppose that our sets A, V and the function f da are defined. One simple example of possible values is: A={color,size} V={Colors,SmallIntegers} Colors={red,green,blue,yellow,black,white} SmallIntegers= {1..256} f da ={(color,colors),(size,smallintegers)} An attribute pair is an ordered pair (a i,v ij ): (a i,v ij ) A V i where f da (a i )=V i. Attribute pairs describe the value of an attribute a i, in some particular context. We create attribute pairs to allow for dynamic selection of attributes that can be applied to an object. In this way, later on, we will not be obliged to previously define which attributes can be used to describe an object, i.e., its class, as in traditional objectoriented theory. An attribute pair set, or a type restricted map or simply a map, is a set of attribute pairs where every first element of an ordered pair is unique among all members of the map. Maps will be used further on to represent the set of attributes that can be used to describe an object. A map will be denoted by m, and formally defined as: m = { (a i,v ij ) ((a i,v ij ) A V i ) (f da (a i )=V i )) ( if (a i,v ij ) m (a k,v kn ) m) then a i =a k v ij =v kn )} The set of all possible maps will be denoted by M. Figure 2. UML Model representing the basic framework Classification and attribution functions In this section we use the definitions in the basic framework to apply the concept of attributes to objects. A classification function is a function f c, that, given an object o generates a set of attributes A i, A i A, that can be used to represent the attributes of the object. f c : O (A) f c (o) = A i = { a ij a ij is an attribute of r } A map attribution function, is a function f ma that, given an object, and a set of attributes A i, A i A, generates a map where to each attribute of A i correspond an attribute pair. f ma : O (A) M f ma (o,a i ) = { (a ij,v ij ) a ij A i ((a ij,v ij ) f ma (o,a i ) (f ad (a ij )=V i )) } 140

6 3.3. Applying attributes to resources In this section we use the definitions in the basic framework to apply the concept of attributes to resources. A resource classification function is a classication function f rc for which the set of objects is restricted to the set of resources. f rc : W (A) f rc (r) = A i = { a ij a ij is an attribute of r } A map attribution function for a resource r, is a map attribution function f mar for which the set of objects is restricted to the set of resources. The resulting map usually represents properties of the resource. f mar : W (A) M f mar (r,a i ) = { (a ij,v ij ) a ij A i ((a ij,v ij ) f mar (r,a i ) (f ad (a ij )=V i )) } A resource classification function is a function that, given a resource, establisheswhich attributes can be evaluated for it. A map attribution function is an evaluation function that returns the values for a set of properties of a resource. It also can be understood as the application of a set of evaluation functions, each one returning the value of a specific property of an object. From the above definitions, we have now the vocabulary to discuss how, given an resource, we can dynamically can generate a set of attributes and their values. The concepts described as sets and functions can be seen in Figure 1 as a UML model. A resource representation or a resource map for a resource r, RM(r), is a map: RM(r) = f ma (r,f rc (r)) Resource maps act as representations of resources. For example, the resource map of a document can be formed by tuples representing its bag of words. Figure 3. An UML model representing the use of maps to describe resource (as a ResourceMap) Concepts and Semantic Fields In this section we slowly build the concept of an abstract tag cloud that could be applied to any object. We start by formalizing Marinchev s [7] concept of semantic field. For that, we must start supposing the not only the existence of a set of resources, but also of a set of concepts, denoted C. Concepts can be extremely abstract, for example, in the case that we are talking about concepts formed in the human mind, or much more concrete, as in the case of representation of concepts in computer data structures. 141

7 The formal definition of concept is not an easy task, and has led to many discussions on philosophy. We use the conservative approach of adopting the Classical Theory of Concepts, that is: most concepts are structured mental representations that encode a set of necessary and sufficient conditions for their application, if possible, in sensory and perceptual terms [14]. However, we aim to explain concepts both as human and as computer based phenomena. Therefore, we will accept that concepts do not need to be mental representations, but only adequate cognitive representations. Given a resource r, from a context w, and a set of abstract concepts C, a semantic field for r is a set SF(r) of concepts SF(r) = {c i c i C applies(c i, r) } where applies is a logical predicate that represents the fact that a concept can be used to describe, in some way, an object or an object property. Therefore, a semantic field is a set of abstract concepts that, somehow, can be applied to an object aiming to build some understanding of it. At some time we will be interested in describing the semantic field of an object under a specific context, and we will use an subscript to represent it, as in SF w (r). A concept classification function is a classification function f cc, that, given a resource r and a concept c generate a set of attributes A i, A i A, that can be used to represent the attributes of the concept c when referring to the resource r. f cc : W C (A) f cc (r,c) = A i = { a ij a ij is an attribute of c when referring to r } A map attribution function for a concept c, is a map attribution function f mac that, given a concept c a resource r, and a set of attributes A i, A i A, generates a map where to each attribute of A i correspond an attribute pair describing. f mac : W C (A) M f mac (r,c,a i ) = { (a ij,v ij ) a ij A i ((a ij,v ij ) f mac (r,a i ) (f da (a ij )=V i )) } A valued semantic field for a resource r is a set of ordered pairs where the first element is a concept applicable to the resource, and the second element is an attribute pair set composed of the attributes induced by c i in r. VSF(r) = { (c i,m i ) c i SF(r) m i = f mac (r,c i,f cc (r,c i))} Although Marinchev s article[13] only discuss semantic fields, i.e., the mapping of concepts to resources, we believe that this mapping cannot be assumed to be free of subtleties and additional information. Moreover, computers do not really deal with concepts, but actually to some representation that can be mapped to a concept. These representations receive a great benefit from being able to carry additional information with them. For example, given that we choose to represent concepts by Wordnet synsets, a SF for an document d can be the set: S = {person,individual,someone,somebody,mortal,soul}. However, it is interesting to know with words were used to obtain the synset. To that, we can have the label original words defining on attribute pair in our map for synset S, and the set {person, individual} describing which words were found in the document d that generated the synset. 142

8 A semantic field generator is a function f sfg that given a context w, a specific resource resource r, r w, and a set of concepts, generates a semantic field SF(r) which indicates a set of concepts than can be considered, under some reasoning, to be applicable to r in context w. f sfg : W (W) C f sfg (r,w) = {c i c i C applies w (c i, r) } = SF w (r) A valued semantic field generator is a function f vsfg that given a context w, a specific resource resource r, r w, and a set of concepts, generates a valued semantic field VSF(r) which indicates a set of concepts than can be considered, under some reasoning, to be applicable to r and its corresponding maps. f sfg : W (W) C M f vsfg (r,w) = { (c i,m i ) c i SF w (r) m i = f mac (r,c i,f cc (r,c i))} We would like to point that although it is possible to generate a semantic field from a single resource, is much more reasonable to consider that to generate this semantic field one needs to analyze not only the resource, but also the context where it is inserted. This context is characterized by all other resources that can be view somehow in the same scope as the analyzed resource. Figure 4. Modelling Semantic Fields in UML From Tags to Abstract Tag Clouds A tag classification function is a function f tc, that, given a resource r and a tag t generate a set of attributes A i, A i A, that can be used to represent the attributes of the tag t when referring to the resource r. f tc : W C (A) f tc (r,c) = A i = { a ij a ij is an attribute of c when referring to r } A map attribution function for a tag t, is a function f mat that, given a tag t, a resource r, and a set of attributes A i, A i A, generates a map where to each attribute of A i correspond an attribute pair describing. f mat : W C (A) M f mat (r,c,a i ) = { (a ij,v ij ) a ij A i ((a ij,v ij ) f mat (r,a i ) (f da (a ij )=V i )) } Given a resource r, from a context w, and a set of tags T, a tag field for r is a set of tags TF w (r) = {t j t j T, c SF(r) represents(t j,c)}, 143

9 where each tag t j is a symbol, usually a word or a short sequence of words, that represents one or more concepts that can be applied to r. Again, we leave the definition of the predicate represents open to interpretation. However, we point that there is no requirement that it is total over SF(r). A tag field assumes the role of a concrete representation of a semantic field. Also, there is no difference from a tag field created by humans or computers. Both of them are sets of concrete symbols. One semantic field can induce different tag fields according to the symbols (words) available and the represents function chosen. A tag field generator is a function f tfg that given a context w, a specific resource resource r, r w, generates a tag field TF(r) which represents the set of tags than can be considered, under some reasoning, to be applicable to r in context w. F tfg : W (W) T f tfg (r,w) = {t i c j SF w (r), represents w (t i, c j ) } = TF w (r) Given a tag field TF(r), an abstract tag cloud ATC(r) is a set of tuples ATC(r)= { (t i,m i ) }, where t i is a tag belonging to TF(r), and m i is a map that represents the attributes of the tag. A abstract tag cloud generator is a function f atcg that given a context w, a specific resource resource r, r w, generates an abstract tag cloud ATG(r,w) which indicates a set of tags than can be considered, under some reasoning, to represent the concepts applicable to r and their corresponding maps. F atcg : W (W) C M F atcg (r,w) = { (t i,m i ) t i TF w (r) m i = f mat (r,c i,f tc (r,c i))} For example, given that the text Romeo and Juliet is an object the concept of forbidden love (that we cannot avoid to represent as words) can be represented as two tags forbidden and love, which can appear in the abstract tag cloud as: { (forbidden,{(color,black),(size,12),(x,1),(y,10)}), (love,{(color,red),(size,16),(x,20),(y,20)})}. One should notice that not all attributes must be representative of visual characteristics. It is possible to have hidden attributes, such as the tf idf [15]value of a word in a text, which will be used in some representation or algorithm. Figure 5. Modeling Tags, and Abstract Tag Clouds in UML. 144

10 3.6. Tag Clouds are Visual Representations We are now able to define tag clouds as a suitable visual representation of an abstract tag cloud. To illustrate the idea, we present a tag cloud (Figure 6) generated from a document about classification of a text document using wavelets, member of a set of documents about web intelligence. Due to printing limitations, we use only size as visual attribute of a tag, to indicate its importance. Larger font sizes indicate the greater importance of a tag. In contrast, smaller font sizes are used for less important tags. The arrangement, i.e., x positions, will follow the frequency of the tags in the document. wavelets classification term signal representation document text domain compression transformation recall precision original signal reduction compression daubechies haar 3.7. Creating Abstract Tag Clouds Figure 6. A simple tag cloud From this sequence of definitions, one can derive the process of generating tag clouds for a document as: 1. Select a document and its context. 2. Build a valued semantic field for the document 3. Use the valued semantic field to define an abstract tag cloud for the document. 4. Generate a suitable visual representation for the abstract tag cloud We make no assumptions on how these procedures will be developed. Many questions are left open and should be defined only in a particular implementation. For example, we make no decision on how to represent concepts. 4. Generating Tag Clouds for a Document In this section we will discuss the steps involved in the process of creating tag clouds. These steps follow the theoretical model proposed in the previous section. The tag creation process starts by applying techniques of text pre processing to a document (the resource) as described in [16]. We continue selecting the nouns from the clean text. We focus on nouns because it is easier to understand them as complete concepts and they are the most common type of tag used by humans. Following our model, the first step is to create a resource representation, RM(r). We do that by extracting the following information from the document: 1. List of terms (stems), and their tf idf. 2. List of bigrams of stems, and their tf idf. 3. A map from terms and bigrams to sets of words, describing which words were reduced to each term or bigram. 145

11 The process of creating that information represents the implementation of f mar. Tf idf is a traditional measure of term relevance used in informational retrieval, defined as [15]: where tfij N w ij = log( ) max ( tf ) n w ij : is the weight of term j in document i tf ij : is the frequence of term j in document i N: is the number of documents in a collection n j : is the number os documents with term j k The next step is calculating the semantic field, SF(r). In our implementation we actually use RM(r) as input, considering that f sfg (r)=g sfg (f mar (r)), where g sfg is an auxiliary function. Our current strategy is to rank all terms and bigrams by tf idf and select the top 40. The weighting provided by the tf idf measure allows us to select terms representing the documents in a collection in a simple and efficient way. However, we are currently experimenting with other measures, such as information gain[17]. Currently, our semantic field is a list of terms and term bigrams. To further improve our results, we extend our semantic field using Wordnet [8]. Wordnet is lexical database of English where nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual semantic and lexical relations [8]. These allow us to simulate some human behavior when creating tag clouds, such as looking for synonyms and hyperonyms, enriching the set of terms. From the semantic field we generate additional information that is interesting for tag cloud calculation and create a valued semantic field VSF(r). This additional information includes the (original) words present in document r that correspond to terms selected for the semantic field. ik j Figure 7. Activity Diagram representing the steps for tag cloud creation. We use VSF(r) to calculate the tag field, TF(r). Our approach is to look for the lemmas of the words that generated the terms chosen for the semantic field and arbitrarily select one or more of it, according to frequency considerations. A lemma is the canonical form for a set of words (lexeme) and for nouns is the nominative singular, usually masculine. Finally, to generate the abstract tag cloud we use again the tf idf as first (hidden) attribute and calculate font size and position according to its value. 146

12 5. Example of Use and Evaluation For an initial analysis of the proposal, we subjected the tool described a set of 20 articles dealing with the subject "Web intelligence". These articles were selected at random from the Web Intelligence 2008 conference proceedings[18]. Of all individual results obtained, we selected three of them as to exemplify the ideas described here. In our examples, tags generated directly from the document are represented with a gray background. Tags in white over a black background are originated from Wordnet. Figure 8 shows the tag cloud of an article that presents a proposal for mining and exploitation of feedback reported by customers in unstructured texts. To enable this, the tool uses a tree structure and display in the form of clusters. Clusters are used in different colors trying to represent the feelings of customers. In this article we avoid use of colors due to printing limitations. It is easy to see that the fundamental concepts presented in the article are highlighted in terms in the tag cloud. Special attention to the presence of the words feedback, keyphrases cluster e unstructured information. Figure 8. Tag cloud for first result Figure 9 shows the tag cloud generated for an article that deals about trying discover the social context that a person is involved, through the obtainment of information in social networks websites, like Orkut and Facebook. And after discover this social context, use it to improve information retrieval algorithms used by this person. We observe in the tag cloud, tags that summarize this idea, for example, social context, comment and people preferences. Figure 9. Tag cloud for second result Figure 10, presents a tag cloud generated for an article that pretends classify the documents into reader emotion categories. And integrate it in a web search engine. Like before we observe in tag cloud the presence of tags that transmit us this idea, such as, emotion, emotion classification and reader. 147

13 Figure 10. Tag cloud for third result All tag clouds generated for this collection presented such characteristics, i.e., their tags provided particular information that can be used to highlight the particular features of a text in relation to a collection of documents Subjective Evaluation To evaluate the tag clouds that we generated, we decided to execute an evaluation based on the idea proposed by [19]. For this evaluation we chose compare our results against the tag clouds generated by a well known visualization web site, that generates different kinds of visualization, among them tag clouds. We will name this site as Site M. In this evaluation, we were able to set up a small evaluation session for the three examples in this paper. This session was realized with seven experts on the topic. All of them are master or doctor students at the UFRJ and have a satisfactory English knowledge. Besides, all evaluators had knowledge about what was approached in the articles that we use to generate the tag clouds. The evaluators concluded that the first example to our tag cloud was better (5) or equal (2) than obtained by the Site M. For the third example the evaluation was similar, our tag cloud was considered better by 2, and equal by others 5 evaluators. However, for the second example all evaluators agreed that both tag clouds failed. Tables below reproduce the evaluation results. Each cell shows how many evaluators found the tag cloud better, equal or worse than the other. Table 1: Evaluation for the first example Our tag cloud Tag cloud generated by M Better 5 0 Equal 2 Worse 0 5 Table 2: Evaluation for the second example Our tag cloud Tag cloud generated by M Better 0 0 Equal 0 Worse 7 7 Table 3: Evaluation for the third example Our tag cloud Tag cloud generated by M Better 2 0 Equal 5 Worse

14 Based on this evaluation and the opinion requested for the evaluators, we believe that our tag clouds are equal or better than the tag clouds of Site M with respect to information, but worst in the aspect of visual representation. Therefore, based on this evaluation, we believe that our generated tag clouds, satisfactorily meet the requirement of representing by their tags, the main concepts and features available in their respective articles. An important advantage that we noticed in our model of generation of tag clouds, was the preference for the use of nouns to represent concepts. This allowed create representative tag clouds with a smaller number of tags. Besides, also we will create a measure to evaluate the tag clouds generated, in relation of their quality. We intend use some parameters to obtain this measure such as: quantity of tags in the "tag cloud", percentage of n grams, among others. 6. Conclusion We presented a formal model to describe tag clouds. A methodology for their construction was also presented and some preliminary results were shown. These results demonstrate the real applicability of the proposal. Due to printing limitations, we decided to present our tag clouds as gray scale representations. Our tool support colors, 2 D distribution of tags and clusterization. As future work, we intend through the application of algorithms for the cluster of terms, generate tag clouds with the ability to highlight characteristics of groups of documents within the collection. Since this is mainly a proposal to substitute a human activity, it is not yet clear on how to evaluate it and what type of functionality we should aggregate to have a good user experience and a effective evaluation of our proposal. For example, we can create hypertext links between tags and tag clouds, and also between tags and documents. These will greatly enhance the usability of our tools, however they also make difficult to assess the individual influence of our proposal for tag cloud generation. We also plan to inquire further into the formal model, and make it compatible with an ontology of information developed by our group. 7. References [1] [2] [3] Hassan Montero Y. and Herrero Solana V. Improving Tag Clouds as Visual Information Retrieval Interfaces, in Proc. of the 1st International Conference on Multidisciplinary Information Sciences and Technologies InSCiT [4] Lamantia J. Tag Clouds: Navigation for Landscapes of Meaning. Joe Lamantia Blog. < dscapes_of_meaning.html> [5] Rivadeneira A. W., Gruen D. M., Muller M. J., Millen D.R.Getting our head in the clouds: toward evaluation studies of tagclouds. In Proceedings of the SIGCHI 149

15 Conference on Human Factors in Computing Systems. CHI '07. ACM, New York, NY, [6] Watters D., Meaningful Clouds: Towards a novel interface for document visualization. (visited 16/5/2009) [7] Marinchev I., Practical Semantic Web Tagging and Tag Clouds, Journal Cybernetics and Information Technologies, v. 6, n. 3 (2006), pp [8] Fellbaum, Christiane. WordNet: An Electornic Lexical Database. Bradford Books [9] Kuo B.Y., Hentrich T., Good B. M. and Wilkinson M.D. Tag clouds for summarizing web search results. In Proceedings of the 16th international Conference on World Wide Web WWW '07. ACM, New York, NY, [10] Song, Y., Zhuang, Z., Li, H., Zhao, Q., Li, J., Lee, W., and Giles, C. L Real time automatic tag recommendation. In Proceedings of the 31st Annual international ACM SIGIR Conference on Research and Development in information Retrieval. SIGIR '08. ACM, New York, NY, [11] Heymann, P., Ramage, D., and Garcia Molina, H Social tag prediction. In Proceedings of the 31st Annual international ACM SIGIR Conference on Research and Development in information Retrieval. SIGIR '08. ACM, New York, NY, [12] Berners Lee, T., Fielding, R. Masinter, L. Uniform Resource Identifier: Generic Syntax, RFC 3986, January [13] Manola, F. Miller, E. RDF Primer. W3 Recommendation 10. Feb [14] Laurence, S. Margolis, E. Concepts and Cognitive Sience in Margolis E. and Laurence, S. (eds.) Concepts: Core Readings, Cambridge, Mass: MIT Press, [15] Manning, C., Raghavan, P., and Schtze, H Introduction to Information Retrieval. Cambridge University Press. [16] Weiss S., Indurkia N., Zhang T., Damerau F., Text Mining Predictive Methods for Analyzing Unstructured Information. Springer [17] Dasgupta A., Drineas P., Harb B., Josifovski V., Mohoney M. Feature Selection Methods for Text Classification. Proceedings of KDD [18] Jain, L. at al. Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, 9 12 December 2008, University of Technology, Sydney, Australia. [19] Harper, S. and Patel, N Gist summaries for visually impaired surfers. In Proceedings of the 7th international ACM SIGACCESS Conference on Computers and Accessibility. Assets '05. ACM, New York, NY, Acknowledgements The authors would like to thanks the financial support of CNPq, CAPES, FAPERJ and Fundação Coppetec. 150

Institute for Information Systems and Computer Media. Graz University of Technology. Phone: (+43) 316-873-5613. Graz University of Technology

Institute for Information Systems and Computer Media. Graz University of Technology. Phone: (+43) 316-873-5613. Graz University of Technology Title: Tag Clouds Name: Christoph Trattner 1 and Denis Helic 2 and Markus Strohmaier 2 Affil./Addr. 1: Knowledge Management Institute and Institute for Information Systems and Computer Media Graz University

More information

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search

Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Comparing Tag Clouds, Term Histograms, and Term Lists for Enhancing Personalized Web Search Orland Hoeber and Hanze Liu Department of Computer Science, Memorial University St. John s, NL, Canada A1B 3X5

More information

Exploiting Tag Clouds for Database Browsing and Querying

Exploiting Tag Clouds for Database Browsing and Querying Exploiting Tag Clouds for Database Browsing and Querying Stefania Leone, Matthias Geel, and Moira C. Norrie Institute for Information Systems, ETH Zurich CH-8092 Zurich, Switzerland {leone geel norrie}@inf.ethz.ch

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

An Ontology-based e-learning System for Network Security

An Ontology-based e-learning System for Network Security An Ontology-based e-learning System for Network Security Yoshihito Takahashi, Tomomi Abiko, Eriko Negishi Sendai National College of Technology a0432@ccedu.sendai-ct.ac.jp Goichi Itabashi Graduate School

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

Universal. Event. Product. Computer. 1 warehouse.

Universal. Event. Product. Computer. 1 warehouse. Dynamic multi-dimensional models for text warehouses Maria Zamr Bleyberg, Karthik Ganesh Computing and Information Sciences Department Kansas State University, Manhattan, KS, 66506 Abstract In this paper,

More information

Visualizing WordNet Structure

Visualizing WordNet Structure Visualizing WordNet Structure Jaap Kamps Abstract Representations in WordNet are not on the level of individual words or word forms, but on the level of word meanings (lexemes). A word meaning, in turn,

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Web Data Extraction: 1 o Semestre 2007/2008

Web Data Extraction: 1 o Semestre 2007/2008 Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008

More information

Using Use Cases for requirements capture. Pete McBreen. 1998 McBreen.Consulting

Using Use Cases for requirements capture. Pete McBreen. 1998 McBreen.Consulting Using Use Cases for requirements capture Pete McBreen 1998 McBreen.Consulting petemcbreen@acm.org All rights reserved. You have permission to copy and distribute the document as long as you make no changes

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Writing Learning Objectives

Writing Learning Objectives Writing Learning Objectives Faculty Development Program Office of Medical Education Boston University School of Medicine All Rights Reserved 2004 No copying or duplication of this presentation without

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Context Capture in Software Development

Context Capture in Software Development Context Capture in Software Development Bruno Antunes, Francisco Correia and Paulo Gomes Knowledge and Intelligent Systems Laboratory Cognitive and Media Systems Group Centre for Informatics and Systems

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes

More information

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

2 AIMS: an Agent-based Intelligent Tool for Informational Support

2 AIMS: an Agent-based Intelligent Tool for Informational Support Aroyo, L. & Dicheva, D. (2000). Domain and user knowledge in a web-based courseware engineering course, knowledge-based software engineering. In T. Hruska, M. Hashimoto (Eds.) Joint Conference knowledge-based

More information

Writing learning objectives

Writing learning objectives Writing learning objectives This material was excerpted and adapted from the following web site: http://www.utexas.edu/academic/diia/assessment/iar/students/plan/objectives/ What is a learning objective?

More information

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK

SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK SEMANTIC VIDEO ANNOTATION IN E-LEARNING FRAMEWORK Antonella Carbonaro, Rodolfo Ferrini Department of Computer Science University of Bologna Mura Anteo Zamboni 7, I-40127 Bologna, Italy Tel.: +39 0547 338830

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments Grzegorz Dziczkowski, Katarzyna Wegrzyn-Wolska Ecole Superieur d Ingenieurs

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Lecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2.

Lecture Overview. Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Web 2.0 Definition. Web 2. Lecture Overview Web 2.0, Tagging, Multimedia, Folksonomies, Lecture, Important, Must Attend, Martin Halvey Introduction to Web 2.0 Overview of Tagging Systems Overview of tagging Design and attributes

More information

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery

Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Transformation of Free-text Electronic Health Records for Efficient Information Retrieval and Support of Knowledge Discovery Jan Paralic, Peter Smatana Technical University of Kosice, Slovakia Center for

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

A Semantic web approach for e-learning platforms

A Semantic web approach for e-learning platforms A Semantic web approach for e-learning platforms Miguel B. Alves 1 1 Laboratório de Sistemas de Informação, ESTG-IPVC 4900-348 Viana do Castelo. mba@estg.ipvc.pt Abstract. When lecturers publish contents

More information

Query Recommendation employing Query Logs in Search Optimization

Query Recommendation employing Query Logs in Search Optimization 1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: singh26.neha@gmail.com Dr Manish

More information

To download the script for the listening go to: http://www.teachingenglish.org.uk/sites/teacheng/files/learning-stylesaudioscript.

To download the script for the listening go to: http://www.teachingenglish.org.uk/sites/teacheng/files/learning-stylesaudioscript. Learning styles Topic: Idioms Aims: - To apply listening skills to an audio extract of non-native speakers - To raise awareness of personal learning styles - To provide concrete learning aids to enable

More information

virtual class local mappings semantically equivalent local classes ... Schema Integration

virtual class local mappings semantically equivalent local classes ... Schema Integration Data Integration Techniques based on Data Quality Aspects Michael Gertz Department of Computer Science University of California, Davis One Shields Avenue Davis, CA 95616, USA gertz@cs.ucdavis.edu Ingo

More information

KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE

KNOWLEDGE-BASED IN MEDICAL DECISION SUPPORT SYSTEM BASED ON SUBJECTIVE INTELLIGENCE JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 22/2013, ISSN 1642-6037 medical diagnosis, ontology, subjective intelligence, reasoning, fuzzy rules Hamido FUJITA 1 KNOWLEDGE-BASED IN MEDICAL DECISION

More information

Towards Effective Recommendation of Social Data across Social Networking Sites

Towards Effective Recommendation of Social Data across Social Networking Sites Towards Effective Recommendation of Social Data across Social Networking Sites Yuan Wang 1,JieZhang 2, and Julita Vassileva 1 1 Department of Computer Science, University of Saskatchewan, Canada {yuw193,jiv}@cs.usask.ca

More information

RRSS - Rating Reviews Support System purpose built for movies recommendation

RRSS - Rating Reviews Support System purpose built for movies recommendation RRSS - Rating Reviews Support System purpose built for movies recommendation Grzegorz Dziczkowski 1,2 and Katarzyna Wegrzyn-Wolska 1 1 Ecole Superieur d Ingenieurs en Informatique et Genie des Telecommunicatiom

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Practical Semantic Web Tagging and Tag Clouds 1

Practical Semantic Web Tagging and Tag Clouds 1 BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6, No 3 Sofia 2006 Practical Semantic Web Tagging and Tag Clouds 1 Ivo Marinchev Institute of Information Technologies, 1113

More information

Automatic Timeline Construction For Computer Forensics Purposes

Automatic Timeline Construction For Computer Forensics Purposes Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,

More information

Appendix B Data Quality Dimensions

Appendix B Data Quality Dimensions Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational

More information

Web 3.0 image search: a World First

Web 3.0 image search: a World First Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together Owen Sacco 1 and Matthew Montebello 1, 1 University of Malta, Msida MSD 2080, Malta. {osac001, matthew.montebello}@um.edu.mt

More information

Electronic Document Management Using Inverted Files System

Electronic Document Management Using Inverted Files System EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,

More information

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data De la Business Intelligence aux Big Data Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris 22/01/14 Séminaire Big Data 1 Agenda EvoluHon of Business Intelligence SemanHc Technologies

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

DESIGNING AND MINING WEB APPLICATIONS: A CONCEPTUAL MODELING APPROACH

DESIGNING AND MINING WEB APPLICATIONS: A CONCEPTUAL MODELING APPROACH DESIGNING AND MINING WEB APPLICATIONS: A CONCEPTUAL MODELING APPROACH Rosa Meo Dipartimento di Informatica, Università di Torino Corso Svizzera, 185-10149 - Torino - Italy E-mail: meo@di.unito.it Tel.:

More information

A Workbench for Prototyping XML Data Exchange (extended abstract)

A Workbench for Prototyping XML Data Exchange (extended abstract) A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

int.ere.st: Building a Tag Sharing Service with the SCOT Ontology

int.ere.st: Building a Tag Sharing Service with the SCOT Ontology int.ere.st: Building a Tag Sharing Service with the SCOT Ontology HakLae Kim, John G. Breslin Digital Enterprise Research Institute National University of Ireland, Galway IDA Business Park, Lower Dangan

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

Lightweight Data Integration using the WebComposition Data Grid Service

Lightweight Data Integration using the WebComposition Data Grid Service Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach

An Efficient Database Design for IndoWordNet Development Using Hybrid Approach An Efficient Database Design for IndoWordNet Development Using Hybrid Approach Venkatesh P rabhu 2 Shilpa Desai 1 Hanumant Redkar 1 N eha P rabhugaonkar 1 Apur va N agvenkar 1 Ramdas Karmali 1 (1) GOA

More information

Interactive Graphic Design Using Automatic Presentation Knowledge

Interactive Graphic Design Using Automatic Presentation Knowledge Interactive Graphic Design Using Automatic Presentation Knowledge Steven F. Roth, John Kolojejchick, Joe Mattis, Jade Goldstein School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

More information

Information, Entropy, and Coding

Information, Entropy, and Coding Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated

More information

Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration

Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration Steffen Lohmann, Jürgen Ziegler, and Lena Tetzlaff University of Duisburg-Essen, Lotharstrasse 65, 47057 Duisburg, Germany,

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Text Mining: The state of the art and the challenges

Text Mining: The state of the art and the challenges Text Mining: The state of the art and the challenges Ah-Hwee Tan Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore 119613 Email: ahhwee@krdl.org.sg Abstract Text mining, also known as text data

More information

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)

Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE) HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India anuangra@yahoo.com http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University

More information

Card-Sorting: What You Need to Know about Analyzing and Interpreting Card Sorting Results

Card-Sorting: What You Need to Know about Analyzing and Interpreting Card Sorting Results October 2008, Vol. 10 Issue 2 Volume 10 Issue 2 Past Issues A-Z List Usability News is a free web newsletter that is produced by the Software Usability Research Laboratory (SURL) at Wichita State University.

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

Analysing the Behaviour of Students in Learning Management Systems with Respect to Learning Styles

Analysing the Behaviour of Students in Learning Management Systems with Respect to Learning Styles Analysing the Behaviour of Students in Learning Management Systems with Respect to Learning Styles Sabine Graf and Kinshuk 1 Vienna University of Technology, Women's Postgraduate College for Internet Technologies,

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

An eclipse-based Feature Models toolchain

An eclipse-based Feature Models toolchain An eclipse-based Feature Models toolchain Luca Gherardi, Davide Brugali Dept. of Information Technology and Mathematics Methods, University of Bergamo luca.gherardi@unibg.it, brugali@unibg.it Abstract.

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information