Exam in course TDT4215 Web Intelligence - Solutions and guidelines -
|
|
|
- Cynthia Carter
- 9 years ago
- Views:
Transcription
1 English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: Allowed means of assistance: D No printed or handwritten material allowed. Simple calculator allowed. The weighting of the questions is indicated by percentages. All answers are to be entered directly in the designated boxes on the question sheet and no additional sheets of paper are to be handed in. For the multiple choice questions, you should set a cross in the box for the correct answer. There is only one correct answer for the multiple choice questions. A list of formulas from the book and the papers/chapters are included at the end of the exam set. Question 1. Vector Space model (25%) Assume a document collection with the following four documents: Document #1: Trondheim is the third largest city in Norway and is situated on the Nidelva river. (Length = 0.89) Document #2: Trondheim is a university city with many students. Document #3: The Nidaros cathedral is a popular tourist attraction. (Length = 1.13) Document #4: Nidaros was the original name of Trondheim. Trondheim is the modern name. (Length = 0.89) Small letters and capital letters are treated as the same. Do not perform any stemming, lemmatization or stop word removal. a) Explain the idea behind the cosine similarity in vector space information retrieval (2.5%).
2 English Student no:... Page 2 of in Baeza-Yates & Ribeiro-Neto: Modern Information Retrieval b) Explain the motivation behind the tf idf formula and each of its components (2.5%) in Baeza-Yates & Ribeiro-Neto: Modern Information Retrieval c) Calculate the length of document #2. Show the calculation (5%). Vocabulary: Doc 1 Doc 2 Doc 3 Doc 4 n log(n/n) di=tfidf di^2 a , , , city , , , is
3 English Student no:... Page 3 of 12 many 1 1 0, , , students 1 1 0, , , Trondheim , , ,01561 university 1 1 0, , , with 1 1 0, , , Square Maxfreq 1 sum: 1, (di^2)= ( (di^2)) 1.28 Length of document #2 is d) Calculate the tfidf score for the terms trondheim and nidaros for each of the documents (5%). Nidaros: Idf(nidaros) = log(4/2) D1,D2 = 0 (does not contain nidaros) D3: tfidf(nidaros) = 1/1*log(4/2) = (maxfreq = 1) D4: tfidf(nidaros)= ½*log(4/2) = (maxfreq=2) Trondheim Idf(Trondheim) = log(4/3) D1: tfidf = ½*log(4/3) = (maxfreq = 2)
4 English Student no:... Page 4 of 12 D2: Tfidf(trondheim) = 1/1*log(4/3) = (maxfreq = 1) D3: Tfidf(trondheim) = 0 (does not contain trondheim) D4: Tfidf(Trondheim) = 2/2*log(4/3) = e) Calculate the cosine similarity for each of the documents with respect to the query (10%). Query: trondheim nidaros Show the list of ranked documents. Query: W i,q =( *tf i,q /maxf i,q )*log(n/n) W nidaros,q = ( *1/1) * log(4/2) = W trondheim,q = ( *1/1) * log(4/3) = Length og query: L q = (0.301^ ^2) = D1: Tfidf nidaros = 0 Tfidf trondheim =0.062 Length = 0.89 D2: Tfidf nidaros = 0
5 English Student no:... Page 5 of 12 Tfidf trondheim =0.125 Length = 1.28 D3: Tfidf nidaros = Tfidf trondheim =0 Length = 1.13 D4: Tfidf nidaros = Tfidf trondheim =0.125 Length = 0.89 Sim(d1,q) = ( 0 * * 0.125)/(0.89*0.326) = 0,0267 Sim(d2,q) = ( 0 * * 0.125)/(1.28*0.326) = 0,0374 Sim(d3,q) = ( * * 0.125)/(1.13*0.326) = 0,2459 Sim(d4,q) = ( * * 0.125)/(0.89*0.326) = 0,2105 Ranking: D3 > D4 > D2 > D1 Question 2. Retrieval Evaluation (10%) a) Given a set of relevant documents Rq for a given query q, and the returned (ranked) result from an experimental information retrieval engine for query q, Aq, calculate the interpolated precision at 11 standard recall levels. You are not required to graph the result. (5%) Rq = {D4, D12, D23, D26, D30, D33, D60, D73, D83, D99} Aq = 1. D4, 2. D12, 3. D2, 4. D33, 5. D31, 6. D83, 7. D41, 8. D62, 9. D23, 10. D51, 11. D9, 12. D43, 13. D73, 14. D55, 15. D17 recall precision 0 % 100 % 10 % 100 % 20 % 100 %
6 English Student no:... Page 6 of % 75 % 40 % 67 % 50 % 56 % 60 % 46 % 70 % 0 % 80 % 0 % 90 % 0 % 100 % 0 % b) What is one of the biggest problems of precision/recall in web information retrieval, and which measure out of the two is more important to a user in web search? (2.5%) We do not have detailed knowledge of the document collection searched; and hence no exact definition of the set of related documents for a given query. This leads to only estimation of the correct set of documents. Precision will most likely be the most important measure, since the user is interested in getting valuable hits high in the ranking. c) Calculate the R precision for the query and the ranked list given in a). (2.5%) Number of relevant documents = 10. In the first 10 documents retrieved, there are 5 relevant documents; R-precision is 5/10 = 0.5 (50%). Question 3. Text Categorization (20%) a) What is the basic hypothesis in using the vector space model for classification and what is the content of the hypothesis? (5%) The basic hypothesis in using the vector space model for classification is the contiguity hypothesis. Contiguity hypothesis: Documents in the same class form a contiguous region and regions of
7 English Student no:... Page 7 of 12 different classes do not overlap. b) Explain briefly the rationale of k Nearest Neighbor (knn) text classification. (5%) The rationale of knn classification is that, based on the contiguity hypothesis, we expect a test document d to have the same label as the training documents located in the local region surrounding d. c) The table below shows a test set consisting 10 documents, where each document is annotated with its corresponding class (A,B,C). We wish to use this test set to find which class document X belong to. The distance between document X and the documents in the test set is given in the table. Use majority voting scheme knn with k=5 to calculate which class document X belongs to. (5%) Document Class A A B A C C B C B B Distance from X The five closest documents are: 3(B), 4(A), 8(C), 9(B), 10(B). B has three out of 5 closest documents, while A and C only have 1 out of the 5 closest documents. Document X is annotated with class B. d) With the table given in question c), use weighted sum voting scheme knn with k=5 to calculate which class document X belongs to. (5%) The five closest documents are: 3(B), 4(A), 8(C), 9(B), 10(B). The maximum distance 2.15 is used for normalization and calculation of weights. A=1 1.12/2.15=0.479 B=(1 0.63/2.15) + (1 0.89/2.15) + (1 2.15/2.15)=1.293 C=1 1.35/2.15=0.37
8 English Student no:... Page 8 of 12 Document X is annotated with class B Question 4. Clustering (15%) a) Explain the main steps of the k means clustering algorithm (5%) See chapter 4 in Chakrabarti: Mining the web discovering knowledge from hypertext data. b) Assume the four documents below. Use the suffix tree clustering (STC) method explained in Contextualized Clustering in Exploratory Web Search and build a suffix tree for the documents (5%). Doc 1: {gustave eiffel} Doc 2: {eiffel, paris sightseeing} Doc 3: {eiffel tower, paris attractions} Doc 4: {alexandre gustave eiffel, paris} Alexandre gustaveeiffel [4] gustaveeiffel [1,4] paris [4] sightseeing [2] attractions [3] eiffel[1,2,4] tower [3] sightseeing [2] attractions [3] tower [3]
9 English Student no:... Page 9 of 12 c) Identify the three base clusters and calculate their scores (5%). We use formula s(b) = B * f( P ) Base cluster Documents Score eiffel 1,2,4 3 paris 2,3,4 3 gustave eiffel 1,4 4 Question 5. Latent Semantic Analysis (5%) Explain briefly the purpose of Singular Value Decomposition in latent semantic analysis (5%) See paper on latent semantic analysis Question 6. Semantic Web (15%) a) What role does XML play in the Semantic Web? (3%) Definition of OWL Serialization of OWL ontology models = TRUE Unified format for ontology mapping Representation of ontology layout Semantic markup language
10 English Student no:... Page 10 of 12 b) Which is the most expressive language that retains computational completeness and decidability? (3%) XML RDF RDFS OWL DL = TRUE OWL Full c) Below is an OWL definition described using the abstract syntax from Antoniou et al. SubClassOf(student complete intersectionof(human smart) ObjectProperty(hasStudNo InverseFunctional) Class(student partial restriction(read somevaluefrom(book)) restriction(read allvaluesfrom(scientificbook))) Individual(john type(student)) Individual(john value(hasstudno 12345)) Individual(Mary value(hasstudno 12345)) Phrase these definitions in natural language. (3%) All students are smart humans. Things have unique student numbers. Students must read at least one book and they only read scientific books. John is a student. Both John and Mary have student number d) What cannot be inferred from the statements in c)? (3%) Mary is a student John and Mary are the same individuals Mary reads only scientific books John is smart Each student has exactly one student number = TRUE 100%: Correctly answered otherwise 0%
11 English Student no:... Page 11 of 12 e) Translate the following definition of professors into the OWL abstract syntax (3%) Scientific staff are either professors or researchers. Scientific staff teach courses Professors teach at the most one course each. SubClassOf(scientificStaff unionof(professor researcher)) ObjectProperty(teach domain(scientificstaff) range(course)) Class(professor partial restriction(teach maxcardinality(1))) Question 7. Semantic Search (10%) a) Recall the article "Indexing a Web Site with a Terminology Oriented Ontology by Demontils et al., they mention "cumulative similarity measure" by Demontils & Jacquin. Why did they stem instead of lemmatize the words before they were annotated with part of speech tags? (2.5%) Stemming is the process to find the common root form of words and hence an ideal basis for a part of speech tagger; Stemming is the process to find the canonical form of words and hence an ideal basis for a part of speech tagger; Lemmatization is the process to find the canonical form of words and hence not suitable as a good basis for a part of speech tagger; Lemmatization is the process to find the common root form of words and hence not suitable as a good basis for a part of speech tagger; They did not use stemming, but lemmatization. = TRUE 100%: Correctly answered otherwise 0% b) Recall the article "Construction of Ontology based Semantic Linguistic Feature Vectors for Searching: the Process and Effect" by Tomassen & Strasunskas. The development of the approach is inspired by a linguistic method for describing the meaning of objects, where a FV "connects" something. What does a feature vector "connect"? (2.5%) An ontology property with associated textual documents; An ontology property with associated domain terminology;
12 English Student no:... Page 12 of 12 An ontology entity with associated textual documents; An ontology entity with associated domain terminology; = TRUE An ontology with associated textual documents; An ontology with associated domain terminology. 100%: Correctly answered otherwise 0% c) Recall the article "Measuring intrinsic quality of semantic search based on Feature Vectors" by Tomassen & Strasunskas. A set of evaluation measures were proposed. What is the purpose of the Average Fv Similarity measure? (2.5%) Provides an indication of the uniqueness of the FVs; = TRUE Provides an indication of the degree of semantic relatedness between neighbouring entities; Provides an indication of the overall quality of the FVs; Provides an indication of the FV quality relative to the ontology quality; Provides an indication of the semantic distance between the FV terms. 100%: Correctly answered otherwise 0% d) What are ontologies normally not used for in semantic search applications? (2.5%) Disambiguation of queries Query expansion with synonyms Semantic indexing with ontology concepts Latent semantic indexing = TRUE Semantic annotation of documents 100%: Correctly answered otherwise 0% Appendix A. Formulas
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Homework 2. Page 154: Exercise 8.10. Page 145: Exercise 8.3 Page 150: Exercise 8.9
Homework 2 Page 110: Exercise 6.10; Exercise 6.12 Page 116: Exercise 6.15; Exercise 6.17 Page 121: Exercise 6.19 Page 122: Exercise 6.20; Exercise 6.23; Exercise 6.24 Page 131: Exercise 7.3; Exercise 7.5;
Incorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Query Recommendation employing Query Logs in Search Optimization
1917 Query Recommendation employing Query Logs in Search Optimization Neha Singh Department of Computer Science, Shri Siddhi Vinayak Group of Institutions, Bareilly Email: [email protected] Dr Manish
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Eng. Mohammed Abdualal
Islamic University of Gaza Faculty of Engineering Computer Engineering Department Information Storage and Retrieval (ECOM 5124) IR HW 5+6 Scoring, term weighting and the vector space model Exercise 6.2
Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
Text Analytics Illustrated with a Simple Data Set
CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Optimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
Finding Advertising Keywords on Web Pages. Contextual Ads 101
Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
Electronic Document Management Using Inverted Files System
EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,
An Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 [email protected] Steffen STAAB Institute AIFB,
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Information Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1
Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1 Ivo Marinchev Abstract: The paper introduces approach to semantic lifting of unstructured data with the help of natural language
Automated News Item Categorization
Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr
Comparison of Standard and Zipf-Based Document Retrieval Heuristics
Comparison of Standard and Zipf-Based Document Retrieval Heuristics Benjamin Hoffmann Universität Stuttgart, Institut für Formale Methoden der Informatik Universitätsstr. 38, D-70569 Stuttgart, Germany
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
BUSINESS VALUE OF SEMANTIC TECHNOLOGY
BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director
Linear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, [email protected] Spring 2007 Text mining & Information Retrieval Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences
Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences Wenwen Li 1 and Vidit Bhatia 2 1 GeoDa Center for Geospatial Analysis and Computation School of Geographical Sciences
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Folksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
EUR-Lex 2012 Data Extraction using Web Services
DOCUMENT HISTORY DOCUMENT HISTORY Version Release Date Description 0.01 24/01/2013 Initial draft 0.02 01/02/2013 Review 1.00 07/08/2013 Version 1.00 -v1.00.doc Page 2 of 17 TABLE OF CONTENTS 1 Introduction...
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
PDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
6-1. Process Modeling
6-1 Process Modeling Key Definitions Process model A formal way of representing how a business system operates Illustrates the activities that are performed and how data moves among them Data flow diagramming
Web 3.0 image search: a World First
Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have
Ontology based ranking of documents using Graph Databases: a Big Data Approach
Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval
Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Semantic Web based e-learning System for Sports Domain
Semantic Web based e-learning System for Sports Domain S.Muthu lakshmi Research Scholar Dept.of Information Science & Technology Anna University, Chennai G.V.Uma Professor & Research Supervisor Dept.of
Explorer's Guide to the Semantic Web
Explorer's Guide to the Semantic Web THOMAS B. PASSIN 11 MANNING Greenwich (74 w. long.) contents preface xiii acknowledgments xv about this booh xvii The Semantic Web 1 1.1 What is the Semantic Web? 3
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Information Retrieval on the Internet
Information Retrieval on the Internet Diana Inkpen Professor, University of Ottawa, 800 King Edward, Ottawa, ON, Canada, K1N 6N5 Tel. 1-613-562-5800 ext 6711, fax 1-613-562-5175 [email protected] Outline
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
HELP DESK SYSTEMS. Using CaseBased Reasoning
HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Efficient Query Optimizing System for Searching Using Data Mining Technique
Vol.1, Issue.2, pp-347-351 ISSN: 2249-6645 Efficient Query Optimizing System for Searching Using Data Mining Technique Velmurugan.N Vijayaraj.A Assistant Professor, Department of MCA, Associate Professor,
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS Ionela MANIU Lucian Blaga University Sibiu, Romania Faculty of Sciences [email protected] George MANIU Spiru Haret University Bucharest, Romania Faculty
A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS
A COMBINED TEXT MINING METHOD TO IMPROVE DOCUMENT MANAGEMENT IN CONSTRUCTION PROJECTS Caldas, Carlos H. 1 and Soibelman, L. 2 ABSTRACT Information is an important element of project delivery processes.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.
CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II
! E6893 Big Data Analytics Lecture 5:! Big Data Analytics Algorithms -- II Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
The Seven Practice Areas of Text Analytics
Excerpt from: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Available now:
Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Information Technology for KM
On the Relations between Structural Case-Based Reasoning and Ontology-based Knowledge Management Ralph Bergmann & Martin Schaaf University of Hildesheim Data- and Knowledge Management Group www.dwm.uni-hildesheim.de
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Automated Collaborative Filtering Applications for Online Recruitment Services
Automated Collaborative Filtering Applications for Online Recruitment Services Rachael Rafter, Keith Bradley, Barry Smyth Smart Media Institute, Department of Computer Science, University College Dublin,
I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION
Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Identifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
CSCI 5417 Information Retrieval Systems Jim Martin!
CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad
