Indexing, cont d PHRASE QUERIES AND POSITIONAL INDEXES

Size: px
Start display at page:

Download "Indexing, cont d PHRASE QUERIES AND POSITIONAL INDEXES"

Transcription

1 Indexing, cont d PHRASE QUERIES AND POSITIONAL INDEXES

2 Sec. 2.4 Phrase Queries Want to be able to answer queries such as dalhousie university as a phrase Thus the sentence I went to university at dalhousie is not a match. The concept of phrase queries has proven easily understood by users; one of the few advanced search ideas that works Many more queries are implicit phrase queries For this, it no longer suffices to store only <term : docs> entries

3 Sec A First Attempt: Biword indexes Index every consecutive pair of terms in the text as a phrase For example the text Friends, Romans, Countrymen would generate the biwords friends romans romans countrymen Each of these biwords is now a dictionary term Two-word phrase query-processing is now immediate.

4 One Motive Could be that most search queries are 2.4 words long. This applies to Web search only.

5 Sec Longer Phrase Queries Longer phrases are processed as Boolean biword queries: dalhousie university halifax canada can be broken into the following Boolean query on biwords: dalhousie university AND university halifax AND halifax canada Without the documents, we cannot verify that the documents matching the above Boolean query do contain the phrase. Can have false positives!

6 Sec Extended biwords Parse the indexed text and perform part-of-speech-tagging (POST). Bucket the terms into (say) Nouns (N) and articles/prepositions (X). Call any string of terms of the form NX*N an extended biword. Each such extended biword is now made a term in the dictionary. Example: catcher in the rye N X X N Query processing: parse it into N s and X s Segment query into enhanced biwords Look up in index: catcher rye

7 Sec Issues for biword indexes False positives, as noted before Index blowup due to bigger dictionary Infeasible for more than biwords, big even for them Biword indexes are not the standard solution (for all biwords) but can be part of a compound strategy

8 Sec Solution 2: Positional Indexes In the postings, store, for each term the position(s) in which tokens of it appear: <term, number of docs containing term; doc1: position1, position2 ; doc2: position1, position2 ; etc.>

9 Sec Positional index example Which of docs 1,2,4,5 could contain to be or not to be? For phrase queries, we use a merge algorithm recursively at the document level But we now need to deal with more than just equality

10 Sec Processing a phrase query Extract inverted index entries for each distinct term: to, be, or, not. Merge their doc:position lists to enumerate all positions with to be or not to be. to: be: 2:1,17,74,222,551; 4:8,16,190,429,433; 7:13,23,191;... 1:17,19; 4:17,191,291,430,434; 5:14,19,101;... Same general method for proximity searches

11 Sec Proximity queries LIMIT! /3 STATUTE /3 FEDERAL /2 TORT Again, here, /k means within k words of. Clearly, positional indexes can be used for such queries; biword indexes cannot. Exercise: Adapt the linear merge of postings to handle proximity queries. Can you make it work for any value of k? This is a little tricky to do correctly and efficiently See Figure 2.12 of IIR There s likely to be a problem on it!

12 Figure 2.12 Text Book Please run it through this example: to: be: 4: 8,16,190,429,433; 4: 17,191,291,430,434; Proximity on both sides of the term

13 Sec Positional index size You can compress position values/offsets. Nevertheless, a positional index expands postings storage substantially Nevertheless, a positional index is now standardly used because of the power and usefulness of phrase and proximity queries whether used explicitly or implicitly in a ranking retrieval system.

14 Sec Positional index size Need an entry for each occurrence, not just once per document Index size depends on average document size Average web page has <1000 terms Books and some epic poems easily 100,000 terms Consider a term with frequency 0.1% Document size ,000 Postings 1 1 Positional postings 1 100

15 Sec Rules of thumb A positional index is 2 4 as large as a nonpositional index Positional index size 35 50% of volume of original text Caveat: all of this holds for English-like languages

16 Sec Combination schemes These two approaches can be profitably combined For particular phrases ( Michael Jackson, Britney Spears ) it is inefficient to keep on merging positional postings lists Even more so for phrases like The Who Williams et al. (2004) evaluate a more sophisticated mixed indexing scheme A typical web query mixture was executed in ¼ of the time of using just a positional index It required 26% more space than having a positional index alone

17 Resources for today s lecture IIR Chapter 2: H.E. Williams, J. Zobel, and D. Bahle Fast Phrase Querying with Combined Indexes, ACM Transactions on Information Systems. D. Bahle, H. Williams, and J. Zobel. Efficient phrase querying with an auxiliary index. SIGIR 2002, pp

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

Inverted Indexes: Trading Precision for Efficiency

Inverted Indexes: Trading Precision for Efficiency Inverted Indexes: Trading Precision for Efficiency Yufei Tao KAIST April 1, 2013 After compression, an inverted index is often small enough to fit in memory. This benefits query processing because it avoids

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project proposals due

More information

Informa(on Retrieval

Informa(on Retrieval Introduc*on to Informa(on Retrieval Lecture 4: Dic*onaries and tolerant retrieval 1 Ch. 3 This lecture Dic*onary data structures Tolerant retrieval Wild-card queries Spelling correc*on Soundex 2 Sec. 3.1

More information

Grammar Imitation Lessons: Simple Sentences

Grammar Imitation Lessons: Simple Sentences Name: Block: Date: Grammar Imitation Lessons: Simple Sentences This week we will be learning about simple sentences with a focus on subject-verb agreement. Follow along as we go through the PowerPoint

More information

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

WRITING PROOFS. Christopher Heil Georgia Institute of Technology WRITING PROOFS Christopher Heil Georgia Institute of Technology A theorem is just a statement of fact A proof of the theorem is a logical explanation of why the theorem is true Many theorems have this

More information

boolean retrieval some slides courtesy James

boolean retrieval some slides courtesy James boolean retrieval some slides courtesy James Allan@umass 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Data Discovery on the Information Highway

Data Discovery on the Information Highway Data Discovery on the Information Highway Susan Gauch Introduction Information overload on the Web Many possible search engines Need intelligent help to select best information sources customize results

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Compressing the Digital Library

Compressing the Digital Library Compressing the Digital Library Timothy C. Bell 1, Alistair Moffat 2, and Ian H. Witten 3 1 Department of Computer Science, University of Canterbury, New Zealand, tim@cosc.canterbury.ac.nz 2 Department

More information

Multi-source hybrid Question Answering system

Multi-source hybrid Question Answering system Multi-source hybrid Question Answering system Seonyeong Park, Hyosup Shim, Sangdo Han, Byungsoo Kim, Gary Geunbae Lee Pohang University of Science and Technology, Pohang, Republic of Korea {sypark322,

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

Chapter 2 The Information Retrieval Process

Chapter 2 The Information Retrieval Process Chapter 2 The Information Retrieval Process Abstract What does an information retrieval system look like from a bird s eye perspective? How can a set of documents be processed by a system to make sense

More information

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Incorporating Window-Based Passage-Level Evidence in Document Retrieval Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 7: Scores in a Complete Search System Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-07

More information

Homework 1 (10 ) a. Term document incidence matrix. b. inverted index representation for this collection (change the order between hopes and for )

Homework 1 (10 ) a. Term document incidence matrix. b. inverted index representation for this collection (change the order between hopes and for ) Homework 1 (10 ) Page 9: Exercise 1.2; Exercise 1. Page 12: Exercise 1.6 Page 1: Exercise 1.8; Exercise 1.10 Page : Exercise 2.1; Exercise 2. Page 6: Exercise 2.7 Page 41: Exercise 2.9 Page 51: Exercise.2;

More information

TREC 2007 ciqa Task: University of Maryland

TREC 2007 ciqa Task: University of Maryland TREC 2007 ciqa Task: University of Maryland Nitin Madnani, Jimmy Lin, and Bonnie Dorr University of Maryland College Park, Maryland, USA nmadnani,jimmylin,bonnie@umiacs.umd.edu 1 The ciqa Task Information

More information

Web Search Engines. Search Engine Characteristics. Web Search Queries. Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley

Web Search Engines. Search Engine Characteristics. Web Search Queries. Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley Web Search Engines Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley http://www.sims.berkeley.edu/courses/is202/f00/ Database Management Systems, R. Ramakrishnan 1 Search Engine Characteristics

More information

Electronic Document Management Using Inverted Files System

Electronic Document Management Using Inverted Files System EPJ Web of Conferences 68, 0 00 04 (2014) DOI: 10.1051/ epjconf/ 20146800004 C Owned by the authors, published by EDP Sciences, 2014 Electronic Document Management Using Inverted Files System Derwin Suhartono,

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =

More information

Large-Scale Data Cleaning Using Hadoop. UC Irvine

Large-Scale Data Cleaning Using Hadoop. UC Irvine Chen Li UC Irvine Joint work with Michael Carey, Alexander Behm, Shengyue Ji, Rares Vernica 1 Overview Importance of information Importance of information quality Data cleaning Large scale Hadoop 2 Data

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

A Comparative Approach to Search Engine Ranking Strategies

A Comparative Approach to Search Engine Ranking Strategies 26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab

More information

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic

More information

The University of Lisbon at CLEF 2006 Ad-Hoc Task

The University of Lisbon at CLEF 2006 Ad-Hoc Task The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports

More information

Using Wikipedia to Translate OOV Terms on MLIR

Using Wikipedia to Translate OOV Terms on MLIR Using to Translate OOV Terms on MLIR Chen-Yu Su, Tien-Chien Lin and Shih-Hung Wu* Department of Computer Science and Information Engineering Chaoyang University of Technology Taichung County 41349, TAIWAN

More information

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty 1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 3: Dictionaries and tolerant retrieval Ch. 2 Recap of the previous lecture The

More information

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK 1 K. LALITHA, 2 M. KEERTHANA, 3 G. KALPANA, 4 S.T. SHWETHA, 5 M. GEETHA 1 Assistant Professor, Information Technology, Panimalar Engineering College,

More information

2. Compressing data to reduce the amount of transmitted data (e.g., to save money).

2. Compressing data to reduce the amount of transmitted data (e.g., to save money). Presentation Layer The presentation layer is concerned with preserving the meaning of information sent across a network. The presentation layer may represent (encode) the data in various ways (e.g., data

More information

Reasoning Component Architecture

Reasoning Component Architecture Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

More information

Basic indexing pipeline

Basic indexing pipeline Information Retrieval Document Parsing Basic indexing pipeline Documents to be indexed. Friends, Romans, countrymen. Tokenizer Token stream. Friends Romans Countrymen Linguistic modules Modified tokens.

More information

CS 103X: Discrete Structures Homework Assignment 3 Solutions

CS 103X: Discrete Structures Homework Assignment 3 Solutions CS 103X: Discrete Structures Homework Assignment 3 s Exercise 1 (20 points). On well-ordering and induction: (a) Prove the induction principle from the well-ordering principle. (b) Prove the well-ordering

More information

Exploring Adaptive Window Sizes for Entity Retrieval

Exploring Adaptive Window Sizes for Entity Retrieval Exploring Adaptive Window Sizes for Entity Retrieval Fawaz Alarfaj, Udo Kruschwitz, and Chris Fox School of Computer Science and Electronic Engineering University of Essex Colchester, CO4 3SQ, UK {falarf,udo,foxcj}@essex.ac.uk

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

1 Boolean retrieval. Online edition (c)2009 Cambridge UP

1 Boolean retrieval. Online edition (c)2009 Cambridge UP DRAFT! April 1, 2009 Cambridge University Press. Feedback welcome. 1 1 Boolean retrieval INFORMATION RETRIEVAL The meaning of the term information retrieval can be very broad. Just getting a credit card

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

A survey of alternative designs for a search engine storage structure

A survey of alternative designs for a search engine storage structure Information and Software Technology 43 2001) 661±677 www.elsevier.com/locate/infsof A survey of alternative designs for a search engine storage structure Andrea Garratt*, Mike Jackson, Peter Burden, Jon

More information

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

Optimization of Internet Search based on Noun Phrases and Clustering Techniques Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna

More information

Isotope distributions

Isotope distributions Isotope distributions This exposition is based on: R. Martin Smith: Understanding Mass Spectra. A Basic Approach. Wiley, 2nd edition 2004. [S04] Exact masses and isotopic abundances can be found for example

More information

Sentence Blocks. Sentence Focus Activity. Contents

Sentence Blocks. Sentence Focus Activity. Contents Sentence Focus Activity Sentence Blocks Contents Instructions 2.1 Activity Template (Blank) 2.7 Sentence Blocks Q & A 2.8 Sentence Blocks Six Great Tips for Students 2.9 Designed specifically for the Talk

More information

C H A P T E R Regular Expressions regular expression

C H A P T E R Regular Expressions regular expression 7 CHAPTER Regular Expressions Most programmers and other power-users of computer systems have used tools that match text patterns. You may have used a Web search engine with a pattern like travel cancun

More information

Ranked Keyword Search in Cloud Computing: An Innovative Approach

Ranked Keyword Search in Cloud Computing: An Innovative Approach International Journal of Computational Engineering Research Vol, 03 Issue, 6 Ranked Keyword Search in Cloud Computing: An Innovative Approach 1, Vimmi Makkar 2, Sandeep Dalal 1, (M.Tech) 2,(Assistant professor)

More information

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer

Static vs. Dynamic. Lecture 10: Static Semantics Overview 1. Typical Semantic Errors: Java, C++ Typical Tasks of the Semantic Analyzer Lecture 10: Static Semantics Overview 1 Lexical analysis Produces tokens Detects & eliminates illegal tokens Parsing Produces trees Detects & eliminates ill-formed parse trees Static semantic analysis

More information

Handling big data of online social networks on a small machine

Handling big data of online social networks on a small machine Jia et al. Computational Social Networks (2015) 2:5 DOI 10.1186/s40649-015-0014-7 RESEARCH Open Access Handling big data of online social networks on a small machine Ming Jia *, Hualiang Xu, Jingwen Wang,

More information

CS 164 Programming Languages and Compilers Handout 8. Midterm I

CS 164 Programming Languages and Compilers Handout 8. Midterm I Mterm I Please read all instructions (including these) carefully. There are six questions on the exam, each worth between 15 and 30 points. You have 3 hours to work on the exam. The exam is closed book,

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle

More information

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation.

CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation. Miguel Ruiz, Anne Diekema, Páraic Sheridan MNIS-TextWise Labs Dey Centennial Plaza 401 South Salina Street Syracuse, NY 13202 Abstract:

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

More information

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm 1. LZ77:Sliding Window Lempel-Ziv Algorithm [gzip, pkzip] Encode a string by finding the longest match anywhere within a window of past symbols

More information

Structure for String Keys

Structure for String Keys Burst Tries: A Fast, Efficient Data Structure for String Keys Steen Heinz Justin Zobel Hugh E. Williams School of Computer Science and Information Technology, RMIT University Presented by Margot Schips

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,

More information

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006

The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1

More information

Data Structures Fibonacci Heaps, Amortized Analysis

Data Structures Fibonacci Heaps, Amortized Analysis Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines

Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines , 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing

More information

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Computational Geometry. Lecture 1: Introduction and Convex Hulls Lecture 1: Introduction and convex hulls 1 Geometry: points, lines,... Plane (two-dimensional), R 2 Space (three-dimensional), R 3 Space (higher-dimensional), R d A point in the plane, 3-dimensional space,

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

Question 1. Question 2. Question 3. Question 4. Mert Emin Kalender CS 533 Homework 3

Question 1. Question 2. Question 3. Question 4. Mert Emin Kalender CS 533 Homework 3 Question 1 Cluster hypothesis states the idea of closely associating documents that tend to be relevant to the same requests. This hypothesis does make sense. The size of documents for information retrieval

More information

8.1 Makespan Scheduling

8.1 Makespan Scheduling 600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programing: Min-Makespan and Bin Packing Date: 2/19/15 Scribe: Gabriel Kaptchuk 8.1 Makespan Scheduling Consider an instance

More information

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System

Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Stemming Methodologies Over Individual Query Words for an Arabic Information Retrieval System Hani Abu-Salem* and Mahmoud Al-Omari Department of Computer Science, Mu tah University, P.O. Box (7), Mu tah,

More information

Information Retrieval Systems in XML Based Database A review

Information Retrieval Systems in XML Based Database A review Information Retrieval Systems in XML Based Database A review Preeti Pandey 1, L.S.Maurya 2 Research Scholar, IT Department, SRMSCET, Bareilly, India 1 Associate Professor, IT Department, SRMSCET, Bareilly,

More information

Binary and Ranked Retrieval

Binary and Ranked Retrieval Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return under-dimensioned output

More information

Multilingual Information Retrieval Using English and Chinese Queries

Multilingual Information Retrieval Using English and Chinese Queries Multilingual Information Retrieval Using and Chinese Queries Aitao Chen School of Information Management and Systems University of California, Berkeley CLEF 2001 Workshop: 3-4 Sept, 2001, Darmstadt, Germany

More information

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)! Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following

More information

4 Domain Relational Calculus

4 Domain Relational Calculus 4 Domain Relational Calculus We now present two relational calculi that we will compare to RA. First, what is the difference between an algebra and a calculus? The usual story is that the algebra RA is

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value

Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value , pp. 397-408 http://dx.doi.org/10.14257/ijmue.2014.9.11.38 Developing a Collaborative MOOC Learning Environment utilizing Video Sharing with Discussion Summarization as Added-Value Mohannad Al-Mousa 1

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Elementary IR Systems: Supporting Boolean Text Search. Information Retrieval

Elementary IR Systems: Supporting Boolean Text Search. Information Retrieval Elementary IR Systems: Supporting Boolean Text Search Based on Hellerstein s slides, UC-Berkeley Database Management Systems, R. Ramakrishnan 1 Information Retrieval v A research field traditionally separate

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

Metasearch Engines. Synonyms Federated search engine

Metasearch Engines. Synonyms Federated search engine etasearch Engines WEIYI ENG Department of Computer Science, State University of New York at Binghamton, Binghamton, NY 13902, USA Synonyms Federated search engine Definition etasearch is to utilize multiple

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Log Analysis of Academic Digital Library: User Query Patterns

Log Analysis of Academic Digital Library: User Query Patterns Log Analysis of Academic Digital Library: User Query Patterns Hyejung Han 1, Wooseob Jeong 1 and Dietmar Wolfram 1 1 University of Wisconsin - Milwaukee Abstract This study analyzed user queries submitted

More information

Overview of the Full-Text Document Retrieval Benchmark

Overview of the Full-Text Document Retrieval Benchmark 8 Overview of the Full-Text Document Retrieval Benchmark Samuel DeFazio Digital Equipment Corporation 8.1 Introduction For most of recorded history, textual data have existed primarily in hard-copy format,

More information

Solutions to Problem Set 1

Solutions to Problem Set 1 YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467b: Cryptography and Computer Security Handout #8 Zheng Ma February 21, 2005 Solutions to Problem Set 1 Problem 1: Cracking the Hill cipher Suppose

More information

Introduction to Proofs

Introduction to Proofs Chapter 1 Introduction to Proofs 1.1 Preview of Proof This section previews many of the key ideas of proof and cites [in brackets] the sections where they are discussed thoroughly. All of these ideas are

More information

Reading and Writing in the EYFS

Reading and Writing in the EYFS Reading and Writing in the EYFS Aims of this session: Outline the expectations in Nursery and Reception for reading and writing Explain how we teach reading in the EYFS Give you some ideas on how you can

More information

Inverted Files for Text Search Engines

Inverted Files for Text Search Engines Inverted Files for Text Search Engines JUSTIN ZOBEL RMIT University, Australia AND ALISTAIR MOFFAT The University of Melbourne, Australia The technology underlying text search engines has advanced dramatically

More information

Storage Management for Files of Dynamic Records

Storage Management for Files of Dynamic Records Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:

More information

Standard for Software Component Testing

Standard for Software Component Testing Standard for Software Component Testing Working Draft 3.4 Date: 27 April 2001 produced by the British Computer Society Specialist Interest Group in Software Testing (BCS SIGIST) Copyright Notice This document

More information

Search Engines: Technology, Society, and Business. Prof. Marti Hearst Sept 24, 2007

Search Engines: Technology, Society, and Business. Prof. Marti Hearst Sept 24, 2007 Search Engines: Technology, Society, and Business Prof. Marti Hearst Sept 24, 2007 How Search Engines Work Three main parts: i. Gather the contents of all web pages (using a program called a crawler or

More information

Sentence Parts. Abbreviations

Sentence Parts. Abbreviations Sentence Parts Daily Grammar Practice Day 2 Tuesday what do I do with all those labels from Monday? First, don t ignore what you did yesterday. Use Monday s labels as a guide. 1. Label any prepositional

More information

MFL skills map. Year 3 Year 4 Year 5 Year 6 Develop understanding of the sounds of Individual letters and groups of letters (phonics).

MFL skills map. Year 3 Year 4 Year 5 Year 6 Develop understanding of the sounds of Individual letters and groups of letters (phonics). listen attentively to spoken language and show understanding by joining in and responding explore the patterns and sounds of language through songs and rhymes and link the spelling, sound and meaning of

More information

Comparing explicit and implicit feedback techniques for web retrieval: TREC-10 interactive track report

Comparing explicit and implicit feedback techniques for web retrieval: TREC-10 interactive track report Comparing explicit and implicit feedback techniques for web retrieval: TREC-10 interactive track report Ryen W. White 1, Joemon M. Jose 1 and I. Ruthven 2 1 Department of Computing Science University of

More information

Testing LTL Formula Translation into Büchi Automata

Testing LTL Formula Translation into Büchi Automata Testing LTL Formula Translation into Büchi Automata Heikki Tauriainen and Keijo Heljanko Helsinki University of Technology, Laboratory for Theoretical Computer Science, P. O. Box 5400, FIN-02015 HUT, Finland

More information

Terminology Retrieval: towards a synergy between thesaurus and free text searching

Terminology Retrieval: towards a synergy between thesaurus and free text searching Terminology Retrieval: towards a synergy between thesaurus and free text searching Anselmo Peñas, Felisa Verdejo and Julio Gonzalo Dpto. Lenguajes y Sistemas Informáticos, UNED {anselmo,felisa,julio}@lsi.uned.es

More information