The Mirror DBMS at TREC-9
|
|
|
- Derrick Percival Chapman
- 10 years ago
- Views:
Transcription
1 The Mirror DBMS at TREC-9 Arjen P. de Vries CWI, Amsterdam, The Netherlands Abstract The Mirror DBMS is a prototype database system especially designed for multimedia and web retrieval. From a database perspective, this year s purpose has been to check whether we can get sufficient efficiency on the larger data set used in TREC-9. From an IR perspective, the experiments are limited to rather primitive web-retrieval, teaching us that web-retrieval is (un?-)fortunately not just retrieving text from a different data source. We report on some limited (and disappointing) experiments in an attempt to benefit from the manually assigned data in the metatags. We further discuss observations with respect to the effectiveness of title-only topics. 1 Introduction The Mirror DBMS [dv99] combines content management and data management in a single system. The main advantage of such integration is the facility to combine IR with traditional data retrieval. Furthermore, IR researchers can experiment more easily with new retrieval models, using and combining various sources of information. The IR retrieval model is completely integrated in the Mirror DBMS database architecture, emphasizing efficient set-oriented query processing. The logical layer of its architecture supports a nested object algebra called Moa; the physical layer uses the Monet mainmemory DBMS and its MIL query language [BK99]. Experiments performed in last year s evaluation are described in [dvh99]; its support for IR is presented in detail in [dv98] and [dvw99]. The main goal of this year s participation in TREC has been to migrate from plain text retrieval to retrieving web documents, and simultaneously improve our algorithms to handle the significantly larger collections. The paper is organized as follows. Section 2 details our lab environment. Section 3 interprets our results and discusses our plans for next year with the Mirror DBMS, followed by conclusions. 1
2 Arjen P. de Vries 2 <doc docno= WTX044-B44-57 docoldno= IA B > <title>bootnet reviews</title> <description>... </description> <keywords>... </keywords> <body> compaq ambitious delivers impressive design performance and features compaq presario so near yet so far win means the issue of sound blaster compatibility is dead and buried right wrong... </body> <img>card cage</img> </doc> 2 Lab Environment Figure 1: The intermediate format produced (XML). This section discusses the processing of the WT10g data collection used to produce our runs. The hardware platform running the experiments is a (dedicated) dual Pentium III 600 MHz, running Linux, with 1 Gb of main memory and 100 Gb of disk space. Adapting our existing IR setup to handling Web data caused much more trouble than expected. As a side-effect of this problem, the submitted runs contained some errors, and even fixing does not give us the best runs ever; a lot of work still remains to be done to improve our current platform. Managing a new document collection and getting it indexed has turned out to be a rather timeconsuming problem. Obviously, the WT10g is 10 times larger than TREC, so we decided to treat it as a collection of 104 subcollections following the layout on the compact discs. But, handling a collection of this size was not the real issue; our main problems related to the quality of data gathered from the web. 2.1 Parsing After some initial naive efforts to hack a home-grown HTML parser, we bailed out and used the readily available Perl package HTML::Parser. It is pretty good at correcting bad HTML on-the-fly; the only real problem we bumped into was that it assumes a document to always have at least a <HEAD> and a <BODY>, which fails on WTX089-B33. We convert the sloppy HTML documents into (rather simple) XML documents that are easier to manage in the subsequent indexing steps. We keep the normal content words, the content of <IMG> s ALT attribute, as well as the following meta tags: keywords, description, classification, abstract, author, build. In this first step, we also normalize the textual data to some extent by converting to lower-case and throwing out strange characters ; unfortunately, due to working against a very tight schedule (too tight), this included the removal of all punctuation and numeric characters (not helping topics referring to a particular year, like topics 456 and 481). An example result file is shown in Figure 1. What affected our results severely is our assumption that HTML documents are nicely wrapped in <HTML> and </HTML> tags. Unfortunately, this first bug removed about
3 Arjen P. de Vries 3 half of the collection from our index. 2.2 Indexing The second step reads the intermediate XML files and converts them into load tables for our database system. The contents of the title, body, and img tags are unioned together in term sets, and all other tags in meta sets. 1 Notice that we do not have to union these tags together; but, any alternative requires an IR model that can handle fields properly, which is still beyond our skill. After loading these tables, the complete collection is represented by the following schema: define WT10g_docs as TUPLE< Atomic<string> : docno, Atomic<string> > : term, Atomic<string> > : meta > >: subcollection >; The Mirror DBMS supports efficient ranking using the CONTREP domain-specific Moa structures, specialized in IR. These structures now have to be created from the above representation; this indexing procedure is still implemented in a separate indexing MIL script which is directly fed into Monet, the DBMS. The script performs stopping, stemming, creates the global statistics, and creates the internally used < d j, t i, tf i,j > tuples. Web-data is quite different from newspaper articles: the strangest terms can be found in the indexing vocabulary after stopping and stemming. Examples vary from yippieyayeheee and the like, to complete sentences lacking spaces between the words. After quick inspection, we decided to prune the vocabulary aggressively: all words longer than 20 characters are plainly removed from the indexing vocabulary, as well as all words containing a sequence of more than four identical characters Retrieval After running the indexing script, we obtain the following schema that can be used in Moa expressions to perform ranking: 1 Notice that these sets are really multi-sets or bags. 2 We realize that this is a rather drastic ad-hoc approach; it is not likely to survive into the codebase of next year.
4 Arjen P. de Vries 4 define WT10g_index as TUPLE< Atomic<string>: docno, CONTREP : term, CONTREP : meta > >: subcollection >; The Monet database containing both the parsed web documents and its index takes 9 Gb of disk space. Like in TREC-8, we use Hiemstra s LMM retrieval model (see also [Hie00] and our technical report [HdV00]). It builds a simple statistical language model for each document in the collection. The probability that a query T 1, T 2,, T n of length n is generated by the language model of the document with identifier D is defined by the following equation: n df(t i ) P (T 1 =t 1,, T n =t n D =d) = (α 1 i=1 t df(t) + α tf (t i, d) 2 (1) t tf (t, d)) The getbl-operator (i.e. get beliefs) defined for the CONTREP structure ranks documents according to this retrieval model. We planned two series of experiments: ranking using the raw content collected in the term sets, and ranking using a weighted combination from the first ranking with the ranking based on the annotations extracted from the meta tags collected in the meta sets. These two experiments are (approximately) described by the following Moa expressions: 3 (1) flatten(map[map[tuple<this.docno, sum(getbl(this.term, termstat, query))>](this)](wt10g_index)); (2) map[tuple< THIS.docno, THIS.termBel + 0.1*THIS.metaBel >]( flatten(map[ map[ TUPLE< THIS.docno, sum(getbl(this.term, termstat, query)): termbel, sum(getbl(this.meta, metastat, query)): metabel > ](THIS) ]( WT10g_index ))); We do not expect the reader to grasp the full meaning of these queries, but only intend to give an overall impression; the inner map computes the conditional probabilities for 3 Details like sorting and selecting the top ranked documents have been left out.
5 Arjen P. de Vries 5 documents in a subcollection, which are accessed with the outer map; the flattens remove nesting. Similar to TREC-8, the query plan generated by the Moa rewriter (in MIL) has been manually edited to loop over the 50 topics, log the computed ranking for each topic, and use two additional tables, one with precomputed normalized inverse document frequencies (a materialized view), and one with the document-specific constants for normalizing the term frequencies. The flatten and the outer map, which iterate over the 104 subcollections and merge the results, were written directly in MIL as well. Unfortunately, we introduced a second (real) bug in this merging phase, which messed up our main results: we used slice(1,1000) whereas this really selects from the 2nd up to the 1001st ranking document; hence throwing out the 104 best documents (best according to our model). 3 Discussion Table 1 presents a summary of the average precision (AP, as reported by trec eval) measured on our runs. The first column has the results in which the top 100 documents are missing; the second column has the fixed results. 4 run name description AP AP (fixed) CWI0000 title CWI0001 title & description CWI0002 title & description & narrative CWI0010 title (term & meta) Table 1: Result summary. Very surprising is the fact that using the description and/or narrative has not been helpful at all. This is completely different from our experience with evaluating TREC-6 and TREC-8 topics. Closer examination shows that the description (and sometimes the narrative) help significantly for the following topics: 452: do beavers live in salt water?. words such as habitat ; Here, the description adds more general 455: the description specifies major league and the narrative gives finally the desired baseball ; 476: the description adds the scope ( television and movies ) to title Jennifer Aniston ; 478: The description adds mayor to Baltimore 4 The run with combined term and meta data has not been fixed, due to some strange software problems that we have not figured out yet.
6 Arjen P. de Vries 6 498: The title does not mention how many and cost which reveal the real information need. Precision drops however in most other cases; especially the very concise title queries 485 ( gps clock ), 497 ( orchid ) and 499 ( pool cue ) suffer from overspecification in description and narrative. For example, topic 486 shows that the casino from the title is not weighted enough in comparison to generic terms like Eldorado. It warrants further investigation to view if query term weighting would address this counter-intuitive result. We have not succeeded to make effective use of the information collected in the meta tags. Using only the meta tags leads to a poor average precision of Closer investigation of topics 451, 492, 494, for which the meta tags do retrieve relevant documents in the top 10, it turns out that these are also ranked high using the bare document content. Thus, we can only conclude that in our current approach, the meta tags can safely be ignored. Despite of this disappointing experience, we hope still that using this information source may be more benificial for processing blind relevance feedback. Table 2 shows the results after spell-checking the topics semi-automatically using a combination of ispell and common sense, showing that this would have a minor positive effect on the title-only queries. Results for topics 463 (Tartin), 475 (compostion), and 487 (angioplast7) improve significantly; but, splitting nativityscenes in topic 464 causes a loss in precision, because the (Porter) stemmer reduces nativity to nativ. We conclude from the other runs with spelled topics that we should address proper weighting of title terms first. run name description AP CWI0000s title CWI0001s title & description CWI0002s title & description & narrative Table 2: Results with spell-checked topics. 4 Conclusions The honest conclusion of this year s evaluation should be that we underestimated the problem of handling Web data. Surprising is the performance of the title-only queries doing better than queries including description or even narrative. It seems that the web-track topics are really different from the previous TREC topics in the ad-hoc task, for which we never weighted title terms different from description or narrative. For next year, our primary goal will be to improve the current indexing situation. The indexing process can be described declaratively using the notion of feature grammars described in [dvwak00]. Also, we will split the indexing vocabulary in a trusted vocabulary (based on a proper dictionary), a numeric and named entity collection, and
7 Arjen P. de Vries 7 a trash dictionary with rare and possibly non-sense terms. A third goal should be evident from the following quote from last year s TREC paper: Next year, the Mirror DBMS should be ready to participate in the large WEB track. In other words: we still intend to tackle the large web track. For our cross-lingual work in CLEF [dv00] we have to address the problems of spelling errors and named entities anyways, which is directly related to some of our TREC problems. Some other ideas include to work on using links, blind relevance feedback, as well as improve our understanding on how to effectively exploit the meta sets. Acknowledgements Henk Ernst Blok did a great just-in-time job of getting a late delivery of hardware prepared for running these experiments. Further thanks go to Djoerd Hiemstra and Niels Nes for their support with IR models and Monet respectively. References [BK99] P.A. Boncz and M.L. Kersten. MIL primitives for querying a fragmented world. The VLDB Journal, 8(2): , [dv98] A.P. de Vries. Mirror: Multimedia query processing in extensible databases. In Proceedings of the fourteenth Twente workshop on language technology (TWLT14): Language Technology in Multimedia Information Retrieval, pages 37 48, Enschede, The Netherlands, December [dv99] [dv00] A.P. de Vries. Content and multimedia database management systems. PhD thesis, University of Twente, Enschede, The Netherlands, December A.P. de Vries. A poor man s approach to CLEF. In CLEF 2000: Workshop on crosslanguage information retrieval and evaluation, Lisbon, Portugal, September Working Notes. [dvh99] A.P. de Vries and D. Hiemstra. The Mirror DBMS at TREC-8. In Proceedings of the Eighth Text Retrieval Conference TREC-8, number in NIST Special publications, pages , Gaithersburg, Maryland, November [dvw99] A.P. de Vries and A.N. Wilschut. On the integration of IR and databases. In Database issues in multimedia; short paper proceedings, international conference on database semantics (DS-8), pages 16 31, Rotorua, New Zealand, January [dvwak00] A.P. de Vries, M. Windhouwer, P.M.G. Apers, and M. Kersten. Information access in multimedia databases based on feature models. New Generation Computing, 18(4): , August [HdV00] Djoerd Hiemstra and Arjen de Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR CTIT 00 09, Centre for Telematics and Information Technology, May [Hie00] D. Hiemstra. A probabilistic justification for using tf idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2): , 2000.
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
An Information Retrieval using weighted Index Terms in Natural Language document collections
Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia
The University of Lisbon at CLEF 2006 Ad-Hoc Task
The University of Lisbon at CLEF 2006 Ad-Hoc Task Nuno Cardoso, Mário J. Silva and Bruno Martins Faculty of Sciences, University of Lisbon {ncardoso,mjs,bmartins}@xldb.di.fc.ul.pt Abstract This paper reports
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
Managing large sound databases using Mpeg7
Max Jacob 1 1 Institut de Recherche et Coordination Acoustique/Musique (IRCAM), place Igor Stravinsky 1, 75003, Paris, France Correspondence should be addressed to Max Jacob ([email protected]) ABSTRACT
ICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms
Using Database Metadata and its Semantics to Generate Automatic and Dynamic Web Entry Forms Mohammed M. Elsheh and Mick J. Ridley Abstract Automatic and dynamic generation of Web applications is the future
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER
Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER Junction Solutions documentation 2012 All material contained in this documentation is proprietary and confidential to
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
Inmagic Content Server Workgroup Configuration Technical Guidelines
Inmagic Content Server Workgroup Configuration Technical Guidelines 6/2005 Page 1 of 12 Inmagic Content Server Workgroup Configuration Technical Guidelines Last Updated: June, 2005 Inmagic, Inc. All rights
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
The Data Quality Continuum*
The Data Quality Continuum* * Adapted from the KDD04 tutorial by Theodore Johnson e Tamraparni Dasu, AT&T Labs Research The Data Quality Continuum Data and information is not static, it flows in a data
Incorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects
Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi [email protected]
www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web
wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually
Physical Design. Meeting the needs of the users is the gold standard against which we measure our success in creating a database.
Physical Design Physical Database Design (Defined): Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Query optimization. DBMS Architecture. Query optimizer. Query optimizer.
DBMS Architecture INSTRUCTION OPTIMIZER Database Management Systems MANAGEMENT OF ACCESS METHODS BUFFER MANAGER CONCURRENCY CONTROL RELIABILITY MANAGEMENT Index Files Data Files System Catalog BASE It
The Import & Export of Data from a Database
The Import & Export of Data from a Database Introduction The aim of these notes is to investigate a conceptually simple model for importing and exporting data into and out of an object-relational database,
Chapter 1: Introduction
Chapter 1: Introduction Database System Concepts, 5th Ed. See www.db book.com for conditions on re use Chapter 1: Introduction Purpose of Database Systems View of Data Database Languages Relational Databases
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Efficient Processing of XML Documents in Hadoop Map Reduce
Efficient Processing of Documents in Hadoop Map Reduce Dmitry Vasilenko, Mahesh Kurapati Business Analytics IBM Chicago, USA [email protected], [email protected] Abstract has dominated the enterprise
On a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
SEO Content Writing Guide
A Guideline for Creating Content to Achieve Higher Search Engine Ranking By Steve Wiideman Search Engine Optimization Expert Contents Writing Content in Wordpress... 3 Questions for the Designer... 3 Wordpress
Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007
Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking
Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In
The University of Amsterdam s Question Answering System at QA@CLEF 2007
The University of Amsterdam s Question Answering System at QA@CLEF 2007 Valentin Jijkoun, Katja Hofmann, David Ahn, Mahboob Alam Khalid, Joris van Rantwijk, Maarten de Rijke, and Erik Tjong Kim Sang ISLA,
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
101 Basics to Search Engine Optimization
101 Basics to Search Engine Optimization (A Guide on How to Utilize Search Engine Optimization for Your Website) By SelfSEO http://www.selfseo.com For more helpful SEO tips, free webmaster tools and internet
Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines
Inmagic Content Server v1.3 Technical Guidelines 6/2005 Page 1 of 15 Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines Last Updated: June, 2005 Inmagic, Inc. All rights
Multimedia Database Systems: Where are we now?
Multimedia Database Systems: Where are we now? Harald Kosch and Mario Döller Institute of Information Technology, University Klagenfurt Universitätsstr. 65/67, A -9020 Klagenfurt Austria harald(mario)@itec.uni-klu.ac.at
Improving Contextual Suggestions using Open Web Domain Knowledge
Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma
Big Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Advas A Python Search Engine Module
Advas A Python Search Engine Module Dipl.-Inf. Frank Hofmann Potsdam 11. Oktober 2007 Dipl.-Inf. Frank Hofmann (Potsdam) Advas A Python Search Engine Module 11. Oktober 2007 1 / 15 Contents 1 Project Overview
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS
ONTOLOGY BASED FEEDBACK GENERATION IN DESIGN- ORIENTED E-LEARNING SYSTEMS Harrie Passier and Johan Jeuring Faculty of Computer Science, Open University of the Netherlands Valkenburgerweg 177, 6419 AT Heerlen,
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
SEO AND CONTENT MANAGEMENT SYSTEM
International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel
WEB APPLICATION FOR TIMETABLE PLANNING IN THE HIGHER TECHNICAL COLLEGE OF INDUSTRIAL AND TELECOMMUNICATIONS ENGINEERING
WEB APPLICATION FOR TIMETABLE PLANNING IN THE HIGHER TECHNICAL COLLEGE OF INDUSTRIAL AND TELE ENGINEERING Dra. Marta E. Zorrilla Pantaleón Dpto. Applied Mathematics and Computer Science Avda. Los Castros
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
A Framework for Personalized Healthcare Service Recommendation
A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,
Database preservation toolkit:
Nov. 12-14, 2014, Lisbon, Portugal Database preservation toolkit: a flexible tool to normalize and give access to databases DLM Forum: Making the Information Governance Landscape in Europe José Carlos
AppendixA1A1. Java Language Coding Guidelines. A1.1 Introduction
AppendixA1A1 Java Language Coding Guidelines A1.1 Introduction This coding style guide is a simplified version of one that has been used with good success both in industrial practice and for college courses.
Data Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction
Introduction to Database Systems Winter term 2013/2014 Melanie Herschel [email protected] Université Paris Sud, LRI 1 Chapter 1 Introduction After completing this chapter, you should be able to:
The XLDB Group at CLEF 2004
The XLDB Group at CLEF 2004 Nuno Cardoso, Mário J. Silva, and Miguel Costa Grupo XLDB - Departamento de Informática, Faculdade de Ciências da Universidade de Lisboa {ncardoso, mjs, mcosta} at xldb.di.fc.ul.pt
Data Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
101 Basics to Search Engine Optimization. (A Guide on How to Utilize Search Engine Optimization for Your Website)
101 Basics to Search Engine Optimization (A Guide on How to Utilize Search Engine Optimization for Your Website) Contents Introduction Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Why Use
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
UNIVERSITY OF MALTA THE MATRICULATION CERTIFICATE EXAMINATION ADVANCED LEVEL COMPUTING. May 2011
UNIVERSITY OF MALTA THE MATRICULATION CERTIFICATE EXAMINATION ADVANCED LEVEL COMPUTING May 2011 EXAMINERS REPORT MATRICULATION AND SECONDARY EDUCATION CERTIFICATE EXAMINATIONS BOARD AM Computing May 2011
www.gr8ambitionz.com
Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1
Introduction to DISC and Hadoop
Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and
Inmagic Content Server v9 Standard Configuration Technical Guidelines
Inmagic Content Server v9.0 Standard Configuration Technical Guidelines 5/2006 Page 1 of 15 Inmagic Content Server v9 Standard Configuration Technical Guidelines Last Updated: May, 2006 Inmagic, Inc. All
How To Use X Query For Data Collection
TECHNICAL PAPER BUILDING XQUERY BASED WEB SERVICE AGGREGATION AND REPORTING APPLICATIONS TABLE OF CONTENTS Introduction... 1 Scenario... 1 Writing the solution in XQuery... 3 Achieving the result... 6
Website Standards Association. Business Website Search Engine Optimization
Website Standards Association Business Website Search Engine Optimization Copyright 2008 Website Standards Association Page 1 1. FOREWORD...3 2. PURPOSE AND SCOPE...4 2.1. PURPOSE...4 2.2. SCOPE...4 2.3.
Eleven Steps to Success in Data Warehousing
DATA WAREHOUSING BUILDING A DATA WAREHOUSE IS NO EASY TASK... THE RIGHT PEOPLE, METHODOLOGY, AND EXPERIENCE ARE EXTREMELY CRITICAL Eleven Steps to Success in Data Warehousing by Sanjay Raizada General
FileMaker 12. ODBC and JDBC Guide
FileMaker 12 ODBC and JDBC Guide 2004 2012 FileMaker, Inc. All Rights Reserved. FileMaker, Inc. 5201 Patrick Henry Drive Santa Clara, California 95054 FileMaker and Bento are trademarks of FileMaker, Inc.
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113
CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113 Instructor: Boris Glavic, Stuart Building 226 C, Phone: 312 567 5205, Email: [email protected] Office Hours:
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
Chapter 1: Introduction. Database Management System (DBMS) University Database Example
This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
Interactive Content-Based Retrieval Using Pre-computed Object-Object Similarities
Interactive Content-Based Retrieval Using Pre-computed Object-Object Similarities Liudmila Boldareva and Djoerd Hiemstra University of Twente, Databases Group, P.O.Box 217, 7500 AE Enschede, The Netherlands
Chapter 1. Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages
Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: [email protected] Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming
Segmentation and Classification of Online Chats
Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract One method for analyzing textual chat
Learn to Personalized Image Search from the Photo Sharing Websites
Learn to Personalized Image Search from the Photo Sharing Websites ABSTRACT: Increasingly developed social sharing websites, like Flickr and Youtube, allow users to create, share, annotate and comment
PROJECT REPORT OF BUILDING COURSE MANAGEMENT SYSTEM BY DJANGO FRAMEWORK
PROJECT REPORT OF BUILDING COURSE MANAGEMENT SYSTEM BY DJANGO FRAMEWORK by Yiran Zhou a Report submitted in partial fulfillment of the requirements for the SFU-ZU dual degree of Bachelor of Science in
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,
A Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
Representing XML Schema in UML A Comparison of Approaches
Representing XML Schema in UML A Comparison of Approaches Martin Bernauer, Gerti Kappel, Gerhard Kramler Business Informatics Group, Vienna University of Technology, Austria {lastname}@big.tuwien.ac.at
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
SEO Services Sample Proposal
SEO Services Sample Proposal Scroll down to see the rest of this truncated sample. When purchased, the complete sample is 18 pages long and was written using these Proposal Pack templates: Cover Letter,
Semester Thesis Traffic Monitoring in Sensor Networks
Semester Thesis Traffic Monitoring in Sensor Networks Raphael Schmid Departments of Computer Science and Information Technology and Electrical Engineering, ETH Zurich Summer Term 2006 Supervisors: Nicolas
REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES. Jesse Wright Jet Propulsion Laboratory,
REDUCING THE COST OF GROUND SYSTEM DEVELOPMENT AND MISSION OPERATIONS USING AUTOMATED XML TECHNOLOGIES Colette Wilklow MS 301-240, Pasadena, CA phone + 1 818 354-4674 fax + 1 818 393-4100 email: [email protected]
Time: A Coordinate for Web Site Modelling
Time: A Coordinate for Web Site Modelling Paolo Atzeni Dipartimento di Informatica e Automazione Università di Roma Tre Via della Vasca Navale, 79 00146 Roma, Italy http://www.dia.uniroma3.it/~atzeni/
But have you ever wondered how to create your own website?
Foreword We live in a time when websites have become part of our everyday lives, replacing newspapers and books, and offering users a whole range of new opportunities. You probably visit at least a few
