Promoting Agriculture Knowledge via Public Web Search Engines : An Experience by an Iranian Librarian in Response to Agricultural Queries



Similar documents
Precision and Relative Recall of Search Engines: A Comparative Study of Google and Yahoo

Raising Reliability of Web Search Tool Research through. Replication and Chaos Theory


Site-Specific versus General Purpose Web Search Engines: A Comparative Evaluation

ICAU1133A Send and retrieve information using web browsers and

Specialized Search Engines for Arabic Language

ICAU1133B Send and retrieve information using web browsers and

Performance analysis of 5A's web robots

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Search engine ranking

IJREAS Volume 2, Issue 2 (February 2012) ISSN: STUDY OF SEARCH ENGINE OPTIMIZATION ABSTRACT

Short title: Empirical evaluation of Internet indexes. Please send correspondence to both authors:

Search Engine Optimization based on Effective Factors of Ranking in Web Sites: A Review

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Collaborative Search: Deployment Experiences

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Web Impact Factors and Search Engine Coverage 1

Multitasking Web Search on Alta Vista

Browser Searching Tips For Windows/Macintosh Twila Baze

PARTITIONING DATA TO INCREASE WEBSITE VISIBILITY ON SEARCH ENGINE

SEARCH ENGINE BASICS- THE SEARCH HELPER Randy Abdallah, Arts/Technology Specialist

Web of Science based ranking of Indian library and information science journals

Designing and Development of Biochemistry Subject Portal using Bluevoda Web Building Software: A Practical Approach

A Webometric Analysis of Some Universities in Lebanon

Citations in scientific communication

Does it Matter Which Citation Tool is Used to Compare the h-index of a Group of Highly Cited Researchers?

How Search Engines Work

Search Engine Submission

Ways to find medical information on the Internet. Academic Library of the Medical Faculty CU

Search and Information Retrieval

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Predicting Web Hosting Trend by Analyzing the Wikipedia Article Traffic

Requirement Engineering in Service-Oriented Architecture

Search Engine Optimization Techniques To Enhance The Website Performance

A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

Arya Progen Technologies & Engineering India Pvt. Ltd.

THE DYNAMICS OF SEARCH ENGINE MARKETING FOR TOURIST DESTINATIONS

Our SEO services use only ethical search engine optimization techniques. We use only practices that turn out into lasting results in search engines.

Bibliometrics and Transaction Log Analysis. Bibliometrics Citation Analysis Transaction Log Analysis

Journal of Informetrics

Domain Classification of Technical Terms Using the Web


Chapter 2. The Internet, The Web, and Electronic Commerce

Pay-Per-Click Internet Marketing Proposal

Journal of Informetrics

On the Fly Query Segmentation Using Snippets

Significance and Impact of Meta Tags on Search Engine Results Pages

Challenges in Running a Commercial Web Search Engine. Amit Singhal

AGENCY51 INSIGHTS OUR PROCESS, CHECKLIST & UNDERSTANDING SEO

A Framework for Evaluating the Retrieval Effectiveness of Search Engines

CLOUD COMPUTING AN EFFICIENT WAY TO PROVIDE FOR IT SERVICE IN IRAN METEOROLOGICAL ORGANIZATION

Search Engine Optimisation (SEO) Guide

Effect of some important factors on management of customer relationship with an emphasis on comprehensive banking

Module Two - Searching Tools

A Comparative Approach to Search Engine Ranking Strategies

A Rule-Based Short Query Intent Identification System

Chapter 2. The Internet, The Web, and Electronic Commerce. McGraw-Hill/Irwin. Copyright 2008 by The McGraw-Hill Companies, Inc. All rights reserved.

SEO AND CONTENT MANAGEMENT SYSTEM

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.

Online Attention of Universities in Finland: Are the Bigger Universities Bigger Online too?

Effective use of the Internet for Enquiry Answering

Optimizer Search Engine Optimization Services

Authorship Pattern and Degree of Collaboration in Information Technology

A Novel Framework for Personalized Web Search

Methods for comparing rankings of search engine results

Online Traffic Generation

This is a living document that can be changed or updated at any time. Any unforeseen costs will be agreed upon by both parties before proceeding.

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Ranking Web of repositories: End users point of view?

Collaboration of Turkish Scholars: Local or Global? *

Search Query and Matching Approach of Information Retrieval in Cloud Computing

Hyperlink Analysis of E-commerce Websites for Business Intelligence: Exploring Websites of Top Retail Companies of Asia Pacific and USA

Custom Online Marketing Program Proposal for: Hearthstone Homes

Thinking About a Website? An Introduction to Websites for Business. Name:

A Synonym Based Approach of Data Mining in Search Engine Optimization

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:

CITATION RESOURCES RESEARCH SKILLS. Definitions of terms

Sixth International Conference on Webometrics, Informetrics and Scientometrics & Eleventh COLLNET Meeting, October 19 22, 2010, University of Mysore,

Analyzing Chinese-English Mixed Language Queries in a Web Search Engine

Comparing Journal Impact Factor and H-type Indices in Virology Journals

Internet Access, Use and Gratification among University Students: A Case Study of the Islamia University of Bahawalpur, Pakistan

Log Analysis of Academic Digital Library: User Query Patterns

Dynamics of Search Engine Rankings A Case Study

Internet and Its Use in the Engineering Colleges of Udaipur, Rajasthan, India: A Case Study

Best Practice Search Engine Optimisation

Search Star An Introduction to Pay Per Click

THE SECRETS OF SEARCH ENGINE OPTIMIZATION

Architecture for Checking Trustworthiness of Websites

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN

Searching and Researching on the Internet. By Anne M. Chémali and Jill R. Sommer. Unleash the Power of the Internet!

THE USE OF INTERNET AMONG MALAYSIAN LIBRARIANS

Chapter 5 Use the technological Tools for accessing Information 5.1 Overview of Databases are organized collections of information.

Investigating customer click through behaviour with integrated sponsored and nonsponsored results

Model for E-Learning in Higher Education of Agricultural Extension and Education in Iran

Optimised Realistic Test Input Generation

Get Found: Local SEO Marketing

Educational Requirement Analysis for Information Security Professionals in Korea

Scientific Research Activity and Communication Measured With Cybermetrics Indicators

Search Engine Optimization Questionnaire

Transcription:

Promoting Agriculture Knowledge via Public Web Search Engines : An Experience by an Iranian Librarian in Response to Agricultural Queries Sedigheh Mohamadesmaeil Saeed Ghaffari Sedigheh Mohamadesmaeil Assistant Professor Department of Library and Information Sciences Science and Research Branch Islamic Azad University Tehran, Iran m.esmaeili2@gmail.com Although the Internet is already becoming a valuable information resource in information retrieving, there are important challenges before agricultural interest groups and users for extensive accessing to this information. Indeed, there is a couple of specific search engines, directories and sites in agricultural subject domain on the web, but it seems the major public search engines could be able to response the scientific field requests as well. This paper aims to determine whether this fact is true in agricultural field domain or not? We are comparing and measuring five major public search engines in response to agricultural requests. This research examines major search engines in response to agricultural terminologies. In order to assess the recall, precision and overlap of search engines, five well-used search engines (Google, Yahoo, AltaVista, AOL, ASK) were chosen. Then five agricultural keywords which selected from CAB (consist of: Intercropping, Carnivorous plants, Soil pollution, Plant viruses, Irrigation farming) were searched in each these search engines. The best search engines in answer to the subject terms are introduced. AOL had 63% precision and 22% recall and retrieved the most relevant agricultural documents. Also, Yahoo had 43% overlap with other search engines, so Yahoo also scored the highest rank. Through this study, findings reveal that major public search engines are suitable alternative for finding agricultural information.the results of this study can also inform agricultural centers, agricultural Information Specialists and agricultural interest groups (users) seek better agricultural resources.this research is an investigation into web search engines recall, precision, and Saeed Ghaffari Department of Library & Information Science Payam Noor University Qom-Iran Ghaffari13@yahoo.com Originally presented at the 7th International Conference on Webometrics, Informetrics and Scientometrics (WIS) and 12th COLLNET Meeting, September 20 23, 2011, Istanbul Bilgi University, Istanbul, Turkey. Published Online First : 15 December 2012 http://www.tarupublications.com/journals/cjsim/cjsim.htm COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 1

Promoting Agriculture Knowledge via Public Web Search Engines overlap using agricultural queries, sheds light on the uniqueness of top results retrieved by search engines. In other words, this paper indicates the significant value of search engines in web retrieval even in expert areas. Search engines, recall ratio, precision ratio, overlap, agricultural information, information retrieval Keywords: Agriculture Knowledge, Public Web Search Engines, Google, Yahoo, AltaVista, AOL, ASK 1. Introduction Although the Internet is already a valuable information resource in agricultural information retrieval, there are important challenges to be faced before users will have extensive access to this information (Aguillo, 2000 [1]). Searching is the main activity on the web, and the major search engines are the most frequently used tools for accessing information (Nielsen, 2005 [8]). Many commercial web search engines offer public access to web sites, including Yahoo!, MSN Search, Google and Northern light. Web search engines can differ from one another in three ways crawling reach, frequency of updates, and relevancy analysis. Therefore, the performance capabilities and limitations of web search engines, and the differences between them, is an important and significant research area (Spink et al., 2006 [18]). There are a large variety of search engines on the web. It is essential for agricultural librarians, as information experts, and also for agriculturalists to identify the best search engines in agricultural information retrieval in order to introduce them to agricultural researchers and using them by themselves. If search engines with a high recall ratio are identified, users, here agricultural experts, can confidently rely on them in searching the web. Thus, this paper aims at calculating recall, precision and overlap of well-used popular search engines in reaction to agricultural expressions. 2. Related studies Since the mid-1990s, web searching research has become a crucial area of study. Ding and Marchionini, 1996, [9]) investigated Infoseek, Lycos and Open Text for precision, duplication and degree of overlap using five complex queries. The first twenty hits assessed for precision show that the best results are obtained from Lycos and Open Text. Leighton and Srivastava, [13] searched fifteen queries on AltaVista, Excite, HotBot, Infoseek and Lycos taking the first twenty hits for evaluation of precision. Chu and Rosenthal [6] have investigated AltaVista, Excite and Lycos for their search capabilities and precision. The authors have used ten search queries of varying complexity by evaluating the first ten results for relevance assessment and revealed that AltaVista outperformed,and Excite and Lycos both in search facilities and retrieval performance. Clarke and Willett [7] searched thirty queries of varying nature on AltaVista, Excite and Lycos and obtained best results in terms of precision, recall and coverage from AltaVista. Bar-Ilan [3] investigated six search 2 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First)

Sedigheh Mohamadesmaeil and Saeed Ghaffari engines using a single query Erdos. All 6,681 retrieved hits examined for precision, overlap and an estimated recall report that no search engine has high recall. Jansen et al., [12], Spink et al., [17], and Spink and Jansen [16] highlight key searching trends from 1997 to 2004, including that most web users do not enter many queries during a search session and view few results pages. Link analysis has also developed as a major web research area (Thelwall [19]}. Cheney and Perry [5] compare the comparative size of Yahoo! and Google s indexes. Mowshowitz and Kawaguchi [15] examined the differences between web search engine results from an expected distribution. Egghe and Rousseau [10] analyze IR system overlap from a mathematical perspective, and Bar-Ilan [2] discusses a statistical comparison of overlap in web search engines. Bar-Yossef and Gurevich [4] discuss methods for comparing web search engine indexes. Isfandyari Moghaddam [11] carried out a comparative study on overlapping of search results in meta search engines and their common underlying search engines. Mohammadesmaeil, Lafzighazi and Gilvari [14], carried out a study entitled: Comparing Search Engines and Meta Search Engines in Pharmaceutic Information Retrieval. The objective of that research was measure the relevance of documents retrieved from search engines and meta search engines in the field of pharmacology. Findings help web users, especially pharmaceutic researches and specialists to know the search tools which cover more pharmaceutic information and use these search tools to access the required information.this research was done in descriptive survey method. 6 major search engines and Meta search engines that are introduced by the website of ww.searchenginewatch.com as well-used search tools of internet was chosen. Pharmaceutic keywords were chosen from medical subject Headings (Mesh) and then selected terms of pharmacology were searched in each of search engines. The first 10 results of search engines were selected for evaluation of recall and precision. Data were analyzed with Excel. More over findings showed that Yahoo retrieved the most pharmaceutic documents and scored the highest rank (34%). Aol had (62%) precision and (21%) recall and retrieved the most relevant pharmaceutic documents. Dogpile retrieved the most pharmaceutic documents and scored the highest rank (22%),followed by Metacrawler (21%) and Info (19%). Excite had (62%) precision and (22%) recall and retrieved the most relevant pharmaceutic documents.finally researchers concluded that, search engines and meta search engines are suitable tools for amateur or professional users and they have suitable search capabilities and facilities. Although using search engines in retrieving relevant documents is useful, but it is suggested that users follow the search in several search engines to access the relevant documents among the vast available sources on web. Briefly, studies show that recall, precision and overlap are a considerable subject matter for web search engine performance survey. Most web search engines studies were performed using general query samples, but in this research, we attend to survey the recall, precision and overlaps of five popular and well-used search engines (Google, Yahoo, AltaVista, AOL, ASK) in relation to the six more precise and specific subject and agricultural keywords which selected from CAB were searched in each these search engines. The best search engines in answer to the subject terms are introduced. COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 3

Promoting Agriculture Knowledge via Public Web Search Engines 3. Methodology In May 2010, a set of 6 queries relating to agricultural topics were chosen from agricultural Subject Headings (CAB) were searched in each these search engines. Five major search engines that are introduced by the website of www.searchenginewatch.com as wellused search tools in that time were chosen. In order to determine recall, precision and overlap of search engines in these 5 major search engines were accessed for the selected terms from 25th June to 10th July, 2010. First 10 hits of each search engine result pages in response to each 6 term queries are considered as search population. The research elements are as follows: Major search engines: Google, Yahoo, AltaVista, AOL, ASK. Search queries consist of: Intercropping, Carnivorous plants, Soil pollution, Plant viruses, Irrigation farming. 3.1. Estimation of Precision, Recall and overlap Determination of recall and precision needs to decide which retrieved search engines hits is relevance and which ones are not. Decision of relevance and no relevance hits are made by authors and scored as follow: Exactly relevance: hits that the requested terms are completely amongst the title words of retrieved documents. Relevance: hits in that the compositions of stem of the requested terms are in the title words of retrieved documents. Partly relevance: hits in that part of the requested terms are combined, as prefixes or suffixes, to make a word in the title of retrieved documents. Not relevance: hits in that no one of the requested terms are appeared in the title of retrieved documents. Exactly relevance, relevance and partly relevance are considered as relevance and scored 1. Not relevance hits scored 0. Precision is the fraction of search outputs that is relevant for a particular query. Its calculation, hence, requires knowledge of the relevant and non-relevant hits in the evaluated set of documents (Clarke & Willet, 1997). Thus it is possible to calculate absolute precision of search engines which provide an indication of the relevance of the system. In the context of the present study, precision is defined as: Precision = Sum of the scores of relevance scholarly documents retrieved by a search engine Tota ln umber of results evaluated 4 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First)

Sedigheh Mohamadesmaeil and Saeed Ghaffari Table 1 The Total number of relevance scholarly documents retrieved by each five search engines (in agricultural information retrieval) Subject Terms Google Yahoo Altavista Aol Ask Total amount Intercropping 230000 542000 541000 56600 48800 1418400 Carnivorous plants 384000 1690000 1660000 62400 53900 3850300 Soil pollution 504000 1050000 1050000 118000 99900 2821900 Plant viruses 624000 793000 796000 64500 55600 2333100 Irrigation Farming 49600 84000 116000 6760 5840 262200 Organic farming 2060000 8960000 9010000 410000 561500 21001500 Total amount 3851600 13119000 13173000 718260 825540 31687400 The recall on the other hand is the ability of a retrieval system to obtain all or most of the relevant documents in the collection. Thus it requires knowledge not just of the relevant and retrieved but also those not retrieved (Clarke & Willet, 1997). The relative recall value is thus defined as: Relative Re call = Total number of relevance scholarly documents retrieved by asearchengine Sum of scholarly documents retrieved by all five search engines To calculate the overlap of the above search engines, each keyword was searched in each search engine. Then, six lists were prepared. Afterwards, these lists were compared with each other. Finally, overlap is thus defined as: Overlap = Total number of same results in each search engine in comparison with others Number of keywords* * number of recall number of other search engines 3.2. Results More over, the mean precision and relative recall of select search engines for retrieving agricultural information are presented in Table 2. Comparing the mean precision, Ask scored the highest rank 63% followed by Yahoo 61% and Alta vista 60%, while AOL received the lowest precision 56% (Figure1). Comparing the corresponding mean relative recall values, Ask has the highest recall 22% followed by Yahoo 21%, Alta vista and Google 20%, while AOL received the lowest recall 19% (Figure 3). COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 5

Promoting Agriculture Knowledge via Public Web Search Engines Table 2 Mean Precision and Relative Recall of search engines during 2010 Search engines Alta vista Yahoo Google Ask AOL Precision 60% 61% 58% 63% 56% Recall 20% 21% 20% 22% 19% Figure 1 Percentage of the Total number of relevance scholarly documents retrieved by each five search engines (in agricultural information retrieval) Figure 2 Precision of Five Search Engines in Agricultural Information Retrieval 6 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First)

Sedigheh Mohamadesmaeil and Saeed Ghaffari Figure 3 Recall of Five search engines in Agricultural Information Retrieval Table 3 The Percentage of Overlap of five Search Engines in Agricultural Information Retrieval Ask AOL Alta vista Yahoo Google %22 %40 %43 %44 %38 The percentage of overlap in five search engines in agricultural information retrieval is also shown in Table 3. 4. Conclusion In this study AOL ranks the top search engine with highest relevant percentage of returns (63% precision and 22% recall), respectively, with overall good performance for its currency sources of information. Also, Yahoo had 44% overlap with other search engines, so Yahoo scored the highest rank. This research also has produced significant findings for all web users, the web researchers, especially agriculturists. This study has determined that, using the best search engines only half of retrieval would be relevant. A major result of our study is that first page results returned by the five major search engines included in this study are different from one COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 7

Promoting Agriculture Knowledge via Public Web Search Engines another. Search engines seldom agree on first page returned results for any query. It means that, there is little agreement among search engines on what are the best results for a given query. Moreover, major search engines are suitable tools for findings agricultural information. A huge amount of sources retrieved from the web must be examined and carefully evaluated, thus users can not predict the quality and timeliness of search results. However, searching the web does enable users to discover ghastly current information, agricultural conferences and products, current statistics, news, services and full text articles. References [1] Aguillo, Isidro, A new generation of tools for search, recovery and quality evaluation of World Wide Web medical resources, Management in Medicine, Vol. 14(4), 2000, pp. 240 248. [2] Bar-Ilan, J., Comparing rankings of search results on the web, Information Processing & Management, Vol. 41, 2005, pp. 1511 9. [3]., On the overlap, the precision and estimated recall of search engines: A case study of the query Erdos. Scientometrics, Vol. 42(2), 1998, pp. 207 208. [4] Bar-Yossef, Z. B., Gurevich, M. G., Random sampling from a search engine s index, Proceedings of the 2006 World Wide Web Conference. 22-26 May 2006. Edinburgh, Scotland, 2006. [5] Cheney, M., Perry, M. (2005), A comparison of the size of Yahoo! and Google indices, available at: http://vburton.ncsa.uiuc.edu/indexsize.html [6] Chu, H., and Rosenthal, M. (1996). Search engines for the World Wide Web: a comparative study and evaluation methodology. In: Proceedings of the ASIS 1996 Annual Conference, October, 33, 127-35. Retrieved August 19, 2003 from http://www.asis.org/ annual-96/electronicproceedings/chu.html [7] Clarke, S., and Willett, P. Estimating the recall performance of search engines. ASLIB Proceedings, Vol. 49(7), 1997, pp. 184 189. [8] D. Sullivan, Nielsen. Net ratings: search engine ratings. In Search Engine Watch. [9] Ding, W., and Marchionini, G.. A comparative study of the Web search service performance. In: Proceedings of the ASIS 1996 Annual Conference, October, Vol. 33, 1996, pp. 136 142. [10] Egghe, L., Rousseau, R., Classical retrieval and overlap measures satisfy the requirements for rankings based on a Lorenz curve, Information Processing and Management, Vol. 42(10), 2006, pp. 106 20. [11] Isfandyari Moghaddam,Alireza; Parirokh, Mehri. A comparative study on overlapping of search results in metasearch engines and their common underlying search. Library Review, Vol. 55(5), 2006, pp. 301 306. 8 COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First)

Sedigheh Mohamadesmaeil and Saeed Ghaffari [12] Jansen, B. J., Spink, A., Saracevic, T., Real life, real users, and real needs: a study and analysis of user queries on the web, Information Processing & Management, Vol. 36(2), 2000, pp. 207 27. [13] Leighton, H. (1996). Performance of four WWW index services, Lycos, Infoseek, Webcrawler and WWW Worm. Retrieved June 10, 2005 from http://www.winona.edu/ library/webind.htm. [14] Mohammadesmaeil S, Lafzighazi E, Gilvari A. Comparing Search Engines and Meta Search Engines in Pharmaceutic Information Retrieval. Health Information Management, Vol. 5(2), 2008. [15] Mowshowitz, A., Kawaguchi, A., Measuring search engine bias, Information Processing and Management, Vol. 41, 2005, pp. 1193 205. [16] Spink, A., Jansen, B. J., (Eds),Web Search: Public Searching of the Web, Springer, Berlin, 2004. [17] Spink, A., Jansen, B. J., Wolfram, D., Saracevic, T., IEEE Computer, From e-sex to e-commerce: Web search changes, Vol. 35(3), 2002, pp. 133 5. [18] Spink, Amanda et al., overlap among major web search engines. Internet Research, Vol. 16(9); 2006, pp. 419. [19] Thelwall, M., Link Analysis: An Information Science Perspective, Elsevier Academic Press, 2004. COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT (Online First) 9