Bibliometrics and Transaction Log Analysis. Bibliometrics Citation Analysis Transaction Log Analysis



Similar documents
Getting Started with Tableau Server 6.1

An Introduction to Using CINAHL

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Scopus. Quick Reference Guide

Functional Requirements for Digital Asset Management Project version /30/2006

Navigating An Introductory Guide for Librarians

MOOCviz 2.0: A Collaborative MOOC Analytics Visualization Platform

Content Manager User Guide Information Technology Web Services

EBSCOhost User Guide Searching. Basic, Advanced & Visual Searching, Result List, Article Details, Additional Features. support.ebsco.

Arti Tyagi Sunita Choudhary

1.00 ATHENS LOG IN BROWSE JOURNALS 4

Urban Andersson Jonas Gilbert and Karin Henning Gothenburg University Library Gothenburg, Sweden

Search and Information Retrieval

Content Management Software Drupal : Open Source Software to create library website

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Library Guide to EndNote Online

Upload Your Culminating Project to The Repository at St. Cloud State University

QUICK REFERENCE GUIDE

Photo Library. Help Guide

Personal Cloud. Support Guide for Mac Computers. Storing and sharing your content 2

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

HOME PAGE. Quick Start Guide. Here s how to navigate the Films On Demand home page you first see when you log in.

Everything you ever wanted to know about. Physiotherapy. C a n a d a ONLINE

Digital Reference Service: Libraries Online 24/7

California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department

Enhanced Library Database Interface at NTU Library

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING

123RF Corporate+ User s Guide

SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE

FileMaker 12. ODBC and JDBC Guide

Swinburne University of Technology

Bibliometric Big Data and its Uses. Dr. Gali Halevi Elsevier, NY

Big answers from big data: Thomson Reuters research analytics

Personal Cloud. Support Guide for Mobile Apple Devices

HP Asset Hub. Fundamentals Training - Event Syndication Migration - August 2015

Figure A Partial list of EBSCOhost databases

Chapter 11 Managing Core Database Downloads

Reference Software Workshop Tutorial - The Basics

California Lutheran University Information Literacy Curriculum Map Graduate Psychology Department

Use of Online Public Access Catalogue at Annamalai University Library

eyeos Web System User Manual

Installation and Deployment

General Product Questions Q. What is the Bell Personal Vault Vault?...4. Q. What is Bell Personal Vault Backup Manager?...4

An Effective Analysis of Weblog Files to improve Website Performance

ERMes: an Access-Based ERM

Inmagic Content Server Standard and Enterprise Configurations Technical Guidelines

Quick Reference Guide

Library and information science research trends in India

Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same?


Using Internet or Windows Explorer to Upload Your Site

NETWORKS AND THE INTERNET

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

No web design or programming expertise is needed to give your museum a world-class web presence.

Chapter 7. Using Hadoop Cluster and MapReduce

CINAHL (via Ovid): Introduction to Searching

Image Galleries: How to Post and Display Images in Digital Commons

Internet Search Techniques

Welcome to Collage (Draft v0.1)

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT

HW9 WordPress & Google Analytics

Fahad H.Alshammari, Rami Alnaqeib, M.A.Zaidan, Ali K.Hmood, B.B.Zaidan, A.A.Zaidan

OneLogin Integration User Guide

FileMaker 11. ODBC and JDBC Guide

How to set up the HotSpot module with SmartConnect. Panda GateDefender 5.0

Husky SciVal Experts FAQs

Introduction. Chapter Introduction. 1.2 Background

Novell ZENworks Asset Management 7.5

Educating the Health Librarians in Africa today: Competencies, Skills and Attitudes required in a Changing Health Environment

Website Usage Monitoring and Evaluation

Configuring Your Gateman Proxy Server

We re going to show you how to make a Share site. It takes just a few minutes to set one up. Here s how it s done.

Personal Cloud. Support Guide for Windows Mobile Devices

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

OvidSP Quick Reference Guide

CITATION METRICS WORKSHOP ANALYSIS & INTERPRETATION WEB OF SCIENCE Prepared by Bibliometric Team, NUS Libraries. April 2014.

Scientific Knowledge and Reference Management with Zotero Concentrate on research and not re-searching

Eclipse.Net Hosted Librarian Guide

Integrated Library Systems (ILS) Glossary

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Transaction Log Analyses of Electronic Book (E-book) Usage

Request for Proposal (RFP) Toolkit

How To Use The Alabama Data Portal

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

How to FTP (How to upload files on a web-server)

Important Notice. All company and brand products and service names are trademarks or registered trademarks of their respective holders.

HydroDesktop Overview

Create Beautiful Reports with AWR Cloud and Prove the Value of Your SEO Efforts

SYSPRO App Store: Registration Guide

End User Guide. July 22, 2015

Taylor & Francis Online Mobile FAQs

EBOX Digital Content Management System (CMS) User Guide For Site Owners & Administrators

Transcription:

and Transaction Log Analysis Bibliometrics Citation Analysis Transaction Log Analysis

Definitions: Quantitative study of literatures as reflected in bibliographies Use of quantitative analysis and statistics to describe patterns of publication within a given field or body of literature

Generally speaking, bibliometrics helps explore questions about bodies of literature and the authors that produce it: How scholarly is the cited literature? How current is the cited literature? How research oriented is it? How interdisciplinary is it? Who writes that literature? Where does the literature appear? How do other authors use that literature?

More specifically, enables investigation of basic research questions: Provide macro perspective on scientific communication Determining influence of a single author Describing relationship between two or more authors or works Demonstrating emergence of new subject fields Describing growth of literature on a subject Quantifying productivity of individual authors Measuring dispersion of articles on a subject across journals Characterizing obsolescence of literature

Findings can be applied to range of practical problems: Collection development Thesaurus development Development of indexes, abstracts, taxonomies, metadata Collection pruning Journal, database acquisition and purchasing

Two distinct bibliometric approaches have developed in parallel Analysis of distribution properties resulting in statistical laws or mathematical models Range of methods that enable specific descriptions of the content, structure, and development of research fields

Bibliometric laws Lotka s Law of scientific productivity Describes the frequency of publication by authors in a given field Demonstrates that only a small percentage of authors in a field are highly productive

Bibliometric laws Bradford s Law of Core and Scatter in Journals Demonstrates that a small portion of journals in a field contain a substantial portion of relevant articles in the field Journals in a single field can be divided into three parts, each containing the same number of articles: 1. A core of journals, few in number, that produces one-third of all the articles 2. A second zone, containing same number of articles as first, but a greater number of journals 3. A third zone, containing the same number of articles as the second, but a still greater number of journals

Bibliometric laws Zipf s Law of Word Frequency Predicts the frequency of words within a text Based on ranking words occuring in descreasing frequency

Citation analysis Tool to identify core sets of articles, authors, or journals of particular fields of study, and to describe relationships and trends within and between these entities When one author cites another author, a relationship is established, between: Authors Journals, publishers Disciplines, fields, subject areas Keywords Institutions, countries, languages Citations both!om and to a given work can be unit of analysis

Citation analysis Three distinct approaches Co-citation analysis Bibliographic coupling Co-word analysis

Co-citation analysis Method used to establish a subject similarity between two documents Number of times two documents are jointly cited in other documents If papers A and B are both cited by paper C, they can be said to be related to one another, even though they don t directly cite each other The more papers A and B are both cited by, the stronger their relationship is Can be used to map the topical relatedness of clusters of authors, journals or articles Can also be based on authors or journals as units of analysis

Co-citation analysis Influential Authors in LIS 2000-20002 - A First Author Co-citation Map http://www.umu.se/inforsk/ Imageindexing/imageindex.htm

Co-citation analysis AuthorLink Co-citation Map http://faculty.cis.drexel.edu/ ~xlin/authorlink.html

Bibliographic coupling Assumes two documents that both cite the same document have something in common Links two papers that cite the same articles, so that if papers A and B both cite paper C, they may be said to be related, even though they do not directly cite each other The more papers they both cite, the stronger their relationship is

Co-word analysis Based on analysis of co-occurence of keywords used to index documents Useful for: Mapping the content of research in a field Creation of indexes or thesauri for a given subject domain Supplement search terms in information retrieval systems

2392"@9O"P0"0U<0?20D"7301"9"G=:01"9M2365" 6?9;"9M2365C" I,JKAS"230"?6)?=292=61"?6M12A"L65"9;;"<9=5A" ;;"P0"D=A<;9O0D"7301"230"^#367",M@P05A^" 2"=A"?;=?Q0DC""K30"HI,JK"9;G65=23@"?50920A" 59<3" =1" 73=?3" 61;O" 230" ^;09A2"?6A2^" <923A" "D5971C""*09A2)?6A2"<923A"950"D0205@=10D" 50" Bibliometrics 230"?6)?=292=61"?6M12A" 6L" 9M2365" <9=5AC""" A2"A=1G;0"?6)?=292=61"?6M12"P027001"9M2365A" A2)?6A2"70=G32S"P0?9MA0"7301"230"70=G32A" 27001"9M2365A"950"AM@@0DS"230O"0U?00D"=2C""" Co-word analysis =2A"236A0"<923A"91D"D597A"61;O"230"<923A" 2"?6)?=292=61"?6M12A"L65"09?3"<9=5C""" =@<65291?0" ConceptLink 6L" A367=1G" 230"?6M12A" =A" 2392" "6L"D6?M@012A"A02A"230";=1Q0D"<9=5A"76M;D" " 9" 56MG3" http://faculty.cis.drexel.edu/ @09AM50" 6L" =@<65291?0" 65" =:01" D6@9=1C" ~xlin/conceptlink.html " +1" I=GM50" _S" #=@61" 91D" "P623"/95;"H6<<05"91D"K36@9A"#C"/M31C""+L" )?=292=61"?6M12A"L65"230A0"50A<0?2=:0"<9=5AS" ``S" 730509A" H6<<05)/M31FA" =A" &\\C" " K3=A" 5" <56@=101?0" 6L" 230" ;92205" <9=5" =1" 9" " =1" 205@A" 6L" 230" D6?M@012A" 230=5" 19@0A" 6@"230"#=@61"0U9@<;0S"9";62"?91"P0"D610" 61"@9<AC""K30"MA05"?91"A2MDO"230"9M2365"" ;;67" ;=1QA" 26" A00" 367" 276" 9M2365A" 950" ;OB" D=:=D0" 230" 012=50" @9<" =126" D=LL05012" "D=LL05012"<62012=9;A"L65"D6?M@012"5025=0:9;C"" D05A291D=1G" 6L" 230" 9M2365A" 91D" 230=5" ^;=L2=1G^"91D"^<6A2M50^"26"230"A095?3"P6UC""K30"1M@P05"6L"3=2A" =@@0D=920;O" L9;;A" 26" (''C" " Y0" 9DDA" ^<3OA=?9;" 23059<O^" 26" 230" A095?3"P6UC"",67"230"1M@P05"6L"3=2A"=A"('C""#6"30"?;=?QA"61"230" ^$6" $02" +2g^" PM2261" 91D" P567A0A" 2356MG3" 230" ('" D6?M@012AC"" W392"30"G02A"=A"=1L65@92=61"50;920D"26"230"eM05O"^=1L65@92=61" 61" <3OA=?9;" 23059<O" L65" P9?Q" <9=1A"?9MA0D" PO" ;=L2=1G" 91D" <6A2M50"<56P;0@AS^"9;236MG3"30"10:05"0U<;=?=2;O"236MG32"9P6M2" 2392"eM05O"65"367"26"?61:0O"=2"7=23"b66;091";6G=?C""Y0"A=@<;O" 39A"26"50?6G1=>0"205@A"0U<50AA=1G"3=A"=12050A2C""" " " " " " " " " " " " " " " " " " " I=GM50".C"""4"?61?0<2" @9<" L65"230"A095?3"^P9?Q"<9=1^C""I6M5"?;MA205A"950"?;095;O" A001" 89A" =1D=?920D" PO" 230"?=5?;0AS" 73=?3" 950" 162" G0105920D" PO" 230" AOA20@h"230O"950"9DD0D"3050"L65"=;;MA2592=61"61;OBC"

Example of practical value of citation analysis Collection development Collection planning: determine information needs, make decisions about priorities Collection implementation: organizing collection, creating useful indexing aids for finding resources Tasks require knowledge about the structure of a subject field, about information resources used, about important themes and terminology upon which the collection can be organized and indexed Co-citation analysis, bibliographic coupling, co-word analysis can each be useful: Mapping the structure and use of the relevant literature Determine terms for indexing, thesauri, searching and browsing interfaces

Measuring growth and obsolescence Use of citation data to measure half-life of articles, journals, fields Median citation age: based on publishing years of citing publications and publishing years of citations Price index: measure of how many citations in a publication are at most five years old at the time of publishing Index value is a measure of the increase of publications in the subject field If the growth of a field is 10% the literature is doubled in about 7 years, 39% of the literature was published during the past five years Humanities have a low Price index; obsolescence is slow Emerging sciences have high Price index; obsolescence is relatively quick Can be calculated annually to demonstrate changes and trends

Impact Factor Measure of the frequency with which the average article in a journal has been cited in a particular year or period A = total citations in a year (example: 2001) B = 2001 citations to journal (X) articles published in years 1999-2000 (subset of A) C = number of articles published in journal (X) in years 1999-2000 D = B/C = 2001 impact factor

Impact Factor Provides an approximation of the prestige of journals in which individuals have been published Gives library administrator information about journals in existing collection and journals being considered for acquisition Can be useful but many cavets about use (eliminate self-citiations, variations between fields, journal coverage in ISI indexes, etc.)

Strengths of bibliometrics as a research approach Methods are objective and repeatable Results have a wide range of potential practical value Does not require human subject interaction High reliability in that data are collected unobtrusively, from the published record, and can be easily replicated by others

Limitations of bibliometrics as a research approach Results are only valid to extent that citations are assumed to represent signficant link between citing and cited documents, a questionable assumption: Citations made for many reasons other than topic similarity or quality Citations which should be made are often not Technical issues related to data obtained from citation indexes and bibliographies Variations and misspelling of author names, authors with same name, incomplete coverage of non-english publications

Bibliometric methods not widely used by librarians for practical problems In recent years, however: Rapid emergence of new subject fields and interdisciplinary publications Explosive growth in number of available documents Bibliometrics provides tools that can help librarians deal with challenges posed: Collection development, subject indexing, metadata and theasurus creation, etc.

Bibliometric related resources ISI Web of Knowledge Simmons Libraries -> GSLIS -> Online databases pulldown menu Userid: simm23 Password: educate Try: ISI Web of Science - citations to a given article or author ISI Journal Citation Reports - Social Sciences, subject category; information & library science, sort by impact factor

Transaction Log Analysis Number of digital documents and users of those documents growing rapidly Findings from the How Much Information? project (http://www.sims.berkeley.edu/research/projects/how-much-info -2003/) New stored information grew about 30% a year between 1999 and 2002 Almost 800 MB of recorded information is produced per person each year The World Wide Web contains about 170 terabytes of information on its surface; about seventeen times the size of the Library of Congress print collections The deep Web is estimated to be 400 to 450 times larger

Transaction Log Analysis Basic concepts of bibliometrics can also be applied to patterns of usage beyond citations Transaction log analysis or webmetrics Analyzing usage patterns in a digital environment Allows range of other types of observations Citations do not necessarily reflect usage Transaction logs generally do reflect real usage Web server log analysis ILL records, circulation records Browsing data

Transaction Log Analysis Web log data One or more log files on the Web server can record: IP address of requesting computer Date and time of request Page (filename) requested Referrer page (URL of page that brought user) Web browser/operating system of requesting computer Search terms used from search engine Can also create relatively easily customized logs for a given system to gather more specific data

Transaction Log Analysis Types of possible analysis Session level: complete sequence of requests/queries by a given user Characterize actions of and information sought by user What is the user trying to accomplish? What types of things do users in aggregate try to do?

Transaction Log Analysis Types of possible analysis Page/object level: access to specific pages or objects in the system Which pages are most popular? Which files, images, videos are most frequently looked at or downloaded? Errors resulting from page or resource requests Query level: how users navigate or attempt to find information or resources Which query terms are used? What combination of terms are used? How long or short are queries?

Transaction Log Analysis Example 1: Analyzing user queries from Excite search engine logs Jansen, Bernard J., & Amanda Spink. (2000). Methodological approach in discovering user search patterns through web log analysis: using the Excite search engine. Bulletin of the American Society for Information Science. 27, no1: 15-17. http://www.asis.org/bulletin/oct-00/janses spink.html Log of 1 million queries each in 1997 and 1999: Mean queries per user session: 4.8 in 1997, 2.0 in 1999 Mean terms per query: 2.4 in 1997, 2.35 in 1999 Users most often view at most 10 results Only about 8% of users use Boolean queries

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site The open-video.org Web site redesigned in September, 2003 How are users using the redesigned site? Which pages are most popular? Which options in the search results page do they use? Log data can provide evidence upon which to make design and information architecture decisions within the Web site or digital library

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site User activity in 4 months after redesign: Total of 69,589 unique visitors Total of 140,135 downloads Page Views Video Details 348,974 Search Results 276,745 Main 150,622 Popular Video 61,429 Special Collection Details 12,227 New Video 4,133 Project Information 4004 Detailed Search 3097 Special Collections 3013 Related Video 2842 Project News 2427 Random Video 1835 Contributing Video Info 1503 Help on Playing Video 1465 Project Publications 521 Browser Compatibility 390 Project Contacts 334

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site User activity in 4 months after redesign: Finding video by popularity much more common than by lists of new or random video Page Views Video Details 348,974 Search Results 276,745 Main 150,622 Popular Video 61,429 Special Collection Details 12,227 New Video 4,133 Project Information 4004 Detailed Search 3097 Special Collections 3013 Related Video 2842 Project News 2427 Random Video 1835 Contributing Video Info 1503 Help on Playing Video 1465 Project Publications 521 Browser Compatibility 390 Project Contacts 334

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site Which options do users use to sift search results? Visual layout of results Ordering criteria Size of visible set

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site Sifting options - User choice of visual layout of results options Large thumbnails 221,540 * Text 13,223 Small thumbnails 16,029 Thumbnails only 12,730 * Default choice

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site Sifting options - User choice of ordering criteria of results Option # of Selections Relevance 258,386 * Title 3,700 Year 6,735 Duration 1,320 Popularity 6,604 * Default choice

Transaction Log Analysis Example 2: Analyzing user activity on Open Video site Sifting options - User choice of size of visible set of results Option # of Selections 10 252,207 * 20 4,923 30 3,600 50 5,585 100 7,350 All 10,430 * Default choice

Transaction Log Analysis Limitations of transaction log analysis Assumption that an IP address represents unique user often not true Dynamic IP addresses - same user can have different IP addresses Shared computers - different users can have same IP address Web pages can be cached, both by the client machine and by the Internet Service Provider (ISP) Do not know user motivation for page, query selection Privacy concerns - user registration can obviate variable IP address issues, but has its own issues