Measuring impact revisited



Similar documents
Deutsche Initiative für NetzwerkInformation e.v.

Item-level Usage Statistics - a review of current practices and recommendations for normalization and exchange

SHared Access Research Ecosystem (SHARE)

Matthias Schulze University of Stuttgart Stuttgart, Germany

Institutional Repositories: Staff and Skills Set

The PIRUS Code of Practice for recording and reporting usage at the individual article level

CiteSeer x in the Cloud

Interoperability around repositories: RCAAP services. José Carvalho

Getting the global picture

APT How to Better Manage Risk by Outsourcing Risk Measurement. London 11 June 2014

CERN Document Server

Institutional Repositories: Staff and Skills requirements

The Open Access Strategy of the Max Planck Society

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Matthias Schulze University of Stuttgart Stuttgart, Germany

DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES TRUSTED BY OVER 25,000 BUSINESSES

Search and Information Retrieval

SPARC Europe in the Open Access Movement: A Master Plan to tackle the barriers?

Acquia Introduction December 9th, 2009

INDEX. OutIndex Services...2. Collection Assistance...2. ESI Processing & Production Services...2. Computer-Based Language Translation...

Legal session: copyright status of statistical data, privacy issues

friendly (technical) guide to COUNTER

DRIVER: Building a sustainable infrastructure of European Scientifc Repositories

Agio Remote Monitoring and Management

RCAAP: Building and maintaining a national repository network

Obtaining Oxford Journals usage statistics with SUSHI

DSpace: An Institutional Repository from the MIT Libraries and Hewlett Packard Laboratories

EQUELLA. One Central Repository for a Diverse Range of Content.

Approaches to Making Data Citeable Recommendations of the RDA Working Group. Andreas Rauber, Ari Asmi, Dieter van Uytvanck Stefan Pröll

Citebase Search: Autonomous Citation Database for e-print Archives

SharePoint 2010 Interview Questions-Architect

Open Access Repositories Technical Considerations. Introduction. Approaches to Setting up Repositories

Binonymizer A Two-Way Web-Browsing Anonymizer

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Security Service website

Ganzheitliches Datenmanagement

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

AUDIENCE MEASUREMENT TAGGING BENEFITS GUIDE 2013

SharePoint Governance Execution

Big answers from big data: Thomson Reuters research analytics

Database Services for CERN

DFG form /15 page 1 of 8. for the Purchase of Licences funded by the DFG

Urban Andersson Jonas Gilbert and Karin Henning Gothenburg University Library Gothenburg, Sweden

CiteSeer x : A Cloud Perspective. Pradeep Teregowda, Bhuvan Urgaonkar, C. Lee Giles Pennsylvania State University

Más de 1000 repositorios a su disposición OpenDOAR y ROAR

MESUR: usage-based metrics of scholarly impact.

Data at NIST: A View from the Office of Data and Informatics

Electronic publications

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

Life Insurance & Big Data Analytics: Enterprise Architecture

D5.5 Initial EDSA Data Management Plan

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

The Data Webhouse. Toolkit. Building the Web-Enabled Data Warehouse WILEY COMPUTER PUBLISHING

A Plan for the Continued Development of the DNS Statistics Collector

Overview of Website Analytics. membership, marketing & website solutions

Making System z Sexy Again with Social and Collaboration Software

Online Property: Certificate of Activity

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

Specification of adaptation of content management services

How to avoid building a data swamp

Functional Requirements for Digital Asset Management Project version /30/2006

LDIF - Linked Data Integration Framework

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

How To Manage Your Digital Assets On A Computer Or Tablet Device

Augmented Search for Software Testing

Enlighten: Adding and Assessing Value

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

An Eprints Apache Log Filter for Non-Redundant Document Downloads by Browser Agents

Analysis of Web Archives. Vinay Goel Senior Data Engineer

The Czech Digital Library and Tools for the Management of Complex Digitization Processes

RSA Security Analytics Security Analytics System Overview

WEB ANALYTICS. Presented by Massimo Paolini MPThree Consulting Inc

Web Analytics Big Three Definitions Version 1.0

Search Engine Technology and Digital Libraries: Moving from Theory t...

EnterpriseLink Benefits

The SharePoint Maturity Model

Adding scalability to legacy PHP web applications. Overview. Mario Valdez-Ramirez

OPERATIONAL RISK MANAGEMENT & MODELLING FROM WYNYARD GROUP & EVMTECH

Open Source Monitoring

How To Digitise Newspapers On A Computer At Nla.Com

Search and Real-Time Analytics on Big Data

Jochen Schirrwagen, Najko Jahn. Bielefeld University Library, Germany. Research in Context

BIRT Document Transform

Notes about possible technical criteria for evaluating institutional repository (IR) software

AD Account Lockout Investigation and Root Cause Analysis

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Real Time Performance Dashboard for SOA Web Services ORION SOA

Globule: a Platform for Self-Replicating Web Documents

The New RERO Statistics Services

NOS for Data Analysis (802) September 2014 V1.3

BIG DATA SOLUTION DATA SHEET

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Unified Batch & Stream Processing Platform

Building integration environment based on OAI-PMH protocol. Novytskyi Oleksandr Institute of Software Systems NAS Ukraine

IBM Software Group Thought Leadership Whitepaper. IBM Customer Experience Suite and Real-Time Web Analytics

TEST AUTOMATION FRAMEWORK

Cloud Customer Architecture for Web Application Hosting, Version 2.0

Improved metrics collection and correlation for the CERN cloud storage test framework

DRIVER Providing value-added services on top of Open Access institutional repositories

Transcription:

Measuring impact revisited Frank Scholze Stuttgart University Library 5th Workshop on Innovations in Scholarly Communication CERN, Geneva, 18.4.2007

Outline 1. Scope / definitions 2. Issues 3. Ongoing work / projects

Impact Impact is any change resulting from an activity, project, or organization. It includes intended as well as unintended effects, negative as well as positive, and long-term as well as short-term. From: Susan Wainwright. Measuring Impact - A Guide to Resources. NCVO, 2002

Measuring I What Impact vs. output Publications are the quantifiable output of the research process Electronic publications as a limited but rather well defined sub-field of research output Which publications represent e-science?

Measuring II How Choose indicators for publications (citation, usage, endorsement, post-publication peer-review ) Collect information Analyse / digest the information

Indicators and methods Web Analytics = Usage Logfile analysis Server side scripts (e.g. Linkresolver) Page tagging, web bugs, cookies Hybrid methods Citation Analysis Examination of the frequency and pattern of citations ti in publications Impact factor (IF)

A taxonomy of metrics scholarship evaluation now authors Frequentist counting IF, citation counts Flickr.org Slashdot.org novel methods of resource evaluation users Google s PR Technorati Amazon.com Collaborative filtering Structural pattern LANL experiments for scholarship evaluation since 1999 From: Bollen, Johan and Van de Sompel, Herbert (2005) A framework for assessing the impact of units of scholarly communication based on OAI-PMH harvesting of usage information. Delivered at OAI4, Geneva

Measuring impact: the elements (schematic) collect analys se cho oose Services (ranking, recommender, collection building and management...) Metrics (structural, quantitative) Data mining Compare, aggregate Link-Resolvers Logfile analysis Citation analysis Collect Collect Parse Aggregate Aggregate Extract t Aggregate Basic set of documents (Journals, repositories, primary data etc.)

Outline 1. Scope / definitions 2. Issues 3. Ongoing work / projects

What are the basic items we want to count? Granularity (journals, articles, etc.) Beyond publications (primary data, learning objects, etc.) Identifying items (=referent deduplication) Metadata based heuristics, persistent identifiers Identifying users (=agent deduplication) User and session identification, pseudonymization

Indicators and methods Log data processing Identifying / filtering robots and crawlers Proxies, caching click spans Grouping, isolating and aggregating useful usage patterns Comparison and validation to citation data Aggregation and scalability Technical issues Different architectural frameworks: linking serverbased, other, scalability, pseudonymization Social/Policy issues Networks of trust, sharing usage data How is data collated with external sources (publisher data )

Confidence in data and stats? Data validity Usage definition, recording and representation, quality benchmarks, falsification issues Fraud and data manipulation Standardized and transparent collection and aggregation of usage data Auditing standards and procedures

Legal issues, metrics and services Privacy and other legal issues User and session identification, pseudonymization Legal implications of log storage and aggregation Metrics and services Investigating and tailoring metrics Interfaces with existing bibliometric products Definition of end-user services

Background Issues What is the context for talking about impact? Why is promoting enhanced and alternative metrics important? as a licensing issue: consortia and publishers want to investigate value for money issues as a research policy issue: scholarly communication changes rapidly impact in new OA environment trend analysis ( in-progress communication) assessment and evaluation of research

Outline 1. Scope / Definitions 2. Issues 3. Ongoing work / projects

Knowledge Exchange Workshop on Institutional Repositories Practical definition iti of usage Raw format for exchanging usage events Items to be counted Lobby UNTER to add article level stats Other academic output types Standard reports Agree on a small set of standard useful statistical reports that repositories should produce Policies for stats Compliance with local laws on e.g. privacy Enhance SHERPA policy tool Collection and aggregation Normalisation Specify issues of aggregation and deduplication for later study Collation with external sources Aggregating UNTER stats at consortium level Investigate t SUSHI interoperability with repositories i and OAI-PMH + OpenURLContextObjects

Ongoing g work LANL UK bx (with CalState, ExLibris) MESUR University of Southampton (IRS, EPStats ) University College London Germany (DINI / DFG) Göttingen State and University Library Stuttgart University Library Computer and Media Service Humboldt University Berlin Saarbrücken State and University Library

Germany Workshop on Enhanced and Alternative ti Metrics of Publication Impact, 20 21 February 2006, Humboldt University Berlin Cluster of proposals to the DFG Network of certified open access repositories 2y Usage statistics demonstrator Distributed open access reference citation service demonstrator

Germany II German collecting society for copyright charges (VG Wort) has started a project on statistics (METIS) IFABC (International Federation of Audit Bureaux of Circulation) standards for robot detection and click spans (30 min)

Distributing VG Wort IDs PDF Repository ingest Doc. comes with or gets new VG Wort counter ID e.g. 490271fca856ba65cc VG Wort registers ID if not known PDF Open access ID is made available through OAI-PMH VG Wort DNB Repository

Collecting data User PDFaddress http://vg00.met.vgwort.de/na/490271fca856ba65cc?l=pdfaddress VG Wort IFABC metrics PDF Rewrite Module mod_rewrite and PHP Log Repository Repository PDF Log harvester

Link Resolver Link Resolver Webserver -Log Rewrite e module Measuring impact: the technical elements for usage data Log Repository Log Repository Log Repository OpenURL ContextObjects Log harvester (Service Provider) Normalise OpenURL ContextObjects or SUSHI Normalise (optional) -> Robots, psydonymization Aggregated Usage Data Log DB Aggregated Usage Data Aggregated logs Log DB Data Mining Filtering e.g. Metrics Services e.g. UNTER Based on: Bollen, Johan and Van de Sompel, Herbert, OAI4, Geneva

Conclusion Infrastructure for collecting and aggregating g g usage data is conceptually available, has to be deployed and implemented in practice on a large scale Investigating metrics for different needs and purposes Technical and social issues (the later posing the bigger challenge) Standardization Modularity Co-Operation Openness Transparency