Collecting and Providing Access to Large Scale Archived Web Data. Helen Hockx-Yu Head of Web Archiving, British Library
|
|
|
- Neil Wilkinson
- 10 years ago
- Views:
Transcription
1 Collecting and Providing Access to Large Scale Archived Web Data Helen Hockx-Yu Head of Web Archiving, British Library
2 Web Archives key characteristics Snapshots of web resources, taken at given point in time a living thing Archived web resources are reborn different from digitised and born digital collections (Brugger, N., 2013) Different from the live web in many ways Additional temporal aspects: many different versions of the same thing? Different things in own right? Have boundaries & limitations, determined by purpose, strategy and technological choices 2
3 Different types of web archives Global: Internet Archive s Wayback Machine National: UK, Denmark, France, Finland and many other countries Sub-set of a national domain: eg UK Government Web Archive, Canadian Government Web Archive Topical collections: eg Human rights Collaborative: Archive It service by Internet Archive Corporate: a single organisation s web estate, eg UK Parliament 3
4 Web Archiving at the British Library Started web archiving in 2003: Open UK Web Archive Selective, topical collections and key sites 20TB, 14,792 websites Archiving UK Web for non-print Legal Deposit since April 2013: Legal Deposit UK Web Archive Comprehensive national archive with on-site access only 2013 UK domain crawl contains 31TB, 1.6 billion URLs from 3.8 million domains Collect UK digital heritage and provide continued access to archived web resources 4
5 UK websites territoriality explained An online work is considered as published in the UK and therefore in scope for Legal Deposit, if it meets either of the following criteria: (a) it is made available to the public from a website with a domain name which relates to the United Kingdom or to a place within the United Kingdom; or (b) it is made available to the public by a person and any of that person s activities relating to the creation or the publication of the work take place within the United Kingdom The Legal Deposit Libraries (Non-Print Works) Regulations,
6 Territoriality - implementation All websites with a.uk domain name non.uk websites have to meet at least one criteria UK Hosting: check external IP geo-location database and add in-scope URLs to the fetch-chain UK postal address Correspondence Professional judgement 6
7 Legal requirements, policies and decisions Content not in scope Sound recordings and films (unless incidental to the deposited work) Intranet, personal s Follow Robots Exclusion Protocol by default with some exceptions Mutual agreement All seed pages and embedded content (i.e., CSS, JavaScript and images) Host Cap A 512MB cap per host based on the known site size distribution Tail of large sites based on curators assessment Some excepted and crawled without data cap Collect embedded content regardless where it is hosted 7
8 Collecting strategy for websites Domain Crawl Events Key sites News S p e c i a l c o l l e c t i o n Domain crawl: Broad sweep of UK domain Once or twice a year S p e c i a l c o l l e c t i o n Events & key sites and news: Events of UK interest High value, high impact sites National & regional news S p e c i a l c o l l e c t i o n Special Collection: Focused, thematic collections Support priority subjects S p e c i a l c o l l e c t i o n 8
9 Curating websites Concept of a website relates to the way people use and present information Collection is how libraries organise knowledge Description at website and collection levels including subject terms Presenting collections and websites to end-users 9
10 Website (landing page) 10
11 Collections 11
12 Subjects 12
13 Full-text index and facets The UK Legal Deposit UK Web Archive offers full-text search. Results can be refined by Content type (html, pdf ) Crawl year (individual) Domain (bbc.co.uk; newsforscotland.com ) Domain suffix (.co.uk;.org.uk;.ac.uk;.com) Author (based on metadata extracted from web pages) Still has collections selected and curated by curators, usually crawled within a fixed period of time, relating to an event Subjects information added to selected sites only included in the index 13
14 Resource Not in Archive Common error message but appears for different reasons Intended boundary No permission for linked content Not allow by robots.txt Edge of a archive Data limitation Technical limitations, e.g. dynamic content the crawler could not collect Avoid dead end in navigation Search Link to live web Find archived copies elsewhere 14
15 Memento service Allow users to finding archived web pages (mementos) in multiple web archives across the world (search based on aggregated metadata) Exposes the memento protocol, which adds time dimension to HTTP - accessing the past web as it is to access the current web uses the Memento aggregate TimeGate hosted by lanl.gov Source code Find memento bookmarklet, finding archived versions of 404 webpages while browsing 15
16 Who has archived
17 Access and use of web archives Problematic at two levels: legal restriction and (single) envisaged use case Archive webpages as historical documents used for reference URL search is the standard, universal access method Search or browse functions Number of archives offering the function URL search 26 Keyword search 15 Full-text search 11 Thematic collections 11 Subject browsing 9 Alphabetical browsing 14 Access methods of 29 IIPC members web archives 17
18 Issues of document-centric approach Rise of digital scholarship, taking advantage of the possibilities offered by technology Primacy of text as object of study no longer exists Paratexts play a crucial role in textual coherence of a website: header & footer drop-down menu, site map, breadcrumb, etc. Distant reading (Franco Moretti) focus on units that are much smaller or much larger than the text: devices, themes, tropes or genres and systems 18
19 There is so much more Statistical overview, scale and distribution of a web national domain Size: bytes Space: geo location, postcodes Type of content, eg file format, language Structure, linked entities and networks Evolution, pattern of change over time, eg domain names Correlation, eg between certain term and historical event Also viral content, crawl logs with eg http response codes 19
20 JISC UK web domain dataset ( ) Collaboration between the Internet Archive (IA), the Joint Information Systems Committee (JISC) and the British Library Extracted copies of UK websites from the Internet Archives collection 1 st tranche : , 30TB, 2.5 billion URLs 2 nd tranche: 2010 April 2013, 27.5TB, 1.5 billion URLs (estimated) Research agreement between JISC and IA, upholding IA s Terms of Use Access via IA s Wayback Machine Allows replication / extraction of derivative or secondary datasets, some available at Test-bed to develop new capability and services 20
21 UK Web 2013: data volume per host 21
22 UK Web 2013: number of URLs per host 22
23 UK Web 2013: the largest host 23
24 Postcode-based access 24
25 HTML version analysis 25
26 Image Format Analysis 26
27 Exploring Host Link Graph Courtesy of Peter Webster, Rainer Simon and Jules Mataly 27
28 How was the UK web linked in 1996? By Rainer Simon using UK Host-Level Link Graph ( ) dataset. Based on the 1996 portion: 58,842 hosts (nodes); 184,433 host-to-host links (edges) UK web as part of the global web Scalability issues with large dataset over time 28
29 Visualising links (to and from bl.uk) Interactive version How it is done 29
30 Visualising links (to and from bl.uk) Interactive version How it is done 30
31 The disappearing web over time 31
32 Using N-gram for scholarly research Courtesy of Dr Peter Webster, Institute of Historical Research, University of London
33 Viral content, null results in the index 157,846 viral records detected in 2013 UK domain crawl Not included in the index Redirects Content no longer there (404) Content there, but restricted Server error 33
34 Crawl logs 34
35 Next steps General issues related to analytical access - Scepticism or suspicion about hidden algorithms behind analysis - Biases in data and how data collection decisions lead to variances in outputs - Need to manage expectations, analysis and visualisation as finished products and first steps - Ethical and privacy issues Maximise transparency and make base-line knowledge is self-explanatory e.g. scope of the archive, its coverage and lacunae, how it was collected, and how a particular website was crawled Further work through involvement in projects Big UK Domain Data for Arts and Humanities "ALEXANDRIA: Foundations for Temporal Retrieval, Exploration and Analytics in Web Archives RESAW, a Research Infrastructure for the Study of Archived Web Materials 35
36 Big UK Domain Data for Arts and Humanities Funded by the UK Arts and Humanities Research Council as one of the 21 Big Data projects Collaboration between the Institution of Historical Research, Oxford Internet Institute, British Library and Aarhus University Develop theoretical and methodological framework for the study of web archives Build on ADDAA: researchers and the BL co-produce access tools A major study of the history of UK web space from 1996 to sub-projects covering a range of disciplines Also an online training course and peer-reviewed journal articles. 36
37 Shine Query building Corpus formation and handling Annotation and curation In-corpus analysis Whole-dataset analysis 37
38 Brügger, N.; Finnemann, N.E., The Web and Digital Humanities: Theoretical and Methodological Concerns ISO technical report: Statistics and Quality Indicators for Web Archiving E _DRAFT.pdf UK Web Archive open data
Web Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
Scholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
Web Archiving Tools: An Overview
Web Archiving Tools: An Overview JISC, the DPC and the UK Web Archiving Consortium Workshop Missing links: the enduring web Helen Hockx-Yu Web Archiving Programme Manager July 2009 Shape of the Web: HTML
Archiving Social Media in the Context of Non-print Legal Deposit
Submitted on: 30/07/2013 Archiving Social Media in the Context of Non-print Legal Deposit Helen Hockx-Yu Head of Web Archiving, British Library, London, United Kingdom. E-mail address: [email protected]
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
How To Manage Pandora
PANDORA - past, present, and future National web archiving in Australia Dr Paul Koerbin Manager Web Archiving National Library of Australia National Conference on eresources in Malaysia Penang, Malaysia,
Building a master s degree on digital archiving and web archiving. Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF)
Building a master s degree on digital archiving and web archiving Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF) Objectives of the presentation > Present and discuss an experiment
Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web. NLA Gordon Mohr March 28, 2012
Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web NLA Gordon Mohr March 28, 2012 Overview The tools: Heritrix crawler Wayback browse access Lucene/Hadoop utilities:
The challenges of digital preservation to support research in the digital age
DRAFT FOR DISCUSSION WITH ADVISORY COUNCIL MEMBERS ONLY The challenges of digital preservation to support research in the digital age Lynne Brindley CEO, The British Library November 2005 Agenda UK developments
Working together to archive the UK Web. Helen Hockx-Yu Head of Web Archiving, British Library
Working together to arhive the UK Web Heen Hokx-Yu Head of Web Arhiving, British Library The UK Web Domain 4 th TLD after.om,.de and.net Over 10 miion.uk registered domain UK organisations aso use non.uk
Indexing Repositories: Pitfalls & Best Practices. Anurag Acharya
Indexing Repositories: Pitfalls & Best Practices Anurag Acharya Web search & Scholar Web search indexes all documents Scholar indexes scholarly articles Web search needs document text Scholar also needs
THE BRITISH LIBRARY BOARD BLB 12/29
IN CONFIDENCE THE BRITISH LIBRARY BOARD BLB 12/29 BRITISH LIBRARY DIGITAL STRATEGY TO 2015 1. PURPOSE OF THE PAPER To respond to members request for an overarching overview of how digital activities in
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
Web Harvesting and Archiving
Web Harvesting and Archiving João Miranda Instituto Superior Técnico Technical University of Lisbon [email protected] Keywords: web harvesting, digital archiving, internet preservation Abstract. This
Xtreeme Search Engine Studio Help. 2007 Xtreeme
Xtreeme Search Engine Studio Help 2007 Xtreeme I Search Engine Studio Help Table of Contents Part I Introduction 2 Part II Requirements 4 Part III Features 7 Part IV Quick Start Tutorials 9 1 Steps to
www.coveo.com Unifying Search for the Desktop, the Enterprise and the Web
wwwcoveocom Unifying Search for the Desktop, the Enterprise and the Web wwwcoveocom Why you need Coveo Enterprise Search Quickly find documents scattered across your enterprise network Coveo is actually
Web Archiving at BnF September 2006
Hosting the IIPC Steering Committee gives us the opportunity to give an update on BnF s organisation and projects. In 2006, we have been mostly focusing on building our own organisation and internal dissemination
Elgg 1.8 Social Networking
Elgg 1.8 Social Networking Create, customize, and deploy your very networking site with Elgg own social Cash Costello PACKT PUBLISHING open source* community experience distilled - BIRMINGHAM MUMBAI Preface
THE BRITISH LIBRARY. Unlocking The Value. The British Library s Collection Metadata Strategy 2015-2018. Page 1 of 8
THE BRITISH LIBRARY Unlocking The Value The British Library s Collection Metadata Strategy 2015-2018 Page 1 of 8 Summary Our vision is that by 2020 the Library s collection metadata assets will be comprehensive,
Product Navigator User Guide
Product Navigator User Guide Table of Contents Contents About the Product Navigator... 1 Browser support and settings... 2 Searching in detail... 3 Simple Search... 3 Extended Search... 4 Browse By Theme...
How To Use The Web Curator Tool On A Pc Or Macbook Or Ipad (For Macbook)
User Manual Version 1.4.1... April 2009 Contents Contents... 2 Introduction... 4 About the Web Curator Tool... 4 About this document... 4 Where to find more information... 4 System Overview... 5 Background...
Portals and Hosted Files
12 Portals and Hosted Files This chapter introduces Progress Rollbase Portals, portal pages, portal visitors setup and management, portal access control and login/authentication and recommended guidelines
ACCESSING WEB ARCHIVES
ACCESSING WEB ARCHIVES There are two collections of archived websites available via Explore the British Library: The Legal Deposit UK Web Archive, archived through legal deposit regulations introduced
User Guide to the Content Analysis Tool
User Guide to the Content Analysis Tool User Guide To The Content Analysis Tool 1 Contents Introduction... 3 Setting Up a New Job... 3 The Dashboard... 7 Job Queue... 8 Completed Jobs List... 8 Job Details
Evaluating the impact of research online with Google Analytics
Public Engagement with Research Online Evaluating the impact of research online with Google Analytics The web provides extensive opportunities for raising awareness and discussion of research findings
Archiving the Web: the mass preservation challenge
Archiving the Web: the mass preservation challenge Catherine Lupovici Chargée de Mission auprès du Directeur des Services et des Réseaux Bibliothèque nationale de France 1-, Koninklijke Bibliotheek, Den
Portal Version 1 - User Manual
Portal Version 1 - User Manual V1.0 March 2016 Portal Version 1 User Manual V1.0 07. March 2016 Table of Contents 1 Introduction... 4 1.1 Purpose of the Document... 4 1.2 Reference Documents... 4 1.3 Terminology...
Archival Data Format Requirements
Archival Data Format Requirements July 2004 The Royal Library, Copenhagen, Denmark The State and University Library, Århus, Denmark Main author: Steen S. Christensen The Royal Library Postbox 2149 1016
Archive-IT Services Andrea Mills Booksgroup Collections Specialist
Getting Started with Archive-IT Services Andrea Mills Booksgroup Collections Specialist Internet Archive Micro History Text Archive Update Archive-IT Services 1996 The Internet Archive is created, with
Website Audit Reports
Website Audit Reports Here are our Website Audit Reports Packages designed to help your business succeed further. Hover over the question marks to get a quick description. You may also download this as
Design and selection criteria for a national web archive
Design and selection criteria for a national web archive Daniel Gomes Sérgio Freitas Mário J. Silva University of Lisbon Daniel Gomes http://xldb.fc.ul.pt/daniel/ 1 The digital era has begun The web is
SHared Access Research Ecosystem (SHARE)
SHared Access Research Ecosystem (SHARE) June 7, 2013 DRAFT Association of American Universities (AAU) Association of Public and Land-grant Universities (APLU) Association of Research Libraries (ARL) This
Shared service components infrastructure for enriching electronic publications with online reading and full-text search
Shared service components infrastructure for enriching electronic publications with online reading and full-text search Nikos HOUSSOS, Panagiotis STATHOPOULOS, Ioanna-Ourania STATHOPOULOU, Andreas KALAITZIS,
Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
THE OPEN UNIVERSITY OF TANZANIA
THE OPEN UNIVERSITY OF TANZANIA Institute of Educational and Management Technologies COURSE OUTLINES FOR DIPLOMA IN COMPUTER SCIENCE 2 nd YEAR (NTA LEVEL 6) SEMESTER I 06101: Advanced Website Design Gather
Glossary of Library Terms
Glossary of Library Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Term A Abstract Advanced search B Basic search Bibliography Boolean operators C Call number Catalogue Definition An abstract
Load testing with. WAPT Cloud. Quick Start Guide
Load testing with WAPT Cloud Quick Start Guide This document describes step by step how to create a simple typical test for a web application, execute it and interpret the results. 2007-2015 SoftLogica
Data access and management
B Data access and management CONTENTS B.1 Introduction... B-1 B.2 Data requirements and availability... B-1 B.3 Data access... B-2 B.4 Overall procedures... B-2 B.5 Data tools and management... B-4 Appendix
30 Website Audit Report. 6 Website Audit Report. 18 Website Audit Report. 12 Website Audit Report. Package Name 3
TalkRite Communications, LLC Keene, NH (603) 499-4600 Winchendon, MA (978) 213-4200 [email protected] Website Audit Report TRC Website Audit Report Packages are designed to help your business succeed further.
Practical Options for Archiving Social Media
Practical Options for Archiving Social Media Content Summary for ALGIM Web-Symposium Presentation 03/05/11 Euan Cochrane, Senior Advisor, Digital Continuity, Archives New Zealand, The Department of Internal
SEO FOR VIDEO: FIVE WAYS TO MAKE YOUR VIDEOS EASIER TO FIND
SEO FOR VIDEO: FIVE WAYS TO MAKE YOUR VIDEOS EASIER TO FIND The advent of blended search results, known as universal search in Google, has produced listings that now contain various types of media beyond
TIBCO Spotfire Metrics Modeler User s Guide. Software Release 6.0 November 2013
TIBCO Spotfire Metrics Modeler User s Guide Software Release 6.0 November 2013 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE
OPENGREY: HOW IT WORKS AND HOW IT IS USED
OPENGREY: HOW IT WORKS AND HOW IT IS USED CHRISTIANE STOCK [email protected] INIST-CNRS, France Abstract OpenGrey is a unique repository providing open access to European grey literature references,
Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and
Automated Data Collection with R A Practical Guide to Web Scraping and Text Mining Simon Munzert Department of Politics and Public Administration, Germany Christian Rubba University ofkonstanz, Department
Website Standards Association. Business Website Search Engine Optimization
Website Standards Association Business Website Search Engine Optimization Copyright 2008 Website Standards Association Page 1 1. FOREWORD...3 2. PURPOSE AND SCOPE...4 2.1. PURPOSE...4 2.2. SCOPE...4 2.3.
Service Road Map for ANDS Core Infrastructure and Applications Programs
Service Road Map for ANDS Core and Applications Programs Version 1.0 public exposure draft 31-March 2010 Document Target Audience This is a high level reference guide designed to communicate to ANDS external
VMware vcenter Log Insight User's Guide
VMware vcenter Log Insight User's Guide vcenter Log Insight 1.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition.
Step into the Future: HTML5 and its Impact on SSL VPNs
Step into the Future: HTML5 and its Impact on SSL VPNs Aidan Gogarty HOB, Inc. Session ID: SPO - 302 Session Classification: General Interest What this is all about. All about HTML5 3 useful components
EVILSEED: A Guided Approach to Finding Malicious Web Pages
+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia + Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 RESEARCH ON SEO STRATEGY FOR DEVELOPMENT OF CORPORATE WEBSITE S Shiva Saini Kurukshetra University, Kurukshetra, INDIA
deskspace responsive web builder: Instructions
deskspace responsive web builder: Instructions Site Manager Editor Viewer Settings Features The Free & Personal Licenses cover these features. The Pro Licenses add these additional features. Pro screen
This document is for informational purposes only. PowerMapper Software makes no warranties, express or implied in this document.
SortSite 5 User Manual SortSite 5 User Manual... 1 Overview... 2 Introduction to SortSite... 2 How SortSite Works... 2 Checkpoints... 3 Errors... 3 Spell Checker... 3 Accessibility... 3 Browser Compatibility...
LJMU Research Data Policy: information and guidance
LJMU Research Data Policy: information and guidance Prof. Director of Research April 2013 Aims This document outlines the University policy and provides advice on the treatment, storage and sharing of
SEARCH ENGINE OPTIMIZATION
SEARCH ENGINE OPTIMIZATION WEBSITE ANALYSIS REPORT FOR miaatravel.com Version 1.0 M AY 2 4, 2 0 1 3 Amendments History R E V I S I O N H I S T O R Y The following table contains the history of all amendments
data.bris: collecting and organising repository metadata, an institutional case study
Describe, disseminate, discover: metadata for effective data citation. DataCite workshop, no.2.. data.bris: collecting and organising repository metadata, an institutional case study David Boyd data.bris
EDINBURGH UNIVERSITY PRESS LIBRARIAN ADMINISTRATION USER GUIDE http://www.euppublishing.com
EDINBURGH UNIVERSITY PRESS LIBRARIAN ADMINISTRATION USER GUIDE http://www.euppublishing.com Journal Subscription Activation... 1 1. Register as an Individual User... 1 2. Subscription Confirmation Email...
SEO Training SYLLABUS by SEOOFINDIA.COM
1 Foundation Course SEO Training SYLLABUS by SEOOFINDIA.COM Search Engine Optimization Training Course Internet and Search Engine Basics Internet Marketing Importance of Internet Marketing Types of Internet
Digital Marketing Training Institute
Our USP Live Training Expert Faculty Personalized Training Post Training Support Trusted Institute 5+ Years Experience Flexible Batches Certified Trainers Digital Marketing Training Institute Mumbai Branch:
Image Galleries: How to Post and Display Images in Digital Commons
bepress Digital Commons Digital Commons Reference Material and User Guides 12-2014 Image Galleries: How to Post and Display Images in Digital Commons bepress Follow this and additional works at: http://digitalcommons.bepress.com/reference
Investigating Hadoop for Large Spatiotemporal Processing Tasks
Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein [email protected] Stephen Mcdonald [email protected] Benjamin Lewis [email protected] Weihe
A Platform for Large-Scale Machine Learning on Web Design
A Platform for Large-Scale Machine Learning on Web Design Arvind Satyanarayan SAP Stanford Graduate Fellow Dept. of Computer Science Stanford University 353 Serra Mall Stanford, CA 94305 USA [email protected]
ANALYSIS. wikia.com. YOUR NAME & SLOGAN Call Me: +11-223-444-5556
ANALYSIS wikia.com -- YOUR NAME & SLOGAN Content MOBILE DEVICES Mobile optimisation GOOGLE SEARCH RESULT LIST PREVIEW DOMAIN / URL AUDIT NUMBER OF SOCIAL MEDIA SHARES META DATA SEO AUDIT: CONTENT ANALYSIS
Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
Orientation Course - Lab Manual
Orientation Course - Lab Manual Using the Virtual Managed Workplace site for the lab exercises Your instructor will provide the following information before the first lab exercise begins: Your numerical
Ingeniux 8 CMS Web Management System ICIT Technology Training and Advancement ([email protected])
Ingeniux 8 CMS Web Management System ICIT Technology Training and Advancement ([email protected]) Updated on 10/17/2014 Table of Contents About... 4 Who Can Use It... 4 Log into Ingeniux... 4 Using Ingeniux
Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine
Answering the Call Two new DB2 Web Query options expand Microsoft integration As printed in the September 2009 edition of the IBM Systems Magazine Written by Robert Andrews [email protected] End-user
Decision-making using web analytics. Rachell Underhill, UNC Grad School Anita Crescenzi, UNC Health Sciences Library
Decision-making using web analytics Rachell Underhill, UNC Grad School Anita Crescenzi, UNC Health Sciences Library For today Analytics at The Graduate School and the Health Sciences Library What analytics
Bibliometrics and Transaction Log Analysis. Bibliometrics Citation Analysis Transaction Log Analysis
and Transaction Log Analysis Bibliometrics Citation Analysis Transaction Log Analysis Definitions: Quantitative study of literatures as reflected in bibliographies Use of quantitative analysis and statistics
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
SharePoint Server 2010 Capacity Management: Software Boundaries and Limits
SharePoint Server 2010 Capacity Management: Software Boundaries and s This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,
Working with the British Library and DataCite A guide for Higher Education Institutions in the UK
Working with the British Library and DataCite A guide for Higher Education Institutions in the UK Contents About this guide This booklet is intended as an introduction to the DataCite service that UK organisations
Glyma Deployment Instructions
Glyma Deployment Instructions Version 0.8 Copyright 2015 Christopher Tomich and Paul Culmsee and Peter Chow Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
OECD.Stat Web Browser User Guide
OECD.Stat Web Browser User Guide May 2013 May 2013 1 p.10 Search by keyword across themes and datasets p.31 View and save combined queries p.11 Customise dimensions: select variables, change table layout;
Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints
Response to Invitation to Tender: requirements and feasibility study on preservation of e-prints A proposal to the JISC from the Arts and Humanities Data Service and the University of Nottingham, Project
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
The Open Source Knowledge Discovery and Document Analysis Platform
Enabling Agile Intelligence through Open Analytics The Open Source Knowledge Discovery and Document Analysis Platform 17/10/2012 1 Agenda Introduction and Agenda Problem Definition Knowledge Discovery
