Archive-IT Services Andrea Mills Booksgroup Collections Specialist



Similar documents
Practical Options for Archiving Social Media

WEB ARCHIVING AT SCALE

Tools for Web Archiving: The Java/Open Source Tools to Crawl, Access & Search the Web. NLA Gordon Mohr March 28, 2012

How To Understand Web Archiving Metadata

Indexing big data with Tika, Solr, and map-reduce

Web Archiving Tools: An Overview

THE WEB ARCHIVING LIFE CYCLE MODEL

Everything you ever wanted to know about. Physiotherapy. C a n a d a ONLINE

Kris Carpenter Negulescu, Director The Internet Archive, Web Group

Mark E. Pruzansky MD. Local SEO Action Plan for. About your Local SEO Action Plan. Technical SEO. 301 Redirects. XML Sitemap. Robots.

Building a master s degree on digital archiving and web archiving. Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF)

DFID Research Open and Enhanced Access Policy: Implementation guide

The British Academy of Management. Website and Social Media Policy

Plagiarism. Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support for South Asia Head, LRC & CDDL, IIM Kozhikode

Local Loading. The OCUL, Scholars Portal, and Publisher Relationship

Web Archiving and Scholarly Use of Web Archives

INDUSTRY GUIDE TO FINANCIAL PLANNING WEEK

Collecting and Providing Access to Large Scale Archived Web Data. Helen Hockx-Yu Head of Web Archiving, British Library

A survey of web archive search architectures

USM Web Content Management System

Best Practices. for libraries to maximize digital circulation. Your checklist to a successful digital collection. Staff. Collection Development

WEB DEVELOPMENT & SEO

Google Product. Google Module 1

State Records Guideline No 18. Managing Social Media Records

Archiving the Web and Beyond: A Look at Twi8er and Facebook (and some other things too)

Start the tour. Oxford University Press All rights reserved.

Scholarly Use of Web Archives

Commerce 4KH3: Management Issues in Electronic Business

Digital Heritage Preservation - Economic Realities and Options

FINDING THESES AND DISSERTATIONS

Website, Blogs, Social Sites : Create web presence in the world of Internet rcchak@gmail.com, June 21, 2015.

STEPPING UP TO THE ELECTRONIC ARCHIVING CHALLENGE: OCLC S ROLE. Andrea Keyhani Director, Licensing & Publisher Relations

Oracle Social Relationship Management (SRM): Professional Services for Branded Solution Delivery

How To Harvest For The Agnic Portal

B2B Software Content Marketing: 2013 Benchmarks, Budgets, and Trends North America

Housing Works. Content Management System Overview. Presented to:

Note: Survey responses are based upon the number of individuals that responded to the specific question. Response Responses Responses

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

ACORD Website Design

THE ICDD & SOCIAL MEDIA. By Betsy Potter, Director of Operations

How To Manage Pandora

SEO: What is it and Why is it Important?

Residential Technology Assessment by Educational Attainment. Do Not Copy Without Written Permission 85

Introducing our new Editor: Creator

Best Practices. for Library Partners to maximize digital circulation. Your checklist to a successful digital collection.

HTML5 for ETDs. Virginia Polytechnic Institute and State University CS May 8 th, Sung Hee Park. Dr. Edward Fox.

OVERVIEW OF NTU LIBRARIES 南 洋 理 工 大 学 图 书 馆 简 介

A new home page design is being finalized, which will add a new link to material in other languages through the top navigation of the homepage.

The Australian War Memorial s Digital Asset Management System

ISLE Open Educational Resources What IOER Offers Now

BeeSocial. Create A Buzz About Your Business. Social Media Marketing. Bee Social Marketing is part of Genacom, Inc.

San Francisco

User s Guide: Archiving Work from an LMS PROJECT SHARE

WEB ARCHIVING IN THE UNITED STATES: A 2013 SURVEY AN NDSA REPORT

BUSINESS PLAN Library and Archives Canada

Salesforce CRM Content Implementation Guide

ANNUAL SURVEY ON INFOCOMM USAGE IN HOUSEHOLDS AND BY INDIVIDUALS FOR 2012

BIG DATA. John A. Eisenhauer Chair, Data Governance Society Rick Young - Managing Director 3Sage Consulting

How to Use Social Media to Enhance Your Web Presence USING SOCIAL MEDIA FOR BUSINESS.

How to Drive More Traffic to Your Event Website

Digital Collecting Strategy

COMPLIANCE MATRIX of GIGW

NEWSLETTERS FOR LEAD NURTURING LEADFORMIX BEST PRACTICES

Transcription:

Getting Started with Archive-IT Services Andrea Mills Booksgroup Collections Specialist

Internet Archive Micro History Text Archive Update Archive-IT Services

1996 The Internet Archive is created, with the goal to archive and preserve the World Wide Web www.archive.org

2004-- Book digitization begins at University of Toronto Libraries 2006--Archive-IT begins targeted web archiving services

OpenLibrary, TVNews, Audio and Video, Computer Games and Software

Updates 10 Years of Digitization

A Decade of Collecting 2.3 million ebooks 1250 Contributing Institutions 400 Sponsors 2450 unique texts collections More than 150 digitization projects currently underway

Canadian Libraries

Government Publications

Social Media Twitter @internetarchive @IABooksGlobal Instagram http://instagram.com/iabookscanada Flickr www.flickr.com/photos/internetarchivebookimages

Getting Started with Archive-IT Services https://archive-it.org

Archive-IT.org

Web Archiving The process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re-use.

Archive-IT vs. WaybackMachine

Archive-IT Services Web based application and fully hosted solution; includes access and storage (2 copies) Tools for selection, scoping and metadata creation Scope-IT Capture content using 10 different frequencies

Types of Content HTML, text, video, audio, social media, PDF, images, passwordprotected content, static databases, newspapers Social Media: Flickr, Twitter, Instagram, Vimeo and Facebook only with Archive-IT

Features Different levels of access for users Browse collections by both URL, Full text search (basic and advanced) and metadata search 9 post crawl reports for Analysis Online Help Section, Partner Specialists and Tech Support

How does it Work? Heritrix: Web Crawler Umbra: Assists/provides flexibility for the crawler to access sites as a browser does Wayback Machine: Access tool for rendering and the viewing pages - the web as it was. NutchWAX: Search engine Full-text search SOLR: Metadata search

Starting to Collect

Big Questions Do you have a Mission/Mandate to Collect? What are the Goals and Objectives for the Collection? Vision for the Collection?

Mandate to Collect... What now? Institutional Collection Web Content

Goals and Objectives Why is this web archive important? Short-term Vision (3 yrs.) Long Term Vision (10 yrs.)

Vision for Collection What will it look like? How will it be used? How will it be managed and maintained?

Broad to Specific As of today, Archive-It has collected 8,961,536,030 URLs for 2,643 public collections!

Broad Collections Canadian Government Information collected by University of Toronto has 605 seeds

Broad Collections Prairie Provinces Politics Prairie Provinces Politics & Economics collected by University of Alberta has 393 seeds

Specific Collections University of Southern California collecting 1 seed

Site Closures Aboriginal Canada Portal Closed February 12, 2013

10 Years on Mars: Collected by University of Michigan Capture public perception of the Mars Rovers on their 10th anniversary, and to preserve and provide access to that information for the future. 1. Official government documents 2. Popular news and Science media 3. Fringe (conspiracy theorizing, alien spotting...)

Current Events Ebola Virus Disease Collected by University of Manitoba has 13 seeds

Test Account and Practise https://archive-it.org/contact-us

Test Account Create a collection, capture content and view the results Start with Five (5) URLs 1 crawl Archive up to 250,000 webpages

Is your seed already in the WaybackMachine? Search both keywords and URLs https://archive-it.org/explore

Is the Site Archived Elsewhere? Ask your Colleagues LISTSERVs Registry options?

Valuable Experience Attempt to capture all or part of your proposed collection in your test crawl This will help determine Scope, Frequency, QA needs and Subscription level

Start Collecting Refer back to Mission, Goals and Vision for collection Repeat

Learn More https://archive-it.org/learn- more Download our white paper on the web archiving life cycle Check out our blog: https://archive-it.org/blog