Advanced Archive- It Applica2on Training: Archiving Social Networking and Social Media Sites



Similar documents
Archiving the Social Web MARAC Spring 2013 Conference

Adobe Social Product Capabilities. Publish Anywhere

Archiving Social Media in Senators Offices

Practical Options for Archiving Social Media

Online Marketing Company INDIA Digital Marketing Training

Archive-IT Services Andrea Mills Booksgroup Collections Specialist

Social Media Marke-ng for Academic Research

Social Application Guide

First crawling of the Slovenian National web domain *.si: pitfalls, obstacles and challenges

Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS

Pu?ng B2B Research to the Legal Test

Introduction to Social Media

Indiana University Northwest Social Media Handbook

Trends in Student Perspectives of their College Search #IACAC Conference

Capturing the Web WEB ARCHIVING. Tools for the Capture of Digital Assets on Websites March 27, 2006 Kelly Eubank

Technology for Small Business

Enterprise Social Software

Social Networking Tools Comparison Chart

WEB ARCHIVING AT SCALE

Affordable SEO Services

Builder 2.0 Guide and Walkthrough

Effec%ve AX 2012 Upgrade Project Planning and Microso< Sure Step. Arbela Technologies

Emory University Social Media Strategy

PASIG San Diego March 13, 2015

Social Media Glossary

Social Media Recruitment 101 Supplementing your recruiting practices with the use of social media.

Social Media Monitoring: Engage121

Navigating the Web: Are You Missing The Boat?

Level 3 Diploma in Social Media for Business

Draft Response for delivering DITA.xml.org DITAweb. Written by Mark Poston, Senior Technical Consultant, Mekon Ltd.

Engaging the growing Washington, DC Chapter through a dynamic online presence

Mark E. Pruzansky MD. Local SEO Action Plan for. About your Local SEO Action Plan. Technical SEO. 301 Redirects. XML Sitemap. Robots.

Web Archiving and Scholarly Use of Web Archives

Marketing and Promoting Your Cooperative Through Social Media. How social media can be a success for your housing cooperative

Video #3 Creating Google Accounts Free SEO Training Roadmap Video Training Series

Using Social Networking Sites as a Platform for E-Learning

Digital Presence Strategy Washington and Lee University School of Law Library

Online Reputation Management Services

Building a master s degree on digital archiving and web archiving. Sara Aubry (IT department, BnF) Clément Oury (Legal Deposit department, BnF)

Dean College Social Media Handbook

Private Cloud Website Solu2on

University of Wisconsin- Milwaukee Social Media Guidelines Updated:

UNSW Social Media communication guidelines

Easy Strategies for using Content (Ctrl) in your Marketing Today

SAP HANA Cloud Portal Overview and Scenarios

Communications and Public Relations

Social Media Get Beyond the Hype and Find Out the True Business Value

What is FTH 2.0? replacement for

Leveraging Facebook to build your ecommerce Business. #ecomsa

Small Business Internet Marketing: Just What You Need to Know. ecape 900 Route 134 South Dennis, MA x10

Mobile App Proposal Magazine company- @address.com. January 12, y. Direct Contact.

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

Purpose. Introduction to the Guidelines. Social Media Definition.

Create a Personal Facebook Profile. (Unless you already have one) Create a Business Facebook Page

The Power of Social Media Marketing. Steven R. Van Hook, PhD

INTRODUCTION TO SOCIAL MEDIA

EDIT YOUR FUNDRAISING PAGE AND MANAGE YOUR HEADQUARTERS

Social Media Glossary of Terms

The Practice of Social Research in the Digital Age:

Table of contents. HTML5 Data Bindings SEO DMXzone

Smart Policing Initiative Website and Social Media

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO & PPC Strategy On Page SEO Off Page SEO Pricing Plans Why Us

10 Things To Know When Switching Web Developers. Presented by: Tim Banks, VP of Engineering

Social Media Marketing Strategies

Legacy Archiving How many lights do you leave on? September 14 th, 2015

1. Introduction to SEO (Search Engine Optimization)

Alert Notification as a Service

Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protec/ng Informa/on Assets Greg Senko

Getting Started with Social Media

Social Media. Style Guide

From Grassroots to the Cloud

Blackboard Mobile Learn: Best Practices for Making Online Courses Mobile-Friendly

How to Use Social Media to Enhance Your Web Presence USING SOCIAL MEDIA FOR BUSINESS.

MARKETING KUNG FU SEO: Key Things to Expand Your Digital Footprint. A Practical Checklist

The ischool Institute Professional Learning Certificate in Social Media Engagement Proposal

U of S Course Tools. List of Tools Available (Definitions) For Instructors

Archiving the Internet

The Socialtext Enterprise Collaboration Platform

Leveraging Social Media to Make Your Webinar a Success

Create Your Technology Strategy:

DESIGNING YOUR WEBSITE. The following guide will show you how to setup your music website using

Constructing Your Social Marketing Architecture

Boise State University Social Media Handbook

Social Media Strategy

! Spreecast Help Guide: How to Produce a Spreecast!

Custom Online Marketing Program Proposal for: Hearthstone Homes

Solving the Unique Challenges of IT Recruiting

Developing a Social Media Strategy

A wiki is nothing more than a website that is op-mized for easy edi-ng,

Current Quality Assurance Practices in Web Archiving. Prepared By. Brenda Reyes Ayala

Lync for Mac Get Help Guide

MANAGEMENT AND AUTOMATION TOOLS

Blackboard 9.1 Basic Instructor Manual

You Are What You Tweet Information Security & Risk Management Conference Steps to Personal Branding Success. University of Guelph

church and ministry websites.

How collaboration can save [more of] the web: recent progress in collaborative web archiving initiatives

Katy Young s Guide to... Twitter

Beyond Strategy: Building Your Mobile Capabili6es

Google Product. Google Module 1

Introduction 3. Step One: Create a Keyword Strategy 4. Step Two: Optimize Your Website 7. Step Three: Create Blog and Other Content 14

Transcription:

Advanced Archive- It Applica2on Training: Archiving Social Networking and Social Media Sites 1

Agenda Overview of Social Networking/Media sites Why archive these sites? Typical Challenges Best Prac2ces: TwiGer, Facebook, YouTube, Flickr Looking toward the future Ques2ons/Discussion 2

Why Archive These Sites? State Agencies: An increasing number have decided that the content on these sites are a record and need to be archived. "A tweet is a record University libraries: Used to share informa2on with students and alumni and contain important records about a school's culture, student body and campus events. Non Government Non Profit Organiza@ons: Used to record online presence and impact Researchers: Used to preserve valuable social reac2ons and change on topics of interest 3

Archive- It and Social Media Overview Capturing Social media sites is becoming more necessary for Archive- It partners S2ll focused on: Flickr, Facebook, TwiGer, and YouTube On our radar: Vimeo, LinkedIn, Others? Join the Archive- It social media list serve to hear breaking news, including fixes and adjustments within Archive- It 4

Social Media Crawling Notes Content behind log- ins can not be archived currently Feature in 4.8 Release, April 2013 Some parts of sites are not archive- friendly (i.e. complicated javascript, etc.) These sites tend to change both their technical structure and policy quickly and oeen. 5

Scoping Social Media Sites Because of the way many of these sites are structured, scoping crawls correctly is very important if you are archiving these sites. Each site has its own unique structure Not scoping correctly can result in crawling much much more than you intend, or not capturing the content you want to archive. 6

Scoping - Overall Approaches Trial and Error: Try to harvest with a variety of seings and a variety of seeds Quality Review: review archived content thoroughly Collaborate: compare approaches and results with other Archive- It users Document detailed instruc2ons, lessons learned, and best prac2ces for other partners 7

Best Prac2ces Best prac2ces for various social networking and social media sites are documented on the Archive- It Help Wiki: hgps://webarchive.jira.com/wiki/display/arih/ Archiving+Social+Networking+Sites+with +Archive- It 8

Best Prac2ces Be specific with your seed URLs - list only the page you would like to archive as a seed. Do NOT use the larger site as a seed (for example, do NOT use www.facebook.com or www.twiger.com as seeds. DO use: hgp://twiger.com/internetarchive/). Double check your seed: Do you need an ending slash /? Ignore Robots.txt as needed: Some sites block content using robots.txt 9

Best Prac2ces ALWAYS run a test crawl when first seing up these seeds to avoid using more of your document budget than expected. You may need to run more than one un2l you get it right. 10

Best Prac2ces ANer your first crawl Review post- crawl reports (did you crawl too much?) Review archived content in Wayback Did you capture all the areas you expected? Are there any display issues? 11

Reviewing Scoping Rules To the web app! 12

TwiGer Sample URLs Individual user feeds hgps://twiger.com/archiveitorg/ Searches hgps://twiger.com/search?q=web %20archiving&src=typd Lists hgps://twiger.com/smithsonian/smithsonian/ A specific tweet hgps://twiger.com/archiveitorg/status/ 294819565320413184 13

TwiGer - Scoping Expand Scope (using SURTs) to capture dynamically loading content: Individual TwiGer feed: +hgp://(com,twiger,)/i/profiles/show/ BrowardCollege/ Mul2ple TwiGer feeds: +hgp://(com,twiger,)/i/profiles/show/ 14

Links in Tweets Can I archive a url linked to using a url shortener? Yes! Use an Expand Scope rule for hgp://t.co/ - all URLs posted on TwiGer redirect through that domain Note: just the one page that the url shortener link points to will be archived (plus embedded content) 15

TwiGer Examples of Archived Pages 16

Facebook Sample URLs Individual User Profiles Timeline view hgp://www.facebook.com/tonyforsenate/ Pages - Timeline view hgp://www.facebook.com/archiveit/ Events hgp://www.facebook.com/events/265897963430841/ Albums hgps://www.facebook.com/media/set/?set=a. 13499334573.18616.6193904573&type=3 17

Facebook - Scoping Ignoring robots.txt: www.facebook.com qcdn.net akamaihd.net Document limit on www.facebook.com (recommended 2000 for each seed) Note, you cannot limit to *just* capture content from one Facebook account Expand Scope: SURT +hgp://(net,qcdn, 18

Facebook Currently we can capture the ini2al content on a Facebook 2meline, however the dynamically loading content can be difficult to capture due to the frequent changes in the way that content is served by Facebook Our engineers are working on keeping up to date with these changes and we are also inves2ga2ng alternate methods for capturing Facebook pages 19

Facebook Examples of Archived Pages 20

YouTube - Sample URLs Channel /User pages hgp://www.youtube.com/whitehouse Watch pages- individual videos hgp://www.youtube.com/watch?v=5lviuw8vj_e Uploaded Document RSS Feed hgp://gdata.youtube.com/feeds/api/users/whitehouse/ uploads/ Embedded YouTube Videos on other sites: hgp://www.whitehouse.gov/photos- and- video/video/ 2013/01/29/president- obama- speaks- comprehensive- immigra2on- reform 21

YouTube - Scoping For all YouTube content, ignore robots.txt for: youtube.com y2mg.com For Watch pages- individual videos Use One Page Only Seed Type For Channel/User pages Crawl with a document limit or using RSS/News Feed seed type 22

YouTube Viewing YouTube videos: YouTube videos for Watch pages and most embedded YouTube videos will playback normally in Wayback For Channel/User Pages or other pages where videos are not playing back within the page, view videos from the video report or the public video page for that seed. 23

YouTube Examples of Archived Pages 24

Flickr What types of pages can be archived? Photo streams Ex: hgp://www.flickr.com/photos/whitehouse/ Individual photos Ex: hgp://www.flickr.com/photos/whitehouse/ 8390033709/in/photostream 25

Flickr Examples of Archived Pages 26

Other Sites Can sites other than those already men2oned be archived? Yes! There are many more sites out there that can be archived. Please send us sites you are interested in archiving. Other sites men2oned by partners currently are Google+, LinkedIn, Vimeo, and SlideShare. 27

Moving Forward These best prac2ces will change as the sites themselves make changes. Please be sure to check the Help Wiki page for updates We con2nue to focus on working with our partners to improve the capture and display of archived social networking sites The Archive- It team is exploring other capture mechanisms besides using a tradi2onal crawler resource (Heritrix) Headless browsers Hybrid architecture API Partnering with third party soeware Enhance the display and search capabili2es 28

Thank you! Ques2ons? Discussion? Please take our quick survey: hgp://www.surveymonkey.com/s/gz8cwc8 29