Welcome. Opening Session Internet Archives & Research Potential Building Community: Research Highlights. Discussion and Challenges



Similar documents
AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Analysis One Code Desc. Transaction Amount. Fiscal Period

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Attributes and Objectives of Social Media. What is Social Media? Maximize Reach with Social Media

FY 2015 Schedule at a Glance

How To Understand The City Of Hong Kong

Collecting and Providing Access to Large Scale Archived Web Data. Helen Hockx-Yu Head of Web Archiving, British Library

Choosing a Cell Phone Plan-Verizon

Qi Liu Rutgers Business School ISACA New York 2013

Big Data and the Uses and Disadvantages of Scientificity for Social Research

CENTERPOINT ENERGY TEXARKANA SERVICE AREA GAS SUPPLY RATE (GSR) JULY Small Commercial Service (SCS-1) GSR

Ashley Institute of Training Schedule of VET Tuition Fees 2015

Academic Calendar Arkansas State University - Jonesboro

Unprecedented Exposure January 2014

CAFIS REPORT

CHILDREN AND YOUNG PEOPLE'S PLAN: PLANNING AND PERFORMANCE MANAGEMENT STRATEGY

When Strategy Meets Crisis: How To Keep Your Team (and You) Focused. Beth Talbert Global Strategic Portfolio Manager PMO15WS2

Computing & Telecommunications Services Monthly Report March 2015

Deep Security/Intrusion Defense Firewall - IDS/IPS Coverage Statistics and Comparison

Methods of Social Media Research: Data Collection & Use in Social Media

BCOE Payroll Calendar. Monday Tuesday Wednesday Thursday Friday Jun Jul Full Force Calc

Deep Security Intrusion Detection & Prevention (IDS/IPS) Coverage Statistics and Comparison

Research Laboratory. Java Web Crawler & Hadoop MapReduce Anri Morchiladze && Bachana Dolidze Supervisor Nodar Momtselidze

PdM Overview. Predictive Maintenance Services

Consumer ID Theft Total Costs

Text Analytics The three-minute guide

FOR RELEASE: MONDAY, SEPTEMBER 10 AT 4 PM

Media Planning. Marketing Communications 2002

Social Media Get Beyond the Hype and Find Out the True Business Value

Wealth Management: The enduring approach to financial success

Deep Security Vulnerability Protection Summary

Archive-IT Services Andrea Mills Booksgroup Collections Specialist

Update on the Twitter Archive At the Library of Congress

KA LEO MEDIA KIT

opennms reporting generation tool

ROYAL REHAB COLLEGE AND THE ENTOURAGE EDUCATION GROUP. UPDATED SCHEDULE OF VET UNITS OF STUDY AND VET TUITION FEES Course Aug 1/2015

Big data in official statistics Insights about world heritage from the analysis of Wikipedia use

A Guide to the Insider Buying Investment Strategy

CHOOSE MY BEST PLAN OPTION (PLAN FINDER) INSTRUCTIONS

Ways We Use Integers. Negative Numbers in Bar Graphs

Insurance and Banking Subcommittee

Alexandria Overview. Sept 4, 2015

Taking the Complexity Out of Hybrid IT : Questions for Every CIO to Consider. CIO 100, August 2015

NASDAQ DUBAI TRADING AND SETTLEMENT CALENDAR On US Federal Reserve Holidays, no settlements will take place for USD.

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

BILD and WELT: Two successful Paid Content Strategies Capital Markets Day Berlin, December 10, Donata Hopfen, Publishing Director BILD Group

oct 03 / 2013 nov 12 / oct 05 / oct 07 / oct 21 / oct 24 / nov 07 / 2013 nov 14 / 2013.

Network Analysis Basics and applications to online data

ESTABLISHMENT OF A NEW BUSINESS

A!Team!Cymru!EIS!Report:!Growing!Exploitation!of!Small! OfCice!Routers!Creating!Serious!Risks!

ACTIVE MICROSOFT CERTIFICATIONS:

SITS:Vision for Colleges

Big Data in OpenTopography

Using Data Mining for Mobile Communication Clustering and Characterization

Stanford Computer Security Lab. TrackBack Spam: Abuse and Prevention. Elie Bursztein, Peifung E. Lam, John C. Mitchell Stanford University

Domain Name Abuse Detection. Liming Wang

LAUREA MAGISTRALE - CURRICULUM IN INTERNATIONAL MANAGEMENT, LEGISLATION AND SOCIETY. 1st TERM (14 SEPT - 27 NOV)

Introduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.

WEATHERHEAD EXECUTIVE EDUCATION COURSE CATALOG

Department of Public Welfare (DPW)

Grain Stocks Estimates: Can Anything Explain the Market Surprises of Recent Years? Scott H. Irwin

US Army Corps of Engineers. Vision Statement. Be the premier stewards of entrusted hydropower resources

Basic Project Management & Planning

GREYSTONE COLLEGE ONE PAGERS GSC

SIPP Core and Topical Modules Organization and Issues

The principles, processes, tools and techniques of project management

Employers Compliance with the Health Insurance Act Annual Report 2015

Global Strategy on Human Resources for Health: Workforce 2030

ACTIVE MICROSOFT CERTIFICATIONS:

17. Do you approve or disapprove of the way Barack Obama is handling his job as President? Tot Rep Dem Ind Men Wom

BT Retail Social Media making it easy for our customers

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

The Economics of Digitization: An Agenda for NSF. By Shane Greenstein, Josh Lerner, and Scott Stern

Saving for your. child s future

Castellon, of January Copyright Grupo Europeo de Consultoría LocalEurope. Reservados todos los derechos.

Business Plan Example. 31 July 2020

Transcription:

Welcome Opening Session Internet Archives & Research Potential Building Community: Research Highlights Oxford Internet Institute Centre for Internet Studies & NetLab LS3 & the ALEXANDRIA Project WebScience @ University of Southampton Discussion and Challenges

ArchiveHub and Internet Archive Research

{AGENDA} 1. Large Scale Data! 2. Developing New Tools! 3. Testing and Building Theory

Opportunity: The Internet Archive contains the largest single record of the history of the World Wide Web from 1995 to the present a wealth of untapped research data. Challenge: There is a significant lack of researchready databases and tools available to the scholarly community 5

A sense of scale The Library of Congress contains approximately 3 PB of data a The Internet Archive contains in excess of 10 PB of archived cultural material The Wayback Machine contains more than 410 Billion available web pages (as of 2014). Internet Archive Library of Congress a http://blogs.loc.gov/digitalpreservation/2012/03/how-many-libraries-of-congress-does-it-take/ 6

7

Opportunity: The ArchiveHub project aims to support the creation and dissemination of general guidelines & tools for conducting theoretically and methodologically rigorous longitudinal research using archival Web data 8

HistoryTracker Tool Version 2.0 PIG Scripts in Hadoop Environment Link Lists & Text Data 20 th Century Collection @ RU Curated Data Sets RU High-Speed Computing Cluster 9

Dataset Research Potential Dates Captures Unique URLs US Media Occupy Wall Street Hurricane Katrina Superstorm Sandy Previous studies of news media organizations (Greer & Mensing, 2006; Weber, 2012; Weber & Monge, In Press); focus on evolutionary patterns Previous research on NGOs in the online environment (Bach & Stark, 2004; Shumate, 2003, 2012; Shumate, Fulk, & Monge, 2005); use of hyperlink data to study the formation and role of alliances between SMOs Study the growth of political activity in online environments (Adamic & Glance, 2005; Bruns, 2007; Chang & Park, 2012); polarization & media discourse Online networks and organizational resilience (Chewning, Lai and Doerfel, 2012; Perry, Taylor and Doerfel, 2003) in the wake of disasters; information dissemination 2008 2012 1,315,132,55 5 2010 2012 247,928,272 11,3259,655 US Senate 109 26,965,770 8,674,397 112 US House Congresses 51,840,777 12,410,014 2003 2012 1,694,236 663,740 2003 2012 41,703,112 20,013,455 539,184,823 10

What s in the data? Link Data: Source Destination Date Frequency Content Type Bytes Descriptive Text http://gawker.com http://gawker.com/5953665/mittromneys-staff-played-the-media-coveringthem-in-a-friendly-game-of-flag-football 2012-10-22 Mitt Romney's Staff Played the Media Covering Them in a Friendly Game of Flag 11

http://archivehub.rutgers.edu 12

13

14

PUTTING BIG THEORY INTO BIG DATA [or]! moving from observing the Web to observing new phenomenon on the Web 15

Tracing the Emergence of Organizational Forms Environment: Organizations compete for scare resources; during rapid periods of disruption, new entrants seek protected niches (Weber & Monge 2014) Population: In digital spaces, online connections provide communicative representations of information flows (Weber & Monge, 2012)! Formation of ties (e.g. hyperlinks) can positively impact long-term likelihood of organization survival (Weber, 2012) Organization: Organizations adapt internally, reconfiguring team structures and developing new routines for knowledge sharing (Ellison, Gibbs & Weber, In Press; Weber & Kim, Under Review) 16

17

18

19

20

21

22

Big Data Big Theory? Networks are central to social movements in that links between nodes can be influential in collective action! Examples of nodes includes participants, organizations, media and communications technologies Social networks and social movements (Diani, 2003) The interaction between actors, and between actors and hashtags, collectively represent a networked form of organization Network form of organization (Powell, 1990)

Data Triangulation of data insulates against false readings from largescale data (see Lazer, Kennedy, King and Vespignani, 2014)! Internet Archive: 335 OWS related websites; ~330 million edges over a 2-year period Lexis Nexis: Search conducted to assess U.S. newspaper coverage of OWS from the early stages of the movement in September 2011 through Sept. 2012 Search OWS keywords, e.g. Occupy Wall Street, Occupy Oakland Twitter Gnip PowerTrack Search by keywords; captures a larger volume of Twitter data than other options Sample includes October 17, 2011, through January 5, 2012. Initial study focused on the critical two-month period from November 1 through December 31, 2011, 750,816 tweets across the two-month period. 25

OWS News Coverage

OWS on the Web 335 seed organizations based on records from #OccupyResearch Data extracted for 2011 Captures & 2012, per based Monthon both matching 16000000 12000000 8000000 4000000 0 Jan-14 Apr-14 Jul-14 Oct-14 Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 Jan-17 Apr-17 28

Maximal Cores (k Coreness) Aug. 2011 Jan. 2012 29

000.00 Edges 500.00 000.00 500.00 00 Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 180 Jul-16 Oct-16 Vertices 150 120 90 60 Jan-15 Apr-15 Jul-15 Oct-15 Jan-16 Apr-16 Jul-16 Oct-16 30

0.07 Density 0.053 0.035 0.018 0 Jan-15 Mar-15 May-15 Jul-15 Sept-15 Nov-15 Jan-16 Mar-16 May-16 Jul-16 Sept-16 Nov-16 31

100 Clusters 75 50 25 0 Jan-15 Mar-15 May-15 Jul-15 Sept-15 Nov-15 Jan-16 Mar-16 May-16 Jul-16 Sept-16 Nov-16 32

33

Challenges: Access Challenges: Scaling access to the data Data Challenges: Moving from access to researchable data! Research Challenges: Bridging big data to big theory Potential for use as a historical research tool 34

Want data? Email me! matthew.weber@rutgers.edu ArchiveHub: http://archivehub.rutgers.edu The Team Kris Carpenter, Vinay Goel, Internet Archive David Lazer, Katherine Ognyanova, Northeastern University Allie Kosterich, Hai Nguyen, Luan Nguyen, Marya Doerfel, Rutgers University Peter Monge, Ayushman Datta, Kristen Guth, USC Research supported by NSF Award #1244727 and the NetSCI Lab @ Rutgers 35