Association of Research Libraries May 2011 Legal Issues in Building Social Media Collections http://www.flickr.com/x/t/0098009/photos/cobalt/204902316 Hope O Keeffe Library of Congress Office of the General Counsel loke@loc.gov. The text of this presentation is a United States Government work not subject to copyright in the United States. Images may be subject to copyright. This presentation does not reflect the views of the Library of Congress.
Familiar Legal Issues Acquisition Agreements Copyright Get the rights in the acquisition agreement Clear rights Use library & fair use exemptions Privacy Access Risk management http://www.flickr.com/x/t/0091009/photos/rofi/2647699204
Case Study: Twitter Archive Simple agreement No significant copyright issues BUT: downstream rights issues for users Some access issues 6 month quarantine No bulk distribution from LC website Researcher notification No commercial use No substantial redistribution Users have burden to comply with copyright, privacy Limited privacy issues Note: Twitter agreement is online, with Twitter s consent, at http://blogs.loc.gov/loc/files/2010/04/loc-twitter.pdf
Why Archive Twitter? May 1, 2011. http://twitter.com/#!/klerner/status/64895357355704320
How to acquire a moving target? 3 years, 2 months and 1 day. The time it took from the first Tweet (March 21, 2006) to the billionth Tweet. 1 week. The time it now takes for users to send a billion Tweets. 50 million. The average number of Tweets people sent per day, one year ago. 140 million. The average number of Tweets people sent per day, in the last month. 177 million. Tweets sent on March 11, 2011. 456. Tweets per second (TPS) when Michael Jackson died on June 25, 2009 (a record at that time). 6,939. Current TPS record, set 4 seconds after midnight in Japan on New Year s Day. 572,000. Number of new accounts created on March 12, 2011. 460,000. Average number of new accounts per day over the last month. Source: http://blog.twitter.com/2011/03/numbers.html. Image: Huffington Post
Twitter Archive Lessons Win for Twitter: preservation; validation of cultural importance Win for Library: preserve digital snapshot of our time Reaction: Announcement crashed Library blog for 1 st time; huge press interest Historians & academics thrilled Magnified concerns regarding privacy once it s the Government collecting tweets Public conversation regarding value of social media archiving Challenge of acquiring continuing archive Twitter & Library working together on implementation Reaching agreement is just the beginning
Case Study: Web Archiving Enormous rights issues Both legal & practical Access issues linked to rights Risk management is essential Best practices development is key http://www.flickr.com/x/t/0094009/photos/turtlemom_nancy/1914397629
Why should libraries archive websites? Average lifespan of website: 45 days Existing archives like Wayback Machine tend to be broad but shallow rather than narrow & deep Don t capture all levels Need both kinds of archiving Library archiving allows curation http://www.flickr.com/x/t/0093009/photos/clintjcl/4382938375/
Clearance Process for Web Archives Pre 2002: no permissions sought Few takedown requests No legal challenges Since 2002, permission-based approach: Three categories Permission to crawl & to display offsite Notice of crawl, permission to display No notice Investment: thousands of staff hours spent seeking permissions Very few denials Many nonresponsive Practice: No crawl if permission denied or no response No display if permission denial or no response Library of Congress Bain Collection 1909
Result: What you won t see in new online LC web archives
Case Study: September 11 archive Collected 30,000 sites Archive available on open web No permissions sought LC s highestdemand web archive few takedown requests since 2002 http://www.flickr.com/photos/idovermani/3911596294/
Case Study: Election sites 2000: Archive on open web since 2000 No takedown requests 2002-2010: Very few denials Many no response Much of archive restricted to on-premises use http://www.flickr.com/photos/lowercolumbiacollege/4505658206/#/
What about robots.txt? Legal argument: absence of robots.txt is implied license to crawl Following robots.txt may show good faith Problem: overinclusive lawyers use robots.txt as proxy for copyright permission; web managers use for other purposes like load management LC practice: Disregard robots.txt Leave crawler id Work with web mgrs on tech issues Typically want crawl slowdown NO copyright denials If blocked by 412 precondition, contact web mgr http://www.flickr.com/x/t/0095009/photos/selva/24604141/
Is there a solution for Web archives? Legislative change (i.e., Section 108 for web archives)? Develop best practices for web archiving institutions? Model after community-developed fair use standards? Can one size fit all? Reassess practices/risk based on experience: e.g., few takedown requests, few denials? accept more risk? crawl now/permission later v. permission first/crawl later Opt-out notices Vary permissions practice by type of website shift the forms of access -- e.g., something between premises only and open web? http://www.flickr.com/x/t/0090009/photos/csessums/4781752262/
Legal tips for digital acquisitions Examine the business case for each acquisition Include lawyers on the team along with curators and techies for both acquisition and implementation Think about rights from the beginning Including rights for users Think about access and data curation from the beginning Incorporate risk management Evolve and adapt practice over time based on experience http://www.flickr.com/x/t/0097009/photos/duncan/2288115818
Decision Tree For Using a Work: Do you already own the copyright or have permission? YES USE NO Is it protected by copyright? NO USE YES Does a library exemption or fair use apply? YES USE NO Can you get permission to use this work? YES USE NO Is the level of risk acceptable to your institution? YES USE NO DON T USE http://www.flickr.com/x/t/0094009/photos/47030134@n06/4312660408
Risk Management There is always a risk Might be copyrighted abroad Might not have permission of right person Donor or seller may not be sole copyright owner Layers of copyright mean multiple owners Even if copyright is clear, issues of privacy, publicity, trademark, libel, content challenges Might face meritless claims even if do everything right Institution assesses risk, decides what risks to assume and how much
Risk increases with level of access Dark archive Premises only Enterprise only Interlibrary loan Researcher copies Passworded access to server Open Web Commercial exploitation (e.g. POD) http://www.flickr.com/x/t/0094009/photos/22750018@n05/4434362439/
Ways to minimize risk Comply with collections-based restrictions Triage permission requests: low hanging fruit, litigious rightsholders When you accept gifts or get licenses, get rights for users as well as the institution Make sure the releases, licenses, and permissions cover your proposed use Prepare defenses like 108 & fair use Adjust levels of access Provide a mechanism for notices of infringement or abuse Follow strict notice and takedown - but make it friendly! tell us more about the content v. complain about copyright violations here DMCA registration even if not user-generated content Comply with best practices http://www.flickr.com/x/t/0092009/photos/tonyjcase/3499402735/