Social Media Case Studies: Archiving Social Media: Mesolithic Online Resources (Mesolithic Miscellany and Mesolithic Research Forum). NHPP Number 5C1.114 (6765) Authors Katie Green, Communications and Access Manager, Archaeology Data Service (ADS), Department of Archaeology, The Kings Manor, York, YO1 7EP. Kieron Niven, Digital Archivist, Archaeology Data Service, Department of Archaeology, The Kings Manor, York, YO1 7EP. Summary This project draws on the ADS experience in digital preservation and data management, to study the preservation of social media outputs. This study uses a selection of the social media activity from Mesolithic Miscellany to explore the criteria for the appraisal and selection of social media outputs for digital archiving and create a Guide to Good Practice for archiving, specifically, Facebook and Twitter outputs. Background The ADS supports research, learning and teaching by preserving digital data in the long term and providing free access to a range of archaeological data. The ADS promotes good practice in the use of digital data, provides technical advice, and supports the deployment of digital technologies. The ADS Guides to Good Practice series provides a good framework in which to consider the technical archiving issues surrounding social media outputs. Challenges Social media has created new and interactive platforms for individuals and communities to create, edit, share and discuss material. The historic environment sector is increasingly embracing this as an important communication tool. However, little thought has been given to the long term value of social media outputs. Nonetheless, these outputs can record important milestones, detail responses from the community, and relate attitudes and feelings towards an archaeological event. Information of historical significance may be lost if we continue to view social media outputs as throw-away data. It is increasingly important to consider what to keep? and how to keep it? This case study will address these challenges. 1
Approach This case study will approach these challenges through consideration of the Twitter feed and Facebook page of Mesolithic Miscellany. The following steps will be taken: undertake a literature review, develop criteria for the selection of social media, review outputs following criteria for their archaeological and sociohistoric value, consider non-professional audience outputs in relation to outputs from more informed audiences, to assess the influence of audience background on suitability for archiving, explore methods for downloading Twitter and Facebook content, determine the best methods for long term preservation of social media outputs. Aims This study aims to benefit the historic environment sector as a whole. In particular: The ADS will develop guidelines for the selection and retention of social media outputs; Mesolithic Miscellany will use the study to guide their future social media activity; and HER s will benefit from the guidelines developed. Guidance delivered at an early stage in the uptake of social media will have the greatest impact in forming approaches to its long term use. The ADS will use the Mesolithic Miscellany social media outputs to: consider the potential of social media as a resource in its own right, explore and develop a criteria to be used in the appraisal and selection of outputs for digital preservation, develop a guide to best practice for archiving social media content. Analysis of the social media retention issues encountered during this case study will allow the development of guidance for social media archiving. This case study will produce a report providing guidance on the selection and preservation of social media outputs, specifically Facebook pages and Twitter feeds. It will feed into the future implementation of a social media retention strategy for Mesolithic Miscellany, and the outcomes will be incorporated into ADS policy and guidance for social media archiving which will be available online. Delivery Mesolithic Miscellany was chosen for this case study because it provides a discrete body of social media outputs from its Twitter feed and Facebook page. The Facebook page is made up of posts by the moderator, which are often commented links to external content, with further comments and likes from 2
users. The Facebook page also includes account details, a brief description of Mesolithic Miscellany, one associated photograph album, two profile pictures and one cover photo. The Facebook account can also collect user statistics recording the number of views of posts on the page. This is particularly interesting for assessing impact. The Twitter feed is made up of tweets posted by the moderator which are largely links to external sites. The Twitter feed includes re-tweets and associated conversations. The Twitter account also includes a list of followers, accounts followed and a brief description of Mesolithic Miscellany. The common elements of both of these social media outlets are, the chronology of the posts, the textual content, links contained within the posts, and the user interaction attached to the posts. Obstacles to overcome The following problems were encountered: Current archive selection and retention guidelines are not suitable for social media outputs if each comment is to be assessed for its value, Complex legal issues associated with the retention and reuse of content, Social media platforms record and display content in different ways, Capturing and preserving the mixed media content of social media outputs is extremely problematic. Achievements Outcomes: The social media output of Mesolithic Miscellany included socio-historic content that could be retained for future research, confirming social media outputs as a resource in its own right, in both the study of the European Mesolithic and the development of archaeology as a discipline. At an account holder level, we could use ADS Guidance on the Selection of Material for Deposit and Archive to determine retention value. Within individual accounts the selection of what output to archive was more difficult. Tweets and Facebook posts have an initial announcement stage and then an open ended 'comment' stage. There is often a high level of interaction after the initial post which tails off as it ages. This makes it difficult to decide at what point to select outputs, particularly for Facebook pages where issues posted can come back into vogue. For Facebook pages the timeframe for preservation activity should be dependent upon the content of the account; for Mesolithic Miscellany yearly would be appropriate. In the case of Twitter feeds an annual archive process was considered sufficient, with the year starting three months in the past from the date of archive to allow for on-going conversations. 3
The basic criteria for assessing comments were determined as; relevance to social media forum topic, value to future research, sensitivity of the data and, that it is not offensive. All content within the Mesolithic Miscellany social media accounts was found to be appropriate for retention. However, those engaging with Mesolithic Miscellany social media are a self-selecting group of informed individuals. The fact that the content is appropriate is also a result of the management of the accounts whereby spam, abusive or unrelated comments are removed. If management and moderation of the social media account is on-going, selection at a comment level will not be required before archiving. The Mesolithic Miscellany Facebook page and Twitter feed often links to external media, it is important that the selection and retention process avoids duplication of external media such as videos. Social media do not display their content in a single interface. Facebook in particular allows customised lists and views, which alter the way in which content is presented. This requires a decision to be made regarding which of these views to archive, especially if using tools like PDF which records content as a flat image. Under the OAIS reference model, preservation of any object requires the preservation of more than the object itself. Knowledge of how to decode the information must also be preserved, as must any intellectual context. Social media outputs are more than just text, they link to internal and external content such as websites, pictures, videos and other media. This raises the technical issue of how to archive such media. Native download options for Twitter and Facebook are currently limited. Twitter provides a download option which records the tweets from the selected account in.csv format, but it does not preserve conversations or the context of the Twitter feed. This study found that combining this.csv download with.pdf copies of the Twitter feed maybe the best option, with additional preservation of linked content. However, legal considerations must be explored before accessioning this linked content. Facebook only allows a data download for personal pages, and there are many problems with using web-crawlers for capturing social media content automatically. As a result the only suitable archiving method currently available is using PDF. However, PDF cannot preserve dynamic media such as video and audio files. Therefore separate preservation of such associated and linked media would also be required. Again, legal considerations must be explored before accessioning this content. One benefit of PDF is that it does record the structural and aesthetic context of the original social media output, if not its functionality. 4
Legal considerations are complex. The terms and conditions of the social media provider can effect what can be archived and reused, these must be investigated. Consent from contributors and the intellectual property rights of the owners of linked media also have to be considered. This study recommends that notices be placed within the social media accounts stating that archiving of content is taking place. More investigation into this area is needed and legal advice should be taken before reusing any archived material. The following products have been delivered by this project. Case study report (this document) ADS internal document on digital preservation of social media outputs. Selection criteria for social media outputs and good practice guide for social media archiving to be added to the ADS advice pages on our website following feedback from this case study. Mesolithic Miscellany recommendation document. Recommendations for further investigation (see below). Lessons learned Selection must be guided by individual requirements. What to archive is a compromise between what is desirable (e.g. just user s posts; all posts; all posts but no linked media; etc.) and what is possible, technically (e.g. linked media from external sources) and what is legal (securing copyright permissions). There are still a lot of issues with social media preservation that need resolving and much more consideration is needed before extensive guidance can be provided but basic guidance is possible for the current level of usage. Next steps This study has identified several areas which need further investigation: legal considerations; more sophisticated methods for capturing and archiving social media outputs; other social media types such as blogs, Pinterest and Flicker. Further information Project Manager Katie Green katie.green@york.ac.uk Digital Archivist Kieron Niven kieron.niven@york.ac.uk Main Resources Archaeology Data Service, http://archaeologydataservice.ac.uk/ 5
Mesolithic Miscellany, https://sites.google.com/site/mesolithicmiscellany/ Mesolithic Miscellany Facebook Page, http://www.facebook.com/mesolithic.miscellany?ref=ts&fref=ts Mesolithic Miscellany Twitter Feed, https://twitter.com/mesomisc/ Further Reading Selection and Retention ADS. (2013) Guidance on the Selection of Material for Deposit and Archive. Available at http://archaeologydataservice.ac.uk/advice/selectionguidance/ (April 12, 2013). DPC. (n.d.). Decision Tree for Selection of Digital Materials for Long-term Retention. Digital Preservation Coalition. Retrieved Available at http://www.dpconline.org/advice/decision-tree.html (June 24, 2010). Esanu, J., Davidson, J., Ross, S., & Anderson, W. (2004) Selection, Appraisal, and Retention of Digital Scientific Data: Highlights of an ERPANET/CODATA Workshop. Data Science Journal, 3, 226. Gutmann, M., Schurer, K., Donakowski, D., & Beedham, H. (2004) The selection, appraisal, and retention of social science data. Data Science Journal, 3/0, 209 221. NARA - US National Archives and Records Administration. (n.d.) Strategic Directions: Appraisal Policy. NARA. Available at http://www.archives.gov/records-mgmt/initiatives/appraisal.html (June 24, 2010). Witt, M. (2008) Institutional Repositories and Research Data Curation in a Distributed Environment. Library Trends, 57/2, 191 201. Whyte, A. and Wilson, A. (2010) How to Appraise & Select Research Data for Curation, Digital Curation Centre. Available at http://www.dcc.ac.uk/resources/how-guides/appraise-select-research-data (June 6, 2011). Legal Considerations Madhava, R. (2011) 10 Things to Know About Preserving Social Media, Information Management, ARMA. http://content.arma.org/imm/september- October2011/10thingstoknowaboutpreservingsocialmedia.aspx McDonough, J. P. et al. (2010) Preserving Virtual Worlds Final Report, IDEALS. Available at http://hdl.handle.net/2142/17097 (April 12, 2013). Tweetcc. (n.d.) Publish & license tweets with Creative Commons, Tweetcc. Available at http://tweetcc.com/ (April 12, 2013). Archiving Social Media: Methods and Examples Fuhrig, L. S. (2011) The Smithsonian: Using and Archiving Facebook, The Bigger Picture, The Smithsonian. http://siarchives.si.edu/blog/smithsonianusing-and-archiving-facebook (April 12, 2013). Jeffery, S. (2012) A new Digital Dark Age? Collaborative web tools, social media and long-term preservation, World Archaeology, 44/4. 553-570. 6
Available at http://www.tandfonline.com/doi/full/10.1080/00438243.2012.737579 (April 22, 2013). Rosenthal, D. (2013) DAWN vs. Twitter, DSHR s Blog. Available at http://blog.dshr.org/2013/01/dawn-vs-twitter.html ( April 12, 2013). Rosenthal, D. (2013) Update on the Twitter Archive At the Library of Congress, News from the Library of Congress, Library of Congress. Available at www.loc.gov/today/pr/2013/files/twitter_report_2013jan.pdf (April 12, 2013). Marshall, C. C. (2009) How People Manage Information over a Lifetime in (eds. William Jones and Jaime Teevan) Personal Information Management, University of Washington Press. North Carolina State. (2013) State of North Carolina - Social Media Archive. NC State Government Web Site Archives and Access Program. Available at http://nc.gov.archivesocial.com/ (April 12, 2013). Twitter. (2012) Your Twitter Archive, Blog, Twitter. Available at http://blog.twitter.com/2012/12/your-twitter-archive.html (April 12, 2013). 2014 update A reassessment of the social media archiving strategy presented in the original case study does not reveal the possibility of any major improvements or amendments to the proposed strategy. Following the submission of the original case study the archiving methodology presented was followed and the Mesolithic Miscellany Facebook page and Twitter account were archived as PDF/A copies of the social media webpages. The downloadable data, including statistical data, made available from Twitter and Facebook was also downloaded and archived as csv files. This methodology was, as predicted; very time consuming due to the need to open every comment thread within each social media account. In review of the suggested one year archiving schedule, a more frequent schedule of three months or less is recommended to make the PDF process easier, less time consuming and reduce the computational power required to PDF long webpages. An archiving schedule should be designed and implemented as part of a larger social media account management plan in order to effectively incorporate these time-consuming activities with regular social media account management activities. Archiving social media accounts in this manner is suitable for social media accounts with small to medium sized comments threads and an average of one or two posts a day. Larger accounts may find this archiving strategy unsustainable due to the time required to carry out the activity. During the eleventh months since the original case study was submitted, neither Twitter nor Facebook have extended the functionality of their download options, and no suitable independent tool has been developed to enable a superior archiving solution to that presented in the original case study. However, the team behind Archive Facebook, a Mozilla Firefox add-on which can successfully archive personal Facebook pages, are hoping to develop a suitable solution for Facebook fan/ group pages in the future. Due 7
to the many disadvantages of the social media archiving strategy developed in the original case study Archive Facebook should be carefully monitored for developments which may improve upon the Facebook page archiving strategy presented in the original case study. The archived social media data from Mesolithic Miscellany has not been made available to the public as more investigation into the legal considerations needs to be carried out and while the social media account are active this is unnecessary. The potential inability to make this archive fully accessible in the future is undesirable, but the fast moving pace of social media tools and the possibility that both Facebook and Twitter could go offline in the near future, it is safer to archive social media accounts in this manner rather than wait for a more effective tool to emerge or the legal issues to be resolved. 8
If you require an alternative accessible version of this document (for instance in audio, Braille or large print) please contact our Customer Services Department: Telephone: 0370 333 1181 Fax: 01793 414926 Textphone: 0800 015 0516 E-mail: customers@english-heritage.org.uk