Big Data in Communication Research: Its Contents and Discontents



Similar documents
SECTION A The College of Communication Graduate Program

Big Data Hope or Hype?

I N D U S T R Y S P O T L I G H T. T h e Grow i n g Appeal of Ad va n c e d a n d P r e d i c ti ve Analytics f o r the Utility I n d u s t r y

Proposed Minor in Media Studies. Department of Communication. University of Utah

the beginner s guide to SOCIAL MEDIA METRICS

Big Data and the Uses and Disadvantages of Scientificity for Social Research

Collaborations between Official Statistics and Academia in the Era of Big Data

How To Listen To Social Media

SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE

Miracle Integrating Knowledge Management and Business Intelligence

Department of Behavioral Sciences and Health Education

HOW TO ACCURATELY TRACK YOUR SOCIAL MEDIA BUZZ

Ethnography and Big Data: A Rapprochement?

Methodological Issues for Interdisciplinary Research

Core Ideas of Engineering and Technology

5 Point Social Media Action Plan.

CREDIT TRANSFER: GUIDELINES FOR STUDENT TRANSFER AND ARTICULATION AMONG MISSOURI COLLEGES AND UNIVERSITIES

Taking A Proactive Approach To Loyalty & Retention

Master s of Arts in Communication

Ethnography and Big Data

Workshop Discussion Notes: Housing

Master s Programme in International Administration and Global Governance

Engaging E-Patients in Clinical Trials through Social Media. Patient Recruitment and the E-Patient: A Survey Analysis

GLOSSARY OF EVALUATION TERMS

College of Agriculture, School of Human Environmental Sciences

Five steps to improving the customer service experience

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives

Capturing Meaningful Competitive Intelligence from the Social Media Movement

Criminal Justice Evaluation Framework (CJEF): Conducting effective outcome evaluations

Information Visualization WS 2013/14 11 Visual Analytics

Upon completion of the First Year Navigation Competency, students will be able to: Understand the physical and virtual WCSU campus;

CULMINATING PROJECT GUIDELINES

Good morning. It is a pleasure to be with you here today to talk about the value and promise of Big Data.

INDEX OF TEMPLATES INTRODUCING WHAT THEY SAY

BARBARA SEMEDO Strategic Advisor, Communications & Media basemedo@gmail.com

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

Social Media. Campaign Checklist

Strategic Sourcing Outlook: Emerging Techniques and Media

College-wide Goal Assessment Plans (SoA&S Assessment Coordinator September 24, 2015)

How to gather and evaluate information

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

WHITE PAPER. Virtual Impact. The Internet s Effect on How Candidates Look for Jobs and How Companies Look for Candidates.

IBM Customer Experience Suite and Predictive Analytics

Sources: Summary Data is exploding in volume, variety and velocity timely

Comparing User Engagement across Seven Interactive and Social-Media Ad Types.

Contents Page. Programme Specification... 2

Prequalification Education, Assessment of Professional Competence and Experience Requirements of Professional Accountants

The State of Community Engagement in Graduate Education: Reflecting on 10 Years of Progress

Take Advantage of Social Media. Monitoring.

Research Note What is Big Data?

Bachelor of Information Technology

31 December Dear Sir:

Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics

HR STILL GETTING IT WRONG BIG DATA & PREDICTIVE ANALYTICS THE RIGHT WAY

Big Data a threat or a chance?

Guidelines for Integrative Core Curriculum Themes and Perspectives Designations

Curriculum Development for Doctoral Studies in Education

Broad and Integrative Knowledge. Applied and Collaborative Learning. Civic and Global Learning

NEDS A NALYTIC SUMMARY

Innovation by Design Thinking:

Information Literacy and Information Technology Literacy: New Components in the Curriculum for a Digital Culture

Programme Study Plan

Publishing multiple journal articles from a single data set: Issues and recommendations

Navigating Big Data business analytics

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE

Rules of Good Scientific Practice

Short-Term Forecasting in Retail Energy Markets

A. The master of arts, educational studies program will allow students to do the following.

IS THE INTERNET OF THINGS MAKING OUR LIVES EASIER OR MORE DIFFICULT? WHAT ARE THE OPPORTUNITIES AND CHALLENGES?

CUSTOMER SERVICE MEETS SOCIAL MEDIA: BEST PRACTICES FOR ENGAGEMENT

In Partnership with Zignal Labs

Media Boot Camp for Patient Recruitment

FROM TERRIBLE TO TERRIFIC UNDERGRADUATE

Big Data how it changes the way you treat data

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

AAA School of Advertising Part Time Bachelor of Arts in Marketing Communication

Transcription:

Journal of Communication ISSN 0021-9916 AFTERWORD Big Data in Communication Research: Its Contents and Discontents Malcolm R. Parks Department of Communication, University of Washington, Seattle, WA, 98195, USA doi:10.1111/jcom.12090 I had two goals in mind when I decided to dedicate a special issue of the Journal of Communication to Big Data. One was to provide an outlet for the growing number of excellent Big Data studies on mass communication, digital technologies, political communication, health communication, and many other areas of interest to our discipline. My focus was on empirical papers that made substantive contributions using new methods, rather than on explanations, endorsements, or critiques of the Big Data movement.thegoalwastoshowcasethestateoftheartinrecentresearchincomputational communication science. My second goal was to provide a benchmark for research innovation. Big Data research is still in its infancy in communication. Relatively little of the work done in thisearlystagewillstandthetestoftime,butallofitwilllikelybecriticalintheon going process of conceptual and methodological advance. The articles featured in this issue represent the best of what is currently being done. Their strengths will guide futurework,butso,too,willtheirlimitations. What is Big Data? There is no one definition of Big Data. Thought about in simple terms, Big Data involves datasets that are far larger than those traditionally examined in journals like this one. Yet there has always been considerable variation in the size of datasets, ranging from small experimental studies to large samples involving census or polling data. Sizealoneisthereforeaninsufficientdescriptor.Inmoresubstantiveterms,theBig Data movement has been associated with the analysis of large social networks (including online networks such as Twitter), automated data aggregation and mining, web and mobile analytics, visualization of large datasets, sentiment analysis/opinion mining, machine learning, natural language processing, and computer-assisted content analysis of very large datasets. Several of these methods are featured in this issue. Corresponding author: Malcolm R. Parks; e-mail: macp@uw.edu Journal of Communication 64 (2014) 355 360 2014 International Communication Association 355

Big Data, Contents and Discontents M. R. Parks As others have observed, the Big Data movement often brings along an ideology or mythology that asserts a special, transformative value (e.g., boyd & Crawford, 2012). Inordertoevaluatetheseclaims,itisnecessarytodistinguishthetruepromiseofBig Data from some of the over promises of its most fervent proponents. Separating promise from poses Big Data methods and sources will become increasingly important because they offer data and insights that could not be obtained in other ways. These methods open research to work involving datasets of previously unimagined size. Indeed they often provide the only means of managing and analyzing digital datasets of increasing size and complexity. The entry by Baek, Park, and Cha, for instance, begins with a scan of approximately 1.7 billion tweets. Even after the most relevant data are selected by these and the other authors represented in this issue, the sample sizes typically remain in thehundredsofthousands.theultimatevalueofbigdata,however,derivesnotfrom sheersize,butratherfromtwootherfactors. First, because the Big Data movement is coupled with what is sometimes called datafication, that is, the creation of quantitative datasets from information that has not been viewed as data in the past (Mayer-Schönberger & Cukier, 2013), it leads to new research questions and new ways of thinking about existing questions. Among the many examples is the relatively large social network that Christakis and Fowler (2007, 2009) constructed from previously overlooked participant tracking information in the long-running Framingham heart study. In this issue, we might think of Giglietto and Selva s creative analysis of messages tweeted by television viewers as an exampleofrarelyexamineddiscourse.wemightalsopointtohillandshaw ssubstantive appropriation of administrative data in wikis. BigDatacanopennewdoorsinasecondwayaswell.Itscomputationaltools enhance researchers ability to bring together multiple datasets datasets of different times, from different places, or gathered at different times. This ability has always existed on a small scale, but new data management and analytic capabilities make it possible to conduct research of unprecedented complexity and scope. Several of the studies here have done just that. One of the more striking examples is Jungherr s analysis combining Twitter content, separate content analyses of print and television coverage, and public opinion polling related to the 2009 federal elections in Germany. Together,datafication(i.e.,theconstructionandsharingofmultifaceteddatasets)and the development of new analytic tools to work on them hold dramatic promise for our discipline. In order to realize this promise, however, it is necessary to place Big Data in a larger intellectual and disciplinary context. This requires looking beyond much of the hyperbole about the Big Data Revolution. Among the most extreme claims is the assertion that Big Data will render science itself obsolete, or at least no longer in need of theory, models, or interpretation. With enough data, the numbers speak for themselves (Anderson, 2008). Others claim that simple correlations will be sufficient in the 356 Journal of Communication 64 (2014) 355 360 2014 International Communication Association

M. R. Parks Big Data, Contents and Discontents AgeofBigData,thathypothesistestingandcausalanalysiswillnolongerbenecessary to advance science (Mayer-Schönberger & Cukier, 2013). It is fair to say that such positions are intended to be provocative, often in service of the authors market interests. A more realistic view might be to acknowledge the value of large-scale datasets, while at the same time recognizing that the choice of data (even Big Data) always reflects at least an implicit theoretic model and that the desire for explanation will continue to lead scientists toward causal analysis and experimentation (even though some experiments may now become very large). A more subtle, but still misleading view of Big Data is that it presents a sharp break from the past or possibly even a new science. The term data science is particularly unfortunate in this regard, both because of its redundancy, and because of thewayitobscuresthefactthatbigdata svalueultimatelydependsondisciplinary and interdisciplinary utility. Kuhn s (1962) observation that substantive advances and methodological advances are more often intertwined than independent is no less true todaythanitwas50yearsago.thissuggeststhattheimpactof datascience specialists will depend on their ability to create value for those engaged with substantive disciplinary and interdisciplinary issues. BigDataisnotsomuchabreakfromthepastassimplythelatestinamoreor lesssteadyflowofmethodologicaladvancesthathavetransformedthesocialsciences over the past 100 years. These include the codification of experimental design, the development of systematic sampling and surveys, the advent of multivariate statistical analysis, the development of searchable compilations of media content, and video recording, to name just a few. We might also keep in mind that perceptions of bigness are themselves relative and historically bound. Several of the innovations mentioned above were the big data revolutions of their day. Making the most of Big Data Placing the Big Data movement in disciplinary and historical context enables us to attend to the issues that must be addressed if progress is to be made. Four issues would benefit from greater attention in my view. Greater attention to questions of theoretic and social importance One might imagine three stages in the adoption of new research methods. Studies done during the initial stage emphasize the methods themselves. Many are essentially demonstration projects. Much of the current Big Data work in the social sciences, including communication, is still at this first stage. Next, investigators begin to apply new methods to smaller problems or well established findings. Many of the findings will essentially replicate previous work or address questions of secondary importance. These studies may be useful substantively and provide guides for those working in more central areas. Yet they will often be limited because they often rely on the data that are available rather than on the data that are needed. Finally, new methods move into the mainstream as investigators begin to apply them to theoretically and socially important problems. Journal of Communication 64 (2014) 355 360 2014 International Communication Association 357

Big Data, Contents and Discontents M. R. Parks We selected manuscripts for this issue with this third stage in mind. Although the chosen studies vary, each clearly grapples with an issue of interest within our research community. Studies by Jungherr, by Neuman and colleagues, and by Vargo and colleagues bring new approaches to understanding central questions regarding thenatureandtimingofinfluencebetweenonlinesocialmediaandmoretraditional media. Colleoni and colleagues examine the theoretically important question of whether the structure interaction on Twitter brings users into contact with diverse perspectives or merely creates an echo chamber of likeminded voices. Emery and her colleagues open a new window for considering the theoretic and socially important issue of how public health campaigns work. Advancing toward this higher stage will inevitably bring changes in patterns of graduate education and collaboration. Just as media and communication researchers in the 1970s sought training in multivariate analysis from those outside the discipline, we now reach out to those with the computational skills. But we need not go with hats in hand. It is clear that we have much to offer in terms of substance, substance often lacking in the demonstration projects so often found in computationally oriented work. Our contribution becomes even more critical when research sponsors begin to demand that the makers of new tools demonstrate their societal value. Greaterconcernforvalidityofmeasurement In many of the submissions we received, researchers selected the large-scale indicators they could and were then left in the position of trying to attribute broader conceptual meaning or importance to operational indicators of convenience rather than of choice. Even more difficult problems arise when a given operational indicator appears to be valid, but is too limited to capture the full richness of the concept it presumably measures. Progress depends as well on providing stronger evidence to support the validity of automated coding systems, machine learning algorithms, sentiment analysis, and the other new tools rapidly entering the research sphere. The paper by Emery and colleagues offers a good example of what is necessary to validate machine-coding procedures.otherpapers,includingmanyofthoseweturnedaway,eitherreliedon coding validation procedures that were not tailored to the specific research situation or the authors simply assumed that previous, often very limited, validation efforts were sufficient. Here we must guard against the error of equating very detailed technical descriptions of procedures with evidence of validity. Very detailed procedures and algorithms are not necessarily any more valid than more straightforward ones. Indeed, because more assumptions are made, there is more to go wrong. Greater attention to sampling and representativeness Big Data is not complete data. This can be seen in the articles in this issue. In nearly every case investigators have started with a dataset that represented only a portion of thesampleuniverseofinterestandhavethenfocusedonastillsmallerportionofthe sample universe. The article by Giglietto and Selva provides us with an illuminating 358 Journal of Communication 64 (2014) 355 360 2014 International Communication Association

M. R. Parks Big Data, Contents and Discontents example. Their dataset of tweets (N = 2.49 million) related to political talk shows for the 2012/2013 season is described as complete. Upon closer inspection, however, it is apparent that the dataset only contains tweets that included official or the most popularhashtagsfortheprogramsofinterest.asjungherrnotesinhisarticle,the choice to sample Twitter messages using hashtags may slant the sample toward more experienced users. Giglietto and Selva based their final analyses on a much smaller dataset intended to reflect tweets during peaks of activity. This is not intended to be critical and indeed, to their credit, the authors are quite candid about the limitations ofthefinaldataset.thelargerpointisthatevenverylargedatasetsoftenrepresent samples whose generalizability and representativeness is open to challenge. Bigness does not ensure quality. It is striking that seven of the eight papers selected for this issue rely entirely or in part on Twitter data. Although Twitter users in the United States increasingly mirror its online population in basic demographic terms (Brenner & Smith, 2013), we know much less about the demographics of Twitter users in most other countries, particularly those in the developing world. As Baek and his colleagues acknowledge, this leaves cross-cultural comparisons of Twitter use and content open to concerns of sampling bias. Beyond this, however, there is no reason to assume that molar demographic similarities between Twitter users and the overall online population imply similarities in attitudes, issues discussed, or several of the other more specific issues addressed in this issue. In addition to concerns about how representative Twitter users are, we should also be concerned about Twitter s ability to represent social media platforms more generally.itisanappropriatechoiceonsubstantivegroundsinsomecases,butnotinothers, or at least not as a sole choice. Twitter was an excellent choice for Giglietto and Selva s analysis of second screen interaction, though one might acknowledge that television viewers also interact with one another via direct texts, e-mail, and cellphone. As digitalvenuesproliferate,itwillbecomeincreasinglyimportanttoanalyzemorethan one medium, just as those interested in media coverage of issues more generally now are encouraged to consider both broadcast and print media. The study by Neuman, Guggenheim, Jang, and Bae offers an outstanding example of an analysis using multiple traditional and social media. In some other cases, it is fair to ask if Twitter data were representative of the larger, more diverse media streams substantively related to the authors research questions. This is a legitimate question for any study that is based on a single digital media platform, again, regardless of the amount of data drawn from that platform. Enhancing data access and ensuring data quality Several commentators have raised concerns about the fact that much of the Big Data of greatest interest to social scientists, particularly communication and media scholars, is the property of commercial entities such as Facebook, Twitter, and Google. These companies either deny or tightly manage data access by researchers, leading to fears of newdigitaldivides andthecreationofclassesofresearcherswhoareeither data Journal of Communication 64 (2014) 355 360 2014 International Communication Association 359

Big Data, Contents and Discontents M. R. Parks rich or data poor (e.g., boyd & Crawford, 2012). These are legitimate fears and ought to be a source of alarm for everyone in the research community as more and moreofoursociallifeisconductedwithin commercially owned walled gardens. But the rhetoric of digital divides fails to capture the full range of the danger. As communication researchers begin to work with the owners of social networking sites and other proprietary venues, they may well begin to experience the same challenges that biomedical researchers have experienced working with commercial entities making drugs and medical devices. Communication researchers may have to contend with the fact that companies will grant access only to data that they believe will reflect positively upon their commercial interests. They will discover, as biomedical researchers have, that sponsorship and assistance often comes with strings. Sometimesthesestringsareexplicit,asinthecase of a company demanding the right to approve manuscripts before they are submitted for publication. Sometimes, the strings will be implicit, as in cases where researchers are biased by their own desire to please or to gain visibility through association with a trendy company or industry group. In extreme cases, there may be direct conflicts of financial interest when investigators have ownership or extensive consulting relationships with the companies whose products they study. SignificantchallengesthereforefaceusaswemoveintotheeraofBigData.Some arenew,butfortunatelymostofthemarethesamechallengesthathavebeenfaced with major methodological innovations in the past. Looking past claims of exceptionalism will help us recognize the road ahead. Moving forward holds the potential for not only examining existing questions in new ways, but for positioning the discipline of communication at the heart of efforts to understand social and civic life in an increasingly mediated age. The challenges are familiar; the theoretic and practical potential is enormous. References Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired [WWW document]. Retrieved from http://www.wired.com/science/ discoveries/magazine/16-07/pb_theory. boyd, d., & Crawford, K. (2012). Critical questions for big data. Information, Communication &Society, 15, 662 679. doi:10.1080/1369118x.2012.678878. Brenner, J., & Smith, A. (2013). 72% of online adults are social networking site users. Washington, DC: Pew Research Center s Internet & American Life Project. Retrieved from http://pewinternet.org/ /media//files/reports/2013/pip_social_networking_sites _update_pdf.pdf. Christakis, N., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357,370 379. doi:10.1056/nejmsa066082. Christakis, N., & Fowler, J. H. (2009). Connected: The surprising power of our social networks and how they shape our lives.newyork,ny:little,brown. Kuhn, T. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press. Mayer-Schönberger, V., & Cukier, K. (2013). Big Data: A revolution that will transform how we live, work, and think. Boston, MA: Houghton Mifflin Harcourt. 360 Journal of Communication 64 (2014) 355 360 2014 International Communication Association