Big Data and computational research methods: Opportunities and challenges for cultural industries research Queensland University of Technology www.qut.edu.au
Patrik Wikström Associate Professor Principal Research Fellow, QUT DMRC Research interests 1. Business innovation and digital disruption in the creative and cultural economy 2. Use and development of computational research methods Digital creative economy; Music; Computational social science; Time
QUT QUT DMRC conducts world-leading research that helps society understand and adapt to the social, cultural and economic transformations associated with digital media technologies. Journalism, Public Communication & Democracy Industries, Policies & Economies DIGITAL MEDIA Digital Method Innovation Technologies & Practices in Everyday Life
Agenda The computational turn, Big data Four examples Some methodological considerations Building theory about complex and dynamic phenomena
Computational turn (Berry) What has changed? 1. Abundance of data on human behaviour created by the digital traces of our everyday online activities 2. Digital-born methods and tools for largescale data analysis and visualisation
Social networks Examples gg Structural dynamics (Inter)-network information flows Platforms Online user behaviour Multisided market dynamics Sentiment dynamics
1. Visit logs 2. Web APIs 3. Web scraping 4. Web archives Examples (different data sources)
Clickstream data from visit logs
About the study The role of the website in a magazine business Exploratory vs goal-oriented user behaviour
Identification of typical user behaviours using hierarchical cluster analysis Page #1 Page #2 Page #3 Page #4
Web APIs
Public communication via social media platforms
The Sydney Siege (Dec 2014) Axel Bruns
The Australian Twittersphere Horse Racing ~140k Australian accounts with degree > 1000, as of Sep. 2013 Hard Right Leftists Football Politics Education AFL TalkbackCycling Agriculture Journalists Literature V8s TV News Adelaide / SA Arts Beer Netizens UFC Media Music Pop Teen Idols Fashion Advertising Food NRL Celebrities Cinema Urban Media Utilities Wine Cricket NRU Cody Simpson Beauty Marketing Hillsong Business Mums PR HR / Support Parenting Real Estate Investing Followback Home Business Sole Traders Self-Help Perth Axel Bruns
Web API Application Programming Interface https://api.genderize.io/?name=peter https://api.twitter.com/1.1/search/tweets.json?q=paris
Web scraping
Web scraping Because many online services do not offer a programming interface to access their data
Example: Changing patterns in the production and consumption of cultural artefacts Data collection Cultural artefacts, Popular music, Artists, albums & tracks Data analytics E.g. Time based, Geographic, Acoustic metadata
Changing audio attributes over time: 2 Duration 1.5 1 0.5 0 0.5 1 AU UK US 1.5 2 2.5 3
Danceability 2.5 2 1.5 1 0.5 0 0.5 AU UK US 1 1.5 2 2.5 3
3 2.5 2 1.5 1 0.5 0 1958 1960 1962 1964 1966 1968 1970 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 1972 2002 2004 2006 2008 2010 2012 0.5 1 1.5 2 Speechiness AU UK US
3.5 Acousticness 3 2.5 2 1.5 1 0.5 AU UK US 0 0.5 1 1.5
Energy 2 1.5 1 0.5 0 0.5 AU UK US 1 1.5 2 2.5 3
3 Liveness 2 1 0 1 AU UK US 2 3 4
2.5 Loudness 2 1.5 1 0.5 0 AU UK US 0.5 1 1.5 2 2.5
2 What is this? Is it a measure of music innovation? Is it cyclical? 1.8 1.6 1.4 1.2 1 0.8 0.6 1958 1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 US
Beautiful Soup
Beautiful Soup
Web archives
A Battle for Legitimacy
A Battle for Legitimacy (sentiment analysis: multiple aspect rating data source: web archive)
Beautiful Soup
https://www.python.org http://pandas.pydata.org https://try.jupyter.org http://www.python-course.eu
Opportunities and challenges Cost efficient Unobtrusive Requires some technical skills Data seduction Apophenia
Methodological considerations Qualitative or quantitative data? Are these methods a complement or substitute to traditional methods? E.g. what about motivational drivers? Production v collection of data Raw data is an oxymoron (Gitelman 2013) Ethical concerns? E.g. access often necessitates engagement Is this the end of theory?
What is theory? Theory is the answer to queries of why. Theory is about connections among phenomena, a story about why acts, events, structure and thoughts occur. What theory is not // Sutton & Straw, 1995: 378
Big data certainly doesn t mean the end of theory (Auerbach, 2014) We no longer need to speculate and hypothesise; we simply need to let machines lead us to the patterns, trends, and relationships in social, economic, political, and environmental relationships. (Graham, 2012) Research Centre
Perhaps we need a new approach to theory development?
Complex and dynamic phenomena Large number of (locally) interacting elements. Any element is affected by and affects several other elements. The interactions are non-linear: small changes in can cause large effects. They are dynamic and have a history. They evolve and their past is co-responsible for their present behaviour. Often difficult or impossible to define system boundaries. Systems are often far from equilibrium. Micro level actions generate macro level patterns (e.g. Miller & Page 2007)
It is challenging to build theory that is able to capture such complexities
Traditional modelling approaches are simply not very useful: Verbal models The paper with the largest circulation in a market has financial and economic advantages that enable it to increase advertising and circulation sales by attracting customers from the smaller paper. As the leading paper attracts more circulation, it attracts more advertising, which in turn attracts more circulation, trapping the secondary paper in a circulation spiral that ultimately leads to its demise.
Traditional modelling approaches are simply not very useful: Verbal models Signifier sound image Signified concept
Traditional modelling approaches are simply not very useful: Statistical (e.g. regression) models
Traditional modelling approaches are simply not very useful: Mathematical models Interactions between feral cats, foxes, native carnivores, and rabbits in Australia. System of differential equations
Computational modelling, however, is a promising approach Concepts, assumptions, logics represented by a computer program. Allows simulation of model behaviour over time.
Computational modelling (e.g. agent-based modelling) may be a way forward
Conclusions Data abundance and computational methods enable facilitate cultural and creative industries research from new perspectives. A complement (not a substitute) to traditional methods. We need new approaches to develop theory about these complex and dynamic phenomena. Coding is gradually becoming an essential research skill.
www.qut.edu.au/research/dmrc Queensland University of Technology patrik.wikstrom@qut.edu.au Visit! Collaborate! Join! patrik.wikstrom@qut.edu.au @pwikstrom www.linkedin.com/in/patrikwikstrom