1 The implications of big data for government, business and the academy Rob Kitchin, National University of Ireland Maynooth
2 Small data / big data Characteristic Small data Big data Volume Limited to large Very large Exhaustivity Samples Entire populations Resolution and Coarse & weak to tight Tight & strong indexicality & strong Relationality Weak to strong Strong Velocity Slow, freeze-framed Fast Variety Limited to wide Wide Flexible and scalable Low to middling High
3 Directed o Surveillance: CCTV, drones/satellite o Digitisation of millions of documents, films, audio recordings Automated o Automated surveillance o Digital devices o Sensed and scanned data o Interaction and transactional data o IoT (Internet of things) and M2M (machine to machine) Volunteered o Social media o Sousveillance (wearables) o Crowdsourcing o Citizen science Urban big data
4 Big data analytics Challenge of making sense of big data is coping with its abundance and exhaustivity, timeliness and dynamism, messiness and uncertainty, semistructured or unstructured nature Solution has been machine learning made possible by advances in computation and computational techniques Four broad classes of analytics: data mining and pattern recognition statistical analysis prediction, simulation, and optimization data visualization and visual analytics
6 Government and Business
7 Government and business Big data and associated analytics will enhance the governing of people, managing organisations, leveraging value and producing capital, creating better places, improving health and well-being, tackling social and ecological issues, etc. Driven by overlapping set of discourses/promises: improved insight and wisdom, productivity, competitiveness, efficiency, effectiveness, utility, sustainability, securitisation...
8 Governing people State is a prime generator and user of data Has sought to create more systematic ways of managing and governing populations and delivering services through auditing and quantification of society Citizens and institutions are identified and monitored, records updated, profiles mapped, data analyzed to spot issues and trends, payments are tracked, and services and disciplining administered Big data latest set of technologies that can expand and improve state work by extending the timeliness and expansiveness of calculative practices
9 Managing organisations Data provide the basis to manage an organisation more effectively, efficiently, competitively and productively Information systems have become essential support infrastructures to track and manage complex assemblages of people, components, commodities and infrastructures across time and space Big data - real-time intelligence on an organisation - offers further efficiencies whilst reducing risks, costs and operational losses, and improving customer experience Three common data-driven systems to facilitate greater coordination and control within and between organisations include Enterprise Resource Planning (ERP), Supply Chain Management (SCM), and Customer Relationship Management (CRM) Produce cost savings across operational base
10 Leveraging value and capital Big data solutions enable realisation of untapped capital, increase return on investment, and leverage competitive advantage There are several ways in which big data solutions can offer corporate intelligence that can grow turnover and profits inc. segmenting the market, tackling customer and employee churn, optimizing various inputs (e.g., components, labour, utilities) and yield, and building various profiles and predictive models to answer a variety of questions: whether to contact the customer or not? (target marketing); provide the customer with a retention offer or not? (customer retention); which type of ad or choice of words/images or product to present to a customer? (content selection); which channel the customer should be contacted through? (channel selection); whether a customer is offered a higher or lower price? (dynamic pricing/discounting); whether a debtor is offered a deeper write-off? (collections); whether a customer is offered a higher or lower credit limit or interest rate? (credit risk).
11 Creating better places Produce smart cities Places increasingly composed of and monitored by pervasive and ubiquitous computing and their economy and governance driven by ICT innovation, creativity and entrepreneurship Cities can be understood and regulated in realtime; they produce, share, integrate, consume and act on the big data they produce Create more liveable, secure, functional, competitive and sustainable places
12 12 Modules: 100s of interactive graphs/maps How s Dublin Doing? Dublin Indicators & benchmarking Dublin Real-Time Real-time data from sensors across Dublin Dublin Mapped Detailed Census maps for 2006 & 2011 Census, crime, welfare Dublin Planning Land zoning & planning permissions Dublin Near To Me Maps of location and nearness to public services; area profiles Dublin Housing Maps of housing, house prices and commuting patterns Dublin Reporting Report issues to city authorities Dublin Data Stores Access to all data used in the dashboard Dublin Social (in progress) Maps of social media activity Dublin Modelled (in progress) Modelling and scenario tools Dublin Apps (in progress) Directory of apps relevant to Dublin Have Your Say (in progress) Feedback from users
13 The Academy
14 Big data and epistemology Revolutions in science have often been preceded by revolutions in measurement Sinan Aral Big data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities Academic analysis typically based on extracting insights from small datasets using a limited set of tools Now have a data deluge in many fields and a suite of new analytical techniques Transforming how we frame, ask and answer questions
15 The end of theory Anderson (2008) argues that the data deluge makes the scientific method obsolete ; that the patterns and relationships contained within big data inherently produce meaningful and insightful knowledge about complex phenomena. There is now a better way. Petabytes allow us to say: Correlation is enough.... We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.... Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. There s no reason to cling to our old ways. Ayasdi software claims to be able to: automatically discover insights -- regardless of complexity -- without asking questions.
16 The end of theory Powerful and attractive set of ideas at work in the empiricist epistemology that run counter to mainstream deductive approach: big data can capture a whole of a domain and provide full resolution there is no need for a priori theory, models or hypotheses through the application of agnostic data analytics the data can speak for themselves free of human bias or framing, and that any patterns and relationships within big data are inherently meaningful and truthful meaning transcends context or domain-specific knowledge, thus can be interpreted by anyone who can decode a statistic or data visualization These work together to suggest that a new mode of science is being created
17 The end of theory Empiricist thinking is problematic for four reasons: Big data are both a representation and a sample, shaped by the technology and platform used, the data ontology employed, the regulatory environment, and are subject to sampling bias Big data do not arise from nowhere, free from the the regulating force of philosophy Big data cannot they simply speak for themselves free of human bias or framing Big data cannot be interpreted outside of context and domain-specific knowledge
18 Data-driven science Data-driven science seeks to hold to the tenets of the scientific method, but is more open to using a hybrid combination of abductive, inductive and deductive approaches to advance the understanding of a phenomena It differs from the traditional, experimental deductive design in that it seeks to generate hypotheses and insights born from the data rather than born from the theory Seeks to incorporate a mode of induction into the research design, though explanation through induction is not the intended end-point. Instead, it forms a new mode of hypothesis generation before a deductive approach is employed Nor does the process of induction arise from nowhere, but is situated and contextualised within a highly evolved theoretical domain As such, the epistemological strategy is to use guide knowledge discovery techniques to identify potential questions worthy of further examination and testing Approach is suited to extracting additional, valuable insights that traditional knowledge-driven science would fail to generate
19 Computational social science For positivistic scholars in the social sciences, big data offers the opportunity to develop more sophisticated, wider-scale, finergrained models of human life. To shift from: data-scarce to data-rich studies of societies from static snapshots to dynamic unfoldings from coarse aggregations to high resolutions from relatively simple models to more complex, sophisticated simulations The potential is for studies with much greater breadth, depth, scale, and timeliness, and are inherently longitudinal Moreover, the variety, exhaustivity, resolution, and relationality of data, plus the growing power of computation and new data analytics, addresses some of the critiques of positivistic scholarship to date, especially those of reductionism and universalism, by providing more finely grained, sensitive, and nuanced analysis
20 Digital humanities For post-positivist scholars, big data offers both opportunities and challenges The opportunities are a proliferation, digitisation and interlinking of a diverse set of analogue and unstructured data, much of it new (e.g., social media) and many of which have heretofore been difficult to access (e.g., millions of books, documents, newspapers, photographs, art works, material objects, etc., from across history) Provision of new tools of data curation, management and analysis that can handle massive numbers of data objects Rather than concentrating on a handful of novels or photographs, or a couple of artists and their work, it becomes possible to search and connect across a very large number of related works Has implications for social sciences, but most widely being examined through the emerging field of digital humanities
21 Digital humanities Digital humanities advocates broadly divided into two camps epistemologically Those that believe that that new techniques -- counting, graphing, mapping, data mining -- bring methodological rigour and objectivity to disciplines that heretofore been unsystematic and random in their focus and approach Those that see the techniques as a supplement to, rather than replacement for existing humanities methods and theory building Both cases tend to use descriptive rather than inferential statistics The claims of the former have opened up an epistemological debate centred on close versus distant reading/interpretation, ability of algorithms to parse meaning & context DH seen by some as mechanistic and reductionist, identifying patterns but not processes or meaning Also worries CSS and DH relegate questions concerning metaphysical aspects of human life (meanings, beliefs, experiences) and normative questions (ethical and moral dilemmas about how things should be as opposed to how they are)
22 What happens to small data studies? Big data doesn t replace or negate small data Small data have a proven track record of answering specific questions, with est. procedures, methods, etc. Studies can be much more finely tailored Small data studies seek to mine gold from carefully working a narrow seam, whereas big data studies seek to extract nuggets through open-pit mining, scooping up and sieving huge tracts of land Small data will, however, increasingly be made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and re-use, and open them up to combination with big data and analysis using big data analytics
23 Ethical, political and social consequences of big data
24 Surveillance Surveillance/dataveillance Creation of extensive digital footprints (data people themselves leave behind) but data shadows (information about them generated by others) Related to individual, objects, interactions, transactions, territories... Creation of a vast data market and data brokerage and analytics industry Reshaping individual s relationship with companies and state Significantly reshaping privacy
25 Data type Accounts log App Activity App Data Usage App Install Battery Device Info GPS MMS NetData PhoneCall SMS TelephonyInfo WifiConnection WifiNeighbors Root Check Malware Info Data collected by Uber android app log name, package name, process number of activity, processed id Cache size, code size, data size, name, package name installed at, name, package name, unknown sources enabled, version code, version name health, level, plugged, present, scale, status, technology, temperature, voltage board, brand, build version, cell number, device, device type, display, fingerprint, IP, MAC address, manufacturer, model, OS platform, product, SDK code, total disk space, unknown sources enabled accuracy, altitude, latitude, longitude, provider, speed from number, MMS at, MMS type, service number, to number bytes received, bytes sent, connection type, interface type call duration, called at, from number, phone call type, to number from number, service number, SMS at, SMS type, to number cell tower ID, cell tower latitude, cell tower longitude, IMEI, ISO country code, local area code, MEID, mobile country code, mobile network code, network name, network type, phone type, SIM serial number, SIM state, subscriber ID BSSID, IP, linkspeed, MAC addr, network ID, RSSI, SSID BSSID, capabilities, frequency, level, SSID root status code, root status reason code, root version, sig file version algorithm confidence, app list, found malware, malware SDK version, package list, reason code, service list, sigfile version
26 Privacy Surveillance Watching, listening to, or recording of an individual s activities Interrogation Various forms of questioning or probing for information Aggregation The combination of various pieces of data about a person Identification Linking information to particular individuals Insecurity Carelessness in protecting stored information from leaks and improper access Secondary Use Use of information collected for one purpose for a different purpose without the data subject s consent Exclusion Failure to allow the data subject to know about the data that others have about her and participate in its handling and use, including being barred from being able to access and correct errors in that data Breach of confidentiality Breaking a promise to keep a person s information confidential Disclosure Revelation of information about a person that impacts the way others judge her character Exposure Revealing another s nudity, grief, or bodily functions Increased Accessibility Amplifying the accessibility of information Blackmail Threat to disclose personal information Appropriation The use of the data subject s identity to serve the aims and interests of another Distortion Dissemination of false or misleading information about individuals Intrusion Invasive acts that disturb one s tranquillity or solitude Decisional Interference Incursion into the data subject s decisions regarding her private affairs
27 Political, ethical and social issues Predictive profiling and social sorting Dynamic pricing Anticipatory governance Control creep Technocratic modes of governance and technological lock-ins Vulnerabilities: buggy, brittle and hackable systems Data protection and security
28 Conclusion Big data/analytics does constitute a data revolution fundamental alters the nature of data and how we make sense of them It is starting to transform how business and government is conducted, organised and managed It is also starting to alter how research is conducted across the academy As the technology and analytics improve these transformations will extend and deepen They will thus pose significant epistemological questions, as well social, political and ethical ones We are only just starting to examine and think through these questions
29 @robkitchin Kitchin, R., Lauriault, T. and McArdle, G. (2015) Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards. Regional Studies, Regional Science 2 Kitchin, R. and Lauriault, T. (2014) Towards critical data studies. SSRN Kitchin, R. and Lauriault, T. (2014) Small data in the era of big data. GeoJournal (online first) Kitchin, R. (2014) Big data, new epistemologies and paradigm shifts. Big Data and Society 1 (April-June): Kitchin, R. (2014) The real-time city? Big data and smart urbanism. GeoJournal 79(1): Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3):
Original Research Article Big Data, new epistemologies and paradigm shifts Big Data & Society April June 2014: 1 12! The Author(s) 2014 DOI: 10.1177/2053951714528481 bds.sagepub.com Rob Kitchin Abstract
NESSI White Paper, December 2012 Big Data A New World of Opportunities Contents 1. Executive Summary... 3 2. Introduction... 4 2.1. Political context... 4 2.2. Research and Big Data... 5 2.3. Purpose of
Using Privacy by Design to Achieve Big Data Innovation Without Compromising Privacy Ann Cavoukian, Ph.D. Information and Privacy Commissioner Ontario, Canada David Stewart National Advanced Analytics Leader
Big data and positive social change in the developing world: A white paper for practitioners and researchers Rockefeller Foundation Bellagio Centre conference, May 2014 Please cite as: Bellagio Big Data
Association for Data-driven Marketing & Advertising BEST PRACTICE GUIDELINE: BIG DATA A guide to maximising customer engagement opportunities through the development of responsible Big Data strategies.
Compliments of 2nd IBM Limited Edition Business Analytics in Retail Learn to: Put knowledge into action to drive higher sales Use advanced analytics for better response Tailor consumer shopping experiences
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, VOL. XX, NO. X, XXXX 20XX 1 Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives Zhi-Hua Zhou, Nitesh V. Chawla, Yaochu Jin, and
Reports & Publications Impact of Mobile Technologies on Enterprises: Strategies, Success Factors, Recommendations A study by Stefan Stieglitz and Tobias Brockmann published by the Vodafone Institute for
Challenges and Opportunities with Big Data A community white paper developed by leading researchers across the United States Executive Summary The promise of data-driven decision-making is now being recognized
Big Data & Analytics : Use Cases, Case Studies & Glossary What Is Big Data? Let us for a moment and look at how the technology world has changed since the turn of the century: 80 % of the world s information
A Deloitte Analytics paper Open data Driving growth, ingenuity and innovation Contents Foreword 1 Executive summary 3 Our vision for open data 4 Build a strategy to use open data 10 Change the way your
July 2013 Contents 1. Introduction 3 2. What is Big Data? 4 3. Big Data Adoption 5 4. Drivers and Barriers 11 5. Opportunities for Digital Entrepreneurship 14 5.1. Supply-side Business opportunities 14
THE PRESIDENT S NATIONAL SECURITY TELECOMMUNICATIONS ADVISORY COMMITTEE NSTAC Report to the President on the Internet of Things November 19, 2014 TABLE OF CONTENTS EXECUTIVE SUMMARY... ES-1 1.0 INTRODUCTION...
International Journal of Computer Systems (ISSN: 2394-1065), Volume 01 Issue 02, November, 2014 Available at http://www.ijcsonline.com/ Kajal Garg, Sonal Somani Department of Computer Science, Rajasthan
DRAFT VERSION Big Data privacy principles under pressure September 2013 2 Contents Summary... 6 1 Introduction... 8 1.1 Problems for discussion... 8 1.2 Definitions... 9 1.2.1 Big Data... 9 1.2.2 Personal
Big Data How it can become a differentiator Contents Definition of Big Data 3 Opportunity space 4 Key Players 6 Leading Industries taking advantage 7 of the Big Data trend Big Data in the Financial Industry
CGMA REPORT From insight to impact Unlocking opportunities in big data Two of the world s most prestigious accounting bodies, AICPA and CIMA, have formed a joint venture to establish the Chartered Global
GSR discussion paper Big Data - Opportunity or Threat? Work in progress, for discussion purposes Comments are welcome! Please send your comments on this paper at: firstname.lastname@example.org by 20 June 2014. The views
ISSN 1868-9558 JOURNAL ISSUE 22 // JULY 2014» PAGE 4 Big Data security by Rohde & Schwarz» PAGE 16 Wearables Smart protocols for smart technology by Rutronik The role of Big Data & Data Security in M2M
International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 11, No. 3, pp. 116 127, 2014 ANALYTICS ON BIG AVIATION DATA: TURNING DATA INTO INSIGHTS RAJENDRA AKERKAR
Research Memorandum 94 July 2014 Creating business value from Big Data and business analytics: organizational, managerial and human resource implications Hull University Business School Prof Richard Vidgen
Project Acronym: BIG Project Title: Big Data Public Private Forum (BIG) Project Number: 318062 Instrument: CSA Thematic Priority: ICT-2011.4.4 D2.3.1 First Draft of Sector s Requisites Work Package: WP2
IBM Software Big Data & Analytics Thought Leadership White Paper Better business outcomes with IBM Big Data & Analytics The insights to transform your business with speed and conviction 2 Better business
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.