How To Understand The Big Data Paradigm



Similar documents
Big Data: A Critical Analysis!!

Data Mining. Supervised Methods. Ciro Donalek Ay/Bi 199ab: Methods of Sciences hcp://esci101.blogspot.

The Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis

Big Data Hope or Hype?

Governance as Leadership: Reframing the Work of Nonprofit Boards

Present Levels of Academic Achievement and Functional Performance (PLAAFP) Training

An Open Dynamic Big Data Driven Applica3on System Toolkit

Information Visualization WS 2013/14 11 Visual Analytics

How To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9

Graduate Systems Engineering Programs: Report on Outcomes and Objec:ves

Mission. To provide higher technological educa5on with quality, preparing. competent professionals, with sound founda5ons in science, technology

How to Develop a Research Protocol

MSc Data Science at the University of Sheffield. Started in September 2014

Workshop : Open and Big Data for Life Imaging

WHY ANALYSE? BOB APOLLO

USE OF EXPERT WITNESSES IN CONTESTED CASES BY: JAMES (DUSTY) JOHNSTON GENERAL COUNSEL TEXAS BOARD OF NURSING

School of Advanced Studies Doctor Of Management In Organizational Leadership. DM 004 Requirements

Unified Monitoring with AppDynamics

Building your cloud porbolio APS Connect

CREDIT TRANSFER: GUIDELINES FOR STUDENT TRANSFER AND ARTICULATION AMONG MISSOURI COLLEGES AND UNIVERSITIES

Big Data and Health Insurance Product Selec6on (and a few other applica6on) Jonathan Kolstad UC Berkeley and NBER

BML Munjal University School of Management. Doctor of Philosophy (Ph.D.) Program In Business AdministraBon

Clinical teachers experiences of nursing and teaching. Dr. Helen Forbes Deakin University

Innovation Quality Flexibility

College of Arts and Sciences: Social Science and Humanities Outcomes

How to write an effec-ve DIGITAL MARKETING STRATEGY. Secrets from the professionals

School of Advanced Studies Doctor Of Management In Organizational Leadership/information Systems And Technology. DM/IST 004 Requirements

Big Data Introduction, Importance and Current Perspective of Challenges

Big Data + Big Analytics Transforming the way you do business

Cathrael Kazin, JD, PhD Chief Academic Officer

DTCC Data Quality Survey Industry Report

The Shi'ing Role of School Psychologists within a Mul7-7ered System of Support Framework. FASP Annual Conference October 29, 2015

Positive Philosophy by August Comte

CSER & emerge Consor.a EHR Working Group Collabora.on on Display and Storage of Gene.c Informa.on in Electronic Health Records

Mergers in Produc.on and Percep.on. Ka.e Drager (University of Hawai i at Mānoa) Jennifer Hay (University of Canterbury)

Pu?ng B2B Research to the Legal Test

Big Data in medical image processing

Overcoming the false dichotomy of quantitative and qualitative research: The case of criminal psychology

Migrating to Hosted Telephony. Your ultimate guide to migrating from on premise to hosted telephony.

9/21/15. Research Educa4on Solu4ons A NEW LANGUAGE FOR LEADERSHIP TRANSFORMING PERFORMANCE MANAGEMENT: AN ELI LILLY CASE STUDY

Founda'onal IT Governance A Founda'onal Framework for Governing Enterprise IT Adapted from the ISACA COBIT 5 Framework

Machine Learning and Data Mining. Fundamentals, robotics, recognition

IT Change Management Process Training

Phone Systems Buyer s Guide

Big Data, new epistemologies and paradigm shifts

Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protec/ng Informa/on Assets Greg Senko

Data Isn't Everything

THE PERFORMANCE MANAGEMENT PROGRAM FOR DEPUTY MINISTERS. May 2012

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights

*Heinemann, London, 1979

Introduc)on to the IoT- A methodology

Splunk and Big Data for Insider Threats

Transcription:

Big Data and Its Empiricist Founda4ons Teresa Scantamburlo

The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve solu4ons (?) Open problems

Sta4s4cal Learning Theory The ques4on is how a machine, a computer, can learn from examples (= induc&ve inference and generaliza&on ability) The machine is shown par4cular examples (x 1, y 1 ),...,(x n, y n ) of a specific task where x i! X (instances) and y i! Y (labels). Its goal is to infer a general rule f : X! Y (classifier) which can both explain the examples it has seen already and which can generalize to new examples. von Luxburg and Schölkopf, Sta&s&cal Learning Theory: Models, Concepts and Results, 2011

Sta4s4cal Learning Theory

The Business of Data Big Data is not simply denoted by volume. Some characterizing features: velocity, being created in or near real- 4me; variety, being structured and unstructured in nature; exhaus&ve in scope, striving to capture en4re popula4ons or systems (n=all); rela&onal in nature, containing common fields that enable the conjoining of different data sets; fine- grained in resolu4on flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly). R. Kitchin, Big data, new epistemologies and paradigm shifs, 2014

The Big Data Paradigm Big Data is less about data that is big than it is about a capacity to search, aggregate, and cross- reference large data sets. Big Data as a socio- technical phenomenon It rests on the interplay of: Technology: maximizing computa4on power and algorithmic accuracy to gather, analyze, link, and compare large data sets. Analysis: drawing on large data sets to iden4fy pa]erns in order to make economic, social, technical, and legal claims. Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objec4vity, and accuracy d. boyd and K. Crawford, Cri&cal ques&ons for Big Data: provoca&ons for a cultural, technological, and scholarly phenomenon, 2012

The end of theory This is a world where massive amounts of data and applied mathema&cs replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguis4cs to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. C. Anderson, The end of theory: The data deluge makes the scien&fic method obsolete, 2008

The triumph of correla4ons Big Data encourages a growing respect for correla&on, which comes to be appreciated as not only a more informa4ve and plausible form of knowledge than the more definite but also a more elusive, causal explana4on. In the words of Mayer- Schönberger and Cukier (2013): the correla4ons may not tell us precisely why something is happening, but they alert us that it is happening. And in many situa&ons this is good enough. S. Leonelli, What Difference Does Quan&ty Make? On The Epistemology of Big Data in Biology, 2014

Empiricism Reborn There is a powerful and a]rac4ve set of ideas at work in the empiricist epistemology that runs counter to the deduc4ve approach that is hegemonic within modern science: Big Data can capture a whole domain and provide full resolu4on; there is no need for a priori theory, models or hypotheses; through the applica4on of agnos4c data analy4cs the data can speak for themselves free of human bias or framing, and any pa]erns and rela4onships within Big Data are inherently meaningful and truthful; meaning transcends context or domain- specific knowledge, thus can be interpreted by R. Kitchin, Big data, new epistemologies and paradigm shifs, 2014

Some reac4ons Claims to objec4vity and accuracy are misleading Bigger data are not always be]er data Taken out of context, Big Data loses its meaning Just because it is accessible does not make it ethical Limited access to Big Data creates new digital divides d. boyd and K. Crawford, Cri&cal ques&ons for Big Data: provoca&ons for a cultural, technological, and scholarly phenomenon, 2012

An interes4ng analysis Both data analysis models and theore4cal scien4fic models are there to solve a problem, one to solve a problem of data analysis, the other to solve a problem of describing an empirical phenomenon. D.M. Bailer- Jones and C.A.L. Bailer- Jones, Modelling data: Analogies in neural networks, simulated annealing and gene&c algorithms, 2002

An interes4ng analysis Data analysis models Beyond the goal of accurate predic4on, the scien&fic insight that computa4onal data models give in a specific case may be limited. Data analysis techniques are not specific to the type of data that are modelled. The techniques are designed to be independent of specific applica4ons they are applica&on- neutral. Theore4cal scien4fic models A theore4cal scien4fic model is, in contrast, specific to a type of phenomenon. The theore4cal concepts and laws that give shape to the theore4cal model are chosen on the basis of the physical proper4es of the phenomenon to be modelled. D.M. Bailer- Jones and C.A.L. Bailer- Jones, Modelling data: Analogies in neural networks, simulated annealing and gene&c algorithms, 2002

An interes4ng analysis D.M. Bailer- Jones and C.A.L. Bailer- Jones, Modelling data: Analogies in neural networks, simulated annealing and gene&c algorithms, 2002

A tenta4ve reconcilia4on In contrast to new forms of empiricism, data- driven science seeks to hold to the tenets of the scien4fic method, but is more open to using a hybrid combina4on of abduc&ve, induc&ve and deduc&ve approaches to advance the understanding of a phenomenon. It seeks to incorporate a mode of induc4on into the research design, though explana4on through induc4on is not the intended end- point (as with empiricist approaches). It forms a new mode of hypothesis genera4on before a deduc4ve approach is employed. The epistemological strategy adopted within data- driven science is to use guided knowledge discovery techniques to iden4fy poten4al ques4on(hypotheses) worthy of further examina4on and tes4ng R. Kitchin, Big data, new epistemologies and paradigm shifs, 2014

A philosophical interpreta4on The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve solu4ons (?) Open problems?

Hume s Legacy Hume s an4- ra4onalism polemic contributed to introduce a gap between the knowledge of the world and pure reasoning (Hume s fork) Knowledge of the world = a product of repeated percep&ons. Imagina4on becomes accustomed to foresee the order of events. Note that this expecta4on subsumes a feeling of inevitability, somehow replacing the rejected ra4onal necessity. it arises in the mind spontaneously and naturally, without the involvement of reason, merely because the mind is acted upon by the same objects in the same way repeatedly. Induc4on is replaced at the level of a non- ra4onal feeling whose reliability is leh to the vivacity and the freshness of data percep4on. So, removing any degree of ra4onality (or logos) within content experiences, we are led to reinforce the degree of connec4ons

Open problems Induc4on: abstrac4on and generaliza4on? Induc4on: models of data and models of phenomena?