APPLIED ECONOMICS IN THE DIGITAL ERA WEBDATANET TF25 www.webdatanet.eu Pablo de Pedraza SEVILLE 22 SEPTEMBER 2015
multidisciplinary network of web-based data-collection experts 2011-2015 Researchers & stakeholders from any different backgrounds & 34 countries Foster scientific use of web-based data to benefit society & public interests WG1 Quality WG2 Innovation WG3Implementation TF25: Applied Economics in the Digital Era
Digitalization & data flowing: - Capture, create and save more data than ever before -The internet of things Motivation/Stimulation Applied Economics have not fully tap into this opportunity We need a strategy for publicly funded research to be part of the new realities TF25: Applied Economics in the Digital Era What should a research agenda and strategy of academic and institutional researchers look like if applied economics is to fully benefit from Big data? 2.2.- Identify reasons of the limited use: Big Data Characterization 2.3.- Propose potential solutions: Strategies and actions
2.2.- Big data Characterization 1.The 5Vs in Social sciences 2. Case studies 3.Danger provision of data direct research 4. Big Data management 5. Big data epistemology 2.3.-Strategies and actions 1. Tap into current opportunities 2. Sources of error 3. Claiming access to data 4. Stakeholder consortium 5. Networking process
1.The 5Vs in Social sciences - 5vs 2.2.- (commercial) Big data Characterization do not guarantee validity for 2.3.- social Strategies sciences and actions - Volume: Definition (Hey et al. 2009, IBM 2013, Google 2013, 1.The 5Vs Microsoft, in Social 2013, sciences Intel 2013, Varian 1. 2013 ) Tap into current opportunities - Velocity: Nowcasting (Giannone et al. 2008) 2. Case studies - Variety: Models based on representative 2. Sources agents of error can improve - Value: short run (Mayer & cukier 2013) vs long run & societal 3. Danger - Veracity: provision Organic of data data direct Design 3. data Claiming access to data research - Good qualities for Social Sciences BUT not 4. Stakeholder generated by consortium any of the existing 4. scientific Big Data methods: management Uncertainty about generation process & sources of error (Coverage, Selectivity, non-response ) 5. Big data epistemology - Complementarity with web-survey methods (ex. wageindicator.com) & experiments (Reips 2007 & Edelman 2012). BENCHMARK.
2. Case Studies Are data 2.2.- able Big to answer data Characterization research questions? Examples: 2.3.- Strategies and actions -Prices (Cavallo 2011 www.bpp.mit.edu) 1.The -New 5Vs in papers Social webs sciences to identify stock 1. market Tap into bubbles current (Gerow opportunities & Keane, 2012) -Twitter mood correlation with DJIA (Bollena et al. 2011) 2. Case -Twitter studies to predict currency exchange 2. rates Sources (Janetzko of error 2014) -Google searches to predict: unemployment (Choi & Varian 2011) 3. Danger provision well of data being direct during 3. the Claiming crisis (Askitas access & to Zimmerman data 2011) research Housing markets & mortgage delinquencies Tourists inflows 4. (Artola Stakeholder & Pedraza consortium 2015) 4. Big -Wage Data management indicator for job insecurity (Bustillo & Pedraza 2010) and life satisfaction models Guzi & Pedraza 2015) 5. Big ( ) data epistemology File drawer effect (Rosenthal 1979), very little about limitations and how to overcome them (Couper 2013, Lazer et al. 2014, Butler 2013, Artola & Pedraza 2015) not only what but why (rumour, change in algorithms, search behavior ) = better access+ benchmark traditional data Problems are specific to each research goal
3. Danger 2.2.- Big provision data Characterization of data direct research 2.3.- Strategies and actions Data from Twitter, Facebook, Google, Amazon, or ebay, Master Card, Swift, and 1.The UPS are 5Vs hidden in Social for sciences science or available 1. only Tap for into few current researchers. opportunities 2. Also Case the studies public sector produce data disregarding 2. Sources potential of error scientific use (Askitas & Zimmerman 2014: The Toll Index) 3. Danger provision of data direct 3. Claiming access to data research Replicability in danger when it could increase given ICT 4. Stakeholder consortium 4. Open Big Data Data management Movements (World Bank, UN Global Pulse, OKF, DA ) NO SYSTEMATIC APPROACH FOR RESEARCHERS 5. Big data epistemology Problems in opening: privacy & regulations data keepers competitiveness benefits of sharing Cost of resources
2.2.- Big data Characterization 4. Big Data management 2.3.- Strategies and actions 1.The 5Vs in Social sciences 1. Tap into current opportunities It is possible to preserve privacy and open data conditional on specific licenses using existing technologies: 2. Case studies 2. Sources of error - data repositories (OAIS framework, FEDORA repository ) 3. Danger provision of data direct 3. Claiming access to data - models to support traceability of data (W3c Prov model) research - web resource archiving 4. Stakeholder consortium 4. Big Data management Data formats, Meta data describing data sets 5. Big data epistemology
5. Big 2.2.- data Big epistemology data Characterization 2.3.- Strategies and actions Not yet systematically used to test existing social sciences theoretical models 1.The 5Vs in Social sciences 1. Tap into current opportunities New tools and ways of thinking Ex. 2. Case If N=Millions studies everything is significant (Taylor 2. Sources et al 2014): of error 1 st approach pre-analysis with machine learning techniques 3. Danger provision Good: to of define data questions direct 3. & Claiming variables access at the light to data of theories research Critic: correlations but not casualty may pick up things that are not there 4. Stakeholder consortium 4. Big Data 2 nd approach management Econometrics using samples drawn from Big data 5. Big data epistemology Constant interaction between THEORY AND DATA. Complementarities & interactions among disciplines (Survey methods, Computer sciences, Math's and the Graph theory, Psychologists, Linguistics, )
2.2.- Big data Characterization 2.3.- Strategies and actions 1.The 5Vs in Social sciences 1. Tap into current opportunities Case Studies + Inductive reasoning feed the general approach 2. Case studies Goals: 2. Sources of error (ex. Google searches) 3. Danger provision of data direct 3. Claiming access to data Why research well performing models may break down 4. Stakeholder consortium Identify 4. Big complementary Data management of tools of any kind Identify 5. Big complementary data epistemology of data modes & methods Multidisciplinary interaction: computer sciences, experimental psychology, survey methodology, linguistics, applied mathematics, law (ex. http://www.eduworks-network.eu ) Quicker and more flexible dissemination means avoid File drawer effect
2.2.- Big data Characterization 2.3.- Strategies and actions 1.The 5Vs in Social sciences 2. Sources of 1. error: Tap into current opportunities Approach big data Quality by (Quality frameworks from Statistical offices): 2. Case studies 2. Sources of error - Sources of error using existing and new frameworks 3. Danger provision of data direct 3. Claiming access to data research -Interaction/complementarities among - data types: (non-reactive, experiments, 4. Stakeholder surveys ) consortium to test, evaluate, 4. Big Data complement, management improve - types of variables: ex subjective variables/reveled vs reported 5. Big data preferences) epistemology CONSTANT EVALUATION OF DATA -Origin/activities that generates DATA: Admin, transactional, social Networks, mobile devices
2.2.- Big data Characterization 3. Claiming access to 2.3.- data Strategies and actions 1.The Better 5Vs access in Social = better sciences data consumers protection 1. Tap into (insurance current opportunities companies) Assist legal experts 2. Case studies 2. Sources of error Specific licenses for researchers 3. Musgrave s Danger provision debate (Data of Taxation) data direct 3. Claiming access to data research Individual data philanthropy (Ex. Quantify 4. self) Stakeholder consortium 4. Corporate Big Data data management philanthropy (benefits of sharing, ex. Google) 5. Balance Big data inequalities epistemology in Access (UN 2014) Openness score Build upon & contribute to open data movements (OKF, WB )
2.2.- Big data Characterization 4. Stakeholder consortium 2.3.- Strategies and actions More vigilant in fighting Scientific/Public interests 1.The 5Vs in Social sciences 1. Tap into current opportunities Lobby to influence legislators 2. Case studies 2. Sources of error Coordination of Initiatives from International institutions 3. Danger provision of data direct 3. Claiming access to data research Higher education curricula & interaction among disciplines (www.webdatametrics.com ) 4. Stakeholder consortium 4. Big Data management Global infrastructure of data 5. Big data epistemology Role of Statistical Offices -Quality Frameworks (Eurostat, OECD, UNECE ): Claiming, curating, and making available while continue generating traditional (benchmark) data.
2.2.- Big data Characterization 2.3.- Strategies and actions Conclusions 1.The 5Vs in Social sciences 1. Tap into current opportunities Feed the general approach by inductive reasoning from case studies 2. Case studies 2. Sources of error 3. Danger provision Multidisciplinary of data direct networking 3. Claiming & discussions access to data research 4. Stakeholder consortium 4. Institutions Big Data management & researchers interact within a clear plan aiming benefit citizen's and consumers 5. Big data epistemology Access to data
Kea Tijdens, University of Amsterdam Prasanna Lal Das, The World Bank Ulf-Dietrich Reips, University of Konstanz Richard Freeman, Harvard University Muriel Foulonneau, Tudor Research Center Clemens A. Grünwald, Datenschutz Nord Group Concha Artola Central Bank of Spain Fernando Reis, Eurostat Laura Wilson, Office for National Statistics United Kingdom Fazel Ansari, University of Siegen Eva Beyvers, University of Passau.
Gracias Thank you