Small area model-based estimators using big data sources

Size: px
Start display at page:

Download "Small area model-based estimators using big data sources"

Transcription

1 Small area model-based estimators using big data sources Monica Pratesi 1 Stefano Marchetti 2 Dino Pedreschi 3 Fosca Giannotti 4 Nicola Salvati 5 Filomena Maggino 6 1,2,5 Department of Economics and Management, University of Pisa 3 Department of Informatics, University of Pisa, 4 National Research Council of Italy 6 Department of Statistics and informatics, University of Florence NTTS conference, Bruxelles, 4-7 March 2013

2 Outline 1 Motivation 2 Social Mining 3 Small Area Estimation 4 Model-Based Estimators of Poverty Indicators 5 Using Big Data in Small Area Estimation

3 Motivation Part I Motivation

4 Motivation Motivation The timely, accurate monitoring of social indicators, such as poverty or inequality, at a fine grained spatial and temporal scale is a challenging task for official statistics, albeit a crucial tool for understanding social phenomena and policy making Statistical methods and social mining from big data represent today a concrete opportunity for understanding social complexity The aim is to improve our ability to measure, monitor and, possibly, predict social performance, well-being, deprivation poverty, exclusion, inequality, and so on, at a fine-grained spatial and temporal scale To succeed we need a statistical framework that allow to make inference with the big data

5 Social Mining Part II Social Mining

6 Social Mining Social Mining Social Mining aims to provide the analytical methods and associated algorithms needed to understand human behaviour by means of automated discovery of the underlying patterns, rules and profiles from the massive datasets of human activity records produced by social sensing We live in a time with unprecedented opportunities of sensing, storing, analyzing (micro)-data, at mass level, recording human activities at extreme detail Shopping patterns and lifestyle Relationships and social ties Desires, opinions, sentiments Movements

7 Social Mining Social Mining Figure: What happen in 60 seconds?

8 Social Mining Big Data and New Question to Ask Figure: Text analysis of Tweets in the USA in the 24 hours

9 Social Mining Big Data as Proxy of Human Mobility Figure: Impact of systematic mobility on access patterns

10 Social Mining Big Data as Proxy of Human Mobility Work-Home Home-Work Figure: Discovering individual systematic movements

11 Social Mining Big Data as Proxy of Human Mobility Figure: By GPS data is possible to identify new borders of mobility that are different from administrative borders

12 SAE Model-Based Estimators of Poverty Indicators Part III Model-Based Estimators of Poverty Indicators

13 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Population of interest (or target population): population for which the survey is designed direct estimators should be reliable for the target population Domain: sub-population of the population of interest, they could be planned or not in the survey design Geographic areas (e.g. Regions, Provinces, Municipalities, Health Service Area) Socio-demographic groups (e.g. Sex, Age, Race within a large geographic area) Other sub-populations (e.g. the set of firms belonging to a industry subdivision) we don t know the reliability of direct estimators for the domains that have not been planned in the survey design

14 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Often direct estimators are not reliable for some domains of interest In these cases we have two choices: oversampling over that domains applying statistical techniques that allow for reliable estimates in that domains Small Domain or Small Area Geographical area or domain where direct estimators do not reach a minimum level of precision Small Area Estimator (SAE) An estimator created to obtain reliable estimate in a Small Area

15 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Modern small area estimation approach is based on model-based methods Statistical models link the variable of interest with covariate information unit level models when auxiliary variables are known for out f sample units area level models when auxiliary variables are known only at area level Unit level models are based on mixed models or M-quantile models Area level models are based on the Fay and Herriot model

16 SAE Model-Based Estimators of Poverty Indicators Small Area Estimation: Unit Level Model Let y ij be the target variable for unit j in area i Let x ij be the auxiliary vector of p auxiliary variables for unit j in area i Mixed effect model y ij = x T ij β + u i + ε ij u i N(0, σ 2 u) ε ij N(0, σ 2 ε) M-quantile models Q(y ij x ij ) qij = x T ij β(q ij ) y ij = x T ij β(θ i ) + ε ij Parameters are estimated respectively with restricted maximum likelihood and iterative weighted least squares By the use of these models estimates are more accurate because we take into account the between area variation

17 SAE Model-Based Estimators of Poverty Indicators Small Area Estimation: Fay-Harriot Model Based on mixed models Fay and Harriot modeled direct estimates with area level auxiliary variables Let θ i be a target parameter and theta ˆ i its direct estimate in area i Let x i be the auxiliary vector of p auxiliary variables for area i ˆθ i = θ i + ε i ε i N(0, σ 2 ε i ) θ i = x i β + u i u i N(0, σ 2 u) The Fay and Harriot model is than ˆθ i = x T i β + u i + ε i Model parameters can be estimated by maximum likelihood methods (σ 2 ε i are considered known) By the use of the auxiliary information the accuracy of the estimates is improved

18 SAE Model-Based Estimators of Poverty Indicators Model-Based Estimators of Poverty Indicators Denoting by t the poverty line different poverty measures are defined by using (Foster et al., 1984) ( t yij ) αi(yij F α,ij = t) i = 1,..., N t The population distribution function in small area j can be decomposed as follows [ ] F α,j = N 1 j F α,ij + F α,ij i s j i r j Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the Poverty Gap. HCR and PG can be estimated with unit level models.

19 SAE Model-Based Estimators of Poverty Indicators Model-Based Estimators of Poverty Indicators By the use of HCR and PG estimators we can estimate 13 on 19 Laeken indicators The mean squared error of these estimator can be estimated by bootstrap techniques (Molina and Rao, 2010; Marchetti et al., 2012) The use of Fay and Herriot models for binary or count outcome is currently under development The use of well-specified models and the availability of auxiliary variable is a key point to allow the use of small area estimation methods

20 Using Big Data in Small Area Estimation Part IV Using Big Data in Small Area Estimation

21 Using Big Data in Small Area Estimation Small Area Estimation and Big Data Our aim is to use the huge source of data coming from human activities - the big data - to make accurate inference at a small area level. We identified three possible approaches: 1 Use big data as covariate in small area models 2 Use survey data to remove self-selection bias from estimates obtained using big data 3 Use big data to validate small area estimates

22 Using Big Data in Small Area Estimation Use Big Data as Covariate in Small Area Models Big data often provide unit level data Outcome variable have to be linked to auxiliary variable in order to use unit level data in a small area model Due to technical challenges and law restriction it is unfeasible at this stage to have unit level big data that can be linked with administrative archive or census or survey data Big data can be aggregate at area level and then used in an area level model ˆθ i = d T i β + u i + ε i d i is a vector of p variables gathered from big data sources

23 Using Big Data in Small Area Estimation Use Survey Data to Remove Self-Selection Bias from Estimates Obtained Using Big Data An option is to use big data directly to measure poverty and social exclusion It is realistic to think the big data are not representative of the whole population of interest (self-selection problem) Using a quality survey we can check difference in the distribution of common variables between big data and survey data If there aren t common variables we can use known correlated data to check difference in the distribution Given this difference we can compute weights that allow the reduction of bias due to the self-selection of the big data

24 Using Big Data in Small Area Estimation Use Big Data to Validate Small Area Estimates Poverty and deprivation measures obtained from big data can be compared with similar measures obtained from survey data If there is accordance between big data estimates and survey data estimates then there is a double checked evidence of the level of poverty and deprivation If there is discrepancy there is need of further investigation

25 Using Big Data in Small Area Estimation Essential Bibliography Breckling, J. and Chambers, R. (1988). M-quantiles. Biometrika, 75, Chambers, R., Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika 93 (2), Foster, J., Greer, J., Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica 52, Giannotti, F. and Pedreschi, D. (2008). Mobility, Data Mining and Privacy, Springer. Giannotti, F., Pedreschi, D., Pentland, A., Lukowicz, P., Kossmann, D., Crowley, J. and Helbing, D. (2012). A planetary nervous system for social mining and collective awareness. European Physics Journal - Special Topics 214, Marchetti, S., Tzavidis, N. and Pratesi, M. (2012). Non-parametric bootstrap mean squared error Estimation for M-quantile Estimators of Small Area Averages, Quantiles and Poverty Indicators. Computional Statistics and Data Analysis, 56(10), Molina, I., Rao, J. (2010). Small area estimation of poverty indicators. The Canadian Journal of Statistics.

Small area model-based estimators using big data sources

Small area model-based estimators using big data sources Small area model-based estimators using big data sources Monica Pratesi 1, Dino Pedreschi 2, Fosca Giannotti 3, Stefano Marchetti 4, Nicola Salvati 5, Filomena Maggino 6 1 University of Pisa, e-mail: m.pratesi@ec.unipi.it

More information

BIG DATA AND OFFICIAL STATISTICS. Filomena Maggino, Monica Pratesi

BIG DATA AND OFFICIAL STATISTICS. Filomena Maggino, Monica Pratesi BIG DATA AND OFFICIAL STATISTICS Filomena Maggino, Monica Pratesi What about risks, needs, and challenges of big-data in the context of measuring wellbeing? «Data are widely available, what is scarce is

More information

Small Area Model-Based Estimators Using Big Data Sources

Small Area Model-Based Estimators Using Big Data Sources Journal of Official Statistics, Vol. 31, No. 2, 2015, pp. 263 281, http://dx.doi.org/10.1515/jos-2015-0017 Small Area Model-Based Estimators Using Big Data Sources Stefano Marchetti 1, Caterina Giusti

More information

Monica Pratesi, University of Pisa

Monica Pratesi, University of Pisa DEVELOPING ROBUST AND STATISTICALLY BASED METHODS FOR SPATIAL DISAGGREGATION AND FOR INTEGRATION OF VARIOUS KINDS OF GEOGRAPHICAL INFORMATION AND GEO- REFERENCED SURVEY DATA Monica Pratesi, University

More information

Mobile phone data for Mobility statistics

Mobile phone data for Mobility statistics International Conference on Big Data for Official Statistics Organised by UNSD and NBS China Beijing, China, 28-30 October 2014 Mobile phone data for Mobility statistics Emanuele Baldacci Italian National

More information

Thomas Risse. 2 nd International Alexandria Workshop 2015

Thomas Risse. 2 nd International Alexandria Workshop 2015 Thomas Risse 2 nd International Alexandria Workshop 2015 We inhabit a measurable global society nowadays The increasing wealth of data is a chance to disentangle the social complexity Thomas Risse 2 Big

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Estimating Economic Development with Mobile Phone Data

Estimating Economic Development with Mobile Phone Data Estimating Economic Development with Mobile Phone Data [Experiments and Analysis Paper] Luca Pappalardo KDD Lab University of Pisa Pisa, Italy lpappalardo@di.unipi.it Zbigniew Smoreda SENSE Orange Lab

More information

Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach

Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Barbara Furletti, Lorenzo Gabrielli, Giuseppe Garofalo,

More information

SPATIAL MODELS IN SMALL AREA ESTIMATION IN THE CONTEXT OF OFFICIAL STATISTICS

SPATIAL MODELS IN SMALL AREA ESTIMATION IN THE CONTEXT OF OFFICIAL STATISTICS Statistica Applicata - Italian Journal of Applied Statistics Vol. 24 (1) 9 SPATIAL MODELS IN SMALL AREA ESTIMATION IN THE CONTEXT OF OFFICIAL STATISTICS Alessandra Petrucci 1 Department of Statistics,

More information

Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR:

Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Developing Real-time Census from Crowds of Greater Dhaka Ayumi Arai 1 and Ryosuke Shibasaki 1,2 1 Department

More information

TREND ANALYSIS OF MONETARY POVERTY MEASURES IN THE SLOVAK AND CZECH REPUBLIC

TREND ANALYSIS OF MONETARY POVERTY MEASURES IN THE SLOVAK AND CZECH REPUBLIC TREND ANALYSIS OF MONETARY POVERTY MEASURES IN THE SLOVAK AND CZECH REPUBLIC Iveta Stankovičová Róbert Vlačuha Ľudmila Ivančíková Abstract The EU statistics on income and living conditions (EU SILC) is

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

and second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity

and second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity and second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity in cooperation with Eurostat DIDACTIC ORGANIZATION A. Concepts

More information

Workshop Discussion Notes: Housing

Workshop Discussion Notes: Housing Workshop Discussion Notes: Housing Data & Civil Rights October 30, 2014 Washington, D.C. http://www.datacivilrights.org/ This document was produced based on notes taken during the Housing workshop of the

More information

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS).

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS. Vladislav Beresovsky National Center for Health Statistics 3311 Toledo Road Hyattsville, MD 078

More information

second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity

second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity LIST OF SUBJECTS AND TOPICS A. Concepts and tools Total: 7 credits

More information

Small area estimation of turnover of the Structural Business Survey

Small area estimation of turnover of the Structural Business Survey 12 1 Small area estimation of turnover of the Structural Business Survey Sabine Krieg, Virginie Blaess, Marc Smeets The views expressed in this paper are those of the author(s) and do not necessarily reflect

More information

Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region

Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region (DAStU, Politecnico di Milano) New «urban questions» and challenges

More information

Big Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.

Big Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr. Big Data & Privacy It s Time for a New Deal on Personal Data Dino Pedreschi KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.it Taiwan-Italy Workshop Roma 27 Feb 2015 SIAMO TUTTI POLLICINI DIGITALI

More information

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models

A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Statistics Canada s National Household Survey: State of knowledge for Quebec users

Statistics Canada s National Household Survey: State of knowledge for Quebec users Statistics Canada s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013 INSTITUT DE LA STATISTIQUE DU QUÉBEC Statistics Canada s National Household Survey:

More information

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1

Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Vladislav Beresovsky (hvy4@cdc.gov), Janey Hsiao National Center for Health Statistics, CDC

More information

SMALL AREA ESTIMATES OF THE POPULATION DISTRIBUTION BY ETHNIC GROUP IN ENGLAND: A PROPOSAL USING STRUCTURE PRESERVING ESTIMATORS

SMALL AREA ESTIMATES OF THE POPULATION DISTRIBUTION BY ETHNIC GROUP IN ENGLAND: A PROPOSAL USING STRUCTURE PRESERVING ESTIMATORS STATISTICS IN TRANSITION new series and SURVEY METHODOLOGY 585 STATISTICS IN TRANSITION new series and SURVEY METHODOLOGY Joint Issue: Small Area Estimation 2014 Vol. 16, No. 4, pp. 585 602 SMALL AREA

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Paolo Nesi Dipartimento di Ingegneria dell Informazione, DINFO Università

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

CSAC, April 16-17, 2015 Discussion: Big Data and Modernizing Federal Statistics: Update by Bill Bostic and Ron Jarmin

CSAC, April 16-17, 2015 Discussion: Big Data and Modernizing Federal Statistics: Update by Bill Bostic and Ron Jarmin CSAC, April 16-17, 2015 Discussion: Big Data and Modernizing Federal Statistics: Update by Bill Bostic and Ron Jarmin Noel Cressie National Institute for Applied Statistics Research Australia (NIASRA)

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Selectivity of Big data

Selectivity of Big data Discussion Paper Selectivity of Big data The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands 2014 11 Bart Buelens Piet Daas

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data

Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data Georg Fuchs 1,3, Hendrik Stange 1, Dirk Hecker 1, Natalia Andrienko 1,2,3, Gennady Andrienko 1,2,3 1 Fraunhofer

More information

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &

More information

Science Europe Position Statement. On the Proposed European General Data Protection Regulation MAY 2013

Science Europe Position Statement. On the Proposed European General Data Protection Regulation MAY 2013 Science Europe Position Statement On the Proposed European General Data Protection Regulation MAY 2013 Science Europe Position Statement on the Proposal for a Regulation of the European Parliament and

More information

Analysis of GSM calls data for understanding user mobility behavior

Analysis of GSM calls data for understanding user mobility behavior Analysis of GSM calls data for understanding user mobility behavior Barbara Furletti, Lorenzo Gabrielli, Chiara Renso, Salvatore Rinzivillo KDDLab, ISTI-CNR, Pisa, Italy Email: {barbara.furletti,lorenzo.gabrielli,chiara.renso,salvatore.rinzivillo}@isti.cnr.it

More information

DATA MINING TECHNIQUES FOR CRM

DATA MINING TECHNIQUES FOR CRM International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 509 DATA MINING TECHNIQUES FOR CRM R.Senkamalavalli, Research Scholar, SCSVMV University, Enathur, Kanchipuram

More information

Big Picture of Big Data Software Engineering With example research challenges

Big Picture of Big Data Software Engineering With example research challenges Big Picture of Big Data Software Engineering With example research challenges Nazim H. Madhavji, UWO, Canada Andriy Miranskyy, Ryerson U., Canada Kostas Kontogiannis, NTUA, Greece madhavji@gmail.com avm@ryerson.ca

More information

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Neural Networks for Sentiment Detection in Financial Text

Neural Networks for Sentiment Detection in Financial Text Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Discussion of Presentations on Commercial Big Data and Official Economic Statistics

Discussion of Presentations on Commercial Big Data and Official Economic Statistics Discussion of Presentations on Commercial Big Data and Official Economic Statistics John L. Eltinge U.S. Bureau of Labor Statistics Presentation to the Federal Economic Statistics Advisory Committee June

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Federal Statistics and College Entrepreneurships

Federal Statistics and College Entrepreneurships Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Noel Cressie, Scott H. Holan, and Christopher K. Wikle Department of Statistics,

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Big Data / Privacy: Pick One?

Big Data / Privacy: Pick One? Big Data / Privacy: Pick One? A. Michael Froomkin University of Miami School of Law froomkin@law.miami.edu 1 Privacy, Quickly Has multiple elements including control of access to body, to thoughts, protection

More information

Strategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure

Strategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure Strategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure Paula McLeod Canada Centre for Mapping and Earth Observation United Nations 10 th Regional Cartographic Conference

More information

CHAPTER-24 Mining Spatial Databases

CHAPTER-24 Mining Spatial Databases CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification

More information

Privacy-by-design in big data analytics and social mining

Privacy-by-design in big data analytics and social mining Monreale et al. EPJ Data Science 2014, 2014:10 REGULAR ARTICLE OpenAccess Privacy-by-design in big data analytics and social mining Anna Monreale 1,2*, Salvatore Rinzivillo 2, Francesca Pratesi 1,2, Fosca

More information

Content-Based Discovery of Twitter Influencers

Content-Based Discovery of Twitter Influencers Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy irma.metra@mail.polimi.it chiara.francalanci@polimi.it

More information

Measuring BDC s impact on its clients

Measuring BDC s impact on its clients From BDC s Economic Research and Analysis Team July 213 INCLUDED IN THIS REPORT This report is based on a Statistics Canada analysis that assessed the impact of the Business Development Bank of Canada

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

Using mobile phone data to map human population distribution

Using mobile phone data to map human population distribution Using mobile phone data to map human population distribution Pierre Deville, Vincent D. Blondel Université catholique de Louvain, Belgium Andrew J. Tatem University of Southampton, UK Marius Gilbert Université

More information

Mapping Linear Networks Based on Cellular Phone Tracking

Mapping Linear Networks Based on Cellular Phone Tracking Ronen RYBOWSKI, Aaron BELLER and Yerach DOYTSHER, Israel Key words: Cellular Phones, Cellular Network, Linear Networks, Mapping. ABSTRACT The paper investigates the ability of accurately mapping linear

More information

Regression analysis of probability-linked data

Regression analysis of probability-linked data Regression analysis of probability-linked data Ray Chambers University of Wollongong James Chipperfield Australian Bureau of Statistics Walter Davis Statistics New Zealand 1 Overview 1. Probability linkage

More information

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents

More information

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY MASTER in Informatics Engineering Module general information Module name BIG DATA ANALYTICS SPECIALITY Typology Optional ECTS 18 Temporal organization C1S2

More information

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region

Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region International Journal of Computational Engineering Research Vol, 03 Issue, 8 Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region 1, Salim Diwani, 2, Suzan Mishol, 3, Daniel

More information

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

How to gather and evaluate information

How to gather and evaluate information 09 May 2016 How to gather and evaluate information Chartered Institute of Internal Auditors Information is central to the role of an internal auditor. Gathering and evaluating information is the basic

More information

Big Data and Graph Analytics in a Health Care Setting

Big Data and Graph Analytics in a Health Care Setting Big Data and Graph Analytics in a Health Care Setting Supercomputing 12 November 15, 2012 Bob Techentin Mayo Clinic SPPDG Archive 43738-1 Archive 43738-2 What is the Mayo Clinic? Mayo Clinic Mission: To

More information

Statistique. Canada. Statistics. Canada

Statistique. Canada. Statistics. Canada The DLI and RDC Program: Valuable Research Resources for WIHIR Researchers Sandra Keyes, DLI Representative, UW Pat Newcombe-Welch, Data Analyst, SWORDC G. Keith Warriner, Co-director, South-Western Ontario

More information

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.

Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu. Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu. For Expert Group Meeting Opportunities and advantages of enhanced

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Use of mobile phone data to estimate visitors mobility flows

Use of mobile phone data to estimate visitors mobility flows Use of mobile phone data to estimate visitors mobility flows Lorenzo Gabrielli, Barbara Furletti, Fosca Giannotti, Mirco Nanni, Salvatore Rinzivillo Abstract Big Data originating from the digital breadcrumbs

More information

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace what is this class about? health informatics managing and making sense of biomedical information but mostly from an

More information

Government Management Committee. P:\2013\Internal Services\I&T\gm13005I&T (AFS # 17768)

Government Management Committee. P:\2013\Internal Services\I&T\gm13005I&T (AFS # 17768) STAFF REPORT ACTION REQUIRED Social Sentiment Analysis Date: May 31, 2013 To: From: Wards: Reference Number: Government Management Committee Director, 311 Toronto Chief Information Officer All P:\2013\Internal

More information

Big Data for Development: What May Determine Success or failure?

Big Data for Development: What May Determine Success or failure? Big Data for Development: What May Determine Success or failure? Emmanuel Letouzé letouze@unglobalpulse.org OECD Technology Foresight 2012 Paris, October 22 Swimming in Ocean of data Data deluge Algorithms

More information

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

More information

Short-Term Forecasting in Retail Energy Markets

Short-Term Forecasting in Retail Energy Markets Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting

More information

Cross-Validation. Synonyms Rotation estimation

Cross-Validation. Synonyms Rotation estimation Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical

More information

THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION

THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION This paper was commissioned by CGAP to Real Impact Analytics Public version March 2013 CGAP 2013, All Rights Reserved EXECUTIVE SUMMARY This

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

From Research Question to Exploratory Analysis

From Research Question to Exploratory Analysis From Research Question to Exploratory Analysis Jo Wathan Ros Southern UK Data Service 21 Nov 2014 Today Research questions and data for questions Finding data in the UK Data Service Evaluating data for

More information

Autologistic Regression Model for Poverty Mapping and Analysis

Autologistic Regression Model for Poverty Mapping and Analysis Metodološki zvezki, Vol. 1, No. 1, 2004, 225-234 Autologistic Regression Model for Poverty Mapping and Analysis Alessandra Petrucci, Nicola Salvati, and Chiara Seghieri 1 Abstract Poverty mapping in developing

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Discovering Local Subgroups, with an Application to Fraud Detection

Discovering Local Subgroups, with an Application to Fraud Detection Discovering Local Subgroups, with an Application to Fraud Detection Abstract. In Subgroup Discovery, one is interested in finding subgroups that behave differently from the average behavior of the entire

More information

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential

More information

FRAMEWORK TO EVALUATE INTERNET USE AND DIGITAL DIVIDE IN FIRMS

FRAMEWORK TO EVALUATE INTERNET USE AND DIGITAL DIVIDE IN FIRMS FRAMEWORK TO EVALUATE INTERNET USE AND DIGITAL DIVIDE IN FIRMS Michela Serrecchia * Istituto di Informatica e Telematica, CNR * Via G. Moruzzi, 1-56124 Pisa, Italy * Maurizio Martinelli * Istituto di Informatica

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER

PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER Michael Paul (@mjp39) Johns Hopkins University Crowdsourcing and Human Computation Lecture 18 Learning about the real world through Twitter

More information

Data Mining for Customer Relationship Management (CRM)

Data Mining for Customer Relationship Management (CRM) Data Mining for Customer Relationship Management (CRM) Jaideep Srivastava srivasta@cs.umn.edu 1 Introduction Data Mining has enjoyed great popularity in recent years, with advances in both research and

More information

Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students

Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students Maria A. Jankowska UCLA Research Library IASSIST Conference, June, 2012 Agenda The Age of Big Data/ UN Global Pulse

More information

Review on Financial Forecasting using Neural Network and Data Mining Technique

Review on Financial Forecasting using Neural Network and Data Mining Technique ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado Does Big Data spell Big Trouble for Lawyers? Paul Karlzen Director HR Information & Analytics April 1, 2015 Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

More information