Small area model-based estimators using big data sources
|
|
- Edward Briggs
- 8 years ago
- Views:
Transcription
1 Small area model-based estimators using big data sources Monica Pratesi 1 Stefano Marchetti 2 Dino Pedreschi 3 Fosca Giannotti 4 Nicola Salvati 5 Filomena Maggino 6 1,2,5 Department of Economics and Management, University of Pisa 3 Department of Informatics, University of Pisa, 4 National Research Council of Italy 6 Department of Statistics and informatics, University of Florence NTTS conference, Bruxelles, 4-7 March 2013
2 Outline 1 Motivation 2 Social Mining 3 Small Area Estimation 4 Model-Based Estimators of Poverty Indicators 5 Using Big Data in Small Area Estimation
3 Motivation Part I Motivation
4 Motivation Motivation The timely, accurate monitoring of social indicators, such as poverty or inequality, at a fine grained spatial and temporal scale is a challenging task for official statistics, albeit a crucial tool for understanding social phenomena and policy making Statistical methods and social mining from big data represent today a concrete opportunity for understanding social complexity The aim is to improve our ability to measure, monitor and, possibly, predict social performance, well-being, deprivation poverty, exclusion, inequality, and so on, at a fine-grained spatial and temporal scale To succeed we need a statistical framework that allow to make inference with the big data
5 Social Mining Part II Social Mining
6 Social Mining Social Mining Social Mining aims to provide the analytical methods and associated algorithms needed to understand human behaviour by means of automated discovery of the underlying patterns, rules and profiles from the massive datasets of human activity records produced by social sensing We live in a time with unprecedented opportunities of sensing, storing, analyzing (micro)-data, at mass level, recording human activities at extreme detail Shopping patterns and lifestyle Relationships and social ties Desires, opinions, sentiments Movements
7 Social Mining Social Mining Figure: What happen in 60 seconds?
8 Social Mining Big Data and New Question to Ask Figure: Text analysis of Tweets in the USA in the 24 hours
9 Social Mining Big Data as Proxy of Human Mobility Figure: Impact of systematic mobility on access patterns
10 Social Mining Big Data as Proxy of Human Mobility Work-Home Home-Work Figure: Discovering individual systematic movements
11 Social Mining Big Data as Proxy of Human Mobility Figure: By GPS data is possible to identify new borders of mobility that are different from administrative borders
12 SAE Model-Based Estimators of Poverty Indicators Part III Model-Based Estimators of Poverty Indicators
13 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Population of interest (or target population): population for which the survey is designed direct estimators should be reliable for the target population Domain: sub-population of the population of interest, they could be planned or not in the survey design Geographic areas (e.g. Regions, Provinces, Municipalities, Health Service Area) Socio-demographic groups (e.g. Sex, Age, Race within a large geographic area) Other sub-populations (e.g. the set of firms belonging to a industry subdivision) we don t know the reliability of direct estimators for the domains that have not been planned in the survey design
14 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Often direct estimators are not reliable for some domains of interest In these cases we have two choices: oversampling over that domains applying statistical techniques that allow for reliable estimates in that domains Small Domain or Small Area Geographical area or domain where direct estimators do not reach a minimum level of precision Small Area Estimator (SAE) An estimator created to obtain reliable estimate in a Small Area
15 SAE Model-Based Estimators of Poverty Indicators Introduction to Small Area Estimation Modern small area estimation approach is based on model-based methods Statistical models link the variable of interest with covariate information unit level models when auxiliary variables are known for out f sample units area level models when auxiliary variables are known only at area level Unit level models are based on mixed models or M-quantile models Area level models are based on the Fay and Herriot model
16 SAE Model-Based Estimators of Poverty Indicators Small Area Estimation: Unit Level Model Let y ij be the target variable for unit j in area i Let x ij be the auxiliary vector of p auxiliary variables for unit j in area i Mixed effect model y ij = x T ij β + u i + ε ij u i N(0, σ 2 u) ε ij N(0, σ 2 ε) M-quantile models Q(y ij x ij ) qij = x T ij β(q ij ) y ij = x T ij β(θ i ) + ε ij Parameters are estimated respectively with restricted maximum likelihood and iterative weighted least squares By the use of these models estimates are more accurate because we take into account the between area variation
17 SAE Model-Based Estimators of Poverty Indicators Small Area Estimation: Fay-Harriot Model Based on mixed models Fay and Harriot modeled direct estimates with area level auxiliary variables Let θ i be a target parameter and theta ˆ i its direct estimate in area i Let x i be the auxiliary vector of p auxiliary variables for area i ˆθ i = θ i + ε i ε i N(0, σ 2 ε i ) θ i = x i β + u i u i N(0, σ 2 u) The Fay and Harriot model is than ˆθ i = x T i β + u i + ε i Model parameters can be estimated by maximum likelihood methods (σ 2 ε i are considered known) By the use of the auxiliary information the accuracy of the estimates is improved
18 SAE Model-Based Estimators of Poverty Indicators Model-Based Estimators of Poverty Indicators Denoting by t the poverty line different poverty measures are defined by using (Foster et al., 1984) ( t yij ) αi(yij F α,ij = t) i = 1,..., N t The population distribution function in small area j can be decomposed as follows [ ] F α,j = N 1 j F α,ij + F α,ij i s j i r j Setting α = 0 defines the Head Count Ratio whereas setting α = 1 defines the Poverty Gap. HCR and PG can be estimated with unit level models.
19 SAE Model-Based Estimators of Poverty Indicators Model-Based Estimators of Poverty Indicators By the use of HCR and PG estimators we can estimate 13 on 19 Laeken indicators The mean squared error of these estimator can be estimated by bootstrap techniques (Molina and Rao, 2010; Marchetti et al., 2012) The use of Fay and Herriot models for binary or count outcome is currently under development The use of well-specified models and the availability of auxiliary variable is a key point to allow the use of small area estimation methods
20 Using Big Data in Small Area Estimation Part IV Using Big Data in Small Area Estimation
21 Using Big Data in Small Area Estimation Small Area Estimation and Big Data Our aim is to use the huge source of data coming from human activities - the big data - to make accurate inference at a small area level. We identified three possible approaches: 1 Use big data as covariate in small area models 2 Use survey data to remove self-selection bias from estimates obtained using big data 3 Use big data to validate small area estimates
22 Using Big Data in Small Area Estimation Use Big Data as Covariate in Small Area Models Big data often provide unit level data Outcome variable have to be linked to auxiliary variable in order to use unit level data in a small area model Due to technical challenges and law restriction it is unfeasible at this stage to have unit level big data that can be linked with administrative archive or census or survey data Big data can be aggregate at area level and then used in an area level model ˆθ i = d T i β + u i + ε i d i is a vector of p variables gathered from big data sources
23 Using Big Data in Small Area Estimation Use Survey Data to Remove Self-Selection Bias from Estimates Obtained Using Big Data An option is to use big data directly to measure poverty and social exclusion It is realistic to think the big data are not representative of the whole population of interest (self-selection problem) Using a quality survey we can check difference in the distribution of common variables between big data and survey data If there aren t common variables we can use known correlated data to check difference in the distribution Given this difference we can compute weights that allow the reduction of bias due to the self-selection of the big data
24 Using Big Data in Small Area Estimation Use Big Data to Validate Small Area Estimates Poverty and deprivation measures obtained from big data can be compared with similar measures obtained from survey data If there is accordance between big data estimates and survey data estimates then there is a double checked evidence of the level of poverty and deprivation If there is discrepancy there is need of further investigation
25 Using Big Data in Small Area Estimation Essential Bibliography Breckling, J. and Chambers, R. (1988). M-quantiles. Biometrika, 75, Chambers, R., Tzavidis, N. (2006). M-quantile models for small area estimation. Biometrika 93 (2), Foster, J., Greer, J., Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica 52, Giannotti, F. and Pedreschi, D. (2008). Mobility, Data Mining and Privacy, Springer. Giannotti, F., Pedreschi, D., Pentland, A., Lukowicz, P., Kossmann, D., Crowley, J. and Helbing, D. (2012). A planetary nervous system for social mining and collective awareness. European Physics Journal - Special Topics 214, Marchetti, S., Tzavidis, N. and Pratesi, M. (2012). Non-parametric bootstrap mean squared error Estimation for M-quantile Estimators of Small Area Averages, Quantiles and Poverty Indicators. Computional Statistics and Data Analysis, 56(10), Molina, I., Rao, J. (2010). Small area estimation of poverty indicators. The Canadian Journal of Statistics.
Small area model-based estimators using big data sources
Small area model-based estimators using big data sources Monica Pratesi 1, Dino Pedreschi 2, Fosca Giannotti 3, Stefano Marchetti 4, Nicola Salvati 5, Filomena Maggino 6 1 University of Pisa, e-mail: m.pratesi@ec.unipi.it
More informationBIG DATA AND OFFICIAL STATISTICS. Filomena Maggino, Monica Pratesi
BIG DATA AND OFFICIAL STATISTICS Filomena Maggino, Monica Pratesi What about risks, needs, and challenges of big-data in the context of measuring wellbeing? «Data are widely available, what is scarce is
More informationSmall Area Model-Based Estimators Using Big Data Sources
Journal of Official Statistics, Vol. 31, No. 2, 2015, pp. 263 281, http://dx.doi.org/10.1515/jos-2015-0017 Small Area Model-Based Estimators Using Big Data Sources Stefano Marchetti 1, Caterina Giusti
More informationMonica Pratesi, University of Pisa
DEVELOPING ROBUST AND STATISTICALLY BASED METHODS FOR SPATIAL DISAGGREGATION AND FOR INTEGRATION OF VARIOUS KINDS OF GEOGRAPHICAL INFORMATION AND GEO- REFERENCED SURVEY DATA Monica Pratesi, University
More informationMobile phone data for Mobility statistics
International Conference on Big Data for Official Statistics Organised by UNSD and NBS China Beijing, China, 28-30 October 2014 Mobile phone data for Mobility statistics Emanuele Baldacci Italian National
More informationThomas Risse. 2 nd International Alexandria Workshop 2015
Thomas Risse 2 nd International Alexandria Workshop 2015 We inhabit a measurable global society nowadays The increasing wealth of data is a chance to disentangle the social complexity Thomas Risse 2 Big
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationEstimating Economic Development with Mobile Phone Data
Estimating Economic Development with Mobile Phone Data [Experiments and Analysis Paper] Luca Pappalardo KDD Lab University of Pisa Pisa, Italy lpappalardo@di.unipi.it Zbigniew Smoreda SENSE Orange Lab
More informationUse of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach
Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Barbara Furletti, Lorenzo Gabrielli, Giuseppe Garofalo,
More informationSPATIAL MODELS IN SMALL AREA ESTIMATION IN THE CONTEXT OF OFFICIAL STATISTICS
Statistica Applicata - Italian Journal of Applied Statistics Vol. 24 (1) 9 SPATIAL MODELS IN SMALL AREA ESTIMATION IN THE CONTEXT OF OFFICIAL STATISTICS Alessandra Petrucci 1 Department of Statistics,
More informationEstimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR:
Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Developing Real-time Census from Crowds of Greater Dhaka Ayumi Arai 1 and Ryosuke Shibasaki 1,2 1 Department
More informationTREND ANALYSIS OF MONETARY POVERTY MEASURES IN THE SLOVAK AND CZECH REPUBLIC
TREND ANALYSIS OF MONETARY POVERTY MEASURES IN THE SLOVAK AND CZECH REPUBLIC Iveta Stankovičová Róbert Vlačuha Ľudmila Ivančíková Abstract The EU statistics on income and living conditions (EU SILC) is
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationand second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity
and second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity in cooperation with Eurostat DIDACTIC ORGANIZATION A. Concepts
More informationWorkshop Discussion Notes: Housing
Workshop Discussion Notes: Housing Data & Civil Rights October 30, 2014 Washington, D.C. http://www.datacivilrights.org/ This document was produced based on notes taken during the Housing workshop of the
More informationMethodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS).
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS. Vladislav Beresovsky National Center for Health Statistics 3311 Toledo Road Hyattsville, MD 078
More informationsecond level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity
second level university master Academic Year 2013/14 QoLexity Measuring, Monitoring and Analysis of Quality of Life and its Complexity LIST OF SUBJECTS AND TOPICS A. Concepts and tools Total: 7 credits
More informationSmall area estimation of turnover of the Structural Business Survey
12 1 Small area estimation of turnover of the Structural Business Survey Sabine Krieg, Virginie Blaess, Marc Smeets The views expressed in this paper are those of the author(s) and do not necessarily reflect
More informationProcesses of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region
Processes of urban regionalization in Italy: a focus on mobility practices explained through mobile phone data in the Milan urban region (DAStU, Politecnico di Milano) New «urban questions» and challenges
More informationBig Data & Privacy. It s Time for a New Deal on Personal Data Dino Pedreschi. KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.
Big Data & Privacy It s Time for a New Deal on Personal Data Dino Pedreschi KDD LAB ISTI CNR and Univ. of Pisa http://kdd.isti.cnr.it Taiwan-Italy Workshop Roma 27 Feb 2015 SIAMO TUTTI POLLICINI DIGITALI
More informationDIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationA Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models
A Composite Likelihood Approach to Analysis of Survey Data with Sampling Weights Incorporated under Two-Level Models Grace Y. Yi 13, JNK Rao 2 and Haocheng Li 1 1. University of Waterloo, Waterloo, Canada
More informationThe primary goal of this thesis was to understand how the spatial dependence of
5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial
More informationHow To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationStatistics Canada s National Household Survey: State of knowledge for Quebec users
Statistics Canada s National Household Survey: State of knowledge for Quebec users Information note December 2, 2013 INSTITUT DE LA STATISTIQUE DU QUÉBEC Statistics Canada s National Household Survey:
More informationMethodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1
Methodological aspects of small area estimation from the National Electronic Health Records Survey (NEHRS). 1 Vladislav Beresovsky (hvy4@cdc.gov), Janey Hsiao National Center for Health Statistics, CDC
More informationSMALL AREA ESTIMATES OF THE POPULATION DISTRIBUTION BY ETHNIC GROUP IN ENGLAND: A PROPOSAL USING STRUCTURE PRESERVING ESTIMATORS
STATISTICS IN TRANSITION new series and SURVEY METHODOLOGY 585 STATISTICS IN TRANSITION new series and SURVEY METHODOLOGY Joint Issue: Small Area Estimation 2014 Vol. 16, No. 4, pp. 585 602 SMALL AREA
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationAutomating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.
Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Paolo Nesi Dipartimento di Ingegneria dell Informazione, DINFO Università
More informationEPSRC Cross-SAT Big Data Workshop: Well Sorted Materials
EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations
More informationAssociation Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
More informationHandling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationCSAC, April 16-17, 2015 Discussion: Big Data and Modernizing Federal Statistics: Update by Bill Bostic and Ron Jarmin
CSAC, April 16-17, 2015 Discussion: Big Data and Modernizing Federal Statistics: Update by Bill Bostic and Ron Jarmin Noel Cressie National Institute for Applied Statistics Research Australia (NIASRA)
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationSelectivity of Big data
Discussion Paper Selectivity of Big data The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands 2014 11 Bart Buelens Piet Daas
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationConstructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data
Constructing Semantic Interpretation of Routine and Anomalous Mobility Behaviors from Big Data Georg Fuchs 1,3, Hendrik Stange 1, Dirk Hecker 1, Natalia Andrienko 1,2,3, Gennady Andrienko 1,2,3 1 Fraunhofer
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationScience Europe Position Statement. On the Proposed European General Data Protection Regulation MAY 2013
Science Europe Position Statement On the Proposed European General Data Protection Regulation MAY 2013 Science Europe Position Statement on the Proposal for a Regulation of the European Parliament and
More informationAnalysis of GSM calls data for understanding user mobility behavior
Analysis of GSM calls data for understanding user mobility behavior Barbara Furletti, Lorenzo Gabrielli, Chiara Renso, Salvatore Rinzivillo KDDLab, ISTI-CNR, Pisa, Italy Email: {barbara.furletti,lorenzo.gabrielli,chiara.renso,salvatore.rinzivillo}@isti.cnr.it
More informationDATA MINING TECHNIQUES FOR CRM
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 509 DATA MINING TECHNIQUES FOR CRM R.Senkamalavalli, Research Scholar, SCSVMV University, Enathur, Kanchipuram
More informationBig Picture of Big Data Software Engineering With example research challenges
Big Picture of Big Data Software Engineering With example research challenges Nazim H. Madhavji, UWO, Canada Andriy Miranskyy, Ryerson U., Canada Kostas Kontogiannis, NTUA, Greece madhavji@gmail.com avm@ryerson.ca
More informationGetting the Most from Demographics: Things to Consider for Powerful Market Analysis
Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market
More informationData Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationDiscussion of Presentations on Commercial Big Data and Official Economic Statistics
Discussion of Presentations on Commercial Big Data and Official Economic Statistics John L. Eltinge U.S. Bureau of Labor Statistics Presentation to the Federal Economic Statistics Advisory Committee June
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationFederal Statistics and College Entrepreneurships
Training Undergraduates, Graduate Students, Postdocs, and Federal Agencies: Methodology, Data, and Science for Federal Statistics Noel Cressie, Scott H. Holan, and Christopher K. Wikle Department of Statistics,
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationCustomer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
More informationBig Data / Privacy: Pick One?
Big Data / Privacy: Pick One? A. Michael Froomkin University of Miami School of Law froomkin@law.miami.edu 1 Privacy, Quickly Has multiple elements including control of access to body, to thoughts, protection
More informationStrategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure
Strategic Activities to Support Sustainability of Canada s Geospatial Data Infrastructure Paula McLeod Canada Centre for Mapping and Earth Observation United Nations 10 th Regional Cartographic Conference
More informationCHAPTER-24 Mining Spatial Databases
CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification
More informationPrivacy-by-design in big data analytics and social mining
Monreale et al. EPJ Data Science 2014, 2014:10 REGULAR ARTICLE OpenAccess Privacy-by-design in big data analytics and social mining Anna Monreale 1,2*, Salvatore Rinzivillo 2, Francesca Pratesi 1,2, Fosca
More informationContent-Based Discovery of Twitter Influencers
Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy irma.metra@mail.polimi.it chiara.francalanci@polimi.it
More informationMeasuring BDC s impact on its clients
From BDC s Economic Research and Analysis Team July 213 INCLUDED IN THIS REPORT This report is based on a Statistics Canada analysis that assessed the impact of the Business Development Bank of Canada
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationBig Data, Official Statistics and Social Science Research: Emerging Data Challenges
Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of
More informationUsing mobile phone data to map human population distribution
Using mobile phone data to map human population distribution Pierre Deville, Vincent D. Blondel Université catholique de Louvain, Belgium Andrew J. Tatem University of Southampton, UK Marius Gilbert Université
More informationMapping Linear Networks Based on Cellular Phone Tracking
Ronen RYBOWSKI, Aaron BELLER and Yerach DOYTSHER, Israel Key words: Cellular Phones, Cellular Network, Linear Networks, Mapping. ABSTRACT The paper investigates the ability of accurately mapping linear
More informationRegression analysis of probability-linked data
Regression analysis of probability-linked data Ray Chambers University of Wollongong James Chipperfield Australian Bureau of Statistics Walter Davis Statistics New Zealand 1 Overview 1. Probability linkage
More informationBEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More informationDEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering
DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY MASTER in Informatics Engineering Module general information Module name BIG DATA ANALYTICS SPECIALITY Typology Optional ECTS 18 Temporal organization C1S2
More informationOverview Applications of Data Mining In Health Care: The Case Study of Arusha Region
International Journal of Computational Engineering Research Vol, 03 Issue, 8 Overview Applications of Data Mining In Health Care: The Case Study of Arusha Region 1, Salim Diwani, 2, Suzan Mishol, 3, Daniel
More informationISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS
Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED
More informationA General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center
More informationHow to gather and evaluate information
09 May 2016 How to gather and evaluate information Chartered Institute of Internal Auditors Information is central to the role of an internal auditor. Gathering and evaluating information is the basic
More informationBig Data and Graph Analytics in a Health Care Setting
Big Data and Graph Analytics in a Health Care Setting Supercomputing 12 November 15, 2012 Bob Techentin Mayo Clinic SPPDG Archive 43738-1 Archive 43738-2 What is the Mayo Clinic? Mayo Clinic Mission: To
More informationStatistique. Canada. Statistics. Canada
The DLI and RDC Program: Valuable Research Resources for WIHIR Researchers Sandra Keyes, DLI Representative, UW Pat Newcombe-Welch, Data Analyst, SWORDC G. Keith Warriner, Co-director, South-Western Ontario
More informationCountry Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu.
Country Paper: Automation of data capture, data processing and dissemination of the 2009 National Population and Housing Census in Vanuatu. For Expert Group Meeting Opportunities and advantages of enhanced
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationUse of mobile phone data to estimate visitors mobility flows
Use of mobile phone data to estimate visitors mobility flows Lorenzo Gabrielli, Barbara Furletti, Fosca Giannotti, Mirco Nanni, Salvatore Rinzivillo Abstract Big Data originating from the digital breadcrumbs
More informationA.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace
A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace what is this class about? health informatics managing and making sense of biomedical information but mostly from an
More informationGovernment Management Committee. P:\2013\Internal Services\I&T\gm13005I&T (AFS # 17768)
STAFF REPORT ACTION REQUIRED Social Sentiment Analysis Date: May 31, 2013 To: From: Wards: Reference Number: Government Management Committee Director, 311 Toronto Chief Information Officer All P:\2013\Internal
More informationBig Data for Development: What May Determine Success or failure?
Big Data for Development: What May Determine Success or failure? Emmanuel Letouzé letouze@unglobalpulse.org OECD Technology Foresight 2012 Paris, October 22 Swimming in Ocean of data Data deluge Algorithms
More informationChapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
More informationShort-Term Forecasting in Retail Energy Markets
Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting
More informationCross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
More informationTHE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION
THE POWER OF SOCIAL NETWORKS TO DRIVE MOBILE MONEY ADOPTION This paper was commissioned by CGAP to Real Impact Analytics Public version March 2013 CGAP 2013, All Rights Reserved EXECUTIVE SUMMARY This
More informationPRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
More informationFrom Research Question to Exploratory Analysis
From Research Question to Exploratory Analysis Jo Wathan Ros Southern UK Data Service 21 Nov 2014 Today Research questions and data for questions Finding data in the UK Data Service Evaluating data for
More informationAutologistic Regression Model for Poverty Mapping and Analysis
Metodološki zvezki, Vol. 1, No. 1, 2004, 225-234 Autologistic Regression Model for Poverty Mapping and Analysis Alessandra Petrucci, Nicola Salvati, and Chiara Seghieri 1 Abstract Poverty mapping in developing
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationDiscovering Local Subgroups, with an Application to Fraud Detection
Discovering Local Subgroups, with an Application to Fraud Detection Abstract. In Subgroup Discovery, one is interested in finding subgroups that behave differently from the average behavior of the entire
More informationSimilarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
More informationFRAMEWORK TO EVALUATE INTERNET USE AND DIGITAL DIVIDE IN FIRMS
FRAMEWORK TO EVALUATE INTERNET USE AND DIGITAL DIVIDE IN FIRMS Michela Serrecchia * Istituto di Informatica e Telematica, CNR * Via G. Moruzzi, 1-56124 Pisa, Italy * Maurizio Martinelli * Istituto di Informatica
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationPUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER
PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER Michael Paul (@mjp39) Johns Hopkins University Crowdsourcing and Human Computation Lecture 18 Learning about the real world through Twitter
More informationData Mining for Customer Relationship Management (CRM)
Data Mining for Customer Relationship Management (CRM) Jaideep Srivastava srivasta@cs.umn.edu 1 Introduction Data Mining has enjoyed great popularity in recent years, with advances in both research and
More informationIntegrating Numeric, Statistical, and Geospatial Data Services for Graduate Students
Integrating Numeric, Statistical, and Geospatial Data Services for Graduate Students Maria A. Jankowska UCLA Research Library IASSIST Conference, June, 2012 Agenda The Age of Big Data/ UN Global Pulse
More informationReview on Financial Forecasting using Neural Network and Data Mining Technique
ORIENTAL JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY An International Open Free Access, Peer Reviewed Research Journal Published By: Oriental Scientific Publishing Co., India. www.computerscijournal.org ISSN:
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationHealthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
More informationBig Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado
Does Big Data spell Big Trouble for Lawyers? Paul Karlzen Director HR Information & Analytics April 1, 2015 Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado
More information