Use of social media data for official statistics

Similar documents
Population and housing census and Big Data. March 3, 2015

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

The Sandbox 2015 Report

This survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple times - once for each project.

Sentiment analysis using emoticons

Exploring Big Data in Social Networks

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Semantic Sentiment Analysis of Twitter

Sentiment analysis on tweets in a financial domain

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Highly Scalable Tile-Based Visualization for Exploratory Data Analysis

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Maschinelles Lernen mit MATLAB

Twitter Sentiment Analysis of Movie Reviews using Machine Learning Techniques.

Forecasting stock markets with Twitter

The Italian Hate Map:

Initial Report. Predicting association football match outcomes using social media and existing knowledge.

Topics for Industry Research Project and Partnership

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

SI485i : NLP. Set 6 Sentiment and Opinions

CS 229, Autumn 2011 Modeling the Stock Market Using Twitter Sentiment Analysis

NetView 360 Product Description

The Matsu Wheel: A Cloud-based Scanning Framework for Analyzing Large Volumes of Hyperspectral Data

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

Microblog Sentiment Analysis with Emoticon Space Model

STATISTICAL ANALYSIS OF SOCIAL NETWORKS. Agostino Di Ciaccio, Giovanni Maria Giorgi

Mining a Corpus of Job Ads

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Tweedr: Mining Twitter to Inform Disaster Response

Sentiment Analysis on Big Data

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Economic and Social Council

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

How To Learn From The Revolution

Text Mining with R Twitter Data Analysis 1

Sentiment analysis for news articles

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Integrating a Big Data Platform into Government:

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

End-to-End Sentiment Analysis of Twitter Data

Italian Journal of Accounting and Economia Aziendale. International Area. Year CXIV n. 1, 2 e 3

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Analysis of Tweets for Prediction of Indian Stock Markets

SUMMER SCHOOL STRUCTURE

Using R for Social Media Analytics

Can Twitter provide enough information for predicting the stock market?

Data Mining Yelp Data - Predicting rating stars from review text

Social Market Analytics, Inc.

Package syuzhet. February 22, 2015

Geospatial Information for disaster risk reduction and natural resources management. Rolando Ocampo Alcántar

Role of Social Networking in Marketing using Data Mining

INDIVIDUAL COURSE DETAILS

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

University of Guanajuato International Relations Office Language Department Social Sciences and Humanities Division Guanajuato Campus

Ganzheitliches Datenmanagement

Sentiment Analysis and Topic Classification: Case study over Spanish tweets

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

The Big Data Revolution: welcome to the Cognitive Era.

Data + Science Towards Organizational Excellence. eric Choo

A geo-referenced Census

DIY Social Sentiment Analysis in 3 Steps

PROGRAMME SPECIFICATION

Social Media Monitoring, Planning and Delivery

Exploiting Social Network for Forensic Analysis to Predict Civil Unrest

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

By: KPIT Technologies Limited. NHAI Portal Guide

Leveraging Big Data. A case study from Thomson Reuters

An interdisciplinary model for analytics education

NTT DATA Big Data Reference Architecture Ver. 1.0

Azure Machine Learning, SQL Data Mining and R

Artificial Intelligence and Machine Learning Models

Prepanet Monterrey Tech Community High School

Applying Machine Learning to Stock Market Trading Bryce Taylor

Twitter sentiment vs. Stock price!

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

Copyright 2013 Splunk Inc. Introducing Splunk 6

Scientific Data Infrastructure: activities in the Capacities Programme of FP7

Probes and Big Data: Opportunities and Challenges

Government Management Committee. P:\2013\Internal Services\I&T\gm13005I&T (AFS # 17768)

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Big Data Little Impact?

CAS-ICT at TREC 2005 SPAM Track: Using Non-Textual Information to Improve Spam Filtering Performance

University of Guanajuato International Relations Office Language Department Social Sciences and Humanities Division Guanajuato Campus

Multilanguage sentiment-analysis of Twitter data on the example of Swiss politicians

Databricks. A Primer

Social Media Monitoring: Engage121

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Transcription:

Use of social media data for official statistics International Conference on Big Data for Official Statistics, October 2014, Beijing, China Big Data Team

1. Why Twitter 2. Subjective well-being 3. Tourism exercises 4. Mobility studies 5. Next projects 6. Institutional strategy Content

Readily available Why Twitter? Up to 1% of global tweets at no cost Around 12 M accounts in Mexico Geo-located tweets from 700 thousand accounts 90 M plus tweets downloaded since January 2014

Infrastructure Big Data Requirements Multidisciplinary task team Advanced computer and programming knowledge and skills Danger zone! Machine learning DATA SCIENTIST Traditional researchl Experts in statistics and mathematics Data analysts Resultados

Infrastructure for collecting tweets Cluster Notebooks (Hydra) NoSql Database Elasticsearch Unix Red Hat Big Data Layers

First trials with 20 M tweets

Subjective well-being

Twitter for Subjective Well-being Naive Bayes Structure of the tweet Ease of access Support Vector Machines (SVM) Randomness KNN Geographic filters Word Count Sentiment Analysis University of Pennsylvania Mood of the Nation UK Big Data and Official Statistics NL Workshop on Sentiment Analysis 2013 of the Spanish Society for Natural Language Processing (SEPLN) Spanish Emotion Lexicon (SEL) KNN AFINN, WordNet, ANEW

Twitter for Subjective Well-being Word counting (single or tokenized) will not be used. Dictionaries for sentiment analysis will not be used either. Instead, supervised learning method will be used. Sentiment in a sample of tweets is graded by humans resulting in a training set. This set is then used to find similarities in order to classify the remaining and future tweets for sentiment.

Twitter for Subjective Well-being http://cienciadedatos.inegi.org.mx/pioanalisis An app was developed by INEGI in order to classify tweets for sentiments: positive, negative or neutral, and then allocate them to different domains. Universidad Tec Milenio supports the exercise with about 3000 students who are classifying the training set

Twitter for Subjective Well-being The Spanish firm Lambdoop offered free of charge, the implementation of the processing software with the method called supervised learning to extract the sentiment from the tweets. The Dattlas Division of the firm KioNetworks, offered free of charge, the provision of a cluster with enough capacity to carry out our pilot tests and be able to identify, assess and to budget the HW and SW requirements.

Tourist Exercise

Twitter- Tourist Exercise Background: Collaboration with the Ministry of Tourism. 95% of Twitter users in Mexico are in the age group of 18 to 49. 87% of domestic tourists are between 18 and 55 years old.

Twitter- Tourist Exercise Production of statistics for the tourist sector: 60 M tweets were processed from January to July 2014. Guanajuato y Puebla, days February 1st, 2nd y 3rd. 7,955 twitters: 827,424 tweets produced from any other state during a 6 month period. Find out from which state people twitted and how long had they stayed in that state. Stays of 1 to 15 consecutive days in Puebla or Guanajuato. With data obtained generate a map for Puebla and another for Guanajuato.

A. Twitter- Tourist Exercise

B. Twitter- Tourist Exercise

B. Twitter- Tourist Exercise Trips to Guanajuato or Puebla before, during and after a long weekend

Mobility studies

Mobility studies Research for development of an analytical method to measure trans-border mobility trough GPS tweets. Mexican (red) and US(blue) tweets Mexican tweets

Mobility studies Using the National Roads Network to find out whether domestic mobility analysis can be conducted, current work. (Plot of 70 M tweets)

Mobility studies

Next projects and the institutional strategy

Other potential projects from Twitter Consumer confidence Indicators on public safety Definition of metropolitan areas Quality of tourist services Conformation of regions

Institutional steps: Framework: Data Revolution: get everyone involved Big Data Participate in the main national and international initiatives Partnerships with the government, research centers and universities Explore methodologies, data sets, Approach the private sector Publish results as experimental data Form an inter-disciplinary task team

Institutional steps: Specific actions: Big Data project jointly funded by CONACYT and INEGI Partnerships with research centers: Infotec (programing, data processing, infrastructure); Centro Geo (geographical research); CIMAT (mathematics). Establish a Big Data Laboratory in Aguascalientes

Conociendo México 01 800 111 46 34 www.inegi.org.mx atencion.usuarios@inegi.org.mx @inegi_informa INEGI Informa