I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 The Italian Hate Map: semantic content analytics for social good (Università degli Studi di Bari Aldo Moro, Italy - SWAP Research Group)
2
The Italian HateMap Inspired by the Hate Map built by the Humboldt University joint research with a psychologists team of Rome University and a no-profit agency focused on human rights http://users.humboldt.edu/mstephens/hate/hate_map.html 3
The Italian HateMap Insight: To aggregate rough people-based data in order to analyze complex phenomena. http://users.humboldt.edu/mstephens/hate/hate_map.html 4
The Italian HateMap red = cholera cases blue = water (Not a new idea) Map of cholera in London, 1854 5
The Italian HateMap Research Question: Is it possible to extract and process social media to detect intolerant content posted on social networks and identify the most at-risk areas of the Italian country? 6
CrowdPulse A framework for real-time Semantic Analysis of Social Streams 7
CrowdPulse features Social Data Extraction Sentiment Analysis Semantic Tagging Processing & Visualization 8
CrowdPulse workflow 9
CrowdPulse Step 1: Social Data Extraction 10
CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 11
CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 12
CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #icities2015 @barack_obama @comunepalermo Heuristics Geo Content+Geo Page Group 13
CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #www2015 @barack_obama @comunefi Heuristics We only extract public content Geo Content+Geo Page Group 14
Use Case The Italian Hate Map CROWDPULSE SETTINGS Heuristics: Twitter content - 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism, homophobia, disability, anti-semitism 15
Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweet about an Italian ministry Tweet about ipod nano Tweet about an Italian football player Extracted content (seed term: nano/midget) 16
Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweet about an Italian ministry X X Tweet about ipod nano Tweet about an Italian football player Many non-intolerant Tweets are extracted! 17
Use Case The Italian Hate Map CROWDPULSE SETTINGS Sentiment Analysis and Semantic Tagging of the content 18
Semantic Tagging Motivations (midget) nano (ipod nano)? Keyword-based representation introduces a lot of noise in the analysis 19
Semantic Tagging Motivations E inutile, il mio nano non segnerà mai INTOLERANT? NOT INTOLERANT? 20
CrowdPulse Step 2: Semantic Tagging Solution: semantic processing of extracted content Entity Linking Algorithms Algorithms Input: textual content Output: identification and disambiguation of the entities mentioned in the text. (1) http://tagme.di.unipi.it (2) http://spotlight.dbpedia.org 21
Use Case The Italian Hate Map CROWDPULSE SETTINGS Non-intolerant Tweets are detected and filtered out. 22
CrowdPulse Step 3: Sentiment Analysis 23
Sentiment Analysis Motivations Is this content conveying any opinion? 24
Sentiment Analysis Motivations Is this content conveying any opinion? This is a crucial issue if people-based findings have to be generated 25
Sentiment Analysis Definition It is the field of study that analyzes people s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) We concentrated on the polarity detection task 26
Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweets with positive or neutral sentiment are detected and filtered out. 27
Use Case The Italian Hate Map CROWDPULSE SETTINGS 28
CrowdPulse Step 4: Processing 29
Use Case The Italian Hate Map CROWDPULSE SETTINGS We have to build a map, so we only need geotagged content 30
Use Case The Italian Hate Map CROWDPULSE SETTINGS Definition of heuristics to increase the number of geotagged Tweets 31
Use Case The Italian Hate Map Dimension #Tweets #Geo %Geo Homophobia 110,774 8,501 7,66% Racism 154,170 1,940 1,24% Violence 1,102,494 28,886 2,62% Disability 479,654 3,410 0,75% Anti-Semitism 6,000 1,150 18,03% 32
CrowdPulse Step 4: Data Visualization 33
Use Case The Italian Hate Map CROWDPULSE OUTPUT Violence against women Disability based on OpenStreetMap 34
Use Case The Italian Hate Map CROWDPULSE OUTPUT Racism Homophobia based on OpenStreetMap 35
Conclusions The Italian Hate Map Crowdsourcing-based approach 1. Social content 2. containing the seed terms is extracted and processed in real-time Sentiment Analysis used to filter out Tweet with irony 3. 4. Analytics Console used Semantic Processing exploited to delete non-intolerant Tweets to build real-time hate maps Almost 2,000,000 social content extracted and analyzed. 36
Lessons Learned 37
Lessons Learned The Italian Hate Map Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms, time lapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors. These guidelines have been freely distributed to public administration on early 2015. 38
Lessons Learned DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS Pipeline of state of the art techniques Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization Use Case: The Italian Hate Map Thanks to the huge availability of textual data very complex phenomena can be analyzed in a totally new way 39
questions? Cataldo Musto, PhD cataldo.musto@uniba.it @cataldomusto http://www.di.uniba.it/~swap