The Italian Hate Map:

Similar documents
Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network

Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.

SENTIMENT ANALYZER. Manual. Tel & Fax: info@altiliagroup.com Web:

Analysis of Social Media Streams

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Data Mining Yelp Data - Predicting rating stars from review text

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

DIY Social Sentiment Analysis in 3 Steps

Sentiment Analysis on Big Data

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

How To Make Sense Of Data With Altilia

Knowledge based energy management for public buildings through holistic information modeling and 3D visualization. Ing. Antonio Sacchetti TERA SRL

Business Intelligence meets Big Data: An Overview on Security and Privacy

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Text Mining - Scope and Applications

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

DISIT Lab, competence and project idea on bigdata. reasoning

Can Twitter provide enough information for predicting the stock market?

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Search and Information Retrieval

Social Market Analytics, Inc.

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Delivering Smart Answers!

Sentiment analysis on tweets in a financial domain

Neuro-Fuzzy Classification Techniques for Sentiment Analysis using Intelligent Agents on Twitter Data

Predicting stocks returns correlations based on unstructured data sources

Navigating Big Data business analytics

WHITEPAPER. Text Analytics Beginner s Guide

Reputation Management System

Sentiment analysis: towards a tool for analysing real-time students feedback

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Robust Sentiment Detection on Twitter from Biased and Noisy Data

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

Creating Usable Customer Intelligence from Social Media Data:

How To Understand The Benefits Of Big Data

Web Archiving and Scholarly Use of Web Archives

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

Visualization methods for patent data

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

Real-Time Analytics: Integrating Social Media Insights with Traditional Data

<no narration for this slide>

The Real-time Monitoring System of Social Big Data for Disaster Management

DIGITAL'INNOVATION'IN' HEALTHCARE:'' THE'SMART'CARE'SYSTEM'

Sentiment Analysis Tool using Machine Learning Algorithms

Big Data Analytics. Optimizing Operations and Enabling New Business Models

Big Data and Opinion Mining: Challenges and Opportunities

Industrial Dr. Stefan Bungart

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Use of social media data for official statistics

Conquering the Astronomical Data Flood through Machine

Customer Analytics. Turn Big Data into Big Value

Location-Based Social Media Intelligence

Insightful Analytics: Leveraging the data explosion for business optimisation. Top Ten Challenges for Investment Banks 2015

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

The Visualization Pipeline

PROMT Technologies for Translation and Big Data

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Click to edit Master title style

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Cleaned Data. Recommendations

Get the most value from your surveys with text analysis

Exploring Big Data in Social Networks

Big Data: Rethinking Text Visualization

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Graphical Web based Tool for Generating Query from Star Schema

WHITE PAPER. CRM Evolved. Introducing the Era of Intelligent Engagement

The only 100% open source, complete and flexible Business Intelligence suite

C E D A T 8 5. Innovating services and technologies for speech content management

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Transcription:

I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 The Italian Hate Map: semantic content analytics for social good (Università degli Studi di Bari Aldo Moro, Italy - SWAP Research Group)

2

The Italian HateMap Inspired by the Hate Map built by the Humboldt University joint research with a psychologists team of Rome University and a no-profit agency focused on human rights http://users.humboldt.edu/mstephens/hate/hate_map.html 3

The Italian HateMap Insight: To aggregate rough people-based data in order to analyze complex phenomena. http://users.humboldt.edu/mstephens/hate/hate_map.html 4

The Italian HateMap red = cholera cases blue = water (Not a new idea) Map of cholera in London, 1854 5

The Italian HateMap Research Question: Is it possible to extract and process social media to detect intolerant content posted on social networks and identify the most at-risk areas of the Italian country? 6

CrowdPulse A framework for real-time Semantic Analysis of Social Streams 7

CrowdPulse features Social Data Extraction Sentiment Analysis Semantic Tagging Processing & Visualization 8

CrowdPulse workflow 9

CrowdPulse Step 1: Social Data Extraction 10

CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 11

CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 12

CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #icities2015 @barack_obama @comunepalermo Heuristics Geo Content+Geo Page Group 13

CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #www2015 @barack_obama @comunefi Heuristics We only extract public content Geo Content+Geo Page Group 14

Use Case The Italian Hate Map CROWDPULSE SETTINGS Heuristics: Twitter content - 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism, homophobia, disability, anti-semitism 15

Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweet about an Italian ministry Tweet about ipod nano Tweet about an Italian football player Extracted content (seed term: nano/midget) 16

Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweet about an Italian ministry X X Tweet about ipod nano Tweet about an Italian football player Many non-intolerant Tweets are extracted! 17

Use Case The Italian Hate Map CROWDPULSE SETTINGS Sentiment Analysis and Semantic Tagging of the content 18

Semantic Tagging Motivations (midget) nano (ipod nano)? Keyword-based representation introduces a lot of noise in the analysis 19

Semantic Tagging Motivations E inutile, il mio nano non segnerà mai INTOLERANT? NOT INTOLERANT? 20

CrowdPulse Step 2: Semantic Tagging Solution: semantic processing of extracted content Entity Linking Algorithms Algorithms Input: textual content Output: identification and disambiguation of the entities mentioned in the text. (1) http://tagme.di.unipi.it (2) http://spotlight.dbpedia.org 21

Use Case The Italian Hate Map CROWDPULSE SETTINGS Non-intolerant Tweets are detected and filtered out. 22

CrowdPulse Step 3: Sentiment Analysis 23

Sentiment Analysis Motivations Is this content conveying any opinion? 24

Sentiment Analysis Motivations Is this content conveying any opinion? This is a crucial issue if people-based findings have to be generated 25

Sentiment Analysis Definition It is the field of study that analyzes people s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) We concentrated on the polarity detection task 26

Use Case The Italian Hate Map CROWDPULSE SETTINGS Tweets with positive or neutral sentiment are detected and filtered out. 27

Use Case The Italian Hate Map CROWDPULSE SETTINGS 28

CrowdPulse Step 4: Processing 29

Use Case The Italian Hate Map CROWDPULSE SETTINGS We have to build a map, so we only need geotagged content 30

Use Case The Italian Hate Map CROWDPULSE SETTINGS Definition of heuristics to increase the number of geotagged Tweets 31

Use Case The Italian Hate Map Dimension #Tweets #Geo %Geo Homophobia 110,774 8,501 7,66% Racism 154,170 1,940 1,24% Violence 1,102,494 28,886 2,62% Disability 479,654 3,410 0,75% Anti-Semitism 6,000 1,150 18,03% 32

CrowdPulse Step 4: Data Visualization 33

Use Case The Italian Hate Map CROWDPULSE OUTPUT Violence against women Disability based on OpenStreetMap 34

Use Case The Italian Hate Map CROWDPULSE OUTPUT Racism Homophobia based on OpenStreetMap 35

Conclusions The Italian Hate Map Crowdsourcing-based approach 1. Social content 2. containing the seed terms is extracted and processed in real-time Sentiment Analysis used to filter out Tweet with irony 3. 4. Analytics Console used Semantic Processing exploited to delete non-intolerant Tweets to build real-time hate maps Almost 2,000,000 social content extracted and analyzed. 36

Lessons Learned 37

Lessons Learned The Italian Hate Map Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms, time lapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors. These guidelines have been freely distributed to public administration on early 2015. 38

Lessons Learned DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS Pipeline of state of the art techniques Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization Use Case: The Italian Hate Map Thanks to the huge availability of textual data very complex phenomena can be analyzed in a totally new way 39

questions? Cataldo Musto, PhD cataldo.musto@uniba.it @cataldomusto http://www.di.uniba.it/~swap