Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network

Similar documents
Transcription:

I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network (Università degli Studi di Bari Aldo Moro, Italy - SWAP Research Group)

L Aquila April 6, 2009 5.8 magnitude earthquake 20 billions damages 70,000 people displaced 309 people died 2

L Aquila 2015: six years later 7 billions fundings still needed 22,000 people still displaced Diaspora 3

L Aquila 19 new towns around l Aquila 15,200 people today live there 4

L Aquila What about the consequences? Loss of trust, sense of belonging, relationships 5

L Aquila Loss of social capital 6

L Aquila Social Urban Network 7

L Aquila Social Urban Network Our contribution! 8

L Aquila Social Urban Network Research Question: Is it possible to extract and process social media to monitor in real time people feelings, opinions and sentiments about the current state of the social capital of L Aquila? 9

CrowdPulse A framework for real-time Semantic Analysis of Social Streams 10

CrowdPulse features Social Data Extraction Sentiment Analysis Semantic Tagging Processing & Visualization 11

CrowdPulse workflow 12

CrowdPulse Step 1: Social Data Extraction 13

CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 14

CrowdPulse Step 1: Social Data Extraction Source Extraction Heuristics 15

CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #www2015 @barack_obama @comunefi Heuristics Geo Content+Geo Page Group 16

CrowdPulse Step 1: Social Data Extraction Source Extraction Content User #earthquake #traffic #democrats #icities2015 @barack_obama @comunepalermo Heuristics We only extract public content Geo Content+Geo Page Group 17

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS Heuristics: - Twitter users (local newspapers, mention to politicians) - Twitter content+geo (50km around l Aquila and/or specific hashtags as #laquila #earthquake, etc) 18

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS Heuristics: - Facebook groups (identified after a thorough analysis) - Facebook pages (identified after a thorough analysis) 19

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS Tweets about the fear of new earthquakes. Facebook posts about citizens proposals. Tweets about people worried of the situation. Tweets about new buildings in the city. Extracted content (example) 20

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS Sentiment Analysis and Semantic Tagging of the content 21

Semantic Tagging Motivations (eagle)? aquila (italian) (italian city)? Keyword-based representation introduces a lot of noise in the analysis 22

Semantic Tagging Motivations Fate qualcosa per favore, l Aquila sta morendo! (Please, do something: l Aquila is going to die!) (Please, do something: the eagle is going to die!)? 23

CrowdPulse Step 2: Semantic Tagging identification and disambiguation of the entities mentioned in the text. Non-trivial NLP tasks (stopwords removal, n-grams identification, named entities recognition and disambiguation) are automatically performed 24

CrowdPulse Step 3: Sentiment Analysis 25

Sentiment Analysis Motivations Is this content conveying any opinion? 26

Sentiment Analysis Motivations Is this content conveying any opinion? This is a crucial issue if people-based findings have to be generated 27

Sentiment Analysis Definition It is the field of study that analyzes people s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) We concentrated on the polarity detection task 28

CrowdPulse Step 3: Sentiment Analysis Overall sentiment: :-( 29

CrowdPulse Step 3: Sentiment Analysis Overall sentiment: :-( The process can be iterated over a larger set of content, to get findings about the feeling of the population regards a certain topic 30

CrowdPulse Step 3: Sentiment Analysis Overall sentiment: :-( 31

CrowdPulse Step 4: Processing & Visualization 32

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS How to map each content with the social indicator it refers to? 33

Use Case L Aquila Social Urban Network CROWDPULSE SETTINGS Given a fixed set of social capital indicators, we built a classification model to associate each content (along with its sentiment) to the social indicator it refers to. 34

Use Case L Aquila Social Urban Network Tweet about new buildings in the city. Tweet about new buildings in the city. Social Capital Mapper 35

Use Case L Aquila Social Urban Network Tweet about new buildings in the city. Tweet about new buildings in the city. Input: Social indicators + classification model 36

Use Case L Aquila Social Urban Network Tweet about new buildings in the city. Domain-specific processing: Classification task 37

Use Case L Aquila Social Urban Network Tweet about new buildings in the city. Output: (multi-class) classification + sentiment 38

Use Case L Aquila Social Urban Network Tweet about new buildings in the city. The score of a social indicator is the average sentiment of all the content referring to it. 39

Use Case L Aquila Social Urban Network CROWDPULSE OUTPUT Overall score of the social indicators between March and August 2014 40

Use Case L Aquila Social Urban Network CROWDPULSE OUTPUT MONITORS THE STATE OF THE SOCIAL INDICATORS COMMUNITY PROMOTER Real-world application of the output DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL 41

Lessons Learned 42

Lessons Learned DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS Pipeline of state of the art techniques Semantic Processing, Sentiment Analysis, Machine Learning, Data Visualization Use Case: L Aquila Social Urban Network Thanks to the huge availability of textual data very complex phenomena can be analyzed in a totally new way 43

questions? Cataldo Musto, Ph.D cataldo.musto@uniba.it @cataldomusto