Spatial-Temopral Events Detection Based On Weibo Checkin Data Wan You Wuhan University 9 th International Workshop of Geographical Information Science
Main Contents 1.Traditional Methods for E-D on texts 2.Methods for Weibo/Twitter data E-D 3.A new Spatial temporal E-D method 4.Two Experiments 5.Conclusions
Event detection is an important and practical task in information science area. An event is commonly considered as an occurrence at a specific time and place. In social media space: An occurrence causing change in the volume of text data that discusses the associated topic at a specific time. This occurrence is characterized by topic and time, and often associated with entities such as people and location. But, for some local emergency events, we need to know the exact place of them. And we still need location information very much. How to add location to event detection task in social media space? 3
1. Traditional methods for event detection 1. Text model based: Hard to extract detailed descriptions of events Four: Boolean model, vector space model, language model, probability model; Text cluster algorithm (single Pass, k-means, DBScan) 2. Feature items based: Difficults in nature language processing Name entity recognition: noun(place, person, institution), key verbs, etc. Items co-occurrence. 3. Mixed method: Using NER to help the text modelling: given different weights for different name entities. Place name often has the most importance in event detection (*4) When considering place factor, we are facing problems: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities in different contexts Similarity calculation for both text semantics and locations 4
2. Methods for Weibo/Twitter data E-D 1. Text model based 2. Feature items based 3. Mixed method Key words extraction by tools Facing problems: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities for different contexts Similarity calculation for both text semantics and locations Facing some new problems. 5
A typical process to detect event from W/T 1. Data harvesting: Retrieving API data Web crawler Data quality: bias, incomplete 2. Data pre-processing: Noise reduction Natural language processing: word segmentation, name entity recognition 3. Event detection: Text cluster algorithm Text classification algorithm Data volume: very large, real-time updating Data content: short, noisy Text similarity mixed with location similarity Event lifetime 4. Visualization: Text geocoding Geo-visualization Place name recognition Event location 6
What can a GIScientist do to help E-D? So, if we want to consider place factor when doing event detection on weibo data, we will facing: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities for different contexts Similarity calculation for both text semantics and locations Facing new problems: 1 2 3 4 Data quality: bias, incomplete Data volume: very large, real-time updating Data content: short, noisy Event detection and location Big advantages: Location data: GPS coordinates, IP addresses, geo-tags, user profiles 7
3. New Spatial temporal E-D method Two assumptions: 1 2 People who join the hot event activities are having positive relations with people who use weibo do check in. People are tend to update there check in status at the place where a relevant event happens nearby. Spatial-temporal hot event: n ijk =f(x i, y i, r, t j, t k ), ijk =n ijk-median +1.5*IQR ijk Refresh the check in number at each spatial location every hour, and when n ijk > ijk : a spatial temporal hot event happened near the certain location i. 8
3. New Spatial temporal E-D method Data harvesting: Spatial range division Call API to retrieve nearby check in data (Only 1% of the whole weibo data stream) ST-Event detection: Detailed content extraction: Abnormal detection from Time sequence Natural language processing Text cluster algorithm on spatial-temporal data Geo-visualization Data harvesting: Retrieving API data Web crawler Data pre-processing: Noise reduction Natural language processing Event detection: Text cluster algorithm Text classification algorithm Visualization: Text geocoding Geo-visualization 9
4. Exp1: data quality evaluation & new method verify 90 80 70 60 50 40 30 20 10 0 Daily Checkins at International Exhibition Center Subway Station 36 38 36 84 52 36 43 展会时间 2014.2.10~2014.2.12 2014.2.20~2014.2.23 2014.2.26~2014.3.1 2014.3.4~2014.3.7 2014.3.11~2014.3.14 2014.3.19~2014.3.21 2014.3.26~2014.3.29 2014.4.2~2014.4.4 展会名称第二十四届中国国际钓鱼用品贸易展览会 第 18 届中国国际汽车用品展览会 2014 北京国际汽车维修检测诊断设备及汽车养护展览会 第二十一届中国 ( 北京 ) 国际建筑装饰及材料博览会 2014 第十三届中国国际门业展览会 第十四届中国 ( 北京 ) 国际石油石化技术装备展览会及中国国际管道防爆电气自动化海事展览会 第 22 届中国国际服装服饰博览会 第十九届京正 北京孕婴童产品博览会 京正 童装博览会 2014.4.9~2014.4.11 第二十五届国际制冷 空调 供暖 通风及食品冷冻加工展览会 2014.4.20~2014.4.27 2014 第十三届北京国际汽车工业展览会 10
4. Experiment2: ST-Event detection in Beijing 37 Stations where ST hot events happed on the Chinese Spring Festival Eve. 11
4. Experiment2: ST-Event detection in Beijing 5 Stations where ST hot events happed during the Chinese Spring Festival in Beijing City. 12
4. Experiment2: ST-Event detection in Beijing 21 Stations where ST hot events happened on first working day after the Chinese Spring Festival. 13
5. Conclusions 1. Weibo check in data can be used to detect spatial temporal event in real time and efficiently (Low type 1 error). But the criterion for the abnormal detection in a check in sequence data needs to be well defined (High type 2 error). 2. The data bias and incomplete problems are partially solved by using spatial division. But more investigations needs to be done on check in data quality. More work needs to be done in this big data year! 14