Spatial-Temopral Events Detection Based On Weibo Checkin Data

Similar documents
第 二 届 中 国 中 东 欧 国 家 投 资 贸 易 博 览 会 总 体 方 案

中 国 石 化 上 海 石 油 化 工 研 究 院 欢 迎 国 内 外 高 层 次 人 才 加 入

Advancements in Slurry Gasification

电 信 与 互 联 网 法 律 热 点 问 题

2013 首 届 国 际 营 养 与 健 康 大 会

中 国 ( 南 京 ) 软 件 谷 简 介

2015 年 12 月 大 学 英 语 六 级 考 试 真 题 优 化 卷 ( 第 二 套 ) 答 题 卡 1

PCS Educational Foundation Inc.

THINGS YOU SHOULD KNOW ABOUT BOTTLED WATER IN CHINA

現 代 辦 公 室 行 政 及 科 技 文 憑 課 程. Diploma Programme in Office Administration & Office Technology

Beijing West Railway Station 北 京 西 站 / 118, East of Lianhua Dong Road, Fengtai District, Beijing 北 京 市 丰 台 区 莲 花. Quick Guide. General Information

南 京 农 业 大 学 农 业 资 源 与 生 态 环 境 研 究 所 土 壤 碳 氮 循 环 与 气 候 变 化 研 究 团 队 2009 年 报 2010 年 1 月 1 日, 南 京

Bird still caged? China s courts under reform. Workshop, June 3-4, 2016, Vienna, Austria. (University of Vienna, Department of East Asian Studies)

Synergy yet to be Seen, Maintain Accumulate

路 面 供 电 电 动 汽 车 感 应 电 能 传 输 系 统 综 述. A Review on Inductive Power Transfer Systems for Roadway Powered Electric Vehicles

1.1 Exploded view of indoor unit for KF-26GW/GX1b,KFR-26GW/GX1b,

A web-based vocabulary profiler for Chinese language teaching and research

Mobile TV Target Audience Measurement (Report 1)

English. Copyright DEWALT

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Model 123A24. Rocket motor ICP pressure sensor, 1000 psi, 5 mv/psi, Helium bleed, Installation and Operating Manual

CAPITALBIO MICROARRAY SERVICES

MySQL 備 份, 高 可 用 及 高 扩 展 解 决 方 案

THE PEI CHUN STORY Primary 4 Level Briefing for Parents 小 四 家 长 讲 解 会. 公 立 培 群 学 校 Pei Chun Public School

FRESH PRODUCE FORUM CHINA, 1 JUNE 2016, CHENGDU, CHINA 新 鲜 果 蔬 行 业 中 国 高 峰 论 坛,2016 年 6 月 1 日, 成 都 EXHIBITOR REGISTRATION FORM 参 展 商 申 请 表 格

A Brief Study on Cancellation of Late-Marriage and Late-Childbirth Leaves

Master Program in Project Management Yunnan University of Finance & Economics, 2016

China Foreign Affairs University(CFAU)

MultiModem ISI. User Guide. ISI5634UPCI & ISI9234PCIe Server Cards Intelligent Serial Interface. 1 MultiModem ISI Server Cards User Guide

新 东 方 大 学 英 语 四 级 考 试

Whole Genome Based Plant Breeding: Platforms and Technologies

PROFILES OF SPEAKERS, DISCUSSANTS AND MODERATORS * 发 言 人 评 论 员 与 主 持 人 简 历 (in alphabetical order)

The STC for Event Analysis: Scalability Issues

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

PhoCA: An extensible service-oriented tool for Photo Clustering Analysis

Unsupervised and supervised dimension reduction: Algorithms and connections

MARGINAL COST OF INDUSTRIAL PRODUCTION


Standard specifications SRA210D-01-FD11

INFORMATION NOTE. Causes of Poverty in Hong Kong: A Literature Review

广 东 培 正 学 院 2016 年 本 科 插 班 生 专 业 课 考 试 大 纲 基 础 英 语 课 程 考 试 大 纲

Context-aware taxi demand hotspots prediction

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

ifuzhen.com, ifortzone.com a product of Edgework Ventures Financial Management Software & Financial Wiki

Kingdom Tower: A New Icon for Saudi Arabia

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Introduction. A. Bellaachia Page: 1

MySQL High Availability. MMM & MHA in DP 卢钧轶

Automatic Text Processing: Cross-Lingual. Text Categorization

Behavior Analysis in Crowded Environments. XiaogangWang Department of Electronic Engineering The Chinese University of Hong Kong June 25, 2011

Global Observation Data Integration with Lexicographic and Geospatial Ontology

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

SPATIAL DATA CLASSIFICATION AND DATA MINING

Shooting for the 18TH ANNUAL CONFERENCE OF THE CHINESE AMERICAN BIOPHARMACEUTICAL SOCIETY

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

美 国 律 师 协 会 知 识 产 权 法 部 和 国 际 法 律 部 关 于 中 华 人 民 共 和 国 专 利 法 修 改 草 案 ( 征 求 意 见 稿 ) 的 联 合 意 见 书

Microsoft Big Data 解決方案與案例分享

DYNAMIC FUZZY PATTERN RECOGNITION WITH APPLICATIONS TO FINANCE AND ENGINEERING LARISA ANGSTENBERGER

Very happy to do so. I suppose a suitable venue / audience would depend on what the Minister intends to say.

College information system research based on data mining

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

Chapter ML:XI (continued)

Word Alignment of English-Chinese Bilingual Corpus Based on Chunks

Dynamic Data in terms of Data Mining Streams

China Property CREDIT CHINA. Proliferation of onshore CNY bonds

酷 刑 和 其 他 残 忍 不 人 道 或 有 辱 人 格 的 待 遇 或 处 罚 问 题 特 别 报 告 员 曼 弗 雷 德 诺 瓦 克 的 报 告

IMPLICIT SHAPE MODELS FOR OBJECT DETECTION IN 3D POINT CLOUDS

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Seagate Nytro XF1230 SATA SSD

Environmental Remote Sensing GEOG 2021

中 国 国 家 留 学 基 金 管 理 委 员 会 CHINA SCHOLARSHIP COUNCIL 驻 外

北 京 地 区 成 人 本 科 学 士 学 位 英 语 统 一 考 试

Local production for local consumption( 地 産 地 消 ) Worksheet 1. Task: Do you agree with this idea? Explain the reasons why you think so.

Walking on Thin Ice. Control, Intimidation and Harassment of Lawyers in China

TIETS34 Seminar: Data Mining on Biometric identification

Your partners in PET RECYCLING!

Multisensor Data Fusion and Applications

How To Understand The History Of Navigation In French Marine Science

Graduate School of Engineering. Master s Program, 2016 (October entrance)

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Visualizing Data: Scalable Interactivity

Introduction to Data Mining

China Property CREDIT CHINA PROPERTY. Onshore bond issuance at a record pace

Terms and Conditions of Purchase- Bosch China [ 采 购 通 则 博 世 ( 中 国 )]

Key Words: United Nations Peacebuilding, protection of human security, liberal

China s Agricultural Investment in Latin America. Margaret Myers Inter- American Dialogue November 21, 2013

The transmission calculation by empirical numerical model and Monte Carlo simulation in high energy proton radiography of thick objects *

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Transcription:

Spatial-Temopral Events Detection Based On Weibo Checkin Data Wan You Wuhan University 9 th International Workshop of Geographical Information Science

Main Contents 1.Traditional Methods for E-D on texts 2.Methods for Weibo/Twitter data E-D 3.A new Spatial temporal E-D method 4.Two Experiments 5.Conclusions

Event detection is an important and practical task in information science area. An event is commonly considered as an occurrence at a specific time and place. In social media space: An occurrence causing change in the volume of text data that discusses the associated topic at a specific time. This occurrence is characterized by topic and time, and often associated with entities such as people and location. But, for some local emergency events, we need to know the exact place of them. And we still need location information very much. How to add location to event detection task in social media space? 3

1. Traditional methods for event detection 1. Text model based: Hard to extract detailed descriptions of events Four: Boolean model, vector space model, language model, probability model; Text cluster algorithm (single Pass, k-means, DBScan) 2. Feature items based: Difficults in nature language processing Name entity recognition: noun(place, person, institution), key verbs, etc. Items co-occurrence. 3. Mixed method: Using NER to help the text modelling: given different weights for different name entities. Place name often has the most importance in event detection (*4) When considering place factor, we are facing problems: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities in different contexts Similarity calculation for both text semantics and locations 4

2. Methods for Weibo/Twitter data E-D 1. Text model based 2. Feature items based 3. Mixed method Key words extraction by tools Facing problems: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities for different contexts Similarity calculation for both text semantics and locations Facing some new problems. 5

A typical process to detect event from W/T 1. Data harvesting: Retrieving API data Web crawler Data quality: bias, incomplete 2. Data pre-processing: Noise reduction Natural language processing: word segmentation, name entity recognition 3. Event detection: Text cluster algorithm Text classification algorithm Data volume: very large, real-time updating Data content: short, noisy Text similarity mixed with location similarity Event lifetime 4. Visualization: Text geocoding Geo-visualization Place name recognition Event location 6

What can a GIScientist do to help E-D? So, if we want to consider place factor when doing event detection on weibo data, we will facing: 1 Place name recognition: large percentage of ambiguity (92%) 2 3 Weighting name entities for different contexts Similarity calculation for both text semantics and locations Facing new problems: 1 2 3 4 Data quality: bias, incomplete Data volume: very large, real-time updating Data content: short, noisy Event detection and location Big advantages: Location data: GPS coordinates, IP addresses, geo-tags, user profiles 7

3. New Spatial temporal E-D method Two assumptions: 1 2 People who join the hot event activities are having positive relations with people who use weibo do check in. People are tend to update there check in status at the place where a relevant event happens nearby. Spatial-temporal hot event: n ijk =f(x i, y i, r, t j, t k ), ijk =n ijk-median +1.5*IQR ijk Refresh the check in number at each spatial location every hour, and when n ijk > ijk : a spatial temporal hot event happened near the certain location i. 8

3. New Spatial temporal E-D method Data harvesting: Spatial range division Call API to retrieve nearby check in data (Only 1% of the whole weibo data stream) ST-Event detection: Detailed content extraction: Abnormal detection from Time sequence Natural language processing Text cluster algorithm on spatial-temporal data Geo-visualization Data harvesting: Retrieving API data Web crawler Data pre-processing: Noise reduction Natural language processing Event detection: Text cluster algorithm Text classification algorithm Visualization: Text geocoding Geo-visualization 9

4. Exp1: data quality evaluation & new method verify 90 80 70 60 50 40 30 20 10 0 Daily Checkins at International Exhibition Center Subway Station 36 38 36 84 52 36 43 展会时间 2014.2.10~2014.2.12 2014.2.20~2014.2.23 2014.2.26~2014.3.1 2014.3.4~2014.3.7 2014.3.11~2014.3.14 2014.3.19~2014.3.21 2014.3.26~2014.3.29 2014.4.2~2014.4.4 展会名称第二十四届中国国际钓鱼用品贸易展览会 第 18 届中国国际汽车用品展览会 2014 北京国际汽车维修检测诊断设备及汽车养护展览会 第二十一届中国 ( 北京 ) 国际建筑装饰及材料博览会 2014 第十三届中国国际门业展览会 第十四届中国 ( 北京 ) 国际石油石化技术装备展览会及中国国际管道防爆电气自动化海事展览会 第 22 届中国国际服装服饰博览会 第十九届京正 北京孕婴童产品博览会 京正 童装博览会 2014.4.9~2014.4.11 第二十五届国际制冷 空调 供暖 通风及食品冷冻加工展览会 2014.4.20~2014.4.27 2014 第十三届北京国际汽车工业展览会 10

4. Experiment2: ST-Event detection in Beijing 37 Stations where ST hot events happed on the Chinese Spring Festival Eve. 11

4. Experiment2: ST-Event detection in Beijing 5 Stations where ST hot events happed during the Chinese Spring Festival in Beijing City. 12

4. Experiment2: ST-Event detection in Beijing 21 Stations where ST hot events happened on first working day after the Chinese Spring Festival. 13

5. Conclusions 1. Weibo check in data can be used to detect spatial temporal event in real time and efficiently (Low type 1 error). But the criterion for the abnormal detection in a check in sequence data needs to be well defined (High type 2 error). 2. The data bias and incomplete problems are partially solved by using spatial division. But more investigations needs to be done on check in data quality. More work needs to be done in this big data year! 14