News media analysis at Lab SAPO UPorto Jorge Teixeira
Past deliverables and visualization prototypes Twitómetro Twitteuro Mundo Visto Daqui interativo (MVDi)
On-going work Mundo Numa Rede Sapo Notícias - Interativo
Mundo Numa Rede (Lusa)
Mundo Numa Rede (Jornal de Angola)
SAPO notícias Interativo
SAPO notícias MVDi interativo Ego-centric network of the entity Variation of the number of mentions of top personalities A minha rede
SAPO notícias MVDi entity profile page Verbetes Quotes (Voxx) News coverage of this personality
Mundo Numa Rede news archives New requirements: New news sources: expresso, DN, etc. New type of content: photos, videos New entities: organizations, locations, products, etc. New relations: family, professional, bussiness, etc. New challenges: Volume of data (dozens/hundreds of million items) Extraction of new types of entities (Verbetes) Identification of new relations
On-going and future work Scientific analysis and validation of MVDi Preliminary user study conducted during Codebits 2012 journalists and non-journalists analysis of survey results coding of recorded interviews UI & UX design guidelines
TwitterEcho 3 social media research platform Arian Pasquali
Outline Architecture overview Batch processing Real time stream processing Web Search Near real time dashboard Visualization components Networks Maps, etc. Next steps
Architecture overview Streaming client receives Twitter status (i.e. tweets) sends tweets to message broker Broker consumers receives tweets pre-processing (tokenization, language detection, etc) indexing in Solr and MongoDB real time computation streaming client message broker preprocessing / indexing database batch processing
Batch processing Extract users interactions (e.g., for a particular topic and date range) Extract unique URLs and expand short URLs Compute most mentioned users, hashtags, etc. database batch processing
Real time stream processing Extract entities, aggregate statistics and ranking URLs, user mentions, hashtags, etc. Extract tweets geo-location Store results in MongoDB stream processing broker consumer message broker message parse extract entities database sliding window counter
Crawling setup ## SETTINGS # for tracking words just enumerate them separated by commas (e.g. iphone, apple) tracking.keywords=#euro2012 ## ADAPTERS CONFIG # Home directory where files containing tweet s json will be stored home.dir=/big/stream/eurocopa # Message broker endpoint broker.address=robinson.fe.up.pt:2181 ## AUTHENTICATION SETTINGS # Twitter OAuth application tokens application.consumer.key=abcdefghijklmnopqrstuvxyz application.consumer.secret=1234567890abcdefghijklmnopqrstuvxyz # Twitter OAtuh user token accounts (E.g. TOKEN,SECRETTOKEN;TOKEN,SECRETTOKEN) twitter.accounts= 1234567890-abcdefghijklmnopqrstuvxyz,abcdefghijklmnopqrstuvx
Search Interfaces Free-text search for users and tweets
Visualization components Maps for geo-tagged tweets SentiBubbles
Visualization components User interaction networks retweets, mentions, replies
Dashboard for real-time monitoring Trending topics Most mentioned users Popular URLs Crawler activity Etc. (optimized for HD displays)
Next steps User interface Improved crawling setup Customizable dashboard Crawler Community expansion module Topic expansion crawling
Next steps Integration of pre-processing and data analysis modules topic modeling and user influence credibility scoring / spam detection bot detection language variant profiling social network metrics opinion mining
Social media content pre-processing Gustavo Laboreiro
Bot detection
Pre-processing modules (O DRA.MA DO OUTRO CROMOo é q nãoo sai de CASA SOZINHOo.) Drama. Drama. MtoO DRA,MA. Tokenization ( O DRA.MA DO OUTRO CROMOo é q nãoo sai de CASA SOZINHOo. ) Drama Drama MtoO, DRAMA. Error correction ( O DRAMA DO OUTRO CROMO é que não sai de CASA SOZINHO. ) Drama Drama Muito, DRAMA. Normalization ( O drama do outro cromo é que não sai de casa sozinho. ) drama drama muito drama.
Language variant detection
Context enrichment Enrich content with external information context linking to news media mentioned entities expanded URL Web page title and body text Other tweets linking to the same URL hashtags other tweets containing the same hashtags
TweeProfiles: detection of spatio-temporal patterns on Twitter Tiago Cunha
Context TwitterEcho Geo-referenced and timestamped tweets Analysis module: clustering in 4 dimensions Spatial Temporal Social Content Visualization Tool
Vision
Objectives
Motivation Clustering innovation: 4 dimensions Use spatio-temporal tweets What, where, when and by whom?
State of the art Clustering algorithms Distance functions Spatio-temporal data visualization tools Result evaluation Scientific and user related
Proposed solution
Search and Extraction of Information from Online Discussion Groups Jorge Moreira
Objectives Scalable crawler Data indexing
Discussion forum
Technologies
Architecture
Future Work Classification algorithms: Author Ranking Extension for blogs