News media analysis at Lab SAPO UPorto. Jorge Teixeira



Similar documents
Task 3 Web Community Sensing & Task 6 Query and Visualization

REACTION Workshop Overview Porto, FEUP. Mário J. Silva IST/INESC-ID, Portugal REACTION

Search and Information Retrieval

Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS

Task 3 Web Community Sensing

Technical Presentations. Arian Pasquali, FEUP, REACTION Data Collection Plataform David Batista, INESC-ID, Sematic Relations Extraction REACTION

Big Data and Society: The Use of Big Data in the ATHENA project

Internet Marketing Career Point. Boost your Career. Comprehensive Digital Marketing Training Program

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

Twitter for Beginners

Real Time Analytics for Big Data. NtiSh Nati

DIGITAL MARKETING TRAINING

What to do Post Google Panda

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Social Media Glossary of Terms

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Homework: Visual Search and Interaction with NSF and NASA Polar Datasets Due: May 2nd, 2015, 12pm PT

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

Social Market Analytics, Inc.

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

Custom Online Marketing Program Proposal for: Hearthstone Homes

Using Twitter for Business

Dynamic accessibility analysis using big data

Analysis of Web Archives. Vinay Goel Senior Data Engineer

How To Create A Spam Detector On A Web Browser

Fast Data in the Era of Big Data: Twitter s Real-

JamiQ Social Media Monitoring Software

itunes Store Publisher User Guide Version 1.1

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Project Report BIG-DATA CONTENT RETRIEVAL, STORAGE AND ANALYSIS FOUNDATIONS OF DATA-INTENSIVE COMPUTING. Masters in Computer Science

Full-text Search in Intermediate Data Storage of FCART

Multichannel Customer Listening and Social Media Analytics

Use Excel to analyze Twitter data

What is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015

Skills for Employment Investment Project (SEIP)

Sizmek Formats. IAB Mobile Pull. Build Guide

Latest Developments in Oceanographic Applications of GIS including!

How To Map Human Dynamics With Social Media For Disaster Alerts

Provalis Research Text Analytics and the Victory Index

Dean College Social Media Handbook

SEO Services. Climb up the Search Engine Ladder

Generating leads and sales from the internet DUANE FAITEL PRESIDENT D NET MARKETING

Easy Social Media Management with Hootsuite

1. Introduction to SEO (Search Engine Optimization)

DIGITAL MARKETING. The Page Title Meta Descriptions & Meta Keywords

INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE

Exploring Big Data in Social Networks

ITP 140 Mobile Technologies. Mobile Topics

Microsoft Big Data. Solution Brief

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Bandra (W) : Kemps Corner :

Salesforce ExactTarget Marketing Cloud Radian6 Mobile User Guide

Title/Description/Keywords & Various Other Meta Tags Development

How Poll Everywhere Works. You ask a question. Poll Everywhere - Simple SMS Voting. Pricing Take a Tour Help & FAQ.

Social Media Management Pricing

Going Global With Social Media Analytics

See how social media listening and engagement can help your business

Streaming items through a cluster with Spark Streaming

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

What You Need to Know Before Distributing Your Infographic

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Online Reputation Management Services

MEMBER SOCIAL MEDIA SETUP GUIDEBOOK

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

5 Point Social Media Action Plan.

IT Peace of Mind. Powered by: Secure Backup and Collaboration for Enterprises

Social Media Marketing Strategies

Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp

Strategic Execution for Restaurant Rewards App. Implementation of content strategy spanning search, blog, and social

Collaborative Open Market to Place Objects at your Service

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

SEO, Search Engine and Online Reputation Management

Social Media Analytics

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Social Media, How To Guide for American Express Merchants

#umea2014 Digital Strategy

Mark E. Pruzansky MD. Local SEO Action Plan for. About your Local SEO Action Plan. Technical SEO. 301 Redirects. XML Sitemap. Robots.

Transcription:

News media analysis at Lab SAPO UPorto Jorge Teixeira

Past deliverables and visualization prototypes Twitómetro Twitteuro Mundo Visto Daqui interativo (MVDi)

On-going work Mundo Numa Rede Sapo Notícias - Interativo

Mundo Numa Rede (Lusa)

Mundo Numa Rede (Jornal de Angola)

SAPO notícias Interativo

SAPO notícias MVDi interativo Ego-centric network of the entity Variation of the number of mentions of top personalities A minha rede

SAPO notícias MVDi entity profile page Verbetes Quotes (Voxx) News coverage of this personality

Mundo Numa Rede news archives New requirements: New news sources: expresso, DN, etc. New type of content: photos, videos New entities: organizations, locations, products, etc. New relations: family, professional, bussiness, etc. New challenges: Volume of data (dozens/hundreds of million items) Extraction of new types of entities (Verbetes) Identification of new relations

On-going and future work Scientific analysis and validation of MVDi Preliminary user study conducted during Codebits 2012 journalists and non-journalists analysis of survey results coding of recorded interviews UI & UX design guidelines

TwitterEcho 3 social media research platform Arian Pasquali

Outline Architecture overview Batch processing Real time stream processing Web Search Near real time dashboard Visualization components Networks Maps, etc. Next steps

Architecture overview Streaming client receives Twitter status (i.e. tweets) sends tweets to message broker Broker consumers receives tweets pre-processing (tokenization, language detection, etc) indexing in Solr and MongoDB real time computation streaming client message broker preprocessing / indexing database batch processing

Batch processing Extract users interactions (e.g., for a particular topic and date range) Extract unique URLs and expand short URLs Compute most mentioned users, hashtags, etc. database batch processing

Real time stream processing Extract entities, aggregate statistics and ranking URLs, user mentions, hashtags, etc. Extract tweets geo-location Store results in MongoDB stream processing broker consumer message broker message parse extract entities database sliding window counter

Crawling setup ## SETTINGS # for tracking words just enumerate them separated by commas (e.g. iphone, apple) tracking.keywords=#euro2012 ## ADAPTERS CONFIG # Home directory where files containing tweet s json will be stored home.dir=/big/stream/eurocopa # Message broker endpoint broker.address=robinson.fe.up.pt:2181 ## AUTHENTICATION SETTINGS # Twitter OAuth application tokens application.consumer.key=abcdefghijklmnopqrstuvxyz application.consumer.secret=1234567890abcdefghijklmnopqrstuvxyz # Twitter OAtuh user token accounts (E.g. TOKEN,SECRETTOKEN;TOKEN,SECRETTOKEN) twitter.accounts= 1234567890-abcdefghijklmnopqrstuvxyz,abcdefghijklmnopqrstuvx

Search Interfaces Free-text search for users and tweets

Visualization components Maps for geo-tagged tweets SentiBubbles

Visualization components User interaction networks retweets, mentions, replies

Dashboard for real-time monitoring Trending topics Most mentioned users Popular URLs Crawler activity Etc. (optimized for HD displays)

Next steps User interface Improved crawling setup Customizable dashboard Crawler Community expansion module Topic expansion crawling

Next steps Integration of pre-processing and data analysis modules topic modeling and user influence credibility scoring / spam detection bot detection language variant profiling social network metrics opinion mining

Social media content pre-processing Gustavo Laboreiro

Bot detection

Pre-processing modules (O DRA.MA DO OUTRO CROMOo é q nãoo sai de CASA SOZINHOo.) Drama. Drama. MtoO DRA,MA. Tokenization ( O DRA.MA DO OUTRO CROMOo é q nãoo sai de CASA SOZINHOo. ) Drama Drama MtoO, DRAMA. Error correction ( O DRAMA DO OUTRO CROMO é que não sai de CASA SOZINHO. ) Drama Drama Muito, DRAMA. Normalization ( O drama do outro cromo é que não sai de casa sozinho. ) drama drama muito drama.

Language variant detection

Context enrichment Enrich content with external information context linking to news media mentioned entities expanded URL Web page title and body text Other tweets linking to the same URL hashtags other tweets containing the same hashtags

TweeProfiles: detection of spatio-temporal patterns on Twitter Tiago Cunha

Context TwitterEcho Geo-referenced and timestamped tweets Analysis module: clustering in 4 dimensions Spatial Temporal Social Content Visualization Tool

Vision

Objectives

Motivation Clustering innovation: 4 dimensions Use spatio-temporal tweets What, where, when and by whom?

State of the art Clustering algorithms Distance functions Spatio-temporal data visualization tools Result evaluation Scientific and user related

Proposed solution

Search and Extraction of Information from Online Discussion Groups Jorge Moreira

Objectives Scalable crawler Data indexing

Discussion forum

Technologies

Architecture

Future Work Classification algorithms: Author Ranking Extension for blogs