Agenda 11:30 Welcome + Quick progress report and status summary 11:45 Task leaders summarize ongoing activities (10 min each max) 12:30 Break. 14:00 Technical Presentations 15:00 Break 16:00 Short Technical Presentations / Demos 18:00 Directions for next meeting/next workshop 18:15 Meeting ends.
Technical Presentations Arian Pasquali, FEUP, Data Collection Plataform David Batista, INESC-ID, Sematic Relations Extraction
Short Technical Presentations (5-10 min each) Silvio Moreira - Replab + Semeval Joao Santos Entity Disambiguation Carolina Bento - Coocorrence networks João Oliveira - ngrams Publico 10 anos Raquel Albuquerque- Data Journalism at Público Francisco Couto - O mundo em Pessoa Pedro Saleiro Filtering at Replab Jorge Teixeira Timeline Gustavo Laboreiro Data Preparation Tiago Cunha TweeProfiles (PS) Tomy Rodrigues - RetweePatterns
The Problem... Computational journalism, aka database journalism Intensive use of software tools for news research, production and presentation What is the impact in the routines of newsrooms? What effect will these tools have on the quality of news and the productivity of journalists?
Challenges 1. Automatic content analysis (documents, news, blogs, micro-blogs, comments) 2. Automatic analysis of explicit and implicit social networks 3. Design of rich visualization and interaction interfaces 4. Case-study evaluation of developed computational journalism methodology in a production setting. Critical analysis of practical impact on newsroom quality, efficiency, and economics.
Partnership LASIGE, FCUL >> INESC-ID, IST (Mário J. Silva, Paula Carvalho and Francisco Couto from FCUL) LIACC, FEUP (Eugénio de Oliveira, Eduarda M. Rodrigues, Luís Sarmento, Carlos Soares) CIMJ, FCH/UNL (António Granado) Austin: School of Information and Computer Science at Austin (Luis Francisco-Revilla, Matthew Lease) PT Comunicações, SAPO (Benjamim Júnior, Celso Martinho, Luís Sarmento, Pedro Torres) Público (Sérgio B. Gomes)
Students Inesc-id: David Batista, Silvio Moreira Diogo Figueiredo, João Ramalho, João Oliveira, João Santos, Carolina Bento Rui Silva, David Forte UP: Matko Bosnjak, Arian Pasquali, Gustavo Laboreiro, Andrija Cajic, Nuno Baldaia, Tiago Cunha, Jorge Moreira Jorge Teixeira, Luís Rei (SAPO) UT Austin: Hohyon Ryu, Steven Fazzio UNL: Raquel Albuquerque, Tiago Carvalho
Research tasks 1. Information Mining 2. Information Discovery 3. Web Community Sensing 4. Tracking Information Flow 5. Interaction and Personalization 6. Query and Visualization 7. Computational Newsroom
Research tasks - Leaders 1. Information Mining 2. Information Discovery 3. Web Community Sensing 4. Tracking Information Flow 5. Interaction & Personalization 6. Query and Visualization 7. Computational Newsroom Paula Cravalho Bruno Martins (was Francisco) Carlos Soares (was Eduarda) Francisco Couto (was Matt) Mário J. Silva (was Revilla) Carlos Soares (Sarmento, Eduarda ) António Granado (Mário covers)
Information Mining Development of robust linguistic resources to process different types and genres of texts knowledge resources about media personalities: recognizing and resolving references to named-entities; sentiment lexicons and grammars: detecting the polarity of opinions about relevant personalities annotated corpora: training different text classifiers and evaluating classification procedures
Information Discovery Relationship extraction techniques to support information discovery in journalists activities Entity Ranking: finding the relevant entities for a given topic Entity Distillation: finding relevant resources for a given entity Attribute Selection: finding a list of key aspects to compare and differentiate a given set of entities
Web Community Sensing Modeling the credibility and authority of news sources and opinion makers in social networks Identifying influential individuals and experts on a given news topic Monitoring the community reaction to news stories and the polarity of opinions
Tracking Information Flow Identifying originating source of new ideas and information Understand evolutionary development of ideas through their iterative retelling and revision over time and across sources detecting cases and patterns of re-use (e.g. via memes or larger units of similar text) and information flow for source identification and novelty detection.
Interaction and Personalization Determining which interaction and personalization mechanisms are best suited to: Significantly enhance the user experience Provide the news site with useful, tacit feedback about its readers needs Investigating interactive news interfaces that support both automatic and manual personalization for readers
Query and Visualization Development of tools for querying extracted information and visualizing annotated documents and datasets Continuous scanning of the social web, news sources and various kinds of data streams Sapo already scans and processes many of these streams, in particular the news media
Computational Newsroom Environment where the new tools and resources developed in the project, together with other software will be accessible Will use tools and collect data for case studies to be evaluated observation and structured interviewing of the journalists in contact with the developed tools. The research will try to contextualize the changing nature of media work
More details Started October 1st 2010, 3 years http://dmir.inesc-id.pt/reaction/ 1 st milestone: End of Month 6 Specification First toolset prototype (should have demoed it at the 2011 Collaboratory) 2 nd milestone: End of Month 36 Demonstrable Computational Newsroom Asking for an Extension