The Science in the Media Monitor (SMM) system Federico Neresini University of Padua - PaSTIS research unit Observa Science in Society 1
SMM allows to: monitoring different Web sources (newspapers, blog, forum, sites); integrating different text classification procedures; detecting articles/text in which S&T play an important role; confronting articles/texts highly related to S&T with articles/texts not specifically related to S&T making available the corpus of texts and/or the meta-data regarding each group of selected articles/texts (export function) provide statistics about S&T coverage by sources 2
Sources and archives Italian news, 8 on-line newspapers (starting from 2008, today ~1milion articles) Italian news artificial week sample (2 newspapers, 1992-2013, 163.000 articles) Italian blogs, starting from 2013 (500 blogs, ~1000 posts/day) English news, from 2014 (5 on-line newspapers: NYTimes, Guardian, Mirror, Telegraph, Times of India, ~1000/day) French news, from 2014 (5 on-line newspapers, Figaro, Lacroix, Le Monde, Les Echos, Liberation, Parisien, ~1000/day) 3
S&T Salience, Italian newspapers 2008-2014 Salience (% of relevant S&T articles on total) 2008-2014, four newspapers (Corriere della Sera, La Repubblica, Sole 24 Ore, La Stampa) 4
SMM home interface 5
Search Interface 6
SELECTING TECHNO-SCIENTIFIC RELATED ARTICLES FOR GROUND-TRUTH SAMPLE SOFTWARE: MALLET (open source) TAG: 24447 DATABASE:10920 As a social activity, technoscience is: done by someone; done inside an organization; done through a full range of communicative processes; done by designing, testing, developing and using instruments; done by involving some specific objects and giving raise to specific artifacts; described by narratives that use specific words and expressions. Considering all these features, technoscientific related articles should contain at least two of the following elements: 1) a scientist/engineer is mentioned; 2) a research centre is mentioned; 3) a scientific journal is mentioned; 4) a scientific discipline is mentioned (excluding humanities and social sciences); 5) there is a generic reference to research processes and/or technological innovations; 6) a discovery, an innovation, a scientific instrument or a medical apparatus is mentioned. 7
SOFTWARE: MALLET (open source) TAG: 24447 DATABASE:10920 INTERCODER S RELIABILITY coders disagreeme nt % A B 1,8 B A 7,9 C D 11,7 D A 1,3 A D 14,7 B C 3,3 C B 8,7 D C 4,1 avera ge 6,69 min 1,3 MAX 14,7 0 1 2 3 4 5 6 each article has been codified with a scale ranging from 0 to 6 % disagreement: 2 on a scale from 0 to 6; if the difference between coders is equal to 2 but it is the result of 6-4 or 5-3, then it has been not considered a disagreement 8
SOFTWARE: MALLET (open source) THE RESULTING GROUND-TRUTH TAG: 24447 DATABASE:10920 SAMPLE: NUMBER OF ARTICLES: RELATED TO TECHNOSCIENCE NOT RELATED TO TECHNOSCIENCE FIX PROP restrictive inclusive restrictive inclusive 2648 2607 3962 3921 1170 1211 1170 1211 TOTAL 3818 3818 5132 5132 RESTRICTIVE: if intercoder difference = 1, then the choosen value is the lower; difference = 2, choosen value = intermediate; difference = 3, choosen value = the lower + 1 INCLUSIVE: if intercoder difference = 1, then the choosen value is the highest; difference = 2, choosen value = intermediate; difference =3, then choosen value = the highest - 1 9
LDA Automated topic modeling 1992-2010 6 S&T topics identified with LDA - automated topic detection on relevant S&T coverage; 1992-2010 n=3293 10
SOFTWARE: MALLET (open source) TAG: 24447 DATABASE:10920 S&T Topics 2010-2012 SOFTWARE: MALLET (open source), TAG: 24447, DATABASE:10920 2010-12 TOPICS MAIN KEYWORDS WEIGHT 1. ECONOMICS ITALY, RESEARCH, UNIVERSITY 0.19 2. SMARTPHONES APPLE, SMARTPHONE, TABLET 0.11 3. WEB INTERNET, WEB, GOOGLE 0.11 4. HEALTH STUDY, RISK, WOMEN 0.11 5. RENEWABLES ENERGY, RENEWABLES, EMISSION 0.10 6. BIOMEDICINE CELLS, RESEARCH, RESEARCHERS 0.08 7. SCIENCE & RELIGION SCIENCE, LIFE, MAN 0.08 8. NUCLEAR POWER NUCLEAR, OBAMA, WAR 0.05 9. ECOLOGY WATER, SEA, SPECIE 0.08 10. SPACE HEARTH, SPACIAL, NASA 0.05 11. ROBOTICS DEGREE, TECHNOLOGY, COMPUTER 0.15 12. CAR INNOVATION CAR, KM, MOTOR 0.05 13. MEDICINE & CARE PATIENTS, DISEASE, DRUGS 0.05 14. FAMILY & SEXUALITY WOMEN, KIDS, PARENTS 0.06 11
Public opinion SOFTWARE: on MALLET nuclear (open source) TAG: 24447 DATABASE:10920 power 2002-2011 13
Nuclear power SOFTWARE: newspaper MALLET (open source) TAG: 24447 DATABASE:10920 coverage 1992-2012 14
SOFTWARE: MALLET (open source) TAG: 24447 DATABASE:10920 Media and public opinion correlations Mazur's hypthesis and risky terms indicators on newspapers, blogs and web forums 15