New Frontiers of Automated Content Analysis in the Social Sciences

Size: px
Start display at page:

Download "New Frontiers of Automated Content Analysis in the Social Sciences"

Transcription

1 Symposium on the New Frontiers of Automated Content Analysis in the Social Sciences University of Zurich July 1-3, Abstract Automated Content Analysis (ACA) is one of the key fields of methodological innovation in the social sciences, not least because there is a growing need to analyze the increasing number of digitally available text collections. Our goal is to bring together computational linguists and social scientists in order to improve the dialogue between the two research communities and to exploit mutual benefits for the advancement of ACA in the social sciences. More precisely, our program pairs social scientists and computational linguists into thematically coherent sessions, which are related to event analysis, trend identification, text classification, text scaling and sentiment detection. This setup should enable social scientists to gain insights into the sophisticated methodological instruments of computational linguistics in order to enhance their analyses. Computational linguists, in contrast, have the opportunity to apply their concepts and instruments to the vast array of research questions debated in the social sciences. The conference is jointly organized by the Swiss National Center of Competence Research in Democracy, the Stein Rokkan Chair of the European University Institute, as well as the Department of Political Science at the University of Zurich. Organisers Prof. Gerold Schneider and Dr. des. Bruno Wueest (NCCR Democracy) Prof. Hanspeter Kriesi (European University Institute) Prof. Silja Häusermann (University of Zurich) 1

2 Speakers and topics A: Social Sciences B: Computational linguistics Introductory keynote (July 1, 17:00-18:00, UZH main building, KOL-H-317) B. Kathleen McKeown Session 1 Extracting Complex Relational Data Chair: Jasmine Lorenzini A. Alexander Hanna and Pamela Oliver: Automated Coding of Protest Event Data B. Peter Makarov and Klaus Rothenhäusler: Towards Automated Protest Event Analysis Session 2 Retrieving Events from Large-Scale Data Chair: Bruno Wueest A. Wouter van Atteveldt: Using grammatical clauses for social and semantic network analysis B. Vasileios Lampos: Extracting Interesting Concepts from Large-Scale Textual Data Session 3 Trend Identification Chair: Swen Hutter A. Bruno Wueest: Taking care of time dependency and theoretical mismatch in topic models of political attention B. Michael Amsler and Gerold Schneider: Data-Driven and Linguistically Motivated Trend Identification Session 4 Enhancing Text classification Chair: Silja Häusermann A. Nils Weidmann and Mihai Croicu: Improving the Selection of News Reports for Event Coding Using Ensemble Classification B. Jordan Boyd-Graber: Interactive Topic Modeling for Labeling and Making Sense of Large Corpora Session 5 Data-Driven vs. Annotation-driven Text Mining Chair: Thomas Kurer A. Martin Wettstein and Werner Wirth: Semi-automated content analysis of news texts B. Andrew Salway: Some possibilities and limits of data-driven content analysis Session 6 Actor-level Sentiment Chair: Gerold Schneider A. Martin Haselmayer and Marcelo Jenny: Dictionary-based Sentiment Analysis with Crowdcoding B. Jochen Leidner: A Critical Analyisis of Sentiment Analysis Session 7 Text Scaling / Document-level sentiment Chair: Hanspeter Kriesi A. Will Lowe: Scaling things we can count B. Ralf Steinberger: Observing trends in multilingual media analysis Closing keynote (July 3, 16:30-17:30, UZH main building, KOL-H-317) A. Justin Grimmer 2

3 Outline As much as ACA is on the verge of becoming a standard tool for social scientists, scholars still dispute its promises and pitfalls. Hence, existing approaches to analyze unstructured text data mainly developed in computational linguistics need to be amended and adapted to the specific requirements of social scientific studies. To achieve this, we bring together computational linguists and social scientists who share interests in the analysis of large-scale text data. The conference is structured into seven thematic sessions, which are accompanied by an introductory and closing keynote speech. Prof. Kathleen McKeown will provide the introductory talk from the perspective of computational linguistics. The focus of her presentation will lie on the potential of computational linguistics for the social sciences. While possible applications seem abundant, there may well be paramount challenges for the integration of computational linguistic approaches into social scientific research frameworks. The closing talk will be given by Prof. Justin Grimmer. His talk will sum up the most important findings of the conference and give an outlook on the most likely advancements in the social scientific application of ACA in the near future. Session 1 Extracting Complex Relational Data Events such as the eruption of political protest or hostilities in armed conflicts are the unit of enquiry of many social scientific analyses. Obviously, the conceptual and operational specifications of what constitutes an event vary significantly. However, what all event analyses have in common is that a combination of several individual indicators is necessary to specify an event. On the most basic level, events are usually defined by the relation of an action, a date, and a location. When working with large-scale text data, this relation mining task of linking the single indicators to an event is far from trivial, especially since further indicators such as the goals of the action and actors involved are frequently added. Hence, one of the major challenges of automated event analyses is to generate models that allow one to extract events defined as compounds of single indicators. We have invited two research teams (Hanna and Oliver; Makarov and Rothenhäusler), who will report on their progress in dealing with this chal- 3

4 lenge. Both teams are in the process of creating a system for the automated recognition of political protest events, the former dealing with protest events in the US and the latter in Europe. Session 2 Retrieving Events from Large-Scale Data The output that social scientists need from event analyses is information on the actual occurrence of events, and not only the number of mentions of these events in the data. The insights on the recognition of events in session 1 thus have to be enriched with approaches on how to aggregate the single event instances found in the data. Indeed, if event data is retrieved largely through automated procedures, two challenging problems for the retrieval of events arise. First, an aggregative model needs to be able to distinguish between reports belonging to the same event and reports covering different events. Second, there most certainly is bias in how frequently the data source contains information on particular events. The most pressing issue here is how these biases can be assessed and controlled for. This session includes two presentations (van Attveldt; Lampos) approaching such questions from different perspectives. Session 3 Trend Identification This session deals with models to explore corpora in which documents have a sequential order. Agenda research, i.e. the study of attention to political topics over time, is a prominent research area in the social sciences where such corpora are used. Serial correlation in these corpora can be both a curse and a blessing. On the one hand, time-specific dynamics in textual data can be directly used to identify trends. On the other hand, the general evolvement of language over time needs to be taken into account in studies measuring time-invariant concepts such as topic categories. While this may complicate tracking topics, short-term linguistic changes, particularly the introduction of new terms and multi-word units, are equally a useful instrument. The presenters in this session (Gilardi, Wueest and Giovanoli; Amsler and Schneider) will, thus, propose and evaluate approaches that deal with trends in different ways. 4

5 Session 4 Enhancing Text Classification This session will present and discuss new approaches to classify textual content. Classification is one of the most frequent tasks of content analyses also in the social sciences. An important issue in this area of research is the frequent mismatch between the researcher s theoretical expectations and the results of unsupervised text classifications. While inductively generated text classifications are statistically sound, they often considerably deviate from the researchers conception of the structure of the data. Supervised classifications, in contrast, may suffer from poor predictive robustness if the classes strongly confound the statistical properties of the data. The first presentation by Boyd-Graber and Hu discusses a specific model that reconciles the potential conflict between theoretical expectations and statistical predictions. The second presentation by Weidmann and Croicu presents an application and extensive evaluation of a supervised classification on a large newswire corpus. Session 5 Data-driven vs. Annotation-driven Text Mining The participants of this session (Wettstein and Wirth; Salway) are invited to engage in the fundamental question on content analyses in the social sciences, that is whether we should approach text mining in a deductive or rather inductive way. Most social scientists expect manual approaches to the quantification of content to remain indispensable for some tasks at least in the near future. The question thus arises whether and how computational models can support human-generated data collections. An opposite perspective argues for a largely data-driven content analysis. The idea here is to automatically augment representations of text content until results come close to the concepts social scientists want to explore. We expect that the comparison of these two perspectives will lead to a particularly fruitful exchange on the possibilities and constraints to automated content analyses. Session 6 Actor-level Sentiment The identification of tonality in language is essential for many social scientific research questions, first of all for all analyses of political rhetoric and discourse. For many such applications, however, sentiment measures are only valuable if they can be attributed to political actors. In most cases, this involves the detection of sentiment at the level of statements and a model 5

6 relating this sentiment to the speakers communicating them. Among the pressing questions for this session are thus a) how tonality can be measured at the level of single statements such as sentences and speech acts and b) how this tonality can be related to speakers and addressees so that information on the intensity of political conflict can be generated. Presenters in this session are Haselmayer and Jenny as well as (tba). Session 7 Text Scaling / Document-level sentiment Research in the social sciences has already brought forward an impressive array of approaches that aim to locate text on latent scales such as ideological dimensions or documentlevel sentiment. These efforts have developed largely independently from similar advances in computational linguistics, which means the potential for an interdisciplinary exchange seems especially large in this area. The presentation by Lowe will provide the most recent advancements in this field from the social scientific perspective. Ralf Steinberger will complement the session by showing how computational linguists generalize such approaches to the study of trends over time, across different languages, and in different media. 6

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES

TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES 22 TEACHING INTERCULTURAL COMMUNICATIVE COMPETENCE IN BUSINESS CLASSES Roxana CIOLĂNEANU Abstract Teaching a foreign language goes beyond teaching the language itself. Language is rooted in culture; it

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

F. Aiolli - Sistemi Informativi 2007/2008

F. Aiolli - Sistemi Informativi 2007/2008 Text Categorization Text categorization (TC - aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Roadmapping Discussion Summary. Social Media and Linked Data for Emergency Response

Roadmapping Discussion Summary. Social Media and Linked Data for Emergency Response Roadmapping Discussion Summary Social Media and Linked Data for Emergency Response V. Lanfranchi 1, S. Mazumdar 1, E. Blomqvist 2, C. Brewster 3 1 OAK Group, Department of Computer Science University of

More information

The international conference Networks in the Global World. Bridging Theory and Method: American, European, and Russian Studies took place at St.

The international conference Networks in the Global World. Bridging Theory and Method: American, European, and Russian Studies took place at St. The international conference Networks in the Global World. Bridging Theory and Method: American, European, and Russian Studies took place at St. Petersburg State University on June 27-29, 2014. It was

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

DEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES

DEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES Teaching English with Technology, vol. 3, no. 1, pp. 3-12, http://www.iatefl.org.pl/call/callnl.htm 3 DEFINING EFFECTIVENESS FOR BUSINESS AND COMPUTER ENGLISH ELECTRONIC RESOURCES by Alejandro Curado University

More information

Preface. A Plea for Cultural Histories of Migration as Seen from a So-called Euro-region

Preface. A Plea for Cultural Histories of Migration as Seen from a So-called Euro-region Preface A Plea for Cultural Histories of Migration as Seen from a So-called Euro-region The Centre for the History of Intercultural Relations (CHIR), which organised the conference of which this book is

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC)

Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger. European Commission Joint Research Centre (JRC) Automated Multilingual Text Analysis in the Europe Media Monitor (EMM) Ralf Steinberger European Commission Joint Research Centre (JRC) https://ec.europa.eu/jrc/en/research-topic/internet-surveillance-systems

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Search Based Applications

Search Based Applications CHAPTER 1 Search Based Applications 1 1.1 INTRODUCTION Figure 1.1: Can you see the search engine behind these screens? Management of information via computers is undergoing a revolutionary change as the

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics

Computational Linguistics and Learning from Big Data. Gabriel Doyle UCSD Linguistics Computational Linguistics and Learning from Big Data Gabriel Doyle UCSD Linguistics From not enough data to too much Finding people: 90s, 700 datapoints, 7 years People finding you: 00s, 30000 datapoints,

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

GOOD MORNING TWEETHEARTS! : THE DIFFUSION OF A LEXICAL INNOVATION IN TWITTER

GOOD MORNING TWEETHEARTS! : THE DIFFUSION OF A LEXICAL INNOVATION IN TWITTER GOOD MORNING TWEETHEARTS! : THE DIFFUSION OF A LEXICAL INNOVATION IN TWITTER REBECCA MAYBAUM (University of Haifa) Abstract The paper analyses the diffusion patterns of a community-specific lexical innovation,

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo

More information

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic

More information

Semantic Search in E-Discovery. David Graus & Zhaochun Ren

Semantic Search in E-Discovery. David Graus & Zhaochun Ren Semantic Search in E-Discovery David Graus & Zhaochun Ren This talk Introduction David Graus! Understanding e-mail traffic David Graus! Topic discovery & tracking in social media Zhaochun Ren 2 Intro Semantic

More information

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:

More information

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Text Analysis for Big Data. Magnus Sahlgren

Text Analysis for Big Data. Magnus Sahlgren Text Analysis for Big Data Magnus Sahlgren Data Size Style (editorial vs social) Language (there are other languages than English out there!) Data Size Style (editorial vs social) Language (there are

More information

Presentation fiche: ESCO, the forthcoming European Skills, Competencies and Occupations taxonomy

Presentation fiche: ESCO, the forthcoming European Skills, Competencies and Occupations taxonomy EUROPEAN COMMISSION Employment, Social Affairs and Equal Opportunities DG Employment, Lisbon Strategy, International Affairs Employment Services, Mobility Brussels, 18 January 2010 EMPL D-3/LK D(2009)

More information

Probabilistic topic models for sentiment analysis on the Web

Probabilistic topic models for sentiment analysis on the Web University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

More information

Logo. International Symposium Security Dimensions in Europe Today 2004/12/20

Logo. International Symposium Security Dimensions in Europe Today 2004/12/20 Logo International Symposium Security Dimensions in Europe Today 2004/12/20 state A Institution hostility state B state C state A state B state C Need for Institutionalized Cooperation among

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

Methods in writing process research

Methods in writing process research Carmen Heine, Dagmar Knorr and Jan Engberg Methods in writing process research Introduction and overview 1 Introduction Research methods are at the core of assumptions, hypotheses, research questions,

More information

Find the signal in the noise

Find the signal in the noise Find the signal in the noise Electronic Health Records: The challenge The adoption of Electronic Health Records (EHRs) in the USA is rapidly increasing, due to the Health Information Technology and Clinical

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

II. TYPES OF LEVEL A.

II. TYPES OF LEVEL A. Study and Evaluation for Quality Improvement of Object Oriented System at Various Layers of Object Oriented Matrices N. A. Nemade 1, D. D. Patil 2, N. V. Ingale 3 Assist. Prof. SSGBCOET Bhusawal 1, H.O.D.

More information

Big data workshop. Digital Reading Network 6 th March 2014, Sheffield. Andrew Salway, Uni Research, Bergen Daniel Allington, The Open University

Big data workshop. Digital Reading Network 6 th March 2014, Sheffield. Andrew Salway, Uni Research, Bergen Daniel Allington, The Open University Big data workshop Digital Reading Network 6 th March 2014, Sheffield Andrew Salway, Uni Research, Bergen Daniel Allington, The Open University Overview What does big data mean (for social science and humanistic

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

ACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community

ACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community ACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community! Exclusive News and Analysis! Weekly Web Seminars! Podcasts! On- Demand Training! Networking! Resources! Jobs

More information

Study Plan for Master of Arts in Applied Linguistics

Study Plan for Master of Arts in Applied Linguistics Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment

More information

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews

Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie

More information

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos

More information

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and

More information

Contractual Relationships in Open Source Structures

Contractual Relationships in Open Source Structures Contractual Relationships in Open Source Structures Carsten Schulz JBB Rechtsanwälte carsten.schulz@ifross.de Abstract: The article provides an overview of the legal relationships in Open Source Structures.

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

2013 IOS Press. This document is published in:

2013 IOS Press. This document is published in: This document is published in: Bossé, E. et al. (eds.) (2013) Prediction and Recognition of Piracy Efforts Using Collaborative Human-Centric Information Systems, Proceedings of the NATO Advanced Study

More information

How to prepare and submit a proposal for EARLI 2015

How to prepare and submit a proposal for EARLI 2015 How to prepare and submit a proposal for EARLI 2015 If you intend to contribute to the scientific programme of EARLI 2015, you have to choose between various conference formats, which are introduced in

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

Text as (Big) Data. Fabrizio Gilardi. Department of Political Science University of Zurich. HWZ-Darden-Conference 4 June 2015. (Updated June 4, 2015)

Text as (Big) Data. Fabrizio Gilardi. Department of Political Science University of Zurich. HWZ-Darden-Conference 4 June 2015. (Updated June 4, 2015) Text as (Big) Data Fabrizio Gilardi Department of Political Science University of Zurich HWZ-Darden-Conference 4 June 2015 (Updated June 4, 2015) 1 / 31 Outline Text as Big Data Analyzing Text as Data

More information

Mid-Term Review: A contractual obligation and a fruitful dialogue

Mid-Term Review: A contractual obligation and a fruitful dialogue FP7 Marie Curie Initial Training Networks Mid-Term Review: A contractual obligation and a fruitful dialogue Guidelines for the Mid-Term Review 1 January 2014 2 1 These guidelines shall guide through the

More information

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com

Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,

More information

COURSE DESCRIPTION FOR THE BACHELOR DEGREE IN INTERNATIONAL RELATIONS

COURSE DESCRIPTION FOR THE BACHELOR DEGREE IN INTERNATIONAL RELATIONS COURSE DESCRIPTION FOR THE BACHELOR DEGREE IN INTERNATIONAL RELATIONS Course Code 2507205 Course Name International Relations of the Middle East In this course the student will learn an historical and

More information

Workshop Series on Open Source Research Methodology in Support of Non-Proliferation

Workshop Series on Open Source Research Methodology in Support of Non-Proliferation The International Centre for Security Analysis The Policy Institute at King s King s College London Workshop Series on Open Source Research Methodology in Support of Non-Proliferation Workshop 1: Exploiting

More information

THE BACHELOR S DEGREE IN SPANISH

THE BACHELOR S DEGREE IN SPANISH Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text

More information

Rethinking Sentiment Analysis in the News: from Theory to Practice and back

Rethinking Sentiment Analysis in the News: from Theory to Practice and back 1 Rethinking Sentiment Analysis in the News: from Theory to Practice and back Alexandra Balahur 1,2, Ralf Steinberger 1 1 European Commission Joint Research Centre 2 University of Alicante, Department

More information

Programme Specification

Programme Specification Programme Specification Awarding Body/Institution Teaching Institution Queen Mary, University of London Queen Mary, University of London Name of Final Award and Programme Title Master of Science (MSc)

More information

Towards a new paradigm of science

Towards a new paradigm of science Essay Towards a new paradigm of science in scientific policy advising 2007 Eva Kunseler Kansanterveyslaitos Environment and Health Department Methods of Scientific Thought Prof. Lindqvist Introduction:

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Research Challenge on Opinion Mining and Sentiment Analysis *

Research Challenge on Opinion Mining and Sentiment Analysis * Research Challenge on Opinion Mining and Sentiment Analysis * David Osimo 1 and Francesco Mureddu 2 Draft Background The aim of this paper is to present an outline for discussion upon a new Research Challenge

More information

Big Data. Data is the new content: How publishers can use Big Data to increase revenues. September 2014

Big Data. Data is the new content: How publishers can use Big Data to increase revenues. September 2014 Big Data Data is the new content: How publishers can use Big Data to increase revenues September 2014 Big Data revolutionizes publishing business About this report Qualitative enquiry with 15 German publishers

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

Using Artificial Intelligence to Manage Big Data for Litigation

Using Artificial Intelligence to Manage Big Data for Litigation FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch

More information

OPEN SOURCE INFORMATION ACQUISITION, ANALYSIS, AND INTEGRATION IN THE IAEA DEPARTMENT OF SAFEGUARDS 1

OPEN SOURCE INFORMATION ACQUISITION, ANALYSIS, AND INTEGRATION IN THE IAEA DEPARTMENT OF SAFEGUARDS 1 JAMES MARTIN CENTER FOR NONPROLIFERATION STUDIES Twentieth Anniversary Celebration: The Power and Promise of Nonproliferation Education and Training December 3-5, 2009 OPEN SOURCE INFORMATION ACQUISITION,

More information

Conquering the Astronomical Data Flood through Machine

Conquering the Astronomical Data Flood through Machine Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

Analysis and Synthesis of Help-desk Responses

Analysis and Synthesis of Help-desk Responses Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au

More information

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana

Introduction to Text Mining and Semantics. Seth Grimes -- President, Alta Plana Introduction to Text Mining and Semantics Seth Grimes -- President, Alta Plana New York Times October 9, 1958 Text expresses a vast, rich range of information, but encodes this information in a form that

More information

CONNECTING DATA WITH BUSINESS

CONNECTING DATA WITH BUSINESS CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm

More information

Honorary Fellow of the Amsterdam School of Communication Research (ASCoR), University of Amsterdam, The Netherlands

Honorary Fellow of the Amsterdam School of Communication Research (ASCoR), University of Amsterdam, The Netherlands Klaus Schönbach Chair of General Communication Science, Department of Communication, University of Vienna, Austria Honorary Professor of Zeppelin University, Friedrichshafen, Germany Honorary Fellow of

More information

Text Mining with R. Rob Zinkov. October 19th, 2010. Rob Zinkov () Text Mining with R October 19th, 2010 1 / 38

Text Mining with R. Rob Zinkov. October 19th, 2010. Rob Zinkov () Text Mining with R October 19th, 2010 1 / 38 Text Mining with R Rob Zinkov October 19th, 2010 Rob Zinkov () Text Mining with R October 19th, 2010 1 / 38 Outline 1 Introduction 2 Readability 3 Summarization 4 Topic Modeling 5 Sentiment Analysis 6

More information

Biological kinds and the causal theory of reference

Biological kinds and the causal theory of reference Biological kinds and the causal theory of reference Ingo Brigandt Department of History and Philosophy of Science 1017 Cathedral of Learning University of Pittsburgh Pittsburgh, PA 15260 E-mail: inb1@pitt.edu

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Internet of Things, data management for healthcare applications. Ontology and automatic classifications Internet of Things, data management for healthcare applications. Ontology and automatic classifications Inge.Krogstad@nor.sas.com SAS Institute Norway Different challenges same opportunities! Data capture

More information

Information Need Assessment in Information Retrieval

Information Need Assessment in Information Retrieval Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information

More information

Crime Pattern Analysis

Crime Pattern Analysis Crime Pattern Analysis Megaputer Case Study in Text Mining Vijay Kollepara Sergei Ananyan www.megaputer.com Megaputer Intelligence 120 West Seventh Street, Suite 310 Bloomington, IN 47404 USA +1 812-330-01

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

NSF Workshop on Big Data Security and Privacy

NSF Workshop on Big Data Security and Privacy NSF Workshop on Big Data Security and Privacy Report Summary Bhavani Thuraisingham The University of Texas at Dallas (UTD) February 19, 2015 Acknowledgement NSF SaTC Program for support Chris Clifton and

More information

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Context Aware Predictive Analytics: Motivation, Potential, Challenges Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Study program International Communication (120 ЕCTS)

Study program International Communication (120 ЕCTS) Study program International Communication (120 ЕCTS) Faculty Cycle Languages, Cultures and Communications Postgraduate ECTS 120 Offered in Skopje Description of the program The International Communication

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

The Six Critical Considerations of Social Media Threat Intelligence

The Six Critical Considerations of Social Media Threat Intelligence The Six Critical Considerations of Social Media Threat Intelligence Every day, angry rhetoric and hints of potential danger flow though streams of social media data. Some of these threats may affect your

More information