VERBATIM Automatic Extraction of Quotes and Topics from News Feeds
|
|
|
- Richard Warren
- 10 years ago
- Views:
Transcription
1 VERBATIM Automatic Extraction of Quotes and Topics from News Feeds Luis Sarmento e Sérgio Nunes 4th Doctoral Symposium on Informatics Engineering Porto, Portugal, on February 5 6, 2009.
2 Verbatim: Motivation Growth in information production poses increasing challenges to consumers information overflow Tools that work as personal information butlers verbatim acquires information from live news feeds Extracts quotes and topics Presents this information in a web interface. Automatic watchdog by confronting quotes by the same entities on the same topics over time
3 Related Work (I) NewsExplorer [6] extract quotations in multilingual news. It extracts quotes, the name of the entity making the quote and also entities mentioned Krestel et al. [5] describe the development of a reported speech extension to the GATE framework, for English In [4], the authors propose the TF*PDF algorithm for extracting terms to be used as descriptive tag: most tags are quite uninformative and innappropriate for high-level topic tags
4 Related Work (II) In-Quotes from Google, presents a web-based interface structured in issues (i.e. topics) and displays side-by-side quotes from two actors at a time However, no implementation details are known Our work is different: It is focused on a single language (Portuguese) It addresses the problem of topic extraction and distillation, while most related works assume that news topics have been previously identified
5 System Overview Data Acquisition and Parsing Quote Extraction Removal of Duplicates Topic Classification Topic Identification + Generation of Training Set Training the Topic Classifiers Topic Classification Procedure Web Interface Update Routine
6 Data Acquisition and Parsing Using a fixed number of data feeds from major portuguese mainstream media sources for news gathering only generic mainstream sources in this initial selection Avoid the major challenges faced in web crawling We customized content decoding routines for each individual source. Fetching performed periodically every hour on all sources Content in stored in a UTF 8 encoded format on the server
7 Quote Extraction Large variety of ways in which quotes can be expressed We only address quotes that explicitly mention the name of the speaker to avoid anaphoric resolution More specifically, we look for sentences in the body of the news feed that match the following pattern: Name of Speaker, Optional Ergonym, Speech-Act, Modifier, Quote O Primeiro-ministro, José Sócrates, anunciou esta terça-feira que o Itinerário Principal 4 (IP4), que liga Vila Real a Bragança, será transformado em auto-estrada daqui a três anos, matching patterns e 35 Speech-Acts: 5% news We have low recall at this stage (but high precision)
8 Removal of Duplicates (I) It is usual to find duplicate or near duplicates news from which duplicate quotes will be extracted We try to aggregate the most similar quotes in quote groups, Q_1, Q_2,... Q_last Each new quote, q_new, is compared with the k most recent quote groups: Q_last, Q_last 1, Q_last 2... Q_last k+1 If the similarity between q_new and any of such groups is higher that a given threshold, s_min, then q_new is added to the most similar group.
9 Removal of Duplicates (II) Otherwise, a new group, Q_new is created, containing q_new only Comparison between the new quotes q_new and the longuest quote for each group First, check if the speakers are the same Then, content similarity is computed vector representation using a binary bag-ofwords approach (stop words are removed) vectors are compared (Jaccard Coefficient) Sim > 0.25, then quotes are considered duplicates
10 Topic Classification verbatim assigns a topic tag to each quote. Wide variety of topics in the news with new unseen topics can be added as more news are collected Efficient topic classification of news requires: dynamically identify new topics tags as they appear in the news automatically generate a training set using new topic tags re-train the topic classification procedure
11 Identification of Topics & Generation of Training Set Identification of topic tags by mining a common structure in titles: topic tag: title headline Literatura:"A viagem do elefante", de José Saramago, tem lançamento mundial quinta-feira em São Paulo... From about 26,000 news items, we found 783 different topic tags (occurring in 2+ titles). Generation of a training: For every tag t_i in the set of topic tags found, T group the set of news items for that topic I_i = (i1i, i2i... in i ) We will denote the complete training set as T I
12 Training the Topic Classifiers Two different text classification approaches Rocchio classification [7] and SVM [2] Both involve representing news as vectors of features We use a bag-of-words approach for vectorizing news feed items (word, frequency) information about the location of each word - title or body of the news - is kept Stop words are removed
13 Rocchio Classification Rocchio classification is a straight-forward way to classify items using a nearest-neighbour strategy For each topic t_i, of a set of T topics, we need to obtain [c_i ], a vector representing the topic class [c_i ] is computed by aggregating the vectors of news item [i_ij] for that topic (TF-IDF weighting of features is performed) Classification is made by comparing [i_new ], against class descriptions of the T classes We used the cosine, i.e. cos([i_new ], [c_i ])
14 SVM Classification SVMs are effective for classifying items described in high-dimensional spaces, as in text classification SVMs are binary classifiers, so we need to train one SVM for each topic t_k using I_k as positive examples and I I_k as negative examples: svm_k = train_svm (Ik, I Ik ) Then for a given news item, i_news : svm_k ([inews ]) > 0 if i_news ~ topic tk svm_k ([inews ]) < 0 if i_news!~ topic tk We used the SVM-light [3] with default parameters
15 Topic Classification Procedure Let T = (t1, t2... tk ) be the set of topic tags over, i_qt be the news items to classify, and let [iqt ] bet its vector representation. Then: find svm_max, the maximum svm_k ([i_qt]), corresponding k = k_svmmax find roc_max, the maximum cos([ck ], [iqt ]), corresponding to k = k_rocmax. if svm_max min_svm, i_qt ~ t_k_max svm elsif rocmax min_roc, i_qt ~ t_k_maxroc else do not classify i_qt (23% of cases)
16 Web Interface (I)
17 Web Interface (II)
18 Update Routines Quote extraction routine (1/1h): Read web feeds available. Run the quote extraction procedure. Store the extracted information in DB. Run the quote duplicate detection routine. Run the classification procedure. Store classification in the database. Topic Identification + Classifier re-training (1/24h): Run the topic detection procedure on all news in DB to build the training set T I. Vectorize all news items. Train Rocchio. Train SVM. Store Rocchio classes and SVMs descriptions
19 Results and Error Analysis Statistics from early January 2009 / 47 days up: 26,266 news items 570 quotes (68 not quotes %). 337 distinct named entities (6 incorrect 1.8%) Over 197 different topics (1 incorrect topic ident) Most of the errors have no impact on readability Classification: 42 topics misattributed (7.4%) No recall figures yet
20 Conclusions and Future Work (I) Fully functional online service working over live data from the portuguese mainstream media The overall feedback, both online and offline, as been positive Still much work ahead: Increase the number of news sources Improve quotation extraction: creating new rules for other common pattern (both in news body and title) Resolve anaphoric references
21 Conclusions and Future Work (II) Improving topic extraction and classification news sources are not consistent about the words used to describe the topics: Crise Finaceira vs. Crise Económica or Desporto vs. Futebol Upgrade the end-user interface additional navigational axis based on temporal information (e.g. filter by time feature). Evaluation: develop a reference collection for computing Precision and Recall at several stages
22 References 1. Sudipto Guha, Nina Mishra, Rajeev Motwani, and Liadan O Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science, pages , Thorsten Joachims. Text categorization with support vector machines: learning with many relevant features. In Claire Nedellec and C eline Rouveirol, editors, Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages , Chemnitz, DE, Springer Verlag, Heidelberg, DE. 3. Thorsten Joachims. Making large-scale svm learning practical. In Advances in Kernel Methods - Support Vector Learning. MIT Press, software available at 4. Khoo Khyou and Bun Mitsuru Ishizuka. Topic extraction from news archive using tf*pdf algorithm. In Proceedings of 3rd Int l Conference on Web Informtion System Engineering (WISE 2002),IEEE Computer Soc, pages WISE, Ralf Krestel, Sabine Bergler, and Rene Witte. Minding the source: Automatic tagging of reported speech in newspaper articles. In Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), May Bruno Pouliquen, Ralf Steinberger, and Clive Best. Automatic detection of quotations in multilingual news. In Proceedings of Recent Advances in Natural Language Processing 2007, Borovets, Bulgaria, J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, editor, The SMART Retrieval System, pages , Englewood, Cliffs, New Jersey, Prentice Hall.
23 Verbatim Thank you! Luís Sarmento: Sérgio Nunes:
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
Index Terms: Online Ticket Resolving System (OTRS), Network Operation Center(NOCs), Incident Management(INC),
Survey Paper On Resolving Trouble-Ticket System Vikas Kumar Gupta, Ashwin Rajpurohit,Prakhyat Sapkale, Gajanan Chainpure. Mr Kalyan Bamne Information Technology Department, Savitribai Phule Pune University.
Chapter 2 Automatic Expansion of a Social Network Using Sentiment Analysis
Chapter 2 Automatic Expansion of a Social Network Using Sentiment Analysis Hristo Tanev, Bruno Pouliquen, Vanni Zavarella, and Ralf Steinberger Abstract In this chapter, we present an approach to learn
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
An Introduction to Machine Learning and Natural Language Processing Tools
An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo) 8/24/2010-8/26/2010 Some reasonably reliable
Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research [email protected]
Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research [email protected] Introduction Logistics Prerequisites: basics concepts needed in probability and statistics
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2
UDC 004.75 A MACHINE LEARNING APPROACH TO SERVER-SIDE ANTI-SPAM E-MAIL FILTERING 1 2 I. Mashechkin, M. Petrovskiy, A. Rozinkin, S. Gerasimov Computer Science Department, Lomonosov Moscow State University,
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
Controlling Spam E-mail at the Routers
Controlling Spam E-mail at the Routers Banit Agrawal Nitin Kumar Mart Molle Department of Computer Science & Engineering University of California, Riverside, CA, 92521, USA email: bagrawal, nkumar, mart
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)
The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit
Sentiment analysis for news articles
Prashant Raina Sentiment analysis for news articles Wide range of applications in business and public policy Especially relevant given the popularity of online media Previous work Machine learning based
How To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
Kofax Transformation Modules Generic Versus Specific Online Learning
Kofax Transformation Modules Generic Versus Specific Online Learning Date June 27, 2011 Applies To Kofax Transformation Modules 3.5, 4.0, 4.5, 5.0 Summary This application note provides information about
The Enron Corpus: A New Dataset for Email Classification Research
The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
Term extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
Context Aware Predictive Analytics: Motivation, Potential, Challenges
Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Machine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
Projektgruppe. Categorization of text documents via classification
Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction
Semi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
High Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
WE DEFINE spam as an e-mail message that is unwanted basically
1048 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Support Vector Machines for Spam Categorization Harris Drucker, Senior Member, IEEE, Donghui Wu, Student Member, IEEE, and Vladimir
Online Cost-Sensitive Learning for Efficient Interactive Classification
Mohit Kumar Rayid Ghani Accenture Technology Labs, 161 N Clark St, Chicago, IL 60601 USA [email protected] [email protected] Abstract A lot of practical machine learning applications
Incorporating Window-Based Passage-Level Evidence in Document Retrieval
Incorporating -Based Passage-Level Evidence in Document Retrieval Wensi Xi, Richard Xu-Rong, Christopher S.G. Khoo Center for Advanced Information Systems School of Applied Science Nanyang Technological
Research Article 2015. International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-4) Abstract-
International Journal of Emerging Research in Management &Technology Research Article April 2015 Enterprising Social Network Using Google Analytics- A Review Nethravathi B S, H Venugopal, M Siddappa Dept.
How To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
Server Load Prediction
Server Load Prediction Suthee Chaidaroon ([email protected]) Joon Yeong Kim ([email protected]) Jonghan Seo ([email protected]) Abstract Estimating server load average is one of the methods that
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination
8 Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination Ketul B. Patel 1, Dr. A.R. Patel 2, Natvar S. Patel 3 1 Research Scholar, Hemchandracharya North Gujarat University,
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES
AUTOMATIC CLASSIFICATION OF QUESTIONS INTO BLOOM'S COGNITIVE LEVELS USING SUPPORT VECTOR MACHINES Anwar Ali Yahya *, Addin Osman * * Faculty of Computer Science and Information Systems, Najran University,
Data Mining in Personal Email Management
Data Mining in Personal Email Management Gunjan Soni E-mail is still a popular mode of Internet communication and contains a large percentage of every-day information. Hence, email overload has grown over
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology A Personalized Spam Filtering Approach Utilizing Two Separately Trained Filters Wei-Lun Teng, Wei-Chung Teng
Investigation of Support Vector Machines for Email Classification
Investigation of Support Vector Machines for Email Classification by Andrew Farrugia Thesis Submitted by Andrew Farrugia in partial fulfillment of the Requirements for the Degree of Bachelor of Software
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization
Bug Localization Using Revision Log Analysis and Open Bug Repository Text Categorization Amir H. Moin and Mohammad Khansari Department of IT Engineering, School of Science & Engineering, Sharif University
Inner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
Tivoli Security Information and Event Manager V1.0
Tivoli Security Information and Event Manager V1.0 Summary Security information and event management (SIEM) is a primary concern of the CIOs and CISOs in many enterprises. They need to centralize security-relevant
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering
A Two-Pass Statistical Approach for Automatic Personalized Spam Filtering Khurum Nazir Junejo, Mirza Muhammad Yousaf, and Asim Karim Dept. of Computer Science, Lahore University of Management Sciences
An Approach to support Web Service Classification and Annotation
An Approach to support Web Service Classification and Annotation Marcello Bruno, Gerardo Canfora, Massimiliano Di Penta, and Rita Scognamiglio [email protected], [email protected], [email protected],
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
SZTAKI @ ImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy Róbert Pethes András A. Benczúr Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute of the Hungarian Academy
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching
Assisting bug Triage in Large Open Source Projects Using Approximate String Matching Amir H. Moin and Günter Neumann Language Technology (LT) Lab. German Research Center for Artificial Intelligence (DFKI)
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected]
Université de Montpellier 2 Hugo Alatrista-Salas : [email protected] WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
How can we discover stocks that will
Algorithmic Trading Strategy Based On Massive Data Mining Haoming Li, Zhijun Yang and Tianlun Li Stanford University Abstract We believe that there is useful information hiding behind the noisy and massive
Emotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Car Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware
Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware Cumhur Doruk Bozagac Bilkent University, Computer Science and Engineering Department, 06532 Ankara, Turkey
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER
TECHNICAL SCIENCES Abbrev.: Techn. Sc., No 15(2), Y 2012 CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND k NEAREST NEIGHBOUR CLASSIFIER Piotr Artiemjew, Przemysław Górecki, Krzysztof Sopyła
