Information Retrieval, Information Extraction and Social Media Analytics
|
|
- Patricia McCormick
- 8 years ago
- Views:
Transcription
1 Anwendersoftware a Information Retrieval, Information Extraction and Social Media Analytics Based on chapter 10 of the Advanced Information Management lecture Laura Kassner Universität Stuttgart Winter Term 2014
2 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 2
3 Information Retrieval Systems simpler data model than database systems Information organized as a collection of documents Documents are unstructured, no schema Goal: locate relevant documents based on user input keywords example documents e.g., find documents containing the words database systems "database system" database system query input IR collection of documents document_x document_y document_z works on textual descriptions provided with non-textual data such as images Example: Web search engines, desktop file search Dr. Holger Schwarz, Universität Stuttgart, IPVS 3
4 Information Retrieval Systems Differences from database systems: No transactional updates (including concurrency control and recovery) Database systems deal with structured data, with schemas that define the data organization IR systems deal with some querying issues not generally addressed by database systems - Approximate searching by keywords - Ranking of retrieved answers by estimated degree of relevance Dr. Holger Schwarz, Universität Stuttgart, IPVS 4
5 Keyword Search In full text retrieval, all the words in each document are considered to be keywords. Word in a document = term query expressions consist of keywords and the logical connectives "and", "or", and "not" and is implicit for queries with several worcs Ranking of documents on the basis of estimated relevance to a query is critical! Factors for relevance: Term frequency - Frequency of occurrence of query keyword in document Inverse document frequency - How many documents the query keyword occurs in Fewer give more importance to keyword Hyperlinks to documents - More links to a document document is more important (cf. PageRank) Dr. Holger Schwarz, Universität Stuttgart, IPVS 5
6 Document Indexing An inverted index maps each keyword K i to a set of documents S i that contain the keyword Documents identified by identifiers Inverted index may record Keyword locations within document to allow proximity based ranking Counts of number of occurrences of keyword to compute TF and operation: Finds documents that contain all of K 1, K 2,..., K n. Intersection S 1 S 2... S n or operation: documents that contain at least one of K 1, K 2,, K n Union S 1 S 2... S n Each S i is kept sorted to allow efficient intersection/union by merging not can also be efficiently implemented by merging of sorted lists Dr. Holger Schwarz, Universität Stuttgart, IPVS 6
7 Relevance Ranking Using Terms TF-IDF (Term frequency/inverse Document frequency) ranking: n(d) = number of terms in the document d n(d, t) = number of occurrences of term t in the document d n(t) = number of documents containing term t Relevance of a document d to a term t n(d, t) TF (d, t) = log 1 + n(d) - log factor to avoid excessive weight to frequent terms Relevance of a term t in document collection D IDF (t) = log D n(t) Dr. Holger Schwarz, Universität Stuttgart, IPVS 7
8 Relevance Ranking Using Terms Relevance of document d to term t: r (d, t) = TF (d, t) IDF(t) Relevance of document d to query Q: r (d, Q) = TF (d, t) n(t) t Q Dr. Holger Schwarz, Universität Stuttgart, IPVS 8
9 Relevance Ranking Using Terms Assume: - document A of 100 words contains the term "database" 3 times and the term "system" 6 times - document base D consists of 1 Mio. documents documents contain the term "database" documents contain the term "system" Relevance of a document d to a term TF(A,"database") = log(1+3/100) = TF(A,"system") = log(1+6/100) = Relevance of a term in document collection D IDF("database") = log(1000) = 3 IDF("system") = log(20) = TF-IDF(A,"database") = 0.013*3 = TF-IDF(A,"system") = 0.025*1.301 = Dr. Holger Schwarz, Universität Stuttgart, IPVS 9
10 Relevance Ranking Using Terms Most systems are more complex than that: Words that occur in title, author list, section headings, etc. are given greater importance Words whose first occurrence is late in the document are given lower importance Very common words such as a, an, the, it etc. are eliminated (stop words) Proximity: if keywords in query occur close together in the document, the document has higher importance than if they occur far apart Documents are returned in decreasing order of relevance score (usually only top n documents) Dr. Holger Schwarz, Universität Stuttgart, IPVS 10
11 Similarity Based Retrieval Similarity based retrieval - retrieve documents similar to a given document Similarity may be defined on the basis of common words - E.g. find k terms in A with highest TF (d, t ) / n (t ) and use these terms to find relevance of other documents. Relevance feedback: Similarity can be used to refine answer set to keyword query User selects a few relevant documents from those retrieved by keyword query, and system finds other documents similar to these Dr. Holger Schwarz, Universität Stuttgart, IPVS 11
12 Similarity Based Retrieval Vector space model: Define an n-dimensional space, where n is the number of terms in the document set. Vector for document d goes from origin to a point whose i th coordinate is TF (d,t ) / n (t ) The cosine of the angle between the vectors of two documents is used as a measure of their similarity. Usage in keyword search: Transform set of keywords into a document vector Calculate cosines for every document vector in D Use these to rank documents for retrieval Dr. Holger Schwarz, Universität Stuttgart, IPVS 12
13 Measuring Retrieval Effectiveness Information-retrieval systems save space by using index structures that support only approximate retrieval. This may result in: false negative (false drop): some relevant documents may not be retrieved. false positive: some irrelevant documents may be retrieved. For many applications, false positives are more tolerable than false negatives Dr. Holger Schwarz, Universität Stuttgart, IPVS 13
14 Measuring Retrieval Effectiveness Relevant performance metrics: precision: relevant documents retrieved documents retrieved documents % of retrieved documents that are relevant recall : % of relevant documents that were retrieved relevant documents retrieved documents relevant documents retrieved docs. Dr. Holger Schwarz, Universität Stuttgart, IPVS relevant not relevant 14
15 Measuring Retrieval Effectiveness Recall vs. precision tradeoff: increase recall by retrieving many documents Reduce precision by retrieving many irrelevant documents among them Measures of retrieval effectiveness: Recall as a function of number of documents fetched, or Precision as a function of recall - Equivalently, as a function of number of documents fetched E.g. precision of 75% at recall of 50%, and 60% at a recall of 75% Problem: measures of relevance Dr. Holger Schwarz, Universität Stuttgart, IPVS 15
16 Information Retrieval and Structured Data Information retrieval systems originally treated documents as a collection of words Information extraction systems infer structure from documents, e.g.: Extraction of house attributes (size, address, number of bedrooms, etc.) from a text advertisement Extraction of topic and people named from a news article Relations or XML structures used to store extracted data System seeks connections among data to answer queries Question answering systems Dr. Holger Schwarz, Universität Stuttgart, IPVS 16
17 Concept-Based Querying Approach For each word, determine the concept it represents from context Use one or more ontologies: - Hierarchical structure showing relationship between concepts - E.g.: elephant IS-A mammal can be used to standardize terminology in a specific field Ontologies can link multiple languages Foundation of the Semantic Web (not covered here) Useful for building concept-based querying: information extraction Which concepts make sense for this document collection? Which relations do we detect between concepts in this collection? Dr. Holger Schwarz, Universität Stuttgart, IPVS 17
18 Concept Resource: WordNet Lexical database of English verbs, nouns, and adjectives Taxonomy of concepts as represented by words Links concepts via semantic relations Synonyms happy, glad grouped into synsets Hypernyms and Hyponyms dog, mammal Meronyms wheel, tire Disambiguates word senses Freely available Equivalents exist for several natural languages e.g. GermaNet 18
19 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 19
20 Beyond Search: Information Extraction Information Retrieval only cares about retrieving documents containing a certain content Information Extraction distills content from documents i.e. uses documents as a source for Question answering Summary creation Compiling structured data Discovering new facts and relations This (often) requires text analytics! 20
21 Beyond Search: Text Analytics Tokenization: Splitting a text into words (tokens) - simple: on whitespace and punctuation - complex: what about compound nouns, multiwords, abbreviations, etc.? Sentence Splitting: finding sentence boundaries - Non-trivial: punctuation can also mark an abbreviation ('Dr. W. Jones is out of office today.'), not every sentence is delimited by punctuation (headlines), what about mid-sentence quotes? Stemming / Lemmatization: reducing words to base forms - e.g. running, horses Part-of-Speech-Tagging: Assigning a word its part of speech - Noun, verb, preposition, adverb tagsets - Challenges: ambiguous word class, e.g. 'I run a mile every day' vs. 'Today's run was great!' Chunking: combining several tokens into syntactic chunks, e.g. corresponding to noun phrases, prepositional phrases, adverbial... Parsing: assigning structure to entire sentences - constitutent vs. dependency Dr. Holger Schwarz, Universität Stuttgart, IPVS 21
22 Text Analytics Example Pipeline Text Files Natural Language Processing et al. Structured Information S-Klasse bezeichnet die Oberklasse der Automarke Mercedes-Benz. Sie steht für luxuriöse Limousinen und Coupés. Im Jahr 1972 erschien mit der Baureihe 116 die erste offiziell von Mercedes-Benz (MB) so bezeichnete S-Klasse. (Wikipedia) Entstehungsjahr(S-Klasse): 1972 IS-A(S-Klasse, Luxusauto) 22
23 Text Analytics Example Pipeline Words Parts of Speech Named Entities Sentence Structure S-Klasse bezeichnet die Oberklasse der Automarke Mercedes-Benz. Sie steht für luxuriöse Limousinen und Coupés. Im Jahr 1972 erschien mit der Baureihe 116 die erste offiziell von Mercedes-Benz (MB) so bezeichnete S-Klasse. (Wikipedia) Verbs NP S-Klasse (N) Names S bezeichnet (VFIN) VP NP NP NP die (ART) Oberklasse (N) der (ART) Automarke (N) Mercedes- Benz (N) 23
24 Text Analytics - Challenges Language-specific: Different structures, e.g. English / Turkish / Chinese Statistical tools perform well, but training requires large amounts of (annotated) data best performances usually for English, annotation is labor-intensive Web data: often written by non-native speakers and full of slang, abbreviations, nonstandard language need robust tools for 'ungrammatical' input Domain-specific: Narrow, fixed-structure idioms from one domain are easier to handle but may require manual calibration Free text with no topic restrictions is more difficult to process Complexity: full-blown text analytics is costly and not always precise enough for some applications, surfacey approaches such as regular expression pattern matching may be better suited 24
25 Text Analytics Frameworks and Toolkits Frameworks: Apache UIMA GATE Java Toolkits: OpenNLP Stanford Core NLP Python Toolkits: NLTK TextBlob 25
26 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 26
27 Social Media Analytics Central questions: Who cares about what on the web? What are people saying about [brand person event] online? Which topics are popular / trending? Positive or negative opinions? Which voices are influential? How does opinion spread? Can we identify recurring root causes? Are there correlations with [marketing campaigns product releases new strategies]? Company: Which products should I recommend to customer X based on his buying behavior? User: Which product should I buy? Is this movie worth watching? Do people like my blog? 27
28 Social Media Analytics structured sources Structured data sources: Page views Clicks Likes Followers Friend graphs Retweet/reblog statistics 28
29 Social Media Analytics structured sources 29
30 Social Media Analytics unstructured sources Unstructured data sources: News texts Blog content Reviews Comment sections Tweets and status updates 30
31 Sentiment Detection a.k.a. opinion mining performed mainly on unstructured, free text data sources research focus since early 2000s Machine learning available Large text collections available (the internet) Fed by interest in text summarization throughout 1990s classifies text snippets or entire documents as subjective / objective positive / negative / (neutral) strongly or weakly opinionated (intensity) Connects sentiment to topics / entities e.g. products, productions, persons 31
32 Sentiment Detection Not as easy as it seems 32
33 Text Features for Sentiment Detection Features for Sentiment and Subjectivity Classification Keywords with positive or negative sentiment Frequency Occurrence (yes/no) more effective Bigram or trigram features? Conflicted evidence, but bag-of-word models are problematic e.g. with regard to negation Parts-of-speech Only reliable feature: frequent adjectives signal subjectivity Syntax No clear evidence that parsing is helpful But: syntactic knowledge helps identify valence shifters e.g. negation, intensifiers, diminishers Collocations / syntactic patterns may be useful Predicate-argument combinations may carry sentiment where the single terms do not latent sentiment - The price is low = positive Rule-based classification vs. machine learning approaches 33
34 Creating a Sentiment Dictionary Hand-craft? Extremely time-consuming Even human annotators do not agree on all polarities Cluster terms according to frequencies, context, and constructions 'elegant but over-priced', 'clever and informative' 2 clusters assign orientation (e.g. cluster with more frequent average occurrences = positive seems to work) Use seed words with known polarity find words with similar distribution, co-occurrence, or which are synonymous propagate polarity e.g. across WordNet links 34
35 Sentiment and Topic What units are we looking at? Do we want to classify the document / paragraph / sentence / snippet? Local vs. global sentiment of a text Distance between topic and sentiment term same sentence, same paragraph, title of document? Topic-dependent sentiment Wal-mart reports that profits rose - positive in an article about Wal-mart, negative in an article about Target the Samsung Galaxy S5 is better than the LG 3G - positive for Samsung, negative for LG making things (slightly) easier: let user specify which topic they want to consider Discourse structure Headlines, position in paragraph Quoting and responding behavior in conversation threads 35
36 Resources for Sentiment Detection polarity word lists / nets English: Harvard General Inquirer SentiWordNet German: SentiWS Reviews with both unstructured and structured content labeled data for learning sentiment 36
37 Social Media Analytics Demographic Information What kind of people talk about a product? Men, women, children? Parents? Do they own the product? Are they potential customers? Where do they live? Username: supermama_10 Location: Houston, Texas I usually buy Pampers diapers, they are the best I gave my older daughter a Samsung S3 for Xmas, but now my husband uses it all the time lol 37
38 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,
39 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,
40 Social Media Analytics Refining Concepts Refining concepts: Concept suggestion component Select a representative sample of the gathered documents (downsampling) Extract the most relevant terms from these documents as keywords Cluster documents based on these keywords Control cluster: using just the initially specified concepts Similar to control cluster add keywords as new concept suggestions Different from control cluster add keywords as blacklist suggestions Feedback to user refined concept selection new crawl for documents 40
41 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,
42 Sentiment Detection and Concept Extraction Sentiment Detection (similar, published approach: WebFountain sentiment miner, which also belongs to IBM) Linguistic preprocessing: Tokenization POS-tagging Parsing phrase and sentence structures Identify concepts and feature terms Part-of or attribute-of relationship with concept or known feature (e.g. 'lens' part-of 'camera', 'price' attribute-of 'camera') Candidates: beginning definite base noun phrases, i.e. POS-tag/word sequences 'the NN', 'the JJ NN', 'the NN NN' etc. (NN = noun, JJ = adjective) (Yi et al, 2005) 42
43 Sentiment Detection and Concept Extraction Sentiment Detection Sentiment lexicon <entry> <POS-tag> <polarity> excellent JJ + Sentiment patterns <predicate> <sentence-category> <target> <predicate> - a verb <sentence-category> - a subject phrase, object phrase, complement / adjective phrase or prepositional phrase, associated with a polarity + or - Flipped polarity on target is signified by ~ marker <target> - a subject or object phrase at which the sentiment is directed 43
44 Sentiment Detection and Concept Extraction Semantic relationship analysis: identify pattern elements from parse trees, starting with predicates In a pattern, assign sentiment to target based on source sentiment If the phrase or the sentence contains a negation, reverse the sentiment polarity Precision: 86 %, Recall: 56 % 44
45 Social Media Analytics a concrete architecture IBM Social Media Analytics Alper et Coutinho al et al.,
46 Resources / Further Reading Information retrieval: Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge University Press, Sentiment Detection: Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval (2008): Social Media Analytics: Coutinho, Fabio Cardoso, Alexander Lang, and Bernhard Mitschang. "Making Social Media Analysis More Efficient Through Taxonomy Supported Concept Suggestion." Proceedings of the BTW Alper, Basak, et al. "OpinionBlocks: Visualizing Consumer Reviews." Proceedings of the IEEE VisWeek Workshop on Interactive Text Analytics for Decision Making Yi, Jeonghee, and Wayne Niblak. Sentiment Mining in WebFountain. Proceedings of the 21st ICDE
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationStock Market Prediction Using Data Mining
Stock Market Prediction Using Data Mining 1 Ruchi Desai, 2 Prof.Snehal Gandhi 1 M.E., 2 M.Tech. 1 Computer Department 1 Sarvajanik College of Engineering and Technology, Surat, Gujarat, India Abstract
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationInformation Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
More informationCustomer Intentions Analysis of Twitter Based on Semantic Patterns
Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationThe Seven Practice Areas of Text Analytics
Excerpt from: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Available now:
More informationSurvey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationClever Search: A WordNet Based Wrapper for Internet Search Engines
Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,
More informationText Mining and Analysis
Text Mining and Analysis Practical Methods, Examples, and Case Studies Using SAS Goutam Chakraborty, Murali Pagolu, Satish Garla From Text Mining and Analysis. Full book available for purchase here. Contents
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationC o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER
INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process
More informationA Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationComputer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationSemantic analysis of text and speech
Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic
More informationAnalyzing survey text: a brief overview
IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationText Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
More informationFrom Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files
Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationDiagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study
Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationSIMOnt: A Security Information Management Ontology Framework
SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud
More informationKeywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
More informationDoctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
More informationTwitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu
Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced
More informationDomain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
More informationWhitepaper. Leveraging Social Media Analytics for Competitive Advantage
Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
More informationBuild Vs. Buy For Text Mining
Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question
More informationSentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationSI485i : NLP. Set 6 Sentiment and Opinions
SI485i : NLP Set 6 Sentiment and Opinions It's about finding out what people think... Can be big business Someone who wants to buy a camera Looks for reviews online Someone who just bought a camera Writes
More informationComparing Ontology-based and Corpusbased Domain Annotations in WordNet.
Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information
More informationIntegrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes
Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,
More informationInterest Rate Prediction using Sentiment Analysis of News Information
Interest Rate Prediction using Sentiment Analysis of News Information Dr. Arun Timalsina 1, Bidhya Nandan Sharma 2, Everest K.C. 3, Sushant Kafle 4, Swapnil Sneham 5 1 IOE, Central Campus 2 IOE, Central
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines -
English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed
More informationInteractive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs
Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationText Processing with Hadoop and Mahout Key Concepts for Distributed NLP
Text Processing with Hadoop and Mahout Key Concepts for Distributed NLP Bridge Consulting Based in Florence, Italy Foundedin 1998 98 employees Business Areas Retail, Manufacturing and Fashion Knowledge
More informationInteractive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
More informationCAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
More informationFolksonomies versus Automatic Keyword Extraction: An Empirical Study
Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationSINAI at WEPS-3: Online Reputation Management
SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More informationAnotaciones semánticas: unidades de busqueda del futuro?
Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015
RESEARCH ARTICLE Multi Document Utility Presentation Using Sentiment Analysis Mayur S. Dhote [1], Prof. S. S. Sonawane [2] Department of Computer Science and Engineering PICT, Savitribai Phule Pune University
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationSENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT
More informationResearch Article 2015. International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-4) Abstract-
International Journal of Emerging Research in Management &Technology Research Article April 2015 Enterprising Social Network Using Google Analytics- A Review Nethravathi B S, H Venugopal, M Siddappa Dept.
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationOptimization of Internet Search based on Noun Phrases and Clustering Techniques
Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationA Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
More informationSentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationA Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis
A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse Features Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The
More informationCIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet
CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationEXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS
EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS 1 Soundarya.V, 2 Siddareddy Sowmya Rupa, 3 Sristi Khanna, 4 G.Swathi, 5 Dr.D.Manjula 1,2,3,4,5 Department of Computer Science And Engineering,
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationReputation Management System
Reputation Management System Mihai Damaschin Matthijs Dorst Maria Gerontini Cihat Imamoglu Caroline Queva May, 2012 A brief introduction to TEX and L A TEX Abstract Chapter 1 Introduction Word-of-mouth
More informationANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS
ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering
More informationLing 201 Syntax 1. Jirka Hana April 10, 2006
Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all
More informationOntology based ranking of documents using Graph Databases: a Big Data Approach
Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationWhy are Organizations Interested?
SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationCS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage
CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing
More informationBig Data Analytics and Healthcare
Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured
More informationShallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
More informationTechWatch. Technology and Market Observation powered by SMILA
TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationifinder ENTERPRISE SEARCH
DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationSentiment-Oriented Contextual Advertising
Teng-Kai Fan Department of Computer Science National Central University No. 300, Jung-Da Rd., Chung-Li, Tao-Yuan, Taiwan 320, R.O.C. tengkaifan@gmail.com Chia-Hui Chang Department of Computer Science National
More informationSemantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems
Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation,
More informationNatural Language Processing
Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models
More informationQuestion Answering and Multilingual CLEF 2008
Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We
More informationReducing Client Incidents through
Intel IT IT Best Practices Big Data Predictive Analytics December 2013 Reducing Client Incidents through Big Data Predictive Analytics Executive Overview Our new ability to proactively, rather than reactively,
More informationSENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND CROSS DOMAINS EMMA HADDI BRUNEL UNIVERSITY LONDON
BRUNEL UNIVERSITY LONDON COLLEGE OF ENGINEERING, DESIGN AND PHYSICAL SCIENCES DEPARTMENT OF COMPUTER SCIENCE DOCTOR OF PHILOSOPHY DISSERTATION SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND
More information