Information Retrieval, Information Extraction and Social Media Analytics

Size: px
Start display at page:

Download "Information Retrieval, Information Extraction and Social Media Analytics"

Transcription

1 Anwendersoftware a Information Retrieval, Information Extraction and Social Media Analytics Based on chapter 10 of the Advanced Information Management lecture Laura Kassner Universität Stuttgart Winter Term 2014

2 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 2

3 Information Retrieval Systems simpler data model than database systems Information organized as a collection of documents Documents are unstructured, no schema Goal: locate relevant documents based on user input keywords example documents e.g., find documents containing the words database systems "database system" database system query input IR collection of documents document_x document_y document_z works on textual descriptions provided with non-textual data such as images Example: Web search engines, desktop file search Dr. Holger Schwarz, Universität Stuttgart, IPVS 3

4 Information Retrieval Systems Differences from database systems: No transactional updates (including concurrency control and recovery) Database systems deal with structured data, with schemas that define the data organization IR systems deal with some querying issues not generally addressed by database systems - Approximate searching by keywords - Ranking of retrieved answers by estimated degree of relevance Dr. Holger Schwarz, Universität Stuttgart, IPVS 4

5 Keyword Search In full text retrieval, all the words in each document are considered to be keywords. Word in a document = term query expressions consist of keywords and the logical connectives "and", "or", and "not" and is implicit for queries with several worcs Ranking of documents on the basis of estimated relevance to a query is critical! Factors for relevance: Term frequency - Frequency of occurrence of query keyword in document Inverse document frequency - How many documents the query keyword occurs in Fewer give more importance to keyword Hyperlinks to documents - More links to a document document is more important (cf. PageRank) Dr. Holger Schwarz, Universität Stuttgart, IPVS 5

6 Document Indexing An inverted index maps each keyword K i to a set of documents S i that contain the keyword Documents identified by identifiers Inverted index may record Keyword locations within document to allow proximity based ranking Counts of number of occurrences of keyword to compute TF and operation: Finds documents that contain all of K 1, K 2,..., K n. Intersection S 1 S 2... S n or operation: documents that contain at least one of K 1, K 2,, K n Union S 1 S 2... S n Each S i is kept sorted to allow efficient intersection/union by merging not can also be efficiently implemented by merging of sorted lists Dr. Holger Schwarz, Universität Stuttgart, IPVS 6

7 Relevance Ranking Using Terms TF-IDF (Term frequency/inverse Document frequency) ranking: n(d) = number of terms in the document d n(d, t) = number of occurrences of term t in the document d n(t) = number of documents containing term t Relevance of a document d to a term t n(d, t) TF (d, t) = log 1 + n(d) - log factor to avoid excessive weight to frequent terms Relevance of a term t in document collection D IDF (t) = log D n(t) Dr. Holger Schwarz, Universität Stuttgart, IPVS 7

8 Relevance Ranking Using Terms Relevance of document d to term t: r (d, t) = TF (d, t) IDF(t) Relevance of document d to query Q: r (d, Q) = TF (d, t) n(t) t Q Dr. Holger Schwarz, Universität Stuttgart, IPVS 8

9 Relevance Ranking Using Terms Assume: - document A of 100 words contains the term "database" 3 times and the term "system" 6 times - document base D consists of 1 Mio. documents documents contain the term "database" documents contain the term "system" Relevance of a document d to a term TF(A,"database") = log(1+3/100) = TF(A,"system") = log(1+6/100) = Relevance of a term in document collection D IDF("database") = log(1000) = 3 IDF("system") = log(20) = TF-IDF(A,"database") = 0.013*3 = TF-IDF(A,"system") = 0.025*1.301 = Dr. Holger Schwarz, Universität Stuttgart, IPVS 9

10 Relevance Ranking Using Terms Most systems are more complex than that: Words that occur in title, author list, section headings, etc. are given greater importance Words whose first occurrence is late in the document are given lower importance Very common words such as a, an, the, it etc. are eliminated (stop words) Proximity: if keywords in query occur close together in the document, the document has higher importance than if they occur far apart Documents are returned in decreasing order of relevance score (usually only top n documents) Dr. Holger Schwarz, Universität Stuttgart, IPVS 10

11 Similarity Based Retrieval Similarity based retrieval - retrieve documents similar to a given document Similarity may be defined on the basis of common words - E.g. find k terms in A with highest TF (d, t ) / n (t ) and use these terms to find relevance of other documents. Relevance feedback: Similarity can be used to refine answer set to keyword query User selects a few relevant documents from those retrieved by keyword query, and system finds other documents similar to these Dr. Holger Schwarz, Universität Stuttgart, IPVS 11

12 Similarity Based Retrieval Vector space model: Define an n-dimensional space, where n is the number of terms in the document set. Vector for document d goes from origin to a point whose i th coordinate is TF (d,t ) / n (t ) The cosine of the angle between the vectors of two documents is used as a measure of their similarity. Usage in keyword search: Transform set of keywords into a document vector Calculate cosines for every document vector in D Use these to rank documents for retrieval Dr. Holger Schwarz, Universität Stuttgart, IPVS 12

13 Measuring Retrieval Effectiveness Information-retrieval systems save space by using index structures that support only approximate retrieval. This may result in: false negative (false drop): some relevant documents may not be retrieved. false positive: some irrelevant documents may be retrieved. For many applications, false positives are more tolerable than false negatives Dr. Holger Schwarz, Universität Stuttgart, IPVS 13

14 Measuring Retrieval Effectiveness Relevant performance metrics: precision: relevant documents retrieved documents retrieved documents % of retrieved documents that are relevant recall : % of relevant documents that were retrieved relevant documents retrieved documents relevant documents retrieved docs. Dr. Holger Schwarz, Universität Stuttgart, IPVS relevant not relevant 14

15 Measuring Retrieval Effectiveness Recall vs. precision tradeoff: increase recall by retrieving many documents Reduce precision by retrieving many irrelevant documents among them Measures of retrieval effectiveness: Recall as a function of number of documents fetched, or Precision as a function of recall - Equivalently, as a function of number of documents fetched E.g. precision of 75% at recall of 50%, and 60% at a recall of 75% Problem: measures of relevance Dr. Holger Schwarz, Universität Stuttgart, IPVS 15

16 Information Retrieval and Structured Data Information retrieval systems originally treated documents as a collection of words Information extraction systems infer structure from documents, e.g.: Extraction of house attributes (size, address, number of bedrooms, etc.) from a text advertisement Extraction of topic and people named from a news article Relations or XML structures used to store extracted data System seeks connections among data to answer queries Question answering systems Dr. Holger Schwarz, Universität Stuttgart, IPVS 16

17 Concept-Based Querying Approach For each word, determine the concept it represents from context Use one or more ontologies: - Hierarchical structure showing relationship between concepts - E.g.: elephant IS-A mammal can be used to standardize terminology in a specific field Ontologies can link multiple languages Foundation of the Semantic Web (not covered here) Useful for building concept-based querying: information extraction Which concepts make sense for this document collection? Which relations do we detect between concepts in this collection? Dr. Holger Schwarz, Universität Stuttgart, IPVS 17

18 Concept Resource: WordNet Lexical database of English verbs, nouns, and adjectives Taxonomy of concepts as represented by words Links concepts via semantic relations Synonyms happy, glad grouped into synsets Hypernyms and Hyponyms dog, mammal Meronyms wheel, tire Disambiguates word senses Freely available Equivalents exist for several natural languages e.g. GermaNet 18

19 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 19

20 Beyond Search: Information Extraction Information Retrieval only cares about retrieving documents containing a certain content Information Extraction distills content from documents i.e. uses documents as a source for Question answering Summary creation Compiling structured data Discovering new facts and relations This (often) requires text analytics! 20

21 Beyond Search: Text Analytics Tokenization: Splitting a text into words (tokens) - simple: on whitespace and punctuation - complex: what about compound nouns, multiwords, abbreviations, etc.? Sentence Splitting: finding sentence boundaries - Non-trivial: punctuation can also mark an abbreviation ('Dr. W. Jones is out of office today.'), not every sentence is delimited by punctuation (headlines), what about mid-sentence quotes? Stemming / Lemmatization: reducing words to base forms - e.g. running, horses Part-of-Speech-Tagging: Assigning a word its part of speech - Noun, verb, preposition, adverb tagsets - Challenges: ambiguous word class, e.g. 'I run a mile every day' vs. 'Today's run was great!' Chunking: combining several tokens into syntactic chunks, e.g. corresponding to noun phrases, prepositional phrases, adverbial... Parsing: assigning structure to entire sentences - constitutent vs. dependency Dr. Holger Schwarz, Universität Stuttgart, IPVS 21

22 Text Analytics Example Pipeline Text Files Natural Language Processing et al. Structured Information S-Klasse bezeichnet die Oberklasse der Automarke Mercedes-Benz. Sie steht für luxuriöse Limousinen und Coupés. Im Jahr 1972 erschien mit der Baureihe 116 die erste offiziell von Mercedes-Benz (MB) so bezeichnete S-Klasse. (Wikipedia) Entstehungsjahr(S-Klasse): 1972 IS-A(S-Klasse, Luxusauto) 22

23 Text Analytics Example Pipeline Words Parts of Speech Named Entities Sentence Structure S-Klasse bezeichnet die Oberklasse der Automarke Mercedes-Benz. Sie steht für luxuriöse Limousinen und Coupés. Im Jahr 1972 erschien mit der Baureihe 116 die erste offiziell von Mercedes-Benz (MB) so bezeichnete S-Klasse. (Wikipedia) Verbs NP S-Klasse (N) Names S bezeichnet (VFIN) VP NP NP NP die (ART) Oberklasse (N) der (ART) Automarke (N) Mercedes- Benz (N) 23

24 Text Analytics - Challenges Language-specific: Different structures, e.g. English / Turkish / Chinese Statistical tools perform well, but training requires large amounts of (annotated) data best performances usually for English, annotation is labor-intensive Web data: often written by non-native speakers and full of slang, abbreviations, nonstandard language need robust tools for 'ungrammatical' input Domain-specific: Narrow, fixed-structure idioms from one domain are easier to handle but may require manual calibration Free text with no topic restrictions is more difficult to process Complexity: full-blown text analytics is costly and not always precise enough for some applications, surfacey approaches such as regular expression pattern matching may be better suited 24

25 Text Analytics Frameworks and Toolkits Frameworks: Apache UIMA GATE Java Toolkits: OpenNLP Stanford Core NLP Python Toolkits: NLTK TextBlob 25

26 Overview Information Retrieval Introduction Relevance Ranking TF-IDF Similarity-Based Retrieval Measuring Retrieval Effectiveness Concept-Based Querying Information Extraction Text Analytics Social Media Analytics Introduction SMA on structured data Sentiment Detection Examples/Discussion 26

27 Social Media Analytics Central questions: Who cares about what on the web? What are people saying about [brand person event] online? Which topics are popular / trending? Positive or negative opinions? Which voices are influential? How does opinion spread? Can we identify recurring root causes? Are there correlations with [marketing campaigns product releases new strategies]? Company: Which products should I recommend to customer X based on his buying behavior? User: Which product should I buy? Is this movie worth watching? Do people like my blog? 27

28 Social Media Analytics structured sources Structured data sources: Page views Clicks Likes Followers Friend graphs Retweet/reblog statistics 28

29 Social Media Analytics structured sources 29

30 Social Media Analytics unstructured sources Unstructured data sources: News texts Blog content Reviews Comment sections Tweets and status updates 30

31 Sentiment Detection a.k.a. opinion mining performed mainly on unstructured, free text data sources research focus since early 2000s Machine learning available Large text collections available (the internet) Fed by interest in text summarization throughout 1990s classifies text snippets or entire documents as subjective / objective positive / negative / (neutral) strongly or weakly opinionated (intensity) Connects sentiment to topics / entities e.g. products, productions, persons 31

32 Sentiment Detection Not as easy as it seems 32

33 Text Features for Sentiment Detection Features for Sentiment and Subjectivity Classification Keywords with positive or negative sentiment Frequency Occurrence (yes/no) more effective Bigram or trigram features? Conflicted evidence, but bag-of-word models are problematic e.g. with regard to negation Parts-of-speech Only reliable feature: frequent adjectives signal subjectivity Syntax No clear evidence that parsing is helpful But: syntactic knowledge helps identify valence shifters e.g. negation, intensifiers, diminishers Collocations / syntactic patterns may be useful Predicate-argument combinations may carry sentiment where the single terms do not latent sentiment - The price is low = positive Rule-based classification vs. machine learning approaches 33

34 Creating a Sentiment Dictionary Hand-craft? Extremely time-consuming Even human annotators do not agree on all polarities Cluster terms according to frequencies, context, and constructions 'elegant but over-priced', 'clever and informative' 2 clusters assign orientation (e.g. cluster with more frequent average occurrences = positive seems to work) Use seed words with known polarity find words with similar distribution, co-occurrence, or which are synonymous propagate polarity e.g. across WordNet links 34

35 Sentiment and Topic What units are we looking at? Do we want to classify the document / paragraph / sentence / snippet? Local vs. global sentiment of a text Distance between topic and sentiment term same sentence, same paragraph, title of document? Topic-dependent sentiment Wal-mart reports that profits rose - positive in an article about Wal-mart, negative in an article about Target the Samsung Galaxy S5 is better than the LG 3G - positive for Samsung, negative for LG making things (slightly) easier: let user specify which topic they want to consider Discourse structure Headlines, position in paragraph Quoting and responding behavior in conversation threads 35

36 Resources for Sentiment Detection polarity word lists / nets English: Harvard General Inquirer SentiWordNet German: SentiWS Reviews with both unstructured and structured content labeled data for learning sentiment 36

37 Social Media Analytics Demographic Information What kind of people talk about a product? Men, women, children? Parents? Do they own the product? Are they potential customers? Where do they live? Username: supermama_10 Location: Houston, Texas I usually buy Pampers diapers, they are the best I gave my older daughter a Samsung S3 for Xmas, but now my husband uses it all the time lol 37

38 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,

39 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,

40 Social Media Analytics Refining Concepts Refining concepts: Concept suggestion component Select a representative sample of the gathered documents (downsampling) Extract the most relevant terms from these documents as keywords Cluster documents based on these keywords Control cluster: using just the initially specified concepts Similar to control cluster add keywords as new concept suggestions Different from control cluster add keywords as blacklist suggestions Feedback to user refined concept selection new crawl for documents 40

41 Social Media Analytics a concrete architecture IBM Social Media Analytics Coutinho et al.,

42 Sentiment Detection and Concept Extraction Sentiment Detection (similar, published approach: WebFountain sentiment miner, which also belongs to IBM) Linguistic preprocessing: Tokenization POS-tagging Parsing phrase and sentence structures Identify concepts and feature terms Part-of or attribute-of relationship with concept or known feature (e.g. 'lens' part-of 'camera', 'price' attribute-of 'camera') Candidates: beginning definite base noun phrases, i.e. POS-tag/word sequences 'the NN', 'the JJ NN', 'the NN NN' etc. (NN = noun, JJ = adjective) (Yi et al, 2005) 42

43 Sentiment Detection and Concept Extraction Sentiment Detection Sentiment lexicon <entry> <POS-tag> <polarity> excellent JJ + Sentiment patterns <predicate> <sentence-category> <target> <predicate> - a verb <sentence-category> - a subject phrase, object phrase, complement / adjective phrase or prepositional phrase, associated with a polarity + or - Flipped polarity on target is signified by ~ marker <target> - a subject or object phrase at which the sentiment is directed 43

44 Sentiment Detection and Concept Extraction Semantic relationship analysis: identify pattern elements from parse trees, starting with predicates In a pattern, assign sentiment to target based on source sentiment If the phrase or the sentence contains a negation, reverse the sentiment polarity Precision: 86 %, Recall: 56 % 44

45 Social Media Analytics a concrete architecture IBM Social Media Analytics Alper et Coutinho al et al.,

46 Resources / Further Reading Information retrieval: Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval. Vol. 1. Cambridge: Cambridge University Press, Sentiment Detection: Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval (2008): Social Media Analytics: Coutinho, Fabio Cardoso, Alexander Lang, and Bernhard Mitschang. "Making Social Media Analysis More Efficient Through Taxonomy Supported Concept Suggestion." Proceedings of the BTW Alper, Basak, et al. "OpinionBlocks: Visualizing Consumer Reviews." Proceedings of the IEEE VisWeek Workshop on Interactive Text Analytics for Decision Making Yi, Jeonghee, and Wayne Niblak. Sentiment Mining in WebFountain. Proceedings of the 21st ICDE

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

Stock Market Prediction Using Data Mining

Stock Market Prediction Using Data Mining Stock Market Prediction Using Data Mining 1 Ruchi Desai, 2 Prof.Snehal Gandhi 1 M.E., 2 M.Tech. 1 Computer Department 1 Sarvajanik College of Engineering and Technology, Surat, Gujarat, India Abstract

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

The Seven Practice Areas of Text Analytics

The Seven Practice Areas of Text Analytics Excerpt from: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Available now:

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Clever Search: A WordNet Based Wrapper for Internet Search Engines

Clever Search: A WordNet Based Wrapper for Internet Search Engines Clever Search: A WordNet Based Wrapper for Internet Search Engines Peter M. Kruse, André Naujoks, Dietmar Rösner, Manuela Kunze Otto-von-Guericke-Universität Magdeburg, Institut für Wissens- und Sprachverarbeitung,

More information

Text Mining and Analysis

Text Mining and Analysis Text Mining and Analysis Practical Methods, Examples, and Case Studies Using SAS Goutam Chakraborty, Murali Pagolu, Satish Garla From Text Mining and Analysis. Full book available for purchase here. Contents

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

A Comparative Study on Sentiment Classification and Ranking on Product Reviews

A Comparative Study on Sentiment Classification and Ranking on Product Reviews A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

Analyzing survey text: a brief overview

Analyzing survey text: a brief overview IBM SPSS Text Analytics for Surveys Analyzing survey text: a brief overview Learn how gives you greater insight Contents 1 Introduction 2 The role of text in survey research 2 Approaches to text mining

More information

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A

Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases

More information

Resolving Common Analytical Tasks in Text Databases

Resolving Common Analytical Tasks in Text Databases Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files Journal of Universal Computer Science, vol. 21, no. 4 (2015), 604-635 submitted: 22/11/12, accepted: 26/3/15, appeared: 1/4/15 J.UCS From Terminology Extraction to Terminology Validation: An Approach Adapted

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study

Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

SIMOnt: A Security Information Management Ontology Framework

SIMOnt: A Security Information Management Ontology Framework SIMOnt: A Security Information Management Ontology Framework Muhammad Abulaish 1,#, Syed Irfan Nabi 1,3, Khaled Alghathbar 1 & Azeddine Chikh 2 1 Centre of Excellence in Information Assurance, King Saud

More information

Keywords social media, internet, data, sentiment analysis, opinion mining, business

Keywords social media, internet, data, sentiment analysis, opinion mining, business Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction

More information

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED

Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko

More information

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu

Twitter Stock Bot. John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Twitter Stock Bot John Matthew Fong The University of Texas at Austin jmfong@cs.utexas.edu Hassaan Markhiani The University of Texas at Austin hassaan@cs.utexas.edu Abstract The stock market is influenced

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

SI485i : NLP. Set 6 Sentiment and Opinions

SI485i : NLP. Set 6 Sentiment and Opinions SI485i : NLP Set 6 Sentiment and Opinions It's about finding out what people think... Can be big business Someone who wants to buy a camera Looks for reviews online Someone who just bought a camera Writes

More information

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet.

Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. Comparing Ontology-based and Corpusbased Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo Presented by: rabee ali alshemali Motive. Domain information

More information

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes

Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Integrating Public and Private Medical Texts for Patient De-Identification with Apache ctakes Presented By: Andrew McMurry & Britt Fitch (Apache ctakes committers) Co-authors: Guergana Savova, Ben Reis,

More information

Interest Rate Prediction using Sentiment Analysis of News Information

Interest Rate Prediction using Sentiment Analysis of News Information Interest Rate Prediction using Sentiment Analysis of News Information Dr. Arun Timalsina 1, Bidhya Nandan Sharma 2, Everest K.C. 3, Sushant Kafle 4, Swapnil Sneham 5 1 IOE, Central Campus 2 IOE, Central

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Text Processing with Hadoop and Mahout Key Concepts for Distributed NLP

Text Processing with Hadoop and Mahout Key Concepts for Distributed NLP Text Processing with Hadoop and Mahout Key Concepts for Distributed NLP Bridge Consulting Based in Florence, Italy Foundedin 1998 98 employees Business Areas Retail, Manufacturing and Fashion Knowledge

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

SINAI at WEPS-3: Online Reputation Management

SINAI at WEPS-3: Online Reputation Management SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015 RESEARCH ARTICLE Multi Document Utility Presentation Using Sentiment Analysis Mayur S. Dhote [1], Prof. S. S. Sonawane [2] Department of Computer Science and Engineering PICT, Savitribai Phule Pune University

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES

SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT

More information

Research Article 2015. International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-4) Abstract-

Research Article 2015. International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-4) Abstract- International Journal of Emerging Research in Management &Technology Research Article April 2015 Enterprising Social Network Using Google Analytics- A Review Nethravathi B S, H Venugopal, M Siddappa Dept.

More information

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...

More information

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

Optimization of Internet Search based on Noun Phrases and Clustering Techniques Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015 Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis

A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse. Features. Thesis A Sentiment Analysis Model Integrating Multiple Algorithms and Diverse Features Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The

More information

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet

CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet CIRGIRDISCO at RepLab2014 Reputation Dimension Task: Using Wikipedia Graph Structure for Classifying the Reputation Dimension of a Tweet Muhammad Atif Qureshi 1,2, Arjumand Younus 1,2, Colm O Riordan 1,

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS

EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS 1 Soundarya.V, 2 Siddareddy Sowmya Rupa, 3 Sristi Khanna, 4 G.Swathi, 5 Dr.D.Manjula 1,2,3,4,5 Department of Computer Science And Engineering,

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

Reputation Management System

Reputation Management System Reputation Management System Mihai Damaschin Matthijs Dorst Maria Gerontini Cihat Imamoglu Caroline Queva May, 2012 A brief introduction to TEX and L A TEX Abstract Chapter 1 Introduction Word-of-mouth

More information

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

More information

Ling 201 Syntax 1. Jirka Hana April 10, 2006

Ling 201 Syntax 1. Jirka Hana April 10, 2006 Overview of topics What is Syntax? Word Classes What to remember and understand: Ling 201 Syntax 1 Jirka Hana April 10, 2006 Syntax, difference between syntax and semantics, open/closed class words, all

More information

Ontology based ranking of documents using Graph Databases: a Big Data Approach

Ontology based ranking of documents using Graph Databases: a Big Data Approach Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Terminology Extraction from Log Files

Terminology Extraction from Log Files Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier

More information

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing

More information

Big Data Analytics and Healthcare

Big Data Analytics and Healthcare Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

TechWatch. Technology and Market Observation powered by SMILA

TechWatch. Technology and Market Observation powered by SMILA TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Sentiment-Oriented Contextual Advertising

Sentiment-Oriented Contextual Advertising Teng-Kai Fan Department of Computer Science National Central University No. 300, Jung-Da Rd., Chung-Li, Tao-Yuan, Taiwan 320, R.O.C. tengkaifan@gmail.com Chia-Hui Chang Department of Computer Science National

More information

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems

Semantic Analysis of. Tag Similarity Measures in. Collaborative Tagging Systems Semantic Analysis of Tag Similarity Measures in Collaborative Tagging Systems 1 Ciro Cattuto, 2 Dominik Benz, 2 Andreas Hotho, 2 Gerd Stumme 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation,

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

Reducing Client Incidents through

Reducing Client Incidents through Intel IT IT Best Practices Big Data Predictive Analytics December 2013 Reducing Client Incidents through Big Data Predictive Analytics Executive Overview Our new ability to proactively, rather than reactively,

More information

SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND CROSS DOMAINS EMMA HADDI BRUNEL UNIVERSITY LONDON

SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND CROSS DOMAINS EMMA HADDI BRUNEL UNIVERSITY LONDON BRUNEL UNIVERSITY LONDON COLLEGE OF ENGINEERING, DESIGN AND PHYSICAL SCIENCES DEPARTMENT OF COMPUTER SCIENCE DOCTOR OF PHILOSOPHY DISSERTATION SENTIMENT ANALYSIS: TEXT PRE-PROCESSING, READER VIEWS AND

More information