On-Demand Information Extraction. Summer/Fall 07. New York University Satoshi Sekine

Size: px
Start display at page:

Download "On-Demand Information Extraction. Summer/Fall 07. New York University Satoshi Sekine"

Transcription

1 On-Demand Information Extraction Summer/Fall 07 New York University Satoshi Sekine

2 Introduction (

3 Research topics On-demand IE IE pattern Discovery Multi/Sing doc. sum. IE and Sum. Summarization Paraphrase Discovery Cross Ling. Pre-ODIE Relation Discovery Information needs Encyclopedia QA 2nd grade language test Cross Ling. QA IR, clustering & sum. QA IR English Analyzer (OAK) Extended NE Attribute for ENE Query log & class Web People Search Japanese Spelling

4 What is Information Extraction To automatically extract information on specific scenario from unstructured text and put it into table format Newspaper Ex. Management succession Date person Company Position 7/1/2003 John Smith Smith Trade corporation COE 7/2/2003 Bill Brown Bank of Manhattan President

5 Problem in conventional IE Preparation of knowledge for given scenario Pattern, lexicon and semantic knowledge Company announced Person s promotion to Position Writing rules by hand or creating training data Restriction Preparation time (one month in MUC) Specific/Limited scenario/domain

6 Our goal Make one month to one minute (=automatic) IE on the fly How? Use unsupervised learning methods Pattern discovery, paraphrase discovery Prepare as much knowledge/tools as we can for as many scenarios as possible Extended NE, semantic knowledge, relation Demo

7 ODIE Flow Chart Description of task Relevant Document IR system Language Analyzer Pattern discovery Pattern ENE tagger Paraphrase discovery Co-reference Pattern sets Table construction Table

8 Automatic Discovery of IE patterns (Sudo et.al HLT01) (Sudo et.al ACL03) Bottleneck of IE is knowledge creation IE pattern Company announced Person s promotion to Position Key Idea Retrieve documents on the given topic Collect relatively frequent patterns in the retrieved documents Research focus and Result Pattern format (predicate-argument, chain, sub-tree) Computation time (Tree mining algorithms) Near human performance was achieved

9 IE Pattern Discovery - Result Succession Train: 1 yr. newspaper Test: 148 (87 relevant, 61 irrelevant) Slots: Person:135 Org: 172 Pot: 215

10 Paraphrase Acquisition Automatic acquisition of paraphrase (Sekine Gengo01) Shinyama et al HLT02) (Shinyama et al IWP03) Purpose: Relationship between IE patterns Key idea Events are usually reported in different newspapers on the same day However, NE and numeric, date expressions are identical Use them as anchor to acquire paraphrase We observed encouraging results, but we found that it is not so easy a task

11 Paraphrase Acquisition Procedure Newspaper A Article A Find Comparable Articles Newspaper B Anchors identified NE tagging Coref. Article B Find Comparable Sentences Anchors identified phrase A Extract Paraphrase phrase B

12 Paraphrase Acquisition Evaluation Two techniques (co-reference, Argument Structure Database) One year of two Japanese newspaper 195 article pairs on arrest of murder suspects Coref ASD Sentence w/o (93) with Phrases w/o (>100) w/o with w/o with with obtained precision recall %(41) 44% 69%(52) 56% 24%(25) 56%(18) 62%(23)

13 Relation Discovery Relation discovery (Hasegawa et al ACL04) Motivation Discovering particular relations between NE s Country & President, Merged Companies Unsupervised method Not known the relationships in advance As many relationships as possible Idea Context based clustering Pairs of entities with similar context would be in the same relation Similar contexts could also characterize the relations

14 Relation Discovery Procedure Tag ENE on text, select two ENE types Collecting expressions within 5 words, and make the context between them as context vector Cluster ENE pairs based on context vectors Label each cluster based on frequent context words Named entity Company A is offering to buy s interest in is negotiating to acquire s planned purchase of Accumulated contexts 5 words or less Named entity Company B

15 Relation Discovery Result Major relations Ratio Common words (Relative frequency) President 17/23 President (1.0), president (0.415), Senator 19/21 Sen. (1.0), Republican (0.214), Prime Minister 15/16 Minister (1.0), minister (0.875), Prime (0.875), 15/16 Gov. (1.0), governor (0.458), Governor (0.3), Governor Secretary 6/7 Secretary (1.0), secretary (0.143), Republican 5/6 Rep. (1.0), Republican (0.667), Coach 5/5 coach (1.0), M&A 10/11 buy (1.0), bid (0.382), M&A 9/9 acquire (1.0), acquisition (0.583), buy (0.583), Parent 7/7 parent (1.0), unit (0.476), own (0.143),

16 Relation Discovery Evaluation and Error analysis Evaluation result Labeling accuracy: 10/10 = 100% (Cluster level) 108/121 = 89% (NE pair level) (cf. recall: 140/(177+65) = 58% (All clusters)) False Alarm (Mis-clustered NE pairs): Common words in another cluster which existed in 5 words contexts by accident (noise words) Chechnya war may exhaust President Boris Yeltsin Miss (Undetected NE pairs): Absence of common words in 5 words contexts Outer context might be helpful Boris Yeltsin on end the fighting in Chechnya

17 Another Paraphrase Discovery via Relation Discovery (Hasegawa et.al Gnengo-05) Expressions found in the same relation should contain paraphrase Two filters Used in more than one pair of instance Expressions contain frequent term Ex) A bought B / A has agreed to buy B / A, which is buying B / A s proposed acquisition of B / A s acquisition of B / A s agreement to buy B / A s purchase of B / A bid for B / A s takeover of B / A merger with B / A succeeded in buying B / B, which was acquired by A / B would become a subsidiary of A / B agreed to be bought by A

18 Yet Another Paraphrase Discovery using context and keywords (Sekine IPW-05) To eliminate the frequency threshold for the vector model in Hasegawa s method Algorithm Find keywords in context of each pair of NEs using TFIDF Cluster phrases based on NE instances (buy acquire / buy agree / buy purchase)

19 Extended Named Entity Sekine et.al LREC02) (Sekine et.al LREC04) Named Entity with 200 categories Definition (150 page) was released Automatic tagger 130K entries 2000 rules 70% accuracy Improving

20 Attributes for ENE (ver ) (example for PERSON) Attribute(20) Example of value Vocation professional baseball player, economist, poet Nationality Freq. (%) ENE 46(100) Vocation American, Chinese, Japanese 29(63) Country Career A professor at Yale University, The Princess of Wales 26(57) Vocation Masterpiece Guernica, Mona Lisa 25(54) Product, Facility Graduate M.A. in German at Cambridge, MK High School 20(44) School Hometown Paris, Manchester, Shanghai 19(41) City Native Province State of Illinois, Sichuan 18(39) Province Previous stay England, New York 12(26) Location Mentor Andrea del Verrocchio, Michelangelo di Lodovic Buonarroti Simoni 10(22) Person Death date 04,23,1704, 04/23/1704, unknown 10(22) date Era Edo period, the 11th century 8(17) Era Award Academy Award, MVP, Nobel Prize 8(17) Award Real Name Saint Nicholas 8(17) Person Another name Santa, father Christmas 8(17) Person Title Knight, an honorary degree at Yale 6(13) Title Competition World Series, 1955 piano competition in Paris 6(13) Game Place of birth New York, Birmingham 5(11) Location Father John B. Kelly, Sr. 5(11) Person Cause of death Car accident, Guillotine 5(11)

21 ODIE Flow Chart Description of task Relevant Document IR system Language Analyzer Pattern discovery Pattern ENE tagger Paraphrase discovery Co-reference Pattern sets Table construction Table

22 On-demand Information Extraction - ODIE Middle-size NSF project at NYU 03-08) Demo Description of the task (topic) acquire acquisition merge merger buy purchase Build a table in 1 minute Docid Company Money Date nyt Comdata Holdings Corp., Ceridian Corp. $900 million Last week nyt Chemicals Inc., $400 million Last year nyt Tenneco Inc., Mobil Corp. $1.27 billion nyt CoreStates Financial Corp., Merdian Bancorp $3.1 billion

23 nyt The move comes amid a flurry of acquisitions in the computer financial processing industry. Last week, check processor Ceridian Corp. agreed to buy Comdata Holdings Corp. for $900 million. nyt Last year, Terra Industries Inc., another large fertilizer maker, bought Agricultural Minerals & Chemicals Inc. for $400 million. ``Industry growth is primarily overseas,'' said Harvey Stober, an analyst with Donaldson, Lufkin & Jenrette in New York. nyt Tenneco Inc. said it agreed to buy Mobil Corp.'s plastics division for $1.27 billion as part of its strategy to expand into less cyclical, higher-growth businesses. nyt CoreStates Financial Corp. agreed to buy Meridian Bancorp for $3.1 billion in stock, creating a bank with a commanding role in Philadelphia and the industrial cities of eastern Pennsylvania.

24 ODIE - Evaluation Result Out of 20 topics, tables created for 2 topics are judged useful, 12 are useful, 6 are not useful. Correctness of the table fillers Evaluation Number of rows Correct 84 Partially correct 4 Incorrect 12

25 Research topics On-demand IE IE pattern Discovery Multi/Sing doc. sum. IE and Sum. Summarization Paraphrase Discovery Cross Ling. Pre-ODIE Relation Discovery Information needs Encyclopedia QA 2nd grade language test Cross Ling. QA IR, clustering & sum. QA IR English Analyzer (OAK) Extended NE Attribute for ENE Query log & class Web People Search Japanese Spelling

26 Preemptive Information Extraction (Shinyama, Sekine NAACL06) We can even delete queries 1. Find a pair (or triple) of entities that have a similar relation in articles. ex. Relation between Katrina - New Orleans in Article A is similar to relation between Longwang - Taiwan in Article B. 2. Cluster relations based on their similarity. 3. A cluster of similar relations = table.

27 Knowledge Discovery from Query Log (work at Microsoft Research, Summer 06) Query log contains a lot of knowledge Discover knowledge about ENE Query log (6month, 1.5B) bathroom+showers+pictures miss+mundo owen+mulligan+tyrone news+press free+wedding+planner marc+arther+glen preadolescente+follando gokmenler+agricultural+machinery+co.+erotika+com academy+award+winner ENE dictionary (245K) 100+greatest+britons AWARD 100+worst+britons AWARD aaass/orbis+books+prize+for+polish+studies AWARD abel+prize AWARD academy+award AWARD academy+awards AWARD acm+turing+award AWARD agatha+award AWARD agatha+awards AWARD air+medal AWARD

28 Problems to be solved Noise Problem 2 This is noise in ENE dictionary. I want to get rid of them!!! Populate the list Problem 3 The list may not have enough coverage. I want to populate ENE Important Context Problem 1 This is a very general context. I want to see important context for this category Clustering Problem 4 Individual entities are verbose. I want to cluster them.

29 Key Idea Find Important context for each category New Formula (not TFIDF, MI, log likelihood ratio ) Score(context) = ftype{context} * log( g(context) / C ); g(context) = ftype{context}/finst{context}; C = ftype{contexttop1000} / Finst{contexttop1000}; Result ACADEMIC #+syllabus, degrees+in+#, phd+in+#, masters+in+#, diploma+in+#, #+lecture+notes, masters+degree+in+#, courses+in+, AWARD #+winners, #+nominees, #+nominations, #+winner, #+award, who+won+#, winners+of+#, list+of+#+winners, winners+of+the+#

30 Results Delete Noise (Problem 2) Entities which does not co-occur with important context are noise BOOK: it, porno, space, night, we, working, candy, wheels, foundation, jazz, ghost, couples, the bible, giant, daddy, creation Find New entities (Problem 3) words co-occur with important context are entities AWARD: golden+globes, grammys, golden+globe, kentucky+derby, daytime+emmy, sag, sag+awards, american+idol, daytime+emmys

31 Web People Search a SemEval 2007 task Input Output First 100 results for a person name search Clustring according to the actual people John Smith 1 (Captain) John Smith 2 (Labour leader) Captain John Smith John Smith wikipedia en.eikipedia.. BBC: Labour leader John Smith news John Smith wikipedia en.wikipedia John Smith 3 (IBM researcher) John Smith 4 (Film director) John Smith 5 (Shoe company) John Smith 6 (Writer)

32 WePS SemEval - formally called Senseval since proposal / 18 final tasks >100 teams, > 125 systems, 110 papers WePS is the largest single task 16 participants (38 people in ML) Data Training source Test Av. entity Av. doc. Wikipedia ECDL WEB03* Total av. source Av. entity Av. doc. Wikipedia ACL Census Total av

33 WePS (Result) team CU_COM SEM COMBINED_BASELINE IRST-BP PSNUS UVA SHEF FICO UNN SCATTERED_BASELINE AUG SWAT-IV UA-ZSA TITPI JHU1-13 DFKI2 WIT UC3M_13 UBC-AS JOINED_BASELINE F α=0, purity inv_purity Presentation/Poster Tomorrow - PS2-11:15 12:30 Tomorrow - PS2 11:15 12:30 Next presentation - 17:00 17:15 Tomorrow - PS3 14:30 15:45 Tomorrow - PS2 11:15 12:30 Tomorrow - PS3 14:30 15:45 Tomorrow - PS2 11:15 12:30 Tomorrow - PS3 14:30 15:45 Tomorrow - PS3 14:30 15:45 Tomorrow - PS2 11:15 12:30 Tomorrow - PS2 11:15 12:30 Tomorrow - PS3 14:30 15:45 Tomorrow - PS3 14:30 15:45 -

34 WePS Systems Basic strategies are similar HTML analsys => pre-processing (POS, NE ) => feature selection (token, NE ) => calculate similarities => make clusters => produce output Summary is available Future Second event (08 fall or 09 spring) Attribute extraction (DOB, vocation )

35 Human Relationship Extraction (Goto NHK) Extract human relationships from movie/tv summary Display it as a network in a graph NE for the domain Attribute for NE Clustering the relationships

36 English Analyzer (OAK system) Chunked sentence (low level constituent) POS tagged sentence CONLL PTB/tag Tokenized sentence Dependency between Chunks and constituents PTB/tree CONL L, MUC PTB/tree PTB/tree Collins Plain Sentence PTB/tag PTB/tag PTB/tree NE tagged sentence (org,loc,person,time ) Plain NE tagger Chunker Dependency Analyzer POS tagger Parser Tokenizer Function Tagger Sentence Splitter Regularizer Triplet Plain Text OAK System PTB/tree Parse tree PTB/tree Parse tree with Function labels Glarf Regularized structure (predicate-argument)

37

38 Research topics On-demand IE IE pattern Discovery Multi/Sing doc. sum. IE and Sum. Summarization Paraphrase Discovery Cross Ling. Pre-ODIE Relation Discovery Information needs Encyclopedia QA 2nd grade language test Cross Ling. QA IR, clustering & sum. QA IR English Analyzer (OAK) Extended NE Attribute for ENE Query log & class Web People Search Japanese Spelling

39 Extra slides

40 IR, clustering and multi-doc. summaization Nobata et.al Gengo03) Nobata et.al DUC03) (Murakami et.al Gengo04) Large number of retrieved documents Automatic clustering by sub-topics 200 documents -> 5 clusters Summarize each cluster (MDS) Demo (

41

42

43

44 Cross Lingual Question Answering Sekine DARPA03) (Sekine TALIP04) Developed in a month in DARPA s SurpriseLanguage experiment Type in question in English Translate question to Hindi Find answer in Hindi newspaper Translate answer to English Demo (

45 NE tagged corpus Num. knowledge Q Examiner NE tagger Keywords in English Type of expected answer newspaper Relevant Documents CLIR Q in English Numeric tagger Keywords in Hindi User I/F Find answer Bi-ling. dict. Answer, context in English Answer, context in Hindi Translation (H->E)

46

47

48

49 Huge Text data & knowledge acquisition (Yoshihira et.al Gengo04) (Ando et.al LREC04) (Sekine Gengo04) Collecting huge corpus (untagged) WEB pages : (Japanese 350G, English 300G) Newspaper : 38 years, ~10G Encyclopedia Tool Suffix Array for the huge data Application Semantic knowledge acquisition by lexico-syntactic pattern Automatic acquisition of NE instances IE pattern, paraphrase acquisition Collect spelling variations KWIC system (

50

51 Survey on Multi-document summarization Sekine et.al DUC03) Purpose Multi-doc. summarization & IE Analyze summary/category of 100 doc. set Category of doc. set 13 category: (multi single)-(person org loc ), other Consistent tagging by people It should be useful for summarization strategy Table format summary 40-45%: Table is natural, 36-38%: Table is OK Encouraging results for On-demand IE

52 Spelling variation in Japanese (Masuyama et.al Gengo04) Serious problem in real world systems 30% of question for encyclopedia QA system 40% of zero hits on dictionary look up A system to find the closest entry 190,000 entries Edit distance after narrowing down candidates If the target is in the dictionary, 100% in top 3 Demo Planning to construct variation dictionary from WEB corpus

53

54 Encyclopedia QA (Sekine Gengo03) (Sekine et al. Gengo05) User can ask question in natural language and get the answer from encyclopedia System is working as a real demo ( Answer in the entry words or numeral in explanation Answer prepared in advance by IE and pattern based Q matcher pinpoint the answer

55

56 2nd Grade Language Test Objectives Realize the NLP technologies into the form which can be easily observed by ordinary people Observe the problems of the NLP technologies by degrading the level of target materials Results # of Q Known Correct Cor in known Cor. In total Kanji Word K Reading Total

57

58 Research topics On-demand IE IE pattern Discovery Multi/Sing doc. sum. IE and Sum. Summarization Paraphrase Discovery Cross Ling. Pre-ODIE Relation Discovery Information needs Encyclopedia QA 2nd grade language test Cross Ling. QA IR, clustering & sum. QA IR English Analyzer (OAK) Extended NE Attribute for ENE Query log & class Web People Search Japanese Spelling

59 My Research Interest things are the central People, organization, location, event name, vocation, product name How they are related How they act, are used or are evaluated It is knowledge How to create knowledge? Extract from huge corpus Fusion with the existing knowledge (e.g. encyclopedia)

60 Knowledge Belongs to

61 Knowledge Belongs to Belongs to A part of Belongs to A part of

62 Knowledge Belongs to Has a lunch Belongs to A part of Belongs to A part of

63 Knowledge Work in, commute in Belongs to Has a lunch Belongs to A part of Belongs to A part of

64 Knowledge features Largest city Work in, commute in Favorite team Belongs to Has a lunch Belongs to A part of Belongs to A part of

65 Knowledge features Largest city Work in, commute in Now visiting Favorite team Belongs to Has a lunch Belongs to A part of Belongs to A part of

66 Knowledge Largest city Friendship countries A member of features Work in, commute in Now visiting Favorite team Belongs to Has a lunch Belongs to A part of Belongs to A part of

67 Knowledge A member of A member of features Pay money A member of Friendship countries Work in, commute in Now visiting A member of Largest city Saw it play Favorite team Nationality A member of Belongs to Awarded Favorite team Nationality Live Born in Has a lunch Belongs to A part of Belongs to Same name A part of A member of

68 Boo! Chee r! Knowledge A member of A member of features Pay money A member of Friendship countries Work in, commute in Now visiting A member of Largest city Saw it play Favorite team Nationality A member of Belongs to Awarded Favorite team Nationality Live Born in Has a lunch Belongs to A part of Belongs to Same name A part of A member of

69 Knowledge Creation Not only by hand Cyc, EDR Learn from Corpus Frequency, TFIDF Pattern discovery, NE discovery Distributional similarity NE discovery, attribute for NE, Paraphrase Discovery Lexico-syntactic pattern Hyper-hyponym discovery, IE pattern Alignment Paraphrase Discovery

70 Topics Names Create categories (e.g. product name, event name) Find entities (e.g. instances of laptop names) Relation Find relations (e.g. pro/cons of products, sentiment, opinion) Categorize Relation Paraphrase, textual entailment (e.g. same kind of opinion) Knowledge Link and structure the extracted knowledge Human interface

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Anotaciones semánticas: unidades de busqueda del futuro?

Anotaciones semánticas: unidades de busqueda del futuro? Anotaciones semánticas: unidades de busqueda del futuro? Hugo Zaragoza, Yahoo! Research, Barcelona Jornadas MAVIR Madrid, Nov.07 Document Understanding Cartoon our work! Complexity of Document Understanding

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization

Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Knowledge Discovery using Text Mining: A Programmable Implementation on Information Extraction and Categorization Atika Mustafa, Ali Akbar, and Ahmer Sultan National University of Computer and Emerging

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Cross-Lingual Concern Analysis from Multilingual Weblog Articles Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/

More information

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1

Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Qualitative Corporate Dashboards for Corporate Monitoring Peng Jia and Miklos A. Vasarhelyi 1 Introduction Electronic Commerce 2 is accelerating dramatically changes in the business process. Electronic

More information

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Question Answering and Multilingual CLEF 2008

Question Answering and Multilingual CLEF 2008 Dublin City University at QA@CLEF 2008 Sisay Fissaha Adafre Josef van Genabith National Center for Language Technology School of Computing, DCU IBM CAS Dublin sadafre,josef@computing.dcu.ie Abstract We

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering

More information

Why are Organizations Interested?

Why are Organizations Interested? SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty M-E.Eddlestone@sas.com +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

A Survey on Web Mining From Web Server Log

A Survey on Web Mining From Web Server Log A Survey on Web Mining From Web Server Log Ripal Patel 1, Mr. Krunal Panchal 2, Mr. Dushyantsinh Rathod 3 1 M.E., 2,3 Assistant Professor, 1,2,3 computer Engineering Department, 1,2 L J Institute of Engineering

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse

Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse Milk, bread and toothpaste : Adapting Data Mining techniques for the analysis of collocation at varying levels of discourse Rob Sanderson, Matthew Brook O Donnell and Clare Llewellyn What happens with

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines -

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - English Student no:... Page 1 of 12 Contact during the exam: Geir Solskinnsbakk Phone: 94218 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Friday May 21, 2010 Time: 0900-1300 Allowed

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow

A Framework-based Online Question Answering System. Oliver Scheuer, Dan Shen, Dietrich Klakow A Framework-based Online Question Answering System Oliver Scheuer, Dan Shen, Dietrich Klakow Outline General Structure for Online QA System Problems in General Structure Framework-based Online QA system

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens

More information

The Seven Practice Areas of Text Analytics

The Seven Practice Areas of Text Analytics Excerpt from: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Available now:

More information

Supervised Learning Evaluation (via Sentiment Analysis)!

Supervised Learning Evaluation (via Sentiment Analysis)! Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Introduction to IE with GATE

Introduction to IE with GATE Introduction to IE with GATE based on Material from Hamish Cunningham, Kalina Bontcheva (University of Sheffield) Melikka Khosh Niat 8. Dezember 2010 1 What is IE? 2 GATE 3 ANNIE 4 Annotation and Evaluation

More information

Leveraging ASEAN Economic Community through Language Translation Services

Leveraging ASEAN Economic Community through Language Translation Services Leveraging ASEAN Economic Community through Language Translation Services Hammam Riza Center for Information and Communication Technology Agency for the Assessment and Application of Technology (BPPT)

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Text Analytics Illustrated with a Simple Data Set

Text Analytics Illustrated with a Simple Data Set CSC 594 Text Mining More on SAS Enterprise Miner Text Analytics Illustrated with a Simple Data Set This demonstration illustrates some text analytic results using a simple data set that is designed to

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Brill s rule-based PoS tagger

Brill s rule-based PoS tagger Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based

More information

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

Optimization of Internet Search based on Noun Phrases and Clustering Techniques Optimization of Internet Search based on Noun Phrases and Clustering Techniques R. Subhashini Research Scholar, Sathyabama University, Chennai-119, India V. Jawahar Senthil Kumar Assistant Professor, Anna

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Internet of Things, data management for healthcare applications. Ontology and automatic classifications Internet of Things, data management for healthcare applications. Ontology and automatic classifications Inge.Krogstad@nor.sas.com SAS Institute Norway Different challenges same opportunities! Data capture

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Stanford s Distantly-Supervised Slot-Filling System

Stanford s Distantly-Supervised Slot-Filling System Stanford s Distantly-Supervised Slot-Filling System Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning Computer Science Department,

More information

Inner Classification of Clusters for Online News

Inner Classification of Clusters for Online News Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

TechWatch. Technology and Market Observation powered by SMILA

TechWatch. Technology and Market Observation powered by SMILA TechWatch Technology and Market Observation powered by SMILA PD Dr. Günter Neumann DFKI, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Juni 2011 Goal - Observation of Innovations and Trends»

More information

Search Engine Based Intelligent Help Desk System: iassist

Search Engine Based Intelligent Help Desk System: iassist Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India sahilshahwnr@gmail.com, sheetaltakale@gmail.com

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

Search Result Optimization using Annotators

Search Result Optimization using Annotators Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

SINAI at WEPS-3: Online Reputation Management

SINAI at WEPS-3: Online Reputation Management SINAI at WEPS-3: Online Reputation Management M.A. García-Cumbreras, M. García-Vega F. Martínez-Santiago and J.M. Peréa-Ortega University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

The key linkage of Strategy, Process and Requirements

The key linkage of Strategy, Process and Requirements Business Systems Business Functions The key linkage of Strategy, Process and Requirements Leveraging value from strategic business architecture By: Frank Kowalkowski, Knowledge Consultants, Inc.. Gil Laware,

More information

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014

Automatic Knowledge Base Construction Systems. Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 Automatic Knowledge Base Construction Systems Dr. Daisy Zhe Wang CISE Department University of Florida September 3th 2014 1 Text Contains Knowledge 2 Text Contains Automatically Extractable Knowledge 3

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

How To Train A Kpl System

How To Train A Kpl System New York University 2011 System for KBP Slot Filling Ang Sun, Ralph Grishman, Wei Xu, Bonan Min Computer Science Department New York University {asun, grishman, xuwei, min}@cs.nyu.edu 1 Introduction The

More information

A Survey of Text Mining Techniques and Applications

A Survey of Text Mining Techniques and Applications 60 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 1, NO. 1, AUGUST 2009 A Survey of Text Mining Techniques and Applications Vishal Gupta Lecturer Computer Science & Engineering, University

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

Text Understanding and Analysis of Natural Language Communication

Text Understanding and Analysis of Natural Language Communication Information Extraction Definition Information Extraction Katharina Kaiser http://www.ifs.tuwien.ac.at/~kaiser History Architecture of IE systems Wrapper systems Approaches Evaluation 2 IE: Definition IE:

More information

Build Vs. Buy For Text Mining

Build Vs. Buy For Text Mining Build Vs. Buy For Text Mining Why use hand tools when you can get some rockin power tools? Whitepaper April 2015 INTRODUCTION We, at Lexalytics, see a significant number of people who have the same question

More information

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:>

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:> » A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams 2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

More information

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

A Case Study of Question Answering in Automatic Tourism Service Packaging

A Case Study of Question Answering in Automatic Tourism Service Packaging BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, Special Issue Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0045 A Case Study of Question

More information

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

Finding Advertising Keywords on Web Pages. Contextual Ads 101

Finding Advertising Keywords on Web Pages. Contextual Ads 101 Finding Advertising Keywords on Web Pages Scott Wen-tau Yih Joshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University Contextual Ads 101 Publisher s website Digital Camera Review The

More information

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007 Naïve Bayes Components ML vs. MAP Benefits Feature Preparation Filtering Decay Extended Examples

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

TAAI 2012 Panel Discussion: Big Data. About Me: Chin Yew Lin

TAAI 2012 Panel Discussion: Big Data. About Me: Chin Yew Lin TAAI 2012 Panel Discussion: Big Data Chin Yew Lin cyl@microsoft.com Microsoft Research Asia About Me: Chin Yew Lin Senior Researcher, Knowledge Mining Group, Microsoft Research Asia Areas of Interest Natural

More information

Schema documentation for types1.2.xsd

Schema documentation for types1.2.xsd Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp

Information & Data Visualization. Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp Information & Data Visualization Yasufumi TAKAMA Tokyo Metropolitan University, JAPAN ytakama@sd.tmu.ac.jp 1 Introduction Contents Self introduction & Research purpose Social Data Analysis Related Works

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)

An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.

More information

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Survey Results: Requirements and Use Cases for Linguistic Linked Data Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

More information

A Survey on Product Aspect Ranking

A Survey on Product Aspect Ranking A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic

More information

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval

Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Approaches of Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH Journal of Computer Science 9 (7): 922-927, 2013 ISSN: 1549-3636 2013 doi:10.3844/jcssp.2013.922.927 Published Online 9 (7) 2013 (http://www.thescipub.com/jcs.toc) ARABIC PERSON NAMES RECOGNITION BY USING

More information