Special Topics in Computer Science



Similar documents
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Programming Languages

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Semantic annotation of requirements for automatic UML class diagram generation

An Introduction to Language Processing with Perl and Prolog

D2.4: Two trained semantic decoders for the Appointment Scheduling task

A Machine Translation System Between a Pair of Closely Related Languages

Computer Science Theory. From the course description:

Markus Dickinson. Dept. of Linguistics, Indiana University Catapult Workshop Series; February 1, 2013

Automatic Text Analysis Using Drupal

Natural Language Database Interface for the Community Based Monitoring System *

English Descriptive Grammar

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System

Processing: current projects and research at the IXA Group

How To Complete The Danish Masters Program In Lct

Scope of this Course. Database System Environment. CSC 440 Database Management Systems Section 1

CSCI 3136 Principles of Programming Languages

Presented to The Federal Big Data Working Group Meetup On 07 June 2014 By Chuck Rehberg, CTO Semantic Insights a Division of Trigent Software

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Natural Language to Relational Query by Using Parsing Compiler

31 Case Studies: Java Natural Language Tools Available on the Web

Teaching Formal Methods for Computational Linguistics at Uppsala University

ARABIC PERSON NAMES RECOGNITION BY USING A RULE BASED APPROACH

CS 51 Intro to CS. Art Lee. September 2, 2014

Classification of Natural Language Interfaces to Databases based on the Architectures

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

Overview of the TACITUS Project

Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Word Completion and Prediction in Hebrew

Customizing an English-Korean Machine Translation System for Patent Translation *

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Interactive Dynamic Information Extraction

How to make Ontologies self-building from Wiki-Texts

Clustering Connectionist and Statistical Language Processing

The Prolog Interface to the Unstructured Information Management Architecture

Terminology Extraction from Log Files

Automated Content Analysis of Discussion Transcripts

English 1302 Writing Across the Curriculum Spring 2016

n Introduction n Art of programming language design n Programming language spectrum n Why study programming languages? n Overview of compilation

Timeline (1) Text Mining Master TKI. Timeline (2) Timeline (3) Overview. What is Text Mining?

Introduction to Text Mining. Module 2: Information Extraction in GATE

A Mixed Trigrams Approach for Context Sensitive Spell Checking

English 1302 Writing Across the Curriculum Fall 2015

Research Portfolio. Beáta B. Megyesi January 8, 2007

Curriculum Vitae. John M. Zelle, Ph.D.

Assessment for Master s Degree Program Fall Spring 2011 Computer Science Dept. Texas A&M University - Commerce

TOOL OF THE INTELLIGENCE ECONOMIC: RECOGNITION FUNCTION OF REVIEWS CRITICS. Extraction and linguistic analysis of sentiments

Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Hybrid Strategies. for better products and shorter time-to-market

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Terminology Extraction from Log Files

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

How RAI's Hyper Media News aggregation system keeps staff on top of the news

Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology

Datavetenskapligt Program (kandidat) Computer Science Programme (master)

A Method for Automatic De-identification of Medical Records

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

INTEGRATED DEVELOPMENT ENVIRONMENTS FOR NATURAL LANGUAGE PROCESSING

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

USC VITERBI SCHOOL OF ENGINEERING INFORMATICS PROGRAM

Agreement on. Dual Degree Master Program in Computer Science KAIST. Technische Universität Berlin

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Salt Lake Community College Essentials of College Study (EDU 1020)

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Christian Leibold CMU Communicator CMU Communicator. Overview. Vorlesung Spracherkennung und Dialogsysteme. LMU Institut für Informatik

Compilers. Presentation. Lecture 0. Spring term. Mick O Donnell: michael.odonnell@uam.es Alfonso Ortega: alfonso.ortega@uam.es

Syllabus INFO-GB Design and Development of Web and Mobile Applications (Especially for Start Ups)

University of Central Florida Department of Electrical Engineering & Computer Science EEL 4914C Spring Senior Design I

PP-Attachment. Chunk/Shallow Parsing. Chunk Parsing. PP-Attachment. Recall the PP-Attachment Problem (demonstrated with XLE):

Sentiment analysis for news articles

The Specific Text Analysis Tasks at the Beginning of MDA Life Cycle

CS4025: Pragmatics. Resolving referring Expressions Interpreting intention in dialogue Conversational Implicature

Reading Listening and speaking Writing. Reading Listening and speaking Writing. Grammar in context: present Identifying the relevance of

Building a Question Classifier for a TREC-Style Question Answering System

Semantic Analysis of Natural Language Queries Using Domain Ontology for Information Access from Database

Financial Optimization ISE 347/447. Preliminaries. Dr. Ted Ralphs

ETL Ensembles for Chunking, NER and SRL

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Winter 2016 Course Timetable. Legend: TIME: M = Monday T = Tuesday W = Wednesday R = Thursday F = Friday BREATH: M = Methodology: RA = Research Area

A GrAF-compliant Indonesian Speech Recognition Web Service on the Language Grid for Transcription Crowdsourcing

DEPENDENCY PARSING JOAKIM NIVRE

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

Transcription:

Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology

INTRODUCTION Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 2

Objectives Natural Language Processing (NLP) P)is a discipline that takes a semester or two to appreciate it the full spectrum of its implications. In this course, however, we will spend a minimal amount of time for the introductory materials and jump into real world applications, focusing on text mining and human robot interface, for the students to get a hands on experience as early as possible and with meaningful results by the end of the semester. Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 3

Administrative Details Instructor Jong C. Park Email: park@cs.kaist.ac.kr Office: CS Bldg. Room 2406 Phone: x3541 Teaching Assistants Seung Cheol Baek and Hee Jin Lee Email: cs492@nlp.kaist.ac.kr k Phone: x3581 Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 4

Administrative Details Homepage http://nlp.kaist.ac.kr/~cs492 p// p / Time and Place Tuesdays and Thursdays 1pm~2:20pm CS Bldg. Room 2443 Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 5

Course Plan Textbooks Textbook Materials Software Review Project Feedback Plan Weekly Schedule Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 6

Textbooks Pi Primary An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, ti and Application with Special Consideration of English, French, and German, Pierre M. Nugues, Springer, 2006. http://www.cs.lth.se/~pierre/ilppp/ Secondary Speech and Language Processing: An Introduction to Natural Language g Processing, Computational Linguistics, and Speech Recognition, Daniel Jurafsky & James H. Martin, Pearson, 2 nd edition, 2009. Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 7

Textbook Materials [chapters] An Overview of Language Processing Corpus Processing Tools Encoding, Entropy, and Annotation Schemes Counting Words Words, Parts of Speech, and Morphology Part of Speech Tagging Using Rules Part of Speech Tagging Using Stochastic Techniques Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 8

Textbook Materials [chapters] Phrase Structure Grammars in Prolog Partial Parsing Syntactic Formalisms Parsing Techniques Semantics and Predicate Logic Lexical Semantics Discourse Dialogue Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 9

Textbook Materials [software] An Overview of Language Processing The German Institute for Artificial Intelligence Research http://registry.dfki.de The Oxford Text Archive http://ota.ox.ac.uk The Linguistic Data Consortium of the University of Pennsylvania http://www.ldc.upenn.edu The European Language Resources Association http://www.elra.info Peedy at the Microsoft Research Web site http://www.research.microsoft.com Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 10

Textbook Materials [software] Corpus Processing Tools Finite State Automata in Prolog (p29~30) Concordances in Prolog (p46~47) Concordances in Perl (p48~50) The Minimum Edit Distance in Perl (p53~54) Searching Edits in Prolog (p54) The FSA Utilities http://odur.let.rug.nl/~vannoord/fsa The FSM Library http://www.research.att.com/sw/tools/fsm Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 11

Textbook Materials [software] Encoding, Entropy, and Annotation Schemes IBM s Library of Unicode Components in Java and C++ http://www.ibm.com/software/globalization/icu p// / /g / Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 12

Textbook Materials [software] Counting Words Tokenizing Texts in Prolog (p89~91) Tokenizing ing Texts in Perl (p91) Counting Unigrams in Prolog (p93) Counting Unigrams and Bigrams with Perl (p93~95) Extracting Collocations with Perl (p108~109) The SRI Language Modeling Collection http://www.speech.sri.com The CMU Cambridge Statistical Language Modeling Toolkit http://svr www.eng.cam.ac.uk/~prc14/toolkit.html Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 13

Textbook Materials [software] Words, Parts of Speech, and Morphology Building a Trie in Prolog (p121~123) Finding a Word in a Trie in Prolog (p123) Finite State Transducers in Prolog (p134~136) PC KIMMO 2 http://www.sil.org General Purpose Finite State Transducer Toolkits http://www igm.univ mlv.fr/~unitex / Xerox XFST http://www.xrce.xerox.com/competencies/content ero com/competencies/content analysis/fst/ Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 14

Textbook Materials [software] Part of Speech Tagging Using Rules Tagging g in Prolog (p151~153) 53) Part of Speech Tagging Using Stochastic Techniques GIZA++ http://www.fjoch.com/giza++.html Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 15

Textbook Materials [software] Phrase Structure Grammars in Prolog Tokenizing Texts Using DCG Rules (p200~202) Partial Parsing (Simplified) ELIZA in Prolog (p215~216) Multiword Detection using DCG Rules (p219~221) Detecting ti Noun Groups using DCG Rules (p223~224) Detecting Verb Groups using DCG Rules (p225~227) Detecting Prepositional Groups using DCG Rules (p232~p234) Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 16

Textbook Materials [software] Syntactic Formalisms Unification of Feature Structures in Prolog (p263) Parsing Techniques Shift Reduce Parsing in Prolog (p279~281) 281) Earley Parsing in Prolog (p288~294) CYK Parsing in Prolog (p298~300) Nivre s Parser in Prolog (p306~309) Constraint t Handling Rules (available) in SWI Prolog Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 17

Textbook Materials [software] Semantics and Predicate Logic Lexical Semantics Discourse Dialogue Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 18

Software Review Students t will make presentations tti in groups. Topics for student presentations will be on: Software modules from the primary textbook Software packages from the Internet Each presentation should include: Basic concepts Usage of software modules or packages Pros and cons of such modules or packages Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 19

Project Students t may form groups to propose and work on projects related to Human Robot Interaction Text Mining You can choose your own programming language. Use Prolog (or Perl) and enhance the functions of the software modules in the textbook. Use other languages (such as Python or Java) and combine modules from the available packages. Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 20

Feedback Plan Evaluation Attendance: 15% Presentation: 30% Programming Assignments: 15% Project: 40% Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 21

Weekly Schedule (1/3) Week Lecture Materials il Software Review Homework Introduction: 1 An Overview of Language Processing Corpus Processing Tools; 2 Encoding, An Overview of Language Processing Read: Intro to Prolog [pdf] Prolog Programming g Assignment 1 3 Counting Words; Corpus Processing Tools; Words, POS, Encoding, 4 POS Tagging ; PSGs in Prolog Counting Words; Words, POS, Prolog Programming Assignment 2 5 Partial Parsing POS Tagging ; PSGs in Prolog 6 Syntactic Formalisms; Parsing Techniques Partial Parsing Project Ideas (1 page) Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 22

Weekly Schedule (2/3) Week Lecture Materials Software Review Homework 7 Syntactic Formalisms; Parsing Techniques Project Proposal 8 Midterm Week [no exam] 9 10 Text Mining Applications: Resources Text Mining Applications: Terminology and NER Text Mining Applications: 11 Extraction and Annotation 12 [Project Interim Demo] [Project Proposal : Presentation] Text Mining Applications: Resources Text Mining Applications: Terminology and NER Text Mining Applications: Extraction and Annotation Project Interim Demo Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 23

Weekly Schedule (3/3) Week Lecture Materials Software Review Homework 13 14 HRI Applications: Speech Synthesis HRI Applications: Robots HRI Applications: Speech Project Final with Emotion Synthesis Demo 15 [Project Final Demo] HRI Applications: Robots with Emotion Final Report 16 Final Exam Week [no exam] Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 24

Reading Assignment An Introduction to Prolog PDF file available at http://www.cs.lth.se/~pierre/ilppp SWI Prolog http://www.swi prolog.org Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 25