Language and Computation



Similar documents
Machine Translation. Agenda

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Statistical Machine Translation: IBM Models 1 and 2

ISA OR NOT ISA: THE INTERLINGUAL DILEMMA FOR MACHINE TRANSLATION

Survey Results: Requirements and Use Cases for Linguistic Linked Data

Tagging with Hidden Markov Models

Study Plan for Master of Arts in Applied Linguistics

Master of Arts in Linguistics Syllabus

The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge

Big Data and Scripting. (lecture, computer science, bachelor/master/phd)

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test

Introduction. Philipp Koehn. 28 January 2016

Fall 2012 Q530. Programming for Cognitive Science

Career info session Nov. 17th, / 24

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

Clustering Connectionist and Statistical Language Processing

CS 40 Computing for the Web

An Online Service for SUbtitling by MAchine Translation

Overview of MT techniques. Malek Boualem (FT)

MEDAR Mediterranean Arabic Language and Speech Technology An intermediate report on the MEDAR Survey of actors, projects, products

YALE UNIVERSITY Department of Psychology New Haven, Connecticut 06511

Hybrid Strategies. for better products and shorter time-to-market

Course: Model, Learning, and Inference: Lecture 5

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

Collecting Polish German Parallel Corpora in the Internet

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Big Data and Scripting

Minor in Environmental Studies LINGUISTICS GENERAL ELECTIVES COOPERATIVE EDUCATION RESIDENCY REQUIREMENT UNIVERSITY-WIDE REQUIREMENTS REQUIRED COURSE

Computer Aided Translation

Appendices master s degree programme Human Machine Communication

Dutch Parallel Corpus

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS

Graduate School of Informatics

D2.4: Two trained semantic decoders for the Appointment Scheduling task

Machine Learning for natural language processing

General Psychology, PSY 101

HLT in Hungary

M LTO Multilingual On-Line Translation

Introduction. Compiler Design CSE 504. Overview. Programming problems are easier to solve in high-level languages

Speech and Language Processing

Appendices master s degree programme Artificial Intelligence

USABILITY OF A FILIPINO LANGUAGE TOOLS WEBSITE

Building a Question Classifier for a TREC-Style Question Answering System

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Convergence of Translation Memory and Statistical Machine Translation

Parsing Software Requirements with an Ontology-based Semantic Role Labeler

B. Tech. Project Report

Learning Translation Rules from Bilingual English Filipino Corpus

Brill s rule-based PoS tagger

Why major in linguistics (and what does a linguist do)?

CS 51 Intro to CS. Art Lee. September 2, 2014

Statistical Machine Translation

Effective Self-Training for Parsing

Natural Language to Relational Query by Using Parsing Compiler

SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer

Customizing an English-Korean Machine Translation System for Patent Translation *

Special Topics in Computer Science

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

George Mason University Graduate School of Education Program: Special Education

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

Machine Translation Computer Aided Translation Machine Language Processing

MA APPLIED LINGUISTICS AND TESOL

CSC384 Intro to Artificial Intelligence

Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features

Question template for interviews

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

The Halting Problem is Undecidable

Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013.

THE CHINESE UNIVERSITY OF HONG KONG DEPARTMENT OF TRANSLATION COURSE OUTLINE

Psychological Tests and Measurements PSYC Summer 2016

Chapter 5. Phrase-based models. Statistical Machine Translation

Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University

CS 6740 / INFO Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Terminology Extraction from Log Files

31 Case Studies: Java Natural Language Tools Available on the Web

Processing: current projects and research at the IXA Group

An Overview of Applied Linguistics

Objective 2.4.: Students will demonstrate knowledge of, and the ability to practice, behavioral consultation with teachers and parents.

ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS

Multi-Engine Machine Translation by Recursive Sentence Decomposition

INTRODUCTION TO PSYCHOLOGY

Chapter 8. Final Results on Dutch Senseval-2 Test Data

Transcription:

Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1

Practical matters Superficial reading: JM 25 http://birot.hu/courses/2014-lc/readings.txt Assignment 4: returned, remarks posted Assignment 5: posted Python: if needed, programming section Section/review sections next week: let Jen know! Final exam: Fr 05/02, 9 am; Mo 05/05, 7 pm. Room t.b.a. In-class, no laptop, open-book extra copies needed? Tamás Biró, Yale U., Language and Computation p. 2

Today Machine Translation: traditional approaches Machine Translation: statistical approaches Summary: humans and computers Tamás Biró, Yale U., Language and Computation p. 3

Classical Machine Translation Tamás Biró, Yale U., Language and Computation p. 4

Classical Machine Translation The Vauquois Triangle (1968): Tamás Biró, Yale U., Language and Computation p. 5

Classical Machine Translation Separate MT system for each language pair, or deploying contrastive knowledge n languages there are n(n 1) pairs (Cf. translation in the European Union: 24 official languages.) MT via universal interlingua via truly universal semantic representation via English, Esperanto, etc. Tamás Biró, Yale U., Language and Computation p. 6

Classical Machine Translation Principled ways vs. ad hoc solutions: Part-of-speech taggers, parsers, disambiguation, ontologies, etc. Hand-crafted rules Balancing three, contradicting goals: High quality No need for human intervention (cf. feedback loops in conversational agents) Breadth of domain portability to new domains, domain adaptation scaling up to larger domains Tamás Biró, Yale U., Language and Computation p. 7

Statistical Machine Translation Tamás Biró, Yale U., Language and Computation p. 8

Bayesian MT (Brown et al. 1993) Translating French sentence f to English e: Given f, which e maximizes Prob(e f)? Bayes theorem: Prob(e f) = Prob(e) Prob(f e) Prob(f) Hence, find e that maximizes Prob(e) Prob(f e). Tamás Biró, Yale U., Language and Computation p. 9

Bayesian MT Tamás Biró, Yale U., Language and Computation p. 10

Bayesian MT A classical search problem: find e such that e = arg max e Prob(e ) Prob(f e ). Problem decomposed. Parameters estimated from corpora: Language model Prob(e): quality of English translation Estimated piecemeal from corpus of English. No need to care for correspondence with French. Translation model Prob(f e): E-F correspondences Estimated piecemeal from aligned bilingual corpora. No need to care for quality of the generated English text. Decoder: similar idea to spelling, speech, etc. Tamás Biró, Yale U., Language and Computation p. 11

Summary: humans and computers Tamás Biró, Yale U., Language and Computation p. 12

Goals of doing Language and Computation Linguistics: computational linguistics Computer science: Natural Language Processing Electrical engineering: Speech Recognition Psychology, cognitive science: Computational Psycholinguistics Tamás Biró, Yale U., Language and Computation p. 13

Goals of doing Language and Computation Supporting (theoretical) linguistics: novel approaches to language analysis. Computational issues raised by (theoretical) linguistics and cognitive science: E.g., parsability, learnability, mental implementation of theoretical constructs. Language related tasks when answering the needs of the computer users: processing natural language data. Tamás Biró, Yale U., Language and Computation p. 14

Humans and computers Human computer interaction Should computers imitate humans? Performance? (e.g., prone to errors) Mechanism? E.g., parsing algorithms vs. human parsing. E.g., the Chomsky hierarchy and natural languages. Tamás Biró, Yale U., Language and Computation p. 15

Thank you for your attention during the entire semester! Tamás Biró, Yale U., Language and Computation p. 16