Activities. but I will require that groups present research papers



Similar documents
GUIDELINES FOR TRANSLATING THE EUROPASS CERTIFICATE SUPPLEMENT INTRODUCTION GENERAL RECOMMENDATIONS

Yandex.Translate API Developer's guide

Official Journal of the European Union

Translating for a Multilingual European Union: Putting Multilingualism into Context Dr Angeliki PETRITS Language Officer European Commission, UK

Designing Tablet Computer Keyboards for European Languages

GET YOUR START MENU BACK IN MICROSOFT WINDOWS SERVER 2012

Community herbal monograph on Commiphora molmol Engler, gummi-resina

CALL FOR EXPRESSIONS OF INTEREST FOR CONTRACT STAFF

Formatting Custom List Information

Community herbal monograph on Arnica montana L., flos

Community herbal monograph on Cucurbita pepo L., semen

CALL FOR EXPRESSIONS OF INTEREST FOR CONTRACT AGENTS CHILDCARE STAFF. Function Group II EPSO/CAST/S/2/2012 I. INTRODUCTION

Community herbal monograph on Lavandula angustifolia Miller, aetheroleum

INVESTING IN INTANGIBLES: ECONOMIC ASSETS AND INNOVATION DRIVERS FOR GROWTH

Community herbal monograph on Oenothera biennis L.; Oenothera lamarckiana L., oleum

Translation strategy decision support at the European Commission

Europeans and their Languages

Community herbal monograph on Panax ginseng C.A. Meyer, radix

Community herbal monograph on Rosa gallica L., Rosa centifolia L., Rosa damascena Mill., flos

Community herbal monograph on Panax ginseng C.A.Meyer, radix

Policy on enrolment in the Brussels European Schools for the school year

HOW COMPANIES INFLUENCE OUR SOCIETY: CITIZENS VIEW

How do I translate...?

Luxembourg-Luxembourg: FL/SCIENT15 Translation services 2015/S Contract notice. Services

User language preferences online. Analytical report

Languages Supported. SpeechGear s products are being used to remove communications barriers throughout the world.

Il potere tra le tue mani

Community herbal monograph on Arctostaphylos uva-ursi (L.) Spreng., folium

European Union herbal monograph on Carum carvi L., aetheroleum

Luxembourg-Luxembourg: FL/TERM15 Translation services 2015/S Contract notice. Services

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Responsible Research and Innovation (RRI), Science and Technology

USER GUIDE: Trading Central Indicator for the MT4 platform

MM, EFES EN. Marc Mathieu

Information and Communications Technologies (ICTs) in Schools

LSI TRANSLATION PLUG-IN FOR RELATIVITY. within

GÜLTIGE RESERVELISTEN NACH 31. DEZEMBER 2015 Kategorie Listen Enddatum 1 Middle Management EPSO/AD/232/12 Head of Unit (AD 12) with Bulgarian as

TRADING CENTRAL INDICATOR FOR METATRADER USERS GUIDE. Blue Capital Markets Limited All rights reserved.

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

European Economic and Social Committee

LANGUAGE CONNECTIONS YOUR LINGUISTIC GATEWAY

Oracle Taleo Enterprise Mobile for Talent Management Cloud Service Administration Guide

MINIMUM ADMISSION REQUIREMENTS. for Higher Certificate, Diploma and Bachelor s Degree Programmes requiring a National Senior Certificate

Table 1: TSQM Version 1.4 Available Translations

Towards Collaborative Practice - European Conference on Youth Work, Social Innovation, and Enterprise

REACH-IT Industry User Manual

Tel: Fax: P.O. Box: 22392, Dubai - UAE info@communicationdubai.com comm123@emirates.net.ae

SAP BusinessObjects Document Version: 4.1 Support Package Dashboards and Presentation Design Installation Guide

European Union herbal monograph on Salvia officinalis L., folium

EN 106 EN 4. THE MOBILE USE OF THE INTERNET BY INDIVIDUALS AND ENTERPRISES Introduction

D EUOSME: European Open Source Metadata Editor (revised )

Remote Desktop Services Guide

CrossLoop Help. What would you like to do? Joining A Session

Luxembourg-Luxembourg: FL/RAIL16 Translation services 2016/S Contract notice. Services

PRICE LIST. ALPHA TRANSLATION AGENCY

RETAILERS ATTITUDES TOWARDS CROSS- BORDER TRADE AND CONSUMER PROTECTION

A global leader in document translations

RESEARCH ASSISTANCE. The Portal is also accessible to the general public but restricted to the free case law databases.

MT Search Elastic Search for Magento

egovernment Digital Agenda Scoreboard 2014

Trading Central Indicator for MetaTrader4 TRADER / USER SET UP & CONFIGURATION

10TH EDITION MERGER CONTROL VADEMECUM FILING THRESHOLDS AND CLEARANCE CONDITIONS IN THE 29 EUROPEAN JURISDICTIONS

Vacancy notice for establishing a reserve list: Project Manager Reference: 08/EJ/CA/51 Contract Agent FG IV M/F

1. the language situation in the country (the total number of speakers of the individual languages);

Cyclope Internet Filtering Proxy. - User Guide -

We Answer To All Your Localization Needs!

Voice Mail. Service & Operations

How To Set The Minimum Admission Requirements For A National Senior Certificate

SMES, RESOURCE EFFICIENCY AND GREEN MARKETS

CE Marking of Glass products

European cooperation on judicial training for court staff and bailiffs. Regional seminars with national breakout sessions and other examples

Teaching Languages at School

Vacancy notice for the post of: Events Coordinator Reference: 09/EJ/211 Temporary Agent AST 3 M/F

GENERAL SERVICES ADMINISTRATION

Translution Price List GBP

INTERC O MBASE. Global Language Solution

Official name: Department of Contracts National ID: Postal address: Notre Dame Ravelin Town: Floriana Postal code: FRN 1600

External Candidate Online Application

This notice in TED website:

Speaking your language...

1a. Total Leaseurope Leasing Market 2012

Medical Insurance Services and Medical Claim Administration Services for ECB Staff (2011/S )

Towards a safer use of the Internet for children in the EU a parents perspective. Analytical report

CALL FOR AN EXPRESSION OF INTEREST FOR A SECONDED NATIONAL EXPERT WITHIN EUROJUST:

Poland-Warsaw: MyFrontex digital workplace (COTS-based intranet) 2016/S Contract notice. Services

Introductory Guide to the Common European Framework of Reference (CEFR) for English Language Teachers

Leaseurope Biannual Survey 2014 Table of Contents

Safe Harbor Statement

Xerox Easy Translator Service User Guide

Who We Are. Services We Offer

Vacancy notice for establishing a reserve list: Administrative Assistant to Eurojust Reference: 08/EJ/CA/55 Contract Agent FG I M/F

ADECCO BULGARIA MANAGED SERVICES

Internet sites for machine translation available language-pairs ** Part 1 direct translation sites

Transcription:

CS-498 Signals AI Themes Much of AI occurs at the signal level Processing data and making inferences rather than logical reasoning Areas such as vision, speech, NLP, robotics methods bleed into other areas (graphics, animation,...) Linked by the use of statistical tools and ideas from machine learning Much domain knowledge is required to make progress in areas However, they do share tools

Activities Mainly lecture Evaluation Final project by choice but I will require that groups present research papers 4 projects, which will have a competitive component build a part-of-speech tagger build a face finder build a word spotter build a styleik (we ll talk about what this means) Participation

Natural Language - Applications Machine Translation e.g. South Africa s official languages: English, Afrikaans, the Nguni languages, isizulu, isixhosa, isindebele, and Siswati, and the Sotho languages, which include Setswana, Sesotho and Sesotho sa Leboa. The remaining two languages are Tshivenda and Xitsonga. български (Bălgarski) - BG - Bulgarian; Čeština - CS - Czech; Dansk - DA - Danish; Deutsch - DE - German; Eesti - ET - Estonian; Elinika - EL - Greek; English - EN; Español - ES - Spanish; Français - FR - French; Gaeilge - GA - Irish; Italiano - IT - Italian; Latviesu valoda - LV - Latvian; Lietuviu kalba - LT - Lithuanian; Magyar - HU - Hungarian; Malti - MT - Maltese; Nederlands - NL - Dutch; Polski - PL - Polish; Português - PT - Portuguese; Română - RO - Romanian; Slovenčina - SK - Slovak; Slovenščina - SL - Slovene; Suomi - FI - Finnish; Svenska - SV - Swedish

More NLP applications Question answering Text summarisation Information extraction Information retrieval Improved understanding of language, linguistics

Why is NLP hard? Meaning is a complex phenomenon Sentences are often radically ambiguous Time flies like an arrow Fruit flies like a banana

Is this grammatical? John I believe Sally said Bill believed Sue saw. What did Sally whisper that she had secretly read? John wants very much for himself to win. Those are the books you should read before it becomes difficult to talk about. Those are the books you should read before talking about becomes difficult. Who did Jo think said John saw him? That a serious discussion could arise here of this topic was quite unexpected. The boys read Mary s stories about each other.

Is this grammatical? - Answers Y: John I believe Sally said Bill believed Sue saw. N: What did Sally whisper that she had secretly read? Y: John wants very much for himself to win. Y: Those are the books you should read before it becomes difficult to talk about. N: Those are the books you should read before talking about becomes difficult. Y: Who did Jo think said John saw him? Y: That a serious discussion could arise here of this topic was quite unexpected. N: The boys read Mary s stories about each other. Answers due to van Riemsdijk and Williams, 1986, given in Manning and Schutze

Ambiguity in parsing - 1 Note: S= sentence Aux=auxiliary V=verb NP=noun phrase VP=verb phrase Figure from Manning and Schutze

Ambiguity in parsing - 2 cf Our problem is training workers Figure from Manning and Schutze

Ambiguity in parsing - 4 cf Those are training wheels Figure from Manning and Schutze

Ambiguity in parsing -4 a reasonably sophisticated system gives 455 parses for: List the sales of the products produced in 1973 with the products produced in 1972 Difficulty: There doesn t seem to be a single grammar that is right choice of grammar is not innocuous: more complex grammars lead to more ambiguous parses less complex grammars can t deal with some (possibly important) special cases

Counts, frequencies and probabilities Important phenomena Most are very rare Some things are very frequent This is a dominant phenomenon in natural language important in vision, too

What is a word? Word token Word type each actual instance of the word There are two the s in the cat sat on the mat count with multiplicity e.g. 71, 370 in Tom Sawyer the occurs in the cat sat on the mat count without multiplicity e.g. 8, 018 in Tom Sawyer Are these two the same word? ate, eat, eating stock, stocking (perhaps if they re both verbs, but...)

From Manning and Schutze; recall there are 71, 370 word tokens in Tom Sawyer

From Manning and Schutze; recall there are 8, 018 word types

Zipf s law rank word types by frequency, highest first each word type then has: a frequency, f a rank, r Zipf s law: f r = constant

Figure from Manning and Schutze

Zipf s law Qualitatively, assuming there are many words few very common words moderate number of medium frequency words very many low frequency words Implication: we will spend a lot of effort modelling phenomena we hardly ever observe

Figure from Manning and Schutze

Collocations: an example Collocations are: turn of phrase or accepted usage where whole is perceived to have an existence beyond sum of parts compounds - disk drive phrasal verbs - make up stock phrases - bacon and eggs; steak and kidney; egg and bacon; etc.

Finding possible collocations Strategy 1: find pairs of words with high frequency Figure from Manning and Schutze

Finding possible collocations Strategy 2: find high frequency pairs of words and then filter them, rejecting any pairs(triples) that do not correspond to part of speech patterns. Figure from Manning and Schutze

Finding possible collocations - 3 Figure from Manning and Schutze

Probability and models I assume very basic knowledge of probability and conditional probability. Build and investigate procedures to predict words given words e.g. english given french evaluate interpretations of words

Modelling strings of letters Alphabet: 27 tokens (each letter, space; no cases) Simplest models: M1: tokens are independent, identically distributed, have uniform probability M2: tokens are independent, identically distributed, have different probs. Which is better? and why? compare P(M1 S) with P(M2 S) using Bayes rule

Conditional probability models M1 and M2 give quite poor results, M2 much better than M1 Now consider conditional models we condition a letter on some previous letters 1, 2,... sometimes known as Markov models these are significantly better in practice we need tools to understand how much better; coming divergence: application of markov models in computer vision

Macroscopic model Text Microscopic model

Texture mapping Getting enough texture observations buy it tile it synthesize it Putting the texture in the right place applying texture to surfaces - artists, parametrization, etc. rendering - ray trace, interpolation

Texture synthesis Use image as a source of probability model Choose pixel values by matching neighbourhood, then filling in Matching process look at pixel differences count only synthesized pixels

From Image analogies, Herzmann et al, SIGGRAPH 2001