Statistical Natural Language Processing: an introduction

Size: px
Start display at page:

Download "Statistical Natural Language Processing: an introduction"

Transcription

1 Statistical Natural Language Processing: an introduction +contents of the 2016 course 1

2 Modeling language Language is complex, adaptive system Storing and processing text and speech Large datasets We want to make systems that ' understand' Take into account language related phenomena Building models about natural language using large data sets 2

3 Statistical Natural Language Processing Methodological basis: machine learning, pattern recognition, probability theory, statistics and signal processing Related fields: computational linguistics, corpus linguistics, phonetics, speech processing, discourse analysis, cognitive science, artificial intelligence 3

4 What's in a language? Phonetics and phonology: the physical sounds the patterns of sounds Morphology: The different building blocks of words Syntax: The grammatical structure Semantics: The meaning of words Pragmatics, discourse, spoken interaction... 4

5 Application areas Information retrieval Text clustering and classification Automatic speech recognition Statistical machine translation Natural language interfaces Word sense disambiguation Syntactic parsing... 5

6 Information retrieval 6

7 PageRank algorithm 7

8 Text clustering 8

9 Speech recognition 9

10 Natural language interfaces 10

11 Machine translation 11

12 Machine translation: large probabilistic models 12

13 Discussion Discuss 10 mins in groups of three or four: What kind of Natural Language Processing applications would be useful in your daily life? Are there applications you already use? How do they work? What does not work? 13

14 Complexity of languages A large proportion of modern human activity in its different forms is based on the use of language Large variation: morphology and syntactic structures Complexity of natural language(s) More than 6000 languages, many more dialects Each language a large number of different word forms Each word is understood differently by each speaker of a language at least to some degree 14

15 Languages in the internet 15

16 EU languages 16

17 Challenges of segmentation Modeling morphology - segmenting words istua "to sit", istuutua to sit down, Istun "I sit", istahdan "I sit down for a while" istahtaisin "I would sit down for a while" istahtaisinko? "should I sit down for a while?" istahtaisinkohan? "I wonder if I should sit down for a while?" Where are the word boundaries? 17

18 Challenge of modeling syntax 18

19 Challenge: semantics How to model the meaning of words? Semantic similarity: Vector space models Understanding the meaning of words? Subjectivity: learning language through individual life paths and thus end up having different ways of understanding and producing language. > How is successful communication possible? 19

20 Challenge of ambiguity break, cut, run, play, make, light, set, hold, clear, give, draw, take, fall, pass, head, etc. ( "haku" N ELA PL (of search) "hauki" N ELA PL (of pike) "hauis" N PTV SG (part of biceps) ( Big children and adults saw a man with a telescope 20

21 Example: color naming 21

22 Complex concepts: e.g. concept of computation 22

23 Different cultural contexts 23

24 Challenge of encoding world knowledge For good performance, world knowledge is needed Quantitatively this is challenging Qualitatively there are also many problems (mapping between language and the world is complex, cf. examples above) Note: world is essentially dynamic, continuous and multimodal,- symbolic systems are not 24

25 Corpus-based methods Corpora are large collections of text Annotated: add knowledge about words or structure into corpus Or just plain text Statistical information on Distribution of words and parts of words Structure Word similarity Allow us to build models and test hypotheses Allow us to explore Choose the best models based on statistics 25

26 Read more Manning & Schütze: Foundations of Statistical Natural language processing Chapter 1: Chapter 2: Probability and Information Theory basics Exercises will be on these topics this week 26

27 T Statistical natural language processing 5 cr, graded 1 5 based on exam + project work 10 lectures: Wed 12:15 14:00 in T3, Jan 20 Mar exercise sessions: Thu 14:15 16:00 in T5 Text book: C. Manning, H. Schütze, Foundations of Statistical Natural Language Processing. MIT Press. Home page: 27

28 Course personnel Responsible professor & lecturer: Assistant & exercises: Stig-Arne Grönroos Project work: Krista Lagus Expert lecturers: Oskar Kohonen, Kalle Palomäki, Mari-Sanna Paukkeri, Matti Varjokallio, Teemu Ruokolainen, Sami Virpioja, Juho Rousu 28

29 Goals To learn how statistical and adaptive methods are used in information retrieval, machine translation, text mining, speech processing and related areas to process natural language data To learn how to apply the basic methods and techniques for clustering, classification, hidden Markov models and Bayesian models to model natural language 29

30 Lectures in the course 20 Jan : / 27 Jan : Sentence level processing / Oskar Kohonen 03 Feb : Speech recognition / Kalle Palomäki 10 Feb : Term project and Sentiment analysis / Krista Lagus 24 Feb : Vector spaces & Information retrieval / Mari-Sanna Paukkeri 02 Mar : Statistical language models / 09 Mar : Morpheme-level processing / Matti Varjokallio 16 Mar : Tagging / Teemu Ruokolainen 23 Mar : Statistical machine translation / Sami Virpioja 30 Mar : Text classification using kernel methods / Juho Rousu 30

31 How to pass the course? Participate actively in each lecture, read the corresponding material and ask questions to learn the basics Participate actively in each exercise session after each lecture to learn how to solve the problems, in practice Participate actively in project work to learn to apply your knowledge Participate in the examination to show how well you have learned the topics of the course 31

32 Questions? Responsible professor & lecturer: Assistant & exercises: Stig-Arne Grönroos Project work: Krista Lagus s: Home page: 32

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Study Plan for Master of Arts in Applied Linguistics

Study Plan for Master of Arts in Applied Linguistics Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment

More information

Master of Arts in Linguistics Syllabus

Master of Arts in Linguistics Syllabus Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university

More information

209 THE STRUCTURE AND USE OF ENGLISH.

209 THE STRUCTURE AND USE OF ENGLISH. 209 THE STRUCTURE AND USE OF ENGLISH. (3) A general survey of the history, structure, and use of the English language. Topics investigated include: the history of the English language; elements of the

More information

Language Technology based on Big Data: Current Situation and Future Perspectives

Language Technology based on Big Data: Current Situation and Future Perspectives Language Technology based on Big Data: Current Situation and Future Perspectives Timo Honkela 30 October 2014 Centre for Preservation and Digitisation Department of Modern Languages KITES-symposium Introductory

More information

An Overview of Applied Linguistics

An Overview of Applied Linguistics An Overview of Applied Linguistics Edited by: Norbert Schmitt Abeer Alharbi What is Linguistics? It is a scientific study of a language It s goal is To describe the varieties of languages and explain the

More information

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage

CS 6740 / INFO 6300. Ad-hoc IR. Graduate-level introduction to technologies for the computational treatment of information in humanlanguage CS 6740 / INFO 6300 Advanced d Language Technologies Graduate-level introduction to technologies for the computational treatment of information in humanlanguage form, covering natural-language processing

More information

Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University

Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University Master of Arts Program in Linguistics for Communication Department of Linguistics Faculty of Liberal Arts Thammasat University 1. Academic Program Master of Arts Program in Linguistics for Communication

More information

Why major in linguistics (and what does a linguist do)?

Why major in linguistics (and what does a linguist do)? Why major in linguistics (and what does a linguist do)? Written by Monica Macaulay and Kristen Syrett What is linguistics? If you are considering a linguistics major, you probably already know at least

More information

How To Complete The Danish Masters Program In Lct

How To Complete The Danish Masters Program In Lct European Masters Program in Language and Communication Technologies (LCT) Modules Handbook for Prospective Students European Masters Program in LCT - Modules Handbook Page ii Chapter 1 Study Program The

More information

1. Introduction 1.1 Contact

1. Introduction 1.1 Contact English Discourse Analysis: An Introduction Rachel Whittaker (Grp 41) Mick O Donnell, Laura Hidalgo (Grp 46) 1.1 Contact Group 46: Mick O Donnell (7 Feb 23 March) Modulo VI bis 311 michael.odonnell@uam.es

More information

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014

Introduction. BM1 Advanced Natural Language Processing. Alexander Koller. 17 October 2014 Introduction! BM1 Advanced Natural Language Processing Alexander Koller! 17 October 2014 Outline What is computational linguistics? Topics of this course Organizational issues Siri Text prediction Facebook

More information

Reading Competencies

Reading Competencies Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies

More information

Robust Methods for Automatic Transcription and Alignment of Speech Signals

Robust Methods for Automatic Transcription and Alignment of Speech Signals Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Language and Computation

Language and Computation Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters

More information

CS 6220: Data Mining Techniques Course Project Description

CS 6220: Data Mining Techniques Course Project Description CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to

More information

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic

Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development

Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Scandinavian Dialect Syntax Transnational collaboration, data collection, and resource development Janne Bondi Johannessen, Signe Laake, Kristin Hagen, Øystein Alexander Vangsnes, Tor Anders Åfarli, Arne

More information

Semantic analysis of text and speech

Semantic analysis of text and speech Semantic analysis of text and speech SGN-9206 Signal processing graduate seminar II, Fall 2007 Anssi Klapuri Institute of Signal Processing, Tampere University of Technology, Finland Outline What is semantic

More information

2008-09 Online Catalogue

2008-09 Online Catalogue 3/11/2009 Academic Offerings : Catalogue 2008 2008-09 Online Catalogue Academic Offerings 2 > Linguistics (Minor) 3 Linguistics (Undergraduate Minor) Specific Requirements College or School: Department

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students

European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students European Masters Program in Language and Communication Technologies (LCT) Module Handbook for Prospective Students October, 2012 European Masters Program in LCT Module Handbook Page 1 Contents 1 What is

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix

St. Petersburg College. RED 4335/Reading in the Content Area. Florida Reading Endorsement Competencies 1 & 2. Reading Alignment Matrix Course Credit In-service points St. Petersburg College RED 4335/Reading in the Content Area Florida Reading Endorsement Competencies 1 & 2 Reading Alignment Matrix Text Rule 6A 4.0292 Specialization Requirements

More information

The Seven Practice Areas of Text Analytics

The Seven Practice Areas of Text Analytics Excerpt from: Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Available now:

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus

University of Massachusetts Boston Applied Linguistics Graduate Program. APLING 601 Introduction to Linguistics. Syllabus University of Massachusetts Boston Applied Linguistics Graduate Program APLING 601 Introduction to Linguistics Syllabus Course Description: This course examines the nature and origin of language, the history

More information

Bachelor in Deaf Studies

Bachelor in Deaf Studies Bachelor in Deaf Studies COURSE CODE: PLACES 2008: POINTS 2007: AWARD: 20 n/a Degree ENTRY REQUIREMENTS: Matriculation requirements apply. Students must hold the Leaving Certificate or equivalent, with

More information

Skills for Effective Business Communication: Efficiency, Collaboration, and Success

Skills for Effective Business Communication: Efficiency, Collaboration, and Success Skills for Effective Business Communication: Efficiency, Collaboration, and Success Michael Shorenstein Center for Communication Kennedy School of Government Harvard University September 30, 2014 I: Introduction

More information

Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013.

Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013. Psychology G4470. Psychology and Neuropsychology of Language. Spring 2013. I. Course description, as it will appear in the bulletins. II. A full description of the content of the course III. Rationale

More information

THE BACHELOR S DEGREE IN SPANISH

THE BACHELOR S DEGREE IN SPANISH Academic regulations for THE BACHELOR S DEGREE IN SPANISH THE FACULTY OF HUMANITIES THE UNIVERSITY OF AARHUS 2007 1 Framework conditions Heading Title Prepared by Effective date Prescribed points Text

More information

A. Schedule: Reading, problem set #2, midterm. B. Problem set #1: Aim to have this for you by Thursday (but it could be Tuesday)

A. Schedule: Reading, problem set #2, midterm. B. Problem set #1: Aim to have this for you by Thursday (but it could be Tuesday) Lecture 5: Fallacies of Clarity Vagueness and Ambiguity Philosophy 130 September 23, 25 & 30, 2014 O Rourke I. Administrative A. Schedule: Reading, problem set #2, midterm B. Problem set #1: Aim to have

More information

ÄSSA12, No English Translation Available, 30 credits Svenska som andraspråk 1 A, gy, 30 högskolepoäng First Cycle / Grundnivå

ÄSSA12, No English Translation Available, 30 credits Svenska som andraspråk 1 A, gy, 30 högskolepoäng First Cycle / Grundnivå Faculties of Humanities and Theology ÄSSA12, No English Translation Available, 30 credits Svenska som andraspråk 1 A, gy, 30 högskolepoäng First Cycle / Grundnivå Details of approval The syllabus was approved

More information

UNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH

UNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH UNIVERSITY OF PUERTO RICO RIO PIEDRAS CAMPUS COLLEGE OF HUMANITIES DEPARTMENT OF ENGLISH Instructor: Dr. Alicia Pousada Course Title: Study of language Course Number: INGL 4205 Number of Credit Hours:

More information

University of Khartoum. Faculty of Arts. Department of English. MA in Teaching English to Speakers of Other Languages (TESOL) by Courses

University of Khartoum. Faculty of Arts. Department of English. MA in Teaching English to Speakers of Other Languages (TESOL) by Courses University of Khartoum Faculty of Arts Department of English MA in Teaching English to Speakers of Other Languages (TESOL) by Courses 3 Table of Contents Contents Introduction... 5 Rationale... 5 Objectives...

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

School of Computer Science

School of Computer Science School of Computer Science Head of School Professor S Linton Taught Programmes M.Sc. Advanced Computer Science Artificial Intelligence Computing and Information Technology Information Technology Human

More information

Annotation in Language Documentation

Annotation in Language Documentation Annotation in Language Documentation Univ. Hamburg Workshop Annotation SEBASTIAN DRUDE 2015-10-29 Topics 1. Language Documentation 2. Data and Annotation (theory) 3. Types and interdependencies of Annotations

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

The course is included in the CPD programme for teachers II.

The course is included in the CPD programme for teachers II. Faculties of Humanities and Theology LLYU72, Swedish as a Second Language for Upper Secondary School Teachers, 60 credits Svenska som andraspråk för lärare i gymnasieskolan, 60 högskolepoäng First Cycle

More information

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS

MASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS University of Cambridge: Programme Specifications Every effort has been made to ensure the accuracy of the information in this programme specification. Programme specifications are produced and then reviewed

More information

Linguistics 2288B Introductory General Linguistics 2011-12

Linguistics 2288B Introductory General Linguistics 2011-12 Linguistics 2288B Introductory General Linguistics 2011-12 Class: Instructor: Monday 12:30 p.m 2:30 p.m., Wednesday 12:30 p.m. 1:30 p.m., TH 3154 Ileana Paul UC 136b 519-661-2111 x 85360 ileana@uwo.ca

More information

Teaching Formal Methods for Computational Linguistics at Uppsala University

Teaching Formal Methods for Computational Linguistics at Uppsala University Teaching Formal Methods for Computational Linguistics at Uppsala University Roussanka Loukanova Computational Linguistics Dept. of Linguistics and Philology, Uppsala University P.O. Box 635, 751 26 Uppsala,

More information

Appendices master s degree programme Artificial Intelligence 2014-2015

Appendices master s degree programme Artificial Intelligence 2014-2015 Appendices master s degree programme Artificial Intelligence 2014-2015 Appendix I Teaching outcomes of the degree programme (art. 1.3) 1. The master demonstrates knowledge, understanding and the ability

More information

How To Teach Reading

How To Teach Reading Florida Reading Endorsement Alignment Matrix Competency 1 The * designates which of the reading endorsement competencies are specific to the competencies for English to Speakers of Languages (ESOL). The

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Master of Science in Artificial Intelligence

Master of Science in Artificial Intelligence Master of Science in Artificial Intelligence Options: Engineering and Computer Science (ECS) Speech and Language Technology (SLT) Big Data Analytics (BDA) Faculty of Engineering Science Faculty of Science

More information

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures

More information

Language Modeling. Chapter 1. 1.1 Introduction

Language Modeling. Chapter 1. 1.1 Introduction Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

More information

A Proposal for the use of Artificial Intelligence in Spend-Analytics

A Proposal for the use of Artificial Intelligence in Spend-Analytics A Proposal for the use of Artificial Intelligence in Spend-Analytics Mark Bishop, Sebastian Danicic, John Howroyd and Andrew Martin Our core team Mark Bishop PhD studied Cybernetics and Computer Science

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language

More information

Contemporary Linguistics

Contemporary Linguistics Contemporary Linguistics An Introduction Editedby WILLIAM O'GRADY MICHAEL DOBROVOLSKY FRANCIS KATAMBA LONGMAN London and New York Table of contents Dedication Epigraph Series list Acknowledgements Preface

More information

Processing: current projects and research at the IXA Group

Processing: current projects and research at the IXA Group Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive

More information

STANDARDS FOR ENGLISH-AS-A-SECOND LANGUAGE TEACHERS

STANDARDS FOR ENGLISH-AS-A-SECOND LANGUAGE TEACHERS STANDARDS FOR ENGLISH-AS-A-SECOND LANGUAGE TEACHERS Introduction The English as a Second Language standards describe the knowledge and skills that beginning teachers must have to meet expectations for

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Introduction Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 13 Introduction Goal of machine learning: Automatically learn how to

More information

Graduate School of Informatics

Graduate School of Informatics Graduate School of Informatics Admissions Policy '( ) ' ' - Master's Degree Program Major Enrollment Capacity 40 40 Doctor's Degree Program Major Enrollment Capacity 8 1 M. Entrance examination for international

More information

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni

Syntactic Theory. Background and Transformational Grammar. Dr. Dan Flickinger & PD Dr. Valia Kordoni Syntactic Theory Background and Transformational Grammar Dr. Dan Flickinger & PD Dr. Valia Kordoni Department of Computational Linguistics Saarland University October 28, 2011 Early work on grammar There

More information

Data at the SFB "Mehrsprachigkeit"

Data at the SFB Mehrsprachigkeit 1 Workshop on multilingual data, 08 July 2003 MULTILINGUAL DATABASE: Obstacles and Opportunities Thomas Schmidt, Project Zb Data at the SFB "Mehrsprachigkeit" K1: Japanese and German expert discourse in

More information

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky DATA MINING FOR BUSINESS INTELLIGENCE PROFESSOR MAYTAL SAAR-TSECHANSKY Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky This course provides a comprehensive introduction

More information

Hybrid Strategies. for better products and shorter time-to-market

Hybrid Strategies. for better products and shorter time-to-market Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,

More information

Natural Language Processing. What s this story about?

Natural Language Processing. What s this story about? Natural Language Processing (adapted from Jim Martin) 1 What s this story about? 17 the 13 and 10 of 10 a 8 to 7 s 6 in 6 Romney 6 Mr 5 that 5 state 5 for 4 industry 4 automotiv e 4 Michigan 3 on 3 his

More information

Career info session Nov. 17th, 2015 1 / 24

Career info session Nov. 17th, 2015 1 / 24 Career info session Nov. 17th, 2015 1 / 24 2 / 24 Outline 1 Jobs 2 Graduate schools 3 Applying 3 / 24 Your skills Comprehensive understanding of language and linguistic theory Analytical skills in all

More information

Culture and Language. What We Say Influences What We Think, What We Feel and What We Believe

Culture and Language. What We Say Influences What We Think, What We Feel and What We Believe Culture and Language What We Say Influences What We Think, What We Feel and What We Believe Unique Human Ability Ability to create and use language is the most distinctive feature of humans Humans learn

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014

COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE. Fall 2014 COURSE SYLLABUS ESU 561 ASPECTS OF THE ENGLISH LANGUAGE Fall 2014 EDU 561 (85515) Instructor: Bart Weyand Classroom: Online TEL: (207) 985-7140 E-Mail: weyand@maine.edu COURSE DESCRIPTION: This is a practical

More information

Prosodic Phrasing: Machine and Human Evaluation

Prosodic Phrasing: Machine and Human Evaluation Prosodic Phrasing: Machine and Human Evaluation M. Céu Viana*, Luís C. Oliveira**, Ana I. Mata***, *CLUL, **INESC-ID/IST, ***FLUL/CLUL Rua Alves Redol 9, 1000 Lisboa, Portugal mcv@clul.ul.pt, lco@inesc-id.pt,

More information

in Language, Culture, and Communication

in Language, Culture, and Communication 22 April 2013 Study Plan M. A. Degree in Language, Culture, and Communication Linguistics Department 2012/2013 Faculty of Foreign Languages - Jordan University 1 STUDY PLAN M. A. DEGREE IN LANGUAGE, CULTURE

More information

The primary goals of the M.A. TESOL Program are to impart in our students:

The primary goals of the M.A. TESOL Program are to impart in our students: Quality of Academic Program Goals The primary goals of the M.A. TESOL Program are to impart in our students: (1) knowledge of language, i.e., knowledge of the major elements of language as a system consisting

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic

More information

A System for Labeling Self-Repairs in Speech 1

A System for Labeling Self-Repairs in Speech 1 A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

The Prolog Interface to the Unstructured Information Management Architecture

The Prolog Interface to the Unstructured Information Management Architecture The Prolog Interface to the Unstructured Information Management Architecture Paul Fodor 1, Adam Lally 2, David Ferrucci 2 1 Stony Brook University, Stony Brook, NY 11794, USA, pfodor@cs.sunysb.edu 2 IBM

More information

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics?

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics? Historical Linguistics Diachronic Analysis What is Historical Linguistics? Historical linguistics is the study of how languages change over time and of their relationships with other languages. All languages

More information

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2

The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

Technical Report. Overview. Revisions in this Edition. Four-Level Assessment Process

Technical Report. Overview. Revisions in this Edition. Four-Level Assessment Process Technical Report Overview The Clinical Evaluation of Language Fundamentals Fourth Edition (CELF 4) is an individually administered test for determining if a student (ages 5 through 21 years) has a language

More information

AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING

AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING Computing and Informatics, Vol. 26, 2007, 565 576 AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING Marcia T. Mitchell Computer and Information Sciences Department Saint Peter

More information

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING

DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING DEGREE PLAN INSTRUCTIONS FOR COMPUTER ENGINEERING Fall 2000 The instructions contained in this packet are to be used as a guide in preparing the Departmental Computer Science Degree Plan Form for the Bachelor's

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area 398 / SE05117 Advanced Cover software lifecycle: waterfall model, V model, spiral model, RUP and

More information

Introduction. Philipp Koehn. 28 January 2016

Introduction. Philipp Koehn. 28 January 2016 Introduction Philipp Koehn 28 January 2016 Administrativa 1 Class web site: http://www.mt-class.org/jhu/ Tuesdays and Thursdays, 1:30-2:45, Hodson 313 Instructor: Philipp Koehn (with help from Matt Post)

More information

What Is Linguistics? December 1992 Center for Applied Linguistics

What Is Linguistics? December 1992 Center for Applied Linguistics What Is Linguistics? December 1992 Center for Applied Linguistics Linguistics is the study of language. Knowledge of linguistics, however, is different from knowledge of a language. Just as a person is

More information

Web 3.0 image search: a World First

Web 3.0 image search: a World First Web 3.0 image search: a World First The digital age has provided a virtually free worldwide digital distribution infrastructure through the internet. Many areas of commerce, government and academia have

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Curriculum for the basic subject at master s level in. IT and Cognition, the 2013 curriculum. Adjusted 2014

Curriculum for the basic subject at master s level in. IT and Cognition, the 2013 curriculum. Adjusted 2014 D E T H U M A N I S T I S K E F A K U L T E T K Ø B E N H A V N S U N I V E R S I T E T Curriculum for the basic subject at master s level in IT and Cognition, the 2013 curriculum Adjusted 2014 Department

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review

Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review Identifying Thesis and Conclusion Statements in Student Essays to Scaffold Peer Review Mohammad H. Falakmasir, Kevin D. Ashley, Christian D. Schunn, Diane J. Litman Learning Research and Development Center,

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information