A MIXTURE MODEL WITH SHARING

Size: px
Start display at page:

Download "A MIXTURE MODEL WITH SHARING"

Transcription

1 A MIXTURE MODEL WITH SHARING FOR LEXICAL SEMANTICS Joseph Reisinger and Raymond Mooney The University of Texas at Austin 1

2 Outline Distributional lexical semantics Inadequacies of standard centroid representation Richer representations of homonymy using mixture models A tiered clustering model for polysemy: = + word senses background sense-specific deltas 2

3 Distributional Lexical Semantics Represent meaning as a point/vector in a highdimensional space Word relatedness correlates with some distance metric Attributional / relational / resource-based, Almuhareb and Poesio (2004), Baroni and Lenci (2009), Bullinaria and Levy (2007), Erk (2007), Griffiths et al. (2007), Landauer and Dumais (1997), Moldovan (2006), Padó and Lapata (2007), Pantel and Pennacchiotti (2006), Sahlgren (2006), Turney and Pantel (2010) 3

4 (Some of) Distributional Lexical Semantics word occurrences / context vectors The history of Oz prior to The Wonderful Wizard of Oz (often called the Rowling describes the beloved wizard Dumbledore as Machiavellian and says Merlin is a legendary figure best known as the wizard featured in the Arthurian legend A wizard comedian is known to have survived eating this plant on a bet, though he is still purple True Image is known for its simple, wizard driven interface, and received positive Thunder did a cover of "Pinball Wizard" to be featured on "Hollywood Rocks" compute centroid evil, powerful, magic, wizard, Merlin, spells, Harry 4

5 One Word One Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Find the centroid of the individual word occurrence context vectors Conflates senses / doesn t reliably account for thematic variability in usage 5 e.g., Schütze (1997)

6 One Word One Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Find the centroid of the individual word occurrence context vectors Conflates senses / doesn t reliably account for thematic variability in usage 5 e.g., Schütze (1997)

7 Multi-Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Use a mixture model to cluster occurrences of each target word separately. Doesn t find lexicographic senses; captures contextual variance directly. 6

8 WS-353 Correlation with Human Relatedness Judgements Spearman s ρ football - soccer 9.03 doctor - personnel 5.00 delay - news 3.31 Mars - water All High-Polysemy Single Prototype Multi Prototype 353 words, biased towards pairs with high similarity High polysemy subset generated by counting WN senses 7 Dataset: Finkelstein et. al 2001

9 Shared Structure Matters Spearman s ρ All High-Polysemy brother hit 4.7 hit brother 4.1 quake hit 3.4 hit quake 1.2 Single Prototype Multi Prototype Selectional Preference: Predict typical arguments for verbs (e.g. things that can eat or things that can be shot) Background cluster captures commonalities between argument fillers. 8 Dataset: Padó et. al 2007

10 Tiered Clustering There are cases where is a high degree of overlap between senses e.g. polysemous words like line or raise or i.e. in selectional preference where argument fillers may have some common structure MP models homonymy but not polysemy- it cannot account for shared structure Tiered Clustering introduces a background component to account for this = + word senses background sense-specific deltas

11 Tiered Clustering 10

12 Tiered Clustering φ 1 Merlin King Arthur Arthurian 10

13 Tiered Clustering φ 1 φ 2 Merlin King Arthur Arthurian fairy wicked scene tale 10

14 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore 10

15 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard 10

16 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 10

17 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 10

18 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11

19 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11

20 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11

21 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11

22 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b φ 3 φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11

23 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b φ 3 φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 12

24 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Merlin Rowling is a legendary describes figure the beloved best known wizard as Dumbledore the wizard featured as Machiavellian in the Arthurian and says legend 13

25 Tiered Clustering φ 1 φ 2 φ 3 φ b Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore evil powerful magic wizard θ d α Beta α d D, φ d β,g 0 DP β,g 0 d D, φ back β back Dirichlet β back z i,d θ d Bernoulli θ d i w d, (cluster proportions) (clusters) (background / shared) (tier indicator) w i,d φ d,z i,d Mult φ back z i,d 1 Mult φ d otherwise i w d, (features) here controls the per-data tier distribution 14

26 Latent Dirichlet Allocation θ d α Dirichlet(α), d D, (topic proportions) φ t β Dirichlet(β), t T, (topics) z id θ d Mult(θ d ), i w d, (topic indicators) w id φ zid Mult(φ zid ), i w d, (words) Tiered Clustering θ d α Beta α d D, φ d β,g 0 DP β,g 0 d D, φ back β back Dirichlet β back z i,d θ d Bernoulli θ d i w d, (cluster proportions) (clusters) (background / shared) (tier indicator) w i,d φ d,z i,d Mult φ back z i,d 1 Mult φ d otherwise i w d, (features) here controls the per-data tier distribution 15

27 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16

28 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16

29 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16

30 Experimental Setup Features: Unigram context features, 10 word windows; frequency-based pruning Model Training: Gibbs sampling; convergence when # of cluster swaps falls below a fixed threshold Evaluation: Spearman s ρ with human similarity judgements Inferred similarity is cosine using tf-idf features, adapted for tiered clustering: AvgSim(w, w )= 1 K 1 K 2 K 1 j=1 K 2 k=1 d(φ w k, φ w j ) TieredAvgSim(w, w ) = AvgSim(w, w )+d(φ w back, φ w back) 17

31 WS-353 Correlation with Human Relatedness Judgements Spearman s ρ football - soccer 9.03 doctor - personnel 5.00 delay - news 3.31 Mars - water All High-Polysemy Single Prototype Multi Prototype Tiered Clustering 353 words, biased towards pairs with high similarity High polysemy subset generated by counting WN senses 18 Dataset: Finkelstein et. al 2001

32 WN-Evocation Relatedness Correlation Spearman s ρ social - urban 3.3 human - old 0.9 jealous - abundant 0.2 retired - ethical All High-Similarity High-Polysemy Single Prototype Multi Prototype Tiered Clustering Much higher degree of unrelated words than WS-353, more uniform sample over word pairs High similarity set restricts to pairs in the top quartile by human rating 19 Dataset: Ma et. al 2009

33 Padó Selectional Preference Spearman s ρ All High-Polysemy brother hit 4.7 hit brother 4.1 quake hit 3.4 hit quake 1.2 Single Prototype Multi Prototype Tiered Clustering Predict typical arguments for verbs (e.g. things that can eat or things that can be shot) Separate model for each argument slot Background cluster captures commonalities between argument fillers. Dataset: Padó et. al

34 Conclusion Introduced a latent variable model explicitly accounting for shared structure when clustering A priori assumption that features can be separated into shared and idiosyncratic components Showed significant improvement over baseline for word-relatedness and selectional preference 21

35 Thanks!

36 Questions

37 WS-353 Correlation with Human Relatedness Judgements 85 Spearman s ρ 77.5 Single Prototype Multi Prototype Tiered Clustering MP+SP 70 All High-Polysemy 353 words, biased towards pairs with abnormally high similarity High polysemy subset generated by counting WN senses 24 (Finkelstein et. al 2001)

38 WN-Evocation Relatedness Correlation 30 Spearman s ρ 20 Single Prototype Multi Prototype Tiered Clustering MP+SP 10 All High-Similarity High-Polysemy Much higher degree of unrelated words than WS-353, more uniform sample over word pairs High similarity set restricts to pairs in the top quartile by human rating 25 (Ma et. al 2009)

39 Padó Selectional Preference 30 Spearman s ρ 20 Single Prototype Multi Prototype Tiered Clustering MP+SP 10 All High-Polysemy Background cluster captures commonalities between argument fillers, e.g. things that can eat or things that can be shot. 26 (Padó et. al 2007)

Multi-Prototype Vector-Space Models of Word Meaning

Multi-Prototype Vector-Space Models of Word Meaning Multi-Prototype Vector-Space Models of Word Meaning Joseph Reisinger Department of Computer Science The University of Texas at Austin 1 University Station C0500 Austin, TX 78712-0233 joeraii@cs.utexas.edu

More information

A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection

A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection Gabriella Lapesa 2,1 1 Universität Osnabrück Institut für Kognitionswissenschaft Albrechtstr. 28,

More information

Improving Word Representations via Global Context and Multiple Word Prototypes

Improving Word Representations via Global Context and Multiple Word Prototypes Improving Word Representations via Global Context and Multiple Word Prototypes Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford,

More information

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education

Fry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education Set 1 The people Write it down By the water Who will make it? You and I What will they do? He called me. We had their dog. What did they say? When would you go? No way A number of people One or two How

More information

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS

FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie

More information

Exploratory analysis of semantic categories: comparing data-driven and human similarity judgments

Exploratory analysis of semantic categories: comparing data-driven and human similarity judgments Lindh-Knuutila and Honkela Computational Cognitive Science (2015) 1:2 DOI 10.1186/s40469-015-0001-1 RESEARCH Open Access Exploratory analysis of semantic categories: comparing data-driven and human similarity

More information

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse

Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo

More information

State Trivia. You found a shortcut! Board Game. Super Teacher Worksheets - www.superteacherworksheets.com

State Trivia. You found a shortcut! Board Game. Super Teacher Worksheets - www.superteacherworksheets.com Super Teacher Worksheets - www.superteacherworksheets.com If a player lands on Lose a turn, they skip their next turn. A player can only cross the Route 66 shortcut if they land on the square that says,

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Gambling Commission. Trends in Gambling Behaviour 2008-2014

Gambling Commission. Trends in Gambling Behaviour 2008-2014 Gambling Commission Trends in Gambling Behaviour 2008-2014 Report Name Prepared for Authors Project team Trends in Gambling Behaviour 2008-2014 Gambling Commission Doros Georgiou Karl King Adrian Talbot

More information

Modeling coherence in ESOL learner texts

Modeling coherence in ESOL learner texts University of Cambridge Computer Lab Building Educational Applications NAACL 2012 Outline 1 2 3 4 The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion The Task: Automated Text

More information

Which of these words are connected with magic? Write them into the cauldron.

Which of these words are connected with magic? Write them into the cauldron. 1 quick quiz Which of these words are connected with magic? Write them into the cauldron. spell professor wizard magical tropical bewitched prince potions school witchcraft marriage Hogwarts 2 What does

More information

The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON

The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON This lesson: 1/ Grammar 2/ Speaking: about ideas and graphs 3/ Reading: for gist and detail 4/ Writing: make a poster

More information

Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space

Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space Marco Baroni and Roberto Zamparelli Center for Mind/Brain Sciences, University of Trento Rovereto

More information

Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>

Practical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com> Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Probabilistic topic models for sentiment analysis on the Web

Probabilistic topic models for sentiment analysis on the Web University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as

More information

Using 3D Illustration and Animation in e-learning

Using 3D Illustration and Animation in e-learning 302 Using 3D Illustration and Animation in e-learning Sheldon Murphy, May 17 & 18, 2007 Produced by New Techniques for Designing and Example from TV 1 Question: What s the dead giveaway from the commercial

More information

2. SEMANTIC RELATIONS

2. SEMANTIC RELATIONS 2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the

More information

Phonics. High Frequency Words P.008. Objective The student will read high frequency words.

Phonics. High Frequency Words P.008. Objective The student will read high frequency words. P.008 Jumping Words Objective The student will read high frequency words. Materials High frequency words (P.HFW.005 - P.HFW.064) Choose target words. Checkerboard and checkers (Activity Master P.008.AM1a

More information

2nd Grade Language Arts Practice Test

2nd Grade Language Arts Practice Test 2nd Grade Language Arts Practice Name: Instructions: Copyright 2000-2002 Measured Progress, All Rights Reserved : 1. Which word in the paragraph below is used too much? My music teacher is nice. He plays

More information

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Statistics E100 Fall 2013 Practice Midterm I - A Solutions STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

More information

Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management

Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management Enrique Amigó 1, Damiano Spina 1, Bernardino Beotas 2, and Julio Gonzalo 1 1 Departamento de Lenguajes y Sistemas

More information

A Systematic Study of Semantic Vector Space Model Parameters

A Systematic Study of Semantic Vector Space Model Parameters A Systematic Study of Semantic Vector Space Model Parameters Douwe Kiela University of Cambridge Computer Laboratory douwe.kiela@cl.cam.ac.uk Stephen Clark University of Cambridge Computer Laboratory sc609@cam.ac.uk

More information

That s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12

That s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12 That s Not Fair! ASSESSMENT # Benchmark Grades: 9-12 Summary: Students consider the difference between fair and unfair games, using probability to analyze games. The probability will be used to find ways

More information

Spazi vettoriali e misure di similaritá

Spazi vettoriali e misure di similaritá Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Anansi and the Moss- Covered Rock Retold by Eric A. Kimmel & Adapted by Whitney Peterson. Lion Scene

Anansi and the Moss- Covered Rock Retold by Eric A. Kimmel & Adapted by Whitney Peterson. Lion Scene Anansi and the Moss- Covered Rock Retold by Eric A. Kimmel & Adapted by Whitney Peterson Narrator #1: Once upon a time Anansi the Spider was walking, walking, walking through the forest when something

More information

Basic Set Theory. 1. Motivation. Fido Sue. Fred Aristotle Bob. LX 502 - Semantics I September 11, 2008

Basic Set Theory. 1. Motivation. Fido Sue. Fred Aristotle Bob. LX 502 - Semantics I September 11, 2008 Basic Set Theory LX 502 - Semantics I September 11, 2008 1. Motivation When you start reading these notes, the first thing you should be asking yourselves is What is Set Theory and why is it relevant?

More information

Natural Language Processing. Part 4: lexical semantics

Natural Language Processing. Part 4: lexical semantics Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Efficient Cluster Detection and Network Marketing in a Nautural Environment

Efficient Cluster Detection and Network Marketing in a Nautural Environment A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

270 Stages 8 Lærerveiledning Årsplan/periodeplan

270 Stages 8 Lærerveiledning Årsplan/periodeplan 270 Stages 8 Lærerveiledning Årsplan/periodeplan Chapter 1 The Stage Is Yours Talking and writing about yourself To be and to have Question words Words with ea Capitalization Noun plurals Making a graph

More information

Problem sets for BUEC 333 Part 1: Probability and Statistics

Problem sets for BUEC 333 Part 1: Probability and Statistics Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from

More information

Outline of today s lecture

Outline of today s lecture Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?

More information

PUSD High Frequency Word List

PUSD High Frequency Word List PUSD High Frequency Word List For Reading and Spelling Grades K-5 High Frequency or instant words are important because: 1. You can t read a sentence or a paragraph without knowing at least the most common.

More information

Clustering of Polysemic Words

Clustering of Polysemic Words Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,

More information

Using Machine Learning to Predict Human Brain Activity

Using Machine Learning to Predict Human Brain Activity Using Machine Learning to Predict Human Brain Activity Senior Thesis Mahtiyar Bonakdarpour Advisor: Tom Mitchell 1 Abstract Brain imaging studies are geared towards decoding the way the human brain represents

More information

Predicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith

Predicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith Predicting the NFL Using Twitter Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith Disclaimer This talk will not teach you how to become a successful sports bettor. Questions What is the NFL and what

More information

Don t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

Don t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors Don t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors Marco Baroni and Georgiana Dinu and Germán Kruszewski Center for Mind/Brain Sciences (University

More information

AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS

AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-19-25 Available online at http://www.bioinfo.in/contents.php?id=32 AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Inertia, Forces, and Acceleration: The Legacy of Sir Isaac Newton

Inertia, Forces, and Acceleration: The Legacy of Sir Isaac Newton Inertia, Forces, and Acceleration: The Legacy of Sir Isaac Newton Position is a Vector Compare A A ball is 12 meters North of the Sun God to A A ball is 10 meters from here A vector has both a direction

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

A picture is worth five captions

A picture is worth five captions A picture is worth five captions Learning visually grounded word and sentence representations Grzegorz Chrupała (with Ákos Kádár and Afra Alishahi) Learning word (and phrase) meanings Cross-situational

More information

Frames and Commonsense. Winston, Chapter 10

Frames and Commonsense. Winston, Chapter 10 Frames and Commonsense Winston, Chapter 10 Michael Eisenberg and Gerhard Fischer TA: Ann Eisenberg AI Course, Fall 1997 Eisenberg/Fischer 1 Representations powerful ideas the representation principle:

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Harry Potter: Lesson Plan!

Harry Potter: Lesson Plan! Harry Potter: Lesson Plan Aims: By the end of the lesson, students will be able to know more about Harry Potter and the corresponding books. Materials: Harry Potter book covers, character list and picture

More information

CHECKLIST FOR THE DEGREE PROJECT REPORT

CHECKLIST FOR THE DEGREE PROJECT REPORT Kerstin Frenckner, kfrenck@csc.kth.se Copyright CSC 25 mars 2009 CHECKLIST FOR THE DEGREE PROJECT REPORT This checklist has been written to help you check that your report matches the demands that are

More information

The Law and Harry Potter

The Law and Harry Potter The Law and Harry Potter Edited by Jeffrey E. Thomas Franklin G. Snyder Carolina Academic Press Durham, North Carolina Copyright 2010 Jeffrey E. Thomas Franklin G. Snyder All Rights Reserved Library of

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Finding Similar Questions in Large Question and Answer Archives

Finding Similar Questions in Large Question and Answer Archives Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts,

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

BB2798 How Playtech uses predictive analytics to prevent business outages

BB2798 How Playtech uses predictive analytics to prevent business outages BB2798 How Playtech uses predictive analytics to prevent business outages Eli Eyal, Playtech Udi Shagal, HP June 13, 2013 Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained

More information

Hitwise Mobile Client Release FAQ February 2013

Hitwise Mobile Client Release FAQ February 2013 Hitwise Mobile Client Release FAQ February 2013 About Hitwise Mobile Mobile is one of the most dynamic aspects of the Internet today. Understanding the way consumers use the Internet from mobile devices

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Sentence paraphrase detection: When determiners and word order make the difference

Sentence paraphrase detection: When determiners and word order make the difference Sentence paraphrase detection: When determiners and word order make the difference Nghia Pham University of Trento thenghia.pham@unitn.it Yao Zhong Zhang The University of Tokyo zhangyaoz@gmail.com Raffaella

More information

Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia

Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia Martin Scaiano University of Ottawa mscai056@uottawa.ca Diana Inkpen University of Ottawa diana@site.uottawa.com Abstract

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

This image cannot currently be displayed. Course Catalog. Language Arts 600. 2016 Glynlyon, Inc.

This image cannot currently be displayed. Course Catalog. Language Arts 600. 2016 Glynlyon, Inc. This image cannot currently be displayed. Course Catalog Language Arts 600 2016 Glynlyon, Inc. Table of Contents COURSE OVERVIEW... 1 UNIT 1: ELEMENTS OF GRAMMAR... 3 UNIT 2: GRAMMAR USAGE... 3 UNIT 3:

More information

Vocabulary Problem in Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California. fshli, danzigg@cs.usc.

Vocabulary Problem in Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California. fshli, danzigg@cs.usc. Vocabulary Problem in Internet Resource Discovery Technical Report USC-CS-94-594 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781

More information

Performance Evaluation and Optimization of Math-Similarity Search

Performance Evaluation and Optimization of Math-Similarity Search Performance Evaluation and Optimization of Math-Similarity Search Qun Zhang and Abdou Youssef Department of Computer Science The George Washington University Washington DC, 20052, USA Abstract Similarity

More information

Ready, Set, Go! Math Games for Serious Minds

Ready, Set, Go! Math Games for Serious Minds Math Games with Cards and Dice presented at NAGC November, 2013 Ready, Set, Go! Math Games for Serious Minds Rande McCreight Lincoln Public Schools Lincoln, Nebraska Math Games with Cards Close to 20 -

More information

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015. Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups

More information

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa Kotaro Nakayama Takahiro Hara Shojiro Nishio Graduate School of Information Science and Technology,

More information

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment

Topics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer

More information

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

More information

Communications. NQF Level: 3 US No: 8970. Facilitator:... Company:... Commodity:... Date:...

Communications. NQF Level: 3 US No: 8970. Facilitator:... Company:... Commodity:... Date:... NQF Level: 3 US No: 8970 Facilitator Guide Primary Agriculture Communications Facilitator:.......................................... Company:.......................................... Commodity:....................

More information

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

More information

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:

Texas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows: Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos

More information

Managing Incompleteness, Complexity and Scale in Big Data

Managing Incompleteness, Complexity and Scale in Big Data Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Sentiment Analysis for Movie Reviews

Sentiment Analysis for Movie Reviews Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

Buzz Monitoring in Word Space

Buzz Monitoring in Word Space Buzz Monitoring in Word Space Magnus Sahlgren 1,2 and Jussi Karlgren 1,2 1 Gavagai AB 2 SICS (Swedish Institute of Computer Science) Box 1263, SE-16429 Kista, Sweden {mange,jussi}@gavagai.se Abstract.

More information

Mean, Median, and Mode

Mean, Median, and Mode DELTA MATH SCIENCE PARTNERSHIP INITIATIVE M 3 Summer Institutes (Math, Middle School, MS Common Core) Mean, Median, and Mode Hook Problem: To compare two shipments, five packages from each shipment were

More information

Intro to Linguistics Semantics

Intro to Linguistics Semantics Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember

More information

Analyzing Structured and Unstructured Data in Electronic Health Records

Analyzing Structured and Unstructured Data in Electronic Health Records Analyzing Structured and Unstructured Data in Electronic Health Records Jing Zhao & Aron Henriksson Department of Computer and Systems Sciences (DSV) Stockholm University Electronic Health Records Administrative

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Population Research Seminar Series

Population Research Seminar Series Survey and Statistical Methods Core Population Research Seminar Series Session 4: Standardized Survey Interviewing Part 1: Controlling Interviewer Related Error Jack Fowler, PhD Center for Survey Research

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

Web-Unit zu Highlight 4A, Unit 3. Worksheet zur Web-Unit California Dreaming. Exercise 1: Los Angeles

Web-Unit zu Highlight 4A, Unit 3. Worksheet zur Web-Unit California Dreaming. Exercise 1: Los Angeles Worksheet zur Web-Unit California Dreaming Exercise 1: Los Angeles Try to answer these questions about this Californian city! Is Los Angeles the largest city in California? Where is Los Angeles? Los Angeles

More information

Fall 2013 to present Assistant Professor, Department of Psychological and Brain Sciences, Johns Hopkins University

Fall 2013 to present Assistant Professor, Department of Psychological and Brain Sciences, Johns Hopkins University M A R I N A B E D N Y Johns Hopkins University Department of Psychological and Brain Sciences 3400 N. Charles Street, Ames Hall Baltimore, MD 21218 mbedny@mit.edu ACADEMIC POSITIONS Fall 2013 to present

More information

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

始 まりのブザーが 鳴 るまでページをめくってはいけません

始 まりのブザーが 鳴 るまでページをめくってはいけません 2013 年 度 青 山 学 院 高 等 部 一 般 入 学 試 験 問 題 英 語 始 まりのブザーが 鳴 るまでページをめくってはいけません 下 記 の 注 意 事 項 に 目 を 通 しておいてください 問 題 用 紙 は 1 ページから 12 ページまであるので 始 まりのブザー が 鳴 ったらすぐに 確 認 すること 解 答 はすべて 別 紙 の 解 答 用 紙 に 記 入 すること とじてある

More information

( ) is proportional to ( 10 + x)!2. Calculate the

( ) is proportional to ( 10 + x)!2. Calculate the PRACTICE EXAMINATION NUMBER 6. An insurance company eamines its pool of auto insurance customers and gathers the following information: i) All customers insure at least one car. ii) 64 of the customers

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Human behavior analysis from videos using optical flow

Human behavior analysis from videos using optical flow L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e Human behavior analysis from videos using optical flow Yassine Benabbas Directeur de thèse : Chabane Djeraba Multitel

More information

Mathematical goals. Starting points. Materials required. Time needed

Mathematical goals. Starting points. Materials required. Time needed Level S2 of challenge: B/C S2 Mathematical goals Starting points Materials required Time needed Evaluating probability statements To help learners to: discuss and clarify some common misconceptions about

More information

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006

Pixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006 Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

More information

Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum.

Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum. Work sample portfolio summary WORK SAMPLE PORTFOLIO Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum. Each portfolio is an example

More information