A MIXTURE MODEL WITH SHARING
|
|
- Archibald Thompson
- 7 years ago
- Views:
Transcription
1 A MIXTURE MODEL WITH SHARING FOR LEXICAL SEMANTICS Joseph Reisinger and Raymond Mooney The University of Texas at Austin 1
2 Outline Distributional lexical semantics Inadequacies of standard centroid representation Richer representations of homonymy using mixture models A tiered clustering model for polysemy: = + word senses background sense-specific deltas 2
3 Distributional Lexical Semantics Represent meaning as a point/vector in a highdimensional space Word relatedness correlates with some distance metric Attributional / relational / resource-based, Almuhareb and Poesio (2004), Baroni and Lenci (2009), Bullinaria and Levy (2007), Erk (2007), Griffiths et al. (2007), Landauer and Dumais (1997), Moldovan (2006), Padó and Lapata (2007), Pantel and Pennacchiotti (2006), Sahlgren (2006), Turney and Pantel (2010) 3
4 (Some of) Distributional Lexical Semantics word occurrences / context vectors The history of Oz prior to The Wonderful Wizard of Oz (often called the Rowling describes the beloved wizard Dumbledore as Machiavellian and says Merlin is a legendary figure best known as the wizard featured in the Arthurian legend A wizard comedian is known to have survived eating this plant on a bet, though he is still purple True Image is known for its simple, wizard driven interface, and received positive Thunder did a cover of "Pinball Wizard" to be featured on "Hollywood Rocks" compute centroid evil, powerful, magic, wizard, Merlin, spells, Harry 4
5 One Word One Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Find the centroid of the individual word occurrence context vectors Conflates senses / doesn t reliably account for thematic variability in usage 5 e.g., Schütze (1997)
6 One Word One Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Find the centroid of the individual word occurrence context vectors Conflates senses / doesn t reliably account for thematic variability in usage 5 e.g., Schütze (1997)
7 Multi-Prototype chicken (food) elephant Ω = pasta chicken (animal) wizard Use a mixture model to cluster occurrences of each target word separately. Doesn t find lexicographic senses; captures contextual variance directly. 6
8 WS-353 Correlation with Human Relatedness Judgements Spearman s ρ football - soccer 9.03 doctor - personnel 5.00 delay - news 3.31 Mars - water All High-Polysemy Single Prototype Multi Prototype 353 words, biased towards pairs with high similarity High polysemy subset generated by counting WN senses 7 Dataset: Finkelstein et. al 2001
9 Shared Structure Matters Spearman s ρ All High-Polysemy brother hit 4.7 hit brother 4.1 quake hit 3.4 hit quake 1.2 Single Prototype Multi Prototype Selectional Preference: Predict typical arguments for verbs (e.g. things that can eat or things that can be shot) Background cluster captures commonalities between argument fillers. 8 Dataset: Padó et. al 2007
10 Tiered Clustering There are cases where is a high degree of overlap between senses e.g. polysemous words like line or raise or i.e. in selectional preference where argument fillers may have some common structure MP models homonymy but not polysemy- it cannot account for shared structure Tiered Clustering introduces a background component to account for this = + word senses background sense-specific deltas
11 Tiered Clustering 10
12 Tiered Clustering φ 1 Merlin King Arthur Arthurian 10
13 Tiered Clustering φ 1 φ 2 Merlin King Arthur Arthurian fairy wicked scene tale 10
14 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore 10
15 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard 10
16 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 10
17 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 10
18 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11
19 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11
20 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11
21 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11
22 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b φ 3 φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 11
23 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard φ 3 φ b φ b φ 3 φ 3 φ b Rowling describes the beloved wizard Dumbledore as Machiavellian and says 12
24 Tiered Clustering φ 1 φ 2 φ 3 Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore φ b Dir(βbackground ) evil powerful magic wizard Merlin Rowling is a legendary describes figure the beloved best known wizard as Dumbledore the wizard featured as Machiavellian in the Arthurian and says legend 13
25 Tiered Clustering φ 1 φ 2 φ 3 φ b Merlin King Arthur Arthurian fairy wicked scene tale Harry Potter Voldemort Dumbledore evil powerful magic wizard θ d α Beta α d D, φ d β,g 0 DP β,g 0 d D, φ back β back Dirichlet β back z i,d θ d Bernoulli θ d i w d, (cluster proportions) (clusters) (background / shared) (tier indicator) w i,d φ d,z i,d Mult φ back z i,d 1 Mult φ d otherwise i w d, (features) here controls the per-data tier distribution 14
26 Latent Dirichlet Allocation θ d α Dirichlet(α), d D, (topic proportions) φ t β Dirichlet(β), t T, (topics) z id θ d Mult(θ d ), i w d, (topic indicators) w id φ zid Mult(φ zid ), i w d, (words) Tiered Clustering θ d α Beta α d D, φ d β,g 0 DP β,g 0 d D, φ back β back Dirichlet β back z i,d θ d Bernoulli θ d i w d, (cluster proportions) (clusters) (background / shared) (tier indicator) w i,d φ d,z i,d Mult φ back z i,d 1 Mult φ d otherwise i w d, (features) here controls the per-data tier distribution 15
27 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16
28 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16
29 Prototype Comparison multi-prototype LIFE my, you, real, about, your, would years, spent, rest, lived, last sentenced, imprisonment, sentence, prison years, cycle, life, all, expectancy, other all, life, way, people, human, social, many RADIO STOCK market, price, stock, company, value, crash housing, breeding, all, large, stock, many car, racing, company, cars, summer, NASCAR stock, extended, folded, card, barrel, cards rolling, locomotives, new, character, line Table 4: Example DPMM multi-prototype representation tiered LIFE all, about, life, would, death my, you, real, your, about spent, years, rest, lived, last sentenced, imprisonment, sentence, prison insurance, peer, Baron, member, company Guru, Rabbi, Baba, la, teachings RADIO (more polysemous) STOCK stock, all, other, company, new market, crash, markets, price, prices housing, breeding, fish, water, horses car, racing, cars, NASCAR, race, engine card, cards, player, pile, game, paper rolling, locomotives, line, new, railway Table 1: Example tiered clustering representation of (more homonymous) 16
30 Experimental Setup Features: Unigram context features, 10 word windows; frequency-based pruning Model Training: Gibbs sampling; convergence when # of cluster swaps falls below a fixed threshold Evaluation: Spearman s ρ with human similarity judgements Inferred similarity is cosine using tf-idf features, adapted for tiered clustering: AvgSim(w, w )= 1 K 1 K 2 K 1 j=1 K 2 k=1 d(φ w k, φ w j ) TieredAvgSim(w, w ) = AvgSim(w, w )+d(φ w back, φ w back) 17
31 WS-353 Correlation with Human Relatedness Judgements Spearman s ρ football - soccer 9.03 doctor - personnel 5.00 delay - news 3.31 Mars - water All High-Polysemy Single Prototype Multi Prototype Tiered Clustering 353 words, biased towards pairs with high similarity High polysemy subset generated by counting WN senses 18 Dataset: Finkelstein et. al 2001
32 WN-Evocation Relatedness Correlation Spearman s ρ social - urban 3.3 human - old 0.9 jealous - abundant 0.2 retired - ethical All High-Similarity High-Polysemy Single Prototype Multi Prototype Tiered Clustering Much higher degree of unrelated words than WS-353, more uniform sample over word pairs High similarity set restricts to pairs in the top quartile by human rating 19 Dataset: Ma et. al 2009
33 Padó Selectional Preference Spearman s ρ All High-Polysemy brother hit 4.7 hit brother 4.1 quake hit 3.4 hit quake 1.2 Single Prototype Multi Prototype Tiered Clustering Predict typical arguments for verbs (e.g. things that can eat or things that can be shot) Separate model for each argument slot Background cluster captures commonalities between argument fillers. Dataset: Padó et. al
34 Conclusion Introduced a latent variable model explicitly accounting for shared structure when clustering A priori assumption that features can be separated into shared and idiosyncratic components Showed significant improvement over baseline for word-relatedness and selectional preference 21
35 Thanks!
36 Questions
37 WS-353 Correlation with Human Relatedness Judgements 85 Spearman s ρ 77.5 Single Prototype Multi Prototype Tiered Clustering MP+SP 70 All High-Polysemy 353 words, biased towards pairs with abnormally high similarity High polysemy subset generated by counting WN senses 24 (Finkelstein et. al 2001)
38 WN-Evocation Relatedness Correlation 30 Spearman s ρ 20 Single Prototype Multi Prototype Tiered Clustering MP+SP 10 All High-Similarity High-Polysemy Much higher degree of unrelated words than WS-353, more uniform sample over word pairs High similarity set restricts to pairs in the top quartile by human rating 25 (Ma et. al 2009)
39 Padó Selectional Preference 30 Spearman s ρ 20 Single Prototype Multi Prototype Tiered Clustering MP+SP 10 All High-Polysemy Background cluster captures commonalities between argument fillers, e.g. things that can eat or things that can be shot. 26 (Padó et. al 2007)
Multi-Prototype Vector-Space Models of Word Meaning
Multi-Prototype Vector-Space Models of Word Meaning Joseph Reisinger Department of Computer Science The University of Texas at Austin 1 University Station C0500 Austin, TX 78712-0233 joeraii@cs.utexas.edu
More informationA Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection
A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection Gabriella Lapesa 2,1 1 Universität Osnabrück Institut für Kognitionswissenschaft Albrechtstr. 28,
More informationImproving Word Representations via Global Context and Multiple Word Prototypes
Improving Word Representations via Global Context and Multiple Word Prototypes Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford,
More informationFry Phrases Set 1. TeacherHelpForParents.com help for all areas of your child s education
Set 1 The people Write it down By the water Who will make it? You and I What will they do? He called me. We had their dog. What did they say? When would you go? No way A number of people One or two How
More informationFINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS
FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie
More informationExploratory analysis of semantic categories: comparing data-driven and human similarity judgments
Lindh-Knuutila and Honkela Computational Cognitive Science (2015) 1:2 DOI 10.1186/s40469-015-0001-1 RESEARCH Open Access Exploratory analysis of semantic categories: comparing data-driven and human similarity
More informationGallito 2.0: a Natural Language Processing tool to support Research on Discourse
Presented in the Twenty-third Annual Meeting of the Society for Text and Discourse, Valencia from 16 to 18, July 2013 Gallito 2.0: a Natural Language Processing tool to support Research on Discourse Guillermo
More informationState Trivia. You found a shortcut! Board Game. Super Teacher Worksheets - www.superteacherworksheets.com
Super Teacher Worksheets - www.superteacherworksheets.com If a player lands on Lose a turn, they skip their next turn. A player can only cross the Route 66 shortcut if they land on the square that says,
More informationClustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
More informationA Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
More informationGambling Commission. Trends in Gambling Behaviour 2008-2014
Gambling Commission Trends in Gambling Behaviour 2008-2014 Report Name Prepared for Authors Project team Trends in Gambling Behaviour 2008-2014 Gambling Commission Doros Georgiou Karl King Adrian Talbot
More informationModeling coherence in ESOL learner texts
University of Cambridge Computer Lab Building Educational Applications NAACL 2012 Outline 1 2 3 4 The Task: Automated Text Scoring (ATS) ATS systems Discourse coherence & cohesion The Task: Automated Text
More informationWhich of these words are connected with magic? Write them into the cauldron.
1 quick quiz Which of these words are connected with magic? Write them into the cauldron. spell professor wizard magical tropical bewitched prince potions school witchcraft marriage Hogwarts 2 What does
More informationThe more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON
The more money we have NEW INTERNATIONALIST EASIER ENGLISH INTERMEDIATE READY LESSON This lesson: 1/ Grammar 2/ Speaking: about ideas and graphs 3/ Reading: for gist and detail 4/ Writing: make a poster
More informationNouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space
Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space Marco Baroni and Roberto Zamparelli Center for Mind/Brain Sciences, University of Trento Rovereto
More informationPractical Introduction to Machine Learning and Optimization. Alessio Signorini <alessio.signorini@oneriot.com>
Practical Introduction to Machine Learning and Optimization Alessio Signorini Everyday's Optimizations Although you may not know, everybody uses daily some sort of optimization
More informationElementary Statistics
Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationProbabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
More informationUsing 3D Illustration and Animation in e-learning
302 Using 3D Illustration and Animation in e-learning Sheldon Murphy, May 17 & 18, 2007 Produced by New Techniques for Designing and Example from TV 1 Question: What s the dead giveaway from the commercial
More information2. SEMANTIC RELATIONS
2. SEMANTIC RELATIONS 2.0 Review: meaning, sense, reference A word has meaning by having both sense and reference. Sense: Word meaning: = concept Sentence meaning: = proposition (1) a. The man kissed the
More informationPhonics. High Frequency Words P.008. Objective The student will read high frequency words.
P.008 Jumping Words Objective The student will read high frequency words. Materials High frequency words (P.HFW.005 - P.HFW.064) Choose target words. Checkerboard and checkers (Activity Master P.008.AM1a
More information2nd Grade Language Arts Practice Test
2nd Grade Language Arts Practice Name: Instructions: Copyright 2000-2002 Measured Progress, All Rights Reserved : 1. Which word in the paragraph below is used too much? My music teacher is nice. He plays
More informationStatistics E100 Fall 2013 Practice Midterm I - A Solutions
STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won
More informationTowards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management
Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management Enrique Amigó 1, Damiano Spina 1, Bernardino Beotas 2, and Julio Gonzalo 1 1 Departamento de Lenguajes y Sistemas
More informationA Systematic Study of Semantic Vector Space Model Parameters
A Systematic Study of Semantic Vector Space Model Parameters Douwe Kiela University of Cambridge Computer Laboratory douwe.kiela@cl.cam.ac.uk Stephen Clark University of Cambridge Computer Laboratory sc609@cam.ac.uk
More informationThat s Not Fair! ASSESSMENT #HSMA20. Benchmark Grades: 9-12
That s Not Fair! ASSESSMENT # Benchmark Grades: 9-12 Summary: Students consider the difference between fair and unfair games, using probability to analyze games. The probability will be used to find ways
More informationSpazi vettoriali e misure di similaritá
Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 April 10, 2009 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationAnansi and the Moss- Covered Rock Retold by Eric A. Kimmel & Adapted by Whitney Peterson. Lion Scene
Anansi and the Moss- Covered Rock Retold by Eric A. Kimmel & Adapted by Whitney Peterson Narrator #1: Once upon a time Anansi the Spider was walking, walking, walking through the forest when something
More informationBasic Set Theory. 1. Motivation. Fido Sue. Fred Aristotle Bob. LX 502 - Semantics I September 11, 2008
Basic Set Theory LX 502 - Semantics I September 11, 2008 1. Motivation When you start reading these notes, the first thing you should be asking yourselves is What is Set Theory and why is it relevant?
More informationNatural Language Processing. Part 4: lexical semantics
Natural Language Processing Part 4: lexical semantics 2 Lexical semantics A lexicon generally has a highly structured form It stores the meanings and uses of each word It encodes the relations between
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationEfficient Cluster Detection and Network Marketing in a Nautural Environment
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 jian.zhang@cs.cmu.edu Zoubin
More informationIntroduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More information270 Stages 8 Lærerveiledning Årsplan/periodeplan
270 Stages 8 Lærerveiledning Årsplan/periodeplan Chapter 1 The Stage Is Yours Talking and writing about yourself To be and to have Question words Words with ea Capitalization Noun plurals Making a graph
More informationProblem sets for BUEC 333 Part 1: Probability and Statistics
Problem sets for BUEC 333 Part 1: Probability and Statistics I will indicate the relevant exercises for each week at the end of the Wednesday lecture. Numbered exercises are back-of-chapter exercises from
More informationOutline of today s lecture
Outline of today s lecture Generative grammar Simple context free grammars Probabilistic CFGs Formalism power requirements Parsing Modelling syntactic structure of phrases and sentences. Why is it useful?
More informationPUSD High Frequency Word List
PUSD High Frequency Word List For Reading and Spelling Grades K-5 High Frequency or instant words are important because: 1. You can t read a sentence or a paragraph without knowing at least the most common.
More informationClustering of Polysemic Words
Clustering of Polysemic Words Laurent Cicurel 1, Stephan Bloehdorn 2, and Philipp Cimiano 2 1 isoco S.A., ES-28006 Madrid, Spain lcicurel@isoco.com 2 Institute AIFB, University of Karlsruhe, D-76128 Karlsruhe,
More informationUsing Machine Learning to Predict Human Brain Activity
Using Machine Learning to Predict Human Brain Activity Senior Thesis Mahtiyar Bonakdarpour Advisor: Tom Mitchell 1 Abstract Brain imaging studies are geared towards decoding the way the human brain represents
More informationPredicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith
Predicting the NFL Using Twitter Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith Disclaimer This talk will not teach you how to become a successful sports bettor. Questions What is the NFL and what
More informationDon t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
Don t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors Marco Baroni and Georgiana Dinu and Germán Kruszewski Center for Mind/Brain Sciences (University
More informationAN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS
Advances in Information Mining ISSN: 0975 3265 & E-ISSN: 0975 9093, Vol. 3, Issue 1, 2011, pp-19-25 Available online at http://www.bioinfo.in/contents.php?id=32 AN SQL EXTENSION FOR LATENT SEMANTIC ANALYSIS
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationInertia, Forces, and Acceleration: The Legacy of Sir Isaac Newton
Inertia, Forces, and Acceleration: The Legacy of Sir Isaac Newton Position is a Vector Compare A A ball is 12 meters North of the Sun God to A A ball is 10 meters from here A vector has both a direction
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationA picture is worth five captions
A picture is worth five captions Learning visually grounded word and sentence representations Grzegorz Chrupała (with Ákos Kádár and Afra Alishahi) Learning word (and phrase) meanings Cross-situational
More informationFrames and Commonsense. Winston, Chapter 10
Frames and Commonsense Winston, Chapter 10 Michael Eisenberg and Gerhard Fischer TA: Ann Eisenberg AI Course, Fall 1997 Eisenberg/Fischer 1 Representations powerful ideas the representation principle:
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationHarry Potter: Lesson Plan!
Harry Potter: Lesson Plan Aims: By the end of the lesson, students will be able to know more about Harry Potter and the corresponding books. Materials: Harry Potter book covers, character list and picture
More informationCHECKLIST FOR THE DEGREE PROJECT REPORT
Kerstin Frenckner, kfrenck@csc.kth.se Copyright CSC 25 mars 2009 CHECKLIST FOR THE DEGREE PROJECT REPORT This checklist has been written to help you check that your report matches the demands that are
More informationThe Law and Harry Potter
The Law and Harry Potter Edited by Jeffrey E. Thomas Franklin G. Snyder Carolina Academic Press Durham, North Carolina Copyright 2010 Jeffrey E. Thomas Franklin G. Snyder All Rights Reserved Library of
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationFinding Similar Questions in Large Question and Answer Archives
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Center for Intelligent Information Retrieval, Computer Science Department University of Massachusetts,
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationBB2798 How Playtech uses predictive analytics to prevent business outages
BB2798 How Playtech uses predictive analytics to prevent business outages Eli Eyal, Playtech Udi Shagal, HP June 13, 2013 Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained
More informationHitwise Mobile Client Release FAQ February 2013
Hitwise Mobile Client Release FAQ February 2013 About Hitwise Mobile Mobile is one of the most dynamic aspects of the Internet today. Understanding the way consumers use the Internet from mobile devices
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationSentence paraphrase detection: When determiners and word order make the difference
Sentence paraphrase detection: When determiners and word order make the difference Nghia Pham University of Trento thenghia.pham@unitn.it Yao Zhong Zhang The University of Tokyo zhangyaoz@gmail.com Raffaella
More informationFinding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia
Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia Martin Scaiano University of Ottawa mscai056@uottawa.ca Diana Inkpen University of Ottawa diana@site.uottawa.com Abstract
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationThis image cannot currently be displayed. Course Catalog. Language Arts 600. 2016 Glynlyon, Inc.
This image cannot currently be displayed. Course Catalog Language Arts 600 2016 Glynlyon, Inc. Table of Contents COURSE OVERVIEW... 1 UNIT 1: ELEMENTS OF GRAMMAR... 3 UNIT 2: GRAMMAR USAGE... 3 UNIT 3:
More informationVocabulary Problem in Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California. fshli, danzigg@cs.usc.
Vocabulary Problem in Internet Resource Discovery Technical Report USC-CS-94-594 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781
More informationPerformance Evaluation and Optimization of Math-Similarity Search
Performance Evaluation and Optimization of Math-Similarity Search Qun Zhang and Abdou Youssef Department of Computer Science The George Washington University Washington DC, 20052, USA Abstract Similarity
More informationReady, Set, Go! Math Games for Serious Minds
Math Games with Cards and Dice presented at NAGC November, 2013 Ready, Set, Go! Math Games for Serious Minds Rande McCreight Lincoln Public Schools Lincoln, Nebraska Math Games with Cards Close to 20 -
More informationProbability, statistics and football Franka Miriam Bru ckler Paris, 2015.
Probability, statistics and football Franka Miriam Bru ckler Paris, 2015 Please read this before starting! Although each activity can be performed by one person only, it is suggested that you work in groups
More informationProbabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa Kotaro Nakayama Takahiro Hara Shojiro Nishio Graduate School of Information Science and Technology,
More informationTopics in Computational Linguistics. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment
Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment Regina Barzilay and Lillian Lee Presented By: Mohammad Saif Department of Computer
More informationOnline Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
More informationCommunications. NQF Level: 3 US No: 8970. Facilitator:... Company:... Commodity:... Date:...
NQF Level: 3 US No: 8970 Facilitator Guide Primary Agriculture Communications Facilitator:.......................................... Company:.......................................... Commodity:....................
More informationDiscrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
More informationTexas Hold em. From highest to lowest, the possible five card hands in poker are ranked as follows:
Texas Hold em Poker is one of the most popular card games, especially among betting games. While poker is played in a multitude of variations, Texas Hold em is the version played most often at casinos
More informationManaging Incompleteness, Complexity and Scale in Big Data
Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationSentiment Analysis for Movie Reviews
Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationBuzz Monitoring in Word Space
Buzz Monitoring in Word Space Magnus Sahlgren 1,2 and Jussi Karlgren 1,2 1 Gavagai AB 2 SICS (Swedish Institute of Computer Science) Box 1263, SE-16429 Kista, Sweden {mange,jussi}@gavagai.se Abstract.
More informationMean, Median, and Mode
DELTA MATH SCIENCE PARTNERSHIP INITIATIVE M 3 Summer Institutes (Math, Middle School, MS Common Core) Mean, Median, and Mode Hook Problem: To compare two shipments, five packages from each shipment were
More informationIntro to Linguistics Semantics
Intro to Linguistics Semantics Jarmila Panevová & Jirka Hana January 5, 2011 Overview of topics What is Semantics The Meaning of Words The Meaning of Sentences Other things about semantics What to remember
More informationAnalyzing Structured and Unstructured Data in Electronic Health Records
Analyzing Structured and Unstructured Data in Electronic Health Records Jing Zhao & Aron Henriksson Department of Computer and Systems Sciences (DSV) Stockholm University Electronic Health Records Administrative
More informationTF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
More informationEmotion Detection from Speech
Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationPopulation Research Seminar Series
Survey and Statistical Methods Core Population Research Seminar Series Session 4: Standardized Survey Interviewing Part 1: Controlling Interviewer Related Error Jack Fowler, PhD Center for Survey Research
More informationSentiment analysis using emoticons
Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was
More informationWeb-Unit zu Highlight 4A, Unit 3. Worksheet zur Web-Unit California Dreaming. Exercise 1: Los Angeles
Worksheet zur Web-Unit California Dreaming Exercise 1: Los Angeles Try to answer these questions about this Californian city! Is Los Angeles the largest city in California? Where is Los Angeles? Los Angeles
More informationFall 2013 to present Assistant Professor, Department of Psychological and Brain Sciences, Johns Hopkins University
M A R I N A B E D N Y Johns Hopkins University Department of Psychological and Brain Sciences 3400 N. Charles Street, Ames Hall Baltimore, MD 21218 mbedny@mit.edu ACADEMIC POSITIONS Fall 2013 to present
More informationA Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationLegal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION
Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,
More information始 まりのブザーが 鳴 るまでページをめくってはいけません
2013 年 度 青 山 学 院 高 等 部 一 般 入 学 試 験 問 題 英 語 始 まりのブザーが 鳴 るまでページをめくってはいけません 下 記 の 注 意 事 項 に 目 を 通 しておいてください 問 題 用 紙 は 1 ページから 12 ページまであるので 始 まりのブザー が 鳴 ったらすぐに 確 認 すること 解 答 はすべて 別 紙 の 解 答 用 紙 に 記 入 すること とじてある
More information( ) is proportional to ( 10 + x)!2. Calculate the
PRACTICE EXAMINATION NUMBER 6. An insurance company eamines its pool of auto insurance customers and gathers the following information: i) All customers insure at least one car. ii) 64 of the customers
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationHuman behavior analysis from videos using optical flow
L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e Human behavior analysis from videos using optical flow Yassine Benabbas Directeur de thèse : Chabane Djeraba Multitel
More informationMathematical goals. Starting points. Materials required. Time needed
Level S2 of challenge: B/C S2 Mathematical goals Starting points Materials required Time needed Evaluating probability statements To help learners to: discuss and clarify some common misconceptions about
More informationPixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
More informationAnnotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum.
Work sample portfolio summary WORK SAMPLE PORTFOLIO Annotated work sample portfolios are provided to support implementation of the Foundation Year 10 Australian Curriculum. Each portfolio is an example
More information