# It s a Contradiction No, it s Not: A Case Study using Functional Relations

Size: px
Start display at page:

## Transcription

3 Of course, world knowledge, and reasoning about text, are often uncertain, which leads us to associate probabilities with a CD system s conclusions. Nevertheless, the knowledge base K is essential for CD. We now turn to a probabilistic model that helps us simultaneously estimate the functionality of relations (B in the above example) and ambiguity of argument values (A above). Section 4 describes the remaining components of AUCONTRAIRE. 3 Detecting Functionality and Ambiguity This section introduces a formal model for computing the probability that a phrase denotes a function based on a set of extracted tuples. An extracted tuple takes the form R(x, y) where (roughly) x is the subject of a sentence, y is the object, and R is a phrase denoting the relationship between them. If the relation denoted by R is functional, then typically the object y is a function of the subject x. Thus, our discussion focuses on this possibility, though the analysis is easily extended to the symmetric case. Logically, a relation R is functional in a variable x if it maps it to a unique variable y: x, y 1, y 2 R(x, y 1 ) R(x, y 2 ) y 1 = y 2. Thus, given a large random sample of ground instances of R, we could detect with high confidence whether R is functional. In text, the situation is far more complex due to ambiguity, polysemy, synonymy, and other linguistic phenomena. Deciding whether R is functional becomes a probabilistic assessment based on aggregated textual evidence. The main evidence that a relation R(x, y) is functional comes from the distribution of y values for a given x value. If R denotes a function and x is unambiguous, then we expect the extractions to be predominantly a single y value, with a few outliers due to noise. We aggregate the evidence that R is locally functional for a particular x value to assess whether R is globally functional for all x. We refer to a set of extractions with the same relation R and argument x as a contradiction set R(x, ). Figure 1 shows three example contradiction sets. Each example illustrates a situation commonly found in our data. Example A in Figure 1 shows strong evidence for a functional relation. 66 out of 70 TEXTRUNNER extractions for was born in (Mozart, PLACE) have the same y value. An ambiguous x argument, however, can make a functional relation appear non-functional. Example B depicts a distribution of y values that appears less functional due to the fact that John Adams refers to multiple, distinct real-world individuals with that name. Finally, example C exhibits evidence for a non-functional relation. A. was born in(mozart, PLACE): Salzburg(66), Germany(3), Vienna(1) B. was born in(john Adams, PLACE): Braintree(12), Quincy(10), Worcester(8) C. lived in(mozart, PLACE): Vienna(20), Prague(13), Salzburg(5) Figure 1: Functional relations such as example A have a different distribution of y values than non-functional relations such as C. However, an ambiguous x argument as in B, can make a functional relation appear non-functional. 3.1 Formal Model of Functions in Text To decide whether R is functional in x for all x, we first consider how to detect whether R is locally functional for a particular value of x. The local functionality of R with respect to x is the probability that R is functional estimated solely on evidence from the distribution of y values in a contradiction set R(x, ). To decide the probability that R is a function, we define global functionality as the average local functionality score for each x, weighted by the probability that x is unambiguous. Below, we outline an EMstyle algorithm that alternately estimates the probability that R is functional and the probability that x is ambiguous. Let Rx indicate the event that the relation R is locally functional for the argument x, and that x is locally unambiguous for R. Also, let D indicate the set of observed tuples, and define D R(x, ) as the multi-set containing the frequencies for extractions of the form R(x, ). For example the distribution of extractions from Figure 1 for example A is D was born in(mozart, ) = {66, 3, 1}. Let θ f R be the probability that R(x, ) is locally functional for a random x, and let Θ f be the vector of these parameters across all relations R. Likewise, θx u represents the probability that x is locally unambiguous for random R, and Θ u the vector for all x.

4 We wish to determine the maximum a posteriori (MAP) functionality and ambiguity parameters given the observed data D, that is arg max Θ f,θ u P (Θf, Θ u D). By Bayes Rule: P (Θ f, Θ u D) = P (D Θf, Θ u )P (Θ f, Θ u ) P (D) (1) We outline a generative model for the data, P (D Θ f, Θ u ). Let us assume that the event R x depends only on θ f R and θu x, and further assume that given these two parameters, local ambiguity and local functionality are conditionally independent. We obtain the following expression for the probability of R x given the parameters: P (R x Θ f, Θ u ) = θ f R θu x We assume each set of data D R(x, ) is generated independently of all other data and parameters, given R x. From this and the above we have: P (D Θ f, Θ u ) = R,x ( P (D R(x, ) R x)θ f R θu x ) +P (D R(x, ) Rx)(1 θ f R θu x) (2) These independence assumptions allow us to express P (D Θ f, Θ u ) in terms of distributions over D R(x, ) given whether or not Rx holds. We use the URNS model as described in (Downey et al., 2005) to estimate these probabilities based on binomial distributions. In the single-urn URNS model that we utilize, the extraction process is modeled as draws of labeled balls from an urn, where the labels are either correct extractions or errors, and different labels can be repeated on varying numbers of balls in the urn. Let k = max D R(x, ), and let n = D R(x, ) ; we will approximate the distribution over D R(x, ) in terms of k and n. If R(x, ) is locally functional and unambiguous, there is exactly one correct extraction label in the urn (potentially repeated multiple times). Because the probability of correctness tends to increase with extraction frequency, we make the simplifying assumption that the most frequently extracted element is correct. 3 In this case, k is the number of correct extractions, which by the 3 As this assumption is invalid when there is not a unique maximal element, we default to the prior P (R x) in that case. URNS model has a binomial distribution with parameters n and p, where p is the precision of the extraction process. If R(x, ) is not locally functional and unambiguous, then we expect k to typically take on smaller values. Empirically, the underlying frequency of the most frequent element in the Rx case tends to follow a Beta distribution. Under the model, the probability of the evidence given Rx is: ( ) n P (D R(x, ) Rx) P (k, n Rx) = p k (1 p) n k k And the probability of the evidence given R x is: P (D R(x, ) R x) P (k, n R x) = ( n k) 1 0 = p k+α f 1 (1 p ) n+β f 1 k B(α f,β f ) dp ( n ) k Γ(n k + βf )Γ(α f + k) B(α f, β f )Γ(α f + β f + n) (3) where n is the sum over D R(x, ), Γ is the Gamma function and B is the Beta function. α f and β f are the parameters of the Beta distribution for the R x case. These parameters and the prior distributions are estimated empirically, based on a sample of the data set of relations described in Section Estimating Functionality and Ambiguity Substituting Equation 3 into Equation 2 and applying an appropriate prior gives the probability of parameters Θ f and Θ u given the observed data D. However, Equation 2 contains a large product of sums with two independent vectors of coefficients, Θ f and Θ u making it difficult to optimize analytically. If we knew which arguments were ambiguous, we would ignore them in computing the functionality of a relation. Likewise, if we knew which relations were non-functional, we would ignore them in computing the ambiguity of an argument. Instead, we initialize the Θ f and Θ u arrays randomly, and then execute an algorithm similar to Expectation- Maximization (EM) (Dempster et al., 1977) to arrive at a high-probability setting of the parameters. Note that if Θ u is fixed, we can compute the expected fraction of locally unambiguous arguments x for which R is locally functional, using D R(x, ) and

8 Functionality Ambiguity Precision AuContraire No Iteration Precision AuContraire No Iteration Recall Figure 3: After 5 iterations of EM, AUCONTRAIRE achieves a 19% boost to area under the precision-recall curve (AUC) for functionality detection, and a 31% boost to AUC for ambiguity detection. Recall Precision Web Distribution Balanced Data Recall Figure 4: Performance of AUCONTRAIRE at distinguishing genuine contradictions from false positives. The bold line is results on the actual distribution of data from the Web. The dashed line is from a data set constructed to have 50% positive and 50% negative instances. 5.4 Contribution of Knowledge Sources We carried out an ablation study to quantify how much each knowledge source contributes to AU- CONTRAIRE s performance. Since most of the knowledge sources do not apply to numeric argument values, we excluded the extractions where y is a number in this study. As shown in Figure 5, performance of AUCONTRAIRE degrades with no knowledge of synonyms (NS), with no knowledge of meronyms (NM), and especially without argument typing (NT). Conversely, improvements to any of these three components would likely improve the performance of AUCONTRAIRE. The relatively small drop in performance from no meronyms does not indicate that meronyms are not essential to our task, only that our knowledge sources for meronyms were not as useful as we hoped. The Tipster Gazetteer has surprisingly low coverage for our data set. It contains only 41% of the y values that are locations. Many of these are matches on a different location with the same name, which results in incorrect meronym information. We estimate that a gazetteer with complete coverage would increase area under the curve by approximately 40% compared to a system with meronyms from the Tipster Gazetteer and WordNet Percentage AUC AuContraire NS NM NT Figure 5: Area under the precision-recall curve for the full AUCONTRAIRE and for AUCONTRAIRE with knowledge removed. NS has no synonym knowledge; NM has no meronym knowledge; NT has no argument typing. To analyze the errors made by AUCONTRAIRE, we hand-labeled all false-positives at the point of maximum F-score: 29% Recall and 48% Precision.

10 References M. Banko and O. Etzioni The tradeoffs between traditional and open relation extraction. In Proceedings of ACL. M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni Open information extraction from the Web. In Procs. of IJCAI. W.W. Cohen, P. Ravikumar, and S.E. Fienberg A comparison of string distance metrics for namematching tasks. In IIWeb. Cleo Condoravdi, Dick Crouch, Valeria de Paiva, Reinhard Stolle, and Daniel G. Bobrow Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL 2003 workshop on Text meaning, pages 38 45, Morristown, NJ, USA. Association for Computational Linguistics. I. Dagan, O. Glickman, and B. Magnini The PASCAL Recognising Textual Entailment Challenge. Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment, pages 1 8. Marie-Catherine de Marneffe, Anna Rafferty, and Christopher D. Manning Finding contradictions in text. In ACL A.P. Dempster, N.M. Laird, and D.B. Rubin Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39(1):1 38. D. Downey, O. Etzioni, and S. Soderland A Probabilistic Model of Redundancy in Information Extraction. In Procs. of IJCAI. Sanda Harabagiu, Andrew Hickl, and Finley Lacatusu Negation, contrast and contradiction in text processing. In AAAI. Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2): B. MacCartney and C.D. Manning Natural Logic for Textual Inference. In Workshop on Textual Entailment and Paraphrasing. Ellen M. Voorhees Contradictions and justifications: Extensions to the textual entailment task. In Proceedings of ACL-08: HLT, pages 63 71, Columbus, Ohio, June. Association for Computational Linguistics. Hong Yao and Howard J. Hamilton Mining functional dependencies from data. Data Min. Knowl. Discov., 16(2): A. Yates and O. Etzioni Unsupervised resolution of objects and relations on the Web. In Procs. of HLT.

### What Is This, Anyway: Automatic Hypernym Discovery

What Is This, Anyway: Automatic Hypernym Discovery Alan Ritter and Stephen Soderland and Oren Etzioni Turing Center Department of Computer Science and Engineering University of Washington Box 352350 Seattle,

### 2. REPORT TYPE Final Technical Report

REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704 0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

### Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

### Generatin Coherent Event Schemas at Scale

Generatin Coherent Event Schemas at Scale Niranjan Balasubramanian, Stephen Soderland, Mausam, Oren Etzioni University of Washington Presented By: Jumana Table of Contents: 1 Introduction 2 System Overview

### Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### The Role of Sentence Structure in Recognizing Textual Entailment

Blake,C. (In Press) The Role of Sentence Structure in Recognizing Textual Entailment. ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, Czech Republic. The Role of Sentence Structure

### Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata

Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive

### Learning First-Order Horn Clauses from Web Text

Learning First-Order Horn Clauses from Web Text Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld Turing Center University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98125,

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### Statistical Machine Translation: IBM Models 1 and 2

Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation

### Detecting Parser Errors Using Web-based Semantic Filters

Detecting Parser Errors Using Web-based Semantic Filters Alexander Yates Stefan Schoenmackers University of Washington Computer Science and Engineering Box 352350 Seattle, WA 98195-2350 Oren Etzioni {ayates,

### Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

### Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

1 Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity David Nadeau 1,2, Peter D. Turney 1 and Stan Matwin 2,3 1 Institute for Information Technology National Research Council

### Phrase-Based Translation Models

Phrase-Based Translation Models Michael Collins April 10, 2013 1 Introduction In previous lectures we ve seen IBM translation models 1 and 2. In this note we will describe phrasebased translation models.

### Web Search Engines: Solutions

Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Divide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics

for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.

### Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

### Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

### Discovering process models from empirical data

Discovering process models from empirical data Laura Măruşter (l.maruster@tm.tue.nl), Ton Weijters (a.j.m.m.weijters@tm.tue.nl) and Wil van der Aalst (w.m.p.aalst@tm.tue.nl) Eindhoven University of Technology,

### Guido Sciavicco. 11 Novembre 2015

classical and new techniques Università degli Studi di Ferrara 11 Novembre 2015 in collaboration with dr. Enrico Marzano, CIO Gap srl Active Contact System Project 1/27 Contents What is? Embedded Wrapper

### Binary and Ranked Retrieval

Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return under-dimensioned output

### Open Information Extraction from the Web

Open Information Extraction from the Web Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni Turing Center Department of Computer Science and Engineering University of

### Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

### Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

Finding Contradictions in Text Marie-Catherine de Marneffe, Linguistics Department Stanford University Stanford, CA 94305 mcdm@stanford.edu Anna N. Rafferty and Christopher D. Manning Computer Science

### Learning Lexical Clusters in Children s Books. Edmond Lau 6.xxx Presentation May 12, 2004

Learning Lexical Clusters in Children s Books Edmond Lau 6.xxx Presentation May 12, 2004 1 Vision Children learn word patterns from repetition Cinderella s glass slippers -> Barbie s plastic slippers How

### Language Modeling. Chapter 1. 1.1 Introduction

Chapter 1 Language Modeling (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction In this chapter we will consider the the problem of constructing a language model from a set

### Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

### 2. Methods of Proof Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try.

2. METHODS OF PROOF 69 2. Methods of Proof 2.1. Types of Proofs. Suppose we wish to prove an implication p q. Here are some strategies we have available to try. Trivial Proof: If we know q is true then

### Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009

Robust Question Answering for Speech Transcripts: UPC Experience in QAst 2009 Pere R. Comas and Jordi Turmo TALP Research Center Technical University of Catalonia (UPC) {pcomas,turmo}@lsi.upc.edu Abstract

### A Key Obstacle to Implementing a Patent Strategy and One Way to Overcome It

Your Pacific Northwest Law Firm A Key Obstacle to Implementing a Patent Strategy and One Way to Overcome It Many articles have been written that discuss the importance of having a patent strategy and why

### Paraphrasing controlled English texts

Paraphrasing controlled English texts Kaarel Kaljurand Institute of Computational Linguistics, University of Zurich kaljurand@gmail.com Abstract. We discuss paraphrasing controlled English texts, by defining

### Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA

Age to Age Factor Selection under Changing Development Chris G. Gross, ACAS, MAAA Introduction A common question faced by many actuaries when selecting loss development factors is whether to base the selected

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

### 99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

### Enterprise Discovery Best Practices

Enterprise Discovery Best Practices 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

### Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

### Imputing Missing Data using SAS

ABSTRACT Paper 3295-2015 Imputing Missing Data using SAS Christopher Yim, California Polytechnic State University, San Luis Obispo Missing data is an unfortunate reality of statistics. However, there are

### Worked examples Basic Concepts of Probability Theory

Worked examples Basic Concepts of Probability Theory Example 1 A regular tetrahedron is a body that has four faces and, if is tossed, the probability that it lands on any face is 1/4. Suppose that one

### Chinese Open Relation Extraction for Knowledge Acquisition

Chinese Open Relation Extraction for Knowledge Acquisition Yuen-Hsien Tseng 1, Lung-Hao Lee 1,2, Shu-Yen Lin 1, Bo-Shun Liao 1, Mei-Jun Liu 1, Hsin-Hsi Chen 2, Oren Etzioni 3, Anthony Fader 4 1 Information

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### The English Baccalaureate Exam Specifications All Streams

ثانية INTRODUCTION The English Baccalaureate Exam Specifications All Streams The Baccalaureate English exam is a final written achievement test based on the standards set forth for the teaching of English

### Natural Language Updates to Databases through Dialogue

Natural Language Updates to Databases through Dialogue Michael Minock Department of Computing Science Umeå University, Sweden Abstract. This paper reopens the long dormant topic of natural language updates

### ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS

ANALYSIS OF LEXICO-SYNTACTIC PATTERNS FOR ANTONYM PAIR EXTRACTION FROM A TURKISH CORPUS Gürkan Şahin 1, Banu Diri 1 and Tuğba Yıldız 2 1 Faculty of Electrical-Electronic, Department of Computer Engineering

### Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

### Analysis and Synthesis of Help-desk Responses

Analysis and Synthesis of Help-desk s Yuval Marom and Ingrid Zukerman School of Computer Science and Software Engineering Monash University Clayton, VICTORIA 3800, AUSTRALIA {yuvalm,ingrid}@csse.monash.edu.au

### A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

### SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE

SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE Lakshmish Ramaswamy & I. Budak Arpinar Dept. of Computer Science, University of Georgia laks@cs.uga.edu, budak@cs.uga.edu

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### A Few Basics of Probability

A Few Basics of Probability Philosophy 57 Spring, 2004 1 Introduction This handout distinguishes between inductive and deductive logic, and then introduces probability, a concept essential to the study

### 3. Logical Reasoning in Mathematics

3. Logical Reasoning in Mathematics Many state standards emphasize the importance of reasoning. We agree disciplined mathematical reasoning is crucial to understanding and to properly using mathematics.

### How I won the Chess Ratings: Elo vs the rest of the world Competition

How I won the Chess Ratings: Elo vs the rest of the world Competition Yannis Sismanis November 2010 Abstract This article discusses in detail the rating system that won the kaggle competition Chess Ratings:

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

### The Analysis of Online Communities using Interactive Content-based Social Networks

The Analysis of Online Communities using Interactive Content-based Social Networks Anatoliy Gruzd Graduate School of Library and Information Science, University of Illinois at Urbana- Champaign, agruzd2@uiuc.edu

### Cross-Lingual Concern Analysis from Multilingual Weblog Articles

Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Connections to the Common Core State Standards for Literacy in Middle and High School Science

Connections to the Common Core State Standards for Literacy in Middle and High School Science This document is based on the Connections to the Common Core State Standards for Literacy in Science and Technical

### Chapter 6. The stacking ensemble approach

82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

### 6.4 Normal Distribution

Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

### Hypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam

Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests

### 3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### Quine on truth by convention

Quine on truth by convention March 8, 2005 1 Linguistic explanations of necessity and the a priori.............. 1 2 Relative and absolute truth by definition.................... 2 3 Is logic true by convention?...........................

### Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment

2009 10th International Conference on Document Analysis and Recognition Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment Ahmad Abdulkader Matthew R. Casey Google Inc. ahmad@abdulkader.org

### Natural Language to Relational Query by Using Parsing Compiler

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

### Explaining Keynes Theory of Consumption, and Assessing its Strengths and Weaknesses.

Explaining Keynes Theory of Consumption, and Assessing its Strengths and Weaknesses. The concept of consumption is one that varies between the academic community, governments, and between individuals.

### Student Performance Q&A:

Student Performance Q&A: AP Calculus AB and Calculus BC Free-Response Questions The following comments on the free-response questions for AP Calculus AB and Calculus BC were written by the Chief Reader,

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### Topic #6: Hypothesis. Usage

Topic #6: Hypothesis A hypothesis is a suggested explanation of a phenomenon or reasoned proposal suggesting a possible correlation between multiple phenomena. The term derives from the ancient Greek,

### Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle

### Stanford s Distantly-Supervised Slot-Filling System

Stanford s Distantly-Supervised Slot-Filling System Mihai Surdeanu, Sonal Gupta, John Bauer, David McClosky, Angel X. Chang, Valentin I. Spitkovsky, Christopher D. Manning Computer Science Department,

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

### A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

### INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION (Effective for assurance reports dated on or after January 1,

### Beating the MLB Moneyline

Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### THE MULTINOMIAL DISTRIBUTION. Throwing Dice and the Multinomial Distribution

THE MULTINOMIAL DISTRIBUTION Discrete distribution -- The Outcomes Are Discrete. A generalization of the binomial distribution from only 2 outcomes to k outcomes. Typical Multinomial Outcomes: red A area1

### International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

### Math 3000 Section 003 Intro to Abstract Math Homework 2

Math 3000 Section 003 Intro to Abstract Math Homework 2 Department of Mathematical and Statistical Sciences University of Colorado Denver, Spring 2012 Solutions (February 13, 2012) Please note that these

### Recurrent Neural Learning for Classifying Spoken Utterances

Recurrent Neural Learning for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter Centre for Hybrid Intelligent Systems, University of Sunderland, Sunderland, SR6 0DD, United Kingdom e-mail:

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Know the definitions of conditional probability and independence

### 3. Mathematical Induction

3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

FE670 Algorithmic Trading Strategies Lecture 6. Portfolio Optimization: Basic Theory and Practice Steve Yang Stevens Institute of Technology 10/03/2013 Outline 1 Mean-Variance Analysis: Overview 2 Classical

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 6: Models and Patterns Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Models vs. Patterns Models A model is a high level, global description of a

### Evolutionary Data Mining With Automatic Rule Generalization

Evolutionary Data Mining With Automatic Rule Generalization ROBERT CATTRAL, FRANZ OPPACHER, DWIGHT DEUGO Intelligent Systems Research Unit School of Computer Science Carleton University Ottawa, On K1S

### UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee

UNDERSTANDING THE EFFECTIVENESS OF BANK DIRECT MARKETING Tarun Gupta, Tong Xia and Diana Lee 1. Introduction There are two main approaches for companies to promote their products / services: through mass

### Chapter ML:XI (continued)

Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained