The confidence-probability semiring Günther J. Wirsching Markus Huber Christian Kölbl

Similar documents

You know from calculus that functions play a fundamental role in mathematics.

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, Notes on Algebra

Mathematics Review for MS Finance Students

Linear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

The Basics of Graphical Models

University of Lille I PC first year list of exercises n 7. Review

11 Ideals Revisiting Z

Section IV.21. The Field of Quotients of an Integral Domain

COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Degree Hypergroupoids Associated with Hypergraphs

MATH 22. THE FUNDAMENTAL THEOREM of ARITHMETIC. Lecture R: 10/30/2003

How To Prove The Dirichlet Unit Theorem

A CONSTRUCTION OF THE UNIVERSAL COVER AS A FIBER BUNDLE

SOME PROPERTIES OF FIBER PRODUCT PRESERVING BUNDLE FUNCTORS

Notes on Factoring. MA 206 Kurt Bryan

SOLUTIONS TO ASSIGNMENT 1 MATH 576

How To Know If A Domain Is Unique In An Octempo (Euclidean) Or Not (Ecl)

Types of Degrees in Bipolar Fuzzy Graphs

Chapter 13: Basic ring theory

Adaptive Online Gradient Descent

Quotient Rings and Field Extensions

1 Solving LPs: The Simplex Algorithm of George Dantzig

Lecture 13 - Basic Number Theory.

Sets of Fibre Homotopy Classes and Twisted Order Parameter Spaces

EMBEDDING DEGREE OF HYPERELLIPTIC CURVES WITH COMPLEX MULTIPLICATION

No: Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

Group Theory. Contents

Row Ideals and Fibers of Morphisms

4. CLASSES OF RINGS 4.1. Classes of Rings class operator A-closed Example 1: product Example 2:

Tree-representation of set families and applications to combinatorial decompositions

Linear Programming Notes V Problem Transformations

ON GENERALIZED RELATIVE COMMUTATIVITY DEGREE OF A FINITE GROUP. A. K. Das and R. K. Nath

Mathematical finance and linear programming (optimization)

Cartesian Products and Relations

NOTES ON CATEGORIES AND FUNCTORS

Homework until Test #2

6.2 Permutations continued

SOLVING EQUATIONS WITH EXCEL

The last three chapters introduced three major proof techniques: direct,

1 Business Modeling. 1.1 Event-driven Process Chain (EPC) Seite 2

ADDITIVE GROUPS OF RINGS WITH IDENTITY

(a) Write each of p and q as a polynomial in x with coefficients in Z[y, z]. deg(p) = 7 deg(q) = 9

11 Multivariate Polynomials

Left-Handed Completeness

Working Paper Series

CHAPTER 2 Estimating Probabilities

3 Monomial orders and the division algorithm

Introduction to Modern Algebra

Math 55: Discrete Mathematics

4: SINGLE-PERIOD MARKET MODELS

Let H and J be as in the above lemma. The result of the lemma shows that the integral

INTRODUCTORY SET THEORY

Kolmogorov Complexity and the Incompressibility Method

CHAPTER 3. Methods of Proofs. 1. Logical Arguments and Formal Proofs

Linear Algebra I. Ronald van Luijk, 2012

Chapter 7: Products and quotients

3. Mathematical Induction

LEARNING OBJECTIVES FOR THIS CHAPTER

Assignment 8: Selected Solutions

Notes on finite group theory. Peter J. Cameron

α = u v. In other words, Orthogonal Projection

Surface bundles over S 1, the Thurston norm, and the Whitehead link

A New Generic Digital Signature Algorithm

4.1 Modules, Homomorphisms, and Exact Sequences

Computing divisors and common multiples of quasi-linear ordinary differential equations

1 Sets and Set Notation.

Discuss the size of the instance for the minimum spanning tree problem.

Basic Concepts of Point Set Topology Notes for OU course Math 4853 Spring 2011

What is Linear Programming?

NOTES ON LINEAR TRANSFORMATIONS

Khair Eddin Sabri and Ridha Khedri

GROUPS WITH TWO EXTREME CHARACTER DEGREES AND THEIR NORMAL SUBGROUPS

Jieh Hsiang Department of Computer Science State University of New York Stony brook, NY 11794

FINITE DIMENSIONAL ORDERED VECTOR SPACES WITH RIESZ INTERPOLATION AND EFFROS-SHEN S UNIMODULARITY CONJECTURE AARON TIKUISIS

1 Homework 1. [p 0 q i+j p i 1 q j+1 ] + [p i q j ] + [p i+1 q j p i+j q 0 ]

ALMOST COMMON PRIORS 1. INTRODUCTION

How To Understand The Theory Of Media Theory

How To Find Out How To Calculate A Premeasure On A Set Of Two-Dimensional Algebra

Recall that two vectors in are perpendicular or orthogonal provided that their dot

On an anti-ramsey type result

Applied Algorithm Design Lecture 5

What Is School Mathematics?

GROUPS ACTING ON A SET

Chapter ML:IV. IV. Statistical Learning. Probability Basics Bayes Classification Maximum a-posteriori Hypotheses

= = 3 4, Now assume that P (k) is true for some fixed k 2. This means that

Notes from Week 1: Algorithms for sequential prediction

MATHEMATICAL INDUCTION. Mathematical Induction. This is a powerful method to prove properties of positive integers.

SMALL SKEW FIELDS CÉDRIC MILLIET

3. Prime and maximal ideals Definitions and Examples.

CPC/CPA Hybrid Bidding in a Second Price Auction

Indiana State Core Curriculum Standards updated 2009 Algebra I

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Transcription:

UNIVERSITÄT AUGSBURG The confidence-probability semiring Günther J. Wirsching Markus Huber Christian Kölbl Report 2010-04 April 2010

INSTITUT FÜR INFORMATIK D-86135 AUGSBURG 2

Copyright c Günther J. Wirsching, Markus Huber, Christian Kölbl Institut für Informatik Universität Augsburg D 86135 Augsburg, Germany http://www.informatik.uni-augsburg.de all rights reserved

1 Introduction The confidence-probability semiring serves as a modeling of uncertain information. Possible applications are the representation of dialogue systems and weighted automata [2] with the explicit use of confidence-numbers. The confidence will be modeled using the max-plus-algebra, which is also called arctic semiring: Max = R + {± }, Max, 0 Max, Max, 1 Max ) = { } [0, ], max,, +, 0), The probabilities are modeled by the min-plus-algebra, which is also known by tropical semiring: Min = R + { }, Min, 0 Min, Min, 1 Min ) = [0, ], min,, +, 0). It will be shown below that each of these algebras is isomorphic to a semiring whose elements can be interpreted as probabilities. The confidence-probability semiring will be constructed as some kind of semidirect product of the arctic and the tropical semiring. The analogy to the semidirect product in group theory is that the operations on the product are not just component-wise operations on the underlying semirings, as it would be on the direct product, but an additional operation of the one semiring on the other semiring by means of semiring endomorphisms is used. In this case we define an operation of the max-plus-algebra on the min-plus-algebra. The two operations and will be construced in a way that allows for information theoretical interpretation. The -addition combines information from incompatible sources, e. g., different results from an N-best speech recognizer decoding one utterance. The -operation consequently leads to a decision between two confidence-probability pairs: the one with the higher confidence is kept, and in case of equal confidences the one with the higher probability is kept. In contrast, the -operation is to be interpreted as fusion of two confidence-probability-pairs: the confidences are summed up, and the probabilities of the product can be interpreted towards the limit of high confidences as a Bayesian Posterior if one perceives the initial probabilities as 1

Bayesian Prior and Likelihood. On the other hand, if one of the confidenceprobability pairs has low confidence, the other one is barely changed by fusion. This construction of the operations and have one downer: In order to have semiring structure on the product, it is necessary to exclude from the max-plus-algebra, because otherwise the distributive law could be broken. The probabilistic semiring To admit a probabilistic interpretation, we consider the probabilistic semiring P = [0, 1] { }, max,,, 0), whose multiplication is the probabilistic sum p q := p + q pq. The following mapping will be important below: µ : Max P, µx) := 1 e x, µ ) =. It is a semiring-isomorphism, because it is obviously bijective, and it fulfills and µmax{x, y}) = 1 e max{x,y} = 1 min { e x, e y} = max{µx), µy)}, µ ) =, µx + y) = 1 e x+y) = 1 e x e y = 1 e x) + 1 e y) 1 e x) 1 e y) = 1 e x) 1 e y) = µx) µy) µ0) = 0. So, the probabilistic semiring is isomorphic to the max-plus-algebra. 2

The Bayesian-Semiring and the Min-Plus-Algebra With the structure H = [0, 1], max, 0,, 1), another semiring is given which we call Bayesian-Semiring because its multiplication appears to be well-suited for modeling the transition from a Bayesian Prior to the Posterior. The letter H is adopted to avoid confusion with the Boolean Semiring. In speech recognition devices, in general negative logarithms of probabilities, so called log-probs, are used in stead of probabilities. These result from normal probabilities through the mapping: l : [0, 1] [0, ], lp) := log p, l0) :=. It follows from the fact that l is order preserving, and from the functional equation of the logarithm, that l is a semiring-isomorphism from the Bayesian-Semiring to the min-plus-algebra, l : H = Min. The operation of the probabilistic semiring on the Bayesian semiring In his diploma thesis [1], Huber considers pairs c, π), consisting of a confidence c [0, 1] and a probability vector taken from the open standard simplex { n + := π = p 1,..., p n ) n + p 1,..., p n ) R n all p i > 0, and } n p i = 1 i=1 to model uncertain information. Therein it is postulated that there will not occur zero-probabilities because it has technical advantages on the one hand and no apparent practical disadvantages on the other hand particularly in practical applications in speech recognition as considered by Huber, it is usefull to allow for a certain amount of uncertainity at all times. 3

Huber constructs, based on the ideas of Römer and Wirsching, an update-function as binary operation over the set S n := [0, 1] n +. To define, first consider the standardization-function { N : R + ) n \ {0} n := N p 1,..., p n ) := p 1,..., p n ) R + ) n n i=1 p i = 1 1 p 1 +... + p n p 1,..., p n ), }, and define c, π) s, λ) := c s, N π c λ s ) 1/c s))). 1) Here the probabilistic sum c s = c + s cs shows up again, and exponation and multiplication of a probability vector are defined component-wise: π c λ s := p c 1 λs 1,..., pc nλ s n). The update-function should model the fusion of two pieces of information to the same topic, each consisting of a confidence and a probability vector. It fulfills the following set of postulations: 1. is continuous as a function ]0, 1] n +) ]0, 1] n + ) S n 2. is commutative: the order should not play a role by the fusion of pieces of information. 3. is associative: it should be unimportant in which way the pieces of information are combined. 4. is isotone in the first component: the confidence does not fall if new information is added but the probability-vector could become more disordered, which means its entropy may increase. 5. For all π, λ n +, it should hold that 1, π) 1, λ) = 1, N πλ)): Towards the limit of maximal confidence, the fusion should correspond to the transition of a Bayesian Prior to the Posterior. 4

6. For all π, λ n +, it should hold that 0, π) s, λ) = s, λ): Towards the limit of an update without confidence, the probabilityvector should not change. Now the idea is to construct an operation of the probabilistic semiring on the Bayesian-semiring based on this update function. Therefore consider the following mapping: α : ]0, 1] H H, αc, p) := c p := p c, 2) which has the properties for all c ]0, 1] and arbitrary p, q [0, 1]): 2c min{p, q} = min{c p, c q}, c 0 = 0, c p q) = c p c q, c 1 = 1. These two equations show that, at least in case 0 < c 1, the mapping c is a semiring endomorphism H c H. Setting additionally { 0 if p = 0, 20 : [0, 1] [0, 1], 0 p := 1 if p > 0, { 0 if p < 1, ) : [0, 1] [0, 1], ) p := 1 if p = 1, we get, for arbitrary p, q [0, 1], the properties 0 min{p, q} = min{0 p, 0 q}, 0 pq) = 0 p 0 q, ) min{p, q} = min{ ) p, ) q}, ) pq) = ) p ) q. 3) Altogether for each c [0, 1] { } the mapping c semiring-endomorphism. : H H is a 5

Operation from the Max-Plus-Algebra on the Min- Plus-Algebra The operation of the probabilistic semiring on the Bayesian semiring given by 2) and 3) can be seen as an operation of the max-plus-algebra on the min-plus-algebra via the above isomorphisms, as in the commutative diagram This leads to the mapping Max Min Min l µ l 1 P H φ α H. φ : Max Min Min, µa)x if a > 0, 0 if a = and x = 0, a, x) φa, x) := a x := if a = and x > 0; The following definitions of the products 0 if a = 0 and x <, if a = 0 and x =. 0 = 0, 0 =, und x > 0 : x =, seem unnatural at a first look but are resultig from 3), and, consequently, result from the postulation that ) and 0 should be endomorphisms on the semiring Min, whence have to map neutral elements onto neutral elements. Conclusions: a) For each a Max a : Min Min is a semiring endomorphism, which hence particularly keeps neutral elements 4) a ) = and a 0 = 0. 5) b) For 0 < a, we have 0 < µa) 1, hence a : [0, ] [0, ] is bijective. Accordingly for 0 x and 0 < a the quotient is well defined and fulfills x µa) a ) 1 x = x µa). 6) 6

The semi-direct product When building the product of two semirings, one essential trick is first to remove the neutral elemnts before forming the Cartesian product, and re-add them in adequate form afterwards. This leads to the following set: ) { } K := Max \ {, 0}) Min 0 Max, 0 Min ), 1 Max, 1 Min ) = ]0, ] [0, ] ) {, ), 0, 0)}; 7) For later usage two properties are noted here, which result from the special role of the two elements, ) and 0, 0): } 2 a, x) K : a = x = and a x =, 8) a, x) K : a = 0 x = 0 and a x = 0. The -addition over K will be taking the maximum w.r.t. a total ordering K on K formally defined as follows: a < b a, x), b, y) K : a, x) K b, y) : or 9) a = b and x y. To motivate this definition, recall that the first component a { } [0, ] of a confidence-probability pair a, x) K encodes confidence, whereas the second component x = ln p is the negative logarithm of a probability. Given two confidence-probability-pairs a, x), b, y) K, the one with the higher confidence is considered the bigger one. In case of equal confidences, the bigger pair is the one whose second component corresponds to the bigger probability. Now the -addition over K is defined by a, x) b, y) := max K {a, x), b, y)} 10) { a, x) if b, y) K a, x), := b, y) if a, x) K b, y). 7

is commutative and has the neutral element 0 K :=, ), as, ) is the smallest element of K w.r.t. the ordering relation K. To construct the -multiplication we use the following conventions extending the quotient notation 6): x Min = [0, ] : x = und x 0 = 0. 11) On a first look, these conventions may seem unnatural, but they have the pleasant consequence ) a x x a, x) K : µa) = a = x. 12) µa) Indeed, for 0 < a the two equations arise from 4). Moreover, for a = or a = 0 we have µa) = a, making the equation a x µa) = x a consequence from 8) and the conventions 11), whereas the second equation a x ) = x results from 11) and the homomorphism conditions 5). µa) The -multiplication over K is defined through a, x) b, y) := a + b, a ) x + b y ; 13) µa + b) it is also commutative, and has the neutral element as is easily seen using 12): a, x) K : a, x) 0, 0) = 1 K := 0, 0), a + 0, a ) ) x + 0 0 a x = a, = a, x). µa + 0) µa) Associativity und Distributivity The -addition is just taking the maximum w.r.t. a total ordering, whence associativity of is immediate. 8

As on both semirings Max and Min, the operation + is associative, associativity of the -multiplikation will follow from the assertion a, x), b, y), c, z) K : ) a, x) b, y) c, z) = a + b + c, a ) x + b y + c z, µa + b + c) which can be proved as follows: a, x) b, y) ) c, z) 13) = 13) = 12) = a + b, a ) x + b y µa + b) a + b + c, c, z) a + b) a x+b y µa+b) µa + b + c) a + b + c, a x + b y + c z µa + b + c) Collecting what we ve proved up to now, we infer that K,,, ),, 0, 0) ) ) + c z is a bimonoid. In order to prove that this is a semiring, we would have to prove the law of distributivity: a, x), b, y), c, z) K : a, x) b, y) ) c, z) = a, x) c, z) ) b, y) c, z) ). 14) Unfortunately, this assertion is false. Counterexamples are provided by those confidence-probability pairs with a < b, and x < y, c =, and z < : 2 a, x) b, y) ), z) = b, x), z) because a < b, ). =, b y + z) by 13), as = id Min. On the other hand, a < b and x < y imply a x < b y. For z < we infer a x + z < b y + z, and conclude using 10) a, x), z) ) b, y), z) ) =, a x + z), b y + z) contradicting 14). 9 =, a x + z),

The confidence-probability semiring With this preliminary work, we are now ready to define the confidenceprobability semiring. To this end, we remove from K all pairs with confidence and define K := K \ { } Min ) = ]0, [ [0, ] ) {, ), 0, 0)}, 15) and observe that the restriction of 14) to this set is equivalent to the assertion a, x), b, y), c, z) K : a, x) K b, y) a, x) c, z) K b, y) c, z). To proof this, assume that a, x) K b, y). Case 1: a < b. Using c < we infer the strict inequality a + c < b + c, which implies by 13) and 9) the relation a, x) c, z) K b, y) c, z). Case 2: a = b. Then we conclude from 9) that x y, and we have to prove a, x) c, z) K a, y) c, z), which, by 13), is equivalent to a + c, a ) x + c z µa + K a + c, a ) y + c z. 16) c µa + c) In order to prove this, we have to refer to the second case of 9), and, consequently, we have to prove a x + c z µa + c) a y + c z µa + c). If a + c = 0 or a + c =, this follows from 11). Otherwise, we have to show a x + c z a y + c z. 17) 10

If z =, we infer from 5) that c z =, reducing 17) to the equation =. In case z < it also holds c z <, which makes 17) equivalent to a x a y. 18) This follows from x y and the fact that, by their definition in 4), each endomorphism a : Min Min is isotonic. Finally it has been shown that confidence-probability-semiring ) K,,, ),, 0, 0) has the structure of a commutative semiring. Moreover, the semiring K inherits idempotency from the semirings Max and Min. Canonical Embedding The construction of a semidirect product would be incomplete without investigating possible embeddings of the source objects into the product objects. An embedding of an object A into an object B is an isomorphism of A onto a subobject of B. Speaking in the terminology of semirings it is an injective homomorphism of semirings of A onto B. In concrete: Let A = A, A, 0 A, A, 1 A ), B = B, B, 0 B, B, 1 B ), be two semirings then an embedding of A into B is a mapping ι : A b complying with the following properties: i) ι0 A ) = 0 B, ii) ι1 A ) = 1 B, iii) x, y A : iv) x, y A : ιx) B ιy) = ιx A y), ιx) B ιy) = ιx A y), v) ι is injective. 11

In the present case the confidence-probability semiring K is seen as the semidirect product of the shortened max-plus-algebra Max := Max \{ } and the min-plus-algebra Min, written K = Max Min. A homomorphism ι : Max K or ι : Min K is called a canonical embedding if its first resp. second component is the identical mapping, strictly spoken if a Max : ιa) = a, ι 2 a)) resp. x Min : ιx) = ι 1 x), x) holds. Obviously a canonical embedding is injective by which the name embedding is reasonable. Theorem 1 For every x [0, ] the mapping γ x : Max K, γ x a) = a, aµ1) x ) = a, a1 ) e 1 )x µa) 1 e a, 19) is a canonical embedding of the shortened max-plus-algebra into the confidenceprobability semiring whereas the conventions from 11) are needed to understand the quotient. Proof. Let x [0, ] be fixed. The above mentioned requirements i) v) have to be verified. At first i), the preservation of the neutral element of the addition: γ x ) =, 1 ) e 1 ) x =, ), pursuant to the first part of the conventions from 11). ii) concerns the preservation of the neutral element of the multiplication: γ x 0) = 0, 0 1 ) e 1 ) x = 0, 0), 0 after the second part of the conventions from 11). 12

In order to proof iii), the compatibility of γ x and the addition, the special character of the second component is irrelevant; so the abbreviation σa) := aµ1) x. 20) µa) is used. It holds a, b Max : ) γ x a) = a, σa)) if a > b, γ x max{a, b} = γ x a) = γ x b) if a = b, γ x b) = b, σb)) if a < b, = max K { γx a), γ x b) }. For the proof of iv), the compatibility a, b Max : γ x a + b) = γ x a) K γ x b), the special character of σ matters. As a start for a = or b = one gets the equation a + b = and after i) γ x a + b) = γ x ) =, ) = 0 K. Let now a, b [0, [ be arbitrary, then according to 20) and 12) ) ) aµ1) x a σa) = a = aµ1) x 21) µa) holds. Now one can begin calculating for a, b [0, [ according to the proof of iv): γ x a + b) = a, σa) ) ) K b, σb) 13) = a + b, a ) σa) + b σb) µa + b) ) 21) aµ1) x + bµ1) x = a + b, µa + b) ) a + b)µ1) x = a + b, µa + b) 21) = a + b, σa + b)) = γ x a + b). 13

The verification of v) results from the fact that every canonical embedding is per se injective. Theorem 2 There is a canonical embedding Min K. Proof. Let ι, id) : Min K be a homomorphism between semirings. In particular ι ) = und ι0) = 0. hold. Because of 8) it follows: x Min \{0, } a := ιx) > 0. 22) Let x Min \{0, } be arbitrary. On the one hand now ι, id)x + x) = ιx + x), x + x), 23) holds and on the other hand, because ι, id) is a homomorphism between semirings, it also holds after iii) ι, id)x + x) = ι, id)x) K ιx), x) = a, x) K a, x) = a + a, a ) ) x + a x µa)x + x) = a + a,. µa + a) µa + a) If 0 < µa) < 1, one could continue with positiv reals and would get because of 23) 1 = µa) µa + a) = 1 e a 1 e 2a. From that would follow 2a = a. Because of a Max = { } [0, [, this would imply either a = 0 or a =, in contradiction to assumption 0 < µa) < 1. Hence ιx) = a {, 0} and by 8) x Min : ιx), x) {,, ), 0, 0)}, but that is impossible after the construction of K from 7). Finally, let us note that the mapping 0, 0) if x = 0, ι : Min K, ιx) :=, ) if x =,, x) otherwise, is a canonical embedding of the min-plus-algebra into the bimonoid K. 14

References [1] Markus Huber. Semantische Informationsbearbeitung unter Berücksichtigung von Konfidenz-Zahlen. Diplomarbeit in Mathematik. Katholische Universität Eichstätt-Ingolstadt, 2009. [2] Mehryar Mohri. Weighted Automata Algorithms. In Manfred Droste, Werner Kuich, and Heiko Vogler, editors, Handbook of Weighted Automata, Monographs in Theoretical Computer Science, pages 213 254. 2009. 15