Approximated Probabilistic Answer Set Programming

Approximated Probabilistic Answer Set Programming Department of Computer Science Universidade de São Paulo São Paulo, Brazil 2014

Toy Example 4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 50% and the edge (3, 4) to 40%

Outline 1 2 Answer Set Programming 3 Probabilistic Satisfiability 4 5 6

Next Topic 1 2 Answer Set Programming 3 Probabilistic Satisfiability History Applications 4 5 6 7

What is? Non-monotonic, declarative programming paradigm for hard combinatorial problems A Program is a set of rules h L 1,..., L m, not L m+1,..., not L n The symbol not represents default negation or negation as a failure to prove Programs may have variables and functions, but must be grounded before solving

Constraints and Weight Rules Other types of rules: Restrictions: Rules without heads L 1,..., L m, not L m+1,..., not L n Weight rules: Rules made from weight constraints C 0 C 1,..., C n Weight Constraints: L {h 1 = w 1,..., not h n = w n } U. Weight rules make Σ P 2 -complete.

Answer Sets For programs without default negation, the Answer Set is the minimal model that satisfies all rules For programs with default negation, maybe there is not a unique minimal model We must first assume a set a literals and them verify if this set is a minimal model of the resulting rules

Answer Sets Let M be a finite set of atoms of P, the program P M, obtained from P by removing: all the rules that have a literal A in their negative body if A M; the negative body of the remaining rules is called reduction of P by M. Let P M be the reduction of the program P by M, M is an Answer Set of P if the minimal model of P M is M.

Toy Example 4 6 2 5 3 1 An Program that finds all the paths from vertex 1 to n. 1 {visited(x, Y ) = 1 for each edge(x, Y )} 1 vertex(x ), pathto(x ). pathto(1). pathto(y ) pathto(x ), visited(x, Y ), edge(x, Y ). not pathto(n).

Next Topic History Applications 1 2 Answer Set Programming 3 Probabilistic Satisfiability History Applications 4 5 6 7

A Brief History of History Applications Probabilistic logic was proposed in On the Laws of Thought [Boole 1854] Classical probability and classical logic No assumption of a priori statistical independence Rediscovered several times since Boole De Finetti [1937, 1974], Good [1950], Smith [1961] Studied by Hailperin [1965] Nilsson [1986] (re)introduces to AI Papadimitriou et al [1988]: NP-complete Many other works; see Hansen & Jaumard [2000]

The Setting: the language History Applications Logical variables or atoms: P = {x 1,..., x n } Connectives:,,,,. Formulas (L) are inductively composed form atoms using connectives Formulas can be brought to clausal form, but need not be.

Semantics History Applications Propositional valuation v : P {0, 1} Generalized for any propositional formula (clausal or not) v : L {0, 1}

Semantics History Applications Propositional valuation v : P {0, 1} Generalized for any propositional formula (clausal or not) v : L {0, 1} A probability distribution over propositional valuations π : V [0, 1] π(v i ) = 1 2 n i=1

Semantics History Applications Propositional valuation v : P {0, 1} Generalized for any propositional formula (clausal or not) v : L {0, 1} A probability distribution over propositional valuations π : V [0, 1] π(v i ) = 1 2 n i=1 Probability of a formula α according to π P π (α) = {π(v i ) v i (α) = 1}

The Problem History Applications Consider k formulas α 1,..., α k defined on n atoms {x 1,..., x n } A problem Σ is a set of k restrictions Σ = {P(α i ) = p i 1 i k} Probabilistic Satisfiability: are these restrictions consistent?

The Problem History Applications Consider k formulas α 1,..., α k defined on n atoms {x 1,..., x n } A problem Σ is a set of k restrictions Σ = {P(α i ) = p i 1 i k} Probabilistic Satisfiability: are these restrictions consistent? Given Σ = {P(φ i ) = p i φ i L PL, 1 i q}. Is there a π such that P π (φ i ) = p i, for 1 i q?

A example Is the Hypothesis Consistent with the Data History Applications The problem: how to fit precise theories with an imprecise world? Doctor investigating disease D Examine role of genes G 1, G 2, G 3 Hypothesis At least two genes have to be present for D to develop Data Gene occurrence in D-patients G 1 60% G 2 60% G 3 60%

A example History Applications Σ = { P(a b) = P(a c) = P(b c) = 1 P(a) = P(b) = P(c) = 0.6 }

A example History Applications Σ = { P(a b) = P(a c) = P(b c) = 1 P(a) = P(b) = P(c) = 0.6 } v 1 = {a = b = c = 1}, v 2 = {a = b = 1; c = 0}, v 3 = {a = c = 1; b = 0} and v 4 = {a = 0; b = c = 1}.

History Applications The Problem: An Algebraic Formalization Vector of probabilities p of dimension k 1 (given) Consider a large matrix A k 2 n = [a ij ] (computed) a ij = v j (α i ) {0, 1} : decide if there is vector π of dimension 2 n 1 such that Aπ = p πi = 1 π 0 π: probability distribution of exponential size

is NP-complete History Applications is NP-complete: [Georgakopoulos & Kavvadias & Papadimitriou 1988] A problem has a solution, then there is a solution π with at most k + 1 elements π i > 0 Carathéodory s Lemma So has a polynomial size witness 1 1 π 1 0/1 0/1..... π 2. 0/1 0/1 π k+1 = 1 p 1. p k

Possible Applications History Applications has many potential applications Computer models of biological processes Machine learning Fault tolerance/detection Software design and analysis Economics, econometrics, etc. Already successfully applied in Materials Science [Finger 2013]

is more expressive than SAT We want to extend with probabilistic logic Facilitate writing complex models with

A P instance is a grounded Program and a set of probabilities over the atoms The set of probabilities P = {P(a i ) = p i 1 i k} is satisfied by S if there is a probability distribution π over all subsets where p i = {π(v l ) a i v l and v l is Answer Set of S}

Like the, we can write a P problem in matrix form A k 2 HB S = [a ij ], such that a ij = 1 iff the j-th atom subset contains the i-th atom criterion for deciding the satisfiability of a P instance becomes: Aπ = p πi = 1 π 0

Proposition If there exists an solution for a P instance, there is a solution with at most k + 1 nonzero elements in π. This Proposition follows directly from Carathéodory s Lemma. P is Σ P 2 -complete (due to weight rules)

Toy Example 4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 50% and the edge (3, 4) to 40% P(visited(1, 3)) = 0.5, P(visited(3, 4)) = 0.4 Satisfiable with π = [0.5, 0.1, 0.4]

Solving via Linear Programming We decide the consistency by finding a solution to the matrix equation This can be seen as a Linear Programming problem, with the Simplex algorithm We initialize the algorithm using artificial variables The equation with the first base is: 1 1... 1 0 1... 1...... 0 0... 1 π 0 π 1. π k = 1 p 1 The cost function is the number of columns that do not correspond to an Answer Set. p k

Column generation We must generate new columns to reduce the cost function Use an Solver to generate new Answer Sets with the added restriction that it s reduced cost is negative c B B 1 A j > 0 Be careful since is non-monotonic, and you can t guarantee that by adding new rules you will have the same Answer Set But you can if the only rules added are restriction

Inequalities as SAT instances It s possible to express an binary linear inequality as SAT formulas [J.P. Warners, 1998] The coefficients must be integer and are represented in binary form With that we obtain a CNF formula (a 11 a n1 ) (a 1m a jm ) And express it in the program as the restrictions: not a 11,..., not a n1.. not a 1m,..., not a jm.

Weight rules Append the rule: {l 1 = c B B 1 1,..., l k = c B B 1} c B B 1 (possible scaled to only allow integers) k 0

Implementation Implementation available at git://gitorious.org/pasp/pasp-asp.git

Problems Trying to use P on a POS-Tagger revealed some limitations...

4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 20% and the edges (3, 4) and (4, 5) to 40% Unsatisfiable

4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 20% and the edges (3, 4) and (4, 5) to 40% Unsatisfiable Can we find a π that best approximate to these probabilities?

Error Given a probability distribution π, we define the error of π as E(π) = p i {π(v l ) a i v l and v l is Answer Set of S} i P If E(π) = 0, it is satisfiable.

We need a different algorithm 4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 20% and the edges (3, 4) and (4, 5) to 40% π(i 3 ) = 0.4, π(i i ) = 0 i 3: E(π) = 0.2

We need a different algorithm 4 6 2 5 1 3 Limit the percentage of time the edge (1, 3) is used to 20% and the edges (3, 4) and (4, 5) to 40% π(i 3 ) = 0.4, π(i i ) = 0 i 3: E(π) = 0.2 The best solution the algorithm we saw can reach has error 0.4

Slack variables A viable solution will never have a greater probability to an atom It may assign (non-zero) probabilities to non-answer-set sets These can be thought as positive slack variables To have a viable solution with greater probability, we need negative slack variables

minimize c T π [A I I ] π = p π 0 πi = 1 π i > 0 for a column i in the Identity Matrix is summed or subtracted to the probability of the i-th atom. It is part of the error of this atom.

minimize c T π [A I I ] π = p π 0 πi = 1 π i > 0 for a column i in the Identity Matrix is summed or subtracted to the probability of the i-th atom. It is part of the error of this atom. The result is not a distribution over the subsets of HB S, but such a distribution can be obtained.

Implementation and optimization Part-of-speech tagger

Thank you!