On learning of structures for Bayesian networks Timo Koski Mathematical statistics, KTH
|
|
- Lillian Summers
- 7 years ago
- Views:
Transcription
1 On learning of structures for Bayesian networks Timo Koski Mathematical statistics, KTH Sigtuna
2 Introduction Bayesian networks are a special case are multivariate (discrete) statistical distributions embodying a collection marginal and conditional independencies which may be represented by means of a directed acyclic graph.
3 A Bayesian Network W X Y Z W is a parent of X and Y, W and X are parents of Y, and X and Y are parents of Z. P (X, Y, Z, W) = P (Z X, Y) P (Y X, W) P (X W) P (W) We specify the simultaneous distribution by a set of simpler conditional distributions (modularity).
4 A Bayesian Network: learning structure W X Y Z W W X Y X Y Z Z
5 Graph Structure (DAG), random walk 1/5 1/6 omgivning omgivning
6 Model Choice The main challenges are: the increase in the number of graphical model structures as a function of the number of nodes.
7 Model Choice The main challenges are: the increase in the number of graphical model structures as a function of the number of nodes. the equivalence of the statistical models determined by different graphical models.
8 Robinson (1973) has given the following recursive function for computing the number f (d) of directed acyclic graphs with d nodes: f (d) = d ( ( 1) i+1 d i i=1 ) 2 i(d 1) f (d i). (1) This number grows exponentially, for d = 5 it is 29000, and for d = 10 it is approximately
9 Model choice There are two approaches of model selection for Bayesian networks: maximize a score function, e.g., the posterior probability of the graph P (G X), where G is a graph and X is a set of data (a sample of a set of random variables). The algorithms search the space of all possible structures (in fact, one should search over representors of equivalence classes, as discussed in the sequel).
10 Model choice There are two approaches of model selection for Bayesian networks: maximize a score function, e.g., the posterior probability of the graph P (G X), where G is a graph and X is a set of data (a sample of a set of random variables). The algorithms search the space of all possible structures (in fact, one should search over representors of equivalence classes, as discussed in the sequel). constraint based: algorithms estimate from data whether certain conditional independencies between the variables hold.
11 Algebraic statistics Bayesian networks are families of probability distributions, which are real algebraic varieties (or actually semi-algebraic sets): they are zeros of some polynomials in the probability simplex. L.D. Garcia, M. Stillman, and B. Sturmfels: Algebraic geometry of Bayesian networks. Journal of Symbolic Computation, 39, 2005, pp
12 Algebraic statistics & Model Choice L.D. Garcia: Algebraic statistics in Model Selection Uncertainty in Artificial Intelligence, UAI the correct dimension of a model when applying Bayesian Information Criterion (BIC) for model selection with hidden variables. BIC = an asymptotic expansion of P (G X).
13 Notations Consider the random variables X 1,..., X d. Each variable X i assumes values in the discrete and finite alphabet X i. The set of all possible configurations of X 1,..., X d is denoted by X = d i=1 X i. X i = n i. Let the generic element of X be denoted by x. R X denotes the real vector space of d dimensional tables of the format n 1 n 2... n d.
14 Joint probability, Polynomials We assign a joint probability to x X, P (x) = P (X 1 = x 1,...,X d = x d ) = p x1...x d We can regard p x1...x d as indeterminates (real numbers) that generate the ring R [X] of polynomial functions on the space of tables R X. Actually we need not restrict them to be probabilities.
15 Conditional Independence Let X,Y and Z be pairwise disjoint subsets of the random variables X 1,...,X d. We say that X and Y are conditionally independent given Z if P (X = x,y = y Z = z) = P (X = x Z = z) P (Y = y Z = z) for all x,y,z. We write this with the symbol X Y Z
16 Conditional Independence (1) B. Sturmfels 1 observed that X Y Z p xyz p x y z p x yz p xy z = 0 where p xyz e.t.c. are obtained by marginalization from the joint probability p x1...x d. Hence p xyz e.t.c. are polynomials of degreee 1 in R (X). 1 Solving Systems of Polynomial Equations, AMS, 2002
17 Conditional Independence (2) Therefore the cross product differences (cpd) cpd := p xyz p x y z p x yz p xy z correspond to homogeneous quadratic polynomials. Hence all the statements of conditional independence for a distribution P translate into a system of homogeneous quadratic polynomials (Sturmfels, loc.cit. 2002, Theorem 8.1.). We write I X Y Z for the ideal generated by these polynomials.
18 Independence Models Bayesian networks, as many other statistical models, can be described by a finite list of conditional independence statements, M = {X (1) Y (1) Z (1), X (2) Y (2) Z (2),...,X (m) Y (m) Z (m) } and the ideal of the model M is I M = I X (1) Y (1) Z (1) + I X (2) Y (2) Z (2) I X (m) Y (m) Z (m)
19 Independence Variety I M = I X (1) Y (1) Z (1) + I X (2) Y (2) Z (2) I X (m) Y (m) Z (m) The independence variety is V (I M ) is the set of common (complex) zeros of the polynomials in I M, or, equivalently, the set of all n 1 n 2... n d tables which satisfy the conditional independence statements in M.
20 Algebraic statistical model Let now V (I M + < p x1...x d 1 >) denote the subset of the independence variety V (I M ) which consists of non-negative tables whose entries sum to one. This is the subset of the probability simplex specified by the model M.
21 Notations A graph G is a pair (V, E), where E V V \{(i, i) V V } is the set of ordered pairs of nodes denoted as edges. If both (i, j) E and (j, i) E then there is an undirected edge between i and j written as i j. if (i, j) E but (j, i) E there is a directed edge between i and j written as i j. path is a sequence v 0,..., v n of different nodes such that v i v j and (v i 1, v i ) E for all i, j = 1,..., n. A cycle is a path with the only exception that v 0 = v n. A directed cycle is a cycle which has at least one directed edge.
22 Bayesian Network Let G = (V,E) be a directed acyclic graph (DAG) and the set of nodes V = {1,...,d} index the random variables X 1,...,X d 2. Let P be a joint proability distribution of X 1,...,X d. We say that the pair (G, P) is a (discrete)bayesian Network if (G, P) satisfies the following independence model a.k.a. the (directed) local Markov property local (G) = {X i nd(x i ) \ pa(x i ) pa(x i ),i = 1,2,...,d} where nd(x i ) is the set of nondescendants of X i and pa(x i ) is the set of parents of X i. 2 Convention: We are going to identify nodes of a graph with random variables.
23 Bayesian Network local (G) = {X i nd(x i ) \ pa(x i ) pa(x i ),i = 1,2,...,d} X j is a nondescendant of X i if there is no directed path from X i to X j.
24 Bayesian Network local (G) = {X i nd(x i ) \ pa(x i ) pa(x i ),i = 1,2,...,d} X j is a nondescendant of X i if there is no directed path from X i to X j. pa(x i ) is the set of parents of X i.
25 Global Markov and d-separation There may be more conditional independencies in (G,P) than those listed in local (G). The (directed) global Markov property of (G,P) is the list of independence statements where global (G) = {X Y Z X Y G Z }, X Y G Z says that X and Y are d -separated by Z. d-separation is a graphical algorithm to find conditional independence statements in a Bayesian network.
26 d-separation X and Y are d -separated by Z, if and only if all trails from X to Y are blocked by Z. A trail π from X i to X j is blocked in G by (the nodes in) Z if there is a node X b such that either X b Z and arrows of the trail π do not meet head-to-head at X b or X b / Z and X b has no descendants in Z, and arrows of π do meet head-to-head at X b.
27 d-separation
28 Faithfulness We should assume that (G,P) is such that X Y G Z X Y Z This is the starting point of the constraint based methods of model choice.
29 Factorization (1) Assume that Then we set pa(x j ) = {X i1,x i2,...,x ir } q j x 0 x 1 x r = P (X j = x 0 X i1 = x 1,X i2 = x 2,...,X ir = x r ) Let E denote the set of these unknowns q j x 0 x 1 x r, j = 1,2,...,d and let R[E] denote the polynomial ring they generate.
30 Factorization (2) φ : R E R X is defined by d j=1 q j x 1 x r φ px1...x d The image of φ is contained in the independence variety V global(g) The ring homomorphism Φ : R [X] R[E]/J, where J =< q j P x1 x r 1 > p x1...x d d j=1 q j x 1 x r
31 Factorization Theorem Theorem The following four subsets of the probability simplex coincide: ( V I local(g) + < ) p x1...x d 1 > = V ( I global(g) + < ) p x1...x d 1 > = = V (ker (Φ)) = image (φ). This is an improvement of a factorization theorem due to Steffen Lauritzen et.al. expressed in terms of algebraic statistics, and is due to L.D.Garcia, M. Stillman & B. Sturmfels: Algebraic geometry of Bayesian networks. Journal of Symbolic Computation, 39, 2005, pp
32 Markov equivalence There may be several graphs G such that (G,P) is a Bayesian network for a given P. In other words, G 1 and G 2 may have global(g 1 ) = global(g 2 ) for a given P. This cannot be ignored in the construction of algorithms. We shall discuss this within the context of scoring algorithms by introduction of partially directed acyclic graphs.
33 Markov equivalence A partially directed acyclic graph (PDAG) 3 is a graph G that does not contain any directed cycles. A directed acyclic graph (DAG) is a PDAG with only directed edges and an undirected graph (UG) is a PDAG with only undirected edges. M. Frydenberg: The Chain Graph Markov Property. Scand. J. Stat., 17, , S.A.Andersson, D. Madigan & M.D. Perlman: A Characterization of Markov Equivalence Classes for Acyclic Digraphs. The Annals of Statistics, 25, , a.k.a. CHAIN GRAPH
34 Conditional Independence Statements for PDAGs Lauritzen, Wermuth and Frydenberg (LWF) Markov properties. P,L,G relative to a (DAG, UG, PDAG) are defined as 1 the pairwise Markov property P, if for any pair (i,j) of nodes (i,j),(j,i) E with j nd(i), i j nd(i)\{i,j}. 2 the local Markov property L, for any i V, i nd(i)\bd(i) bd(i). 3 the global Markov property G, if for any triple of disjoint subsets (A,B,S V ) such that S separates A from B in (G An(A B S) ) m, A B S.
35 Markov equivalence A complex in a PDAG is a sequence of nodes v 1,...,v k,k 3 such that v 1 v 2,v i v i+1 for i = 2,...,k 2,v k 1 v k and there are no other edges (v i,v j ) E for all 0 i,j k. v 1 v k v v 3 v k 2 v k 1 A Complex
36 Markov equivalence The following is proved in (Frydenberg 1990), see also (Andersson et al 1997). Theorem Two PDAGs have the same (LWF) Markov properties iff they have the same undirected graph and the same complexes.
37 Markov equivalence b b b b a d a d a d a d c c c c D 1 D 2 D 3 D 4 { D D D } D 4
38 Markov equivalence A graph G = (V,E) is larger than the graph Ĝ = (V,Ê) if G might have edges where Ĝ has arrows. Theorem (Frydenberg 1990) For any PDAG G there exists a unique PDAG G = (Ṽ,Ẽ) with the same Markov properties as G such that G is larger than any other graph Ĝ = (ˆV,Ê) that has the same Markov properties as G.
39 Markov equivalence Each Markov-equivalence class is uniquely determined by a single PDAG (a chain graph), simultaneously equivalent to each graph in the class, hence the learning of structure should be based on these representatives. The question is, how to construct them.
40 Markov equivalence M. Volf and M. Studený: A graphical characterization of the largest chain graphs. International Journal of Approximate Reasoning 20, pp , Volf and Studený define a descending path, if either i i + 1 or i i + 1 for all i = 1,...,n 1. an ancestor of a node j as a node i such that there exists a descending path from i to j in G.
41 Markov equivalence A directed edge X i X j covers X y X z if X i is an ancestor of X y and X z is an ancestor of X j in a PDAG. An edge is protected iff it covers a directed edge which belongs to a complex. u x v y u v covers x y
42 Markov equivalence Theorem (Volf & Studený) A LPDAG is the largest PDAG of a Markov equivalent set of PDAGs iff every directed edge is protected. The largest graph is constructed by undirecting every non-protected arrow.
43 Markov equivalence b b b b a d a d a d a d c c c c D 1 D 2 D 3
44 Markov equivalence Note however that it is possible to represent a set of Markov properties in a PDAG that cannot be represented in a DAG or UG, such as Hence we need Gröbner bases for I M to check this. For small d the known number of LPDAGs that cannot be represented by UGs is 0 for d = 2,3 (Volf and Studený 1999). For d = 4,5 the percentage of LPDAGs that cannot be represented as UGs is 6 and 22.
45 Markov equivalence The presence of equivalence suggests the use of search space S, the states of which are equivalence classes represented by its LPDAG, that is, graphical models that all have the same Markov properties according to Definition LWF. Then the size of a search space, where the states are equivalence classes, is smaller than, e.g., the space of all DAGs for d 10 (Gillispie and Perlman 2001).
46 Milan Studený and his co-workers are currently developing (have already developed (?)) and algebraic representation, using objects called imsets, of the equivalence classes that so that an equivalence class corresponds uniquely to (a long) vector (with lots of zeros). Then the polytope generated by the imsets may be the basis of learning Bayesian networks by linear programming. J. Vomlel & M. Studený: Graphical and Algebraic Representatives of Conditional Independence Models. home.html
47 Marginal data distribution In the current approach the next issue is to compute the marginal data distribution P (X G) taking into account the equivalence classes. We follow/present the solution due to due to The symbol X designates a set of data, or, r samples of each of the variables X 1,...,X d, i.e., X = { x (l)} r l=1, x(l) X, with neither missing values nor hidden variables. For a subset A V, the corresponding random variable is X A = (X i ) i A.
48 Factorization Theorem Theorem (Frydenberg 1990) A probability distribution on a discrete and finite sample space with strictly positive density P, w.r.t. a product measure, satisfies P (see Definition LWF) over a PDAG G if and only if it factorizes as P(x) = P ( ) x τ x bd(τ), (2) τ T (G) where each factor P ( x τ x bd(τ) ) further factorizes along the undirected closure graph G m cl(τ).
49 Hyper Dirichlet priors A clique is a complete subgraph G A, such that there exists no (other) complete subgraph G A, A A V such that G A G A. A hyper-dirichlet density is defined for given real positive numbers λ c = {λ c,xc } xc X c let D(λ c ) for θ c by the density π (θ c λ c ) = Γ(λ 0) γ p θ λc,xc 1 c,x c (3) x c X c on the set { θ c x c X c θ c,xc = 1,θ c,xc > 0 }, where Γ(x) is the Euler gamma function, λ 0 = x c X c λ c,xc,γ p = x c X c Γ(λ c,xc ).
50 Let G be a PDAG, and let us set P (X θ,g) = r P τ T (G) l=1 ( ) x l τ θ τ,x l bd(τ), by (2), which is formally rewritten to include the parameters.
51 Next note that in (2) the probabilities P ( ) x τ θ τ,x bd(τ) satisfy the LWF Markov property P on the undirected closure graph G m cl(τ). Recall that a probability P(x θ) > 0 satisfying the property P on an undirected graph factorizes as c C P(x θ) = θ c,x c s S θ, s,x s where C {A A V } is the set of cliques and S {A A V } is the multiset (including repetitions of elements) of separators (for two cliques C i and C j the separator S = C i C j ), respectively. Thus ( ) r l=1 P x l τ θ τ,x l bd(τ) is a function of a product of expressions of the following form r θ(x l c) = l=1 θc,x nc,xc c, x c X c where { n c,xc } is the number of times the configuration x c occurs in r X c = x (l) c l=1.
52 On any clique c in G m cl(τ) we introduce the integration with respect to D(λ c,i ) as follows, c,x c π (θ c λ c ) dθ c. {θ c P θ nc,xc xc Xc θc,xc =1,θ c,xc >0} x c X c
53 By standard properties of the Dirichlet integral one gets from (3) that P c (X c ) = Γ(λ 0) Γ(n c,xc + λ c,xc), (4) Γ(r + λ) Γ(λ c,xc ) x c X c where r + λ = x c X c (n c,xc + λ c,xc ).
54 Hence, if C(τ) and S(τ) are the set of cliques and separators in the undirected graph G τ r ( ) P x l τ x l c C(τ) bd(τ) = P c (X c ) s S(τ) P s (X c ). (5) l=1 The probability P s (X c ) can be constructed for a separator by marginalizing over a clique that includes S. For the validity of this argument and the properties of the marginal data distribution, see (Dawid and Lauritzen 1993). The expression (5) must be multiplied over all the chain components τ in order to yield the overall marginal data distribution
55 P (X G) = τ T (G) c C(τ) P c (X c ) s S(τ) P s (X c ). (6) From (6) we define the final scoring function as the posterior probability P (G X) = P (X G)P (G) (7) P (X G)P (G). G S For a given set of nodes V let S = {LPDAG} be the search space. One particular goal for the topology learning can be stated as the identification of a structure G opt S having highest posterior probability (7), i.e. G opt arg max G S P (G X).
56 As noted by (Andersson et al 1997), the factors in (2) P ( x τ x bd(τ) ) satisfy also P on Gτ, i.e., the graph induced by the chain component 4 τ. When the so-called hyper-dirichlet distributions are used as priors the marginal data likelihood has the LWF Markov properties over graphs (Dawid and Lauritzen 1993). 4 the connected components of a graph with arrows removed
57 We need to construct an irreducible Markov chain searching the space. For PDAGs we observe the next result, which is Proposition 4.5 of (Andersson et al 1997). Theorem Consider two graphs G and H with same set of nodes in S = {LPDAG}. Then there exists a finite sequence of LPDAGs G G 1,...,G k H such that each consecutive pair G i,g i+1 differs by either (i) exactly one undirected edge, (ii) exactly by one directed edge, or (iii) by two edges that form an immorality.
58 Non-reversible MCMC An algorithm, based on parallell Markov chains G j = {G tj } with the transition kernels, with the probability of a transition from a current state S to a proposed (by the mechanism above) new state S, as ( ) min 1, p(x G ). p(x G) The chains draw at independent random times a value from P (G j X). This forces the chains to move to regions of maximal posterior probability.
59 The Parallel Search Define the sequence of probabilities {α t,t = 2,3,...} according to α t = 1 q log t, where q 1 can be chosen suitably, for instance q [5,10]. Z 0 = 0, and P(Z t = 1) = α t,p(z t = 0) = 1 α t, independently for t = 1,2,....
60 The Parallel Search For each t = 0, 1,..., define the distribution P t (G tj ) = p(x G tj) m j=1 p(x G t), over the space of the current states {G t1,g t2,...,g tm }. For each t = 0, 1,... such that Z t = 1, the transition to the next state is determined according to this distribution, such that the next state for each chain is sampled to the non-reversible proposal-acceptance formulae, independently for j = 1,..., m.
61 The Parallel Search For each t, such that Z t = 0, transition to the next state S (t+1)j is determined according to the non-reversible proposal-acceptance formulae above independently for j = 1,...,m.
62 Table: Prognostic factors in coronary heart disease. B yes no F E D C A no yes no yes neg < 3 < 140 no yes > 140 no yes > 3 < 140 no yes > 140 no yes pos < 3 < 140 no yes > 140 no yes > 3 < 140 no yes > 140 no yes
63 Table: Explanations of the Labels in Table 1 Label Meaning Range A smoking no,yes B strenuous mental work no,yes C strenuous physical work no,yes D systolic blood pressure < 140, > 140 E ratio of β and α lipoproteins < 3,> 3 F family anamnesis 5 of coronary heart disease no,yes 5 information concerning a medical patient and his/her background for use in analysis of her/his condition
64 The optimal equivalence class, having the estimated posterior probability.32 F B E C A D
65 The optimal equivalence class, having the estimated posterior probability.32 and the cliques {F }, {B,C}, {A,C,E}, {A,D,E}.
66 The method enables also consistent estimation of the marginal posterior probabilities of any edges being present in the graphs, that is we estimate the marginal probabilities as {G=(V,E) {UG} G S ˆP(e X) = P(G X) t,e E} {G {UG} G S P(G X). t} (c.f. M Koivisto: Advances in Exact Bayesian Structure Discovery in Bayesian Networks 2006.) These are illustrated in Table 3 where the probability of adjacency is given for all pairs of the considered variables.
67 Table: The estimated marginal posterior probabilities of edges for the coronary heart disease data. A B C D E F A B C D E F
68 Graph Structure Logarithm of the marginal data distribution Iterations
69 For the special case of decomposable UGs, (Frydenberg and Lauritzen 1989) shows that search operators adding and deleting of single edges in decomposable UGs is sufficient to guarantee irreducibility. Corollary The Markov chain G (i) with transition mechanism defined by the kernel, Algorithm and the search operator in the Algorithm is irreducible with respect to S ={UG}.
70 The example above F B E C A D P ψ BC ψ CAE ψ EAD ψ F
71 Joint work with: parts of the talk are based on a paper to appear in Data Mining and Knowledge Discovery, 2008, joint work with Jukka Corander and Magnus Ekdahl.
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationAlgebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.
This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra
More informationHow To Prove The Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
More informationChapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
More informationit is easy to see that α = a
21. Polynomial rings Let us now turn out attention to determining the prime elements of a polynomial ring, where the coefficient ring is a field. We already know that such a polynomial ring is a UF. Therefore
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
More informationSOLVING POLYNOMIAL EQUATIONS
C SOLVING POLYNOMIAL EQUATIONS We will assume in this appendix that you know how to divide polynomials using long division and synthetic division. If you need to review those techniques, refer to an algebra
More informationQuotient Rings and Field Extensions
Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationPartial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:
Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: From the standpoint of integration, the left side of Equation 1 would be much easier to work with than
More informationThis unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.
Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationScheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
More informationGröbner Bases and their Applications
Gröbner Bases and their Applications Kaitlyn Moran July 30, 2008 1 Introduction We know from the Hilbert Basis Theorem that any ideal in a polynomial ring over a field is finitely generated [3]. However,
More informationa 1 x + a 0 =0. (3) ax 2 + bx + c =0. (4)
ROOTS OF POLYNOMIAL EQUATIONS In this unit we discuss polynomial equations. A polynomial in x of degree n, where n 0 is an integer, is an expression of the form P n (x) =a n x n + a n 1 x n 1 + + a 1 x
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationMath 120 Final Exam Practice Problems, Form: A
Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,
More informationTree-representation of set families and applications to combinatorial decompositions
Tree-representation of set families and applications to combinatorial decompositions Binh-Minh Bui-Xuan a, Michel Habib b Michaël Rao c a Department of Informatics, University of Bergen, Norway. buixuan@ii.uib.no
More informationJust the Factors, Ma am
1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive
More informationNEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
More informationClass One: Degree Sequences
Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of
More informationBig Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
More informationClovis Community College Core Competencies Assessment 2014 2015 Area II: Mathematics Algebra
Core Assessment 2014 2015 Area II: Mathematics Algebra Class: Math 110 College Algebra Faculty: Erin Akhtar (Learning Outcomes Being Measured) 1. Students will construct and analyze graphs and/or data
More informationLecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationBasics of Polynomial Theory
3 Basics of Polynomial Theory 3.1 Polynomial Equations In geodesy and geoinformatics, most observations are related to unknowns parameters through equations of algebraic (polynomial) type. In cases where
More information1 Lecture: Integration of rational functions by decomposition
Lecture: Integration of rational functions by decomposition into partial fractions Recognize and integrate basic rational functions, except when the denominator is a power of an irreducible quadratic.
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationLEARNING OBJECTIVES FOR THIS CHAPTER
CHAPTER 2 American mathematician Paul Halmos (1916 2006), who in 1942 published the first modern linear algebra book. The title of Halmos s book was the same as the title of this chapter. Finite-Dimensional
More informationNETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More informationIRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL. 1. Introduction
IRREDUCIBLE OPERATOR SEMIGROUPS SUCH THAT AB AND BA ARE PROPORTIONAL R. DRNOVŠEK, T. KOŠIR Dedicated to Prof. Heydar Radjavi on the occasion of his seventieth birthday. Abstract. Let S be an irreducible
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationDiscrete Mathematics & Mathematical Reasoning Chapter 10: Graphs
Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph
More informationLecture 17 : Equivalence and Order Relations DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/31/2011 Lecture 17 : Equivalence and Order Relations Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT Last lecture we introduced the notion
More informationApplication. Outline. 3-1 Polynomial Functions 3-2 Finding Rational Zeros of. Polynomial. 3-3 Approximating Real Zeros of.
Polynomial and Rational Functions Outline 3-1 Polynomial Functions 3-2 Finding Rational Zeros of Polynomials 3-3 Approximating Real Zeros of Polynomials 3-4 Rational Functions Chapter 3 Group Activity:
More informationMessage-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationMath Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
More informationLecture 16 : Relations and Functions DRAFT
CS/Math 240: Introduction to Discrete Mathematics 3/29/2011 Lecture 16 : Relations and Functions Instructor: Dieter van Melkebeek Scribe: Dalibor Zelený DRAFT In Lecture 3, we described a correspondence
More informationFinding and counting given length cycles
Finding and counting given length cycles Noga Alon Raphael Yuster Uri Zwick Abstract We present an assortment of methods for finding and counting simple cycles of a given length in directed and undirected
More informationOn the mathematical theory of splitting and Russian roulette
On the mathematical theory of splitting and Russian roulette techniques St.Petersburg State University, Russia 1. Introduction Splitting is an universal and potentially very powerful technique for increasing
More informationA Sublinear Bipartiteness Tester for Bounded Degree Graphs
A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite
More information11 Multivariate Polynomials
CS 487: Intro. to Symbolic Computation Winter 2009: M. Giesbrecht Script 11 Page 1 (These lecture notes were prepared and presented by Dan Roche.) 11 Multivariate Polynomials References: MC: Section 16.6
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLarge-Sample Learning of Bayesian Networks is NP-Hard
Journal of Machine Learning Research 5 (2004) 1287 1330 Submitted 3/04; Published 10/04 Large-Sample Learning of Bayesian Networks is NP-Hard David Maxwell Chickering David Heckerman Christopher Meek Microsoft
More informationIntroduction to Algebraic Geometry. Bézout s Theorem and Inflection Points
Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a
More informationMicroeconomic Theory: Basic Math Concepts
Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationLinear Maps. Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007)
MAT067 University of California, Davis Winter 2007 Linear Maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 5, 2007) As we have discussed in the lecture on What is Linear Algebra? one of
More informationLecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization
Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued
More informationFactorization Algorithms for Polynomials over Finite Fields
Degree Project Factorization Algorithms for Polynomials over Finite Fields Sajid Hanif, Muhammad Imran 2011-05-03 Subject: Mathematics Level: Master Course code: 4MA11E Abstract Integer factorization is
More informationSolving Systems of Linear Equations
LECTURE 5 Solving Systems of Linear Equations Recall that we introduced the notion of matrices as a way of standardizing the expression of systems of linear equations In today s lecture I shall show how
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More information4.5 Linear Dependence and Linear Independence
4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then
More informationMCS 563 Spring 2014 Analytic Symbolic Computation Wednesday 9 April. Hilbert Polynomials
Hilbert Polynomials For a monomial ideal, we derive the dimension counting the monomials in the complement, arriving at the notion of the Hilbert polynomial. The first half of the note is derived from
More informationDesign of LDPC codes
Design of LDPC codes Codes from finite geometries Random codes: Determine the connections of the bipartite Tanner graph by using a (pseudo)random algorithm observing the degree distribution of the code
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationFigure 1.1 Vector A and Vector F
CHAPTER I VECTOR QUANTITIES Quantities are anything which can be measured, and stated with number. Quantities in physics are divided into two types; scalar and vector quantities. Scalar quantities have
More informationIE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
More informationI. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More information1.7. Partial Fractions. 1.7.1. Rational Functions and Partial Fractions. A rational function is a quotient of two polynomials: R(x) = P (x) Q(x).
.7. PRTIL FRCTIONS 3.7. Partial Fractions.7.. Rational Functions and Partial Fractions. rational function is a quotient of two polynomials: R(x) = P (x) Q(x). Here we discuss how to integrate rational
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationEquations, Inequalities & Partial Fractions
Contents Equations, Inequalities & Partial Fractions.1 Solving Linear Equations 2.2 Solving Quadratic Equations 1. Solving Polynomial Equations 1.4 Solving Simultaneous Linear Equations 42.5 Solving Inequalities
More informationINTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).
INTERPOLATION Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y). As an example, consider defining and x 0 =0, x 1 = π 4, x
More information. 0 1 10 2 100 11 1000 3 20 1 2 3 4 5 6 7 8 9
Introduction The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive integer We say d is a
More informationON THE COEFFICIENTS OF THE LINKING POLYNOMIAL
ADSS, Volume 3, Number 3, 2013, Pages 45-56 2013 Aditi International ON THE COEFFICIENTS OF THE LINKING POLYNOMIAL KOKO KALAMBAY KAYIBI Abstract Let i j T( M; = tijx y be the Tutte polynomial of the matroid
More informationMATH 90 CHAPTER 6 Name:.
MATH 90 CHAPTER 6 Name:. 6.1 GCF and Factoring by Groups Need To Know Definitions How to factor by GCF How to factor by groups The Greatest Common Factor Factoring means to write a number as product. a
More informationUnified Lecture # 4 Vectors
Fall 2005 Unified Lecture # 4 Vectors These notes were written by J. Peraire as a review of vectors for Dynamics 16.07. They have been adapted for Unified Engineering by R. Radovitzky. References [1] Feynmann,
More informationThe Method of Partial Fractions Math 121 Calculus II Spring 2015
Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method
More informationComplexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationA Turán Type Problem Concerning the Powers of the Degrees of a Graph
A Turán Type Problem Concerning the Powers of the Degrees of a Graph Yair Caro and Raphael Yuster Department of Mathematics University of Haifa-ORANIM, Tivon 36006, Israel. AMS Subject Classification:
More informationFactoring of Prime Ideals in Extensions
Chapter 4 Factoring of Prime Ideals in Extensions 4. Lifting of Prime Ideals Recall the basic AKLB setup: A is a Dedekind domain with fraction field K, L is a finite, separable extension of K of degree
More informationContinued Fractions. Darren C. Collins
Continued Fractions Darren C Collins Abstract In this paper, we discuss continued fractions First, we discuss the definition and notation Second, we discuss the development of the subject throughout history
More informationLecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationFactorization in Polynomial Rings
Factorization in Polynomial Rings These notes are a summary of some of the important points on divisibility in polynomial rings from 17 and 18 of Gallian s Contemporary Abstract Algebra. Most of the important
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More informationMath 4310 Handout - Quotient Vector Spaces
Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationCatalan Numbers. Thomas A. Dowling, Department of Mathematics, Ohio State Uni- versity.
7 Catalan Numbers Thomas A. Dowling, Department of Mathematics, Ohio State Uni- Author: versity. Prerequisites: The prerequisites for this chapter are recursive definitions, basic counting principles,
More informationSo let us begin our quest to find the holy grail of real analysis.
1 Section 5.2 The Complete Ordered Field: Purpose of Section We present an axiomatic description of the real numbers as a complete ordered field. The axioms which describe the arithmetic of the real numbers
More informationSolving Simultaneous Equations and Matrices
Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More information1 = (a 0 + b 0 α) 2 + + (a m 1 + b m 1 α) 2. for certain elements a 0,..., a m 1, b 0,..., b m 1 of F. Multiplying out, we obtain
Notes on real-closed fields These notes develop the algebraic background needed to understand the model theory of real-closed fields. To understand these notes, a standard graduate course in algebra is
More informationPYTHAGOREAN TRIPLES KEITH CONRAD
PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient
More information1. Prove that the empty set is a subset of every set.
1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since
More informationMarkov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More information