Perron vector Optimization applied to search engines



Similar documents
Search engines: ranking algorithms

2.3 Convex Constrained Optimization Problems

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

CHAPTER 9. Integer Programming

Mathematical finance and linear programming (optimization)

DATA ANALYSIS II. Matrix Algorithms

Part 1: Link Analysis & Page Rank

Introduction to Markov Chain Monte Carlo

(Quasi-)Newton methods

5.1 Bipartite Matching

Big Data - Lecture 1 Optimization reminders

Computing a Nearest Correlation Matrix with Factor Structure

Linear Programming I

LECTURE 4. Last time: Lecture outline

An interval linear programming contractor

A Network Flow Approach in Cloud Computing

Monte Carlo methods in PageRank computation: When one iteration is sufficient

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

3. Linear Programming and Polyhedral Combinatorics

arxiv: v3 [math.na] 1 Oct 2012

The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Practical Graph Mining with R. 5. Link Analysis

Solving polynomial least squares problems via semidefinite programming relaxations

Transportation Polytopes: a Twenty year Update

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Random graphs with a given degree sequence

Nonlinear Programming Methods.S2 Quadratic Programming

SPECTRAL POLYNOMIAL ALGORITHMS FOR COMPUTING BI-DIAGONAL REPRESENTATIONS FOR PHASE TYPE DISTRIBUTIONS AND MATRIX-EXPONENTIAL DISTRIBUTIONS

Definition Given a graph G on n vertices, we define the following quantities:

Can linear programs solve NP-hard problems?

Approximation Algorithms

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

4.6 Linear Programming duality

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

THE $25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE

IEOR 6711: Stochastic Models, I Fall 2012, Professor Whitt, Final Exam SOLUTIONS

Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Inner Product Spaces

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

Scheduling Shop Scheduling. Tim Nieberg

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

The Mathematics Behind Google s PageRank

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

TD(0) Leads to Better Policies than Approximate Value Iteration

Minkowski Sum of Polytopes Defined by Their Vertices

Master s Theory Exam Spring 2006

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Statistical Machine Learning

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Large-Sample Learning of Bayesian Networks is NP-Hard

Proximal mapping via network optimization

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Linear Programming Notes V Problem Transformations

Binary Image Reconstruction

SEQUENCES OF MAXIMAL DEGREE VERTICES IN GRAPHS. Nickolay Khadzhiivanov, Nedyalko Nenov

NP-Hardness Results Related to PPAD

Adaptive Online Gradient Descent

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

Practical Guide to the Simplex Method of Linear Programming

Optimization Modeling for Mining Engineers

Continuity of the Perron Root

Follow the Perturbed Leader

IEOR 4404 Homework #2 Intro OR: Deterministic Models February 14, 2011 Prof. Jay Sethuraman Page 1 of 5. Homework #2

Parallelism and Cloud Computing

On Integer Additive Set-Indexers of Graphs

Scheduling a sequence of tasks with general completion costs

The Perron-Frobenius theorem.

LECTURE: INTRO TO LINEAR PROGRAMMING AND THE SIMPLEX METHOD, KEVIN ROSS MARCH 31, 2005

Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Solving Method for a Class of Bilevel Linear Programming based on Genetic Algorithms

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Lecture 6: Logistic Regression

Applications to Data Smoothing and Image Processing I

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Actually Doing It! 6. Prove that the regular unit cube (say 1cm=unit) of sufficiently high dimension can fit inside it the whole city of New York.

Equilibria and Dynamics of. Supply Chain Network Competition with Information Asymmetry. and Minimum Quality Standards

Algorithm Design and Analysis

Weakly Secure Network Coding

Mathematics Course 111: Algebra I Part IV: Vector Spaces

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

ELA

Transcription:

Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011

Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink based rankings: - PageRank [Brin and Page, 1998] - HITS [Kleinberg, 1998] - Salsa [Lempel and Moran, 2000], HOTS [Tomlin, 2003] Webmasters want to be on top of the list

Toy example with 21 pages 3 5 6 2 4 20 1 12 9 11 15 16 Nodes = web pages Arcs = hyperlinks 21 : controlled page 18 21 17 13 8 14 1 : non controlled page 19 10 7

Definition of PageRank [Brin and Page, 1998] An important page is a page linked to by important pages π i = j:j i π i = popularity of page i π j N j = j π j S ji Model for the behaviour of a random surfer Markov chain model may be reducible with probability 1 α, surfer gets bored and resets P i,j = αs i,j + (1 α)z j P i,j > 0, i, j (α = 0.85) PageRank is the unique invariant measure π of P

Definition of HITS [Kleinberg, 1998] Given a query, we build a subgraph of the web graph, focused on the query Hub and authority scores ρy j = i:j i x i ρx i = j:j i A: adjacency matrix of the subgraph: A T Ax = ρ 2 x Uniqueness guaranteed with (A T A + ξee T )x = ρ 2 x Generalization and application to synonyms by [Blondel, Gajardo, Heymans, Senellart and van Dooren, 2004] y j

The problem Maximize ranking Controlled pages and non controlled pages Effort concentrated on PageRank - Avrachenkov and Litvak, 2006, one page - Matthieu and Viennot, 2006, unconstrained problem - de Kerchove, Ninove, van Dooren, 2008, qualitative result - Ishii and Tempo, 2010, computation with fragile data - Csáji, Jüngers and Blondel, 2010, optimization algorithm - OF, Akian, Bouhtou, Gaubert, 2010 (arxiv:1011.2348v1)

PageRank optimization Obligatory links O, optional links F, prohibited links I [Ishii and Tempo, 2010] Discrete problem 2 n hyperlink configurations by controlled page Utility f PR (u, P) = i r i,ju i P i,j r i,j is viewed as reward by click on i j

Admissible transition matrices (PageRank) Theorem (OF, Akian, Bouhtou, Gaubert ) The convex hull of the set of admissible transition probabilities is either a simplex or a polyhedron defined by: j I i, j O i \ {j 0 }, j F i, x j = (1 α)z j x j = x j0 (1 α)z j x j x j0 j [n] x j = 1 Discrete problem equivalent to a continuous problem [Csáji, Jüngers and Blondel, 2010]: graph rewriting

Proposition Reduction to ergodic control The PageRank Optimization problem is equivalent to the ergodic control problem: 1 max lim inf (ν t) t 0 T + T E( T 1 r X t,x t+1 ) t=0 ν t P Xt, t 0 P(X t+1 = j X t = i, ν t = p) = p j, i, j [n], p P i, t 0 where ν t is a function of the history (X 0, ν 0,..., X t 1, ν t 1, X t ) Corollary PageRank optimization problem with local constraints is solvable in polynomial time

Resolution by Dynamic Programming The ergodic dynamic programming equation w i + ψ = max ν P i ν(r i, + w), i [n] (1) has a solution (w, ψ) R n R. The constant ψ is unique and is the value of the ergodic control problem. An optimal strategy is obtained by selecting for each state i a maximizing ν P i.

Resolution by Dynamic Programming The ergodic dynamic programming equation w i + ψ = max ν P i ν(r i, + w), i [n] (1) has a solution (w, ψ) R n R. The constant ψ is unique and is the value of the ergodic control problem. An optimal strategy is obtained by selecting for each state i a maximizing ν P i. The unique solution of the discounted equation w i = max αν(r i, +w)+(1 α)zr i,, i [n] (2) ν : αν+(1 α)z P i is solution of (1) with ψ = (1 α)zw The fixed point scheme for (2) has contracting factor α independent of the dimension

Optimality criterion Define P P, v(p) = (I n αs) 1 r, where r i = j P i,jr i,j If r i,j = r i, v(p) is the mean reward before teleportation considered by [de Kerchove, Ninove, van Dooren, 2008] (v j (P) + r i,j )π i (P) is the derivative of the objective at P Proposition P P is an optimum of the PageRank Optimization problem if and only if: i, δ R n s.t. Pi, +δ P i, we have: (v j(p )+r i,j )δ j 0 j v(p ) is solution of the dynamic programming equation.

Existence of a master page From the optimality criterion, we get: Theorem (OF, Akian, Bouhtou, Gaubert) If i, r i,j = r i, there is a page to which all controlled pages should link. [de Kerchove, Ninove and van Dooren, 2008]: similar result with less flexible constraints

Web graph optimized for PageRank 3 5 6 2 1 9 15 added links 4 20 12 11 16 PageRank sum: 0.0997 0.1694 18 21 17 13 8 14 19 10 7

HITS optimization A choice of optional links gives an adjacency matrix A Authority vector: ρu = (A T A + ξee T )u = h H (A)u f H (u) = i r iui 2, i u2 i = 1 HITS optimization on weighted graphs: still nonconvex We abandon global optimality But efficient and scalable computation

Perron vector Optimization Problem We study the following Perron vector Optimization Problem: max f (u(m)) M h(c) h : R m R n n + and f : R n + R are differentiable u : R n n + R n + is the function that to an irreducible matrix associates its normalized Perron vector C convex, h(c) a set of nonnegative irreducible matrices

NP-hardness Consider the following problems with entries a linear function A and a vector b: (P1) min ρ(m) (P2) min f (u(m)) A(M) b, M 0 A(M) b, M 0 Proposition The problems (P1) and (P2) are NP-hard problems. Proof LMP, proved NP-hard by Matsui, reduces to (P1) and (P2) 0 0 1 (LMP) min x 1 x 2 M = x Ax b, x 0 1 0 0 0 x 2 0

Derivative for Perron vector optimization Computing g ij = f (u) T u M ij is enough thanks to chain rule Proposition (Deutsch and Neumann) Denote R = (M ρi ) # the reduced resolvent at ρ and p the normalization factor at u. u M ij (M) = Re i u j + (p T Re i u j )u Corollary Let w T = ( f T + ( f u)p T )R, then g ij = f T u M ij (M) = w i u j

Iterative scheme for the derivative Proposition Let M be a primitive matrix with Perron vectors u and v. We denote M = 1 ρ M, P = uv and z = 1 ρ ( f T + ( f u)p T ) Then the fix point scheme defined by k N, w k+1 = (z + w k M)(I P) converges in geometric speed to w = ( f T + ( f u)p T )(M ρi ) #

Descent algorithm First order descent method Criterion to go to next gradient step before convergence of power iterations Computation of value and derivative all together: u k+1 = Mu k N(Mu k ) v k+1 = v km v k Mu k+1 z k = 1 ρ k ( f (u k ) T + ( f (u k ) u k )p T k ) w k+1 = (z k + 1 ρ k w k M)(I u k+1 v k+1 )

Coupling power and gradient iterations α = α 0 M = M α α αβ Matrix building M α Function evaluation M f α > f σ α δ2? f α f σ α δ2

Coupling power and gradient iterations α = α 0 M = M α α αβ Matrix building M α Power iteration M f α > f σ α δ2? otherwise fα f σ α δ2

Convergence Proposition Power + gradient algorithm converges to a stationary point. Efficiency depends on the quality of the bounds in the Armijo test For HITS, f = f (u k ) r u u k 2 2 2 r u k 2 u u k 2 f = f (u k ) + r u u k 2 2 + 2 r u k 2 u u k 2 f f (u) f

Bounds Lemma M nonnegative, symmetric matrix, Mu = ρu u 0 > 0, u 0 = 1, u 1 = Mu0, a = (max i Mu 0 u u 1 2 u 1 u 0 2 Proof uses [Collatz, 1942]: M 0, u > 0 min i (Mu) i u i (Mu 0 ) i u i 0 ) 2 Mu 0 2 Mu 0 2 σ 2 2 ( σ2 ρ ) 2 (1 σ 2 ρ ) 2 13a 2 ρ(m) max i (Mu) i u i

Web graph optimized for HITS 3 5 6 2 18 21 4 20 17 1 12 13 9 11 8 14 15 16 added internal links added external links HITS authority sum: 0.0555 0.3471 19 10 7

Experimental results A fragment of the real web graph with 1,500 pages (nodes) and 18,398 hyperlinks (arcs) controlled web site has 49 pages PageRank HITS authority Method Dynamic programming Gradient algorithm Initial value 0.0185 0.00000378 Value with clique 0.0693 0.0000169 Final value 0.0830 0.2266 Execution time 1.27 s 1600 s Sequential Matlab code on an Intel Xeon CPU at 2.98 Ghz

Numerical experiment for HITS optimization 10,725 descent iterations, 10,776 matrix buildings, 1,737,677 power iterations, 0.21 reached in 150 iterations

Conclusion PageRank: - Global optimality, discrete solution - Very fast and scalable algorithm - Qualitative results on the optimal solutions HITS authority: - No global optimality result - No qualitative results on the optimal solutions - Non discrete local optimal solutions - Efficient and scalable descent algorithm Application in population dynamics with Billy, Clairambault, Gaubert and Lepoutre