CheiRank versus PageRank



Similar documents

DATA ANALYSIS II. Matrix Algorithms

The PageRank Citation Ranking: Bring Order to the Web

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Tutorial on Markov Chain Monte Carlo

Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time

Notes on Symmetric Matrices

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Part 1: Link Analysis & Page Rank

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

There is a physics joke about the stages of learning quantum mechanics:

( ) = ( ) = {,,, } β ( ), < 1 ( ) + ( ) = ( ) + ( )

Search engines: ranking algorithms

Is Quantum Mechanics Exact?

THE $25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE

Linear Algebra Review. Vectors

Lecture L2 - Degrees of Freedom and Constraints, Rectilinear Motion

1 Portfolio mean and variance

Atoms Absorb & Emit Light

Performance Analysis of a Telephone System with both Patient and Impatient Customers

AP Physics B Ch. 23 and Ch. 24 Geometric Optics and Wave Nature of Light

1 o Semestre 2007/2008

Walk-Based Centrality and Communicability Measures for Network Analysis

5. The Nature of Light. Does Light Travel Infinitely Fast? EMR Travels At Finite Speed. EMR: Electric & Magnetic Waves

Healthcare Analytics. Aryya Gangopadhyay UMBC

LECTURE 4. Last time: Lecture outline

Probability and Random Variables. Generation of random variables (r.v.)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)

Group Theory and Chemistry

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Estimating PageRank Values of Wikipedia Articles using MapReduce

Least Squares Estimation

PUBLIC TRANSPORT SYSTEMS IN POLAND: FROM BIAŁYSTOK TO ZIELONA GÓRA BY BUS AND TRAM USING UNIVERSAL STATISTICS OF COMPLEX NETWORKS

The Math Circle, Spring 2004

Enhancing the Ranking of a Web Page in the Ocean of Data

Gamma Distribution Fitting

Social Media Mining. Network Measures

The Quantum Harmonic Oscillator Stephen Webb

Perron vector Optimization applied to search engines

Complex Networks Analysis: Clustering Methods

Wave Function, ψ. Chapter 28 Atomic Physics. The Heisenberg Uncertainty Principle. Line Spectrum

ELEMENTS OF PHYSICS MOTION, FORCE, AND GRAVITY

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

University of Lille I PC first year list of exercises n 7. Review

New parameterization of cloud optical properties

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Numerical methods for American options

What does Quantum Mechanics tell us about the universe?

Bioinformatics: Network Analysis

Data Mining: Algorithms and Applications Matrix Math Review

Nonlinear Algebraic Equations Example

CRYSTALLINE SOLIDS IN 3D

Continuity of the Perron Root

THE BOHR QUANTUM MODEL

Name Date Class ELECTRONS IN ATOMS. Standard Curriculum Core content Extension topics

IREG List of International Academic Awards. Center for World-Class Universities Shanghai Jiao Tong University Shanghai, China January 20, 2015

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

Topic models for Sentiment analysis: A Literature Survey

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Elasticity Theory Basics

Gaussian Conjugate Prior Cheat Sheet

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Wes, Delaram, and Emily MA751. Exercise p(x; β) = [1 p(xi ; β)] = 1 p(x. y i [βx i ] log [1 + exp {βx i }].

WAVES AND ELECTROMAGNETIC RADIATION

Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation

Richard Feynman, Curious Character

Chapter 7. Matrices. Definition. An m n matrix is an array of numbers set out in m rows and n columns. Examples. (

Linear Algebra: Vectors

Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Quantum is Sexy. Google - Millions of Search Hits

Graph Processing and Social Networks

Nonlinear Regression:

Three Pictures of Quantum Mechanics. Thomas R. Shafer April 17, 2009

Java Modules for Time Series Analysis

Generally Covariant Quantum Mechanics

Function minimization

An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach

α = u v. In other words, Orthogonal Projection

11. Time series and dynamic linear models

ABSTRACT. We prove here that Newton s universal gravitation and. momentum conservation laws together reproduce Weinberg s relation.

Subspace Analysis and Optimization for AAM Based Face Alignment

POLYNOMIAL HISTOPOLATION, SUPERCONVERGENT DEGREES OF FREEDOM, AND PSEUDOSPECTRAL DISCRETE HODGE OPERATORS

THE CURRENT-VOLTAGE CHARACTERISTICS OF AN LED AND A MEASUREMENT OF PLANCK S CONSTANT Physics 258/259

Transcription:

CheiRank versus PageRank Dima Shepelyansky (CNRS, Toulouse) www.quantware.ups-tlse.fr/dima with Alexei Chepelianskii (Cambridge Univ.) Leonardo Ermann, Klaus Frahm, Bertrand Georgeot (CNRS, Toulouse) Through mechanisms still only partially understood, Google finds instantaneously the page you need. Story started in 1998 (now N 10 11 nodes) S. Brin and L. Page, Computer Networks and ISDN Systems 30,107 (1998). (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 1 / 20

How Google works 1 5 2 3 4 6 7 Markov chains and Directed networks Weighted adjacency matrix 0 0 0 0 0 0 0 1 3 0 0 0 0 0 0 1 1 3 0 0 2 0 0 0 S = 1 3 0 0 0 1 1 1 1 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 For a directed network with N nodes the adjacency matrix A is defined as A ij = 1 if there is a link from node j to node i and A ij = 0 otherwise. The weighted adjacency matrix is S ij = A ij / k A kj In addition the elements of columns with only zeros elements are replaced by 1/N. (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 2 / 20

How Google works Google Matrix and Computation of PageRank p = Sp p= stationary vector of S; can be computed by iteration of S. To remove convergence problems: Replace columns of 0 (dangling nodes) by 1 N : 1 0 0 7 0 0 0 0 1 1 3 0 7 0 0 0 0 1 1 1 3 0 7 2 0 0 0 In our example, S = 1 1 3 0 7 0 1 1 1. 1 1 0 0 7 2 0 0 0 1 0 1 7 0 0 0 0 1 0 0 7 0 0 0 0 To remove degeneracies of λ = 1, replace S by Google matrix G = αs + (1 α) E ; Gp = λp => Perron-Frobenius operator N α models a random surfer with a random jump after approximately 6 clicks (usually α = 0.85); PageRank vector => p at λ = 1 ( j p j = 1). CheiRank: S with inverted link directions proposed at A.D.Chepelianskii arxiv:1003.5455 (2010) (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 3 / 20

Models of real networks Real networks are characterized by: small world property: average distance between 2 nodes log N scale-free property: distribution of the number of incoming or outcoming links P(k) k ν (ν 2.1(in); 2.7(out)) Can be explained by a twofold mechanism: Constant growth: new nodes appear regularly and are attached to the network Preferential attachment: nodes are preferentially linked to already highly connected vertices. PageRank vector for large WWW (proportional to ingoing links): p j 1/j β, where j is the ordered index number of nodes N n with PageRank p scales as N n 1/p ν with numerical values ν = 1 + 1/β 2.1 and β 0.9 CheiRank vector is proportional to outgoing links β 0.6 (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 4 / 20

Linux Kernel Network Procedure call network for Linux Pout 10-2 10-6 ν -3 10-4 ν -5 (distinct only) ν ν = 1 10 0 10 1 10 2 10 3 ν = 0 ν -2 ν = 0 ν = 1 ν 10 0 10 2 10 4 P in 10-2 10-4 10-6 10-8 ρ PageRank ρ(i) printk(): 0.024 memset(): 0.012 kfree() : 0.011 10-2 10-3 10-4 10-5 increasing ρ(i) increasing ν(i) 1 ρ ~ K influence PageRank ρ * (i) start_kernel() : 2.80x10 4 * ρ ~ K * 1/2 4 btrfs_ioctl() : 2.55x10 4 menu_filalize(): 2.50x10 10-6 10 0 10 1 10 2 10 3 10 4 10 5 Rank Links distribution (left); PageRank and inverse PageRank (CheiRank) distribution (right) for Linux versions up to 2.6.32 with N = 285509 (ρ 1/j β, β = 1/(ν 1)). [Chepelianskii arxiv:1003.5455] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 5 / 20

Two-dimensional ranking of Wikipedia articles Wikipedia English articles N = 3282257 dated Aug 18, 2009 ln P, ln P -5-10 -15 0 5 10 15 ln K, lnk Dependence of probability of PagRank P (red) and CheiRank P (blue) on corresponding rank indexes K, K ; lines show slopes β = 1/(ν 1) with β = 0.92; 0.57 respectively for ν = 2.09; 2.76. [Zhirov, Zhirov, DS arxiv:1006.4270] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 6 / 20

Two-dimensional ranking of Wikipedia articles Density distribution in plane of PageRank and CheiRank indexes (ln K, ln K ): (a)100 top countries from 2DRank (red), 100 top from SJR (yellow), 30 Dow-Jones companies (cyan); (b)100 top universities from 2DRank (red) and Shanghai (yellow); (c)100 top personalities from PageRank (green), CheiRank (red) and Hart book (yellow); (d)758 physicists (green) and 193 Nobel laureates (red). (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 7 / 20

Wikipedia ranking of physicists Physicists: PageRank: 1. Aristotle, 2. Albert Einstein, 3. Isaac Newton, 4. Thomas Edison, 5. Benjamin Franklin, 6. Gottfried Leibniz, 7. Avicenna, 8. Carl Friedrich Gauss, 9. Galileo Galilei, 10. Nikola Tesla. 2DRank: 1. Albert Einstein, 2. Nikola Tesla, 3. Benjamin Franklin, 4. Avicenna, 5. Isaac Newton, 6. Thomas Edison, 7. Stephen Hawking, 8. Gottfried Leibniz, 9. Richard Feynman, 10. Aristotle. CheiRank: 1. Hubert Reeves, 2. Shen Kuo, 3. Stephen Hawking, 4. Nikola Tesla, 5. Albert Einstein, 6. Arthur Stanley Eddington, 7. Richard Feynman, 8. John Joseph Montgomery, 9. Josiah Willard Gibbs, 10. Heinrich Hertz. Nobel laureates: PageRank: 1. Albert Einstein, 2. Enrico Fermi, 3. Richard Feynman, 4. Max Planck, 5. Guglielmo Marconi, 6. Werner Heisenberg, 7. Marie Curie, 8. Niels Bohr, 9. Paul Dirac, 10. J.J.Thomson. 2DRank: 1. Albert Einstein, 2. Richard Feynman, 3. Werner Heisenberg, 4. Enrico Fermi, 5. Max Born, 6. Marie Curie, 7. Wolfgang Pauli, 8. Max Planck, 9. Eugene Wigner, 10. Paul Dirac. CheiRank: 1. Albert Einstein, 2. Richard Feynman, 3. Werner Heisenberg, 4. Brian David Josephson, 5. Abdus Salam, 6. C.V.Raman, 7. Peter Debye, 8. Enrico Fermi, 9. Wolfgang Pauli, 10. Steven Weinberg. (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 8 / 20

Ranking of World Trade 200 a) 2008: All commodities 150 K * 100 50 50 100 K 150 200 L.Ermann, DS arxiv:1105.5027 (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 9 / 20

Rank of Poland PageRank K and CheiRank K 50 40 Rank 30 20 10 0 1960 1970 1980 1990 2000 2010 years (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 10 / 20

Rank table 2008 (74% of countries of G20) (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 11 / 20

Towards two-dimensional search engine Node density distribution in PageRank (log N K ) and CheRank (log N K ) plane for Univ. Cambridge (left) and Oxford (right) in 2006 (N 200000) [Ermann, Chepelianskii, DS in preparation] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 12 / 20

Towards two-dimensional search engine Node density distribution in PageRank (log N K ) and CheRank (log N K ) plane for Wikipedia (left) and Linux 2.4 (right) [Ermann, Chepelianskii, DS in preparation] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 13 / 20

Correlator of PageRank and CheiRank κ 8 6 4 Wikipedia Brain Model British Universities Kernel Linux Yeast Transcription Esch. Coli Transcr. Business Proc. Man. 2 0 10 2 10 3 10 4 N 10 5 10 6 10 7 κ = N i P(K(i))P(K (i)) 1 [Ermann, Chepelianskii, DS in preparation] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 14 / 20

Flows in two dimensions N N plane of (K, K ) for Univ. of Cambridge (2006) and Wikipedia [Ermann, Chepelianskii, DS in preparation] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 15 / 20

Absence of spectral gap in real WWW Spectrum of Google matrix for Univ. of Cambridge (left) and Oxford (right) in 2006 (N 200000, α = 1). [Frahm, Georgeot, DS arxiv:1105.1062] (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 16 / 20

Universal Emergence of PageRank 10-2 PageRank Cambridge 10-2 PageRank 1-α = 0.1 10-4 10-4 Oxford 1-α = 0.1 P P 10-6 1-α = 10-3 10-8 1-α = 10-5 10-10 1-α = 10-7 10-12 10 0 10 1 10 2 10 3 10 4 10 5 K 10-1 PageRank Cambridge 10-3 1-α = 10-8 10-5 10-7 10-9 10-1 10-4 f(α)-f(1) w(α) 10-11 10-6 10-13 1-α 10-2 10 0 10 1 10 2 10 3 10 4 10 5 K P P 10-6 1-α = 10-3 10-8 1-α = 10-5 10-10 1-α = 10-7 10-12 10 0 10 1 10 2 10 3 10 4 10 5 K 10-1 PageRank Oxford 10-3 1-α = 10-8 10-5 10-7 10-9 10-1 10-4 w(α) f(α)-f(1) 10-11 10-6 10-2 1-α 10-13 K 10 0 10 1 10 2 10 3 10 4 10 5 (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 17 / 20

Invariant subspaces size distribution 10 0 10-1 10-2 F(x) 10-3 10-4 10-5 10-6 10-2 10-1 10 0 x 10 1 10 2 F(x) integrated number of invariant subspaces with size larger that d/d 0 ; x = d/d 0, d 0 is average size of subspaces (top: Cambridge, Oxford 2002-2006; middle: all others; bottom: Wikipedia CheiRank). Curve: F(x) = 1/(1 + 2x) 3/2. (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 18 / 20

PageRank at α 1 10 2 10 0 P N s 10-2 10-4 10-6 10-8 10-4 10-2 10 0 K/N s Top: Cambridge, Oxford 2002-2006; bottom: all others (α = 1 10 8 ). P = 1 α 1 αs 1 N e ; P = P λ j =1 c j ψ j + P λ j 1 1 α (1 α)+α(1 λ j ) c j ψ j (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 19 / 20

Google Matrix Applications practically to everything... more data at http://www.quantware.ups-tlse.fr/qwlib/2drankwikipedia/.../tradecheirank/ (Quantware group, CNRS, Toulouse) Google Krakow, May 23, 2011 20 / 20