IDENIFICAION OF HE DYNAMICS OF HE GOOGLE S RANKING ALGORIHM A. Khak Sedgh, Mehd Roudak Cotrol Dvso, Departmet of Electrcal Egeerg, K.N.oos Uversty of echology P. O. Box: 16315-1355, ehra, Ira sedgh@eetd.ktu.ac.r, roudak@raseo.com Abstract: Amog the search eges, Google s oe of the most powerful. It uses a accurate rakg algorthm to order web pages search results. I ths paper, t s show that a smple lear model ca approxmately model the dyamcs goverg the behavour of Google. Least Squares s used for the system detfcato procedure. Idetfcato results are provded to show the effectveess of the detfed system. Copyrght 003 IFAC Keywords: Google Search Ege, Rakg Algorthm, System Idetfcato, Least Squares. 1. INRODUCION Search eges are tools to help web users fdg ther favourte formato or web stes. oday most people start at a search ege o the web. Popularty of search eges depeds o ther accurate search results. he more accurate the search results are, the more popular the search ege s. Search eges have become creasgly mportat o the web ad ther rakg procedure s a fudametal characterstc whch s mportat to web stes. E-commerce s much attued to the rakg ssue, because hgher rakg traslates drectly to more sales. A top rakg a major search ege ofte wll geerate more targeted traffc tha a expesve baer advertsg campag. Plus, a good search ege posto s free-ayoe ca do t (March, 1999). oday, Google s oe of the most mportat search eges. It s oe of the most mportat places to be well lsted. he resultg traffc ca be staggerg (Jmworld, 00). Google uses a accurate rakg algorthm to order web pages ts search results. Its rakg algorthm s ukow ad s ot avalable to publc. I order to compare the web pages ad order them accordg to ther relevace to the searched term, Google cosders may parameters for web pages whch are also ukow to the Google users. I ths paper, the dyamcs of the rakg algorthm of Google s cosdered as a black box. he puts of whch are the parameters collected from the web pages accordg to the searched term ad the outputs are the raks of those web pages whch the parameters are calculated for them Google search results. Smulato results are provded the paper to show the detfed model capabltes to predct the Google s behavour ts rakg algorthm.. LEAS SQUARES SYSEM IDENIFICAION Least Squares s a well-kow ad establshed techque for the detfcato of the dyamcal behavour of lear dyamcal plas. It s assumed that the computed varable y s gve as (Astrom ad Wttemark, 1995):
y = θ1 φ1(x) + θ φ (x) +... + θ φ (x) Where φ,φ,..., are kow parameters ad 1 φ (1) θ 1,θ,..., θ are the ukow parameters. Observato pars{ (x, y ), = 1,,..., N} are obtaed from the expermets performed ad the parameters are determed to mmze the followg cost fucto: 1 ) = N ε = 1 J(θ () where ε = y y (3) I a compact form, the soluto to the least squares method problem s gve by where θ φ 1 = ( φ) φ y (4) φ = : φ (x 1) (x ) N φ (5) s assumed fullrak ad also, [ φ φ...φ ] φ (x 1) = 1 (6) [ y y... ] y = (7) 1 y N 3. IDENIFICAION FRAMEWORK Google s search results are ot fxed ad the web pages raks always chage. Chages search results are caused by factors such as: dexg ew web pages by Google, effort of webmasters to optmze ther web stes for hgher rakgs, the dyamc property of Google s algorthm ad probably few bugs Google rakg algorthm. I ths paper, terms are searched dfferet tme cycles the the stored search results are compared order to select web pages whch ther raks mostly vary betwee 1 ad 100. o guaratee ths codto, 87 s the worst cosdered rak. hs kd of data collectg s for cosderg the steady state of raks ot the traset raks of some web pages. he ext step s to select the umber of observatos. By observg sum square error gve by equato () for may dfferet cases, the umber of the web pages was fally chose as 6. I selectg the web pages for the detfcato process, the followg pots are cosdered: - All search results web pages are gored f ther URLs (uform resource locator) have chaged. - he softwares ca calculate the web pages parameters. - Most of the parameters of some pages are zero. - Noe of the parameters of some web pages are ot zero. - Some of sequetal web pages have the same PageRak (ths seems to be essetal because t helps to clarfy other parameters flueces. PageRak s the Google s crtera measurg the mportace of web pages). - Number of data sets (N) s bgger tha ukow parameters (). 3.1 Selecto of the Iput Parameters For detfcato purposes, a mportat step s the selecto of the umber of put parameters. May resources state that Google uses more tha 100 parameters. (We, 00) pots that the ma factors Google search ege rakg algorthm are: - Achor text (the text of a lk or hyperlk) from Yahoo ad Dmoz - Achor text - PageRak - Keyword proxmty (the placemet of words o a web page relato to each other) Ad fally states that achor text s the most mportat factor Google s rakg algorthm. Geerally there are may reports about Google s parameters but so far there s o such cofrmed data regardg the Google s parameters. herefore ths paper based o the expereces of the authors these put parameters are cosdered: - PageRak - Keyword frequecy (how ofte a word appears a web page) the web pages vsble text. - Keyword desty (the umber of words appearg o a web page compared to the total umber of words appearg o that web page) the web pages ttle. - Keyword desty the web pages text. - Keyword desty the web pages lked text.
- Keyword desty the web pages AL tags (the AL tag s a label descrbg a mage.). - Keyword promece (how early a web page s text a word appears) the web pages text. Ad these places are cosdered Smlar Pages (whe a user select the Smlar Pages lk for a partcular result, Google automatcally scouts the web for pages that are related to ths result): After collectg ad ormalzg the put data the equato (4) s solved ad the raks are calculated by equato (1). Fg.1 shows the observed raks, the calculated raks are show Fg., the observed ad calculated raks are compared Fg.3 ad Fg.4 shows the errors the estmato of raks. - he ttle of Smlar Pages. - ext below the ttle (ths text s a excerpt from the retured result page wth the query terms bolded.) of Smlar Pages. - he category of Smlar Pages ODP (Ope Drectory Project). - he URL of Smlar Pages. o calculate the parameters whch are related to the Smlar Pages a bary form s cosdered: - If there s the searched term t, t s cosdered as 1. - If there s t the searched term t, t s cosdered as 0. hs kd of valuato method s show table 1. able 1 A bary form s cosdered to calculate the parameters whch are related to the Smlar Pages. It s assumed that the searched term s search ege optmzato. Q meas the searched term s avalable exactly. Q meas at least oe word of searched term s avalable. NC s abbrevato of Not Cosdered. Case Q Q www.searchegewatch.com NC 1 www.pada.com/searchworld/ NC 1 www.toka.com/optmzato/ NC 1 www.rakwrte.com NC 0 HE ILE AND HE EX BELOW HE ILE: search ege optmzato 1 1 HE ILE AND HE EX BELOW HE ILE: search ege placemet ad optmzato 0 1 HE ILE AND HE EX BELOW HE ILE: optmzato 0 1 CAEGORY: Computers>Iteret> >Search Ege Optmzato Frms>S NC 1 CAEGORY: Computers>Iteret> >Promoto>News NC 0 CAEGORY: Computers>Iteret> >Free Search Ege Submttg NC 1 Fg. 1. Observed raks. Fg.. Calculated raks.
acceptable error but ca ot predct the raks whch are out of ths rego. 4. CONCLUSION Fg. 3. Observed ad calculated raks are compared. Least squares detfcato s employed to detfy the dyamcs behavour goverg the rakg algorthm of Google search ege. It s show that a accurate detfcato of Google's rakg algorthm s ot possble, because Google does ot show the etre web pages whch are related to the searched term ad ts parameters are ot kow. However a approxmate lear model s foud to descrbe the rakg dyamcs. It s show that Google's dyamcs ca be partally modelled usg a lear model. Idetfcato results usg real data are preseted to show the ma pots of the paper. REFERENCES Fg. 4. Errors the estmato of the raks. Astrom, K.J. ad B. Wttemark (1995). Adaptve Cotrol, 41-89, Addso Wesley, New York. Jmworld. (00). Search ege forums at Jmworld, http://www.searchegeforums.com. March, F. (1999). Secret s to achevg a top 10 posto, Frst place software Ic. We, P. (00). Google search ege rakg algorthm aalyss, Pwqsoft Ic., http://www.pwqsoft.com/search-egerakg.htm. 3. Aalyss of the Idetfed System o test the accuracy of the detfed system, the parameters of the 5 web pages are calculated ad the ther raks are estmated. hese 5 web pages are ot used the detfcato calculatos. able shows the result of ths aalyss. able Aalyss of the accuracy of the detfed system Raks Google s Calculated Dfferece search results raks 1-34 35 35 51 16 37 3 5 70 65 5 8 53 9 90 65 5 he raks of the web pages whch are used the system detfcato are as follows: rak 87 (8) As t s show table, the detfed system ca predct the raks whch are betwee ad 87 wth a