Hourly Analyss of a Very Large Topcally Categorzed Web Query Log Steven M. Betzel, Erc C. Jensen, Abdur Chowdhury, Davd Grossman, Ophr Freder Illnos Insttute of Technology 0 W 3 St. Chcago, IL 6066 {steve,ej,abdur,grossman,freder}@r.t.edu ABSTRACT We revew a query log of hundreds of mllons of queres that consttute the total query traffc for an entre week of a generalpurpose commercal web search servce. Prevously, query logs have been studed from a sngle, cumulatve vew. In contrast, our analyss shows changes n popularty and unqueness of topcally categorzed queres across the hours of the day. We examne query traffc on an hourly bass by matchng t aganst lsts of queres that have been topcally pre-categorzed by human edtors. Ths represents of the query traffc. We show that query traffc from partcular topcal categores dffers both from the query stream as a whole and from other categores. Ths analyss provdes valuable nsght for mprovng retreval effectveness and effcency. It s also relevant to the development of enhanced query dsambguaton, routng, and cachng algorthms. Categores and Subject Descrptors: H.3.5 [Informaton Storage and Retreval]: Onlne Informaton Servces Web-based servces General Terms: Measurement, Human Factors. Keywords: Query Log Analyss, Web Search.. INTRODUCTION Understandng how queres change over tme s crtcal to developng effectve, effcent search servces. We are unaware of any log analyss that studes dfferences n the query stream over the hours n a day; much less how those dfferences are manfested wthn topcal categores. We focus on Crcadan changes n popularty and unqueness of topcal categores. Emphass on changng query stream characterstcs over ths longtudnal (tme) aspect of query logs dstngushes ths work from pror statc log analyss, surveyed n [7]. We began wth the hypothess that there are very dfferent characterstcs durng peak hours and off-peak hours durng a day. After revewng a week s worth of data hundreds of mllons of queres - we have found, not surprsngly, that: Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR 04, July 25 29, 2004, Sheffeld, South Yorkshre, UK. Copyrght 2004 ACM -583-88-4/04/0007...$5.00. The number of queres ssued s substantally lower durng non-peak hours than peak hours. However, we knew lttle about how often queres are repeated from one hour of the day to the next. After examnng the behavor of mllons of queres from one hour of the day to the next we have found the less obvous result: The average number of query repettons n an hour does not change sgnfcantly on an hourly bass throughout the day. Most queres appear no more than several tmes per hour. These queres consstently account for a large porton of total query volume throughout the course of the day. The queres receved durng peak hours are more smlar to each other than ther non-peak hour counterparts. We also analyze the queres representng dfferent topcs usng a topcal categorzaton of our query stream. These cover approxmately of the total query volume. We hypotheszed that traffc behavor for some categores would change over tme and that others would reman stable. For 6 dfferent categores, we examned ther traffc characterstcs: Some topcal categores vary substantally more n popularty than others as we move through an average day. Some topcs are more popular durng partcular tmes of the day, whle others have a more constant level of nterest over tme. The query sets for dfferent categores have dfferng smlarty over tme. The level of smlarty between the actual query sets receved wthn topcal categores vares dfferently accordng to category. Ths leads us to beleve that predctve algorthms that are able to estmate the lkelhood of a query beng repeated may well be possble. Ths could have a sgnfcant mpact on future cache management and load-balancng algorthms. Such algorthms could mprove retreval effectveness by assstng n query dsambguaton, makng t easer to determne what nformaton need s beng expressed by a query at a gven tme. They could also assst research n search effcency that takes nto account query arrval-rates [3]. Our analyss covers the entrety of the tens of mllons of queres each day n the search log from Amerca Onlne over a complete week n December. Ths represents a populaton of tens of mllons of users searchng for a wde varety of topcs. Secton 2 revews the pror work n query log analyss. Secton 3 descrbes our analyss of overall query traffc. Secton 4 descrbes our analyss of trends n categorzed queres. Fnally, n Secton 5 we present our conclusons and drectons for future work.
2. PRIOR WORK Examnatons of search engne evaluaton ndcate that performance lkely vares over tme due to dfferences n query sets and collectons [6]. Although the change n collectons over tme has been studed (e.g., the growth of the web) [0], analyss of users queres has been prmarly lmted to the nvestgaton of a small set of avalable query logs that provde a snapshot of ther query stream over a fxed perod of tme. Pror work can be parttoned nto statc query log analyss and some recent dsclosures by web search engnes. Query log analyss can be parttoned nto large-scale log analyss, small-scale log analyss and some other applcatons of log analyss such as categorzaton and query clusterng. Jansen and Pooch provde a framework for statc log analyss, but do not address analyss of changes n a query stream over tme [7]. Gven that most search engnes receve on the order of between tens and hundreds of mllons of queres a day [22], current and future log analyss efforts should use ncreasngly larger query sets to ensure that pror assumptons stll hold. Prevous studes measured overall aspects of users queres from statc web query logs. In the only large-scale study (all others nvolve only a few mllon queres), Slversten concludes that users typcally vew only the top ten search results and that they generally enter short queres from a statc analyss of an AltaVsta query log from sx weeks n 998 consstng of 575 mllon nonempty queres [6]. He also found that only 3.6% of queres appear more than three tmes, the top 25 queres represent. of the total query volume, and n 7 of sessons users do not revse ther queres. Addtonally, co-occurrence analyss of the most frequent 0,000 queres showed that the most correlated terms are often consttuents of phrases. No tme-based or topc-based analyss of ths query load was reported; t does not provde nsght nto how or when any usage or topcal nterest changes occur. Other studes examne the effect of advanced query operators on the search servce coverage of Google, MSN, and AOL, fndng that n general, they had lttle effect [4]. These overall statstcs do not provde any nsght nto temporal changes n the query log, but do provde some nsght nto how people use search servces. Jansen, et. al, also provde analyss of query frequency [7][9]. Ther fndngs ndcate that the majorty (57%) of query terms from the Excte log of more than 5,000 queres are used only once, and a large majorty (78%) occur three tmes or less. These studes show that nether queres nor ther component terms follow a Zpfan dstrbuton, as the number of rare, nfrequently repeated queres and terms s dsproportonately large. Other studes have focused on user behavor at the query sesson level and found varyng results, wth some estmatng reformulated queres consttutng 40-52% of queres n a log [8][2]. Wang, et. al examned a log of more than 500,000 queres to a unversty search engne from 997-200 [23]. They fnd trends n the number of queres receved by season, month, and day. We extend upon ths work by examnng the larger communty of general web searchers and analyzng trends correspondng to hour of day. Several studes examne query categores n small, statc logs. Spnk, et. al analyzed logs totalng more than one mllon queres submtted to the Excte web search engne durng sngle days n 997, 999, and 200 [8][9][20]. They classfed approxmately 2,500 queres from each log nto topcal categores and found that although search topcs have changed over the years, users behavors have not. Ross and Wolfram categorzed the top,000 term pars from the one mllon query Excte log nto 30 subject areas to show commonaltes of terms n categores [3]. Jansen, et. al used lsts of terms to dentfy mage, audo, and vdeo queres and measure ther presence n the one mllon query Excte log [9]. In order to examne the dfferences n queres from users n dfferent countres, Spnk, et. al, examned a 500,000 query log from the FAST web search engne durng 200, beleved to be used largely by Europeans at that tme, classfyng 2,500 queres from t nto the same topcal categores. They found dfferences between FAST and Excte n the topcs searched for [7]. Other work manually grouped queres by task. Broder defnes queres as nformatonal, navgatonal or transactonal and presents a study of AltaVsta users va a popup survey and manual categorzaton of 200 queres from a log [2]. Betzel, et. al mplctly categorzed queres from a search log as navgatonal by matchng them to edted ttles n web drectores to automatcally evaluate navgatonal web search []. Xe and Wolfram automatcally categorzed query terms by usng results from web search engnes to assgn the terms to broad subject categores [25]. Several studes of query cachng examne query frequency dstrbutons from a statc log, focusng on the average lkelhood of an arbtrary query beng repeated over the entre, fxed-length log. Lempel and Moran evaluated the performance of cachng strateges over a log of seven mllon queres to AltaVsta n 200 and found that the frequences of queres n ther log followed a power law []. Eron and McCurley compared query vocabulary from a log of nearly.3 mllon queres posed to a corporate ntranet to the vocabulary of web page anchor text and found that the frequency of queres and query terms follows a tal-heavy power law [5]. Xe and O Hallaron studed query logs from the Vvsmo meta-search engne of 0,88 queres over one month n 200 n comparson to the Excte log of.9 mllon over one day n 999 and found that although as n other studes over half of the queres are never repeated, the frequences of queres that are repeated do follow a Zpfan dstrbuton [26]. Sarava, et. al evaluated a two-level cachng scheme on a log of over 00,000 queres to a Brazlan search engne and found that query frequences follow a Zpf-lke dstrbuton [5]. Markatos smulated the effect of several types of query caches on an Excte query log of approxmately one mllon queres and found that tradtonal cachng methods provde sgnfcant mprovements n effcency [2]. Although tradtonal MRU-style caches obvously enhance throughput by explotng temporal localty at the mnuteto-mnute level, these studes do not examne changes n the query stream accordng to the hour of the day that may be leveraged n enhanced cache desgn. It s well known that dfferent users represent the same nformaton need wth dfferent query terms, makng query clusterng attractve when examnng groups of related queres. However, as Raghavan and Sever have shown, tradtonal smlarty measures are unsutable for fndng query-to-query smlarty [3]. Wen, et. al, ncorporated clck-through to cluster users queres [23]. In evaluatng ther system, they analyzed a random subset of 20,000 queres from a sngle month of ther approxmately -mllon queres-per-week traffc. They found
that the most popular 22. queres represent only 400 clusters of queres usng dfferng sets of query terms. Many web search servces have begun to offer vews of the most popular and/or changng (becomng drastcally more or less popular) queres: AOL Member Trends, Yahoo - Buzz Index, Lycos - The Lycos 50 wth Aaron Schatz, Google Zetgest, AltaVsta - Top Queres, Ask Jeeves, Fast (AllTheWeb). These vews necessarly ncorporate a temporal aspect, often showng popular queres for the current tme perod and those that are consstently popular. Some also break down popularty by topcal categores. Systems seekng to dsplay changng queres must address the ssue of relatve versus absolute change n a query s frequency to fnd queres whose change s nterestng, not smply a query that went from frequency one to two (a 20 jump), or one that went from 0,000 to,000 (a 000 absolute change). 3. OVERALL QUERY TRAFFIC We examne a search log consstng of hundreds of mllons of queres from a major commercal search servce over the sevenday perod from 2/26/03 through //04. Ths log represents queres from approxmately 50 mllon users. We preprocess the queres to normalze the query strngs by removng any case dfferences, replacng any punctuaton wth whte space (strppng advanced search operators from the approxmately 2% of queres contanng them), and compressng whte space to sngle spaces. The average query length s.7 terms for popular queres and 2.2 terms over all queres. On average, users vew only one page of results 8% of the tme, two pages 8% and three or more % of the tme. Frst, we examne trends n the query stream as a whole, and then focus on trends related to queres manually categorzed nto topcal categores. We begn our analyss of the overall stream by examnng how the volume of query traffc changes as we move from peak to nonpeak hours. We show the percentage of the day s total and dstnct number of queres for each hour n the day on average over our seven-day perod n Fgure (all tmes n our query log are Eastern Standard Tme). Only 0.7 of the day s total queres appear from 5-6AM, whereas 6.7% of the day s queres appear from 9-0PM. Perhaps more nterestngly, the rato of dstnct to total queres n a gven hour s nearly constant throughout the day. Ths shows that the average number of tmes a query s repeated s vrtually constant over the hours n a day, remanng near 2.4 wth only a 0.2 standard devaton. Although the average repetton of queres remans nearly constant, we can examne ths n greater detal by measurng the frequency dstrbuton of queres at varous hours n the day, as seen n Fgure 2. From ths analyss t s clear that the vast majorty of queres n an hour appear only one to fve tmes and that these rare queres consstently account for large portons of the total query volume throughout the course of the day. Percentage of Daly Query Traffc 8% 7% 6% 4% 2% % Percentage of Average Daly Query Traffc at Each Hour 0 6 2 8 Hour of Day Fgure Although we have shown that the query dstrbuton does not change substantally over the course of a day, ths does not provde nsght nto how the sets of queres vary from one hour to the next. To examne ths, we measure the overlap between the sets of queres entered durng those hours. We use tradtonal set and bag overlap measures as gven n Equaton and Equaton 2, respectvely. Dstnct overlap measures the smlarty between the sets of unque queres from each hour, whle overall (bag) overlap measures the smlarty of ther frequency dstrbutons by ncorporatng the number of tmes each query appears n an hour, q ; A). Whle these measures examne the smlarty of the sets of queres receved n an hour and the number of tmes they are entered, they do not ncorporate the relatve popularty or rankng of queres wthn the query sets. To examne ths, we also measure the Pearson correlaton of the queres frequences. As can be seen from Equaton 3 (where C ( q; A) s the mean number of query repettons n perod A and s q; A) s the standard devaton of all the query frequences n perod A), ths measures the degree of lnear correlaton between the frequences of the queres n each hour, so two hours that had exactly the same queres wth exactly the same frequences would have a correlaton of one. Note that ths normalzes for the effect of dfferng query volume,.e., the correlaton of two hours wth exactly the same underlyng query dstrbutons smply scaled by a constant would also have a correlaton of one. Percentage of Total Queres 4 3 3 2 2 Frequency Dstrbuton of Selected Hours from 2/26/03 2AM-AM 6AM-7AM 2PM-PM 6PM-7PM,00-0,000 50-,000 20-500 0-200 5-00 26-50 2-25 6-20 -5 0 Frequency Ranges Fgure 2 Average Total Queres Average Dstnct Queres 9 8 7 6 5 4 3 2
dst. overlap ( A, B ) = A B A B query stream by hour we are able to nfer the effectveness of general cachng algorthms at those tmes. Equaton : Dstnct Overlap of Query Sets from Hours A and B overla A, B) = q A q ; A) + q A B q B mn( q ; A), q ; B)) q ; B) q A B mn( q ; A), q ; B)) Equaton 2: Overall Overlap of Query Sets from Hours A and B Percentage 6 5 5 4 4 3 3 2 2 Sorted Average Overlap Characterstcs from /2/04 that Matched Each Hour Overlap Dstnct Overlap Pearson 5 6 4 7 3 2 8 9 0 0 2 3 4 23 5 6 7 8 22 9 2 20 Hour of Day Fgure 4 r A, B n = n = ( q ; A) q; A))( q ; B) q; B)) s s q; A) q; B) Equaton 3: Pearson Correlaton of Query Frequences from Hours A and B 5 5 4 4 3 3 2 2 Average Overlap Characterstcs of Matchng Queres from /2/04 Overlap Dstnct Overlap Pearson 0 6 2 8 Hour of Day Fgure 3 In Fgure 3 we examne the average level of overlap and correlaton between the query sets receved durng the same hour for each day over our week. As measurng overlap over the set of all queres appearng n our week would be computatonally expensve, we use the set of all the tens of mllons of queres n the day after our seven-day perod as an ndependent sample and measure overlap at each hour n our week of the queres matchng those n that sample. Although we prevously saw that the frequency dstrbuton of queres does not substantally change across hours of the day, Fgure 3 shows that the smlarty between the actual queres that are receved durng each hour does n fact change. Ths trend seems to follow query volume, whch s apparent f we sort the same overlap data by query volume as s done n Fgure 4. Clearly, as query volume ncreases the queres that compose that traffc are more lkely to be smlar across samples of those peak tme perods. Ths fndng s consstent wth pror analyses of web query caches showng they sgnfcantly mprove performance under heavy load. The more redundancy they are able to detect, the more cachng algorthms are able to enhance throughput. Although the pror work prmarly measures the effect of ths redundancy n cache performance, t s obvous that redundancy must exst and be detected for cachng to succeed. By examnng the overall 4. QUERY CATEGORIES In Secton 3 we analyzed the entre query log. However, ths blanket vew of the query traffc does not provde nsght nto the characterstcs of partcular categores of queres that mght be exploted for enhanced effcency or effectveness. For example, a search provder who returns specalzed results for entertanment queres cannot determne from general query traffc alone whether a gven query s more lkely to be referrng to entertanment related content or how to best process and cache that query. The remander of our analyss focuses on trends relatng to topcal category of queres. Our query set s categorzed smply by exactly matchng queres to one of the lsts correspondng to each category. These lsts are manually constructed by edtors who categorze real users queres, generate lkely queres, and mport lsts of phrases lkely to be queres n a category (e.g., ctes n the US for the US Stes category). Queres that match at least one category lst comprse of the total query traffc on average. Ths represents mllons of queres per day. Sampled Categorzed Query Stream Breakdown Travel Sport s Shoppng Other 6% US Stes Personal Fnance Computng 9% Research & Learn 9% Holdays % Home Entertanment Health Fgure 5 To verfy that our defned category lsts suffcently cover the topcs n the query stream, we manually classfed a random sample of queres, assgnng them to Other f they dd not ntutvely ft nto an exstng category, as can be seen n Fgure 5. To determne the number of queres requred to acheve a representatve sample, we calculate the necessary sample sze n queres, ss = (z 2 σ 2 )/β 2, where z s the confdence level value, σ s
the sample standard devaton, and β s the error rate. By settng our confdence level to 99% and error rate to, we requre a sample of 600 queres. The relatve percentages for each category of the approxmately of query volume that match any category lst over our week (see Fgure 9) are wthn the error rate of those from our manually categorzed sample. Ths shows that our lsts are a reasonable representaton of these topcal categores. We focus on a subset of these categores and examne musc and moves ndependent of other entertanment queres. The relatve sze of each category lst we used s gven n Fgure 6. Obvously, not all queres lsted actually match those entered by users, especally when the category contans large mported lsts of phrases. Percentage of Categorzed Queres 5 4 4 3 3 2 2 Relatve Percentage of Categorzed Queres Shoppng Computng Travel Home Health Government Research & Learnng Fgure 6 Although we have shown that our lsts are a far representaton of the topcs n the query stream, ths does not ndcate what porton of the frequency dstrbuton of that stream they represent. To determne ths, we measured the average proporton of queres matchng any category lst that appear at varous frequences each hour and compared them to the average overall hourly frequency dstrbuton of the query stream (see Fgure 7). Unsurprsngly, ths comparson shows that queres n the category lsts represent more popular, repeated queres than average, although the general shape of the dstrbutons s smlar. Percentage of Average Tota Matchng Queres Fgure 7 Holdays Sports Moves Personal Fnance Entertanment US Stes Musc Hourly Frequency Dstrbuton of Matchng Queres vs. All Queres Averaged over 7 Days and 6 Categores 3 3 2 2 Avg. Matchng Queres Avg. Queres >,000 20-500 5-00 2-25 -5 4. Trends n Category Popularty We begn our temporal analyss of topcal categores by measurng ther relatve popularty over the hours n a day. Frst, we examne the percent of total query volume matchng a selected group of category lsts, as can be seen n Fgure 8. It s clear that dfferent topcal categores are more and less popular at dfferent tmes of the day. Personal fnance, for example, becomes more 9 7 Frequency Ranges 5 3 popular from 7-0AM, whle musc queres become less popular. Although t s dffcult to compare the relatve level of popularty shft from one category to another due to the dfferences n scale of each of ther percentages of the query stream, t s clear that some categores popularty changes more drastcally throughout the day than others. 4% 2% % Fgure 8 In order to quantfy ths, we calculated the KL-dvergence (Equaton 4) between the lkelhood of recevng any query at a partcular tme and the lkelhood of recevng a query n a partcular category, as can be seen n Fgure 9. Ths reveals that the top three categores n terms of popularty are pornography, entertanment, and musc. D( q t) q c, t)) = q q t) q t) log q c, t) Equaton 4: KL-Dvergence of Query Occurrence Lkelhood for Category c and Total Stream at Tme t 6% 4% 2% % Category Percentage of Entre Query Stream and Dvergence from Lkelhood of any Query at Each Hour KL-Dvergence % of query stream Dstnct % of query stream Computng Sports Holdays Research and Learnng Categorcal Percent over Tme Health US Stes Shoppng Government Moves Travel Personal Fnance Category Fgure 9 Entertanment Health Personal Fnance Shoppng Musc USStes 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 Hour of Day Home Musc Entertanment Comparng these dvergences to the proporton of categorzed queres n each category n Fgure 6 quckly llustrates that dvergence s not correlated wth the number of queres categorzed n each category. Also shown n Fgure 9 s the average percentage of the entre query volume and dstnct queres that match each category. Although the categores that cover the largest portons of the query stream also have the most relatve popularty fluctuaton, ths correlaton does not contnue throughout all categores.
We drlled down nto the hghly fluctuatng categores and examned the behavor of the queres wth the most hghly fluctuatng frequences n each category. From ths we hoped to gan some nsght nto the reasons why certan categores fluctuate, and the effect of terms and queres wth very hgh flux on those categores. For example, the three most changng queres for the entertanment category on average over our week were: Table : Top Three Fluctuatng Entertanment Queres gwyneth paltrow pars hlton orlando bloom All three of these queres are specfcally related to recent events n US popular culture; the actress Gwyneth Paltrow recently marred n secret, and the news of her nuptals broke durng the week we analyzed. Hlton Hotel heress Pars Hlton has been a popular topc recently; she starred n a prme tme realty TV show enttled The Smple Lfe. Also popular s Orlando Bloom, the actor who portrays a popular character n the Lord of the Rngs trlogy. As the fnal nstallment of the seres was released n US theatres durng the week pror to our query log, t s no surprse to see hs name as a top-changng query. Drllng down further, we pnponted some of the specfc nstances where these popular queres jumped the most. For example, n the afternoon of Frday, December 27th, the popularty of the query gwyneth paltrow skyrocketed. From 3-4PM, t occurred once, from 4-5PM t occurred 67 tmes, and from 5PM-6PM t occurred,855 tmes. The top changng (on average) twenty-fve queres, after normalzaton, n the Entertanment and Musc categores are shown n Table 2. Table 2: Top 25 Fluctuatng Queres from Musc and Entertanment We also looked at some of the most frequently changng terms to see how they relate to the change of entre queres contanng those terms. Some excellent examples of ths behavor n the Entertanment category nclude the terms pctures (the tenthmost changng term) and duff (the 7 th -most changng term). We looked at the popularty change (.e., change n frequency) for queres contanng these terms and found that several of them also exhbted large changes over tme. For example, on the afternoon of December 28 th from noon to 5PM EST, the query hlary duff changed from an ntal frequency of 27 from 2-PM to a peak of 3 (from 3-4PM), and then stablzed around 70 for the rest of the evenng; smlar spkes n frequency for ths query occurred at smlar tmes durng other days n our perod of study. 4.2 Trends n Unqueness of Queres Wthn Categores Although we have shown that dfferent categores have dfferng trends of popularty over the hours of a day, ths does not provde nsght nto how the sets of queres wthn those categores change throughout the day. In order to examne ths, we return to the overlap measures used n Secton 3. Overlap, dstnct overlap, and the Pearson correlaton of query frequences for Personal Fnance and Musc are shown n Fgure 0 and Fgure. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 Overlap Dst. Olap Pearson Personal Fnance Overlap 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 Musc lyrcs musc brtney spears furnture love hlary duff good charlotte sloppy seconds jessca smpson b2k emnem chrstna agulera smple plan justn tmberlake free musc lnkn park mchael jackson beyonce jennfer lopez 50 cent knky napster chc tupac blnk 82 Entertanment gwyneth paltrow pars hlton orlando bloom espn dsney johnny depp much musc dsney channel hgtv dsneychannel com www dsneychanel com kate holmes pctures pamela anderson cartoon network hlary duff fake chad mchael murray vvca a fox dsneychannel care bears salor moon www cartoonnetwork com days of our lves charmed tom wellng Fgure 0 Although the unqueness of queres n categores n general appears to be correlated wth that of the entre query stream (Fgure 3), that of partcular categores appears to be substantally dfferent from one to the next. For example, f we compare the overlap characterstcs of personal fnance wth those of musc, we see they are qute dfferent. Not only does personal fnance have generally hgher overlap, but t has a much hgher overall overlap than dstnct overlap, whereas they are nearly equal for musc. Other categores wth generally hgh overlap and dstnct overlap are shoppng, computng, and travel. Also, the correlaton of frequences of personal fnance queres s very hgh all day, ndcatng searchers are enterng the same queres roughly the same relatve amount of tmes, ths s clearly not true for musc. Some categores have a hgh Pearson correlaton. Ths ndcates that a sgnfcant porton of the queres n these categores s often ranked smlarly by frequency. These categores are: pornography, travel, research and learnng, and computng, and ther Pearson correlatons are llustrated n Fgure 2.
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0 Overlap Dst. Olap Pearson Musc Overlap 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 Fgure partcular topcal categores. For ths we use a set of topcal categores created by human edtors that represents approxmately of the average query traffc. We show that popularty of some of these categores fluctuates consderably whle other categores reman relatvely stable over the hours n a day. Addtonally, we show that the overlap and correlaton n popularty of the queres wthn each topcal category vares qute dfferently over the course of the day. Extendng ths analyss to nvestgate changes n the very rare queres not often matched by our category lsts would provde nsght nto whether those are changng smlarly to more popular queres. One method for approachng ths mght be to ncorporate automatc query classfcaton methods to extend our basc lsts It s clear that some categores have very smlarly ranked queres by frequency throughout the day, whle others vary dramatcally accordng to query volume. Referrng back to Fgure 6 and Fgure 9, unqueness of queres n partcular categores does not appear to be correlated wth the number of queres n ther respectve category lsts, the proporton of the query stream they represent, or the number of dstnct queres they match. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Pearson Correlatons of Frequences for Categores 0 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 Fgure 2 Personal Fnance Musc Moves Computng Entertanment Government Ths type of data s potentally of great use to query cachng algorthms. For example, f t s known a pror that queres for certan categores are smlarly ranked throughout the day, they can be gven hgher prorty n a query-cachng scheme. Smlarly, queres n categores whose rankngs change vastly over tme mght be gven low cachng prorty. 5. CONCLUSIONS AND FUTURE WORK Ths study focuses on nvestgatng the nature of changes n the query stream of a very large search servce over tme. Understandng how users queres change over tme s crtcal to developng effectve, effcent search systems and to engneerng representatve test sets and evaluatons that drve ths development. In ths study we fnd trends over tme that are stable despte contnung fluctuaton n query volume. Although the average query s repeated only twce durng any gven hour of the day, the total query traffc vares both n magntude from one hour to the next, and also n degree of overlap and correlaton n popularty of the queres that are receved. In addton, we also fnd that the frequency dstrbuton of an hour s worth of queres remans constant throughout the day. Also, at the most general level, we fnd that query volume s hghest and query sets are most stable durng peak hours of the day. Ths study further nvestgates changes n the query stream over tme by examnng the nature of changes n popularty of Ths study s the gateway to a large and dverse body of future work. Integratng ths knowledge of Crcadan changes n the query stream by category wll lkely yeld mproved query dsambguaton, query cachng, and load balancng algorthms. 6. BIBLIOGRAPHY [] Betzel, S., Jensen, E., Chowdhury, A., and Grossman, D. Usng Ttles and Category Names from Edtor-drven Taxonomes for Automatc Evaluaton. In Proceedngs of CIKM 03 (New Orleans, LA, November, 2003), ACM Press. [2] Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2) (Fall, 2002). [3] Chowdhury, A., G. Pass. Operatonal Requrements for Scalable Search Systems, In Proceedngs of CIKM 03 (New Orleans, LA, November 2003), ACM Press. [4] Eastman, C., B. Jansen, Coverage, Relevance, and Rankng: The Impact of Query Operators on Web Search Engne Results, ACM Transactons on Informaton Systems, Vol. 2, No. 4, October 2003, Pages 383 4. [5] Eron, N., K. McCurley. Analyss of Anchor Text for Web Search, In Proceedngs of SIGIR 03 (Toronto, Canada, July 2003), ACM Press. [6] Hawkng, D., Craswell, N., and Grffths, K. Whch Search Engne s Best at Fndng Onlne Servces? In Proceedngs of WWW0 (Hong Kong, May 200), Posters. Actual poster avalable as http://pgfsh.vc.cms.csro.au/~nckc/pubs/www0actualpos ter.pdf [7] Jansen, B. and Pooch, U. A revew of Web searchng studes and a framework for future research. Journal of the Amercan Socety for Informaton Scence and Technology 52(3), 235-246, 200. [8] Jansen, B., Spnk, A., and Saracevc, T. Real lfe, real users, and real needs: a study and analyss of user queres on the web. Informaton Processng and Management, 36(2) (2000), 207-227. [9] Jansen, B.J., Goodrum, A., Spnk, A. Searchng for multmeda: vdeo, audo, and mage Web queres. World Wde Web 3(4), 2000. [0] Lawrence, S. and Gles, C.L. Searchng the World Wde Web. Scence 280(5360), 98-00, 998.
[] Lempel, R. and Moran, S. Predctve cachng and prefetchng of query results n search engnes. In Proceedngs of WWW2 (Budapest, May 2003). [2] Markatos, E.P. On Cachng Search Engne Query Results. In the Proceedngs of the 5th Internatonal Web Cachng and Content Delvery Workshop, May 2000. [3] Raghavan, V. and Sever, H. On the Reuse of Past Optmal Queres. In Proc. of the 995 SIGIR Conference, 344-350, Seattle, WA, July 995. [4] Ross, N. and Wolfram, D. End user searchng on the Internet: An analyss of term par topcs submtted to the Excte search engne. Journal of the Amercan Socety for Informaton Scence 5(0), 949-958, 2000. [5] Sarava, P., Moura, E., Zvan, N., Mera, W., Fonseca, R., Rbero-Neto, B. Rank-preservng two-level cachng for scalable search engnes. In Proc. of the 24th SIGIR Conference, 5-58, New Orleans, LA, September, 200. [6] Slversten, C., Henznger, M., Maras, H., and Morcz, M. Analyss of a very large web search engne query log. SIGIR Forum 33() (Fall, 999), 6-2. [7] Spnk, A., Ozmutlu, S., Ozmutlu, H.C., and Jansen, B.J. U.S. versus European web searchng trends. SIGIR Forum 36(2), 32-38, 2002. [8] Spnk, A., Jansen, B.J., Wolfram, D., and Saracevc, T. From E-sex to e-commerce: Web search changes. IEEE Computer, 35(3), 07-09, 2002. [9] Spnk, A., Wolfram, D., Jansen, B.J. and Saracevc, T. Searchng the Web: The Publc and Ther Queres. Journal of the Amercan Socety of Informaton Scence 53(2), 226-234, 200. [20] Spnk, A., Jansen, B.J., and Saracevc, T. Vox popul: The publc searchng of the web. Journal of the Amercan Socety of Informaton Scence 52 (2), 073-074, 200. [2] Spnk, A., Jansen, B.J., and Ozmultu, H.C. Use of query reformulaton and relevance feedback by Excte users. Internet Research: Electronc Networkng Applcatons and Polcy 0 (4), 2000. [22] Sullvan, D. Searches Per Day. Search Engne Watch, February, 2003. http://searchengnewatch.com/reports/artcle.php/25646 [23] Wang, P., Berry, M., and Yang, Y. Mnng longtudnal web queres: Trends and patterns. Journal of the Amercan Socety for Informaton Scence and Technology 54(8), 743-758, June 2003. [24] J. Wen, J. Ne, H. Zhang Query Clusterng usng User Logs ACM Transactons on Informaton Systems, Vol. 20, No., January 2002, pp59-8. [25] Wolfram, D., H. Xe, Subject categorzaton of query terms for explorng Web users search nterests, Journal of the Amercan Socety for Informaton Scence, v.53 n.8, p.67-630, June 2002. [26] Xe, Y., O Hallaron, D. Localty n Search Engne Queres and Its Implcatons for Cachng. Infocom 2002.