Hourly Analysis of a Very Large Topically Categorized Web Query Log
|
|
|
- Dwight Farmer
- 10 years ago
- Views:
Transcription
1 Hourly Analyss of a Very Large Topcally Categorzed Web Query Log Steven M. Betzel, Erc C. Jensen, Abdur Chowdhury, Davd Grossman, Ophr Freder Illnos Insttute of Technology 0 W 3 St. Chcago, IL 6066 {steve,ej,abdur,grossman,freder}@r.t.edu ABSTRACT We revew a query log of hundreds of mllons of queres that consttute the total query traffc for an entre week of a generalpurpose commercal web search servce. Prevously, query logs have been studed from a sngle, cumulatve vew. In contrast, our analyss shows changes n popularty and unqueness of topcally categorzed queres across the hours of the day. We examne query traffc on an hourly bass by matchng t aganst lsts of queres that have been topcally pre-categorzed by human edtors. Ths represents of the query traffc. We show that query traffc from partcular topcal categores dffers both from the query stream as a whole and from other categores. Ths analyss provdes valuable nsght for mprovng retreval effectveness and effcency. It s also relevant to the development of enhanced query dsambguaton, routng, and cachng algorthms. Categores and Subject Descrptors: H.3.5 [Informaton Storage and Retreval]: Onlne Informaton Servces Web-based servces General Terms: Measurement, Human Factors. Keywords: Query Log Analyss, Web Search.. INTRODUCTION Understandng how queres change over tme s crtcal to developng effectve, effcent search servces. We are unaware of any log analyss that studes dfferences n the query stream over the hours n a day; much less how those dfferences are manfested wthn topcal categores. We focus on Crcadan changes n popularty and unqueness of topcal categores. Emphass on changng query stream characterstcs over ths longtudnal (tme) aspect of query logs dstngushes ths work from pror statc log analyss, surveyed n [7]. We began wth the hypothess that there are very dfferent characterstcs durng peak hours and off-peak hours durng a day. After revewng a week s worth of data hundreds of mllons of queres - we have found, not surprsngly, that: Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SIGIR 04, July 25 29, 2004, Sheffeld, South Yorkshre, UK. Copyrght 2004 ACM /04/ $5.00. The number of queres ssued s substantally lower durng non-peak hours than peak hours. However, we knew lttle about how often queres are repeated from one hour of the day to the next. After examnng the behavor of mllons of queres from one hour of the day to the next we have found the less obvous result: The average number of query repettons n an hour does not change sgnfcantly on an hourly bass throughout the day. Most queres appear no more than several tmes per hour. These queres consstently account for a large porton of total query volume throughout the course of the day. The queres receved durng peak hours are more smlar to each other than ther non-peak hour counterparts. We also analyze the queres representng dfferent topcs usng a topcal categorzaton of our query stream. These cover approxmately of the total query volume. We hypotheszed that traffc behavor for some categores would change over tme and that others would reman stable. For 6 dfferent categores, we examned ther traffc characterstcs: Some topcal categores vary substantally more n popularty than others as we move through an average day. Some topcs are more popular durng partcular tmes of the day, whle others have a more constant level of nterest over tme. The query sets for dfferent categores have dfferng smlarty over tme. The level of smlarty between the actual query sets receved wthn topcal categores vares dfferently accordng to category. Ths leads us to beleve that predctve algorthms that are able to estmate the lkelhood of a query beng repeated may well be possble. Ths could have a sgnfcant mpact on future cache management and load-balancng algorthms. Such algorthms could mprove retreval effectveness by assstng n query dsambguaton, makng t easer to determne what nformaton need s beng expressed by a query at a gven tme. They could also assst research n search effcency that takes nto account query arrval-rates [3]. Our analyss covers the entrety of the tens of mllons of queres each day n the search log from Amerca Onlne over a complete week n December. Ths represents a populaton of tens of mllons of users searchng for a wde varety of topcs. Secton 2 revews the pror work n query log analyss. Secton 3 descrbes our analyss of overall query traffc. Secton 4 descrbes our analyss of trends n categorzed queres. Fnally, n Secton 5 we present our conclusons and drectons for future work.
2 2. PRIOR WORK Examnatons of search engne evaluaton ndcate that performance lkely vares over tme due to dfferences n query sets and collectons [6]. Although the change n collectons over tme has been studed (e.g., the growth of the web) [0], analyss of users queres has been prmarly lmted to the nvestgaton of a small set of avalable query logs that provde a snapshot of ther query stream over a fxed perod of tme. Pror work can be parttoned nto statc query log analyss and some recent dsclosures by web search engnes. Query log analyss can be parttoned nto large-scale log analyss, small-scale log analyss and some other applcatons of log analyss such as categorzaton and query clusterng. Jansen and Pooch provde a framework for statc log analyss, but do not address analyss of changes n a query stream over tme [7]. Gven that most search engnes receve on the order of between tens and hundreds of mllons of queres a day [22], current and future log analyss efforts should use ncreasngly larger query sets to ensure that pror assumptons stll hold. Prevous studes measured overall aspects of users queres from statc web query logs. In the only large-scale study (all others nvolve only a few mllon queres), Slversten concludes that users typcally vew only the top ten search results and that they generally enter short queres from a statc analyss of an AltaVsta query log from sx weeks n 998 consstng of 575 mllon nonempty queres [6]. He also found that only 3.6% of queres appear more than three tmes, the top 25 queres represent. of the total query volume, and n 7 of sessons users do not revse ther queres. Addtonally, co-occurrence analyss of the most frequent 0,000 queres showed that the most correlated terms are often consttuents of phrases. No tme-based or topc-based analyss of ths query load was reported; t does not provde nsght nto how or when any usage or topcal nterest changes occur. Other studes examne the effect of advanced query operators on the search servce coverage of Google, MSN, and AOL, fndng that n general, they had lttle effect [4]. These overall statstcs do not provde any nsght nto temporal changes n the query log, but do provde some nsght nto how people use search servces. Jansen, et. al, also provde analyss of query frequency [7][9]. Ther fndngs ndcate that the majorty (57%) of query terms from the Excte log of more than 5,000 queres are used only once, and a large majorty (78%) occur three tmes or less. These studes show that nether queres nor ther component terms follow a Zpfan dstrbuton, as the number of rare, nfrequently repeated queres and terms s dsproportonately large. Other studes have focused on user behavor at the query sesson level and found varyng results, wth some estmatng reformulated queres consttutng 40-52% of queres n a log [8][2]. Wang, et. al examned a log of more than 500,000 queres to a unversty search engne from [23]. They fnd trends n the number of queres receved by season, month, and day. We extend upon ths work by examnng the larger communty of general web searchers and analyzng trends correspondng to hour of day. Several studes examne query categores n small, statc logs. Spnk, et. al analyzed logs totalng more than one mllon queres submtted to the Excte web search engne durng sngle days n 997, 999, and 200 [8][9][20]. They classfed approxmately 2,500 queres from each log nto topcal categores and found that although search topcs have changed over the years, users behavors have not. Ross and Wolfram categorzed the top,000 term pars from the one mllon query Excte log nto 30 subject areas to show commonaltes of terms n categores [3]. Jansen, et. al used lsts of terms to dentfy mage, audo, and vdeo queres and measure ther presence n the one mllon query Excte log [9]. In order to examne the dfferences n queres from users n dfferent countres, Spnk, et. al, examned a 500,000 query log from the FAST web search engne durng 200, beleved to be used largely by Europeans at that tme, classfyng 2,500 queres from t nto the same topcal categores. They found dfferences between FAST and Excte n the topcs searched for [7]. Other work manually grouped queres by task. Broder defnes queres as nformatonal, navgatonal or transactonal and presents a study of AltaVsta users va a popup survey and manual categorzaton of 200 queres from a log [2]. Betzel, et. al mplctly categorzed queres from a search log as navgatonal by matchng them to edted ttles n web drectores to automatcally evaluate navgatonal web search []. Xe and Wolfram automatcally categorzed query terms by usng results from web search engnes to assgn the terms to broad subject categores [25]. Several studes of query cachng examne query frequency dstrbutons from a statc log, focusng on the average lkelhood of an arbtrary query beng repeated over the entre, fxed-length log. Lempel and Moran evaluated the performance of cachng strateges over a log of seven mllon queres to AltaVsta n 200 and found that the frequences of queres n ther log followed a power law []. Eron and McCurley compared query vocabulary from a log of nearly.3 mllon queres posed to a corporate ntranet to the vocabulary of web page anchor text and found that the frequency of queres and query terms follows a tal-heavy power law [5]. Xe and O Hallaron studed query logs from the Vvsmo meta-search engne of 0,88 queres over one month n 200 n comparson to the Excte log of.9 mllon over one day n 999 and found that although as n other studes over half of the queres are never repeated, the frequences of queres that are repeated do follow a Zpfan dstrbuton [26]. Sarava, et. al evaluated a two-level cachng scheme on a log of over 00,000 queres to a Brazlan search engne and found that query frequences follow a Zpf-lke dstrbuton [5]. Markatos smulated the effect of several types of query caches on an Excte query log of approxmately one mllon queres and found that tradtonal cachng methods provde sgnfcant mprovements n effcency [2]. Although tradtonal MRU-style caches obvously enhance throughput by explotng temporal localty at the mnuteto-mnute level, these studes do not examne changes n the query stream accordng to the hour of the day that may be leveraged n enhanced cache desgn. It s well known that dfferent users represent the same nformaton need wth dfferent query terms, makng query clusterng attractve when examnng groups of related queres. However, as Raghavan and Sever have shown, tradtonal smlarty measures are unsutable for fndng query-to-query smlarty [3]. Wen, et. al, ncorporated clck-through to cluster users queres [23]. In evaluatng ther system, they analyzed a random subset of 20,000 queres from a sngle month of ther approxmately -mllon queres-per-week traffc. They found
3 that the most popular 22. queres represent only 400 clusters of queres usng dfferng sets of query terms. Many web search servces have begun to offer vews of the most popular and/or changng (becomng drastcally more or less popular) queres: AOL Member Trends, Yahoo - Buzz Index, Lycos - The Lycos 50 wth Aaron Schatz, Google Zetgest, AltaVsta - Top Queres, Ask Jeeves, Fast (AllTheWeb). These vews necessarly ncorporate a temporal aspect, often showng popular queres for the current tme perod and those that are consstently popular. Some also break down popularty by topcal categores. Systems seekng to dsplay changng queres must address the ssue of relatve versus absolute change n a query s frequency to fnd queres whose change s nterestng, not smply a query that went from frequency one to two (a 20 jump), or one that went from 0,000 to,000 (a 000 absolute change). 3. OVERALL QUERY TRAFFIC We examne a search log consstng of hundreds of mllons of queres from a major commercal search servce over the sevenday perod from 2/26/03 through //04. Ths log represents queres from approxmately 50 mllon users. We preprocess the queres to normalze the query strngs by removng any case dfferences, replacng any punctuaton wth whte space (strppng advanced search operators from the approxmately 2% of queres contanng them), and compressng whte space to sngle spaces. The average query length s.7 terms for popular queres and 2.2 terms over all queres. On average, users vew only one page of results 8% of the tme, two pages 8% and three or more % of the tme. Frst, we examne trends n the query stream as a whole, and then focus on trends related to queres manually categorzed nto topcal categores. We begn our analyss of the overall stream by examnng how the volume of query traffc changes as we move from peak to nonpeak hours. We show the percentage of the day s total and dstnct number of queres for each hour n the day on average over our seven-day perod n Fgure (all tmes n our query log are Eastern Standard Tme). Only 0.7 of the day s total queres appear from 5-6AM, whereas 6.7% of the day s queres appear from 9-0PM. Perhaps more nterestngly, the rato of dstnct to total queres n a gven hour s nearly constant throughout the day. Ths shows that the average number of tmes a query s repeated s vrtually constant over the hours n a day, remanng near 2.4 wth only a 0.2 standard devaton. Although the average repetton of queres remans nearly constant, we can examne ths n greater detal by measurng the frequency dstrbuton of queres at varous hours n the day, as seen n Fgure 2. From ths analyss t s clear that the vast majorty of queres n an hour appear only one to fve tmes and that these rare queres consstently account for large portons of the total query volume throughout the course of the day. Percentage of Daly Query Traffc 8% 7% 6% 4% 2% % Percentage of Average Daly Query Traffc at Each Hour Hour of Day Fgure Although we have shown that the query dstrbuton does not change substantally over the course of a day, ths does not provde nsght nto how the sets of queres vary from one hour to the next. To examne ths, we measure the overlap between the sets of queres entered durng those hours. We use tradtonal set and bag overlap measures as gven n Equaton and Equaton 2, respectvely. Dstnct overlap measures the smlarty between the sets of unque queres from each hour, whle overall (bag) overlap measures the smlarty of ther frequency dstrbutons by ncorporatng the number of tmes each query appears n an hour, q ; A). Whle these measures examne the smlarty of the sets of queres receved n an hour and the number of tmes they are entered, they do not ncorporate the relatve popularty or rankng of queres wthn the query sets. To examne ths, we also measure the Pearson correlaton of the queres frequences. As can be seen from Equaton 3 (where C ( q; A) s the mean number of query repettons n perod A and s q; A) s the standard devaton of all the query frequences n perod A), ths measures the degree of lnear correlaton between the frequences of the queres n each hour, so two hours that had exactly the same queres wth exactly the same frequences would have a correlaton of one. Note that ths normalzes for the effect of dfferng query volume,.e., the correlaton of two hours wth exactly the same underlyng query dstrbutons smply scaled by a constant would also have a correlaton of one. Percentage of Total Queres Frequency Dstrbuton of Selected Hours from 2/26/03 2AM-AM 6AM-7AM 2PM-PM 6PM-7PM,00-0, , Frequency Ranges Fgure 2 Average Total Queres Average Dstnct Queres
4 dst. overlap ( A, B ) = A B A B query stream by hour we are able to nfer the effectveness of general cachng algorthms at those tmes. Equaton : Dstnct Overlap of Query Sets from Hours A and B overla A, B) = q A q ; A) + q A B q B mn( q ; A), q ; B)) q ; B) q A B mn( q ; A), q ; B)) Equaton 2: Overall Overlap of Query Sets from Hours A and B Percentage Sorted Average Overlap Characterstcs from /2/04 that Matched Each Hour Overlap Dstnct Overlap Pearson Hour of Day Fgure 4 r A, B n = n = ( q ; A) q; A))( q ; B) q; B)) s s q; A) q; B) Equaton 3: Pearson Correlaton of Query Frequences from Hours A and B Average Overlap Characterstcs of Matchng Queres from /2/04 Overlap Dstnct Overlap Pearson Hour of Day Fgure 3 In Fgure 3 we examne the average level of overlap and correlaton between the query sets receved durng the same hour for each day over our week. As measurng overlap over the set of all queres appearng n our week would be computatonally expensve, we use the set of all the tens of mllons of queres n the day after our seven-day perod as an ndependent sample and measure overlap at each hour n our week of the queres matchng those n that sample. Although we prevously saw that the frequency dstrbuton of queres does not substantally change across hours of the day, Fgure 3 shows that the smlarty between the actual queres that are receved durng each hour does n fact change. Ths trend seems to follow query volume, whch s apparent f we sort the same overlap data by query volume as s done n Fgure 4. Clearly, as query volume ncreases the queres that compose that traffc are more lkely to be smlar across samples of those peak tme perods. Ths fndng s consstent wth pror analyses of web query caches showng they sgnfcantly mprove performance under heavy load. The more redundancy they are able to detect, the more cachng algorthms are able to enhance throughput. Although the pror work prmarly measures the effect of ths redundancy n cache performance, t s obvous that redundancy must exst and be detected for cachng to succeed. By examnng the overall 4. QUERY CATEGORIES In Secton 3 we analyzed the entre query log. However, ths blanket vew of the query traffc does not provde nsght nto the characterstcs of partcular categores of queres that mght be exploted for enhanced effcency or effectveness. For example, a search provder who returns specalzed results for entertanment queres cannot determne from general query traffc alone whether a gven query s more lkely to be referrng to entertanment related content or how to best process and cache that query. The remander of our analyss focuses on trends relatng to topcal category of queres. Our query set s categorzed smply by exactly matchng queres to one of the lsts correspondng to each category. These lsts are manually constructed by edtors who categorze real users queres, generate lkely queres, and mport lsts of phrases lkely to be queres n a category (e.g., ctes n the US for the US Stes category). Queres that match at least one category lst comprse of the total query traffc on average. Ths represents mllons of queres per day. Sampled Categorzed Query Stream Breakdown Travel Sport s Shoppng Other 6% US Stes Personal Fnance Computng 9% Research & Learn 9% Holdays % Home Entertanment Health Fgure 5 To verfy that our defned category lsts suffcently cover the topcs n the query stream, we manually classfed a random sample of queres, assgnng them to Other f they dd not ntutvely ft nto an exstng category, as can be seen n Fgure 5. To determne the number of queres requred to acheve a representatve sample, we calculate the necessary sample sze n queres, ss = (z 2 σ 2 )/β 2, where z s the confdence level value, σ s
5 the sample standard devaton, and β s the error rate. By settng our confdence level to 99% and error rate to, we requre a sample of 600 queres. The relatve percentages for each category of the approxmately of query volume that match any category lst over our week (see Fgure 9) are wthn the error rate of those from our manually categorzed sample. Ths shows that our lsts are a reasonable representaton of these topcal categores. We focus on a subset of these categores and examne musc and moves ndependent of other entertanment queres. The relatve sze of each category lst we used s gven n Fgure 6. Obvously, not all queres lsted actually match those entered by users, especally when the category contans large mported lsts of phrases. Percentage of Categorzed Queres Relatve Percentage of Categorzed Queres Shoppng Computng Travel Home Health Government Research & Learnng Fgure 6 Although we have shown that our lsts are a far representaton of the topcs n the query stream, ths does not ndcate what porton of the frequency dstrbuton of that stream they represent. To determne ths, we measured the average proporton of queres matchng any category lst that appear at varous frequences each hour and compared them to the average overall hourly frequency dstrbuton of the query stream (see Fgure 7). Unsurprsngly, ths comparson shows that queres n the category lsts represent more popular, repeated queres than average, although the general shape of the dstrbutons s smlar. Percentage of Average Tota Matchng Queres Fgure 7 Holdays Sports Moves Personal Fnance Entertanment US Stes Musc Hourly Frequency Dstrbuton of Matchng Queres vs. All Queres Averaged over 7 Days and 6 Categores Avg. Matchng Queres Avg. Queres >, Trends n Category Popularty We begn our temporal analyss of topcal categores by measurng ther relatve popularty over the hours n a day. Frst, we examne the percent of total query volume matchng a selected group of category lsts, as can be seen n Fgure 8. It s clear that dfferent topcal categores are more and less popular at dfferent tmes of the day. Personal fnance, for example, becomes more 9 7 Frequency Ranges 5 3 popular from 7-0AM, whle musc queres become less popular. Although t s dffcult to compare the relatve level of popularty shft from one category to another due to the dfferences n scale of each of ther percentages of the query stream, t s clear that some categores popularty changes more drastcally throughout the day than others. 4% 2% % Fgure 8 In order to quantfy ths, we calculated the KL-dvergence (Equaton 4) between the lkelhood of recevng any query at a partcular tme and the lkelhood of recevng a query n a partcular category, as can be seen n Fgure 9. Ths reveals that the top three categores n terms of popularty are pornography, entertanment, and musc. D( q t) q c, t)) = q q t) q t) log q c, t) Equaton 4: KL-Dvergence of Query Occurrence Lkelhood for Category c and Total Stream at Tme t 6% 4% 2% % Category Percentage of Entre Query Stream and Dvergence from Lkelhood of any Query at Each Hour KL-Dvergence % of query stream Dstnct % of query stream Computng Sports Holdays Research and Learnng Categorcal Percent over Tme Health US Stes Shoppng Government Moves Travel Personal Fnance Category Fgure 9 Entertanment Health Personal Fnance Shoppng Musc USStes Hour of Day Home Musc Entertanment Comparng these dvergences to the proporton of categorzed queres n each category n Fgure 6 quckly llustrates that dvergence s not correlated wth the number of queres categorzed n each category. Also shown n Fgure 9 s the average percentage of the entre query volume and dstnct queres that match each category. Although the categores that cover the largest portons of the query stream also have the most relatve popularty fluctuaton, ths correlaton does not contnue throughout all categores.
6 We drlled down nto the hghly fluctuatng categores and examned the behavor of the queres wth the most hghly fluctuatng frequences n each category. From ths we hoped to gan some nsght nto the reasons why certan categores fluctuate, and the effect of terms and queres wth very hgh flux on those categores. For example, the three most changng queres for the entertanment category on average over our week were: Table : Top Three Fluctuatng Entertanment Queres gwyneth paltrow pars hlton orlando bloom All three of these queres are specfcally related to recent events n US popular culture; the actress Gwyneth Paltrow recently marred n secret, and the news of her nuptals broke durng the week we analyzed. Hlton Hotel heress Pars Hlton has been a popular topc recently; she starred n a prme tme realty TV show enttled The Smple Lfe. Also popular s Orlando Bloom, the actor who portrays a popular character n the Lord of the Rngs trlogy. As the fnal nstallment of the seres was released n US theatres durng the week pror to our query log, t s no surprse to see hs name as a top-changng query. Drllng down further, we pnponted some of the specfc nstances where these popular queres jumped the most. For example, n the afternoon of Frday, December 27th, the popularty of the query gwyneth paltrow skyrocketed. From 3-4PM, t occurred once, from 4-5PM t occurred 67 tmes, and from 5PM-6PM t occurred,855 tmes. The top changng (on average) twenty-fve queres, after normalzaton, n the Entertanment and Musc categores are shown n Table 2. Table 2: Top 25 Fluctuatng Queres from Musc and Entertanment We also looked at some of the most frequently changng terms to see how they relate to the change of entre queres contanng those terms. Some excellent examples of ths behavor n the Entertanment category nclude the terms pctures (the tenthmost changng term) and duff (the 7 th -most changng term). We looked at the popularty change (.e., change n frequency) for queres contanng these terms and found that several of them also exhbted large changes over tme. For example, on the afternoon of December 28 th from noon to 5PM EST, the query hlary duff changed from an ntal frequency of 27 from 2-PM to a peak of 3 (from 3-4PM), and then stablzed around 70 for the rest of the evenng; smlar spkes n frequency for ths query occurred at smlar tmes durng other days n our perod of study. 4.2 Trends n Unqueness of Queres Wthn Categores Although we have shown that dfferent categores have dfferng trends of popularty over the hours of a day, ths does not provde nsght nto how the sets of queres wthn those categores change throughout the day. In order to examne ths, we return to the overlap measures used n Secton 3. Overlap, dstnct overlap, and the Pearson correlaton of query frequences for Personal Fnance and Musc are shown n Fgure 0 and Fgure Overlap Dst. Olap Pearson Personal Fnance Overlap Musc lyrcs musc brtney spears furnture love hlary duff good charlotte sloppy seconds jessca smpson b2k emnem chrstna agulera smple plan justn tmberlake free musc lnkn park mchael jackson beyonce jennfer lopez 50 cent knky napster chc tupac blnk 82 Entertanment gwyneth paltrow pars hlton orlando bloom espn dsney johnny depp much musc dsney channel hgtv dsneychannel com www dsneychanel com kate holmes pctures pamela anderson cartoon network hlary duff fake chad mchael murray vvca a fox dsneychannel care bears salor moon www cartoonnetwork com days of our lves charmed tom wellng Fgure 0 Although the unqueness of queres n categores n general appears to be correlated wth that of the entre query stream (Fgure 3), that of partcular categores appears to be substantally dfferent from one to the next. For example, f we compare the overlap characterstcs of personal fnance wth those of musc, we see they are qute dfferent. Not only does personal fnance have generally hgher overlap, but t has a much hgher overall overlap than dstnct overlap, whereas they are nearly equal for musc. Other categores wth generally hgh overlap and dstnct overlap are shoppng, computng, and travel. Also, the correlaton of frequences of personal fnance queres s very hgh all day, ndcatng searchers are enterng the same queres roughly the same relatve amount of tmes, ths s clearly not true for musc. Some categores have a hgh Pearson correlaton. Ths ndcates that a sgnfcant porton of the queres n these categores s often ranked smlarly by frequency. These categores are: pornography, travel, research and learnng, and computng, and ther Pearson correlatons are llustrated n Fgure 2.
7 Overlap Dst. Olap Pearson Musc Overlap Fgure partcular topcal categores. For ths we use a set of topcal categores created by human edtors that represents approxmately of the average query traffc. We show that popularty of some of these categores fluctuates consderably whle other categores reman relatvely stable over the hours n a day. Addtonally, we show that the overlap and correlaton n popularty of the queres wthn each topcal category vares qute dfferently over the course of the day. Extendng ths analyss to nvestgate changes n the very rare queres not often matched by our category lsts would provde nsght nto whether those are changng smlarly to more popular queres. One method for approachng ths mght be to ncorporate automatc query classfcaton methods to extend our basc lsts It s clear that some categores have very smlarly ranked queres by frequency throughout the day, whle others vary dramatcally accordng to query volume. Referrng back to Fgure 6 and Fgure 9, unqueness of queres n partcular categores does not appear to be correlated wth the number of queres n ther respectve category lsts, the proporton of the query stream they represent, or the number of dstnct queres they match Pearson Correlatons of Frequences for Categores Fgure 2 Personal Fnance Musc Moves Computng Entertanment Government Ths type of data s potentally of great use to query cachng algorthms. For example, f t s known a pror that queres for certan categores are smlarly ranked throughout the day, they can be gven hgher prorty n a query-cachng scheme. Smlarly, queres n categores whose rankngs change vastly over tme mght be gven low cachng prorty. 5. CONCLUSIONS AND FUTURE WORK Ths study focuses on nvestgatng the nature of changes n the query stream of a very large search servce over tme. Understandng how users queres change over tme s crtcal to developng effectve, effcent search systems and to engneerng representatve test sets and evaluatons that drve ths development. In ths study we fnd trends over tme that are stable despte contnung fluctuaton n query volume. Although the average query s repeated only twce durng any gven hour of the day, the total query traffc vares both n magntude from one hour to the next, and also n degree of overlap and correlaton n popularty of the queres that are receved. In addton, we also fnd that the frequency dstrbuton of an hour s worth of queres remans constant throughout the day. Also, at the most general level, we fnd that query volume s hghest and query sets are most stable durng peak hours of the day. Ths study further nvestgates changes n the query stream over tme by examnng the nature of changes n popularty of Ths study s the gateway to a large and dverse body of future work. Integratng ths knowledge of Crcadan changes n the query stream by category wll lkely yeld mproved query dsambguaton, query cachng, and load balancng algorthms. 6. BIBLIOGRAPHY [] Betzel, S., Jensen, E., Chowdhury, A., and Grossman, D. Usng Ttles and Category Names from Edtor-drven Taxonomes for Automatc Evaluaton. In Proceedngs of CIKM 03 (New Orleans, LA, November, 2003), ACM Press. [2] Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2) (Fall, 2002). [3] Chowdhury, A., G. Pass. Operatonal Requrements for Scalable Search Systems, In Proceedngs of CIKM 03 (New Orleans, LA, November 2003), ACM Press. [4] Eastman, C., B. Jansen, Coverage, Relevance, and Rankng: The Impact of Query Operators on Web Search Engne Results, ACM Transactons on Informaton Systems, Vol. 2, No. 4, October 2003, Pages [5] Eron, N., K. McCurley. Analyss of Anchor Text for Web Search, In Proceedngs of SIGIR 03 (Toronto, Canada, July 2003), ACM Press. [6] Hawkng, D., Craswell, N., and Grffths, K. Whch Search Engne s Best at Fndng Onlne Servces? In Proceedngs of WWW0 (Hong Kong, May 200), Posters. Actual poster avalable as ter.pdf [7] Jansen, B. and Pooch, U. A revew of Web searchng studes and a framework for future research. Journal of the Amercan Socety for Informaton Scence and Technology 52(3), , 200. [8] Jansen, B., Spnk, A., and Saracevc, T. Real lfe, real users, and real needs: a study and analyss of user queres on the web. Informaton Processng and Management, 36(2) (2000), [9] Jansen, B.J., Goodrum, A., Spnk, A. Searchng for multmeda: vdeo, audo, and mage Web queres. World Wde Web 3(4), [0] Lawrence, S. and Gles, C.L. Searchng the World Wde Web. Scence 280(5360), 98-00, 998.
8 [] Lempel, R. and Moran, S. Predctve cachng and prefetchng of query results n search engnes. In Proceedngs of WWW2 (Budapest, May 2003). [2] Markatos, E.P. On Cachng Search Engne Query Results. In the Proceedngs of the 5th Internatonal Web Cachng and Content Delvery Workshop, May [3] Raghavan, V. and Sever, H. On the Reuse of Past Optmal Queres. In Proc. of the 995 SIGIR Conference, , Seattle, WA, July 995. [4] Ross, N. and Wolfram, D. End user searchng on the Internet: An analyss of term par topcs submtted to the Excte search engne. Journal of the Amercan Socety for Informaton Scence 5(0), , [5] Sarava, P., Moura, E., Zvan, N., Mera, W., Fonseca, R., Rbero-Neto, B. Rank-preservng two-level cachng for scalable search engnes. In Proc. of the 24th SIGIR Conference, 5-58, New Orleans, LA, September, 200. [6] Slversten, C., Henznger, M., Maras, H., and Morcz, M. Analyss of a very large web search engne query log. SIGIR Forum 33() (Fall, 999), 6-2. [7] Spnk, A., Ozmutlu, S., Ozmutlu, H.C., and Jansen, B.J. U.S. versus European web searchng trends. SIGIR Forum 36(2), 32-38, [8] Spnk, A., Jansen, B.J., Wolfram, D., and Saracevc, T. From E-sex to e-commerce: Web search changes. IEEE Computer, 35(3), 07-09, [9] Spnk, A., Wolfram, D., Jansen, B.J. and Saracevc, T. Searchng the Web: The Publc and Ther Queres. Journal of the Amercan Socety of Informaton Scence 53(2), , 200. [20] Spnk, A., Jansen, B.J., and Saracevc, T. Vox popul: The publc searchng of the web. Journal of the Amercan Socety of Informaton Scence 52 (2), , 200. [2] Spnk, A., Jansen, B.J., and Ozmultu, H.C. Use of query reformulaton and relevance feedback by Excte users. Internet Research: Electronc Networkng Applcatons and Polcy 0 (4), [22] Sullvan, D. Searches Per Day. Search Engne Watch, February, [23] Wang, P., Berry, M., and Yang, Y. Mnng longtudnal web queres: Trends and patterns. Journal of the Amercan Socety for Informaton Scence and Technology 54(8), , June [24] J. Wen, J. Ne, H. Zhang Query Clusterng usng User Logs ACM Transactons on Informaton Systems, Vol. 20, No., January 2002, pp59-8. [25] Wolfram, D., H. Xe, Subject categorzaton of query terms for explorng Web users search nterests, Journal of the Amercan Socety for Informaton Scence, v.53 n.8, p , June [26] Xe, Y., O Hallaron, D. Localty n Search Engne Queres and Its Implcatons for Cachng. Infocom 2002.
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
An Empirical Study of Search Engine Advertising Effectiveness
An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
DEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
LIFETIME INCOME OPTIONS
LIFETIME INCOME OPTIONS May 2011 by: Marca S. Wagner, Esq. The Wagner Law Group A Professonal Corporaton 99 Summer Street, 13 th Floor Boston, MA 02110 Tel: (617) 357-5200 Fax: (617) 357-5250 www.ersa-lawyers.com
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT
APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
Multiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME
August 7 - August 12, 2006 n Baden-Baden, Germany SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME Vladmr Šmovć 1, and Vladmr Šmovć 2, PhD 1 Faculty of Electrcal Engneerng and Computng, Unska 3, 10000
A Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.
HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall
SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent
Hollinger Canadian Publishing Holdings Co. ( HCPH ) proceeding under the Companies Creditors Arrangement Act ( CCAA )
February 17, 2011 Andrew J. Hatnay [email protected] Dear Sr/Madam: Re: Re: Hollnger Canadan Publshng Holdngs Co. ( HCPH ) proceedng under the Companes Credtors Arrangement Act ( CCAA ) Update on CCAA Proceedngs
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
RequIn, a tool for fast web traffic inference
RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked
Web Object Indexing Using Domain Knowledge *
Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct
Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000
Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from
Cloud-based Social Application Deployment using Local Processing and Global Distribution
Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer
An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
A Fast Incremental Spectral Clustering for Large Data Sets
2011 12th Internatonal Conference on Parallel and Dstrbuted Computng, Applcatons and Technologes A Fast Incremental Spectral Clusterng for Large Data Sets Tengteng Kong 1,YeTan 1, Hong Shen 1,2 1 School
Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses
Student Performance n Onlne Quzzes as a Functon of Tme n Undergraduate Fnancal Management Courses Olver Schnusenberg The Unversty of North Florda ABSTRACT An nterestng research queston n lght of recent
Analyzing Search Engine Advertising: Firm Behavior and Cross-Selling in Electronic Markets
WWW 008 / Refereed Track: Internet Monetzaton - Sponsored Search Aprl -5, 008 Beng, Chna Analyzng Search Engne Advertsng: Frm Behavor and Cross-Sellng n Electronc Markets Anndya Ghose Stern School of Busness
How To Predct On The Web For Hfmd
Proceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence Predctng Epdemc Tendency through Search Behavor Analyss Danqng Xu, Yqun Lu, Mn Zhang, Shaopng Ma, Anq Cu, Lyun Ru State
Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
The Current Employment Statistics (CES) survey,
Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,
"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
HARVARD John M. Olin Center for Law, Economics, and Business
HARVARD John M. Oln Center for Law, Economcs, and Busness ISSN 1045-6333 ASYMMETRIC INFORMATION AND LEARNING IN THE AUTOMOBILE INSURANCE MARKET Alma Cohen Dscusson Paper No. 371 6/2002 Harvard Law School
High Correlation between Net Promoter Score and the Development of Consumers' Willingness to Pay (Empirical Evidence from European Mobile Markets)
Hgh Correlaton between et Promoter Score and the Development of Consumers' Wllngness to Pay (Emprcal Evdence from European Moble Marets Ths paper shows that the correlaton between the et Promoter Score
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS
IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
A Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.
Corporate Polces & Procedures Human Resources - Document CPP216 Leave Management Frst Produced: Current Verson: Past Revsons: Revew Cycle: Apples From: 09/09/09 26/10/12 09/09/09 3 years Immedately Authorsaton:
A DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
Efficient Bandwidth Management in Broadband Wireless Access Systems Using CAC-based Dynamic Pricing
Effcent Bandwdth Management n Broadband Wreless Access Systems Usng CAC-based Dynamc Prcng Bader Al-Manthar, Ndal Nasser 2, Najah Abu Al 3, Hossam Hassanen Telecommuncatons Research Laboratory School of
Analysis of Premium Liabilities for Australian Lines of Business
Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
Automating Analysis of Large-Scale Botnet Probing Events
Automatng Analyss of Large-Scale Botnet Probng Events Zhchun L, Anup Goyal and Yan Chen Northwestern Unversty 2145 Sherdan Road Evanston, IL, USA {lzc,ago210,ychen}@cs.northwestern.edu Vern Paxson UC Berkeley
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
The Use of Analytics for Claim Fraud Detection Roosevelt C. Mosley, Jr., FCAS, MAAA Nick Kucera Pinnacle Actuarial Resources Inc.
Paper 1837-2014 The Use of Analytcs for Clam Fraud Detecton Roosevelt C. Mosley, Jr., FCAS, MAAA Nck Kucera Pnnacle Actuaral Resources Inc., Bloomngton, IL ABSTRACT As t has been wdely reported n the nsurance
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services
When Network Effect Meets Congeston Effect: Leveragng Socal Servces for Wreless Servces aowen Gong School of Electrcal, Computer and Energy Engeerng Arzona State Unversty Tempe, AZ 8587, USA xgong9@asuedu
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
The Application of Fractional Brownian Motion in Option Pricing
Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn [email protected]
A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing
A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure
Section 5.4 Annuities, Present Value, and Amortization
Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA*
HOUSEHOLDS DEBT BURDEN: AN ANALYSIS BASED ON MICROECONOMIC DATA* Luísa Farnha** 1. INTRODUCTION The rapd growth n Portuguese households ndebtedness n the past few years ncreased the concerns that debt
Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *
Evaluatng the Effects of FUNDEF on Wages and Test Scores n Brazl * Naérco Menezes-Flho Elane Pazello Unversty of São Paulo Abstract In ths paper we nvestgate the effects of the 1998 reform n the fundng
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
Using Content-Based Filtering for Recommendation 1
Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of
A Dynamic Load Balancing for Massive Multiplayer Online Game Server
A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,
Cacheability Analysis of HTTP traffic in an Operational LTE Network
Cacheablty Analyss of HTTP traffc n an Operatonal LTE Network Buvaneswar A.Ramanan, Lawrence M.Drabeck, Mark Haner, Nach Nth, Therry E.Klen, Chtra Sawkar Bell Labs Research, Alcatel-Lucent 6 Mountan Ave
Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error
Intra-year Cash Flow Patterns: A Smple Soluton for an Unnecessary Apprasal Error By C. Donald Wggns (Professor of Accountng and Fnance, the Unversty of North Florda), B. Perry Woodsde (Assocate Professor
The impact of hard discount control mechanism on the discount volatility of UK closed-end funds
Investment Management and Fnancal Innovatons, Volume 10, Issue 3, 2013 Ahmed F. Salhn (Egypt) The mpact of hard dscount control mechansm on the dscount volatlty of UK closed-end funds Abstract The mpact
Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*
Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of
Optimization of File Allocation for Video Sharing Servers
Optmzaton of Fle Allocaton for Vdeo Sharng Servers Emad Mohamed Abd Elrahman Abousabea, Hossam Aff To cte ths verson: Emad Mohamed Abd Elrahman Abousabea, Hossam Aff. Optmzaton of Fle Allocaton for Vdeo
A heuristic task deployment approach for load balancing
Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of
Simple Interest Loans (Section 5.1) :
Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part
Comparing Performance Metrics in Organic Search with Sponsored Search Advertising
Comparng erformance Metrcs n Organc Search wth Sponsored Search Advertsng Anndya Ghose Stern School of Busness ew York Unversty ew York, Y-1001 [email protected] Sha Yang Stern School of Busness ew
Joe Pimbley, unpublished, 2005. Yield Curve Calculations
Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
Schedulability Bound of Weighted Round Robin Schedulers for Hard Real-Time Systems
Schedulablty Bound of Weghted Round Robn Schedulers for Hard Real-Tme Systems Janja Wu, Jyh-Charn Lu, and We Zhao Department of Computer Scence, Texas A&M Unversty {janjaw, lu, zhao}@cs.tamu.edu Abstract
A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
ASSESSMENT OF STEAM SUPPLY FOR THE EXPANSION OF GENERATION CAPACITY FROM 140 TO 200 MW, KAMOJANG GEOTHERMAL FIELD, WEST JAVA, INDONESIA
ASSESSMENT OF STEAM SUPPLY FOR THE EXPANSION OF GENERATION CAPACITY FROM 14 TO 2 MW, KAMOJANG GEOTHERMAL FIELD, WEST JAVA, INDONESIA Subr K. Sanyal 1, Ann Robertson-Tat 1, Chrstopher W. Klen 1, Steven
A graph-theoretic framework for isolating botnets in a network
SECURITY AND COMMUNICATION NETWORKS Securty Comm. Networks (212) Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com)..5 SPECIAL ISSUE PAPER A graph-theoretc framework for solatng botnets n a network
How To Plan A Network Wide Load Balancing Route For A Network Wde Network (Network)
Network-Wde Load Balancng Routng Wth Performance Guarantees Kartk Gopalan Tz-cker Chueh Yow-Jan Ln Florda State Unversty Stony Brook Unversty Telcorda Research [email protected] [email protected] [email protected]
Two Faces of Intra-Industry Information Transfers: Evidence from Management Earnings and Revenue Forecasts
Two Faces of Intra-Industry Informaton Transfers: Evdence from Management Earnngs and Revenue Forecasts Yongtae Km Leavey School of Busness Santa Clara Unversty Santa Clara, CA 95053-0380 TEL: (408) 554-4667,
Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
Many e-tailers providing attended home delivery, especially e-grocers, offer narrow delivery time slots to
Vol. 45, No. 3, August 2011, pp. 435 449 ssn 0041-1655 essn 1526-5447 11 4503 0435 do 10.1287/trsc.1100.0346 2011 INFORMS Tme Slot Management n Attended Home Delvery Nels Agatz Department of Decson and
