Type Less, Find More: Fast Autocompletion Search with a Succinct Index

Size: px
Start display at page:

Download "Type Less, Find More: Fast Autocompletion Search with a Succinct Index"

Transcription

1 Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast Max-Plack-Istitut für Iformatik Saarbrücke, Germay bast@mpi-if.mpg.de Igmar Weber Max-Plack-Istitut für Iformatik Saarbrücke, Germay iweber@mpi-if.mpg.de ABSTRACT We cosider the followig full-text search autocompletio feature. Imagie a user of a search egie typig a query. The with every letter beig typed, we would like a istat display of completios of the last query word which would lead to good hits. At the same time, the best hits for ay of these completios should be displayed. Kow idexig data structures that apply to this problem either icur large processig times for a substatial class of queries, or they use a lot of space. We preset a ew idexig data structure that uses o more space tha a state-of-the-art compressed iverted idex, but that yields a order of magitude faster query processig times. Eve o the large TREC Terabyte collectio, which comprises over 25 millio documets, we achieve, o a sigle machie ad with the idex o disk, average respose times of oe teth of a secod. We have built a full-fledged, iteractive search egie that realizes the proposed autocompletio feature combied with support for proximity search, semi-structured (XML) text, subword ad phrase completio, ad sematic tags. Categories ad Subject Descriptors H.3.1 [Cotet Aalysis ad Idexig]: Idexig Methods; H.3.3 [Cotet Aalysis ad Idexig]: Retrieval Models; H.5.2 [User Iterfaces]: Theory ad Methods Geeral Terms Algorithms, Desig, Experimetatio, Huma Factors, Performace, Theory Keywords Autocompletio, Empirical Etropy, Idex Data Structure 1. INTRODUCTION Autocompletio is a widely used mechaism to get to a desired piece of iformatio quickly ad with as little kowledge ad effort as possible. Oe of its early uses was i the Uix Shell, where pressig the tabulator key gives a list of all file ames that start with whatever has bee typed o Permissio to make digital or hard copies of all or part of this work for persoal or classroom use is grated without fee provided that copies are ot made or distributed for profit or commercial advatage ad that copies bear this otice ad the full citatio o the first page. To copy otherwise, to republish, to post o servers or to redistribute to lists, requires prior specific permissio ad/or a fee. SIGIR 06, August 6 11, 2006, Seattle, Washigto, USA. Copyright 2006 ACM /06/ $5.00. the commad lie after the last space. Nowadays, we fid a similar feature i most text editors, ad i a large variety of browsig GUIs, for example, i file browsers, i the Microsoft Help suite, or whe eterig data ito a web form. Recetly, autocompletio has bee itegrated ito a umber of (web ad desktop) search egies like Google Suggest or Apple s Spotlight. We discuss more applicatios i Sectio 1.2. I the simpler forms of autocompletio, the list of completios is simply a rage from a (typically precomputed) list of words. For the Uix Shell, this is the list of all file ames i all directories listed i the PATH variable. For the text editors, this is the list of all words etered ito the file so far (ad maybe also words from related files). I Google Suggest, completios appear to come from a precompiled list of popular queries. For these kids of applicatios we ca easily achieve fast respose times by two biary or B-tree searches i the (pre)sorted list of cadidate strigs. More advaced forms of autocompletio take ito accout the cotext i which the to-be-completed word has bee typed. The problem we propose ad discuss i this paper is of this kid. The formal problem defiitio will be give i Sectio 2. More iformally, imagie a user of a search egie typig a query. The with every letter beig typed, we would like a istat display of completios of the last query word which would lead to good hits. At the same time, the best hits for ay of these completios should be displayed. All this should preferably happe i less time tha it takes to type a sigle letter. For example, assume a user has typed coferece sig. Promisig completios might the be sigir, sigmod, etc., but ot, for example, sigature, assumig that, although sigature by itself is a pretty frequet word, the query coferece sigature leads to oly few good hits. See Figure 1 for a screeshot of our search egie respodig to that query. For a live demo, see Our results We have developed a ew idexig data structure, amed HYB, which uses o more space tha a state-of-the-art compressed iverted idex, ad which ca respod to autocompletio queries as described above withi a small fractio of a secod, eve for collectio sizes i the Terabyte rage. Our mai competitor i this paper is the iverted idex, referred to as INV i the followig. Other data structures that could be directly applied to our problem either use a lot of space or have other limitatios; we discuss these i Sectio 1.2. We give a rigorous mathematical aalysis of HYB ad INV with respect to both space usage ad query processig times. Our aalysis accurately predicts the real behavior o our test collectios. Cocerig space usage, we defie a otio of empirical

2 Figure 1: A screeshot of our search egie for the query coferece sig searchig the Eglish Wikipedia. The list of completios ad hits is updated automatically ad istatly after each keystroke, hece the absece of ay kid of search butto. The umber i paretheses after each completio is the umber of hits that would be obtaied if that completio where typed. Query words eed ot be completed, however, because the search egie does a implicit prefix search: if, for example, the user cotiued typig coferece sig proc, completios ad hits for proc, e.g., proceedigs, would be from the 185 hits for coferece sig. etropy [11] [22], which captures the iheret space complexity of a idex idepedet of a particular compressio scheme. We prove that the empirical etropy of HYB is essetially equal to that of INV, ad we fid that the actual space usage of our implemetatio of the two idex structures is ideed almost equal, for each of our three test collectios. Cocerig processig times, we give a precise quatificatio of the umber of operatios eeded, from which we derive bouds for the worst, best, ad average-case behavior of INV ad HYB. We also take ito accout the differet latecies of sequetial ad radom access to data [1]. We compare INV ad HYB o three test collectios with differet characteristics. Oe of our collectios has bee (semi-)publicly searchable over the last year, so that we have autocompletio queries from real users for it. Our largest collectio is the TREC Terabyte bechmark with over 25 millio documets [7]. O all three collectios ad o all the queries we cosidered, HYB outperforms INV by a factor of i worst-case query processig time, ad by a factor of 3 10 i average case query processig time. I absolute terms, HYB achieves average query processig of oe teth of a secod or less o all collectios, o a sigle machie ad with the idex o disk (ad ot i mai memory). We have built a full-fledged search egie that supports autocompletio queries of the described kid combied with support for proximity/phrase search, XML tags, subword ad phrase completio, ad category iformatio. All of these extesios are described i Sectio Related work The autocompletio feature as described so far is remiiscet of stemmig, i the sese that by stemmig, too, prefixes istead of full words are cosidered [23]. But ulike stemmig, our autocompletio feature gives the user feedback o which completios of the prefix typed so far would lead to highly raked documets. The user ca the assess the relevace of these completios to his or her search desire, ad decide to (i) type more letters for the last query word, e.g., i the query from Figure 1, type i ad r so that the query is the coferece sigir, or to (ii) start with the ext query word, e.g., type a space ad the proc, or to (iii) stop searchig as, e.g., the user was actually lookig for oe of the hits show i Figure 1. There is o way to achieve this by a stemmig preprocessig step, because there is o way to foresee the user s itet. This kid of user iteractio is well kow to improve retrieval effectiveess i a variety of situatios [21]. While our autocompletio feature is for the purpose of fidig iformatio, autocompletio has also bee employed for the purpose of predictig user iput, for example, for typig messages with a mobile phoe, for users with disabilities cocerig typig, or for the compositio of stadard letters [6] [14] [20] [8] [15]. I [12], cotextual iformatio has bee used to select promisig extesios for a query. Payter et al. have devised a iterface with a zoomig-i property o the word level, ad based o the idetificatio of frequet phrases [18]. We get a related feature by the subword/phrase-completio mechaism described i Sectio 4.4. Our autocompletio problem is related to but distictly differet from multi-dimesioal rage searchig problems, where the collectio cosists of tuples (of some fixed dimesio, for example, pairs of word prefixes), ad queries are askig for all tuples that match a give tuple of rages [10] [2] [4] [13]. These data structures could be used for our autocompletio problem, provided that we were willig to limit the umber of query words. For fast processig times, however, the space cosumptio of ay of these structures is o the order of N 1+d, where N is the size of a iverted idex, ad d > 0 grows (fast) with the dimesio. For our au-

3 tocompletio queries, we ca achieve fast query processig times ad space efficiecy at the same time because we have the set of documets matchig the part of the query before the last word already computed (amely whe this part was beig typed). I a sese, our autocompletio problem is therefore a 1 1/2 - dimesioal rage searchig problem. Fially, there is a large variety of alteratives to the iverted idex i the literature. We have cosidered those we are aware of with regard to their applicability to our autocompletio problem, but foud them either usuitable or iferior to the iverted idex i that respect. For example, approaches that cosider documet by documet are boud to be slow due to a poor locality of access; i cotrast, both INV ad HYB are mostly scaig log lists; see Sectio 3. Sigature files were foud to be i o way superior (but sigificatly more complicated) to the iverted idex i all major respects i [24]. Suffix arrays ad related data structures address the issue of full substrig search, which is ot what we wat here (but see Sectio 4.4); a direct applicatio of a data structure like [11] would have the same efficiecy problems as INV, whereas multi-dimesioal variats like [10] require super-liear space, as explaied above. 2. FORMAL PROBLEM DEFINITION AND DEFINITION OF EMPIRICAL ENTROPY The followig defiitio of our autocompletio problem takes either positioal iformatio, or rakig of the completios or of the documets ito accout. We will first, i Sectio 3, aalyze our data structures for this basic settig. I Sectio 4, we the show how to geeralize the data structures ad their aalysis to cope with positioal iformatio, rakig, ad a umber of other useful ehacemets. This geeralizatio will be straightforward. Defiitio 1. A autocompletio query is a pair (D, W ), where W is a rage of words (all possible completios of the last word which the user has started typig) ad D is a set of documets (the hits for the precedig part of the query). To process the query meas to compute the subset W W of words that occur i at least oe documet from D, as well as the subset D D of documets that cotai at least oe of these words. For our example coferece sig, D is the set of all documets cotaiig a word startig with coferece (computed whe the last letter of this word was typed), ad W is the rage of all words from the collectio startig with sig. For queries with oly a sigle word, e.g., cofer, D is simply the set of all documets. To aalyze the iheret space complexity of INV ad HYB idepedetly of the specialties of a particular compressio scheme, we itroduce a otio of empirical etropy. Both INV ad HYB are essetially a collectio of (multi)sets ad sequeces. The followig defiitio gives a atural otio of etropy for each such buildig block, ad for arbitrary combiatios of them (similar defiitios have bee made i [11] [22]). The reader might first wat to skip the followig defiitio ad come back to it whe it is first used i the aalysis that follows. Defiitio 2. We defie empirical etropy for the followig etities, where H(p 1,..., p l ) = l (pi log 2 pi) is the l-ary etropy fuctio. (a) For a subset of size with elemets from a uiverse of size, the empirical etropy is H( /, 1 /) (iclude each elemet of the uiverse ito the subset with probability /), which is log 2 + ( ) log 2. (b) For a multisubset of size with elemets from a uiverse of size, the empirical etropy is ( + ) H( /( + ), /( + )) (cosider a bitvector of size +, ad let a bit be 0 with probability /( + ) ad 1 otherwise; the prefix sums at the 0-bits give the multisubset), which is + + log 2 + log 2. (c) For a sequece of elemets from a uiverse of size l, where the ith elemet occurs i times ( l = ), the empirical etropy is H( 1/,..., l /) (for each positio, pick elemet i with probability i/), which is 1 log l log 2. 1 l (d) For a collectio of l etities with empirical etropies H 1,..., H l, the empirical etropy is simply H H l. 3. INV, HYB, AND THEIR ANALYSIS I this sectio we will describe INV ad HYB, ad aalyze them with respect to their empirical etropy ad their processig time for autocompletio queries accordig to Defiitio 1. Query processig times will be quatified i terms of all relevat parameters; from this we ca easily derive worstcase, best-case, ad average-case bouds. Our average-case bouds make simplifyig assumptios o the distributio of words i the documets, but evertheless tur out to predict the actual behavior quite well. Implemetatio issues ad the actual performace of our implemetatios of INV ad HYB will be discussed i Sectio 5. We briefly commet o idex costructio times i Sectio The iverted idex (INV) The iverted idex is the data structure of choice for most search applicatios: it is relatively easy to implemet ad exted by other features, it ca be compressed well, it is very efficiet for short queries, ad it has a excellet locality of access [23]. I this paper, by INV we mea the followig data structure: for each word store the list of all (ids of) documets cotaiig that word, sorted i ascedig order. We do ot cosider ehacemets such as skip poiters [17], which we would expect to give similar beefits for both INV ad HYB, however at the price of a icreased space usage. I the followig, we first estimate the iheret space efficiecy (empirical etropy) of INV. We the aalyze the time complexity of processig autocompletio queries with INV, ad poit out two iheret problems. Lemma 1. Cosider a istace of INV with documets ad m words, ad where the ith words occurs i i distict documets (so that m is the total umber of word-i-documet pairs). Let H iv be the empirical etropy accordig to Defiitio 2. The m ( ) 1 H iv i l 2 + i log 2, ad for all collectios cosidered i this paper (where most i are much smaller tha ) this boud is tight up to 2%. Proof. Accordig to Defiitio 2 (a) ad (d), we have H iv = m ( i log 2 i i + ( i) log 2 ). i To prove the lemma, it suffices to observe that because 1 + x e x for ay real x, ( i) log 2 = ( ) i l 1 + i i i l 2 i l 2.

4 Lemma 1 tells us that if the documets i each list were picked uiformly at radom, the a Golomb-ecodig of the gaps [23] from oe documet id to the ext (for list i, the expected size of a gap would be / i) would achieve a space usage very close to H iv bits. I our implemetatio, we opted to ecode gaps with the Simple-9 ecodig from [3], which is easy to implemet, yet achieves very fast decompressio speeds at the price of oly a moderate loss i compressio efficacy; details are reported i Sectio 5. Lemma 2. With INV, a autocompletio query (D, W ) ca be processed i the followig time, where D w deotes the iverted list for word w: D W + w W D w + w W D D w log W. Assumig that the elemets of W, D, ad the D w are picked uiformly at radom from the set of m words ad the set of documets, respectively, this boud has a expected value of D W + W m N + D W N log W. m Remark. By pickig the elemets of a set S at radom from a set U, we mea that each subset of U of size S is equally likely for S. We are ot makig ay radomess assumptio o the sizes of W, D, ad D w above. Proof sketch. The obvious way to use a iverted idex to process a autocompletio query (D, W ) is to compute, for each w W, the itersectios D D w. The, W is simply the set of all w for which the itersectio was o-empty, ad D is the uio of all (o-empty) itersectios. The itersectios ca be computed i time liear i the total iput volume w W ( D + Dw ).1 The uio ca be computed by a W -way merge, which requires o the order of log W time per elemet scaed. With the radomess assumptios, the expected size of D w is N/m, ad the expected size of D D w is D / N/m. Lemma 2 highlights two problems of INV. The first is that the term D W ca become prohibitively large: i the worst case, whe D is o the order of (i.e., the first part of the query is ot very discrimiative) ad W is o the order of m (i.e., oly few letters of the last query word have bee typed), the boud is o the order of m, that is, quadratic i the collectio size. The secod problem is due to the required mergig. While the volume w W D Dw will typically be small oce the first query word has bee completed, it will be large for the first query word, especially whe oly few letters have bee typed. As we will see i Sectio 5, INV frequetly takes secods for some queries, which is quite udesirable i a iteractive settig, ad is exactly what motivated us to develop a more efficiet idex data structure. 3.2 Our ew data structure (HYB) The basic idea behid HYB is simple: precompute iverted lists for uios of words. Assume a autocompletio query (D, W ), where the uio of all lists for word rage W have bee precomputed. We would the get D with a sigle itersectio (of D with the precomputed list). However, from this precomputed list aloe we ca o loger ifer the set W of completios leadig to a hit. Sice W ca be a arbitrary word rage, it is also ot clear which uios should 1 There are asymptotically faster algorithms for the itersectio of two lists [5], but i our experimets, we got the best results with the simple liear-time itersect, which we attribute to its compact code ad perfect locality of access. be precomputed, especially whe we do ot wat to use more space tha a (optimally compressed) iverted idex. The aalysis give i this sectio suggests the followig approach: group the words i blocks so that the legths of the iverted lists i each block sum to (approximately) c, for some costat c < 1 (we will later choose c 0.2). For each block, store the uio of the covered iverted lists as a compressed multiset, usig a effective gap ecodig scheme just as doe for INV (repetitios of the same elemet i the multiset correspod to a gap of zero). I parallel to each multiset, for each elemet x store the id of the word that led to the iclusio of (this occurrece of) x i the multiset. This gives a sequece of word ids, the legth of which is exactly the size of the multiset. Ecode these word ids with code legth (approximately) log 2 (( l )/ i) for the ith word, where i is the umber of documets cotaiig the ith word, ad l is the umber of words i the respective block. Here is a example. Let oe of the blocks comprise four words A, B, C, ad D, with iverted lists A : 3, 5, 6, 8, 9, 11, 12, 15 B : 5, 11 C : 3, 7, 11, 13 D : 3, 8 We would the like to store, i compressed form, the multiset (of documet ids) ad the sequece (of word ids) A C D A B A C A D A A B C A C A The optimal ecodig of the words A, B, C, D would use code legths log 2 (16/8) = 1, log 2 (16/2) = 3, log 2 (16/4) = 2, log 2 (16/2) = 3, respectively, for example A = 0, B = 110, C = 10, D = 111. A optimal ecodig of the four gaps 0, 1, 2, 3 that occur i the above multiset of documet ids would be 0, 10, 110, 111, respectively. What we actually store are the the two bit vectors (where the are solely for better readability; the codes i this example are prefix-free) Note that due to the two differet ecodigs the two lists ed up havig differet legths i compressed form, ad this is also what will happe i reality. The followig aalysis will make very clear that (i) oe should choose blocks of equal list volume (ad ot, for example, of equal umber of words), (ii) this volume should be a small but substatial fractio of the umber of documets (ad either smaller or larger), ad (iii) the lists of documet ids should be gap-ecoded while the lists of word ids should be etropy-ecoded. As for the space usage, we will first derive a very tight estimate of the etropy of HYB, ad the show that, somewhat surprisigly, if we oly choose the block volume to be a small eough fractio of the umber of documets, the etropy of HYB is almost exactly that of INV. We will the show how HYB, whe the blocks are chose of sufficietly large volume, ca be used to process autocompletio queries i time liear i the umber of documets, for ay reasoable word rage. Sice HYB essetially scas log lists, without the eed for ay mergig, except whe the word rage is huge, it also has a excellet locality of access. Lemma 3. Cosider a istace of HYB with words ad m documets, where the ith word occurs i i documets, ad where for each block the sum of the i with i from that block is c, for some c > 0. The the empirical

5 etropy H hyb, defied accordig to Defiitio 2, satisfies m ( H hyb i 1 + c/2 ) + i log l 2 2, i ad the boud is tight as c 0. Proof. Cosider a fixed block of HYB, ad let i deote the umber of documets cotaiig the ith word belogig to that block. Throughout this proof, let i i deote the sum over all these i (so that the sum over all i i from all blocks gives the m i from the lemma). Accordig to Defiitio 2 (b), (c), ad (d), the empirical etropy of this block is the i i log + i i + 2 i + log i 2 + i i i i log i i 2. i Now addig the first ad the last term, the argumets of the logarithms partially cacel out (!), ad we get i i log + i i + i 2 + log i 2. i Now usig that, by assumptio, i i = c, we obtai ( ) i i (1 + 1/c) log 2 (1 + c) + log 2. i Sice (1 + 1/c) l(1 + c) 1 + c/2 for all c > 0 (ot obvious, but true), we ca upper boud this (tightly, as c 0) by ( ) 1 + c/2 i i + log l 2 2. i This bouds the empirical etropy of a sigle block of HYB (the sum goes over all words from that block). Addig this over all blocks gives us the boud claimed i the lemma. Comparig Lemma 3 with Lemma 1, we see that if we let the blocks of HYB be of volume at most c, for some small fractio c, the the empirical etropy of HYB is essetially that of a iverted idex. I Sectio 4.2, we will see that whe we take positioal iformatio ito accout, the empirical etropy of HYB actually becomes less tha that of INV, for ay choice of block volumes. I our implemetatio of HYB, we compress the lists of documet ids by a Simple-9 ecodig of the gaps, just as described for INV above. For the lists of word ids, etropyoptimal compressio could be achieved by arithmetic ecodig [23], but for efficiecy reasos, we compress word ids as follows: assumig that the word frequecies i a block have a Zipf-like distributio, it is ot hard to see that a uiversal ecodig with log x bits for umber x [17] of the raks of the words, if sorted i order of descedig frequecy, is etropy-optimal, too. We agai opted for Simple-9 ecodig of these raks, which gives us a reasoable compressio ad very fast decompressio speed, without the eed for ay large codebook. We take block sizes as /5, but also take word/prefix boudaries ito accout such that frequet prefixes like pro, com, the get a block o their ow. This is to avoid that a query uecessarily spas more tha oe block. Lemma 4. Usig HYB with blocks of volume N, autocompletio queries (D, W ) ca be processed i the followig time, where D w is the iverted list for word w D w (1+ D /N )+ ( ) D D w log D w /N. w W w W w W For N = Θ() ad W m /N, ad assumig that the elemets of D, D w, ad W are picked uiformly at radom from the set of all documets or all m words, respectively, the expected processig time is bouded by O(). Proof sketch. Accordig to Defiitio 1, we have to compute, give (D, W ), the set W of words from W cotaied i documets from D, as well as the set D of documets cotaiig at least oe such word. For each block B, a straightforward itersectio of the give D with the list of documet-word pairs from B, gives us the set W B of all words from W from block B, as well as the set D B of all documet from D which cotai a word from B. From these, D ca be computed by a k-way merge, where k is the umber of blocks that cotai a word from W, ad W ca be computed by a simple liear-time sort ito W buckets (because W is a rage). The umber k of blocks is w Dw /N, which is O(1) i expectatio, give the radomess assumptios stated i the lemma. 3.3 Idex costructio time While gettig from a collectio of documets (files) to INV is essetially a matter of oe big exteral sort [23], HYB does ot require a full iversio of the data. For our experimets, however, we built the compressed idices for both INV ad HYB from a itermediate fully iverted text versio of the collectio, which takes essetially the same time for both. 4. EXTENSIONS I this sectio, we describe a umber of extesios of the basic autocompletio facility we have described ad aalyzed so far. The first (rakig) is essetial for practical usability, the secod (proximity search) greatly wides the spectrum of search tasks for which autocompletio ca be useful, ad the others (support for XML tags, subword ad phrase completio, ad sematic tags) give advaced search facilities to the expert searcher. 4.1 Rakig So far, we have cosidered the followig problem (from Defiitio 1): while the user is typig a query, compute after each keystroke the list of all completios of the last query word that lead to at least oe hit, as well as the list of all hits that would be obtaied by ay of these completios. I practice, oly a selectio of items from these lists ca ad will be preseted to the user, ad it is of course crucial that the most relevat completios ad hits are selected. A stadard approach for this task i ad-hoc retrieval is to have a precomputed score for each word-i-documet pair, ad whe a query is beig processed, to aggregate these scores for each cadidate documet, ad retur documets with the highest such aggregated scores [23]. Both INV ad HYB ca be easily adapted to implemet ay such scorig ad aggregatio scheme: store by each word-i-documet pair its precomputed score, ad whe itersectig, aggregate the scores. A decisio has to be made o how to recocile scores from differet completios withi the same documet. We suggest the followig: whe mergig the itersectios (which gives the set D accordig to Defiitio 1), compute for each documet i D the maximal score achieved for some completio i W cotaied i that documet, ad compute for each completio i W the maximal score achieved for a hit from D achieved for this completio. Asymptotically, the iclusio of rakig does ot affect the time bouds derived i Lemmas 2 ad 4, ad our experimets show that rakig ever takes more tha half of the total query processig time; see Sectio 5.4. The icrease i space usage depeds o the selected scorig scheme, ad is the same for INV ad HYB. It is for these reasos, that we factored out the rakig aspect from our basic Defiitio 1

6 ad from our space ad time complexity aalysis i Sectio Proximity/Phrase searches With a properly chose scorig fuctio, such as BM25, mere rakig by score aggregatio ofte gives very satisfactory precisio/recall behavior [19]. There are may queries, however, where the decisive cue o whether a particular documet is relevat or ot lies i the fact whether certai of the query words occur close to each other i that documet. See [16] for a recet positive result o the use of proximity iformatio i ad-hoc retrieval. Our autocompletio feature icreases the beefits of a proximity operator, because the use of this operator will strogly arrow dow the list of completios displayed to the user, which i tur makes it easier for the user to filter out irrelevat completios. For example, whe searchig the Wikipedia collectio the most relevat completio for the o-proximity query max pl would be place (because max ad place are both frequet words), but for the proximity query max..pl it is plack. Here the two dots.. idicate that words should occur withi x words of each other, for some user-defiable parameter x. It is ot hard to exted both INV ad HYB to support proximity search: i the documet lists (INV ad HYB) as well as i the word lists (HYB oly), we duplicate each etry as may times as it occurs i the correspodig documet, ad store the positios i a parallel array of the same size. Word ad documet lists are compressed just as before, ad the lists of positios are gap-ecoded by Simple-9, just like the lists of documet ids. The itersectio routie is adapted to cosider a proximity widow as a additioal parameter. As we will see i Sectio 5.3, the positio lists icrease the idex size by a factor of 4-5, for both INV ad HYB (without ay kid of stopword removal). We ca exted our aalysis from Sectio 3 to predict this factor as follows. If we replace i, the umber of documets cotaiig the ith word, by N i, the total umber of occurreces of the ith words, ad, the umber of documets, by N, the total umber of word occurreces, we ca show that (details omitted) H hyb H iv m (N i/ l 2 + N i log 2 (N/N i)), where H iv ad H hyb deote the empirical etropy of INV ad HYB, respectively, with positioal iformatio. That is, with positioal iformatio, HYB is always more space-efficiet tha INV, irrespectively of how we divide ito blocks. It ca be show that N i log 2 (N/N i) 2N i log 2 (/ i), ad sice o average a word occurs about 2-3 times i a documet, this is just 4-5 times i log 2 (/ i), which was the correspodig term i the etropy boud for INV or HYB without positioal iformatio (Lemmas 1 ad 3). 4.3 Semistructured (XML) text May documets cotai sematic iformatio i the form of tag pairs. We briefly sketch how we ca make good use of such tags i our autocompletio sceario. Assume that i the archive of a mailig list, the subject of a mail is eclosed i a <subject>... <\subject> tag pair. We ca the easily implemet a operator = (i a way very similar to our implemetatio of the proximity operator), such that for a query subject=sig oly those completios of sig are displayed which actually occur i the subject lie of a mail, ad oly such documets are displayed as hits. 4.4 Completio to subwords ad phrases Aother simple yet ofte useful extesio to the basic autocompletio feature is to cosider as potetial matches ot oly the words as they occur i the collectio, but also meaigful subwords ad phrases. A example ivolvig a subword: for the query ormal..vec we might wat to see eigevector as oe of the relevat completios. A example ivolvig a phrase: for the query max plack we might wat to see the phrase max plack istitute as oe of the relevat completios. It is ot hard to see, that the autocompletio accordig to Defiitio 1 will automatically provide this feature if oly we add the correspodig subwords/phrases to the idex. 4.5 Category iformatio Our autocompletio feature ca be combied with a umber of other techologies that ehace the sematics of a corpus. To give just oe more example here, assume we have tagged all coferece ames i a collectio. The assume we duplicate all coferece ames i the idex, with a added prefix of, say, cof:, e.g., cof:sigir. By the way our autocompletio works, for the query seattle cof: we would the get a list of all ames of cofereces that occur i documets that also metio Seattle. 5. EXPERIMENTS We implemeted both INV ad HYB i compressed format, as described i Sectios 3.1 ad 3.2. Each idex is stored i a sigle file with the idividual lists cocateated ad a array of list offsets at the ed. The vocabulary (which is the same for INV as for HYB) is stored i a separate file. All our code is i C++. All our experimets were ru o a Dual Optero machie, with 2 Itel Xeo 3 GHz processors, 8 GB of mai memory, ad ruig Liux. We esured that the idex was ot cached i mai memory. 5.1 Test collectios We compared the performace of INV ad HYB o three collectios of differet characteristics. The first collectio is a mailig-list archive plus several ecyclopedias o homeopathic medicie ( This collectio has bee searchable via our egie over the past year by a audiece of several hudred people. The secod collectio cosists of the complete dumps of the Eglish ad Germa Wikipedia from December 2005 (search.mpi-if.mpg.de/ wikipedia). The third collectio is the large TREC Terabyte collectio [7], which served as a stress test for our idex structures (ad for the authors as well). Details about all three collectio are give i Table 1, where the Raw size of a collectio is the total size of the origial, ucompressed files i their origial formats (e.g., HTML or PDF). 5.2 Queries For the Homeopathy collectio, we picked 5,732 maximal queries (that is, queries, which are ot a true prefix of aother query) from a fixed time slice of our query log for that collectio. From each of these maximal queries, a sequece of autocompletio queries was geerated by typig the query from left to right, with a miimal prefix legth of 3. Like that, for example, the maximal query acidum phos gives rise to the 6 autocompletio queries aci, acid, acidu, acidum, acidum pho, ad acidum phos. For the Wikipedia collectio, autocompletio queries were geerated i the same maer from a set of 100 radomly geerated queries, with a distributio of the umber of query words ad of the term frequecy similar to that of the real queries for the Homeopathy collectio. For the Terabyte collectio, autocompletio queries were geerated, i agai the same way but with a miimal prefix legth of 4, from the (stemmed) 50 ad-hoc queries of the Robust Track Bechmark [7], e.g., squirrel cotrol protect. For all three collectios, we removed queries cotaiig words that had o completio at all i the respective collectio.

7 For Homeopathy ad Wikipedia, all queries were ru as proximity queries (usig a full positioal idex accordig to Sectio 4.2), while for Terabyte, they were executed as ordiary documet-level queries. For all collectios, both completios ad hits were raked as we described it i Sectio 4.1 (details of the aggregatio fuctio are omitted here). Each autocompletio query was processed accordig to Defiitio 1, e.g., for acidum pho, we compute all completios of pho that occur i a documet which also cotais a word startig with acidum, as well as the set of all such documets. The result for each autocompletio query is remembered i a history, so that we do ot eed to recompute the set of documets matchig the first part of the query. E.g., whe processig acidum pho, we ca take the set of documets matchig acidum from the history; see the explaatio followig Defiitio 1. The autocompletio queries with miimal prefix legths, like aci ad acidum pho for Homeopathy, are the most difficult oes. All other queries ca be easily processed by what we call filterig. For example, both the completios ad hits for the query acidum phos ca be obtaied by retrievig the list of matchig word-i-documet pairs for the previously processed query acidum pho from the history, ad by filterig out, i a liear sca over that list, all those pairs, where the word starts with phos. I practice, this is always faster tha processig such queries as full autocompletio queries accordig to Defiitio 1. Note that this filterig is idetical for INV ad HYB. We evertheless iclude the filtered queries i our experimets, because i reality we will always get a mix of both kids of queries. Table 3 will provide figures for just the difficult (ufiltered) queries. We remark that the history is useful also for cachig purposes, but i our experimets we used it solely for the purpose of filterig. 5.3 Idex space Table 1 shows that INV ad HYB use essetially the same space o all three test collectios, ad that HYB is slighter more compact tha INV for a full positioal idex. This is exactly what Lemmas 1 ad 3, ad the derivatio i Sectio 4.2 predicted! The sizes for both INV ad HYB exceed that predicted by the empirical etropy by about 50%. This is due to our use of the Simple-9 compressio scheme, which trades very fast decompressio time for about this icrease i space usage [3]. A combiatio of Golomb ad arithmetic ecodig would give us a space usage closer to the empirical etropy. However, decompressio would the become the computatioal bottleeck for almost all queries, see Table 3. We remark that, by the way we did our aalysis, ay ew compressio scheme with improved compressio ratio/decompressio speed profile, would immediately yield a correspodig improvemet for both INV ad HYB. 5.4 Query processig time Table 2 shows that i terms of query processig time, HYB outperforms INV by a large margi o all collectios. With respect to maximum processig time, which is especially critical for a iteractive applicatio, the improvemet is by a factor of With respect to average processig time, which is critical for throughput i a high-load sceario, the improvemet is by a factor of Table 3 gives iterestig isights ito where exactly INV loses agaist HYB. The table shows a breakdow of the ruig times of those queries for the Terabyte collectio, which were ot aswered by filterig as discussed above. (Note that the breakdow of the filtered queries would be idetical for both methods.) The table differetiates betwee 1-word queries like squi, squir, etc. ad multi-word queries like squirrel cotr or squirrel cotrol prot. For the 1-word queries, o itersectios have to be computed for either INV or HYB. Accordig to Lemma 2, the Collectio Homeopathy Wikipedia Terabyte Raw size 452 MB 7.4 GB 426 GB #documets 44,015 2,866,503 25,204,103 #words 263,817 6,700,119 25,263,176 #items 12 [27] millio 0.3 [0.8] billio 3.5 billio Vocabulary 2.9 MB 73 MB 239 MB Etropy 6.6 [13.1] bits 9.1 [14.0] bits 8.4 bits INV idex size 13 [70] MB 0.5 [2.2] GB 4.6 GB -per item 9.3 [21.5] bits 12.8 [23.2] bits 11.0 bits HYB idex size 14 [62] MB 0.5 [2.0] GB 4.9 GB -per item 9.4 [19.2] bits 13.0 [20.7] bits 11.6 bits -per doc 3.9 [15.4] bits 4.3 [14.8] bits 5.9 bits -per word 5.5 [3.8] bits 8.7 [5.9] bits 5.7 bits Table 1: Properties of our three test collectios, ad the space cosumptio of INV versus HYB. The etries i square brackets are for a full positioal idex, without ay word whatsoever removed. Collectio Method mea 90% 99% max Homeopathy Wikipedia Terabyte INV HYB INV HYB INV HYB Table 2: Average, 90%-ile, 99%-ile ad maximum processig times i secods for INV versus HYB o our three test collectios. mergig of the itersectios the domiates for INV, ad this ideed shows i the first colum of Table 3. For multiword queries, the result volume w D Dw (Lemmas 2 ad 4) goes dow, ad, accordig to Lemma 2, the itersectio costs domiate for INV, which shows i the third colum of Table 3. I cotrast, colums two ad four demostrate that HYB achieves a better balace of the costs for readig, ucompressig, ad itersectig, ad oe of these essetial operatios becomes the bottleeck. HYB avoids mergig altogether sice, by costructio, the potetial completios from the give word rage W always lie withi a sigle block. The read time of HYB is about 50% larger tha that of INV, because HYB always reads a whole block of size Θ(), eve for small word rages. This also partially explais why HYB speds more time decompressig tha INV; the other factor is that decompressio of the word ids is more expesive tha decompressio of the documet ids. As we remarked i Sectio 4.1, the absolute time for rakig is

8 the same for both methods. Rakig takes more time o average for the 1-word queries, because these ted to have larger result sets. The compariso with the time eeded for the maiteace of the history, which is othig but memory allocatio ad copyig, shows that all of HYB s operatio are essetially fast list scas. Query size 1-word multi-word Idex type INV HYB INV HYB average time secs secs secs secs read.024 7% %.032 1% % decompress.011 3% %.023 1% % itersect % % mergig %.010.4% rakig % %.007.3%.007 4% history % %.062 3% % Table 3: Breakdow of average processig times for INV ad HYB, for the difficult (ufiltered) queries o Terabyte. 6. CONCLUSIONS We have itroduced a autocompletio feature for fulltext search, ad preseted a ew compact idexig data structure for supportig this feature with very fast respose times. We have built a full-fledged search egie aroud this feature, ad we have give argumets, why we believe it to be practically useful. Give the iteractivity of this egie, the ext logical step followig this work would be to coduct a user study for verifyig that belief. We also see potetial for a further speed-up of query processig time by applyig techiques from top-k query processig [9], i order to display the most relevat hits ad completios without first computig ad rakig all of them. 7. ACKNOWLEDGEMENTS May thaks to our metor David Grossma for his ecouragemet ad may valuable commets. 8. REFERENCES [1] A. Aggarwal ad J. S. Vitter. The iput/output complexity of sortig ad related problems. Commuicatios of the ACM, 31(9): , [2] S. Alstrup, G. S. Brodal, ad T. Rauhe. New data structures for orthogoal rage searchig. I 41st Symposium o Foudatios of Computer Sciece (FOCS 00), pages , [3] V. N. Ah ad A. Moffat. Iverted idex compressio usig word-aliged biary codes. Iformatio Retrieval, 8: , [4] L. Arge, V. Samoladas, ad J. S. Vitter. O two-dimesioal idexability ad optimal rage search idexig. I 18th Symposium o Priciples of Database Systems (PODS 99), pages , [5] R. Baeza-Yates. A fast set itersectio algorithm for sorted sequeces. Lecture Notes i Computer Sciece, 3109: , [6] S. Bickel, P. Haider, ad T. Scheffer. Learig to complete seteces. I 16th Europea Coferece o Machie Learig (ECML 05), pages , [7] C. L. A. Clarke, N. Craswell, ad I. Soboroff. The TREC terabyte retrieval track. SIGIR Forum, 39(1):25, [8] J. J. Darragh, I. H. Witte, ad M. L. James. The reactive keyboard: A predictive typig aid. IEEE Computer, pages 41 49, [9] R. Fagi, A. Lotem, ad M. Naor. Optimal aggregatio algorithms for middleware. J. Comput. Syst. Sci., 66(4): , [10] P. Ferragia, N. Koudas, S. Muthukrisha, ad D. Srivastava. Two-dimesioal substrig idexig. Joural of Computer ad System Sciece, 66(4): , [11] P. Ferragia ad G. Mazii. Idexig compressed text. Joural of the ACM, 52(4): , [12] L. Fikelstei, E. Gabrilovich, Y. Matias, E. Rivli, Z. Sola, G. Wolfma, ad E. Ruppi. Placig search i cotext: The cocept revisited. I 10th Iteratioal World Wide Web Coferece (WWW10), pages , [13] V. Gaede ad O. Güther. Multidimesioal access methods. ACM Computig Surveys, 30(2): , [14] K. Grabski ad T. Scheffer. Setece completio. I 27th Coferece o Research ad Developmet i Iformatio Retrieval (SIGIR 04), pages , [15] M. Jakobsso. Autocompletio i full text trasactio etry: a method for humaized iput. I Coferece o Huma Factors i Computig Systems (CHI 86), pages , [16] D. Metzler, T. Strohma, H. Turtle, ad W. B. Croft. Idri at TREC 2004: Terabyte track. I 13th Text Retrieval Coferece (TREC 04), [17] A. Moffat ad J. Zobel. Self-idexig iverted files for fast text retrieval. ACM Trasactios o Iformatio Systems, 14(4): , [18] G. W. Payter, I. H. Witte, S. J. Cuigham, ad G. Buchaa. Scalable browsig for large collectios: A case study. I 5th Coferece o Digital Libraries (DL 00), pages , [19] S. E. Robertso, S. Walker, M. M. Beaulieu, M. Gatford, ad A. Paye. Okapi at TREC-4. I 4th Text Retrieval Coferece (TREC 95), pages 73 96, [20] T. Stocky, A. Faaborg, ad H. Lieberma. A commosese approach to predictive text etry. I Coferece o Huma Factors i Computig Systems (CHI 04), pages , [21] E. M. Voorhees. Query expasio usig lexical-sematic relatios. I 17th Coferece o Research ad Developmet i Iformatio Retrieval (SIGIR 94), pages , [22] H. Williams ad J. Zobel. Compressig itegers for fast file access. Computer Joural, 42(3): , [23] I. H. Witte, T. C. Bell, ad A. Moffat. Maagig Gigabytes: Compressig ad Idexig Documets ad Images, 2d editio. Morga Kaufma, [24] J. Zobel, A. Moffat, ad K. Ramamohaarao. Iverted files versus sigature files for text idexig. ACM Trasactios o Database Systems, 23(4): , 1998.

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction THE ARITHMETIC OF INTEGERS - multiplicatio, expoetiatio, divisio, additio, ad subtractio What to do ad what ot to do. THE INTEGERS Recall that a iteger is oe of the whole umbers, which may be either positive,

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science I-class Exercise: CS100: Itroductio to Computer Sciece What is a flip-flop? What are the properties of flip-flops? Draw a simple flip-flop circuit? Lecture 3: Data Storage -- Mass storage & represetig

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

CS100: Introduction to Computer Science

CS100: Introduction to Computer Science Review: History of Computers CS100: Itroductio to Computer Sciece Maiframes Miicomputers Lecture 2: Data Storage -- Bits, their storage ad mai memory Persoal Computers & Workstatios Review: The Role of

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

ODBC. Getting Started With Sage Timberline Office ODBC

ODBC. Getting Started With Sage Timberline Office ODBC ODBC Gettig Started With Sage Timberlie Office ODBC NOTICE This documet ad the Sage Timberlie Office software may be used oly i accordace with the accompayig Sage Timberlie Office Ed User Licese Agreemet.

More information

Overview on S-Box Design Principles

Overview on S-Box Design Principles Overview o S-Box Desig Priciples Debdeep Mukhopadhyay Assistat Professor Departmet of Computer Sciece ad Egieerig Idia Istitute of Techology Kharagpur INDIA -721302 What is a S-Box? S-Boxes are Boolea

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD sashka.davis@gmail.com RUSSELL IMPAGLIAZZO Dept. of Computer

More information

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology

TruStore: The storage. system that grows with you. Machine Tools / Power Tools Laser Technology / Electronics Medical Technology TruStore: The storage system that grows with you Machie Tools / Power Tools Laser Techology / Electroics Medical Techology Everythig from a sigle source. Cotets Everythig from a sigle source. 2 TruStore

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uiv-ates.fr Jea-Luc Marichal Applied Mathematics

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Basic Elements of Arithmetic Sequences and Series

Basic Elements of Arithmetic Sequences and Series MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic

More information

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV) Ehacig Oracle Busiess Itelligece with cubus EV How users of Oracle BI o Essbase cubes ca beefit from cubus outperform EV Aalytics (cubus EV) CONTENT 01 cubus EV as a ehacemet to Oracle BI o Essbase 02

More information

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean 1 Social Studies 201 October 13, 2004 Note: The examples i these otes may be differet tha used i class. However, the examples are similar ad the methods used are idetical to what was preseted i class.

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Desktop Management. Desktop Management Tools

Desktop Management. Desktop Management Tools Desktop Maagemet 9 Desktop Maagemet Tools Mac OS X icludes three desktop maagemet tools that you might fid helpful to work more efficietly ad productively: u Stacks puts expadable folders i the Dock. Clickig

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

(VCP-310) 1-800-418-6789

(VCP-310) 1-800-418-6789 Maual VMware Lesso 1: Uderstadig the VMware Product Lie I this lesso, you will first lear what virtualizatio is. Next, you ll explore the products offered by VMware that provide virtualizatio services.

More information

Simple Annuities Present Value.

Simple Annuities Present Value. Simple Auities Preset Value. OBJECTIVES (i) To uderstad the uderlyig priciple of a preset value auity. (ii) To use a CASIO CFX-9850GB PLUS to efficietly compute values associated with preset value auities.

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Breaking Undercover: Exploiting Design Flaws and Nonuniform Human Behavior

Breaking Undercover: Exploiting Design Flaws and Nonuniform Human Behavior Breakig Udercover: Exploitig Desig Flaws ad Nouiform Huma Behavior Toi Perković* FESB, Uiversity of Split, Croatia toperkov@fesbhr Shuju Li* Uiversity of Kostaz, Germay shujuli@ui-kostazde Asma Mumtaz

More information

Now here is the important step

Now here is the important step LINEST i Excel The Excel spreadsheet fuctio "liest" is a complete liear least squares curve fittig routie that produces ucertaity estimates for the fit values. There are two ways to access the "liest"

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Convention Paper 6764

Convention Paper 6764 Audio Egieerig Society Covetio Paper 6764 Preseted at the 10th Covetio 006 May 0 3 Paris, Frace This covetio paper has bee reproduced from the author's advace mauscript, without editig, correctios, or

More information

Baan Service Master Data Management

Baan Service Master Data Management Baa Service Master Data Maagemet Module Procedure UP069A US Documetiformatio Documet Documet code : UP069A US Documet group : User Documetatio Documet title : Master Data Maagemet Applicatio/Package :

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series utomatic Tuig for FOREX Tradig System Usig Fuzzy Time Series Kraimo Maeesilp ad Pitihate Soorasa bstract Efficiecy of the automatic currecy tradig system is time depedet due to usig fixed parameters which

More information

MTO-MTS Production Systems in Supply Chains

MTO-MTS Production Systems in Supply Chains NSF GRANT #0092854 NSF PROGRAM NAME: MES/OR MTO-MTS Productio Systems i Supply Chais Philip M. Kamisky Uiversity of Califoria, Berkeley Our Kaya Uiversity of Califoria, Berkeley Abstract: Icreasig cost

More information

Professional Networking

Professional Networking Professioal Networkig 1. Lear from people who ve bee where you are. Oe of your best resources for etworkig is alumi from your school. They ve take the classes you have take, they have bee o the job market

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Discriminative Models of Integrating Document Evidence and Document-Candidate Associations for Expert Search

Discriminative Models of Integrating Document Evidence and Document-Candidate Associations for Expert Search Discrimiative Models of Itegratig Documet Evidece ad Documet-Cadidate Associatios for Expert Search Yi Fag Departmet of Computer Sciece Purdue Uiversity West Lafayette, IN 47907, USA fagy@cs.purdue.edu

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

CCH Accountants Starter Pack

CCH Accountants Starter Pack CCH Accoutats Starter Pack We may be a bit smaller, but fudametally we re o differet to ay other accoutig practice. Util ow, smaller firms have faced a stark choice: Buy cheaply, kowig that the practice

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Chapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives Douglas A. Lapp Multiple Represetatios for Patter Exploratio with the Graphig Calculator ad Maipulatives To teach mathematics as a coected system of cocepts, we must have a shift i emphasis from a curriculum

More information

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations 44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:

More information

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern. 5.5 Fractios ad Decimals Steps for Chagig a Fractio to a Decimal. Simplify the fractio, if possible. 2. Divide the umerator by the deomiator. d d Repeatig Decimals Repeatig Decimals are decimal umbers

More information

insight reporting solutions

insight reporting solutions reportig solutios Create ad cotrol olie customized score reports to measure studet progress ad to determie ways to improve istructio. isight Customized Reportig empowers you to make data-drive decisios.

More information

A Balanced Scorecard

A Balanced Scorecard A Balaced Scorecard with VISION A Visio Iteratioal White Paper Visio Iteratioal A/S Aarhusgade 88, DK-2100 Copehage, Demark Phoe +45 35430086 Fax +45 35434646 www.balaced-scorecard.com 1 1. Itroductio

More information

Math C067 Sampling Distributions

Math C067 Sampling Distributions Math C067 Samplig Distributios Sample Mea ad Sample Proportio Richard Beigel Some time betwee April 16, 2007 ad April 16, 2007 Examples of Samplig A pollster may try to estimate the proportio of voters

More information

Engineering Data Management

Engineering Data Management BaaERP 5.0c Maufacturig Egieerig Data Maagemet Module Procedure UP128A US Documetiformatio Documet Documet code : UP128A US Documet group : User Documetatio Documet title : Egieerig Data Maagemet Applicatio/Package

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

How to read A Mutual Fund shareholder report

How to read A Mutual Fund shareholder report Ivestor BulletI How to read A Mutual Fud shareholder report The SEC s Office of Ivestor Educatio ad Advocacy is issuig this Ivestor Bulleti to educate idividual ivestors about mutual fud shareholder reports.

More information

The Forgotten Middle. research readiness results. Executive Summary

The Forgotten Middle. research readiness results. Executive Summary The Forgotte Middle Esurig that All Studets Are o Target for College ad Career Readiess before High School Executive Summary Today, college readiess also meas career readiess. While ot every high school

More information

Application and research of fuzzy clustering analysis algorithm under micro-lecture English teaching mode

Application and research of fuzzy clustering analysis algorithm under micro-lecture English teaching mode SHS Web of Cofereces 25, shscof/20162501018 Applicatio ad research of fuzzy clusterig aalysis algorithm uder micro-lecture Eglish teachig mode Yig Shi, Wei Dog, Chuyi Lou & Ya Dig Qihuagdao Istitute of

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information