Supporting Efficient Topk Queries in TypeAhead Search


 Dorcas Davidson
 1 years ago
 Views:
Transcription
1 Suppoting Efficient Topk Queies in TypeAhead Seach Guoliang Li Jiannan Wang Chen Li Jianhua Feng Depatment of Compute Science, Tsinghua National Laboatoy fo Infomation Science and Technology (TNList), Tsinghua Univesity, Beijing 84, China. Depatment of Compute Science, UC Ivine, CA , USA ABSTRACT Typeahead seach can onthefly find answes as a use types in a keywod quey. A main challenge in this seach paadigm is the highefficiency equiement that queies must be answeed within milliseconds. In this pape we study how to answe topk queies in this paadigm, i.e., as a use types in a quey lette by lette, we want to efficiently find the k best answes. Instead of inventing completely new algoithms fom scatch, we study challenges when adopting existing topk algoithms in the liteatue that heavily ely on two basic listaccess methods: andom access and soted access. We pesent two algoithms to suppot andom access efficiently. We develop novel techniques to suppot efficient soted access using list puning and mateialization. We extend ou techniques to suppot fuzzy typeahead seach which allows mino eos between quey keywods and answes. We epot ou expeimental esults on seveal eal lage data sets to show that the poposed techniques can answe topk queies efficiently in typeahead seach. Categoies and Subject Desciptos H.3.3 [Infomation Seach and Retieval]: models Geneal Tems Algoithms, Expeimentation, Pefomance Keywods Typeahead seach, topk seach, fuzzy seach Retieval. INTRODUCTION To give instant feedback when uses fomulate seach queies, many infomation systems suppot autocomplete seach, which shows esults immediately afte a use types in a patial keywod quey. As an example, almost all the majo seach engines nowadays automatically suggest possible keywod queies as a use types in patial keywods. Most autocomplete systems teat a quey with multiple keywods as asingle sting, and find answes with text that matches the sting Pemission to make digital o had copies of all o pat of this wok fo pesonal o classoom use is ganted without fee povided that copies ae not made o distibuted fo pofit o commecial advantage and that copies bea this notice and the full citation on the fist page. To copy othewise, to epublish, to post on seves o to edistibute to lists, equies pio specific pemission and/o a fee. SIGIR 2, August 2 6, 22, Potland, Oegon, USA. Copyight 22 ACM /2/8...$5.. exactly. To ovecome this limitation, a new typeahead seach paadigm has emeged ecently [2, 3]. Using this paadigm, a system teats a quey as a set of keywods, and does a fulltext seach on the undelying data to find answes including the keywods. We teat the last keywod in the quey as a patial keywod the use is completing. Fo instance, a quey gaph sig on a publication table can find publication ecods with the keywod gaph andakeywod that has sig as a pefix, such as sigi, sigmod, and signatue. In this way, a use can get instant feedback afte typing keywods, thus can obtain moe knowledge about the undelying data to fomulate a quey moe easily. Ji et al. [3] extended typeahead seach by allowing mino eos between queies and answes. As a use types in quey keywods, the system can find elevant ecods with keywods simila to the quey keywods. This featue is especially impotant when the use has limited knowledge about the exact epesentation of entities she is looking fo. Fo instance, if a use types in a patial quey chitos falut, the system can find ecods appoximately matching the two keywods despite the typo in the quey, such as a ecod with keywods Chistos Faloutsos. Clealy these featues can futhe impove use seach expeiences. In this pape we study how to answe anking queies in typeahead seach on lage amounts of data. That is, as a use types in a keywod quey lette by lette, we want to onthefly find the most elevant (o topk ) ecods. One appoach fist finds ecods matching those quey keywods, and then computes thei anking scoes to find the most elevant ones. This appoach is not efficient when thee ae a lage numbe of candidate answes to compute and stoe. Existing typeahead seach appoaches assume an index stuctue with a tie fo the keywods in the undelying data, and each leaf node has an inveted list of ecods with this keywod, with the weight of this keywod in the ecod [3, 9]. As an example, Table shows a sample collection of publication ecods. Fo simplicity, we only list some of the keywods fo each ecod. Figue shows the coesponding index stuctue. (Moe details about the index ae in Section 3.) Suppose a use types in a quey gaph icdm li. Fo exact seach, we find ecods containing the fist two keywods and a wod with pefix of li, e.g., ecod 5. Fo fuzzy seach, we compute ecods with keywods simila to quey keywods, and ank them to find the best answes. Fo each complete keywod, we find keywods simila to the quey keywod. Fo instance, both keywods icdm and icdl ae simila to the second quey keywod. The last keywod 355
2 Table : Publication ecods with sample keywods. Recod ID Recod gaph icdm... gaph goup lui... 2 gay icdl liu... 3 gaph icdl lin lui... 4 gaph goup icdm lin liu... 5 gaph gay goss icdm lin liu... 6 gay goup icdm lin liu... 7 gay goss goup icdl lin... 8 goss icdl liu... 9 icdm liu... li is teated as a pefix condition, since the use is still typing at the end of this keywod. We find keywods that have a pefix simila to li, such as lin, liu, and lui. We access the inveted lists of these simila keywods to find ecods and ank them to find the best answes fo the use. A key question is: how to access inveted lists on tie leaf nodes efficiently to answe topk queies? Instead of inventing completely new algoithms fom scatch, we study how to adopt a plethoa of algoithms in the liteatue fo answeing topk queies by accessing lists (e.g., [2, 2]). These algoithms shae the same famewok poposed by Fagin [6], in which we have lists of ecods soted based on vaious conditions. An aggegation function takes the scoes of a ecod fom these lists and computes the final scoe of the ecod. Thee ae two methods to access these lists: () Random Access: Given a ecod id, we can etieve the scoe of the ecod on each list; (2) Soted Access: We etieve the ecod ids on each list following the list ode. In this pape we study technical challenges when adopting these algoithms, and focus on new optimization oppotunities that aise in ou poblem. In paticula, we study how to suppot the two types of access opeations efficiently by utilizing chaacteistics specific to ou index stuctues and access methods. We make the following contibutions: ) In Section 3, we pesent a fowadlistbased method fo suppoting andom access on the inveted lists, and develop a heapbased method and listmateialization techniques to suppot soted access efficiently. 2) In Section 4 we study fuzzy typeahead seach. We popose a listpuning technique to impove the pefomance of soted access, and study how to impove the techniques based on fowad lists and list mateialization fo fuzzy seach. Due to the challenging natue of the poblem, ou extensions ae technically nontivial. 3) In Section 5 we pesent ou expeimental esults on eal lage data sets to show the efficiency of ou techniques. We have deployed seveal systems using this paadigm, which have been used egulaly and well accepted by uses due to its fiendly inteface and high efficiency. 2. FORMULATION AND PRELIMINARIES TypeAhead Seach: LetR be a collection of ecods such as the tuples in a elational table. Let D be the set of wods in R. Let Q be a quey the use has typed in, which is a sequence of keywods w,w 2,...,w m. We teat the last keywod w m as a patial keywod the use is completing, and othe keywods as complete keywods the use has completed 2. As a use types in a keywod quey lette by lette, typeahead seach onthefly finds ecods that contain the fist m keywods and a wod with the last keywod as a pefix. 2 Ou method can be easily extended to the case that evey keywod is taken as a patial keywod. a p y h 5,9 4,7 3,4,3,2 g i c i o d n s u l m s p 3,9 8,9 7,4,9 4,9 7,8 7,8 8,2 7,8 5,8 6,4 5,4 3,2 6,4 6,5 5,3 2,2 4,3 9,4 4,2,3 5,8 4,7 2,9 6,5 5,8 9,4 6,4 2,3 7,3 8, Figue : Tie index stuctue. u l u i,6 3,4 Without loss of geneality, each sting in the data set and a quey is assumed to use lowecase lettes. Fo example, in Table, R = {,,..., 9}, D = {gaph, icdm, goup, lui,...}. Suppose a use types in a quey icdm ga. We teat icdm as a complete keywod and ga as a patial keywod. Recods, 4, 5, and 6 ae potentially elevant answes. Fo example, contains complete keywod icdm andwod gaph witha pefixof ga. When the use types in moe lettes and submits quey icdm gaph li, we teat icdm and gaph as complete keywods and li as a patial keywod. Recods 4 and 5 ae potentially elevant answes. Topk Answes: We ank each ecod in R based on its elevance to the quey. Given a positive intege k, ou goal is to compute the best k ecods in R anked by thei elevance to Q. Notice that ou poblem setting allows an impotant ecod to be in the answe, even if not all quey keywods appea in the ecod (the OR semantics). Thus the algoithms in [3] cannot be used diectly in ou poblem. Ranking: In the liteatue thee ae many algoithms fo answeing topk queies by accessing lists (e.g., [2, 2]). These algoithms shae the same famewok poposed by Fagin [6], in which we have lists of ecods soted based on vaious conditions, such as tem fequency and invese document fequency ( tf*idf ). Each ecod has a scoe on a list, and we use an aggegation function to combine the scoes of the ecod on diffeent lists to compute its oveall elevance to the quey. The aggegation function needs to be monotonic, i.e., deceasing the scoe of a ecod on a list cannot incease the ecod s oveall scoe. This appoach has the advantage of allowing a geneal class of anking functions. In this pape, we focus on an impotant class of anking functions with the following popety: the scoe F (, Q) of a ecod to a quey Q is a monotonic combination of scoes of the quey keywods with espect to the ecod. Fomally, we compute the scoe F (, Q) intwo steps. In the fist step, fo each keywod w, wecomputea scoeofthekeywodwithespecttotheecod, denoted by F (, w). In the second step, we compute the scoe F (, Q) by applying a monotonic function on the F (, w) s fo all the keywods w. The intuition of this popety is that the moe elevant an individual quey keywod is to a ecod, the moe likely this ecod is a good answe to this quey. Fo example, we compute the scoe of a ecod to quey icdm gaph li by aggegating the scoes of each of keywods with espect to the ecod. Each complete keywod w has a weight associated with aecod, denoted by W (, w). This weight could depend 356
3 Quey Keywods Patial keywod w w 2 w m Tie vitual list Inveted lists Figue 2: Typeahead seach fo Q = w,w 2,...,w m. on the keywod, such as the tf*idf value of the keywod in the ecod. As a specific case, it can also be independent fom the keywod. Fo instance, if a ecod is a URL with tokenized keywods, its weight could be a ank scoe of the coesponding Web page. If a ecod is an autho, we can use the numbe of publications of the autho as a weight of this ecod. Fo the last patial keywod w m, thee could be multiple complete wods. We compute the elevance scoe of w m in the ecod, i.e., F (, w m), based on the following popety: F (, w m) is the maximal value of the W (, d) weights fo all the keywods d with espect to w m in, whee d is a keywod in ecod and has a pefix of w m. This popety states that we only look at the most elevant keywod in a ecod to the patial keywod when computing the elevance of the keywod to the ecod. It means that the anking function is geedy to find the most elevant keywod in the ecod as an indicato of how impotant this ecod is to the patial keywod. As we can see in Section 3, this popety allows us to do effective puning when accessing the multiple lists of a quey keywod. The following is an example function. m F (, Q) = F (, w i), () whee i= { W (, w F (, w i) if i<m, i)= max complete wod d of wm {W (, d)} if i = m. (2) In Figue, conside quey icdm gaph li and ecod 5. F ( 5, icdm ) = W ( 5, icdm )=8andF ( 5, gaph ) = W ( 5, gaph ) = 9. The patial keywod li hastwo complete wods lin and liu. F ( 5, li ) = max{w ( 5, lin ), W ( 5, liu )}=8. F ( 5, icdm gaph li ) = EXACT TYPEAHEAD SEARCH In this section, we study efficient listaccess methods to suppot exact typeahead seach, i.e., no mismatches between quey keywods and answes. Indexing: We constuct a tie fo the data keywods in the data D. A tie node has a chaacte label. Each keywod in D coesponds to a unique path fom the oot to a leaf node 3 on the tie. Fo simplicity, a tie node is mentioned intechangeably with the keywod coesponding to the path fom the oot to the node. A leaf node has an inveted list of IDs of pais id, weight, wheeid is the ID of a ecod containing the leafnode sting, and weight is the weight of the keywod in the ecod. Figue shows the index stuctue in ou unning example. Fo instance, fo the leaf node of keywod gaph, its inveted list has five elements. 3 A common tick to make each leaf node coesponds to a complete wod and vice vesa is to add a special mak to the end of each wod. Fo simplicity we did not use this tick. Fowad index [,4] [5,6] [7,9] Recod Fowad list i l,2 ;6,3 [5,6] [7,8] [9,9] [,2] c i u,3 ;4,9 ;9,6 [3,4] [5,6] 2 2,9 ;5,2 ;8,3 [3,3] [4,4] d n u i 3,4 ;5,2 ;7,9;9,4 l m ,7 ;4,3;6,9;7,2;8, ,9 ;2,8;3,4;6,8;7,3;8, Figue 3: Fowad lists. g [,4] a o [,] p y s u 2 h s p The fist element 5, 9 indicates that the ecod 5 has this keywod, and the weight of this keywod in this ecod is 9, i.e., W ( 5, gaph ) = 9. Seaching: We compute the topk answes to a quey Q in two steps. As illustated in Figue 2, in the fist step, fo each complete keywod w i( i m ), we get its inveted list. Fo the last patial keywod, we locate the tie node of w m and etieve the inveted lists of the tie node s leaf descendants. Fo example, in Figue, conside a quey icdm li. The patial keywod li has two leafnode keywods: lin and liu. In the second step, we access the inveted lists to compute the k best answes. Many algoithms have been poposed fo answeing topk queies by accessing soted lists [2, 6]. When adopting these algoithms to solve ou poblem, we need to efficiently suppot two basic types of access used in these algoithms: andom access and soted access on the lists. 3. Efficient Random Access To suppot andom access, we constuct a fowad index in which each ecod has a fowad list of IDs of its keywods. We assume each keywod has a unique ID with espect to its leaf node on the tie, and the IDs of the keywods follow thei alphabetical ode. Figue 3 shows the fowad lists. The element, 9 onthefowadlistofecod 5 shows that this ecod has a keywod with ID and weight 9, which is keywod gaph as shown on the tie. Given a ecod and a complete keywod, we can get the coesponding weight by doing a binayseach on the fowad list. Fo example, to get the weight of keywod icdm with ID 6 in 5, we can do a binay seach on 5 s fowad list and get the coesponding weight 8. Fo the patial keywod, as it has multiple complete wods, we need fist locate its tie node and then enumeate its leafdescendants to get the coesponding weights. This method could be expensive if the tie node has many leafdescendants. To impove the pefomance, we can use an altenative method. Fo each tie node n, we can maintain a keywod ange [l n,u n], whee l n and u n ae the minimal and maximal keywod IDs of its leaf nodes, espectively [3]. An inteesting obsevation is that a complete wod with n as a pefix must have an ID in this keywod ange, and each complete wod in the data set with an ID in this ange must have a pefix of n. In Figue 3, the keywod ange of node g is [, 4], since is the smallest ID of its leaf nodes and 4 is the lagest one. Based on this obsevation, this method veifies whethe ecod contains a keywod with a pefix of w m as follows. We fist locate the tie node w m and then check if thee is a keywod ID on the fowad list of in the keywod ange [l wm,u wm ]. Since we can keep the fowad list of soted, this checking can be done efficiently. Fo instance, conside quey gaph icdm l. Fo the fist element on the inveted list of gaph, 5, 9, we can check whethe 357
4 Vitual soted list 3,9 5,8 7,8 4,7,6 6,5 9,4 2,3 8, Patial keywod l 3,9 3,9 3,9 5,8 3,9 5,8,6 7,8 4,7 3,4 6,4 6,5 lui 5,3 9,4 4,2 2,3 8, lin liu Figue 4: A heapbased method to compute the vitual soted list of patial keywod l.,6 5 contains othe two keywods as follows. Fo complete U(v) fo node v with espect to patial keywod w m. keywod icdm withid6,wedoabinayseachon 5 s an answe, i.e., 27. We get the next elements of gaph and fowad list and get weight 8. Fo patial keywod l with icdm, 4, 7 and 5, 8. We incement the cuso of the keywod ange [7, 9], using a binay seach on 5 s fowad list (, 9 ; 2, 8 ; 3, 4 ; 6, 8 ; 7, 3 ; 8, 8 ) list that poduces the top element, push it into the heap, and, we find keywod etieve the next top element: 5, 8. Based on the accessed IDs 7 and 8 in this ange. Thus we know that the ecod elements, we have ) The scoe of ecod 5 is = 25; indeed contains keywods with pefix l, and compute the coesponding scoe F ( 5, l ) = max { 2) The maximal scoe of ecod 3 is = 24, and F ( 5, lin ),F( 5, liu ), F ( 5, lui ) } that of 4 is = 24, while those of othe ecods ae =8.ThusF ( 5, gaph icdm l ) = 25. at most = 23. Thus, ecod 5 is the best answe. 3.2 Efficient Soted Access To suppot soted access, we can keep the elements on the inveted lists soted based on thei weights in a descending ode. Thus, fo the complete keywod, we can get an odeed list. Fo the patial keywod w m, it has multiple leaf descendants and coesponding inveted lists. We use U(w m)todenote the union of those inveted lists, called union list of w m.we need to suppot soted access on U(w m) to etieve the next most elevant ecod ID fo w m. Fully computing U(w m) using the keywod lists could be expensive in tems of time and space. In this section, we popose two techniques to suppot soted access efficiently HeapBased Method We can suppot soted access on U(w m) by building a max heap on the inveted lists of its leaf nodes. In paticula, we maintain a cuso on each inveted list. The max heap initially consists of the ecod IDs pointed by the cusos so fa, soted on the weights of the keywods in these ecods. Notice that each inveted list is aleady soted based on the weights of its keywod in the ecods. To etieve the next best ecod, we pop the top element fom the heap, incement the cuso of the list of the popped element by, and push the new element of this list to the heap. When popping all elements fom the heap, we can get a soted list fo the patial keywod. Fo example, conside the patial keywod l. It has thee complete keywods lin, liu, and lui. We can compute its union list as shown in Figue 4. Note that since ou method does not need to compute the entie list of U(w m), U(w m)isavitual soted list of patial keywod w m. On top of the inveted lists of complete keywods and the max heap of the patial keywod, we can adopt an existing topk algoithm to find the k best ecods. As an example, suppose we want to compute the top best answe fo quey gaph icdm l using soted access only. We get the fist elements of gaph and icdm, 5, 9 and 4, 9, pop the top element of the max heap in Figue 4, 3, 9, and compute an uppe bound on the oveall scoe of Legend: Figue 5: M(v): Mateialized descendants of v v Max heap of w m T(v): subtie of v N(v): othe leaf nodes (of v) without mateialized ancestos Benefits of mateializing the union list List Mateialization We can futhe impove the pefomance of soted access fo the patial keywod w m by pecomputing and stoing the unions of some of the inveted lists on the tie. Let v be a tie node, and U(v) be the union of the inveted lists of v s leaf nodes, soted by thei ecod weights. If a ecod appeas moe than once on these lists, we choose its maximal weight as its weight on list U(v). Fo example, U( li ) = { 3, 9, 5, 8, 7, 8 ; 4, 7, 6, 5, 9, 4, 2, 3, 8, }. When using a max heap to etieve ecods soted by thei scoes fo the patial keywod, this mateialized list could help us build a max heap with fewe lists and educe the cost of push/pop opeations on the heap. Theefoe, this method allows us to utilize additional memoy space to answe topk queies moe efficiently. Fo instance, conside the index in Figue and a quey icdm g. Fo the patial keywod g, we access its data keywods gaph, gay, goss, and goup, and build a max heap on thei inveted lists based on ecod scoes with espect to this quey keywod. If we mateialize the union lists of ga and go, we can use thei mateialized lists, saving the time to tavese the fou leaf nodes and some push/pop opeations on the heap. We next give a detailed costbased analysis to quantify the benefit of mateializing a node on the pefomance of opeations on the max heap of w m, fo exact typeahead seach. Let B be a budget of stoage space we ae given to mateialize union lists. Given a tie node v, letu(v) bethe union of inveted lists of leaf nodes in the subtie of v. Ou goal is to select tie nodes to mateialize thei union lists fo maximizing the pefomance of queies. The following ae naive algoithms fo choosing tie nodes: Random: We andomly select tie nodes. TopDown: We select nodes top down fom the tie oot. BottomUp: We select nodes bottom up fom leaf nodes. Each naive appoach keeps choosing tie nodes to mateialize thei union lists until the sum of thei list sizes eaches the space limit B. One main limitation of these appoaches is that they do not quantitatively conside the benefits of 358
5 mateializing a union list. To ovecome this limitation, we popose a costbased method called CostBased to do list mateialization. Its main idea is the following. Fo simplicity we say a node has been mateialized if its union list has been mateialized. Fo a quey Q with a pefix keywod w m, suppose some of the tie nodes have thei union lists mateialized. Let v be such a mateialized node. If we can use U(v) to constuct the heap of w m,we need not visit v s descendants and access the inveted lists of v s leaf descendants, and thus achieve the benefit of educing the time of tavesing the subtie ooted at v and push/pop opeations on the max heap of w m. We say the mateialized node v is usable fo patial keywod w m. Next we discuss how to check whethe a node v is usable fo patial keywod w m. If v is not a descendant of w m, mateializing v is unusable to w m; othewise, if no node on the path fom v to w m (including w m) has been mateialized, mateializing v is usable to w m. Notice that if v has a mateialized ancesto v on the path fom v to w m, then we can use the mateialized list U(v ) instead of U(v), and the list U(v) will no longe be usable to w m. To summaize, a mateialized node v is usable fo patial keywod w m if,. v is a descendant of w m;and 2. v has no mateialized ancesto between v and w m. Fo example, conside a quey icdm g, mateializing node l is unusable fo patial keywod g as l is not a descendant of g. Mateializing g is usablefo g if g isnot mateialized. If g is mateialized, then mateializing ga is unusable fo g as we will use the mateialized list of g to build the max heap of g, instead of using ga. If v is usable fo w m, mateializing U(v) has the following benefits fo the heap of w m. () We do not need to tavese the tie to access these leaf nodes and use them to constuct the max heap; (2) Each push/pop opeation on the heap is moe efficient since it has fewe lists. Hee we pesent an analysis of the benefits of mateializing the usable node v. In geneal, fo a tie node v, lett (v) denote its subtie and T (v) denote the numbe of nodes in T (v). The total time of tavesing this subtie is O ( T (v) ). Now we analyze the benefit of mateializing node v. As illustated in Figue 5, suppose v has mateialized descendants. Let M(v) be the set of highest mateialized descendants of v. These mateialized nodes can help educe the time of accessing the inveted lists of v s leaf nodes in two ways. Fist, we do not need to tavese the descendants of a mateialized node d M(v). We can just tavese T (v) d M(v) T (d) tie nodes. Second, when inseting lists to the max heap of w m, we inset the union list of v into the heap and need not inset the union list of each d M(v) and the inveted lists of d N(v) into the heap, whee N(v) denotes the set of v s leaf descendants having no ancestos in M(v). Let S(v) =M(v) N(v). We quantify benefits of mateializing node v:. Reducing tavesal time: Since we do not tavese v s descendants, the time eduction is B = O ( T (v) d M(v) T (d) ). 2. Reducing heapconstuction time: When constucting the max heap fo keywod w m, we inset the union list U(v) into the heap, instead of the inveted lists of those nodes in S(v). The time eduction is B 2 = S(v). 3. Reducing sotedaccess time: If we inset the union list U(v) tothemaxheapofw m, the numbe of leaf nodes in the heap is S(w m). Othewise, it is S(w m) + S(v). The time eduction of a soted access is B ( 3=O log( S(w ) m) + S(v) ) O ( log( S(w ) m) ). The following is the oveall benefit of mateializing v fo the patial keywod w m: B v = B + B 2 + A v B 3, (3) whee A v is the numbe of soted accesses on U(v). A v can be computed using the numbe of ecods in the union list U(v), and the numbe of keywods in the quey. The analysis above is on a quey wokload. If thee is no quey wokload, we can use the tie stuctue to count the pobability of each node to be queied and use such infomation to compute the benefit of mateializing a node. In this pape, we employ a no quey wokload setting. 4. FUZZY TYPEAHEAD SEARCH In this section, we fist define the poblem of topk queies in fuzzy typeahead seach [3]. We then develop new techniques to suppot efficient list access to answe such queies by extending techniques developed in exact seach. 4. Ranking As a use types in a quey lette by lette, fuzzy typeahead seach onthefly finds ecods with wods simila to the quey keywods. Fo example, conside the data in Table. Suppose a use types in a quey gaph gose. We etun 5 as a elevant answe since it has a keywod goss simila to quey keywod gose. We use edit distance to measue the similaity between stings. Fomally, the edit distance between two stings s and s 2, denoted by ed(s, s 2), is the minimum numbe of singlechaacte edit opeations (i.e., insetion, deletion, and substitution) needed to tansfom s to s 2. Fo example, ed(goss, gose) =. Similaity Function: Let π be a function that computes the similaity between a data sting s and a quey keywod w in Q = w,w 2,...,w m. An example is: π(s, w) = ed(s, w), w whee w is the length of the quey keywod w. We nomalize the edit distance based on the queykeywod length in ode to allow moe eos fo longe quey keywods. Ou esults in the pape focus on this function, and they can be genealized to othe functions using edit distance. Let d be a keywod in the data set D. Foeachcomplete keywod w i (i =, 2,...,m ) in the quey, we define the similaity of d to w i as: Sim(d, w i)=π(d, w i). Since the last keywod w m is teated as a pefix condition, we define the similaity of d to w m as the maximal similaity of d s pefixes using function π, i.e.: Sim(d, w m)= max {π(p, wm)}. pefix p of d Let τ be a similaity theshold. We say a keywod d in D is simila to a quey keywod w if Sim(d, w) τ. Wesaya pefix p of a keywod in D is simila to the quey keywod w m if π(p, w m) τ. We want to find the keywods in the data set that ae simila to quey keywods, since ecods with such a keywod could be of inteest to the use. 359
6 Quey Keywods Legend: w w 2 w m Tie Simila pefixes Inveted lists Patial keywod Simila complete wods Figue 6: Keywods simila to those in quey Q = w,w 2,...,w m. Each quey keywod w i has simila keywods on leaf nodes. The last pefix keywod w m has simila pefixes. Let Φ(w i)(i =,...,m)denotethesetofkeywodsin D simila to w i,andp (w m) denote the set of pefixes (of keywods in D) simila to w m. We compute the topk answes to the quey Q in two steps. In the fist step, fo each keywod w i in the quey, we fist compute an editdistance uppe bound based on the similaity function, i.e., ( τ) w i, and then compute the simila keywods Φ(w i)and simila pefixes P (w m) on the tie (shown in Figue 6). Ji et al. [3] developed an efficient algoithm fo incementally computing these simila stings as the use modifies the cuent quey. A simila algoithm is developed in [5]. In the second step, we access the inveted lists of these simila data keywods to compute the k best answes. Fo example, assume a use types in a quey gose li lette by lette on the data shown in Table. Suppose the similaity theshold τ is.45. The set of pefixes simila to the patial keywod li isp ( li ) = {l, li, lin, liu, lu, lui, i}, and the set of data keywods simila to the patial keywod li isφ( li ) = {lin, liu, lui, icdl, icdm}. In paticula, lui is simila to li since Sim(lui, li) = ed(lui,li) li =.5 τ. The set of simila wods fo the complete keywod gose is Φ( gose ) = {goss}. Then we compute topk answes using the inveted lists of those wods in Φ( gose ) and Φ( li ). Ranking: We still assume the anking function has the fist popety descibed in Section 2, which computes the scoe F (, Q) by applying a monotonic function on the F (, w i) s fo all the keywods w i in the quey. Given a complete keywod w i and a ecod, fo exact seach, we can use the weight of w i in, i.e., W (, w i), to denote thei elevancy F (, w i). But fo fuzzy seach, the keywod w i can be simila to multiple keywods in the ecod, and diffeent simila wods have diffeent similaities to w i and diffeent weights in. A question is how to compute the elevance value of keywod w i in ecod, F (, w i). Let d be a keywod in ecod such that d is simila to the quey keywod w i, i.e., d Φ(w i). We use F (, w i,d)to denote the elevance of this quey keywod w i in the ecod with espect to keywod d. The value should depend on both the weight of d in, i.e., W (, d), as well as the similaity between w i and d, i.e., Sim(d, w i). Intuitively, the moe simila they ae, the moe elevant w i is to in tems of d. Fo instance, F (, w i,d)=sim(d, w i) W (, d) isanexample anking function to evaluate the elevancy of w i in the ecod with espect to keywod d. We use the following function with the second popety in Section 2 to compute F (, w i): F (, w i)= max {F (, w i,d)}. (4) keywod d (in ) simila to w i 4.2 Efficient Random Access We fist study how to suppot efficient andom access fo fuzzy typeahead seach. Fo simplicity, in the discussion we focus on how to veify whethe the ecod has a keywod with a pefix simila to the patial keywod w m. With mino modifications the discussion extends to the case whee we want to veify whethe has a keywod simila to a complete keywod w i( i m ). In each andom access, given an ID of a ecod, wewant to etieve infomation elated to a quey keywod w i,which allows us to etieve W (, d) fo each of w i s simila wod d so as to compute the scoe F (, w i). In paticula, fo a keywod w i in the quey, does the ecod have a keywod simila to w i? One naive way to get the infomation is to etieve the oiginal ecod and go though its keywods. This appoach has two limitations. Fist, if the data is too lage to fit into memoy and has to eside on had disks, accessing the oiginal data fom the disks may slow down the pocess significantly. This costly opeation will pevent us fom achieving an inteactiveseach speed. The second limitation is that it may equie a lot of computation of sting similaities based on edit distance, which could be time consuming. In this section, we pesent two efficient appoaches fo solving this poblem. Method : Pobing on Fowad Lists: This method veifies whethe ecod contains a keywod with a pefix simila to w m as follows. Fo each pefix p on the tie simila to w m (computed in the fist step of the algoithm as discussed above), we check if thee is a keywod ID on the fowad list of in the keywod ange [l p,u p] of the tie node of p as discussed in Section 3. Method 2: Pobing on Tie Leaf Nodes: Using this method, fo each pefix p simila to w m, we tavese the subtie of p and identify its leaf nodes. Fo each leaf node d, we stoe the fact that fo the quey Q, thiskeywodd has a pefix simila to w m in the quey. Specifically, we stoe Quey ID, patial keywod w m, Sim(p, w m). We stoe the quey ID in ode to diffeentiate it fom othe queies in case multiple queies ae answeed concuently. We stoe the similaity between w m and p to compute the scoe of this keywod in a candidate ecod. In case the leaf node has seveal pefixes simila to w m, we only keep thei maximal similaity to w m. Fo each complete keywod w i, we also stoe the same infomation fo those tie nodes simila to w i. Theefoe, a leaf node might have multiple enties coesponding to diffeent keywods in the same quey. We call these enties fo the leaf node as its collection of elevant quey keywods. Notice that this stuctue needs vey little stoage space, since the enties of old queies can be quickly eused by new queies, and the numbe of keywods in a quey tends to be small. We use this additional infomation to efficiently check if a ecod contains a complete wod with a pefix simila to the patial keywod w m. We scan the fowad list of. Fo each of its keywod IDs, we locate the coesponding leaf node, and test whethe its collection of elevant quey keywods includes this quey and 36
7 p [,4] a [,4] y g [,2] [3,4] [,] s o [3,3] [4,4] u Fowad index [5,6] [7,9] Recod Fowad list l [5,6] [7,8] [9,9],2 ;6,3 i u [5,6],3 ;4,9 ;9,6 2,9 ;5,2 ;8,3 i c d n u,4 ;5,2 ;7,9;9,4 l m ,7 ;4,3;6,9;7,2;8,7 h s p q,lin,.66,9 ;2,8;3,4;6,8;7,3;8,8 q,gose,.8 q,lin, q 2, liu,... q 2,goss, q 2, liu,.66 Figue 7: Pobing on tie leaf nodes. i the keywod w m. If so, we use the stoed sting similaity to compute the scoe of this keywod in the quey. Figue 7 shows how we use this method in ou unning example, whee the use types in a keywod quey q = lin, gose. When computing the simila wods of gose, i.e., goss, we inset the quey ID (shown as q ), the patial keywod gose, and the coesponding pefix similaity to its collection of elevant quey keywods. To veify whethe ecod 5 has a wod with a pefix simila to gose, we scan its fowad list. Its thid keywod is goss. We access its coesponding leaf node, and see that the node s collection of elevant quey keywods includes gose. Thus we know that 5 indeed contains a keywod simila to gose, and can etieve the coesponding pefix similaity. Compaison: The time complexity of the fowadlist based method (Method ) is O ( G log( ) ), whee G is the total numbe of simila pefixes of w m and simila complete wods of w i s fo i m, and is the numbe of distinct keywods in ecod. Since the simila pefixes of w m could have ancestodescendant elationships, we can optimize the step of accessing them by consideing the highest ones. The time complexity of the second method is O( T (p) + Q ). smila pefix p of w m The fist tem coesponds to the time of tavesing the subties of simila pefixes, whee T (p) is the subtie ooted at a simila pefix p. The second tem coesponds to the time of pobing the leaf nodes, whee Q is the numbe of quey keywods. Notice that to identify the answes, we need access the inveted lists of complete wods, thus the fist tem can be emoved fom the complexity. Method is pefeed fo data sets whee ecods have a lot of keywods such as long documents, while Method 2 is pefeed fo data sets whee ecods have a small numbe of keywods such as elational tables with elatively shot attibute values. 4.3 Efficient Soted Access HeapBased Method: Fo a quey keywod w, wewant to suppot soted access that can access ecod IDs based on the elevance of w to these ecods. As w has multiple simila wods, we can suppot soted access efficiently by building a max heap on the inveted lists of such simila wods, as descibed in Section 3. Notice that, in exact seach, each leaf node has the same similaity to w; but fo fuzzy seach, diffeent leaf nodes could have diffeent similaities. Thus, when pushing a ecod fom an inveted list of a simila wod d to the heap, we maintain, F (, d) in the heap. We push/pop the ecod on the heap with the maximal F (, d). Conside the quey icdm li. Figue 8 shows the two heaps fo the two keywods. Fo illustation puposes, fo icdm li 4,9 3,9 3,9 4,9 5,8 5,8 3,9 4,4.5 6,5 7,8 7,3 4,9 9,4 3,9 5,8,3 *3/4 4,4.5 * 4,7 7,3 6,5 * * 7,4 4,9 4,4.5 7,3,3 9,4 3, 9 5,8 */2 8,2 5,8 4,7 */2 */2 8,.5,3 7,8 3,2 6,5 6,5,6 4,9 7,6 3,.5 2,3 6,4 2,2 9,4 2,.5,3,.5 9,4 3,4 5,8 8,5 5,3 6,5 2,3 3,4 8, 4,2 9,4 8, 2,3 icdl icdm lin liu,3 lui icdm icdl Figue 8: Max heaps fo the quey keywods icdm and li. Each shaded list is meged fom the undelying lists. It is vitual since we do not need to compute the entie list. each keywod we also show the vitual meged list of ecods with thei scoes, and this list is only patially computed duing the tavesal of the undelying lists. Each ecod on a heap has an associated scoe of this keywod with espect to the quey keywod, computed using Equation 4. List Puning: As thee may be a lage numbe of simila wods fo a quey keywod, especially fo the patial keywod, it could be expensive to constuct a heap on the fly. We futhe impove the pefomance of soted access on the vitual soted list U(w) by using the idea of ondemand heap constuction, i.e., we want to avoid constucting a heap fo all the inveted lists of keywods simila to a quey keywod. Suppose w has t simila wods. Each push/pop opeation on the heap of these lists takes O(log(t)) time. If we can educe the numbe of lists on the heap, we can educe the cost of its push/pop opeations. We have two obsevations about this puning method. () As a special case, if those keywods matching quey keywods exactly have the highest elevance scoes, this method allows us to conside these ecods pio to consideing othe ecods with mismatching keywods. (2) The puning can be moe poweful if w is the last patial keywod w m,sincemanyof its simila keywods shae the same pefix p on the tie. Conside quey icdm li, Figue 8 illustates how we can pune lowscoe lists and do ondemand heap constuctions. The pefix li has seveal simila keywods. Among them, the two wods lin and liu have the highest similaity value to the quey keywod, mainly because they have a pefix matching the keywod exactly. We build a heap using these two lists. To compute the top best answe, the lists of lui, icdm, and icdl ae neve included in the heap since thei uppe bounds ae always smalle than the scoes of popped ecods befoe the tavesal teminates. We next intoduce how to do list puning fo the maxheap based methods in fuzzy typeahead seach. Given a keywod w, letd,...,d t be its simila wods and L,...,L t be the coesponding inveted lists, espectively. We need not use all the inveted lists to build the max heap of w. Instead, we use those with highe similaities to w to ondemand build the max heap. We fist sot these inveted lists based on the similaities of thei keywods to w, without loss of geneality, suppose Sim(d,w) >...>Sim(d t,w). We fist constuct the max heap using the lists with the highest similaity values and then include othe lists ondemand. Suppose L i is a list not included in the heap so fa. We can deive an uppe bound u i on the scoe of a ecod fom L i (with espect to the quey keywod w) using the lagest 36
8 weight on the list and the sting similaity Sim(d i,w). Let be the top ecod on the heap, with a scoe F (, w). If F (, w) u i, then this list does not need to be included in the heap, since it cannot have a ecod with a highe scoe. Othewise, this list needs to be included in the heap. Based on this analysis, each time we pop a ecod fom the heap and push a new ecod, we compae the scoe of the new ecod with the uppe bounds of those lists not included in the heap so fa. Fo those lists with an uppe bound geate than this scoe, they need to be included in the heap fom now on. Notice that this checking can be done vey efficiently by stoing the maximal value of these uppe bounds, and odeing these lists based on thei uppe bounds. The puning powe can be even moe significant if the keywod w is the patial keywod w m, since many of its simila keywods shae the same pefix p on the tie simila to w m. We can compute an uppe bound of the ecod scoes fom these lists and stoe the bound on the tie node p. In this way, we can pune the lists moe effectively by compaing the value F (, w) with this uppe bound stoed on the tie, without needing to onthefly compute the bound. List Mateialization: Fo fuzzy seach, the patial keywod w m has multiple simila pefixes and each simila pefix has multiple simila wods. The max heap of w m is built on top of inveted lists of such simila wods. Let d be such a simila wod. Recall that the value F (, w m,d) of a ecod on the list of a simila wod d with espect to w m is based on both W (d, ) andsim(d, w m). Let v be a mateialized node. To use U(v) to eplace the lists of v s leaf nodes in the max heap, the following two conditions need to be satisfied: All the leaf nodes of v have the same similaity to w m. All the leaf nodes of v ae simila to w m, i.e., thei similaity to w m is no less than the theshold τ. When the conditions ae satisfied, the soting ode of the union list U(v) is also the ode of the scoes of the ecods on the leafnode lists with espect to w m. A mateialized node v that satisfies the two conditions must be a descendant of a simila pefix of patial keywod w m. We can pove this by contadiction. Suppose node v is not a descendant of any simila pefix of patial keywod w m. Then node v and its ancestos ae not simila pefixes of w m,thatis the leaf nodes of v ae not simila keywods of w m. This is contadicted with the second condition. Thus a mateialized node v that satisfies the two conditions must be a descendant of a simila pefix of patial keywod w m. Suppose p,p 2,...,p n ae simila pefixes of w m. We check whethe thei mateialized descendants satisfy the two conditions as follows. Conside a mateialized node v which has ancestos among p,p 2,...,p n.ifnodev has no descendants that ae simila pefixes of w m, v must satisfy the two conditions; othewise suppose p j is a descendant of v that is a simila pefix of w m and has the lagest similaity to v among all such descendants. Without loss of geneality, let p i be an ancesto of v and has the lagest similaity with v among all simila pefixes. If Sim(v, p j) Sim((v, p i), v satisfies the two conditions; othewise v will not. Thus we can find usable mateialized nodes to constuct the max heap of w m and use ou poposed techniques in Section to do a costbased analysis to select highquality nodes fo mateialization. 5. EXPERIMENTS We implemented ou poposed techniques and compaed with existing methods on thee eal data sets. () DBLP : It included compute science publication ecods 4. (2) URL 5 : It included million URLs. (3) Enon : It was an collection 6. Table 2 shows details of the data. Table 2: Data sets and index costs. Data Set URL DBLP Enon # of Recods (millions).5 Data size. GB 5 MB.4 GB Avg. # of wods/ecod # of distinct keywods (millions) Tie size 42 MB 3 MB 28 MB Size of inveted lists 379 MB 83 MB 342 MB Fo the DBLP data set, we selected eal queies fom the logs of ou deployed systems and each quey contained 6 keywods 7. Fo the othe two data sets, we geneated queies with keywods andomly selected fom the set of wods used in the collection. We assumed the lettes of a quey wee typed in one by one. Fo each keystoke, we measued the time of computing the topk answes to this quey. Fo exact seach, we measued the total unning time. Fo fuzzy seach, we measued the time in two steps: in step we computed keywods on the tie simila to the quey keywods (using the algoithm descibed in [3]); in step 2 we found the topk answes using the inveted lists of these simila keywods. Unless othewise specified, k =. We compaed ou method with stateoftheat method [3]. We implemented the NRA algoithm descibed in [6] if we only do soted access, and the Theshold Algoithm ( TA ) if we can do both soted access and andom access. All the indexes wee built offline and peloaded and fullesident in memoy duing all queying opeations. All expeiments wee un on a Ubuntu Linux machine with an Intel Coe pocesso (X545 3.GHz and 4 GB RAM). 5. Exact Seach Soted Access Only: We implemented the following methods. () BinayPobe [3]: We consideed the inveted lists of the complete quey keywods, and the union of the inveted lists fo the complete keywods of the patial keywod. We chose the shotest list, and fo each of its ecod IDs, we did binay pobings on othe lists. (2) NRA(Heap): We implemented the NRA algoithm using the heapbased technique. (3) NRA(Heap+Mateialization 8 ): We implemented the NRA algoithm using the heapandmateializationbased techniques. Figue 9 shows the esults on the Enon dataset, which showed that ou method impoved seach efficiency. Fo instance, fo queies with a patial keywod of length 2, NRA(Heap) educed the quey time of BinayPobe fom 28 ms to ms. NRA(Heap+Mateialization) futhe educed the time to 2 ms. This is because ) BinayPobe fist computed all esults and then anked them; 2) BinayPobe onthefly computed the union list of the patial keywod. NRA(Heap) used the max heap to geneate a soted patial list and NRA(Heap+Mateialization) used mateialized lists to save push/pop opeations on the heap. Soted Access + Random Access: We implemented the following methods. () BinayPobe (Fowad List)[3], we chose the shotest list, and fo each of its ecod IDs, we veified whethe the ecod ID contained othe keywods enon/ 7 Details ae omitted due to doubleblind eview. 8 We used additional 5% space with espect to inveted index fo mateialization in the expeiments. 362
9 Quey Time (ms) BinayPobe NRA(Heap) NRA(Heap+Mateialization) Quey Time (ms) BinayPobe NRA(Heap) NRA(Heap+Mateialization) Quey Time (ms) BinayPobe(Fowad List) TA(Fowad List+Heap) TA(Fowad List+Heap+Mateialization) Quey Time (ms) BinayPobe(Fowad List) TA(Fowad List+Heap) TA(Fowad List+Heap+Mateialization) # of ecods (*K) Length of the pefix keywod (a) Vaying Data Size (b) Vaying pefix length Figue 9: Exact seach using soted access (Enon). using the fowad list. (2) TA(Fowad List+Heap): We implemented the TA algoithm using fowad list fo andom access and max heap fo soted access. (3) TA(Fowad List+Heap+Mateialization): We implemented the TA algoithm using fowad list, max heap, and list mateialization. Figue shows the esults on the DBLP dataset. We can see that the andomaccess techniques indeed impoved efficiency. 5.2 Fuzzy Seach Soted Access Only: We fist evaluated the effect of the listpuning technique. Figue shows the expeimental esults (including two steps). We can obseve that list puning indeed impoved seach efficiency. Fo the Enon dataset with.5m ecods, the method with puning can educe the time fom 3 ms to 7 ms. The puning technique was moe effective on the Enon dataset than on the othe two datasets mainly due to two easons. Fist, the Enon dataset had moe tie nodes due to its lage numbe of distinct keywods in the s. Thus a quey keywod can have moe simila pefixes on the tie. Second, the Enon dataset had fewe ecods, and the inveted lists wee elatively shote. Duing the list tavesal, the NRA algoithm visited fewe ecods, and its highe scoe of the top ecod fom the max heap helped us pune moe lists. List Mateialization: We evaluated the impovement on soted access using list mateialization fo fuzzy typeahead seach. We measued the amount of stoage space fo stoing mateialized lists as a pecentage of the total size of the inveted lists on the tie. We vaied this amount, and measued the aveage time of finding the top answes using the NRA algoithm. Figue 2 shows the esults. We can see that list mateialization impoved the seach pefomance. We implemented the diffeent methods fo list mateialization, namely Random, TopDown, BottomUp, and CostBased as discussed in Section Figue 3 shows the esults. Among the thee naive methods, Random gave the best esults. The CostBased algoithm outpefomed all the naive methods. This is because CostBased selected highquality nodes fo mateialization using a costbased analysis. Soted Access + Random Access: We implemented the TA algoithm using the two methods fo andom access and list puning fo soted access (descibed in Section 4). Figue 4 shows the scalability esults on the thee datasets. The two andomaccess methods scaled well. Method 2 (pobing on tie leaf nodes) outpefomed Method (pobing on fowad lists). This is because fo the thee data sets, thee wee many pefixes simila to the patial keywod, and Method needed to conside all simila pefixes fo each ecod on fowad lists. 6. RELATED WORK Thee ae many studies on autocomplete and phase pediction fo use queies [22, 5, 9, 23, 7]. Google instant seach was # of ecods (*K) Length of the pefix keywod (a) Vaying Data Size (b) Vaying pefix length Figue :Exact seach using andom access(dblp). launched to suppot typeahead seach. It fist suggested elevant queies based on use pofiles and quey logs and then answeed the top queies. Chaudhui et al. [5] studied how to find simila stings inteactively as uses type in a quey sting, using an appoach simila to that in [3, 2]. They did not study the case whee a quey has multiple keywods that need listintesection opeations. The seach paadigm studied in this pape is diffeent since we suppot fuzzy, fulltext seach as uses type in queies. Bast et al. poposed techniques to suppot typeahead seach in thei CompleteSeach systems [2, 3, ]. Anothe study [9] is about typeahead seach on elational data gaphs. Ji et al. [3] developed algoithms fo fuzzy typeahead seach. Ou wok extends these studies by developing efficient algoithms to suppot topk seach. Khoussainova et al. [4] poposed to suggest elevant SQL snippets as uses type in SQL queies. Li et al. [8] studied how to use SQLs to suppot typeahead seach in databases. Feng et al. [8] studied fuzzy seach on XML data. Thee have been many studies on suppoting fuzzy seach (e.g., [, 7, 4,, 24, 6]). Howeve these algoithms ae inefficient fo typeahead seach since they have low puning powe fo shot stings (patial keywods). The expeiments in [3, 5] showed that these appoaches ae not as efficient as tiebased methods fo fuzzy typeahead seach. Theobald et al. [25] poposed a heapbased method fo quey expansion. They used WodNet wods and only utilized soted access. conside both soted access and andom access. We 7. CONCLUSION In this pape we studied how to efficiently answe topk queies in typeahead seach. We focused on an index stuctue with a tie of keywods in a data set and inveted lists of ecods on the tie leaf nodes. We studied technical challenges when adopting existing topk algoithms in the liteatue: how to efficiently suppot andom access and soted access on inveted lists? We pesented two algoithms fo suppoting andom access, and poposed optimization techniques using list puning and mateialization to suppot soted access. Ou techniques can be easily extended to suppot lage datasets though data patition. Fo example, we have built a system to seach on 2 million MEDLINE publication ecods using two machines. Acknowledgement. The authos have financial inteest in Bimaple Technology Inc., a company cuently commecializing some of the techniques descibed in this publication. Chen Li is patially suppoted by the NIH gant R2LM43A and the National Natual Science Foundation of China (No. 6292). Guoliang Li, Jianan Wang, and Jianhua Feng wee patly suppoted by the National Natual Science Foundation of China (No. 634), the National Gand Fundamental Reseach 973 Pogam of China (No. 2CB3226), Tsinghua Univesity (No. 2873), and the NExT Reseach Cente funded by MDA, Singapoe (No. WBS:R ). 363
10 Quey Time (ms) Without Puning Puning Computing Simila Keywods Quey Time (ms) Without Puning Puning Computing Simila Keywods Quey Time (ms) Without Puning Puning Computing Simila Keywods # of ecods (*M) # of ecods (*K) # of ecods (*K) (a) URL (b) DBLP (c) Enon Figue : Fuzzy seach using list puning (similaity theshold τ =.6). Quey Time (ms) keywod queies 4keywod queies 3keywod queies 2keywod queies keywod queies % % 2% 3% 4% 5% Additional Space/InvetedIndex Size Quey Time (ms) keywod queies 4keywod queies 3keywod queies 2keywod queies keywod queies % % 2% 3% 4% 5% Additional Space/InvetedIndex Size Quey Time (ms) keywod queies 4keywod queies 3keywod queies 2keywod queies keywod queies 5 % % 2% 3% 4% 5% Additional Space/InvetedIndex Size (a) URL (b) DBLP (c) Enon Figue 2: Fuzzy seach using list mateialization (soted access only, with list puning, theshold τ =.6). Quey Time (ms) 5 5 TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/InvetedIndex Size Quey Time (ms) TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/InvetedIndex Size Quey Time (ms) TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/InvetedIndex Size (a) URL (b) DBLP (c) Enon Figue 3: Compaison of diffeent mateialization methods (similaity theshold τ =.6). Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*M) Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*K) Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*K) (a) URL (b) DBLP (c) Enon Figue 4: Fuzzy seach with soted access ( SA ) and andom access ( RA ) (similaity theshold τ =.6). 8. REFERENCES [] H. Bast, A. Chitea, F. M. Suchanek, and I. Webe. Este: efficient seach on text, entities, and elations. In SIGIR, pages , 27. [2] H. Bast and I. Webe. Type less, find moe: fast autocompletion seach with a succinct index. In SIGIR, pages , 26. [3] H. Bast and I. Webe. The completeseach engine: Inteactive, efficient, and towads i& db integation. In CIDR, pages 88 95, 27. [4] S. Chaudhui, V. Ganti, and R. Kaushik. A pimitive opeato fo similaity joins in data cleaning. In ICDE, pages 5 6, 26. [5] S. Chaudhui and R. Kaushik. Extending autocompletion to toleate eos. In SIGMOD Confeence, pages 77 78, 29. [6] R. Fagin, A. Lotem, and M. Nao. Optimal aggegation algoithms fo middlewae. In PODS, pages 2 3, 2. [7] J. Fan, G. Li, and L. Zhou. Inteactive SQL quey suggestion: Making databases usefiendly. ICDE, pages , 2. [8] J. Feng, and G. Li. Efficient Fuzzy TypeAhead Seach in XML Data. IEEE TKDE, 24(5): , 22. [9] K. Gabski and T. Scheffe. Sentence completion. In SIGIR, pages , 24. [] L. Gavano, P. G. Ipeiotis, H. V. Jagadish, N. Koudas, S. Muthukishnan, and D. Sivastava. Appoximate sting joins in a database (almost) fo fee. In VLDB, pages 49 5, 2. [] M. Hadjieleftheiou, A. Chandel, N. Koudas, and D. Sivastava. Fast indexes and algoithms fo set similaity selection queies. In ICDE, pages , 28. [2] I. F. Ilyas, G. Beskales, and M. A. Soliman. A suvey of topk quey pocessing techniques in elational database systems. ACM Comput. Suv., 4(4), 28. [3] S. Ji, G. Li, C. Li, and J. Feng. Efficient inteactive fuzzy keywod seach. In WWW, pages 37 38, 29. [4] N. Khoussainova, Y. Kwon, M. Balazinska, and D. Suciu. Snipsuggest: Contextawae autocompletion fo sql. PVLDB, 4():22 33, 2. [5] K. Kukich. Techniques fo automatically coecting wods in text. ACM Comput. Suv., 24(4): , 992. [6] H. Lee, R. T. Ng, and K. Shim. Extending qgams to estimate selectivity of sting matching with low edit distance. In VLDB, pages 95 26, 27. [7] C. Li, J. Lu, and Y. Lu. Efficient meging and filteing algoithms fo appoximate sting seaches. In ICDE, pages , 28. [8] G. Li, J. Feng, and C. Li. Suppoting seachasyoutype using sql in databases. IEEE TKDE, 22. [9] G. Li, S. Ji, C. Li, and J. Feng. Efficient typeahead seach on elational data: a tastie appoach. In SIGMOD Confeence, pages , 29. [2] G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy fulltext typeahead seach. VLDB J., 2(4):6764, 2. [2] N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggegation of anked inputs. In ICDE, page 72 83, 26. [22] H. Motoda and K. Yoshida. Machine leaning techniques to make computes easie to use. Atif. Intell., 3(2):295 32, 998. [23] A. Nandi and H. V. Jagadish. Effective phase pediction. In VLDB, pages 29 23, 27. [24] J. Qin, W. Wang, Y. Lu, C. Xiao, and X. Lin. Efficient exact edit similaity quey pocessing with the asymmetic signatue scheme. In SIGMOD Confeence, pages 33 44, 2. [25] M. Theobald, R. Schenkel, and G. Weikum. Efficient and selftuning incemental quey expansion fo topk quey pocessing. In SIGIR, pages ,
Top K Nearest Keyword Search on Large Graphs
Top K Neaest Keywod Seach on Lage Gaphs Miao Qiao, Lu Qin, Hong Cheng, Jeffey Xu Yu, Wentao Tian The Chinese Univesity of Hong Kong, Hong Kong, China {mqiao,lqin,hcheng,yu,wttian}@se.cuhk.edu.hk ABSTRACT
More informationUncertain Version Control in Open Collaborative Editing of TreeStructured Documents
Uncetain Vesion Contol in Open Collaboative Editing of TeeStuctued Documents M. Lamine Ba Institut Mines Télécom; Télécom PaisTech; LTCI Pais, Fance mouhamadou.ba@ telecompaistech.f Talel Abdessalem
More informationOverencryption: Management of Access Control Evolution on Outsourced Data
Oveencyption: Management of Access Contol Evolution on Outsouced Data Sabina De Capitani di Vimecati DTI  Univesità di Milano 26013 Cema  Italy decapita@dti.unimi.it Stefano Paaboschi DIIMM  Univesità
More informationAN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM
AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM Main Golub Faculty of Electical Engineeing and Computing, Univesity of Zageb Depatment of Electonics, Micoelectonics,
More informationSoftware Engineering and Development
I T H E A 67 Softwae Engineeing and Development SOFTWARE DEVELOPMENT PROCESS DYNAMICS MODELING AS STATE MACHINE Leonid Lyubchyk, Vasyl Soloshchuk Abstact: Softwae development pocess modeling is gaining
More informationAn Efficient Group Key Agreement Protocol for Ad hoc Networks
An Efficient Goup Key Ageement Potocol fo Ad hoc Netwoks Daniel Augot, Raghav haska, Valéie Issany and Daniele Sacchetti INRIA Rocquencout 78153 Le Chesnay Fance {Daniel.Augot, Raghav.haska, Valéie.Issany,
More informationAn Introduction to Omega
An Intoduction to Omega Con Keating and William F. Shadwick These distibutions have the same mean and vaiance. Ae you indiffeent to thei iskewad chaacteistics? The Finance Development Cente 2002 1 Fom
More informationQuestions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing
M13914 Questions & Answes Chapte 10 Softwae Reliability Pediction, Allocation and Demonstation Testing 1. Homewok: How to deive the fomula of failue ate estimate. λ = χ α,+ t When the failue times follow
More informationMining Relatedness Graphs for Data Integration
Mining Relatedness Gaphs fo Data Integation Jeemy T. Engle (jtengle@indiana.edu) Ying Feng (yingfeng@indiana.edu) Robet L. Goldstone (goldsto@indiana.edu) Indiana Univesity Bloomington, IN. 47405 USA Abstact
More informationThe transport performance evaluation system building of logistics enterprises
Jounal of Industial Engineeing and Management JIEM, 213 6(4): 194114 Online ISSN: 213953 Pint ISSN: 2138423 http://dx.doi.og/1.3926/jiem.784 The tanspot pefomance evaluation system building of logistics
More informationYARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH
nd INTERNATIONAL TEXTILE, CLOTHING & ESIGN CONFERENCE Magic Wold of Textiles Octobe 03 d to 06 th 004, UBROVNIK, CROATIA YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH Jana VOBOROVA; Ashish GARG; Bohuslav
More informationChapter 3 Savings, Present Value and Ricardian Equivalence
Chapte 3 Savings, Pesent Value and Ricadian Equivalence Chapte Oveview In the pevious chapte we studied the decision of households to supply hous to the labo maket. This decision was a static decision,
More informationA framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods
A famewok fo the selection of entepise esouce planning (ERP) system based on fuzzy decision making methods Omid Golshan Tafti M.s student in Industial Management, Univesity of Yazd Omidgolshan87@yahoo.com
More informationReduced Pattern Training Based on Task Decomposition Using Pattern Distributor
> PNN05P762 < Reduced Patten Taining Based on Task Decomposition Using Patten Distibuto ShengUei Guan, Chunyu Bao, and TseNgee Neo Abstact Task Decomposition with Patten Distibuto (PD) is a new task
More informationApproximation Algorithms for Data Management in Networks
Appoximation Algoithms fo Data Management in Netwoks Chistof Kick Heinz Nixdof Institute and Depatment of Mathematics & Compute Science adebon Univesity Gemany kueke@upb.de Haald Räcke Heinz Nixdof Institute
More informationTowards Automatic Update of Access Control Policy
Towads Automatic Update of Access Contol Policy Jinwei Hu, Yan Zhang, and Ruixuan Li Intelligent Systems Laboatoy, School of Computing and Mathematics Univesity of Westen Sydney, Sydney 1797, Austalia
More informationest using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.
9.2 Inteest Objectives 1. Undestand the simple inteest fomula. 2. Use the compound inteest fomula to find futue value. 3. Solve the compound inteest fomula fo diffeent unknowns, such as the pesent value,
More informationDatabase Management Systems
Contents Database Management Systems (COP 5725) D. Makus Schneide Depatment of Compute & Infomation Science & Engineeing (CISE) Database Systems Reseach & Development Cente Couse Syllabus 1 Sping 2012
More informationSTUDENT RESPONSE TO ANNUITY FORMULA DERIVATION
Page 1 STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION C. Alan Blaylock, Hendeson State Univesity ABSTRACT This pape pesents an intuitive appoach to deiving annuity fomulas fo classoom use and attempts
More informationAn Approach to Optimized Resource Allocation for Cloud Simulation Platform
An Appoach to Optimized Resouce Allocation fo Cloud Simulation Platfom Haitao Yuan 1, Jing Bi 2, Bo Hu Li 1,3, Xudong Chai 3 1 School of Automation Science and Electical Engineeing, Beihang Univesity,
More informationConcept and Experiences on using a Wikibased System for Softwarerelated Seminar Papers
Concept and Expeiences on using a Wikibased System fo Softwaeelated Semina Papes Dominik Fanke and Stefan Kowalewski RWTH Aachen Univesity, 52074 Aachen, Gemany, {fanke, kowalewski}@embedded.wthaachen.de,
More informationON THE (Q, R) POLICY IN PRODUCTIONINVENTORY SYSTEMS
ON THE R POLICY IN PRODUCTIONINVENTORY SYSTEMS Saifallah Benjaafa and JoonSeok Kim Depatment of Mechanical Engineeing Univesity of Minnesota Minneapolis MN 55455 Abstact We conside a poductioninventoy
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distibuted Computing and Big Data: Hadoop and Map Bill Keenan, Diecto Tey Heinze, Achitect Thomson Reutes Reseach & Development Agenda R&D Oveview Hadoop and Map Oveview Use Case: Clusteing Legal Documents
More informationData Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation
(213) 1 28 Data Cente Demand Response: Avoiding the Coincident Peak via Wokload Shifting and Local Geneation Zhenhua Liu 1, Adam Wieman 1, Yuan Chen 2, Benjamin Razon 1, Niangjun Chen 1 1 Califonia Institute
More informationReview Graph based Online Store Review Spammer Detection
Review Gaph based Online Stoe Review Spamme Detection Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu Univesity of Illinois at Chicago Chicago, USA gwang26@uic.edu sxie6@uic.edu liub@uic.edu psyu@uic.edu
More informationThe Role of Gravity in Orbital Motion
! The Role of Gavity in Obital Motion Pat of: Inquiy Science with Datmouth Developed by: Chistophe Caoll, Depatment of Physics & Astonomy, Datmouth College Adapted fom: How Gavity Affects Obits (Ohio State
More informationHEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING
U.P.B. Sci. Bull., Seies C, Vol. 77, Iss. 2, 2015 ISSN 22863540 HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING Roxana MARCU 1, Dan POPESCU 2, Iulian DANILĂ 3 A high numbe of infomation systems ae available
More informationChris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment
Chis J. Skinne The pobability of identification: applying ideas fom foensic statistics to disclosue isk assessment Aticle (Accepted vesion) (Refeeed) Oiginal citation: Skinne, Chis J. (2007) The pobability
More informationScheduling Hadoop Jobs to Meet Deadlines
Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafo Anyanwu Depatment of Compute Science Noth Caolina State Univesity {kkc,kogan}@ncsu.edu Abstact Use constaints such as deadlines ae impotant equiements
More informationAn Analysis of Manufacturer Benefits under Vendor Managed Systems
An Analysis of Manufactue Benefits unde Vendo Managed Systems Seçil Savaşaneil Depatment of Industial Engineeing, Middle East Technical Univesity, 06531, Ankaa, TURKEY secil@ie.metu.edu.t Nesim Ekip 1
More informationConverting knowledge Into Practice
Conveting knowledge Into Pactice Boke Nightmae srs Tend Ride By Vladimi Ribakov Ceato of Pips Caie 20 of June 2010 2 0 1 0 C o p y i g h t s V l a d i m i R i b a k o v 1 Disclaime and Risk Wanings Tading
More informationMETHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION
ETHODOOGICA APPOACH TO STATEGIC PEFOANCE OPTIIZATION ao Hell * Stjepan Vidačić ** Željo Gaača *** eceived: 4. 07. 2009 Peliminay communication Accepted: 5. 0. 2009 UDC 65.02.4 This pape pesents a matix
More informationwho supply the system vectors for their JVM products. 1 HBench:Java will work best with support from JVM vendors
Appeaed in the ACM Java Gande 2000 Confeence, San Fancisco, Califonia, June 35, 2000 HBench:Java: An ApplicationSpecific Benchmaking Famewok fo Java Vitual Machines Xiaolan Zhang Mago Seltze Division
More informationHigh Availability Replication Strategy for Deduplication Storage System
Zhengda Zhou, Jingli Zhou College of Compute Science and Technology, Huazhong Univesity of Science and Technology, *, zhouzd@smail.hust.edu.cn jlzhou@mail.hust.edu.cn Abstact As the amount of digital data
More informationOptimizing Content Retrieval Delay for LTbased Distributed Cloud Storage Systems
Optimizing Content Retieval Delay fo LTbased Distibuted Cloud Stoage Systems Haifeng Lu, Chuan Heng Foh, Yonggang Wen, and Jianfei Cai School of Compute Engineeing, Nanyang Technological Univesity, Singapoe
More informationJapan s trading losses reach JPY20 trillion
IEEJ: Mach 2014. All Rights Reseved. Japan s tading losses each JPY20 tillion Enegy accounts fo moe than half of the tading losses YANAGISAWA Akia Senio Economist Enegy Demand, Supply and Foecast Goup
More informationModeling and Verifying a Price Model for Congestion Control in Computer Networks Using PROMELA/SPIN
Modeling and Veifying a Pice Model fo Congestion Contol in Compute Netwoks Using PROMELA/SPIN Clement Yuen and Wei Tjioe Depatment of Compute Science Univesity of Toonto 1 King s College Road, Toonto,
More informationIlona V. Tregub, ScD., Professor
Investment Potfolio Fomation fo the Pension Fund of Russia Ilona V. egub, ScD., Pofesso Mathematical Modeling of Economic Pocesses Depatment he Financial Univesity unde the Govenment of the Russian Fedeation
More informationA Comparative Analysis of Data Center Network Architectures
A Compaative Analysis of Data Cente Netwok Achitectues Fan Yao, Jingxin Wu, Guu Venkataamani, Suesh Subamaniam Depatment of Electical and Compute Engineeing, The Geoge Washington Univesity, Washington,
More informationEfficient Redundancy Techniques for Latency Reduction in Cloud Systems
Efficient Redundancy Techniques fo Latency Reduction in Cloud Systems 1 Gaui Joshi, Emina Soljanin, and Gegoy Wonell Abstact In cloud computing systems, assigning a task to multiple seves and waiting fo
More informationComparing Availability of Various Rack Power Redundancy Configurations
Compaing Availability of Vaious Rack Powe Redundancy Configuations By Victo Avela White Pape #48 Executive Summay Tansfe switches and dualpath powe distibution to IT equipment ae used to enhance the availability
More informationTHE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION
IADIS Intenational Confeence Applied Computing 2006 THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION Jög Roth Univesity of Hagen 58084 Hagen, Gemany Joeg.Roth@Fenunihagen.de ABSTRACT
More informationTowards Realizing a Low Cost and Highly Available Datacenter Power Infrastructure
Towads Realizing a Low Cost and Highly Available Datacente Powe Infastuctue Siam Govindan, Di Wang, Lydia Chen, Anand Sivasubamaniam, and Bhuvan Ugaonka The Pennsylvania State Univesity. IBM Reseach Zuich
More informationSecure SmartcardBased Fingerprint Authentication
Secue SmatcadBased Fingepint Authentication [full vesion] T. Chales Clancy Compute Science Univesity of Mayland, College Pak tcc@umd.edu Nega Kiyavash, Dennis J. Lin Electical and Compute Engineeing Univesity
More informationAutomatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*
Automatic Testing of Neighbo Discovey Potocol Based on FSM and TTCN* Zhiliang Wang, Xia Yin, Haibin Wang, and Jianping Wu Depatment of Compute Science, Tsinghua Univesity Beijing, P. R. China, 100084 Email:
More informationVISCOSITY OF BIODIESEL FUELS
VISCOSITY OF BIODIESEL FUELS One of the key assumptions fo ideal gases is that the motion of a given paticle is independent of any othe paticles in the system. With this assumption in place, one can use
More informationHow to recover your Exchange 2003/2007 mailboxes and emails if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database
AnswesThatWok TM Recoveing Emails and Mailboxes fom a PRIV1.EDB Exchange 2003 IS database How to ecove you Exchange 2003/2007 mailboxes and emails if all you have available ae you PRIV1.EDB and PRIV1.STM
More informationEffect of Contention Window on the Performance of IEEE 802.11 WLANs
Effect of Contention Window on the Pefomance of IEEE 82.11 WLANs Yunli Chen and Dhama P. Agawal Cente fo Distibuted and Mobile Computing, Depatment of ECECS Univesity of Cincinnati, OH 452213 {ychen,
More information2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,
3.4. KEPLER S LAWS 145 3.4 Keple s laws You ae familia with the idea that one can solve some mechanics poblems using only consevation of enegy and (linea) momentum. Thus, some of what we see as objects
More informationTracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors
Tacking/Fusion and Deghosting with Dopple Fequency fom Two Passive Acoustic Sensos Rong Yang, Gee Wah Ng DSO National Laboatoies 2 Science Pak Dive Singapoe 11823 Emails: yong@dso.og.sg, ngeewah@dso.og.sg
More informationChannel selection in ecommerce age: A strategic analysis of coop advertising models
Jounal of Industial Engineeing and Management JIEM, 013 6(1):89103 Online ISSN: 0130953 Pint ISSN: 013843 http://dx.doi.og/10.396/jiem.664 Channel selection in ecommece age: A stategic analysis of
More informationAn Immunological Approach to Change Detection: Algorithms, Analysis and Implications
An Immunological Appoach to Change Detection: Algoithms, Analysis and Implications Patik D haeselee Dept. of Compute Science Univesity of New Mexico Albuqueque, NM, 87131 patik@cs.unm.edu Stephanie Foest
More informationPerformance Analysis of an Inverse Notch Filter and Its Application to F 0 Estimation
Cicuits and Systems, 013, 4, 1171 http://dx.doi.og/10.436/cs.013.41017 Published Online Januay 013 (http://www.scip.og/jounal/cs) Pefomance Analysis of an Invese Notch Filte and Its Application to F 0
More informationThe impact of migration on the provision. of UK public services (SRG.10.039.4) Final Report. December 2011
The impact of migation on the povision of UK public sevices (SRG.10.039.4) Final Repot Decembe 2011 The obustness The obustness of the analysis of the is analysis the esponsibility is the esponsibility
More informationSupplementary Material for EpiDiff
Supplementay Mateial fo EpiDiff Supplementay Text S1. Pocessing of aw chomatin modification data In ode to obtain the chomatin modification levels in each of the egions submitted by the use QDCMR module
More informationMULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION
MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION K.C. CHANG AND TAN ZHANG In memoy of Pofesso S.S. Chen Abstact. We combine heat flow method with Mose theoy, supe and subsolution method with
More informationDo Vibrations Make Sound?
Do Vibations Make Sound? Gade 1: Sound Pobe Aligned with National Standads oveview Students will lean about sound and vibations. This activity will allow students to see and hea how vibations do in fact
More informationLab #7: Energy Conservation
Lab #7: Enegy Consevation Photo by Kallin http://www.bungeezone.com/pics/kallin.shtml Reading Assignment: Chapte 7 Sections 1,, 3, 5, 6 Chapte 8 Sections 14 Intoduction: Pehaps one of the most unusual
More informationHow to create RAID 1 mirroring with a hard disk that already has data or an operating system on it
AnswesThatWok TM How to set up a RAID1 mio with a dive which aleady has Windows installed How to ceate RAID 1 mioing with a had disk that aleady has data o an opeating system on it Date Company PC / Seve
More informationExperiment 6: Centripetal Force
Name Section Date Intoduction Expeiment 6: Centipetal oce This expeiment is concened with the foce necessay to keep an object moving in a constant cicula path. Accoding to Newton s fist law of motion thee
More informationSemipartial (Part) and Partial Correlation
Semipatial (Pat) and Patial Coelation his discussion boows heavily fom Applied Multiple egession/coelation Analysis fo the Behavioal Sciences, by Jacob and Paticia Cohen (975 edition; thee is also an updated
More informationPromised LeadTime Contracts Under Asymmetric Information
OPERATIONS RESEARCH Vol. 56, No. 4, July August 28, pp. 898 915 issn 3364X eissn 15265463 8 564 898 infoms doi 1.1287/ope.18.514 28 INFORMS Pomised LeadTime Contacts Unde Asymmetic Infomation Holly
More informationCloud Service Reliability: Modeling and Analysis
Cloud Sevice eliability: Modeling and Analysis YuanShun Dai * a c, Bo Yang b, Jack Dongaa a, Gewei Zhang c a Innovative Computing Laboatoy, Depatment of Electical Engineeing & Compute Science, Univesity
More informationUNIT CIRCLE TRIGONOMETRY
UNIT CIRCLE TRIGONOMETRY The Unit Cicle is the cicle centeed at the oigin with adius unit (hence, the unit cicle. The equation of this cicle is + =. A diagam of the unit cicle is shown below: + =   
More informationMemoryAware Sizing for InMemory Databases
MemoyAwae Sizing fo InMemoy Databases Kasten Molka, Giuliano Casale, Thomas Molka, Laua Mooe Depatment of Computing, Impeial College London, United Kingdom {k.molka3, g.casale}@impeial.ac.uk SAP HANA
More informationEvaluating the impact of Blade Server and Virtualization Software Technologies on the RIT Datacenter
Evaluating the impact of and Vitualization Softwae Technologies on the RIT Datacente Chistophe M Butle Vitual Infastuctue Administato Rocheste Institute of Technology s Datacente Contact: chis.butle@it.edu
More informationComparing Availability of Various Rack Power Redundancy Configurations
Compaing Availability of Vaious Rack Powe Redundancy Configuations White Pape 48 Revision by Victo Avela > Executive summay Tansfe switches and dualpath powe distibution to IT equipment ae used to enhance
More informationReal Time Tracking of High Speed Movements in the Context of a Table Tennis Application
Real Time Tacking of High Speed Movements in the Context of a Table Tennis Application Stephan Rusdof Chemnitz Univesity of Technology D09107, Chemnitz, Gemany +49 371 531 1533 stephan.usdof@infomatik.tuchemnitz.de
More informationFinancing Terms in the EOQ Model
Financing Tems in the EOQ Model Habone W. Stuat, J. Columbia Business School New Yok, NY 1007 hws7@columbia.edu August 6, 004 1 Intoduction This note discusses two tems that ae often omitted fom the standad
More information9:6.4 Sample Questions/Requests for Managing Underwriter Candidates
9:6.4 INITIAL PUBLIC OFFERINGS 9:6.4 Sample Questions/Requests fo Managing Undewite Candidates Recent IPO Expeience Please povide a list of all completed o withdawn IPOs in which you fim has paticipated
More informationThe Binomial Distribution
The Binomial Distibution A. It would be vey tedious if, evey time we had a slightly diffeent poblem, we had to detemine the pobability distibutions fom scatch. Luckily, thee ae enough similaities between
More informationTiming Synchronization in High Mobility OFDM Systems
Timing Synchonization in High Mobility OFDM Systems Yasamin Mostofi Depatment of Electical Engineeing Stanfod Univesity Stanfod, CA 94305, USA Email: yasi@wieless.stanfod.edu Donald C. Cox Depatment of
More informationThings to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.
Retiement Benefit 1 Things to Remembe Complete all of the sections on the Retiement Benefit fom that apply to you equest. If this is an initial equest, and not a change in a cuent distibution, emembe to
More informationUnveiling the MPLS Structure on Internet Topology
Unveiling the MPLS Stuctue on Intenet Topology Gabiel Davila Revelo, Mauicio Andeson Ricci, Benoit Donnet, José Ignacio AlvaezHamelin INTECIN, Facultad de Ingenieía, Univesidad de Buenos Aies Agentina
More informationA Capacitated Commodity Trading Model with Market Power
A Capacitated Commodity Tading Model with Maket Powe Victo MatínezdeAlbéniz Josep Maia Vendell Simón IESE Business School, Univesity of Navaa, Av. Peason 1, 08034 Bacelona, Spain VAlbeniz@iese.edu JMVendell@iese.edu
More informationEnergy Efficient Cache Invalidation in a Mobile Environment
Enegy Efficient Cache Invalidation in a Mobile Envionment Naottam Chand, Ramesh Chanda Joshi, Manoj Misa Electonics & Compute Engineeing Depatment Indian Institute of Technology, Rookee  247 667. INDIA
More informationResearch on Risk Assessment of the Transformer Based on Life Cycle Cost
ntenational Jounal of Smat Gid and lean Enegy eseach on isk Assessment of the Tansfome Based on Life ycle ost Hui Zhou a, Guowei Wu a, Weiwei Pan a, Yunhe Hou b, hong Wang b * a Zhejiang Electic Powe opoation,
More informationOffice of Family Assistance. Evaluation Resource Guide for Responsible Fatherhood Programs
Office of Family Assistance Evaluation Resouce Guide fo Responsible Fathehood Pogams Contents Intoduction........................................................ 4 Backgound..........................................................
More informationHour Exam No.1. p 1 v. p = e 0 + v^b. Note that the probe is moving in the direction of the unit vector ^b so the velocity vector is just ~v = v^b and
Hou Exam No. Please attempt all of the following poblems befoe the due date. All poblems count the same even though some ae moe complex than othes. Assume that c units ae used thoughout. Poblem A photon
More informationA TwoStep Tabu Search Heuristic for MultiPeriod MultiSite Assignment Problem with Joint Requirement of Multiple Resource Types
Aticle A TwoStep Tabu Seach Heuistic fo MultiPeiod MultiSite Assignment Poblem with Joint Requiement of Multiple Resouce Types Siavit Swangnop and Paveena Chaovalitwongse* Depatment of Industial Engineeing,
More informationMultiband Microstrip Patch Antenna for Microwave Applications
IOSR Jounal of Electonics and Communication Engineeing (IOSRJECE) ISSN: 22782834, ISBN: 22788735. Volume 3, Issue 5 (Sep.  Oct. 2012), PP 4348 Multiband Micostip Patch Antenna fo Micowave Applications
More informationINITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS
INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS Vesion:.0 Date: June 0 Disclaime This document is solely intended as infomation fo cleaing membes and othes who ae inteested in
More information30 H. N. CHIU 1. INTRODUCTION. Recherche opérationnelle/operations Research
RAIRO Rech. Opé. (vol. 33, n 1, 1999, pp. 2945) A GOOD APPROXIMATION OF THE INVENTORY LEVEL IN A(Q ) PERISHABLE INVENTORY SYSTEM (*) by Huan Neng CHIU ( 1 ) Communicated by Shunji OSAKI Abstact. This
More informationAn Epidemic Model of Mobile Phone Virus
An Epidemic Model of Mobile Phone Vius Hui Zheng, Dong Li, Zhuo Gao 3 Netwok Reseach Cente, Tsinghua Univesity, P. R. China zh@tsinghua.edu.cn School of Compute Science and Technology, Huazhong Univesity
More informationAn application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty
An application of stochastic pogamming in solving capacity allocation and migation planning poblem unde uncetainty YinYann Chen * and HsiaoYao Fan Depatment of Industial Management, National Fomosa Univesity,
More informationAn Infrastructure Cost Evaluation of Single and MultiAccess Networks with Heterogeneous Traffic Density
An Infastuctue Cost Evaluation of Single and MultiAccess Netwoks with Heteogeneous Taffic Density Andes Fuuskä and Magnus Almgen Wieless Access Netwoks Eicsson Reseach Kista, Sweden [andes.fuuska, magnus.almgen]@eicsson.com
More informationGive me all I pay for Execution Guarantees in Electronic Commerce Payment Processes
Give me all I pay fo Execution Guaantees in Electonic Commece Payment Pocesses Heiko Schuldt Andei Popovici HansJög Schek Email: Database Reseach Goup Institute of Infomation Systems ETH Zentum, 8092
More informationSUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA
Intenational Jounal of Compute Science, Systems Engineeing and Infomation Technology, 4(), 20, pp. 677 SUPPORT VECTOR MACHIE FOR BADWIDTH AALYSIS OF SLOTTED MICROSTRIP ATEA Venmathi A.R. & Vanitha L.
More informationTrading Volume and Serial Correlation in Stock Returns in Pakistan. Abstract
Tading Volume and Seial Coelation in Stock Retuns in Pakistan Khalid Mustafa Assistant Pofesso Depatment of Economics, Univesity of Kaachi email: khalidku@yahoo.com and Mohammed Nishat Pofesso and Chaiman,
More informationProgramming Assignment #1
Due: Nov 3 (11:59pm). Pogamming Assignment #1 CMSC 351 Fall 2014 Rules 1) You may only use C/C++, Java. 2) You pogam should use the standad input/output. Fo example C/C++ uses should use scanf/pintf/cin/cout
More informationManual ultrasonic inspection of thin metal welds
Manual ultasonic inspection of thin metal welds Capucine Capentie and John Rudlin TWI Cambidge CB1 6AL, UK Telephone 01223 899000 Fax 01223 890689 Email capucine.capentie@twi.co.uk Abstact BS EN ISO 17640
More informationContinuous Compounding and Annualization
Continuous Compounding and Annualization Philip A. Viton Januay 11, 2006 Contents 1 Intoduction 1 2 Continuous Compounding 2 3 Pesent Value with Continuous Compounding 4 4 Annualization 5 5 A Special Poblem
More informationFXA 2008. Candidates should be able to : Describe how a mass creates a gravitational field in the space around it.
Candidates should be able to : Descibe how a mass ceates a gavitational field in the space aound it. Define gavitational field stength as foce pe unit mass. Define and use the peiod of an object descibing
More informationSelfAdaptive and ResourceEfficient SLA Enactment for Cloud Computing Infrastructures
2012 IEEE Fifth Intenational Confeence on Cloud Computing SelfAdaptive and ResouceEfficient SLA Enactment fo Cloud Computing Infastuctues Michael Maue, Ivona Bandic Distibuted Systems Goup Vienna Univesity
More informationStatistics and Data Analysis
Pape 27425 An Extension to SAS/OR fo Decision System Suppot Ali Emouznead Highe Education Funding Council fo England, Nothavon house, Coldhabou Lane, Bistol, BS16 1QD U.K. ABSTRACT This pape exploes the
More informationGravitational Mechanics of the MarsPhobos System: Comparing Methods of Orbital Dynamics Modeling for Exploratory Mission Planning
Gavitational Mechanics of the MasPhobos System: Compaing Methods of Obital Dynamics Modeling fo Exploatoy Mission Planning Alfedo C. Itualde The Pennsylvania State Univesity, Univesity Pak, PA, 6802 This
More informationCoordinate Systems L. M. Kalnins, March 2009
Coodinate Sstems L. M. Kalnins, Mach 2009 Pupose of a Coodinate Sstem The pupose of a coodinate sstem is to uniquel detemine the position of an object o data point in space. B space we ma liteall mean
More informationLoyalty Rewards and Gift Card Programs: Basic Actuarial Estimation Techniques
Loyalty Rewads and Gift Cad Pogams: Basic Actuaial Estimation Techniques Tim A. Gault, ACAS, MAAA, Len Llaguno, FCAS, MAAA and Matin Ménad, FCAS, MAAA Abstact In this pape we establish an actuaial famewok
More informationENABLING INFORMATION GATHERING PATTERNS FOR EMERGENCY RESPONSE WITH THE OPENKNOWLEDGE SYSTEM
Computing and Infomatics, Vol. 29, 2010, 537 555 ENABLING INFORMATION GATHERING PATTERNS FOR EMERGENCY RESPONSE WITH THE OPENKNOWLEDGE SYSTEM Gaia Tecaichi, Veonica Rizzi, Mauizio Machese Depatment of
More information