Supporting Efficient Top-k Queries in Type-Ahead Search

Size: px
Start display at page:

Download "Supporting Efficient Top-k Queries in Type-Ahead Search"

Transcription

1 Suppoting Efficient Top-k Queies in Type-Ahead Seach Guoliang Li Jiannan Wang Chen Li Jianhua Feng Depatment of Compute Science, Tsinghua National Laboatoy fo Infomation Science and Technology (TNList), Tsinghua Univesity, Beijing 84, China. Depatment of Compute Science, UC Ivine, CA , USA ABSTRACT Type-ahead seach can on-the-fly find answes as a use types in a keywod quey. A main challenge in this seach paadigm is the high-efficiency equiement that queies must be answeed within milliseconds. In this pape we study how to answe top-k queies in this paadigm, i.e., as a use types in a quey lette by lette, we want to efficiently find the k best answes. Instead of inventing completely new algoithms fom scatch, we study challenges when adopting existing top-k algoithms in the liteatue that heavily ely on two basic list-access methods: andom access and soted access. We pesent two algoithms to suppot andom access efficiently. We develop novel techniques to suppot efficient soted access using list puning and mateialization. We extend ou techniques to suppot fuzzy type-ahead seach which allows mino eos between quey keywods and answes. We epot ou expeimental esults on seveal eal lage data sets to show that the poposed techniques can answe top-k queies efficiently in type-ahead seach. Categoies and Subject Desciptos H.3.3 [Infomation Seach and Retieval]: models Geneal Tems Algoithms, Expeimentation, Pefomance Keywods Type-ahead seach, top-k seach, fuzzy seach Retieval. INTRODUCTION To give instant feedback when uses fomulate seach queies, many infomation systems suppot autocomplete seach, which shows esults immediately afte a use types in a patial keywod quey. As an example, almost all the majo seach engines nowadays automatically suggest possible keywod queies as a use types in patial keywods. Most autocomplete systems teat a quey with multiple keywods as asingle sting, and find answes with text that matches the sting Pemission to make digital o had copies of all o pat of this wok fo pesonal o classoom use is ganted without fee povided that copies ae not made o distibuted fo pofit o commecial advantage and that copies bea this notice and the full citation on the fist page. To copy othewise, to epublish, to post on seves o to edistibute to lists, equies pio specific pemission and/o a fee. SIGIR 2, August 2 6, 22, Potland, Oegon, USA. Copyight 22 ACM /2/8...$5.. exactly. To ovecome this limitation, a new type-ahead seach paadigm has emeged ecently [2, 3]. Using this paadigm, a system teats a quey as a set of keywods, and does a full-text seach on the undelying data to find answes including the keywods. We teat the last keywod in the quey as a patial keywod the use is completing. Fo instance, a quey gaph sig on a publication table can find publication ecods with the keywod gaph andakeywod that has sig as a pefix, such as sigi, sigmod, and signatue. In this way, a use can get instant feedback afte typing keywods, thus can obtain moe knowledge about the undelying data to fomulate a quey moe easily. Ji et al. [3] extended type-ahead seach by allowing mino eos between queies and answes. As a use types in quey keywods, the system can find elevant ecods with keywods simila to the quey keywods. This featue is especially impotant when the use has limited knowledge about the exact epesentation of entities she is looking fo. Fo instance, if a use types in a patial quey chitos falut, the system can find ecods appoximately matching the two keywods despite the typo in the quey, such as a ecod with keywods Chistos Faloutsos. Clealy these featues can futhe impove use seach expeiences. In this pape we study how to answe anking queies in type-ahead seach on lage amounts of data. That is, as a use types in a keywod quey lette by lette, we want to on-the-fly find the most elevant (o top-k ) ecods. One appoach fist finds ecods matching those quey keywods, and then computes thei anking scoes to find the most elevant ones. This appoach is not efficient when thee ae a lage numbe of candidate answes to compute and stoe. Existing type-ahead seach appoaches assume an index stuctue with a tie fo the keywods in the undelying data, and each leaf node has an inveted list of ecods with this keywod, with the weight of this keywod in the ecod [3, 9]. As an example, Table shows a sample collection of publication ecods. Fo simplicity, we only list some of the keywods fo each ecod. Figue shows the coesponding index stuctue. (Moe details about the index ae in Section 3.) Suppose a use types in a quey gaph icdm li. Fo exact seach, we find ecods containing the fist two keywods and a wod with pefix of li, e.g., ecod 5. Fo fuzzy seach, we compute ecods with keywods simila to quey keywods, and ank them to find the best answes. Fo each complete keywod, we find keywods simila to the quey keywod. Fo instance, both keywods icdm and icdl ae simila to the second quey keywod. The last keywod 355

2 Table : Publication ecods with sample keywods. Recod ID Recod gaph icdm... gaph goup lui... 2 gay icdl liu... 3 gaph icdl lin lui... 4 gaph goup icdm lin liu... 5 gaph gay goss icdm lin liu... 6 gay goup icdm lin liu... 7 gay goss goup icdl lin... 8 goss icdl liu... 9 icdm liu... li is teated as a pefix condition, since the use is still typing at the end of this keywod. We find keywods that have a pefix simila to li, such as lin, liu, and lui. We access the inveted lists of these simila keywods to find ecods and ank them to find the best answes fo the use. A key question is: how to access inveted lists on tie leaf nodes efficiently to answe top-k queies? Instead of inventing completely new algoithms fom scatch, we study how to adopt a plethoa of algoithms in the liteatue fo answeing top-k queies by accessing lists (e.g., [2, 2]). These algoithms shae the same famewok poposed by Fagin [6], in which we have lists of ecods soted based on vaious conditions. An aggegation function takes the scoes of a ecod fom these lists and computes the final scoe of the ecod. Thee ae two methods to access these lists: () Random Access: Given a ecod id, we can etieve the scoe of the ecod on each list; (2) Soted Access: We etieve the ecod ids on each list following the list ode. In this pape we study technical challenges when adopting these algoithms, and focus on new optimization oppotunities that aise in ou poblem. In paticula, we study how to suppot the two types of access opeations efficiently by utilizing chaacteistics specific to ou index stuctues and access methods. We make the following contibutions: ) In Section 3, we pesent a fowad-list-based method fo suppoting andom access on the inveted lists, and develop a heap-based method and list-mateialization techniques to suppot soted access efficiently. 2) In Section 4 we study fuzzy type-ahead seach. We popose a list-puning technique to impove the pefomance of soted access, and study how to impove the techniques based on fowad lists and list mateialization fo fuzzy seach. Due to the challenging natue of the poblem, ou extensions ae technically nontivial. 3) In Section 5 we pesent ou expeimental esults on eal lage data sets to show the efficiency of ou techniques. We have deployed seveal systems using this paadigm, which have been used egulaly and well accepted by uses due to its fiendly inteface and high efficiency. 2. FORMULATION AND PRELIMINARIES Type-Ahead Seach: LetR be a collection of ecods such as the tuples in a elational table. Let D be the set of wods in R. Let Q be a quey the use has typed in, which is a sequence of keywods w,w 2,...,w m. We teat the last keywod w m as a patial keywod the use is completing, and othe keywods as complete keywods the use has completed 2. As a use types in a keywod quey lette by lette, type-ahead seach on-the-fly finds ecods that contain the fist m keywods and a wod with the last keywod as a pefix. 2 Ou method can be easily extended to the case that evey keywod is taken as a patial keywod. a p y h 5,9 4,7 3,4,3,2 g i c i o d n s u l m s p 3,9 8,9 7,4,9 4,9 7,8 7,8 8,2 7,8 5,8 6,4 5,4 3,2 6,4 6,5 5,3 2,2 4,3 9,4 4,2,3 5,8 4,7 2,9 6,5 5,8 9,4 6,4 2,3 7,3 8, Figue : Tie index stuctue. u l u i,6 3,4 Without loss of geneality, each sting in the data set and a quey is assumed to use lowe-case lettes. Fo example, in Table, R = {,,..., 9}, D = {gaph, icdm, goup, lui,...}. Suppose a use types in a quey icdm ga. We teat icdm as a complete keywod and ga as a patial keywod. Recods, 4, 5, and 6 ae potentially elevant answes. Fo example, contains complete keywod icdm andwod gaph witha pefixof ga. When the use types in moe lettes and submits quey icdm gaph li, we teat icdm and gaph as complete keywods and li as a patial keywod. Recods 4 and 5 ae potentially elevant answes. Top-k Answes: We ank each ecod in R based on its elevance to the quey. Given a positive intege k, ou goal is to compute the best k ecods in R anked by thei elevance to Q. Notice that ou poblem setting allows an impotant ecod to be in the answe, even if not all quey keywods appea in the ecod (the OR semantics). Thus the algoithms in [3] cannot be used diectly in ou poblem. Ranking: In the liteatue thee ae many algoithms fo answeing top-k queies by accessing lists (e.g., [2, 2]). These algoithms shae the same famewok poposed by Fagin [6], in which we have lists of ecods soted based on vaious conditions, such as tem fequency and invese document fequency ( tf*idf ). Each ecod has a scoe on a list, and we use an aggegation function to combine the scoes of the ecod on diffeent lists to compute its oveall elevance to the quey. The aggegation function needs to be monotonic, i.e., deceasing the scoe of a ecod on a list cannot incease the ecod s oveall scoe. This appoach has the advantage of allowing a geneal class of anking functions. In this pape, we focus on an impotant class of anking functions with the following popety: the scoe F (, Q) of a ecod to a quey Q is a monotonic combination of scoes of the quey keywods with espect to the ecod. Fomally, we compute the scoe F (, Q) intwo steps. In the fist step, fo each keywod w, wecomputea scoeofthekeywodwithespecttotheecod, denoted by F (, w). In the second step, we compute the scoe F (, Q) by applying a monotonic function on the F (, w) s fo all the keywods w. The intuition of this popety is that the moe elevant an individual quey keywod is to a ecod, the moe likely this ecod is a good answe to this quey. Fo example, we compute the scoe of a ecod to quey icdm gaph li by aggegating the scoes of each of keywods with espect to the ecod. Each complete keywod w has a weight associated with aecod, denoted by W (, w). This weight could depend 356

3 Quey Keywods Patial keywod w w 2 w m Tie vitual list Inveted lists Figue 2: Type-ahead seach fo Q = w,w 2,...,w m. on the keywod, such as the tf*idf value of the keywod in the ecod. As a specific case, it can also be independent fom the keywod. Fo instance, if a ecod is a URL with tokenized keywods, its weight could be a ank scoe of the coesponding Web page. If a ecod is an autho, we can use the numbe of publications of the autho as a weight of this ecod. Fo the last patial keywod w m, thee could be multiple complete wods. We compute the elevance scoe of w m in the ecod, i.e., F (, w m), based on the following popety: F (, w m) is the maximal value of the W (, d) weights fo all the keywods d with espect to w m in, whee d is a keywod in ecod and has a pefix of w m. This popety states that we only look at the most elevant keywod in a ecod to the patial keywod when computing the elevance of the keywod to the ecod. It means that the anking function is geedy to find the most elevant keywod in the ecod as an indicato of how impotant this ecod is to the patial keywod. As we can see in Section 3, this popety allows us to do effective puning when accessing the multiple lists of a quey keywod. The following is an example function. m F (, Q) = F (, w i), () whee i= { W (, w F (, w i) if i<m, i)= max complete wod d of wm {W (, d)} if i = m. (2) In Figue, conside quey icdm gaph li and ecod 5. F ( 5, icdm ) = W ( 5, icdm )=8andF ( 5, gaph ) = W ( 5, gaph ) = 9. The patial keywod li hastwo complete wods lin and liu. F ( 5, li ) = max{w ( 5, lin ), W ( 5, liu )}=8. F ( 5, icdm gaph li ) = EXACT TYPE-AHEAD SEARCH In this section, we study efficient list-access methods to suppot exact type-ahead seach, i.e., no mismatches between quey keywods and answes. Indexing: We constuct a tie fo the data keywods in the data D. A tie node has a chaacte label. Each keywod in D coesponds to a unique path fom the oot to a leaf node 3 on the tie. Fo simplicity, a tie node is mentioned intechangeably with the keywod coesponding to the path fom the oot to the node. A leaf node has an inveted list of IDs of pais id, weight, wheeid is the ID of a ecod containing the leaf-node sting, and weight is the weight of the keywod in the ecod. Figue shows the index stuctue in ou unning example. Fo instance, fo the leaf node of keywod gaph, its inveted list has five elements. 3 A common tick to make each leaf node coesponds to a complete wod and vice vesa is to add a special mak to the end of each wod. Fo simplicity we did not use this tick. Fowad index [,4] [5,6] [7,9] Recod Fowad list i l,2 ;6,3 [5,6] [7,8] [9,9] [,2] c i u,3 ;4,9 ;9,6 [3,4] [5,6] 2 2,9 ;5,2 ;8,3 [3,3] [4,4] d n u i 3,4 ;5,2 ;7,9;9,4 l m ,7 ;4,3;6,9;7,2;8, ,9 ;2,8;3,4;6,8;7,3;8, Figue 3: Fowad lists. g [,4] a o [,] p y s u 2 h s p The fist element 5, 9 indicates that the ecod 5 has this keywod, and the weight of this keywod in this ecod is 9, i.e., W ( 5, gaph ) = 9. Seaching: We compute the top-k answes to a quey Q in two steps. As illustated in Figue 2, in the fist step, fo each complete keywod w i( i m ), we get its inveted list. Fo the last patial keywod, we locate the tie node of w m and etieve the inveted lists of the tie node s leaf descendants. Fo example, in Figue, conside a quey icdm li. The patial keywod li has two leaf-node keywods: lin and liu. In the second step, we access the inveted lists to compute the k best answes. Many algoithms have been poposed fo answeing top-k queies by accessing soted lists [2, 6]. When adopting these algoithms to solve ou poblem, we need to efficiently suppot two basic types of access used in these algoithms: andom access and soted access on the lists. 3. Efficient Random Access To suppot andom access, we constuct a fowad index in which each ecod has a fowad list of IDs of its keywods. We assume each keywod has a unique ID with espect to its leaf node on the tie, and the IDs of the keywods follow thei alphabetical ode. Figue 3 shows the fowad lists. The element, 9 onthefowadlistofecod 5 shows that this ecod has a keywod with ID and weight 9, which is keywod gaph as shown on the tie. Given a ecod and a complete keywod, we can get the coesponding weight by doing a binay-seach on the fowad list. Fo example, to get the weight of keywod icdm with ID 6 in 5, we can do a binay seach on 5 s fowad list and get the coesponding weight 8. Fo the patial keywod, as it has multiple complete wods, we need fist locate its tie node and then enumeate its leaf-descendants to get the coesponding weights. This method could be expensive if the tie node has many leaf-descendants. To impove the pefomance, we can use an altenative method. Fo each tie node n, we can maintain a keywod ange [l n,u n], whee l n and u n ae the minimal and maximal keywod IDs of its leaf nodes, espectively [3]. An inteesting obsevation is that a complete wod with n as a pefix must have an ID in this keywod ange, and each complete wod in the data set with an ID in this ange must have a pefix of n. In Figue 3, the keywod ange of node g is [, 4], since is the smallest ID of its leaf nodes and 4 is the lagest one. Based on this obsevation, this method veifies whethe ecod contains a keywod with a pefix of w m as follows. We fist locate the tie node w m and then check if thee is a keywod ID on the fowad list of in the keywod ange [l wm,u wm ]. Since we can keep the fowad list of soted, this checking can be done efficiently. Fo instance, conside quey gaph icdm l. Fo the fist element on the inveted list of gaph, 5, 9, we can check whethe 357

4 Vitual soted list 3,9 5,8 7,8 4,7,6 6,5 9,4 2,3 8, Patial keywod l 3,9 3,9 3,9 5,8 3,9 5,8,6 7,8 4,7 3,4 6,4 6,5 lui 5,3 9,4 4,2 2,3 8, lin liu Figue 4: A heap-based method to compute the vitual soted list of patial keywod l.,6 5 contains othe two keywods as follows. Fo complete U(v) fo node v with espect to patial keywod w m. keywod icdm withid6,wedoabinayseachon 5 s an answe, i.e., 27. We get the next elements of gaph and fowad list and get weight 8. Fo patial keywod l with icdm, 4, 7 and 5, 8. We incement the cuso of the keywod ange [7, 9], using a binay seach on 5 s fowad list (, 9 ; 2, 8 ; 3, 4 ; 6, 8 ; 7, 3 ; 8, 8 ) list that poduces the top element, push it into the heap, and, we find keywod etieve the next top element: 5, 8. Based on the accessed IDs 7 and 8 in this ange. Thus we know that the ecod elements, we have ) The scoe of ecod 5 is = 25; indeed contains keywods with pefix l, and compute the coesponding scoe F ( 5, l ) = max { 2) The maximal scoe of ecod 3 is = 24, and F ( 5, lin ),F( 5, liu ), F ( 5, lui ) } that of 4 is = 24, while those of othe ecods ae =8.ThusF ( 5, gaph icdm l ) = 25. at most = 23. Thus, ecod 5 is the best answe. 3.2 Efficient Soted Access To suppot soted access, we can keep the elements on the inveted lists soted based on thei weights in a descending ode. Thus, fo the complete keywod, we can get an odeed list. Fo the patial keywod w m, it has multiple leaf descendants and coesponding inveted lists. We use U(w m)todenote the union of those inveted lists, called union list of w m.we need to suppot soted access on U(w m) to etieve the next most elevant ecod ID fo w m. Fully computing U(w m) using the keywod lists could be expensive in tems of time and space. In this section, we popose two techniques to suppot soted access efficiently Heap-Based Method We can suppot soted access on U(w m) by building a max heap on the inveted lists of its leaf nodes. In paticula, we maintain a cuso on each inveted list. The max heap initially consists of the ecod IDs pointed by the cusos so fa, soted on the weights of the keywods in these ecods. Notice that each inveted list is aleady soted based on the weights of its keywod in the ecods. To etieve the next best ecod, we pop the top element fom the heap, incement the cuso of the list of the popped element by, and push the new element of this list to the heap. When popping all elements fom the heap, we can get a soted list fo the patial keywod. Fo example, conside the patial keywod l. It has thee complete keywods lin, liu, and lui. We can compute its union list as shown in Figue 4. Note that since ou method does not need to compute the entie list of U(w m), U(w m)isavitual soted list of patial keywod w m. On top of the inveted lists of complete keywods and the max heap of the patial keywod, we can adopt an existing top-k algoithm to find the k best ecods. As an example, suppose we want to compute the top- best answe fo quey gaph icdm l using soted access only. We get the fist elements of gaph and icdm, 5, 9 and 4, 9, pop the top element of the max heap in Figue 4, 3, 9, and compute an uppe bound on the oveall scoe of Legend: Figue 5: M(v): Mateialized descendants of v v Max heap of w m T(v): subtie of v N(v): othe leaf nodes (of v) without mateialized ancestos Benefits of mateializing the union list List Mateialization We can futhe impove the pefomance of soted access fo the patial keywod w m by pecomputing and stoing the unions of some of the inveted lists on the tie. Let v be a tie node, and U(v) be the union of the inveted lists of v s leaf nodes, soted by thei ecod weights. If a ecod appeas moe than once on these lists, we choose its maximal weight as its weight on list U(v). Fo example, U( li ) = { 3, 9, 5, 8, 7, 8 ; 4, 7, 6, 5, 9, 4, 2, 3, 8, }. When using a max heap to etieve ecods soted by thei scoes fo the patial keywod, this mateialized list could help us build a max heap with fewe lists and educe the cost of push/pop opeations on the heap. Theefoe, this method allows us to utilize additional memoy space to answe top-k queies moe efficiently. Fo instance, conside the index in Figue and a quey icdm g. Fo the patial keywod g, we access its data keywods gaph, gay, goss, and goup, and build a max heap on thei inveted lists based on ecod scoes with espect to this quey keywod. If we mateialize the union lists of ga and go, we can use thei mateialized lists, saving the time to tavese the fou leaf nodes and some push/pop opeations on the heap. We next give a detailed cost-based analysis to quantify the benefit of mateializing a node on the pefomance of opeations on the max heap of w m, fo exact type-ahead seach. Let B be a budget of stoage space we ae given to mateialize union lists. Given a tie node v, letu(v) bethe union of inveted lists of leaf nodes in the subtie of v. Ou goal is to select tie nodes to mateialize thei union lists fo maximizing the pefomance of queies. The following ae naive algoithms fo choosing tie nodes: Random: We andomly select tie nodes. TopDown: We select nodes top down fom the tie oot. BottomUp: We select nodes bottom up fom leaf nodes. Each naive appoach keeps choosing tie nodes to mateialize thei union lists until the sum of thei list sizes eaches the space limit B. One main limitation of these appoaches is that they do not quantitatively conside the benefits of 358

5 mateializing a union list. To ovecome this limitation, we popose a cost-based method called CostBased to do list mateialization. Its main idea is the following. Fo simplicity we say a node has been mateialized if its union list has been mateialized. Fo a quey Q with a pefix keywod w m, suppose some of the tie nodes have thei union lists mateialized. Let v be such a mateialized node. If we can use U(v) to constuct the heap of w m,we need not visit v s descendants and access the inveted lists of v s leaf descendants, and thus achieve the benefit of educing the time of tavesing the subtie ooted at v and push/pop opeations on the max heap of w m. We say the mateialized node v is usable fo patial keywod w m. Next we discuss how to check whethe a node v is usable fo patial keywod w m. If v is not a descendant of w m, mateializing v is unusable to w m; othewise, if no node on the path fom v to w m (including w m) has been mateialized, mateializing v is usable to w m. Notice that if v has a mateialized ancesto v on the path fom v to w m, then we can use the mateialized list U(v ) instead of U(v), and the list U(v) will no longe be usable to w m. To summaize, a mateialized node v is usable fo patial keywod w m if,. v is a descendant of w m;and 2. v has no mateialized ancesto between v and w m. Fo example, conside a quey icdm g, mateializing node l is unusable fo patial keywod g as l is not a descendant of g. Mateializing g is usablefo g if g isnot mateialized. If g is mateialized, then mateializing ga is unusable fo g as we will use the mateialized list of g to build the max heap of g, instead of using ga. If v is usable fo w m, mateializing U(v) has the following benefits fo the heap of w m. () We do not need to tavese the tie to access these leaf nodes and use them to constuct the max heap; (2) Each push/pop opeation on the heap is moe efficient since it has fewe lists. Hee we pesent an analysis of the benefits of mateializing the usable node v. In geneal, fo a tie node v, lett (v) denote its subtie and T (v) denote the numbe of nodes in T (v). The total time of tavesing this subtie is O ( T (v) ). Now we analyze the benefit of mateializing node v. As illustated in Figue 5, suppose v has mateialized descendants. Let M(v) be the set of highest mateialized descendants of v. These mateialized nodes can help educe the time of accessing the inveted lists of v s leaf nodes in two ways. Fist, we do not need to tavese the descendants of a mateialized node d M(v). We can just tavese T (v) d M(v) T (d) tie nodes. Second, when inseting lists to the max heap of w m, we inset the union list of v into the heap and need not inset the union list of each d M(v) and the inveted lists of d N(v) into the heap, whee N(v) denotes the set of v s leaf descendants having no ancestos in M(v). Let S(v) =M(v) N(v). We quantify benefits of mateializing node v:. Reducing tavesal time: Since we do not tavese v s descendants, the time eduction is B = O ( T (v) d M(v) T (d) ). 2. Reducing heap-constuction time: When constucting the max heap fo keywod w m, we inset the union list U(v) into the heap, instead of the inveted lists of those nodes in S(v). The time eduction is B 2 = S(v). 3. Reducing soted-access time: If we inset the union list U(v) tothemaxheapofw m, the numbe of leaf nodes in the heap is S(w m). Othewise, it is S(w m) + S(v). The time eduction of a soted access is B ( 3=O log( S(w ) m) + S(v) ) O ( log( S(w ) m) ). The following is the oveall benefit of mateializing v fo the patial keywod w m: B v = B + B 2 + A v B 3, (3) whee A v is the numbe of soted accesses on U(v). A v can be computed using the numbe of ecods in the union list U(v), and the numbe of keywods in the quey. The analysis above is on a quey wokload. If thee is no quey wokload, we can use the tie stuctue to count the pobability of each node to be queied and use such infomation to compute the benefit of mateializing a node. In this pape, we employ a no quey wokload setting. 4. FUZZY TYPE-AHEAD SEARCH In this section, we fist define the poblem of top-k queies in fuzzy type-ahead seach [3]. We then develop new techniques to suppot efficient list access to answe such queies by extending techniques developed in exact seach. 4. Ranking As a use types in a quey lette by lette, fuzzy type-ahead seach on-the-fly finds ecods with wods simila to the quey keywods. Fo example, conside the data in Table. Suppose a use types in a quey gaph gose. We etun 5 as a elevant answe since it has a keywod goss simila to quey keywod gose. We use edit distance to measue the similaity between stings. Fomally, the edit distance between two stings s and s 2, denoted by ed(s, s 2), is the minimum numbe of single-chaacte edit opeations (i.e., insetion, deletion, and substitution) needed to tansfom s to s 2. Fo example, ed(goss, gose) =. Similaity Function: Let π be a function that computes the similaity between a data sting s and a quey keywod w in Q = w,w 2,...,w m. An example is: π(s, w) = ed(s, w), w whee w is the length of the quey keywod w. We nomalize the edit distance based on the quey-keywod length in ode to allow moe eos fo longe quey keywods. Ou esults in the pape focus on this function, and they can be genealized to othe functions using edit distance. Let d be a keywod in the data set D. Foeachcomplete keywod w i (i =, 2,...,m ) in the quey, we define the similaity of d to w i as: Sim(d, w i)=π(d, w i). Since the last keywod w m is teated as a pefix condition, we define the similaity of d to w m as the maximal similaity of d s pefixes using function π, i.e.: Sim(d, w m)= max {π(p, wm)}. pefix p of d Let τ be a similaity theshold. We say a keywod d in D is simila to a quey keywod w if Sim(d, w) τ. Wesaya pefix p of a keywod in D is simila to the quey keywod w m if π(p, w m) τ. We want to find the keywods in the data set that ae simila to quey keywods, since ecods with such a keywod could be of inteest to the use. 359

6 Quey Keywods Legend: w w 2 w m Tie Simila pefixes Inveted lists Patial keywod Simila complete wods Figue 6: Keywods simila to those in quey Q = w,w 2,...,w m. Each quey keywod w i has simila keywods on leaf nodes. The last pefix keywod w m has simila pefixes. Let Φ(w i)(i =,...,m)denotethesetofkeywodsin D simila to w i,andp (w m) denote the set of pefixes (of keywods in D) simila to w m. We compute the top-k answes to the quey Q in two steps. In the fist step, fo each keywod w i in the quey, we fist compute an edit-distance uppe bound based on the similaity function, i.e., ( τ) w i, and then compute the simila keywods Φ(w i)and simila pefixes P (w m) on the tie (shown in Figue 6). Ji et al. [3] developed an efficient algoithm fo incementally computing these simila stings as the use modifies the cuent quey. A simila algoithm is developed in [5]. In the second step, we access the inveted lists of these simila data keywods to compute the k best answes. Fo example, assume a use types in a quey gose li lette by lette on the data shown in Table. Suppose the similaity theshold τ is.45. The set of pefixes simila to the patial keywod li isp ( li ) = {l, li, lin, liu, lu, lui, i}, and the set of data keywods simila to the patial keywod li isφ( li ) = {lin, liu, lui, icdl, icdm}. In paticula, lui is simila to li since Sim(lui, li) = ed(lui,li) li =.5 τ. The set of simila wods fo the complete keywod gose is Φ( gose ) = {goss}. Then we compute top-k answes using the inveted lists of those wods in Φ( gose ) and Φ( li ). Ranking: We still assume the anking function has the fist popety descibed in Section 2, which computes the scoe F (, Q) by applying a monotonic function on the F (, w i) s fo all the keywods w i in the quey. Given a complete keywod w i and a ecod, fo exact seach, we can use the weight of w i in, i.e., W (, w i), to denote thei elevancy F (, w i). But fo fuzzy seach, the keywod w i can be simila to multiple keywods in the ecod, and diffeent simila wods have diffeent similaities to w i and diffeent weights in. A question is how to compute the elevance value of keywod w i in ecod, F (, w i). Let d be a keywod in ecod such that d is simila to the quey keywod w i, i.e., d Φ(w i). We use F (, w i,d)to denote the elevance of this quey keywod w i in the ecod with espect to keywod d. The value should depend on both the weight of d in, i.e., W (, d), as well as the similaity between w i and d, i.e., Sim(d, w i). Intuitively, the moe simila they ae, the moe elevant w i is to in tems of d. Fo instance, F (, w i,d)=sim(d, w i) W (, d) isanexample anking function to evaluate the elevancy of w i in the ecod with espect to keywod d. We use the following function with the second popety in Section 2 to compute F (, w i): F (, w i)= max {F (, w i,d)}. (4) keywod d (in ) simila to w i 4.2 Efficient Random Access We fist study how to suppot efficient andom access fo fuzzy type-ahead seach. Fo simplicity, in the discussion we focus on how to veify whethe the ecod has a keywod with a pefix simila to the patial keywod w m. With mino modifications the discussion extends to the case whee we want to veify whethe has a keywod simila to a complete keywod w i( i m ). In each andom access, given an ID of a ecod, wewant to etieve infomation elated to a quey keywod w i,which allows us to etieve W (, d) fo each of w i s simila wod d so as to compute the scoe F (, w i). In paticula, fo a keywod w i in the quey, does the ecod have a keywod simila to w i? One naive way to get the infomation is to etieve the oiginal ecod and go though its keywods. This appoach has two limitations. Fist, if the data is too lage to fit into memoy and has to eside on had disks, accessing the oiginal data fom the disks may slow down the pocess significantly. This costly opeation will pevent us fom achieving an inteactive-seach speed. The second limitation is that it may equie a lot of computation of sting similaities based on edit distance, which could be time consuming. In this section, we pesent two efficient appoaches fo solving this poblem. Method : Pobing on Fowad Lists: This method veifies whethe ecod contains a keywod with a pefix simila to w m as follows. Fo each pefix p on the tie simila to w m (computed in the fist step of the algoithm as discussed above), we check if thee is a keywod ID on the fowad list of in the keywod ange [l p,u p] of the tie node of p as discussed in Section 3. Method 2: Pobing on Tie Leaf Nodes: Using this method, fo each pefix p simila to w m, we tavese the subtie of p and identify its leaf nodes. Fo each leaf node d, we stoe the fact that fo the quey Q, thiskeywodd has a pefix simila to w m in the quey. Specifically, we stoe Quey ID, patial keywod w m, Sim(p, w m). We stoe the quey ID in ode to diffeentiate it fom othe queies in case multiple queies ae answeed concuently. We stoe the similaity between w m and p to compute the scoe of this keywod in a candidate ecod. In case the leaf node has seveal pefixes simila to w m, we only keep thei maximal similaity to w m. Fo each complete keywod w i, we also stoe the same infomation fo those tie nodes simila to w i. Theefoe, a leaf node might have multiple enties coesponding to diffeent keywods in the same quey. We call these enties fo the leaf node as its collection of elevant quey keywods. Notice that this stuctue needs vey little stoage space, since the enties of old queies can be quickly eused by new queies, and the numbe of keywods in a quey tends to be small. We use this additional infomation to efficiently check if a ecod contains a complete wod with a pefix simila to the patial keywod w m. We scan the fowad list of. Fo each of its keywod IDs, we locate the coesponding leaf node, and test whethe its collection of elevant quey keywods includes this quey and 36

7 p [,4] a [,4] y g [,2] [3,4] [,] s o [3,3] [4,4] u Fowad index [5,6] [7,9] Recod Fowad list l [5,6] [7,8] [9,9],2 ;6,3 i u [5,6],3 ;4,9 ;9,6 2,9 ;5,2 ;8,3 i c d n u,4 ;5,2 ;7,9;9,4 l m ,7 ;4,3;6,9;7,2;8,7 h s p q,lin,.66,9 ;2,8;3,4;6,8;7,3;8,8 q,gose,.8 q,lin, q 2, liu,... q 2,goss, q 2, liu,.66 Figue 7: Pobing on tie leaf nodes. i the keywod w m. If so, we use the stoed sting similaity to compute the scoe of this keywod in the quey. Figue 7 shows how we use this method in ou unning example, whee the use types in a keywod quey q = lin, gose. When computing the simila wods of gose, i.e., goss, we inset the quey ID (shown as q ), the patial keywod gose, and the coesponding pefix similaity to its collection of elevant quey keywods. To veify whethe ecod 5 has a wod with a pefix simila to gose, we scan its fowad list. Its thid keywod is goss. We access its coesponding leaf node, and see that the node s collection of elevant quey keywods includes gose. Thus we know that 5 indeed contains a keywod simila to gose, and can etieve the coesponding pefix similaity. Compaison: The time complexity of the fowad-list based method (Method ) is O ( G log( ) ), whee G is the total numbe of simila pefixes of w m and simila complete wods of w i s fo i m, and is the numbe of distinct keywods in ecod. Since the simila pefixes of w m could have ancesto-descendant elationships, we can optimize the step of accessing them by consideing the highest ones. The time complexity of the second method is O( T (p) + Q ). smila pefix p of w m The fist tem coesponds to the time of tavesing the subties of simila pefixes, whee T (p) is the subtie ooted at a simila pefix p. The second tem coesponds to the time of pobing the leaf nodes, whee Q is the numbe of quey keywods. Notice that to identify the answes, we need access the inveted lists of complete wods, thus the fist tem can be emoved fom the complexity. Method is pefeed fo data sets whee ecods have a lot of keywods such as long documents, while Method 2 is pefeed fo data sets whee ecods have a small numbe of keywods such as elational tables with elatively shot attibute values. 4.3 Efficient Soted Access Heap-Based Method: Fo a quey keywod w, wewant to suppot soted access that can access ecod IDs based on the elevance of w to these ecods. As w has multiple simila wods, we can suppot soted access efficiently by building a max heap on the inveted lists of such simila wods, as descibed in Section 3. Notice that, in exact seach, each leaf node has the same similaity to w; but fo fuzzy seach, diffeent leaf nodes could have diffeent similaities. Thus, when pushing a ecod fom an inveted list of a simila wod d to the heap, we maintain, F (, d) in the heap. We push/pop the ecod on the heap with the maximal F (, d). Conside the quey icdm li. Figue 8 shows the two heaps fo the two keywods. Fo illustation puposes, fo icdm li 4,9 3,9 3,9 4,9 5,8 5,8 3,9 4,4.5 6,5 7,8 7,3 4,9 9,4 3,9 5,8,3 *3/4 4,4.5 * 4,7 7,3 6,5 * * 7,4 4,9 4,4.5 7,3,3 9,4 3, 9 5,8 */2 8,2 5,8 4,7 */2 */2 8,.5,3 7,8 3,2 6,5 6,5,6 4,9 7,6 3,.5 2,3 6,4 2,2 9,4 2,.5,3,.5 9,4 3,4 5,8 8,5 5,3 6,5 2,3 3,4 8, 4,2 9,4 8, 2,3 icdl icdm lin liu,3 lui icdm icdl Figue 8: Max heaps fo the quey keywods icdm and li. Each shaded list is meged fom the undelying lists. It is vitual since we do not need to compute the entie list. each keywod we also show the vitual meged list of ecods with thei scoes, and this list is only patially computed duing the tavesal of the undelying lists. Each ecod on a heap has an associated scoe of this keywod with espect to the quey keywod, computed using Equation 4. List Puning: As thee may be a lage numbe of simila wods fo a quey keywod, especially fo the patial keywod, it could be expensive to constuct a heap on the fly. We futhe impove the pefomance of soted access on the vitual soted list U(w) by using the idea of on-demand heap constuction, i.e., we want to avoid constucting a heap fo all the inveted lists of keywods simila to a quey keywod. Suppose w has t simila wods. Each push/pop opeation on the heap of these lists takes O(log(t)) time. If we can educe the numbe of lists on the heap, we can educe the cost of its push/pop opeations. We have two obsevations about this puning method. () As a special case, if those keywods matching quey keywods exactly have the highest elevance scoes, this method allows us to conside these ecods pio to consideing othe ecods with mismatching keywods. (2) The puning can be moe poweful if w is the last patial keywod w m,sincemanyof its simila keywods shae the same pefix p on the tie. Conside quey icdm li, Figue 8 illustates how we can pune low-scoe lists and do on-demand heap constuctions. The pefix li has seveal simila keywods. Among them, the two wods lin and liu have the highest similaity value to the quey keywod, mainly because they have a pefix matching the keywod exactly. We build a heap using these two lists. To compute the top- best answe, the lists of lui, icdm, and icdl ae neve included in the heap since thei uppe bounds ae always smalle than the scoes of popped ecods befoe the tavesal teminates. We next intoduce how to do list puning fo the max-heap based methods in fuzzy type-ahead seach. Given a keywod w, letd,...,d t be its simila wods and L,...,L t be the coesponding inveted lists, espectively. We need not use all the inveted lists to build the max heap of w. Instead, we use those with highe similaities to w to on-demand build the max heap. We fist sot these inveted lists based on the similaities of thei keywods to w, without loss of geneality, suppose Sim(d,w) >...>Sim(d t,w). We fist constuct the max heap using the lists with the highest similaity values and then include othe lists on-demand. Suppose L i is a list not included in the heap so fa. We can deive an uppe bound u i on the scoe of a ecod fom L i (with espect to the quey keywod w) using the lagest 36

8 weight on the list and the sting similaity Sim(d i,w). Let be the top ecod on the heap, with a scoe F (, w). If F (, w) u i, then this list does not need to be included in the heap, since it cannot have a ecod with a highe scoe. Othewise, this list needs to be included in the heap. Based on this analysis, each time we pop a ecod fom the heap and push a new ecod, we compae the scoe of the new ecod with the uppe bounds of those lists not included in the heap so fa. Fo those lists with an uppe bound geate than this scoe, they need to be included in the heap fom now on. Notice that this checking can be done vey efficiently by stoing the maximal value of these uppe bounds, and odeing these lists based on thei uppe bounds. The puning powe can be even moe significant if the keywod w is the patial keywod w m, since many of its simila keywods shae the same pefix p on the tie simila to w m. We can compute an uppe bound of the ecod scoes fom these lists and stoe the bound on the tie node p. In this way, we can pune the lists moe effectively by compaing the value F (, w) with this uppe bound stoed on the tie, without needing to on-the-fly compute the bound. List Mateialization: Fo fuzzy seach, the patial keywod w m has multiple simila pefixes and each simila pefix has multiple simila wods. The max heap of w m is built on top of inveted lists of such simila wods. Let d be such a simila wod. Recall that the value F (, w m,d) of a ecod on the list of a simila wod d with espect to w m is based on both W (d, ) andsim(d, w m). Let v be a mateialized node. To use U(v) to eplace the lists of v s leaf nodes in the max heap, the following two conditions need to be satisfied: All the leaf nodes of v have the same similaity to w m. All the leaf nodes of v ae simila to w m, i.e., thei similaity to w m is no less than the theshold τ. When the conditions ae satisfied, the soting ode of the union list U(v) is also the ode of the scoes of the ecods on the leaf-node lists with espect to w m. A mateialized node v that satisfies the two conditions must be a descendant of a simila pefix of patial keywod w m. We can pove this by contadiction. Suppose node v is not a descendant of any simila pefix of patial keywod w m. Then node v and its ancestos ae not simila pefixes of w m,thatis the leaf nodes of v ae not simila keywods of w m. This is contadicted with the second condition. Thus a mateialized node v that satisfies the two conditions must be a descendant of a simila pefix of patial keywod w m. Suppose p,p 2,...,p n ae simila pefixes of w m. We check whethe thei mateialized descendants satisfy the two conditions as follows. Conside a mateialized node v which has ancestos among p,p 2,...,p n.ifnodev has no descendants that ae simila pefixes of w m, v must satisfy the two conditions; othewise suppose p j is a descendant of v that is a simila pefix of w m and has the lagest similaity to v among all such descendants. Without loss of geneality, let p i be an ancesto of v and has the lagest similaity with v among all simila pefixes. If Sim(v, p j) Sim((v, p i), v satisfies the two conditions; othewise v will not. Thus we can find usable mateialized nodes to constuct the max heap of w m and use ou poposed techniques in Section to do a cost-based analysis to select high-quality nodes fo mateialization. 5. EXPERIMENTS We implemented ou poposed techniques and compaed with existing methods on thee eal data sets. () DBLP : It included compute science publication ecods 4. (2) URL 5 : It included million URLs. (3) Enon : It was an collection 6. Table 2 shows details of the data. Table 2: Data sets and index costs. Data Set URL DBLP Enon # of Recods (millions).5 Data size. GB 5 MB.4 GB Avg. # of wods/ecod # of distinct keywods (millions) Tie size 42 MB 3 MB 28 MB Size of inveted lists 379 MB 83 MB 342 MB Fo the DBLP data set, we selected eal queies fom the logs of ou deployed systems and each quey contained -6 keywods 7. Fo the othe two data sets, we geneated queies with keywods andomly selected fom the set of wods used in the collection. We assumed the lettes of a quey wee typed in one by one. Fo each keystoke, we measued the time of computing the top-k answes to this quey. Fo exact seach, we measued the total unning time. Fo fuzzy seach, we measued the time in two steps: in step we computed keywods on the tie simila to the quey keywods (using the algoithm descibed in [3]); in step 2 we found the top-k answes using the inveted lists of these simila keywods. Unless othewise specified, k =. We compaed ou method with state-of-the-at method [3]. We implemented the NRA algoithm descibed in [6] if we only do soted access, and the Theshold Algoithm ( TA ) if we can do both soted access and andom access. All the indexes wee built off-line and pe-loaded and full-esident in memoy duing all queying opeations. All expeiments wee un on a Ubuntu Linux machine with an Intel Coe pocesso (X545 3.GHz and 4 GB RAM). 5. Exact Seach Soted Access Only: We implemented the following methods. () BinayPobe [3]: We consideed the inveted lists of the complete quey keywods, and the union of the inveted lists fo the complete keywods of the patial keywod. We chose the shotest list, and fo each of its ecod IDs, we did binay pobings on othe lists. (2) NRA(Heap): We implemented the NRA algoithm using the heap-based technique. (3) NRA(Heap+Mateialization 8 ): We implemented the NRA algoithm using the heap-and-mateialization-based techniques. Figue 9 shows the esults on the Enon dataset, which showed that ou method impoved seach efficiency. Fo instance, fo queies with a patial keywod of length 2, NRA(Heap) educed the quey time of BinayPobe fom 28 ms to ms. NRA(Heap+Mateialization) futhe educed the time to 2 ms. This is because ) BinayPobe fist computed all esults and then anked them; 2) BinayPobe on-the-fly computed the union list of the patial keywod. NRA(Heap) used the max heap to geneate a soted patial list and NRA(Heap+Mateialization) used mateialized lists to save push/pop opeations on the heap. Soted Access + Random Access: We implemented the following methods. () BinayPobe (Fowad List)[3], we chose the shotest list, and fo each of its ecod IDs, we veified whethe the ecod ID contained othe keywods enon/ 7 Details ae omitted due to double-blind eview. 8 We used additional 5% space with espect to inveted index fo mateialization in the expeiments. 362

9 Quey Time (ms) BinayPobe NRA(Heap) NRA(Heap+Mateialization) Quey Time (ms) BinayPobe NRA(Heap) NRA(Heap+Mateialization) Quey Time (ms) BinayPobe(Fowad List) TA(Fowad List+Heap) TA(Fowad List+Heap+Mateialization) Quey Time (ms) BinayPobe(Fowad List) TA(Fowad List+Heap) TA(Fowad List+Heap+Mateialization) # of ecods (*K) Length of the pefix keywod (a) Vaying Data Size (b) Vaying pefix length Figue 9: Exact seach using soted access (Enon). using the fowad list. (2) TA(Fowad List+Heap): We implemented the TA algoithm using fowad list fo andom access and max heap fo soted access. (3) TA(Fowad List+Heap+Mateialization): We implemented the TA algoithm using fowad list, max heap, and list mateialization. Figue shows the esults on the DBLP dataset. We can see that the andom-access techniques indeed impoved efficiency. 5.2 Fuzzy Seach Soted Access Only: We fist evaluated the effect of the list-puning technique. Figue shows the expeimental esults (including two steps). We can obseve that list puning indeed impoved seach efficiency. Fo the Enon dataset with.5m ecods, the method with puning can educe the time fom 3 ms to 7 ms. The puning technique was moe effective on the Enon dataset than on the othe two datasets mainly due to two easons. Fist, the Enon dataset had moe tie nodes due to its lage numbe of distinct keywods in the s. Thus a quey keywod can have moe simila pefixes on the tie. Second, the Enon dataset had fewe ecods, and the inveted lists wee elatively shote. Duing the list tavesal, the NRA algoithm visited fewe ecods, and its highe scoe of the top ecod fom the max heap helped us pune moe lists. List Mateialization: We evaluated the impovement on soted access using list mateialization fo fuzzy type-ahead seach. We measued the amount of stoage space fo stoing mateialized lists as a pecentage of the total size of the inveted lists on the tie. We vaied this amount, and measued the aveage time of finding the top- answes using the NRA algoithm. Figue 2 shows the esults. We can see that list mateialization impoved the seach pefomance. We implemented the diffeent methods fo list mateialization, namely Random, TopDown, BottomUp, and CostBased as discussed in Section Figue 3 shows the esults. Among the thee naive methods, Random gave the best esults. The CostBased algoithm outpefomed all the naive methods. This is because CostBased selected high-quality nodes fo mateialization using a cost-based analysis. Soted Access + Random Access: We implemented the TA algoithm using the two methods fo andom access and list puning fo soted access (descibed in Section 4). Figue 4 shows the scalability esults on the thee datasets. The two andom-access methods scaled well. Method 2 (pobing on tie leaf nodes) outpefomed Method (pobing on fowad lists). This is because fo the thee data sets, thee wee many pefixes simila to the patial keywod, and Method needed to conside all simila pefixes fo each ecod on fowad lists. 6. RELATED WORK Thee ae many studies on autocomplete and phase pediction fo use queies [22, 5, 9, 23, 7]. Google instant seach was # of ecods (*K) Length of the pefix keywod (a) Vaying Data Size (b) Vaying pefix length Figue :Exact seach using andom access(dblp). launched to suppot type-ahead seach. It fist suggested elevant queies based on use pofiles and quey logs and then answeed the top queies. Chaudhui et al. [5] studied how to find simila stings inteactively as uses type in a quey sting, using an appoach simila to that in [3, 2]. They did not study the case whee a quey has multiple keywods that need list-intesection opeations. The seach paadigm studied in this pape is diffeent since we suppot fuzzy, full-text seach as uses type in queies. Bast et al. poposed techniques to suppot type-ahead seach in thei CompleteSeach systems [2, 3, ]. Anothe study [9] is about type-ahead seach on elational data gaphs. Ji et al. [3] developed algoithms fo fuzzy type-ahead seach. Ou wok extends these studies by developing efficient algoithms to suppot top-k seach. Khoussainova et al. [4] poposed to suggest elevant SQL snippets as uses type in SQL queies. Li et al. [8] studied how to use SQLs to suppot type-ahead seach in databases. Feng et al. [8] studied fuzzy seach on XML data. Thee have been many studies on suppoting fuzzy seach (e.g., [, 7, 4,, 24, 6]). Howeve these algoithms ae inefficient fo type-ahead seach since they have low puning powe fo shot stings (patial keywods). The expeiments in [3, 5] showed that these appoaches ae not as efficient as tie-based methods fo fuzzy type-ahead seach. Theobald et al. [25] poposed a heap-based method fo quey expansion. They used WodNet wods and only utilized soted access. conside both soted access and andom access. We 7. CONCLUSION In this pape we studied how to efficiently answe top-k queies in type-ahead seach. We focused on an index stuctue with a tie of keywods in a data set and inveted lists of ecods on the tie leaf nodes. We studied technical challenges when adopting existing top-k algoithms in the liteatue: how to efficiently suppot andom access and soted access on inveted lists? We pesented two algoithms fo suppoting andom access, and poposed optimization techniques using list puning and mateialization to suppot soted access. Ou techniques can be easily extended to suppot lage datasets though data patition. Fo example, we have built a system to seach on 2 million MEDLINE publication ecods using two machines. Acknowledgement. The authos have financial inteest in Bimaple Technology Inc., a company cuently commecializing some of the techniques descibed in this publication. Chen Li is patially suppoted by the NIH gant R2LM43-A and the National Natual Science Foundation of China (No. 6292). Guoliang Li, Jianan Wang, and Jianhua Feng wee patly suppoted by the National Natual Science Foundation of China (No. 634), the National Gand Fundamental Reseach 973 Pogam of China (No. 2CB3226), Tsinghua Univesity (No. 2873), and the NExT Reseach Cente funded by MDA, Singapoe (No. WBS:R ). 363

10 Quey Time (ms) Without Puning Puning Computing Simila Keywods Quey Time (ms) Without Puning Puning Computing Simila Keywods Quey Time (ms) Without Puning Puning Computing Simila Keywods # of ecods (*M) # of ecods (*K) # of ecods (*K) (a) URL (b) DBLP (c) Enon Figue : Fuzzy seach using list puning (similaity theshold τ =.6). Quey Time (ms) keywod queies 4-keywod queies 3-keywod queies 2-keywod queies -keywod queies % % 2% 3% 4% 5% Additional Space/Inveted-Index Size Quey Time (ms) keywod queies 4-keywod queies 3-keywod queies 2-keywod queies -keywod queies % % 2% 3% 4% 5% Additional Space/Inveted-Index Size Quey Time (ms) keywod queies 4-keywod queies 3-keywod queies 2-keywod queies -keywod queies 5 % % 2% 3% 4% 5% Additional Space/Inveted-Index Size (a) URL (b) DBLP (c) Enon Figue 2: Fuzzy seach using list mateialization (soted access only, with list puning, theshold τ =.6). Quey Time (ms) 5 5 TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/Inveted-Index Size Quey Time (ms) TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/Inveted-Index Size Quey Time (ms) TopDown BottomUp Random CostBased % % 2% 3% 4% 5% Additional Space/Inveted-Index Size (a) URL (b) DBLP (c) Enon Figue 3: Compaison of diffeent mateialization methods (similaity theshold τ =.6). Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*M) Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*K) Quey Time (ms) SA+RA(Pobing on Fowad Lists) SA+RA(Pobing on Leaf Nodes) SA Computing Simila Keywods # of ecods (*K) (a) URL (b) DBLP (c) Enon Figue 4: Fuzzy seach with soted access ( SA ) and andom access ( RA ) (similaity theshold τ =.6). 8. REFERENCES [] H. Bast, A. Chitea, F. M. Suchanek, and I. Webe. Este: efficient seach on text, entities, and elations. In SIGIR, pages , 27. [2] H. Bast and I. Webe. Type less, find moe: fast autocompletion seach with a succinct index. In SIGIR, pages , 26. [3] H. Bast and I. Webe. The completeseach engine: Inteactive, efficient, and towads i& db integation. In CIDR, pages 88 95, 27. [4] S. Chaudhui, V. Ganti, and R. Kaushik. A pimitive opeato fo similaity joins in data cleaning. In ICDE, pages 5 6, 26. [5] S. Chaudhui and R. Kaushik. Extending autocompletion to toleate eos. In SIGMOD Confeence, pages 77 78, 29. [6] R. Fagin, A. Lotem, and M. Nao. Optimal aggegation algoithms fo middlewae. In PODS, pages 2 3, 2. [7] J. Fan, G. Li, and L. Zhou. Inteactive SQL quey suggestion: Making databases use-fiendly. ICDE, pages , 2. [8] J. Feng, and G. Li. Efficient Fuzzy Type-Ahead Seach in XML Data. IEEE TKDE, 24(5): , 22. [9] K. Gabski and T. Scheffe. Sentence completion. In SIGIR, pages , 24. [] L. Gavano, P. G. Ipeiotis, H. V. Jagadish, N. Koudas, S. Muthukishnan, and D. Sivastava. Appoximate sting joins in a database (almost) fo fee. In VLDB, pages 49 5, 2. [] M. Hadjieleftheiou, A. Chandel, N. Koudas, and D. Sivastava. Fast indexes and algoithms fo set similaity selection queies. In ICDE, pages , 28. [2] I. F. Ilyas, G. Beskales, and M. A. Soliman. A suvey of top-k quey pocessing techniques in elational database systems. ACM Comput. Suv., 4(4), 28. [3] S. Ji, G. Li, C. Li, and J. Feng. Efficient inteactive fuzzy keywod seach. In WWW, pages 37 38, 29. [4] N. Khoussainova, Y. Kwon, M. Balazinska, and D. Suciu. Snipsuggest: Context-awae autocompletion fo sql. PVLDB, 4():22 33, 2. [5] K. Kukich. Techniques fo automatically coecting wods in text. ACM Comput. Suv., 24(4): , 992. [6] H. Lee, R. T. Ng, and K. Shim. Extending q-gams to estimate selectivity of sting matching with low edit distance. In VLDB, pages 95 26, 27. [7] C. Li, J. Lu, and Y. Lu. Efficient meging and filteing algoithms fo appoximate sting seaches. In ICDE, pages , 28. [8] G. Li, J. Feng, and C. Li. Suppoting seach-as-you-type using sql in databases. IEEE TKDE, 22. [9] G. Li, S. Ji, C. Li, and J. Feng. Efficient type-ahead seach on elational data: a tastie appoach. In SIGMOD Confeence, pages , 29. [2] G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy full-text type-ahead seach. VLDB J., 2(4):67-64, 2. [2] N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggegation of anked inputs. In ICDE, page 72 83, 26. [22] H. Motoda and K. Yoshida. Machine leaning techniques to make computes easie to use. Atif. Intell., 3(-2):295 32, 998. [23] A. Nandi and H. V. Jagadish. Effective phase pediction. In VLDB, pages 29 23, 27. [24] J. Qin, W. Wang, Y. Lu, C. Xiao, and X. Lin. Efficient exact edit similaity quey pocessing with the asymmetic signatue scheme. In SIGMOD Confeence, pages 33 44, 2. [25] M. Theobald, R. Schenkel, and G. Weikum. Efficient and self-tuning incemental quey expansion fo top-k quey pocessing. In SIGIR, pages ,

Top K Nearest Keyword Search on Large Graphs

Top K Nearest Keyword Search on Large Graphs Top K Neaest Keywod Seach on Lage Gaphs Miao Qiao, Lu Qin, Hong Cheng, Jeffey Xu Yu, Wentao Tian The Chinese Univesity of Hong Kong, Hong Kong, China {mqiao,lqin,hcheng,yu,wttian}@se.cuhk.edu.hk ABSTRACT

More information

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents

Uncertain Version Control in Open Collaborative Editing of Tree-Structured Documents Uncetain Vesion Contol in Open Collaboative Editing of Tee-Stuctued Documents M. Lamine Ba Institut Mines Télécom; Télécom PaisTech; LTCI Pais, Fance mouhamadou.ba@ telecom-paistech.f Talel Abdessalem

More information

Over-encryption: Management of Access Control Evolution on Outsourced Data

Over-encryption: Management of Access Control Evolution on Outsourced Data Ove-encyption: Management of Access Contol Evolution on Outsouced Data Sabina De Capitani di Vimecati DTI - Univesità di Milano 26013 Cema - Italy decapita@dti.unimi.it Stefano Paaboschi DIIMM - Univesità

More information

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM

AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM AN IMPLEMENTATION OF BINARY AND FLOATING POINT CHROMOSOME REPRESENTATION IN GENETIC ALGORITHM Main Golub Faculty of Electical Engineeing and Computing, Univesity of Zageb Depatment of Electonics, Micoelectonics,

More information

Software Engineering and Development

Software Engineering and Development I T H E A 67 Softwae Engineeing and Development SOFTWARE DEVELOPMENT PROCESS DYNAMICS MODELING AS STATE MACHINE Leonid Lyubchyk, Vasyl Soloshchuk Abstact: Softwae development pocess modeling is gaining

More information

An Efficient Group Key Agreement Protocol for Ad hoc Networks

An Efficient Group Key Agreement Protocol for Ad hoc Networks An Efficient Goup Key Ageement Potocol fo Ad hoc Netwoks Daniel Augot, Raghav haska, Valéie Issany and Daniele Sacchetti INRIA Rocquencout 78153 Le Chesnay Fance {Daniel.Augot, Raghav.haska, Valéie.Issany,

More information

An Introduction to Omega

An Introduction to Omega An Intoduction to Omega Con Keating and William F. Shadwick These distibutions have the same mean and vaiance. Ae you indiffeent to thei isk-ewad chaacteistics? The Finance Development Cente 2002 1 Fom

More information

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing M13914 Questions & Answes Chapte 10 Softwae Reliability Pediction, Allocation and Demonstation Testing 1. Homewok: How to deive the fomula of failue ate estimate. λ = χ α,+ t When the failue times follow

More information

Mining Relatedness Graphs for Data Integration

Mining Relatedness Graphs for Data Integration Mining Relatedness Gaphs fo Data Integation Jeemy T. Engle (jtengle@indiana.edu) Ying Feng (yingfeng@indiana.edu) Robet L. Goldstone (goldsto@indiana.edu) Indiana Univesity Bloomington, IN. 47405 USA Abstact

More information

The transport performance evaluation system building of logistics enterprises

The transport performance evaluation system building of logistics enterprises Jounal of Industial Engineeing and Management JIEM, 213 6(4): 194-114 Online ISSN: 213-953 Pint ISSN: 213-8423 http://dx.doi.og/1.3926/jiem.784 The tanspot pefomance evaluation system building of logistics

More information

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH

YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH nd INTERNATIONAL TEXTILE, CLOTHING & ESIGN CONFERENCE Magic Wold of Textiles Octobe 03 d to 06 th 004, UBROVNIK, CROATIA YARN PROPERTIES MEASUREMENT: AN OPTICAL APPROACH Jana VOBOROVA; Ashish GARG; Bohuslav

More information

Chapter 3 Savings, Present Value and Ricardian Equivalence

Chapter 3 Savings, Present Value and Ricardian Equivalence Chapte 3 Savings, Pesent Value and Ricadian Equivalence Chapte Oveview In the pevious chapte we studied the decision of households to supply hous to the labo maket. This decision was a static decision,

More information

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods

A framework for the selection of enterprise resource planning (ERP) system based on fuzzy decision making methods A famewok fo the selection of entepise esouce planning (ERP) system based on fuzzy decision making methods Omid Golshan Tafti M.s student in Industial Management, Univesity of Yazd Omidgolshan87@yahoo.com

More information

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor

Reduced Pattern Training Based on Task Decomposition Using Pattern Distributor > PNN05-P762 < Reduced Patten Taining Based on Task Decomposition Using Patten Distibuto Sheng-Uei Guan, Chunyu Bao, and TseNgee Neo Abstact Task Decomposition with Patten Distibuto (PD) is a new task

More information

Approximation Algorithms for Data Management in Networks

Approximation Algorithms for Data Management in Networks Appoximation Algoithms fo Data Management in Netwoks Chistof Kick Heinz Nixdof Institute and Depatment of Mathematics & Compute Science adebon Univesity Gemany kueke@upb.de Haald Räcke Heinz Nixdof Institute

More information

Towards Automatic Update of Access Control Policy

Towards Automatic Update of Access Control Policy Towads Automatic Update of Access Contol Policy Jinwei Hu, Yan Zhang, and Ruixuan Li Intelligent Systems Laboatoy, School of Computing and Mathematics Univesity of Westen Sydney, Sydney 1797, Austalia

More information

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years.

est using the formula I = Prt, where I is the interest earned, P is the principal, r is the interest rate, and t is the time in years. 9.2 Inteest Objectives 1. Undestand the simple inteest fomula. 2. Use the compound inteest fomula to find futue value. 3. Solve the compound inteest fomula fo diffeent unknowns, such as the pesent value,

More information

Database Management Systems

Database Management Systems Contents Database Management Systems (COP 5725) D. Makus Schneide Depatment of Compute & Infomation Science & Engineeing (CISE) Database Systems Reseach & Development Cente Couse Syllabus 1 Sping 2012

More information

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION

STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION Page 1 STUDENT RESPONSE TO ANNUITY FORMULA DERIVATION C. Alan Blaylock, Hendeson State Univesity ABSTRACT This pape pesents an intuitive appoach to deiving annuity fomulas fo classoom use and attempts

More information

An Approach to Optimized Resource Allocation for Cloud Simulation Platform

An Approach to Optimized Resource Allocation for Cloud Simulation Platform An Appoach to Optimized Resouce Allocation fo Cloud Simulation Platfom Haitao Yuan 1, Jing Bi 2, Bo Hu Li 1,3, Xudong Chai 3 1 School of Automation Science and Electical Engineeing, Beihang Univesity,

More information

Concept and Experiences on using a Wiki-based System for Software-related Seminar Papers

Concept and Experiences on using a Wiki-based System for Software-related Seminar Papers Concept and Expeiences on using a Wiki-based System fo Softwae-elated Semina Papes Dominik Fanke and Stefan Kowalewski RWTH Aachen Univesity, 52074 Aachen, Gemany, {fanke, kowalewski}@embedded.wth-aachen.de,

More information

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS

ON THE (Q, R) POLICY IN PRODUCTION-INVENTORY SYSTEMS ON THE R POLICY IN PRODUCTION-INVENTORY SYSTEMS Saifallah Benjaafa and Joon-Seok Kim Depatment of Mechanical Engineeing Univesity of Minnesota Minneapolis MN 55455 Abstact We conside a poduction-inventoy

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distibuted Computing and Big Data: Hadoop and Map Bill Keenan, Diecto Tey Heinze, Achitect Thomson Reutes Reseach & Development Agenda R&D Oveview Hadoop and Map Oveview Use Case: Clusteing Legal Documents

More information

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation

Data Center Demand Response: Avoiding the Coincident Peak via Workload Shifting and Local Generation (213) 1 28 Data Cente Demand Response: Avoiding the Coincident Peak via Wokload Shifting and Local Geneation Zhenhua Liu 1, Adam Wieman 1, Yuan Chen 2, Benjamin Razon 1, Niangjun Chen 1 1 Califonia Institute

More information

Review Graph based Online Store Review Spammer Detection

Review Graph based Online Store Review Spammer Detection Review Gaph based Online Stoe Review Spamme Detection Guan Wang, Sihong Xie, Bing Liu, Philip S. Yu Univesity of Illinois at Chicago Chicago, USA gwang26@uic.edu sxie6@uic.edu liub@uic.edu psyu@uic.edu

More information

The Role of Gravity in Orbital Motion

The Role of Gravity in Orbital Motion ! The Role of Gavity in Obital Motion Pat of: Inquiy Science with Datmouth Developed by: Chistophe Caoll, Depatment of Physics & Astonomy, Datmouth College Adapted fom: How Gavity Affects Obits (Ohio State

More information

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING

HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING U.P.B. Sci. Bull., Seies C, Vol. 77, Iss. 2, 2015 ISSN 2286-3540 HEALTHCARE INTEGRATION BASED ON CLOUD COMPUTING Roxana MARCU 1, Dan POPESCU 2, Iulian DANILĂ 3 A high numbe of infomation systems ae available

More information

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment

Chris J. Skinner The probability of identification: applying ideas from forensic statistics to disclosure risk assessment Chis J. Skinne The pobability of identification: applying ideas fom foensic statistics to disclosue isk assessment Aticle (Accepted vesion) (Refeeed) Oiginal citation: Skinne, Chis J. (2007) The pobability

More information

Scheduling Hadoop Jobs to Meet Deadlines

Scheduling Hadoop Jobs to Meet Deadlines Scheduling Hadoop Jobs to Meet Deadlines Kamal Kc, Kemafo Anyanwu Depatment of Compute Science Noth Caolina State Univesity {kkc,kogan}@ncsu.edu Abstact Use constaints such as deadlines ae impotant equiements

More information

An Analysis of Manufacturer Benefits under Vendor Managed Systems

An Analysis of Manufacturer Benefits under Vendor Managed Systems An Analysis of Manufactue Benefits unde Vendo Managed Systems Seçil Savaşaneil Depatment of Industial Engineeing, Middle East Technical Univesity, 06531, Ankaa, TURKEY secil@ie.metu.edu.t Nesim Ekip 1

More information

Converting knowledge Into Practice

Converting knowledge Into Practice Conveting knowledge Into Pactice Boke Nightmae srs Tend Ride By Vladimi Ribakov Ceato of Pips Caie 20 of June 2010 2 0 1 0 C o p y i g h t s V l a d i m i R i b a k o v 1 Disclaime and Risk Wanings Tading

More information

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION

METHODOLOGICAL APPROACH TO STRATEGIC PERFORMANCE OPTIMIZATION ETHODOOGICA APPOACH TO STATEGIC PEFOANCE OPTIIZATION ao Hell * Stjepan Vidačić ** Željo Gaača *** eceived: 4. 07. 2009 Peliminay communication Accepted: 5. 0. 2009 UDC 65.02.4 This pape pesents a matix

More information

who supply the system vectors for their JVM products. 1 HBench:Java will work best with support from JVM vendors

who supply the system vectors for their JVM products. 1 HBench:Java will work best with support from JVM vendors Appeaed in the ACM Java Gande 2000 Confeence, San Fancisco, Califonia, June 3-5, 2000 HBench:Java: An Application-Specific Benchmaking Famewok fo Java Vitual Machines Xiaolan Zhang Mago Seltze Division

More information

High Availability Replication Strategy for Deduplication Storage System

High Availability Replication Strategy for Deduplication Storage System Zhengda Zhou, Jingli Zhou College of Compute Science and Technology, Huazhong Univesity of Science and Technology, *, zhouzd@smail.hust.edu.cn jlzhou@mail.hust.edu.cn Abstact As the amount of digital data

More information

Optimizing Content Retrieval Delay for LT-based Distributed Cloud Storage Systems

Optimizing Content Retrieval Delay for LT-based Distributed Cloud Storage Systems Optimizing Content Retieval Delay fo LT-based Distibuted Cloud Stoage Systems Haifeng Lu, Chuan Heng Foh, Yonggang Wen, and Jianfei Cai School of Compute Engineeing, Nanyang Technological Univesity, Singapoe

More information

Japan s trading losses reach JPY20 trillion

Japan s trading losses reach JPY20 trillion IEEJ: Mach 2014. All Rights Reseved. Japan s tading losses each JPY20 tillion Enegy accounts fo moe than half of the tading losses YANAGISAWA Akia Senio Economist Enegy Demand, Supply and Foecast Goup

More information

Modeling and Verifying a Price Model for Congestion Control in Computer Networks Using PROMELA/SPIN

Modeling and Verifying a Price Model for Congestion Control in Computer Networks Using PROMELA/SPIN Modeling and Veifying a Pice Model fo Congestion Contol in Compute Netwoks Using PROMELA/SPIN Clement Yuen and Wei Tjioe Depatment of Compute Science Univesity of Toonto 1 King s College Road, Toonto,

More information

Ilona V. Tregub, ScD., Professor

Ilona V. Tregub, ScD., Professor Investment Potfolio Fomation fo the Pension Fund of Russia Ilona V. egub, ScD., Pofesso Mathematical Modeling of Economic Pocesses Depatment he Financial Univesity unde the Govenment of the Russian Fedeation

More information

A Comparative Analysis of Data Center Network Architectures

A Comparative Analysis of Data Center Network Architectures A Compaative Analysis of Data Cente Netwok Achitectues Fan Yao, Jingxin Wu, Guu Venkataamani, Suesh Subamaniam Depatment of Electical and Compute Engineeing, The Geoge Washington Univesity, Washington,

More information

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

Efficient Redundancy Techniques for Latency Reduction in Cloud Systems Efficient Redundancy Techniques fo Latency Reduction in Cloud Systems 1 Gaui Joshi, Emina Soljanin, and Gegoy Wonell Abstact In cloud computing systems, assigning a task to multiple seves and waiting fo

More information

Comparing Availability of Various Rack Power Redundancy Configurations

Comparing Availability of Various Rack Power Redundancy Configurations Compaing Availability of Vaious Rack Powe Redundancy Configuations By Victo Avela White Pape #48 Executive Summay Tansfe switches and dual-path powe distibution to IT equipment ae used to enhance the availability

More information

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION

THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION IADIS Intenational Confeence Applied Computing 2006 THE DISTRIBUTED LOCATION RESOLUTION PROBLEM AND ITS EFFICIENT SOLUTION Jög Roth Univesity of Hagen 58084 Hagen, Gemany Joeg.Roth@Fenuni-hagen.de ABSTRACT

More information

Towards Realizing a Low Cost and Highly Available Datacenter Power Infrastructure

Towards Realizing a Low Cost and Highly Available Datacenter Power Infrastructure Towads Realizing a Low Cost and Highly Available Datacente Powe Infastuctue Siam Govindan, Di Wang, Lydia Chen, Anand Sivasubamaniam, and Bhuvan Ugaonka The Pennsylvania State Univesity. IBM Reseach Zuich

More information

Secure Smartcard-Based Fingerprint Authentication

Secure Smartcard-Based Fingerprint Authentication Secue Smatcad-Based Fingepint Authentication [full vesion] T. Chales Clancy Compute Science Univesity of Mayland, College Pak tcc@umd.edu Nega Kiyavash, Dennis J. Lin Electical and Compute Engineeing Univesity

More information

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN*

Automatic Testing of Neighbor Discovery Protocol Based on FSM and TTCN* Automatic Testing of Neighbo Discovey Potocol Based on FSM and TTCN* Zhiliang Wang, Xia Yin, Haibin Wang, and Jianping Wu Depatment of Compute Science, Tsinghua Univesity Beijing, P. R. China, 100084 Email:

More information

VISCOSITY OF BIO-DIESEL FUELS

VISCOSITY OF BIO-DIESEL FUELS VISCOSITY OF BIO-DIESEL FUELS One of the key assumptions fo ideal gases is that the motion of a given paticle is independent of any othe paticles in the system. With this assumption in place, one can use

More information

How to recover your Exchange 2003/2007 mailboxes and emails if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database

How to recover your Exchange 2003/2007 mailboxes and emails if all you have available are your PRIV1.EDB and PRIV1.STM Information Store database AnswesThatWok TM Recoveing Emails and Mailboxes fom a PRIV1.EDB Exchange 2003 IS database How to ecove you Exchange 2003/2007 mailboxes and emails if all you have available ae you PRIV1.EDB and PRIV1.STM

More information

Effect of Contention Window on the Performance of IEEE 802.11 WLANs

Effect of Contention Window on the Performance of IEEE 802.11 WLANs Effect of Contention Window on the Pefomance of IEEE 82.11 WLANs Yunli Chen and Dhama P. Agawal Cente fo Distibuted and Mobile Computing, Depatment of ECECS Univesity of Cincinnati, OH 45221-3 {ychen,

More information

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses,

2 r2 θ = r2 t. (3.59) The equal area law is the statement that the term in parentheses, 3.4. KEPLER S LAWS 145 3.4 Keple s laws You ae familia with the idea that one can solve some mechanics poblems using only consevation of enegy and (linea) momentum. Thus, some of what we see as objects

More information

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors

Tracking/Fusion and Deghosting with Doppler Frequency from Two Passive Acoustic Sensors Tacking/Fusion and Deghosting with Dopple Fequency fom Two Passive Acoustic Sensos Rong Yang, Gee Wah Ng DSO National Laboatoies 2 Science Pak Dive Singapoe 11823 Emails: yong@dso.og.sg, ngeewah@dso.og.sg

More information

Channel selection in e-commerce age: A strategic analysis of co-op advertising models

Channel selection in e-commerce age: A strategic analysis of co-op advertising models Jounal of Industial Engineeing and Management JIEM, 013 6(1):89-103 Online ISSN: 013-0953 Pint ISSN: 013-843 http://dx.doi.og/10.396/jiem.664 Channel selection in e-commece age: A stategic analysis of

More information

An Immunological Approach to Change Detection: Algorithms, Analysis and Implications

An Immunological Approach to Change Detection: Algorithms, Analysis and Implications An Immunological Appoach to Change Detection: Algoithms, Analysis and Implications Patik D haeselee Dept. of Compute Science Univesity of New Mexico Albuqueque, NM, 87131 patik@cs.unm.edu Stephanie Foest

More information

Performance Analysis of an Inverse Notch Filter and Its Application to F 0 Estimation

Performance Analysis of an Inverse Notch Filter and Its Application to F 0 Estimation Cicuits and Systems, 013, 4, 117-1 http://dx.doi.og/10.436/cs.013.41017 Published Online Januay 013 (http://www.scip.og/jounal/cs) Pefomance Analysis of an Invese Notch Filte and Its Application to F 0

More information

The impact of migration on the provision. of UK public services (SRG.10.039.4) Final Report. December 2011

The impact of migration on the provision. of UK public services (SRG.10.039.4) Final Report. December 2011 The impact of migation on the povision of UK public sevices (SRG.10.039.4) Final Repot Decembe 2011 The obustness The obustness of the analysis of the is analysis the esponsibility is the esponsibility

More information

Supplementary Material for EpiDiff

Supplementary Material for EpiDiff Supplementay Mateial fo EpiDiff Supplementay Text S1. Pocessing of aw chomatin modification data In ode to obtain the chomatin modification levels in each of the egions submitted by the use QDCMR module

More information

MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION

MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION MULTIPLE SOLUTIONS OF THE PRESCRIBED MEAN CURVATURE EQUATION K.C. CHANG AND TAN ZHANG In memoy of Pofesso S.S. Chen Abstact. We combine heat flow method with Mose theoy, supe- and subsolution method with

More information

Do Vibrations Make Sound?

Do Vibrations Make Sound? Do Vibations Make Sound? Gade 1: Sound Pobe Aligned with National Standads oveview Students will lean about sound and vibations. This activity will allow students to see and hea how vibations do in fact

More information

Lab #7: Energy Conservation

Lab #7: Energy Conservation Lab #7: Enegy Consevation Photo by Kallin http://www.bungeezone.com/pics/kallin.shtml Reading Assignment: Chapte 7 Sections 1,, 3, 5, 6 Chapte 8 Sections 1-4 Intoduction: Pehaps one of the most unusual

More information

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it

How to create RAID 1 mirroring with a hard disk that already has data or an operating system on it AnswesThatWok TM How to set up a RAID1 mio with a dive which aleady has Windows installed How to ceate RAID 1 mioing with a had disk that aleady has data o an opeating system on it Date Company PC / Seve

More information

Experiment 6: Centripetal Force

Experiment 6: Centripetal Force Name Section Date Intoduction Expeiment 6: Centipetal oce This expeiment is concened with the foce necessay to keep an object moving in a constant cicula path. Accoding to Newton s fist law of motion thee

More information

Semipartial (Part) and Partial Correlation

Semipartial (Part) and Partial Correlation Semipatial (Pat) and Patial Coelation his discussion boows heavily fom Applied Multiple egession/coelation Analysis fo the Behavioal Sciences, by Jacob and Paticia Cohen (975 edition; thee is also an updated

More information

Promised Lead-Time Contracts Under Asymmetric Information

Promised Lead-Time Contracts Under Asymmetric Information OPERATIONS RESEARCH Vol. 56, No. 4, July August 28, pp. 898 915 issn 3-364X eissn 1526-5463 8 564 898 infoms doi 1.1287/ope.18.514 28 INFORMS Pomised Lead-Time Contacts Unde Asymmetic Infomation Holly

More information

Cloud Service Reliability: Modeling and Analysis

Cloud Service Reliability: Modeling and Analysis Cloud Sevice eliability: Modeling and Analysis Yuan-Shun Dai * a c, Bo Yang b, Jack Dongaa a, Gewei Zhang c a Innovative Computing Laboatoy, Depatment of Electical Engineeing & Compute Science, Univesity

More information

UNIT CIRCLE TRIGONOMETRY

UNIT CIRCLE TRIGONOMETRY UNIT CIRCLE TRIGONOMETRY The Unit Cicle is the cicle centeed at the oigin with adius unit (hence, the unit cicle. The equation of this cicle is + =. A diagam of the unit cicle is shown below: + = - - -

More information

Memory-Aware Sizing for In-Memory Databases

Memory-Aware Sizing for In-Memory Databases Memoy-Awae Sizing fo In-Memoy Databases Kasten Molka, Giuliano Casale, Thomas Molka, Laua Mooe Depatment of Computing, Impeial College London, United Kingdom {k.molka3, g.casale}@impeial.ac.uk SAP HANA

More information

Evaluating the impact of Blade Server and Virtualization Software Technologies on the RIT Datacenter

Evaluating the impact of Blade Server and Virtualization Software Technologies on the RIT Datacenter Evaluating the impact of and Vitualization Softwae Technologies on the RIT Datacente Chistophe M Butle Vitual Infastuctue Administato Rocheste Institute of Technology s Datacente Contact: chis.butle@it.edu

More information

Comparing Availability of Various Rack Power Redundancy Configurations

Comparing Availability of Various Rack Power Redundancy Configurations Compaing Availability of Vaious Rack Powe Redundancy Configuations White Pape 48 Revision by Victo Avela > Executive summay Tansfe switches and dual-path powe distibution to IT equipment ae used to enhance

More information

Real Time Tracking of High Speed Movements in the Context of a Table Tennis Application

Real Time Tracking of High Speed Movements in the Context of a Table Tennis Application Real Time Tacking of High Speed Movements in the Context of a Table Tennis Application Stephan Rusdof Chemnitz Univesity of Technology D-09107, Chemnitz, Gemany +49 371 531 1533 stephan.usdof@infomatik.tu-chemnitz.de

More information

Financing Terms in the EOQ Model

Financing Terms in the EOQ Model Financing Tems in the EOQ Model Habone W. Stuat, J. Columbia Business School New Yok, NY 1007 hws7@columbia.edu August 6, 004 1 Intoduction This note discusses two tems that ae often omitted fom the standad

More information

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates

9:6.4 Sample Questions/Requests for Managing Underwriter Candidates 9:6.4 INITIAL PUBLIC OFFERINGS 9:6.4 Sample Questions/Requests fo Managing Undewite Candidates Recent IPO Expeience Please povide a list of all completed o withdawn IPOs in which you fim has paticipated

More information

The Binomial Distribution

The Binomial Distribution The Binomial Distibution A. It would be vey tedious if, evey time we had a slightly diffeent poblem, we had to detemine the pobability distibutions fom scatch. Luckily, thee ae enough similaities between

More information

Timing Synchronization in High Mobility OFDM Systems

Timing Synchronization in High Mobility OFDM Systems Timing Synchonization in High Mobility OFDM Systems Yasamin Mostofi Depatment of Electical Engineeing Stanfod Univesity Stanfod, CA 94305, USA Email: yasi@wieless.stanfod.edu Donald C. Cox Depatment of

More information

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request.

Things to Remember. r Complete all of the sections on the Retirement Benefit Options form that apply to your request. Retiement Benefit 1 Things to Remembe Complete all of the sections on the Retiement Benefit fom that apply to you equest. If this is an initial equest, and not a change in a cuent distibution, emembe to

More information

Unveiling the MPLS Structure on Internet Topology

Unveiling the MPLS Structure on Internet Topology Unveiling the MPLS Stuctue on Intenet Topology Gabiel Davila Revelo, Mauicio Andeson Ricci, Benoit Donnet, José Ignacio Alvaez-Hamelin INTECIN, Facultad de Ingenieía, Univesidad de Buenos Aies Agentina

More information

A Capacitated Commodity Trading Model with Market Power

A Capacitated Commodity Trading Model with Market Power A Capacitated Commodity Tading Model with Maket Powe Victo Matínez-de-Albéniz Josep Maia Vendell Simón IESE Business School, Univesity of Navaa, Av. Peason 1, 08034 Bacelona, Spain VAlbeniz@iese.edu JMVendell@iese.edu

More information

Energy Efficient Cache Invalidation in a Mobile Environment

Energy Efficient Cache Invalidation in a Mobile Environment Enegy Efficient Cache Invalidation in a Mobile Envionment Naottam Chand, Ramesh Chanda Joshi, Manoj Misa Electonics & Compute Engineeing Depatment Indian Institute of Technology, Rookee - 247 667. INDIA

More information

Research on Risk Assessment of the Transformer Based on Life Cycle Cost

Research on Risk Assessment of the Transformer Based on Life Cycle Cost ntenational Jounal of Smat Gid and lean Enegy eseach on isk Assessment of the Tansfome Based on Life ycle ost Hui Zhou a, Guowei Wu a, Weiwei Pan a, Yunhe Hou b, hong Wang b * a Zhejiang Electic Powe opoation,

More information

Office of Family Assistance. Evaluation Resource Guide for Responsible Fatherhood Programs

Office of Family Assistance. Evaluation Resource Guide for Responsible Fatherhood Programs Office of Family Assistance Evaluation Resouce Guide fo Responsible Fathehood Pogams Contents Intoduction........................................................ 4 Backgound..........................................................

More information

Hour Exam No.1. p 1 v. p = e 0 + v^b. Note that the probe is moving in the direction of the unit vector ^b so the velocity vector is just ~v = v^b and

Hour Exam No.1. p 1 v. p = e 0 + v^b. Note that the probe is moving in the direction of the unit vector ^b so the velocity vector is just ~v = v^b and Hou Exam No. Please attempt all of the following poblems befoe the due date. All poblems count the same even though some ae moe complex than othes. Assume that c units ae used thoughout. Poblem A photon

More information

A Two-Step Tabu Search Heuristic for Multi-Period Multi-Site Assignment Problem with Joint Requirement of Multiple Resource Types

A Two-Step Tabu Search Heuristic for Multi-Period Multi-Site Assignment Problem with Joint Requirement of Multiple Resource Types Aticle A Two-Step Tabu Seach Heuistic fo Multi-Peiod Multi-Site Assignment Poblem with Joint Requiement of Multiple Resouce Types Siavit Swangnop and Paveena Chaovalitwongse* Depatment of Industial Engineeing,

More information

Multiband Microstrip Patch Antenna for Microwave Applications

Multiband Microstrip Patch Antenna for Microwave Applications IOSR Jounal of Electonics and Communication Engineeing (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 5 (Sep. - Oct. 2012), PP 43-48 Multiband Micostip Patch Antenna fo Micowave Applications

More information

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS

INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS INITIAL MARGIN CALCULATION ON DERIVATIVE MARKETS OPTION VALUATION FORMULAS Vesion:.0 Date: June 0 Disclaime This document is solely intended as infomation fo cleaing membes and othes who ae inteested in

More information

30 H. N. CHIU 1. INTRODUCTION. Recherche opérationnelle/operations Research

30 H. N. CHIU 1. INTRODUCTION. Recherche opérationnelle/operations Research RAIRO Rech. Opé. (vol. 33, n 1, 1999, pp. 29-45) A GOOD APPROXIMATION OF THE INVENTORY LEVEL IN A(Q ) PERISHABLE INVENTORY SYSTEM (*) by Huan Neng CHIU ( 1 ) Communicated by Shunji OSAKI Abstact. This

More information

An Epidemic Model of Mobile Phone Virus

An Epidemic Model of Mobile Phone Virus An Epidemic Model of Mobile Phone Vius Hui Zheng, Dong Li, Zhuo Gao 3 Netwok Reseach Cente, Tsinghua Univesity, P. R. China zh@tsinghua.edu.cn School of Compute Science and Technology, Huazhong Univesity

More information

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty

An application of stochastic programming in solving capacity allocation and migration planning problem under uncertainty An application of stochastic pogamming in solving capacity allocation and migation planning poblem unde uncetainty Yin-Yann Chen * and Hsiao-Yao Fan Depatment of Industial Management, National Fomosa Univesity,

More information

An Infrastructure Cost Evaluation of Single- and Multi-Access Networks with Heterogeneous Traffic Density

An Infrastructure Cost Evaluation of Single- and Multi-Access Networks with Heterogeneous Traffic Density An Infastuctue Cost Evaluation of Single- and Multi-Access Netwoks with Heteogeneous Taffic Density Andes Fuuskä and Magnus Almgen Wieless Access Netwoks Eicsson Reseach Kista, Sweden [andes.fuuska, magnus.almgen]@eicsson.com

More information

Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes

Give me all I pay for Execution Guarantees in Electronic Commerce Payment Processes Give me all I pay fo Execution Guaantees in Electonic Commece Payment Pocesses Heiko Schuldt Andei Popovici Hans-Jög Schek Email: Database Reseach Goup Institute of Infomation Systems ETH Zentum, 8092

More information

SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA

SUPPORT VECTOR MACHINE FOR BANDWIDTH ANALYSIS OF SLOTTED MICROSTRIP ANTENNA Intenational Jounal of Compute Science, Systems Engineeing and Infomation Technology, 4(), 20, pp. 67-7 SUPPORT VECTOR MACHIE FOR BADWIDTH AALYSIS OF SLOTTED MICROSTRIP ATEA Venmathi A.R. & Vanitha L.

More information

Trading Volume and Serial Correlation in Stock Returns in Pakistan. Abstract

Trading Volume and Serial Correlation in Stock Returns in Pakistan. Abstract Tading Volume and Seial Coelation in Stock Retuns in Pakistan Khalid Mustafa Assistant Pofesso Depatment of Economics, Univesity of Kaachi e-mail: khalidku@yahoo.com and Mohammed Nishat Pofesso and Chaiman,

More information

Programming Assignment #1

Programming Assignment #1 Due: Nov 3 (11:59pm). Pogamming Assignment #1 CMSC 351 Fall 2014 Rules 1) You may only use C/C++, Java. 2) You pogam should use the standad input/output. Fo example C/C++ uses should use scanf/pintf/cin/cout

More information

Manual ultrasonic inspection of thin metal welds

Manual ultrasonic inspection of thin metal welds Manual ultasonic inspection of thin metal welds Capucine Capentie and John Rudlin TWI Cambidge CB1 6AL, UK Telephone 01223 899000 Fax 01223 890689 E-mail capucine.capentie@twi.co.uk Abstact BS EN ISO 17640

More information

Continuous Compounding and Annualization

Continuous Compounding and Annualization Continuous Compounding and Annualization Philip A. Viton Januay 11, 2006 Contents 1 Intoduction 1 2 Continuous Compounding 2 3 Pesent Value with Continuous Compounding 4 4 Annualization 5 5 A Special Poblem

More information

FXA 2008. Candidates should be able to : Describe how a mass creates a gravitational field in the space around it.

FXA 2008. Candidates should be able to : Describe how a mass creates a gravitational field in the space around it. Candidates should be able to : Descibe how a mass ceates a gavitational field in the space aound it. Define gavitational field stength as foce pe unit mass. Define and use the peiod of an object descibing

More information

Self-Adaptive and Resource-Efficient SLA Enactment for Cloud Computing Infrastructures

Self-Adaptive and Resource-Efficient SLA Enactment for Cloud Computing Infrastructures 2012 IEEE Fifth Intenational Confeence on Cloud Computing Self-Adaptive and Resouce-Efficient SLA Enactment fo Cloud Computing Infastuctues Michael Maue, Ivona Bandic Distibuted Systems Goup Vienna Univesity

More information

Statistics and Data Analysis

Statistics and Data Analysis Pape 274-25 An Extension to SAS/OR fo Decision System Suppot Ali Emouznead Highe Education Funding Council fo England, Nothavon house, Coldhabou Lane, Bistol, BS16 1QD U.K. ABSTRACT This pape exploes the

More information

Gravitational Mechanics of the Mars-Phobos System: Comparing Methods of Orbital Dynamics Modeling for Exploratory Mission Planning

Gravitational Mechanics of the Mars-Phobos System: Comparing Methods of Orbital Dynamics Modeling for Exploratory Mission Planning Gavitational Mechanics of the Mas-Phobos System: Compaing Methods of Obital Dynamics Modeling fo Exploatoy Mission Planning Alfedo C. Itualde The Pennsylvania State Univesity, Univesity Pak, PA, 6802 This

More information

Coordinate Systems L. M. Kalnins, March 2009

Coordinate Systems L. M. Kalnins, March 2009 Coodinate Sstems L. M. Kalnins, Mach 2009 Pupose of a Coodinate Sstem The pupose of a coodinate sstem is to uniquel detemine the position of an object o data point in space. B space we ma liteall mean

More information

Loyalty Rewards and Gift Card Programs: Basic Actuarial Estimation Techniques

Loyalty Rewards and Gift Card Programs: Basic Actuarial Estimation Techniques Loyalty Rewads and Gift Cad Pogams: Basic Actuaial Estimation Techniques Tim A. Gault, ACAS, MAAA, Len Llaguno, FCAS, MAAA and Matin Ménad, FCAS, MAAA Abstact In this pape we establish an actuaial famewok

More information

ENABLING INFORMATION GATHERING PATTERNS FOR EMERGENCY RESPONSE WITH THE OPENKNOWLEDGE SYSTEM

ENABLING INFORMATION GATHERING PATTERNS FOR EMERGENCY RESPONSE WITH THE OPENKNOWLEDGE SYSTEM Computing and Infomatics, Vol. 29, 2010, 537 555 ENABLING INFORMATION GATHERING PATTERNS FOR EMERGENCY RESPONSE WITH THE OPENKNOWLEDGE SYSTEM Gaia Tecaichi, Veonica Rizzi, Mauizio Machese Depatment of

More information