Efficient mining of group patterns from user movement data

Size: px
Start display at page:

Download "Efficient mining of group patterns from user movement data"

Transcription

1 Data & Knowledge Engneerng 57 (2006) Effcent mnng of group patterns from user movement data Yda Wang a, Ee-Peng Lm a, *, San-Yh Hwang b a Centre for Advanced Informaton Systems, School of Computer Engneerng, Nanyang Technologcal Unversty, Blk N4, 2a-32, Nanyang Avenue, Sngapore , Sngapore b Department of Informaton Management, Natonal Sun Yat-Sen Unversty, Kaohsung 80424, Tawan Receved 3 February 2005; receved n revsed form 3 February 2005; accepted 27 Aprl 2005 Avalable onlne 31 May 2005 Abstract In ths paper, we present a new approach to derve groupngs of moble users based on ther movement data. We assume that the user movement data are collected by loggng locaton data emtted from moble devces trackng users. We formally defne group pattern as a group of users that are wthn a dstance threshold from one another for at least a mnmum duraton. To mne group patterns, we frst propose two algorthms, namely AGP and VG-growth. In our frst set of experments, t s shown when both the number of users and loggng duraton are large, AGP and VG-growth are neffcent for the mnng group patterns of sze two. We therefore propose a framework that summarzes user movement data before group pattern mnng. In the second seres of experments, we show that the methods usng locaton summarzaton reduce the mnng overheads for group patterns of sze two sgnfcantly. We conclude that the cubod based summarzaton methods gve better performance when the summarzed database sze s small compared to the orgnal movement database. In addton, we also evaluate the mpact of parameters on the mnng overhead. Ó 2005 Elsever B.V. All rghts reserved. Keywords: Group pattern mnng; Moble data mnng; Locaton summarzaton * Correspondng author. Tel.: ; fax: E-mal addresses: [email protected] (Y. Wang), [email protected] (E.-P. Lm), [email protected]. edu.tw (S.-Y. Hwang) X/$ - see front matter Ó 2005 Elsever B.V. All rghts reserved. do: /.datak

2 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Introducton 1.1. Group pattern mnng People form groups due to many dfferent reasons. Wthn an organzaton, formal groups are formed to carry out some desgnated tasks or assgnments. Group members have well-defned roles and the groups may exst tll the tasks or assgnments are completed. Informal groups, on the other hand, are formed due to emotonal, socal or psychologcal needs. Group members have less defned roles and the membershps are less stable. Group dynamcs and ts nfluence on ndvdual decson makng have been well studed by socologsts [7], and t has been shown that peer pressure and group conformty can affect the behavors of ndvduals. Such group behavors can be very useful to dfferent applcatons. For example, by knowng the groups a customer belongs to, retalers can derve common buyng nterests among customers and develop group-specfc prcng models and marketng strateges. Group dscounts and product recommendaton can be ntroduced to encourage more purchases that lead to hgher sales. In fghtng aganst terrorsm, analyzng user group patterns s one of the mportant tasks that help to reveal the lnks between terrorsts and ther roles n the group. In the past, dfferent ways to dscover groups usng clusterng technques have been proposed [22]. Very often, they are based on dfferent defntons of smlarty measure to represent the closeness between users. For example, users may be grouped by ther common nterests, ob features, educaton level, and other attrbutes. Users can also be grouped based on the transactons they perform. For example, Amazon.com groups users together by the common books they purchase. However, n many cases, these methods suffer from a common ptfall,.e., members n a derved group may not even know one another. Such knd of group dervaton approaches are therefore not sutable to many applcatons that requre group members to be acquantances. In our research, we propose a new way to derve groupng knowledge by performng data mnng on user movement log data. These movement data are assumed to be generated by moble devces that track the locatons of ther owners as they move from one place to another. These devces are equpped wth GPS (Global Postonng Systems) and other related postonng technologes. GPS can acheve postonng errors rangng from 10 to 20 m [5,6,32], whle the Asssted-GPS technology further reduces errors to between 1 and 10 m [8]. There are also terrestral-based postonng technologes on the popularly used cellular networks, such as AOA, DOA, and TOA, whch can acheve postonng accuracy around m [33]. In the ndoor envronment, users can also be tracked by RFIDs by havng RFID recevers at dfferent locatons sensng the sgnals from RFID tags. We also assume that each user locaton, n the form of x-/y-/z-coordnates, can be logged at regular ntervals over a perod of tme. In practce, the assumpton may not hold as moble devces may experence falures. They may be swtched off by ther owners from tme to tme, and the data collecton tme may not be synchronzed across users. These assumptons nevertheless are reasonable f consderng data cleanng or data transformaton to be performed before applyng our proposed mnng algorthms. To keep a focused dscusson, we shall keep the prvacy and legal ssues out the scope of ths paper. As loggng user locatons over tme can affect the prvacy of users, we beleve that these ssues should be addressed wthn a legal framework whch s beyond the scope of ths paper. Furthermore, for several practcal stuatons related to safety and securty, user movement loggng s consdered necessary and has already been done.

3 242 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Mnng user groups from movement data s a knd of spato-temporal data mnng. That s, we are nterested to dscover groupngs of users such that members n the same group are spatally close to one another for sgnfcant amount of tme. Such user groupngs are also known as group patterns [27]. The groupng knowledge derved n ths way s unque compared to the other approaches due to the followng: Physcal proxmty between group members: The group members are expected to be physcally close to one another when they act as a group. Such characterstcs are common among many types of groups, e.g., shoppng pals, game partners, etc. Temporal proxmty between group members: The group members are expected to stay together for some meanngful duraton when they act as a group. Such characterstcs dstngushes an ad hoc cluster of people who are physcally close but unaware of one another from a group of people who come together for some planned actvty(es). Intutvely, people who spent sgnfcant tme together are expected to be aware of one another and they should mantan regular contact. Hence, the group members are expected to exert much stronger nfluence on one another Research obectves and contrbutons Our research ams to formalze the concept of group pattern based on user movement. In ths paper, we formally defne the noton of group pattern. We ntroduce max_ds and mn_dur as the maxmum physcal dstance threshold among group members and the mnmum duraton threshold for group members to stay together respectvely for dervng group patterns. In addton, we defne the weght of a group pattern as a measure of ts nterestngness. Wth a mn_we threshold, we defne the noton of vald group pattern and the vald group pattern mnng problem. In the followng, we summarze our man research contrbutons as follows: Algorthms for vald group pattern mnng. Two algorthms AGP and VG-growth are developed to mne vald group patterns. Whle the AGP algorthm s derved from the Apror algorthm for classcal assocaton rule mnng, VG-growth adopts a mnng strategy smlar to FP-growth algorthm and s based on a novel data structure known as VG-graph. Locaton summarzaton based algorthms for mnng vald 2-groups. We observe n our experments that the tme taken by AGP and VG-growth to mne vald group patterns of sze two (also known as vald 2-groups) domnates the total mnng tme, because both algorthms requre large number of user pars to be examned, especally when the number of users s large. We therefore propose a group pattern mnng framework that can accommodate dfferent locaton summarzaton methods. Four dfferent locaton summarzaton methods have been proposed to reduce the overhead of mnng vald 2-groups. Performance evaluaton of group pattern mnng algorthms. We conduct comprehensve experments to evaluate the performance of all the proposed algorthms usng datasets synthetcally generated by IBM Cty Smulator [14]. We observe that VG-growth s much faster than AGP for mnng vald k-groups, where k > 2, whle the locaton summarzaton based algorthms are much more effcent than AGP and VG-growth for mnng vald 2-groups.

4 In our prevous work [27,28], we have proposed our novel algorthms AGP and VG-growth for mnng vald group patterns and one locaton summarzaton method SLS for effcently mnng vald 2-groups. In ths paper, we provde the formal defnton of condtonal VG-graph and gve the correctness and completeness proofs of the VG-growth algorthm. Several other locaton summarzaton methods are also ncluded to further reduce the overhead of mnng vald 2-groups. Comprehensve experments are conducted to compare the performances of dfferent locaton summarzaton based algorthms and the nfluences of relevant parameters Paper outlne Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) The rest of the paper s organzed as follows. The formal defntons of group pattern mnng problem s gven n Secton 2. Secton 3 descrbes two vald group pattern mnng algorthms, AGP and VG-growth, and ther performance results. In Secton 4, we ntroduce a general framework to ncorporate locaton summarzaton nto vald 2-group mnng. Our locaton summarzaton methods are ntroduced n Secton 5. In Secton 6, we present an expermental study on the locaton summarzaton based algorthms. We look at some related work n Secton 7. Fnally, we draw concluson n Secton Problem defnton 2.1. Prelmnares Group pattern mnng s to be conducted on a user movement database defned by D = (D 1,D 2,...,D M ), where D s a tme seres of tuples (t,(x,y,z)) denotng the x-, y- and z-coordnates of user u at tme t. We assume that there are N tme ponts n the tme seres rangng from 0toN 1. For smplcty, we denote the locaton of a user u at tme t by u [t].p, and hs/her x-, y-, and z-values at tme t by u [t].x, u [t].y and u [t].z respectvely. A very small user movement database example s shown n Table 1. Defnton 1. Gven a set of users G, a maxmum dstance threshold max_ds, and a mnmum tme duraton threshold mn_dur, a set of consecutve tme ponts [t a,t b ] s called a vald segment of G, f (1) "u, u 2 G, t a 6 t 6 t b, d(u [t].p,u [t].p) 6 max_ds. (2) If t a >0,$u, u 2 G, d(u [t a 1].p,u [t a 1].p)>max_ds. (3) If t b < N 1, $u, u 2 G, d(u [t b + 1].p,u [t b + 1].p)>max_ds. (4) (t b t a +1)P mn_dur. In other words, wthn a vald segment of a set of users G, all members must be close to one another for at least a mnmum tme duraton (mn_dur). The functon, d(), returns the dstance between two ponts. 1 Furthermore, vald segments are maxmal as no two vald segments of 1 In ths paper, Eucldean dstance s adopted. However, other dstance metrcs can be used for dfferent applcatons, such as Manhattan dstance and weghted Eucldean dstance, as long as they satsfy the followng four propertes: (1) d(,) P 0; (2) d(,) = 0; (3) d(,) =d(,); and (4) d(,) 6 d(,h)+d(h,).

5 244 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Table 1 User movement database D t x y z u u u u

6 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Table 1 (contnued) t x y z u u the same set of users can overlap each other. The thresholds, max_ds and mn_dur, are used to defne the spatal and temporal proxmty requrements between members of a group. In partcular, the mn_dur threshold helps to weed out short-lved or accdental closeness between users. Consder the user movement database n Table 1. For mn_dur = 3 and max_ds = 10, [0,3] s a vald segment of the set of users, {u 1,u 2 }. Defnton 2. Gven a set of users G, thresholds max_ds and mn_dur, we say that G, max_ds and mn_dur form a group pattern, denoted by P = hg,max_ds,mn_dur, f G has a vald segment. The vald segments of a group pattern P are those of ts G component. We also call a group pattern wth k users a k-group pattern. Defnton 3. Gven two group patterns, P = hg,max_ds,mn_dur and P 0 = hg 0,max_ds,mn_ dur, P 0 s called a sub-group pattern of P f G 0 G. In a movement database, a group pattern may have multple vald segments. The combned length of these vald segments s called the weght-count of the pattern. We therefore measure the sgnfcance of the pattern by comparng ts weght-count wth the overall tme duraton. Defnton 4. Let P be a group pattern wth vald segments s 1,...,s n, the weght-count and weght of P are defned as

7 246 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) weght-countðpþ ¼ Xn ¼1 s weghtðpþ ¼ weght-countðpþ P n ¼1 ¼ s ð2þ N N Snce weght represents the proporton of the tme ponts a group of users stay close together, the larger the weght s, the more sgnfcant (or nterestng) the group pattern s. Furthermore, f the weght of a group pattern exceeds a threshold mn_we, we call t a vald group pattern, and the correspondng group of users a vald group. For example, suppose mn_we = 50%. The group pattern h{u 2,u 4,u 5 },10,3 s vald, snce t has a vald segment {[3,9]} and a weght of 7/ 10 P 0.5. Defnton 5. Gven the thresholds max_ds, mn_dur, and mn_we, the problem of fndng all the vald group patterns (or smply vald groups) s known as vald group (pattern) mnng. ð1þ 2.2. Dscussons There are some smlartes between group pattern mnng and the classcal assocaton rule mnng. In the latter problem, the goal s to dscover all frequent temsets, whch s defned as a set of tems, wth support exceedng a mnmal support threshold. There are however several key dfferences that render the drect applcaton of assocaton rule mnng methods not feasble n vald group pattern mnng. There s no explct concept of transacton n a movement database. The movement database conssts of multple tme seres of locatons, one for each user. One can try to organze the locatons recorded at one tme pont nto some knd of transactons wth each transacton representng a set of locatons that are not more than max_ds apart at that tme pont. For example, 12 transactons can be derved based on the locatons at tme pont 3 n Table 1: {u 1,u 2 }, {u 1,u 3 }, {u 1,u 4 }, {u 1,u 5 }, {u 2,u 4 }, {u 2,u 5 }, {u 2,u 6 }, {u 4,u 5 }, {u 4,u 6 }, {u 5,u 6 }, {u 1,u 4,u 5 }, and {u 2,u 4,u 5 }. However, ths groupng of user locatons at each tme pont nto transactons can lead to extremely large number of transactons, especally for a large populaton. Ths transactonzng overhead can be prohbtve f there are many tme ponts and users. The weght defned for vald group pattern mnng s very dfferent from the support defned n assocaton rule mnng. By transactonzng the movement database, t does not address the weght countng problem. For transactons derved from a sngle tme pont, we should not double count the transactons that have the same set of users. In the above example, user par {u 4,u 5 } s contaned n three derved transactons {u 4,u 5 }, {u 1,u 4,u 5 } and {u 2,u 4,u 5 }. But the weght-count of {u 4,u 5 } should be only ncremented by 1, nstead of 3, snce all of the three transactons occur at the same tme pont 3. Therefore, t s necessary to desgn new algorthms for mnng vald group patterns.

8 3. Group pattern mnng algorthms Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) In [27], we proposed two algorthms to mne group patterns, known as the Apror-lke Group Pattern mnng (AGP) algorthm and Vald Group-Growth (VG-Growth) algorthm. The former explores the Apror property of vald group patterns and extends the Apror algorthm [3] to mne vald group patterns. The latter s based on dea smlar to the FP-growth algorthm. Both Apror and FP-growth algorthms are orgnally desgned for assocaton rule mnng. In the followng, we present the proposed AGP and VG-growth algorthms Apror-lked group pattern mnng (AGP) algorthm AGP algorthm s bult upon the Apror property that also holds for group patterns. Property 1 (Apror property of group patterns). If a group pattern s vald, then all of ts subgroup patterns are vald as well. Proof. Gven P mn_we, and a group pattern P = hg, mn_dur, max_ds, fp s a vald group pattern, then P mn we, where s 0 N s are vald segments of P. Let P0 denote any sub-group pat- s tern of P. Note that, for each vald segment s of PP, therepmust exst a vald segment s 0 of P0 such that s s 0 and so s0 P s s. Hence, we have 0 s P P mn we. That s, the sub-group N N pattern P 0 s also a vald group pattern. h The AGP algorthm s shown n Fg. 1. We use C k to denote the set of canddate k-groups, and G k to denote the set of vald k-groups. The AGP algorthm starts by mnng G 1, the set of all dstnct users. It then uses G 1 to fnd G 2, whch n turn s used to fnd G 3. The process repeats untl no more vald k-groups can be found. In each teraton, Apror property s used to generate canddate groups of larger sze, and to prune the unpromsng canddate groups. However, there are two key dfferences between AGP and the classcal Apror algorthm: (1) Instead of examnng whether a transacton contans a canddate temset, the AGP algorthm tests whether users n a canddate group are close to one another at a gven tme pont. (2) Instead of smply ncrementng support counts, AGP algorthm accumulates the lengths of all vald segments so as to compute the weght of a canddate group. For example, suppose we want to mne vald group patterns from D (see Table 1) wth max_ ds = 10, mn_dur = 3, and mn_we = 50%. G 1 s frst assgned the set {{u 1 },{u 2 },{u 3 },{u 4 }, {u 5 },{u 6 }}. We then generate C 2 by a on operaton, whch s the same as that n Apror algorthm. C 2 ¼ffu 1 ; u 2 g; fu 1 ; u 3 g; fu 1 ; u 4 g; fu 1 ; u 5 g; fu 1 ; u 6 g; fu 2 ; u 3 g; fu 2 ; u 4 g; fu 2 ; u 5 g; fu 2 ; u 6 g; fu 3 ; u 4 g; fu 3 ; u 5 g; fu 3 ; u 6 g; fu 4 ; u 5 g; fu 4 ; u 6 g; fu 5 ; u 6 gg. Then we scan D to compute the weghts for each canddate 2-group and select the vald ones for G 2 :

9 248 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Fg. 1. Algorthm AGP.

10 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) G 2 ¼ffu 1 ; u 2 g; fu 1 ; u 4 g; fu 1 ; u 5 g; fu 2 ; u 4 g; fu 2 ; u 5 g; fu 2 ; u 6 g; fu 4 ; u 5 g; fu 4 ; u 6 gg. From G 2, we generate C 3 : C 3 ¼ffu 1 ; u 2 ; u 4 g; fu 1 ; u 2 ; u 5 g; fu 1 ; u 4 ; u 5 g; fu 2 ; u 4 ; u 5 g; fu 2 ; u 4 ; u 6 g; fu 2 ; u 5 ; u 6 gfu 4 ; u 5 ; u 6 gg. Note that {u 2,u 5,u 6 } and {u 4,u 5,u 6 } are subsequently pruned from C 3 snce they have an nvald sub-group ({u 5,u 6 }) whch s not n G 2. After scannng D agan to compute the weghts, we obtan G 3 : G 3 ¼ffu 1 ; u 4 ; u 5 g; fu 2 ; u 4 ; u 5 gg. The algorthm termnates here and the dscovered vald groups are G 2 [ G 3. Tme Complexty Analyss. In the Generate_Canddate_Groups procedure, each call to Has_Invald_Subgroups procedure (Lne 05) requres Oðk G k 1 Þ tme n the worst case. Wth the two loops n Lnes 01 and 02, the tme complexty of the Generate_Canddate_Groups procedure s Oðk G k 1 3 Þ. In the man algorthm, Lnes scan the database to compute the weght, whch costs 2 k OðM N; N C k 2 Þ¼OðM N; N Ck k 2 Þ, where M s the number of dstnct users and N s the whole tme span of D. Lne 13 selects the vald groups, whch costs O(C k ). In total, the tme cost of AGP algorthm s OðR k fk G k 1 3, M Æ N, N Æ C k Æ k 2 }Þ. Ths s a man memory based analyss, whch does not consder the dsk access overhead. That s, both the movement database and the canddate sets are assumed to resde n man memory. Note that, the (N Æ C k Æ k 2 ) component represents the overheads of scannng D to check the dstance between every two users of every canddate group. Ths s the man bottleneck of all Aprorlke algorthms and we therefore develop the VG-growth algorthm to reduce such bottleneck VG-growth: an algorthm based on vald group graph AGP algorthm, lke the orgnal Apror algorthm, nvolves much overhead n canddate k- group generaton and multple database scans. In [11], Han et al. proposed a novel data structure known as FP-tree and a dvde-and-conquer algorthm, FP-growth, that mnes assocaton rules wthout the above overhead. In ths secton, we wll borrow the dea and develop the Vald Group Graph data structure and VG-growth algorthm. In a FP-tree (Frequent Pattern tree), each node represents a frequent tem, and the frequent tems are ordered n support descendng order so that the more frequently occurrng tems are more lkely to be shared and thus located closer to the top of the FP-tree. The FP-growth method starts from a frequent tem (as an ntal suffx pattern), examnes only ts condtonal pattern base (a sub-database whch contans the set of frequent tems co-occurrng wth the suffx pattern), constructs ts condtonal FP-tree, and performs mnng recursvely wth such a tree. The maor operatons of mnng are count accumulaton and prefx count adustment, whch are usually much less costly than canddate generaton and pattern matchng operatons performed n the Aprorlke algorthm. 2 We use O(A,B,C,...) to denote max{o(a),o(b),o (C),...}.

11 250 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) As we extend the FP-tree structure and FP-growth algorthm to vald group pattern mnng, the key dfferences between assocaton rule mnng and vald group pattern mnng have to be consdered. Moreover, some basc concepts adopted by FP-tree and FP-growth wll have to change as descrbed below. Each node n a FP-tree s a frequent tem, whch s also the smallest unt n assocaton rule mnng. In vald group pattern mnng, the smallest unt s a vald 2-group. A drect constructon of FP-tree-lke structure based on vald 2-groups wll however lead to excessve number of nodes n the tree. The weght used n vald group pattern mnng s more complcated than the support measure. Hence, t s necessary to store the lst of vald segments for each vald 2-group so as to derve the weght of vald groups of larger szes Vald group (VG) graph In ths secton, we defne Vald group graph on whch our proposed VG-growth algorthm wll operate to mne vald group patterns. Defnton 6. A vald group graph (VG-graph) s a weghted drected graph (V,E,s), where (1) Each vertex n V represents a user who partcpates n some vald 2-group,.e., V ¼fuu 2 G; G 2 G 2 g. Ths set of users s also known as vald users. (2) Each weghted drected edge n E represents a vald 2-group and the drecton of an edge always orgns from the vertex wth a smaller user d. (3) s s a weghtng functon that maps each edge n E to ts vald segments. Consder D n Table 1. Assume that max_ds = 10, mn_dur = 3 and mn_we = 50%. We construct the VG-graph of D usng a modfed AGP algorthm that stores vald segments as t computes G 2. G 2 s shown n Table 2 and the constructed VG-graph s shown n Fg. 2. Note that a VG-graph can be constructed for a movement database by frst dervng the set of all vald 2- groups, whch requres only one scan of the movement database. Table 2 G 2 and the vald segments Vald 2-groups Vald segment lsts {u 1,u 2 } s(u 1,u 2 ) = {[0,3],[7,9]} {u 1,u 4 } s(u 1,u 4 ) = {[3,9]} {u 1,u 5 } s(u 1,u 5 ) = {[1,3],[5,9]} {u 2,u 4 } s(u 2,u 4 ) = {[0,9]} {u 2,u 5 } s(u 2,u 5 ) = {[3,9]} {u 2,u 6 } s(u 2,u 6 ) = {[0,3],[5,7]} {u 4,u 5 } s(u 4,u 5 ) = {[3,9]} {u 4,u 6 } s(u 4,u 6 ) = {[0,6]}

12 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) u 1 s(u 1,u 2 ) s(u 1,u 4 ) s(u 1,u 5 ) u 2 s(u 2,u 4 ) s(u 2,u 6 ) s(u 2,u 5 ) u 6 s(u 4,u 6 ) u 4 s(u 4,u 5 ) u 5 Fg. 2. The VG-graph for Table 1. Assumng an adacency lst representaton, the space requred for storng a VG-graph can be determned by: VG-graph ¼aV þbeþvsl ð3þ where a and b are the space requred for storng a vertex and an edge respectvely, and vsl s the space requred for storng the vald segment lsts. Note that, V 6 M, E ¼G 2 and vsl 6 N 2dG 2 b c, where d s the space requred for storng one tme stamp, and the number of mn durþ1 N vald segments for a vald 2-group s at most b c. In addton, we defne the compresson mn durþ1 rato of a VG-graph as: compresson rato of VG-graph ¼ VG-graph ð4þ D where D denotes the sze of the orgnal movement database. Property 2. Gven a movement database D and thresholds max_ds, mn_dur and mn_we, ts correspondng VG-graph contans the complete nformaton of D relevant to vald group pattern mnng. Proof. In the VG-graph constructon process, all the vald 2-groups, assocated wth ther vald segments, are stored n the VG-graph. From Property 1, we know that f a k-group (k P 2) pattern s vald, then all of ts 2-subgroup patterns are vald as well. That s to say, each vald k-group can be generated from some vald 2-groups. Moreover, we can check the valdty of a k-group by examnng the ntersectons among the vald segments of all ts 2-subgroups. Thus the property holds. h VG-growth algorthm In ths subsecton, we present the VG-growth algorthm that uses the compact nformaton n VG-graph for mnng the complete set of vald groups.

13 252 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Defnton 7. If (u! v) s a drected edge n a VG-graph, u s called the prefx-neghbor of v. For example, n Fg. 2, u 1 and u 2 are the prefx-neghbors of u 4. Defnton 8. A suffx-group, denoted by H, s an ordered lst of k (k P 0) vald users n user d descendng order. In partcular, the suffx-group s empty at the begnnng of VG-growth algorthm (see Lne 05 n Fg. 4). Defnton 9. The condtonal VG-graph of a suffx-group H contanng k (k P 0) users s called an k-order condtonal VG-graph, denoted by VG (k) (H) =(V k,e k,s k ), and can be constructed as follows. (1) When k =0,VG (0) (B) (.e., the 0-order condtonal VG-graph) s the VG-graph constructed from G 2. (2) When k P 1, let H ¼fu a1 ; u a2 ;...; u ak g, where a 1 > a 2 > > a k P 1. Then, VG (k) (H)= (V k,e k,s k ) can be constructed from VG ðk 1Þ ðh fu ak gþ ¼ ðv k 1 ; E k 1 ; s k 1 Þ as: V k ¼fu u 2 V k 1 ; ðu! u ak Þ2E k 1 g E k ¼fðu! u Þðu! u Þ2E k 1 ; u 2 V k ; u 2 V k ; s k ðu ; u Þ P mn we Ng ð5þ where s k ðu ; u Þ¼fss 2 s \ ; s P mn durg ð6þ and s \ ¼ s k 1 ðu ; u Þ\s k 1 ðu ; u ak Þ\s k 1 ðu ; u ak Þ: ð7þ The VG-growth algorthm conducts a traversal on the VG-graph, vstng vertces accordng to ther vertex ds. We llustrate the algorthm usng the VG-graph n Fg. 2. The vertces are vsted as follows: Vertex u 1 : Select the set of prefx-neghbors of vertex u 1, denoted by V u1. Snce V u1 s empty, the mnng process for u 1 termnates wth no vald group generated. Vertex u 2 : Select the set of prefx-neghbors of vertex u 2,.e., V u2 ¼fu 1 g. For each vertex v n V u2, we generate a vald 2-group by concatenatng v wth u 2,.e., {u 2,u 1 }. Select the set of edges on V u2, denoted by EðV u2 Þ. Here, V u2 contans only one vertex, and EðV u2 Þ¼;. The mnng process for u 2 termnates. Vertex u 4 : V u4 ¼fu 1 ; u 2 g, whch generates two vald 2-groups: {u 4,u 1 } and {u 4,u 2 }. EðV u4 Þ¼ fðu 1! u 2 Þg wth s(u 1,u 2 ) = {[0,3],[7,9]}. Adust the vald segments of edge {(u 1! u 2 )} aganst u 4 :s(u 1,u 2 )=s(u 1,u 2 ) \ s(u 1,u 4 ) \ s(u 2,u 4 ) = {[7,9]}. Snce the adusted vald segments do not meet the mn_we requrement, the mnng process for u 4 termnates. Vertex u 5 : V u5 ¼fu 1 ; u 2 ; u 4 g. Generate three vald 2-groups: {u 5,u 1 }, {u 5,u 2 }, and {u 5,u 4 }. Next, Select the drected edges on V u5 : EðV u5 Þ¼fðu 1! u 2 Þ; ðu 1! u 4 Þ; ðu 2! u 4 Þg wth assocated segment lsts: s(u 1,u 2 ) = {[0,3], [7,9]}, s(u 1,u 4 ) = {[3,9]}, and s(u 2,u 4 ) = {[0,9]}. Now, we adust the assocated segment lsts aganst u 5 as follows:

14 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) sðu 1 ; u 2 Þ¼sðu 1 ; u 2 Þ\sðu 1 ; u 5 Þ\sðu 2 ; u 5 Þ¼f½0; 3Š; ½7; 9Šg \ f½1; 3Š; ½5; 9Šg \ f½3; 9Šg ¼ f½3š; ½7; 9Šg sðu 1 ; u 4 Þ¼sðu 1 ; u 4 Þ\sðu 1 ; u 5 Þ\sðu 4 ; u 5 Þ¼f½3; 9Šg \ f½1; 3Š; ½5; 9Šg \ f½3; 9Šg ¼ f½5; 9Šg sðu 2 ; u 4 Þ¼sðu 2 ; u 4 Þ\sðu 2 ; u 5 Þ\sðu 4 ; u 5 Þ¼f½0; 9Šg \ f½3; 9Šg \ f½3; 9Šg ¼ f½3; 9Šg Edges wth adusted segment lsts not meetng the mn_dur and mn_we requrements are removed. The edge (u 1! u 2 ) s removed n ths step. V u5 and EðV u5 Þ (after segment lst adustment) form the condtonal VG-graph of u 5 (VG(u 5 )), whch contans three vertces {u 1,u 2,u 4 } and two edges (u 1! u 4 ) and (u 2! u 4 ) wth assocated segment lsts. u 5 s a suffx-group as t wll be ncorporated as a suffx to every vald group found n VG(u 5 ). We perform mnng recursvely on VG(u 5 ). We have V u5 u 1 ¼ V u5 u 2 ¼; and V u5 u 4 ¼fu 1 ; u 2 g. From V u5 u 4, two vald 3-groups: {u 5,u 4,u 1 } and {u 5,u 4,u 2 } are derved. Tll now, the mnng process for u 5 termnates. The mnng process for u 5 s shown n Fg. 3. Vertex u 6 : From V u6 ¼fu 2 ; u 4 g, we derve two vald 2-groups: {u 6,u 2 }, {u 6,u 4 }. EðV u6 Þ¼ fðu 2! u 4 Þg wth s(u 2,u 4 ) = {[0,9]}. After adustment, s(u 2,u 4 )=s(u 2,u 4 ) \ s(u 2,u 6 ) \ s(u 4,u 6 )= {[0,3],[5,6]}. The vald segment [5,6] does not meet the mn_dur requrement and s removed, leavng [0,3] to be the only vald segment. Snce [0,3] does not meet the mn_we requrement, ths edge (u 2! u 4 ) s removed. The mnng process for u 6 therefore termnates. After vstng all the vertces, VG-growth termnates. The complete VG-growth algorthm s shown n Fg. 4. Lemma 1. Let VG (k) (H) be the condtonal VG-graph of a suffx-group H ¼fu a1 ;...; u ak g (k P 0, a 1 > a 2 > > a k ), then every edge (u! u ) n VG (k) (H) represents a vald k + 2 group fu ; u ; u a1 ;...; u ak g. Proof. We prove ths lemma by nducton. [Base Case]. When k =0,VG (0) (;) s the orgnal VG-graph. It s clear that every edge (u! u ) n the orgnal VG-graph represents a vald 2-group {u,u } and the base case holds. [Inductve Hypothess]. Suppose the lemma holds for some n, 0 6 n 6 k. That s, every edge (u! u ) n VG (n) (H) =(V n,e n,s n ) wth H ¼fu a1 ;...; u an g represents a vald n + 2 group fu ; u ; u a1 ;...; u an g. u 1 u 1 s(u 1,u 2 ) s(u 1,u 4 ) s(u 1,u 4 ) u 2 s(u 2,u 4 ) u 4 Adust vald segments and dscard nvald ones u 2 s(u 2,u 4 ) u 4 Pck edges E(V u5 ) Condtonal VG-graph of u 5 : VG(u 5 ) Fg. 3. The mnng process for vertex u 5.

15 254 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Fg. 4. VG-growth algorthm. [Inductve Step]. Consder VG ðnþ1þ ðh [fu anþ1 gþ ¼ ðv nþ1 ; E nþ1 ; s nþ1 Þ. Based on the defnton of condtonal VG-graph, for each edge (u! u ) 2 E n+1, there must exst (u! u ), ðu! u anþ1 Þ, and ðu! u anþ1 Þ n E n. Gven the nductve hypothess, these three edges represent three vald n + 2 groups: fu ; u ; u a1 ;...; u an g, fu ; u anþ1 ; u a1 ;...; u an g, and fu ; u anþ1 ; u a1 ;...; u an g, denoted by G I, G II, and G III respectvely. In addton, the vald segments of G I, G II, and G III are s n (u,u ), s n ðu ; u anþ1 Þ, and s n ðu ; u anþ1 Þ respectvely. Next, from Eqs. (6) and (7), we know that s n+1 (u,u ) satsfes mn_dur and thus s n+1 (u,u ) s the vald segments of group G IV = G I [ G II [ G III. From Eq. (5), s n+1 (u,u ) also satsfes mn_we, thus, G IV s vald. That s, each edge (u! u )nvg ðnþ1þ ðh [fu anþ1 gþ represents a vald n +3 group fu ; u ; u anþ1 ; u a1 ;...; u an g. Thus, the lemma holds for n +1. h Theorem 1 (Correctness). VG-growth only generates vald groups.

16 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Proof. Wthout loss of generalty, 3 let G ¼fu a1 ;...; u ak g be a vald group generated by VGgrowth, where a 1 > a 2 > > a k. Accordng to the mnng process of VG-growth, G s generated when vstng vertex u ak 1 n VG (k 2) (H) where H ¼fu a1 ;...; u ak 2 g, and there s an edge ðu ak! u ak 1 Þ n VG (k 2) (H). Based on Lemma 1, we know that ths edge represents a vald k-group fu a1 ;...; u ak g. Thus, ths property s proved. h Theorem 2 (Completeness). VG-growth generates all vald groups. Proof. Let G ¼fu a1 ;...; u ak g be any vald group, where a 1 > a 2 > > a k and k P 2. We need to prove that VG-growth wll generate G as a vald group. We prove ths n two cases: (1) k = 2; and (2) k P 3 as follows: Case 1: when k =2.G wll be generated durng mnng the set of vald 2-groups usng AGP (see lne 02 n Fg. 4). Case 2: when k P 3. Gven that G ¼fu a1 ;...; u ak g s a vald group, u a1 ;...; u ak are vald users n the orgnal VG-graph. Based on the defnton of condtonal VG-graph, there must exst a complete subgraph formed by vertces u aþ1 ;..., and u ak n VG () (H) wth H ¼fu a1 ;...; u a g, ", 26 < k. Consderng the case when = k 2, there must be an edge ðu ak! u ak 1 Þ n VG ðk 2Þ ðfu a1 ;...; u ak 2 gþ. Therefore, when vstng vertex u ak 1, G wll be generated as a vald group by VG-growth. Thus, ths Theorem s proven. h 3.3. Performance evaluaton of AGP and VG-growth In ths secton, we evaluate and compare the performance of AGP and VG-growth. The experments have been conducted usng movement databases generated by IBM Cty Smulator [14] on a Pentum-IV machne wth a CPU clock rate of 2.4 GHz and 1 GB of man memory. Note that both AGP and VG-growth were mplemented assumng that the movement database resdes n man memory. Ths reduces the requred tme for our experments. Cty Smulator can generate realstc three-dmensonal user movement over cty layout that ncludes streets and buldngs. A dataset M1kN1k that contans 1000 users and 1000 tme ponts was generated, coverng a 1000 m 1500 m 100 m area of 48 roads and 72 buldngs wth dfferent heghts (the hghest s around 90 m). We recorded the total executon tme (T), the tme for mnng vald 2-groups (T 2 ), and the tme for mnng all other vald groups (T k ) for dfferent mn_we values rangng from 1% to 10%. T = T 2 + T k as both AGP and VG-growth fnd vald 2-groups frst before the rest. The max_ds and mn_dur thresholds were 30 and 4 respectvely. In the experments, the unt of measurement adopted for dstance was meter, and the nterval between every two consecutve tme ponts represents 10 mn. Thus, N = 1000 and mn_dur = 4 represent about one week and 40 mn respectvely. Table 3 summarzes the parameters used n ths set of experments. 3 Although there s no mplct orderng among the users of a group, we can always sort them by user ds.

17 256 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Table 3 Performance comparson between AGP and VG-growth Dataset M N Dataset sze (MB) Thresholds M1kN1k max_ds = 30, mn_dur =4 mn_we = 1%, 2%, 4%, 6%, 10% T (mn) AGP VG T2 & Tk (mn) AGP-T2 VG-T2 AGP-Tk VG-Tk a mn_we (%) b mn_we (%) G & G G G2 Number of Vertces M1kN1k c mn_we (%) d mn_we (%) Fg. 5. Experment results: AGP vs. VG-growth: (a) T (M1kN1k), (b) T 2 and T k (M1kN1k), (c) G and G 2 (M1kN1k), (d) number of vertces n VG-graph. As shown n Fg. 5(a), VG-growth outperformed AGP n executon tme T, especally when mn_we s small (<4%). In partcular, when mn_we = 1%, VG-growth runs 10 tmes faster than AGP. Fg. 5(b) shows that the executon tme dfferences come from T k, snce AGP and VGgrowth share the same procedure of mnng vald 2-groups. For small mn_we, VG-growth was more effcent than AGP due to larger number of vald groups wth sze >2 that can be mned usng VG-graph. In contrast, AGP suffered from large number of canddate groups, causng large overhead n database scans and valdty checkng of canddate groups. As the mn_we ncreased, T 2 began to domnate T due to larger proportons of vald 2- groups, as shown n Fg. 5(c). For example, when mn_we = 10%, most vald groups were of sze

18 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) and both AGP and VG-growth spent almost all the tme fndng vald 2-groups. In other words, the VG-graph has lttle use to mprove the executon tme. Ths also motvates our proposed summarzaton approach to mne vald 2-groups whch wll be descrbed n Sectons 4 and 5. In the experment, we mplemented the VG-graph usng adacency lst structure. Each vertex n the VG-graph was stored wth a lst of ts prefx-neghbors and the correspondng vald segment lsts. Vertex ds and the tme stamps were represented as 4 bytes ntegers and each lst ponter requred 4 bytes. The byte sze of the VG-graph was obtaned accordngly. As shown n Fg. 5(d), the number of vertces n the VG-graph was the about same as the number of users when mn_we was less than 6%. It decreased wth ncreasng mn_we. For example, when mn_we = 10%, there were only 618 vald users (.e., vertces) among the 1000 users n M1kN1k. Fg. 6(a) shows the sze of VG-graph n KB for dfferent mn_we values. Our experments have shown that the compresson rato of VG-graph for dataset M1kN1k, whch occupes 12 MB space on hard dsk, was between 1% and 6%. Ths ndcates very good compresson ratos acheved by VG-graph as t only contans the set of vald users and only stores the vald segments of each vald 2-group rather than the actual locaton records of each user Sze of VG-graph (KB) VG-graph Sze of VG-graph (MB) VG-graph : M changes VG-graph : N changes a mn_we (%) b M/N T, T 2 and T k (mn) T T2 Tk T, T 2 and T k (mn) T T2 Tk c N d M Fg. 6. Experment results: AGP vs. VG-growth: (a) sze of VG-graph, (b) sze of VG-graph: M/N changes, (c) scale-up wth N (VG-growth), (d) scale-up wth M (VG-growth).

19 258 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Table 4 Scalablty of VG-growth M (thousands) N (thousands) Dataset sze (MB) Thresholds 1 1/3/5/7/ max_ds = 30, mn_dur =4 1/3/5/7/ mn_we = 1%, 2%, 4%, 6%, 10% The scalablty results of VG-growth wth respect to M and N are shown n Fg. 6(b) (d). We measured both the executon tme and the sze of VG-graph for dfferent numbers of users (M) and tme ponts (N) as shown n 4. We only provde the curves for mn_we = 4%, snce the curves for other mn_wes have the same trends. Fg. 6(b) shows the scalablty of VG-graph when M or N changes. The sze of VG-graph ncreased almost quadratcally wth M but ncreased very lttle wth N. Ths s because the ncrease of M leads to not only more vertces n the VG-graph but also more vald segments. On the other hand, the ncrease of N only causes more vald segments whle the number of vertces ncreases only a lttle. From Fg. 6(c) and (d), we fnd that T ncreased almost lnearly wth N and almost quadratcally wth M. Ths s due to the domnatng T 2 whch has O(N Æ M 2 ) tme complexty. 4. Framework for mnng vald 2-groups usng locaton summarzaton In ths secton, we propose to address the overhead of mnng vald 2-groups. We frst descrbe a common framework to ncorporate dfferent locaton summarzaton methods nto vald 2-group mnng. Subsequently, we present several locaton summarzaton methods that adopt dfferent summarzaton models and assumptons on the summarzaton parameters. The proposed framework conssts of two steps: preprocessng and mnng. In the preprocessng step, each userõs movement data are frst dvded nto tme wndows of the same sze and locatons wthn each tme wndow are summarzed usng a summarzaton model. After that, the upper bounds of weght-count and vald segment length for each user par are computed based on the summarzed locaton data. In the mnng step, we frst generate a set of canddate 2-groups based on the upper bound nformaton. Ths set of canddate 2-groups s expected to be smaller than all possble 2-groups. Moreover, nstead of scannng the large movement database, the much smaller summarzed locaton database s scanned to check the valdty of each canddate 2-group. Only when vald segments could not be determned based on the summarzed data, the orgnal database wll then be accessed. We elaborate the detals n the followng subsectons Preprocessng of user movement data Let D 0 denote the summarzed data of user u, n whch the number of tme ponts n the orgnal movement data of u, D, s reduced to N 0 ¼b N c, where w s the tme wndow sze and N s the w number of tme ponts n D. For smplcty, we assume that N s a whole number. Note that a tme w pont t 0 n the summarzed database D 0 corresponds to a tme wndow [t0 Æ w,(t 0 +1)Æ w) nd.we use u [t 0 ].P to denote {u [t].pt 0 Æ w 6 t <(t 0 +1)Æ w}. Based on a summarzaton model (SM), whch s some 3D geometry shape such as sphere, cube, etc., u [t 0 ].P s summarzed to an nstance of the

20 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) z User d: u 1, N = 25, w = 5, N' = 5 sm(u 1, 0) sm(u 1, 1) sm(u 1, 2) sm(u 1, 3) sm(u 1, 4) O y x t t' Fg. 7. Example of nstances of summarzaton model. correspondng SM, denoted by sm(u,t 0 ). D 0 s therefore {sm(u,0),sm(u,1),...,sm(u,n 0 1)}. In addton, we defne the summarzaton rato of a summarzed database as D0, where D D0 and D are the szes of D 0 and D respectvely. For example, Fg. 7 llustrates the nstances of summarzaton model, where N and w are 25 and 5 respectvely. Note that, the locaton ponts wthn each tme wndow are summarzed nto a sphere,.e., an nstance of the sphere summarzaton model, whch wll be descrbed n detal later. The summarzed database contans fve spheres, each represented by a center and a radus. In ths paper, we wll examne four dfferent SMs, namely: Sphere locaton summarzaton method (SLS); Cubod locaton summarzaton method (CLS); Grd-sphere locaton summarzaton method (GSLS); Grd-cubod locaton summarzaton method (GCLS). These methods wll be further descrbed n Secton 5. Extensons to GSLS and GCLS to consder maxmum speed constrant on user movement are gven n Appendx A. These extensons however yeld performance results smlar to GSLS and GCLS. Hence, we do not report ther results n the paper. Wth D 0, the number of tme ponts to be scanned are reduced from N to N. However, ths does w not address the problem of scannng D 0 for large number of canddate 2-groups. Thus, n the preprocessng step, we pre-compute the upper bounds of weght-count and vald segment length for each user par based on D 0. The pre-computaton s carred out under the assumpton that the upper bound of max_ds, denoted by max ds, s gven. Defnton 10. Let t 0 be a tme pont n the summarzed database D 0. Let max ds be the upper bound of max_ds (.e., max ds P max ds). Then sm(u,t 0 ) and sm(u,t 0 ) are sad to be possbly close, f: MnDstanceðsmðu ; t 0 Þ; smðu ; t 0 ÞÞ 6 max ds ð8þ

21 260 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) where MnDstance(sm(u,t 0 ),sm(u,t 0 )) s a functon returnng the mnmum dstance between sm(u,t 0 ) and sm(u,t 0 ). Defnton 11. Gven a summarzed database D 0, a user par {u,u }, and max ds, a set of consecutve tme ponts ½t 0 a ; t0 b Š s called a possbly close segment (PCS) of {u,u }, f: (1) 8t 0 2½t 0 a ; t0 b Š, sm(u,t 0 ) and sm(u,t 0 ) are possbly close. (2) If t 0 a > 0, smðu ; t 0 a 1Þ and smðu ; t 0 a 1Þ are not possbly close. (3) If t 0 b < N 0, smðu ; t 0 b þ 1Þ and smðu ; t 0 b þ 1Þ are not possbly close. We use S({u,u }) to denote the set of PCSs of {u,u },.e., Sðfu ; u gþ ¼ f½t 0 a ; t0 b нt0 a ; t0 b н0; N 0 Þ; ½t 0 a ; t0 b Š s a PCS of fu ; u gg ð9þ Property 3. "s 2 s(u,u ), 9½t 0 a ; t0 b Š2Sðfu ; u gþ such that s ½t 0 a w; ðt0 b þ 1ÞwÞ. Proof. Recall that s(u,u ) s the set of vald segments of {u,u }. Gven any vald segment s of {u,u }, s =[t p,t q ](06 t p < t q < N), [t p,t q ] must le wthn one tme wndow of sze w,oracross more than one tme wndows, denoted by ½t 0 m ; t0 n Š. We have t p P ðt 0 m wþ and t q < ðt 0 n þ 1Þw. Snce u and u are not more than max_ds apart "t 2 [t p,t q ], smðu ; t 0 k Þ and smðu ; t 0 kþ should be possbly close at t 0 k (m 6 k 6 n), snce max ds P max ds. Hence, we have (1) ½t0 m ; t0 n Š tself s a PCS of {u,u }, or (2) ½t 0 m ; t0 n Š s covered by a PCS (say, ½t0 p ; t0 q Š)of{u,u }. Let ½t 0 a ; t0 b Š be ½t0 m ; t0 n Š (for case 1), or ½t0 p ; t0 qš (for case 2). In both cases, we have ½t 0 a ; t0 b Š2Sðfu ; u gþ, and s ¼½t p ; t q нt 0 a w; ðt0 b þ 1ÞwÞ. Therefore, ths property holds. h The above property says S({u,u }) conssts of possbly close segments (n D 0 ) that cover all the vald segments of {u,u }nd. Ths property provdes the foundaton of the correctness and completeness for the summarzaton based algorthms. Defnton 12. Gven a user par {u,u }, the longest possbly close segment length of {u,u }s defned as: Qðfu ; u gþ ¼ w max ½t 0 a;t 0 b Š2Sðfu ;u gþ ðt 0 b t0 a þ 1Þ ð10þ Property 4. "s 2 s(u,u ), Q({u,u }) P s. Proof. Let s max be the longest vald segment of {u,u }. We want to show that Q({u,u }) P s max. Due to Property 3, there exsts a PCS: ½t 0 a ; t0 b Š2Sðfu ; u gþ such that s max ½t 0 a w; ðt0 b þ 1ÞwÞ. Snce ½t0 a t0 b Š2Sðfu ; u gþ, Qðfu ; u gþ P w ðt 0 b t0 a þ 1Þ P s max. Thus, the property s proven. h Ths property asserts that the longest possbly close segment length of a user par s an upper bound of the vald segment length of ths par of users.

22 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Defnton 13. The upper bound weght-count of {u,u } s defned as: X Rðfu ; u gþ ¼ w ðt 0 b t0 a þ 1Þ ½t 0 a;t 0 b Š2Sðfu ;u gþ ð11þ Property 5. R({u,u }) P weght-count ({u,u }). Proof. Recall that weght-countðfu ; u gþ ¼ P n ¼1 s, where s 2 s(u,u ). Let S({u,u }) be the set of PCSs of {u,u }. Note that, for any PCS 2 S({u,u }), there are two possble cases: (1) ths PCS covers one or more vald segment(s); or (2) ths PCS does not cover any vald segment, snce max ds P max ds. Let S 0 ({u,u }) denote the set of PCSs that covers one or more vald segment(s). Obvously, S 0 ({u,u }) S({u,u }). Next, from Property 3, we know that, for each vald segment s, there exsts a PCS coverng s. Thus, X n X s 6 w PCS ¼1 PCS2S 0 ðfu ;u gþ From the defnton of upper bound weght-count, we know: X X Rðfu ; u gþ ¼ w PCS P w PCS PCS2Sðfu ;u gþ PCS2S 0 ðfu ;u gþ Therefore, R({u,u }) P weght-count({u,u }). Thus, the property s proven. h Ths property asserts that the upper bound weght-count of a user par s ndeed the upper bound on the weght-count for ths par of users. Let P denote the set of all user pars together wth ther longest possbly close segment length and upper bound weght-count,.e., P ¼ fðfu ; u g; Qðfu ; u gþ; Rðfu ; u gþþ1 6 < 6 Mg ð12þ where M s the number of dstnct users. P contans the pre-computed upper bounds nformaton about the vald segment length and the weght-count for each user par. To effcently fnd canddate 2-groups that satsfy the mn_dur requrement, we sort P by Q value n descendng order. We also use ðp k.c 2 ; P k.qðc 2 Þ; P k.rðc 2 ÞÞ to denote the kth tuple n P. The detaled algorthm for the preprocessng step s shown n Fg Mnng of vald 2-groups After the summarzed database D 0 and precomputed upper bound nformaton P are constructed, the mnng step can be carred out to fnd the set of vald 2-groups, as shown n Fg. 9. User specfed max_ds, mn_dur and mn_we are nput to the mnng step. From P, we frst determne a set of canddate 2-groups, C 2, such that for each c 2 2 C 2, Q(c 2 ) P mn_dur and R(c 2 ) P mn_we Æ N. Next, we compute the weght-count of each c 2 2 C 2 by scannng the summarzed database D 0. We classfy the closeness of two nstances of SM at a summarzed tme pont nto three cases:

23 262 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Fg. 8. Preprocessng step of framework. Case 1: all locaton ponts wthn the two nstances of SM are no more than max_ds apart (see lnes 06 and 07 n Fg. 9). Case 2: all locaton ponts wthn the two nstances of SM are more than max_ds apart (see lnes n Fg. 9). Case 3: otherwse,.e., only some locaton ponts nsde the two nstances of SM are less than max_ds (see lne 15 n Fg. 9). Should case 3 arses, the correspondng tme wndow n the orgnal movement database D wll be examned to determne the exact weght-count. Tme complexty analyss. In Fg. 9, lne 02 generates the set of canddate 2-groups based on mn_dur and mn_we. The tme complexty of procedure GetCanddate2Groups s O(k), where k s the number of pars wth Q(c 2 ) P mn_dur. Note that, C 2 = k 0 (k 0 6 k), where k 0 s the number of pars that satsfes both Q(c 2 ) P mn_dur and R(c 2 ) P mn_we Æ N. Lnes compute the weght-count for each canddate 2-group. The tme cost of lnes s: n 1 Æ T Max + n 2 Æ (T Max + T Mn )+n 3 Æ (T Max + T Mn + T COD ), where n 1, n 2 and n 3 are the number of tmes when the above three cases are encountered respectvely. T Max, T Mn, and T COD are the tme costs of procedures MaxDstance, MnDstance and CheckOrgnalDB respectvely. Note that, n 1 + n 2 + n 3 = N 0 Æ C 2 as there are altogether N 0 Æ C 2 teratons and T COD = O(w). Thus, the tme complexty of lnes s O(n 1 + n 2 + w Æ n 3 ). Lnes select and output the set of vald 2-groups, whch costs O(C 2 )=O(k 0 ).

24 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Fg. 9. Mnng step of framework. Thus, the total tme cost s Oðk þ n 1 þ n 2 þ w n 3 þ k 0 Þ. In the best case, n 3 = 0, the total tme cost becomes Oðk þ N 0 k 0 þ k 0 Þ¼O N w k0.

25 264 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) In the worst case, n 3 = N 0 Æ C 2 = N 0 Æ k 0, the total tme cost becomes Oðk þ N 0 k 0 w þ k 0 Þ¼OðN k 0 Þ. As mentoned before, wthout the locaton summarzaton, the tme cost for fndng all vald 2- M groups s O N. Note that, here k 0 6 k 6 M, whch ndcates the locaton summarzaton based algorthms always outperform AGP and VG-growth for mnng vald groups. 5. Locaton summarzaton methods In the followng, we wll ntroduce four locaton summarzaton methods. Each method has ts own SM and the correspondng SummarzeLocaton, MnDstance, and MaxDstance procedures Sphere locaton summarzaton (SLS) method The sphere locaton summarzaton method adopts a sphere as the SM. Each sphere s represented by (p c,r), where p c s the center and r s the radus. Gven w locaton values from a tme wndow, u [t 0 ].P, we compute the mnmal and maxmal x-, y-, z-values, denoted by u[t 0 ].x mn, u[t 0 ].x max, u[t 0 ].y mn, u[t 0 ].y max, u[t 0 ].z mn, and u[t 0 ].z max. The center and radus of the sphere at tme t 0 are determned respectvely by: p c ¼ u½t0 Š.x mn þ u½t 0 Š.x max 2 r ¼ max dðp; p c Þ p2u ½t 0 Š.P ; u½t0 Š.y mn þ u½t 0 Š.y max ; u½t0 Š.z mn þ u½t 0 Š.z max 2 2 Gven two spheres ðp c ; r Þ and ðp c ; r Þ, the mnmum and maxmum dstances between them can be easly computed as follows: ð13þ ð14þ Mndstance ¼ dðp c ; p c Þ ðr þ r Þ Maxdstance ¼ dðp c ; p c Þþðr þ r Þ ð15þ ð16þ 5.2. Cubod locaton summarzaton (CLS) method As shown n Fg. 10, nstead of usng sphere, the cubod locaton summarzaton (CLS) method uses a cubod to represent locatons wthn a tme wndow. Consderng the fact that most users travel larger dstance n x y plane than n the z-dmenson, CLS may be more compact than SLS. 4 A more precse summarzaton wll result n: (1) fewer calls to CheckOrgnalDB procedure; (2) fewer canddate 2-groups to be generated. 4 In cubod locaton summarzaton, the edges of a cubod are parallel to the axes.

26 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) z O y x Fg. 10. SLS and CLS models. The cubod s defned such that ts edges are parallel to the x-/y-/z-axes and s represented by (v mn,v max ), where v mn s the corner wth mnmum x-/y-/z-values, and v max s that wth maxmum x-/y-/z-values. The mnmum and maxmum dstances between two cubods ðv mn ; v max Þ and ðv mn ; v max Þ are determned as follows: qffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff Mndstance ¼ Þ 2 þðd mn Þ 2 þðd mn ðd mn x y z Þ 2 ð17þ where d mn x 8 >< ¼ >: 0 f ½v mn maxðv mn mnðv max.x; v mn.xþ.x; v max.xþ ½v mn.x; v max.xš \.x; v max.xš 6¼ ; otherwse The d mn y and d mn z values are defned smlarly. qffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff Maxdstance ¼ ðd max x Þ 2 þðd max y Þ 2 þðd max z Þ 2 ð18þ ð19þ where d max x ¼ maxððv max.x v mn.xþ; ðv max.x v mn.xþþ d max y ¼ maxððv max.y v mn.yþ; ðv max.y v mn.yþþ ð20þ d max z ¼ maxððv max.z v mn.zþ; ðv max.z v mn.zþþ

27 266 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Grd-sphere locaton summarzaton (GSLS) method In both SLS and CLS, the actual coordnates and radus are used n locaton summarzaton. To further reduce the sze of SM, grd based locaton summarzaton methods are ntroduced. One of the four grd based methods to be ntroduced n ths paper s grd-sphere locaton summarzaton (GSLS), whch s a grd extenson to SLS. Grd based method parttons the space nto a set of cells, or cubes, of length l, as shown n Fg. 11. Each cell n the grd s gven a unque d, whch s an nteger startng from 0. Gven a locaton (x,y,z), the d of the correspondng cell c can be determned by c.d ¼ c.x 0 þ N x c.y 0 þðn x N y Þc.z 0 ð21þ where c.x 0 ¼b xc, l c.y0 ¼b y c, l c.z0 ¼b zc, N l x ¼d XMAX e, N l y ¼d YMAX e and XMAX/YMAX are the maxmum values n x-/y-dmensons respectvely. For example, suppose the space under consderaton l s and the cell length s 10,.e., XMAX = YMAX = 100, l = 10. Thus, cþ10 b35 cþ 10 N x = N y = 10. Gven a 3D pont (12,35,70), t falls nto the cell wth d ¼b b 70 c¼1 þ 10 3 þ ¼ 731. That s, nstead of storng three ntegers 12, 35 and 70, 10 we use only cell d 731 to represent the grd cell contanng the 3D pont (12,35,70). Conversely, gven a cell ndex c.d, the correspondng x 0, y 0, z 0 ndex values can be obtaned by a Reverse functon,.e., (c.x 0,c.y 0,c.z 0 )=Reverse(c.d,N x,n y ). c.z 0 c.d ¼ N x N y c.y 0 ¼ c.d N x N y c.z 0 N x c.x 0 ¼ c.d N x N y c.z 0 N x c.y 0 ð22þ Z l O YMAX Y XMAX X Fg. 11. Partton 3D space nto cells.

28 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Usng the above example, we apply the Reverse functon and get c.z 0 ¼b 731c¼7, 100 c.y0 ¼ b c¼3, and c.x 0 = Æ 7 10 Æ 3 = 1. Therefore, we can get the coordnates of the 10 center of cell 731,.e., (1 Æ , 3 Æ , 7 Æ ) = (15,35,75). That s, the orgnal 3D pont s replaced by the center ofp the cell contanng t. The offset between the orgnal 3D pont and the cell center s bounded by ffff 3 l. The smaller the cell length, the smaller the offset s. 2 The man dea of grd based method s to use a sngle c.d nstead of the actual coordnates to represent a locaton pont. Ths results n large storage savng for D 0. Furthermore, the tradeoff between sze and accuracy of summarzaton can be easly tuned by cell length. In GSLS, the locatons wthn a tme wndow are summarzed nto a sphere represented by ðp c.d;rþ, where p c.d s the center cell d and r s the dscretzed radus. The center cell d refers to the cell n whch the actual center of the sphere s found. The dscretzed radus r s defned by d r e, where r s the orgnal radus and c s the radus scale unt for dscretzng radus. Such a c sphere s also known as the grd-sphere. Gven two grd-spheres ðp c.d; r Þ and ðp c.d; r Þ, the mnmum and maxmum dstances between them can be determned by frst dervng two larger non-grd spheres ðp 0 c ; r 0 Þ and ðp0 c ; r 0 Þ that contan the two orgnal grd-spheres,.e., p 0 c ¼ l p c.x 0 þ 1 2 ; p c.y 0 þ 1 2 ; p c.z 0 þ 1 2 p 0 c ¼ l p c.x 0 þ 1 2 ; p c.y 0 þ 1 2 ; p c.z 0 þ 1 2 pffffff ð23þ r 0 ¼ r 3 c þ 2 l pffff r 0 ¼ r 3 c þ 2 l That s, the centers of the two larger non-grd spheres,.e., p 0 c and p 0 c, are the geometrcal centers of the two cells p c.d and p c.d respectvely. The rad r 0 and r 0 are derved from p the dscretzed rad and the radus scale unt. Note that, the derved rad are augmented by ffff 3 l n order to ensure that the non-grd spheres cover the orgnal grd-spheres, snce the maxmal offset betweenpthe ffff 2 3 old and new centers s the dstance from the new center to the corner of the cell, whch s ust l. 2 In addton, the x 0, y 0, z 0 ndex values can be obtaned by Reverse functon. Then, the mnmum and maxmum dstances are computed based on ðp 0 c ; r 0 Þ and ðp0 c ; r 0 Þ,.e., Mndstance ¼ dðp 0 c ; p 0 c Þ ðr 0 þ r0 Þ Maxdstance ¼ dðp 0 c ; p 0 c Þþðr 0 þ r0 Þ ð24þ 5.4. Grd-cubod locaton summarzaton (GCLS) method Smlar to GSLS, GCLS summarzes the locatons wthn each tme wndow nto a cubod such that the cubod conssts of grd cells and s represented by the ds of the two cells that contans the

29 268 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Table 5 Summary of locaton summarzaton methods Method No. of users No. of tme ponts SM Summarzed database sze (bts) SLS (p c,r) ð3g þrþm N w CLS (v mn,v max ) 6gM N w N GSLS M ðp w c.d;rþ ðdþrþm N w GCLS (d mn,d max ) 2dM N w AGP&VG-growth M N hx, y, z 3gMN locatons wth the mnmum x-/y-/z-values and maxmum x-/y-/z-values. Ths cubod s also known as the grd-cubod. Each grd-cubod s denoted by hd mn,d max. For user u at t 0, the correspondng d mn and d max are determned as: d mn ¼ u ½t 0 Š.x mn u ½t 0 Š.y þ N x mn u ½t 0 Š.z mn þðn x N y Þ ð25þ l l l d max ¼ u ½t 0 Š.x max u ½t 0 Š.y þ N x max u ½t 0 Š.z max þðn x N y Þ ð26þ l l l Smlar to GSLS, gven two grd-cubods hd mn ; d max and hd mn ; d max, the mnmum and maxmum dstances between them can be determned by frst dervng two larger non-grd cubods, ðv 0 mn ; v 0 max Þ and ðv 0 mn ; v 0 max Þ, contanng the two orgnal grd-cubods,.e., v 0 mn ¼ l ðd mn.x 0 ; d mn.y 0 ; d mn.z 0 Þ v 0 max ¼ l ðd max v 0 mn ¼ l ðd mn.x 0 þ 1; d max.y 0 þ 1; d max.z 0 þ 1Þ.x 0 ; d mn.y 0 ; d mn.z 0 Þ ð27þ v 0 max ¼ l ðd max.x 0 þ 1; d max.y 0 þ 1; d max.z 0 þ 1Þ The x 0, y 0, z 0 ndex values can be obtaned by Reverse functon. Then, the mnmum and maxmum dstances are computed based on ðv 0 mn ; v 0 max Þ and ðv 0 mn ; v 0 max Þ by usng Eqs. (17) (20) Summary of locaton summarzaton methods We summarze the locaton summarzaton methods n Table 5, n whch the columns from left to rght n turn are the locaton summarzaton method, the number of users n the summarzed database, the number of tme ponts n the summarzed database, the summarzaton model, and the sze of the summarzed database (n terms of bts). Note that, the last row n Table 5 represents the movement database used by AGP or VG-growth wthout locaton summarzaton. In Table 5, g, r, r, and d denote the number of bts requred to represent a coordnate value, a radus, a dscretzed radus, and a cell d respectvely.

30 6. Performance evaluaton of locaton summarzaton based algorthms In ths set of experments, we evaluate the performance of the locaton summarzaton based algorthms for mnng vald 2-groups usng dfferent summarzaton methods. We frst evaluate the mpact of summarzaton parameters on the performance. We also compare the performance of the dfferent algorthms when the summarzed database sze s fxed. Fnally, we compare the performance between AGP and the locaton summarzaton based algorthms. Throughout all the experments, the dataset used s M1kN10k, whch contans 1000 users and 10,000 tme ponts and we fxed max_ds = 30, mn_dur = 4, and mn_we = 1%. Recall that, n the experments descrbed n Secton 3.3, the movement database was loaded nto man memory and mnng dd not nvolve hard dsk access. In ths set of experments, however, only the summarzed database was loaded nto man memory, whle the orgnal movement database resded on the hard dsk. The ntenton s to smulate the stuaton where the movement database s too large to be loaded nto memory and to nvestgate the performance of the locaton summarzaton methods under such a dsadvantaged condton. It s obvous that f the movement database s also loaded nto memory, the performance of the locaton summarzaton based algorthms wll be even better than that presented here Impact of tme wndow sze Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) In ths experment, we studed the nfluence of the tme wndow sze w on the executon tme of the group pattern mnng algorthm usng dfferent locaton summarzaton methods. Dfferent w valued were chosen: 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 100, 200, and 400. The upper bound of max_ds threshold (.e., max ds) was chosen as 40. For SLS and CLS, w s the only summarzaton parameter. On the other hand, GSLS and GCLS may have multple combnatons of d and r values for a sngle w value. We therefore ran GSLS and GCLS wth dfferent combnatons of parameters for each w and reported the ones that gave the smallest T 2. The results are shown n Fg. 12(a), n whch we only plot the T 2 curves of SLS and CLS as the curve of GSLS s almost dentcal to that of SLS. The same apples to GCLS and CLS. Intutvely, f w s very small, each nstance of SM becomes more precse because of fewer locaton ponts wthn a tme wndow. However, a large N 0 ¼ N results n more overhead to scan the w summarzed database. On the other hand, when w s very large, N 0 ¼ N becomes smaller. But the w summarzaton s relatvely coarse and the mnmum dstance between two nstances of SM at any summarzed tme pont s lkely to be less than max ds. Ths causes large number of canddate 2-groups to be generated and more mnng overhead. Therefore, there exst some optmal tme wndow sze between the two extremes. As expected, we can see from Fg. 12(a), the performance of SLS and CLS does not scale up lnearly wth w. In fact, T 2 decreases frst when w ncreases from 4 to around 30 for CLS and 20 for SLS. After that, T 2 ncreases when w ncreases further. Each summarzaton method has an optmal w value. For SLS and GSLS, such an optmal w s around 24. For CLS and GCLS, the optmal w s around 32. In order to understand the reason behnd ths observaton, we decompose T 2 nto T case1,2 and T case3 as follows: T 2 = T case1,2 + T case3, where T case1,2 s the tme spent on Case 1 and Case 2 (see page 235) and T case3 s the tme spent on Case 3. We have:

31 270 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) T 2 (mn) SLS CLS Tme (mn) T2 T_case1,2 T_case3 a tme wndow sze (w) b tme wndow sze (w) 1e e+008 1e+007 1e+006 N'* C2 w*n3 n1+n2 n3 Sp(CLS)/Sp(SLS) Sp(CLS)/Sp(SLS) 10 c tme wndow sze (w) d tme wndow sze (w) Fg. 12. Impact of tme wndow sze. T case1;2 ¼ n 1 T Max þ n 2 ðt Max þ T Mn Þ¼Oðn 1 þ n 2 Þ T case3 ¼ n 3 ðt Max þ T Mn þ T COD Þ¼Oðn 3 þ w n 3 Þ. Note that, T case1,2 s solely memory based whle T case3 conssts of two parts. The frst part n 3 represents the memory based cost and the second part w Æ n 3 represents the hard dsk based cost,.e., the tme used for checkng the orgnal database whch resdes on hard dsk. Intutvely, the hard dsk accessng cost would domnate T 2. Fg. 12(b) shows T case1,2 and T case3 for SLS. The other summarzaton methods produce smlar curves and we therefore chose not to show them. As w ncreases, T case1,2 becomes neglgble, whle T case3 becomes almost dentcal to T 2, snce T case3 domnates T 2. Furthermore, from Fg. 12(c), we can see those components affectng T case1,2 and T case3. Note that, both the x- and y-axes have logarthmc scale. As w ncreases, the total number of teratons for mnng vald 2-groups, N 0 Æ C 2, decreases. Both n 1 + n 2 and n 3 also decrease wth ncreasng w. The former results n smaller T case1,2, snce T case1,2 = O(n 1 + n 2 ). However, although n 3 decreases, w Æ n 3 ncreases wth the ncreasng of w. Recall that they represent memory based cost and hard dsk based cost respectvely. Hence, t s clear that the memory based cost overrdes the hard dsk based cost before reachng the optmal w, and reversely beyond the optmal w.

32 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) From ths experment, we also found that, wth the same w, the two cubod based methods,.e., CLS and GCLS, always outperform the two sphere based methods,.e., SLS and GSLS. Ths s because, cubod based methods provde more precse summarzatons than sphere based methods. Ths leads to smaller number of canddate 2-groups, C 2, and fewer accesses to the orgnal database. We defne the summarzaton precson, denoted by S p,as 1 S p ¼ w P M P N 0 ¼1 t 0 ¼1 V ð28þ smðu ;t 0 Þ where V smðu ;t 0 Þ s the volume of the geometrcal shape formed by sm(u,t 0 ). We compared the summarzaton precson of dfferent locaton summarzaton methods. We found that the summarzaton precson of SLS s slghtly hgher than that of GSLS. The same observaton also apples to CLS and GCLS. Therefore, we only present the results of SLS and CLS here, as shown n Fg. 12(d). Note that, nstead of gvng the absolute value, we provde the relatve rato,.e., S pðclsþ S p. We can see that the summarzaton precson of CLS s much hgher ðslsþ than that of SLS for any w value. In fact, the S p of CLS s around tmes of that of SLS. The dfference becomes smaller when w ncreases because the compactness of CLS suffers when w becomes very large Impact of max ds In ths experment, we studed the effect of max ds on the mnng overhead. For each tme wndow sze n the above experments, max ds was chosen as 30, 40, 60, 100 and 500. Fg. 13(a) gves the curves for SLS and CLS for dfferent max ds values, n whch we only show the curves for w = 32 as the curves for other w values have the same trend. Note that the curve of GSLS s almost dentcal to that of SLS. The same also apples to GCLS and CLS. We can see that T 2 ncreases very lttle when max ds ncreases. Ths may sounds counterntutve because we expect a larger max ds to generate more canddate 2-groups that may sgnfcantly ncrease the mnng overhead. Indeed, C 2 ncreases dramatcally wth max ds ncreasng, as shown n Fg. 13(b). However, a large C 2 alone s not suffcent to ncrease T 2 much, as T 2 s domnated by T case3. Recall that T case3 = O(n 3 + w Æ n 3 ). Wth a fxed w, n 3 s a domnant factor. As shown n Fg. 13(c), n 3 s almost constant when max ds ncreases. The underlyng reason s, although C 2 ncreases quckly causng more checkng teratons durng mnng, most of them fall nto Case 2 (see page 235), wth only very few teratons fallng nto Case 3. In concluson, wthn a certan range, e.g., less than 60 n our experments, max ds does not affect the executon tme much for all the methods. Therefore, we smply fxed max ds as 40 n the followng experments Performance comparson wth fxed summarzaton database sze Cubod based algorthms outperforms sphere based algorthms wth the same w. But, they generate summarzed databases of dfferent szes. Thus, t s not a far comparson. We therefore need to compare them based on the same summarzed database sze.

33 272 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) T 2 (sec) SLS CLS C SLS CLS a max ds b max ds 8e+007 7e+007 6e+007 5e+007 n 2 /n 3 4e+007 3e+007 2e+007 1e+007 n 2 n 3 c max ds Fg. 13. Impact of max ds. (a) T 2, (b) C 2 and (c) n 2 and n 3. In ths experment, we chose seven dfferent D0 ratos, rangng from 0.08% to 8.33%. In order to D make w a whole number, we chose D0 as: 1, 1, 1, 1, 1, 1, and 1. For each D0 rato, we run each D D algorthm for several parameter combnatons. In partcular, for SLS and CLS, D0 D rato can be used to determne w. On the other hand, GSLS and GCLS may adopt dfferent combnatons of w, d, and r values for the same D0 rato. Therefore, for each D0 rato, we run GSLS and D D GCLS wth dfferent combnatons of parameters, and reported the ones that gve the smallest T 2. As shown n Fg. 14, the cubod based methods stll enoy smaller T 2 than the sphere based methods. When the D 0 s relatvely large (e.g., D0 > 6%), the non-grd based methods perform D slghtly better than the grd based methods. The reason s, when D0 s large, SLS and CLS have D more optmal w than GSLS and GCLS. For example, when D0 ¼ 1 (8.33%), SLS and CLS have w D 12 values as 16 and 24 respectvely. On the other hand, GSLS and GCLS share a common w,.e., 8 (see Table B.2). Note that the optmal w values for sphere and cubod based methods are around 24 and 32 respectvely. For the same reason, when the summarzed database s small

34 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) T 2 (mn) SLS CLS GSLS GCLS D' / D (%) Fg. 14. Performance comparson T 2 (mn) AGP SLS CLS GSLS GCLS C AGP SLS CLS a D' / D (%) b D' / D (%) Fg. 15. Performance comparson wth AGP: (a) T 2, (b) C 2. (e.g., D0 < 4%), the grd based algorthms have w values closer to optmal values than the non-grd D based algorthms, e.g., when D0 ¼ 1 (0.83%) (see Table B.1), leadng to smaller T D Performance comparson between AGP and summarzaton based mnng algorthms In ths subsecton, we compare the performance of the algorthms for mnng vald 2-groups usng locaton summarzaton methods wth AGP. 5 In partcular, we combne the results from Sectons 3.3 and 6.3 wth the same parameter settng,.e., dataset M1kN10k, max_ds = 30, mn_dur = 4, and mn_we = 1%. 5 Recall that AGP and VG-growth share the same procedure of fndng the set of vald 2-groups.

35 274 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Fg. 15(a) gves the T 2 curves of the dfferent algorthms. We can see that T 2 values of the locaton summarzaton based algorthms are around 10 20% of that of AGP. Ths llustrates the advantage by usng locaton summarzaton based algorthms. Recall that AGP runs n man memory, whle the locaton summarzaton based algorthms need to access the dsk-resdent orgnal database D. Even so, the latter could stll outperform AGP sgnfcantly. If the orgnal database was also loaded nto man memory, the T 2 of the locaton summarzaton based algorthms would be expected to be much smaller. Fg. 15(b) shows the sze of canddate 2-groups generated by AGP and the locaton summarzaton based algorthms. Here, we only drew the curves for SLS and CLS, snce GSLS and GCLS were very close (slghtly below) to those of SLS and CLS respectvely. Note that, AGP always generates a constant number of canddate 2-groups,.e., ¼ 499; 500. On the other hand, the locaton summarzaton based algorthms generate dfferent sets of canddate 2-groups, whch are much smaller than those generated by AGP. In fact, the rato of C 2 / M 2 for the locaton summarzaton based algorthms was around 13%. Ths sgnfcantly reduced the overhead for mnng vald 2-groups. 7. Related work Group pattern mnng s a very new data mnng research that nvolves both space and tme data. Although there are no exstng work dealng wth the same problem, there do exst some works that are related. In moble computng, n order to reduce the uplnk bandwdth consumpton durng locaton updatng especally for a large populaton, several group models have been proposed such that locaton updatng s conducted only for each derved group rather than for each ndvdual user [12,13,15,21]. However, the groups defned and dscovered n these works are derved only based on physcal closeness, wthout consderng tme. In addton, there s no nterestngness measure for the derved groups, as there s no need to rank the groups n the locaton update problem. Group pattern mnng deals wth tme seres of user locaton data. Hence, t s related to the prevous works on sequental data mnng and tme seres data mnng. Sequental pattern mnng was frst ntroduced by Agrawal and Srkant [4] and was generalzed by the same authors n [24]. Gven a sequence database, n whch each sequence s an ordered lst of transactons that contans a set of tems and has a tme stamp, the goal of sequental pattern mnng s to fnd the set of subsequences wth support exceedng a mnmum support threshold, where the support of a subsequence s the number of sequences contanng ths subsequence. An Apror-lke algorthm GSP was proposed n [24], whch needs to generate and examne a huge number of ntermedate canddate subsequences. To break ths bottleneck, Han et al. desgned an effcent algorthm Prefx- Span for mnng large sequence database contanng long sequences [18 20]. The PrefxSpan algorthm greatly reduces the efforts of canddate subsequence generaton by transformng the sequence database nto a set of smaller proected database and performng mnng on each proected database. Another effcent sequental pattern mnng algorthm, known as SPADE, was proposed by Zak [31], n whch the author adopted vertcal format data. Besdes, the problem of mnng

36 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) frequent epsodes n event sequences was ntroduced n [16], n whch the epsode s defned as a set of events that occur n a gven partal order wthn a tme wndow. Full perodc patterns was studed n [17], n whch the goal s to fnd all assocaton rules that occur perodcally n the whole transacton database. However, partal perodcty s more common n practce snce t s lkely that only some not all of the tme epsodes appear perodcally, whch s the topc studed n [9,10,26]. Furthermore, n [30], the authors consdered the stuaton where the element mutaton s allowed and thus proposed a compatblty matrx to buld probablstc connectons between the observed element and the actual element. In addton, Yang et al. nvestgated the problem of mnng asynchronous perodc patterns n tme seres n [29], consderng the case that the presence of perodc patterns may be shfted because of random nose. There are also some studes on smlarty search n tme seres, such as [1,2]. Among them, Vlachos et al. presented technques to compute the smlartes between traectores of movng obects n [25]. The traectory smlarty s defned by extendng the longest common subsequence (LCSS) model [2] to consder the shft n space when comparng two traectores. Note that, f the spatal shft s not allowed, two movng obects are consdered smlar f ther traectores are close to each other. Nevertheless, Vlachos et al. focused on desgnng effcent algorthm to fnd a traectory, from a set of traectores, that s most smlar to a gven query traectory. They dd not consder the problem of fndng all the clusters of traectores that are physcally close to one another. In [23], Shekhar et al. proposed approaches to dscover spatal co-locaton patterns, whch s defned as a set of features that are physcally close to one another. However, the locatons of features are statc. Ths s very dfferent from vald group pattern mnng where we deal wth users movng contnuously. As dscussed n Secton 3, f each user s vewed as an tem and each group s vewed as an temset, mnng vald groups looks lke mnng frequent temsets n assocaton rule mnng [3,11]. Unfortunately, we have shown that the exstng algorthms n assocaton rule mnng cannot be appled drectly to vald group mnng because of the unque features of the vald group pattern and weght defnton. 8. Concluson Ths paper reports a novel approach to mne user group patterns from user movement data. The dscovered group patterns, satsfyng both spatal and temporal proxmty requrements, could potentally be used n target marketng and personalzed servces. We formally defne the vald group pattern mnng problem and develop two algorthms (AGP and VG-growth) for mnng vald group patterns. The performance of these two algorthms on synthetc movement databases has been reported. It has been shown that VG-growth s a better algorthm and the cost of mnng group patterns s manly due to the mnng of vald-2 groups. Snce VG-growth and AGP have dentcal steps for mnng vald 2-groups, we further ntroduce four locaton summarzaton methods to reduce the mnng overhead. These methods allow vald 2-groups to be mned usng smaller number of summarzed records and by examnng smaller number of canddate 2- groups. We also conducted experments to evaluate the performance of the four methods. The experment results have shown that our proposed locaton summarzaton methods are much

37 276 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) more effcent than AGP algorthm for mnng vald 2-groups. The cubod based methods also perform better than the sphere based methods. In our current work, we dvde the orgnal database D nto dfferent fragments wth equal length w. In the future, t s worthwhle to study the possblty of usng tme wndows wth dfferent szes to summarze locaton nformaton. Besdes, we wll nvestgate other possble ssues related to vald group pattern mnng n our future work, such as effcent approaches to deal wth mssng data or unsynchronzed tme stamps n the movement database. Our future research wll also consder carryng out data cleanng or data transformaton on the movement database before mnng. There are also ongong research extendng group pattern mnng to dynamc user movement databases. Appendx A. GSLS and GCLS methods wth maxmum speed constrant GSLS2 and GCLS2 are the extensons to GSLS and GCLS that ncorporate the maxmum speed constrant on the moble users. The maxmum dstance a user can move wthn a tme wndow w s denoted by d w. For GSLS2, we use d w to determne a sutable radus scale unt c. A smaller c can make the radus of the derved non-grd sphere closer to the orgnal one. Except for the computaton of c, the SM and mnmum and maxmum dstances computatons of GSLS2 are the same as GSLS. Smlarly, n GCLS2, by knowng d w, we can construct a cube wth sde length equal to d w. Then, we dvde ths cube nto a set of sub-cells wth sub-cell length l sub, and gve each sub-cell an unque sub-cell d. Smlar to Eqs. (21) and (22), the computaton of sub-cell d and the Reverse and N ysub, where N xsub ¼ N ysub ¼ d w l sub. functon can be carred out based on l sub, N xsub As shown n Fg. A.1, the dashed lnes represent the sub-cells. Let the cubod wth thckened borders and two corners, v mn and v max, be the summarzed cubod n CLS. Recall that, n GCLS, the two corners are represented by cells contanng v mn and v max. In GCLS2, the corner v mn s stll represented by the correspondng cell contanng t. However, the other corner v max s represented by the sub-cell contanng t, denoted by sd max. That s, n GCLS2, the SM s n Z sub-cell d w cell length l cell sub-cell length l sub v mn v max X O Y Fg. A.1. Illustraton of GCLS2 methods.

38 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) the form of hd mn,sd max. For user u at t 0, the correspondng d mn and sd max can be determned by: d mn ¼ u ½t 0 Š.x mn u ½t 0 Š.y þ N x mn l l u ½t 0 Š.x max u ½t 0 Š.x mn l sd max l ¼ l sub þðn x N y Þ u ½t 0 Š.y max u ½t 0 Š.y mn l l þ N xsub l sub u ½t 0 Š.z mn l ða:1þ u ½t 0 Š.z max u ½t 0 Š.z mn l l þðn xsub N ysub Þ ða:2þ l sub Smlar to GCLS, n GCLS2, gven two grd-spheres hd mn ; sd max and hd mn ; sd max, the mnmum and maxmum dstances can be determned by frst dervng two larger non-grd cubods, ðv 0 mn ; v 0 max Þ and ðv 0 mn ; v 0 max Þ, contanng the two orgnal grd-cubods,.e., v 0 mn ¼ðd mn.x 0 l; d mn.y 0 l; d mn.z 0 lþ v 0 max ¼ðsd max.x 0 l sub þ l sub þ d mn.x 0 l; sd max.y 0 l sub þ l sub þ d mn.y 0 l; sd max v 0 mn ¼ðd mn.z 0 l sub þ l sub þ d mn.z 0 lþ.x 0 l; d mn.y 0 l; d mn.z 0 lþ ða:3þ v 0 max ¼ðsd max.x 0 l sub þ l sub þ d mn.x 0 l; sd max.y 0 l sub þ l sub þ d mn.y 0 l; sd max.z 0 l sub þ l sub þ d mn.z 0 lþ The x 0, y 0, z 0 ndex values of d mn can be obtaned by Reverse functon, whle those for sd max can be obtaned by applyng Reverse functon to the sub-cells,.e., usng N xsub and N ysub. Then, the extreme dstances are computed from ðv 0 mn ; v 0 max Þ and ðv 0 mn ; v 0 max Þ usng Eqs. (17) (20).

39 278 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Appendx B. Parameter settngs for grd based locaton summarzaton We provde the parameter combnatons for D0 ¼ 1 and D0 ¼ 1, as shown n Tables B.1 D 120 D 12 and B.2 respectvely. The parameter combnatons n boldface are the ones producng the smallest Table B.1 Parameter settngs for D0 D ¼ (0.83%) SLS w = 160 CLS w = 240 d (bytes) w l GCLS d (bytes) r ðbytesþ w d w c l GSLS GSLS d (bytes) sd (bytes) w d w l sub l GCLS

40 Table B.2 Parameter settngs for D0 D ¼ 1 12 (8.33%) SLS w =16 CLS w =24 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) d (bytes) w l GCLS d (bytes) r ðbytesþ w d w c l GSLS GSLS d (bytes) sd (bytes) w d w l sub l GCLS T 2. For example, when D0 D ¼ 1 120, GSLS yelds the smallest T 2 when d = 4 bytes, r ¼2 bytes, and w = 60.

41 280 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) References [1] R. Agrawal, C. Faloutsos, A. Swam, Effcent smlarty search n sequence databases, n: Proceedngs of the 4th Internatonal Conference on Foundatons of Data Organzaton and Algorthms, Chcago, [2] R. Agrawal, K.-I. Ln, H. Sawhney, K. Shm, Fast smlarty search n the presence of nose, scalng, and translaton n tme-seres databases, n: Proceedngs of VLDBÕ95, Zurch, Swtzerland, [3] R. Agrawal, R. Srkant, Fast algorthms for mnng assocaton rules, n: Proceedngs of the 20th Internatonal Conference on Very Large Databases, Santago, Chle, 1994, pp [4] R. Agrawal, R. Srkant, Mnng sequental patterns, n: Proceedngs of the 11th Internatonal Conference on Data Engneerng, Tape, Tawan, 1995, pp [5] B. Hofmann-Wellenhof, H. Lchtenegger, J. Collns, thrd revsed ed.global Postonng System: Theory and Practce, vol. I, Sprnger-Verlag Wen, New York, [6] G. Chen, D. Kotz, A Survey of Context-Aware Moble Computng Research, Dartmouth Computer Scence Techncal Report TR , Department of Computer Scence, Dartmouth College, [7] D. Forsyth, Group Dynamcs, thrd ed., Wadsworth, Belmont, CA, [8] G. Gagls, P. Kourouthanass, A. Tsamakos, Moble Commerce: Technology, Theory, and Applcatons, Idea Group Publshng, 2002, Ch. Towards a Classfcaton Network for Moble Locaton Servces. [9] J. Han, G. Dong, Y. Yn, Effcent mnng of partal perodc patterns n tme seres database, n: Proceedngs of the 15th Internatonal Conference on Data Engneerng, 1999, pp [10] J. Han, W. Gong, Y. Yn, Mnng segment-wse perodc patterns n tme-related databases, n: Proceedngs of the 4th Internatonal Conference on Knowledge Dscovery and Data Mnng, 1998, pp [11] J. Han, J. Pe, Y. Yn, Mnng frequent patterns wthout canddate generaton, n: Proceedngs of the Internatonal Conference on Management of Data, Dallas, TX, [12] X. Hong, M. Gerla, Dynamc group dscovery and routng n ad hoc networks, n: Proceedngs of the Frst Annual Medterranean Ad Hoc Networkng Workshop (Med-hoc-Net 2002), Sardegna, Italy, [13] Y. Huh, C. Km, Group-based locaton management scheme n personal communcatons networks, n: 16th Internatonal Conference on Informaton Networkng (ICOIN-16), Korea, [14] J. Kaufman, J. Myllymak, J. Jackson, IBM Almaden Research Center. Avalable from: < (December 2001). [15] G.H.K. Lam, H.V. Leong, S.C.F. Chan, GBL: Group-based locaton updatng n moble wreless envronment, n: 9th Internatonal Conference on Database Systems for Advanced Applcatons (DASFAA 2004), Korea, [16] H. Mannla, H. Tovonen, A.I. Verkamo, Dscovery of frequent epsodes n event sequences, Data Mnng and Knowledge Dscovery 1 (3) (1997) [17] B. Özden, S. Ramaswamy, A. Slberschatz, Cyclc assocaton rules, n: Proceedngs of the 14th Internatonal Conference on Data Engneerng, Orlando, Florda, USA, 1998, pp [18] J. Pe, J. Han, B. Mortazav-Asl, H. Pnto, Q. Chen, U. Dayal, M.-C. Hsu, PrefxSpan mnng sequental patterns effcently by prefx proected pattern growth, n: Proceedngs of the 17th Internatonal Conference on Data Engneerng (ICDEÕOl), 2001, pp Avalable from: <cteseer.st.psu.edu/artcle/pe01prefxspan.html>. [19] J. Pe, J. Han, B. Mortazav-Asl, J. Wang, H. Pnto, Q. Chen, U. Dayal, M.-C. Hsu, Mnng sequental patterns by pattern-growth: the PrefxSpan approach, IEEE Transactons on Knowledge and Data Engneerng 16 (11) (2004) [20] J. Pe, J. Han, W. Wang, Mnng sequental patterns wth constrants n large databases, n: Proceedngs of 2002 Internatonal Conference on Informaton and Knowledge Management (CIKMÕ02), [21] G.-C. Roman, Q. Huang, A. Hazem, Consstent group membershp n ad hoc networks, n: Internatonal Conference on Software Engneerng, 2001, pp [22] J. Schafer, J. Konstan, J. Redl, E-commerce recommendaton applcatons, Data Mnng and Knowledge Dscovery 5 (1/2) (2001) [23] S. Shekhar, Y. Huang, Dscoverng spatal co-locaton patterns: a summary of results, Lecture Notes n Computer Scence 2121 (2001) Avalable from: <cteseer.st.psu.edu/artcle/shekhar01dscoverng.html>. [24] R. Srkant, R. Agrawal, Mnng sequental patterns: generalzatons and performance mprovements, n: P.M.G. Apers, M. Bouzeghoub, G. Gardarn (Eds.), Proceedngs of the 5th Internatonal Conference on Extendng

42 Database Technology, EDBT, vol. 1057, Sprnger-Verlag, 1996, pp Avalable from: <cteseer.n.nec.com/ artcle/srkant96mnng.html>. [25] M. Vlachos, G. Kollos, D. Gunopoulos, Dscoverng smlar multdmensonal traectores, n: Proceedngs of the 18th Internatonal Conference on Data Engneerng (ICDEÕ02), San Jose, CA, [26] W. Wang, J. Yang, P. Yu, Infomner+: mnng partal perodc patterns wth gap penaltes, n: Proceedngs of the 2nd IEEE Internatonal Conference on Data Mnng (ICDM), [27] Y. Wang, E.-P. Lm, S.-Y. Hwang, On mnng group patterns of moble users, n: Proceedngs of the 14th Internatonal Conference on Database and Expert Systems Applcatons DEXA 2003, Prague, Czech Republc, [28] Y. Wang, E.-P. Lm, S.-Y. Hwang, Effcent group pattern mnng usng data summarzaton, n: Proceedngs of the 9th Internatonal Conference on Database Systems for Advanced Applcatons DASFAA 2004, Jeu Island, Korea, [29] J. Yang, W. Wang, P. Yu, Mnng asynchronous perodc patterns n tme seres data, IEEE Transacton on Knowledge and Data Engneerng (TKDE) 15 (3) (2003) [30] J. Yang, P. Yu, W. Wang, J. Han, Mnng long sequental patterns n a nosy envronment, n: Proceedngs of 2002 ACM-SIGMOD Internatonal Conference on Management of Data (SIGMODÕ02), [31] M.J. Zak, SPADE: an effcent algorthm for mnng frequent sequences Specal Issue on Unsupervsed Learnng (Doug Fsher, ed.), Machne Learnng Journal 42 (2001) [32] P. ZarchanGlobal Postonng System: Theory and Applcatons, vol. I, Amercan Insttute of Aeronautcs and Astronautcs, [33] V. Zempeks, G.M. Gagls, G. Lekakos, A taxonomy of ndoor and outdoor postonng technques for moble locaton servces, SIGecom Exchanges, ACM vol. 3.4, 2003, pp Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) Yda Wang s a Ph.D. canddate wth the School of Computer Engneerng at the Nanyang Technologcal Unversty, Sngapore. He obtaned hs B.Eng. degree from Xan Jaotong Unversty, Chna, n Hs research nterests nclude data mnng, moble computng, and context aware systems. He has publshed n nternatonal conferences ncludng DEXA, DASFAA, etc. Ee-Peng Lm s an Assocate Professor wth the School of Computer Engneerng, Nanyang Technologcal Unversty, Sngapore. He obtaned hs Ph.D. from the Unversty of Mnnesota, Mnneapols n Upon graduaton, he started hs academc career at the Nanyang Technologcal Unversty (NTU). In 1997, he establshed the Centre for Advanced Informaton Systems and was apponted the Centre Drector. He was later apponted a vstng professor at the Chnese Unversty of Hong Kong from December 2001 to June Upon hs return to NTU, he started headng the Dvson of Informaton Systems wthn the School of Computer Engneerng. He has publshed more than 120 referred ournal and conference artcles n the area of web/data mnng, dgtal lbrares and database ntegraton. He s currently an Assocate Edtor of the ACM Transactons on Informaton Systems (TOIS) and the Internatonal Journal of Dgtal Lbrares (IJDL).

43 282 Y. Wang et al. / Data & Knowledge Engneerng 57 (2006) San-Yh Hwang receved the B.S. and M.S. degrees from Natonal Tawan Unversty, Tawan, and the Ph.D. degree from the Unversty of Mnnesota, Mnneapols n 1994, all n computer scence. He oned the Department of Informaton Management at Natonal Sun Yat-sen Unversty, Tawan, n 1995 and s presently a professor. Hs current research nterests nclude workflow systems, data mnng, and movng obect databases.

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Efficient Project Portfolio as a tool for Enterprise Risk Management

Efficient Project Portfolio as a tool for Enterprise Risk Management Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse

More information

Software project management with GAs

Software project management with GAs Informaton Scences 177 (27) 238 241 www.elsever.com/locate/ns Software project management wth GAs Enrque Alba *, J. Francsco Chcano Unversty of Málaga, Grupo GISUM, Departamento de Lenguajes y Cencas de

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

Generalizing the degree sequence problem

Generalizing the degree sequence problem Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

To Fill or not to Fill: The Gas Station Problem

To Fill or not to Fill: The Gas Station Problem To Fll or not to Fll: The Gas Staton Problem Samr Khuller Azarakhsh Malekan Julán Mestre Abstract In ths paper we study several routng problems that generalze shortest paths and the Travelng Salesman Problem.

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node Fnal Report of EE359 Class Proect Throughput and Delay n Wreless Ad Hoc Networs Changhua He [email protected] Abstract: Networ throughput and pacet delay are the two most mportant parameters to evaluate

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) [email protected] Abstract

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process Dsadvantages of cyclc TDDB47 Real Tme Systems Manual scheduler constructon Cannot deal wth any runtme changes What happens f we add a task to the set? Real-Tme Systems Laboratory Department of Computer

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Lecture 3: Force of Interest, Real Interest Rate, Annuity Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

VoIP over Multiple IEEE 802.11 Wireless LANs

VoIP over Multiple IEEE 802.11 Wireless LANs SUBMITTED TO IEEE TRANSACTIONS ON MOBILE COMPUTING 1 VoIP over Multple IEEE 80.11 Wreless LANs An Chan, Graduate Student Member, IEEE, Soung Chang Lew, Senor Member, IEEE Abstract IEEE 80.11 WLAN has hgh

More information

Formulating & Solving Integer Problems Chapter 11 289

Formulating & Solving Integer Problems Chapter 11 289 Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng

More information

Network Aware Load-Balancing via Parallel VM Migration for Data Centers

Network Aware Load-Balancing via Parallel VM Migration for Data Centers Network Aware Load-Balancng va Parallel VM Mgraton for Data Centers Kun-Tng Chen 2, Chen Chen 12, Po-Hsang Wang 2 1 Informaton Technology Servce Center, 2 Department of Computer Scence Natonal Chao Tung

More information

A generalized hierarchical fair service curve algorithm for high network utilization and link-sharing

A generalized hierarchical fair service curve algorithm for high network utilization and link-sharing Computer Networks 43 (2003) 669 694 www.elsever.com/locate/comnet A generalzed herarchcal far servce curve algorthm for hgh network utlzaton and lnk-sharng Khyun Pyun *, Junehwa Song, Heung-Kyu Lee Department

More information

Conferencing protocols and Petri net analysis

Conferencing protocols and Petri net analysis Conferencng protocols and Petr net analyss E. ANTONIDAKIS Department of Electroncs, Technologcal Educatonal Insttute of Crete, GREECE [email protected] Abstract: Durng a computer conference, users desre

More information

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika.

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) 2127472, Fax: (370-5) 276 1380, Email: info@teltonika. VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Real-Time Process Scheduling

Real-Time Process Scheduling Real-Tme Process Schedulng [email protected] (Real-Tme and Embedded Systems Laboratory) Independent Process Schedulng Processes share nothng but CPU Papers for dscussons: C.L. Lu and James. W. Layland,

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Outsourcing inventory management decisions in healthcare: Models and application

Outsourcing inventory management decisions in healthcare: Models and application European Journal of Operatonal Research 154 (24) 271 29 O.R. Applcatons Outsourcng nventory management decsons n healthcare: Models and applcaton www.elsever.com/locate/dsw Lawrence Ncholson a, Asoo J.

More information

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1. HIGHER DOCTORATE DEGREES SUMMARY OF PRINCIPAL CHANGES General changes None Secton 3.2 Refer to text (Amendments to verson 03.0, UPR AS02 are shown n talcs.) 1 INTRODUCTION 1.1 The Unversty may award Hgher

More information

Ring structure of splines on triangulations

Ring structure of splines on triangulations www.oeaw.ac.at Rng structure of splnes on trangulatons N. Vllamzar RICAM-Report 2014-48 www.rcam.oeaw.ac.at RING STRUCTURE OF SPLINES ON TRIANGULATIONS NELLY VILLAMIZAR Introducton For a trangulated regon

More information

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers Ž. The Journal of Supercomputng, 15, 25 49 2000 2000 Kluwer Academc Publshers. Manufactured n The Netherlands. A Prefx Code Matchng Parallel Load-Balancng Method for Soluton-Adaptve Unstructured Fnte Element

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Financial Mathemetics

Financial Mathemetics Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

General Auction Mechanism for Search Advertising

General Auction Mechanism for Search Advertising General Aucton Mechansm for Search Advertsng Gagan Aggarwal S. Muthukrshnan Dávd Pál Martn Pál Keywords game theory, onlne auctons, stable matchngs ABSTRACT Internet search advertsng s often sold by an

More information

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks Master s Thess Ttle Confgurng robust vrtual wreless sensor networks for Internet of Thngs nspred by bran functonal networks Supervsor Professor Masayuk Murata Author Shnya Toyonaga February 10th, 2014

More information

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt. Chapter 9 Revew problems 9.1 Interest rate measurement Example 9.1. Fund A accumulates at a smple nterest rate of 10%. Fund B accumulates at a smple dscount rate of 5%. Fnd the pont n tme at whch the forces

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle [email protected] 2 Unversdad Fns Terrae,

More information

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid

Feasibility of Using Discriminate Pricing Schemes for Energy Trading in Smart Grid Feasblty of Usng Dscrmnate Prcng Schemes for Energy Tradng n Smart Grd Wayes Tushar, Chau Yuen, Bo Cha, Davd B. Smth, and H. Vncent Poor Sngapore Unversty of Technology and Desgn, Sngapore 138682. Emal:

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Level Annuities with Payments Less Frequent than Each Interest Period

Level Annuities with Payments Less Frequent than Each Interest Period Level Annutes wth Payments Less Frequent than Each Interest Perod 1 Annuty-mmedate 2 Annuty-due Level Annutes wth Payments Less Frequent than Each Interest Perod 1 Annuty-mmedate 2 Annuty-due Symoblc approach

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

Sngle Snk Buy at Bulk Problem and the Access Network

Sngle Snk Buy at Bulk Problem and the Access Network A Constant Factor Approxmaton for the Sngle Snk Edge Installaton Problem Sudpto Guha Adam Meyerson Kamesh Munagala Abstract We present the frst constant approxmaton to the sngle snk buy-at-bulk network

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

On Lockett pairs and Lockett conjecture for π-soluble Fitting classes

On Lockett pairs and Lockett conjecture for π-soluble Fitting classes On Lockett pars and Lockett conjecture for π-soluble Fttng classes Lujn Zhu Department of Mathematcs, Yangzhou Unversty, Yangzhou 225002, P.R. Chna E-mal: [email protected] Nanyng Yang School of Mathematcs

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information