Neighborhood Based Fast Graph Search in Large Networks

Size: px
Start display at page:

Download "Neighborhood Based Fast Graph Search in Large Networks"

Transcription

1 Neighborhood Bsed Fst Grph Serch in Lrge Networks Arijit Khn Dept. of Computer Science University of Cliforni Snt Brbr, CA 9306 Ziyu Gun Dept. of Computer Science University of Cliforni Snt Brbr, CA 9306 Nn Li Dept. of Computer Science University of Cliforni Snt Brbr, CA 9306 Supriyo Chkrborty Dept. of Electricl Engineering University of Cliforni Los Angeles, CA Xifeng Yn Dept. of Computer Science University of Cliforni Snt Brbr, CA 9306 Shu To IBM T. J. Wtson 9 Skyline Drive Hwthorne, NY 0532 shuto@us.ibm.com ABSTRACT Complex socil nd informtion network serch becomes importnt with vriety of pplictions. In the core of these pplictions, lies common nd criticl problem: Given lbeled network nd query grph, how to efficiently serch the query grph in the trget network. The presence of noise nd the incomplete knowledge bout the structure nd content of the trget network mke it unrelistic to find n exct mtch. Rther, it is more ppeling to find the top-k pproximte mtches. In this pper, we propose neighborhood-bsed similrity mesure tht could void costly grph isomorphism nd edit distnce computtion. Under this new mesure, we prove tht subgrph similrity serch is NP hrd, while grph similrity mtch is polynomil. By studying the principles behind this mesure, we found n informtion propgtion model tht is ble to convert lrge network into set of multidimensionl vectors, where sophisticted indexing nd similrity serch lgorithms re vilble. The proposed method, clled Ness (Neighborhood Bsed Similrity Serch), is pproprite for grphs with low utomorphism nd high noise, which re common in mny socil nd informtion networks. Ness is not only efficient, but lso robust ginst structurl noise nd informtion loss. Empiricl results show tht it cn quickly nd ccurtely find high-qulity mtches in lrge networks, with negligible cost. Ctegories nd Subject Descriptors H.3.3 [Informtion Serch nd Retrievl]: Serch process; I.2.8 [Problem Solving, Control Methods, nd Serch]: Grph nd tree serch strtegies Generl Terms Algorithms, Performnce Permission to mke digitl or hrd copies of ll or prt of this work for personl or clssroom use is grnted without fee provided tht copies re not mde or distributed for profit or commercil dvntge nd tht copies ber this notice nd the full cittion on the first pge. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission nd/or fee. SIGMOD, June 2 6, 20, Athens, Greece. Copyright 20 ACM //06...$0.00. Keywords Grph Query, Grph Serch, Grph Alignment, RDF. INTRODUCTION Recent dvnces in socil nd informtion science hve shown tht linked dt pervde our society nd the nturl world round us [36]. Grphs become incresingly importnt to represent complicted structures nd schem-less dt such s wikipedi, freebse [5] nd vrious socil networks. Given n ttributed network nd smll query grph, how to efficiently serch the query grph in the trget network is criticl tsk for mny grph pplictions. It hs been extensively studied in chemi-informtics, bioinformtics, XML nd Semntic Web. SPARQL [27] is the stte-of-the RDF query lnguge for Semntic Web. SPARQL requires ccurte knowledge bout the grph structure to write query nd lso it performs n exct grph pttern mtching. However, due to the noise nd the incomplete informtion (structure nd content) in mny networks, it is not relistic to find exct mtches for given query. It is more ppeling to find the top-k pproximte mtches. Unfortuntely, grph similrity mesures such s subgrph isomorphism, mximum common subgrphs, grph edit distnce, missing edges tht re pproprite for chemicl structures nd biologicl networks, re not suitble for entity-reltionship grphs nd socil networks. There re two chllenging issues for these grph theoretic mesures. First, entity-reltionship grphs nd socil networks hve quite different chrcteristics from physicl networks. They re not governed by physicl lws nd often full of noise, thus mking strict topologicl similrity exmintion nerly impossible. How the entities re connected in these networks re not s importnt s how closely these entities re connected. Second, these grphs re very lrge nd complex with lot of ttributes ssocited. If ccurcy is to be ensured, the lgorithms developed for edit distnce nd missing edges re not sclble. These two issues motivte us to invent new grph similrity mesures tht re less sensitive to structure chnges, nd hve sclble indexing nd serch solutions. Figure () shows grph query to Find the thlete who is from Romni nd won gold in 3000m nd bronze in 500m both in 984 olympics.. Compre this query ginst possible mtch in FreeBse (Olympics) shown in Figure (b), it is observed tht these two grphs re by no mens similr under trditionl grph similrity definitions. Grph edit distnce between

2 Romni Bronze Romni 500m 984 () Query 3000m Mricic Puic (b) Mtch in Freebse Gold Bronze 500m m Gold Figure : Top- Mtch for Query () in FreeBse these two grphs is 7. The size of their mximum common grph is 3. The number of mximum missing edges for the query grph is 4. However, Mricic Purc in Figure (b) is good mtch for the query shown in Figure (), becuse she hs ll these ttributes quite close to her in Figure (b). In prctice, it is hrd to come up with query tht exctly conforms with the grph structures in the trget network due to the lck of schems in linked dt. However, it is esy to write query like Figure (), where user connects entities with possible links. As long s the the proximity between these entities is pproximtely mintined in query grph, the system shll be ble to deliver mtches like Figure (b). The bove pproximte query form cn serve s primitive for mny dvnced grph opertors such s RDF query nswering, network lignment, subgrph similrity serch, nme dismbigution nd dtbse schem mtching. For exmple, bsed on prtil informtion relted to one person, e.g. his friends, one cn lign his physicl socil circle with his cyber socil network on Fcebook. In mny cses, nodes in socil or informtion networks hve incomplete informtion or even nonymized informtion. Nevertheless, the prtil neighborhood informtion vilble from query grph will be helpful to identify entities in the trget network. Clerly, there is need to dopt pproximte similrity serch techniques to solve the bove problem. In bioinformtics, pproximte grph lignment hs been extensively studied, e.g. PthBlst [2], Sg [33]. These studies resort to strict pproximtion definition such s grph edit distnce, whose optiml solution is expensive to compute. Since they re trgeting reltively smll biologicl networks with less thn 0k nodes, it is difficult to pply them in socil nd informtion networks with thousnds or even millions of nodes. As illustrted in NetAlign [23], in order to hndle lrge grphs with 0k nodes, one hs to scrifice ccurcy to chieve better query response time. Recently there hve been other studies on pproximte mtching with lrge grphs, i.e., TALE [34], SIGMA [24] nd G-Ry [35]. However, both TALE nd SIGMA consider the number of missing edges s the qulittive mesure of pproximte mtching nd hence, the techniques cnnot cpture the notion of proximity mong lbels, s shown in Figure. G-Ry, on the other hnd, tries to mintin the shpe of the query by llowing some pproximtion in the mtch. Unfortuntely, shpe is not n importnt fctor in entity-reltionship grphs. In this pper, we introduce novel neighborhood-bsed similrity mesure by vectorizing nodes ccording to the lbel distribution of their neighbors. We further extend the similrity notion to grph by finding the embeddings in the trget grph tht mximize the sum of node mtches. This grph mtching technique voids complicted subgrph isomorphism nd grph edit distnce clcultion, which becomes infesible for lrge grphs. It is observed tht socil/informtion networks usully hve more diversified node lbels nd therefore less uto-isomorphic structure, but my contin more noise. Our objective function cn provide better similrity semntics for grphs with vrious rndom noise. It simplifies the procedure of grph mtching, leding to the development of n efficient grph serch frmework, clled Ness (Neighborhood Bsed Similrity Serch). With the introduction of sclble indices built on vectorized nodes nd n intelligent query optimiztion technique, Ness cn quickly nd ccurtely find high-qulity mtches in lrge networks, with negligible time cost. Our contributions. We propose novel similrity serch problem in grphs, neighborhood-bsed similrity serch, which combines the topologicl structure nd content informtion together during the serch process. The similrity definition proposed in this work is ble to void expensive isomorphism testing s much s possible. The principles to derive pproprite functions to fit this definition re crefully exmined. We found tht the informtion propgtion model stisfies these principles, where ech node propgtes certin frction of its lbels to its neighbors, nd thereby we could convert ech node into multidimensionl vector, where sophisticted indexing nd similrity serch lgorithms re vilble. Tht is, we successfully turn grph serch problem into high-dimension index problem. We first identify set of rules to define pproximte mtches of nodes bsed on their neighborhood structure nd lbels. These rules re importnt since the query my not lwys hve complete informtion bout the exct neighborhood structure in the trget grph. The pproximte node mtch concept is further extended to subgrph similrity serch, i.e. multiple node lignment for given query grph. We prove tht under this mesure, subgrph similrity serch is NP hrd. However, in comprison with grph isomorphism, which is neither known to be solvble in polynomil time nor NP-hrd, grph similrity mtch is proved to be polynomil. We demonstrte tht, without performing subgrph isomorphism testing, it is possible to prune unpromising nodes by itertively propgting node informtion mong shrinking cndidte set, which significntly reduces query execution time. We further nlyze how to index the vector structure s well s optimize query processing to speed up similrity serch. The informtion propgtion model nd the neighborhood vectoriztion pproch keep the index structure much simpler thn the grph itself, thus mking it esy to be updted dynmiclly for grph chnges rising from node/edge insertion nd deletion. In summry, we propose completely new grph similrity serch frmework, Ness, to define nd determine pproximte mtches in mssive grphs. As tested in rel nd synthetic networks, Ness is ble to find high-qulity mtches efficiently in lrge scle networks. 2. PRELIMINARIES A lbeled grph G = (V G, E G, L G ) hs lbel set L G nd ech node u V G is ttched with set of lbels. The lbel set of node u in G is denoted by L(u) L G. For the ske of simplicity, we ssume there re no lbels nd weights on the edges. Nevertheless, the proposed techniques could be extended for grphs with lbeled or weighted edges. Given two lbeled grphs G nd G, G is clled subgrph isomorphic to G, if there exists subgrph H of G, such tht G is isomorphic to H. Formlly, we define subgrph isomorphism s follow. DEFINITION (SUBGRAPH ISOMORPHISM). A subgrph isomorphism is n injective function f : V G V G, s.t., () u

3 V G, L(u) L(f(u)), nd (2) (u, v) E G, (f(u), f(v)) E G. DEFINITION 2 (EMBEDDING). Given grph G nd query grph Q, n embedding of Q is n (injective) function f : V Q V G, such tht, v V Q, L(v) L(f(v)), where f(v) V (G). In this work, we only studied the one-to-one node mtching for query grph Q nd the node lbels re preserved in the embedding. However, our cost function nd lgorithms cn be extended to include other mtching nd node lbel similrity scenrios. Given two grphs G nd Q, there might be mny possible embeddings. Certinly, the qulity of n embedding depends on whether it preserves the connections nd lbels in the query grph or not. Subgrph isomorphism ctully defines n exct embedding, written s f e. The qulity of n embedding cn be defined in vrious wys; i.e., for given lbel-preserved embedding f, we cn count the number of edge mismtches, C e = {(u, v) E Q : (f(u), f(v)) E G}, s the embedding s qulity. In generl, for cost function C : f R, we define the top-k grph similrity serch problem s below. PROBLEM STATEMENT. Given grph G nd query grph Q, find the top-k embeddings with respect to cost function C. The edge mismtch cost function C e hs been studied in [38, 34, 24]. Unfortuntely, it cnnot differentite the cse where two nodes re close to ech other but there is no direct edge between them. f f 2 c b u u 3 u 2 c b u' u' 3 u' 2 G b c v v 2 v 3 Q Figure 2: Problem with Edge Mismtch Cost Function b u d cf d g ce Figure 3: Informtion Propgtion Model Figure 2 shows one exmple. There re two lbel-preserved embeddings f nd f 2 of the query grph Q in trget grph G. In f nd f 2, there is no edge connecting nd b. Thus, C e will ssign equl cost to both embeddings. On the other hnd, the grph edit distnce between f nd Q is 2, wheres it is only between f 2 nd Q. Although, intuitively it is observed tht f is better mtch thn f 2, becuse the nodes with lbels nd b re only 2-hops wy in f, wheres they re disconnected in f 2. This observtion inspires us to develop neighborhood-bsed similrity mesure tht discounts how nodes re exctly connected, but focuses on the proximity mong the lbels crried by these nodes. It needs to chieve the following two objectives: () The cost function should identify pproximte embeddings, nd (2) it must be esy to compute. In the next section, we will define the neighborhood-bsed similrity cost function nd the complexity nlysis of tht function. 3. NEIGHBORHOOD-BASED GRAPH SIM- ILARITY In order to solve the problem rised by the edge mismtch cost function, we define novel neighborhood-bsed similrity mesure by compring the h-hop neighbors of node, defined s follows. h DEFINITION 3 (h-hop NEIGHBORS). Given grph G nd node u V (G), the h-hop neighborhood of u is the set of nodes v whose distnce from u is less thn or equl to h. To compre the neighborhoods of two nodes, we resort to n informtion propgtion model [22] tht is ble to trnsform neighborhoods into vectors in multidimensionl spce, where sophisticted indexing nd fst similrity serch lgorithms re vilble. 3. Informtion Propgtion Model Figure 3 shows the informtion propgtion model to chrcterize the neighborhood informtion round node u. The lbel informtion encoded in u s neighbors is propgted to u through different pths nd ccumulted t u. One could use the ccumulted informtion nd its strength s vector to describe the neighborhood of u. The neighborhood vector of u is denoted by R(u), which consists of set of tuples, R(u) = { l, A(u, l) }, where l is lbel present in the neighborhood of u nd A(u, l) represents the strength of lbel l t node u in grph. There re mny different mechnisms to propgte informtion. However, not every one is vlid for grph similrity serch. Any vlid one must comply with the following principle, PROPERTY (COST FUNCTION). For grph similrity cost function C, given n exct embedding f e, C(f e) must be equl to 0. Here, we consider simple but effective informtion propgtion model so tht the derived neighborhood-bsed similrity mesure stisfies the bove principle. It propgtes informtion long the shortest pths between two nodes with exponentil decy to the length. Eq. describes the formul of A(u, l) in R(u) = { l, A(u, l) } tht represents the h-hop neighborhood of node u in grph. A(u, l) = h i= α i d(u,v)=i I(l L(v)), () where I(l L(v)) is n indictor function which tkes vlue one when l is in the lbel set of v nd zero otherwise. d(u, v) is the distnce between u nd v. α is constnt clled the propgtion fctor. It is between 0 nd, whose optimum vlue will be discussed lter. Eq. 2 confines Eq. to n embedding f in G by only considering the vertices nd the shortest pths in f. A f (u, l) = h i= α i v V f,d(u,v)=i I(l L(v)). (2) Using this informtion propgtion model, we shll formulte the neighborhood-bsed cost function. 3.2 Neighborhood-bsed Cost Function Given query grph Q nd its embedding f in the trget grph G, we cn pply the informtion propgtion model to propgte lbels in Q nd f. Since vertices in f might not be directly connected, we will consider ll of the shortest pths connecting these vertices during propgtion. To derive the neighborhood-bsed cost function C N (f), we first compute the difference between the neighborhood vectors R f (u) nd R Q(v), representing the neighborhoods u nd v in the embedding nd the query grph, respectively. C N (v, u) = l R Q (v) M(A Q(v, l), A f (u, l)), (3)

4 where M(x, y) is positive difference function s given below. { x y, if x > y; M(x, y) = 0, otherwise. The reson to dpt positive difference function is tht if the embedding f in G crries more lbels thn Q, we shll not penlize it. Only when there re lbels nd edges missed in f, C N (v, u) will return positive vlue. Note tht, the summtion in Eqution 3 is considered over ll lbels l present in R Q (v), i.e. {l : A Q (v, l) > 0}. For brevity, we simply denote this by l R Q (v) in Eqution 3, nd the sme nottion will be used in the remining of the pper. Given n embedding f, we ggregte the differences for ll pirs (v, u), where u = f(v). The neighborhood bsed grph similrity cost C N (f) is given s follows. C N (f) = C N (v, f(v)) (4) v V u2 fbcuf2 bu3 u 2 bv v2 Q f bc b c c d d G Q Figure 4: Neighborhood Bsed Similrity Cost G Figure 5: Exmple of Flse Positive Figure 4 provides n exmple of neighborhood bsed grph mtching cost. In grph G, lbel b is propgted to node u from node u 2 nd u 2, vi the corresponding shortest pths respectively. Assume α = 0.5 nd h = 2, we hve A G (u, b) = = We cn derive the neighborhood vectors for other nodes in G: R G (u ) = { b, 0.75, c, 0.5 }, R G (u 2 ) = {, 0.5, c, 0.25 }, R G (u 3 ) = {, 0.5, b, 0.75 } nd R G (u 2) = { c, 0.5,, 0.25 }. Similrly, R Q (v ) = { b, 0.5 } nd R Q (v 2 ) = {, 0.5 }. In Figure 4, we hve two possible embeddings f nd f 2. R f (u ) = { b, 0.5 } nd R f (u 2 ) = {, 0.5 }. Hence, C N (f ) = ( ) + ( ) = 0. For f 2, we mtch v to u nd v 2 to u 2. We hve R f2 (u ) = { b, 0.25 } nd R f2 (u 2) = {, 0.25 }. Therefore, C N (f 2) = ( ) + ( ) = 0.5. Note tht, for the embedding f 2, node u 3 will not contribute ny lbels to R f2 since it does not prticipte in the mtching. However, it is on the shortest pth from u 2 to u, thus propgting lbels between u 2 nd u. We must mention tht the vectoriztion of the neighborhoods nd the comprison mong these vectors cn be done in vrious wys. However, the finl cost function must stisfy the bsic property of C (Property ) to void flse negtives for exct embeddings. The following theorem shows tht C N follows this property. THEOREM. For n exct embedding f e, C N (f e) = 0. PROOF. For n exct embedding f e, if (v, v 2) E Q, then (f e(v ), f e(v 2)) E G. Thus, the shortest distnce between the node pirs f e(v ), f(v 2) in f e cnnot be higher thn the shortest distnce between the node pirs v, v 2 in Q. Hence, it follows from Eq. tht l, v, A f (f e (v), l) A Q (v, l). Therefore, bsed on Eq. 3 nd Eq. 4, C N (f e ) = 0. Q Theorem ensures tht there is no flse negtives for exct embeddings. However, there might be some flse positives s shown in Figure 5. In this exmple, if h =, C N (f) = 0, lthough f is not n exct embeddings of Q. Fortuntely, if we increse h to 2, C N (f) > 0. In rel-life grphs tht hve low utomorphism nd more distinct lbels in nodes, flse positives cn mostly be voided, s shown in our experiments nd in the following Lemm. LEMMA. Given grph G nd query grph Q, if ech of their nodes hs distinct lbel, for ny inexct embedding f, h > 0, α > 0, C N (f) > 0. PROOF. Omitted. Our definition of neighborhood-bsed cost function is robust ginst structurl differences nd other forms of noises. As long s two close lbels in query grph re close enough in the trget grph, we consider it s potentil mtch. We cn lso rnk the embeddings bsed on the proximity of their lbels in the trget grph compred to tht in the query grph. Thus, even if there exists no exct embedding of the query grph, the cost function cn identify the closely pproximte mtches nd rnk them bsed on their structurl differences. We formlly define our problem sttement s follows. PROBLEM STATEMENT 2. [Neighborhood-Bsed Top-k Similrity Serch] Given trget grph G nd query grph Q, find the top-k embeddings with respect to the cost function C N. In the following discussion, we show tht the bove problem is NP-hrd by reducing the clique problem to it. LEMMA 2. Given grph G nd query grph Q, u V G, v V Q, L(u) =, L(v) =, if Q is complete grph, then for ll inexct embeddings f, C N (f) > 0. PROOF. Since u V G, v V Q, L(u) =, L(v) =, for ny inexct embedding f, ech node u = f(v) hs only one lbel, which is sme s the lbel of node v in Q. Since, Q is complete grph, there exists t lest one node f(v) in f nd lbel l such tht the number of -hop neighbors of v in Q tht hs lbel l is more thn the number of -hop neighbors of f(v) in f with lbel l. Hence, A Q(v, l) > A f (f(v), l). Therefore, it follows from the definition of C N tht, C N (f) > 0. THEOREM 2. Neighborhood-Bsed Top-k Similrity Serch is NP-hrd. PROOF. Let us consider the cse where L(u) =, L(v) =, u V G, v V Q, nd Q is complete grph. Suppose the top- mtch f cn be identified in polynomil time. Given f, it cn lso be verified in polynomil time, whether C N (f) = 0. Now, if C N (f) = 0, by Lemm 2, there exists clique of size of Q in the trget grph G. So, it is possible to solve the clique problem in polynomil time. However, we know tht, the clique decision problem is NP-hrd [0], therefore we hve contrdiction. Hence, the similrity serch problem is NP-hrd. The grph isomorphism problem is neither known to be solvble in polynomil time nor NP-complete. However, given two grphs Q nd G of sme size, it is possible to determine in polynomil time, if G itself is n embedding of Q with cost C N (f) = 0. We cll this problem s the Grph Similrity Mtch problem. Thus, we suspect tht neighborhood-bsed similrity serch might hve lower time complexity thn grph theoretic mesures such s grph isomorphism nd edit distnce.

5 THEOREM 3. Grph Similrity Mtch is polynomil in n, where n = V Q. PROOF. Since G itself is n embedding f of Q, we cn determine the individul node mtching costs C N (v, u) in polynomil time, for ll v V Q, u V G. Next, we construct flow network nd determine the minimum cost of mximum flow in tht network (see Figure 6). From the source node s, dd directed edge to ech node v in Q. The cpcity of ech of these edges is nd the cost is 0. Similrly, from ech node u in G, dd directed edge to the sink node t. The cpcity nd cost of ech of these edges re nd 0 respectively. From ech node v in Q, dd directed edge to ech node u in G, if L(v) L(u). The cpcity nd cost of this edge re nd C N (v, u) respectively. Due to the cpcity constrints, ech node in Q cn be mtched with t most one node in G, nd lso only one node of Q cn be mtched with sme node in G. Clerly, if the mximum flow in this network is n nd the minimum cost of the mximum flow is 0, then G is n embedding of Q with cost C N (f) = 0. However, this flow problem cn be solved using the Ford nd Fulkerson lgorithm [] in O(n 3 ) time. Therefore, given two grphs Q nd G of the sme size, it is possible to determine in polynomil time, if G itself is n embedding f of Q with cost C N (f) = 0. follow, A G(u, l) = h n i (l)α i (l) i=2 < n2 (l)α 2 (l) n(l)α(l) To void flse positive, we wnt A G (u, l) < A Q (v, l) = α(l) s shown in Figure 7. Hence, α(l) <. n(l)+n 2 (l) In the next section, we will introduce n itertive method to find the top-k embeddings in lrge grph. 4. SEARCH ALGORITHM In this section, we introduce sclble itertive pproch to find the top-k grph embeddings. Our gol is not to enumerte ll the possible embeddings f in G for given query grph, whose cost is prohibitive. Insted of enumerting f, we directly use A G (u, l) to bound A f (u, l) since A G (u, l) A f (u, l). LEMMA 3. Given query grph Q nd its embedding f in G, l, u V f, A G (u, l) A f (u, l). PROOF. Omitted. (5),0 s Q v v 2,C N(v,u ) u u 2 G,0 t Lemm 3 shows tht A G (u, l) in the neighborhood vector R G (u) cnnot be lower thn A f (u, l) of the sme lbel l in the neighborhood vector R f (u), where f is subgrph of G. THEOREM 4. Given query grph Q nd its embedding f in G, M(A Q (v, l), A G (f(v), l)) C N (f) v V Q l R Q (v) v n u n PROOF. It follows from Lemm 3 so tht M(A Q (v, l), A f (u, l)) M(A Q (v, l), A G (u, l)). Figure 6: Flow Network to Solve Grph Similrity Mtch 3.3 Propgtion Fctor: α In the informtion propgtion model described in Eq., the propgtion fctor, α, should be less thn in order to reflect the reltion tht the strength A(u, l) of lbel l t node u decreses with the increse of distnce. However, we find the top-k embeddings by repetedly mtching the individul nodes from G nd Q tht stisfies cost threshold ϵ (The detiled procedure will be discussed in the next section). Now, if α is lrge, ech node will propgte high frction of lbels to its neighbors nd this cn increse the number of flse positives t the initil node mtching stge, thus slowing down the overll serch process. In Figure 7, for α = 0.5 nd h = 2, we get R G (u) = {, } = {, 0.5 } nd R Q (v) = {, 0.5 }. Thus, node u G will be reported s mtch of node v Q even for cost threshold ϵ = 0. Clerly, this is flse positive. To solve this problem, we do not employ uniform propgtion fctor for different lbels. Insted, for ech lbel l, we select n optimum α(l). For given lbel l, let us ssume tht, the mximum number of one-hop neighbors with lbel l, of ny node in G is n(l). To consider the worst cse, let us ssume tht, some node u in G hs no one-hop neighbor with lbel l; but it hs n 2 (l) two-hop neighbors with lbel l, n 3 (l) three-hop neighbors with lbel l nd so on. Therefore, the strength of lbel l t node u in G will be s Theorem 4 shows tht without enumerting embeddings of Q in the trget grph G, we cn derive the lower bound: M(A Q (v, l), A G (u, l)), where u is possible mtch of v in G. u G A G(u, ) = 0.5 Figure 7: High α v Q A Q(v, ) = 0.5 Flse Positive for u b c d c G b c Figure 8: Node Mtching Exmple Our lgorithm works by itertively pruning unpromising nodes in the trget grph.. Mtch the individul nodes of the query grph with some nodes in the trget grph, which stisfies predefined cost threshold ϵ (See Eq. 7). 2. Discrd the lbels of the unmtched nodes in the trget grph. 3. Propgte the lbels only mong the mtched nodes from the previous step. Recompute the neighborhood vectors R G (u) only for the mtched nodes. Repet Step until convergence. u b b c d Q v

6 During ech itertion, we remove the lbels of the unmtched nodes in the trget grph G nd then recompute the neighborhood vectors only for the mtched nodes. Since the modified trget grph hs more unlbeled nodes compred to the previous itertion, it will decrese A G (u, l). With this new nd reduced set of neighborhood vectors nd using the sme cost threshold ϵ, we determine the individul node mtches with the nodes of the query grph. Therefore, some dditionl nodes in G will be unmtched t ech itertion. The itertion continues until there is no unmtched nodes found. For rel life grphs, with less utomorphism nd more distinct lbels, we cn unlbel most of the unpromising nodes using this technique. Thus finding the top-k embeddings from the set of remining mtched nodes of G becomes lmost trivil. To determine the runtime complexity of our itertive serch lgorithm, let us denote the number of promising nodes present before i-th itertion s n i nd the number of unpromising nodes discovered t i-th itertion s k i ; where i. Clerly, n = n nd n i+ = n i k i. If there re totl r itertions, r i= k i = O(n). Let the complexity of itertion i be T i. In the first itertion, for ech node, it needs to propgte its lbels t h hops. Thus, T = O(nld h ), where l is the verge number of lbels, d h is the verge number of h-hop neighbors for ech node in G. However, for ech of the subsequent itertions, it is not necessry to perform such propgtion for ll the nodes in the grph. Rther, the number of unpromising nodes t itertion i +, for i, cn be determined by either propgting the remining n i+ nodes lbels, or by subtrcting the effect of k i unpromising nodes from previous itertion. Hence, T i+ = O(min{n i+, k i }ld h ), for i. Therefore, the overll runtime complexity of our serch lgorithm is given s follow. T + r i=2 r T i = O(nld h ) + O(min{n i+, k i}ld h ) i= r = O(nld h ) + O(k i ld h ) i= = O(nld h ) (6) In prctice, it converges much fster. Next, we shll discuss the detils of the itertive lgorithm nd the lgorithm to find the top-k embeddings from the nodes filtered by the itertive lgorithm. 4. Node Mtch Given the trget grph G nd the query grph Q, we compute the vectors R G (u) nd R Q (v) for ll nodes u V G, v V Q, considering their h-hop neighborhoods. For ech node pir u V G, v V Q, s.t. L(v) L(u), we clculte the node mtching cost, cost(u, v) s the difference of their neighborhood vectors, cost(u, v) = M(A Q (v, l), A G (u, l)). (7) l R(v) Figure 8 shows n exmple. Assume α = 0.5 nd h = 2. We get R G (u) = { b, 0.5, c, } = { b, 0.5, c, 0.5 }, nd similrly, R G (u ) = { b,, c, 0.25 }. Menwhile, for the query grph Q, we hve R Q(v) = { b, 0.5, c, 0.25 }. Hence, cost(u, v) = 0 nd lso cost(u, v) = 0 following the bove eqution. Now, for ech node v V G, we mintin list of nodes u V G, such tht L(v) L(u) nd cost(u, v) ϵ. Here, ϵ is predefined cost threshold. The vlue of ϵ will be discussed shortly. 4.2 Top-k Serch In order to find the top-k grph embedding, we initilize the cost threshold ϵ to smll vlue ϵ 0 0 nd perform the bove mentioned itertive procedure until it termintes. Given the mtched nodes, if we cnnot find t lest k embeddings from them, with cost C N (f) ϵ V Q ech; then the threshold cost ϵ is doubled nd we repet the bove procedure, until the k embeddings re found. Otherwise, we find the top-k embeddings mong the mtched nodes. Note tht, t this point, ny embedding formed by ll unmtched nodes will hve cost C N (f) > ϵ V Q. However, it is possible to hve some embedding with few mtched nd unmtched nodes, nd the cost of such embeddings might lso be C N (f) ϵ V Q. The problem is eliminted s follow. We set ϵ equl to the highest cost of the discovered top-k embeddings nd then run the lgorithm gin (this step will find top-k embeddings whose node cost might be higher thn ϵ). In this cse, ny embedding formed by t lest one of the unmtched node will hve cost more thn tht of ny of the top-k embeddings found erlier. Hence, the top-k embeddings identified only using the mtched nodes will be the best top-k embeddings. The complete lgorithm is given below. Algorithm Top-k Serch Input: Trget grph G, query grph Q, positive integer k. Output: Top-k mtches f bsed on the cost metric C N. procedure : ϵ ϵ 0, compute R G(v), v V Q 2: list 0(v) = {u : u V G L(v) L(u)} 3: i, strt with originl grph G nd compute R G(u), u V G 4: for ll v V Q do 5: list i (v) = {u : u V G L(v) L(u) cost(u, v) ϵ} 6: end for 7: (list, i) = Itertive Unlbel(list, i, G, Q) 8: if k mtches of cost C N (f) ϵ V Q cn be found in {u : u list i (v) v V Q } then 9: report top-k mtches nd stop 0: else : ϵ 2ϵ 2: go bck to step 2 3: end if Algorithm 2 Itertive Unlbel (list, i, G, Q) procedure : if list i (v) < list i (v) for some v V G then 2: for ll u V G do 3: if u list i (v) v V Q then 4: unlbel u 5: end if 6: end for 7: recompute R(u) u V G 8: (list, i) = Itertive Unlbel(list, i +, G, Q) 9: else 0: return (list, i) : end if From the finl list of mtched nodes for ech node in V Q, how cn we find embeddings with cost C N (f) ϵ V Q ech (line 8 of Algorithm )? One simple technique is to consider ll possible combintions from the lists nd verify their costs. When the number of mtched nodes in ech of the finl lists is smll, it is not time consuming to check. However, when the lists re long, we cn do better thn brute force enumertion using dynmic progrmming.

7 After finl list of mtched nodes list(v) for ech v V Q is generted, we perform the propgtion once more mong the mtched nodes; however this time we propgte the node id s insted of lbels. After this propgtion, ech mtched node u in G will hve its neighboring nodes (denoted s neighbor(u)) within h hops who hve influence on the cost (Eq. ). The finl embeddings cn be formed s follows. We select node u list(v) for some v V Q nd initilize set P ossible_mtch = neighbor(u). We hve two situtions: () within h hops of u, there is no f(v ) v v in Q. (2) v v of Q, we try to identify mtch u inside P ossible_mtch nd extend this set by dding neighbor(u ) nd lso eliminting the node u from P ossible_mtch. For the first sitution, we could derive the cost for node u, l L(v) AQ (v, l). We cn recurse mong these two situtions to find the embeddings. In this wy, we cn find the low-cost embeddings without enumerting ll possible combintions mong the nodes in the finl lists. 5. INDEXING The most expensive prts of Ness re the computtion of R G (u) for ll u in G (Line 3 of Algorithm ) nd the determintion of list (v) for ll v in V Q (Line 5 of Algorithm ). However, the computtion of R G (u) cn be done off-line by performing bredth first serch up to h-hops from ech node in G. Its time complexity is O( V G d h ), where d is the verge degree of ech node. To speed up the computtion of list (v) for ll v V Q, we use two types of simple index structures. In the first type of indexing, we build hsh tble corresponding to ech lbel. The nodes in G re hshed bsed on their lbels. Given query node v, we use this hsh structure to quickly identify the set of possible mtches u, such tht L(v) L(u). If the lbels of v re very selective, there will be limited number of possible mtches u nd we cn quickly determine the nodes u mong these mtches, for which cost(u, v) ϵ. Algorithm 3 Neighborhood Bsed Indexing Off-line Procedure : pre compute R G (u) = { l, A G (u, l) } for ll u V G 2: for ll lbel l do 3: crete sorted list S(l) of nodes in descending order of A G (u, l), such tht u i (l) is i-th node in S(l) 4: end for On-line Procedure : i 2: sum(i) M(A Q (v, l), A G (u i (l), l)) l R(v) 3: if sum(i) ϵ then 4: i i + 5: go to step 2 6: else 7: verify u j(l) if cost(u j(l), v) ϵ, j < i, l R Q(v) 8: end if However, if the lbels of v re not very selective nd there re mny possible mtches using the hshing technique discussed bove, we use the second index structure, which is built on the neighborhood vector R G(u) following the principle of Threshold Algorithm [2]. The neighborhood vector R G(u) = { l, A G(u, l) } for ech node u V G is pre computed. Next, for ech lbel l, we generte sorted list S(l) of nodes u in descending order of their A G(u, l) vlues. Let us denote the node t position i from the top of S(l) s u i (l). In the online phse, we strt from the top of the ech l R Q (v) sorted list S(l) in prllel nd go to the next position in the subsequent itertion. For some position i from the top, we compute, sum(i) = M[A Q (v, l), A G (u i (l), l)]. Assume t itertion i = i, sum(i ) becomes greter thn the cost threshold ϵ. Then, we terminte this itertive procedure nd verify for ll nodes u j(l), where j < i, l R Q(v), if cost(u j(l), v) ϵ. For ech v V Q, we need to verify only O((i ) l ) nodes for their cost; where l denotes the number of lbels in R Q(v). This cn reduce the complexity of the online lgorithm significntly. The complete procedure for neighborhood bsed indexing is given in Algorithm 3. Proof of Correctness. Let us denote S i (l) s ll the nodes up to position i from top of the sorted list S(l), i.e. S i(l) = {u j(l), j i}. The following lemm will be useful to prove the correctness of our indexing lgorithm. LEMMA 4. If sum(i) > ϵ, then for ll u {S i (l) : l R Q (v)}, cost(u, v) > ϵ. PROOF. It follows directly from the fct tht, ech S(l) is sorted list of nodes u in descending order of A G (u, l) vlues. Therefore, in Algorithm 3, we strt from i = nd find the smllest i, for which sum(i) > ϵ. Following the previous lemm, for ny node u {S i (l) : l R Q (v)}, we cn eliminte them without ctully computing cost(u, v). We note tht, our indexing cn be esily implemented in diskbsed mnner for very lrge grphs. Also we cn pply externl memory bredth first serch lgorithms, e.g., Ulrich Meyer [] nd Lrs Arge [2], to compute the neighborhood vectors R G(u) for ll the nodes. Dynmic Updte. Our indexing structure cn efficiently ccommodte dynmic updtes in G, i.e., insertion/ deletion of nodes, edges nd lbels. If node u is dded or deleted in G, it will only chnge the vectors of u s h-hop neighbors. We only need to propgte the lbels of these nodes nd modify their neighborhood vectors. They lso need to be updted in the sorted lists of lbel l for ll l L(u). The ddition/ deletion of lbel cn be hndled similrly. If n edge (u, u 2 ) is dded/ deleted in G, we need to updte vectors for the h hop neighbors of both u nd u QUERY OPTIMIZATION In this section, we eliminte the non-discrimintive lbels both from the trget nd query grphs t the initil stge of our mtching lgorithm to mke the technique more efficient. The efficiency of the lgorithm Itertive Unlbel is relted to the number of individul node mtches for ech node in the query grph. If there exists some node which is not very selective in terms of its own lbels or the lbels present in its neighborhood, there will be mny mtches corresponding to tht node t the initil stge of our lgorithm. In order to eliminte the problem posed by these nodes, we first eliminte ll the non-discrimintive lbels both from the trget grph nd the query grph, nd then we lso ignore the nodes in the query grph, which do not contin sufficient number of discrimintive lbels in themselves nd in their neighborhoods. These nondiscrimintive lbels re considered t the lst stge of our mtching lgorithm, i.e., when we serch for the finl mtches. In the following discussion, we shll clrify the notion of discrimintive nd non-discrimintive lbels in the perspective of node nd grph mtches.

8 ? Sheil McCrthy? Andre Mgic in the Wter () Query Mrth Plimpton? John Stephen Wters Spielberg () Query Drren E. Burrows Thoms Burstin S. McCrthy Thoms Burstin S. McCrthy Pecker The Goonies Cry-Bby Amistd Andre Andre Mgic in the Wter The Lotus Eter Mgic in the Wter Bright Angel John Stephen Wters Spielberg John Wters Stephen Spielberg (b) Mtch_ (c) Mtch_2 (b) Mtch_ (c) Mtch_2 Figure 0: Top-2 Mtches (Query ) Figure : Top-2 Mtches (Query 2) Pruned # of nodes () hevy-hed Not Pruned A Q(v, l) A G(u, l) # of nodes A Q(v, l) (b) hevy-til A G(u, l) Figure 9: Discrimintive (Hevy-Hed) vs. Non-Discrimintive (Hevy-Til) Distribution Let us consider the distribution of A G(u, l) vlues of some lbel l, <l, A G(u, l)> R G(u), for different nodes u V G. Figure 9 shows one exmple. For lbel l, we plot the different A G (u, l) vlues long the X-xis. The Y -xis shows the number of nodes u hving tht prticulr A G (u, l) vlue in their neighborhood vector R G (u). The distribution in Figure 9() is skewed towrds the smller vlues of A G (u, l), wheres Figure 9(b) is skewed towrds the higher vlues of A G (u, l). We cll them s hevy-hed nd hevy-til distributions respectively. Given query node v, since we prune ll the nodes u in G for which l R Q (v) M[A Q (v, l), A G (u, l)] > ϵ, the lbels with hevy-hed distribution hve more pruning power thn those with hevy-til distribution. Therefore, we should retin lbels with hevy-hed distribution for node mtch, s those lbels re more discrimintive. 7. EXPERIMENTAL RESULTS In this section, we present the experimentl results to demonstrte the effectiveness nd the efficiency of the neighborhood bsed similrity serch technique on number of rel-life nd synthetic grph dtsets including DBLP, Intrusion, Freebse nd WebGrph. In order to evlute the effectiveness, we show two possible pplictions - RDF query nswering nd network lignment. We test the robustness of our pproch by providing the ccurcy of the best mtches for queries of different sizes nd under the presence of rndom noise. The efficiency nd sclbility of our pproch re lso investigted. All experiments re performed using single core in 40GB, 2.50GHz Xeon server. 7. Grph Dt Sets DBLP Collbortion Grph. The DBLP collbortion grph is downloded from ley /db. There re 684K distinct uthors nd 7M co-uthor edges mong them. We consider the nme of ech uthor s the lbel of tht node. There re 683, 927 distinct lbels in DBLP. We use the DBLP dtset for efficiency test. Freebse Entity Reltionship Grph. Freebse is lrge collbortive knowledge bse of structured dt hrvested from mny sources including Wikipedi. We downloded the film entity reltionship grph dt from / This grph hs 72K nodes, ech representing n entity, i.e., ctor, movie, director, producer nd so on. An edge represents the reltionship between two entities. Nmes of entities re treted s lbels. There re totl 579K edges nd 59, 54 distinct lbels in this grph. Freebse grph is used for effectiveness, robustness nd efficiency nlysis. Intrusion Alert Network. This network contins the nonymous log dt of intrusion lerts in computer network. It hs 200K nodes nd 703K edges where ech node is computer nd n edge mens possible ttck such s Denil-of-Service nd TCP Service Sweep. Ech node hs 25 lbels (computer generted lerts in this cse) on verge. There re round, 000 types of lerts. We use this grph for robustness nd efficiency experiments. WebGrph with Synthetic Lbels. We downloded the uk web grph dt from [4]. This web grph is collection of UK web pges. For our experiments, we use subset tht contins 0M pges (i.e. nodes) nd 23M hyperlinks (i.e. edges). We uniformly ssign 0, 000 syntheticlly generted lbels cross vrious nodes, such tht ech node gets one lbel. We test the sclbility of our pproch on this grph. 7.2 RDF Query Answering In ddition to the query shown in Figure, we show two more exmples using the Freebse grph dtset. Query : Who did cinemtogrphy for t lest two Sheil Mc- Crthy movies, one of them being Andre? The person ws lso cinemtogrpher of the movie Mgic in the Wter. Here, we would like to emphsize tht, Sheil McCrthy did not ct in the movie Andre. However, s discussed erlier, this type of inccurcy is common, since the user my not hve the c-

9 ACCURACY () Accurcy (Intrusion) ERROR RATIO (b) Error Rtio (Freebse) ERROR RATIO (c) Error Rtio (Intrusion) Figure 2: Robustness of Network Alignment AVG # OF ITERATIONS () Top-k Serch (Algorithm ) AVG # OF ITERATIONS (b) Itertive Unlbel (Algorithm 2) SEARCH TIME (SEC) (c) Online Serch Time Figure 3: Convergence of Online Serch Algorithm (DBLP) curte informtion, or there cn be some noises in the trget grph. Using our pproch, we get the following top-2 nswers for this query, s shown in Figure 0. Query 2: Which ctors hve ppered in both "John Wters" movie nd "Steven Spielberg" movie? The query nd the corresponding top-2 mtches re shown in Figure. Here, we would like to emphsize tht, ctors in the Freebse dtset re not directly connected with the directors nd cinemtogrphers; rther vi some movies. To write SPARQL query, we need to mintin this structurl property. However, given the query grph s shown in Figure, which does not mintin this structurl property; we still obtin the results, where the embeddings re very close to the query grph. 7.3 Network Alignment We perform network lignment for query grphs of different sizes nd in the presence of vrious mount of noise. For these experiments, three different sets of query grphs re used with dimeters 2, 3, 4 nd the number of nodes 00, 50, 200 respectively. These query sets will simulte the sitution when we lign smll socil network to lrge one. In ech query set, we rndomly select 00 subgrphs with the specified dimeters nd nodes from the originl grph dtsets. Then we introduce noise by dding edges to the query grphs, which re not present in the originl grph. The noise rtio is defined s the number of edges dded divided by the originl number of edges present in the query grph. We use propgtion depth 2 nd α is selected s described erlier in Section 3.3. The robustness of our pproch in the presence of rndom noise is mesured using two metrics. The ccurcy is defined s the number of correctly identified nodes of the trget grph in ll the top- mtches divided by the totl number of nodes in ll query grphs in the corresponding query set. The ccurcy is for both DBLP nd Freebse dtsets with different mounts of noise, since these grphs hve more number of distinct lbels. The ccurcy vs. noise rtio plots for Intrusion dtset is shown in Figure 2(). The ccurcy remins t reltively high level when the noise rtio increses up to 0.2. We lso mesure the error rtio, which is defined s the number of incorrectly identified nodes of the trget grph in ll the top- mtches divided by the totl number of nodes in ll query grphs in the corresponding query set. The lower is the error rtio, the more distinguishble the nodes re in terms of their neighborhood structure nd contents. The error rtio remins close to 0 for DBLP grph t different mount noise. The error rtio vs. noise rtio plots for Freebse nd Intrusion re shown in Figure 2(b) nd 2(c) respectively. It cn be observed tht the error rtio remins t reltively low level for Freebse grph, when the noise rtio increses up to 0.2. Hence, these experiments indicte tht DBLP nd Freebse is less utomorphic compred to the Intrusion network. 7.4 Efficiency Results We provide the running time of our lgorithm for different dtsets in Tble. For these experiments, we rndomly select query grphs with 50 nodes nd dimeter 2 from the originl grph dtsets. The vectoriztion nd indexing is performed with propgtion depth 2 nd the serch lgorithm is used to identify the top- mtches. It cn be observed tht our lgorithm is very efficient for lrge grph dtsets. The on-line phse for Intrusion grph requires more time becuse the verge number of lbels per node is much higher thn tht in other grphs. This leds to more time used for cost computtion (Eq. (7)). We lso verify the convergence rte of our Top-k Serch nd Itertive Unlbel lgorithms for vrious network lignment experiments discussed erlier. The convergence rte of these lgorithms is mesured s the verge number of itertions required before they terminte. When the noise rtio is incresed, our lgorithm requires more itertions to stisfy the cost threshold. Thus, the corresponding running time lso increses s shown in Figure 3 for the DBLP dtset. Moreover, it requires more time to identify the

10 AVG # OF ITERATIONS () Convergence (Freebse) SEARCH TIME (SEC) (b) Serch Time (Freebse) AVG # OF ITERATIONS (c) Convergence(Intrusion) SEARCH TIME (SEC) (d) Serch Time (Intrusion) Figure 4: Convergence of Online Serch Algorithm (Freebse & Intrusion) mtches of lrger query grph. The convergence plots for Freebse nd Intrusion networks re given in Figure 4. Dtset 2-hop Indexing Top- Serch (Off-line) (Online) DBLP, 733 sec 0.06 sec (0.7M, 7M, 0.7M) Freebse 280 sec 0.22 sec (0.2M, 0.6M, 0.2M) Intrusion 227 sec.6 sec (0.2M, 0.5M, K) WebGrph 5, 25 sec 0.26 sec (0M, 23M, 0K) Tble : Efficiency: Off-line Indexing nd Online Serch 7.5 Neighborhood-bsed Cost Function Properties Recll tht we proved in Theorem tht our neighborhood-bsed cost function ensures there is no flse negtives when the cost threshold is set to 0. In this subsection, we investigte the flse positive rte by using our neighborhood-bsed cost function with threshold set to 0. This experiment is performed on DBLP, Freebse nd Intrusion dtsets. In prticulr, for ech dtset, we select 00 smll query subgrphs with 0 nodes ech from the originl grph. For ech of the query grphs, by using 2-hop propgtion, we identify ll mtches with cost = 0. Among these mtches, we mnully verify if there is ny flse positives, i.e. mtch which is not grph isomorphic with the query grph. The percentge of flse positives is clculted s the number of flse positives divided by the totl number of mtches obtined. We show the results in Tble 2. It cn be seen tht using our cost function with cost threshold set to 0, the percentge of flse positives on rel-life socil/ informtion networks is very smll. Dtset Flse Positive DBLP 0% Freebse 0% Intrusion 0.3% Tble 2: Flse Positive Rtio Dtset Serch with Serch w/o Index&Op- Index&Optimiztion timiztion DBLP 0.06 sec 9.63 sec Freebse 0.22 sec.75 sec Tble 3: Benefits of Index nd Optimiztion As we hve discussed erlier, the higher the vlue of h is, the lower the number of flse positives will be. Therefore, for trget grph, we cn employ error rtio s cost function nd lern the stisfctory vlue of h from trining queries generted from the trget grph. DBLP grph is used in this experiment. We use trining set of 00 smll query grphs (with 0 nodes ech) generted from the DBLP grph. The queries re generted in such wy tht the lbels in the query nodes re mostly not unique. Some noise is lso dded in these query grphs s explined erlier. Next, we strt with h = 0 nd grdully increse h until the error rtio becomes less thn smll vlue. We show the results for DBLP grph in Figure 5. It cn be observed tht, by setting h = 2, we cn reduce the error rtio to n cceptble level when the noise rtio is below 0.. This indictes tht for the rel-life socil/ informtion networks with few uto-morphism nd mny distinct lbels, we only need smll propgtion depth to mke the error rtio close to zero. 7.6 Pruning Cpcity of Serch Algorithm We verify the pruning cpcity of our Top-k serch lgorithm with respect to the number of distinct lbels present in the trget grph. For this experiment, we use subgrph extrcted from the WebGrph dtset, which contins, 000 nodes nd 4, 067 edges. We vry the number of distinct lbels from to 800. Given rndomly extrcted query grph with the number of nodes V Q = 8, 0 nd 2 respectively, we check how mny subgrphs need to be verified during the finl mtch phse of our pproch. The smller this number is, the more powerful the pruning of our lgorithm is. We plot the number of subgrphs need to be verified in the finl mtch phse vs. the number of distinct lbels in Figure 6. Note tht the Y xis is in log scle. It cn be observed tht, when there is only distinct lbel in the entire grph, we need to verify bout 0 25 subgrphs for query grph with 8 nodes during the finl mtch phse. However, s the number of distinct lbels increses, the number of subgrphs tht we need to verify decreses rpidly. For 800 distinct lbels, we only need to verify very smll number of subgrphs (e.g. 2 subgrphs when V Q = 8) in the finl mtch phse of our pproch. Thus, our lgorithm cn be very efficient on grphs with few utomorphisms nd mny distinct lbels. 7.7 Indexing nd Query Optimiztion In Tble 3, we compre the running time of our online serch lgorithm with tht of liner scn with no indexing nd query optimiztion. Ech of the query grphs hs 50 nodes nd dimeter 2 for this experiment. It cn be observed tht, our indexing nd query optimiztion techniques cn significntly speed up online serch. We lso compre the index construction time of dynmic updte with the cost of rebuilding the whole index when the trget grph is modified. The propgtion depth is 2 for these experiments. The results for DBLP dtset re shown in Figure 7. As we cn see, for wide rnge of updtes in the trget grph, it is more efficient to updte the index structure rther thn re-indexing the grph. The

11 ERROR RATIO = 0 = 0.05 = 0.0 = PROPAGATION DEPTH # OF SUB GRAPHS (0 x ) V Q =8 V Q =0 35 V Q = # OF DISTINCT LABELS TIME (SEC) Dynmic Updte Re-Index % NODE UPDATE Figure 5: Stisfctory h Vlue (DBLP) Figure 6: Pruning Cpcity (WebGrph) Figure 7: Dynmic Updte Index (DBLP) results lso indicte tht our index structure is very efficient ginst dynmic updtes in the trget grph. 7.8 Sclbility We show the sclbility of our pproch on the WebGrph dtset. The vectoriztion time s function of the number of nodes in the grph is shown in Figure 8(). Figure 8(b) shows the chnge trends of the online serch time with respect to the number of nodes. The propgtion depth is 2 for indexing nd we identify the top- mtches using our serch lgorithm. Ech of the query grphs hs 0 nodes nd dimeter 3 for this experiment. As it cn be observed, for grph with 0 million nodes, our pproch cn return the top- mtch in 0. second. The corresponding index building time is lso tolerble. Both the index building time nd the online serch time is roughly liner in the number of nodes. These results show tht our technique is highly sclble for lrge scle informtion/ socil networks. TIME (SEC) # OF NODES (M) () Vectoriztion Time TIME (SEC) # OF NODES (M) (b) Serch Time For subgrph serch, Shsh et l. [3] extend the pth-bsed technique for full-scle grph retrievl; Yn et l. propose gindex [37] using frequent subgrphs. These studies inspired new grph index structures such s δ-tolernce Closed Frequent Subgrphs [8], Tree [40], nd GCoding[4]. He et l. [7] develop closure tree index to perform pproximte grph serch. Tin et l. [33] design frgment bsed index to ssemble n pproximte mtch. Shng et l. introduce n efficient lgorithm for testing subgrph isomorphism [29]. Ferro et l. propose novel indexing scheme, SING [26], bsed on loclity informtion. All these methods re built strictly on grph structures, not good for pproximte serch shown in Figure. There hve been significnt studies on inexct grph mtching on ttributed grphs [30, 7]. Tong et l. [35] propose the best-effort pttern mtching in lrge ttributed grphs. It finds the best mtch not bsed on the proximity mong the lbels, rther bsed on the shpe of the query grph. Tin et l. [34] proposed n pproximte subgrph mtching tool, clled TALE, with efficient indexing nd high pruning cpbilities. Mongiovì et. l. introduce set-coverbsed inexct grph mtching technique, clled SIGMA [24]. Both techniques only use edge misses to mesure the qulity of grph mtching. Therefore, they re not pproprite for the proximity bsed serch scenrio studied in this work. There hve been some recent work on inexct grph mtching, i.e., simultion bsed cubic time grph pttern mtching [3], homomorphism bsed subgrph mtching [4], Belief propgtion bsed net lignment [3], edgeedit-distnce bsed subgrph indexing technique [39] nd grph prtition bsed subgrph identifiction scheme [6]. Figure 8: Sclbility Results (WebGrph) 8. RELATED WORK Grph serch hs been studied in different contexts such s grph isomorphism, grph indexing, structure mtching, etc. In XML, where the structures encountered re often trees nd lttices, queries built on pth expression become populr [28] nd their corresponding indices hve been developed [9]. In bioinformtics, exct nd pproximte grph lignment hs been extensively studied, e.g., PthBlst [2], Sg [33], NetAlign [23], IsoRnk [32]. They re trgeting reltively smll biologicl networks with less thn 0k nodes. It is difficult to pply them in socil nd informtion networks with thousnds or even millions of nodes. Kernel bsed grph mtching techniques re lso proposed, e.g., common wlks [6, 8], shortest pth [5], limited-size subgrphs [9] nd subtree ptterns [20]. Recently, Shervshidze et. l [25] proposed fst subtree pttern kernel bsed on the Weisfeiler- Lehmn method. Kernel methods do not support subgrph serch well. 9. CONCLUSIONS In this pper, we defined new grph similrity mesure, neighborhood bsed grph similrity, nd proposed n informtion propgtion model to convert lrge network into set of multidimensionl vectors, where sophisticted indexing nd similrity serch lgorithms re vilble. We proved, under this mesure, tht subgrph similrity serch is NP hrd, while grph similrity mtch is polynomil. We introduced criterion to select the best propgtion rte with respect to different node lbels in grph. We further investigted the techniques to index the neighborhood vectors nd to compress them by deleting non-discrimintive lbels, thus optimizing the query processing time. The proposed method, clled Ness, is not only efficient, but lso robust ginst structure chnges nd informtion loss. Empiricl results show tht it could quickly nd ccurtely find high-qulity mtches in lrge networks, with negligible time cost. In future work, it will be interesting to consider the grph lignment problem, when the node lbels in two grphs re not exctly identicl, i.e the sme user cn hve slightly different usernmes in Fcebook nd Twitter.

12 0. ACKNOWLEDGMENTS This reserch ws sponsored in prt by the U.S. Ntionl Science Foundtion under grnt IIS nd by the Army Reserch Lbortory under coopertive greement W9NF (NS- CTA). X. Yn ws supported in prt by the Open Project Progrm of the Stte Key Lb of CAD&CG (Grnt No. A00), Zhejing University. The views nd conclusions contined herein re those of the uthors nd should not be interpreted s representing the officil policies, either expressed or implied, of the Army Reserch Lbortory or the U.S. Government. The U.S. Government is uthorized to reproduce nd distribute reprints for Government purposes notwithstnding ny copyright notice herein.. REFERENCES [] D. Ajwni, U. Meyer, nd V. Osipov. Improved externl memory bfs implementtion. In ALENEX, [2] L. Arge, G. S. Brodl, nd L. Tom. On externl-memory mst, sssp nd multi-wy plnr grph seprtion. In Workshop on Algorithmic Theory, Vol. 85 of LNCS, pges Springer, [3] M. Byti, M. Gerritsen, D. F. Gleich, A. Sberi, nd Y. Wng. Algorithms for lrge, sprse network lignment problems. ICDM, 0:705 70, [4] P. Boldi nd S. Vign. The WebGrph frmework I: Compression techniques. In WWW, pges , [5] K. M. Borgwrdt nd H.-P. Kriegel. Shortest-pth kernels on grphs. In ICDM, pges 74 8, [6] M. Brocheler, A. Pugliese, nd V. S. Subrhmnin. Cosi: Cloud oriented subgrph identifiction in mssive socil networks. ASONAM, 0: , 200. [7] S. Chudhury, K. Gnjm, V.Gnti, nd R. Motwni. Robust nd efficient fuzzy mtch for online dt clening. In SIGMOD, [8] J. Cheng, Y. Ke, W. Ng, nd A. Lu. FG-Index: Towrds verifiction-free query processing on grph dtbses. In SIGMOD, pges , [9] C. Chung, J. Min, nd K. Shim. APEX: An dptive pth index for xml dt. In SIGMOD, pges 2 32, [0] S. Cook. The complexity of theorem-proving procedures. In STOC, pges 5 58, 97. [] J. Edmonds nd R. M. Krp. Theoreticl improvements in lgorithmic efficiency for network flow problems. Journl of the ACM, 9(2): , 972. [2] R. Fgin, A. Lotem, nd M. Nor. Optiml ggregtion lgorithms for middlewre. In PODS, pges 02 3, 200. [3] W. Fn, J. Li, S. M, N. Tng, Y. Wu, nd Y. Wu. Grph pttern mtching: From intrctble to polynomil time. PVLDB, 3(): , 200. [4] W. Fn, J. Li, S. M, H. Wng, nd Y. Wu. Grph homomorphism revisited for grph mtching. PVLDB, 3():6 72, 200. [5] Freebse. [6] T. Gärtner, P. A. Flch, nd S. Wrobel. On grph kernels: Hrdness results nd efficient lterntives. In COLT nd the 7th Kernel Workshop, [7] H. He nd A. Singh. Closure-tree: An index structure for grph queries. In ICDE, pge 38, [8] H.Kshim nd A.Inokuchi. Kernels for grph clssifiction. ICDM Workshop on Active Mining, [9] T. Horváth, T. Gärtner, nd S. Wrobel. Cyclic pttern kernels for predictive grph mining. In KDD, pges 58 67, [20] J. J. Rmon nd T. Gärtner. Expressivity versus efficiency of grph kernels. In First Int. Workshop on Mining Grphs, Trees nd Sequences, pges 65 74, [2] B. P. Kelley, B. Yun, F. Lewitter, R. Shrn, B. R. Stockwell, nd T. Ideker. Pthblst: tool for lignment of protein interction networks. Nucleic Acids Res, 32:83 88, [22] A. Khn, X. Yn, nd K.-L. Wu. Towrds proximity pttern mining in lrge grphs. In SIGMOD, 200. [23] Z. Ling, M. Xu, M. Teng, nd L. Niu. Netlign: web-bsed tool for comprison of protein interction networks. Bioinformtics, 22(7): , [24] M. Mongiovì, R. D. Ntle, R. Giugno, A. Pulvirenti, A. Ferro, nd R. Shrn. Sigm: set-cover-bsed inexct grph mtching lgorithm. J. Bioinformtics nd Computtionl Biology, 8(2):99 28, 200. [25] N. N. Shervshidze nd K. M. Borgwrdt. Fst subtree kernels on grphs. pges Currn, 200. [26] R. D. Ntle, A. Ferro, R. Giugno, M. Mongiovì, A. Pulvirenti, nd D. Shsh. Sing: Subgrph serch in non-homogeneous grphs. BMC Bioinformtics, :96, 200. [27] E. Prudhommeux nd A. Seborne. Sprql query lnguge for rdf. Technicl report, W3C, [28] C. Qun, A. Lim, nd K. Ong. D(k)-index: An dptive structurl summry for grph-structured dt. In SIGMOD, pges 34 44, [29] H. Shng, Y. Zhng, X. Lin, nd J. Yu. Tming verifiction hrdness: An efficient lgorithm for testing subgrph isomorphism. In VLDB, pges , [30] L. Shpiro nd R. Hrlick. Structurl descriptions nd inexct mtching. IEEE Trns. on Pttern Anlysis nd Mchine Intelligence, 3:504 59, 98. [3] D. Shsh, J. T.-L. Wng, nd R. Giugno. Algorithmics nd pplictions of tree nd grph serching. In PODS, pges 39 52, [32] R. Singh, J. Xu, nd B. Berger. Globl lignment of multiple protein interction networks with ppliction to functionl orthology detection. PNAS, 05(35): , [33] Y. Tin, R. McEchin, C. Sntos, D. Sttes, nd J. Ptel. SAGA: subgrph mtching tool for biologicl grphs. Bioinformtics, 23(2): , [34] Y. Tin nd J. M. Ptel. Tle: A tool for pproximte lrge grph mtching. In ICDE, pges , [35] H. Tong, C. Floutsos, B. Gllgher, nd T. Elissi-Rd. Fst best-effort pttern mtching in lrge ttributed grphs. In KDD, pges , [36] D. J. Wtts, P. S. Dodds, nd M. E. J. Newmn. Identity nd serch in socil networks. Sience, 296: , [37] X. Yn, P. S. Yu, nd J. Hn. Grph indexing: A frequent structure-bsed pproch. In SIGMOD, pges , [38] X. Yn, P. S. Yu, nd J. Hn. Substructure similrity serch in grph dtbses. In SIGMOD, pges , [39] S. Zhng, J. Yng, nd W. Jin. Spper: Subgrph indexing nd pproximte mtching in lrge grphs. PVLDB, 3():85 94, 200. [40] P. Zho, J. Yu, nd P. Yu. Grph indexing: tree + delt >= grph. In VLDB, pges , [4] L. Zou, L. Chen, J. Yu, nd Y. Lu. A novel spectrl coding in lrge grph dtbse. In EDBT, pges 8 92, 2008.

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( ) Polynomil Functions Polynomil functions in one vrible cn be written in expnded form s n n 1 n 2 2 f x = x + x + x + + x + x+ n n 1 n 2 2 1 0 Exmples of polynomils in expnded form re nd 3 8 7 4 = 5 4 +

More information

Factoring Polynomials

Factoring Polynomials Fctoring Polynomils Some definitions (not necessrily ll for secondry school mthemtics): A polynomil is the sum of one or more terms, in which ech term consists of product of constnt nd one or more vribles

More information

Reasoning to Solve Equations and Inequalities

Reasoning to Solve Equations and Inequalities Lesson4 Resoning to Solve Equtions nd Inequlities In erlier work in this unit, you modeled situtions with severl vriles nd equtions. For exmple, suppose you were given usiness plns for concert showing

More information

Graphs on Logarithmic and Semilogarithmic Paper

Graphs on Logarithmic and Semilogarithmic Paper 0CH_PHClter_TMSETE_ 3//00 :3 PM Pge Grphs on Logrithmic nd Semilogrithmic Pper OBJECTIVES When ou hve completed this chpter, ou should be ble to: Mke grphs on logrithmic nd semilogrithmic pper. Grph empiricl

More information

Econ 4721 Money and Banking Problem Set 2 Answer Key

Econ 4721 Money and Banking Problem Set 2 Answer Key Econ 472 Money nd Bnking Problem Set 2 Answer Key Problem (35 points) Consider n overlpping genertions model in which consumers live for two periods. The number of people born in ech genertion grows in

More information

Experiment 6: Friction

Experiment 6: Friction Experiment 6: Friction In previous lbs we studied Newton s lws in n idel setting, tht is, one where friction nd ir resistnce were ignored. However, from our everydy experience with motion, we know tht

More information

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers. 2 Rtionl Numbers Integers such s 5 were importnt when solving the eqution x+5 = 0. In similr wy, frctions re importnt for solving equtions like 2x = 1. Wht bout equtions like 2x + 1 = 0? Equtions of this

More information

9 CONTINUOUS DISTRIBUTIONS

9 CONTINUOUS DISTRIBUTIONS 9 CONTINUOUS DISTIBUTIONS A rndom vrible whose vlue my fll nywhere in rnge of vlues is continuous rndom vrible nd will be ssocited with some continuous distribution. Continuous distributions re to discrete

More information

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999

Economics Letters 65 (1999) 9 15. macroeconomists. a b, Ruth A. Judson, Ann L. Owen. Received 11 December 1998; accepted 12 May 1999 Economics Letters 65 (1999) 9 15 Estimting dynmic pnel dt models: guide for q mcroeconomists b, * Ruth A. Judson, Ann L. Owen Federl Reserve Bord of Governors, 0th & C Sts., N.W. Wshington, D.C. 0551,

More information

EQUATIONS OF LINES AND PLANES

EQUATIONS OF LINES AND PLANES EQUATIONS OF LINES AND PLANES MATH 195, SECTION 59 (VIPUL NAIK) Corresponding mteril in the ook: Section 12.5. Wht students should definitely get: Prmetric eqution of line given in point-direction nd twopoint

More information

Math 135 Circles and Completing the Square Examples

Math 135 Circles and Completing the Square Examples Mth 135 Circles nd Completing the Squre Exmples A perfect squre is number such tht = b 2 for some rel number b. Some exmples of perfect squres re 4 = 2 2, 16 = 4 2, 169 = 13 2. We wish to hve method for

More information

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100

Mathematics. Vectors. hsn.uk.net. Higher. Contents. Vectors 128 HSN23100 hsn.uk.net Higher Mthemtics UNIT 3 OUTCOME 1 Vectors Contents Vectors 18 1 Vectors nd Sclrs 18 Components 18 3 Mgnitude 130 4 Equl Vectors 131 5 Addition nd Subtrction of Vectors 13 6 Multipliction by

More information

Treatment Spring Late Summer Fall 0.10 5.56 3.85 0.61 6.97 3.01 1.91 3.01 2.13 2.99 5.33 2.50 1.06 3.53 6.10 Mean = 1.33 Mean = 4.88 Mean = 3.

Treatment Spring Late Summer Fall 0.10 5.56 3.85 0.61 6.97 3.01 1.91 3.01 2.13 2.99 5.33 2.50 1.06 3.53 6.10 Mean = 1.33 Mean = 4.88 Mean = 3. The nlysis of vrince (ANOVA) Although the t-test is one of the most commonly used sttisticl hypothesis tests, it hs limittions. The mjor limittion is tht the t-test cn be used to compre the mens of only

More information

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES DAVID WEBB CONTENTS Liner trnsformtions 2 The representing mtrix of liner trnsformtion 3 3 An ppliction: reflections in the plne 6 4 The lgebr of

More information

CHAPTER 11 Numerical Differentiation and Integration

CHAPTER 11 Numerical Differentiation and Integration CHAPTER 11 Numericl Differentition nd Integrtion Differentition nd integrtion re bsic mthemticl opertions with wide rnge of pplictions in mny res of science. It is therefore importnt to hve good methods

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions. Lerning Objectives Loci nd Conics Lesson 3: The Ellipse Level: Preclculus Time required: 120 minutes In this lesson, students will generlize their knowledge of the circle to the ellipse. The prmetric nd

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: Write polynomils in stndrd form nd identify the leding coefficients nd degrees of polynomils Add nd subtrct polynomils Multiply

More information

Integration. 148 Chapter 7 Integration

Integration. 148 Chapter 7 Integration 48 Chpter 7 Integrtion 7 Integrtion t ech, by supposing tht during ech tenth of second the object is going t constnt speed Since the object initilly hs speed, we gin suppose it mintins this speed, but

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

More information

All pay auctions with certain and uncertain prizes a comment

All pay auctions with certain and uncertain prizes a comment CENTER FOR RESEARC IN ECONOMICS AND MANAGEMENT CREAM Publiction No. 1-2015 All py uctions with certin nd uncertin prizes comment Christin Riis All py uctions with certin nd uncertin prizes comment Christin

More information

Section 5-4 Trigonometric Functions

Section 5-4 Trigonometric Functions 5- Trigonometric Functions Section 5- Trigonometric Functions Definition of the Trigonometric Functions Clcultor Evlution of Trigonometric Functions Definition of the Trigonometric Functions Alternte Form

More information

How To Network A Smll Business

How To Network A Smll Business Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

Helicopter Theme and Variations

Helicopter Theme and Variations Helicopter Theme nd Vritions Or, Some Experimentl Designs Employing Pper Helicopters Some possible explntory vribles re: Who drops the helicopter The length of the rotor bldes The height from which the

More information

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one.

5.2. LINE INTEGRALS 265. Let us quickly review the kind of integrals we have studied so far before we introduce a new one. 5.2. LINE INTEGRALS 265 5.2 Line Integrls 5.2.1 Introduction Let us quickly review the kind of integrls we hve studied so fr before we introduce new one. 1. Definite integrl. Given continuous rel-vlued

More information

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity

Babylonian Method of Computing the Square Root: Justifications Based on Fuzzy Techniques and on Computational Complexity Bbylonin Method of Computing the Squre Root: Justifictions Bsed on Fuzzy Techniques nd on Computtionl Complexity Olg Koshelev Deprtment of Mthemtics Eduction University of Texs t El Pso 500 W. University

More information

The Velocity Factor of an Insulated Two-Wire Transmission Line

The Velocity Factor of an Insulated Two-Wire Transmission Line The Velocity Fctor of n Insulted Two-Wire Trnsmission Line Problem Kirk T. McDonld Joseph Henry Lbortories, Princeton University, Princeton, NJ 08544 Mrch 7, 008 Estimte the velocity fctor F = v/c nd the

More information

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process

An Undergraduate Curriculum Evaluation with the Analytic Hierarchy Process An Undergrdute Curriculum Evlution with the Anlytic Hierrchy Process Les Frir Jessic O. Mtson Jck E. Mtson Deprtment of Industril Engineering P.O. Box 870288 University of Albm Tuscloos, AL. 35487 Abstrct

More information

Regular Sets and Expressions

Regular Sets and Expressions Regulr Sets nd Expressions Finite utomt re importnt in science, mthemtics, nd engineering. Engineers like them ecuse they re super models for circuits (And, since the dvent of VLSI systems sometimes finite

More information

MATH 150 HOMEWORK 4 SOLUTIONS

MATH 150 HOMEWORK 4 SOLUTIONS MATH 150 HOMEWORK 4 SOLUTIONS Section 1.8 Show tht the product of two of the numbers 65 1000 8 2001 + 3 177, 79 1212 9 2399 + 2 2001, nd 24 4493 5 8192 + 7 1777 is nonnegtive. Is your proof constructive

More information

Network Configuration Independence Mechanism

Network Configuration Independence Mechanism 3GPP TSG SA WG3 Security S3#19 S3-010323 3-6 July, 2001 Newbury, UK Source: Title: Document for: AT&T Wireless Network Configurtion Independence Mechnism Approvl 1 Introduction During the lst S3 meeting

More information

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT

COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE. Skandza, Stockholm ABSTRACT COMPARISON OF SOME METHODS TO FIT A MULTIPLICATIVE TARIFF STRUCTURE TO OBSERVED RISK DATA BY B. AJNE Skndz, Stockholm ABSTRACT Three methods for fitting multiplictive models to observed, cross-clssified

More information

Integration by Substitution

Integration by Substitution Integrtion by Substitution Dr. Philippe B. Lvl Kennesw Stte University August, 8 Abstrct This hndout contins mteril on very importnt integrtion method clled integrtion by substitution. Substitution is

More information

Warm-up for Differential Calculus

Warm-up for Differential Calculus Summer Assignment Wrm-up for Differentil Clculus Who should complete this pcket? Students who hve completed Functions or Honors Functions nd will be tking Differentil Clculus in the fll of 015. Due Dte:

More information

Hillsborough Township Public Schools Mathematics Department Computer Programming 1

Hillsborough Township Public Schools Mathematics Department Computer Programming 1 Essentil Unit 1 Introduction to Progrmming Pcing: 15 dys Common Unit Test Wht re the ethicl implictions for ming in tody s world? There re ethicl responsibilities to consider when writing computer s. Citizenship,

More information

Small Business Networking

Small Business Networking Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

More information

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany

g(y(a), y(b)) = o, B a y(a)+b b y(b)=c, Boundary Value Problems Lecture Notes to Accompany Lecture Notes to Accompny Scientific Computing An Introductory Survey Second Edition by Michel T Heth Boundry Vlue Problems Side conditions prescribing solution or derivtive vlues t specified points required

More information

How To Set Up A Network For Your Business

How To Set Up A Network For Your Business Why Network is n Essentil Productivity Tool for Any Smll Business TechAdvisory.org SME Reports sponsored by Effective technology is essentil for smll businesses looking to increse their productivity. Computer

More information

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered: Appendi D: Completing the Squre nd the Qudrtic Formul Fctoring qudrtic epressions such s: + 6 + 8 ws one of the topics introduced in Appendi C. Fctoring qudrtic epressions is useful skill tht cn help you

More information

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes

9.3. The Scalar Product. Introduction. Prerequisites. Learning Outcomes The Sclr Product 9.3 Introduction There re two kinds of multipliction involving vectors. The first is known s the sclr product or dot product. This is so-clled becuse when the sclr product of two vectors

More information

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY

PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY MAT 0630 INTERNET RESOURCES, REVIEW OF CONCEPTS AND COMMON MISTAKES PROF. BOYAN KOSTADINOV NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY Contents 1. ACT Compss Prctice Tests 1 2. Common Mistkes 2 3. Distributive

More information

belief Propgtion Lgorithm in Nd Pent Penta

belief Propgtion Lgorithm in Nd Pent Penta IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 9, NO. 3, MAY/JUNE 2012 375 Itertive Trust nd Reputtion Mngement Using Belief Propgtion Ermn Aydy, Student Member, IEEE, nd Frmrz Feri, Senior

More information

Decision Rule Extraction from Trained Neural Networks Using Rough Sets

Decision Rule Extraction from Trained Neural Networks Using Rough Sets Decision Rule Extrction from Trined Neurl Networks Using Rough Sets Alin Lzr nd Ishwr K. Sethi Vision nd Neurl Networks Lbortory Deprtment of Computer Science Wyne Stte University Detroit, MI 48 ABSTRACT

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Lecture 3 Gussin Probbility Distribution Introduction l Gussin probbility distribution is perhps the most used distribution in ll of science. u lso clled bell shped curve or norml distribution l Unlike

More information

Algebra Review. How well do you remember your algebra?

Algebra Review. How well do you remember your algebra? Algebr Review How well do you remember your lgebr? 1 The Order of Opertions Wht do we men when we write + 4? If we multiply we get 6 nd dding 4 gives 10. But, if we dd + 4 = 7 first, then multiply by then

More information

Section 7-4 Translation of Axes

Section 7-4 Translation of Axes 62 7 ADDITIONAL TOPICS IN ANALYTIC GEOMETRY Section 7-4 Trnsltion of Aes Trnsltion of Aes Stndrd Equtions of Trnslted Conics Grphing Equtions of the Form A 2 C 2 D E F 0 Finding Equtions of Conics In the

More information

piecewise Liner SLAs and Performance Timetagment

piecewise Liner SLAs and Performance Timetagment i: Incrementl Cost bsed Scheduling under Piecewise Liner SLAs Yun Chi NEC Lbortories Americ 18 N. Wolfe Rd., SW3 35 Cupertino, CA 9514, USA ychi@sv.nec lbs.com Hyun Jin Moon NEC Lbortories Americ 18 N.

More information

SPECIAL PRODUCTS AND FACTORIZATION

SPECIAL PRODUCTS AND FACTORIZATION MODULE - Specil Products nd Fctoriztion 4 SPECIAL PRODUCTS AND FACTORIZATION In n erlier lesson you hve lernt multipliction of lgebric epressions, prticulrly polynomils. In the study of lgebr, we come

More information

Vendor Rating for Service Desk Selection

Vendor Rating for Service Desk Selection Vendor Presented By DATE Using the scores of 0, 1, 2, or 3, plese rte the vendor's presenttion on how well they demonstrted the functionl requirements in the res below. Also consider how efficient nd functionl

More information

EE247 Lecture 4. For simplicity, will start with all pole ladder type filters. Convert to integrator based form- example shown

EE247 Lecture 4. For simplicity, will start with all pole ladder type filters. Convert to integrator based form- example shown EE247 Lecture 4 Ldder type filters For simplicity, will strt with ll pole ldder type filters Convert to integrtor bsed form exmple shown Then will ttend to high order ldder type filters incorporting zeros

More information

COMPONENTS: COMBINED LOADING

COMPONENTS: COMBINED LOADING LECTURE COMPONENTS: COMBINED LOADING Third Edition A. J. Clrk School of Engineering Deprtment of Civil nd Environmentl Engineering 24 Chpter 8.4 by Dr. Ibrhim A. Asskkf SPRING 2003 ENES 220 Mechnics of

More information

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding

Example A rectangular box without lid is to be made from a square cardboard of sides 18 cm by cutting equal squares from each corner and then folding 1 Exmple A rectngulr box without lid is to be mde from squre crdbord of sides 18 cm by cutting equl squres from ech corner nd then folding up the sides. 1 Exmple A rectngulr box without lid is to be mde

More information

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn 33337_0P03.qp 2/27/06 24 9:3 AM Chpter P Pge 24 Prerequisites P.3 Polynomils nd Fctoring Wht you should lern Polynomils An lgeric epression is collection of vriles nd rel numers. The most common type of

More information

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment

ClearPeaks Customer Care Guide. Business as Usual (BaU) Services Peace of mind for your BI Investment ClerPeks Customer Cre Guide Business s Usul (BU) Services Pece of mind for your BI Investment ClerPeks Customer Cre Business s Usul Services Tble of Contents 1. Overview...3 Benefits of Choosing ClerPeks

More information

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Or more simply put, when adding or subtracting quantities, their uncertainties add. Propgtion of Uncertint through Mthemticl Opertions Since the untit of interest in n eperiment is rrel otined mesuring tht untit directl, we must understnd how error propgtes when mthemticl opertions re

More information

Performance analysis model for big data applications in cloud computing

Performance analysis model for big data applications in cloud computing Butist Villlpndo et l. Journl of Cloud Computing: Advnces, Systems nd Applictions 2014, 3:19 RESEARCH Performnce nlysis model for big dt pplictions in cloud computing Luis Edurdo Butist Villlpndo 1,2,

More information

Data replication in mobile computing

Data replication in mobile computing Technicl Report, My 2010 Dt repliction in mobile computing Bchelor s Thesis in Electricl Engineering Rodrigo Christovm Pmplon HALMSTAD UNIVERSITY, IDE SCHOOL OF INFORMATION SCIENCE, COMPUTER AND ELECTRICAL

More information

Binary Representation of Numbers Autar Kaw

Binary Representation of Numbers Autar Kaw Binry Representtion of Numbers Autr Kw After reding this chpter, you should be ble to: 1. convert bse- rel number to its binry representtion,. convert binry number to n equivlent bse- number. In everydy

More information

Lectures 8 and 9 1 Rectangular waveguides

Lectures 8 and 9 1 Rectangular waveguides 1 Lectures 8 nd 9 1 Rectngulr wveguides y b x z Consider rectngulr wveguide with 0 < x b. There re two types of wves in hollow wveguide with only one conductor; Trnsverse electric wves

More information

Basic Analysis of Autarky and Free Trade Models

Basic Analysis of Autarky and Free Trade Models Bsic Anlysis of Autrky nd Free Trde Models AUTARKY Autrky condition in prticulr commodity mrket refers to sitution in which country does not engge in ny trde in tht commodity with other countries. Consequently

More information

Distributions. (corresponding to the cumulative distribution function for the discrete case).

Distributions. (corresponding to the cumulative distribution function for the discrete case). Distributions Recll tht n integrble function f : R [,] such tht R f()d = is clled probbility density function (pdf). The distribution function for the pdf is given by F() = (corresponding to the cumultive

More information

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report

DlNBVRGH + Sickness Absence Monitoring Report. Executive of the Council. Purpose of report DlNBVRGH + + THE CITY OF EDINBURGH COUNCIL Sickness Absence Monitoring Report Executive of the Council 8fh My 4 I.I...3 Purpose of report This report quntifies the mount of working time lost s result of

More information

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning

Software Cost Estimation Model Based on Integration of Multi-agent and Case-Based Reasoning Journl of Computer Science 2 (3): 276-282, 2006 ISSN 1549-3636 2006 Science Publictions Softwre Cost Estimtion Model Bsed on Integrtion of Multi-gent nd Cse-Bsed Resoning Hsn Al-Skrn Informtion Technology

More information

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management Journl of Mchine Lerning Reserch 9 (2008) 2079-2 Submitted 8/08; Published 0/08 Vlue Function Approximtion using Multiple Aggregtion for Multittribute Resource Mngement Abrhm George Wrren B. Powell Deprtment

More information

Introducing Kashef for Application Monitoring

Introducing Kashef for Application Monitoring WextWise 2010 Introducing Kshef for Appliction The Cse for Rel-time monitoring of dtcenter helth is criticl IT process serving vriety of needs. Avilbility requirements of 6 nd 7 nines of tody SOA oriented

More information

Small Business Cloud Services

Small Business Cloud Services Smll Business Cloud Services Summry. We re thick in the midst of historic se-chnge in computing. Like the emergence of personl computers, grphicl user interfces, nd mobile devices, the cloud is lredy profoundly

More information

and thus, they are similar. If k = 3 then the Jordan form of both matrices is

and thus, they are similar. If k = 3 then the Jordan form of both matrices is Homework ssignment 11 Section 7. pp. 249-25 Exercise 1. Let N 1 nd N 2 be nilpotent mtrices over the field F. Prove tht N 1 nd N 2 re similr if nd only if they hve the sme miniml polynomil. Solution: If

More information

TITLE THE PRINCIPLES OF COIN-TAP METHOD OF NON-DESTRUCTIVE TESTING

TITLE THE PRINCIPLES OF COIN-TAP METHOD OF NON-DESTRUCTIVE TESTING TITLE THE PRINCIPLES OF COIN-TAP METHOD OF NON-DESTRUCTIVE TESTING Sung Joon Kim*, Dong-Chul Che Kore Aerospce Reserch Institute, 45 Eoeun-Dong, Youseong-Gu, Dejeon, 35-333, Kore Phone : 82-42-86-231 FAX

More information

Understanding Basic Analog Ideal Op Amps

Understanding Basic Analog Ideal Op Amps Appliction Report SLAA068A - April 2000 Understnding Bsic Anlog Idel Op Amps Ron Mncini Mixed Signl Products ABSTRACT This ppliction report develops the equtions for the idel opertionl mplifier (op mp).

More information

Learner-oriented distance education supporting service system model and applied research

Learner-oriented distance education supporting service system model and applied research SHS Web of Conferences 24, 02001 (2016) DOI: 10.1051/ shsconf/20162402001 C Owned by the uthors, published by EDP Sciences, 2016 Lerner-oriented distnce eduction supporting service system model nd pplied

More information

Morgan Stanley Ad Hoc Reporting Guide

Morgan Stanley Ad Hoc Reporting Guide spphire user guide Ferury 2015 Morgn Stnley Ad Hoc Reporting Guide An Overview For Spphire Users 1 Introduction The Ad Hoc Reporting tool is ville for your reporting needs outside of the Spphire stndrd

More information

Version 001 Summer Review #03 tubman (IBII20142015) 1

Version 001 Summer Review #03 tubman (IBII20142015) 1 Version 001 Summer Reiew #03 tubmn (IBII20142015) 1 This print-out should he 35 questions. Multiple-choice questions my continue on the next column or pge find ll choices before nswering. Concept 20 P03

More information

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur

Module 2. Analysis of Statically Indeterminate Structures by the Matrix Force Method. Version 2 CE IIT, Kharagpur Module Anlysis of Stticlly Indeterminte Structures by the Mtrix Force Method Version CE IIT, Khrgpur esson 9 The Force Method of Anlysis: Bems (Continued) Version CE IIT, Khrgpur Instructionl Objectives

More information

Database-Backed Program Analysis for Scalable Error Propagation

Database-Backed Program Analysis for Scalable Error Propagation Dtbse-Bcked Progrm Anlysis for Sclble Error Propgtion Cthrin Weiss cthrin.weiss@gmil.com Cindy Rubio-González University of Cliforni, Dvis crubio@ucdvis.edu Ben Liblit University of Wisconsin Mdison liblit@cs.wisc.edu

More information

AN ANALYTICAL HIERARCHY PROCESS METHODOLOGY TO EVALUATE IT SOLUTIONS FOR ORGANIZATIONS

AN ANALYTICAL HIERARCHY PROCESS METHODOLOGY TO EVALUATE IT SOLUTIONS FOR ORGANIZATIONS AN ANALYTICAL HIERARCHY PROCESS METHODOLOGY TO EVALUATE IT SOLUTIONS FOR ORGANIZATIONS Spiros Vsilkos (), Chrysostomos D. Stylios (),(b), John Groflkis (c) () Dept. of Telemtics Center, Computer Technology

More information

Small Businesses Decisions to Offer Health Insurance to Employees

Small Businesses Decisions to Offer Health Insurance to Employees Smll Businesses Decisions to Offer Helth Insurnce to Employees Ctherine McLughlin nd Adm Swinurn, June 2014 Employer-sponsored helth insurnce (ESI) is the dominnt source of coverge for nonelderly dults

More information

Physics 43 Homework Set 9 Chapter 40 Key

Physics 43 Homework Set 9 Chapter 40 Key Physics 43 Homework Set 9 Chpter 4 Key. The wve function for n electron tht is confined to x nm is. Find the normliztion constnt. b. Wht is the probbility of finding the electron in. nm-wide region t x

More information

4.11 Inner Product Spaces

4.11 Inner Product Spaces 314 CHAPTER 4 Vector Spces 9. A mtrix of the form 0 0 b c 0 d 0 0 e 0 f g 0 h 0 cnnot be invertible. 10. A mtrix of the form bc d e f ghi such tht e bd = 0 cnnot be invertible. 4.11 Inner Product Spces

More information

1. Find the zeros Find roots. Set function = 0, factor or use quadratic equation if quadratic, graph to find zeros on calculator

1. Find the zeros Find roots. Set function = 0, factor or use quadratic equation if quadratic, graph to find zeros on calculator AP Clculus Finl Review Sheet When you see the words. This is wht you think of doing. Find the zeros Find roots. Set function =, fctor or use qudrtic eqution if qudrtic, grph to find zeros on clcultor.

More information

ORBITAL MANEUVERS USING LOW-THRUST

ORBITAL MANEUVERS USING LOW-THRUST Proceedings of the 8th WSEAS Interntionl Conference on SIGNAL PROCESSING, ROBOICS nd AUOMAION ORBIAL MANEUVERS USING LOW-HRUS VIVIAN MARINS GOMES, ANONIO F. B. A. PRADO, HÉLIO KOII KUGA Ntionl Institute

More information

The Definite Integral

The Definite Integral Chpter 4 The Definite Integrl 4. Determining distnce trveled from velocity Motivting Questions In this section, we strive to understnd the ides generted by the following importnt questions: If we know

More information

Redistributing the Gains from Trade through Non-linear. Lump-sum Transfers

Redistributing the Gains from Trade through Non-linear. Lump-sum Transfers Redistributing the Gins from Trde through Non-liner Lump-sum Trnsfers Ysukzu Ichino Fculty of Economics, Konn University April 21, 214 Abstrct I exmine lump-sum trnsfer rules to redistribute the gins from

More information

2. Transaction Cost Economics

2. Transaction Cost Economics 3 2. Trnsction Cost Economics Trnsctions Trnsctions Cn Cn Be Be Internl Internl or or Externl Externl n n Orgniztion Orgniztion Trnsctions Trnsctions occur occur whenever whenever good good or or service

More information

Enterprise Risk Management Software Buyer s Guide

Enterprise Risk Management Software Buyer s Guide Enterprise Risk Mngement Softwre Buyer s Guide 1. Wht is Enterprise Risk Mngement? 2. Gols of n ERM Progrm 3. Why Implement ERM 4. Steps to Implementing Successful ERM Progrm 5. Key Performnce Indictors

More information

6.2 Volumes of Revolution: The Disk Method

6.2 Volumes of Revolution: The Disk Method mth ppliction: volumes of revolution, prt ii Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem ) nd the ccumultion process is to determine so-clled volumes of

More information

Project Recovery. . It Can Be Done

Project Recovery. . It Can Be Done Project Recovery. It Cn Be Done IPM Conference Wshington, D.C. Nov 4-7, 200 Wlt Lipke Oklhom City Air Logistics Center Tinker AFB, OK Overview Mngement Reserve Project Sttus Indictors Performnce Correction

More information

How To Make A Network More Efficient

How To Make A Network More Efficient Rethinking Virtul Network Emedding: Sustrte Support for Pth Splitting nd Migrtion Minln Yu, Yung Yi, Jennifer Rexford, Mung Ching Princeton University Princeton, NJ {minlnyu,yyi,jrex,chingm}@princeton.edu

More information

Online Multicommodity Routing with Time Windows

Online Multicommodity Routing with Time Windows Konrd-Zuse-Zentrum für Informtionstechnik Berlin Tkustrße 7 D-14195 Berlin-Dhlem Germny TOBIAS HARKS 1 STEFAN HEINZ MARC E. PFETSCH TJARK VREDEVELD 2 Online Multicommodity Routing with Time Windows 1 Institute

More information

Scalable Mining of Large Disk-based Graph Databases

Scalable Mining of Large Disk-based Graph Databases Sclle Mining of Lrge Disk-sed Grph Dtses Chen Wng Wei Wng Jin Pei Yongti Zhu Bile Shi Fudn University, Chin, {chenwng, weiwng1, 2465, shi}@fudn.edu.cn Stte University of New York t Bufflo, USA & Simon

More information

Lecture 5. Inner Product

Lecture 5. Inner Product Lecture 5 Inner Product Let us strt with the following problem. Given point P R nd line L R, how cn we find the point on the line closest to P? Answer: Drw line segment from P meeting the line in right

More information

FUNCTIONS AND EQUATIONS. xεs. The simplest way to represent a set is by listing its members. We use the notation

FUNCTIONS AND EQUATIONS. xεs. The simplest way to represent a set is by listing its members. We use the notation FUNCTIONS AND EQUATIONS. SETS AND SUBSETS.. Definition of set. A set is ny collection of objects which re clled its elements. If x is n element of the set S, we sy tht x belongs to S nd write If y does

More information

How fast can we sort? Sorting. Decision-tree model. Decision-tree for insertion sort Sort a 1, a 2, a 3. CS 3343 -- Spring 2009

How fast can we sort? Sorting. Decision-tree model. Decision-tree for insertion sort Sort a 1, a 2, a 3. CS 3343 -- Spring 2009 CS 4 -- Spring 2009 Sorting Crol Wenk Slides courtesy of Chrles Leiserson with smll chnges by Crol Wenk CS 4 Anlysis of Algorithms 1 How fst cn we sort? All the sorting lgorithms we hve seen so fr re comprison

More information

Application-Level Traffic Monitoring and an Analysis on IP Networks

Application-Level Traffic Monitoring and an Analysis on IP Networks Appliction-Level Trffic Monitoring nd n Anlysis on IP Networks Myung-Sup Kim, Young J. Won, nd Jmes Won-Ki Hong Trditionl trffic identifiction methods bsed on wellknown port numbers re not pproprite for

More information

Rotating DC Motors Part II

Rotating DC Motors Part II Rotting Motors rt II II.1 Motor Equivlent Circuit The next step in our consiertion of motors is to evelop n equivlent circuit which cn be use to better unerstn motor opertion. The rmtures in rel motors

More information

Project 6 Aircraft static stability and control

Project 6 Aircraft static stability and control Project 6 Aircrft sttic stbility nd control The min objective of the project No. 6 is to compute the chrcteristics of the ircrft sttic stbility nd control chrcteristics in the pitch nd roll chnnel. The

More information

trademark and symbol guidelines FOR CORPORATE STATIONARY APPLICATIONS reviewed 01.02.2007

trademark and symbol guidelines FOR CORPORATE STATIONARY APPLICATIONS reviewed 01.02.2007 trdemrk nd symbol guidelines trdemrk guidelines The trdemrk Cn be plced in either of the two usul configurtions but horizontl usge is preferble. Wherever possible the trdemrk should be plced on blck bckground.

More information

Efficient load-balancing routing for wireless mesh networks

Efficient load-balancing routing for wireless mesh networks Computer Networks 51 (007) 50 66 www.elsevier.com/locte/comnet Efficient lod-blncing routing for wireless mesh networks Yigl Bejerno, Seung-Je Hn b, *,1, Amit Kumr c Bell Lbortories, Lucent Technologies,

More information

AntiSpyware Enterprise Module 8.5

AntiSpyware Enterprise Module 8.5 AntiSpywre Enterprise Module 8.5 Product Guide Aout the AntiSpywre Enterprise Module The McAfee AntiSpywre Enterprise Module 8.5 is n dd-on to the VirusScn Enterprise 8.5i product tht extends its ility

More information

PHY 140A: Solid State Physics. Solution to Homework #2

PHY 140A: Solid State Physics. Solution to Homework #2 PHY 140A: Solid Stte Physics Solution to Homework # TA: Xun Ji 1 October 14, 006 1 Emil: jixun@physics.ucl.edu Problem #1 Prove tht the reciprocl lttice for the reciprocl lttice is the originl lttice.

More information