parallel algorithm for the extraction of structured motifs lexandra arvalho MEI 2002/03 omputação em Sistemas Distribuídos 2003 p.1/27
Plan of the talk Biological model of regulation Nucleic acids: DN and RN lassification of living organisms: Prokaryotes and Eukaryotes ranscription and ranslation Promoter and Regulatory Sequences omputational model of regulation Suffix tree and generalized suffix tree Single models extraction [M.-F. Sagot, Latin, 1998] Structured models extraction [L. Marsan and M.-F. Sagot, J. omputational Biology, 2000] Parallelization [. arvalho,. Freitas,. Oliveira and M.-F. Sagot, submitted, 2003] he PRIION UP O ε problem he Simpleut algorithm he tree partition problem he PSMILE algorithm Experimental results omputação em Sistemas Distribuídos 2003 p.2/27
Nucleic acids Nucleotides: storage and retrieval of biological information building blocks for the construction of nucleic acids omputação em Sistemas Distribuídos 2003 p.3/27
Nucleic acids Nucleotides: storage and retrieval of biological information building blocks for the construction of nucleic acids wo main types of nucleic acids: DN - DeoxyriboNucleic cid: contain the bases,,, and double-stranded molecule RN - RiboNucleic cid: contain the bases,,, and U single-stranded molecule omputação em Sistemas Distribuídos 2003 p.3/27
lassification of living organisms Prokaryotes: reek words: pro "before"and karyon "nucleus" bacteria and prokaryotes are generally used interchangeably most prokaryotes live as single-celled organisms Eukaryotes: reek words: eu "well"and karyon "nucleus" yeast is an eukaryotic single-celled organism omputação em Sistemas Distribuídos 2003 p.4/27
ranscription and ranslation omputação em Sistemas Distribuídos 2003 p.5/27
Promoter and Regulatory Sequences omputação em Sistemas Distribuídos 2003 p.6/27
Structured motifs Definition. model model is an element in Σ +. Definition. structured model structured model is a pair (m, d) where: m = (m i ) 1 i p, denoting the p boxes d = (d mini, d maxi, δ i ) 1 i p 1, denoting the p 1 intervals of distance omputação em Sistemas Distribuídos 2003 p.7/27
Structured motifs Definition. model model is an element in Σ +. Definition. structured model structured model is a pair (m, d) where: m = (m i ) 1 i p, denoting the p boxes d = (d mini, d maxi, δ i ) 1 i p 1, denoting the p 1 intervals of distance m1 e1 subst. [d1min,d1max] m2 e2 subst. mp ep subst. k1 k2 kp p boxes n attempt to model the combinatorics of regulation omputação em Sistemas Distribuídos 2003 p.7/27
Structured motifs Definition. e-occurrence model m e-occurs in the input sequences if exists u in the input sequences such that HammingDistance(m, u) e (minimum number of substitutions to transform u into m). omputação em Sistemas Distribuídos 2003 p.8/27
Structured motifs Definition. e-occurrence model m e-occurs in the input sequences if exists u in the input sequences such that HammingDistance(m, u) e (minimum number of substitutions to transform u into m). Definition. valid model, quorum model is valid if e-occurs in at least q input sequences, where q is called the quorum. omputação em Sistemas Distribuídos 2003 p.8/27
Structured motifs Definition. e-occurrence model m e-occurs in the input sequences if exists u in the input sequences such that HammingDistance(m, u) e (minimum number of substitutions to transform u into m). Definition. valid model, quorum model is valid if e-occurs in at least q input sequences, where q is called the quorum. e1=2 e2=1 [16,18] k1=[6,8] k2=[6,8] q=3 omputação em Sistemas Distribuídos 2003 p.8/27
Structured motifs Definition. e-occurrence model m e-occurs in the input sequences if exists u in the input sequences such that HammingDistance(m, u) e (minimum number of substitutions to transform u into m). Definition. valid model, quorum model is valid if e-occurs in at least q input sequences, where q is called the quorum. e1=2 e2=1 [16,18] k1=[6,8] k2=[6,8] 17 16 too far 17 q=3 omputação em Sistemas Distribuídos 2003 p.8/27
Input sequences >strand + guab inositol-monophosphate dehydrogenas >strand - yaa yaa >strand + yaaj similar to hypothetical proteins >strand - yaai similar to isochorismatase >strand + mets methionyl-trn synthetase omputação em Sistemas Distribuídos 2003 p.9/27
Suffix ree Definition. Suffix tree suffix tree of a n-character string S is a rooted directed tree with exactly n leaves: leaves are numbered 1 to n each internal node has at least two children each edge is labeled with a nonempty substring of S no two edges out of a node can have edge-labels beginning with the same character he key feature of the suffix tree is that for any leaf i, the label of the path from the root to the leaf i exactly spells out the suffix of S that starts at position i. Weiner, IEEE Symposium on Switching and utomata heory, 1973 Ukkonen, lgorithmica, 1995 heorem. Suffix trees can be built in linear-time. omputação em Sistemas Distribuídos 2003 p.10/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
Suffix ree Suffix tree for the string omputação em Sistemas Distribuídos 2003 p.11/27
eneralized Suffix ree Suffix tree for the strings and # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # # # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # # # # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and # # # # # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and #,# # # # # # omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree Suffix tree for the strings and #,# # # # # #,# omputação em Sistemas Distribuídos 2003 p.12/27
eneralized Suffix ree with olors Suffix tree for the strings and # [0,1] #,#,# # [0,1] [0,1] [1,0] [1,0] # [0,1] # [1,0] [1,0] [0,1] [1,0] # [0,1] [ ]: bit vectors called olors omputação em Sistemas Distribuídos 2003 p.13/27
Extraction of Single Models Definition. e-node-occurrence e-node-occurrence of a model m is represented by a pair (v, e v ) where v is a tree node and e v e is the Hamming distance between the label of the path from the root to v and m. Notation. ν(e, k) he number of distinct words at Hamming distance at most e from a k-long word: ν(e, k) = e k i i=0 ( Σ 1) i k e Σ e. Notation. n k he number of tree nodes at depth k of a suffix tree. omputação em Sistemas Distribuídos 2003 p.14/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and [1,0] [0,1] [1,0] (,0); (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and [1,0] [0,1] [1,0] (,0); (,1); (,1) (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and [1,0] [0,1] [1,0] (,0); (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and [1,0] [0,1] [1,0] (,0); (,1); (,1) (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,0); (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,0); (,1); (,1) (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,0); (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,0); (,1); (,1) (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,0); (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,1); (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) [1,0] [0,1] [1,0] (,1); (,0); (,1) (,1); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) [1,0] [0,1] [1,0] (,1); (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) [1,0] [0,1] [1,0] (,1); (,0); (,1) (,1); (,0) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] (,1); (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] (,1); (,0); (,1) (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] (,1); (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] (,1); (,0); (,1) (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] (,1); (,0); (,1) omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) (,1) (,1) (,1) (,1) [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Single Models M.-F. Sagot, Latin, 1998 k = 2 e = 1 q = 2 Input sequences: and (,0) (,1) (,1) (,1) (,1) (,0) (,1) (,1) (,1) (,1) [1,0] [0,1] [1,0] Proposition. he single motifs extraction takes O(Nn k ν(e, k)) time. omputação em Sistemas Distribuídos 2003 p.15/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 ExtractModels(Model m, Block i) 1. for each node-occurrence v of m 2. if (i > 1) 3. put in P otencialstarts the children of v at levels (i 1)k + (i 1)d mini 1 to (i 1)k + (i 1)d maxi 1 4. else 5. put v in P otencialstarts 6. for each model m i obtained by doing a recursive depth-first traversal from the root of the virtual model tree M while simultaneously traversing from the node-occurrences in P otencialstarts 7. if (i < p) 8. ExtractModels(m = m 1... m i,i + 1) 9. else 10. KeepModel( (m 1,..., m p ), ((d min1, d max1 ),..., (d minp, d maxp )) ) omputação em Sistemas Distribuídos 2003 p.16/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and (,1) (,1) [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and (,1) (,1) [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and (,1) [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and (,1) [0,1] [1,0] [0,1] [1,0] omputação em Sistemas Distribuídos 2003 p.17/27
Extraction of Structured Models: SMILE L. Marsan and M.-F. Sagot, Journal of omputational Biology, 2000 p = 2 k 1 = 2, d = 1, k 2 = 2 e 1 = 1, e 2 = 1 q = 2 Input sequences: and (,1) [0,1] [1,0] [0,1] [1,0] Proposition. he structured motifs extraction takes O(Nn pk+(p 1)dmax ν p (e, k)) time. omputação em Sistemas Distribuídos 2003 p.17/27
PRIION UP O ε PRIION UP O ε problem: l gold bars w i 0 is the the weight of the ith gold bar any gold bar can be cut in c equal parts Optimization version: he problem is how to share the gold between r persons, with the minimum number of gold bars z, in such a way that each person gets the same share of gold up to some weight ε > 0. Decision version: he problem is to decide whether it is possible to share the gold between r persons, with z gold bars, in such a way that each person gets the same share of gold up to some weight ε 0. Proposition. he PRIION UP O ε problem is NP-complete in the strong sense. omputação em Sistemas Distribuídos 2003 p.18/27
PRIION UP O ε Simpleut(Partition i, oldbars l, Persons r, Weights w j, utfactor c, WorkOverload ε) 1. find the smallest t such that max w j c t ε 2. for each j {1,..., l} [ j 1 3. let V j = k=1 w k c t, ) j k=1 w k c t 4. let w = l j=1 w j 5. let γ = w c t mod r 6. let δ = w ct r 7. let I i [(i 1)(δ + 1), i(δ + 1) ) = [γ(δ + 1) + (i (γ + 1))δ, γ(δ + 1) + (i γ)δ ) for all i γ otherwise 8. transform I i = [a, b) into I i = [f(a), f(b)) with f : w c t l c t : f(x) = (j 1) c t + x inf(v j ) w j l c t for all x V j if x = w c t omputação em Sistemas Distribuídos 2003 p.19/27
PRIION UP O ε j 1 2 w j 2 1 r = 3 ε = 1 omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε j 1 2 w j 2 1 r = 3 ε = 1 t = 1 1. find the smallest t such that max w j c t ε omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε j 1 2 w j 2 1 r = 3 ε = 1 t = 1 2. for each j[ 1,..., l j 1 3. V j = k=1 w k c t, ) j k=1 w k c t V1=[0,4) V2=[4,6) omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε j 1 2 w j 2 1 r = 3 ε = 1 t = 1 w = 3 γ = 0 δ = 2 4. w = l j=1 w j 5. γ = w c t mod r 6. δ = w ct r V1=[0,4) V2=[4,6) omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε j 1 2 w j 2 1 { 7. I i [(i 1)(δ + 1), i(δ + 1) ) = for all i γ [γ(δ + 1) + (i (γ + 1))δ, γ(δ + 1) + (i γ)δ ) otherwise r = 3 ε = 1 t = 1 w = 3 γ = 0 δ = 2 V1=[0,4) V2=[4,6) I 1=[0,2) I 2=[2,4) I 3=[4,6) omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε j 1 2 w j 2 1 r = 3 ε = 1 t = 1 w = 3 γ = 0 δ = 2 8. transform I i = [a, b) into I i = [f(a), f(b)) with { (j 1) c t + x inf(v j ) for all x V f(x) = w j j l c t if x = w c t V1=[0,4) V2=[4,6) I 1=[0,2) I 2=[2,4) I 3=[4,6) I1=[0,1) I2=[1,2) I3=[2,4) omputação em Sistemas Distribuídos 2003 p.20/27
PRIION UP O ε Proposition. he Simpleut algorithm requires O(l) time. Proposition. he Simpleut algorithm has a ratio bound ρ(l, r, (w i ) 1 i l, c, ε) = O( max w i ε ). omputação em Sistemas Distribuídos 2003 p.21/27
Parallelization Reducing the tree partition problem to the PRIION UP O ε problem Input of the Simpleut algorithm for the ith grid node: l = Σ r matches the number of grid nodes w j of each symbol of the alphabet is obtained by scanning the input sequences c = Σ ε is an user parameter omputação em Sistemas Distribuídos 2003 p.22/27
Parallelization Reducing the tree partition problem to the PRIION UP O ε problem Input of the Simpleut algorithm for the ith grid node: l = Σ r matches the number of grid nodes w j of each symbol of the alphabet is obtained by scanning the input sequences c = Σ ε is an user parameter Output of the Simpleut algorithm for the ith grid node: the number t of cuts gives the depth t + 1 of the tree where the partition is defined an interval I i corresponding to tree nodes at depth t + 1 assigned to the ith grid node omputação em Sistemas Distribuídos 2003 p.22/27
Parallelization Reducing the tree partition problem to the PRIION UP O ε problem Input of the Simpleut algorithm for the ith grid node: l = Σ r matches the number of grid nodes w j of each symbol of the alphabet is obtained by scanning the input sequences c = Σ ε is an user parameter Output of the Simpleut algorithm for the ith grid node: the number t of cuts gives the depth t + 1 of the tree where the partition is defined an interval I i corresponding to tree nodes at depth t + 1 assigned to the ith grid node Simpleut(Partition i, lphabetsize l, ridnodes r, Weights w j, lphabetsize c, WorkOverload ε) 1. find the smallest t such that max w j c t 2. let t = min(depth(m) - 1,t ) ε omputação em Sistemas Distribuídos 2003 p.22/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 r = 5 ε = 1 omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 r = 5 ε = 1 t = 1 t = 1 1. find the smallest t such that max w j 2. t = min(depth(m) - 1,t ) c t ε omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 3. for each j[ 1,..., l j 1 4. V j = k=1 w k c t, ) j k=1 w k c t r = 5 ε = 1 t = 1 t = 1 V1=[0,8) V2=[8,12) V3=[12,16) V4=[16,24) omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 r = 5 ε = 1 t = 1 t = 1 5. w = l j=1 w j 6. γ = w c t mod r 7. δ = w ct r V1=[0,8) V2=[8,12) V3=[12,16) V4=[16,24) omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 { 8. I i [(i 1)(δ + 1), i(δ + 1) ) = for all i γ [γ(δ + 1) + (i (γ + 1))δ, γ(δ + 1) + (i γ)δ ) otherwise r = 5 ε = 1 t = 1 t = 1 w = 6 γ = 4 δ = 4 V1=[0,8) V2=[8,12) V3=[12,16) V4=[16,24) I 1=[0,5) I 2=[5,10) I 3=[10,15) I 4=[15,20) I 5=[20,24) omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization j 1 2 3 4 σ j w j 2 1 1 2 r = 5 ε = 1 t = 1 t = 1 w = 6 γ = 4 δ = 4 9. transform I i = [a, b) into I i = [f(a), f(b)) with { (j 1) c t + x inf(v j ) for all x V f(x) = w j j l c t if x = w c t V1=[0,8) V2=[8,12) V3=[12,16) V4=[16,24) I 1=[0,5) I 2=[5,10) I 3=[10,15) I 4=[15,20) I 5=[20,24) I1=[0,2) I2=[2,6) I3=[6,11) I4=[11,14) I5=[14,16) omputação em Sistemas Distribuídos 2003 p.23/27
Parallelization PExtractModels(Model m, Block i, PartitionSet I i of M) 1. for each node-occurrence v of m 2. if (i > 1) 3. put in P otencialstarts the children of v at levels (i 1)k + (i 1)d mini 1 to (i 1)k + (i 1)d maxi 1 4. else 5. put v in P otencialstarts 6. for each model m i I i obtained by doing a recursive depth-first traversal from the root of the virtual model tree M while simultaneously traversing from the node-occurrences in P otencialstarts 7. if (i < p) 8. PExtractModels(m = m 1... m i,i + 1, I i ) 9. else 10. KeepModel( (m 1,..., m p ), ((d min1, d max1 ),..., (d minp, d maxp )) ) omputação em Sistemas Distribuídos 2003 p.24/27
Parallelization PSmile(ridNode i, WorkOverload ε) 1. compute weights (w i ) 1 i Σ ; 2. build suffix tree ; 3. create colors on ; 4. let I i = Simpleut(i, Σ, r, (w i ) 1 i Σ, Σ, ε); 5. call PExtractModels(, I i ); Proposition. ssume Σ fixed and w i = 1 for 1 i Σ. he parallel algorithm PSmile is work-efficient with respect to the sequential version when r = O(ν p 2 (e, k)) and ε w 1 r. omputação em Sistemas Distribuídos 2003 p.25/27
Parallelization Experimental results 2 boxes 3 boxes models time (sec) models time (sec) grid node 1 2 155.83 9987 444.50 grid node 2 1 168.11 6178 385.28 grid node 3 2 245.35 3108 473.70 grid node 4 16 262.51 15884 581.64 total 21 831.80 35157 1885.18 parallel time 262.51 581.64 sequential time 757.97 1790.70 speed up 2.9 3.1 omputação em Sistemas Distribuídos 2003 p.26/27
On going and future work Implementation of a more efficient sequential algorithm to extract structured models [L. Marsan and M.-F. Sagot, J. omputational Biology, 2000] Establishing an even more efficient algorithm to extract structured models [. arvalho,. Freitas,. Oliveira and M.-F. Sagot, in preparation, 2003] omparison between algorithms which attempts to model the combinatorics of regulation omputação em Sistemas Distribuídos 2003 p.27/27