Insertion and deletion correcting DNA barcodes based on watermarks

Size: px
Start display at page:

Download "Insertion and deletion correcting DNA barcodes based on watermarks"

Transcription

1 Kracht and Schober BMC Bioinformatics (2015) 16:50 DOI /s METHODOLOGY ARTICLE Open Access Insertion and deetion correcting DNA barcodes based on watermarks David Kracht * and Steffen Schober Abstract Background: Barcode mutipexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, caed barcodes, are attached to natura DNA fragments within the ibrary preparation procedure. Different ibraries, can individuay be abeed with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utiizing these DNA abes. The fina separation step is caed demutipexing and is mainy determined by the characteristics of the DNA code words used as abes. Currenty, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channe coding provides we-known code constructions for Hamming metric. They provide a arge number of code words with variabe engths and maxima correction capabiity regarding substitution errors. However, some sequencing patforms are known to have exceptiona high numbers of insertion or deetion errors. Barcodes based on the edit distance can take insertion and deetion errors into account in the decoding process. Unfortunatey, there is no expicit code-construction known that gives optima codes for edit metric. Resuts: In the present work we focus on an entirey different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-caed watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channes with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exempary set of barcodes that are experimentay compatibe with common next-generation sequencing patforms. Finay, a reaistic simuation scenario is use to evauate the proposed codes to show that the watermark concept is suitabe for DNA sequencing appications. Concusion: Our adaption of watermark codes enabes the construction of barcodes that are capabe of correcting substitutions, insertion and deetion errors. The presented approach has the advantage of not needing any markers or technica sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches. Keywords: Next generation sequencing, Barcodes, Mutipexing, Edit distance, Watermark codes, Insertions, Deetions, Sequence embedding Background Due to the steadiy increasing throughput on patforms for next-generation sequencing and dropping prices of commerciay avaiabe devices, DNA sequencing becomes broady accessibe for researchers. Since the output of bases in each sequencing run has reached giga to tera *Correspondence: david.kracht@uni-um.de Institute of Communications Engineering, Um University, Abert-Einstein-Aee 43, Um, Germany orders within the ast few years, strategies for efficienty sharing the sequencing capacity has become of particuar interest. Mutipexed sequencing is a major key technique, that makes sequencing devices accessibe in parae: DNA sampes from different experiments can be pooed into batches and sequenced in parae in a singe sequencing run. Before joining different sampes it is mandatory to uniquey abe the DNA fragments. DNA barcodes, artificiay synthesized sequences of nuceic acids, are used as 2015 Kracht and Schober; icensee BioMed Centra. This is an Open Access artice distributed under the terms of the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the origina work is propery credited. The Creative Commons Pubic Domain Dedication waiver ( appies to the data made avaiabe in this artice, uness otherwise stated.

2 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 2 of 14 abes to tag the fragments and to separate the output of the sequencers according to the input sampes. The robustness of mutipexing in genera reies on the properties of the used barcodes and how we they are adapted to the underying sequencing protoco and patform. Essentia experimenta pre-processing steps, which are needed to prepare the DNA materia can cause errors on the target genomic sequence and the barcodes as we. Physica and chemica sequence modifications, e.g. fragmentation, igation, or copy procedures are known sources of such errors. These errors ead to cross-tak during demutipexing, i.e., sequences from different batches can not ceary be distinguished, which is of course highy undesirabe. Different constructions for barcodes have been proposed, for exampe those of Hamady et a. [1] and Bystrykh [2] are based on Hamming codes [3] or the approach of Krishnan et a. [4] based on BCH codes [5], to name just a few. For short engths it is even feasibe to appy brute-force search techniques, e.g. Frank [6] or Mir et a. [7], where some of the resuting codes of the atter approach even reach fundamenta bounds. The constructions mentioned so far are designed to correct substitution errors ony. From a conceptiona point of view a of them try to provide codes that maximize the so-caed Hamming distance between the individua code words. The Hamming distance between a pair of sequences measure the minima number of symbo-wise substitution that are needed to transform them into each other. But, some specific sequencing patforms are known to have exceptiona high numbers of insertion and deetion errors as reported for Roche 454 Pyrosequencing [8], PacBio sequencers [9] or Ion Torrent PGM [10]. See [11-13] for a comparison of these sequencing techniques. Hence, especiay for insertion and deetion prone devices one has to consider barcodes that are capabe to correct indes. Promising attempts to find barcodes that are robust to indes have been considered in [14,15] using the so-caed edit or Levenshtein distance (see [16] for an overview). For cacuating the distance between code words the Levenshtein distance takes insertion and deetion operations into account and is therefore better suited for appications where decoding based on Hamming distance fais. Unfortunatey, there is no code-construction known that directy gives optima codes in edit metric. Some greedy (ater evoutionary) agorithms has been proposed in [17,18] to find sets of barcodes of moderate size with high minima edit distance, additionay fufiing bioogica constraints. However, a practica decoding step for the obtained barcodes has not been addressed in the mentioned papers. This was ater done by [19], where it is stated that maximizing the edit distance for barcodes (within a sequence context) is a sub-optima or even wrong strategy. The context of a code words, which is simpy the sequence that contains the DNA tag pays an important roe. Due to indes the exact boundaries of code words can not be correcty recovered. This eads to additiona errors, if the sequence context was not incuded in the code construction. The DNA context at one end of a code word can be taken into account by using an adapted Sequence-Levenshtein distance, as proposed in [19]. In this manuscript, we provide an entire different perspective to obtain barcodes. We give codes based on concepts introduced by Davey and Mackay [20]. The origina watermark approach is aimed to synchronize and decode a continuous stream of arge binary data-bocks. In the domain of DNA codes we face additiona constraints, for which the origina concept is adapted. We finay give an exempary set of barcodes and provide an in siico appication, which shows that demutipexing based on the watermark concept is appicabe in the fied of next-generation sequencing. Basic concepts of watermark coding has aready been considered for data embedding in DNA [21,22], which is cosey reated to the barcoding approach for DNA sequencing. But, the transmission of bioogicay compatibe sequences through an evoutionary channe (in iving ces) is ony sighty simiar to the approach we consider in the present manuscript. Note, that search approaches ike [16,19] can be used to find better codes in terms of code rate and minima (sequence) edit distance, but we see two striking advantages of the watermark concept for barcoding. First, the watermark concept contains an impicit synchronization technique, that does not need preambes or markers to find boundaries of code words within an unknown sequence. Embedding of barcodes in an unknown context is not generaized in approaches considering barcodes based on (sequence) edit distance. A two-ended embedding of sequences is not refected in this metric. Furthermore, we are abe to give an optima decoding procedure, adapted to a specific error channe. In short, this enabes a maximum degree of freedom for existing as we as future experimenta settings. Decoding speed is the second important aspect of mutipexing approaches, as the number of barcodes and the avaiabe read-ength dramaticay increased within the ast few years. The decoding of barcodes based on watermarks aso provides an efficient method for fast decoding of arge scae mutipexing approaches. Methods The main concepts presented in this section originates from the ideas first given by Davey and MacKay [20]. The fact, that quaternary watermark codes can be appied for DNA sequencing was preiminary outined in [23], but sequence constraint on practica oigonuceotides can not be found in this conference paper. Let us denote some basic facts about barcodes first and postpone specific

3 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 3 of 14 sequence constraints to the section about reasonabe encoder settings. For appying the concepts of watermark codes on DNA barcoding we have to focus on the foowing constraints, with impications for the modifications we consider here. We highight our contribution to the basic concepts with the foowing barcode constraints: A barcode is a quaternary sequence. Therefore we wi propose generaized non-binary modes and concepts here (the origina channe was considers stricty binary). in genera embedded in other sequences. Hence we give an adapted transmission scheme and a nove approach to detect barcode boundaries (in [20] the watermark pattern is repetitive and symbos are decoded as stream). ength-imited. That consequenty restricts the number of code-constructions for which an adaption is practica (ong LDPC codes are use in the origina paper, not pausibe for short barcodes). We wi stress additiona contributions to the work of Davey and MacKay, where needed. Let us start with a mode for the sequencing process and continue with encoding and decoding based on watermarks. Sequencing mode We wi first define a communications theoretic mode to formay describe the barcoding appication. Substitution mode A simpistic communication theoretic mode for the sequencing of barcodes has been proposed in [7]. Namey, afixedbarcodewordb A N,withA = {A, G, C, T} as set of possibe symbos (nuceic acids), is entering a communication channe with output r A N,wherethe channe itsef is described by the conditiona probabiities to receive a string y if x was sent, for x, y A N. For our purposes this mode has to be extended into two directions: First, a barcode word b is assumed to be embedded into a randomy chosen context to obtain the sequence t (detais wi be given beow). Second, the sequence t is sent over a so-caed Sequencing Channe resuting in the received word r. The channe not ony substitutes symbos from t, but is aso abe to deete or insert symbos, hence the ength of r and t may differ. Embedding of barcodes Given a barcode word b of ength N the sequence t is obtained as the concatenation of two random sequences t pre, t post and the sequence b as foows t = t 1 t } {{ } δ+1 t δ+n t } {{ } L A } {{ } L, t pre b=b 1 b 2 b N t post with δ {0, 1,.., L N}. The sequences t pre and t post are assumed to have a random ength and the symbos are chosen uniformy at random, hence the ength L of t is a random variabe. We wi ater choose the engths of the embedding sequences according to a quasi-norma ength-distribution on integers (see Section Simuations). Barcoding and working hypotheses In this paragraph we ike to focus on two different paradigm for sequence embedding in our barcoding approach: First we ike to address the tota embedding of barcodes. We, for most sequencing scenarios, we find the barcodes next to fixed technica sequences, which are more reiabe due to sequence specific (bioogica) reactions, e.g. primer or adapter oigonuceotides. For sequencing experiments, where we can rey on the exact knowedge of the position of a barcode at one end, highy optimized code words has been proposed in [19]. As an extension of our work is possibe to incude t pre or t post as partiay known, nevertheess we want to restrict ourseves to have no prior knowedge about the context. The main gain from this very strict premise is a striking freedom of experimenta setups for which we can appy our concepts. For exampe on patforms with paired end sequencing we are abe to decode barcodes independent of the direction of reads not restricting the reads to start with a barcode symbo. The second aspect is the assumption of inherent barcoded sequences. In this manuscript we focus on the probem of decoding (discrimination of code word sequences), conditioned on the fact that a code word is present in the mutipexed sequences. The probem of detecting a barcode (if it is not guaranteed that every sequence contains one) is a more chaenging probem, which we want to avoid in the present assay. On codes based on sequence edit distance this extended probem is addressed for, e.g. the PacBio SMRT patform in [24]. We conjecture that the detection probem of barcodes based on watermarks can be soved in future investigations. Such investigation might aso ead beyond the barcoding for sequencing appications, which we wi address at the end of this manuscript. Nevertheess, we rey on barcoded sampes for the foowing considerations. Sequencing channe mode We define a very simpified mode for sequence errors and discuss some aspects of oversimpification in the next paragraph. Let us describe the processes invoved in sequencing as a memoryess quaternary channe, i.e. each symbo is handed independenty of others. This channe mode is specified by a set of parameters S, p i and p d that are integrated as foows: A transition matrix S R 4 4 which describes the substitution probabiities, and p i and p d to specify insertion and deetion probabiities.

4 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 4 of 14 The channe is modeed as an infinite state-machine on symbo eve, in which a symbo t i is queued to pass the channe and therefore wi undergo one of the foowing three events: With probabiity p i,thesymbot i remains in the queue and the received stream is proonged with a random inserted symbos where we assume an uniform distribution on A ={A, G, C, T}.Withprobabiityp d,the actua queued symbo is deeted. With probabiity p t = 1 p i p d the symbo t i is passed to a substitution channe which substitutes the symbo t i according to the transition matrix S, with transition probabiities S(r j, t i ) = Pr(r j t i ). In order to downsize the number of parameters in our mode, we wi consider S as symmetric 4-ary substitution matrix ater, i.e., we consider substitutions from a base into another with a singe error parameter (simiar to the mode proposed by Jukes and Cantor [25]). Nevertheess, the mode considered here woud be abe to mimic a refined channe, if exact empirica parameters can be considered. Empirica parameters for a channe mode Empirica data about sequence errors can be seen as the crux of a HMM based approaches in the fied of DNA sequencing, as predictions can ony perform as good as the underying assumptions. But, aside from advertising error rate of the big vendors of sequencing devices, rea estimates are rather rare to find in iterature. Further, there is no rea agreement about a common technique to mine such data in a correct way, e.g, using sequence aignment to predict error count is recenty in critic of over-fitting, due to predefined aignment costs with impact on the estimates. The commutabiity of substitutions, insertion and deetion events (a substitution is equa to a deetion foowed by an insertion) made things even more difficut. Additionay, there are some indications, that the sequencing channe is more compex as the mode we utiize in our approach: Sequencing errors seem to be highy depend on the sequence context, with extended impications on the distribution of symbos in sequenced reads. For Iumina this was shown, e.g. in [26]. We know that the DNA poymerase moecue is prone to bursts of insertions and deetions, if for exampe repetitive symbos (homopoymers) are present in the physica tempate sequence. Note, that the reiabiity of the approach presented here is sensitive to empirica channe parameters to obtain best estimates for demutipexing sampes. A stepwise refinement driven by the feedback of experimenta studies is mandatory to adapt watermark barcodes for the demands of different sequencing patforms. Demutipexing probem During the mutipexing step, different sampes are abeed with different barcodes, for exampe t is abeed with b (b embedded in t )andt contains b (see Figure 1). After Figure 1 Sequence embedding of barcodes. Two exampe sequences t and t abeed with barcodes b and b embedded in an unknown DNA context. passing the discussed sequencing channe the resuting sequences r and r can differ from t and t.asthebarcodes are possiby affected by errors, r and r might be associated to the wrong origin during the demutipexing step, what is caed crosstak. The encoding and decoding based on watermarks is abe to reduce such crosstak. Barcode construction on watermarks For the construction of barcodes we use concatenated encoding simiar to the scheme proposed by Davey and Mackay, which consists of the foowing two bocks (see Figure 2): An outer code C 1 with parameters [F q1, n 1, k 1 ] is a code of ength n 1, dimension k 1 and aphabet size q 1 (Gaois fied F q1 ), that provides a set of q k 1 1 code words. Note, that ong LDPC outer codes are used in [20]. In order to avoid the constructions of short LDPC codes for barcoding, we wi consider different outer codes ater. It is worth to mention, that minima distance and the abiity for soft decoding is the ony demands on outer codes. We consider information words c F k 1 q 1, which are mapped to inner code words d C 1 F n 1 q 1. The code C 1 provides a set of code words with a high minima Hamming distance. The n 1 k 1 redundant symbos are used to arrange outer code words as distant as possibe. Such a code can give a code rate of R 1 = k 1 n 1. The second bock consist of an inner encoder, which works in a compete inverse direction. An inner code C 2 is used to create barcode words that have a ow Hamming distance to a watermark sequence. The simiarity of a barcodes to this watermark pattern is utiized to gain synchronization as expained beow. The inner code adds redundancy to the code words by mapping each outer symbos d j F q1 to a sparse sequence representation s (j) Z n 2 4. The set of sequences s(j) with ow mean Hamming weight (number of non-zero symbos) can be seen as inner code C 2. The cardinaity of this inner code set is q 1 andtherateoftheinnercodecanbestatedas R 2 = og 2 (q 1) n 2 og 2 (4). By joining n 1 oftheseinnercodewords,a

5 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 5 of 14 sparse inner bock s = s (1) s (2) s (n1) of ength N = n 1 n 2 is generated. Abarcodewordb = s w A N is obtained via a symbo-wise adding of an arbitrary watermark sequence w to the sparse inner bock s using a fixed mapping of A = {A, G, C, T} onto Z 4, where addition is defined moduo 4 (expicit mappings in Additiona fie 1 section). The fina set of barcodes is denoted as C = C 2 C 1 throughout this manuscript. The code C gives a code rate R = k 1 og 2 (q 1 ) n 2 og 2 (4). Decoding The decoder consists of two bocks (see Figure 2): An inner decoder D 2, which utiizes a hidden Markov mode (HMM) and channe parameters H to provide symbowise ikeihoods. These are fed into the outer decoder D 1 that performs a maximum ikeihood decoding to obtain an estimate ĉ of the sent code word. We wi first define the HMM and expain how this can be used to find the ikeiest transmitted sequence (optima decoding). Afterwards we give a modified HMM, that enabes to estimate the boundaries of embedded barcodes and end this section with a suboptima symbo-wise decoding approach with owered compexity. HMM for decoding The basic idea of the HMM presented in this paragraph refers to considerations in [20]. To expain the function of the HMM for decoding it is hepfu to ignore the random context and the embedding of code words initiay. Therefore we assume t pre and t post to be absent and the transmitted sequence is exacty one barcode, i.e. t = t 1 t 2 t M=N = b. If no insertions or deetions occur in a channe the received sequence r = r 1 r 2 r L is as ong as the sent sequence t, i.e.l = M, but some symbos r i might differ from t i. Assuming that errors occur independent of the position and equay distributed, we can use fixed substitution probabiities to describe the channe. For channes with insertion and deetion events the symbo-wise fixed association t i r i is usuay ost. For exampe, for a singe insertion event at the i-th symbo, t i wi be associated to r i+1. A singe deetion event before transmitting the i-the symbo, wi shift t i to the symbo r i 1 in the received sequence. Obviousy, such errors accumuate during the transmission. One of the main probems of decoding is to estimate the number of insertions and deetions given a received sequence r. Therefore we define the drift x i at the i- th transmit symbo as (# insertions) - (# deetions) that occurredinthereceivedsequencebeforetakingt i into account. The drifts {x i } can be seen as the hidden states of an HMM. Further, we assume the received sequence r = r 1 r 2 r M = r (1) r (2) r (L) Figure 2 Transmission scheme for barcodes based on watermark codes. Consisting of three bocks: 1) outer coder, encoding index words c as code words of a inear code C 1 [ F q1, n 1, k 1 ] and soft-decoder D 1. 2) inner coder, with inner code C 2 (for sparse sequence representation), decoder D 2 (providing ikeihood vaues) and watermark sequence w (known to encoder and decoder). 3) embedding of barcodes in random context and channe with insertion, deetion and substitution errors.

6 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 6 of 14 to be assembed of sub-sequences r (i),asobservabes, based on the hidden state x i and the transmit symbo t i. Every state transition x i 1 x i causes an emission of a sub-sequences r (i), that is associated to the position i in t (in genera HMMs the emissions are associated to singe states and not to transitions, compare with Figure 3). To characterize the transition probabiities among hidden states and the emission probabiities of observabes in the HMM, we use the foowing set of parameters H : {S, p i, p d, I}. Athough we used an identica notation for parameters as before (infinite state-machine in section Sequencing Channe), the channe mode and the HMM discussed here are not equivaent. The matrix S is parametrizing substitution events, with p i and p d we mode the probabiities of insertion and deetions and I is used to imit the maxima ength of observabes. For computationa reasons we suppose that ony I consecutive (eading) insertions can be present in observabes (the channe mode is capabe to insert an infinite number of symbos). We further assume that the ast symbo of an observabe r (i) is created by either deeting the actua transmit symbo t i or appending t i at the end, substituted according to S. Ift i is deeted, we can either just observe r (i) = ɛ (the empty symbo) or the ast symbo of the observabe wi be a random (inserted) symbo, if eading insertion are present. This imits the set of observabes in our mode to r (i) ɛ, t i, t i, t i, t i, t i,..., } {{ } I, whereweusewidcards to symboize random eading insertions. With t i Z 4 \ t i we denote symbos differing from t i. Now we give the transition and emission probabiities of the stated HMM. Due to our non-binary adaption, we use a sighty different notations than used in the origina approach in [20]. An eementary part for both Figure 3 Iustration of the HMM. Exempary series of transitions of the HMM and observabes r (j), based on state transition x j 1 x j and transmit symbos t j for j {i 1, i, i + 1}. t i probabiities is the ength distribution of observabes, that can be derived from 1 for = 0 α ( (p ) = i ) for 0 < 1 (p < I i ) I 0 ese, the probabiity of observing random insertions, conditioned on an observabe of ength. The probabiity for a sequence r (i) of ength is given by p() = α ()p d + α ( 1)p t, with p t = 1 p i p d. The sequence can be obtained via insertions and deeting t i or via 1 eading insertions and attaching t i (or t i )asastsymbo. The transition probabiity of the HMM can easiy be stated as Pr(x i x i 1 ) = p() by choosing = x i x i 1 + 1, i.e. two consecutive drift states determine the ength of observabes and vice versa. The joined probabiity Pr(r (i), x i x i 1 ) of observing r (i) with a state transition x i 1 x i is Q S (, r (i), t i ) = α () p 4 d + α ( 1) p 4 1 t S(r (i), t i ), with 0 I + 1. Expicity, r (i) can consists of random insertions and is not associated with t i (as t i is deeted with probabiity p d ). It is aso possibe to have 1 eading insertions and t i is substituted according to the substitution matrix S. The probabiity Q S can easiy be used to cacuate ikeihoods Pr(r H, t) as shown in the next section. Note, for the origina binary perspective in [20] no matrix representation of the substitution probabiity is needed. Here we generaize the basic concepts. Optima decoding Given a received sequence r = r 1 r 2 r L,thesetof possibe barcodes b = b 1 b 2 b N C and mode parameters H we can describe the decoding as the foowing maximization approach ˆb = arg max{pr(r H, b)}. b C The sequence r is most ikey to be captured, if ˆb is assumed to be the transmitted sequence and the channe exacty acts as described with the mode parameters H. LetusshortyexpainhowPr(r H, b) can be cacuated for the given HMM. The foowing methods can be found in common text books or, e.g. in [27], but we reca the cacuations as we focus on a very specia kind of HMM here. Therefore we have to associate the observabes of the HMM to received symbos, which is achieved with the drift states. If the barcode symbo b i isassumedtobethe i-th transmit symbo t i,thenb i can be associated to the i- th observabe r (i) by the HMM. The drift state x i can be used to associate the ast symbo r (i) of an observabe to

7 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 7 of 14 the symbo r i+xi in the received sequence r. Tocacuate ikeihood vaues we consider the so-caed forward metric F i (y) = Pr ( r 1 r i 1+y, x i 1 = y H, b ) as probabiity of having reached the drift state y and received symbos r 1 r i 1+y before considering the i-th transmitted symbo. In a transmission setting where t = b we can not have any non-zero drift state before considering the first symbo b 1.ThuswedefineF 1 (y) = Pr(x 0 = y) as F 1 (y = 0) = 1andF 1 (y = 0) = 0. The forward metric can easiy be cacuated as I+1 F i+1 (y) = F i (y δ )Q S (, r i+y, b i ), =0 with δ = 1, for 1 i N and 1 i+y M.Theikeihood vaues Pr(r H, b) can be obtained as F N+1 (L M). Using this optima decoding approach there is no need for outer decoding and demutipexing ends up with an estimated barcode as output of the inner decoder. For arge sets of code words this maximum ikeihood approach becomes computation expensive, as for an outer code of dimension k 1 and symbo aphabet of size q 1 there are q k 1 1 cacuations of the forward metric eft for decoding. Before we give a suboptima decoder with reduced compexity, we have to consider how sequence boundaries can be obtained for barcodes embedded in random context. HMM to estimate barcode boundaries Up to this paragraph we have not discussed the roe of the watermark in the transmission. We have just iustrated an HMM abe to give the ikeiest received sequences r conditioned to a hidden transmit sequence t and parameters H. Wewinowtakeacoserookatthewatermark and how the inner encoding can be understood from a communication theoretic point of view (see Figure 2). Reca that a barcode is constructed via addition of two quaternary sequences s, w Z N 4 as b = s w. Due to the symmetry of the addition, there are two ways to perceive a data transmission: Apparenty there is the transmission of s w b with w causing some distortion. A further vaid perspective of the transmission is w s b, which takes w as source with s causing substitution errors. Here we stick to the ast concept treating the symbos s i as independent and identicay (iid) distributed errors on w (which is used to find the boundaries of barcodes). Givenaniidassumptionforsymboss i and the inner code C 2 we can cacuate an extended substitution matrix S,withprobabiitiesS (b i, w i ) for pacing a symbo b i in the barcode, when a w i ispresentatpositioni.weusethe matrix S to define an upstream meta-channe that causes additiona substitutions to those generated by the rea sequencing channe. In anaogy to the previous paragraph, we can state emission probabiities ( ) ( ) Q E w i, r (i) = α () p 4 d + α ( 1) p 4 1 t E r (i), w i, for observing a certain r (i) if the symbo w i is present at position i.with E ( r (i), w i ) = b ( S r (i) ), b S (b, w i ) we denote the effective probabiity of having substitutions duetothesequencess and the substitution matrix S.The task of estimating barcode boundaries can be reduced to the estimation of the ikeiest sequence of drift states Pr(x 1 x 2 x N+1 H, r, w) in the HMM using Q E w i as we show in the next section. Finding barcode boundaries For embedded code words we can understand the symbo b i as shifted transmit symbo t δ+i and thus b i has to be inked to the observabe r (δ+i) in the HMM (see section Embedding of barcodes). But we can easiy integrate the sequence offset δ as initia drift x 0.Forthesymbosb i we therefore redefine the drift states {x i } for the HMMs according to embedded symbos. To cacuate ikeihood vaues for received sequences, we now consider the forward metric F i (y) = Pr(r 1 r i 1+y, x i 1 = y H, w) as probabiity of having reached the drift state y and received symbos r 1 r i 1+y before considering the i-th watermark symbo. We furthermore have to determine an initia distribution for the quantity F 1 (y) to be abe to cacuated the forward metric I+1 F i+1 (y) = F i (y δ )Q E ( ) w i, ri+y, =0 with δ = 1. We can furthermore cacuate backward quantities B i (y) = Pr(r i+1+y x i = y, H, w) as probabiity of receiving a certain tai of symbos starting with r i+1+y given a state y associated with the i-th watermark symbo, which can be cacuated as I+1 B i 1 (y) = B i (y + δ )Q E ( ) w i, ri+y+δ. =0 If we have a good guess for the distribution F 1 (y 1 ) and B N (y N ), i.e. an a priori distribution of having barcode boundaries at position y 1 respective N + y N,wemakeuse of it. To enabe an aignment of the embedded barcodes, we have to introduce nove prior distributions, sighty different to those proposed by Davey and MacKay. Anyway, a conservative approach is setting non-zero probabiities for

8 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 8 of 14 F 1 (y) = M 1,where0 y M and B N(y) = 1ondriftpositions N y M N. Herewehavetodifferfromthe origina concept [20], which is needed to detect a singe non-repetitive watermark within an unknown context. The ikeiest drift associated with the i-th embedded symbo is finay inferred as ˆx i = arg max y { Pr(y H, r, w) Fi+1 (y)b i (y) }. The estimates ˆx 0 and ˆx N are used to refer to the barcode boundaries in the received sequence. Symbo-wise ikeihoods We can finay perform a symbo-wise decoding as foows: The forward and backward metric does not ony provide estimates for the start and the end position of an entire barcode word, but aso enabes to cacuate conditiona ikeihoods ( P r H, w, s (j)) = F u (y)π u (y, z, s (j)) B v (z) y,z basedoninnercodewordss (j) C 2. With the indexes u = (j 1)n 2 + 1andv = jn 2 we denote deimiters of s (j) (compare Figure 2). The drifts y and z define potentia boundaries u + y and v + z of an emitted sub-sequence of r that is assumed to depend on s (j).withπ u (y, z, s (j) ) we symboize a truncated version of the forward metric, starting at states y andendingatstatesz. Fortheevauation ( of π we further consider emission probabiities Q S, r (i), w i s i ). As the inner code words are determined by outer code symbos, i.e. C 2 : d j s (j),we can easiy derive symbo-wise margina a posteriori probabiities P(d j H, r) from the conditiona ikeihoods. The symbo-wise marginas are finay utiized in the outer coder (see Figure 2). Using this suboptima inner decoding approach, we are abe to decrease the computationa costs to q 1 n 1 evauations of the forward metric π (compare section Optima decoding). As we need an additiona soft decoding for the outer code, there are further operations needed: For a maximum ikeihood approach we have to consider q1 k code words and search for the ikeiest soution. Simuations In order to perform an in siico appication of barcodes based on the watermark concept, we first have to define some reasonabe setting for encoding, which is aready quite chaenging. Reasonabe encoder settings As stated before (see section Barcode construction on watermark), there are different parameters, which infuences the concepts and for which we have to find a reasonabe setup to run simuations. First there is an outer codes, which shoud in combination with the inner coding ead to short barcode words, because we do not want to produce exceptiona overhead with mutipexing (tagging) target fragments. There is the minimum distance of outer code words and the sparsity of the inner encoder, which can independenty be characterized. And finay we have the watermark sequence, which can randomy be chosen. We utiize the degree of freedom with the watermark to incorporate with additiona sequence constraint, that barcodes shoud fufi to be experimentay vaid. Therefore we run a greedy search for the watermark pattern that maximizes the number of barcodes that meet a sequence constraint. But et us consider a particuar setting one by one in the foowing paragraphs. Suitabe outer codes For the construction of barcodes we focus on a targetength of N = n 1 n 2 [12,.., 25] symbos with n 1 {3, 4, 5, 6} and [ n 2 {4, 5, 6, ] 7, 8}. Further we imit the outer code C 1 Fq1, n 1, k 1,d H to the best known inear codes isted in [28] for severa cardinaities of Gaois fieds F q1, for which we considered q 1 {2, 3, 4, 5, 7, 8, 9}. Long LDPC codes has been used in the origina approach of Davey and MacKay (see [20]), but as the construction of short LDPC codes woud be somehow confusing for readers invoved in channe coding, we decided to use the best known inear codes. But we might note, that the hamming distance and the abiity for soft decoding is the ony demand on outer codes here and LDPC codes are ikey to perform equivaent. The minima Hamming distance d H of the best known inear codes we used is either maxima or the highest known regarding given parameters. Additionay we bound the dimension k 1 to achieve code set sizes 48 < C 1 =q k 1 1 < This guarantees a certain minima code rate on one side and imit the computationa effort of the outer decoding the other side. We end up at 263 possibe code configurations, but most of the resuting codes perform disastrous with the watermark concept, because the watermark is heaviy corrupted by inner code words. Sparsity of the inner code We consider the density (mean Hamming weight) as w H (C 2 ) = 1 w H (s) n(c 2 ) C 2 s C 2 for the inner code, with w H ( ) as Hamming weight, n( ) as ength and as cardinaity of the code. The inner code C 2 needs to exhibit a ow density, to keep the watermark shining through the barcodes, when inner code words are added. For arge densities there is no abiity eft to detect the barcode boundaries and consequenty decoding wi fai competey. Inspired by the approach in [29] we avoid

9 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 9 of 14 thea-zeroscodewordinc 2,butfurtherboundthedensity to w H (C 2 ) < 0.3, to keep the watermark present. Finay, we end up in a set of 73 parameter configurations for q 1, k 1, n 1 and n 2. For each of the 73 different parameter sets we run a brute-force search (10 7 trias), where we iterativey seected one inner code and watermark randomy to produce barcode sets. From the evauated settings we kept the one, which met the foowing sequence constraints. Sequence constraints We fitered for code words with unbaanced counts of symbos, to respect imitations on the GC content of barcodes. Such fitering can be seen as a de facto standard in the construction of barcodes (see for exampe [1,7,19]) and is reated to technica constraints due to the preparation and the sequencing of genomic materia. The reative frequency of a subset of two symbos shoud not be beow 40% and above 60% in each barcode, otherwise we excuded the barcode. We furthermore excude barcodes with prefect sef-compementation and more than two sequentia repetitions of the same base (homopoymer ength), simiar to the restrictions stated in [24]. We consider this settings as sufficient and strict enough to avoid experimenta probems during the preparation in rea sequencing tasks. Discarding such inappropriate code words means an additiona oss in code rate. Increasing the mean edit distance For decoding based on the HMM approach, edit distance impicity matters, thus we try to increase the mean edit distance of code word with a simpe strategy. For two sets of barcodes with identica counts of remaining barcodes (after fitering) we keep the one maximizing the mean edit distance d E (C) = 1 n(c)(n(c) 1) b i,b j C:i =j d E (b 1, b 2 ), where d E (b 1, b 2 ) denotes the edit distance [30] of barcodes b 1 and b 2, which can be understood as the number of singe-symbo sequence operations to transform b 1 into b 2 or vice versa. But, as the edit distance is bounded by the hamming distance, we do not gain a ot with this fina heuristic refinement step. Estimating the decoding error To evauate the refined set of 73 codes, we give the foowing demutipexing scenario. For each code we consider an error curve according the estimates of the decoding error probabiity on different channe settings. The estimation of a singe point in the error curve is based on a set of 200,000 barcodes, which we refer to as batch. Each batch is processed with a certain symbo mutation probabiity p mut [.01,.16]. This vaue determines a symbo-wise probabiity for an edit operation that can be caused by our channe mode. Simiar approach has been considered in [19], but we ike to be a bit more precise with the description of the modifications of the channe mode. Buschmann et a. [19] considered a minima mutation probabiity of 10 1 for their evauation, but others caim, that error rates are severa orders ower [11]. We wi take such indications into account. Each barcode from a batch is embedded in a random context of variabe ength (compare section Embedding of barcodes). We use norma distributed random variabes (with mean μ = 50 and variance σ = 5) to determine the ength of t pre as we as t post.wesimpytakethenearest non-negative integer to define the ength of the random context, which is uniformy distributed on {A, C, G, T}. We further use a state machine to produce erroneous received sequences based on the foowing four events: correct transmission C, substitution S, insertion I and deetion D. In sighty different notation to the former mode (see. Sequencing Channe) we assign probabiities to the events C and S and do not use conditiona probabiity ike a substitution matrix S. The probabiity for correct transmission C or substitution S is equa to p t in the former representation. The probabiity for C equas 1 p mut and the probabiity for any of the error events (S,I or D)isp mut. Every error event S, I or D is equa ikey. To obtain an equa distribution among the mentioned events, it is easier to use the present notation, but equivaent behavior can be approached with both versions of the channe state-machines. To save decoding time we iterativey pass each transmit sequence through the state machine, unti we have at east one error event within the barcode region. The probabiity of having a defective code word of ength N is Pr(def) = 1 (1 p mut ) N. We further considered an error-free barcode to be decoded perfecty, i.e. Pr(err def) = 0, with err denoting the event of decoding error. Pease note that this oversimpifies the fase positive rate introduced by sporadic simiarities of the context with dedicated barcode words. The rate is supposed to grow inear with the context ength and the size of barcodes set. Nevertheess, the probabiity for fase positive events exponentiay tend to zeros with the ength N of the code words. For barcodes of ength 12 embedded in 100 random symbos we consider this error as margina offset for the estimated decoding errors. Our evauation finay end up in estimating the conditiona error Pr(err def), which gives an estimate for the unconditioned error probabiity as P e = Pr(err def)pr(def). Resuts and discussion First we give a rough overview on meaningfu properties of watermark-based barcodes with the considered 73 parameterizations. Furthermore, we provide a refined

10 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 10 of 14 anaysis and evauation for certain codes in the aready defined decoding scenario. To minimize the textua eements in the figures, we use the foowing notation: q 1 k 1 n 1 n 2 indicates [ the ] concatenated coding using an outer code C 1 Fq1, n 1, k 1 and an inner code C2 : F q1 Z n 2 4. The particuar codes can be found in the Additiona fie 1 section of this paper. Properties of barcodes based on watermarks In Figure 4 we ink the principa characteristics as mean edit distance d E,meandensityw H and the cardinaity C 2 C 1 of barcodes in a comprehensive iustration. We use star-symbos to indicated the mean density (severa eves) and two dimensiona coordinates to ink mean edit distance and the size of code sets. There are distinct bocks, where the infuence of the inner code can be separatey examined. With fixing the outer code, e.g. q n 2 or q n 2 for q 1 {7, 8, 9} andincreasingtheengthn 2 of inner code words, one can deduce how code rate is exchanged for owered mean density and increased mean distance. However, we see configurations, for exampe n 2 for n 2 {4, 5, 6}, where we have not been abe to increase the mean distance by proonging the inner code. We have observed severa custers, where simiar effects can be found. For the codes q 1 k 1 4 n 2 or q 1 k 1 5 n 2 and q 1 k 1 6 n 2 the effect of the outer code can be separated. We find custers of star symbos at different mean distances (see darkened areas in Figure 4). These eves can be expained through the different minimum Hamming distance of the outer codes. We have Hamming distances 2,3 and 4 for the present outer codes at n 1 equas to 4,5 and 6. For concatenated coding it is known that the minimum distances of inner and outer codes are mutipicative [31]. As the edit metric is upper bounded by the Hamming distance, we anticipate the described eves for edit distances. The mentioned eveing can consistenty be found for a outer codes. Despitewehavemaximizedtheeditdistanceofbarcodes on average, it is aso interesting to focus on the pairwise distance of barcodes. To examine the distance in detai we utiize the so-caed distance distribution. In [32] the average number of code words at a certain distance to a fixed code word is considered as an usefu distance measure for non-inear codes based on Hamming metric. The edit distance distribution of a codes C consists of the numbers D e = 1 { (i, j) :d E (b i, b j ) = e, b i, b j C) }, M Figure 4 Properties of the examined barcodes based on watermarks. Star-symbos indicate the mean density and two dimensiona coordinates [ ink the ] mean edit distance d E and the size C 2 C 1 of codes. Parameters of the barcodes are abeed as q 1 k 1 n 1 n 2, denoting an outer code C 1 Fq1, n 1, k 1 and inner code C2 : F q1 Z n2 4. Codes evauated in refined anaysis/simuations (see. Figure 5 and 6) are coored.

11 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 11 of 14 where M = C ( C 1) and d E denotes the edit distance.infigure5weiustratethedistancedistribution of barcodes with parameters 8 3 n 1 n 2 (Figure 4, in bue). Apparenty, there are particuar pairs of code words with very ow edit distances, but as we reca the code construction based on inner code words with very ow Hamming weight, this fact is not too surprising. Nevertheess, some onger codes show a negigibe amount of such code words with sma edit distances. For instance, in the code ess than 1% of a possibe pairs of barcodes show an edit distance d E < 6. According to the very strict fitering (see Sequence constraints), we had to prune out the sets of q k 1 1 outer code words, what additiona owers the code rate. A detaied description about the excuded barcodes can be found in the Additiona fie 1. Evauation of decoding In Figure 6 we iustrate the estimates P e for the decoding error probabiity of different codes settings. We ran simuations for a 73 barcode set and ranked the sets according the decoding behavior. To give a rough outine for the performance of our approach we show the barcodes, which performed best (Figure 4 and 6 in green) and worst (Figure 4 and 6 in red). A barcode ength of 12 (codes q 1 k 1 3 4) seems to be insufficient to provide a good synchronization based on watermarks and thus the majority of decoding errors were found to be caused by synchronization issues (resuts not shown). The best performing sets of barcodes surprisingy have not occupied the maxima possibe ength, but 24 symbos. As there are ony 49 sequences avaiabe, this set of code words provides a poor code rate of og 2 (72 ) og 2 (4 24 ) = A reasonabe trade-off between error-correcting capabiity and cardinaity is provided for exampe by codes with parameters 8 3 n 1 n 2 (Figures 4, 5 and 6, in bue). Athough we are facing reative ow code rates (compared to approaches ike [19]) ranging from to 0.281, more than 400 sequences avaiabe with quite surprising decoding capabiity. Decoding compexity According to [20] we utiized a fast decoding approach with reduced compexity for our simuations. We bounded the maxima number of possibe drift states {x i } in a HMMs to {5,.., 20} according to the suggestion given by Davey and Mackay. The compexity for decoding a singe embedded barcode is O(MLI + q 1 N I + q k 1 1 ),with N denoting the ength of the barcode, that is embedded in a received sequence of ength M. Decoding is based on the assumption that the channe can introduce maxima I inserted symbos and we consider the maxima drift between received and transmitted sequence to be imited by. An order of MLI cacuation are needed to estimate barcode boundaries, n 2 I operations are needed to provide soft-vaues for each of the q 1 n 1 possibe outer code words (resut in q 1 N I) andq k 1 1 fina comparisons has to be spent for soft decoding the outer code in the most expensive case (maximum ikeihood). The prototype decoder with which we ran our simuationsisimpementedinmatlab. We further paraeized the decoding procedure and created jobs of 10 6 received sequences, that were processed by singe cores (Opteron, 2.6 GHz). The average ength of received sequences was in a range of 112 and 150 symbos, resuting in an average processing time of 6 hours for the tasks with owest cacuation costs (code parameters ). The ongesttime we needed to compete demutipexing of 10 6 sequences (code parameters ) was stricty beow 24 hours (on a singe core). Future directions Apart from the theoretica considerations we have given in this manuscript, there are ot of future direction starting from this initia point. Some of them are mandatory to enabe an appication in rea bioogica experiments, Figure 5 Exempary distance distribution of a sets of barcode based on watermarks. D e denotes the reative number of code-pairs with an edit distance equa to e. Each barcode set consists of maxima 512 code words, according to the examined outer codes C 1 [F 8, n 1,3] (see aso Figure 4, in bue). The distance distribution is normaized regarding the cardinaity of code words after fitering.

12 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 12 of 14 Figure 6 Simuation resuts for a reaistic decoding [ scenario. ] Estimated decoding error probabiity P e for barcode sets with parameters q 1 k 1 n 1 n 2, consisting of an outer codes C 1 Fq1, n 1, k 1 and inner code C2 : F q1 Z n2 4. A evauated codes are highighted in Figure 4 with identica coors. Each barcode set is tested for different symbo mutation probabiities p mut [.01,.16]. On average, each randomy drawn barcode is embedded in 100 random symbos before decoding. others are modifications of the concepts for extended appications. Let us first address the essentia steps needed to estabish an HMM based decoding in rea experiments: The HMM, as core of the decoding system, is the most sensibe part of the concept. It is mandatory to run experiments to gather reiabe data about a channes, the concepts shoud be used for. From our point of view there is a ack of reiabe data about insertions, deetions and substitution errors for possibe channe modes. For the sequencing appication we assume that different patforms show a variety of sequencing channes, additionay affected by experimenta parameters, e.g. the extend of PCR pre-processing of sampes. To obtain an optima suited decoder, the HMM shoud be adapted to the considered channes. As most of the channes show a correation of errors, more compex HMMs shoud be considered, refecting a channe mode with memory. Finay, it might be possibe to estabish a sef-adaptive agorithm to parametrize the HMMs without any prior knowedge about the ratio of errors in the channe. A suitabe statistic and refined caibration steps shoud be invented. Another important point for estimating the error characteristics is the construction of watermark codes. Exact empirica parameters of the channe coud be incorporated in the design of watermark codes to improve decoding steps, suited for specia channes. Further aspects that coud be considered with the given concepts are the foowing: Aside from the synchronization aspect in this manuscript, it seems very promising to use the maximum ikeihood decoding method for other sequences than watermark codes. Conditioned on good empirica parameters for an underying HMM one coud consider a reiabe detection of barcodes based on the Sequence-Levenshtein distance in a probabiistic way rather than based on sequence aignment. In the presented approach we focused on the discrimination of code words, assuming codes are aways present in inspected sequences. The detection of code words within DNA context is another big issue that shoud be soved for future investigations using an HMM based decoding. Resent research shows that even for sequencing approaches the detection of barcodes is quite chaenging. In [24] they focus on a specific probem with certain setups on the PacBio SMRT patform. Caused by technica reasons, sporadicay barcodes are not present in the sequence data. Another interesting fied of appication of an HMM based sequence detection coud be cona studies, where the sequenced genome coud or coud not contain a predefined sequence, which was introduced in ancestor organisms. Concusion We proposed an adaption of the watermark concept of [20] for DNA barcoding. A generaized channe mode for sequencing and suitabe modifications of the decoder were defined. Moreover, we investigated in a strategy to choose watermark sequences and inner codes in a reasonabe way to enabe barcoding in ine with common experimenta requirements. We provide a code construction, considering the best known inear codes as outer codes and bioogica sequence constraints to fiter for suitabe code words, resuting in an exempary set of 73 different code sets ranging from 12 to 24 nuceotides. The codes are iustrated in a comprehensive scheme, highighting watermark specific parameters as we as the mean edit distance, to give an impression how watermark based barcodes coud be characterized. For a reduced set of codes

13 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 13 of 14 we finay evauated the demutipexing of sequences in a reaistic simuation scenario. Within this in siico evauation we coud show that barcodes based on watermarks can theoreticay be used for mutipexing. It is remarkabe, that even with very short watermark patterns we are abe to reiaby find the barcodes boundaries in order to discriminate different code words with an HMM based decoder. The probabiity of decoding errors, which finay eads to the undesired cross-tak phenomenon was found to be very ow. Other approaches that investigate barcodes with arge (sequence) edit distance [16,19] show significant higher code rates for shorter barcodes, but we have given an entirey different concept that aows for arge scae mutipexing approaches, aso abe to hande insertion and deetion errors. Moreover, we can provide the marker-ess synchronization based on watermarks, to recover the barcode boundaries. This synchronization concept provides an utimate degree of freedom for experimenta sequencing setups as we as for future appications, aso apart from the sequencing context. Additiona fie Additiona fie 1: We provide 73 fies containing a the examined barcodes. Further informations are given, ike: The watermark in decima and nuceic acid notation, as we as the mapping of the inner encoder and the used mapping from integers to nuceic acid Z 4 {A, G, C, T} are given in a preambe. We furthermore give the distance distribution of barcode words. Finay, a barcodes are isted (pus expicit construction via code concatenation). Dropped barcodes, which do not meet the sequence constraints are expicity indicated. Competing interests The authors decare that they have no competing interests. Authors contributions ST initiated to work on a generaized channe mode for the concepts in [20] and inspired an adaption for DNA barcoding. DK accompished the main deveopments on genera methods, the decoder and the simuation framework. DK was aso assisted by the former master student Mahmoud Amarashi, who submitted a remarkabe thesis. DK and ST wrote the manuscript. Both authors read and approved the fina manuscript. Acknowedgements David Kracht is funded by the Deutsche Forschungsgemeinschaft (DFG) under the grant BO 867/30-1 in the priority program SPP Steffen Schober is supported by the DFG grant SCHO 1576/1. The authors ike to thank Mahmoud Amarashi for fruitfu discussions and assistance in preiminary considerations and managing preparative simuations. Received: 7 May 2014 Accepted: 29 January 2015 References 1. Hamady M, Waker JJ, Harris JK, God NJ, Knight R. Error-correcting barcoded primers for pyrosequencing hundreds of sampes in mutipex. Nat Methods. 2008;5(3): Bystrykh LV. Generaized dna barcode design based on hamming codes. PLoS One. 2012;7(5): Hamming RW. Error detecting and error correcting codes. Be Syst Tech J. 1950;29: Krishnan AR, Sweeney M, Vasic J, Gabraith DW, Vasic B. Barcodes for dna sequencing with guaranteed error correction capabiity. Eectron Lett. 2011;47(4): Lin S, Costeo DJ. Error contro coding, vo Engewood Ciffs, New Jersey: Prentice-ha; Frank DN. Barcraw and bartab: software toos for the design and impementation of barcoded primers for highy mutipexed dna sequencing. BMC Bioinformatics. 2009;10(1): Mir K, Neuhaus K, Bossert M, Schober S. Short barcodes for next generation sequencing. PLoS One. 2013;8(12): Gies A, Megécz E, Pech N, Ferreira S, Maausa T, Martin JF. Accuracy and quaity assessment of 454 gs-fx titanium pyrosequencing. Bmc Genomics. 2011;12(1): Carneiro MO, Russ C, Ross MG, Gabrie SB, Nusbaum C, DePristo MA. Pacific biosciences sequencing technoogy for genotyping and variation discovery in human data. BMC Genomics. 2012;13(1): Bragg LM, Stone G, Buter MK, Hugenhotz P, Tyson GW. Shining a ight on dark sequencing: characterising errors in ion torrent pgm data. PLoS Comput Bio. 2013;9(4): Loman NJ, Misra RV, Daman TJ, Constantinidou C, Gharbia SE, Wain J, et a. Performance comparison of benchtop high-throughput sequencing patforms. Nat Biotechno. 2012;30(5): Shendure J, Ji H. Next-generation dna sequencing. Nat Biotechno. 2008;26(10): Yang X, Chockaingam SP, Auru S. A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 2013;14(1): Adey A, Morrison HG, Xun X, Kitzman JO, Turner EH, Stackhouse B, et a. Rapid, ow-input, ow-bias construction of shotgun fragment ibraries by high-density in vitro transposition. Genome Bio. 2010;11(12): Qiu F, Guo L, Wen TJ, Liu F, Ashock DA, Schnabe PS. Dna sequence-based bar codes for tracking the origins of expressed sequence tags from a maize cdna ibrary constructed using mutipe mrna sources. Pant Physio. 2003;133(2): Faircoth BC, Genn TC. Not a sequence tags are created equa: designing and vaidating sequence identification tags robust to indes. PLoS One. 2012;7(8): Ashock D, Guo L, Qiu F. Greedy cosure evoutionary agorithms. In: Computationa inteigence, proceedings of the word on congress on, vo. 2. Piscataway: IEEE; p Ashock D, Houghten SK. A nove variation operator for more rapid evoution of dna error correcting codes. In: Computationa inteigence in Bioinformatics and computationa bioogy, CIBCB 05. Proceedings of the 2005 IEEE symposium on. Piscataway: IEEE; p Buschmann T, Bystrykh LV. Levenshtein error-correcting barcodes for mutipexed dna sequencing. BMC Bioinformatics. 2013;14(1): Davey MC, Mackay DJC. Reiabe communication over channes with insertions, deetions, and substitutions. Inf Theory IEEE Trans. 2001;47(2): Haughton D, Baado F. Biocode: Two bioogicay compatibe agorithms for embedding data in non-coding and coding regions of dna. BMC Bioinformatics. 2013;14(1): Haughton D, Baado F. A modified watermark synchronisation code for robust embedding of data in dna. In: Acoustics, speech and signa processing (ICASSP), 2013 IEEE internationa conference on. Piscataway: IEEE; p Kracht D, Schober S. Using the davey-mackay code construction for barcodes in dna sequencing. In: Turbo codes and iterative information processing (ISTC), th internationa symposium on. Piscataway: IEEE; p Buschmann T, Zhang R, Brash DE, Bystrykh LV. Enhancing the detection of barcoded reads in high throughput dna sequencing data by controing the fase discovery rate. BMC Bioinformatics. 2014;15(1): Jukes TH, Cantor CR. Evoution of protein moecuese. Mamm Protein Metab. 1969;3: Minoche AE, Dohm JC, Himmebauer H. Evauation of genomic high-throughput sequencing data generated on iumina hiseq and genome anayzer systems. Genome Bio. 2011;12(11): Rabiner L, Juang BH. An introduction to hidden markov modes. ASSP Mag IEEE. 1986;3(1):4 16.

14 Kracht and Schober BMC Bioinformatics (2015) 16:50 Page 14 of Grass M. Searching for inear codes with arge minimum distance In: Bosma W, Cannon J, editors. Discovering mathematics with magma reducing the abstract to the concrete. Agorithms and computation in mathematics, vo. 19. Heideberg: Springer; p Briffa JA, Schaathun HG. Improvement of the davey-mackay construction. In: Information theory and its appications, ISITA internationa symposium on. Piscataway: IEEE; p Levenshtein VI. Binary codes capabe of correcting deetions, insertions and reversas. Soviet Phys Dokady. 1966;10(8): Forney GD. Concatenated codes, vo. 11. Cambridge: MIT Press; MacWiiams FJ, Soane NJA. The theory of error-correcting codes, vo. 16. Amsterdam, Netherands: Esevier; Submit your next manuscript to BioMed Centra and take fu advantage of: Convenient onine submission Thorough peer review No space constraints or coor figure charges Immediate pubication on acceptance Incusion in PubMed, CAS, Scopus and Googe Schoar Research which is freey avaiabe for redistribution Submit your manuscript at

Secure Network Coding with a Cost Criterion

Secure Network Coding with a Cost Criterion Secure Network Coding with a Cost Criterion Jianong Tan, Murie Médard Laboratory for Information and Decision Systems Massachusetts Institute of Technoogy Cambridge, MA 0239, USA E-mai: {jianong, medard}@mit.edu

More information

Australian Bureau of Statistics Management of Business Providers

Australian Bureau of Statistics Management of Business Providers Purpose Austraian Bureau of Statistics Management of Business Providers 1 The principa objective of the Austraian Bureau of Statistics (ABS) in respect of business providers is to impose the owest oad

More information

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations.

Business schools are the academic setting where. The current crisis has highlighted the need to redefine the role of senior managers in organizations. c r o s os r oi a d s REDISCOVERING THE ROLE OF BUSINESS SCHOOLS The current crisis has highighted the need to redefine the roe of senior managers in organizations. JORDI CANALS Professor and Dean, IESE

More information

A New Statistical Approach to Network Anomaly Detection

A New Statistical Approach to Network Anomaly Detection A New Statistica Approach to Network Anomay Detection Christian Caegari, Sandrine Vaton 2, and Michee Pagano Dept of Information Engineering, University of Pisa, ITALY E-mai: {christiancaegari,mpagano}@ietunipiit

More information

Teamwork. Abstract. 2.1 Overview

Teamwork. Abstract. 2.1 Overview 2 Teamwork Abstract This chapter presents one of the basic eements of software projects teamwork. It addresses how to buid teams in a way that promotes team members accountabiity and responsibiity, and

More information

Life Contingencies Study Note for CAS Exam S. Tom Struppeck

Life Contingencies Study Note for CAS Exam S. Tom Struppeck Life Contingencies Study Note for CAS Eam S Tom Struppeck (Revised 9/19/2015) Introduction Life contingencies is a term used to describe surviva modes for human ives and resuting cash fows that start or

More information

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page Chapter 3: JavaScript in Action Page 1 of 10 Chapter 3: JavaScript in Action In this chapter, you get your first opportunity to write JavaScript! This chapter introduces you to JavaScript propery. In addition,

More information

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies ISM 602 Dr. Hamid Nemati Objectives The idea Dependencies Attributes and Design Understand concepts normaization (Higher-Leve Norma Forms) Learn how to normaize tabes Understand normaization and database

More information

Face Hallucination and Recognition

Face Hallucination and Recognition Face Haucination and Recognition Xiaogang Wang and Xiaoou Tang Department of Information Engineering, The Chinese University of Hong Kong {xgwang1, xtang}@ie.cuhk.edu.hk http://mmab.ie.cuhk.edu.hk Abstract.

More information

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing. Fast Robust Hashing Manue Urueña, David Larrabeiti and Pabo Serrano Universidad Caros III de Madrid E-89 Leganés (Madrid), Spain Emai: {muruenya,darra,pabo}@it.uc3m.es Abstract As statefu fow-aware services

More information

Chapter 3: e-business Integration Patterns

Chapter 3: e-business Integration Patterns Chapter 3: e-business Integration Patterns Page 1 of 9 Chapter 3: e-business Integration Patterns "Consistency is the ast refuge of the unimaginative." Oscar Wide In This Chapter What Are Integration Patterns?

More information

Early access to FAS payments for members in poor health

Early access to FAS payments for members in poor health Financia Assistance Scheme Eary access to FAS payments for members in poor heath Pension Protection Fund Protecting Peope s Futures The Financia Assistance Scheme is administered by the Pension Protection

More information

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger

Advanced ColdFusion 4.0 Application Development - 3 - Server Clustering Using Bright Tiger Advanced CodFusion 4.0 Appication Deveopment - CH 3 - Server Custering Using Bri.. Page 1 of 7 [Figures are not incuded in this sampe chapter] Advanced CodFusion 4.0 Appication Deveopment - 3 - Server

More information

GREEN: An Active Queue Management Algorithm for a Self Managed Internet

GREEN: An Active Queue Management Algorithm for a Self Managed Internet : An Active Queue Management Agorithm for a Sef Managed Internet Bartek Wydrowski and Moshe Zukerman ARC Specia Research Centre for Utra-Broadband Information Networks, EEE Department, The University of

More information

Finance 360 Problem Set #6 Solutions

Finance 360 Problem Set #6 Solutions Finance 360 Probem Set #6 Soutions 1) Suppose that you are the manager of an opera house. You have a constant margina cost of production equa to $50 (i.e. each additiona person in the theatre raises your

More information

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007.

TERM INSURANCE CALCULATION ILLUSTRATED. This is the U.S. Social Security Life Table, based on year 2007. This is the U.S. Socia Security Life Tabe, based on year 2007. This is avaiabe at http://www.ssa.gov/oact/stats/tabe4c6.htm. The ife eperiences of maes and femaes are different, and we usuay do separate

More information

Betting Strategies, Market Selection, and the Wisdom of Crowds

Betting Strategies, Market Selection, and the Wisdom of Crowds Betting Strategies, Market Seection, and the Wisdom of Crowds Wiemien Kets Northwestern University w-kets@keogg.northwestern.edu David M. Pennock Microsoft Research New York City dpennock@microsoft.com

More information

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

A Latent Variable Pairwise Classification Model of a Clustering Ensemble A atent Variabe Pairwise Cassification Mode of a Custering Ensembe Vadimir Berikov Soboev Institute of mathematics, Novosibirsk State University, Russia berikov@math.nsc.ru http://www.math.nsc.ru Abstract.

More information

LADDER SAFETY Table of Contents

LADDER SAFETY Table of Contents Tabe of Contents SECTION 1. TRAINING PROGRAM INTRODUCTION..................3 Training Objectives...........................................3 Rationae for Training.........................................3

More information

Pay-on-delivery investing

Pay-on-delivery investing Pay-on-deivery investing EVOLVE INVESTment range 1 EVOLVE INVESTMENT RANGE EVOLVE INVESTMENT RANGE 2 Picture a word where you ony pay a company once they have deivered Imagine striking oi first, before

More information

The guaranteed selection. For certainty in uncertain times

The guaranteed selection. For certainty in uncertain times The guaranteed seection For certainty in uncertain times Making the right investment choice If you can t afford to take a ot of risk with your money it can be hard to find the right investment, especiay

More information

Internal Control. Guidance for Directors on the Combined Code

Internal Control. Guidance for Directors on the Combined Code Interna Contro Guidance for Directors on the Combined Code ISBN 1 84152 010 1 Pubished by The Institute of Chartered Accountants in Engand & Waes Chartered Accountants Ha PO Box 433 Moorgate Pace London

More information

3.3 SOFTWARE RISK MANAGEMENT (SRM)

3.3 SOFTWARE RISK MANAGEMENT (SRM) 93 3.3 SOFTWARE RISK MANAGEMENT (SRM) Fig. 3.2 SRM is a process buit in five steps. The steps are: Identify Anayse Pan Track Resove The process is continuous in nature and handed dynamicay throughout ifecyce

More information

Key Features of Life Insurance

Key Features of Life Insurance Key Features of Life Insurance Life Insurance Key Features The Financia Conduct Authority is a financia services reguator. It requires us, Aviva, to give you this important information to hep you to decide

More information

Pricing and Revenue Sharing Strategies for Internet Service Providers

Pricing and Revenue Sharing Strategies for Internet Service Providers Pricing and Revenue Sharing Strategies for Internet Service Providers Linhai He and Jean Warand Department of Eectrica Engineering and Computer Sciences University of Caifornia at Berkeey {inhai,wr}@eecs.berkeey.edu

More information

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization

WHITE PAPER BEsT PRAcTIcEs: PusHIng ExcEl BEyond ITs limits WITH InfoRmATIon optimization Best Practices: Pushing Exce Beyond Its Limits with Information Optimization WHITE Best Practices: Pushing Exce Beyond Its Limits with Information Optimization Executive Overview Microsoft Exce is the

More information

NCH Software FlexiServer

NCH Software FlexiServer NCH Software FexiServer This user guide has been created for use with FexiServer Version 1.xx NCH Software Technica Support If you have difficuties using FexiServer pease read the appicabe topic before

More information

(12) Patent Application Publication (10) Pub. N0.: US 2006/0105797 A1 Marsan et al. (43) Pub. Date: May 18, 2006

(12) Patent Application Publication (10) Pub. N0.: US 2006/0105797 A1 Marsan et al. (43) Pub. Date: May 18, 2006 (19) United States US 20060105797A (12) Patent Appication Pubication (10) Pub. N0.: US 2006/0105797 A1 Marsan et a. (43) Pub. Date: (54) METHOD AND APPARATUS FOR (52) US. C...... 455/522 ADJUSTING A MOBILE

More information

Network/Communicational Vulnerability

Network/Communicational Vulnerability Automated teer machines (ATMs) are a part of most of our ives. The major appea of these machines is convenience The ATM environment is changing and that change has serious ramifications for the security

More information

Virtual trunk simulation

Virtual trunk simulation Virtua trunk simuation Samui Aato * Laboratory of Teecommunications Technoogy Hesinki University of Technoogy Sivia Giordano Laboratoire de Reseaux de Communication Ecoe Poytechnique Federae de Lausanne

More information

SNMP Reference Guide for Avaya Communication Manager

SNMP Reference Guide for Avaya Communication Manager SNMP Reference Guide for Avaya Communication Manager 03-602013 Issue 1.0 Feburary 2007 2006 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this

More information

Lecture 7 Datalink Ethernet, Home. Datalink Layer Architectures

Lecture 7 Datalink Ethernet, Home. Datalink Layer Architectures Lecture 7 Dataink Ethernet, Home Peter Steenkiste Schoo of Computer Science Department of Eectrica and Computer Engineering Carnegie Meon University 15-441 Networking, Spring 2004 http://www.cs.cmu.edu/~prs/15-441

More information

Oligopoly in Insurance Markets

Oligopoly in Insurance Markets Oigopoy in Insurance Markets June 3, 2008 Abstract We consider an oigopoistic insurance market with individuas who differ in their degrees of accident probabiities. Insurers compete in coverage and premium.

More information

WEBSITE ACCOUNT USER GUIDE SECURITY, PASSWORD & CONTACTS

WEBSITE ACCOUNT USER GUIDE SECURITY, PASSWORD & CONTACTS WEBSITE ACCOUNT USER GUIDE SECURITY, PASSWORD & CONTACTS Password Reset Process Navigate to the og in screen Seect the Forgot Password ink You wi be asked to enter the emai address you registered with

More information

3.5 Pendulum period. 2009-02-10 19:40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s 2. 3.5 Pendulum period 68

3.5 Pendulum period. 2009-02-10 19:40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s 2. 3.5 Pendulum period 68 68 68 3.5 Penduum period 68 3.5 Penduum period Is it coincidence that g, in units of meters per second squared, is 9.8, very cose to 2 9.87? Their proximity suggests a connection. Indeed, they are connected

More information

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN: 1-932394-06-0 IEEE DISTRIBUTED SYSTEMS ONLINE 1541-4922 2005 Pubished by the IEEE Computer Society Vo. 6, No. 5; May 2005 Editor: Marcin Paprzycki, http://www.cs.okstate.edu/%7emarcin/ Book Reviews: Java Toos and Frameworks

More information

The Comparison and Selection of Programming Languages for High Energy Physics Applications

The Comparison and Selection of Programming Languages for High Energy Physics Applications The Comparison and Seection of Programming Languages for High Energy Physics Appications TN-91-6 June 1991 (TN) Bebo White Stanford Linear Acceerator Center P.O. Box 4349, Bin 97 Stanford, Caifornia 94309

More information

Avaya Remote Feature Activation (RFA) User Guide

Avaya Remote Feature Activation (RFA) User Guide Avaya Remote Feature Activation (RFA) User Guide 03-300149 Issue 5.0 September 2007 2007 Avaya Inc. A Rights Reserved. Notice Whie reasonabe efforts were made to ensure that the information in this document

More information

CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS

CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS Dehi Business Review X Vo. 4, No. 2, Juy - December 2003 CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS John N.. Var arvatsouakis atsouakis DURING the present time,

More information

Measuring operational risk in financial institutions

Measuring operational risk in financial institutions Measuring operationa risk in financia institutions Operationa risk is now seen as a major risk for financia institutions. This paper considers the various methods avaiabe to measure operationa risk, and

More information

Breakeven analysis and short-term decision making

Breakeven analysis and short-term decision making Chapter 20 Breakeven anaysis and short-term decision making REAL WORLD CASE This case study shows a typica situation in which management accounting can be hepfu. Read the case study now but ony attempt

More information

Hybrid Process Algebra

Hybrid Process Algebra Hybrid Process Agebra P.J.L. Cuijpers M.A. Reniers Eindhoven University of Technoogy (TU/e) Den Doech 2 5600 MB Eindhoven, The Netherands Abstract We deveop an agebraic theory, caed hybrid process agebra

More information

Maintenance activities planning and grouping for complex structure systems

Maintenance activities planning and grouping for complex structure systems Maintenance activities panning and grouping for compex structure systems Hai Canh u, Phuc Do an, Anne Barros, Christophe Berenguer To cite this version: Hai Canh u, Phuc Do an, Anne Barros, Christophe

More information

Vendor Performance Measurement Using Fuzzy Logic Controller

Vendor Performance Measurement Using Fuzzy Logic Controller The Journa of Mathematics and Computer Science Avaiabe onine at http://www.tjmcs.com The Journa of Mathematics and Computer Science Vo.2 No.2 (2011) 311-318 Performance Measurement Using Fuzzy Logic Controer

More information

Income Protection Options

Income Protection Options Income Protection Options Poicy Conditions Introduction These poicy conditions are written confirmation of your contract with Aviva Life & Pensions UK Limited. It is important that you read them carefuy

More information

Pricing Internet Services With Multiple Providers

Pricing Internet Services With Multiple Providers Pricing Internet Services With Mutipe Providers Linhai He and Jean Warand Dept. of Eectrica Engineering and Computer Science University of Caifornia at Berkeey Berkeey, CA 94709 inhai, wr@eecs.berkeey.edu

More information

Risk Margin for a Non-Life Insurance Run-Off

Risk Margin for a Non-Life Insurance Run-Off Risk Margin for a Non-Life Insurance Run-Off Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas August 15, 2011 Abstract For sovency purposes insurance companies need to cacuate so-caed best-estimate reserves

More information

Learning from evaluations Processes and instruments used by GIZ as a learning organisation and their contribution to interorganisational learning

Learning from evaluations Processes and instruments used by GIZ as a learning organisation and their contribution to interorganisational learning Monitoring and Evauation Unit Learning from evauations Processes and instruments used by GIZ as a earning organisation and their contribution to interorganisationa earning Contents 1.3Learning from evauations

More information

VALUE TRANSFER OF PENSION RIGHTS IN THE NETHERLANDS. June 2004 - publication no. 8A/04

VALUE TRANSFER OF PENSION RIGHTS IN THE NETHERLANDS. June 2004 - publication no. 8A/04 STICHTING VAN DE ARBEID REVISION VALUE TRANSFER OF PENSION RIGHTS IN THE NETHERLANDS June 2004 - pubication no. 8A/04 Vaue transfer of pension rights in the Netherands 1. Introduction The opportunity to

More information

SPOTLIGHT. A year of transformation

SPOTLIGHT. A year of transformation WINTER ISSUE 2014 2015 SPOTLIGHT Wecome to the winter issue of Oasis Spotight. These newsetters are designed to keep you upto-date with news about the Oasis community. This quartery issue features an artice

More information

Let s get usable! Usability studies for indexes. Susan C. Olason. Study plan

Let s get usable! Usability studies for indexes. Susan C. Olason. Study plan Let s get usabe! Usabiity studies for indexes Susan C. Oason The artice discusses a series of usabiity studies on indexes from a systems engineering and human factors perspective. The purpose of these

More information

A Description of the California Partnership for Long-Term Care Prepared by the California Department of Health Care Services

A Description of the California Partnership for Long-Term Care Prepared by the California Department of Health Care Services 2012 Before You Buy A Description of the Caifornia Partnership for Long-Term Care Prepared by the Caifornia Department of Heath Care Services Page 1 of 13 Ony ong-term care insurance poicies bearing any

More information

An Idiot s guide to Support vector machines (SVMs)

An Idiot s guide to Support vector machines (SVMs) An Idiot s guide to Support vector machines (SVMs) R. Berwick, Viage Idiot SVMs: A New Generation of Learning Agorithms Pre 1980: Amost a earning methods earned inear decision surfaces. Linear earning

More information

Fixed income managers: evolution or revolution

Fixed income managers: evolution or revolution Fixed income managers: evoution or revoution Traditiona approaches to managing fixed interest funds rey on benchmarks that may not represent optima risk and return outcomes. New techniques based on separate

More information

Multi-Robot Task Scheduling

Multi-Robot Task Scheduling Proc of IEEE Internationa Conference on Robotics and Automation, Karsruhe, Germany, 013 Muti-Robot Tas Scheduing Yu Zhang and Lynne E Parer Abstract The scheduing probem has been studied extensivey in

More information

Bite-Size Steps to ITIL Success

Bite-Size Steps to ITIL Success 7 Bite-Size Steps to ITIL Success Pus making a Business Case for ITIL! Do you want to impement ITIL but don t know where to start? 7 Bite-Size Steps to ITIL Success can hep you to decide whether ITIL can

More information

Risk Margin for a Non-Life Insurance Run-Off

Risk Margin for a Non-Life Insurance Run-Off Risk Margin for a Non-Life Insurance Run-Off Mario V. Wüthrich, Pau Embrechts, Andreas Tsanakas February 2, 2011 Abstract For sovency purposes insurance companies need to cacuate so-caed best-estimate

More information

CLOUD service providers manage an enterprise-class

CLOUD service providers manage an enterprise-class IEEE TRANSACTIONS ON XXXXXX, VOL X, NO X, XXXX 201X 1 Oruta: Privacy-Preserving Pubic Auditing for Shared Data in the Coud Boyang Wang, Baochun Li, Member, IEEE, and Hui Li, Member, IEEE Abstract With

More information

A quantum model for the stock market

A quantum model for the stock market A quantum mode for the stock market Authors: Chao Zhang a,, Lu Huang b Affiiations: a Schoo of Physics and Engineering, Sun Yat-sen University, Guangzhou 5175, China b Schoo of Economics and Business Administration,

More information

Betting on the Real Line

Betting on the Real Line Betting on the Rea Line Xi Gao 1, Yiing Chen 1,, and David M. Pennock 2 1 Harvard University, {xagao,yiing}@eecs.harvard.edu 2 Yahoo! Research, pennockd@yahoo-inc.com Abstract. We study the probem of designing

More information

Estimation of Liabilities Due to Inactive Hazardous Waste Sites. by Raja Bhagavatula, Brian Brown, and Kevin Murphy

Estimation of Liabilities Due to Inactive Hazardous Waste Sites. by Raja Bhagavatula, Brian Brown, and Kevin Murphy Estimation of Liabiities Due to Inactive Hazardous Waste Sites by Raja Bhagavatua, Brian Brown, and Kevin Murphy ESTIMATION OF LIABILITIES DUE TO INACTIVE HAZARDOUS WASTE SITJZS Abstract: The potentia

More information

NCH Software BroadCam Video Streaming Server

NCH Software BroadCam Video Streaming Server NCH Software BroadCam Video Streaming Server This user guide has been created for use with BroadCam Video Streaming Server Version 2.xx NCH Software Technica Support If you have difficuties using BroadCam

More information

LT Codes-based Secure and Reliable Cloud Storage Service

LT Codes-based Secure and Reliable Cloud Storage Service 2012 Proceedings IEEE INFOCOM LT Codes-based Secure and Reiabe Coud Storage Service Ning Cao Shucheng Yu Zhenyu Yang Wenjing Lou Y. Thomas Hou Worcester Poytechnic Institute, Worcester, MA, USA University

More information

Scheduling in Multi-Channel Wireless Networks

Scheduling in Multi-Channel Wireless Networks Scheduing in Muti-Channe Wireess Networks Vartika Bhandari and Nitin H. Vaidya University of Iinois at Urbana-Champaign, USA vartikab@acm.org, nhv@iinois.edu Abstract. The avaiabiity of mutipe orthogona

More information

SABRe B2.1: Design & Development. Supplier Briefing Pack.

SABRe B2.1: Design & Development. Supplier Briefing Pack. SABRe B2.1: Design & Deveopment. Suppier Briefing Pack. 2013 Ros-Royce pc The information in this document is the property of Ros-Royce pc and may not be copied or communicated to a third party, or used

More information

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation A Simiarity Search Scheme over Encrypted Coud Images based on Secure Transormation Zhihua Xia, Yi Zhu, Xingming Sun, and Jin Wang Jiangsu Engineering Center o Network Monitoring, Nanjing University o Inormation

More information

GWPD 4 Measuring water levels by use of an electric tape

GWPD 4 Measuring water levels by use of an electric tape GWPD 4 Measuring water eves by use of an eectric tape VERSION: 2010.1 PURPOSE: To measure the depth to the water surface beow and-surface datum using the eectric tape method. Materias and Instruments 1.

More information

Chapter 1 Structural Mechanics

Chapter 1 Structural Mechanics Chapter Structura echanics Introduction There are many different types of structures a around us. Each structure has a specific purpose or function. Some structures are simpe, whie others are compex; however

More information

APPENDIX 10.1: SUBSTANTIVE AUDIT PROGRAMME FOR PRODUCTION WAGES: TROSTON PLC

APPENDIX 10.1: SUBSTANTIVE AUDIT PROGRAMME FOR PRODUCTION WAGES: TROSTON PLC Appendix 10.1: substantive audit programme for production wages: Troston pc 389 APPENDIX 10.1: SUBSTANTIVE AUDIT PROGRAMME FOR PRODUCTION WAGES: TROSTON PLC The detaied audit programme production wages

More information

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS Kar Skretting and John Håkon Husøy University of Stavanger, Department of Eectrica and Computer Engineering N-4036 Stavanger,

More information

Market Design & Analysis for a P2P Backup System

Market Design & Analysis for a P2P Backup System Market Design & Anaysis for a P2P Backup System Sven Seuken Schoo of Engineering & Appied Sciences Harvard University, Cambridge, MA seuken@eecs.harvard.edu Denis Chares, Max Chickering, Sidd Puri Microsoft

More information

Design and Analysis of a Hidden Peer-to-peer Backup Market

Design and Analysis of a Hidden Peer-to-peer Backup Market Design and Anaysis of a Hidden Peer-to-peer Backup Market Sven Seuken, Denis Chares, Max Chickering, Mary Czerwinski Kama Jain, David C. Parkes, Sidd Puri, and Desney Tan December, 2015 Abstract We present

More information

medical injury a claimant s guide

medical injury a claimant s guide medica injury a caimant s guide The ega caims process is not a simpe one and cinica negigence caims are amongst the most compex. We have over three decades of experience representing patients and recovering

More information

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks Simutaneous Routing and Power Aocation in CDMA Wireess Data Networks Mikae Johansson *,LinXiao and Stephen Boyd * Department of Signas, Sensors and Systems Roya Institute of Technoogy, SE 00 Stockhom,

More information

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may

More information

Traffic classification-based spam filter

Traffic classification-based spam filter Traffic cassification-based spam fiter Ni Zhang 1,2, Yu Jiang 3, Binxing Fang 1, Xueqi Cheng 1, Li Guo 1 1 Software Division, Institute of Computing Technoogy, Chinese Academy of Sciences, 100080, Beijing,

More information

Leakage detection in water pipe networks using a Bayesian probabilistic framework

Leakage detection in water pipe networks using a Bayesian probabilistic framework Probabiistic Engineering Mechanics 18 (2003) 315 327 www.esevier.com/ocate/probengmech Leakage detection in water pipe networks using a Bayesian probabiistic framework Z. Pouakis, D. Vaougeorgis, C. Papadimitriou*

More information

AA Fixed Rate ISA Savings

AA Fixed Rate ISA Savings AA Fixed Rate ISA Savings For the road ahead The Financia Services Authority is the independent financia services reguator. It requires us to give you this important information to hep you to decide whether

More information

Conference Paper Service Organizations: Customer Contact and Incentives of Knowledge Managers

Conference Paper Service Organizations: Customer Contact and Incentives of Knowledge Managers econstor www.econstor.eu Der Open-Access-Pubikationsserver der ZBW Leibniz-Informationszentrum Wirtschaft The Open Access Pubication Server of the ZBW Leibniz Information Centre for Economics Kirchmaier,

More information

Figure 1. A Simple Centrifugal Speed Governor.

Figure 1. A Simple Centrifugal Speed Governor. ENGINE SPEED CONTROL Peter Westead and Mark Readman, contro systems principes.co.uk ABSTRACT: This is one of a series of white papers on systems modeing, anaysis and contro, prepared by Contro Systems

More information

A short guide to making a medical negligence claim

A short guide to making a medical negligence claim A short guide to making a medica negigence caim Introduction Suffering from an incident of medica negigence is traumatic and can have a serious ong-term impact on both the physica and menta heath of affected

More information

Comparison of Traditional and Open-Access Appointment Scheduling for Exponentially Distributed Service Time

Comparison of Traditional and Open-Access Appointment Scheduling for Exponentially Distributed Service Time Journa of Heathcare Engineering Vo. 6 No. 3 Page 34 376 34 Comparison of Traditiona and Open-Access Appointment Scheduing for Exponentiay Distributed Service Chongjun Yan, PhD; Jiafu Tang *, PhD; Bowen

More information

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES About ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow, some of which may not appy your account. Some of

More information

Introduction the pressure for efficiency the Estates opportunity

Introduction the pressure for efficiency the Estates opportunity Heathy Savings? A study of the proportion of NHS Trusts with an in-house Buidings Repair and Maintenance workforce, and a discussion of eary experiences of Suppies efficiency initiatives Management Summary

More information

READING A CREDIT REPORT

READING A CREDIT REPORT Name Date CHAPTER 6 STUDENT ACTIVITY SHEET READING A CREDIT REPORT Review the sampe credit report. Then search for a sampe credit report onine, print it off, and answer the questions beow. This activity

More information

On Capacity Scaling in Arbitrary Wireless Networks

On Capacity Scaling in Arbitrary Wireless Networks On Capacity Scaing in Arbitrary Wireess Networks Urs Niesen, Piyush Gupta, and Devavrat Shah 1 Abstract arxiv:07112745v3 [csit] 3 Aug 2009 In recent work, Özgür, Lévêque, and Tse 2007) obtained a compete

More information

This paper considers an inventory system with an assembly structure. In addition to uncertain customer

This paper considers an inventory system with an assembly structure. In addition to uncertain customer MANAGEMENT SCIENCE Vo. 51, No. 8, August 2005, pp. 1250 1265 issn 0025-1909 eissn 1526-5501 05 5108 1250 informs doi 10.1287/mnsc.1050.0394 2005 INFORMS Inventory Management for an Assemby System wh Product

More information

INDUSTRIAL AND COMMERCIAL

INDUSTRIAL AND COMMERCIAL Finance TM NEW YORK CITY DEPARTMENT OF FINANCE TAX & PARKING PROGRAM OPERATIONS DIVISION INDUSTRIAL AND COMMERCIAL ABATEMENT PROGRAM PRELIMINARY APPLICATION AND INSTRUCTIONS Mai to: NYC Department of Finance,

More information

l l ll l l Exploding the Myths about DETC Accreditation A Primer for Students

l l ll l l Exploding the Myths about DETC Accreditation A Primer for Students Expoding the Myths about DETC Accreditation A Primer for Students Distance Education and Training Counci Expoding the Myths about DETC Accreditation: A Primer for Students Prospective distance education

More information

The Whys of the LOIS: Credit Risk and Refinancing Rate Volatility

The Whys of the LOIS: Credit Risk and Refinancing Rate Volatility The Whys of the LOIS: Credit Risk and Refinancing Rate Voatiity Stéphane Crépey 1, and Raphaë Douady 2 1 Laboratoire Anayse et Probabiités Université d Évry Va d Essonne 9137 Évry, France 2 Centre d économie

More information

Undergraduate Studies in. Education and International Development

Undergraduate Studies in. Education and International Development Undergraduate Studies in Education and Internationa Deveopment Wecome Wecome to the Schoo of Education and Lifeong Learning at Aberystwyth University. Over 100 years ago, Aberystwyth was the first university

More information

NCH Software Copper Point of Sale Software

NCH Software Copper Point of Sale Software NCH Software Copper Point of Sae Software This user guide has been created for use with Copper Point of Sae Software Version 1.xx NCH Software Technica Support If you have difficuties using Copper Point

More information

COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION

COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION Františe Mojžíš Department of Computing and Contro Engineering, ICT Prague, Technicá, 8 Prague frantise.mojzis@vscht.cz Abstract This

More information

The Web Insider... The Best Tool for Building a Web Site *

The Web Insider... The Best Tool for Building a Web Site * The Web Insider... The Best Too for Buiding a Web Site * Anna Bee Leiserson ** Ms. Leiserson describes the types of Web-authoring systems that are avaiabe for buiding a site and then discusses the various

More information

Assessing Network Vulnerability Under Probabilistic Region Failure Model

Assessing Network Vulnerability Under Probabilistic Region Failure Model 2011 IEEE 12th Internationa Conference on High Performance Switching and Routing Assessing Networ Vunerabiity Under Probabiistic Region Faiure Mode Xiaoiang Wang, Xiaohong Jiang and Achie Pattavina State

More information

CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society

CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY. Course Offered By: Indian Environmental Society CERTIFICATE COURSE ON CLIMATE CHANGE AND SUSTAINABILITY Course Offered By: Indian Environmenta Society INTRODUCTION The Indian Environmenta Society (IES) a dynamic and fexibe organization with a goba vision

More information

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l

ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES. l l ELECTRONIC FUND TRANSFERS YOUR RIGHTS AND RESPONSIBILITIES The Eectronic Fund Transfers we are capabe of handing for consumers are indicated beow some of which may not appy your account Some of these may

More information

ONE of the most challenging problems addressed by the

ONE of the most challenging problems addressed by the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 44, NO. 9, SEPTEMBER 2006 2587 A Mutieve Context-Based System for Cassification of Very High Spatia Resoution Images Lorenzo Bruzzone, Senior Member,

More information

The eg Suite Enabing Rea-Time Monitoring and Proactive Infrastructure Triage White Paper Restricted Rights Legend The information contained in this document is confidentia and subject to change without

More information

Journal of Economic Behavior & Organization

Journal of Economic Behavior & Organization Journa of Economic Behavior & Organization 85 (23 79 96 Contents ists avaiabe at SciVerse ScienceDirect Journa of Economic Behavior & Organization j ourna ho me pag e: www.esevier.com/ocate/j ebo Heath

More information