Genomics 92 (2008) Contents lists available at ScienceDirect. Genomics. journal homepage:

Size: px
Start display at page:

Download "Genomics 92 (2008) 255 264. Contents lists available at ScienceDirect. Genomics. journal homepage: www.elsevier.com/locate/ygeno"

Transcription

1 Genomics 92 (2008) Contents lists available at ScienceDirect Genomics journal homepage: Review Applications of next-generation sequencing technologies in functional genomics Olena Morozova, Marco A. Marra BC Cancer Agency Genome Sciences Centre, Suite 100, 570 West 7th Avenue, Vancouver, BC V5Z 4S6, Canada article info abstract Article history: Received 3 April 2008 Accepted 9 July 2008 Available online 24 August 2008 Keywords: Illumina/Solexa ABI/SOLiD 454/Roche Functional genomics Next-generation sequencing technology Transcriptome Epigenome Deep sequencing Sequencing by synthesis Sequencing by ligation A new generation of sequencing technologies, from Illumina/Solexa, ABI/SOLiD, 454/Roche, and Helicos, has provided unprecedented opportunities for high-throughput functional genomic research. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted resequencing, discovery of transcription factor binding sites, and noncoding RNA expression profiling. This review discusses applications of next-generation sequencing technologies in functional genomics research and highlights the transforming potential these technologies offer Elsevier Inc. All rights reserved. Contents Advances in DNA sequencing technologies Sanger sequencing sequencing technology: pyrosequencing in high-density picoliter reactors Illumina: sequencing by synthesis of single-molecule arrays with reversible terminators ABI/SOLiD: massively parallel sequencing by ligation Making use of next-generation sequencer data format: pains and gains of plentiful short reads Sequence census applications Read pairs and read accuracy issues Transcriptome sequencing by next-generation technologies Gene expression profiling using novel and revisited sequence census methods Small noncoding RNA profiling and the discovery of novel small RNA genes Protein coding gene annotation using transcriptome sequence data Detection of aberrant transcription events Applications of next-generation sequencing for the analysis of epigenetic modifications of histones and DNA DNA methylation profiling by bisulfite DNA sequencing Sequence census applications for mapping histone modifications and the locations of DNA-binding proteins Applications of next-generation sequencers to the study of DNA accessibility and chromatin structure Concluding remarks Acknowledgments References Corresponding author. Fax: address: mmarra@bcgsc.ca (M.A. Marra) /$ see front matter 2008 Elsevier Inc. All rights reserved. doi: /j.ygeno

2 256 O. Morozova, M.A. Marra / Genomics 92 (2008) Since first introduced to the market in 2005, next-generation sequencing technologies have had a tremendous impact on genomic research. The next-generation technologies have been used for standard sequencing applications, such as genome sequencing and resequencing, and for novel applications previously unexplored by Sanger sequencing. In this review we first describe the three commercially available next-generation sequencing technologies in comparison to a state-of-the-art Sanger sequencer, and follow this with a discussion of the novel kind of data produced by nextgeneration sequencers and the issues associated with it. We then turn our attention to the application of next-generation sequencing technologies to functional genomics research, particularly focusing on transcriptomics and epigenomics. We end with a discussion of future prospects that next-generation technologies hold for functional genomics research. This review does not address genome sequencing and resequencing applications of next-generation sequencers [1] or the huge impact these technologies have had in metagenomics (reviewed in [2]). Advances in DNA sequencing technologies The landmark publications of the late 1970 s by Sanger's and Gilbert's groups [3,4] and notably the development of the chain termination method by Sanger and colleagues [5] established the groundwork for decades of sequence-driven research that followed. The chain-termination method published in 1977 [5], also commonly referred to as Sanger or dideoxy sequencing, has remained the most commonly used DNA sequencing technique to date and was used to complete human genome sequencing initiatives led by the International Human Genome Sequencing Consortium and Celera Genomics [6 8]. Very recently, the Sanger method has been partially supplanted by several next-generation sequencing technologies that offer dramatic increases in cost-effective sequence throughput, albeit at the expense of read lengths. The next-generation technologies commercially available today include the 454 GS20 pyrosequencingbased instrument (Roche Applied Science), the Solexa 1G analyzer (Illumina, Inc.), the SOLiD instrument from Applied Biosystems, and the Heliscope from Helicos, Inc. As of this writing, information on the performance of the Heliscope in functional genomics applications is lacking, and so we have restricted our comments to the 454, SOLiD, and 1G sequencing platforms. These new technologies as well as the current state-of the-art Sanger sequencing platform are summarized in Table 1 and discussed in some detail below. For a review of the history of DNA sequencing the reader is referred to [9]; for a more comprehensive review of emerging sequencing technologies see [10]. Table 1 Advances in DNA sequencing technologies Technology Approach Read length Bp per run Company name and Web site Automated Sanger sequencer ABI3730xl 454/Roche FLX system Illumina/ Solexa ABI/SOLiD Synthesis in the presence of dye terminators Pyrosequencing on solid support Sequencing by synthesis with reversible terminators Massively parallel sequencing by ligation Up to 900 bp 96 kb Applied Biosystems www. appliedbyosystems. com bp Mb Roche Applied Science bp 1 Gb Illumina, Inc bp 1 3 Gb Applied Biosystems www. appliedbyosystems. com Sanger sequencing Since its initial report in 1977, the Sanger sequencing method has remained conceptually unchanged. The method is based on the DNA polymerase-dependent synthesis of a complementary DNA strand in the presence of natural 2 -deoxynucleotides (dntps) and 2,3 - dideoxynucleotides (ddntps) that serve as nonreversible synthesis terminators [5]. The DNA synthesis reaction is randomly terminated whenever a ddntp is added to the growing oligonucleotide chain, resulting in truncated products of varying lengths with an appropriate ddntp at their 3 terminus. The products are separated by size using polyacrylamide gel electrophoresis and the terminal ddntps are used to reveal the DNA sequence of the template strand. Originally, four different reactions were required per template, each reaction containing a different ddntp terminator, ddatp, ddctp, ddttp, or ddgtp. However, advances in fluorescence detection have allowed for combining the four terminators into one reaction by having them labeled with fluorescent dyes of different colors [11,12]. Subsequent advances have replaced the original slab gel electrophoresis with capillary gel electrophoresis, thereby enabling much higher electric fields to be applied to the separation matrix. One effect of this advance was to enhance the rate at which fragments could be separated [13]. The overall throughput of capillary electrophoresis was further increased by the advent of capillary arrays whereby many samples could be analyzed in parallel [14]. In addition, breakthroughs in polymer biochemistry, including the development of linear polyacrylamide [15] and polydimethylacrylamide [16] have allowed the reuse of capillaries in multiple electrophoretic runs, thus further increasing sequencing efficiency. For further reading on improvements in Sanger sequencing research the reader is referred to [10] and [17]. These and many other advances in sequencing technology contributed to the relatively low error rate, long read length, and robust characteristics of modern Sanger sequencers. For instance, a commonly used automated high-throughput Sanger sequencing instrument from Applied Biosystems, the ABI 3730xl, has a 96- capillary array format and is capable of producing 900 or more PHRED 20 [18] bp per read for a total of up to 96 kb for a 3-h run (Table 1). However, despite the many advances in chemistries and the robust performance of instruments like the 3730xl, the application of relatively expensive Sanger sequencing to large sequencing projects has remained beyond the means of the typical grant-funded investigator. This is a limitation that has been apparently successfully addressed, to varying degrees, by all of the latest technology offerings. 454 sequencing technology: pyrosequencing in high-density picoliter reactors An inherent limitation of Sanger sequencing is the requirement of in vivo amplification of DNA fragments that are to be sequenced, which is usually achieved by cloning into bacterial hosts. The cloning step is prone to host-related biases, is lengthy, and is quite labor intensive [2]. The 454 technology [19], the first next-generation sequencing technology released to the market, circumvents the cloning requirement by taking advantage of a highly efficient in vitro DNA amplification method known as emulsion PCR [20]. In emulsion PCR, individual DNA fragment-carrying streptavidin beads, obtained through shearing the DNA and attaching the fragments to the beads using adapters, are captured into separate emulsion droplets. The droplets act as individual amplification reactors, producing 10 7 clonal copies of a unique DNA template per bead [19]. Each template-containing bead is subsequently transferred into a well of a picotiter plate and the clonally related templates are analyzed using a pyrosequencing reaction. The use of the picotiter plate allows hundreds of thousands of pyrosequencing reactions to be carried out in parallel, massively increasing the sequencing throughput [19]. The

3 O. Morozova, M.A. Marra / Genomics 92 (2008) pyrosequencing approach [21,22] is a sequencing-by-synthesis technique that measures the release of inorganic pyrophosphate (PP i )by chemiluminescence. The template DNA is immobilized, and solutions of dntps are added one at a time; the release of PP i, whenever the complementary nucleotide is incorporated, is detectable by light produced by a chemiluminescent enzyme present in the reaction mix. The sequence of DNA template is determined from a pyrogram, which corresponds to the order of correct nucleotides that had been incorporated. Since chemiluminescent signal intensity is proportional to the amount of pyrophosphate released and hence the number of bases incorporated, the pyrosequencing approach is prone to errors that result from incorrectly estimating the length of homopolymeric sequence stretches (i.e., indels). The current state-of-the-art 454 platform marketed by Roche Applied Science is capable of generating Mb of sequence in 200- to 300-bp reads in a 4-h run. The 454 technology has been the most widely published next-generation technology, having so far been featured in more than 100 research publications (Roche Applied Sciences). Illumina: sequencing by synthesis of single-molecule arrays with reversible terminators The Illumina/Solexa approach [23 25] achieves cloning-free DNA amplification by attaching single-stranded DNA fragments to a solid surface known as a single-molecule array, or flow cell, and conducting solid-phase bridge amplification of single-molecule DNA templates (Illumina, Inc.). In this process, one end of single DNA molecule is attached to a solid surface using an adapter; the molecules subsequently bend over and hybridize to complementary adapters (creating the bridge ), thereby forming the template for the synthesis of their complementary strands. After the amplification step, a flow cell with more than 40 million clusters is produced, wherein each cluster is composed of approximately 1000 clonal copies of a single template molecule. The templates are sequenced in a massively parallel fashion using a DNA sequencing-by-synthesis approach that employs reversible terminators with removable fluorescent moieties and special DNA polymerases that can incorporate these terminators into growing oligonucleotide chains. The terminators are labeled with fluors of four different colors to distinguish among the different bases at the given sequence position and the template sequence of each cluster is deduced by reading off the color at each successive nucleotide addition step. Although the Illumina approach is more effective at sequencing homopolymeric stretches than pyrosequencing, it produces shorter sequence reads [25] and hence cannot resolve short sequence repeats. In addition, due to the use of modified DNA polymerases and reversible terminators, substitution errors have been noted in Illumina sequencing data [9]. Typically, the 1G genome analyzer from Illumina, Inc., is capable of generating 35-bp reads and producing at least 1 Gb of sequence per run in 2 3 days. ABI/SOLiD: massively parallel sequencing by ligation Massively parallel sequencing by hybridization ligation, implemented in the supported oligonucleotide ligation and detection system (SOLiD) from Applied Biosystems, has recently become available. The ligation chemistry used in SOLiD is based on the polony sequencing technique that was published in the same year as the 454 method [26]. Construction of sequencing libraries for analysis on the SOLiD instrument begins with an emulsion PCR single-molecule amplification step similar to that used in the 454 technique. The amplification products are transferred onto a glass surface where sequencing occurs by sequential rounds of hybridization and ligation with 16 dinucleotide combinations labeled by four different fluorescent dyes (each dye used to label four dinucleotides). Using the fourdye encoding scheme, each position is effectively probed twice, and the identity of the nucleotide is determined by analyzing the color that results from two successive ligation reactions. Significantly, the twobase encoding scheme enables the distinction between a sequencing error and a sequence polymorphism: an error would be detected in only one particular ligation reaction, whereas a polymorphism would be detected in both. The newly released SOLiD instrument is capable of producing 1 3 Gb of sequence data in 35-bp reads per an 8-day run. Making use of next-generation sequencer data format: pains and gains of plentiful short reads Sequence census applications The read lengths currently achievable by 454 technology are approaching 300 bp (Roche Applied Sciences), yet are still shorter than the bp achieved by Sanger sequencing. The Illumina and ABI/ SOLiD instruments generate even shorter 35-bp reads. The large numbers of short reads produced by next-generation sequencers provide opportunities for the development of new applications that benefit from the particular data format. For instance, next-generation technologies have been widely applied in contexts whereby sequencing of only a portion of the molecule is sufficient (referred to as sequence census applications [27]). The sequence census approach (Fig. 1) uses short reads (or tags ) to assign the site of origin of the read instead of determining the entire sequence of the original DNA molecule. By mapping the sequence read to its molecule of origin, the presence of the molecule is established. Significantly, the number of reads that map to a particular nucleic acid species correlates with the abundance of the species in the cell [28 30]. The sequence census approach is conceptually similar to the serial analysis of gene expression (SAGE) method initially developed with Sanger sequencing [31]. In SAGE, the abundance of a particular mrna species is estimated from the count of sequence fragments (tags) derived from its 3 end [31]. Since the advent of next-generation sequencers that reduce the cost Fig. 1. Sequence census approach. In the sequence census approach used in next-generation sequencing, short reads are mapped to the template molecule to provide three types of information. Sequence data are used to reveal sequence polymorphisms in the template, e.g., a SNP (red and yellow), the abundance of reads is used as a quantitative measure of the abundance of the template, and the particular areas of the template covered by reads reveal the internal structure of the template, e.g., the presence of exons and introns.

4 258 O. Morozova, M.A. Marra / Genomics 92 (2008) of sequencing and massively increase throughput, the applicability of such sequence census methods has been expanded to include many research areas [27]. To date, sequence census methods have been most commonly used for the analysis of transcribed portions of the genome, such as gene expression and noncoding RNA profiling [28 30]. The applications of these methods for studying transcriptomes are discussed in more detail in the transcriptome section of this review. A novel use of the sequence census approach is the identification of protein binding sites on the DNA using chromatin immunoprecipitation followed by next-generation sequencing (ChIP-Seq) [32 37]. This technique couples the commonly used chromatin immunoprecipitation procedure, in which DNA protein complexes are cross-linked and precipitated using an antibody [38], to next-generation sequencing of DNA fragments bound to the precipitated protein [36]. To date, ChIP- Seq has been applied to the identification of transcription factor binding sites as well as histone modifications on a genome-wide scale [32,34,36,37]. The application of this technology for studying epigenomes is described in detail in the epigenome section of this review. Read pairs and read accuracy issues A limitation of short-read sequence data is the difficulty in de novo sequence assembly. This shortcoming is particularly an issue in sequencing new genomes and in sequencing highly rearranged genome segments, such as one might discover in cancer genomes [39] or in regions of structural variation [40]. Paired-end sequencing approaches, in which both ends of a fragment of defined size are sequenced to provide more information about the fragment, have the potential to improve the utility of short reads for sequencing rearranged genomic segments and for de novo sequence assembly [2,41]. Although widely adopted for Sanger sequencing, in which paired-end reads are obtained by sequencing both ends of a clone insert, paired-end approaches are currently in their infancy relative to most next-generation technologies [2]. A few reports of pairedend approaches for next-generation sequencers included the pairedend polony sequencing approach, applied to resequence the genome of an evolved strain of Escherichia coli [26], and the multiplex sequencing of paired-end ditags (MS-PET) method for 454 sequencing [42]. More recently, a paired-end mapping (PEM) procedure has been developed for the 454 technology and used to map structural rearrangements in two previously studied human genomes [43]. The results of PEM were in concordance with those obtained from previous investigations, including the HapMap project [43]. Another paired-end mapping approach involving paired-end ditags has been described for the detection of gene fusions and transcribed retrotransposons [44]. At the time of writing, no paired-end studies have been reported using Illumina or SOLiD technologies, although both platforms are developing or have implemented paired-end approaches. Given the quantity of reads and their short length, read accuracy becomes critical for mapping the reads to a reference sequence and for detecting sequence polymorphisms. The base accuracy, and the PHRED method [18] for evaluating the quality of Sanger-sequenced bases, is well established. This is currently not the case for the nextgeneration technologies ([2], but see [45]). To compensate for the uncertainty related to sequence quality and base accuracy, a general reliance on redundancy of sequence coverage is commonly invoked in next-generation sequencing. Multiple overlapping reads are thus used to confirm the accuracy of the base calls in applications in which accuracy is paramount, such as in the reliable detection of mutations or sequence polymorphisms [46]. Increasing the accuracy of individual base calls will ultimately lead to reductions in the high levels of redundancy currently invoked for confident base assignment and thus will presumably decrease sequencing costs. Transcriptome sequencing by next-generation technologies The sequencing of cdna rather than genomic DNA focuses analysis on the transcribed portion of the genome. This focus reduces the size of the sequencing target space, which can be viewed as desirable given the fact that, even with next-generation sequencers, sequencing an entire vertebrate genome is still an expensive undertaking. Transcriptome sequencing has been used for applications ranging from gene expression profiling, genome annotation, and rearrangement detection to noncoding RNA discovery and quantification. A unique feature of high-throughput transcriptome sequencing studies is the versatility of the data, which can simultaneously be analyzed to provide insight into the level of gene expression, the structure of genomic loci, and sequence variation present at loci (e.g., SNPs). To date, the 454 technology has dominated next-generation applications in transcriptomics; but at least one recent paper describes the use of the Illumina sequencer for profiling micrornas [30]. Gene expression profiling using novel and revisited sequence census methods The identification and quantification of mrna species under different conditions or in different cell types have long been of interest to biologists. Two conceptually different approaches to highthroughput gene expression profiling have emerged in the past decade to allow the simultaneous interrogation of gene expression levels on a genome-wide scale [47]. One group of methods is based on microarrays, in which cdna is hybridized to an array of complementary oligonucleotide probes corresponding to genes of interest, and the abundance of a particular mrna species is estimated from its hybridization intensity to the relevant probe [48]. A variety of microarray-based platforms and techniques have been developed in recent years; see for review [49]. A conceptually different group of methods uses sequencing of cdna fragments followed by counting the number of times a particular fragment has been observed. This group of methods includes the well-known SAGE method [31] and the more recent massively parallel signature sequencing (MPSS) [50]. In SAGE, restriction enzymes are used to obtain short sequence fragments (tags) of bp, usually derived from the 3 end of an mrna; the tags are concatenated and sequenced to determine the expression profiles of their corresponding mrnas [31]. The MPSS method also generates small fragment signatures of each mrna species; however, it uses a different protocol that does not involve propagation in bacteria and a different non-gel-based sequencing method [50]. SAGE and MPSS are often termed clone-and-count techniques as they provide a digital overview of gene expression profiles in a cell [47]. Advantages of such digital readouts include statistical robustness and less stringent standardization and replication requirements than those used for microarrays [50,51]. Some disadvantages that have hindered the use of SAGE and MPSS up until recently included the cost of sequencing and the biases introduced by the necessary cloning step. Furthermore, the MPSS technology has been restricted to only a few specialized laboratories [52]. Despite its excellent performance at detecting highly abundant transcripts, SAGE as commonly employed (i.e., sequencing to depths of 30, ,000 tags) involves relatively limited sequencing that does not robustly detect rare mrnas [53]. This is due to the costs incurred with extensive Sanger sequencing of SAGE libraries. Because of these costs, no conventionally sequenced SAGE library exhibits saturating tag number kinetics that would suggest complete representation of the cellular transcriptome [53]. In contrast, next-generation technologies offer substantial cost-effective increases in sequencing throughput, such that millions of sequences can be obtained for a few thousands of dollars or less. In addition, the short read lengths are compatible with the short tags generated using SAGE-like library

5 O. Morozova, M.A. Marra / Genomics 92 (2008) construction techniques. A recent study by Nielsen et al. [52] has described an extension of the SAGE method based on the LongSAGE protocol [54], for generation of longer tags of 17 bp versus 14 bp in the original SAGE, and the 454 sequencing technology. The nextgeneration sequencing-based SAGE method, termed DeepSAGE, greatly simplifies the sample preparation procedure by removing the cloning step and replacing it with emulsion PCR-based amplification; the sequencing is conducted by the 454 protocol that allows multiple samples to be sequenced in a single run at a high depth [52]. In particular, the authors estimate that a typical DeepSAGE experiment would generate 300,000 tags with less effort than a typical LongSAGE experiment generating 50,000 tags [52]. Nielsen et al. [52] applied the DeepSAGE protocol to the analysis of the transcriptome of the potato and showed that it was efficient at detecting rare transcripts (gauged by examining the expression of potato transcription factors) and due to the much increased depth provides more robust expression level estimates than LongSAGE [52]. A novel sequence census technique for surveying mrna levels using 5 -end sequence fragments has been developed and termed rapid analysis of 5 transcript ends (5 -RATE) [55]. The technique involves three steps, 5 oligocapping of mrna; ditag formation using RL-SAGE [56], a modification of the LongSAGE protocol; and 454 sequencing of tags. The technique was applied to the analysis of maize transcripts and was shown to provide an effective means for surveying the transcriptome. Some key features of 5 -RATE are the tag length ( 80 bp), which is longer than that of LongSAGE and MPSS, and the ability to generate tags from the 5 end, facilitating the identification of transcription start sites. In addition, similar to DeepSAGE, 5 -RATE is a relatively simple, fast, and productive procedure that does not involve cloning. Other sequencing-based methods such as full-length cdna sequencing [57] and the generation of expressed sequence tags (ESTs), which are single sequencing reads derived from one end of a cdna clone [58,59], have been used to characterize cellular mrna profiles. However, primarily due to the cost of sequencing, these methods had been even less effective than SAGE at providing a quantitative and comprehensive representation of cellular transcripts or transcript variability [60]. With the development of nextgeneration sequencing technologies, EST sequencing has gained potential as one of the sequence census methods for studying mrna profiles on a genome-wide scale. A number of studies have been successful at constructing EST libraries using the 454 technology; so far EST libraries have been constructed from plants, including the mustard weed Arabidopsis thaliana [61,62], the model legume Medicago truncatula [63], and maize, Zea mays [64], as well as the insects Drosophila melanogaster [28] and wasp, Polistes metricus [65]. A study conducted at our Genome Centre used 454 sequencing to generate ESTs from a human prostate cancer cell line, LNCaP [29]. The 454 technology is well suited to EST sequencing, as it is currently capable of generating 400,000 reads per run [29,63] and provides an unbiased representation of all regions of a transcript independent of length or expression level [61]. Importantly, a single 454 run has been shown to provide a representative view of the mrna population in the cell [61,63]. Further, unlike the shorter reads generated using the Illumina or SOLiD sequencers, the length of the 454 reads allows for interpretation of sequences generated from species lacking a genome sequence or extensive transcriptome sequences for comparison (e.g., [66]). Another novel sequence census approach based on the 454 technology focuses on sequencing unique fragments found at 3 untranslated regions (3 -UTRs) of genes [67]. This approach is particularly useful for distinguishing closely related transcripts, such as those resulting from paralogs, and for studying allele-specific expression [67]. In addition, Eveland et al. [67] estimated that 3 -UTR sequencing is superior to EST sequencing at identifying individual transcripts owing to the decreased sequencing redundancy achieved by restricting reads to the 3 -UTR regions of genes [67]. In particular, the authors identified 47,299 distinct mrnas compared to 17,500 identified by a similar EST study in A. thaliana [61]. Significantly, the method does not rely on a complete genome sequence and has been shown to be successful at surveying transcription in Z. mays, whose genome is currently being sequenced (Maize Genome Sequencing Consortium). Small noncoding RNA profiling and the discovery of novel small RNA genes A related application of next-generation sequencing technologies to the analysis of transcriptomes is small noncoding (ncrna) discovery and profiling. ncrnas are RNA molecules that are not translated into a protein product. This class of RNAs includes transfer RNA (trna), ribosomal RNA (rrna), small nuclear and small nucleolar RNA, and microrna and small interfering RNA (mirna and sirna). Recent research has implicated micrornas, approximately 21- nucleotide-long RNA molecules, as crucial posttranscriptional regulators of gene expression in both animals and plants [68]. A related class of noncoding RNAs, sirnas, nt in length, is the predominant class of small RNA molecules in plants [69]. Definite evidence for the presence of endogenous sirnas in animals is lacking [70]. While mirnas and sirnas are similar in size and are both involved in posttranscriptional regulation of gene expression, their biogenesis and exact functions are different [71]. MicroRNAs were first identified in Caenorhabditis elegans [72] and since then have emerged as crucial regulators of gene expression in many organisms, including humans [73]. Historically, novel mirnas have been identified by cloning and sequencing of individual mirnas, which involved separating them on gels and successively ligating adapters at the 5 -end monophosphate and 3 hydroxyl groups [70,73]. However, using this approach it was difficult to distinguish mirnas from degradation products of other ncrnas in the cell, such as rrna or trna [70]. More recently, microarray-based approaches have been developed for high-throughput mirna profiling; however, these approaches are not suitable for the detection of novel mirnas [70]. High-throughput sequencing of small RNAs provides great potential for the identification of novel small RNAs as well as profiling of known and novel small RNA genes. The MPSS technology has been applied to the sequencing of size-fractionated small RNAs from A. thaliana [69]. This approach involved the generation of 17-nt fragments corresponding to parts of mature small RNA molecules and using bioinformatic analysis to identify the corresponding small RNA genes in the genome [69]. Several disadvantages of this approach are the high complexity and cost of the MPSS technology, which involves a cloning step, and the short read lengths corresponding to only a portion of small RNA molecules [70]. Next-generation sequencing technologies do not involve cloning and produce read lengths compatible with the length of mature mirnas and sirnas. This provides several important advantages for ncrna sequencing studies over MPSS; the advantages include the decreased procedural complexity and cost and the dramatically increased throughput and depth of coverage. Small RNA profiling studies with next-generation sequencing technologies currently include gel-based separation of small RNAs and construction of cdna libraries followed by sequencing on a next-generation platform. To date, small RNA profiling studies involving the 454 technology have been reported. These include studies in the moss Physcomitrella patens [74]; A. thaliana [74 80]; wheat, Triticum aestivum [81]; the basal eudicot species Eschscholzia californica [82]; the lycopod Selaginella moellendorffii [74]; the unicellular alga Chlamydomonas reinhardtii [83]; Marek disease virus [84]; and primates [85]. Importantly, small RNA sequencing studies with the 454 technology contributed to the discovery of a novel class of small RNAs, termed Piwi-interacting RNAs, that are expressed in mammalian testes and are presumably

6 260 O. Morozova, M.A. Marra / Genomics 92 (2008) required for germ cell development in mammals and other species [86 88]. The Illumina and SOLiD platforms allow for the generation of several millions of short 35-nt reads in comparison to up to 500,000 reads generated by the 454 technology [74], potentially providing even deeper coverage of small RNAs than the 454 platform. Our laboratory has recently used the Illumina technology to sequence small RNA libraries from human embryonic stem cells before and after their differentiation into embryonic bodies [30]. This study generated more than 6 million short sequence reads from each library and identified 334 known and 104 novel mirna genes in one of the most comprehensive mirna profiling exercises to date [30]. High-throughput sequencing-based approaches to small ncrna profiling, hugely enabled by next-generation technologies, provide several advantages over microarray methods, including the ability to discover novel mirnas and the potential to detect variations in the mature mirna length and mirna editing [30]. We envision that such approaches will gain even more popularity in the near future as the Illumina and SOLiD platforms are exploited in more laboratories. Protein coding gene annotation using transcriptome sequence data Despite the explosion of genome sequence data from multiple species fueled by advances in sequencing technologies, genome annotation for most multicellular eukaryotic species is still at its rudimentary stages. In particular, recent annotation efforts have focused on the discovery of novel noncoding RNA genes and regulatory elements that determine temporal or spatial gene expression; however, the annotation of protein-coding genes involving the elucidation of their correct exon intron structures largely has lagged behind [89]. The current gold standard for protein-coding gene annotation is EST or full-length cdna sequencing followed by alignment to a reference genome assembly [89]. The cdna sequences can be aligned either to the locus from which they had been derived (cis-alignments) or to a homologous locus from the source genome or the genome of a related organism (trans-alignments). It has been estimated that most EST sequencing projects fail to cover 20 40% of transcripts, which usually include rare or very long transcripts as well as transcripts with highly specific expression patterns [89]. Another challenge of ESTdriven gene annotation is alternative splicing and the complex structure of many loci from multicellular eukaryotes, resulting in a substantial number of incomplete annotations. Next-generation sequencing technologies have the potential for providing much deeper coverage of EST libraries; however, the short reads may be problematic when annotating alternative splice variants and the complete accurate structures of protein coding loci [89]. A recent study used laser capture microdissection [90] to isolate transcripts from the shoot apical meristem of Z. mays followed by cdna library construction and 454 sequencing of ESTs [91]. The study used a cis-alignment method to annotate more than 25,000 genomic sequences from maize and detect transcription from 400 orphan genes, most of which had not been detected using other approaches [91]. Another study used the 454 technology to generate 391,157 EST reads from the brain transcriptome of the wasp P. metricus; the reads were then trans-aligned to the genome sequence and EST resources from the honeybee, Apis mellifera, to annotate P. metricus transcripts [65]. Interestingly, the study found wasp EST matches to 39% of the honeybee mrnas and observed a strong correlation between the expression levels of the corresponding transcripts from the two species. Significantly, many gene expression profiling studies that use high-throughput sequencing can also provide annotation information, such as the presence of novel genes, exons, or splice events. For instance, our own study involving the generation of ESTs from the prostate cancer cell line LNCaP characterized 25 novel splicing events [29]. Another example is the 5 -RATE method [55] described above, which is particularly useful for providing annotation information about transcriptional start sites and other molecular events involving the 5 end of transcripts. In addition, 454 transcriptome sequencing data can be useful for identifying SNPs in coding regions [92]. However, as mentioned earlier, error rates associated with nextgeneration sequencers require a relatively high fold coverage to call a sequence polymorphism reliably, particularly in a heterozygous sample [46]. The much deeper sequencing capacity of next-generation sequencing comes at the cost of shorter reads, which create additional challenges for gene annotation (e.g., difficulties in resolving splice isoforms). Paired-end sequencing approaches, such as the newly developed 454 sequencing-based MS-PET strategy, may facilitate annotation studies by providing mate pair information from large DNA fragments [42]. Detection of aberrant transcription events Genome rearrangements resulting in aberrant transcriptional events are hallmarks of human cancers [93]. Techniques for detecting such genome rearrangements include cytogenetic and PCR methods as well as high-throughput array-based approaches, most notably array comparative genomic hybridization and sequencing-based techniques. Sequencing methods for genome rearrangement detection offer several advantages over array methods, such as the ability to detect multiple types of rearrangements, including previously unknown ones; the detection of absolute rather than relative changes in sequence copy numbers; and the potential for single-nucleotide resolution [39]. Large-scale transcriptome sequencing studies provide a novel means for detecting genome rearrangements in the transcribed portion of the genome. However, due to the short-read-length issue, single-end transcriptome sequencing studies using nextgeneration technologies, including the discussed EST studies, would be of limited use for identifying rearrangements [44]. An elegant gene identification signature analysis using paired-end ditag transcriptome sequencing methodology has been developed for the detection of gene fusions and other aberrant transcripts in cancers [44]. The approach involves generation of 18-nt-long tags from both ends of a transcript, which are then concatenated and sequenced by the 454 technology. This strategy is particularly useful for detecting fusion events in cancers, as well as actively transcribed pseudogenes that are readily distinguishable from their source genomic loci [44]. Related technologies involving other next-generation sequencing platforms are currently being developed by several laboratories. Applications of next-generation sequencing for the analysis of epigenetic modifications of histones and DNA Epigenetics is the study of heritable gene regulation that does not involve the DNA sequence. The two major types of epigenetic modifications regulating gene expression are DNA methylation by covalent modification of cytosine-5 and posttranslational modifications of histone tails [94]. Regulatory RNAs provide another means of epigenetic regulation of gene expression; however, the focus of this section is on applications of next-generation sequencing to the analysis of covalent modifications of DNA and chromatin. Recent research has implicated such epigenetic modifications of prime importance in oncogenesis and development, setting the grounds for the Human Epigenome Project (HEP) initiative, which aims to catalog DNA methylation patterns on a genome-wide scale [95]. The next-generation sequencing technologies offer the potential to accelerate epigenomic research substantially. To date, these technologies have been applied in several epigenomic areas, including the characterization of DNA methylation patterns, posttranslational modifications of histones, and nucleosome positioning on a genome-wide scale.

7 O. Morozova, M.A. Marra / Genomics 92 (2008) DNA methylation profiling by bisulfite DNA sequencing Cataloging genome-wide DNA methylation patterns, the most commonly studied epigenetic modification, is the primary goal of the HEP [95]. As part of the project, methylation profiles have been generated for chromosomes 6, 20, and 22 in 12 different tissues using bisulfite DNA sequencing on a Sanger instrument [96]. There are three main approaches to detecting DNA methylation on a large scale, including restriction endonuclease digestion coupled to microarray technology, bisulfite sequencing, and immunoprecipitation of 5 - methylcytosine to separate methylated from unmethylated DNA (for details see review by Callinan and Feinberg [94]). Bisulfite sequencing, the approach used in the HEP, is based on the chemical property of bisulfite to induce the conversion of cytosine residues to uracils while leaving 5 -methylcytosines intact. Therefore, sequencing of bisulfitetreated DNA will reveal the positions of methylated cytosines (those positions that remained cytosines following the treatment). A recent study by Taylor et al. [97] improved upon the bisulfite DNA sequencing procedure by using the 454 technology to sequence bisulfite-treated PCR amplicons corresponding to gene-related CpG-rich regions. The method, termed ultradeep bisulfite sequencing, was applied to examine methylation patterns at 25 gene-related CpG-rich regions in several hematopoietic tumors. The study generated N1600 individual sequences from each amplicon in contrast to the approximately 20 clones typically generated by conventional bisulfite sequencing, providing a superior robust alternative that does not involve cloning and allows for the simultaneous analysis of multiple genes and multiple samples [97]. A similar study using the Illumina technology has been recently reported [98]. Sequence census applications for mapping histone modifications and the locations of DNA-binding proteins Posttranslational covalent modifications of histone tails, which include methylation, acetylation, phosphorylation, and ADP-ribosylation, are thought to control gene expression by regulating the strength of DNA histone interactions determining the accessibility of DNA to transcriptional regulators [99]. Historically, histone modifications have been identified by chromatin immunoprecipitation (ChIP) which, in brief, involves cross-linking proteins to DNA, followed by immunoprecipitation of a protein of interest with a specific antibody, and characterization of the bound DNA by hybridization [38] or PCR amplification [100]. The genome-wide development of the ChIP method using microarrays, known as ChIP chip, combined the ChIP procedure with hybridization to a microarray to reveal the genome-wide distribution of the protein of interest [101]. Sequence census methods have been recently coupled to the basic ChIP protocol to provide an alternative method for surveying histone modifications on a genome-wide scale. Roh et al. [102] used ChIP followed by a Sanger sequencing-based SAGE procedure (also referred to as the genome-wide mapping technique) to study the distribution of acetylated histones H3 and H4 in the yeast genome [102]. Bhinge et al. [103] replaced Sanger sequencing with the 454 sequencing technology and termed the method sequence tag analysis of genomic enrichment (STAGE). STAGE was successfully applied to identifying the genome-wide binding locations of the STAT1 transcription factor [103]; however,as in the case of the ChIP SAGE protocol, it can also be used for the detection of histone modifications. Importantly, in these two methods the sequence reads are derived from the areas of ChIP DNA that are next to a restriction endonuclease site used in SAGE. The introduction of next-generation sequencing to the field has brought about the development of a new sequencing-based method, named ChIP Seq, for detecting histone modifications on a genomewide scale. In this method, immunoprecipitated DNA is used to construct sequencing libraries for analysis on a next-generation sequencer to generate short sequence reads that, in contrast to ChIP SAGE or STAGE, could be derived from either end of a ChIP DNA fragment regardless of the presence of a restriction site [35,36]. The number of reads that map to a particular genomic area can be used to quantify the strength of binding of the protein of interest in this area (or the amount of the assayed histone modification found at the site). To date, Illumina technology has been most commonly used for the ChIP Seq application. The 25- to 30-bp read length obtained on an Illumina sequencer suffices to map a typical 150- to 200-bp ChIP DNA fragment that may be sequenced from both ends. ChIP Seq has been applied to the identification of histone modifications on a genomewide scale in the human genome [32,37]. In addition, it has been also used to reveal the genome-wide locations of transcription factor binding sites of STAT1 and NRSF [34,36]. Of the genome-wide extensions of the ChIP protocol, ChIP Seq has the potential for the highest resolution as its resolution depends only on the size of the input chromatin fragments and the depth of sequencing. On the other hand, the resolution of ChIP SAGE (STAGE) also depends on the distribution of the restriction enzyme sites in the input ChIP DNA. The resolution of ChIP chip depends on the resolution of probes used for the microarray. Both ChIP SAGE and ChIP Seq require less PCR amplification than ChIP chip and, therefore, may provide improved accuracy for quantifying the binding signal [99]. Applications of next-generation sequencers to the study of DNA accessibility and chromatin structure Next-generation sequencing technologies have been applied to mapping out the positions of nucleosomes and other determinants of DNA accessibility. Nucleosomes are important factors affecting gene regulation and are usually associated with decreased accessibility of DNA to regulatory proteins. Most commonly, nucleosomes are identified by preferential cleavage of linker DNA by micrococcal nuclease (MNase) [99]. The identity of MNase digestion products, revealed by hybridization or sequencing, marks the locations of nucleosomes. Two recent studies used MNase digestion followed by sequencing with the 454 technology to map genome-wide locations of nucleosomes H2A.Z in yeast [104] and nucleosome cores in C. elegans [105]. Albert et al. [104] used MNase digestion followed by immunoprecipitation with an anti-h2a.z antibody to isolate preferentially nucleosomes associated with the particular histone, while Johnson et al. [105] directly sequenced fragments liberated by digestion. ChIP Seq data from genome-wide histone modification profiling experiments can also be used to infer nucleosomal positions on a genome-wide scale. For instance, Schmid and Bucher [106] used ChIP Seq data obtained by Barski et al. [37] to map the positions of two types of nucleosomes, as well as RNA PolII transcription preinitiation complexes in human CD4 + T cells. They achieved this by separately analyzing sequence tags from two DNA strands and assuming that the tags mapping to the sense and antisense strands defined the 5 and 3 boundaries, respectively, of protein DNA complexes [106]. The results obtained from this analysis correlated well with similar findings by Albert et al. [104], who determined the distribution of H2A.Z nucleosomes in yeast and found strong phasing of this type of nucleosome downstream of transcription start sites [104]. The use of ChIP Seq data to map the nucleosome positions has three main limitations. First, only nucleosomes associated with a specific histone modification can be mapped in this manner; second, nucleosome positioning is regulated by certain histone modifications which can, for instance, mark a given nucleosome for removal [107]; and third, the method is not quantitative, as the abundance of reads mapping to a particular region is correlated with the abundance of the histone modification, which is not necessarily correlated with the abundance of the nucleosome [37]. Other uses of the pyrosequencing technology in epigenomics have included identifying DNase I-hypersensitive sites to help infer the role

8 262 O. Morozova, M.A. Marra / Genomics 92 (2008) of 3 -BCL11B in leukemic cis-activation [108] and in the development of the chromosome conformation capture (3C) method to detect higher order chromosomal structures or physical interactions between genomic loci in a high-throughput manner [109]. The original 3C method uses formaldehyde cross-linking followed by restriction enzyme digestion and intramolecular ligation to detect physically interacting genomic loci that are presumably important for regulating gene expression. The abundance of a particular ligation product (detected by quantitative PCR) is a measure of the frequency with which the particular loci interact in the nucleus. The sequencing development of 3C, termed chromosome conformation capture carbon copy (5C), replaces the quantitative PCR detection by 454 sequencing, enabling the use of the 3C approach on a genome-wide scale. The 5C approach successfully identified known and novel looping interactions involving the β-globin locus [109]. Concluding remarks Due to their much improved cost effectiveness, compared to Sanger sequencing, and their many different uses, next-generation sequencing approaches are poised to emerge as the dominant genomics technology. Perhaps most significantly, these new sequencers have provided genome-scale sequencing capacity to individual laboratories in addition to larger genome centers. Compared to Sanger sequencing, advantages of the next-generation technologies mentioned thus far, including 454/Roche [19], Illumina/Solexa [24], and ABI/SOLiD [26], alleviate the need for in vivo cloning by clonal amplification of spatially separated single molecules using either emulsion PCR (454/Roche and ABI/SOLiD) or bridge amplification on solid surface (Illumina/Solexa). In addition to providing a means for cloning-free amplification, these methods use single-molecule templates allowing for the detection of heterogeneity in a DNA sample (e.g., identifying mutations present only in a subpopulation of cells), which is a significant advantage over Sanger sequencing [25,46]. The short read structure of next-generation sequencers provides potential problems for sequence assembly particularly in areas associated with sequence repeats. However, it has found broad applicability in sequence census studies, wherein determining the sequence of the whole DNA molecule is not essential [27]. The short read length also necessitates the development of paired-end sequencing approaches for improved mapping efficiency [41]. To date, such approaches have been reported for the 454 technology (e.g., Korbel et al. [43]) and are being made available for the Illumina and SOLiD platforms. The accuracy of next-generation sequencers is improving, but users generally rely on relatively high redundancy of sequence coverage to determine reliably the sequence of a region, particularly of that containing a polymorphism [46]. Addressing the accuracy issue by improving the reaction chemistry has the potential of further decreasing the current sequencing cost associated with next-generation sequencers. Next-generation sequencing technologies have found broad applicability in functional genomics research. Their applications in the field have included gene expression profiling, genome annotation, small ncrna discovery and profiling, and detection of aberrant transcription, which are areas that have been previously dominated by microarrays. Significantly, several studies found that 454 sequencing correlated well with the established gene expression profiling technologies such as microarray results (correlation coefficients of ) [28] and moderately with SAGE data (correlation coefficients of 0.45) [29]. While the transcriptome sequencing studies discussed here predominantly used the 454 technology, Illumina and SOLiD technologies also offer significant potential for such applications. Another major functional genomic application is determining DNA sequences associated with epigenetic modifications of histones and DNA. Next-generation sequencing approaches have been used in this field to profile DNA methylations, posttranslational modifications of histones, and nucleosome positions on a genome-wide scale. While these areas have been previously addressed by Sanger sequencing, next-generation technologies have improved upon the throughput, the depth of coverage, and the resolution of Sanger sequencing studies. Despite the recent exciting research advances involving nextgeneration sequencers, it should be noted that method development is still in its infancy. Efficient data analysis pipelines are required for many applications before they become routine, and more studies are needed to address the robustness of these techniques as well as the correspondence of results with those obtained by previous methods. Although next-generation sequencers are already being widely used, there are other sequencing methods, such as nanopore sequencing [110], whose scalability is being explored to decrease the sequencing cost and enhance throughput even further. Acknowledgments Support for this work was received from the National Cancer Institute of Canada, Genome Canada, Genome British Columbia, and British Columbia Cancer Foundation. O.M. is a junior trainee of the Michael Smith Foundation for Health Research and a graduate trainee of the Michael Smith Foundation for Health Research/Canadian Institutes of Health Research Bioinformatics Training Program and is supported by a Natural Sciences and Engineering Research Council of Canada fellowship. M.A.M. is a senior scholar of the Michael Smith Foundation for Health Research. References [1] R.E. Green, et al., Analysis of one million base pairs of Neanderthal DNA, Nature 444 (2006) [2] N. Hall, Advanced sequencing technologies and their wider impact in microbiology, J. Exp. Biol. 210 (2007) [3] F. Sanger, A.R. Coulson, A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase, J. Mol. Biol. 94 (1975) [4] A.M. Maxam, W. Gilbert, A new method for sequencing DNA, Proc. Natl. Acad. Sci. U. S. A. 74 (1977) [5] F. Sanger, S. Nicklen, A.R. Coulson, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. USA 74 (1977) [6] E.S. Lander, et al., Initial sequencing and analysis of the human genome, Nature 409 (2001) [7] International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome, Nature 431 (2004) [8] J.C. Venter, et al., The sequence of the human genome, Science 291 (2001) [9] C.A. Hutchison III, DNA sequencing: bench to bedside and beyond, Nucleic Acids Res. 35 (2007) [10] M.L. Metzker, Emerging technologies in DNA sequencing, Genome Res. 15 (2005) [11] L.M. Smith, et al., Fluorescence detection in automated DNA sequence analysis, Nature 321 (1986) [12] J.M. Prober, et al., A system for rapid DNA sequencing with fluorescent chainterminating dideoxynucleotides, Science 238 (1987) [13] A.S. Cohen, D.R. Najarian, A. Paulus, A. Guttman, J.A. Smith, B.L. Karger, Rapid separation and purification of oligonucleotides by high-performance capillary gel electrophoresis, Proc. Natl. Acad. Sci. U. S. A. 85 (1988) [14] X.C. Huang, M.A. Quesada, R.A. Mathies, DNA sequencing using capillary array electrophoresis, Anal. Chem. 64 (1992) [15] M.C. Ruiz-Martinez, J. Berka, A. Belenkii, F. Foret, A.W. Miller, B.L. Karger, DNA sequencing by capillary electrophoresis with replaceable linear polyacrylamide and laser-induced fluorescence detection, Anal. Chem. 65 (1993) [16] R.S. Madabhushi, Separation of 4-color DNA sequencing extension products in noncovalently coated capillaries using low viscosity polymer solutions, Electrophoresis 19 (1998) [17] E. Carrilho, DNA sequencing by capillary array electrophoresis and microfabricated array systems, Electrophoresis 21 (2000) [18] B. Ewing, P. Green, Base-calling of automated sequencer traces using phred. II. Error probabilities, Genome Res. 8 (1998) [19] M. Margulies, et al., Genome sequencing in microfabricated high-density picolitre reactors, Nature 437 (2005) [20] D.S. Tawfik, A.D. Griffiths, Man-made cell-like compartments for molecular evolution, Nat. Biotechnol. 16 (1998) [21] P. Nyren, B. Pettersson, M. Uhlen, Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay, Anal. Biochem. 208 (1993)

9 O. Morozova, M.A. Marra / Genomics 92 (2008) [22] M. Ronaghi, S. Karamohamed, B. Pettersson, M. Uhlen, P. Nyren, Real-time DNA sequencing using detection of pyrophosphate release, Anal. Biochem. 242 (1996) [23] S. Bennett, Solexa Ltd, Pharmacogenomics 5 (2004) [24] S.T. Bennett, C. Barnes, A. Cox, L. Davies, C. Brown, Toward the 1,000 dollars human genome, Pharmacogenomics 6 (2005) [25] D.R. Bentley, Whole-genome re-sequencing, Curr. Opin. Genet. Dev. 16 (2006) [26] J. Shendure, et al., Accurate multiplex polony sequencing of an evolved bacterial genome, Science 309 (2005) [27] B. Wold, R.M. Myers, Sequence census methods for functional genomics, Nat. Methods 5 (2008) [28] T.T. Torres, M. Metta, B. Ottenwalder, C. Schlotterer, Gene expression profiling by massively parallel sequencing, Genome Res. 18 (2008) [29] M.N. Bainbridge, et al., Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach, BMC Genomics 7 (2006) 246. [30] R.D. Morin, et al., Application of massively parallel sequencing to microrna profiling and discovery in human embryonic stem cells, Genome Res. 18 (2008) [31] V.E. Velculescu, L. Zhang, B. Vogelstein, K.W. Kinzler, Serial analysis of gene expression, Science 270 (1995) [32] T.S. Mikkelsen, et al., Genome-wide maps of chromatin state in pluripotent and lineage-committed cells, Nature 448 (2007) [33] E.R. Mardis, ChIP-seq: welcome to the new frontier, Nat. Methods 4 (2007) [34] G. Robertson, et al., Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing, Nat. Methods 4 (2007) [35] S. Fields, Molecular biology: site-seeing by sequencing, Science 316 (2007) [36] D.S. Johnson, A. Mortazavi, R.M. Myers, B. Wold, Genome-wide mapping of in vivo protein DNA interactions, Science 316 (2007) [37] A. Barski, et al., High-resolution profiling of histone methylations in the human genome, Cell 129 (2007) [38] D.S. Gilmour, J.T. Lis, Detecting protein DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes, Proc. Natl. Acad. Sci. U. S. A. 81 (1984) [39] O. Morozova, M.A. Marra, From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumors, Biochem. Cell Biol. 86 (2008) [40] E. Tuzun, et al., Fine-scale structural variation of the human genome, Nat. Genet. 37 (2005) [41] E.R. Mardis, Anticipating the 1,000 dollar genome, Genome Biol. 7 (2006) 112. [42] P. Ng, et al., Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes, Nucleic Acids Res. 34 (2006) e84. [43] J.O. Korbel, et al., Paired-end mapping reveals extensive structural variation in the human genome, Science 318 (2007) [44] Y. Ruan, et al., Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using paired-end ditags (PETs), Genome Res. 17 (2007) [45] W. Brockman, et al., Quality scores and SNP detection in sequencing-by-synthesis systems, Genome Res. 18 (2008) [46] R.K. Thomas, et al., Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing, Nat. Med. 12 (2006) [47] S. Tyagi, Taking a census of mrna populations with microbeads, Nat. Biotechnol. 18 (2000) [48] M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science 270 (1995) [49] S.A. Ness, Microarray analysis: basic strategies for successful experiments, Mol. Biotechnol. 36 (2007) [50] S. Brenner, et al., Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays, Nat. Biotechnol. 18 (2000) [51] S. Audic, J.M. Claverie, The significance of digital gene expression profiles, Genome Res. 7 (1997) [52] K.L. Nielsen, A.L. Hogh, J. Emmersen, DeepSAGE digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples, Nucleic Acids Res. 34 (2006) e133. [53] S.M. Wang, Understanding SAGE data, Trends Genet. 23 (2007) [54] S. Saha, et al., Using the transcriptome to annotate the genome, Nat. Biotechnol. 20 (2002) [55] M. Gowda, H. Li, J. Alessi, F. Chen, R. Pratt, G.L. Wang, Robust analysis of 5 transcript ends (5 -RATE): a novel technique for transcriptome analysis and genome annotation, Nucleic Acids Res. 34 (2006) e126. [56] M. Gowda, C. Jantasuriyarat, R.A. Dean, G.L. Wang, Robust-LongSAGE (RL-SAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis, Plant Physiol. 134 (2004) [57] M. Seki, et al., Functional annotation of a full-length Arabidopsis cdna collection, Science 296 (2002) [58] L.D. Hillier, et al., Generation and analysis of 280,000 human expressed sequence tags, Genome Res. 6 (1996) [59] W.R. McCombie, et al., Caenorhabditiselegansexpressed sequence tagsidentifygene families and potential disease gene homologues, Nat. Genet. 1 (1992) [60] M. Sun, G. Zhou, S. Lee, J. Chen, R.Z. Shi, S.M. Wang, SAGE is far more sensitive than EST for detecting low-abundance transcripts, BMC Genomics 5 (2004) 1. [61] A.P. Weber, K.L. Weber, K. Carr, C. Wilkerson, J.B. Ohlrogge, Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing, Plant Physiol. 144 (2007) [62] M.W. Jones-Rhoades, J.O. Borevitz, D. Preuss, Genome-wide expression profiling of the Arabidopsis female gametophyte identifies families of small, secreted proteins, PLoS Genet. 3 (2007) [63] F. Cheung, B.J. Haas, S.M. Goldberg, G.D. May, Y. Xiao, C.D. Town, Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology, BMC Genomics 7 (2006) 272. [64] K. Ohtsu, et al., Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.), Plant J. 52 (2007) [65] A.L. Toth, et al., Wasp gene expression supports an evolutionary link between maternal behavior and eusociality, Science 318 (2007) [66] J.C. Vera, et al., Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing, Mol. Ecol. 17 (2008) [67] A.L. Eveland, D.R. McCarty, K.E. Koch, Transcript profiling by 3 -untranslated region sequencing resolves expression of gene families, Plant Physiol. 146 (2008) [68] W. Filipowicz, S.N. Bhattacharyya, N. Sonenberg, Mechanisms of posttranscriptional regulation by micrornas: are the answers in sight? Nat. Rev. Genet. 9 (2008) [69] C. Lu, S.S. Tej, S. Luo, C.D. Haudenschild, B.C. Meyers, P.J. Green, Elucidation of the small RNA component of the transcriptome, Science 309 (2005) [70] B.C. Meyers, F.F. Souret, C. Lu, P.J. Green, Sweating the small stuff: microrna discovery in plants, Curr. Opin. Biotechnol. 17 (2006) [71] Z. Xie, et al., Genetic and functional diversification of small RNA pathways in plants, PLoS Biol. 2 (2004) E104. [72] R.C. Lee, R.L. Feinbaum, V. Ambros, The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14, Cell 75 (1993) [73] E. Berezikov, E. Cuppen, R.H. Plasterk, Approaches to microrna discovery, Nat. Genet. 38 Suppl. (2006) S2 S7. [74] M.J. Axtell, C. Jan, R. Rajagopalan, D.P. Bartel, A two-hit trigger for sirna biogenesis in plants, Cell 127 (2006) [75] I.R. Henderson, et al., Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning, Nat. Genet. 38 (2006) [76] C. Lu, et al., MicroRNAs and other small RNAs enriched in the Arabidopsis RNAdependent RNA polymerase-2 mutant, Genome Res. 16 (2006) [77] R. Rajagopalan, H. Vaucheret, J. Trejo, D.P. Bartel, A diverse and evolutionarily fluid set of micrornas in Arabidopsis thaliana, Genes Dev. 20 (2006) [78] N. Fahlgren, et al., High-throughput sequencing of Arabidopsis micrornas: evidence for frequent birth and death of MIRNA genes, PLoS One 2 (2007) e219. [79] K.D. Kasschau, et al., Genome-wide profiling and analysis of Arabidopsis sirnas, PLoS Biol. 5 (2007) e57. [80] M.D. Howell, et al., Genome-wide analysis of the RNA-dependent RNA polymerase 6/dicer-like 4 pathway in Arabidopsis reveals dependency on mirna- and tasirna-directed targeting, Plant Cell 19 (2007) [81] Y. Yao, et al., Cloning and characterization of micrornas from wheat (Triticum aestivum L.), Genome Biol. 8 (2007) R96. [82] A. Barakat, K. Wall, J. Leebens-Mack, Y.J. Wang, J.E. Carlson, C.W. Depamphilis, Large-scale identification of micrornas from a basal eudicot (Eschscholzia californica) and conservation in flowering plants, Plant J. 51 (2007) [83] T. Zhao, et al., A complex system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii, Genes Dev. 21 (2007) [84] J. Burnside, et al., Marek's disease virus encodes micrornas that map to meq and the latency-associated transcript, J. Virol. 80 (2006) [85] E. Berezikov, et al., Diversity of micrornas in human and chimpanzee brain, Nat. Genet. 38 (2006) [86] N.C. Lau, et al., Characterization of the pirna complex from rat testes, Science 313 (2006) [87] A. Girard, R. Sachidanandam, G.J. Hannon, M.A. Carmell, A germline-specific class of small RNAs binds mammalian Piwi proteins, Nature 442 (2006) [88] S. Houwing, et al., A role for Piwi and pirnas in germ cell maintenance and transposon silencing in zebrafish, Cell 129 (2007) [89] M.R. Brent, Steady progress and recent breakthroughs in the accuracy of automated genome annotation, Nat. Rev. Genet. 9 (2008) [90] M.R. Emmert-Buck, et al., Laser capture microdissection, Science 274 (1996) [91] S.J. Emrich, W.B. Barbazuk, L. Li, P.S. Schnable, Gene discovery and annotation using LCM-454 transcriptome sequencing, Genome Res. 17 (2007) [92] W.B. Barbazuk, S.J. Emrich, H.D. Chen, L. Li, P.S. Schnable, SNP discovery via 454 transcriptome sequencing, Plant J. 51 (2007) [93] D. Hanahan, R.A. Weinberg, The hallmarks of cancer, Cell 100 (2000) [94] P.A. Callinan, A.P. Feinberg, The emerging science of epigenomics, Hum. Mol. Genet. 15 Spec. No. 1 (2006) R95 R101. [95] M. Esteller, The necessity of a human epigenome project, Carcinogenesis 27 (2006) [96] F. Eckhardt, et al., DNA methylation profiling of human chromosomes 6, 20 and 22, Nat. Genet. 38 (2006) [97] K.H. Taylor, et al., Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing, Cancer Res. 67 (2007)

10 264 O. Morozova, M.A. Marra / Genomics 92 (2008) [98] S.J. Cokus, et al., Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA methylation patterning, Nature 452 (2008) [99] D.E. Schones, K. Zhao, Genome-wide approaches to studying chromatin modifications, Nat. Rev. Genet. 9 (2008) [100] M.H. Kuo, C.D. Allis, In vivo cross-linking and immunoprecipitation for studying dynamic protein:dna associations in a chromatin environment, Methods 19 (1999) [101] B. Ren, et al., Genome-wide location and function of DNA binding proteins, Science 290 (2000) [102] T.Y. Roh, W.C. Ngau, K. Cui, D. Landsman, K. Zhao, High-resolution genome-wide mapping of histone modifications, Nat. Biotechnol. 22 (2004) [103] A.A. Bhinge, J. Kim, G.M. Euskirchen, M. Snyder, V.R. Iyer, Mapping the chromosomal targets of STAT1 by sequence tag analysis of genomic enrichment (STAGE), Genome Res. 17 (2007) [104] I. Albert, et al., Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome, Nature 446 (2007) [105] S.M. Johnson, F.J. Tan, H.L. McCullough, D.P. Riordan, A.Z. Fire, Flexibility and constraint in the nucleosome core landscape of Caenorhabditis elegans chromatin, Genome Res. 16 (2006) [106] C.D. Schmid, P. Bucher, ChIP-Seq data reveal nucleosome architecture of human promoters, Cell 131 (2007) author reply [107] H. Boeger, J. Griesenbeck, J.S. Strattan, R.D. Kornberg, Nucleosomes unfold completely at a transcriptionally active promoter, Mol. Cell 11 (2003) [108] S. Nagel, et al., Activation of TLX3 and NKX2-5 in t(5;14)(q35;q32) T-cell acute lymphoblastic leukemia by remote 3 -BCL11B enhancers and coregulation by PU.1 and HMGA1, Cancer Res. 67 (2007) [109] J. Dostie, et al., Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements, Genome Res. 16 (2006) [110] D.W. Deamer, M. Akeson, Nanopores and nucleic acids: prospects for ultrarapid sequencing, Trends Biotechnol. 18 (2000)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977

More information

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Genetic Analysis. Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis Genetic Analysis Phenotype analysis: biological-biochemical analysis Behaviour under specific environmental conditions Behaviour of specific genetic configurations Behaviour of progeny in crosses - Genotype

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection Over the past three years, massively

More information

How many of you have checked out the web site on protein-dna interactions?

How many of you have checked out the web site on protein-dna interactions? How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss

More information

Introduction to next-generation sequencing data

Introduction to next-generation sequencing data Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

The world of non-coding RNA. Espen Enerly

The world of non-coding RNA. Espen Enerly The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

July 7th 2009 DNA sequencing

July 7th 2009 DNA sequencing July 7th 2009 DNA sequencing Overview Sequencing technologies Sequencing strategies Sample preparation Sequencing instruments at MPI EVA 2 x 5 x ABI 3730/3730xl 454 FLX Titanium Illumina Genome Analyzer

More information

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova New generation sequencing: current limits and future perspectives Giorgio Valle CRIBI Università di Padova Around 2004 the Race for the 1000$ Genome started A few questions... When? How? Why? Standard

More information

Computational Genomics. Next generation sequencing (NGS)

Computational Genomics. Next generation sequencing (NGS) Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years

More information

Next generation DNA sequencing technologies. theory & prac-ce

Next generation DNA sequencing technologies. theory & prac-ce Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

PreciseTM Whitepaper

PreciseTM Whitepaper Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis

More information

DNA Sequence Analysis

DNA Sequence Analysis DNA Sequence Analysis Two general kinds of analysis Screen for one of a set of known sequences Determine the sequence even if it is novel Screening for a known sequence usually involves an oligonucleotide

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

Essentials of Real Time PCR. About Sequence Detection Chemistries

Essentials of Real Time PCR. About Sequence Detection Chemistries Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu. Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.au What is Gene Expression & Gene Regulation? 1. Gene Expression

More information

An Overview of DNA Sequencing

An Overview of DNA Sequencing An Overview of DNA Sequencing Prokaryotic DNA Plasmid http://en.wikipedia.org/wiki/image:prokaryote_cell_diagram.svg Eukaryotic DNA http://en.wikipedia.org/wiki/image:plant_cell_structure_svg.svg DNA Structure

More information

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium

Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable

More information

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The GS Junior System The Power of Next-Generation Sequencing on Your Benchtop Proven technology: Uses the same long

More information

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.

New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc. New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System

More information

How is genome sequencing done?

How is genome sequencing done? How is genome sequencing done? Using 454 Sequencing on the Genome Sequencer FLX System, DNA from a genome is converted into sequence data through four primary steps: Step One DNA sample preparation; Step

More information

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne

Sanger Sequencing and Quality Assurance. Zbigniew Rudzki Department of Pathology University of Melbourne Sanger Sequencing and Quality Assurance Zbigniew Rudzki Department of Pathology University of Melbourne Sanger DNA sequencing The era of DNA sequencing essentially started with the publication of the enzymatic

More information

DNA Sequencing & The Human Genome Project

DNA Sequencing & The Human Genome Project DNA Sequencing & The Human Genome Project An Endeavor Revolutionizing Modern Biology Jutta Marzillier, Ph.D Lehigh University Biological Sciences November 13 th, 2013 Guess, who turned 60 earlier this

More information

Overview of Next Generation Sequencing platform technologies

Overview of Next Generation Sequencing platform technologies Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany Outline 1. Technologies

More information

Next Generation Sequencing for DUMMIES

Next Generation Sequencing for DUMMIES Next Generation Sequencing for DUMMIES Looking at a presentation without the explanation from the author is sometimes difficult to understand. This document contains extra information for some slides that

More information

Forensic DNA Testing Terminology

Forensic DNA Testing Terminology Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.

More information

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research March 17, 2011 Rendez-Vous Séquençage Presentation Overview Core Technology Review Sequence Enrichment Application

More information

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold

More information

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

Automated DNA sequencing 20/12/2009. Next Generation Sequencing DNA sequencing the beginnings Ghent University (Fiers et al) pioneers sequencing first complete gene (1972) first complete genome (1976) Next Generation Sequencing Fred Sanger develops dideoxy sequencing

More information

Introduction To Real Time Quantitative PCR (qpcr)

Introduction To Real Time Quantitative PCR (qpcr) Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Concepts and methods in sequencing and genome assembly

Concepts and methods in sequencing and genome assembly BCM-2004 Concepts and methods in sequencing and genome assembly B. Franz LANG, Département de Biochimie Bureau: H307-15 Courrier électronique: Franz.Lang@Umontreal.ca Outline 1. Concepts in DNA and RNA

More information

Control of Gene Expression

Control of Gene Expression Home Gene Regulation Is Necessary? Control of Gene Expression By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring

More information

Illumina Sequencing Technology

Illumina Sequencing Technology Illumina Sequencing Technology Highest data accuracy, simple workflow, and a broad range of applications. Introduction Figure 1: Illumina Flow Cell Illumina sequencing technology leverages clonal array

More information

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material

More information

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d. 13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

More information

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!! DNA Replication & Protein Synthesis This isn t a baaaaaaaddd chapter!!! The Discovery of DNA s Structure Watson and Crick s discovery of DNA s structure was based on almost fifty years of research by other

More information

Profiling of non-coding RNA classes Gunter Meister

Profiling of non-coding RNA classes Gunter Meister Profiling of non-coding RNA classes Gunter Meister RNA Biology Regensburg University Universitätsstrasse 31 93053 Regensburg Overview Classes of non-coding RNAs Profiling strategies Validation Protein-RNA

More information

Recombinant DNA and Biotechnology

Recombinant DNA and Biotechnology Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study

More information

Mir-X mirna First-Strand Synthesis Kit User Manual

Mir-X mirna First-Strand Synthesis Kit User Manual User Manual Mir-X mirna First-Strand Synthesis Kit User Manual United States/Canada 800.662.2566 Asia Pacific +1.650.919.7300 Europe +33.(0)1.3904.6880 Japan +81.(0)77.543.6116 Clontech Laboratories, Inc.

More information

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening Micro RNAs: potentielle Biomarker für das Blutspenderscreening micrornas - Background Types of RNA -Coding: messenger RNA (mrna) -Non-coding (examples): Ribosomal RNA (rrna) Transfer RNA (trna) Small nuclear

More information

Polyacrylamide gels have a pore size of only a few nm

Polyacrylamide gels have a pore size of only a few nm Gel electrophoresis Gel electrophoresis Separates - DNA fragments (single nucleotide resolution) - proteins - Protein-DNA complexes can be analyzed by gel electrophoresis native - denatured gel-electrophoresis

More information

- In 1976 1977, Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides.

- In 1976 1977, Allan Maxam and walter Gilbert devised the first method for sequencing DNA fragments containing up to ~ 500 nucleotides. DNA Sequencing - DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases adenine, guanine, cytosine, and thymine in a molecule of DNA. -

More information

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Additional Markers 13 core STR loci Obtain further information from additional markers: Y STRs Separating male samples Mitochondrial DNA Working with extremely degraded

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression What is Gene Expression? Gene expression is the process by which informa9on from a gene is used in the synthesis of a func9onal gene product. What is Gene Expression? Figure

More information

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99. 1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence

More information

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR

Technical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR Roche Applied Science Technical Note No. LC 18/2004 Purpose of this Note Assay Formats for Use in Real-Time PCR The LightCycler Instrument uses several detection channels to monitor the amplification of

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

Biotechnology: DNA Technology & Genomics

Biotechnology: DNA Technology & Genomics Chapter 20. Biotechnology: DNA Technology & Genomics 2003-2004 The BIG Questions How can we use our knowledge of DNA to: diagnose disease or defect? cure disease or defect? change/improve organisms? What

More information

Structure and Function of DNA

Structure and Function of DNA Structure and Function of DNA DNA and RNA Structure DNA and RNA are nucleic acids. They consist of chemical units called nucleotides. The nucleotides are joined by a sugar-phosphate backbone. The four

More information

Gene Expression Assays

Gene Expression Assays APPLICATION NOTE TaqMan Gene Expression Assays A mpl i fic ationef ficienc yof TaqMan Gene Expression Assays Assays tested extensively for qpcr efficiency Key factors that affect efficiency Efficiency

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression (Learning Objectives) Explain the role of gene expression is differentiation of function of cells which leads to the emergence of different tissues, organs, and organ systems

More information

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

Complex multicellular organisms are produced by cells that switch genes on and off during development.

Complex multicellular organisms are produced by cells that switch genes on and off during development. Home Control of Gene Expression Gene Regulation Is Necessary? By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

Bioinformatics I, WS 09-10, D. Huson, January 27, 2010 145

Bioinformatics I, WS 09-10, D. Huson, January 27, 2010 145 Bioinformatics I, WS 09-10, D. Huson, January 27, 2010 145 10 DNA sequencing This exposition is very closely based on the following sources, which are all recommended reading: 1. Clyde A. Hutchison, DNA

More information

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual

Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual F- 470S 20 cdna synthesis reactions (20 µl each) F- 470L 100 cdna synthesis reactions (20 µl each) Table of contents 1. Description...

More information

Sequencing the Human Genome

Sequencing the Human Genome Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data

More information

The Biotechnology Education Company

The Biotechnology Education Company EDVTEK P.. Box 1232 West Bethesda, MD 20827-1232 The Biotechnology 106 EDV-Kit # Principles of DNA Sequencing Experiment bjective: The objective of this experiment is to develop an understanding of DNA

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information

Genome Sequencer 20 System. First to the Finish. www.roche-applied-science.com

Genome Sequencer 20 System. First to the Finish. www.roche-applied-science.com Genome Sequencer 20 System First to the Finish www.roche-applied-science.com Technology 4-5 Process Steps 8-11 DNA Library Preparation 8 empcr Amplification 9 Sequencing-by-Synthesis 10-11 The Genome Sequencer

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

Techniques in Molecular Biology (to study the function of genes)

Techniques in Molecular Biology (to study the function of genes) Techniques in Molecular Biology (to study the function of genes) Analysis of nucleic acids: Polymerase chain reaction (PCR) Gel electrophoresis Blotting techniques (Northern, Southern) Gene expression

More information

Personal Genome Sequencing with Complete Genomics Technology. Maido Remm

Personal Genome Sequencing with Complete Genomics Technology. Maido Remm Personal Genome Sequencing with Complete Genomics Technology Maido Remm 11 th Oct 2010 Three related papers 1. Describing the Complete Genomics technology Drmanac et al., Science 1 January 2010: Vol. 327.

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

ChIP TROUBLESHOOTING TIPS

ChIP TROUBLESHOOTING TIPS ChIP TROUBLESHOOTING TIPS Creative Diagnostics Abstract ChIP dissects the spatial and temporal dynamics of the interactions between chromatin and its associated factors CD Creative Diagnostics info@creative-

More information

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.

1. Molecular computation uses molecules to represent information and molecular processes to implement information processing. Chapter IV Molecular Computation These lecture notes are exclusively for the use of students in Prof. MacLennan s Unconventional Computation course. c 2013, B. J. MacLennan, EECS, University of Tennessee,

More information

MUTATION, DNA REPAIR AND CANCER

MUTATION, DNA REPAIR AND CANCER MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful

More information

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director Gene expression depends upon multiple factors Gene Transcription

More information

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-

More information

Transcription and Translation of DNA

Transcription and Translation of DNA Transcription and Translation of DNA Genotype our genetic constitution ( makeup) is determined (controlled) by the sequence of bases in its genes Phenotype determined by the proteins synthesised when genes

More information

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,

More information

restriction enzymes 350 Home R. Ward: Spring 2001

restriction enzymes 350 Home R. Ward: Spring 2001 restriction enzymes 350 Home Restriction Enzymes (endonucleases): molecular scissors that cut DNA Properties of widely used Type II restriction enzymes: recognize a single sequence of bases in dsdna, usually

More information

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing

More information

DNA (genetic information in genes) RNA (copies of genes) proteins (functional molecules) directionality along the backbone 5 (phosphate) to 3 (OH)

DNA (genetic information in genes) RNA (copies of genes) proteins (functional molecules) directionality along the backbone 5 (phosphate) to 3 (OH) DNA, RNA, replication, translation, and transcription Overview Recall the central dogma of biology: DNA (genetic information in genes) RNA (copies of genes) proteins (functional molecules) DNA structure

More information

Protein Synthesis How Genes Become Constituent Molecules

Protein Synthesis How Genes Become Constituent Molecules Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein

More information

Real-Time PCR Vs. Traditional PCR

Real-Time PCR Vs. Traditional PCR Real-Time PCR Vs. Traditional PCR Description This tutorial will discuss the evolution of traditional PCR methods towards the use of Real-Time chemistry and instrumentation for accurate quantitation. Objectives

More information

Introduction to SAGEnhaft

Introduction to SAGEnhaft Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene

More information

AP BIOLOGY 2009 SCORING GUIDELINES

AP BIOLOGY 2009 SCORING GUIDELINES AP BIOLOGY 2009 SCORING GUIDELINES Question 4 The flow of genetic information from DNA to protein in eukaryotic cells is called the central dogma of biology. (a) Explain the role of each of the following

More information

Interaktionen von RNAs und Proteinen

Interaktionen von RNAs und Proteinen Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental

More information

Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation

Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation Genome-wide measurements of protein-dna interaction by chromatin immunoprecipitation D. Puthier. laboratoire INSERM, Aix-Marseille Université, TAGC/INSERM U928, Parc Scientifique de Luminy case 928 Outline

More information

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction Outline MicroRNA Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology (CMB) Karolinska Institutet! Introduction! microrna target site prediction! Useful resources 2 short non-coding RNAs

More information

13.4 Gene Regulation and Expression

13.4 Gene Regulation and Expression 13.4 Gene Regulation and Expression Lesson Objectives Describe gene regulation in prokaryotes. Explain how most eukaryotic genes are regulated. Relate gene regulation to development in multicellular organisms.

More information

1 Mutation and Genetic Change

1 Mutation and Genetic Change CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds

More information

HiPer RT-PCR Teaching Kit

HiPer RT-PCR Teaching Kit HiPer RT-PCR Teaching Kit Product Code: HTBM024 Number of experiments that can be performed: 5 Duration of Experiment: Protocol: 4 hours Agarose Gel Electrophoresis: 45 minutes Storage Instructions: The

More information

Gene Models & Bed format: What they represent.

Gene Models & Bed format: What they represent. GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,

More information

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAACT GTGCACT GTGAACT STGAAC STGAAC GTGCAC GTGAAC Wouter Coppieters Head of the genomics core facility GIGA center, University of Liège Bioruptor NGS: Unbiased DNA

More information

History of DNA Sequencing & Current Applications

History of DNA Sequencing & Current Applications History of DNA Sequencing & Current Applications Christopher McLeod President & CEO, 454 Life Sciences, A Roche Company IMPORTANT NOTICE Intended Use Unless explicitly stated otherwise, all Roche Applied

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information