Decoding the human genome sequence
|
|
|
- Barry Howard
- 10 years ago
- Views:
Transcription
1 2000 Oxford University Press Human Molecular Genetics, 2000, Vol. 9, No. 16 Review Decoding the human genome sequence David R. Bentley + The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Received 11 August 2000; Accepted 17 August 2000 The year 2000 is marked by the production of the sequence of the human genome. A working draft of high quality sequence covering 90% of the genome has been determined and a quarter is in finished form, including the first two completed chromosomes. All sequence data from the project is made freely available to the community via the Internet, for further analysis and exploitation. The challenge which lies ahead is to decipher the information. Knowledge of the human genome sequence will enable us to understand how the genetic information determines the development, structure and function of the human body. We will be able to explore how variations within our DNA sequence cause disease, how they affect our interaction with our environment and ultimately to develop new and effective ways to improve human health. THE IMPORTANCE OF SEQUENCING WHOLE GENOMES The ability to undertake DNA sequencing on a large scale started a revolution in biology, as it became possible to determine the complete DNA sequence of any organism and therefore obtain a full description of genes and other important biological information stored in the genome. For the first time, it was possible to define the complete set of proteins required for a particular life form, to make full comparisons of protein sets between different species, to discover the basis of their similarities and differences and to explore the evolutionary relationship between them. For example, the 15 Mb genome sequence of baker s yeast Saccharomyces cerevisiae, completed in 1996, contains >6000 genes(1,2). These encode all the functions required for a eukaryotic cell. The 99 Mb genome sequence of the free-living soil nematode worm Caenorhabditis elegans, completed 2 years later, contains genes (3). This constitutes a complete gene set for a multicellular organism, encoding proteins involved in basic eukaryotic cell functions and also many others involved, for example, in development, cell cell interactions, motility, feeding and reproduction. A recent cross comparison of the complete gene sets (4) has revealed that 23% of the proteins encoded by yeast genes have apparent homologues in the nematode worm, reflecting functions common to both organisms. A closer examination of matches between individual protein domains illustrates another level of homology, with 41% of yeast proteins matching at least one protein domain in a nematode protein and 36% being homologous to a representative mammalian protein set (4). It is evident that diversity of function has arisen in evolution as a result of existing domains being used in new combinations, as well as from the generation of entirely new proteins. These model genome projects used a strategy by which a physical map was constructed of overlapping bacterial or yeast artificial chromosome (YAC) clones representing the entire genomic DNA of the organism (5 7). A minimum overlapping set of clones (the tile path) was then chosen for sequencing. Each clone was then subjected to complete sequencing, comprising an initial random shotgun phase followed by a phase of directed finishing, resulting in a final reference sequence with an accuracy of >99.99% (8). Completion of these model genomes thus provided many important technological and biological insights that have since been applied to the human genome. The extraordinary progress of the Human Genome Project was made possible by the remarkable international collaboration which developed from the early 1990s. Fostered by the close cooperation of multiple national funding agencies, 20 centres in six countries have worked together (see Appendix), co-ordinating their efforts by sharing technology, resources and information which were freely available throughout the project. Sequencing in many laboratories was also co-ordinated substantially using the genome map itself. This helped to minimize unnecessary duplication and to allow both large and small partners to contribute effectively. The development of new automated sequencing technology, coupled with the adoption of streamlined approaches to generating genome sequence information on a large scale, led to a dramatic acceleration of the programme during the last 2 years, driven by a growing awareness of the enormous value of gaining a first view of the human genome. The result is the emergence of the working draft. Although incomplete, it provides extensive coverage of most of the human genes. The working draft is therefore both an advanced intermediate on the pathway to the production of the finished reference sequence and a product of immediate value in accelerating studies of individual genes, associations with disease and the study of human sequence variation. The working draft is a dynamic product, subject to continual refinement as new data are added. A brief summary of the strategy, milestones and status of the international programme is outlined below, based on current information that is publicly available. A central list of websites providing information is available at ( + Tel: ; Fax: ; [email protected]
2 2354 Human Molecular Genetics, 2000, Vol. 9, No. 16 Review ). A full analysis is due for publication at the end of STRATEGY An early milestone in tackling the human genome was the construction of a genome-wide genetic linkage map of polymorphic microsatellites, which provided a first framework of 2000 unique landmarks ordered at high odds (>1000:1) across the genome (9). The use of overlapping YACs (10) and, later, radiation hybrid (RH) mapping (11,12) resulted in construction of maps of much higher densities of landmarks for each of the human chromosomes (13,14 20). The RH mapping work included another milestone, the human gene map (21), which was constructed using gene-based markers derived from expressed sequence tags (ESTs) (22) selected from nonredundant clusters, or Unigenes ( Unigene ). To provide the starting material for sequencing, a complete physical map of overlapping clones was constructed using libraries of large insert ( kb) P1-derived artificial chromosomes (PACs) (23) or BACs (24). Clones were assembled into contigs on the basis of overlaps identified by pairwise comparison of restriction fingerprints (25,26) using data generated either by fingerprinting whole genomic libraries or chromosome-specific clone collections. Many overlaps were also identified directly by screening clones with markers from the landmark maps as part of the initial contig assembly process. More recently, large-scale sequencing of BAC clone ends has provided an additional resource to enable new clones to be added to the map directly by matching the BAC end sequences to draft or finished sequences. Additional mapping information on the order, orientation and gap size between contigs has also been established using clones as probes in fluorescence in situ hybridization (FISH) analysis of interphase nuclei or extended DNA fibres. On 15 June 2000, the BAC clone map contained > clones and comprised 1743 contigs representing an estimated 97 98% of the genome, from which clones formed the tile path (or map of accessioned clones) (details are listed at ). To generate the working draft DNA sequence, each BAC or PAC clone in the tile path was subjected to an initial shotgun sequencing phase in which sufficient random reads were assembled into a consensus to provide at least 3-fold coverage of each sequenced base at a quality score [Q value, calculated by the base-calling program PHRED (27,28)] of 20 or higher (29). This provided 95% coverage of the insert. At present, each clone is being taken to a full depth shotgun (typically 8- to 10-fold), and this is being followed by a directed finishing phase in which all gaps are closed, ambiguities are resolved, and the entire finished sequence is checked to ensure an accuracy of >99.99% before submission to the public databases (see Appendix, nos 3, 4 and 12). The availability of draft sequence for every clone in the tile path allows re-calculation of the entire map. Overlaps between all clones can now be based on matching sequences, which provide much more information than the earlier overlap analysis based on fingerprint or landmark data. A set of fragments of accessioned sequence has been merged together for each chromosome and the order and orientation of non-overlapping fragments have been determined where possible using additional information (see This will continue to be refined as more clone sequences are completed. This dataset enables re-assessment of the order of all landmarks relative to the previous maps, thus providing independent validation of long-range order in the working draft. The status of the project can be assessed based on the amount of sequence that is available and comparing this figure to the estimated size of the genome. Of course, any estimate is necessarily an approximation, as the true size of the genome is unknown; this will only be determined when all remaining gaps in the genome sequence are measured [for example using the approaches taken when finishing chromosomes 21 and 22 (see below)]. In addition, analysis of the existing sequence is being refined continually to confirm new overlaps, or to define genomic duplications and to make other changes which affect estimates of sequence coverage. The current estimate is that >90% of the euchromatic human genome sequence is publicly available and 23% is finished [as of 31 August 2000: see current progress at ( )]. The identification of genes and other important biological features in the sequence requires further analysis. This is carried out on each BAC or PAC as it is released, whether in draft or in finished form. A number of data analysis centres are involved in providing up-to-date annotation of the rapidly evolving draft sequence, generally using fast, automatic analyses. These sites provide consistent annotation and interfaces to the data. The Ensembl database, based at the Sanger Centre and the European Bioinformatics Institute, annotates known genes by aligning mrna sequence to genomic DNA using the EST_GENOME program. Further genes are detected by predicting candidate gene features, which are then confirmed using homology searches against EST and protein sequence databases (see ). Analysis of a pre-release of the working draft (24 May 2000, containing 2.2 Gb of sequence data) at Ensembl resulted in detection of predicted gene features; extrapolating from this figure to the amount of sequence that is now available, this figure would increase to genes (see below for discussion of gene number estimates). The National Centre for Biotechnology Information provides a Map Viewer, which displays the available human genome sequence data and includes the location of genes, polymorphisms and other features ( ). WHOLE GENOME SHOTGUNS The recent advances in sequencing technology and throughput have led researchers to reconsider the possibility of shotgun sequencing large and complex genomes directly, thus bypassing the mapping stage (30,31). The potential advantages of speed and simplicity of the strategy are countered by the difficulties faced in the scale of computing analysis required for assembly of the DNA fragments, plus the many false joins which would inevitably arise from assembly of an entire complex genome sequence containing a range of repetitive sequences (32). Assembly of 3.2 million reads (12.8 coverage) of the 120 Mb euchromatic genome of the fruit fly Drosophila melanogaster resultedinanassemblycovering115mbofthe 120 Mb euchromatic region, with 1630 gaps (33). Closure of the gaps and correction of the remaining misassemblies are
3 Human Molecular Genetics, 2000, Vol. 9, No. 16 Review 2355 being done using the scaffold of sequenced BAC clones which was also available and had been aligned to the genome sequence (33). The finishing process is thus modelled on the earlier experience of the nematode genome project. The experience gained in the D.melanogaster whole genome shotgun assembly was used as a basis for a human whole genome assembly. There are two important differences between the two analyses, however, which would be expected to have a marked effect on the efficiency of the sequence assembly process. First, the human genome contains a diverse collection of repetitive sequences comprising 50% of the total. The experience gained in Drosophila is therefore not necessarily representative of the human (or mouse) genomes. Second, the increased size of the human genome (25 times that of the fruit fly) would result in a greatly increased scale of computational analysis. The current availability of a complete working draft of the human genome, assembled on a clone-by-clone basis, however, obviates the need for a whole genome shotgun assembly to be done in isolation. Most of the data from a whole genome shotgun can be aligned with the mapped human sequence that is currently available. The remaining unmatched sequences would be the result of either failures in the matching process, or fragments of human DNA that were represented in the small insert libraries used for the whole genome shotgun, but not represented in the mapped BACs used so far to generate the working draft. Testing of these hypotheses will require access to merged sequence from both projects to enable the appropriate analysis to be carried out. FINISHING HUMAN CHROMOSOMES In December 1999, the sequence of the euchromatic part of chromosome 22q (an acrocentric chromosome) was finished (34) and 33.5 Mb was represented in map 10 contigs, of which the largest was 25 Mb. The nine gaps in the clone map (plus one small gap in the sequence, bridged by a clone) covered a total of just over 1 Mb, and the total coverage obtained was therefore 97%. The gaps were not closed after exhaustive screening of BAC and PAC libraries; work is continuing using other cloning systems. At least three of the gaps were deemed unstable in BACs: fibre FISH analysis of several BAC clones showed that each one contained sequence on both sides of a gap but was missing a central portion, which may have been lost during cloning or in subsequent bacterial culture. Chromosome 21, another acrocentric chromosome, was completed 5 months later (35), 33.8 Mb of 21q was represented in just three contigs and the remaining three gaps in the clone map (plus seven small sequence gaps) accounted for only 0.1 Mb. The total coverage was therefore 99.7% of the euchromatin. In addition, 0.28 Mb of the heterochromatic sequence of 21p was also determined. Both these chromosomes were finished to an accuracy of >99.99%, and annotation of the sequence revealed a marked contrast in gene content. Chromosomes 22 and 21 were found to contain 545 and 225 genes, respectively, plus 134 and 59 pseudogenes, respectively. Differences in base composition and repeat sequence content between the two chromosomes reflected this: the gene-rich chromosome 22 had a higher GC content, more SINEs and slightly fewer LINEs than the genepoor chromosome 21. HOW MANY GENES IN THE GENOME? It is important to realize that we will not know the number of genes in the human genome when the genome sequence is finished. Finding all the genes in the genome is dependent on the resolution of methods to analyse the sequence and to identify genes based on characteristic features such as recognizable motifs, homology to other genes, matches to mrna sequences and ab initio predictions. In many cases, hypothetical gene structures also require experimental validation. With the sequence of the human genome still in draft form, and our knowledge of the repertoire of gene features still limited, estimates of the total number of human genes are necessarily approximate. An early estimate of human genes was based on the range of different sequences found in a population of ESTs sequenced at random from tissues. A similar figure (80 000) was derived from measuring the total number of CpG islands in the genome, and extrapolating on the basis that 56% of known genes were associated with CpG islands (36). The total gene number based on CpG islands may be an overestimate if some CpG islands are not associated with genes, or if there are cases where more than one CpG island is associated withthesamegene. The availability of much larger EST datasets led to greater divergence in gene number estimates, with the most recent examples ranging from (37) to (38). Two problems confound gene number estimates that are based on the complexity of EST collections. First, sampling of ESTs is biased by the tissue sources selected and the depth to which each one is sampled. Therefore, genes which are expressed either at low abundance (i.e. lower than the depth of sampling of the library sequencing), or only in inaccessible tissues or developmental stages, will be under-represented or absent from current datasets. This will lead to an underestimate of the complexity of the EST dataset. Second, EST datasets are composed of incomplete sequences of mrna and independent ESTs which are derived from the same mrna may not overlap on the basis of the available sequence data, thus leading to an overestimate of complexity of the EST collection. Ewing and Green. (38) took both these factors into account in their analysis, by using only 3 EST sequences and by normalizing their results using the annotated sequence of chromosome 22. Two other analyses have yielded similar estimates. First, the identification of evolutionarily conserved regions ( ecores ) between a random set of genomic sequences of the puffer fish Tetraodon nigroviridis and the human genome resulted in estimates of human genes. This study was also normalized with respect to the annotated genes on chromosome 22 (39). Second, a direct extrapolation from the annotated genes on chromosome 22 alone, or chromosomes 21 and 22 combined (assuming that 35 Mb sequence is 1.1% of the genome) leads to estimates of or genes, respectively, excluding pseudogenes. These extrapolations assume that either chromosome 22, or the combination of chromosomes 21 and 22, respectively, are representative of the rest of the human genome. These last estimates are significantly lower than previously assumed views of the number of genes in the human genome. One of the most interesting insights to emerge from comparing complete gene sets in model organisms to date is that gene
4 2356 Human Molecular Genetics, 2000, Vol. 9, No. 16 Review number is not necessarily a good measure of complexity and this trend may be borne out in the human genome. For example, the fruit fly would be considered more anatomically complex, with 10 times more cells, than the nematode (which has 1000 cells) and the fruit fly also undergoes a more complex developmental process than the nematode worm. Yet the fruit fly genome contains only genes, compared with the genes found in the nematode genome. Instead, it appears that biological complexity is achieved in other ways. Relatively few gene products may determine relatively complex biochemical and developmental processes. Greater functional complexity and diversity may result from a relatively small number of novel regulatory proteins working in a variety of combinations, for example, and from the existence of multiple protein isoforms. In practice, the estimates of gene number are most likely to stabilize when the genome sequences of multiple vertebrate genomes are produced and aligned to the human sequence (notably the mouse sequence, for example, is projected to be available within 2 years). This will enable the annotation of genes to be done accurately based on conserved sequences. The location of putative exons can be correlated with functional motifs, such as exon intron boundaries, and matches to known mrna and ESTs, and gene structures can be accepted or rejected based on the occurrence of an open reading frame. GENETIC RECOMBINATION IN THE HUMAN GENOME The earliest maps of the human genome have relied on the measurement of genetic recombination occurring between two loci in the genome. Genetic linkage maps have provided a view of the order and distance between landmarks (polymorphic loci) that is derived by scoring the number of recombinations observed between two loci in a given number of meioses. For mapping applications, it has generally been assumed that the physical distance (in base pairs) is proportional to the genetic distance (in centimorgans) between the loci. However, it is well-known that certain regions of the genome exhibit a higher frequency of recombination than others. The frequency of genetic recombination can now be compared with the physical distance between any two markers, given an accurate measurement of the physical distance (or contiguous sequence) between them. This has been demonstrated on chromosome 22, by plotting the genetic distance between microsatellite markers in the genetic map relative to their physical separation along the finished chromosome sequence. The results showed that the frequency of recombination along the chromosome varied (within the limits of resolution of the study) between 1.1 cm/mb (over most of the chromosome: 28.2 Mb) to 6.2 cm/mb (averaged over four regions totalling 5.2 Mb) (34). The analysis illustrates how to identify potential recombination hotspots. It may require a much higher resolution of genetic analysis, with many more polymorphic markers (see below), but it is clearly possible to identify minimum regions of genomic sequence which may play a critical role in enhanced recombination and to use knowledge of the sequence to design experimental tests of this hypothesis. FINDING CONSERVED REGULATORY SEQUENCE ELEMENTS A wealth of information can be obtained from comparisons between genome sequences, to identify the features which have been conserved during evolution and are therefore likely to be important for survival. Comparisons made between distantly related species (in different phyla) yield information chiefly on gene homologues, as discussed above. Comparison of more closely related species can reveal similarities in gene order (synteny), as seen for example in the extensive synteny over large genomic distances between man and mouse. Great similarity between human and murine sequences is also evident at the sequence level. A recent survey of 77 orthologous mouse and human gene pairs, for example, revealed an average of 82% identity in coding regions (40), in close agreement to an earlier study comparing cdna sequences (41). The gene pair study also illustrated the frequent occurrence of an extensive degree of homology in non-coding regions. Fifty-six per cent of 3 -untranslated regions (UTRs) and 50% of 5 -UTRs had >60% sequence identity. Notably, this level of sequence identity was also observed in the 100 bases immediately upstream of the promoters in 70% of the gene pairs and, over larger blocks, generally conservation was found more often in the upstream sequences (36%) than in introns (23%) (40). The systematic identification of conserved sequences in noncoding regions offers potentially an excellent strategy for the identification of transcriptional regulatory elements. The strategy is most applicable to genes that have conserved functions and expression patterns. For example, the stem cell leukaemia (SCL) gene encodes a basic helix loop helix transcription factor with a key role in the early stages of haematopoiesis and vasculogenesis (42). The SCL gene has a specific pattern of expression that is highly conserved in mammals, birds, frogs and teleost fish, indicating that the sequence elements responsible for this level of transcriptional control might also be conserved. This was confirmed following a comparison of the human, mouse and chicken SCL gene sequences, which revealed a series of highly conserved noncoding segments, some of which corresponded precisely with previously characterized SCL enhancers. The analysis also identified a peak of mouse human chicken homology, located 10 kb downstream of the gene, of no known significance. However, the sequence was found to act as a neural enhancer, directing an appropriate SCL-like pattern of expression of GFP or lacz reporter genes in transgenic frogs and mice, respectively (43). CATALOGUING HUMAN SEQUENCE VARIATION The first human genome sequence provides a reference for characterizing all genes and other features and is applicable to the study of the entire human race. However, there is significant variation between individuals at the level of the DNA sequence. On average, comparison of the sequence of any two genomes results in the detection of a sequence variant every 1000 bases. Most ( 90%) sequence variants are single base substitutions (44,45), known as single nucleotide polymorphisms (SNPs), and the remainder are insertions or deletions of one or more bases. Sequence variants that occur within genes
5 Human Molecular Genetics, 2000, Vol. 9, No. 16 Review 2357 or in regulatory elements may affect the function of expression of a gene. These functional variants are responsible for the genetic component of phenotypic variables, including disease susceptibility and drug response. A major goal in human genetics is to identify these functional variants and then to determine how a specific allele contributes to a particular phenotypic characteristic. The association of a phenotype with a particular functional variant allele may be tested directly, by genotyping the variant in an appropriately selected population, or indirectly using a nearby SNP which is in linkage disequilibrium (LD) with the functional variant. The indirect approach, which is likely to predominate this kind of study in the near future, depends on the extent of LD in the human genome. This is unknown and there is likely to be considerable local variation. Estimates based on simulations vary from 3 to 100 kb (46,47). Work is in progress in multiple laboratories to obtain a true measure of LD in different regions. The availability of the working draft has been critical for the development of a first human map of SNPs of sufficient density to allow the ascertainment of LD on a genome-wide basis and to underpin association studies. In addition to locusspecific targeted approaches for SNP finding, which are based on re-sequencing DNA fragments after amplification from different individuals, three genome-wide approaches have been taken. Matching EST sequences to each other (48), and now to the working draft, allows detection of sequence differences and hence candidate SNPs. Similarly, candidate SNPs have been detected by comparing the draft or finished sequence of overlapping BAC or PAC clones (45; E. Dawson et al., unpublished data), and thus identifying dense clusters of SNPs which are separated by regions of typically Mb. More recently, The SNP Consortium (TSC; a consortium of industrial organizations, the Wellcome Trust and genome research laboratories; see ), have undertaken large-scale genomic sequencing approaches to identify SNPs across the entire genome. Sequences obtained from the DNA of 24 individuals representing a range of ethnic groups were aligned to each other, or to the working draft. Pilot studies carried out either on the whole genome or on chromosome 22 demonstrated the feasibility of the project and, further, that the availability of the genome sequence would more than double the number of candidate SNPs produced (49,50).NearlyalltheSNPscanalsobemappeddirectlyby alignment to the draft sequence and the TSC programme is on course to produce a map at a density of at least one SNP per 5 kb (double the original target) and to finish well ahead of schedule. The integration of SNPs from all sources using the working draft will thus pave the way for a new era in human genetic studies. CONCLUSION Sequencing whole genomes has brought many new insights in our understanding of the biology and genetics of the organism. Forthefirsttimeweareabletocomparecompletegenesets and to begin to unravel the molecular basis for biological complexity. The sequence of one organism can be used for annotation of another. With the benefit of experience and data from the model organism studies, the emergence of the working draft of the human genome provides an immediate platform to study the biology of Homo sapiens. Already there are great advances being made in resolving the human gene set, establishing studies of their expression in different environments using microarrays and in studying their association with human disease. The value of the working draft is already illustrated from discovery and rapid characterization of many genes involved in disease, notably including genes involved in cancer [the breast cancer gene BRCA2 (51)], immune defects [e.g. X-linked lymphoproliferative disease (52)], deafness [Pendred syndrome (53), DFNA5 (54), DFNA13 (55)] to name but a few. The working draft has also made it possible to construct a comprehensive SNP map of the genome. At the same time, we should not lose sight of the task of finishing the reference sequence. Only then will the full potential of sequence analysis be realized. Armed with a genome sequence that is >99.99% accurate and has no gaps, researchers will be able to explore the functional significance of every last base in the human genome and all the variants thereof, assured of the reliability of their basic reference. Determining the genome sequence has been the main purpose of the Human Genome Project until now; realizing its full potential is the challenge that lies ahead. ACKNOWLEDGEMENTS The author is most grateful for helpful discussions with Alex Bateman, Richard Durbin, Tim Hubbard, Jim Mullikin, Jane Rogers, John Sulston and other members of the Sanger Centre. APPENDIX Centres involved in the international human sequencing consortium (source: ; ) 1. Baylor College of Medicine, Houston, TX, USA 2. Beijing Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China 3. DNA Database of Japan 4. European Bioinformatics Institute, Cambridge, UK 5. Gesellschaft für Biotechnologische Forschung mbh, Braunschweig, Germany 6. Genoscope, Evry, France 7. Genome Therapeutics Corporation, Waltham, MA, USA 8. Institute for Molecular Biotechnology, Jena, Germany 9. Joint Genome Institute, US Department of Energy, Walnut Creek, CA, USA 10. Keio University, Tokyo, Japan 11. Max Planck Institute for Molecular Genetics, Berlin, Germany 12. National Center for Biotechnology Information, NIH, Bethesda, MD, USA 13. RIKEN Genomic Sciences Center, Saitama, Japan 14. Sanger Centre, Hinxton, UK 15. Stanford DNA Sequencing and Technology Development Center, Palo Alto, CA, USA 16. Stanford Human Genome Center, Palo Alto, CA, USA 17. University of Washington Genome Center, Seattle, WA, USA 18. University of Washington Multimegabase Sequencing Center,Seattle,WA,USA
6 2358 Human Molecular Genetics, 2000, Vol. 9, No. 16 Review 19. Whitehead Institute for Biomedical Research, MIT, Cambridge, MA, USA 20. Washington University Genome Sequencing Center, St Louis, MO, USA REFERENCES 1. Goffeau, A. et al. (1996) Life with 6000 genes. Science, 274, Goffeau, A.et al. (1997) The Yeast Genome Directory. Nature, 387, The C.elegans Sequencing Consortium (1998) Genome sequence of the nematode C.elegans: a platform for investigating biology. Science, 282, Rubin, G.M. et al. (2000) Comparative genomics of the eukaryotes. Science, 87, Coulson, A. et al. (1986) Towards a physical map of the genome of the nematode Caenorhabditis elegans. Proc. Natl Acad. Sci. USA, 83, Olson, M.V. et al. (1986) Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl Acad. Sci. USA, 83, Coulson, A. et al. (1988) Genome linking with yeast artificial chromosomes. Nature, 335, Wilson, R. et al. (1994) 2.2 Mb of contiguous nucleotide sequence from chromosome III of C.elegans. Nature, 368, Dib, C. et al. (1996) A comprehensive genetic map of the human genome basedon5264microsatellites.nature, 380, Burke, D.T., Carle, G.F. and Olson, M.V (1987) Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science, 236, Cox, D.R. et al. (1990) Radiation hybrid mapping: a somatic cell genetic method for constructing high-resolution maps of mammalian chromosomes. Science, 250, Walter, M.A. et al. (1994) A method for constructing radiation hybrid maps of whole genomes. Nature Genet., 7, Foote, S. et al. (1992) The human Y chromosome: overlapping DNA clones spanning the euchromatic region. Science, 258, Chumakov, I. et al. (1992) Continuum of overlapping clones spanning the entire human chromosome 21q. Nature, 359, Gemmill, R.M. et al. (1995) A second-generation YAC contig map of human chromosome 3. Nature, 377, Krauter, K. et al. (1995) A second-generation YAC contig map of human chromosome 12. Nature, 377, Doggett, N.A. et al. (1995) An integrated physical map of human chromosome 16. Nature, 377, Collins, J.E. et al. (1995) A high-density YAC contig map of human chromosome 22. Nature, 377, Bouffard, G.G. et al. (1997) A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79 kb. Genome Res., 7, Hudson, T.J. et al. (1995) An STS-based map of the human genome. Science, 270, Deloukas, P. et al. (1998) A physical map of human genes. Science, 282, Hillier, L.D. et al. (1996) Generation and analysis of human expressed sequence tags. Genome Res., 6, Ioannou, P.A. et al. (1994) A new bacteriophage P1-derived vector for the propagation of large human DNA fragments. Nature Genet., 6, Shizuya, H.et al. (1992) Cloning and stable maintenance of 300-kilobasepair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA, 89, Gregory, S.G., Howell, G.R. and Bentley, D.R. (1997) Genome mapping by fluorescent fingerprinting. Genome Res., 7, Marra, M.A. et al. (1997) High throughput fingerprint analysis of largeinsert clones. Genome Res., 7, Ewing, B. et al. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, Bouck, J. et al. (1998) Analysis of the quality and utility of random shotgun sequencing at low redundancies. Genome Res., 8, Weber, J.L. and Myers, E.W. (1997) Human whole-genome shotgun sequencing. Genome Res., 7, Venter, J.C. et al. (1998) Shotgun sequencing of the human genome. Science, Green, P. (1997) Against a whole-genome shotgun. Genome Res., 7, Myers, E.W. et al. (2000) A whole-genome assembly of Drosophila. Science, 287, Dunham, I. et al. (1999) The DNA sequence of human chromosome. Nature, 402, Hattori, M. et al. (2000) The DNA sequence of human chromosome 21. The chromosome 21 mapping and sequencing consortium. Nature, 405, Antequera, F. and Bird, A. (1994) Predicting the total number of human genes. Nature Genet., 8, Liang, F. et al. (2000) Gene index analysis of the human genome estimates approximately genes. Nature Genet., 25, Ewing, B. and Green, P. (2000) Analysis of expressed sequence tags indicates human genes. Nature Genet., 25, Roest Crollius, H. et al. (2000) Estimate of human gene number provided by genome-wide analysis using Tetraodon nigriviridus DNA sequence. Nature Genet., 25, Jareborg, N., Birney, E.and Durbin, R. (1999) Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res., 9, Makalowski, W. et al. (1996) Comparitive analysis of 1196 orthologous mouse and human full-length in mrna and protein sequences. Genome Res., 6, Begley, C.G. and Green, E.W. (1999) The SCL gene: from case report to critical hematopoietic regulator. Blood, 93, Gottgens, B. et al. (2000) Analysis of vertebrate SCL loci identifies conserved enhancers. Nature Biotechnol., 18, Wang, D.G. et al. (1998) Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science, 280, Taillon-Miller, P. et al. (1998) Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res., 8, Kruglyak, L. (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet., 22, Collins, A., Lonjou, C. and Morton, N.E. (1999) Genetic epidemiology of single-nucleotide polymorphisms. Proc. Natl Acad. Sci. USA, 96, Picoult-Newberg, L. et al. (1999) Mining SNPs from EST databases. Genome Res., 9, Altschuler, D. et al. (2000) A human SNP map generated by reduced representation shotgun sequencing. Nature, in press. 50. Mullikin, J.C. et al. (2000) A SNP map of human chromosome 22.Nature, in press. 51. Wooster, R. et al. (1995) Identification of the breast cancer susceptibility gene BRCA2. Nature, 378, Coffey, A.J.et al. (1998) Host response to EBV infection in X-linked lymphoproliferative disease results from mutations in an SH2-domain encoding gene. Nature Genet., 20, Everett, L.A. et al. (1997) Pendred syndrome is caused by mutations in a putative sulphate transporter gene (PDS). Nature Genet., 17, Van Laer, L. et al. (1998) Nonsyndromic hearing impairment is associated with a mutation in DFNA5. Nature Genet., 20, McGuirt, W.T. et al. (1999) Mutations in COL11A2 cause non-syndromic hearing loss (DFNA13). Nature Genet., 23,
Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
2 The Human Genome Project
2 The Human Genome Project LAP CHEE TSUI STEVE W. SCHERER Toronto, Canada 1 Introduction 42 2 Chromosome Maps 42 2.1 Genetic Maps 43 2.2 Physical Maps 44 3 DNA Sequencing 45 3.1 cdna Sequencing 47 3.2
Biological Sciences Initiative. Human Genome
Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
Human Genome and Human Genome Project. Louxin Zhang
Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
The Human Genome Project
The Human Genome Project Brief History of the Human Genome Project Physical Chromosome Maps Genetic (or Linkage) Maps DNA Markers Sequencing and Annotating Genomic DNA What Have We learned from the HGP?
Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company
Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just
Focusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
An Overview of DNA Sequencing
An Overview of DNA Sequencing Prokaryotic DNA Plasmid http://en.wikipedia.org/wiki/image:prokaryote_cell_diagram.svg Eukaryotic DNA http://en.wikipedia.org/wiki/image:plant_cell_structure_svg.svg DNA Structure
High-throughput sequencing and big data: implications for personalized medicine?
High-throughput sequencing and big data: implications for personalized medicine? Dominick J. Lemas, PhD Postdoctoral Fellow in Pediatrics - Neonatology Mentor: Jacob E. (Jed) Friedman, PhD What is Big
Human Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
Introduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
SNPbrowser Software v3.5
Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium
Core Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome
Module 2 Genome Viewing Using Genome Browsers to View Annotation of the Human Genome Bert Overduin, Ph.D. PANDA Coordination & Outreach EMBL - European Bioinformatics Institute Wellcome Trust Genome Campus
restriction enzymes 350 Home R. Ward: Spring 2001
restriction enzymes 350 Home Restriction Enzymes (endonucleases): molecular scissors that cut DNA Properties of widely used Type II restriction enzymes: recognize a single sequence of bases in dsdna, usually
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
Searching Nucleotide Databases
Searching Nucleotide Databases 1 When we search a nucleic acid databases, Mascot always performs a 6 frame translation on the fly. That is, 3 reading frames from the forward strand and 3 reading frames
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B
INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE ICH HARMONISED TRIPARTITE GUIDELINE QUALITY OF BIOTECHNOLOGICAL PRODUCTS: ANALYSIS
Current Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
How many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
CCR Biology - Chapter 9 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible
School of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
Becker Muscular Dystrophy
Muscular Dystrophy A Case Study of Positional Cloning Described by Benjamin Duchenne (1868) X-linked recessive disease causing severe muscular degeneration. 100 % penetrance X d Y affected male Frequency
Genetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
Next Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
How To Understand The Science Of Genomics
Curs Bioinformática. Grau Genética GENÓMICA INTRODUCTION TO GENOME SCIENCE Antonio Barbadilla Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Arabidopsis. A Practical Approach. Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham
Arabidopsis A Practical Approach Edited by ZOE A. WILSON Plant Science Division, School of Biological Sciences, University of Nottingham OXPORD UNIVERSITY PRESS List of Contributors Abbreviations xv xvu
Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at
Woods Hole Zebrafish Genetics and Development Bioinformatics/Genomics Lab Ian Woods Note: This document wh_informatics_practical.doc and supporting materials can be downloaded at http://faculty.ithaca.edu/iwoods/docs/wh/
European Medicines Agency
European Medicines Agency July 1996 CPMP/ICH/139/95 ICH Topic Q 5 B Quality of Biotechnological Products: Analysis of the Expression Construct in Cell Lines Used for Production of r-dna Derived Protein
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Genomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
Algorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative
The Human Genome Project From genome to health From human genome to other genomes and to gene function Structural Genomics initiative June 2000 What is the Human Genome Project? U.S. govt. project coordinated
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources
Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold
1865 Discovery: Heredity Transmitted in Units
1859 Discovery: Natural Selection Genetic Timeline Charles Darwin wrote On the Origin of Species by Means of Natural Selection, or the Preservation of Favored Races in the Struggle for Life. 1865 Discovery:
Next Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University [email protected] http://tandem.bu.edu/ The Human Genome Project took
Comparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
Basic Concepts of DNA, Proteins, Genes and Genomes
Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate
4.2.1. What is a contig? 4.2.2. What are the contig assembly programs?
Table of Contents 4.1. DNA Sequencing 4.1.1. Trace Viewer in GCG SeqLab Table. Box. Select the editor mode in the SeqLab main window. Import sequencer trace files from the File menu. Select the trace files
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Chromosomes, Mapping, and the Meiosis Inheritance Connection
Chromosomes, Mapping, and the Meiosis Inheritance Connection Carl Correns 1900 Chapter 13 First suggests central role for chromosomes Rediscovery of Mendel s work Walter Sutton 1902 Chromosomal theory
An Overview of Cells and Cell Research
An Overview of Cells and Cell Research 1 An Overview of Cells and Cell Research Chapter Outline Model Species and Cell types Cell components Tools of Cell Biology Model Species E. Coli: simplest organism
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA
CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA INTRODUCTION DNA : DNA is deoxyribose nucleic acid. It is made up of a base consisting of sugar, phosphate and one nitrogen base.the
SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications
Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each
Gene mutation and molecular medicine Chapter 15
Gene mutation and molecular medicine Chapter 15 Lecture Objectives What Are Mutations? How Are DNA Molecules and Mutations Analyzed? How Do Defective Proteins Lead to Diseases? What DNA Changes Lead to
Innovations in Molecular Epidemiology
Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether
Trasposable elements: P elements
Trasposable elements: P elements In 1938 Marcus Rhodes provided the first genetic description of an unstable mutation, an allele of a gene required for the production of pigment in maize. This instability
Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected]
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected] 1 RNA Transcription to RNA and subsequent
Genetics Lecture Notes 7.03 2005. Lectures 1 2
Genetics Lecture Notes 7.03 2005 Lectures 1 2 Lecture 1 We will begin this course with the question: What is a gene? This question will take us four lectures to answer because there are actually several
Introduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
Gene Mapping Techniques
Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction
Sequencing and microarrays for genome analysis: complementary rather than competing?
Sequencing and microarrays for genome analysis: complementary rather than competing? Simon Hughes, Richard Capper, Sandra Lam and Nicole Sparkes Introduction The human genome is comprised of more than
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe
Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications
Recombinant DNA and Biotechnology
Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study
Targeted. sequencing solutions. Accurate, scalable, fast TARGETED
Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered
Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes
Chapter 2. imapper: A web server for the automated analysis and mapping of insertional mutagenesis sequence data against Ensembl genomes 2.1 Introduction Large-scale insertional mutagenesis screening in
Forensic DNA Testing Terminology
Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.
The world of non-coding RNA. Espen Enerly
The world of non-coding RNA Espen Enerly ncrna in general Different groups Small RNAs Outline mirnas and sirnas Speculations Common for all ncrna Per def.: never translated Not spurious transcripts Always/often
Worksheet - COMPARATIVE MAPPING 1
Worksheet - COMPARATIVE MAPPING 1 The arrangement of genes and other DNA markers is compared between species in Comparative genome mapping. As early as 1915, the geneticist J.B.S Haldane reported that
Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing
KOO10 5/31/04 12:17 PM Page 131 10 Geospiza s Finch-Server: A Complete Data Management System for DNA Sequencing Sandra Porter, Joe Slagel, and Todd Smith Geospiza, Inc., Seattle, WA Introduction The increased
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).
DNA Sequencing & The Human Genome Project
DNA Sequencing & The Human Genome Project An Endeavor Revolutionizing Modern Biology Jutta Marzillier, Ph.D Lehigh University Biological Sciences November 13 th, 2013 Guess, who turned 60 earlier this
A leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility
Sharing Data from Large-scale Biological Research Projects: A System of Tripartite Responsibility Report of a meeting organized by the Wellcome Trust and held on 14 15 January 2003 at Fort Lauderdale,
Genomics Services @ GENterprise
Genomics Services @ GENterprise since 1998 Mainz University spin-off privately financed 6-10 employees since 2006 Genomics Services @ GENterprise Sequencing Service (Sanger/3730, 454) Genome Projects (Bacteria,
Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage
Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research March 17, 2011 Rendez-Vous Séquençage Presentation Overview Core Technology Review Sequence Enrichment Application
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.
Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Empowering microbial genomics. Extensive methods. Expansive possibilities. In microbiome studies
Biomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems
Localised Sex, Contingency and Mutator Genes Bacterial Genetics as a Metaphor for Computing Systems Outline Living Systems as metaphors Evolutionary mechanisms Mutation Sex and Localized sex Contingent
Breast cancer and the role of low penetrance alleles: a focus on ATM gene
Modena 18-19 novembre 2010 Breast cancer and the role of low penetrance alleles: a focus on ATM gene Dr. Laura La Paglia Breast Cancer genetic Other BC susceptibility genes TP53 PTEN STK11 CHEK2 BRCA1
Data Analysis for Ion Torrent Sequencing
IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page
GeneCopoeia Genome Editing Tools for Safe Harbor Integration in. Mice and Humans. Ed Davis, Liuqing Qian, Ruiqing li, Junsheng Zhou, and Jinkuo Zhang
G e n e C o p o eia TM Expressway to Discovery APPLICATION NOTE Introduction GeneCopoeia Genome Editing Tools for Safe Harbor Integration in Mice and Humans Ed Davis, Liuqing Qian, Ruiqing li, Junsheng
Genetic Technology. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Name: Class: Date: Genetic Technology Multiple Choice Identify the choice that best completes the statement or answers the question. 1. An application of using DNA technology to help environmental scientists
How the worm was won
The C. elegans genome project Special Review Caenorhabditis elegans is the first animal to have its genome completely sequenced. To mark this outstanding achievement, we have published a series of articles
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University
Genotyping by sequencing and data analysis Ross Whetten North Carolina State University Stein (2010) Genome Biology 11:207 More New Technology on the Horizon Genotyping By Sequencing Timeline 2007 Complexity
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
