Methodology for Copy Number Variant Detection from High. Throughput DNA Exome Sequencing and Application to the

Size: px
Start display at page:

Download "Methodology for Copy Number Variant Detection from High. Throughput DNA Exome Sequencing and Application to the"

Transcription

1 Methodology for Copy Number Variant Detection from High Throughput DNA Exome Sequencing and Application to the Genetic Mapping of Rare Genetic Disorders Case Presentation 3 Word Count Michael Epstein CoMPLEX University College London Supervisors: Dr Vincent Plagnol, UCL Genetics Institute Prof. Nick Wood, UCL Institute of Neurology MRes Modelling Biological Complexity,

2 Contents Contents b 1 Abstract 1 2 Background Genomic Variation Copy Number Variation Detecting Copy Number Variation Comparative Genomic Hybridisation and SNP detection CNV Discovery Strategies using Next Generation Sequencing Focus of the Case Presentation Materials and Methods Generating Read Depths with ReadDepthMapper Data Generation Exome Sequencing Available Exome Sequences Program Execution Data Analysis Results 19 5 Discussion 24 Bibliography 26 b

3 1 Abstract Large structural variations such as Copy Number Variations (CNVs) are pervasive in the human genome and are thought to have a large influence on a variety of Mendelian and somatic genetic disorders. This Case Presentation reviews how increasing numbers of CNVs in the human genome are being detected, particularly within the context of emergent Next Generation Sequencing technology. The project describes the development of ReadDepthMapper, a C++ command line tool, to generate read depths from BAM formatted NGS data. The tool is then applied to 263 exome sequences available at the UCL Genetics Institute with the aim of uncovering evidence for large structural deletions. The results revealed a potential monosomy in chromosome 7 in one individual which helps suggest a biological explanation for this individual s rare genetic disorder.

4 2 Background 2.1 Genomic Variation Genomic variation can be defined as variations in genotypes within or between species. It is the key driver of varying phenotypes between individuals. Therefore discovering and understanding the various types of genomic variation, particularly in reference to the human genome, is key to unlocking the reasons behind phenotypic differences between individuals. It also helps to elucidate the role of genetic variation in the analysis of human diseases. This is particularly important in complex diseases where individual genetic factors exhibit a limited penetrate or a number of genomic factors combine to contribute to a human disease. Exploring genomic variation typically involves classifying variations based the size of discovered variants. For example, large differences between individuals, or even genomes in different cells within the same individual, can be detected visually as chromosomal abnormalities on the microscopic level. At the smallest end of the variant spectrum, differences between individual genomes have been uncovered at the base level. Variation as small as differences in nucleotides at a given position between genomic sequences, known as Single Nucleotide Polymorphisms (SNPs), are a well established source of variation. Also at the base scale, small insertions and deletions of a small number of 2

5 2.2. COPY NUMBER VARIATION 3 nucleotides, known as indels, also contribute to the polymorphism of genotypes. Analysing the relative importance of different classes of structural variation can be distilled into two issues. The first issue is estimating, for a particular class of variation, how much of the overall variation observed in a genome it accounts for. The second issue is establishing what phenotypic impact a certain class of structural variation creates. One natural implication follows in establishing the role that different types of variant plays in causing human disease. Until recently, much of the important phenotypic variation seen in the human genome was thought to be the results of small base level mutations or indels [1]. For example, recent developments in large scale sequencing and SNP discovery have allowed Genome Wide Association (GWA) studies to identify SNPs which have a statistically significant association with phenotypic trails such as height[2] and complex diseases [3, 4]. However the total level of phenotypic variation explained by these statistically significant gene loci falls well short of prior estimates of genetic heritability for complex diseases and phenotypes. This has led for the search for the missing heredity and other explanations to help explain a greater proportion of the observed heritability of these traits. 2.2 Copy Number Variation Additional attention has recently been focused on a different class of genetic variant. These structural variants are called Copy Number Variations (CNVs) and are defined as polymorphisms greater than a kilobase in size which are present at a variable number of copies when compared to a reference genome[5]. CNVs encompass insertions, deletions and inversions of stretches of DNA as well as more complex multisite variants. Such variation covers around 12% of the human genome [5], encompassing a greater proportion

6 2.2. COPY NUMBER VARIATION 4 of nucleotides in the human genome than SNPs. In addition to covering a wide area of the typical human genome, CNV regions are also thought to account for a sizeable variety of phenotypes - CNVRs span across almost 1,500 genes [5] and are likely to have significant impact on disease as initial estimates suggested 14.5% of genes registered in the Online Mendelian Inheritance in Man (OMIM) 1 were subject to CNV. Gene ontology analysis suggests that large portions of the exome that are under copy number variation tends to code for proteins involved in the immune response and inflammation, for example low copy numbers for the gene CCL3L1, which encodes an HIV1-suppressive chemokine is associated with an accelerated rate of HIV progression[4]. Causal CNVs have been well established between rare neurological diseases such as Williams-Beuren syndrome but have also been associated with spectrum disorders such as autism and other psychiatric illnesses such as schizophrenia[4]. In addition to germline CNV, somatic and inherited variations are recognised as important factors in the development of cancer genomes, such as structural chromosomal rearrangements and the development of fusion transcripts which dis-regulate the expression and activity of genes that control the cell cycle. For example, [6] establish CNV fusion transcripts such as CHD7-PVT1 likely to have occurred early in tumour genesis in small cell lung cancer on the basis of their amplified copy number. Common somatic CNVs have also seen found in other solid tumours such as prostate and colorectal cancers [4] 1

7 2.3. DETECTING COPY NUMBER VARIATION Detecting Copy Number Variation Comparative Genomic Hybridisation and SNP detection Early efforts to discover and categorise copy number variation relied on a technique called Comparative Genomic Hybridisation, or CGH, which seeks to test the relative frequencies of DNA regions from a test and a reference sample on an hybridisation array. The colour of the fluorescence on the array indicates gains or losses in copy number against a reference. In one such study[7], a targeted microarray containing 1,986 nonredundant BACs was constructed to encompass a total of 130 recombination hotspots, as defined by the presence of common intra-chromosomal duplications in the genome. 47 lymphoblastoid cell lines were hybridised which revealed 119 regions of significant copy number polymorphism, of which 73 were at the time previously unreported. This suggested an important role of segmental duplications in defining rearrangement hotspots - highlighting genomic regions likely to contribute to CNV polymorphism. Another early method used to measure copy number variation involves using SNP genotypes, for example from the extensive and high quality collection of SNPs generated from the International HapMap project[8, 9]. In suspected regions of CNVs, algorithms can be used to detect signatures from a contiguous run of SNPs which could suggest potential deletions. [10] for example, analysed transmission genotypes from parent-child trios from the International HapMap Project. The authors examined SNP calls in offspring that appear to be incompatible with Mendelian inheritance of SNPs. This is where a maternal deletion has occurred but the SNP genotyping method has miscalled as homozygous for the remaining paternal allele in the offspring. Using this technique, areas of CNV deletions within the human genome were revealed. There are drawbacks to both CGH and SNP approaches. For example, with CGH, the

8 2.3. DETECTING COPY NUMBER VARIATION 6 size of the putative CNV regions examinable is limited by the insert size, which is the length of the DNA sequence used to make up the probe. Also, the coverage of regions is physically limited by the density of the array and CGH cannot detect inversions, only the absolute copy number of a probe sequence. With SNP analyses, the detectable size and breakpoint resolution of CNVs vary with the available density of SNPs in the regions where CNVs are being studied for CNV Discovery Strategies using Next Generation Sequencing Both CGH and SNP approaches to detecting CNVs predate the development of Next Generation Sequencing (NGS). High throughput technologies allow high genomic coverage of paired-end reads of a genomic sample. Paired reads are generated through fragmenting genomic DNA samples into a predefined length, known as the insert size. Both the forward and reverse template strands are sequenced from opposite ends. The resulting set of sequences forms a library of paired reads a fixed distance apart which aids alignment to a reference genome. These paired reads can be utilised using various techniques to detect CNVs between the sample and a reference. Paired reads were originally produced by traditional Sanger sequencing, but the advent of NGS technologies such as Illumina, ABI-SOliD and Roche 454 allow a faster generation of paired end reads at a greater genome depth of coverage and at a cheaper sequencing cost per base. The read lengths of these technologies, however, are quite short by Sanger standards, typically ranging from 35bp-400bp in length. This makes them difficult to arrange into contigs and scaffolds, and therefore makes the direct assembly of a sample genome from its millions of paired reads still extremely difficult.

9 2.3. DETECTING COPY NUMBER VARIATION 7 (a) Normal read: here, the distance between the two reads is the same between the reference and sample. (b) Deletion: the distance between the reads mapped to the genome is greater than the insert size of the sample read. This indicates that a some content which is present in the reference has been deleted from the sample. (c) Insertion: the distance between the reads as mapped to the genome is smaller than the insert size of the sample. This indicates that some sequence has been inserted into the sample. (d) Inversion: the order and insert size of two paired reads are the same. However, one of each of the paired reads changes orientation indicating that some of the sequence within the two outer reads has become inverted in the sample relative to the reference. (e) Linked signature: two adjacent paired reads map to distal segments of the genome. These linked insertions can be further used to discover linked insertions and identify the content which has been inserted. (f) Everted duplication: two interlinked paired reads map in the same orientation to the reference but in a different order; this signifies a tandem duplication of DNA sequence in the sample. Figure 2.1: Paired Read Mapping signatures (Reproduced from [11])

10 2.3. DETECTING COPY NUMBER VARIATION 8 Instead, detecting genetic variation requires mapping paired reads from a sample to a reference genome, and CNVs are investigated using Paired-End Mapping or Depth of Coverage strategies, both explained below Paired-End Mapping (PEM) The limited insert size of the sequenced pairs precludes direct genome assembly and direct comparison with other completed genomes. Instead, the reads are mapped to a reference assembly and then investigated with recourse to common matching signatures which suggest insertion, deletion, inversion or other more complicated CNV events. These signatures are illustrated and described in Figure 2.1. Additionally, there is information given by split or unmapped reads. For example, a split read, where two fragments of a read map to different regions on the genome, indicate a deletion and also the breakpoint of the deletion. Similarly, a truncated mapping, where only a fraction of the whole read maps to the genome, reveals the sequence and size of an inserted element. An unmapped read, disregarding quality control issues, can indicate the insertion of novel genomic sequence as it cannot be mapped to the reference. For practical paired read mapping analysis, methods are required to detect the above signatures outlined in Figure 2.1. Occurrences of single signatures at a given genomic location are unlikely to represent true structural variants because of the imperfect nature of base calling, potential for chimeric reads and incorrect alignment to the reference. Also, while the insert size of a library can be set, it also carries some variability. Hence small indels could be presumed to occur on the basis of varying insert size if individual reads are taken at face value.

11 2.3. DETECTING COPY NUMBER VARIATION 9 Non-withstanding these issues, a pioneering study by [12] usedpemtechniquesto discover CNVs using 3kb insert reads sequenced with NGS technology. Their approach used sequences from two individuals (female European and female African) to identify similar and divergent structural variants greater than 3kb in size. Their analysis revealed that the number of structural variants were greater than previously thought, many of which could potentially affect gene function. This was as a result of the smaller paired read insert size, which enabled the study to detect smaller variants and provide greater breakpoint precision than previous studies which utilised CGH. In an alternative use of paired end short reads, [13] developed Pindel, an algorithm that uses pattern growth to map the break points of large deletions and medium sized insertions. The pattern growth process is used to map the fragments of the second paired read where the first read has been uniquely mapped to the genome. Deletions are detected from the second unmapped read, from the reference sequence located between the first mapped portion from the 3 end and the second mapped portion from the 5 end. For insertions, the insertion is classified as the segment between the 5 and 3 pattern fragments of the unmapped read which cannot be mapped to the genome. Both [12, 13] use a clustering strategy which clusters paired read mappings from discordant pairs (where only one end of the read is mapped to the reference genome) to build up a catalog of evidence to support a particular type to structural variant at a given genomic location. The minimum numbers of read pairs required for a reliable cluster to be formed and the distance at which the read pair is considered discordant are both key parameters to establish the sensitivity and specificity of CNV clusters. Sequencing coverage depth can also impact these parameters; a greater coverage depth results in fewer numbers of mate pairs required for a given level of specificity for a variant call, and shortens the distance after which a paired read can be considered discordant, as greater coverage

12 2.3. DETECTING COPY NUMBER VARIATION 10 depth increases the signal to noise ratio obtained from the sample. The continuing development of sophisticated clustering approaches will be key in detecting CNVs with increased breakpoint precision and confidence Depth of Coverage (DOC) One of the key advantages of NGS technology is the vastly improved depth of coverage. This provides the additional benefit of multiple samples of a given genomic segment which can increase the signal to noise ratio from the sample sequencing. Furthermore, if it can be assumed that the probability of any given base in the genome being sequenced is equal to any other base, then the number of reads mapping to a given stretch of DNA can be assumed to follow a Poisson distribution and is therefore in proportion to the number of times this region appears in the sample genome. Therefore, a duplicated sequence in a sample could be revealed as having a higher than expected number of reads when mapping to a reference sequence. Similarly, a sequence which has been deleted from the sample sequence would have a fewer reads mapping to the reference and hence exhibit a lower read depth. Depth of coverage measures are good at detecting large CNV events, assuming a good level of genomic coverage. This is because significant differences in read depth are less likely to occur by chance for large insertions and deletions than for smaller CNVs. However, DOC cannot detect where duplications have been inserted into the genome, but only that the duplications exist. Also, novel inserted sequences cannot be detected as they, by definition, will not map to the reference genome. Studies which have used depth of coverage to investigate CNVs typically look to segment read depth measurements mapped to the reference genome. Each read depth window aims to have the same read depth within the window, but the depth of a given window

13 2.3. DETECTING COPY NUMBER VARIATION 11 contrasts sharply with adjacent windows. Windowing provides the length of sequence with a consistent read depth and also suggests the breakpoints of the window. Based on its read depth, the window can indicate a gain, loss or no CNV event. Studies such as [6] used reads from their sample which mapped correctly to the reference in terms of insert size and orientation to build a picture of copy number changes across the genome. They subsequently adapted a circular binary segmentation algorithm to generate statistical predictions of copy number changes in both the raw copy number and the breakpoints of copy number variation. This method subsequently helped unravel the structure of complex amplified elements of the cancer genomes of two individuals, leading to putative suggestions of the evolutionary timeline of the cancer genome. The current challenge of read depth analysis is determining robust methods for segmenting the depth of coverage windows which delineates different copy number events for an accepted significance level. Recent developments to address this problem include [14] which pioneered a technique called Event-Wise Testing to merge read depths across 100bp windows into contiguously larger regions which have statistically increased or decreased read depth. The results from this statistically based merging technique suggest that this read depth technique is able to detect CNV signatures which PEM based approaches find difficult to detect, such as segmental duplications. [15] developed a technique called CNV-Seq, conceptually derived from CGH which uses NGS data to compare two genomic sample to a reference genome. [16] presented SegSeq an algorithm to segment equal copy numbers from NGS data. SegSeq merges candidate 100kb windows to generate variable window sizes setting a false discovery rate for 10 genome wide false positive segments. The technique was used both to detect copy-number variations in tumour cell lines on a par with existing microarray technologies but with greater breakpoint precision, typically to within 1kb.

14 2.4. FOCUS OF THE CASE PRESENTATION Focus of the Case Presentation The aim of this case presentation is to use CNV detection techniques to discover cases of monosomy in 263 sample exome sequences available at the UGI. Monosomy is a class of aneuploidy, or an abnormal number of chromosomes, in which case only one of the chromosomes in a diploid organism is present. Partial monosomy is where a part of a chromosome is deleted leaving only one copy of a stretch of the genome in the other chromosome. In humans, this can lead to many disorders such as Cri du chat, a rare genetic disorder due to the deletion of a part of chromosome 5 and myelodysplastic syndrome, a disease of the blood and bone marrow, linked with partial monosomy of chromosome 7. Although monosomy and partial monosomy are large structural events, this project aims to detect them using CNV approaches. However, the use of exome sequencing precludes the use of paired read mapping strategies to discover copy number variants. This is because only a small and interspersed portion of the genome is being sequenced. This means that breakpoints of deleted and inserted exons is difficult to calculate using split read techniques as split reads are unlikely to fall on these reduced boundaries. Therefore, a read depth approach is a better method to investigate the possibility of partial monosomy of samples of exome sequencing as it is a good technique to detect large deletion events, such as chromosomal partial monosomy. The specific focus of this case presentation is twofold. The first aim is to develop a computational tool to efficiently parse sequenced paired reads and generate a read depth profile across defined genomic regions. The read depths generated from the tool are then to be used in order to detect monosomy signatures across a sample of 263 exome sequences held at the UGI using a simple statistical procedure.

15 3 Materials and Methods 3.1 Generating Read Depths with ReadDepthMapper A tool called ReadDepthMapper was developed in C++ in order to generate read depths for sample sequences. Initially, the program parses a file containing a set of regions, typically a set of exons as defined by the CCDS 1, or a set of chromosomes. It then merges small regions into larger regions if two regions are very close to each other, less than 50bp by default. This is necessary as due to noise in the capture and sequencing process, there provide limits to the level of resolution that the read depth technique can provide. The names of the regions which are concatenated are noted so the user remains informed as to which regions have been merged. The program then accepts a list of paired read files, which are stored in the compressed binary version of the Sequence Alignment/Map format (BAM)[17]. Using the Samtools C API 2 as a 3rd party library within ReadDepthMapper, the files are parsed in turn. Each read pair is examined to check each paired end maps to the same chromosome and that the distance between the reads is not significantly greater than the insert size of the reads. If the read pair matches these criteria, the midpoint of the paired read is

16 3.2. DATA GENERATION 14 considered to be the location of the read. If the location of the read lies within a region as specified from the merged region file, a count is added to that region. After all paired read files have been parsed, the data structure holding the read depths generated from the paired reads is printed out to a directory location specified by the user. The user can also specify to print out the names of the regions (for example, exon names), which is useful if small exons have been merged, for reasons explained above. Development of the program utilised a unit test framework UnitTest++ 3 in order to test the key algorithms of the program, such as the sorting and ordering of regions and printing out of the read depth analysis. The unit tests can be run as part of the Makefile compilation of the program. The source code is held in an online svn repository 4, access to which is available from the author. In order to enable the parallelisation of read depth generation, a scripting tool was developed to concatenate ReadDepthMapper output files together. This allows a large number of input files with a common region file to be broken up and run concurrently, with the resulting files being merged at the end of the processes. 3.2 Data Generation Exome Sequencing Exome capture is a strategy to sequence the coding elements of a target genome in order to uncover genes or gene variants associated with genetic disorders. It involves the capture of the exons in the genome, which constitute about 1% of the human genome 3 4 https://subversion.assembla.com/svn/readdepthmapper/

17 3.2. DATA GENERATION 15 but are estimated to harbour 85% of the mutations with large effects on disease related traits[18]. Exome sequencing avoids the cost and complexity of whole genome sequencing by deep sequencing just a small portion of the human genome but maintaining the benefit of maintaining high coverage depth available through NGS. Its has recently been applied to discover the candidate gene for Miller syndrome, a rare Mendelian disorder with previously unknown cause [19]. The exomes sequenced used in this project were captured by hybridisation. The Agilent SureSelect method of genome capture is shown in Figure 3.1 below. Agilent SureSelect protocols use a biotinylated RNA library to capture the sequences of interest from a genomic sample. After the targeted genomic sample has hybridised to the RNA library, streptavidin coated magnetic beads attach to the biotin to allow it to be preferentially removed. This leaves the unbound genomic sample, which is not of interest, to be discarded. The captured elements can then be cleaned, amplified and sequenced Available Exome Sequences Figure 3.1: Agilent SureSelect Exome Capture (taken from [20]) There were 263 exome sequences available for analysis at the UGI, all obtained from

18 3.2. DATA GENERATION 16 individuals with diseases suspected to be caused by rare genetic variants. These samples can be sub-classed into different study cohorts as shown in Table 3.1 below. Source Samples Capture Technology Primary Capture Reason UCL Institute of Neurology 223 Agilent 50MB Investigation of early onset Alzheimer and early dementia UCL Institute of Ophthalmology 5 Agilent 38MB Investigation of early onset blindness QMUL School of Medicine and Dentistry 11 Agilent 38MB Investigation of rare forms of dermatological disease Cambridge Infectious Disease 12 Agilent 38MB Investigation of tuberculosis resistance QMUL Blizard Institute of Cell and Molecular Science 12 Agilent 38MB Investigation into rare forms of bone marrow failure Table 3.1: Samples used for Project Monosomy Screening As can be seen in Table 3.1, the exome capture technologies differed between subgroups. The exomes from the Institute of Neurology were captured with 50MB Agilent SureSelect technology whereas the other samples were captured with 38MB Agilent SureSelect technology. The main difference between the two captures is that the 50MB capture includes additional validated exomic content from the GENCODE project. It also includes all exons annotated in the consensus CDS database, as well as small non-coding RNAs from mirbase and Rfam Program Execution The 263 samples were examined for evidence of monosomy. A region file was generated which had each chromosome defined as one region, therefore read depth counts for each of the 263 samples were generated on a per chromosome basis. Given the large number of samples, the analysis was parallelised by dividing the samples up into groups of at most ten samples and submitting 27 jobs to the cluster at UGI. The results files were concatenated together using the script utility described in Section 3.1 to produce a read depth for each sample and chromosome in a single file. 5 https://www.chem.agilent.com/library/datasheets/public/ en_lo.pdf

19 3.3. DATA ANALYSIS Data Analysis Data analysis was performed in R for the resulting read depth analysis generated as described in Section 3.2. A straightforward statistical analysis was performed which involved calculating a z-score for each chromosome in each sample. This z-score was used to highlight autosomal chromosomes which had a statistically significant lack of reads compared to the expected number of reads for that chromosome. Such a z-score would suggest evidence of monosomy in that sample chromosome. The z-scores were computed as follows. The total number of reads for each chromosome over all samples was generated. Then, the fraction of reads that each sample has for a given chromosome was calculated. The mean fraction of chromosomal reads of each sample across its chromosomes was then computed. For a given sample, the mean number of reads and standard error for each chromosome was calculated by using the binomial formula for the expectation and variance, np and np(1 p) respectively. Here, n represents the number of reads across all samples for that chromosome, p is the fraction of reads in the sample for that chromosome and hence 1 p is the fraction of chromosomal reads mapping to other samples. The binomial distribution is appropriate for each sample chromosome as a read in a given chromosome across all the samples either maps to a particular sample (success) or it maps to that chromosome in another sample (failure). Given the large number of total reads, the estimates for the expectation and variance of a given sample chromosome can be assumed to approximate a normal distribution with the estimated binomial mean and standard error used as parameter estimates for µ and σ. The z-score for each chromosome is then calculated from the observed number of reads, the expected number of reads for that chromosome calculated from the sample

20 3.3. DATA ANALYSIS 18 fraction mean and the estimated variance in the reads for that chromosome.

Sequencing and microarrays for genome analysis: complementary rather than competing?

Sequencing and microarrays for genome analysis: complementary rather than competing? Sequencing and microarrays for genome analysis: complementary rather than competing? Simon Hughes, Richard Capper, Sandra Lam and Nicole Sparkes Introduction The human genome is comprised of more than

More information

Next Generation Sequencing: Technology, Mapping, and Analysis

Next Generation Sequencing: Technology, Mapping, and Analysis Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took

More information

Fishing for variants in the deep end of the gene pool: OGT s custom bait designs

Fishing for variants in the deep end of the gene pool: OGT s custom bait designs Fishing for variants in the deep end of the gene pool: OGT s custom bait designs Jolyon Holdstock, Simon Hughes and Daniel Swan Abstract Oxford Gene Technology (OGT) has extensive expertise in probe design

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)

Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

How-To: SNP and INDEL detection

How-To: SNP and INDEL detection How-To: SNP and INDEL detection April 23, 2014 Lumenogix NGS SNP and INDEL detection Mutation Analysis Identifying known, and discovering novel genomic mutations, has been one of the most popular applications

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Information leaflet. Centrum voor Medische Genetica. Version 1/20150504 Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel

Information leaflet. Centrum voor Medische Genetica. Version 1/20150504 Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel Information on genome-wide genetic testing Array Comparative Genomic Hybridization (array CGH) Single Nucleotide Polymorphism array (SNP array) Massive Parallel Sequencing (MPS) Version 120150504 Design

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Copy Number Variation: available tools

Copy Number Variation: available tools Copy Number Variation: available tools Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction A literature review of available

More information

Gene Mapping Techniques

Gene Mapping Techniques Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction

More information

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur

More information

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has

More information

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA

Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Page 1 of 5 Biology Behind the Crime Scene Week 4: Lab #4 Genetics Exercise (Meiosis) and RFLP Analysis of DNA Genetics Exercise: Understanding how meiosis affects genetic inheritance and DNA patterns

More information

Simplifying Data Interpretation with Nexus Copy Number

Simplifying Data Interpretation with Nexus Copy Number Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing

More information

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Product Bulletin Sequencing Software SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications Comprehensive reference sequence handling Helps interpret the role of each

More information

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered

More information

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics The GS Junior System The Power of Next-Generation Sequencing on Your Benchtop Proven technology: Uses the same long

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

Human Mendelian Disorders. Genetic Technology. What is Genetics? Genes are DNA 9/3/2008. Multifactorial Disorders

Human Mendelian Disorders. Genetic Technology. What is Genetics? Genes are DNA 9/3/2008. Multifactorial Disorders Human genetics: Why? Human Genetics Introduction Determine genotypic basis of variant phenotypes to facilitate: Understanding biological basis of human genetic diversity Prenatal diagnosis Predictive testing

More information

Meiosis and Sexual Life Cycles

Meiosis and Sexual Life Cycles Meiosis and Sexual Life Cycles Chapter 13 1 Ojectives Distinguish between the following terms: somatic cell and gamete; autosome and sex chromosomes; haploid and diploid. List the phases of meiosis I and

More information

Human Genome and Human Genome Project. Louxin Zhang

Human Genome and Human Genome Project. Louxin Zhang Human Genome and Human Genome Project Louxin Zhang A Primer to Genomics Cells are the fundamental working units of every living systems. DNA is made of 4 nucleotide bases. The DNA sequence is the particular

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Chromosomes, Mapping, and the Meiosis Inheritance Connection

Chromosomes, Mapping, and the Meiosis Inheritance Connection Chromosomes, Mapping, and the Meiosis Inheritance Connection Carl Correns 1900 Chapter 13 First suggests central role for chromosomes Rediscovery of Mendel s work Walter Sutton 1902 Chromosomal theory

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

CCR Biology - Chapter 9 Practice Test - Summer 2012

CCR Biology - Chapter 9 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 9 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Genetic engineering is possible

More information

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company

Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Chapter 8: Recombinant DNA 2002 by W. H. Freeman and Company Genetic engineering: humans Gene replacement therapy or gene therapy Many technical and ethical issues implications for gene pool for germ-line gene therapy what traits constitute disease rather than just

More information

Biomedical Big Data and Precision Medicine

Biomedical Big Data and Precision Medicine Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types

More information

Mendelian violations in the CEU and YRI Pilot 2 Trios

Mendelian violations in the CEU and YRI Pilot 2 Trios Mendelian violations in the CEU and YRI Pilot 2 Trios Mark DePristo and Mark Daly Manager, Medical and Population Genetics Analysis Medical and Population Genetics Program Broad Institute of Harvard and

More information

14.3 Studying the Human Genome

14.3 Studying the Human Genome 14.3 Studying the Human Genome Lesson Objectives Summarize the methods of DNA analysis. State the goals of the Human Genome Project and explain what we have learned so far. Lesson Summary Manipulating

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Overview of Genetic Testing and Screening

Overview of Genetic Testing and Screening Integrating Genetics into Your Practice Webinar Series Overview of Genetic Testing and Screening Genetic testing is an important tool in the screening and diagnosis of many conditions. New technology is

More information

Innovations in Molecular Epidemiology

Innovations in Molecular Epidemiology Innovations in Molecular Epidemiology Molecular Epidemiology Measure current rates of active transmission Determine whether recurrent tuberculosis is attributable to exogenous reinfection Determine whether

More information

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources 1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools

More information

Disease gene identification with exome sequencing

Disease gene identification with exome sequencing Disease gene identification with exome sequencing Christian Gilissen Dept. of Human Genetics Radboud University Nijmegen Medical Centre c.gilissen@antrg.umcn.nl Contents Infrastructure Exome sequencing

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

NGS and complex genetics

NGS and complex genetics NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine r.kraaij@erasmusmc.nl Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing Gene Hunting Mendelian gene

More information

Genomes and SNPs in Malaria and Sickle Cell Anemia

Genomes and SNPs in Malaria and Sickle Cell Anemia Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing

More information

Chapter 21 Active Reading Guide The Evolution of Populations

Chapter 21 Active Reading Guide The Evolution of Populations Name: Roksana Korbi AP Biology Chapter 21 Active Reading Guide The Evolution of Populations This chapter begins with the idea that we focused on as we closed Chapter 19: Individuals do not evolve! Populations

More information

Genetics Lecture Notes 7.03 2005. Lectures 1 2

Genetics Lecture Notes 7.03 2005. Lectures 1 2 Genetics Lecture Notes 7.03 2005 Lectures 1 2 Lecture 1 We will begin this course with the question: What is a gene? This question will take us four lectures to answer because there are actually several

More information

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova

New generation sequencing: current limits and future perspectives. Giorgio Valle CRIBI - Università di Padova New generation sequencing: current limits and future perspectives Giorgio Valle CRIBI Università di Padova Around 2004 the Race for the 1000$ Genome started A few questions... When? How? Why? Standard

More information

History of DNA Sequencing & Current Applications

History of DNA Sequencing & Current Applications History of DNA Sequencing & Current Applications Christopher McLeod President & CEO, 454 Life Sciences, A Roche Company IMPORTANT NOTICE Intended Use Unless explicitly stated otherwise, all Roche Applied

More information

Factors for success in big data science

Factors for success in big data science Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)

More information

Roberto Ciccone, Orsetta Zuffardi Università di Pavia

Roberto Ciccone, Orsetta Zuffardi Università di Pavia Roberto Ciccone, Orsetta Zuffardi Università di Pavia XIII Corso di Formazione Malformazioni Congenite dalla Diagnosi Prenatale alla Terapia Postnatale unipv.eu Carrara, 24 ottobre 2014 Legend:Bluebars

More information

Mitochondrial DNA Analysis

Mitochondrial DNA Analysis Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)

More information

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office 2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation

More information

DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky

DNA Mapping/Alignment. Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky DNA Mapping/Alignment Team: I Thought You GNU? Lars Olsen, Venkata Aditya Kovuri, Nick Merowsky Overview Summary Research Paper 1 Research Paper 2 Research Paper 3 Current Progress Software Designs to

More information

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat

Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent

More information

Forensic DNA Testing Terminology

Forensic DNA Testing Terminology Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.

More information

Gene mutation and molecular medicine Chapter 15

Gene mutation and molecular medicine Chapter 15 Gene mutation and molecular medicine Chapter 15 Lecture Objectives What Are Mutations? How Are DNA Molecules and Mutations Analyzed? How Do Defective Proteins Lead to Diseases? What DNA Changes Lead to

More information

MUTATION, DNA REPAIR AND CANCER

MUTATION, DNA REPAIR AND CANCER MUTATION, DNA REPAIR AND CANCER 1 Mutation A heritable change in the genetic material Essential to the continuity of life Source of variation for natural selection New mutations are more likely to be harmful

More information

SNPbrowser Software v3.5

SNPbrowser Software v3.5 Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium

More information

Genetic diagnostics the gateway to personalized medicine

Genetic diagnostics the gateway to personalized medicine Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed

More information

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.

CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data. : An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 Results

More information

The following chapter is called "Preimplantation Genetic Diagnosis (PGD)".

The following chapter is called Preimplantation Genetic Diagnosis (PGD). Slide 1 Welcome to chapter 9. The following chapter is called "Preimplantation Genetic Diagnosis (PGD)". The author is Dr. Maria Lalioti. Slide 2 The learning objectives of this chapter are: To learn the

More information

Chapter 13: Meiosis and Sexual Life Cycles

Chapter 13: Meiosis and Sexual Life Cycles Name Period Chapter 13: Meiosis and Sexual Life Cycles Concept 13.1 Offspring acquire genes from parents by inheriting chromosomes 1. Let s begin with a review of several terms that you may already know.

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines

Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice. Supplementary Guidelines Assuring the Quality of Next-Generation Sequencing in Clinical Laboratory Practice Next-generation Sequencing: Standardization of Clinical Testing (Nex-StoCT) Workgroup Principles and Guidelines Supplementary

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

PLNT2530 Unit 6e DNA Sequencing

PLNT2530 Unit 6e DNA Sequencing PLNT2530 Unit 6e DNA Sequencing Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada 1 High-throughput

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research March 17, 2011 Rendez-Vous Séquençage Presentation Overview Core Technology Review Sequence Enrichment Application

More information

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want 1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

Computational Genomics. Next generation sequencing (NGS)

Computational Genomics. Next generation sequencing (NGS) Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years

More information

Milk protein genetic variation in Butana cattle

Milk protein genetic variation in Butana cattle Milk protein genetic variation in Butana cattle Ammar Said Ahmed Züchtungsbiologie und molekulare Genetik, Humboldt Universität zu Berlin, Invalidenstraβe 42, 10115 Berlin, Deutschland 1 Outline Background

More information

INTRODUCTION TO NGS VARIANT CALLING ANALYSIS

INTRODUCTION TO NGS VARIANT CALLING ANALYSIS Hospital Universitari Vall d Hebron Institut de Recerca - VHIR Institut d Investigació Sanitària de l Instituto de Salud Carlos III (ISCIII) INTRODUCTION TO NGS VARIANT CALLING ANALYSIS Bioinformàtica

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

escience and Post-Genome Biomedical Research

escience and Post-Genome Biomedical Research escience and Post-Genome Biomedical Research Thomas L. Casavant, Adam P. DeLuca Departments of Biomedical Engineering, Electrical Engineering and Ophthalmology Coordinated Laboratory for Computational

More information

Corporate Medical Policy

Corporate Medical Policy Corporate Medical Policy Whole Exome and Whole Genome Sequencing for Diagnosis of Genetic Disorders File Name: Origination: Last CAP Review: Next CAP Review: Last Review: whole_exome_and_whole_exome_sequencing_for_diagnosis_of_genetic_disorders

More information

Microevolution: The mechanism of evolution

Microevolution: The mechanism of evolution Microevolution: The mechanism of evolution What is it that evolves? Not individual organisms Populations are the smallest units that evolve Population: members of a species (interbreeding individuals and

More information

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms

Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently. Name Section 7.014 Problem Set 5 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68-120 by 5:00pm on Friday

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms DNA Copy Number and Loss of Heterozygosity Analysis Algorithms Detection of copy-number variants and chromosomal aberrations in GenomeStudio software. Introduction Illumina has developed several algorithms

More information

SNP and destroy - a discussion of a weighted distance-based SNP selection algorithm

SNP and destroy - a discussion of a weighted distance-based SNP selection algorithm SNP and destroy - a discussion of a weighted distance-based SNP selection algorithm David A. Hall Rodney A. Lea November 14, 2005 Abstract Recent developments in bioinformatics have introduced a number

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Challenges associated with analysis and storage of NGS data

Challenges associated with analysis and storage of NGS data Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing

More information

Biological Sciences Initiative. Human Genome

Biological Sciences Initiative. Human Genome Biological Sciences Initiative HHMI Human Genome Introduction In 2000, researchers from around the world published a draft sequence of the entire genome. 20 labs from 6 countries worked on the sequence.

More information

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative The Human Genome Project From genome to health From human genome to other genomes and to gene function Structural Genomics initiative June 2000 What is the Human Genome Project? U.S. govt. project coordinated

More information

Genetics Module B, Anchor 3

Genetics Module B, Anchor 3 Genetics Module B, Anchor 3 Key Concepts: - An individual s characteristics are determines by factors that are passed from one parental generation to the next. - During gamete formation, the alleles for

More information

Version 5.0 Release Notes

Version 5.0 Release Notes Version 5.0 Release Notes 2011 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) +1.734.769.7249 (elsewhere) +1.734.769.7074 (fax) www.genecodes.com

More information

Next generation DNA sequencing technologies. theory & prac-ce

Next generation DNA sequencing technologies. theory & prac-ce Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing

More information

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium

More information

Detecting the Sardinian Specific Variability Trough Next Generation Sequencing of 2120 Individuals

Detecting the Sardinian Specific Variability Trough Next Generation Sequencing of 2120 Individuals UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:

More information

Overview of Next Generation Sequencing platform technologies

Overview of Next Generation Sequencing platform technologies Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany Outline 1. Technologies

More information

Title: Genetics and Hearing Loss: Clinical and Molecular Characteristics

Title: Genetics and Hearing Loss: Clinical and Molecular Characteristics Session # : 46 Day/Time: Friday, May 1, 2015, 1:00 4:00 pm Title: Genetics and Hearing Loss: Clinical and Molecular Characteristics Presenter: Kathleen S. Arnos, PhD, Gallaudet University This presentation

More information

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling

Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material

More information

Genetics of Rheumatoid Arthritis Markey Lecture Series

Genetics of Rheumatoid Arthritis Markey Lecture Series Genetics of Rheumatoid Arthritis Markey Lecture Series Al Kim akim@dom.wustl.edu 2012.09.06 Overview of Rheumatoid Arthritis Rheumatoid Arthritis (RA) Autoimmune disease primarily targeting the synovium

More information

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Molecular typing of VTEC: from PFGE to NGS-based phylogeny Molecular typing of VTEC: from PFGE to NGS-based phylogeny Valeria Michelacci 10th Annual Workshop of the National Reference Laboratories for E. coli in the EU Rome, November 5 th 2015 Molecular typing

More information