Next-Generation Sequencing: Applications and Implications III Peter J. Tonellato Laboratory for Personalized Medicine Center for Biomedical Informatics, Harvard Medical School May 2, 2011
Personal genome sequencing Applications and Implications III Translation of genomic information into clinical applications: Breast Cancer (BC) Examples Use Case 1: BC risk prediction model (review) Use Case 2: Screening test for gene mutations associated with hereditary BC Use Case 3: Expression-based BC classification Current landscape of genetically-informed medicine Where might we go in the future? Anticipated future clinical applications of NGS idea video
Use Case 1 A Risk Prediction Model for Breast Cancer SNPs are used in a genetic algorithm for disease risk The model is validated in a case-control study with an AUC Sample Case: Gail 2008 Model Mitchell H. Gail creates a breast cancer risk algorithm with a genetic component using SNPs from Easton et al. and other studies Gail s model is validated with an AUC of 63.2% 3
Development of BC risk prediction model HapMap project provides SNP frequency data and measure of linkage disequilibrium (LD) between subset of SNPs in dbsnp (1) Bin algorithm used to create LD blocks and tag SNPs for each block (2) Conduct GWAS study using tag SNPs to identify small subset of variants that contribute most to the variation at the phenotypic level (3) Validation studies that relate variation of identified SNPs and phenotypes to clinical applications (4, 5) 1 The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27; 437 (7063): 1299-320. 2 Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-Genome Patterns of Common DNA Variation in Three Human Populations. Science. 2005 Feb 18: 307 (5712): 1072-79. 3 Easton DF, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007 Jun 2; 447 (7148): 1087-93. 4 Gail, MH. Discriminatory Accuracy from Single-Nucleotide Polymorphisms in Models to Predict Breast Cancer Risk. J Natl Cancer Inst. 2008 July 16; 100(14): 1037 41. 5 Gail MH, Pfeiffer RM, On criteria for evaluating models of absolute risk. Biostatistics. 2005 Apr 1; 6 (2): 227-39.
Use Case 2 Screening test to identify mutations in BC susceptibility genes for risk assessment of hereditary cancer Anatomy of healthcare information unit from theoretical concept to clinical application
Historical basis for an inheritable component of BC The idea of a breast-cancer-prone family a case of a 19-year old BC patient whose grandmother and grand maternal uncle died of the disease (1757) 1 Documentation of hereditary BC in the Broca family (1866) 1 Segregation analyses provide evidence for autosomal dominant transmission of BC susceptibility gene (1984, 1988 ) 2,3 1. Eisinger F et al. Lancet. 1998 May 2;351(9112):1366. 2. Williams WR et al. Genet Epidemiol. 1984;1(1):7-20. 3. Newman B et al. Proc Natl Acad Sci U S A. 1988 May;85(9):3044-8.
Quick review on linked polymorphisms Genetic markers (DNA sequences with known polymorphisms) in close proximity tend to be inherited together (see slide 33) Use markers to map disease genes by measuring recombination Identify markers linked to the disease gene (little or no recombination between disease gene and the marker) to identify a region of genome of ~1-5 million bps containing the disease gene http://genome.wellcome.ac.uk/doc_wtd020778.html
Establishment of BRCA1 as a BC susceptibility gene Linkage of early-onset hereditary BC to chromosome 17q21 using genetic analysis of 23 extended families, including 146 individuals with BC (1990) 1 Confirmation of linkage results (1991) 2 Formal labeling of the specific (but unidentified) BC locus as BRCA1 (1991) 3 Further confirmation by a collaborative linkage study involving 214 BC families (1993) 4 Refinement of BRCA1 localization to a region of 1-2 megabases (1993-1994) 5-7 Further refinement of the BRCA1 locus to a ~600 kb region (1994) 8 Identification of BRCA1 gene (1994) 9 1. Hall JM et al. Science. 1990 Dec 21;250(4988):1684-9. 2. Narod SA et al. Lancet. 1991 Jul 13;338(8759):82-3. 3. Solomon E et al. Cytogenet Cell Genet. 199158:686-738. 4. Easton DF et al. Am J Hum Genet. 1993 Apr;52(4):678-701. 5. Bowcock AM et al. Am J Hum Genet. 1993 Apr;52(4):718-22. 6. Simard J et al. Hum Mol Genet. 1993 Aug;2(8):1193-9. 7. Goldgar DE et al. J Natl Cancer Inst. 1994 Feb 2;86(3):200-9. 8. Neuhausen SL et al. Hum Mol Genet. 1994 Nov;3(11):1919-26. 9. Miki Y et al. Science. 1994 Oct 7;266(5182):66-71.
Commercialization of BRCA mutation analysis Identification of new genetic alteration in BRCA1 mutations (1997) 1 Nine patents granted to Myriad Genetics, Inc. covering 47 mutations of BRCA1 (1997), all uses of BRCA1 (1998), and BRCA2 mutations (1998,2000) 2 Myriad begins to market BRACAnalysis diagnotic tests (late 1990s) 2 Myriad launches direct-to-consumer ad campaign (2002, 2001) 2 Update of BRACAnalysis to include quantitative DNA measurement assay (BART) test to detect large exonic deletions and duplications (2007) 3,4 1. Shattuck-Eidens D et al. Jama. 1997 Oct 15;278(15):1242-50. 2. Gold ER et al. Genet Med. 2010 Apr;12(4 Suppl):S39-70. 3. https://www.myriadpro.com/brac_bart 4. http://www.pnas.org/content/early/2010/06/23/1007983107.full.pdf+html
BRACAnalysis: Non-NGS Test of clinically significant genetic mutations Extraction of DNA from patient samples PCR amplification Identification of point mutations and indels Detection of large rearrangements (deletions and duplications, see slides 34-35) Sanger sequencing of individual exons Quantitative PCR-based copy number analysis Requires separate testing for large rearrangements http://www.myriad.com/lib/technical-specifications/bracanalysis- Technical-Specifications.pdf
The fine print at the bottom of the test results This test and its performance characteristics were determined by Myriad Genetic Laboratories. It has not been reviewed by the U.S. Food and Drug Administration. The FDA has determined that such clearance or approval is not necessary.
Use Case 3 Expression-based BC classification
BC Classification Protein level Hormone receptor (estrogen or progesterone receptor) positive ER positive /negative PgR positive /negative Her2 positive/negative Triple negative: Not positive to receptors for estrogen, progesterone, or HER2. Gene level, gene expression signatures Ellsworth RE et al. Curr Genomics. 2010 May;11(3):146-61. Laura J. van t Veer, Nature 2008; Vol 452 564-70
Oncotype DX Optimization of methods for quantifying gene expression 1 Selection of 250 cancer-related candidate genes, see slide 36 2-5 Analysis of candidate gene expression in multiple independent clinical studies, see slide 37 6-8 Clinical validation of 21-gene panel for prediction of chemotherapy benefit and 10-year distant recurrence for certain breast cancer patients 1. Cronin M, Am J Pathol 2004;164:35-42. 2. Perou CM, Nature 2000;406:747-52. 3. Golub TR,. Science 1999;286:531-7 4. van t Veer LJ, Nature 2002;415:530-6. 5. Sorlie T, Proc Natl Acad Sci U S A 2001;98:10869-74 6. Esteban J, Prog Proc Am Soc Clin Oncol 2003;22:850. abstract. 7. Cobleigh MA, Prog Proc Am Soc Clin Oncol 2003;22:850. abstract. 8. Paik S, Breast Cancer Res Treat 2003;82:A16. abstract..
Results of Oncotype DX 21-gene assay expressed as a recurrence score (RS), see slides 38-44 16 Cancer and 5 Reference Genes From 3 Studies PROLIFERATION Ki-67 STK15 Survivin Cyclin B1 MYBL2 INVASION Stromelysin 3 Cathepsin L2 HER2 GRB7 HER2 ESTROGEN ER PR Bcl2 SCUBE2 GSTM1 CD68 BAG1 REFERENCE Beta-actin GAPDH RPLPO GUS TFRC RS = + 0.47 x HER2 Group Score - 0.34 x ER Group Score + 1.04 x Proliferation Group Score + 0.10 x Invasion Group Score + 0.05 x CD68-0.08 x GSTM1-0.07 x BAG1 Category RS (0-100) Low risk RS <18 Int risk RS 18 30 High risk RS 31 Paik et al. N Engl J Med. 2004;351:2817-2826. http://www.oncotypedx.com/en-us/breast/healthcareprofessional/ ~/media/files/basic/breast/hcp/development_and_clinical_validation.ashx
Oncotype DX for personalized therapy 1. Validated prognostic test for tamoxifen-treated patients Predictive of distant recurrence May be used as categorical or continuous variable Paik et al. NEJM, 2004 2. Also validated in population-based Kaiser study Habel et al. Breast Cancer Research, May 2006 3. Lower RS predictive of tamoxifen benefit Paik et al. ASCO 2005, abstr 510 4. Higher RS predictive of chemotherapy benefit Paik et al. JCO, August 2006 5. Correlates more strongly with outcome than Adjuvant! Bryant et al. St. Gallen, 2005 6. Predictive of local recurrence in tam-treated patients Mamounas, SABCS 2005, abstr 29 Scientific Rationale for Selecting Oncotype DX for Trial Assigning Individualized Options for Treatment (TAILORx) http://www.oncotypedx.com/en-us/breast/healthcareprofessional/~/media/files/basic/breast/hcp/development_and_clinical_validation.ashx Sparano, Clinical Breast Cancer, 2006 Sparano, ASCO Educational Book 2007
Personal genome sequencing Applications and Implications III Translation of genomic information into clinical applications: Breast Cancer (BC) Examples Use Case 1: BC risk prediction model (review) Use Case 2: Screening test for gene mutations associated with hereditary BC Use Case 3: Expression-based BC classification Current landscape of genetically-informed medicine Where might we go in the future? Anticipated future clinical applications of NGS idea video
BC risk assessment tools Freely available online programs to assess your risk for BC Developer Tool Website NCI Breast Cancer Risk Assessment Tool http://www.cancer.gov/bcriskt ool/ Harvard Center for Cancer Prevention Disease Risk Index http://www.diseaseriskindex.h arvard.edu BreastCancerPrevention.com Calculate Your Breast Cancer Risk http://www.breastcancerpreve ntion.org/raf_source.asp Steven B. Halls, MD Detailed Breast Cancer Risk Calculator http://www.halls.md/breast/ris k.htm Additional tools are available for medical professionals BRCAPRO (http://astor.som.jhmi.edu/bayesmendel/brcapro.html) BRCA Risk Calculator (https://www.myriadpro.com/brca-risk-calculator) Adjuvant! Online (https://www.adjuvantonline.com) - estimates risks and benefits of adjuvant therapy after surgery
The NCI Risk Assessment Tool
Molecular classification of BC for personalized prognostics Molecular signatures # of improved the ability to predict outcome Genes/ identify patients most likely to benefit University from certain therapies Genomics qrt-pcr 55 NOT 100% accurate Test Company Assay Type a Breast Bioclassifier MammaPrint Agendia Microarray 70 Proteins Classification Tumor subtype Therapeutic guidance Prognostic Therapeutic guidance MammoStrat Applied Genomics IHC 5 Prognostic MapQuant DX Ipsogen Microarray 97 Tumor grade Genomic Prognostic Oncotype DX Health qrt-pcr 21 Therapeutic guidance Rotterdam signature Veridex Microarray 76 Prognostic a qrt-pcr = quantitative real-time PCR; IHC = immunohistochemistry Ellsworth RE et al. Curr Genomics. 2010 May;11(3):146-61.
More Molecular signatures in the market Product Company Disease Purpose BC-SeraPro Power3 Breast cancer Differentiation between breast cancer patients and control subjects. Breast Cancer Index biotheranostics Breast cancer Risk assessment and identification of patients likely to benefit from endocrine therapy, and whose tumors are likely to be sensitive or resistant to chemotherapy. CancerTYPEID biotheranostics Cancer Classification of 39 types of cancer. CupPrint Agendia Cancer Determination of the origin of the primary tumor. GeneSearchBLN Assay Veridex Breast cancer Insight DxBreast Clarient Breast Cancer Profile cancer OvaCheck Correlogic Ovarian cancer OvaSure LabCorp Ovarian cancer Prostate Gene Clarient Prostate Expression cancer Profile PulmoStrat Applied Lung PulmoType Genomics Applied Genomics cancer Lung cancer Determination of whether breast cancer has spread to the lymph nodes. Prediction of disease recurrence risk. Early detection of epithelial ovarian cancer. Assessment of the presence of early stage ovarian cancer in high-risk women. Diagnosis of grade 3 or higher prostate cancer. Assessment of an individual's risk of lung cancer recurrence following surgery for helping with adjuvant therapy decisions. Classification of non-small cell lung cancer into adenocarcinoma versus squamous cell carcinoma subtypes. http://webdoc.nyumc.org/nyumc/files/chibi/attachments/path_class%201.pdf
Company Direct-to-Consumer Genetic Testing No genetic test with established beneficial clinical utility incorporates new BC susceptibility variants identified by GWAS BUT commercially available direct-to-consumer (DTC) testing offers genetic analysis and risk assessment Website Cost (USD) Genetic Counseling 23andMe www.23andme.com $399 No 2 SNPS decodeme www.decodeme.com $985 a Yes 11 variants b Breast Cancer Susceptibility Variants Knome www.knome.com Custom c Yes DNA sequence Navigenics www.navigenics.com $999 d Yes unknown a Complete scan. b For women of European descent. c KnomeSELECT is $24,500 for complete sequence of 20,000 genes; c KnomeCOMPLETE is $99,500 for complete genome sequence. d Option for ongoing subscription ($199 per year) for updates. Ellsworth RE et al. Curr Genomics. 2010 May;11(3):146-61.
Still a ways to go to realize full potential of personalized medicine Most causative breast cancer genes have not yet been identified Ellsworth RE et al. Curr Genomics. 2010 May;11(3):146-61.
Personal genome sequencing Applications and Implications III Translation of genomic information into clinical applications: Breast Cancer (BC) Examples Use Case 1: BC risk prediction model (review) Use Case 2: Screening test for gene mutations associated with hereditary BC Use Case 3: Expression-based BC classification Current landscape of genetically-informed medicine Where might we go in the future? Anticipated future clinical applications of NGS idea video
Where we have been BRACAnalysis: current non-ngs approach for identifying clinically significance genetic mutations Extraction of DNA from patient samples PCR amplification Identification of point mutations and indels Detection of large rearrangements (deletions and duplications, see slides 34-35) Sanger sequencing of individual exons Quantitative PCR-based copy number analysis Requires separate testing for large rearrangements http://www.myriad.com/lib/technical-specifications/bracanalysis- Technical-Specifications.pdf
Where we are going Targeted DNA capture followed by NGS for detection of clinically important inherited mutations Paired-end library from germline DNA DNA capture using hybridization in solution to custom oligonucleotides Library enriched for targeted genomic regions Raw sequence data (~5GB) Sequence 2-76-bp reads Filtering for high-quality reads Mapping using MAQ Alignment to the reference human genome (GRCh37), see slide 45 http://www.pnas.org/content/early/2010/06/23/1007983107.full.pdf+html
Targeted DNA capture followed by NGS for detection of clinically important inherited mutations Alignment to the reference human genome (GRCh37) Known variants Novel variants Read depth Exclude common variants by comparing with dbsnp, see slide 46 Candidate variants Compare to mutation databases (LSDBs) Large deletions and duplications, see slide 47 Predicted effect on mrna and protein, see slide 48-49 Does not require separate testing for large rearrangements http://www.pnas.org/content/early/2010/06/23/1007983107.full.pdf+html
Conclusions of proof of principle study of a NGS approach to enable accurate and cost-effective detection of mutations for breast and ovarian cancer Identified a wide variety of mutations in various genes in all test samples with no false-positive calls Detected large deletions and duplications that would have been missed by standard sequencing Does not require separate testing for large rearrangements Cost-effective Cost of reagents and consumables for NGS analysis of 21 cancer-associated genes < $1,500 Standard BRACAnalysis costs $3,340 Additional testing for gene rearrangements is another $650 http://www.pnas.org/content/early/2010/06/23/1007983107.full.pdf+html
Futuristic Paradigm for Cancer Care See Readings and References list: http://lpm.hms.harvard.edu/palaver/resources Boguski MS et al. F1000 Biol Rep. 2009 Sep 28;1. pii: 73.
The Future is NOW! This week in JAMA Link DC et al. JAMA. 2011 Apr 20;305(15):1568-76. Welch JS et al. JAMA. 2011 Apr 20;305(15):1577-84.
http://www.illumina.com/landing/idea The impact of a modular pipeline for the extraction of clinically relevant genomic data envisioned by the Laboratory for Personalized Medicine (PI: Peter Tonellato) http://www.youtube.com/watch?v=xqdw5mgazkk
Linked SNPs (Tonellato, NGS - Applications and Implications I) Recombination leads to the generation of new combinations of alleles Group of alleles located close to each other on a chromosome are rarely separated by recombination Neighboring SNPs are frequently inherited together Crossing-over and recombination during meiosis http://www.med.nyu.edu/rcr/rcr/course/geneticvariation2.ppt
Why have SVs been ignored? (Tonellato, NGS - Applications and Implications I) SV traditionally defined as deletions, insertions, or inversions > 1 kb Often involves repetitive regions of the genome and complex rearrangements Importance not recognized No optimal method for SV discovery http://www.stanford.edu/class/gene211/lectures/lecture5_personalgenomics.pdf Snyder M et al. Genes Dev. 2010 Mar 1;24(5):423-31.
What is Copy Number Variation (CNV)? (Wall, NGS - Data Analysis and Computation - II) Deletion Duplication Insertion Mobile Element Insertion Alu/Line/SVA Target site duplications Slide courtesy of Ryan Mills, PhD
Selection of 250 cancer-related candidate genes 8,10,11 DNA microarray analysis to classify and identify a gene expression signature strongly predictive a short interval to distant metastasis. 42 breast cancers 8 117 breast cancer 10 78 cancers, 3 fibro adenomas, 4 normal breast samples 11 8. Perou CM, Nature 2000;406:747-52. 10. van t Veer LJ, Nature 2002;415:530-6. 11. Sorlie T, Proc Natl Acad Sci U S A 2001;98:10869-74
Analysis of candidate gene expression in multiple 12-14 independent clinical studies Three independent clinical studies of breast cancer, 447 patients. (Data are not shown.) 12. Esteban J, Prog Proc Am Soc Clin Oncol 2003;22:850. abstract. 13. Cobleigh MA, Prog Proc Am Soc Clin Oncol 2003;22:850. abstract. 14. Paik S, Breast Cancer Res Treat 2003;82:A16. abstract.
Algorithm for Recurrence Score The recurrence score on a scale from 0 to 100 from the reference normalized expression measurements 1. Expression for each gene is normalized relative to the expression of the five reference genes 2. The GRB7, ER, proliferation, and invasion group scores are calculated 3. The unscaled recurrence score (RSU) is calculated with the use of coefficients that are predefined on the basis of regression analysis of gene expression and recurrence in the three training studies 4. The recurrence score (RS) is rescaled from the unscaled recurrence score Paik S, N Engl J Med. 2004 Dec 30;351(27):2817-26
1. Expression for each gene is normalized relative to the expression of the five reference genes ACTB [the gene encoding b-actin], GAPDH, GUS, RPLPO, and TFRC Reference-normalized expression measurements range from 0 to 15, with a 1-unit increase reflecting approximately a doubling of RNA. Genes are grouped on the basis of function, correlated expression, or both. Paik S, N Engl J Med. 2004 Dec 30;351(27):2817-26
2. The GRB7, ER, proliferation, and invasion group scores are calculated from individual gene-expression measurements, as follows: GRB7 group score = 0.9 xgrb7+0.1x HER2 (if the result is less than 8, then the GRB7 group score is considered 8) ER Group score = (0.8xER+1.2xPGR+BCL2+SCUBE2) 4 Proliferation group score= (Survivin+KI67+MYBL2+CCNB1[the gene encoding cyclin B1]+STK15) 5 (if the result is less than 6.5, then the proliferation group score is considered 6.5) Invasion group score= (CTSL2[the gene encoding cathepsin L2]+MMP11[the gene encoding stromolysin 3]) 2. Paik S, N Engl J Med. 2004 Dec 30;351(27):2817-26
3. The unscaled recurrence score (RSU) is calculated with the use of coefficients that are predefined on the basis of regression analysis of gene expression and recurrence in the three training studies RSU=+0.47xGRB7group score-0.34xer group score+1.04xproliferation group score+0.10 X invasion group score+0.05xcd68-0.08xgstm1-0.07xbag1 A plus sign indicates that increased expression is associated with an increased risk of recurrence, A minus sign indicates that increased expression is associated with a decreased risk of recurrence. Paik S, N Engl J Med. 2004 Dec 30;351(27):2817-26
4. The recurrence score (RS) is rescaled from the unscaled recurrence score RS=0 if RS U<0; RS=20X(RS U-6.7) if 0 RS U 100; and RS=100 if RS U>100. Paik S, N Engl J Med. 2004 Dec 30;351(27):2817-26
Validation Study Oncotype DX
Albain KS, Lancet Oncol. 2010 Jan;11(1):55-6
Identification of genetic variants (Tonellato, NGS - Applications and Implications II) Sample sequence compared to the reference genome to detect variants Example: Paired-end mapping (PEM) to identify structural variants Reference Concordant Relative Insertion Relative Deletion Relative Inversion Sample http://www.ncbi.nlm.nih.gov/dbvar/content/overview/
http://www.ncbi.nlm.nih.gov/projects/snp/get_html.cgi? whichhtml=how_to_submit (Tonellato, NGS - Applications and Implications II) NCBI Resource Links Submitted SNP Submitted SNP Submitted SNP Reference SNP Record External Muation Databases submitter SNP records; with ss identifiers referring to the same genomic location are grouped into reference SNP records with rs identifiers MUTATION DATA COLLECTION METHODS SUBMITTER DATA VARIATION DATA
Sequence-based methods for SV detection (Tonellato, NGS - Applications and Implications I) (A)Paired-end reads to detect insertions and deletions. (B) Split read methods for breakpoint identification. (C) Read depth analysis to detect CNVs (D) Local reassembly to reconstruction novel insertions Snyder M et al. Genes Dev. 2010 Mar 1;24(5):423-31.
Functional classification of SNPs (Tonellato, NGS - Applications and Implications I) Functional class nonsense frameshift coding-nonsynonymous splice-5 or splice-3 coding-synonymous utr-5 or utr-3 near-gene-5 or near-gene-3 intron Description within an exon and translated, amino acid changed to stop codon within an exon and translated, insertion or deletion interrupts the reading frame within an exon and translated, protein amino acid change, but not nonsense or frameshift; dbsnp calls this missense in first two bases or last two bases of an intron within an exon and translated, no protein amino acid change within an exon, but not translated intergenic, but within 2000 bases of a transcribed region between exons http://gvs.gs.washington.edu/gvs/helpsnpsummary.jsp
Potential biological consequences of SNPs (Tonellato, NGS - Applications and Implications I) Normal protein structure SNP encodes for a different 3-dimensional structure Correct splicing; normal mrna produce E1 E2 E1 E2 SNP leads to incorrect splicing; intron is not removed and abnormal mrna produced Nonsynonymous SNPs can affect protein structure and function E1 E2 E1 E2 Splice-site SNPs can affect mrna splicing E1 E2 E3 E1 E2 E3 No change in the coding region; normal mrna and protein produced E1 E2 E3 E1 E2 SNP results in creation of a premature termination codon ; degraded mrna or a protein product with abnormal function is produced Termination codon introducing SNPs can affect mrna stability and protein function Adapted from Savas, S. et al. Oncologist 2009;14:657-666.