Whole Genome Based Plant Breeding: Platforms and Technologies Gengyun Zhang Ph.D. BGI-SZ Email: zhanggengyun@genomics.org.cn www.genomics.org.cn
From a single gene to the whole genome 1984 Discussions 1990 Lunching 1996 First bacteria sequenced 1998 China joined 1999 BGI founded (1%) 2000 Working draft completed (Published in 2001) 2003 HGP goals meted phi X 2006 Project completion
From a single gene to the whole genome
培 训 伦 理 技 术 发 展 基 因 组 与 社 会 教 育 资 源 基 因 组 与 健 康 计 算 生 物 学 基 因 组 与 生 物 学 人 类 基 因 组 计 划 HGP marks the era of genomics is coming
Sequence more, light more Cost & Time
BGI current sequencing capacity 128 Illumina Hi-Seq 200 (2010) 2 Life Tech/SOLiD 4 +25 Life Tech/SOLiD 4 (March, 2010) 16 AB/3730xl + 110 MegeBACEs Data production: 2 Tb / day (now) 5 Tb / day (2010) About 100 completed human genome a day
BGI supercomputing # CPUS Flops RAM Storage 2009.01 1,500 18T 4TB 2PB 2009.08 3,000 50T 10TB 5PB 2009.12 5,000 100T 20TB 10PB 1,000PB 2010.09 50,000 1,000T (1P) 200TB (1EB)
BGI software and database Genome assembly: RePS SOAP Genome Annotation: BGF ReAS Comparative Genomics: FGF KaKS_Calculator CAT BGI developed software: SOAP series SOAPsnp, SOAPdenovo, SOAPsv, SOPAaligner, SOAPmeth,...
Bioinformatics Team 700 people working on bioinformatics Employee PhD students Young undergraduated students (a few) -- even a few high school students!
SEQUENCING SEQUENCING & SEQUENCING Sequencing is Basic! Study gene regulation Find genes whole genome/targeted region resequencing SVs, SNPs identifying, Exome sequencing whole genome sequencing targeted region sequencing DNA methylation, Gene expression mrna, ncrna, small RNA, micro RNA, regulatory RNA whole transcriptom shotgun Protein-DNA Metagenomics ChIP on Sequencing
Platform MassARRAY genotyping system iselect genotyping system
Genomics cannot be done alone Calls for International Collaboration BGI: a premium scientific collaborator
Important publications before 2009
Main articles since 2009 In 2009, 6 papers have been published in Science, Nature or Nature series
Platform of Agriculture --a technical supporting platform for global molecular breeding
Key Laboratory of Genomics, Ministry of Agriculture, China had been established in Department of Agriculture & Bioenergy, BGI.
The first domesticated crop in China? Yellow river region, China Neolithic Era, Hou Su, who began to cultivate Foxtail Millet in Chinese legend Seeds from 5,000 years, BC Figure in ancient book
Short life cycle Seeds Seedling 8 weeks under light control Mature Flowering
Abundant seeds
Short stature Foxtail Millet Maize Sorghum
Setaria italica Genome size: 500Mb 2n=2x=18
Solexa sequencing Male Parent: Zhanggu 1 Drought tolerance Herbicide-resistance Female Parent: A2 sterile lines Resequencing Resistance specific genes Up/down regulation genes Tissue specific expression genes Millet specific expression genes Early response/later tolerance genes Photo-Thermo-Sensitive Genic sterile (PTGS)
Sequence strategy
Data production of Setaria italica Sequence Data Insert Size Total Length (Gb) Sequence Depth (X) Sol exa Reads 150~200 3. 79 7. 59 350 10. 56 21. 13 500~700 9. 33 18. 66 2K 5. 1 10. 2 5K 8. 12 16. 25 10K 4. 36 8. 72 Tot al 41. 26 82. 55
Setaria italica Assemble Cont i g Si ze Cont i g Number Scaf f ol d Si ze Scaf f ol d Number N90 3933 19541 80106 1262 N80 8685 13253 149429 897 N70 12980 9678 208856 667 N60 17262 7124 267770 494 N50 22023 5159 334308 357 Tot al Si ze( G) 383000881 409671546 Tot al Number ( >100bp) 96678 49897 Tot al Number ( >2Kb) 24552 2554
Genetic linkage map of chromosomes 537 F2 population; 700 sv markers; A linkage map of 700 markers, 9 linkage group, finished in 64 days; Anchoring 80% sequence.
Genetic linkage map
Comparison Setaria vs sorghum orghum etaria orghum etaria
Annotation Top 15 InterPro hits I nt er pr o accessi on I PR011009 I PR000719 I PR002290 I PR020635 I PR017442 I PR001611 I PR001810 I PR002885 I PR016040 I PR001841 I PR001128 I PR009057 I PR002182 I PR018957 Number of genes Funct i on 1450 Pr ot ei n ki nase- l i ke domai n 1370 Pr ot ei n ki nase, cat al yt i c domai n 1304 Ser i ne/ t hr eoni ne- pr ot ei n ki nase domai n 1251 Tyr osi ne- pr ot ei n ki nase, subgr oup, cat al yt i c domai n 1218 Ser i ne/ t hr eoni ne- pr ot ei n ki nase- l i ke domai n 576 Leuci ne- r i ch r epeat 569 Cycl i n- l i ke F- box 500 Pent at r i copept i de r epeat 439 NAD( P) - bi ndi ng domai n 416 Zi nc f i nger, RI NG- t ype 406 Cyt ochr ome P450 372 Homeodomai n- l i ke 361 NB- ARC 352 Zi nc f i nger, C3HC4 RI NG- t ype
Setaria italica transformation system Rooting Regeneration Gus stain
rice millet sweet sorghum soybean maize potato cassava cucumber
Sex determination gene M m Genetics
Case study- Cloning of the M gene 26,682 45 8 3 1 Genome sequencing Genetic map Association study Comparative genetics 0.2cM 50 varieties vs melon Digital gene expression 10 tissue
Extensive genome comparation between cultivated rice and wild rice wild rice rice Low coverage resequencing of 25 core cultivated rice lines and 25 wild rice lines have been finished.
selective sweep: inheritance of regions around adaptive alleles from Anderson and Georges Nature Reviews of Genetic5: 202-212 (2004) extent of selective sweep for domestication in MAIZE: tb1 locus (60 to 90-kb) (Clark et al. 2004), Y1 locus (about 600-kb) (Palaisa et al. 2004)
We found many selective sweeps!
Neighbor-joining phylogenetic trees for all accessions based on the number of SNPs Whole genome SNPs SNPs surround sh4 Seed shattering SNPs surround prog1 Tillering
>500 related to domestication >150 high confidence Identifying their functions: Transgenic study---over expression or RNAi
High flux platform of transgenic gene transgenic function seed patent
High-throughput transgenic flux: 5,000 genes/year finished in 2009: 242 genes 24 genes show significant differences from WT patents: 13 genes
Molecular Breeding Part I: Determination of Core Gene Set (Who is Who?)
tematic molecular eeding Evolutionary Genomics Mutant Library ➄ ➆ LD mapping Genome Seq Germplasm reseq. ➂ Core gene set 100-1000 Gene Test Kits ➀ ➁ ➈ ➉ Linkage analysis ➃ Digital Exp. Profile ➅ Genetic Transformation ➇ Germplasm selection efficiency 1000x breeding time 1/2-1/3 New cultivar
Molecular Breeding Part II: Determination of higher quality (desired) allele (Which one is better?)
Different loci and alleles Accumulation of desirable alleles Loci: A B C D germplasm1 G. 2 G. 3 Different alleles G. n High density markers based on resequencing of G. 1 n Marker-assisted selection: quick selection of desirable progenies Better combination (New Cultivar)
Marker-Assisted Selection Based on Single Segment Substitution Lines (SSSLs) SSSLs Desirable SSSLs New cultivars Selection of good SSSLs Rapid accumulation of desirable alleles supported by MAS A Rice Marker-Assisted Selection Platform Based on SSSLs Release of NC
基 于 SSSL 的 分 子 设 计 育 种 华 小 黑 1 号 1 2 3 4 5 6 7 8 9 10 11 12 Pb 华 小 黑 1 号 是 一 个 单 片 段 代 换 系, 携 带 来 自 黑 米 亲 本 联 鉴 33 第 4 染 色 体 的 一 个 代 换 片 段, 遗 传 背 景 与 华 粳 籼 74 相 似 于 2005 年 通 过 了 广 东 省 的 品 种 审 定 ( 粤 审 稻 2005015) 华 小 黑 1 号 的 审 定, 表 明 通 过 染 色 体 片 段 代 换 可 以 实 现 个 别 性 状 的 改 良 华 小 黑 1 号 华 粳 籼 74 From: Dr. Guiquan Zhang, South China Agricultural University
基 于 SSSL 的 分 子 设 计 育 种 华 标 1 号 1 2 3 4 5 6 7 8 9 10 11 12 是 一 个 三 片 段 聚 合 系, 携 带 来 自 Lemont 第 6 染 色 体 的 一 个 代 换 片 段, 以 及 中 4188 第 3 和 第 8 染 色 体 的 一 个 代 换 片 段, 遗 传 背 景 与 华 粳 籼 74 相 似 经 广 东 省 区 试 鉴 定, 华 标 1 号 的 米 质 达 国 标 2 级 ; 产 量 与 对 照 相 当 GS3 alk Wx-20G gw8 于 2009 年 通 过 了 广 东 省 的 品 种 审 定 From: Dr. Guiquan Zhang, South China Agricultural University
四 基 于 SSSL 的 分 子 设 计 育 种 华 标 3 号 1 2 3 4 5 6 7 8 9 10 11 12 是 一 个 四 片 段 聚 合 系, 代 换 片 段 分 别 来 自 第 3 6 8 染 色 体, 遗 传 背 景 与 华 粳 籼 74 相 似 华 标 3 号 将 于 2009 年 参 加 广 东 等 省 的 区 试 GS3 alk Wx-20G fgr gw8 华 标 3 号 在 华 标 1 号 的 基 础 上, 增 加 了 香 味 基 因 (fgr), 因 此 品 质 更 优
Resequencing of a rice RIL population A G A G Populations for plant genotyping: F2, BC, RILs, DH U A G Single Nucleotide Polymorphisms (SNPs) are the most common type of genetic variation. A A A G G G U N generations-- RILs U G G G G A G A G A A A A A:G=1:1
Chromsome SNPs Chromosome length Kb/SNP Depth Chr01 17710 47283185 2.67 2.41 Chr02 15051 38103930 2.53 2.47 Chr03 15173 41884883 2.76 2.39 Chr04 13872 34718618 2.50 2.37 Chr05 8648 31240961 3.61 2.48 Chr06 2999 32913967 10.97 2.31 Chr07 3778 27957088 7.40 2.34 Chr08 10955 30396518 2.77 2.09 Chr09 9864 21757032 2.21 2.40 Chr10 9593 22204031 2.31 2.31 Chr11 6409 23035369 3.59 2.24 Chr12 8474 23049917 2.72 2.31
Recombinant Map 2 1 0.8 0.4 0.2 0.1
Reveal the breeding process by resequencing The composition of inbred lines Our analysis showed that there were 27 recombination breakpoints from Inbred 5003 to Inbred 478 and 46 breakpoints from Inbred 478 to Zheng58. Inbred 478 inherited 43% of its genome from one parent (Inbred 5003) and 57% from its other parent (Inbred 8112). Whereas, Zheng58 inherited 43% of its genomic content from Inbred 478, but the contributions here from its grandparents, Inbreds 5003 and 8112, were unequal (12% were derived from 5003 and 31% from 8112)
A whole genome based molecular marker assisted selection system could be quickly established based on high throughput sequencing and bioinformatics platforms, even for a less studied crop. The platform in BGI-SZ provides practical solutions for developing countries to develop and apply their own molecular breeding systems economically. Thank you!!