1 / 90

Haplotype-led approaches for increasing precision in plant breeding

Haplotype led approaches in Plant breeding

Maruthi3
Download Presentation

Haplotype-led approaches for increasing precision in plant breeding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seminar-II Maruthi Prasad B P II PhD PAMB 1066 Dept. of GPB, UASB

  2. POPULATION INCREASE!!!! Biotic and Abiotic stresses!!!! CLIMATE CHANGE!!!!

  3. Conventional breeding has made great success in the development of high-yielding crop varieties • It is important to accelerate the pace of crop improvement programmes especially for the complex traits such as yield under stress condition Varshney et al. 2005

  4. Genetic Variations Trait Improvement Environmental Resilience Efficient Breeding

  5. Marker allelic variations within a genome of a same species 1. Single nucleotide polymorphisms – SNPs 2. Segmental/nucleotide insertions/deletions - InDels 3. Differences in the number of tandem repeats at a locus – SSRs SSR ACTGTCGACACACACACACGCTAGCT TGACAGCTGTGTGTGTGTGCGATCGA ACTGTCGACACACACACACACACACGCTAGCT TGACAGCTGTGTGTGTGTGTGTGTGCGATCGA ACTGTCGACACACACACACACACACACACACACGCTAGCT TGACAGCTGTGTGTGTGTGTGTGTGTGTGTGTGCGATCGA InDels CATCGCGAATTCCCATCG GTAGCGCTTAAGGGTAGC CATCG----------------CATCG GTAGC----------------GTAGC SNP GAATTC CTTAAG GAACTC CTTGAG Mammadov et al., 2012

  6. Targeting genetic variants associated with agronomic traits and identifying important underlying candidate genes have become a key area in crop genetic research Depending on detection method and throughput • Low-throughput, hybridization-based markers : RFLPs • Medium-throughput, PCR-based markers: RAPD, AFLP, SSRs • High-throughput (HTP) sequence-based markers: SNPs

  7. Single Nucleotide Polymorphism • A Single Nucleotide Polymorphisms (SNP), pronounced “snips,” is a genetic variation when a single nucleotide (i.e., A, T, C, or G) is altered and kept through heredity. • SNP: Single DNA base variation found >1% • Mutation: Single DNA base variation found <1% C T T A G CT T C T T A G C T T 99.9% 94% C T T A G T TT C T T A G T T T 0.1% 6% SNP Mutation

  8. Mutations SNPs time present Mutations and SNPs Observed genetic variations Common Ancestor

  9. Single Nucleotide Polymorphism A SNP is usually assumed to be a binary variable The probability of repeat mutation at the same SNP locus is quite small The tri-allele cases are usually considered to be the effect of genotyping errors The nucleotide on a SNP locus is called a major allele (if allele frequency > 50%) a minor allele (if allele frequency < 50%) A C T T A G C T T T: Major allele 94% C: Minor allele A C T T A G C T C 6%

  10. Single Nucleotide Polymorphism • SNPs are found in • coding and (mostly) noncoding regions • Occur with a very high frequency • about 1 in 1000 bases to 1 in 100 to 300 bases • Easily automated • SNPs close to particular gene can act as a marker for that gene • SNPs have become the preferred markers for association studies because of their high abundance and high-throughput SNP genotyping technologies.

  11. A G C T A T A T AC GT C G C G SNP1 SNP2 SNP1 SNP2 SNP1 SNP2 SNP1 SNP2 Haplotype data Genotype data Genotypes • The use of haplotype information has been limited because the individual genome is a diploid. • To obtain the haplotype data, we have to separate them first • In large sequencing projects, genotypesinstead of haplotypes are collected due to cost consideration.

  12. A G C T AC GT SNP1 SNP2 SNP1 SNP2 Genotype data A G A T C T C G SNP1 SNP2 SNP1 SNP2 Problems of Genotypes or We don’t know which haplotype pair is real • Genotypesonly tell us the alleles at each SNP locus • But we don’t know the connection of alleles at different SNP loci • There could be several possible haplotypes for the same genotype

  13. “Haplotype-led approaches for increasing precision in plant breeding”

  14. Outline of Presentation Introduction Haplotype construction and Inference 01 Haplotype Mapping 02 Tag SNPs & Methods to select tSNPs. 03 Application of Haplotype led approaches in Plant Breeding 04 05 Case studies . 06 Conclusion

  15. alleles Haplotype locus String of SNPs that are linked/co-inherit tegether Polymorphic frozen blocks haplotypes A haplotype is a group of genes in an organism that are inherited together from a single parent in a defined order (Bevan et al., 2017) These variants tend to be inherited together, often because they are very close together in the same chromosome region and therefore less likely to be separated by crossing over(Snowdon et al., 2015)

  16. C T C Haplotype 1 -A C T TA G C T T- -A C T TT G C T C- C A T Haplotype 2 A T C -A A T TT G C T C- Haplotype 3 SNP1 SNP2 SNP3 SNP1 SNP2 SNP3 Haplotypes • In terms of SNP- • “Two or more SNP alleles that tend to be inherited as a unit” (Bernardo, 2010) • A haplotype stands for a set of linked SNPs on the same chromosome not easily separable by recombination • within each block, recombination is rare due to tight linkage

  17. Haplotype blocks Recombination Hotspots and Haplotype Blocks • Haplotype blocks are defined as a contiguous series of SNPs and appearing to have very little evidence of historical recombination among the individuals (Gabriel et al., 2002)

  18. Haplotype patterns P1 P2 P3 P4 Recombinationhotspots S1 S2 S3 S4 : Major allele Haplotypeblocks S5 SNP loci S6 : Minor allele S7 S8 S9 S10 S11 S12 Chromosome Recombination Hotspots and Haplotype Blocks

  19. A Haplotype Block Example • The Chromosome 21 of humans is partitioned into 4,135 haplotype blocks over 24,047 SNPs by Patil et al. (Science, 2001). • Blue box:major allele • Yellow box:minor allele

  20. Hapmap Source: The International Hapmap Project • The HapMap is a map of the haplotype blocks and specific SNPs that identify the haplotypes • The haplotype map or "HapMap" acts as tool to find genes and genetic variations that affect the trait expression.

  21. Steps in hapmap construction Third generation sequencing: Alleviating the bottlenecks in haplotype identification NGS Technology TGS Technology

  22. Different phasing methods for haplotype construction/ reconstruction Reference-based phasing De novo genome assembly (such as diploid and polyploid assembly) Strain-resolved metagenome assembly (de novo re-assembly, single nucleotide variant-based assembly, read and contig binning)

  23. Tools for Haplotype analysis

  24. Haplotagging-A novel sequencing strategy for rapid discovery of haplotypes

  25. Steps in hapmap construction SNPs are identified in DNA samples from multiple individuals Adjacent SNPs that are inherited together are compiled into haplotypes “Tag” SNPs are identified within haplotypes that uniquely describe those haplotypes Source: The International Hapmap Project

  26. Haplotype blocking Saad et al., 2018 Confidence interval test Four gamete test Solid spine of linkage disequilibrium

  27. Confidence interval test The reasons for allowing <5% of weak LD in the haplotype block is due to force like recurrent mutation, gene conversion, or errors of the genome assembly or genotyping in addition to recombination events Saad et al., 2018

  28. Four gamete test Haplotype block partitioning method that assumes recombination events are not allowed within each block Four gametes condition Three gamete condition Three gametes = No recombination- Haplotype block Four gametes = Recombination event occurred-No blocking • Rare gamete frequency > 0.01 to count a recombination event • Recombination events are only accepted between blocks

  29. Solid spine of linkage disequilibrium • Strong LD between the first SNP and the last SNP and with all the intermediate SNPs is observed Scenario where a SNP marker exhibits strong and consistent associations with surrounding SNPs, indicating the presence of a stable haplotype block The solid spine is a line of strong LD >0.8 that moves from one allele to next along the legs of the triangle. Which defines particular haplotype

  30. Comparison among haplotype blocking methods The FGT method differs from other methods as it does not require threshold for LD Qian et al., 2017

  31. Haplotype Inference • The problem of inferring the haplotypes from a set of genotypes is called haplotype inference. • Most combinatorial methods consider the maximum parsimony model to solve this problem. • This model assumes that the real haplotypes in natural population is rare • The solution of this problem is a minimum set of haplotypesthat can explain the given genotypes

  32. A G A T A A G h3 h1 G1 T A C C T C G h4 h2 SNP1 SNP2 A T h1 T G2 A T T h1 SNP1 SNP2 A T A G C G C T A T Maximum Parsimony • Find a minimum set of haplotypesto explain the given genotypes. or

  33. Factors affecting haplotype map construction Hamblin & Jannink, 2011 SNP allele frequency distribution Haplotype allele numbers Linkage disequilibrium (LD)

  34. Problems of Using SNPs for Association Studies • The number of SNPs is still too large to be used for association studies • There are millions of SNPs in a plant genome • To reduce the SNP genotyping cost, we wish to use as few SNPs as possible for association studies • Tag SNPs are a small subset of SNPs that is sufficient for performing association studies without losing the power of using all SNPs.

  35. Brief glossary of terms Halldorssonet al., 2004

  36. Examples of Tag SNPs Haplotype patterns An unknown haplotype sample P1 P2 P3 P4 S1 • Suppose we wish to distinguish an unknown haplotype sample • We can genotype all SNPs to identify the haplotype sample S2 S3 S4 S5 S6 SNP loci S7 S8 S9 : Major allele S10 S11 : Minor allele S12

  37. Examples of Tag SNPs Haplotype pattern • In fact, it is not necessary to genotype all SNPs • SNPs S3, S4, and S5 can form a set of tag SNPs P1 P2 P3 P4 S1 S2 S3 S4 S5 S6 SNP loci P1 P2 P3 P4 S7 S8 S3 S9 S4 S10 S5 S11 S12

  38. Examples of Wrong Tag SNPs Haplotype pattern P1 P2 P3 P4 • SNPsS1, S2, and S3 can not form a set of tag SNPs because P1 and P4 will be ambiguous S1 S2 S3 S4 S5 S6 SNP loci P1 P2 P3 P4 S7 S1 S8 S2 S9 S3 S10 S11 S12

  39. Examples of Tag SNPs Haplotype pattern • SNPs S1 and S12 can form a set of tag SNPs • This set of SNPs is the minimum solution in this example P1 P2 P3 P4 S1 S2 S3 S4 S5 S6 SNP loci S7 S8 P1 P2 P3 P4 S9 S1 S10 S12 S11 S12

  40. Steps for ‘tag SNP’ selection Halldorssonet al., 2004 (1) Determining predictive neighborhoods (2) Minimizing the number of tagging SNPs (3) Tagging quality assessment

  41. Haplotype Blocks and Tag SNPs • Recent studies have shown that the chromosome can be partitioned into haplotype blocks interspersed by recombination hotspots • Within a haplotype block, there is little or no recombination occurred. • The SNPs within a haplotype block tend to be inherited together • Within a haplotype block, a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block • We only need to genotype tag SNPs instead of all SNPs within a haplotype block

  42. S3 S4 S2 There are pairs of patterns. Problem Formulation P1 P2 P3 P4 • The relation between SNPs and haplotypes can be formulated as a bipartite graph • S1can distinguish (P1, P3), (P1, P4), (P2, P3), and (P2, P4) • S2 can distinguish (P1, P4), (P2, P4), (P3, P4) S1 S2 S3 S4 S1 (1,2) (1,3) (1,4) (2,3) (2,4) (3,4)

  43. P1 P2 P3 P4 S1 S2 S3 S1 S3 S4 S2 Observation • The SNPs can form a set of tag SNPs ifeach pair of patterns is connected by at least one edge • e.g., S1 and S3 can form a set of tag SNPs • e.g., S1 and S2 can not be tag SNPs (1,2) (1,3) (1,4) (2,3) (2,4) (3,4) Each pair of patterns is connected by at least one edge.

  44. Methods to select tSNPs Covariance matrix of SNPs Principal components analysis SNPs contribute most to eigenvectors & associated with the largest eigenvalues are considered as more influential Selected SNPs added to the set of tagging SNPs Based on Principal Component Analysis (PCA) to reduce the dimensions of complete sets of SNPs

  45. Methods to select tSNPs • Shannon entropy: Based on defining how well a subset of SNPs captures the variation in the complete set • Shannon entropy helps us quantify how much genetic diversity a particular SNP captures SNP has high entropy→ It comes with different versions of alleles → Reflecting greater diversity→ tSNP is selected SNP has low entropy → Most individuals have same version of alleles → Less diversity →Less informative

  46. Linkage Disequilibrium • The problem of finding tag SNPs can be also solved from the statistical point of view • We can measure the correlation between SNPs and identify sets of highly correlated SNPs • For each set of correlated SNPs, only one SNP need to be genotyped and can be used to predict the values of other SNPs • Linkage Disequilibrium (LD)is a measure that estimates such correlation between two SNPs

  47. A B a B a b Introduction to Linkage Disequilibrium • PAB≠ PAPB • PAb≠PAPb = PA(1-PB) • PaB≠PaPB = (1-PA) PB • Pab≠PaPb = (1-PA) (1-PB) A b SNP2 SNP1 SNP2 SNP1

  48. Linkage Disequilibrium Formulas • Mathematical formulas for computing LD or Correlation: • r2 or Δ2:

  49. Linkage Disequilibrium Bins • The statistical methods for finding tag SNPs are based on the analysis ofLDamong all SNPs • An LD bin is a set of SNPs such that SNPs within the same bin are highly correlated with each other • The value of a single SNP in one LD bin can predict the values of other SNPs of the same bin • These methods try to identify the minimum set of LD bins

  50. An Example of LD Bins (1/3) • SNP1 and SNP2 can not form an LD bin • e.g., A in SNP1 may imply either G or A in SNP2

More Related