520 likes | 799 Views
Project started in december 2009. OptiMAS: a decision support tool to conduct Marker Assisted Selection (MAS) programs. F. Valente, J. Joets A. Charcosset & L. Moreau. ICIS Workshop Hyderabad, India March 28-01, 2011. Breeding decisions adapted to different crops / projects.
E N D
Project started in december 2009 OptiMAS: a decision support tool to conduct Marker Assisted Selection (MAS) programs F. Valente, J. Joets A. Charcosset & L. Moreau ICIS Workshop Hyderabad, India March 28-01, 2011
Breeding decisions adapted to different crops / projects • Projects / Crops: Chickpea, Bean, Cowpea, Sorghum, Cassava, Maize and Wheat (see Alain’s table synthesis of users needs). • QTL will be detected for different traits and favourable alleles will be found. Mostly biparental population(s), but several cases where population are connected through common parent(s). • Common objective: create new genetic materials assembling favorable QTL alleles from 2 parents or all parents. Implementation of tools and strategies to facilitate marker assisted recurrent selection for users.
Marker Assisted Recurrent Selection (MARS) Population n • Objective 1: to develop algorithms to compute probabilities of allele transmissionthrough generations andtopredict the genetic value of individuals (molecular score). • Objective 2: to develop methodology to identify the best intermating strategy to accumulate favourable alleles and to extract varieties. based on markers • To select the « best » individuals • To conduct intermating Variety Creation Aim Population n+1
Selection of individuals • Aim: to obtain an ideotype combining all the favorable alleles at QTL positions. • Prediction of genetic value (molecular score) based on the information from neighboring markers. Which parental alleles are transmitted at the QTL position ? • 2 problems: • Markers not polymorphic in all populations can’t distinguish the parental alleles (IBS: Identical By State ≠ IBD: Identical By Descent). • QTL position rarely located at a marker QTL alleles are unknown and must beinferred from flanking markers. Sometimes, need to use more than 2 markers / QTL
OptiMAS tested on a multiparental connected designs • QTL detection (traits: flowering time, grain yield), identification of favorable alleles and estimation of allele effects. • e.g of previous work: 6 connected F1 populations derived from a diallel cross between 4 maïze inbred lines. D F1 F F1 F1 F1 F1 F1 S X Blanc et al., TAG, 2006
Problem of partially informative markers 1 2 3 4 Pr571 Ideal case: all 4 alleles are observed (informative marker)
Problem of partially informative markers Pr472 0 1 18 Pr1186 23 2 Pr473 3 29 Pr475 Pr476 4 Pr305 Pr304 45 50 Pr92 Flowering time QTL, chr. 2, favorable allele = d Pr571 69 Ideal case: all 4 alleles are observed (informative marker) Pr1181 Pr1211 77 Pr96 85 Pr1165 90 Pr1187 94 Pr193 99 Pr1190 105 Pr309 Pr308 117 Pr95 1 Pr20 Pr1226 Pr1199 132 Pr130 2 Pr1174 146 Pr484 152 Pr94 Pr491 160 Pr7 Pr1165 2 alleles detected 2 hybrids not polymorphic (partially informative marker often bi allelic) df dx fx sd sf sx Sometimes, need to use more than 2 markers / QTL.
Selection of individuals • Aim: to obtain an ideotype combining all the favorable alleles at QTL. • Prediction of individual fitness (molecular score) based on the information from neighboring markers. Which parental alleles were transmitted at the QTL ? • 2 problems: • Markers not polymorphic in all populations can’t distinguish the parental alleles (IBS: Identical By State ≠ IBD: Identical By Descent). • QTL position rarely located at a marker QTL alleles are unknown and must beinferred from flanking markers. Sometimes, need to use more than 2 markers / QTL
Following « phased genotypes » • Information available • Pedigree • Distance between loci (Haldane recombination rate) • Molecular markers (observed genotypes) Identification of all the possible phased genotypes for each QTL and computation of their probabilities. d ? d mk1 QTL mk2 Read / Real mk1 QTL mk2 ? 1/2 1/2 f ? f • Phase (haplotypes) unknown all possibilities considered Phased genotype: contains no ambiguity on alleles transmitted from parents and the phase.
New algorithms: adaptability to different marker selection schemes and mating systems mk1 QTL mk2 mk1 QTL mk2 Inbred Lines X D F dd dd dd ff ff ff
New algorithms: adaptability to different marker selection schemes and mating systems mk1 QTL mk2 mk1 QTL mk2 Inbred Lines X D F dd dd dd ff ff ff QTL 1 possible phased genotype DF Hybrid F1 df df df
New algorithms: adaptability to different marker selection schemes and mating systems mk1 QTL mk2 mk1 QTL mk2 Inbred Lines X D F dd dd dd ff ff ff QTL 1 possible phased genotype 8 possible gametes DF Hybrid F1 df df df [ddd] [ddf] [dfd] [dff] [fdd] [fdf] [ffd] [fff]
New algorithms: adaptability to different marker selection schemes and mating systems mk1 QTL mk2 mk1 QTL mk2 Inbred Lines X D F dd dd dd ff ff ff QTL 1 possible phased genotype 8 possible gametes DF Hybrid F1 df df df [ddd] [ddf] [dfd] [dff] [fdd] [fdf] [ffd] [fff] x Selfing QTL F2 36possible phased genotypes df39 dd dd dd dd dd df :: :: :: Reconstruction ff ff df ff ff ff
New algorithms: adaptability to different marker selection schemes and mating systems mk1 QTL mk2 mk1 QTL mk2 Inbred Lines X D F dd dd dd ff ff ff QTL 1 possible phased genotype 8 possible gametes DF Hybrid F1 df df df [ddd] [ddf] [dfd] [dff] [fdd] [fdf] [ffd] [fff] x Selfing QTL F2 36possible phased genotypes df39 dd dd dd dd dd df :: :: :: Reconstruction ff ff df Genotyping ff ff ff QTL 12 22 F2 df39 df dd dd df df dd Reduce the space of possible genotypes 4compatible phased genotypes dd df dd df fd dd fd ff dd
QTL QTL QTL QTL ? fx fx xx ? dd sd ? sd sf ? ss xx ? dx sf ? sf xx ? xx sf ? ff ff fx xx ss sd dd QTL uncertainty persists over generations QTL QTL QTL QTL xx xx xx dd dd dd ff ff ff ss ss ss IL F D S X F2 sd554 dx284 sf743 fx304 C1 A551 A830 Post F2 operations possibly involve crossing with individuals from more than 2 parental inbred lines.
QTL QTL QTL QTL ? fx fx xx ? dd sd ? sd sf ? ss xx ? dx sf ? sf xx ? xx sf ? ff ff fx xx ss sd dd sf sf ff sf sd ff QTL sf sx ff sf ff ff sf ? ff sf fd ff sf ? sf sf fx ff sf dd ff sf dx ff QTL uncertainty persists over generations QTL QTL QTL QTL xx xx xx dd dd dd ff ff ff ss ss ss IL F D S X F2 sd554 dx284 sf743 fx304 C1 A551 A830 C2 B369 Pedigree of individual involves more parental lines Increase the number of possibilities(computation time)
Germplasm Management Genealogy Genotyping Data analysis IB Nursery Book Pedigree Management IB Lab Book QTL detection (position) Favorable alleles / effects Genotyping Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Variety Release MARS cycles Fixation cycles Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) OptiMAS workflow – Relationship with IBP modules Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Input file format: genetic map geno Pr443 qtl1 Pr256 Pr299 qtl2 … chr 1 1 1 1 1 … qtl 1 1 1 2 2 … pos 0 23.3 28.5 66.5 74 … nb+ 2 2 2 1 1 … all+ d/f d/f d/f d d … • Marker names • Index of chromosomes • Index of QTL • Positions of markers / QTL • Favorable alleles at QTL • Indication of the markers selected around each QTL Information of QTL detection and identification offavorable alleles D F S X QTL chr pos Nb class QTL x QTL Grain yield (t ha-1) (Blanc et al., TAG, 2006) Parental alleles
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps - IL: Inbred Line-
ss xx sx ss sx Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps - IL: Inbred Line - CR: intercrossing (F1, Backcross)-
ss xx sx ss sx xx Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps - IL: Inbred Line - CR: intercrossing (F1, Backcross) - Sn: n generation of Selfing-
ss xx sx ss sx xx ss sx xx ss xx Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps - IL: Inbred Line - CR: intercrossing (F1, Backcross) - Sn: n generation of Selfing - RIL:Recombinant Inbred Lines-
ss xx sx ss xx Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps - IL: Inbred Line - CR: intercrossing (F1, Backcross) - Sn: n generation of Selfing - RIL:Recombinant Inbred Lines - HD: Double Haploids-
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps • Genotype (if available): 2 lines per individual,the phase is unknown
Input file format: pedigree / genotype 4 5 2 #D . d IL 1 3 3 2 2 1 3 3 2 2 #F . f IL 1 3 2 1 3 1 3 2 1 3 #df D F CR #df37 df df S1 1 3 2 1 2 1 3 3 2 ? .. .. .. .. .. d f s x IL D F S X [d,f] [s,x] F1 df sx Selfing Selfing X X [d,f] [s,x] F2 df37 sx127 MD [d,f,s,x] A1003 C1 • 1st line: number of individuals / markers / QTL • Markers in column • Individuals by line + Pedigree • Information of mating system steps • Genotype (if available) + Missing Data
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Running OptiMAS • Development of 2 versions of the tool: • Command line: C-ANSI language • Graphical interface (GUI): C++ (Qt & Qwt libraries) language Linux (alpha version / can be tested with virtualbox) Windows / Mac OS $ ./bin/optimas -f input/input_example.txt –q [num] • Tested with 3 cycles of MARS + Selfing (time consuming case): • 4 IL [geno] • 3 F1 cross[geno] • 25 F2 1 selfing [geno] • 25 F4 2 selfing [x] • 21 C cross [geno] • 21 C’ 1 selfing [x] • 298 CM cross [geno] • 298 CM’ 1 selfing [x] Option: -q all QTL Cut off: 0.00001 PG > 96.6 % 695 individuals 45 Markers 11 QTL Run: ~ 2 min !! Configuration: 2.4 GHz + 2 Go RAM
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Following the transmission of parental alleles • haplo_output: probabilities of all the possible phased genotypes (haplotypic composition in terms of parental alleles) #QTL indiv haplo1 haplo2 read1 read2 proba nb_haplo 1 fx305_ x.f.x x.f.x 2.f.2 2.f.2 0.000138 31 fx305_ x.f.x x.x.x 2.f.2 2.x.2 0.023226 31 fx305_ x.x.x x.x.x 2.x.2 2.x.2 0.976636 3 1 sf648_ s.s.f s.s.f 1.s.3 1.s.3 0.001615 10.. ………. …… …… ……. ……. …………. ...1 sf648_ s.f.f f.f.f 1.f.3 1.f.3 0.273916 101 sf648_ f.f.f f.f.f 1.f.3 1.f.3 0.598402 10 Genotypes at markers neighbouring the QTL QTL position
Estimation of genetic values • _scores: genetic value (molecular score) for each individual • Each QTL: expected dose of favorable allele (0 or 1) for all possible phased genotypes , weighted by the probability of each phased genotype. ∑ θq. pG ∑Meach 2 classes: - Favorable - Not favorable Meach= Mall= 2 nb_qtl • θq : genotype (homo -/+, hetero) at QTL position (0, 1, 2) • pG: probability of the current genotype • Option: effect associated with parental allele(s) at QTL • QTL are considered as being independent (unlinked) Geno θ Homo - 0 Hetero 1 Homo + 2
Computation of genotypic probabilities • tab_homo_hetero: probabilities of being homozygous (++/--) or heterozygous at the QTL position. • Favourable / unfavourable grouping of parental alleles • Detailed genotype frequencies at QTL
Prediction of genetic value • tab_scores: genetic value (molecular score) for each individual (all / each QTL)
Estimation of parental allele transmission • tab_parents:probability / frequency of parental allele at QTL.
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Qtl1 + + . . + . . . - Qtl2 + + . . - . . . - Qtl3 - - . . + . . . - Qtl4 + - . . - . . . - Qtl5 + + . . + . . . - Qtl6 + + . . - . . . - Qtl7 + + . . - . . . - Qtl8 + + . . + . . . - …... . . . . . . . . . …... . . . . . . . . . Qtln - - . . + . . . - Selection of individuals • Manual selection. • Truncation Selection (MTS): individuals are ranked based on marker score and the best individuals (Nsel) are selected for recombination. • QTL Complementation Selection (QCS) [in progress] Apply high selection intensity while preventing the loss of rare favorable alleles(Hospital et al., 2000) M - Individuals M + • Iterative selection of candidates following ranking on M value but that complement already selected individuals for favorable allele representation Prevent the loss of rare favourable alleles and fixation of unfavourable alleles at QTL with small effects.
Decision support application for MARS: OptiMAS Pedigree / Genotype Genetic map Algorithms to compute probabilities of IBD allele transmission throughout generations of selection Estimated genetic values(Homozygous +/-, heterozygous, Probability of parental alleles) Manual, Truncation selection QTL complementation selection Selected individuals Complete, Random Intermating, Better Half, Manual Selfing, HD, SSD Expected value of progeny Selected crosses Selected individals for variety creation
Identification of crosses to be made among selected individuals • Complete : all possible crosses between the selected individuals.
Identification of crosses to be made among selected individuals • Complete : all possible crosses between the selected individuals. • Better Half : avoid crosses between the last selected genotypes. • Optimization of selection intensity
Identification of crosses to be made among selected individuals • Complete : all possible crosses between the selected individuals. • Better Half : avoid crosses between the last selected genotypes. • Optimization of selection intensity • Predefined number of crosses to be made based on : • Molecular score [in progress]. • Weight : score updated after assignation of different weights for each QTL according to allele effect or other index defined by users. • Utility criterion [in progress] : expected mean and variance at the next generation (favor heterozygous…). • Random [in progress]. A list of crosses/couples is proposed (based on these criteria) to reach a given number, totally at random or according to constraints such as equal contribution of individuals.
OptiMAS: Graphical User Interface (GUI) 4 connected Inbred Lines • OptiMAS tested on a multiparental connected design (Blanc et al, 2006) • Example : • 3rd cycle of selection. • Population of 298 individuals generated. • Prediction and selection of the best ones. • Generation of crosses. 6 hybrids F1 25 selected 900 plants F2 21 selected 291 plants Cycle 2 C Selection ? 298 plants Cycle 3 CM Blanc et al., TAG, 2006 DEMONSTRATION
Perspectives Short term • Interaction with user group to improve functions / ergonomy. • Connection with ICIS DB. • Implementation of complementation selection. • Expected variance of crosses. • Version 1 alpha release in June then publication. 2nd step • Consider allelic effects at QTL. • Multitrait fonctionalities to adjust economic indexes. • Visualisation of pedigree of selected individuals. • Linkage between QTL. • Wrap OptiMAS algorithms into R scripts / packages. • Combination with global genome performance estimation (genomic selection).
Acknowledgments • Guylaine Blanc and J.B Veyrieras for developing the first prototype of the program. • Nicolas Bardol for proximity beta testing. Thank you Questions welcome