290 likes | 508 Views
Software for population genetics. Structure: J. K. Pritchard et al . Geneclass 2: S. Piry et al . Structure. Identification of genetic clusters Identification of subclustering within breeds or relationships between breeds Breed assignment of unkown samples to reference set.
E N D
Software for population genetics Structure: J. K. Pritchard et al. Geneclass 2: S. Piry et al.
Structure • Identification of genetic clusters • Identification of subclustering within breeds or relationships between breeds • Breed assignment of unkown samples to reference set
StructureIdentification of genetic clusters • Baysian likelihood method of identifying K clusters • K: number of clusters/populations – provided by user or inferred by Structure
StructureAncestry models • No admixture model: Each individual originates from one of the K populations • Admixture models: Each individual has genomic fractions of more than one of the K populations • Linkage model: admixture model, but linked loci are more likely to originate from the same population. • Prior information model: user pre-defines (some of) the clusters • NB: the model is also determined by the type of data one has!!
StructureAnchestry models and input data • Dominant markers: noadmixture model. • AA and Aa cannot be distinguished so only a ´present´ or ´absent´ genotype is available. • AFLP, RFLP etc • Sequence data, Y chrom or mtDNA haplotypes: linkage model. Consider this as a single locus with many alleles.
Structureallele frequency models • Correlated allele frequencies: frequencies in different populations are likely to be similar (due to migrations or shared ancestry). • Independent allele frequencies: allele freqencies are independent draws from a distribution specified by a factor λ
StructureDetermining the K • How to estimate the number of populations / clusters in your dataset? • Fully resolving all the groups in your data (high K): testing all K values until highest likelihood values are reached. • Determining the rough relations (low K) • Trail and error
Structurerunning parameters • Likelihood method: the program optimizes its own internal parameters. • Startup configuration can have a very low probability, so Structure needs a learning run: the burnin (10.000-100.000 replicates) • Actual run: enough replicates to obtain statistically sound results (depending on your dataset) ~ 50.000 (?)
Geneclass 2breed assignment • Software for Genetic assignment and first-generation Migrant Detection • S. Piry, A. Alpetite, J.-M. Cornuet, D. Paetkau, L. Baudouin, A. Estoup • INRA, Fr. • Journal of Heredity 2004:95(6): 536-9
Geneclass 2breed assignment • Infers the probability of assignment of reference populations as origin of sampled individuals on the basis of multilocus genotypic data. • Haploid or diploid or mix. • Likelihood criteria • Genetic distances • Allele frequencies • Bayesian algorithm • Monte Carlo resampling
Two examples… • Products of protected geographical origin (PGI) • Vitellone dell´Appennino Centrale • Allowed breeds: Chianina,Romagnola, Marchicana • Not allowed: Piedmontese, Maremmana, Pezzata Rossa Italiana, Italian Brown, Italian Friesian, Charolais, Limousin, Belgian Blue • Veau du Limousin • Allowed breeds: Limousin, Blonde d'Aquitaine,Limousin, Bazadaise • Not allowed: Holstein, Friesian, Fries-Hollands, Belgian Blue, Main-Anjou, Normand, Bretonne-pied-noire, Charolais, Hereford, Aberdeen Angus, Gasconne, Aubrac, Salers, Montbélliard, Simmental, Piedmontese, Swiss Brown, Pirinaica
Objective? • Identify a representative sample from a batch • Traceability • Fraud? • Protection of the (cultural, economic) integrity of the product
How? • Typing with microsatellites. • Compare patterns / allele frequencies with reference set. • Reference library: product of EU diversity project Resgen: • ~45 breeds (still adding) • 20 animals per breed • 30 microsatellite markers
Title Markerorder Genotypes (allele1allele2) Populations
Optimization • No need to type all 30 microsatellites • Product specific level of marker information • Geneclass 2 option: selfidentification • Isolate breeds involved in the product (allowed or not allowed) • Infer the level of successful selfidentification per maker • Rank the markers in order of level of information
Conclusions • Breed assignment of unknown samples to a (large) reference set is quite successful • Optimizing markerorder for each question greatly decreases the amount of typing necessary. • For a more detailed picture of relationships, data can be analyzed in structure
Exercise • 37 unknown samples (file exercise.txt) • Use the reference set (file reference.txt) to assign breednames to the samples • Play with the loci to see the effect of different markers on the solution