1 / 27

Software for population genetics

Software for population genetics. Structure: J. K. Pritchard et al . Geneclass 2: S. Piry et al . Structure. Identification of genetic clusters Identification of subclustering within breeds or relationships between breeds Breed assignment of unkown samples to reference set.

errin
Download Presentation

Software for population genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software for population genetics Structure: J. K. Pritchard et al. Geneclass 2: S. Piry et al.

  2. Structure • Identification of genetic clusters • Identification of subclustering within breeds or relationships between breeds • Breed assignment of unkown samples to reference set

  3. StructureIdentification of genetic clusters • Baysian likelihood method of identifying K clusters • K: number of clusters/populations – provided by user or inferred by Structure

  4. StructureAncestry models • No admixture model: Each individual originates from one of the K populations • Admixture models: Each individual has genomic fractions of more than one of the K populations • Linkage model: admixture model, but linked loci are more likely to originate from the same population. • Prior information model: user pre-defines (some of) the clusters • NB: the model is also determined by the type of data one has!!

  5. StructureAnchestry models and input data • Dominant markers: noadmixture model. • AA and Aa cannot be distinguished so only a ´present´ or ´absent´ genotype is available. • AFLP, RFLP etc • Sequence data, Y chrom or mtDNA haplotypes: linkage model. Consider this as a single locus with many alleles.

  6. Structureallele frequency models • Correlated allele frequencies: frequencies in different populations are likely to be similar (due to migrations or shared ancestry). • Independent allele frequencies: allele freqencies are independent draws from a distribution specified by a factor λ

  7. StructureDetermining the K • How to estimate the number of populations / clusters in your dataset? • Fully resolving all the groups in your data (high K): testing all K values until highest likelihood values are reached. • Determining the rough relations (low K) • Trail and error

  8. Structurerunning parameters • Likelihood method: the program optimizes its own internal parameters. • Startup configuration can have a very low probability, so Structure needs a learning run: the burnin (10.000-100.000 replicates) • Actual run: enough replicates to obtain statistically sound results (depending on your dataset) ~ 50.000 (?)

  9. Geneclass 2breed assignment • Software for Genetic assignment and first-generation Migrant Detection • S. Piry, A. Alpetite, J.-M. Cornuet, D. Paetkau, L. Baudouin, A. Estoup • INRA, Fr. • Journal of Heredity 2004:95(6): 536-9

  10. Geneclass 2breed assignment • Infers the probability of assignment of reference populations as origin of sampled individuals on the basis of multilocus genotypic data. • Haploid or diploid or mix. • Likelihood criteria • Genetic distances • Allele frequencies • Bayesian algorithm • Monte Carlo resampling

  11. Two examples… • Products of protected geographical origin (PGI) • Vitellone dell´Appennino Centrale • Allowed breeds: Chianina,Romagnola, Marchicana • Not allowed: Piedmontese, Maremmana, Pezzata Rossa Italiana, Italian Brown, Italian Friesian, Charolais, Limousin, Belgian Blue • Veau du Limousin • Allowed breeds: Limousin, Blonde d'Aquitaine,Limousin, Bazadaise • Not allowed: Holstein, Friesian, Fries-Hollands, Belgian Blue, Main-Anjou, Normand, Bretonne-pied-noire, Charolais, Hereford, Aberdeen Angus, Gasconne, Aubrac, Salers, Montbélliard, Simmental, Piedmontese, Swiss Brown, Pirinaica

  12. Objective? • Identify a representative sample from a batch • Traceability • Fraud? • Protection of the (cultural, economic) integrity of the product

  13. How? • Typing with microsatellites. • Compare patterns / allele frequencies with reference set. • Reference library: product of EU diversity project Resgen: • ~45 breeds (still adding) • 20 animals per breed • 30 microsatellite markers

  14. Title Markerorder Genotypes (allele1allele2) Populations

  15. Optimization • No need to type all 30 microsatellites • Product specific level of marker information • Geneclass 2 option: selfidentification • Isolate breeds involved in the product (allowed or not allowed) • Infer the level of successful selfidentification per maker • Rank the markers in order of level of information

  16. Conclusions • Breed assignment of unknown samples to a (large) reference set is quite successful • Optimizing markerorder for each question greatly decreases the amount of typing necessary. • For a more detailed picture of relationships, data can be analyzed in structure

  17. Exercise • 37 unknown samples (file exercise.txt) • Use the reference set (file reference.txt) to assign breednames to the samples • Play with the loci to see the effect of different markers on the solution

  18. Solution

More Related