130 likes | 141 Views
This paper introduces mStruct, a novel admixture model that considers both genetic admixing and allele mutations in population structure analysis. The model is based on microsatellites and single nucleotide polymorphisms (SNP) data and incorporates a mutation parameter for each locus. The authors present the generative process for mStruct and propose a variational inference algorithm for efficient and tractable inference. Experimental results on synthetic datasets and HGDP microsatellite data demonstrate the effectiveness of mStruct in capturing population structure with genetic admixture and allele mutation effects.
E N D
mStruct:A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric Xing School of Computer Science Carnegie Mellon University ICML 2008 Presented by Haojun Chen
Outline • Background • Structure Model • mStruct Model • Experiment Results • Summary
Background • Allele: one member of a pair or series of different forms of a gene • Population structure analysis aim to shed light on evolutionary history of modern human population • Microsatellites and single nucleotide polymorphisms (SNP) data: base of population structure analysis • State-of-the-art method: Structure
Structure Model x: Microsatellite alleles : unique set of population-specific multinomial distributions : vector of multinomial parameters, a.k.a., allele frequency profile (AP), of the allele distribution at locus i in ancestral population k : total number of observed marker alleles at locus I : total number of marker loci : total number of individuals : individual-specific admixing coefficient vector
Pitfall of Structure • There is no mutation model for modern individual alleles with respect to common prototypes in the modern populations • Every unique allele in the modern population is assumed to have a distinct ancestral frequency, rather than allowing the possibility of it just being a descendent of some common ancestral allele
mStruct Model : set of ancestral alleles : mutation parameter associated with locus : frequencies of the ancestral alleles : total number of ancestral alleles Microsatellite mutation model SNP mutation model
Generative Process • Generative process for Structure where • Generative process for mStruct step 2.2 above is replaced by
mStruct Model Inference • MCMC: slow • Variational inference for hidden variable variational EM for hyperparameter
Synthetic Data Twenty microsatellite genotype datasets with 100 individuals from 3 ancestral populations at 50 genotype loci
HGDP Microsatellite Data • Model selection by BIC (Bayesian Information Criterion) score
HGDP Microsatellite Data 1056 individuals from 52 populations at 377 autosomal microsatellite loci am-spectrum: spectrums of different ancestral populations gm-spectrum: spectrums of differentgeographical populations
Summary • mStruct takes into account genetic admixture and allele mutation effects • mStruct: extended LDA which allows noisy observations • Variational inference algorithm that allows tractable inference developed for mStruct • Other application: images, text and so on