MALD

MALD Mapping by Admixture Linkage Disequilibrium

Introduction • Admixture – a genetic mix of two or more different populations.(such as African-Americans) • Linkage Disequilibrium – an event where two alleles at different loci are genetically linked, and show non-random association.

Why do we need MALD? Problem We want to find the cause of a complex genetic disorder or disease. The disease is a result of several different genes, each one having a small effect (or none), but all together cause the disease phenotype.

Why do we need MALD? Solution 1 Using linkage mapping methods inorder to find the cause. Problem Linkage mapping will only find the allele with the biggest influence on the disease, but not necessarily it’s cause. Individuals with the allele maybe healthy and individuals without maybe sick. Failed

Why do we need MALD? Solution 2 Using genome-wide association studies to find the causative alleles in the affected individuals. Problem genome-wide association studies require haplotypes of the entire genome for each of the thousands of individuals in the study, costs would be several millions of dollars for a single study!! Not Feasible

??? We need a model that enables us to infer information about the entire genome of an individual without having to genotype the entire genome and without having to use thousands of people…

MALD – some basic concepts • Linkage Disequilibriumlinkage disequilibrium refers to an event where 2 alleles in different locations on the genome are linked (non-random). Linkage is not due to recombination but rather other effects such as epistasis.

Linkage Disequilibrium Linkage disequilibrium is usually measured by the covariance of the allele frequency: Here p1,p2 denote the marginal allele frequencies at the two loci and h12 denotes the haplotype frequency in the joint distribution of both alleles.

Linkage Disequilibrium – cont. Normal frequencies

Linkage Disequilibrium – cont. Linkage equilibrium

Linkage Disequilibrium – cont. Linkage disequilibrium

Linkage Disequilibrium – cont. Using the following notation:

Linkage Disequilibrium – cont. DAB is hard to interpret: • Sing is arbitrary. • Range depends on allele frequencies

Linkage Disequilibrium – cont. D’AB – a scaled version of DAB Better estimates exists which will not be discussed here…

Why does linkage equilibrium hold for most loci? Generation t, initial configuration:

Why does linkage equilibrium hold for most loci? Generation t+1, without recombination:

Why does linkage equilibrium hold for most loci? Generation t+1, with recombination:

Why does linkage equilibrium hold for most loci? Generation t+1, Overall:

Why does linkage equilibrium hold for most loci? r = the probability of recombination

Admixture Linkage Disequilibrium In an admixed group the genotype of the people in the population is a mix of both parent population, alleles in the genetic mix can be linked back to their original parental populations. These segments of LD in an admixed individual are said to be ALD.

MALD – basic concepts cont. Relies on the differences in allele frequency between the parent populations. Using these differences allows us to focus on changes in regions in the genome rather than specific genes.

Admixture • Admixture is a result of a mixture of 2 or more populations: • African-Americans: 80% African 20% European • Latinos: 50% Native American 50% European • Caribbean: 50% European 30% Native American 20% Western African

MALD MALD uses the allele frequencies of areas in the genome that exhibit LD to the parental population. When an individual from an admixed population is affected by a genetic disease, with higher frequency in either parental population. The variation in allele frequency can be detected, and the disease locus found.

MALD MALD can be separated to 5 steps: • Choosing a cohort of people affected by the disease from an admixed population.

Choosing the group • The group needs to be at least 2nd generation admixed in order to rule out data resulting from recombination* • A set of markers that identifies the origin of the alleles must be available. • The individuals need to be at least 10% admixed.

Conservation of ALD

Which disease is suitable for MALD • The disease must be one that has a large difference in frequency between the parental populations (~60%). • Disease should be complex (otherwise we can use linkage mapping).

MALD • The group of people selected are genotyped with a set of polymorphic markers.

Markers used for MALD • Markers must be evenly spaced and sufficiently dense (at most 1.5cM apart) • Markers must be able to differentiate between alleles from parental populations. • Markers should not show LD within the parental populations. • Markers must have high Shannon Information Content (SIC).

Markers used for MALD • The amount and spacing of the markers required depends on the amount of admixture. • More admixture (such as Latinos) means more fragmented LD segments. Which means more markers are needed in order to find the origin of each loci.

Shannon Information Content A measurement used in Information Theory. SIC of a marker is the amount of information that I gain from using this marker. SIC is a much better measurement of the quality of a marker from simple LD (D) since it takes into consideration the amount of information gained by not finding the marker.

Shannon Information Content A SIC value of 0.035for a marker is considered sufficient of MALD.

MALD • The patchwork of ancestral chromosome data is assessed for every individual • Chromosome regions that have elevated frequency of the ancestry with the higher disease incidence are Identified. • The cause at each loci is identified.

The Power of MALD • Theoretically MALD enables us to only choose cases (affected individuals), without the need for controls. • MALD can find the causative allele in a resolution of 10cM*, ~100 genes. Which can later be analyzed by association studies, and research. * Depending on the amount of admixture and density of markers.

The Power of MALD • Sample sizes are considerably smaller for MALD analysis. • The amount of SNPs to be mapped is considerably smaller. • An individual can be genotyped for a MALD study for a few hundred dollars. • Feasible!!

Limitations and Guidelines • Assessing LD is tricky, LD may result from natural selection in the parental population. This must be ignored so as not to give false-positives. • Errors in assessing the frequencies in parental populations can have the same effect. • Ethical consideration must be taken into account, information might be misused.

Criteria for declaring significance in a MALD study • The Bayesian statistic for detecting genome-wide significant association should be >2. • The deviation of European ancestry compared with the genome average should be seen in cases only, and not in controls. • The signal should remain when the marker that contributes most strongly to disease is removed. • Markers that are in linkage disequilibrium with each other in ancestral European and western African populations should be excluded from the mapping by admixture linkage disequilibrium (MALD) marker set. • The region of association should be statistically significant based on two different Markov chain Monte Carlo analysis-software packages. • The P-values for case–control association studies should be obtained by carrying out permutation testing. The statistic at the disease locus must be more extreme, and therefore more significant, than for any other locus throughout the genome in 100 random permutations of the case and control labels. • The statistic for association should increase in significance when marker density at the locus is increased, or when more affected samples are added to the study.

MALD – the future • Using databases such as the International HapMap project, SNPs of each parental population can be assessed, and we will be able to reach ~90% information about the origin at each loci. • Maps for other populations will be built allowing ALD studies in Latinos, Hawaiians and Aborigines. • SIC values will become less important as cost of denser genotyping drops. • Disease that exhibit 10-20% frequency difference between parental populations will be assessed.

Questions?

MALD

MALD

Presentation Transcript

Geoffrey J. Oravec, MD, MPH, MALD Capt, USAF, MC

Methodology for MALD-TOF data analysis