Understanding Linkage Disequilibrium in Marker-Assisted Selection: Key Concepts and Relevance in Orphan Crops

IGSS Workshop Marker Assisted Selection for Orphan CropsPart 3: Linkage Disequilibrium

Objectives • Present basic LD concepts and equations • Advantages and disadvantages of LD • List key differences between D, D’ and r2 • Show how LD impacts many aspects of marker based selection, marker assisted selection, and genomic selection.

Linkage and Linkage Disequilibrium

Physical Linkage 1st Gen of Rec. 2nd Gen of Rec. 5th Gen of Rec. X

Physical Linkage 5th Gen of Rec. • Recombination reduces LD between physically linked loci • Random mating reduces D and r2 as D in the tth generation of random mating will be Dt = (1-c)t D0 after t generations of random mating where D0 is D prior to random mating. • After an infinite number of generations of random mating, D∞ = 0, all loci with c>0 would appear to be in equilibrium (eg r2=0), even physically linked loci. • LD persist longer in inbred species due to reduced heterozygosity, thus less effective recombinations

Linkage Disequilibrium (LD) • LD occurs between two loci when there is a non-random association between alleles at the loci • When two loci are in LD, then the alleles at one locus predict the alleles at the other locus • LD is a measure of the association • Significant LD: Two loci are said to be in LD when their alleles are associated • Non-significant LD: Two loci are in equilibrium (eg LD≈0) when their alleles are NOT associated

Linkage Disequilibrium (LD) • Two forms of LD • Physical linkage of two loci on same chromosome • Non-random association of alleles between loci on different chromosomes (or very distant on same chromosome) • Causes of LD • Actual linkage • Mutation • Migration • Epistatic selection • Drift (small population size), • Non-random mating

Why does LD matter? • MAS and association genetics all require LD as most markers are NOT in a gene • Background levels of LD, determines what marker density to use - • Marker effects (a, r2, etc) are dependent on LD between QTL and marker locus • Marker and QTL must be in linkage disequilibrium (LD) for marker to have an effect Q M = Q c=cM=0 cM > 0 M Note: cM = c * 100 where c is the recombination frequency between M and Q loci

Measuring LD D` Influenced by small sample size and rare alleles R2 D Dependent on allele frequencies LD

LD Example

High or low LD???

Measuring LD: D LD originally measured as: D` = f(MQ)-f(M)f(Q) = observed – predicted D = f(MQ)*f(mq) – f(Mq)*f(mQ) where there are two loci each with two allele (M,m, and Q,q) and f(MQ) is the frequency of the MQ haplotype. D is greatly influenced by allele frequencies making comparisons between pairs of loci or populations impossible. D could be positive or negative depending on whether MQ/mq or Mq/mQ were the parental type gametes

Measuring LD: D` In a common standardization of D, a relative measure of disequilibrium (D) compared to its maximum is used: D` = D / Dmax When D is positive, Dmax= min [ (p1q2) or (p2q1) ] When D is negative Dmax= min [ (p1q1) or (p2q2) ] This standardization makes D-values range between 0 and 1 Reliable estimate of physical distance and therefore recombination history D is greatly influenced by allele frequencies making comparisons between pairs of loci or populations impossible. D could be positive or negative depending on whether MQ/mq or Mq/mQ were the parental type gametes

Measuring LD: r2 The statistic r2 corrects many of the issues associated with D by standardizing D r2 can range from 0 (two loci in equilibrium) to 1 (non-random loci in complete LD). r2 is a squared correlation. r2 measures recombination history as well as mutation frequency r2 can be calculated for loci with >2 alleles and for multiple loci.

LD Visualization

Example 3 : Populations D = f(MQ)*f(mq) – f(Mq)*f(mQ) 1 2 3 4 5

r2 between pairs of loci with different recombination frequencies (c) after successive generations of random mating

LD between unlinked loci • Two loci on different chromosomes or very distant on the same chromosome, can appear to be associated (eg have significant LD, r2 is > 0). This is due to: • Population structure • Selection

Example: One population with two subgroups Sub group 1 Sub group 2 Population structure occurs when not all individuals are derived from the same random mating population in Hardy Weinberg equilibrium. Causes spurious associations if not accounted for

Population structure … Transgene Eathington et al, 2007. 49 markers on 15 out of 20 chromosomes had highly significant effects - ignored When population structure was accounted for, the location of the transgene was correctly identified with one marker

PC of Elite Panel Marker Data 42.1% of parentage From “Truman” or Truman full-Sibs

Accounting for population structure

Linkage Disequilibrium Decay • Varies according to: • Species • LD persists differently say in cattle compared to maize depending on the recombination history of the species. • Types of germplasm • LD decays more slowly among elite inbred lines than OPVs or landraces that have undergone larger numbers of meiotic events • Mode of pollination • LD persists longer map distances or large numbers of base pairs in self pollinating crops than in cross pollinated crops • Different genes • In maize, LD decayed rapidly within 500 base pairs in d3 genes while it did not follow the same pattern in sugnes

Fst Statistic…

LD Decay in Biparental Linkage mapping Populations LD Decays much slower and in a predictable manner

LD Decay in a association mapping population LD Decays more rapidly in a diverse population due to different forces like population structure and more chances of recombination. Dense or sparse genotyping?

Extensive LD in barley of the upper Midwest Small effective population size with limited diversity and therefore LD Decays much slower – extending out to about 8cM when r = 0.2

LD decay in 2 wheat populations r2 = 0.2 at ~ 4-8 cM

Edward S. Buckler Buckler lab, LD decay in Maize

Example: LD decay (r2 = 0.2) for wheat chromosomes based on SNP data

LD r2 and QTL mapping r2 • LD: r2 is measure of association between alleles at two loci, a squared correlation of genotypic data • QTL: r2 is proportion of phenotypic variation modeled by a marker, the association of genotypic from one locus and phenotypic data from many loci. Can be viewed as squared correlation of the genotypic and phenotypic data • The LD r2 though sets the limit of how much genetic variation controlled by a QTL a marker can explain. • r2 = 1 between M and Q loci, then a marker can explain 100% of the genetic variation controlled by Q • r2 = 0.5 between M and Q loci, then a marker can explain 50% of the genetic variation controlled by Q • r2 = 0 between M and Q loci, then a marker can explain 0% of the genetic variation controlled by Q.

Main Points • Need LD between marker and QTL for MAS, genomic selection, etc to work • When two loci are in LD, then the alleles at one locus predict the alleles at the other locus • LD arises from physical linkage of loci, but can also occur between unlinked loci • LD r2 between a marker and QTL determines how much genetic variation controlled by that QTL the marker can model

KDCompute Plugin: Linkage Disequilibrium

Understanding Linkage Disequilibrium in Marker-Assisted Selection: Key Concepts and Relevance in Orphan Crops