Gene Mapping: Linkage and Association Analysis for Disease Identification

Disease gene mapping is one of the main purposes for genotyping Two major approaches: linkage and association analyses Gene mapping:Linkage and association methods

Try to localize genes affecting specific phenotypes Search for: co-segregation of disease and marker alleles Linkage analysis

Idea of Linkage Analysis Types of Linkage Analysis Parametric Linkage Analysis Conclusions Basics of Linkage Analysis

One of the two main approaches in gene mapping. Uses pedigree data. Linkage Analysis

Two loci are linked if they appear nearby in the same chromosome. The task of linkage analysis is to find markers that are linked to the hypothetical disease locus Complex diseases in focus  usually need to search for one gene at a time Requires mathematical modelling of meiosis Genetic linkage and linkage analysis

Number of crossover sites is thought to follow Poisson distribution. Their locations are generally random and independent of each other. Meiosis and crossover

DIS Marker The simple idea • Recombination fraction  Always: 0 ≤  ≤ 0.5 • Task: Find  that maximises L( |data ) • Obtain measure for degree of evidence in favour of linkage (LOD score)

Polymorphic loci whose locations are known Most often SNPs or microsatellites Inherited within the chromosomes 1 2 4 3 2 1 3 4 2 3 1 2 Father Mother 2 3 1 4 3 1 Child Markers and inheritance

Two individuals share same allele label  they share the allele IBS (identical by state) Two individuals share an allele with same (grand)parental origin  they share an allele IBD (identical by descent) IBS sharing can easily be deduced from genotypes. IBD sharing requires more information. One can try to deduce IBD sharing based on family structure and inheritance. Markers and information

Markers and information 1,2 2,3 The children share allele 1 IBS. They also share it IBD. 1,2 1,3

Markers and information 1,2 1,3 The children share allele 1 IBS. 1,2 1,3 They do not share alleles IBD.

Markers and information 1,1 2,3 The children share allele 1 IBS. 1,2 1,3 They either share or do not share it IBD.

Chr. 1 1 2 1 12 1 2 1 1 2 1 1 5 2 14 1 3 2 1 2 2 Chr. 2 1 3 4 5 1 2 3 2 4 2 1 3 4 4 7 1 4 3 4 4 2 3 Chr. 22 2 1 1 3 2 2 2 3 3 4 Building blocks of linkage analysis Marker maps Pedigree structures Genotypes Phenotypes

Information about disease model (in parametric analysis) Building blocks of linkage analysis  (aa), probability of a homozygote being affected  (Aa), probability of a heterozygote being affected  (AA), probability of a non-carrier being affected (phenocopy rate) • Assumed disease allele frequency • Marker allele frequencies • Information about environmental variables

Parametric vs. non-parametric Dichotomous vs. continuous phenotypes Elston-Stewart vs. Lander-Green vs. heuristic Two-point vs. multipoint Genome scan vs. candidate gene Types of linkage analysis

A common approach in statistical estimation Define hypotheses Generate likelihood function Estimate Test hypotheses Draw statistical conclusions Maximum likelihood estimation

H0:  = 0.5 the disease locus isnot linked to the marker(s) HA:  0.5 the disease locus is linked to the marker(s) Hypotheses in linkage analysis

Lj = gF P(gF) P(yF | gF)gM P(gM)P(yM | gM)gOi P(gOi | gF, gM) P(yOi | gO) The parameter  is incorporated here Likelihood function for a single nuclear family G = genotype probabilitiesy = phenotype probabilities

The likelihood functions of multiple independent families are combined: L =  Lj or logL =  log Lj Several independent families

Compute values of likelihood function under null and alternative hypotheses. Their relationship is expressed by LOD score (essentially derived from the likelihood ratio test statistic. Testing of hypotheses

P-value gives a probability that a null hypothesis is rejected even though it was true. A LOD-score threshold of 3 corresponds to a single-test p-value of approximately 0.0001 Often, the significant areas pointed out are quite large, from 10-40 cM (millions of basepairs) On significance levels

0.56 0.5 LOD score 0.0 0.0 0.14 0.5 Recombination fraction LOD>3 taken as evidence of linkage.

Linkage analysis is a pedigree-based approach to gene mapping. Parametric vs. nonparametric methods. Hypothesis-driven vs. explorative analysis. Meta-analysis (integration of several studies into “one big study”) becoming increasingly popular. Conclusions

After successful linkage analysis, what to do? How to refine the linked area – where actually the disease susceptibility locus is? Outline of the rest of the lecture: Allelic association χ2 –test LD mapping Fine mapping and association analysis

An example: A leukaemia study, where a number of affected and healthy control persons have been contacted for DNA samples A candidate gene has been suggested: GSTM1, which functions in the metabolism of benzene GSTM1 has two different alleles, 1 and 2, where A person is “positive” for allele 1 if his genotype is 1 1 or 1 2 A person is “null”, if having genotype 2 2 The numbers of leukaemic and control individuals either positive or null with respect to allele 1 are compared by χ2-test in order to find out, whether there is statistically significant difference Allelic association

Results: observed frequencies Expected frequencies Allelic assosiation

The observed are compared to expected frequencies. (null hypothesis, H0: carrier status and disease occurrence are independent of each other ) Test statistic where oi is the observed frequency for class i, ei the expected frequency for class i k is the number of classes Test statistic

Now, χ2 = 111,39. Degrees of freedom for the test: df=(r-1)(s-1), where r = number of rows, s = number of columns Here, df = (2-1)*(2-1) = 1 The χ2 value is then compared to the null distribution of critical χ2-test statistic values (within the given df class) Allelic assosiation

df\p 0.10 .05 .025 .01 .005 1 2.71 3.84 5.02 6.63 7.88 2 4.61 5.99 7.38 9.21 10.60 3 6.25 7.81 9.35 11.34 12.84 4 7.78 9.49 11.14 13.28 14.86 5 9.24 11.07 12.83 15.09 16.75 6 10.64 12.59 14.45 16.81 18.55 7 12.02 14.07 16.01 18.48 20.28 8 13.36 15.51 17.53 20.09 21.96 9 14.68 16.92 19.02 21.67 23.59 10 15.99 18.31 20.48 23.21 25.19 11 17.28 19.68 21.92 24.73 26.76 χ2-distribution: critical values for chosen significance levels When the observed value of test statistic is greater than the critical value (for the chosen significance levels) given in the table, the null hypothesis can be rejected.

The value we obtained, χ2 = 111,39 , exceeds all critical values with df=1 given in the table. We conclude, that H0 can be rejected and thus, there is statistically significant difference between the affected and healthy with respect to GSTM1 genotypes. The relative frequencies of ’null’ and ’positive’ genotypes show the same It seems that different GSTM1 genotypes, by changing the benzene metabolism, considerably affect the probability of getting leukaemia Allelic association

Note: compared to linkage analysis, which is based on the observed inheritance patterns in pedigrees, the association analysis studies correlation of allele presence and a disease in the level of population We find an allele or a haplotype overrepresented in affected individuals → BUT the statistical correlation does not implicate a causal relationship !!!! → Quite often, the associating allele or haplotype is not the cause of the disease itself, but is merely correlated with the presence of the actual susceptibility gene in the same chromosome. It is then said to be in linkage disequilibrium with the disease gene. →

6 2 1 3 1 2 5 3 Original mutation in one chromosome in the founder population A Time Current generation C B An affected pedigree

The marker itself is NOT the reason for the disease, but it’s located nearby the disease susceptibility gene, and there is correlation between the presence of certain marker allele and the disease gene allele (LD) The correlation, i.e. LD, is based on founder effect: the disease allele has been born a long time ago on a certain ancestral chromosome, and majority of disease alleles existing presently predate from that original mutation LD mapping

LD-mapping: Utilizing the founder effect

Data Disease locus Disease status SNP1 S2 ... ... a ? 2 1 1 a ? 1 2 1 1 2 2 1 1 2 1 2 1 2 1 1 2 2 1 2 2 1 2 1 1 2 2 1 1 1 1 1 1 1 c 2 1 ? ?c 1 1 ? ? 1 2 2 1 1 2 1 1 2 2 2 1 1 1 a 1 1 2 1a 1 1 1 2 1 1 2 1 1 2 2 2 2 2 1 1 2 1 2 2 ? 1 1 1 ? 1 … … … …

”old-fashioned” allele association with some simple test (problem: multiple testing) TDT; modelling of LD process: Bayesian, EM algorithm, integrated linkage & LD Many approaches, several programs

The amount of LD is on a continuous but slow change, where the natural forces of genetic drift population structure natural selection new mutations founder effect ...affect it – even if two pairs of loci are in exactly the same distance from each other, their amount of LD may vary a lot. → This limits the accuracy of LD mapping, though it is much more accurate in pinpointing the location of a disease gene compared to linkage Limitations: LD is random process

Gene Mapping: Linkage and Association Analysis for Disease Identification

Gene Mapping: Linkage and Association Analysis for Disease Identification

Presentation Transcript

Dihybrid crosses and gene linkage

Genes and MS in Tasmania, cont.

Record Linkage Survey

mRNA - Seq : methods and applications

Gene Linkage

Linkage and Gene Mapping

Genomewide Association Studies

Evaluation of methods in gene association studies: yet another case for Bayesian networks

Association mapping with high density marker panels

Study Design for Linkage, Association and TDT Studies

Gene Linkage and Crossing Over

Methods of gene transformation ～ particle bombardment ～

Bump Mapping

High-resolution mapping of meiotic crossovers and noncrossovers

Fine Mapping of Complex Traits in Yeast: Mapping Meiotic Recombination across the Genome

Chapter 7 – Linkage, Recombination, and Eukaryotic Gene Mapping

What is an association study? Define linkage disequilibrium.

Chapter 7 Linkage, Crossing Over, and Chromosome Mapping in Eukaryotes

Record Linkage Survey

Genetic Linkage and Chromosome Mapping

MORE GENE INTERACTIONS