310 likes | 522 Views
QTL mapping in mice. Lecture 10, Statistics 246 February 24, 2004. The mouse as a model. Same genes? The genes involved in a phenotype in the mouse may also be involved in similar phenotypes in the human. Similar complexity?
E N D
QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004
The mouse as a model • Same genes? • The genes involved in a phenotype in the mouse may also be involved in similar phenotypes in the human. • Similar complexity? • The complexity of the etiology underlying a mouse phenotype provides some indication of the complexity of similar human phenotypes. • Transfer of statistical methods. • The statistical methods developed for gene mapping in the mouse serve as a basis for similar methods applicable in direct human studies.
Quantitative traits (phenotypes) 133 females from our earlier (NOD B6) (NOD B6) cross Trait 4 is the log count of a particular white blood cell type.
Another representation of a trait distribution Note the equivalent of dominance in our trait distributions.
A second example Note the approximate additivity in our trait distributions here.
Trait distributions: a classical view In general we seek a difference in the phenotype distributions of the parental strains before we think seeking genes associated with a trait is worthwhile. But even if there is little difference, there may be many such genes. Our trait 4 is a case like this.
Data and goals Data Phenotypes: yi= trait value for mouse i Genotype: xij = 1/0 of mouse i is A/H at marker j (backcross); need two dummy variables for intercross Genetic map: Locations of markers Goals Identify the (or at least one) genomic region, called quantitative trait locus = QTL, that contributes to variation in the trait Form confidence intervals for the QTL location Estimate QTL effects
Models: Recombination • We assume no chromatid or crossover interference. • points of exchange (crossovers) along chromosomes are distributed as a Poisson process, rate 1 in genetic distancce • the marker genotypes {xij} form a Markov chain along the chromosome for a backcross; what do they form in an F2 intercross?
Models: GenotypePhenotype • Let y = phenotype, g = whole genome genotype • Imagine a small number of QTL with genotypes g1,…., gp (2por 3p distinct genotypes for BC, IC resp). • We assume • E(y|g) = (g1,…gp ), var(y|g) = 2(g1,…gp)
Models: GenotypePhenotype, ctd • Homoscedacity (constant variance) • 2(g1,…gp) = 2(constant) • Normality of residual variation • y|g ~ N(g ,2) • Additivity: • (g1,…gp )= + ∑j gj (gj = 0/1 for BC) • Epistasis: Any deviations from additivity.
The simplest method: ANOVA • Split mice into groups according to genotype at a marker • Do a t-test/ANOVA • Repeat for each marker • Adjust for multiplicity LOD score = log10 likelihood ratio, comparing single-QTL model to the “no QTL anywhere” model.
Exercise • Explain what happens when one compares trait values of individuals with the A and H genotypes in a backcross (a standard 2-sample comparison), when a QTL contributing to the trait is located at a map distance d (and recombination fraction r) away from the marker. • 2. Can the location of a QTL as in 1 be estimated, along with the magnitude of the difference of the means for the two genotypes at the QTL? Explain fully.
Interval mapping (IM) • Lander & Botstein (1989) • Take account of missing genotype data (uses the HMM) • Interpolates between markers • Maximum likelihood under a mixture model
Interval mapping, cont • Imagine that there is a single QTL, at position z between two (flanking) markers • Let qi= genotype of mouse i at the QTL, and assume • yi | qi ~ Normal( qi , 2 ) • We won’t know qi, but we can calculate • pig = Pr(qi = g | marker data) • Then, yi, given the marker data, follows a mixture of normal distributions, with known mixing proportions (the pig). • Use an EM algorithm to get MLEs of = (A, H, B, ). • Measure the evidence for a QTL via the LOD score, which is the log10 likelihood ratio comparing the hypothesis of a single QTL at position z to the hypothesis of no QTL anywhere.
Exercises • Suppose that two markers Ml and Mr are separated by map distance d, and that the locus z is a distance dl from Ml and dr from Mr. a) Derive the relationship between the three recombination fractions connecting Ml , Mrand z corresponding to dl + dr =d. b) Calculate the (conditional) probabilities pig defined on the previous page for a BC (two g, four combinations of flanking genotypes), and an F2 (three g, nine combinations of flanking genotype). • Outline the mixture model appropriate for the BC distribution of a QT governed by a single QTL at the locus z as in 1 above.
LOD thresholds • To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. • LOD threshold = 95th %ile of the distribution of genome-wide maxLOD,, when there are no QTL anywhere • Derivations: • Analytical calculations (Lander & Botstein, 1989) • Simulations • Permutation tests (Churchill & Doerge, 1994).
Acknowledgement Karl Broman, Johns Hopkins