230 likes | 343 Views
Estimating in vivo Hypermutation Rates and Lethal Frequencies. Steven H. Kleinstein Department of Computer Science Princeton University. Collaborators Yoram Louzoun ( Bar-Ilan University ) Mark Shlomchik (Yale University).
E N D
Estimating in vivo Hypermutation Rates and Lethal Frequencies Steven H. Kleinstein Department of Computer Science Princeton University Collaborators Yoram Louzoun (Bar-Ilan University) Mark Shlomchik (Yale University)
What’s hard about estimating the mutation rate? Number of divisions is unknown! High Mutation Rate Low Mutation Rate Number of Mutations Observed Number of Mutations Number of Cell Divisions Most recognized in vivo estimates took educated guesses (McKean et al, 1984 and Sablitzky et al, 1985) B Cells Undergo Somatic Hypermutation Introduce (point) mutations into DNA coding for antibody receptor
Generate immune response… • Auto-immune model (MRL/lpr AM14 heavy chain transgenic) (William, Euler, Christensen, and Shlomchik. Science. 2002 ) • Inject antigen (NP) to provoke immune response • (Jacob et al., 1991; Jacob and Kelsoe, 1992; Jacob et al., 1993; Radmacher et al., 1998) • Harvest spleen, cross-section and stain • Extra-follicular areas (Auto-immune) and germinal centers (NP) B cells T cells • Microdissect groups of proliferating B cells • Pick size <50 cells (Auto-immune), up to 100 cells (NP) • Use PCR to Sequence DNA coding for B cell receptor Dividing B cells FDC How can we estimate the mutation rate from these data? Motivating Experiment In auto-immune mouse model, observed hypermutating B cells in extra-follicular areas of spleen (not germinal centers) (Williams et al, Science, 2002)
4 5(TC) 1 3 1(GA) 9(CA) 2(GC) 2 8(TG) Germline Clonal tree ‘shapes’ reflect underlying dynamics Clonal Trees Provide Needed Information Analyze pattern of shared and unique mutations among sequences from each microdissection Germline GGGATTCTC 1 -C-----G- 2 -------G- 3 A------GA 4 A---C--GA
A A A B B B C D D (Simulation Data – 5 sequences / tree) g g ct ct c D a a t t Initial Sequence Initial Sequence Relevant shape measures can differentiate similar clones Relating Tree Shapes to Underlying Dynamics Investigate with computer simulation of B cell clonal expansion Parameters: mutation rate (m), lethal frequency (l), # divisions (d), pick size (p) Compare:Rate of 0.2 division-1 for 14 divisions Rate of 0.4 division-1 for 7 divisions
Intermediate Vertices is Useful Measure Compare: Rate of 0.2 division-1 for 14 divisions Rate of 0.4 division-1 for 7 divisions (Simulation Data) Shape measures can supplement information from mutation counting
Mutation Rate Assumes equivalent mutation rate in all trees, although number divisions may differ Experimental Observations Simulation of B cell expansion Set of Observed Tree Shapes = Distribution of Tree Shapes Number of mutations Intermediate vertices Sequences at root Also developed analytical method based on same underlying idea (The Journal of Immunology (2003) Vol. 171 No. 9, 4639-4649.) Method for Estimating Mutation Rate (m) Find mutation rate that produces distribution of tree ‘shapes’ most equivalent to observed set of trees
Run simulation many times to fill in equivalent matrix • Likelihood of experimentally observed tree t: Sample space is subset of all simulation runs • Likelihood of experimental dataset: Use Golden Section Search to optimize mutation rate (m) Details of the Simulation Method Estimate lethal frequency (l), then for each mutation rate (m)… Equivalent Matrix, E(t,d) # simulated trees ‘equivalent’ to observed tree after d divisions # divisions (d) Same tree shapes? t
Estimating the Lethal Frequency (l) Only replacement mutations can be lethal, so look at… Fraction of all mutations that are replacements: R / (R + S) Choose l so expected R/(R+S) equals observed average over all mutations Expected R/(R+S) can be calculated from germline DNA sequence
Method Precision (SD = 0.035 division-1) Method works even with limited number of clonal trees and sequences Validating the Simulation Method Use simulation to construct synthetic data sets with limited number of trees/sequences reflecting currently available experimental data
Generation of synthetic data Synchronous Division Asynchronous Division Assumption does not significantly impact rate estimate Testing Method Assumption… All cells in single microdissection divided same number of times (i.e., division is synchronous)
Mutation Rate in Autoimmune Response Experimental data set: 31 trees from 7 mice, 6 sequences / tree from extra-follicular areas (Williams et al, Science, 2002) Estimated l 0.55 based on R/(R+S) Estimated mutation rate is 1.0 ± 0.1 x 10-3 base-pair-1 division-1
Mutation Rate in Primary NP Response Experimental data set: 23 trees, 7 sequences / tree from germinal centers (Jacob et al., 1991; Jacob and Kelsoe, 1992; Jacob et al., 1993; Radmacher et al., 1998) • May not conform to all model assumptions: • Data based on larger picks • Positive selection may be factor Estimated mutation rate is 1.1 ± 0.1 x 10-3 base-pair-1 division-1
Summary • Developed simulation and analytical methods to estimate in vivo mutation rates (and lethal frequencies) • First rigorous method for in vivo estimates • Synthetic datasets used to show that… • Methods are precise (± 0.1 x 10-3 base-pair-1 division-1) • Assumption of synchronous division does not impact results • Extra-follicular B cells in autoimmune mouse hypermutate • Mutation rate (0.9 ± 0.1 x 10-3) similar to NP response (1.1 ± 0.1 x 10-3) • Future improvements in precision with additional data Rigorous method to compare mutation rates under varying experimental conditions
Acknowledgements Yoram Louzoun (Bar-Ilan University) Mark Shlomchik (Yale University) PICASsoProgram in Integrative Information, Computer and Application Sciences For more information: stevenk@cs.princeton.edu; www.cs.princeton.edu/~stevenk Kleinstein, Louzoun and Shlomchik The Journal of Immunology (2003) Vol. 171 No. 9, 4639-4649.
Mutation Rate in Primary NP Response Experimental data set: 23 trees, 7 sequences / tree from germinal centers (Jacob et al., 1991; Jacob and Kelsoe, 1992; Jacob et al., 1993; Radmacher et al., 1998) Estimated l 0.63 based on R/(R+S) • May not conform to all model assumptions: • Data based on larger picks: • Positive selection may be factor Estimated mutation rate is 1.1 ± 0.1 x 10-3 base-pair-1 division-1
B Cells Undergo Somatic Hypermutation Experimental data shows mutated bases compared with germline Germline PCR Results Mutations What can we learn about the underlying dynamics?
High Mutation Rate Low Mutation Rate Number of Mutations Observed Number of Mutations Number of Cell Divisions Estimating the B Cell Mutation Rate Counting mutations is not sufficient, number of cell divisions at time of measurement is unknown Also, problems of mutation schedule and positive/negative selection
New tree root Time Stem is removed Initial Sequence Stems Pruned to Reflect Local Expansion Closely related cells remain in close spatial proximity Long stems may be artifact of local micro-dissection Pick Mutation information retained in local branching structure
Observed shape Calculated shape For each observed tree, choose number of divisions to minimize error The average number of mutations per sequence (M) The average number of sequences present at the root of the tree (R) Total number of sequences in nodes with repeated sequences (P) Analytical Method Derived formulas to approximate tree shapes… Minimize error X(m) over all experimentally observed trees (t)
Validating the Analytical Method Use simulation to construct artificial data sets with limited number of trees/sequences reflecting currently available experimental data Results for analytical method (assumes correct l) Method works even with limited number of clonal trees and sequences
Validity of Underlying Assumptions • Common assumptions: • All trees have same underlying mutation rate • Mutation rate constant (per division) • Frequency of lethal mutations is ~ constant • Reasonable, impact needs testing: • In each tree, all cells have same # divisions • No positive selection over time-scale considered Use simulation and validation on synthetic data to test assumptions
Estimating the Lethal Frequency (l) Fraction of all mutations that are replacements is a key measure Not informative for individual trees, so can’t add to definition of ‘equivalence’ Combined average over many trees is good approximation • Simulated Clonal Tree (m = 1.0, l = 0.5, 3 seq / tree) Estimate for l based on combined data from all sequences