520 likes | 735 Views
The Cognitive Dog: Savant or Slacker. Class 5: Observing your dog: from the genome to the expression. Molecular Genetics and the Dog. Setting the stage. The archeological mystery. A few controversial finds dated at 12,000 - 14,000 BP Is it a dog or a wolf?
E N D
The Cognitive Dog: Savant or Slacker • Class 5: Observing your dog: from the genome to the expression
The archeological mystery... • A few controversial finds dated at 12,000 - 14,000 BP • Is it a dog or a wolf? • Numerous and uncontroversial finds dated at 9,000 - 7,000 BP • More pronounced dog-like features of skull and teeth
Clutton-Brock, J. (1995). Origins of the dog: domestication and early history. The Domestic Dog: its evolution, behavior, and interactions with people. J. Serpell. Cambridge, UK, Cambridge University Press: 8-20. From Israel, 12,000 BP
Fossil evidence • What they are looking for... • size of teeth • size and proportions of skull & jaw • ... Clutton-Brock, J. (1995). Origins of the dog: domestication and early history. The Domestic Dog: its evolution, behavior, and interactions with people. J. Serpell. Cambridge, UK, Cambridge University Press: 8-20.
The proliferation of breeds is product of the 19th century • Buffon’s classification of dogs ~1800, today over 150 AKC breeds
Young, A. and D. Bannach (2006). Morphological Variation in the Dog. The Dog and Its Genome. E. A. Ostrander, U. Giger and K. Lindblad-Toh. Cold Spring Harbor, NY, Cold Spring Harbor Press: 584. The most diversity of any mammalian species • And this doesn’t even count behavioral...
Can looking at dogs today, specifically the dog genome, tell us anything about... • The origin of the pet dog (when and where)? • The evolution of breeds? • The effect of the 19th/20th century explosion in breeds? • The genetic basis of morphological and behavioral diversity?
What is a genome anyway? • It describes the genetic material (DNA) within the nucleus of a cell. • Sequencing the genome means creating a map that describes the sequence of bases on each chromosome. • A gene is an identifiable part of a chromosome that contains the instructions how to, and possibly when to create proteins • Most of a genome though is non-coding, i.e., does not contain a gene Carroll, S. B. (2006). The Making of the Fittest: DNA and the Ultimate Forensic Record of Evolutiom. New York, NY, W.W. Norton. Proteins are a big deal because they act as enzymes, building blocks, and help regulate metabolism and development
Dogs have 39 pairs of chromosomes • Each pair is made up of a chromosome from Mom and one from Dad • For any given location on a chromosome, if it contains the same element as at the same location on its counterpart it is called homozygous, if different it is called heterozygous. • Recombination and mutation create variation Wilkie, P. J. (1999). Future Dog: Breeding for Genetic Soundness. St. Paul, MN, University of Minnesota Agricultural Service.
Mitochondrial DNA • Passed down directly from mother via egg cell. • Only change from 1 generation to next is due to mutation, but rate of mutation is very low • 500 to 1000 copies vs. 2 copies of nuclear DNA per cell http://micro.magnet.fsu.edu/cells/animalcell.html
Just because its junk, doesn’t mean its not useful: Microsatellites • Repeating sequences in “junk” DNA at known locations (assumption: no selective advantage) • Mutations take the form of additions & deletions to repeating pattern. Different lengths identify individuals/populations • Change at faster rate than mtDNA http://www.asicoaquaticmarkers.com/AnatomyofaMicrosatellite.htm
Wilkie, P. J. (1999). Future Dog: Breeding for Genetic Soundness. St. Paul, MN, University of Minnesota Agricultural Service. Practical application of microsatellite markers
Wilkie, P. J. (1999). Future Dog: Breeding for Genetic Soundness. St. Paul, MN, University of Minnesota Agricultural Service. Microsatellites • who’s my daddy?
Haplotypes: for any given part of the genome, typically a few common sequences in a population • Think of mtDNA (and DNA) as a long sequence of bases (A,T,G,C) made up of sub-sequences, known as Haplotypes, i.e., identifiable subsequences • When a mutation occurs, it creates a 1 base change: ATTA -> ATCA. Note: Prob. of base change at same site again is low, but needs to be accounted for. • At a given region of the genome there may be one or more haplotypes within the population. • e.g., 30% of the population has haplotype 1, 20% has haplotype 2, 50% of the population has haplotype 3.
Haplotypes can be used to build a family tree... • Look at... • Number & frequency of haplotypes in population • Clustering based on closeness of different haplotypes • ‘Family tree’ based on model of minimum substitutions (one step changes)
Example 4 mutations away from wolf, 2 away from breed 1 ATCGAACTTTAC breed 2 ATCGTACCTTAC breed 1 2 mutations away from wolf ATGGTACCTGAC wolf One possible interpretation: breed 1 is closer to common ancestor, and a more ancient breed...
Example GAGGTATCTTAC breed 3 ATCGAACTTTAC breed 2 ATCGTACCTTAC breed 1 ATGGTACCTGAC ATGGTACCTGAC wolf
Can these techniques tell us about when dogs evolved from wolves & where?
The mitochondrial clock • Use mtDNA as a the basis for a clock to estimate how long ago 2 populations diverged... • time = (amount of change)/(rate of mutation) • e.g., 2M years = (7% change)/(3.5%/M year) • Rate estimated from fossil evidence... • 7.8% difference between coyotes and wolves • Fossil evidence of split 1M years ago
Initial work in the 90’s using mtDNA analysis... • Vila et al performed a comparative analysis of mtDNA of dogs and wolves & other canids and concluded... • Dogs did seem to have evolved from wolves • Gray wolves differ from dogs by 0.2% of mtDNA • Wolves differ from coyotes by 4% of mtDNA • The date of the split based on using a mitochondrial clock was 135,000BP • The date was immediately called into question...
The mitochondrial clock isn’t quite like a swiss watch... • Problems as a clock... • Assumes rate of mutation can be reliably estimated and is constant • Across species • Across parts of mtDNA • Variation among lineages may not be constant • A lineage have accumulated more or less mutations • One lineage may have suffered from a collapse & lost variation.
The mitochondrial clock... bottleneck Change that had selective advantage, e.g., resistance to distemper... “lucky founder”, e.g., he was sleeping on high ground when the flash flood came through
The mitochondrial clock... Multiple founding lines may produce the same variation in less time than 1 founding line There are too many conflating causes for diversity/lack of diversity to use it as the basis for a clock...
Subsequent work... • While 100,000 years is still used as the outside number, my interpretation is that consensus around 15K ypb seems to be forming in the dog genome community. Indeed... • “The available mtDNA data do not give resolution enough to precisely determine a date for the origin of the dog, but the archaeological record indicates an origin ~15,000 ybp, a date which is not contradicted by the mtDNA record” • Savolainen, P. (2005). mtDNA Studies of the Origin of Dogs. The Dog and Its Genome. E. Ostrander, U. Giger and K. Lindblad-Toh. Cold Spring Harbor, NY, Cold Spring Harbor Press: 584. • Multiple domestication events, interbreeding...
Savolainen, P., Y.-p. Zhang, et al. (2002). "Genetic Evidence for an East Asian Origin of Domestic Dogs." Science298: 1610-1613. More diversity, & unique haplotypes in East Asia • Lots of questions about this conclusion...
Ancient new world dogs... • More closely related to old world dogs than to new world gray wolves. • Seemed to be descended from 5 lineages of old world dogs • Presumably introduced around 12,000- 14,000 BP • Did not survive 2nd wave of immigration, e.g. today’s Mexican hairless more closely related to old world dogs than to ancient new world dogs. Leonard, J., R. Wayne, et al. (2002). Ancient DNA Evidence for Old World Origin of New World Dogs. Science. 298: 1613-1616.
But at best this is circumstantial evidence... • There are a ton of explanations for why you might see greater diversity in one place rather than another that have nothing to do with the age of the breed... • Population size • Bottlenecks • Comparing apples and oranges: inbred lines in Europe/US vs. mongrels in East Asian samples. • Pattern of current diversity may be different than ancestral diversity
Parker et al used SNPs and microsatellite markers to examine breed relationships... • Choose to use microsatellite markers within DNA rather than mtDNA so as to better reflect modern origin of most breeds, and tried to build a phylogentic tree... Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164. Which of these are good obedience dogs?
Parker et al then use statistical clustering techniques... • Big idea... • Define a distance metric that allows you to say how close 1 pattern is to another • Form clusters based on distance. Number of clusters can... • Come from the data • apriori, e.g., “if I were to say there were 2 clusters, find the best 2 clusters and tell me who would be in each”
Pattern analysis is a very well established field • But it does rest on apriori decisions
Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164. Assuming 2 clusters • Note how Asian & Arctic breeds, sigh hounds & wolves are in 1 cluster
Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164. Assume 3 clusters • 3rd cluster tend to be mastiff kinds of dogs: broad heads & lots of muscle
Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164. Assume 4 clusters • Herding and some of the sight-hounds, but also shih tsu and pugs:-)
Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164. The remaining dogs tend to be hunting dogs & terriers
The technique was able to correctly assign dogs to its correct breed in almost all cases... Parker, H., L. Kim, et al. (2004). Genetic Structure of the Purebred Domestic Dog. Science. 304: 1160-1164.
Does this mean my Pharaoh hound isn’t descended from the time of Cleopatra? • Well, its not looking good, BUT... • The data and techniques weren’t sufficient to build a statistically strong phylogenetic tree for more than a few breeds. That is, it can’t tell you who was descended from whom, and when... • The clustering only tells how close the breeds, as represented by the individuals tested, are with respect to a certain distance metric. May reflect any number of things including... • morphological similarity in part or whole • behavioral similarity
And then came Tasha... • Scientists at the Broad Institute were able to create a high quality map of Tasha’s genome • “Guide comparative analysis of human genome” • “Explore genetic basis of disease susceptibility, morphological variation and behavioral traits” Lindblad-Toh, K., C. M. Wade, et al. (2005). "Genome sequence, comparative analysis and haplotype structure of the domestic dog." Nature438(7069): 803-819. Broad Institute
Some comparisons with human & mouse genome • Dog genome is ~2.4GB which is smaller than human genome (~2.9GB) or mouse genome (~2.6GB) • Dogs have 18,846 genes vs. 20426 genes in humans: dogs may be closer to “common ancestor” • High degree of synteny (similar genes line up together on chromosomes) • One of the most significant findings was a “common set of functional element corresponding to ~5% of human genome.” The stuff that makes organisms work? • 1.5% protein coding genes, the remainder is clustered around these genes and probably include “regulatory elements, structural elements and RNA genes”
SNPs • An SNP is a location in the genome where the nucleotide base (ATCG) shows some variation across the population. • SNP Rate = bases/SNP • The higher the SNP rate, the ‘closer’ the breeds/individuals are. Lindblad-Toh, K., C. M. Wade, et al. (2005). "Genome sequence, comparative analysis and haplotype structure of the domestic dog." Nature438(7069): 803-819.
Lindblad-Toh, K., C. M. Wade, et al. (2005). "Genome sequence, comparative analysis and haplotype structure of the domestic dog." Nature438(7069): 803-819. Family Tree
SNPs & Linkage Disequilibrium • LD is a measure that reflects the degree to which elements such as SNPs are correlated, i.e., knowing the value of one lets you predict the value of another. For example, • SNP 1 is A in 25% of the population, and SNP 2 is G in 10% of the population. The laws of probability would say that the frequency of SNP1 = A and SNP2 =G in a given individual => 2.5%. If it is observed to be well above 2.5%, then it is in “linkage disequilibrium”. • Knowing SNP1 in this case helps predict the value SNP2.
Lindblad-Toh, K., C. M. Wade, et al. (2005). "Genome sequence, comparative analysis and haplotype structure of the domestic dog." Nature438(7069): 803-819. Ostrander, E. A. and R. K. Wayne (2005). "The canine genome." Genome Research15: 1706-1716. Structure seen in Boxer consistent across breeds • But LD varies 10x across breeds reflecting origin & pop. size.
The big story • Wolf to dog created a bottleneck resulting in subset of wolf haplotypes • Pre-breed dogs characterized by diverse short haplotype blocks • Selective breeding created new bottleneck • Breeds characterized by long breed specific haplotype blocks made up of short ancestral blocks
The big story continued... • Within a typical 10KB region... • ~10 distinct haplotypes across breeds • Within a single breed, typically see 4 haplotypes with the 2 most common accounting for 80% of the frequency • Across breeds, haplotypes & frequency vary, but a high degree of sharing
So why do you care? • The structure of the dog genome means that it is dramatically easier to identify genetic basis for disease in dogs • 10-15K SNPs vs. 300K SNPs in humans to provide coverage • 99% chance of detecting locus given 100 affected and un-affected dogs in case of a single dominant gene • More difficult in case of multiple interacting genes, but still very high chance • Within breed and cross breed comparisons will be useful.