610 likes | 623 Views
Delve into the complexities of the human-chimpanzee genome comparison, highlighting genetic variations, evolutionary hypotheses, and structural disparities. Discover the significance of single nucleotide substitutions, indels, transposable elements, and more.
E N D
ID _PANTR HPI meeting The chimp and us
ID _PANPA HPI meeting The chimp and us
Complete chimp (PANTR) genome publication: Nature, sept 2005 - Genome derived from one individual ‘Clint’ (male from west Africa) - Inter vs intra (polymorphism) species differences !!! - Individual human genome variation: 1bp/1000 - Individual chimp genome variation: 1bp/250 (estimation Varki (2000)
Cheeta has been recognized by the Guinness Book of World Records as the world's oldest chimp. Chimps rarely live past the age of 40 in the wild, but can reach 60 in captivity. HPI meeting The chimp and us
The chimpanzee genome was sequenced to approximately four-fold coverage (error rate < 10-4) • WGS sequencing approach (-> problem for the assembly of region with segmental duplication): ~22.5 millions of sequence reads to assemble. • 2 assembly approaches (PCAP* and ARACHNE) • - In one* of the 2 approaches, contigs were assembled using the human genome as a guide "humanized" in their construction. • some sequences, such as insertions, deletions, and gene duplications, may not be accurately represented by the current chimpanzee assembly.
NCBI has adopted the NEW chimpanzee chromosome naming system as proposed by McConkey, 2004 • - The UCSC-Genome browser currently uses the original chimpanzee chromosome naming system.
Humanness • - Bipedalism • - Large cranial capacity (Brain size) • - Advance brain development (langage capability) • A long generation time • and some other ‘biomedical’ differences….
Chimps expressed apoE4 allele Chimps: no acne, rhinitis but no asthma, no rheumatoid arthritis Olson et al., (2002)
The last common acestor of humans and chimp is believed to have walked on 4 legs. • The oldest fossils that resemble bipedal human are 6 to 7 millions years old. • - DNA sequence analyses suggest the 2 lineages separated about 5.4 millions years ago.
Short time since human-chimp split: it is likely that a few mutations of large effects are responsible for part of the differences. Comparative genomic analysis Human vs mouse, chick…: focus on similarities Human vs chimp: focus on differences
Quantifying the sequence divergence: Single nucleotide subtitutions: 1.23% (1, 78% for chromosome Y) (0.8 % in protein coding region) Indels: ~1.5 % Transposable elements: 3 % Recent duplication of DNA segments: 2.7 % ~ 35 mo nucleotides differences ~ 5 mo indels Many chromosomal rearrangements Human: 3.4 109 bp; Chimp: 3.6 109 bp
~ 35 mo nucleotides differences ‘Since we apparently diverged from a common ancestor 6 million years ago, that is roughly 6 mutations per year that get fixed within the genome (or 3 per year if you divide them equally amongst the 2 branching species). Given a conservative estimate of average generational time of 10 years, this means that 30 new mutations had to be fixed within the population every generation. The current human mutation rate is around 3 or 4 mutations per organism.’ http://www.uncommondescent.com/index.php/archives/875 HPI meeting The chimp and us
At the genome level 1) Structural variations
A genome-wide survey of structural variation between human and chimpanzee (Newman et al., (2005)) - Approach: Mapping chimp fosmid against human reference sequence and identifying discordant regions by size and orientation - Limitations: The human genome is not complete The chimp genome = 1 individual (! Inter/intraspecific differences !)
Identification of 651 regions of putative structural variation between the human genome assembly and the single chimp individual (293 chimp deletions, 184 chimp insertions and 174 inversions/duplicative transpositions). • Chromosome Y is the most rearranged chromosome between human and chimp (! Repetitive regions !) - They have identified 245 (RefSeq) genes that may be affected by the structural differences between chimp and human (drug detoxification, receptors, reproduction) (Newman et al., (2005))
At the genome level • Structural variations • 2) Segmental duplication
Segmental duplication (impact: 2.7 %) • Longer than 20 kilobases (-> 300 kb), greater than 94 % sequence identity • 33% of human duplicated segments are human specific • - 17 % of chimp duplication are chimp specific. • Half of the genes in the human specific duplicated regions exhibit significant differencesin gene expression relative to chimp and are most often upregulated. • Cheng et al., Nature (2005)
About 300 region were identified where the human genome showed significant increase in copy number when compared to chimp. ‘Only’ 92 regions where the chimp genome showed an increase in copy number compared to human (but with higher rate of duplication) Cheng et al., Nature (2005)
Example: 4 human regions represented ~ 400 x in chimp genome (99.2% identity) Cheng et al., Nature (2005)
At the genome level • Structural variations • 2) Segmental duplication • 3) Interspersed/Transposable repeats
The human genome is composed of ~ 45 % of interspersed elements • Including: • Long interspersed elements (LINEs); these encode a reverse transcriptase • Short interspersed elements (SINEs); these include Alu repeats • -The human genome contains about 1,000,000 Alu elements. - Found only in primates .
Interspersed/Transposable element insertions (impact 3 %) • endogenous mutagens which can alter genes, promote genomic rearrangements… • may help to drive the speciation of organisms • Particular interest in recently mobilized transposons • - The transposons that inserted into human or chimp genome during the passed 6 mo years would be expected to be present in only one of the 2 genome. • ~11’000 ‘recent’ transposons copies that are differentially present in human/chimp: • 73 % found in human and 27 % found in chimp
Interspersed/Transposable element insertions Endogenous retrovirus Mills et al., Am. J. Hum. Genet., 78:671-679, 2006
Interspersed /Transposable element insertions • Alu, L1 and SVA insertions accounted for > 95% of the insertion in both species • - Human and chimp have amplified different subfamilies of these elements. SVA: composite element (1.5-2.5 kb) (2 Alu, a tandem repeat and a region derived form HERV-k)
Human have supported higher levels of transposition than chimp during the past several million years (but…not the case for the baboo which shows an activity 1.6 fold higher than human -> general decline in Alu activity in chimp)
Blat human DNA vs chimp DNA AJ271736 Xq pseudoautosomal
Interspersed /Transposable element insertions • 34 % of the insertions were located within known genes during the evolution of human and chimp
Interspersed /Transposable element insertions - conclusions - The original set of transposons in the common ancestor of human and chimp behaved differently during the subsequent evolution of the 2 organisms - Human received at least 4’800 additional transposon insertions compared to chimp -> impact of transposon mutagenesis is likely to be greatest in human during the past several million years. - Human and chimp have amplified different subfamilies of these elements. - Factors such as differences in population size may also have influence the pattern of transposon insertion.
At the sequence level (coding sequence level)
Nucleotide divergence: 1.23 % 14-22 % of these differences are due to polymorphism -> fixed divergence rate = ~1.06 % Chromosome X: ~0.94 % Chromosome Y: ~1.9 % Higher mutation rate in the male compared with female germ line (higher number of cell division (5 to 6 fold))
At the gene level: 13’454 pair of orthologous genes (507 Swiss-Prot, 1134 TrEMBL: 1641) (NCBI: 3111) - 29 % are 100 % identical - 5% with in-frame indel (mainly in repetitive region)
Aclassical measure of the overall evolutionary constraint on a gene KA: non-synonymous substitution rate in coding sequence KB or Ks synonymous substitution rate in coding sequence Kl: substitution rate in non-coding sequence KA/KB << 1: typical of most proteins where change is detrimental (negative selection) KA/KB > 1: for the rare protein for which it is a positive selection
About 500 genes with a KA/KB > 1 Most of the genes with a KA/KB > 1 are not involved in process related to supposed humanness. Genes with highest KA/KB ratio are mostly related to host-pathogen interaction, immunity and reproduction (pattern also found in other mammals (cf Valeria’s work on human/mouse orthologs)
In fact genes related to brain function and neuronal activities show lower-than-average KA/KB ratio - Neural genes, as a group, have much lower average of KA/KB ratio than genes expressed outside of the brain. Hypothesis: only a small subset of genes may be the target of positive selection: not visible in such type of studies. (Hill, Walsh (2005))
Example 1: FOXP2 • gene relevant for the human ability to develop language • among the 5% most conserved protein • CC -!- DISEASE: Defects in FOXP2 are the cause of speech-language • CC disorder 1 (SPCH1) [MIM:602081]; also known as autosomal dominant • CC speech and language disorder with orofacial dyspraxia. Affected • CC individuals have a severe impairment in the selection and • CC sequencing of fine orofacial movements, which are necessary for • CC articulation. They also show deficits in several facets of • CC language processing (such as the ability to break up words into • CC their constituent phonemes) and grammatical skills.
Extremely conserved among mammals • - Acquired 2 aa changes in the human lineage (T303N and N325S), • including one potential/functional phosphorylation site (N325S) • Estimation: fixation of these mutations occurs during the last 200’000 years of human history, concomitant with of subsequent to the emergence of anatomically modern humans. • Enard et al., Nature (2002)
BUT: • no aa substitution are shared between song-learning birds, vocal learning whales, dolphins and bats, and human, … • AND… • during times of song plasticity, FoxP2 is upregulated in a striatal region esssential for song learning. • selection acted on large non-coding regulatory regions of FoxP2 ??? • - duplication of the chromosomal region (27 genes including FoxP2) may be another cause of speech and language disturbance ???
Less-is-more hypothesis Loss of function changes (lack of body hair, preservation of juvenile traits, expansion of the cranium) could be caused by non-synonymous substitutions, indels, loss of coding regions and deletions of entire genes. -> 53 human genes with disruptive indels in the coding regions (compared to chimp)
Well documented examples of human specific pseudogenization - MYH16, CMAH, CASP12, ELN, T2R62P (bitter taste receptor), MBL1 - Microcephalin (MCPH1) Challenge: dating the event !
MYH16 Myosin gene mutation (MYH16) correlates with anatomical changes in the human lineage inactivated by a frameshifting mutation after the lineages leading to humans and chimpanzees diverged (~2.4 Myr). The gene is transcribed (-> the coding sequence deletion was not preceeded by a mutation in a transcriptional control domain). Expressed only in masticatory muscles in other mammals. Loss of this protein isoform is associated with marked size reductions in individual muscle fibres and entire masticatory muscles. Nature428, 415-418 (2004)
Phylogenetic reconstruction for all human sarcomeric myosin genes (heavy chain), showing early divergence of MYH16 from others. Nature428, 415-418 (2004)
Aligned DNA sequences for MYH16 exon 18 representing seven non-human primate species and six geographically dispersed human populations, revealing the effect of frameshift on reading frame and deduced amino acid sequence. Note stop codon at position 72−74. Nature428, 415-418 (2004)