170 likes | 341 Views
Major insights from the HGP on. Gene content Proteome content SNP identification Distribution of GC content CpG islands Recombination rates Repeat content. Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & 875-914. 1) Gene content.
E N D
Major insights from the HGP on • Gene content • Proteome content • SNP identification • Distribution of GC content • CpG islands • Recombination rates • Repeat content Nature (2001) 15th Feb Vol 409 special issue; pgs 814 & 875-914.
1) Gene content 30 - 40,000 protein-coding genes estimated based on known genes and predictions IHGSCCelera definite genes 24,500 26,383 possible genes 5000 12,000 Genes encode either protein or noncoding RNAs rRNA, tRNA, snRNA, snoRNA Nature (2001) 15th Feb Vol 409 special issue; pg 814-816 and 860-914.
Gene content…. More genes: Twice as many as drosophila / C.elegans Uneven gene distribution: Gene-rich and gene-poor regions More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes) Nature (2001) 409: pp 892
Gene content Uneven gene distribution Gene-rich E.g. MHC on chromosome 6 has 60 genes with a GC content of 54% Gene-poor regions 82 gene deserts identified ? Large or unidentified genes What is the functional significance of these variations? Genetics by Hartwell: pp 341-347
2) Proteome content Protein Domains (sections with identifiable shape/function) Domain arrangements in humans largest total number of domains is 130 largest number of domain types per protein is 9 Mostly identical arrangement of domains proteome more complex than invertebrates A A B B B C C C C C Protein X Nature (2001) 15th Feb Vol 409 special issue; pg 847
2) Proteome content…. proteome more complex than invertebrates…… • no huge difference in domain number in humans • BUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function) • However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins. • e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila Nature (2001) 15th Feb Vol 409 special issue; pg 847
3) SNPs (single nucleotide polymorphisms) More than 1.4million SNPs identified One every 1.9kb length on average Densities vary over regions and chromosomes e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many millions of years Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
How does one distinguish sequence errors from polymorphisms? sequence errors Each piece of genome sequenced at least 10 times to reduce error rate (0.01%) Polymorphisms Sequence variation between individuals is 0.1% To be defined as a polymorphism, the altered sequence must be present in a significant population Rate of polymorphism in diploid human genome is about 1 in 500 bp Nature (2001) 15th Feb Vol 409 special issue; pgs 821-823 & 928
3) SNPs…… • Sites that result from point mutations in individual base pairs • biallelic • ~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP) • May or may not affect the ORF • Most SNPs may be regulatory Nature (2001) 15th Feb Vol 409 special issue; pg 821 & 928 http://www.genetics.gsk.com/kids/medicine01.htm
4) Distribution of GC content Genome wide average of 41% Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36% Confirms cytogenetic staining with G-bands (Giemsa) dark G-bands – low GC content (37%) light G-bands – high GC content (45%) Nature (2001) 15th Feb Vol 409 special issue; pg 876-877
5) CpG islands CpG TpG Methyl CpG Deamination Significance of CpG islands • Non-methylated CpG islands associated with the 5’ ends of genes • Aberrant methylation of CpG islands is one mechanism of inactivating tumor suppressor genes (TSGs) in neoplasia methylated at C CpG islands show no methylation http://www.sanger.ac.uk/HGP/cgi.shtml
CpG islands Greatly under-represented in human genome • ~28,890 in number • Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb Average is 10.5/Mb Nature (2001) 15th Feb Vol 409 special issue; pg 877-888
6) Recombination rates 2 main observations • Recombination rate increases with decreasing arm length • Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb
7) Repeat content • Age distribution • Comparison with other genomes • Variation in distribution of repeats • Distribution by GC content • Y chromosome Nature (2001) 409: pp 881-891