1 / 38

Genome Sequences/ the Human Genome Project Dr. Chris Evelo

Genome Sequences/ the Human Genome Project Dr. Chris Evelo. Genomes. A genome is the collection of DNA that comprises an organism. Today we have assembled the sequence of hundreds of genomes. The genome is divided into chromosomes, chromosomes contain genes, and genes are made of DNA.

Download Presentation

Genome Sequences/ the Human Genome Project Dr. Chris Evelo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Sequences/ the Human Genome Project Dr. Chris Evelo

  2. Genomes A genome is the collection of DNA that comprises an organism. Today we have assembled the sequence of hundreds of genomes. The genome is divided into chromosomes, chromosomes contain genes, and genes are made of DNA. • Each one of earth's organism has its own distinctive genome (except identical twins).

  3. Genome sizes in nucleotide base pairs (log scale) plasmids viruses bacteria fungi plants algae insects mollusks bony fish The size of the human genome is ~ 3 X 109 bp; almost all of its complexity is in single-copy DNA. The human genome is thought to contain ~20,000-30,000 genes. amphibians reptiles birds mammals 104 105 106 107 108 109 1010 1011 http://www3.kumc.edu/jcalvet/PowerPoint/bioc801b.ppt

  4. Genome content bacteria yeast worm fly man Size (Mb) 2 12 97 137 3.500 % genes junk ? total genes 2.000 6.300 19.000 14.000 30.000 ?

  5. GENOME TRANSCRIPTOME PROTEOME METABOLOME DNA RNA PROTEIN METABOLISM Functional genomics All genes Single genes Organisation (HT-sequencing) Expression (μ-arrays/HT-sequencing) Synthesis/Structure (2D gels & LCMS/NMR-Xray) Flux (NMR-kinetics-model) FUNCTION

  6. Which genomes are sequenced? • Selection of genomes for sequencing is based • on criteria such as: • genome size (some plants are >>>human genome) • cost • relevance to human disease (or other disease) • relevance to basic biological questions • relevance to agriculture • Ongoing projects (most are finished by now): • Chicken Fungi (many) • Chimpanzee Honey bee • Cow Sea urchin • Dog Rhesus macaque

  7. Human Genome project International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome. Nature 431, 931-945 (21 October 2004). Introduction video Strategies Conclusions

  8. Additionele manieren voor het verkrijgen Van sequenties.

  9. Overview of genome analysis There are two main stragies for sequencing genomes An approach used to decode an organism's genome by shredding it into smaller fragments of DNA which can be sequenced individually. The sequences of these fragments are then ordered, based on overlaps in the genetic code, and finally reassembled into the complete sequence. The 'whole genome shotgun' (WGS) method is applied to the entire genome all at once, while the 'hierarchical shotgun' method is applied to large, overlapping DNA fragments of known location in the genome.

  10. Strategic issues: Hierarchical / shotgun sequencing • The human genome was sequenced in parallel by: • A public consortium (IHGSC) • Celera Genomics • Using alternative sequencing strategies: • Hierarchical shotgun sequencing (public consortium) • Whole genome shotgun sequencing (Celera)

  11. Source: IHGSC (2001)

  12. Source: IHGSC (2001)

  13. Sequenced-clone contigs are merged to form scaffolds of known order and orientation Source: IHGSC (2001)

  14. Source: IHGSC (2001)

  15. Strategies Whole genome shotgun sequencing (Celera) -- given the computational capacity, this approach is far faster than hierarchical shotgun sequencing -- the approach was ~validated using Drosophila Hierarchical shotgun sequencing (public consortium) -- 29,000 BAC clones -- 4.3 billion base pairs -- it is helpful to assign chromosomal loci to sequenced fragments, especially in light of the large amount of repetitive DNA in the genome -- individual chromosomes assigned to centers

  16. Draft sequence finished in 2001 • Public consortium: • International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome.". Nature 409: 860−921 • Celera • Venter, JC, et al (2001). "The sequence of the human genome.". Science 291: 1304−1351.

  17. When has a genome been fully sequenced? A typical goal is to obtain five to ten-fold coverage. Finished sequence: a clone insert is contiguously sequenced with high quality standard of error rate 0.01%. There are usually no gaps in the sequence. Draft sequence: clone sequences may contain several regions separated by gaps. The true order and orientation of the pieces may not be known.

  18. Main conclusions of human genome project • There are about 30,000 to 40,000 human genes. • This number (from 2001) is far smaller than earlier • estimates. • The public consortium estimated 31,000, while • Celera estimated 38,500. • But note: • Many predicted genes were unique to each group • There are many transcripts of unknown function • More recently (2004), the estimate has been revised • to 20,000 to 25,000 genes.

  19. Main conclusions of human genome project 1. We have about the same number of genes as fish and plants, and not that many more genes than worms and flies. Fugu rubripes (pufferfish): 20,000 to 25,000 Arabidopsis thaliana (thale cress): 26,000 Caenorhabditis elegans (worm): 19,000 Drosophila melanogaster (fly): 13,000

  20. Main conclusions of human genome project 2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes. Vertebrates have a more complex mixture of protein domainarchitectures. Additionally, the human genome displays greater complexity in its processing of mRNA transcripts by alternative splicing.

  21. Main conclusions of human genome project 3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report. Evidence: compare the proteomes of human, fly, worm, yeast, Arabidopsis, eukaryotic parasites, and all completed prokaryotic genomes. Find some genes shared exclusively by humans and bacteria—but according to TIGR, only about 40 of these genes (or fewer?) were acquired by LGT. (See Salzberg et al., Science 292:1903, 2001). Reasons for artifactually high estimates include: -- gene loss -- small sample size of species

  22. Main conclusions of human genome project 4. 98% of the genome does not code for genes >50% of the genome consists of repetitive DNA derived from transposable elements: (also called interspersed repeats): LINEs (20%) SINEs (13%) LTR retrotransposons (8%) DNA transposons (3%) There has been a decline in activity of some of these elements in the human lineage.

  23. Main conclusions of human genome project 5. Segmental duplication is a frequent occurrence in the human genome. -- tandem duplications (rare) -- retrotransposition (intronless paralogs) -- segmental duplications (common)

  24. Main conclusions of human genome project 6. There are 300,000 Alu repeats in the human genome. These are about 300 base pairs and contain an AluI restriction enzyme site. They occupy 3% of the genome. Their distribution is non-random: they are retained in GC-rich regions and may confer some benefit.

  25. Main conclusions of human genome project 7. The mutation rate is about twice as high in male meiosis than female meiosis. Most mutation probably occurs in males.

  26. Main conclusions of human genome project 8. More than 1.4 million single nucleotide polymorphisms (SNPs; single base pair changes) were identified. Celera initially identified 2.1 million SNPs. Currently, dbSNP at NCBI (build 125) has about 10.4 million human SNPs (4.9 million validated refSNPs). A SNP occurs every 100 to 300 base pairs. Fewer than 1% of SNPs alter protein sequence.

  27. Other conclusions • Other conclusions with respect to: • Long-range variation in GC content • CpG islands • Comparison of genetic and physical distance • The repeat content of the human genome • The gene content of the human genome • Noncoding RNA • More about this during the presentation on CpG islands, SNP and • miRNA in weeks 3 and 7!

More Related