1 / 49

Human Genome Sequence and Variability

Human Genome Sequence and Variability. Gabor T. Marth, D.Sc. Department of Biology, Boston College marth@bc.edu. Medical Genomics Course – Debrecen, Hungary, May 2006. Lecture overview. 1. Genome sequencing strategies, sequencing informatics.

enye
Download Presentation

Human Genome Sequence and Variability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Genome Sequence and Variability Gabor T. Marth, D.Sc. Department of Biology, Boston College marth@bc.edu Medical Genomics Course – Debrecen, Hungary, May 2006

  2. Lecture overview 1. Genome sequencing strategies, sequencing informatics 2. Genome annotation, functional and structural features in the human genome 3. Genome variability, DNA nucleotide, structural, and epigenetic variations

  3. 1. The Human genome sequence

  4. The nuclear genome (chromosomes)

  5. The genome sequence • the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

  6. ~3,000 Mb >100 Mb ~100 Mb Completed genomes ~1 Mb

  7. Whole-genome shotgun sequencing Main genome sequencing strategies Clone-based shotgun sequencing Human Genome Project Celera Genomics, Inc.

  8. Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

  9. Clone mapping – “sequence ready” map

  10. Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

  11. Shotgun subclone library construction cloning vector BAC primary clone subclone insert sequencing vector

  12. Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

  13. Sequencing

  14. Robotic automation Lander et al. Nature 2001

  15. Base calling PHRED base = A Q = 40

  16. Vector clipping

  17. Hierarchical genome sequencing BAC library construction clone mapping shotgun subclone library construction sequencing/read processing sequence reconstruction (sequence assembly) Lander et al. Nature 2001

  18. Sequence assembly PHRAP

  19. Repetitive DNA may confuse assembly

  20. Sequence completion (finishing) region of low sequence coverage and/or quality gap CONSED, AUTOFINISH

  21. 2. Human genome annotation

  22. Genome annotation – Goals repetitive elements protein coding genes RNA genes GC content

  23. The starting material AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

  24. Coding genes – ab initio predictions Stop codon Start codon ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA PolyA signal Open Reading Frame = ORF

  25. Ab initio predictions Gene structure

  26. Ab initio predictions …AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG… splice acceptor site splice donor site

  27. Ab initio predictions Genscan Grail Genie GeneFinder Glimmer etc… EST_genome Sim4 Spidey EXALIN

  28. Homology based predictions known coding sequence from another organism expressed sequence ACGGAAGTCT GGACTATAAA ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA genes predicted by homology Genomescan Twinscan etc…

  29. Consolidation – gene prediction systems Sim4 dbEst Genewise Grail Genscan FgenesH Ensembl Otto

  30. ncRNA genes prediction based on structure (e.g. tRNAs) for other novel ncRNAs, only homology-based predictions have been successful

  31. Repeat annotations Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

  32. The landscape of the human genome

  33. Gene annotations – # of coding genes Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  34. Gene annotations – gene length Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  35. Gene annotations – gene function Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  36. GC content and coding potential Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  37. ncRNAs Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  38. Segmental duplications Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  39. Repeat elements Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

  40. Genes and repeats

  41. Physical vs. genetic map (Mb/cM) 0.4 cM 1.3 cM 0.7 cM 0.4 Mb 0.7 Mb 0.3 Mb

  42. 3. Human genome variability

  43. DNA sequence variations • the reference Human genome sequence is 99.9% common to each human being • sequence variations make our genetic makeup unique • the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known SNP

  44. DNA sequence variations insertion-deletion (INDEL) polymorphisms

  45. Structural variations Speicher & Carter, NRG 2005

  46. Structural variations Feuk et al. Nature Reviews Genetics7, 85–97 (February 2006) | doi:10.1038/nrg1767

  47. Detection of structural variants Feuk et al. Nature Reviews Genetics7, 85–97 (February 2006) | doi:10.1038/nrg1767

  48. Epigenetic changes: chromatin structure Sproul, NRG 2005

  49. Epigenetic changes: DNA methylation Laird, NRC 2003

More Related