430 likes | 470 Views
Explore the vast complexity and structure of the human genome, from genes to telomeres, and the revolutionary methods used to map and sequence it. Learn about the Human Genome Project (HGP) and Celera approaches, physical mapping, sequencing technologies like shotgun sequencing, and the challenges and breakthroughs in deciphering our genetic code.
E N D
Human Genome Contents: 3200 Mb • Genes: 1200 Mb • Genes 48 Mb • Related 1152 Mb: Pseudogenes, Gene Fragments, Introns • Intergenic DNA 2000 Mb • Interspersed Repeats 1400 Mb • Microsatellite (short tandem repeats) 90 Mb • Telomeres: End Sequences • Centromeres: • Single Nucleotide Polymorphisms
Chromosomes • Shorter than DNA they contain • Histones: DNA binding proteins • Two Copies held together by centromeres • Telomere: Terminal region • Two humans differ by 0.1%
Donors • HGP: • Opportunity advertised near labs • First come; First Taken • 5-10 samples for every one used • No link between donor and sample • Celera: 5 subjects (three men; two women) • One Asian; One African-American; One Hispanic; Two Caucasians • Craig Venter
Basic Technology • Physical Mapping • Cloning • Shotgun Sequencing • Computational Sequence Reassembly
STS • High Resolution, Rapid, Simple • 100 - 500 bp • Collection of overlapping fragments • Each point represented multiple times in random fragments • Sequence must be known • Unique in chromosome under study
Physical Mapping • A set of clone fragments whose position relative to each other is known • Restriction Maps: Relative locations of Restriction Sites • Fluorescent in situ hybridization (FISH): Marker locations mapped by hybridizing probe to chromosomes • Sequence Tagged Sites (STS): Positions of short sequences mapped by PCR or hybridization analysis of genome fragments • Expressed Sequence Tags (EST): short sequences from cDNA clones
Genome cut into fragments Cloned as library in vector (red)
Hybridisation mapping:1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right.
Fingerprinting: Digest clones and run On gel Overlap by sharedbands
Assembly of Contiguous DNA Sequence • Shotgun Approach • Contigs: Result of joining overlapping sequences • Scaffold: Result of connecting contigs by filling in gaps • BAC: Bacteria artificial chromosome vector: Inserts 100 - 200 kbs
Regional mapping Minimal tiling path selected for sequencing.
>20 kbp ~300 bp Molecular weight marker every 5th lane Restriction fragment fingerprinting - BAC clones are grown in 96-well format - Hind III digest - 1% agarose
A B C D E F G * * * * * * Contig assembly FPC* • Overlap identification by restriction pattern similarities • Facilitated contig assembly • *Sanger Centre • C. Soderlund, I Longden and R. Mott Clone All restriction fragments within a clone selected for the tiling path must be verified by their presence in overlapping clones. : insert fragments : vector fragments
BCM- HGSC
Shotgun Sequencing I :RANDOM PHASE Sheared DNA: 1.0-2.0 kb Bac Clone: 100-200 kb Random Reads Sequencing Templates: BCM- HGSC
Single Stranded Region Low Base Quality Mis-Assembly (Inverted) Sequence Gap Shotgun Sequencing II:ASSEMBLY Consensus BCM- HGSC
Single Stranded Region Low Base Quality Mis-Assembly (Inverted) Sequence Gap Shotgun Sequencing III: FINISHING Consensus BCM- HGSC
Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Shotgun Sequencing III: FINISHING Consensus BCM- HGSC
Mis-Assembly (Inverted) Sequence Gap Shotgun Sequencing III: FINISHING Consensus BCM- HGSC
Mis-Assembly (Inverted) Shotgun Sequencing III: FINISHING Consensus BCM- HGSC
High Accuracy Sequence: < 1 error/ 10,000 bases Shotgun Sequencing III: FINISHING BCM- HGSC
Whole Genome Shotgun Sequencing Sheared DNA: 1.0-2.0 kb Whole Genome: 3,000 Mb Random Reads Sequencing Templates: BCM- HGSC
Single Stranded Region Low Base Quality Mis-Assembly (Inverted) Sequence Gap Whole Genome Shotgun Sequencing:Assembly Consensus BCM- HGSC
Sequence Gap Whole Genome Shotgun Sequencing:Assembly Low Base Quality Consensus BCM- HGSC
Random fragmentation of genome produces good sampling of itssequence space. Overlaps are identified, and subassembly of sequence takes place after cloning into universal vector.
Assembled into contigs. Gaps and single-stranded regions identified for further study. Targeted fornew sequencing. Double-Barreled: Both Strands.
Whole-Genome Shotgun Sequencing • Speed-up: Assembled Correctly? • Avoid up-front mapping • Huge amount of computer time to identify overlaps • Have to reference a map • Repeats are a problem: • Leave out sequence between repeats • Missing Reference End Sequence means Error
HGP • Isolate large fragments in BACs with framework of landmark-based physical map • Sequence on clone-by-clone basis • Time-Consuming subcloning of random fragments and physical mapping
Sequence Reassembly • Phrap • Shortest Covering Superstring • Map Assembly • Overlap: Finding overlapping fragments • Layout: ordering fragments • Consensus: Sequences from layout