260 likes | 565 Views
Introduction to Genomic Sequencing Assembly Annotation Diego Martinez. Why do we want to know the sequence of an entire genome??. To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation
E N D
Introduction to Genomic SequencingAssemblyAnnotationDiego Martinez
Why do we want to know the sequence of an entire genome?? To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation – know all the upstream/downstream regions for proteins to bind and control transcription
Convinced???? Good. Lets Sequence!!! • 2 ways • Map (Public) • Several long steps – tiling • Expensive because they generated a complete map • Whole Genome Shotgun (Private) • Direct/cuts out some steps, missed 103 genes • Repeats! • Synthesis approach
Core principles • BAC-end sequences allow one to know physical association of sequence • More coverage leads to better sequence • Better algorithms make it easier • Longer sequence reads are critical
Figure 2.5 Relationships of chromosomes to genome sequencing markers Sixteen overlapping clones represent 1,408 BACs needed to span the 163 Mb X chromosome. (Avg insert 146 kb)
Assembly::put it all back together Assemble – BAC assemblies, Phrap (Phil Green, UW) Celera, WGS – Celera Assembler both – find overlaps of same sequence, build regions (contigs) put contigs together using paired end information – Order and Orient into large Scaffolds (also called super contigs.) Whole Genome Shotgun – automated, without tiling Finishing
Which method worked best? • WGS failed with highly repetitive regions • WGS, however, reduced overall workload for sequencing • Use hybrid approach • WGS used for 6-fold coverage • Reduced number of BACs needed to sequence by 93%
Annotation Need to make it useable – and fun!!! What is annotation? Find sequence features in the genome find genes (focus here) The act or process of furnishing critical commentary or explanatory notes. pseudogenes repeats reg. elements(very difficult, still in its infancy) attempt to describe gene function
Figure 2.6 alternative splicing NADPH oxidase H+ channel
Table 2.1 How annotation can be used to infer/understand biological niche
Functional Annotation – What does the protein do? • Found Genes • Basic approach – By similarity to known protein. • Old Style – Best Blast Hit • Can lead to funny incorrect annotations
Phanerochaete chrysosporium • Degrades lignin • 30 million base pair (30 MB) • This was the 1st basidiomycete – so gene finding was a big challenge • Estimate 11,777 genes