1 / 26

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez. Why do we want to know the sequence of an entire genome??. To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation

johana
Download Presentation

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Genomic SequencingAssemblyAnnotationDiego Martinez

  2. Why do we want to know the sequence of an entire genome?? To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation – know all the upstream/downstream regions for proteins to bind and control transcription

  3. Convinced???? Good. Lets Sequence!!! • 2 ways • Map (Public) • Several long steps – tiling • Expensive because they generated a complete map • Whole Genome Shotgun (Private) • Direct/cuts out some steps, missed 103 genes • Repeats! • Synthesis approach

  4. Generate Data

  5. Core principles • BAC-end sequences allow one to know physical association of sequence • More coverage leads to better sequence • Better algorithms make it easier • Longer sequence reads are critical

  6. Figure 2.5 Relationships of chromosomes to genome sequencing markers Sixteen overlapping clones represent 1,408 BACs needed to span the 163 Mb X chromosome. (Avg insert 146 kb)

  7. Assembly::put it all back together Assemble – BAC assemblies, Phrap (Phil Green, UW) Celera, WGS – Celera Assembler both – find overlaps of same sequence, build regions (contigs) put contigs together using paired end information – Order and Orient into large Scaffolds (also called super contigs.) Whole Genome Shotgun – automated, without tiling Finishing

  8. Problems even Map-based couldn’t fix

  9. Which method worked best? • WGS failed with highly repetitive regions • WGS, however, reduced overall workload for sequencing • Use hybrid approach • WGS used for 6-fold coverage • Reduced number of BACs needed to sequence by 93%

  10. Annotation Need to make it useable – and fun!!! What is annotation? Find sequence features in the genome find genes (focus here) The act or process of furnishing critical commentary or explanatory notes. pseudogenes repeats reg. elements(very difficult, still in its infancy) attempt to describe gene function

  11. Figure 2.6 alternative splicing NADPH oxidase H+ channel

  12. Table 2.1 How annotation can be used to infer/understand biological niche

  13. Example of annotation - What is a gene?

  14. Functional Annotation – What does the protein do? • Found Genes • Basic approach – By similarity to known protein. • Old Style – Best Blast Hit • Can lead to funny incorrect annotations

  15. Funny examples

  16. Critical residues/multiple sequence alignment(lysozyme)

  17. Gene Family Expansion

  18. Signaling Pathways

  19. Phanerochaete chrysosporium • Degrades lignin • 30 million base pair (30 MB) • This was the 1st basidiomycete – so gene finding was a big challenge • Estimate 11,777 genes

  20. Genome Facts!

  21. http://genome.jgi-psf.org/Phchr1/Phchr1.info.html

  22. Phylogenies of Genes

  23. Genome Evolution and RIP

More Related