1 / 57

Now-16.12.04

Now-16.12.04. Genome based views of organisms. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence. carry out dideoxy sequencing ( Sanger ). connect seqs. to make whole chromosomes. find the genes!. Reading:. DNA target sample. SHEAR. Reads.

archer
Download Presentation

Now-16.12.04

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Now-16.12.04

  2. Genome based views of organisms Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing (Sanger) connect seqs. to make whole chromosomes find the genes!

  3. Reading: DNA target sample SHEAR Reads LIGATE & CLONE Primer SEQUENCE Vector Shotgun DNA Sequencing of whole genome (WGS)

  4. Reading: DNA samples in bacteria are picked, processed & sequenced robotically, -many thousands per day in large sequencing centers http://itgmv1.fzk.de/www/itg/uetz/robot_movie.html

  5. We depend on Sanger dideoxy – sequencing Reading: Movie MCB0701_loop

  6. We depend on Sanger dideoxy – sequencing Reading:

  7. We depend on Sanger dideoxy – dye sequencing Reading:

  8. We depend on Sanger dideoxy – dye sequencing Reading:

  9. We depend on Sanger dideoxy – dye sequencing Reading:

  10. Reading: Dideoxy sequencing done robotically…Hi-throughput

  11. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence Robotically do dideoxy-dye data collection connect seqs. to make whole chromosomes

  12. Reading to Assembly:

  13. Assembly: End Reads (Mates) Central steps of the assembly Primer ......................................ctgccaatgggc 12 |||||||||||| 401 caatcggccatggtggcatcgatcgcggcggcgttggactgccaatgggc 450 . . . . . 13 gctggcacgatgcccggcggtgtgggcggtggaccggggggcgtggccgg 62 |||||||||||||||||||||||||||||||||||||||||||||||||| 451 gctggcacgatgcccggcggtgtgggcggtggaccggggggcgtggccgg 500 . . . . . 63 tggaggaccccaggtgggagtcggtccacccgggagcggtaacggtggca 112 |||||||||||||||||||||||||||||||||||||||||||||||||| 501 tggaggaccccaggtgggagtcggtccacccgggagcggtaacggtggca 550 Smith-Waterman type alignment, >96%

  14. Assembly: Human repeats make assembly difficult 50% of human genome is (dead)transposons/mobile elements!! 50% of human genome is (dead) transposons/mobile elements!!

  15. Assembly: Human repeats make assembly difficult

  16. Assembly: using “pair-mate” reads to connect contigs

  17. Assembly: DNA target sample SHEAR SIZE SELECT End Reads (Mates) LIGATE & CLONE Primer SEQUENCE Vector HUMAN Shotgun DNA Sequencing pair-mates 2,000 Base pair library 10,000 Base pair library 150,000 Base pair library

  18. Assembly: Each nucleotide sequenced many times Assembly Progression(Macro View)

  19. Assembly: Labour intensive “Genome closure”

  20. Assembly: The challenge of eukaryotic genomes E. coli Genome 4 million bp The Human Genome 3 billion bp 50% of genome is repeat sequences!

  21. Assembly:There is a second strategy for genome sequencing All new projects now do WGS (whole genome shotgun) approach But…many genomes were donein the past by ordered clone based sequencing As a user of genome databases, you’ll find many sequenced organisms whose data reflect the ordered clone approach

  22. Assembly:Strategy II: Ordered clone based sequencing ~300,000 bp

  23. Assembly:Strategy II: Ordered clone based sequencing

  24. Assembly:Strategy II: Ordered clone based sequencing

  25. Assembly: The two alternative genome sequencing strategies 2. Whole Genome Shotgun approach • Ordered clone • approach

  26. Annotation: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence Robotically do dideoxy-dye data collection Whole genome shotgun OR Ordered clones find the genes !

  27. Annotation: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence find the genes ! • ab initio • by evidence

  28. Annotation: For Bacterial genomes, ab initio is adequate ab initio: “from the beginning” יש מאין from first principles… ORFs are MOST of prokaryotic genome

  29. Annotation: ab initio – finding ORFs • -85-88% of the nucleotides are associated with coding sequence • in the bacterial genomes that have been completely sequenced. • example: in Escherichia coli there are 4288 genes that • have an average of 950 bp of coding sequence • and are separated by an average of just 118 bp. So first, to find genes in prokaryotic DNA, search for ORFs!!

  30. Annotation: ab initio – finding ORFs Scanning Prokaryotic DNA for ORF = stop = start (methionine)

  31. Annotation: ab initio – finding ORFs Human and Yeast codon usage

  32. Annotation: ab initio – evaluating ORFs Scanning Prokaryotic Codons for Usage Profile (Same gene [LacZ] as seen in previous slide)

  33. Annotation: ab initio – beyond ORFs beyond ORFs: • -Prokaryotes have short, simple promoters that are • easy to recognize • -Transcriptional terminators often consist of short inverted • repeats followed by a run of Ts. • -Therefore, programs that find prokaryotic genes search for: • ORFs 60 or more codons long –and codon usage • promoters at the 5' end • Terminators at the 3' end • Homology to known genes from other prokaryotes • Shine-Dalgarno sequences • `

  34. Annotation: ab initio – automated Prokaryotic gene finder examples Glimmer- Interpolated Markov Model method GrailII- Neural Network method (See BioInfo text – Fig 8.8)

  35. Annotation: results

  36. Annotation: Multicellular eukaryotes

  37. Annotation: Multicellular eukaryotes

  38. Annotation: Multicellular eukaryotes

  39. Annotation: 2 ways to annotate eukaryotic genomes: -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. -Genes based on previous knowledge….EVIDENCE -cDNA sequence of the gene’s message -cDNA of a closely related gene’ message sequence -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein……. -ab initio gene finders: Work on basic biological principles: Open reading frames Codon usage Consensus splice sites Met start codons ….. Genes based on previous knowledge-EVIDENCE -cDNA sequence of the gene’s message -cDNA of a related gene’s message seq. -Protein sequence of the known gene Same gene’s Same gene’s from another species Related gene’s protein…….

  40. [t]BLAST[x/n/p] t : Translate a DNA database in all 6 reading frames for comparison with a Protein query. x : Translate a nucleotide query in all 6 reading frames for comparison with a Protein database. p : Comparison is against a Protein database. n : Comparison is against a Nucleotide database. BLAST Versions of the program

  41. [t]BLAST[x/n/p] t : Translate a DNA database in all 6 reading frames for comparison with a Protein query. x : Translate a nucleotide query in all 6 reading frames for comparison with a Protein database. p : Comparison is against a Protein database. n : Comparison is against a Nucleotide database. BLAST Versions of the program tBLASTn BLASTn nucleotide vs. nucleotide

  42. Annotation: Evidence based A Single Gene of Interest Annotation Hundreds of Gene Families Thousands of Genes Millions of Blast Hits Tens of Scaffolds Thousands of Contigs Millions of Reads

  43. Annotation: Evidence based Human Genome: Y2K-2000 “What was annotated (initial pass)?” PrimarySecondary (Evidence) (from same species) Repetitive DNAs Matches: tRNAs ESTs CpG islands Full length cDNAs Transcripts (including alternatives) Protein Similarity Transcription factor binding sites BLASTn BLASTn BLASTn BLASTn tBLASTn tBLASTX

  44. Annotation: Evidence based From human pipeline to autoannotation ASSEMBLED SEQUENCE Precomputes Repeat Masker BlastX BlastN tRNAScan CpG Gene finders nraa CHGI Genscan CRGI fgenesH dBEST (subset) Grail-exp Various mRNAs dBSTS mouse fragments AUTOANNOTATION Feature Clustering and Resolution

  45. Annotation: Annotation: Evidence based Transcription Unit

  46. start and stop site predictions Unique identifiers Splice site predictions Homology based exon predictions computational exon predictions Tracking information Consensus gene structure (both strands)

More Related