1 / 18

Genome Annotation

Genome Annotation. Group 3 Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang. 1) Introduction 2) Methodology 3) Selection of Tools 4) Role of Group Members 5) Project Time Frame. Introduction Genome annotation: - Identifying useful elements in the genome,

elatimer
Download Presentation

Genome Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Annotation Group 3 Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang 1) Introduction 2) Methodology 3) Selection of Tools 4) Role of Group Members 5) Project Time Frame

  2. Introduction Genome annotation: - Identifying useful elements in the genome, - And providing biological information to these elements. - Manual curation - Automated annotation In our genome annotation project, we use automated method to annotate elements in the genome, by applying both homology-based and ab inito approaches.

  3. Methodology Genome sequence Genome browser search NCBI, ENSEMBLE … yes Contains known or predicted genes no no Flowchart: Steps in our AnnotationProject Pseudogene finder (PPfinder) Repeat mask sequence (RepeatMasker) Homology search (FGENESH, BLAST, EST, cDNA..) Generate model for training Visual integration of the results (CGView, …) Evaluation Ab inito prediction (FGENESB, Glimmer, GOana …)

  4. Tools - Repeat sequences: RepeatMasker - Pseudo-genes: PPfinder - Homology based approaches:FGENESH, GOana, SGP2, Exonerate - Ab Inito approaches: FGENESB; Glimmer, GeneMark, BProt, GrailEPX - Visualization: CGView, Apollo, Softberry

  5. Project team member roles:

  6. Project Time frame:

  7. Homology based approach--FGENESH Introduction FGENESH has both program version and web-site It can do Hidden Markov Model-based Gene prediction We choose website---more convenient http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind&advanced=on

  8. Homology based approach--FGENESH

  9. Homology based approach--FGENESH How to use it • Paste your assigned genomic sequence into the sequence window or load the FASTA file • select the most similar organism to assist in gene prediction • Select the following "advanced options" from the list below:- print mRNA sequences for predicted genes- print exon sequences for predicted genes • Click the "Search" button

  10. Homology based approach--FGENESH • output

  11. Ab inito approach GLIMMER • GLIMMER (ver. 3.02; iterated) predictions: • orfID start end frame score • -------- ----- ----- -- ----- • >scf7180000000377 • orf00003 2585 1557 -3 17.67 • orf00004 3184 2678 -2 13.87 • orf00006 4381 3320 -2 16.85 • orf00007 5943 4540 -1 17.93 • orf00010 6804 6001 -1 14.80 • orf00012 7301 6801 -3 11.90 • orf00013 7995 7306 -1 9.88 • orf00015 8278 8775 +1 15.89 • orf00016 8823 8939 +3 3.96 • orf00017 10219 8936 -2 13.29 • orf00018 10326 10589 +3 10.04 • orf00020 11655 10699 -1 10.11 • orf00021 12927 12265 -1 15.16 • orf00023 13791 13069 -1 17.54 • orf00025 14678 13857 -3 13.96 • orf00026 16005 14695 -1 13.37 • orf00029 17495 16005 -3 16.09 • orf00032 18683 17637 -3 18.61 • orf00033 19278 18838 -1 10.04 • orf00034 19798 20154 +1 8.68 • orf00035 20601 21221 +3 15.98 • orf00036 22130 21270 -3 11.58 • orf00037 1560 22237 -1 10.21

  12. GeneMark • Parse predicted by GeneMark.hmm 2.4 • GeneMark.hmm PROKARYOTIC (Version 2.6r) • Model organism: Escherichia_coli_K12 • Tue Oct 6 20:00:46 2009 • Predicted genes • Gene Strand LeftEnd RightEnd Gene Class • # Length • 1 - <1 1560 1560 1 • 2 - 1557 2675 1119 1 • 3 - 2678 3184 507 1 • 4 - 3320 4381 1062 1 • 5 - 4540 5943 1404 1 • 6 - 6001 6804 804 1 • 7 - 6801 7301 501 1 • 8 - 7306 8091 786 1 • 9 + 8293 8775 483 1 • 10 - 8936 10219 1284 1 • 11 + 10308 10589 282 1 • 12 - 10699 11637 939 1 • 13 - 12265 12951 687 1 • 14 - 13069 13791 723 1

  13. Fgenesb • Prediction of potential genes in microbial genomes • Time: Tue Jan 1 00:00:00 2005 • Seq name: test sequence • Length of sequence - 22260 bp • Number of predicted genes - 25 • Number of transcription units - 18, operons - 3 • N Tu/Op Conserved S Start End Score • pairs(N/Pv) • 1 1 Op 1 . - CDS 3 - 1560 737 • 2 1 Op 2 . - CDS 1557 - 2543 663 • 3 2 Tu 1 . + CDS 2482 - 2691 77 • 4 3 Tu 1 . - CDS 2678 - 3154 373 • 5 4 Tu 1 . - CDS 3320 - 4381 739 • 6 5 Op 1 . - CDS 4540 - 5943 757

  14. CGView – Circular Genome Viewer Java-based tool Run from web browser, Java-applet Circular genomes Integrated into bacterial genome annotation pipeline, and generates web content for data visualization Support many formats, such as XML, tab delimited file, or an NCBI PTT file Convenient to use with FgenesB

  15. CGView – Circular Genome Viewer

  16. Thank You!

More Related