210 likes | 344 Views
TwinScan Annotation of the Laccaria Sequences & Annotation of Genes in the Signaling Pathways Michael Muratet*, S é bastien Duplessis † , Gopi Podila* 2 nd Laccaria Genome Meeting Gent, Belgium October 14, 2005. * University of Alabama in Huntsville † Centre INRA de Nancy. Overview.
E N D
TwinScan Annotation of the Laccaria Sequences&Annotation of Genes in the Signaling PathwaysMichael Muratet*, Sébastien Duplessis†, Gopi Podila*2ndLaccaria Genome MeetingGent, BelgiumOctober 14, 2005 * University of Alabama in Huntsville † Centre INRA de Nancy
Overview • TwinScan Annotation • TwinScan Theory & Application • Annotation Process • Summary of Results • Annotation of Genes in the Signaling Pathways • Target List • Annotation Processes • Summary of Results
TwinScan Theory & Application • Combines the (HMM) probability model of Genscan with a probability model for a ‘conservation sequence’ • Training files for C. elegans, A. thaliana, C. neoformans, H. sapiens • L. bicolor similar to C. neoformans • Conservation sequence is created using BLAST alignments of related species (‘informants’) • Accuracy is relatively insensitive to BLAST parameters (require 30 bp @ 66% identity) • Does not attempt global alignments (as compared to Rosetta or CEM) • Approximately 60% more sensitive than Genscan, but still only 25% correct • Requirements • ~ 1 GByte memory per 1 Mbase of sequence • perl Korf, I., Flicek, P., Duan, D., Brent, M.R. (2001). “Integrating genomic homology into gene structure prediction”, Bioinformatics, 17 (Suppl 1):5140-5148.
TwinScan Annotation Process M=1 N=-1 –nogap Q=5 R=1 S=35 S2=35 W=10 X=30 B=1000 Y=Z=300000000 “Informant” Database C. NEOFORMANS GENOME LACCARIA SCAFFOLDS xdformat (wu)blastn www.sequence.stanford.edu/ Group/c.neoformans/download.html BLAST RESULTS GENBANK/ FASTA/EMBL ‘CONSERVATION’ SEQUENCES conseq.pl iscan ANNOTATION RESULTS process_zoe Note: No ESTs used!
TwinScan Output # iscan # Date: Sun Jul 24 23:58:02 2005 # Twinscan version 2.02 build 20041011CW # Genome Parameters: TwinScan/parameters/crypto_iscan-1208-genes-09-15-2003.zhmm # Conservation Parameters: TwinScan/parameters/crypto_iscan-1208-genes-09-15-2003.zhmm # Target Sequence: >scaffold_9 1418118 # Target Sequence Read... 1418118bp C+G = 47.8490% # Conservation Sequence: >Informant database(s): - # This is the 1-th best path. # Score: 122959 scaffold_9.fa iscan stop_codon 1033 1035 . - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan CDS 1036 1317 159 - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan CDS 1358 1444 87 - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan start_codon 1442 1444 . - 0 gene_id "scaffold_9.fa.001"; transcript_id "scaffold_9.fa.001.1"; scaffold_9.fa iscan start_codon 8013 8015 . + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan CDS 8013 8040 110 + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan CDS 8091 8146 93 + 2 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1"; scaffold_9.fa iscan stop_codon 8147 8149 . + 0 gene_id "scaffold_9.fa.002"; transcript_id "scaffold_9.fa.002.1";
Summary of TwinScan Results • 18,429 Genes Predicted • Max Length 12,878 nt Min Length 63 nt Avg Length 986.7 nt 945.4
Matches to ESTs • ~ 1500 GENO & INRA ESTs have no matches in TwinScan predictions
GTP-binding proteins and related enzymes G protein coupled receptors (GPCR) heterotrimeric G-protein, a (GPa), b, g subunits monomeric G-proteins of the Ras small GTPases superfamily Ras Small GTPases Ras type (& Sos/Grb2 systems) Rho type, Rab type and Arf and Kir/Rem/Rad subfamilies & nuclear GTPase Ran 14-3-3 proteins Secondary messengers (generation of Phosphate-Inositides, PIP2/IP3; Diacylglycerol, DAG ; Ca2+; cAMP; …) Adenylate / Guanylate cyclases (AC) Phospholipases (Phospholipase C, PLC ; PL A2 and PL D) Phosphodiesterases (PDE) Calmoduline (CaM) Kinases Histidine kinase (HK) and Response regulator (RR) PDPK (proline directed Proteine Kinase, Ser-Pro & Thr-Pro) MAPKs (Mitogen Activated Protein Kinases – MAPKKK, MAPKK, MAPK) SAPKs (Stress Activated Protein Kinases) DYRKs (Dual Specific tyr-Phosphorylated and Regulating Kinases) CdKs (Cyclin dependent kinases) Non-PDPK PKA (cAMP-PK) PKC (Ca2+/CaM-PK) CaMKII Ser/Thr Phosphatases PP2A PP2B (Calcineurine), and others? PP2C Other PPases…? & Others… Ca2+ channels and transporters Signaling Protein Search List
UAH Signaling Gene Annotation Process LACCARIA SCAFFOLDS BLAST DATABASE XDFORMAT SOURCE PROTEIN SEQUENCES TWINSCAN ANNOTATION SQL DATABASE TBLASTN • GPCR Database http://www.gpcr.org/7tm/ • Protein Kinase Resource http://www.kinasenet.org/pkr/ • NCBI BLAST HIT SQL DATABASE FIND OVERLAPPING HSPs
INRA Annotation Process • Selection of genes was based on BlastP against L. bicolor eugene v00.2 with signalling protein sequences from: • Ustilago maydis • Magnaporthe grisea • Phanerochaete chrysosporium • and in some cases • Pisolithus microcarpus • Suillus bovinus • Candida albicans • Tuber borchii • Botrytis cinerea • Neurospora crassa • Aspergillus fumigatus. • Homologs were selected for their scores and e-value depending on the percentages of identities (>50%) and homologies (>65%) on a sufficient portion length considering the initial protein size • For a given function, the 1st hit listed below usually corresponded to an e-value of 0.0 or lower than e-70 with >85% identities.
Adenylate / guanylate cyclases AC=> 1 gene scaffold_5_scaff.724 and 2 AC-like = scaffold_5_scaff.707 + scaffold_11_scaff.274 Phospholipases PI-specific PLC => 1 gene scaffold_6_scaff.206 + scaffold_6_scaff.209 & scaffold_6_scaff.209 PLD => scaffold_40_scaff.98 + scaffold_40_scaff.81 PLD, Phox-like => scaffold_3_scaff.869 14-3-3 proteins 14-3-3 => scaffold_3_scaff.430 Small G-protein Ras Ras (P. microcarpus ras and Ras1p S. bovinus) => scaffold_11_scaff.210 & scaffold_11_scaff.195 ; scaffold_11_scaff.196 & scaffold_11_scaff.185 ; scaffold_11_scaff.186 Ras (Ras2p S. bovinus) => scaffold_47_scaff.86 + scaffold_96_scaff.10 + scaffold_1_scaff.1164 + scaffold_47_scaff.137 Heterotrimeric GTP-binding proteins Gp-a (Gpa1 U. maydis) => scaffold_60_scaff_87 + scaffold_87_scaff_24 + scaffold_31_scaff_121 + scaffold_31_scaff_112 + scaffold_31_scaff_166 + scaffold_31_scaff_149 + scaffold_31_scaff_179 + scaffold_31_scaff_155 Gpa2 U. maydis => scaffold_57_scaff_31 ; Gpa3 U. maydis => scaffold_38_scaff_18 & scaffold_38_scaff_19 + scaffold_47_scaff_101 ; Gpa4 U. maydis => very low hits Gp-b => scaffold_1_scaff_681 ; scaffold_10_scaff_255 Gp-g => scaffold_2_scaff_833 & scaffold_2_scaff_834 e-value and scores very bad, but % id. and % pos. were really high and the anchoring site to beta subunit was present in sequence Signaling Genes versus Eugene v00.2
Phosphatases PP2A => scaffold_8_scaff_112 PP2B / Calcineurine (Ca dependent ser/thr PPase) => scaffold_25_scaff_97 PP2C => very low hits (e-10) Kinases Protein kinase A (PKA) / cAMP-dependent PK => scaffold_4_scaff_881 Protein kinase C (PKC) => scaffold_3_scaff_687 2-components - histidine kinase => scaffold_34_scaff_64 MAP kinases MAPK (Pmk1 M. grisea & Kpp6, Ubc3, Kpp2 U. maydis) => scaffold_12_scaff_76 + scaffold_12_scaff_321 + scaffold_40_scaff_65 + scaffold_5_scaff_402 MAPK (Kpp4 U. maydis) => scaffold_2_scaff_982 MAPK (Ubc1 & Ubc2 U. maydis) => very low hits MAP kinase kinases MAPKK (Ste7/Ste11 & Fuz7 U. maydis) => scaffold_3_scaff_317 + scaffold_36_scaff_81 Signaling Genes versus Eugene v00.2 (con’t)
Signaling Genes versus TwinScan Summary G PROTEIN GPCR KINASE
Summary and Conclusions • A TwinScan prediction of genes for the Laccaria bicolor scaffolds based on Cryptococcus neoformans has been completed • Number of genes midway between Eugene v00.1 and v00.2 • Does not include all of the EST data (i.e., there are some missing genes) • No further work is planned for the Laccaria genome project • A list of candidate signaling genes families has been prepared and the annotation is progressing • Results will be collated and merged into EMBL records
Acknowledgements • Sébastien Duplessis • Jan Wuyts • Francis Martin • Pierre Rouze • Gopi Podila • NSF US Western Europe Cooperative Research Grant