240 likes | 332 Views
Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource. Dahai Gai Samuel Kalet Hongbo Yang May 16 2001. X. New Protein. New mRNA. DNA replication RNA synthesis. Signal Transduction. Nucleus.
E N D
Project No. 7Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo Yang May 16 2001
X New Protein New mRNA DNA replication RNA synthesis Signal Transduction Nucleus
RGS in G protein signaling regulation G-GTP: active form G-GDP-: inactive form
Mechanism of RGS Regulation RGS: increases turnover rate of intrinsic GTPase activity of Ga subunits resulting in increased “off” rate, therefore decreased signal
RGS domain 9 helices
RGS Web Genomic Data RGSdb Informatics Engine Structural Data RGS-Web (Interface) Expression Data Visualization Simulation Engine Interaction Data
Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation
Gene finding: troubleshooting Trouble: Genescan cannot handle sequences larger than 5M Solving: Split long sequence to multiple short sequences Trouble: If split happen in a gene, genescan will miss it Solving: Make 10kb overlap between split sequences Trouble: Long unknown sequences (Ns) slow down genescan Solving: Replace any long Ns (>0.5kb) with 0.5kb Ns
Gene finding: results Tool: Genescan Machine: Hydra.capsl CPU usage: 75% Mem. Usage: 7.5G Speed: 0.1~0.2s/kb Total running time: 3 days
Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Covered by Gai’s talk Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation
Multiple sequence alignment • ClustalW http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/ • Clustal W is a general purpose multiple alignment program for DNA or proteins. • Multiple alignments are carried out in 3 stages • all sequences are compared to each other (pairwise alignments) • a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity • the final multiple alignment is carried out, using the dendrogram as a guide.
HMM training • HMMer http://hmmer.wustl.edu/ • Using RGS domain sequences have found to train the HMM. • Two set of source data: • Set A: Only those RGS which begin the protein sequence • Set B: all RGS domain sequence in proteins • Elements in set A have high similarity while those in set B have low similarity
HMMer usage • Hmmbuild • Input: aligned sequences • output: the hidden Markov model • hmmcalibrate • work on the HMM to improve the E-value sensitivity • hmmsearch • Input: the built HMM, the target protein sequence • output: the domains found, position of the domain, score and E-value
HMM search result Q: is there correlation between length and # of RGSQ: density(affinity) of the RGS, metric?
Reasons for the miss • Genome sequence not complete or has error in it. • Genescan prediction is not 100% accurate • . . . possible reasons, need further investigation.
Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Covered by Gai’s talk Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation
Subfamily identification • Build HMM for each subfamily(A - F) • Use each HMM to search the to-be-discovered RGS with high score • Result • chr1-4901, subfamily A • chr4-5038, subfamily A
Summary • Integrated tools • gene scan • ClustalW • HMMer • Our framework works in finding genes, performing multiple sequence alignment, building HMM and search to-be-discovered RGS domain in protein sequences.
References • De Vries, L., Zheng, B., Fischer, T., Elenko, E., Farquhar, M. G. (2000). The regulator of G protein signaling family. Annu. Rev. Pharmacol. Toxicol. 40:235-71 • De Vries, L., and Farquhar, M. G. (1999). RGS proteins: more than just GAPs for heterotrimeric G proteins. Trends. Cell. Biol. 9(4):138-44 • Zheng, B., De Vries, L., and Farquhar, M. G. (1999). Divergence of RGS proteins: evidence for the existence of six mammalian RGS subfamilies.Trends. Biochem. Sci. 24(11):411-4 • Berman, D. M., and Gilman, A. G. (1998) Mammalian RGS Proteins: Barbarians at the Gate. J. Biol. Chem. 1998 273: 1269-1272. • Dohlman, H. G., and Thorner, J. (1997) RGS Proteins and Signaling by Heterotrimeric G Proteins. J. Biol. Chem. 1997 272: 3871-3874.