1 / 24

Dahai Gai Samuel Kalet Hongbo Yang May 16 2001

Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource. Dahai Gai Samuel Kalet Hongbo Yang May 16 2001. X. New Protein. New mRNA. DNA replication RNA synthesis. Signal Transduction. Nucleus.

Download Presentation

Dahai Gai Samuel Kalet Hongbo Yang May 16 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project No. 7Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo Yang May 16 2001

  2. X New Protein New mRNA DNA replication RNA synthesis Signal Transduction Nucleus

  3. RGS in G protein signaling regulation G-GTP: active form G-GDP-: inactive form

  4. Mechanism of RGS Regulation RGS: increases turnover rate of intrinsic GTPase activity of Ga subunits resulting in increased “off” rate, therefore decreased signal

  5. RGS proteins & RGS domain

  6. RGS domain 9  helices

  7. RGS Web Genomic Data RGSdb Informatics Engine Structural Data RGS-Web (Interface) Expression Data Visualization Simulation Engine Interaction Data

  8. Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation

  9. Human genomic data

  10. Gene finding: troubleshooting Trouble: Genescan cannot handle sequences larger than 5M Solving: Split long sequence to multiple short sequences Trouble: If split happen in a gene, genescan will miss it Solving: Make 10kb overlap between split sequences Trouble: Long unknown sequences (Ns) slow down genescan Solving: Replace any long Ns (>0.5kb) with 0.5kb Ns

  11. Gene finding: results Tool: Genescan Machine: Hydra.capsl CPU usage: 75% Mem. Usage: 7.5G Speed: 0.1~0.2s/kb Total running time: 3 days

  12. Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Covered by Gai’s talk Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation

  13. Multiple sequence alignment • ClustalW http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/ • Clustal W is a general purpose multiple alignment program for DNA or proteins. • Multiple alignments are carried out in 3 stages • all sequences are compared to each other (pairwise alignments) • a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity • the final multiple alignment is carried out, using the dendrogram as a guide.

  14. HMM training • HMMer http://hmmer.wustl.edu/ • Using RGS domain sequences have found to train the HMM. • Two set of source data: • Set A: Only those RGS which begin the protein sequence • Set B: all RGS domain sequence in proteins • Elements in set A have high similarity while those in set B have low similarity

  15. HMMer usage • Hmmbuild • Input: aligned sequences • output: the hidden Markov model • hmmcalibrate • work on the HMM to improve the E-value sensitivity • hmmsearch • Input: the built HMM, the target protein sequence • output: the domains found, position of the domain, score and E-value

  16. HMM search result

  17. HMM search result Q: is there correlation between length and # of RGSQ: density(affinity) of the RGS, metric?

  18. Summary of HMMer result

  19. HMMer result summary in detail

  20. Reasons for the miss • Genome sequence not complete or has error in it. • Genescan prediction is not 100% accurate • . . . possible reasons, need further investigation.

  21. Prediction of to-be-discovered RGS proteins Known RGS protein sequences ClustalW HMM model Covered by Gai’s talk Potential RGS proteins 24 human genome Genescan Predicted genes RGS HMM model RGS subfamilies protein sequences ClustalW HMM model Subfamily HMM models Function prediction and biological test Subfamily allocation

  22. Subfamily identification • Build HMM for each subfamily(A - F) • Use each HMM to search the to-be-discovered RGS with high score • Result • chr1-4901, subfamily A • chr4-5038, subfamily A

  23. Summary • Integrated tools • gene scan • ClustalW • HMMer • Our framework works in finding genes, performing multiple sequence alignment, building HMM and search to-be-discovered RGS domain in protein sequences.

  24. References • De Vries, L., Zheng, B., Fischer, T., Elenko, E., Farquhar, M. G. (2000). The regulator of G protein signaling family. Annu. Rev. Pharmacol. Toxicol. 40:235-71 • De Vries, L., and Farquhar, M. G. (1999). RGS proteins: more than just GAPs for heterotrimeric G proteins. Trends. Cell. Biol. 9(4):138-44 • Zheng, B., De Vries, L., and Farquhar, M. G. (1999). Divergence of RGS proteins: evidence for the existence of six mammalian RGS subfamilies.Trends. Biochem. Sci. 24(11):411-4 • Berman, D. M., and Gilman, A. G. (1998) Mammalian RGS Proteins: Barbarians at the Gate. J. Biol. Chem. 1998 273: 1269-1272. • Dohlman, H. G., and Thorner, J. (1997) RGS Proteins and Signaling by Heterotrimeric G Proteins. J. Biol. Chem. 1997 272: 3871-3874.

More Related