1 / 14

A New Interface to GeneKeyDB

A New Interface to GeneKeyDB. Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng. Protein domains are distinct units of protein three-dimensional structure, which also carry function.

Download Presentation

A New Interface to GeneKeyDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng

  2. Protein domains • are distinct units of protein three-dimensionalstructure, which also carry function. • Proteins can be composedof single or multiple domains. • A few thousand conserved domain modelsare sufficient to cover more than two thirds of known proteinsequences. Marchler-Bauer A, et al. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Research 31:383-387 (2003) .

  3. The growth of the number of proteins known vs. the growth in the number of unique domains Geer,L.Y., Domrachev,M., Lipman,D.J. and Bryant,S.H. (2002) CDART: Protein Homology by Domain Architecture. Genome Res., 12, 1619–1623.

  4. Conserved Domain Database (CDD): • http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml • a curated Entrez database of conserved domain alignments at NCBI • currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI, such as COG.

  5. Data generation using GeneKeyDB -- create a master table of associatioship between -- locuslink id and cdd_key CREATE TABLE peng_cddlist as (SELECT a.ll_id, b.ll_refseq_nm_id, c.cdd_key, c.cdd_evalue, a.organism FROM ll_locus a, ll_refseq_nm b, ll_np_cdd c WHERE a.ll_id = b.ll_id and b.ll_refseq_nm_id = c.ll_refseq_nm_id ); commit;

  6. Summary of Data

  7. Looking at groups of domains • We look at a list of cdd domains and return the proteins that are found exclusively in the intersection of those domains. • If a second (third, etc.) list of domains is added, we look at the proteins found exclusively in the intersection of this list, and we combine this with previous lists and do the same.

  8. Looking at groups of domains B A + B A

  9. Options • This can be done using either human or mouse data. • We can turn the exclusivity off, so that we return all proteins in the intersection of the list of cdd keys.

  10. Sample Input and Output Input the first list of domains. The domains should be separated by spaces and should all be on one line. 1 438 (1 438): Input another list of domains separated by spaces (or hit q to quit): 1825 (1825): (1 438 1825): 28992 83666 Input another list of domains separated by spaces (or hit q to quit):

  11. Why useful? A thought 2003

  12. ?: log[P(k)] ~ -  k k: the number of CDs per protein

  13. Redundancy in CDD?

  14. Following works: • Remove CDD redundancy • Distribution of the minimal set of proteins across different biological processes/subcellular location (GO terms) • Application in other types of graph with same or different dataset, such genes + TBS

More Related