120 likes | 126 Views
This tool assists curators in creating protein complexes by refining and assigning modified proteins, as well as predicting operons based on gene pairing and intergenic distance. It integrates specific biological knowledge and literature to make informed decisions.
E N D
Selected PathoLogic Refining Tasks • Creation of Protein Complexes • Assignment of Modified Proteins • Operon Prediction
Creating Protein Complexes • Refine -> Create Protein Complexes • When multiple polypeptides catalyze the same reaction • Could be isozymes Do nothing • Could be components of a complex • Software can’t tell the difference
Curators decide based on: Names, e.g. subunit A, subunit B How enzyme is organized in other organisms Members of a complex are often neighbors on chromosome Specific biological knowledge based on literature, etc. Complex creation tool: Lists names, gene IDs Shows reaction in MetaCyc Indicates which genes are neighbors Leaves final decision up to curator Manually-assisted Complex Creation
Complex Subunit Stoichiometries • Leave coefficients blank if unknown
Proteins that are Reaction Substrates • Reactions are defined in MetaCyc with protein classes as substrates • Need to find which genes in the genome code for instances of those classes. • Refine -> Assign Modified Proteins • Finds all reactions that • Have an enzyme • Have a protein class as substrate • Name search for substrate • Presents possibilities, asks curator to choose • Chosen protein will be made a child of the protein class
Operon Predictor • Refine -> Predict Transcription Units
Nomenclature • WO pair = pair of genes within an operon • TUB pair = pair of genes at a transcription unit boundary (delineate operons)
Operation of the operon predictor • For each contiguous gene pair, predict whether gene pairs are within the same operon or at a transcription unit boundary • Use pairwise predictions to identify potential operons AB = TUB pair BC = WO pair operon = BCD CD = WO pair DE = TUB pair A B C D E
Operon predictor • We use method from Salgado et al, PNAS (2000) as a starting point. • Uses E. coli experimentally verified data as a training set. • Compute log likelihood of two genes being WO or TUB pair based on intergenic distance. • Predicts operon gene pairs based on: • intergenic distance between genes • genes in the same functional class
Operon predictor Additional features easily computed from a PGDB • both genes products enzymes in the same metabolic pathway • both gene products monomers in the same protein complex • one gene product transports a substrate for a metabolic pathway in which the other gene product is involved as an enzyme • a gene upstream or downstream from the gene pair (and within the same directon) is related to either one of the genes in the pair as per features 1, 2 and 3 above.