120 likes | 248 Views
Selected PathoLogic Refining Tasks. Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction. Creating Protein Complexes. Refine -> Create Protein Complexes When multiple polypeptides catalyze the same reaction Could be isozymes Do nothing
E N D
Selected PathoLogic Refining Tasks • Creation of Protein Complexes • Assignment of Modified Proteins • Operon Prediction
Creating Protein Complexes • Refine -> Create Protein Complexes • When multiple polypeptides catalyze the same reaction • Could be isozymes Do nothing • Could be components of a complex • Software can’t tell the difference
Curators decide based on: Names, e.g. subunit A, subunit B How enzyme is organized in other organisms Members of a complex are often neighbors on chromosome Specific biological knowledge based on literature, etc. Complex creation tool: Lists names, gene IDs Shows reaction in MetaCyc Indicates which genes are neighbors Leaves final decision up to curator Manually-assisted Complex Creation
Complex Subunit Stoichiometries • Leave coefficients blank if unknown
Proteins that are Reaction Substrates • Reactions are defined in MetaCyc with protein classes as substrates • Need to find which genes in the genome code for instances of those classes. • Refine -> Assign Modified Proteins • Finds all reactions that • Have an enzyme • Have a protein class as substrate • Name search for substrate • Presents possibilities, asks curator to choose • Chosen protein will be made a child of the protein class
Operon Predictor • Refine -> Predict Transcription Units
Nomenclature • WO pair = pair of genes within an operon • TUB pair = pair of genes at a transcription unit boundary (delineate operons)
Operation of the operon predictor • For each contiguous gene pair, predict whether gene pairs are within the same operon or at a transcription unit boundary • Use pairwise predictions to identify potential operons AB = TUB pair BC = WO pair operon = BCD CD = WO pair DE = TUB pair A B C D E
Operon predictor • We use method from Salgado et al, PNAS (2000) as a starting point. • Uses E. coli experimentally verified data as a training set. • Compute log likelihood of two genes being WO or TUB pair based on intergenic distance. • Predicts operon gene pairs based on: • intergenic distance between genes • genes in the same functional class
Operon predictor Additional features easily computed from a PGDB • both genes products enzymes in the same metabolic pathway • both gene products monomers in the same protein complex • one gene product transports a substrate for a metabolic pathway in which the other gene product is involved as an enzyme • a gene upstream or downstream from the gene pair (and within the same directon) is related to either one of the genes in the pair as per features 1, 2 and 3 above.