270 likes | 405 Views
The Pathway Tools Schema. Motivations for Understanding Schema. Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB
E N D
Motivations for Understanding Schema • Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB • When writing complex queries to PGDBs, those queries must name classes and slots within the schema • A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity
Reference • Pathway Tools User’s Guide, Volume I • Appendix A: Guide to the Pathway Tools Schema
Web of Relationships for One Enzyme Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle
Frame Data Model • Frame Data Model -- organizational structure for a PGDB • Knowledge base (KB, Database, DB) • Frames • Slots • Facets • Annotations
Knowledge Base • Collection of frames and their associated slots, values, facets, and annotations • AKA: Database, PGDB • Can be stored within • An Oracle or MySQL DB • A disk file • Pathway Tools binary program
Frames • Entities with which facts are associated • Kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • Classes: • Superclass(es) • Subclass(es) • Instance(s) • A symbolic frame name (id, key) uniquely identifies each frame
Slots • Encode attributes/properties of a frame • Integer, real number, string • Represent relationships between frames • The value of a slot is the identifier of another frame • Every slot is described by a “slot frame” in a KB that defines meta information about that slot
Slot Links Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle in-pathway reaction catalyzes component-of product
Slots • Number of values • Single valued • Multivalued: sets, bags • Slot values • Any LISP object: Integer, real, string, symbol (frame name) • Slotunits define properties of slots: datatypes, classes, constraints • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides
Representation of Function EC# Keq Succinate + FAD = fumarate + FADH2 Cofactors Inhibitors Enzymatic-reaction Molecular wt pI Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle Left-end-position
Monofunctional Monomer Pathway Reaction Enzymatic-reaction Monomer Gene
Bifunctional Monomer Pathway Reaction Reaction Enzymatic-reaction Enzymatic-reaction Monomer Gene
Monofunctional Multimer Pathway Reaction Enzymatic-reaction Multimer Monomer Monomer Monomer Monomer Gene Gene Gene Gene
Pathway and Substrates Reactant-1 Pathway left in-pathway Reactant-2 Reaction Reaction Reaction Reaction Product-1 right Product-2
Transcriptional Regulation trp Int005 apoTrpR Int001 TrpR*trp site001 pro001 Int003 RpoSig70 trpL trpLEDCBA trpE trpD trpC trpB trpA
Principle Classes • Class names are capitalized, plural, separated by dashes • Genetic-Elements, with subclasses: • Chromosomes • Plasmids • Genes • Transcription-Units • RNAs • rRNAs, snRNAs, tRNAs, Charged-tRNAs • Proteins, with subclasses: • Polypeptides • Protein-Complexes
Principle Classes • Reactions, with subclasses: • Transport-Reactions • Enzymatic-Reactions • Pathways • Compounds-And-Elements
Frame IDs of Instances • Instance frame ID conventions have evolved over time • Examples: • Pathways • TRPSYN-PWY, P23-PWY • Genes • AG10045 • Monomers • TRPA-MONOMER, AG10045-MONOMER
Slots in Multiple Classes • Common-Name • Synonyms • Names (computed as union of Common-Name, Synonyms) • Comment • Citations • DB-Links
Genes Slots • Component-Of (links to replicon, transcription unit) • Left-End-Position • Right-End-Position • Centisome-Position • Transcription-Direction • Product
Proteins Slots • Molecular-Weight-Seq • Molecular-Weight-Exp • pI • Locations • Modified-Form • Unmodified-Form • Component-Of
Polypeptides Slots • Gene
Protein-Complexes Slots • Components
Reactions Slots • EC-Number • Left, Right • Substrates (computed as union of Left, Right) • DeltaG0 • Keq • Spontaneous?
Enzymatic-Reactions Slots • Enzyme • Reaction • Activators • Inhibitors • Physiologically-Relevant • Cofactors • Prosthetic-Groups • Alternative-Substrates • Alternative-Cofactors
Pathways Slots • Reaction-List • Predecessors • Primaries