470 likes | 637 Views
The Pathway Tools Schema. Motivations for Understanding Schema. Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB
E N D
Motivations for Understanding Schema • Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB • When writing complex queries to PGDBs, those queries must name classes and slots within the schema • A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity
Reference • Pathway Tools User’s Guide, Volume I • Appendix A: Guide to the Pathway Tools Schema
Web of Relationships for One Enzyme Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle
Frame Data Model • Frame Data Model -- organizational structure for a PGDB • Knowledge base (KB, Database, DB) • Frames • Slots • Facets • Annotations
Knowledge Base • Collection of frames and their associated slots, values, facets, and annotations • AKA: Database, PGDB • Can be stored within • An Oracle DB • A disk file • A Pathway Tools binary program
Frames • Entities with which facts are associated • Kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • Classes: • Superclass(es) • Subclass(es) • Instance(s) • A symbolic frame name (id, key) uniquely identifies each frame
Frame IDs • Naming conventions for frame IDs • Uniqueness of frame IDs • Frame IDs must be unique within a PGDB • Goal: Same frame ID within different PGDBs should refer to the same biological entity • Because many frames are imported from MetaCyc, this helps ensure consistency of frame names • Frame IDs for newly created frames (not imported) are generated by Pathway Tools • Those frame IDs contain a PGDB-specific identifier • Example: CPLXzz-nnnn CPLXB3-0035
Slots • Encode attributes/properties of a frame • Integer, real number, string, symbols • Represent relationships between frames • The value of a slot is the identifier of another frame • Every slot is described by a “slot frame” in a KB that defines meta information about that slot
Slot Links Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle in-pathway reaction catalyzes component-of product
Slots • Number of values • Single valued • Multivalued: sets, bags • Slot values • Any LISP object: Integer, real, string, symbol (frame name) • Slotunits define properties of slots: datatypes, classes, constraints • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides
Representation of Function EC# Keq Succinate + FAD = fumarate + FADH2 Cofactors Inhibitors Enzymatic-reaction Molecular wt pI Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle Left-end-position
Monofunctional Monomer Pathway Reaction Enzymatic-reaction Monomer Gene
Bifunctional Monomer Pathway Reaction Reaction Enzymatic-reaction Enzymatic-reaction Monomer Gene
Monofunctional Multimer Pathway Reaction Enzymatic-reaction Multimer Monomer Monomer Monomer Monomer Gene Gene Gene Gene
Pathway and Substrates Reactant-1 Pathway left in-pathway Reactant-2 Reaction Reaction Reaction Reaction Product-1 right Product-2
Transcriptional Regulation trp Int005 apoTrpR Int001 TrpR*trp site001 pro001 Int003 RpoSig70 trpL trpLEDCBA trpE trpD trpC trpB trpA
Principle Classes • Class names are capitalized, plural, separated by dashes • Genetic-Elements, with subclasses: • Chromosomes • Plasmids • Genes • Transcription-Units • RNAs • rRNAs, snRNAs, tRNAs, Charged-tRNAs • Proteins, with subclasses: • Polypeptides • Protein-Complexes
Principle Classes • Reactions, with subclasses: • Transport-Reactions • Enzymatic-Reactions • Pathways • Compounds-And-Elements
Slots in Multiple Classes • Common-Name • Synonyms • Comment • Citations • DB-Links
Genes Slots • Component-Of (links to replicon, transcription unit) • Left-End-Position • Right-End-Position • Centisome-Position • Transcription-Direction • Product
Proteins Slots • Molecular-Weight-Seq • Molecular-Weight-Exp • pI • Locations • Modified-Form • Unmodified-Form • Component-Of
Polypeptides Slots • Gene
Protein-Complexes Slots • Components
Reactions Slots • EC-Number • Left, Right • DeltaG0 • Keq • Spontaneous?
Enzymatic-Reactions Slots • Enzyme • Reaction • Activators • Inhibitors • Physiologically-Relevant • Cofactors • Prosthetic-Groups • Alternative-Substrates • Alternative-Cofactors
Pathways Slots • Reaction-List • Predecessors • Primaries
GKB Editor • Browse class hierarchy and slot definitions • Tools -> Ontology Browser • GKB Editor described at • http://www.ai.sri.com/~gkb/user-man.html
Introduction • MANY ways to access and update PGDBs • APIs in Java, Perl, and Lisp • Import/export of files in many formats • Registry of Pathway/Genome Databases • Import PGDB data into BioWarehouse • Updating a PGDB from an external genome DB
Pathway Tools APIs • Support programmatic queries and updates to PGDBs • APIs in Java, Perl, and Lisp all provide access to a common set of procedures: • Generic Frame Protocol -- Ocelot object database API • Additional Pathway Tools functions • For more information see • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Generic Frame Protocol (GFP) • A library of procedures for accessing Ocelot DBs • GFP specification: • http://www.ai.sri.com/~gfp/spec/paper/paper.html • A small number of GFP functions are sufficient for most complex queries • Knowledge of Pathway Tools schema is critical for using the APIs: • Appendix I of Pathway Tools User’s Guide, Vol I
Generic Frame Protocol • get-class-all-instances (Class) • Returns the instances of Class • Key Pathway Tools classes: • Genetic-Elements • Genes • Proteins • Polypeptides (a subclass of Proteins) • Protein-Complexes (a subclass of Proteins) • Pathways • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters • DNA-Binding-Sites
Generic Frame Protocol • Notation Frame.Slot means a specified slot of a specified frame • get-slot-value(Frame Slot) • Returns first value of Frame.Slot • get-slot-values(Frame Slot) • Returns all values of Frame.Slot as a list • slot-has-value-p(Frame Slot) • Returns T if Frame.Slot has at least one value • member-slot-value-p(Frame Slot Value) • Returns T if Value is one of the values of Frame.Slot • print-frame(Frame) • Prints the contents of Frame • Note: Frame and Slot must be symbols!
Generic Frame Protocol • coercible-to-frame-p (Thing) • Returns T if Thing is the name of a frame, or a frame object • save-kb • Saves the current KB
Generic Frame Protocol –Update Operations • put-slot-value(Frame Slot Value) • Replace the current value(s) of Frame.Slot with Value • put-slot-values(Frame Slot Value-List) • Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values • add-slot-value(Frame Slot Value) • Add Value to the current value(s) of Frame.Slot, if any • remove-slot-value(Frame Slot Value) • Remove Value from the current value(s) of Frame.slot • replace-slot-value(Frame Slot Old-Value New-Value) • In Frame.Slot, replace Old-Value with New-Value • remove-local-slot-values(Frame Slot) • Remove all of the values of Frame.Slot
Additional Pathway Tools Functions –Semantic Inference Layer • Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB • http://bioinformatics.ai.sri.com/ptools/ptools-fns.html
Internal note • Note: Refer to local copy of ptools-fns.html to go through the semantic inference layer fns
File Import/Export Capabilities • PGDBs can be exported in whole or part to: • SBML – Systems Biology Markup Language – sbml.org • Import supported by many simulation packages • File -> Export -> Selected Reactions to SBML File • Pathway Tools Attribute-Value format and column-delimited format files • http://brg.ai.sri.com/ptools/flatfile-format.shtml • Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat Files • Dump selected frames to a single file: File -> Export -> Selected Frames to File
Import/Export • Import from attribute-value or column-delimited files • File -> Import -> Frames From File • Import/Export to/from internal Pathway Tools format that allows pathways, reactions, enzymes, and compounds to be easily moved between Pathway Tools installations • Edit -> Add Pathway to File Export List • File -> Export -> Selected Pathways to File • File -> Import -> Pathways from File • Import/Export to/from MDL molfile format • Edit -> Import compound structure from molfile • Edit -> Export compound structure to molfile
Miscellaneous Exports • Overview -> Highlight -> Save to File • Overview -> Highlight -> Load from File • Gene / Protein Sequence / Save to file • Chromosome -> Show Sequence of a Segment of Replicon
Napster Comes to Bioinformatics • Public sharing of Pathway/Genome Databases • PGDB registry maintained by SRI at URL http://biocyc.org/registry.html • Registry operations • List contents of registry • Download PGDBs listed in the registry • Register PGDBs you have created
Registry Details • Why register your PGDB? • Declare existence of your PGDB in a central location • Facilitate download by other scientists • Why download a PGDB? • Desktop Navigator provides more functionality than Web • Comparative operations • Programmatic querying and processing of PGDB • Registration process • Registered PGDBs have open availability by default • Authors can provide their own license agreements • Registered PGDBs reside on authors’ FTP site
BioWarehouse • Biospice.org
New Import/Export Tools • Suggestions? • Volunteers?
Updating a PGDB From anExternal Genome DB • Example: AraCyc forms a pathway module to the TAIR DB • TAIR is authoritative source for gene and gene-product information • Update AraCyc to reflect updates in TAIR
Proposed Approach • Export TAIR to PathoLogic files • Build AraCyc2 from those PathoLogic files – automated PathoLogic only • Compare AraCyc1 (A1) to AraCyc2 (A2) A. Import new genes/proteins from A2 to A1 B. Delete from A1 genes/proteins not found in A2 C. Rename genes/proteins whose names changed from A2 to A1 • Run name matcher on A1’ • Check for pathways with no enzymes and report them so user can keep any that otherwise PathoLogic will delete • What about enzymes that were assigned to a pathway by the hole filler? • Re-run pathway predictor • Remember what pathways user deletes so they are not re-predicted by PathoLogic • Consider movement of genes from contig to chromosome