1 / 99

Computing with Pathway/Genome Databases

Computing with Pathway/Genome Databases. Motivations for Understanding Pathway Tools Schema. When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema Queries using Lisp, Perl, Java APIs Queries using Query Page

mattsons
Download Presentation

Computing with Pathway/Genome Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing with Pathway/Genome Databases

  2. Motivations for UnderstandingPathway Tools Schema • When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema • Queries using Lisp, Perl, Java APIs • Queries using Query Page • Queries using Structured Advanced Query Form

  3. Motivations for Understanding Schema • Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB • A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity

  4. Pathway Tools Implementation Details • Platforms: • Macintosh, PC/Linux, and PC/Windows platforms • Same binary can run as desktop app or Web server • Production-quality software • Version control • Two regular releases per year • Extensive quality assurance • Extensive documentation • Auto-patch • Automatic DB-upgrade • 420,000 lines of Lisp code

  5. More Information • Pathway Tools Web Site, Tutorial Slides • http://bioinformatics.ai.sri.com/ptools/ • http://bioinformatics.ai.sri.com/ptools/examples.lisp • PerlCyc & JavaCyc API , includes some relationships • http://www.arabidopsis.org/tools/aracyc/perlcyc/ • http://www.arabidopsis.org/tools/aracyc/javacyc/ • Pathway Tools User’s Guide • Appendix: Guide to the Pathway Tools Schema • Curator's Guide • http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf • aic/pathway-tools/nav/12.0/lisp/relationships.lisp

  6. References • Ontology Papers section of http://biocyc.org/publications.shtml • "An Evidence Ontology for use in Pathway/Genome Databases" • "An ontology for biological function based on molecular interactions" • "Representations of metabolic knowledge: Pathways" • "Representations of metabolic knowledge"

  7. Data Exchange • APIs: Lisp API, Java API, and Perl API : read & modify • Cyclone • Export to files • BioPAX Export: since Pathway Tools 9.0 Biopax.org • Export PGDB genome to Genbank format • Export entire PGDB as column-delimited and attribute-value file formats • Export PGDB reactions as SBML -- sbml.org • Import/Export of Pathways: between PGDBs • Import/Export of Selected Frames, for Spreadsheets • Import/Export of Compounds as Molfile, CML • Registering/Publishing PGDBs on WWW • BioWarehouse : Loader for Flatfiles, SQL access • http://bioinformatics.ai.sri.com/biowarehouse/ • BMC Bioinformatics 7:170 2006

  8. Programmatic Access to BioCyc • Common LISP • Native language of Pathway Tools • Interactive & Mature Environment • Full Access to the Data & Many Utility Functions • Source code is available for academics • PerlCyc • API of Functions, Exposed to Perl • Communication through UNIX Socket • JavaCyc • API of Functions, Exposed to Java • Communication through UNIX Socket • Cyclone

  9. Cyclone • Developed by Schachter and colleagues from Genoscope • http://nemo-cyclone.sourceforge.net/archi.php • Cyclone is a Java-based system that: • Extracts data from a Pathway Tools PGDB • Converts it to an XML schema • Maps the data to Java objects and to a relational database • Changes made to the data on the Java side can be committed back to a Pathway Tools PGDB

  10. Pathway Tools Data Model • PGDBs are object-oriented databases • Frame Representation System, named Ocelot • Frame data model • PGDB = Knowledge base = KB = Database = DB • Frames • Slots • PGDBs are stored in three possible ways • Preloaded into binary executable • Ocelot file: single-user • RDBMS: MySQL-4 or Oracle-10 : multi-user, change-logging • Query API: GFP (Generic Frame Protocol)

  11. Frames • Entities with which facts are associated • Kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • Classes: • Superclass(es), Subclass(es), Instance(s) • A symbolic frame name (id, key) uniquely identifies each frame • Examples: EG10223, TRP, Proteins

  12. Slots • Encode attributes and properties of a frame • Represent relationships between frames • The value of a slot is the identifier of another frame

  13. Slots • Number of values • Single valued • Multivalued: sets, bags • Slot values • Any LISP object: Integer, real, string, symbol (frame name) • Every slot is described by a “slot frame” in a KB that defines meta information about that slot • Datatype, classes it pertains to, constraints • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides

  14. Pathway Tools Ontology / Schema • Ontology classes: 1621 • Datatype classes: Define objects from genomes to pathways • Classification systems for pathways, chemical compounds, enzymatic reactions (EC system) • Protein Feature ontology • Controlled vocabularies: • Cell Component Ontology • Evidence codes • Comprehensive set of 248 attributes and relationships

  15. Root Classes in the Pathway ToolsOntology • Chemicals -- All molecules • Polymer-Segments -- Regions of polymers • Protein-Features -- Features on proteins • Paralogous-Gene-Groups • Organisms • Generalized-Reactions -- Reactions and pathways • Enzymatic-Reactions -- Link enzymes to reactions they catalyze • Regulation -- Regulatory interactions • CCO -- Cell Component Ontology • Evidence -- Evidence ontology • Notes -- Timestamped, person-stamped notes • Organizations • People • Publications

  16. Use GKB Editor to Inspect thePathway Tools Ontology • GKB Editor = Generic Knowledge Base Editor • Type in Navigator window: (GKB) or • [Right-Click] Edit->Ontology Editor • View->Browse Class Hierarchy • [Middle-Click] to expand hierarchy • To view classes or instances, select them and: • Frame -> List Frame Contents • Frame -> Edit Frame

  17. Schema Overview

  18. Principal Classes • Class names are capitalized, plural, separated by dashes • Genetic-Elements, with subclasses: • Chromosomes • Plasmids • Genes • Transcription-Units • RNAs • rRNAs, snRNAs, tRNAs, Charged-tRNAs • Proteins, with subclasses: • Polypeptides • Protein-Complexes

  19. Principal Classes • Reactions, with subclasses: • Transport-Reactions • Enzymatic-Reactions • Pathways • Compounds-And-Elements

  20. Principal Classes • Regulation • Regulation-of-Enzyme-Activity • Regulation-of-Transcription • Regulation-of-Transcription-Initiation • Transcriptional-Attenuation

  21. Example of a Single GFP Call • The General Pattern: gfp-function(frame-ID slot-ID value ...) (gfp-function frame-ID slot-ID value …) • LISP (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)

  22. Architecture of the API server –PerlCyc and JavaCyc • Works on Unix (Solaris or Linux) only • Start up Pathway Tools with the –api arg • Pathway Tools listens on a Unix socket – perl program communicates through this socket • Supports both querying and editing PGDBs • Must run perl or java program on the same machine that runs Pathway Tools • This is a security measure, as the API server has no built-in security • Can only handle one connection at a time

  23. Obtaining PerlCyc and JavaCyc Download from http://www.sgn.cornell.edu/downloads/ PerlCyc written and maintained by Lukas Mueller at Boyce Thompson Institute for Plant Research. JavaCyc written by Thomas Yan at Carnegie Institute, maintained by Lukas Mueller. Easy to extend…

  24. GFP functions (require knowledge of Pathway Tools schema): get_slot_values get_class_all_instances put_slot_values Pathway Tools functions (described at http://bioinformatics.ai.sri.com/ptools/ptools-fns.html): genes_of_reaction find_indexed_frame pathways_of_gene transport_p getSlotValues getClassAllInstances putSlotValues genesOfReaction findIndexedFrame pathwaysOfGene transportP Examples of PerlCyc, JavaCyc Functions

  25. Writing a PerlCyc or JavaCyc program • Create a PerlCyc, JavaCyc object: perlcyc -> new (“ORGID”) new Javacyc (“ORGID”) • Call PerlCyc, JavaCyc functions on this object: my $cyc = perlcyc -> new (“ECOLI”); my @pathways = $cyc -> all_pathways (); Javacyc cyc = new Javacyc(“ECOLI”); ArrayList pathways = cyc.allPathways (); • Functions return object IDs, not objects. • Must connect to server again to retrieve attributes of an object. foreach my $p (@pathways) { print $cyc -> get_slot_value ($p, “COMMON-NAME”);} for (int i=0; I < pathways.size(); i++) { String pwy = (String) pathways.get(i); System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }

  26. Sample PerlCyc Query • Number of proteins in E. coli use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); my @proteins = $cyc-> get_class_all_instances("|Proteins|"); my $protein_count = scalar(@proteins); print "Protein count: $protein_count.\n";

  27. Sample PerlCyc Query • Print IDs of all proteins with molecular weight between 10 and 20 kD and pI between 4 and 5. use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { my $mw = $cyc->get_slot_value($p, "molecular-weight-kd"); my $pI = $cyc->get_slot_value($p, "pi"); if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) { print "$p\n"; } }

  28. Sample PerlCyc Query • List all the transcription factors in E. coli, and the list of genes that each regulates: use perlcyc; my $cyc = perlcyc -> new (“ECOLI”); foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { if ($cyc->transcription_factor_p($p)) { my $name = $cyc->get_slot_value($p, "common-name"); my %genes = (); foreach my $tu ($cyc->regulon_of_protein($p)) { foreach my $g ($cyc->transcription_unit_genes($tu)) { $genes{$g} = $cyc->get_slot_value($g, "common-name"); } } print "\n\n$name: "; print join " ", values %genes; } }

  29. Sample Editing Using PerlCyc • Add a link from each gene to the corresponding object in MY-DB (assume ID is same in both cases) use perlcyc; my $cyc = perlcyc -> new (“HPY”); my @genes = $cyc->get_class_all_instances (“|Genes|”); foreach my $g (@genes) { $cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”); } $cyc->save_kb();

  30. Sample JavaCyc Query:Enzymes for which ATP is a regulator import java.util.*; public class JavacycSample { public static void main(String[] args) { Javacyc cyc = new Javacyc("ECOLI"); ArrayList regframes = cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|"); for (int i = 0; i < regframes.size(); i++) { String reg = (String)regframes.get(i); boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP"); if (bool) { String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”); String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”); System.out.println(enz); } } } }

  31. Simple Lisp Query Example:Enzymes for which ATP is a regulator (defun atp-inhibits () (loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|) ;; Does the Regulator slot contain the compound ATP, and the mode ;; of regulation is negative (inhibition)? when (and (member-slot-value-p x ‘Regulator 'ATP) (member-slot-value-p x ‘Mode “-”) ) ;; Whenever the test is positive, we collect the value of the slot Enzyme ;; of the Regulated-Entity of the regulatory interaction frame. ;; The collected values are returned as a list, once the loop terminates. collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) ) ) ;;; invoking the query: (select-organism :org-id 'ECOLI) (atp-inhibits) (get-slot-values 'TRYPSYN-RXN 'LEFT) ==> (INDOLE-3-GLYCEROL-P SER)

  32. Simple Perl Query Example:Enzymes for which ATP is a regulator use perlcyc; my $cyc = perlcyc -> new("ECOLI"); my @regs = $cyc -> get_class_all_instances("|Regulation-of-Enzyme-Activity|"); ## We check every instance of the class foreach my $reg (@regs) { ## We test for whether the INHIBITORS-ALL ## slot contains the compound frame ATP my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp"); my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-"); if ($bool1 && $bool2) { ## Whenever the test is positive, we collect the value of the slot ENZYME . ## The results are printed in the terminal. my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity"); my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme"); print STDOUT "$enz\n"; } }

  33. Getting started with Lisp • pathway-tools –lisp • (load “file”) (compile-file “file.lisp”) • Emacs is a useful editor • Pathway Tools source code is available: ask • Lisp resources: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

  34. Query Gotchas • Study schema carefully • :test #’fequal • Cascade of slot-values: check for NIL

  35. Semantic Inference Layerrelationships.lisp • Library of functions that encapsulate common query building blocks and intricacies of navigating the schema • enzymes-of-gene • reactions-of-gene • pathways-of-gene • genes-of-pathway • pathway-hole-p • reactions-of-compound • top-containers(protein) • all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport etc.) • (all-rxns :metab-pathways)

  36. Pathway Tools Schema and Semantic Inference LayerGenes, Operons, and Replicons

  37. Representing a Genome product components • Classes: • ORG is of class Organisms • CHROM1 is of class Chromosomes • PLASMID1 is of class Plasmids • Gene1 is of class Genes • Product1 is of class Polypeptides or RNA Product1 Gene1 Gene2 CHROM1 genome Gene3 CHROM2 ORG PLASMID1

  38. (defun genes-of-chrom (chrom) (loop for x in (get-slot-values chrom ‘components) when (instance-all-instance-of-p x ‘|Genes|) collect x) )

  39. Polynucleotides Review slots of COLI and of COLI-K12

  40. Genetic-Elements • Sequence is stored in a separate file or database table

  41. Polymer-Segments Review slots of Genes

  42. Complexities of Gene / Gene-ProductRelationships • The Product of a gene can be an instance of Polypeptides or RNAs • An instance of Polypeptides can have more than one gene encoding it • Sequence position: • Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position (usually greater, except at origin) • Transcription-Direction + / - • Alternative splicing: • Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position • Intron positions specified in Splice-Form-Introns of gene product • (200 300) (350 400)

  43. Gene Reaction Schematic

  44. Substring Search Example • Find all genes that contain a given substring within their common name or synonym list. (defun find-gene-by-substring (substring) (let (result) (loop for g in (get-class-all-instances '|Genes|) do (loop for name in (get-slot-values g 'names) when (search substring name :test #'string-equal) do (pushnew g result) ) ) result ) )

  45. Proteins

  46. Proteins and Protein Complexes • Polypeptide: the monomer protein product of a gene (may have multiple isoforms, as indicated at gene level) • Protein complex: proteins consisting of multiple polypeptides or protein complexes • Example: DNA pol III • DnaE is a polypeptide • pol III core is DnaE and two other polypeptides • pol III holoenzymes is several protein complexes combined

  47. Protein Complex Relationships

  48. Slots of a protein (DnaE) • catalyzes • Is it a regulator/reactant/etc? • comment • component-of • dblinks • features (edited in feature editor) • Many other attributes possible

  49. A complex at the frame level (pol III) • Most of the same attributes as polypeptide frame • component-of and components • note coefficients

  50. Protein Complex Relationships

More Related