390 likes | 519 Views
Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl. Randy Gobbel, Ph.D. May 14, 2003 gobbel@ai.sri.com. Overview. Why would you need to write complex queries? Emacs Lisp perlcyc The GFP API, and Pathway Tools-specific functions Examples and exercises.
E N D
Constructing Complex Queriesin Pathway Toolsusing Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003 gobbel@ai.sri.com
Overview • Why would you need to write complex queries? • Emacs • Lisp • perlcyc • The GFP API, and Pathway Tools-specific functions • Examples and exercises
When do you need complex queries? • Many common queries are accessible from the command menu • By name • By substring • By class • Others are specialized by the type of the object being displayed • Other queries of arbitrary complexity can be created by writing a (simple) program • Example: find all reactions with more than 5 citations
Programmatic Access to PGDBs • LISP and PERL languages used for programmatic queries and updates to PGDBs • Generic Frame Protocol (GFP) is API for PGDBs
Emacs • “The extensible, self-documenting editor” • (Most of the time) typing a printing character simply inserts it • Just like most Windows and MacOS programs • Control and Meta keys in combination with other keys run commands • Again, just like keyboard shortcuts in most programs • Control-H: Help • T -> tutorial, A -> apropos, W -> “where is <command>” • K -> “what does this key combination do?” • Many commands are now available from pulldown menus
Emacs • Three ways to run Pathway Tools from within Emacs • Use the Emacs/Lisp interface provided with Allegro Common Lisp (fi) • Use the free ILisp package (wriitten in Emacs Lisp) • Run Pathway Tools from a shell within Emacs • Windows users: lowest-common-denominator • Cut and paste still works • Advantages of using Emacs with Lisp • Syntax highlighting • Automatic indentation • One-keystroke evaluation of Lisp forms in fi and ilisp
Lisp • An idea that keeps reinventing itself • Function, arguments • What is a list? • Unit of syntax: (a b c) • Unit of data: (a b c) • Unit of execution: (get-slot-value ‘arca ‘citations) • Most languages: function(arg1, arg2, …) • Fine for writing • Lisp: (function arg1 arg2 arg3 …) • Much easier to deal with in a computer
Lisp Data Types • Numbers • 1 • 1.325 • Strings • “hello” • Symbols • E.g.: ARCA (or, arcA) • Make a literal symbol by quoting it: ‘ARCA • Case-sensitive symbols require vertical bars: ‘|Genes| • Special symbols: T and NIL • Used to mean True and False • NIL is also the empty list: ()
Lisp Expressions and Evaluation • (+ 3 4 5) • ‘+’ is a function • (+ 3 4 5) is a function call with 3 arguments • Arguments are evaluated: • Numbers evaluate to themselves • If any of the args are themselves expressions, they are also evaluated • (+ 1 (+ 3 4)) 8 • The values of the args are passed to the function • Some functions allow variable numbers of arguments • (+) 0 • (+ 1) 1 • (+ 2 3 1 3 4 5 6) 24 • (+ (* 3 4) 6) 18
Lisp Expressions and Evaluation • Also called “top level” and “read-eval-print loop” • Uses a three-step process • Read • Reader converts elements outside “” and || to uppercase • Evaluate • Print • Anything you type in is evaluated • 1 1 • “hello” hello • (+ 2 3) 5 • Quoting prevents evaluation • ‘(+ 2 3) (+ 2 3) • Setting a symbol to a value creates a variable: • (setq foo ‘(a b c)) (a b c) • foo (a b c) • No declarations required!
The Lisp Listener • Useful forms in listener: • Previous Results: *, **, *** • But: not in programs (+ 1 2) 3 (+ 3 *) 6 ** 3
Dealing with the Lisp debugger • Error conditions result in a call to the Lisp debugger: • :continue continues, a numeric argument selects between possible options • Lower-numbered options generally take less drastic actions • :reset unwinds to the top level • WARNING: may exit the Pathway Tools window! • :zoom displays the stack EC(4): (xxx) *debugger-hook* called. Error: Attempt to take the value of the unbound variable `X'. [condition type: UNBOUND-VARIABLE] Restart actions (select using :continue): 0: Try evaluating X again. 1: Use :X instead. 2: Set the symbol-value of X and use its value. 3: Use a value without setting X. 4: Return to Top Level (an "abort" restart). 5: Abort entirely from this process. [1] EC(5): :res
Lisp Variables • Global variable values can be set and used during a session • Declarations not needed (setq x 5) 5 x 5 (+ 3 x) 8 (setq y “atgc”) “atgc”
Equality in LISP • Internally LISP refers to objects via pointers • Fundamental equality operation is EQ • True if the two arguments point to the same object • Very efficient • Other comparison operators: • = for numbers: (= x 4) • EQUAL for list structures or exact string matching: (equal x “abc”) • STRING-EQUAL for case-insensitive string matching: (string-equal x “AbC”) • EQL for characters: (eql x #’\A) • EQ for list structures or symbols (compares pointers): (eq x ‘ABC) • FEQUAL for frames: (fequal x ‘trp) • Simple rule: Use EQUAL for everything except frames
Functions for Operating on Lists • length • (length x) • Returns the number of elements • first • (first x) • Returns the first element • nth • (nth j x) • Returns the Jth element of list X (element 0 is the first element)
loop • Loop allows you to iterate • Through a series of numbers • for i from 1 to 10 • Through a list • for rxn in rxns • Conditionals control whether execution continues • when (> (length (get-slot-values rxn ‘citations)) 5) • do lets you do something • do (+ i total) • collect lets you gather up values • collect (get-frame-name rxn)
loop • You can combine as many loop clauses as you need: (loop for i from 1 to 10 for j from 10 downto 1 do (print (+ i j)) collect (* i j)) (10 18 24 28 30 30 28 24 18 10)
Defining Functions • Put function definitions in a file • Reload the file when definitions change • EC(1): :ld my-queries.lisp • (defun <name> (<arguments>) … code for function …) • Creates a new operation called <name> • Examples: (defun square (x) (* x x)) (defun message () (print “Hello”)) (defun test-fn () 1 2 3 4)
Accessing Lisp from Pathway Tools • Starting Pathway Tools for Lisp work: > pathway-tools –lisp EC(1): (select-organism :org-id ‘XXX) Windows: pathway-tools-lisp.exe • Lisp expressions can be typed at any time to the Pathway Tools listener Command: (get-slot-value ‘trp ‘common-name) “L-tryptophan” • Invoking the Navigator from Lisp: EC(2): (eco)
The perlcyc API • Written by Lukas Mueller at TAIR • Downloadable from the TAIR Web site • Installs as a standard CPAN module • From within Pathway Tools, start the server by hand: • (start-external-access-daemon) • (start-external-access-daemon :verbose? t) for tracing output • Function names are the same as Lisp, with hyphens replaced by underscores, question marks by _p • get-class-all-instances get_class_all_instances • coercible-to-frame? coercible_to_frame_p • Pathway Tools functions are callable as standard Perl functions • Frame names are symbols which can be passed back to Lisp • Control structures are standard Perl
javacyc • Uses the same Unix domain socket interface as perlcyc • Function names use Java conventions • Get-slot-values getSlotValues • Includes a C library for Unix domain sockets
Lisp vs. Perl • Task: find all reactions with fewer than 5 citations • Perl: use perlcyc; my $cyc = perlcyc->new(“ECOLI"); my @found; foreach $r ($cyc->all_rxns()){ my @citations = get_slot_values($r, “citations”); if (scalar(@citations) < 5) { push @found, $r; } • Lisp: (loop for r in (all-rxns) when (< (length (get-slot-values r ‘citations)) 5) collect r)
Pathway Tools User Accessible Functions • Internal Pathway Tools functions that users can call • Includes: • Generic Frame Protocol (GFP), the Ocelot object database API • Additional functions specific to Pathway Tools • For more information see • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Generic Frame Protocol (GFP) • A library of Lisp functions for accessing Ocelot DBs • GFP specification: • http://www.ai.sri.com/~gfp/spec/paper/paper.html • A small number of GFP functions are sufficient for most complex queries
Generic Frame Protocol • (get-class-all-instances Class) • Returns the instances of Class • Key Pathway Tools classes: • Genetic-Elements • Genes • Proteins • Polypeptides (a subclass of Proteins) • Protein-Complexes (a subclass of Proteins) • Pathways • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters • DNA-Binding-Sites
Generic Frame Protocol • Note: Frame.Slot means a specified slot of a specified frame • Frame and Slot must be symbols! • (get-slot-value Frame Slot) • Returns first value of Frame.Slot • (get-slot-values Frame Slot) • Returns all values of Frame.Slot as a list • (slot-has-value-p Frame Slot) • Returns T if Frame.Slot has at least one value • (member-slot-value-p Frame Slot Value) • Returns T if Value is one of the values of Frame.Slot • (print-frame Frame) • Prints out the contents of Frame
More useful functions • (coercible-to-frame-p Thing) • Returns T if Thing is the name of a frame, or a frame object • (save-kb) • Saves the current KB • (replace-answer-list <list of frames>) • Makes the specified frames browseable via the Pathway Tools GUI
Generic Frame Protocol –Update Operations • (put-slot-value Frame Slot Value) • Replace the current value(s) of Frame.Slot with Value • (put-slot-values Frame Slot Value-List) • Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values • (add-slot-value Frame Slot Value) • Add Value to the current value(s) of Frame.Slot, if any • (remove-slot-value Frame Slot Value) • Remove Value from the current value(s) of Frame.slot • (replace-slot-value Frame Slot Old-Value New-Value) • In Frame.Slot, replace Old-Value with New-Value • (remove-local-slot-values Frame Slot) • Remove all of the values of Frame.Slot
Additional Pathway Tools Functions –Semantic Inference Layer • Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB • http://bioinformatics.ai.sri.com/ptools/ptools-fns.html
GKB editor • GUI for browsing the frame hierarchy • Command: Special Taxonomy Viewer • View Browse Class Hierarchy (ctrl-B) • Allows viewing of classes, slots, and instances • You can’t write a query unless you know the exact class and slot names • Class names are usually case-sensitive symbols • |Genes|, |Proteins|, …
LISP and GFP References • Common LISP, the Language -- The standard reference • Paper edition by Guy Steele • Online version • http://www.lispworks.com/reference/HyperSpec/Front/index.htm • Information on writing Pathway Tools queries: • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html • http://www.ai.sri.com/pkarp/loop.html • http://bioinformatics.ai.sri.com/ptools/debugger.html
Pathway Tools information Web site • Top top-level page • http://www.biocyc.org/ • General Pathway Tools information • http://bioinformatics.ai.sri.com/ptools/ • How to submit a bug report • http://bioinformatics.ai.sri.com/ptools/bug.html • Writing queries, introductions to Lisp, etc. • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Examples (select-organism :org-id ‘ecoli) ECOLI (setq genes (get-class-all-instances ‘|Genes|)) (……………) (setq monomers (get-class-all-instances ‘|Polypeptides|)) (…………….) (setq genes2 genes) (…………….)
Problems • all-substrates • enzymes-of-reaction • genes-of-reaction • genes-of-pathway • monomers-of-protein • genes-of-enzyme
Example Session (setq x ‘trp) trp (get-slot-value x ‘common-name) “L-tryptophan” (setq aas (get-class-all-instances ‘|Amino-Acids|)) (……..) (loop for x in aas count x) 20
Example Session (loop for x in genes for name = (get-slot-value x ‘common-name) when (and name (search “trp” name)) collect x)) (…) (setq rxns (get-class-all-instances ‘|Reactions|)) (…) (loop for x in rxns when (member-slot-value-p x ‘substrates ‘trp) collect x) (…) (replace-answer-list *)
Example Session (setq x ‘(trp arg)) (TRP ARG) (replace-answer-list x) (TRP ARG) (eco)
How to write a good bug report • Use dribble-bug • (excl:dribble-bug “bug.txt”) to start dribbling • (excl:dribble-bug) to stop • How to get out of the debugger • :bt – short backtrace of what functions are being called • :zoom – more detailed trace • :cont <n> - continue. Lower numbers are less drastic • Be specific, and as detailed as you can stand • What button/key did you push? • Which screen/editor were you using at the time? • What object were you viewing/editing? • Try to find a reproducible test case if at all possible!
How to use autopatch • Patches load automatically on startup, or-- • Special Install Patches • Download and install • Or simply install • Goes to our Web server gets patches, and installs them • Restarting is usually not required • Functions are redefined on the fly • But: if the patch involved initialization, you might need to restart