180 likes | 202 Views
The Ocelot Frame Knowledge Representation System. Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com. Frame Knowledge Representation Systems. Long history of development in the AI knowledge representation community
E N D
The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International pkarp@ai.sri.com
Frame Knowledge Representation Systems • Long history of development in the AI knowledge representation community • Distant cousin of object-oriented databases (convergent evolution) • Background reading on frame systems • P. Karp, “The design space of frame knowledge representation systems” • http://www.ai.sri.com/pubs/files/236.pdf • P. Karp, “Distinguishing Knowledge Bases and Data Bases: Who's on First and What's on Second” • http://www.ai.sri.com/pubs/files/1397.pdf
Ocelot Information • P.D. Karp et al, “A collaborative environment for authoring large knowledge bases,” J Intelligent Information Systems 13:155-94 1999. http://www.ai.sri.com/pkarp/pubs/99jiis.pdf • “Ocelot User’s Guide” http://www.ai.sri.com/pkarp/ocelot/
Ocelot Data Model • Ocelot database • Aka DB, Knowledge Base, KB, PGDB • An Ocelot database is a collection of frames and slots
Ocelot Frames • Two kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • A symbolic frame name (id, key) uniquely identifies each frame • Examples: EG10223, TRP, Proteins • Classes have Superclass(es), Subclass(es), Instance(s) • Instances have one or more parent classes
Slots • Encode attributes and properties of a frame • Molecular weight, gene coordinates, comments • Represent relationships between frames • The value of a slot is the identifier of another frame
Slots • Number of values • Single valued • Multivalued: sets or lists • Slot values • Integer, real, string, symbol (frame name) • Every slot is described by a “slot frame” (slotunit) in a KB that defines meta information about that slot • Datatype, classes it pertains to, constraints • Enumerations • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides
Ocelot Schema • Schema is stored within the DB • Schema is self documenting • Slot frames define metadata about slots • Schema evolution facilitated by • Easy addition/removal of slots, or alteration of slot datatypes • Flexible data formats that do not require dumping/reloading of data • New versions of Pathway Tools include a schema upgrade function • Updates schema to match that of new MetaCyc version • Transforms data into new schema
Ocelot Storage Subsystem • RDBMS KBs • RDBMS schema is independent of application schema • DBMS is submerged within Ocelot, invisible to users • Frames transferred from DBMS to Ocelot • On demand • By background prefetcher • Memory cache • Persistent disk cache speeds performance via Internet
Ocelot Frame Faulting • When a frame is referenced by Pathway Tools • Look in Ocelot virtual memory • Look in disk cache • Look in RDBMS
Ocelot RDBMS Transaction History • RDBMS KBs store complete transaction history • Stored as sequences of GFP operations executed by the user or by Pathway Tools • Right click -> Show -> Changes in pop-up window • Used to compute gene last-curated date • Can be used to open a PGDB in an earlier state
Ocelot RDBMS Concurrency Control • When user A saves updates: • Ocelot queries all transactions that occurred since A last saved or since the start of A’s session • Ocelot compares the operations in those transactions with the updates made by A • If conflicts are found, save does not occur and conflicts are reported to the user • If no conflicts, save proceeds • Other user transactions are evaluated into A’s session • “Refresh”
Ocelot Update Conflicts • Example conflicting updates: • User A deletes frame F ; User B modifies value in slot F • User A changes MW of protein P from 3 to 4 ; User B changes MW of protein P from 3 to 5 • Example of updates that don’t conflict: • User A updates frame E ; User B updates frame F • User A updates the value of P.MW ; User B updates the value of P.pI • Users A and B both delete all values of P.MW
Revert KB Operation • Undoes all changes in current session
Pathway Tools / BioCycSoftware/Database Bundles • Each downloadable Pathway Tools configuration contains a combination of PGDBs • Those PGDBs are loaded into Lisp virtual memory • Build process: • Start Common Lisp • Load in all Pathway Tools compiled Lisp code into virtual memory • Load in all PGDBs for that configuration into virtual memory • Save virtual memory image as binary executable file
“Full BioCyc” or Tier 1+2+3 Configuration • 507 PGDBs loaded into virtual memory
BioCyc at 10,000 Genomes • Scalability of current approach is limited • New approach: For full BioCyc, store PGDBs not in virtual memory but in Franz AllegroCache • AllegroCache is a Common Lisp object-oriented database • Implementation now in hand for Ocelot • We have done extensive performance testing • Performance looks good to 10,000 PGDBs