210 likes | 351 Views
GUS Plugin System. Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania. Plugin Overview. Small Perl programs that load and manipulate data within GUS Written using the GUS Plugin API and Perl Object Layer Provide automatic support for: Data Provenance
E N D
GUS Plugin System Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania
Plugin Overview • Small Perl programs that load and manipulate data within GUS • Written using the GUS Plugin API and Perl Object Layer • Provide automatic support for: • Data Provenance • Object layer and database connectivity • Standardized documentation • Command line argument processing • Logging • Error Handling • “Supported” and “Community” Plugins provided with GUS
Supported Plugins • Have been tested in Oracle and Postgres and are confirmed to work • Portable • Useful beyond the site that developed them • Meet the GUS Plugin Standard
Community Plugins • Fail to meet one or more of the criteria above • Have not been tested • Provided as a general resource to the community
Plugin Life Cycle • Plugin Initialization • Documentation • Command Line Arguments • Data Loading • Reading, Parsing, Querying • Data Manipulation • Insert or Update? • Restart Logic • Data Submission
InsertArrayDesignControl.pm InsertAssayControl.pm InsertBlastSimilarities.pm InsertExternalDatabase.pm InsertExternalDatabaseRls.pm InsertGOEvidenceCode.pm InsertGeneOntology.pm InsertGeneOntologyAssoc.pm InsertRadAnalysis.pm InsertReviewStatus.pm InsertSecondaryStructure.pm InsertSequenceOntology.pm LoadArrayDesign.pm LoadArrayResults.pm LoadFastaSequences.pm LoadGusXml.pm LoadNRDB.pm LoadRow.pm LoadTaxon.pm GUS Supported Plugins
Plugin Shell package GUS::Supported::Plugin::LoadRow; @ISA = qw(GUS::PluginMgr::Plugin); use strict; use GUS::PluginMgr::Plugin; sub new { … } sub run { … }
Plugin Initialization sub new { my ($class) = @_; my $self = {}; bless($self, $class); $self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $', name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation }); return $self; }
Declaring Arguments stringArg({name => 'externalDatabaseVersion', descr => 'sres.externaldatabaserelease.version for this instance of NRDB', constraintFunc => undef, reqd => 1, isList => 0 }), fileArg({name => 'gitax', descr => 'pathname for the gi_taxid_prot.dmp file', constraintFunc => undef, reqd => 1, isList => 0, mustExist => 1, format => 'Text' }),
Argument Types • String • Integer • Boolean • Table Name • Float • File • Enumeration • Controlled Vocab • Local, Database Term Pairs for “dinky” CVs
Declaring Documentation my $tablesDependedOn = [ ['GUS::Model::DoTS::NRDBEntry', 'pulls aa_sequence_id from here when id and extDbId match requested']]; my $documentation = { purposeBrief => $purposeBrief, purpose => $purpose, tablesAffected => $tablesAffected, tablesDependedOn => $tablesDependedOn, howToRestart => $howToRestart, failureCases => $failureCases, notes => $notes };
Plugin Initialization sub new { my ($class) = @_; my $self = {}; bless($self, $class); $self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $', name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation }); return $self; }
Plugin Shell package GUS::Supported::Plugin::LoadRow; @ISA = qw(GUS::PluginMgr::Plugin); use strict; use GUS::PluginMgr::Plugin; sub new { … } sub run { … }
Run Method • “Entry point” for plugin • Concise overview/“table of contents” for plugin: sub run { my ($self) = @_; my $rows = 0; my $rawData = $self->readData();my @parsedData = $self->parseData($rawData); foreach $data (@parsedData) { $data->submit(); $rows++; } return “Inserted $rows ”; }
Accessing Data • Command line arguments: • $self->getArg(‘nrdbFile); • Through Objects: • my $preExtAASeq =GUS::Model::DoTS::ExternalAASequence->new ({'aa_sequence_id'=>$aa_seq_id});$preExtAASeq->retrieveFromDB(); • Direct Database Access: • my $dbh = $self->getQueryHandle();my $sth = $dbh->prepare(…);
Persisting Data • Saving & Updating: • $obj->submit(); • Will cascade and submit children • Delete: • $obj->markDeleted(1);$obj->submit();
Logging and Error Handling • For general logging, use logging functions • Printed to STDERR • $self->log(“message”) • For error handling: • Either die() immediately or • Write errors to a file (for recoverable errors) • Restart functionality • Check for object existence • Check, but ensure loaded from a valid proper invocation • Store data from previous run and use as a filter
Clearing the Cache • Historical: Perl previously had poor garbage collection support • Default capacity of 10000 objects • At the bottom of the outermost loop: • $self->undefPointerCache();
Data Provenance • Tracks plugin revisions-- Name, Checksum, Revision • Tracks parameters that a specific plugin is executed with Algorithm AlgorithmParamKeyType AlgorithmImplementation AlgorithmParamKey AlgorithmInvocation AlgorithmParam
Plugin Evolution • Changes abound: • Data file formats • Schema • Be flexible in writing plugins-- command line configuration • Be clear about what schema objects you use
Plugin Standard • See Developer’s Guide: • http://gusdb.org/documentation/3.5/developers/developersguide.html