210 likes | 222 Views
Explore the functionality of GUS Plugin System for automatic support in data manipulation, documentation, logging, and error handling. Learn about Supported and Community Plugins and their life cycle. Dive into Plugin Initialization, Data Loading, and Manipulation.
E N D
GUS Plugin System Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania
Plugin Overview • Small Perl programs that load and manipulate data within GUS • Written using the GUS Plugin API and Perl Object Layer • Provide automatic support for: • Data Provenance • Object layer and database connectivity • Standardized documentation • Command line argument processing • Logging • Error Handling • “Supported” and “Community” Plugins provided with GUS
Supported Plugins • Have been tested in Oracle and Postgres and are confirmed to work • Portable • Useful beyond the site that developed them • Meet the GUS Plugin Standard
Community Plugins • Fail to meet one or more of the criteria above • Have not been tested • Provided as a general resource to the community
Plugin Life Cycle • Plugin Initialization • Documentation • Command Line Arguments • Data Loading • Reading, Parsing, Querying • Data Manipulation • Insert or Update? • Restart Logic • Data Submission
InsertArrayDesignControl.pm InsertAssayControl.pm InsertBlastSimilarities.pm InsertExternalDatabase.pm InsertExternalDatabaseRls.pm InsertGOEvidenceCode.pm InsertGeneOntology.pm InsertGeneOntologyAssoc.pm InsertRadAnalysis.pm InsertReviewStatus.pm InsertSecondaryStructure.pm InsertSequenceOntology.pm LoadArrayDesign.pm LoadArrayResults.pm LoadFastaSequences.pm LoadGusXml.pm LoadNRDB.pm LoadRow.pm LoadTaxon.pm GUS Supported Plugins
Plugin Shell package GUS::Supported::Plugin::LoadRow; @ISA = qw(GUS::PluginMgr::Plugin); use strict; use GUS::PluginMgr::Plugin; sub new { … } sub run { … }
Plugin Initialization sub new { my ($class) = @_; my $self = {}; bless($self, $class); $self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $', name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation }); return $self; }
Declaring Arguments stringArg({name => 'externalDatabaseVersion', descr => 'sres.externaldatabaserelease.version for this instance of NRDB', constraintFunc => undef, reqd => 1, isList => 0 }), fileArg({name => 'gitax', descr => 'pathname for the gi_taxid_prot.dmp file', constraintFunc => undef, reqd => 1, isList => 0, mustExist => 1, format => 'Text' }),
Argument Types • String • Integer • Boolean • Table Name • Float • File • Enumeration • Controlled Vocab • Local, Database Term Pairs for “dinky” CVs
Declaring Documentation my $tablesDependedOn = [ ['GUS::Model::DoTS::NRDBEntry', 'pulls aa_sequence_id from here when id and extDbId match requested']]; my $documentation = { purposeBrief => $purposeBrief, purpose => $purpose, tablesAffected => $tablesAffected, tablesDependedOn => $tablesDependedOn, howToRestart => $howToRestart, failureCases => $failureCases, notes => $notes };
Plugin Initialization sub new { my ($class) = @_; my $self = {}; bless($self, $class); $self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $', name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation }); return $self; }
Plugin Shell package GUS::Supported::Plugin::LoadRow; @ISA = qw(GUS::PluginMgr::Plugin); use strict; use GUS::PluginMgr::Plugin; sub new { … } sub run { … }
Run Method • “Entry point” for plugin • Concise overview/“table of contents” for plugin: sub run { my ($self) = @_; my $rows = 0; my $rawData = $self->readData();my @parsedData = $self->parseData($rawData); foreach $data (@parsedData) { $data->submit(); $rows++; } return “Inserted $rows ”; }
Accessing Data • Command line arguments: • $self->getArg(‘nrdbFile); • Through Objects: • my $preExtAASeq =GUS::Model::DoTS::ExternalAASequence->new ({'aa_sequence_id'=>$aa_seq_id});$preExtAASeq->retrieveFromDB(); • Direct Database Access: • my $dbh = $self->getQueryHandle();my $sth = $dbh->prepare(…);
Persisting Data • Saving & Updating: • $obj->submit(); • Will cascade and submit children • Delete: • $obj->markDeleted(1);$obj->submit();
Logging and Error Handling • For general logging, use logging functions • Printed to STDERR • $self->log(“message”) • For error handling: • Either die() immediately or • Write errors to a file (for recoverable errors) • Restart functionality • Check for object existence • Check, but ensure loaded from a valid proper invocation • Store data from previous run and use as a filter
Clearing the Cache • Historical: Perl previously had poor garbage collection support • Default capacity of 10000 objects • At the bottom of the outermost loop: • $self->undefPointerCache();
Data Provenance • Tracks plugin revisions-- Name, Checksum, Revision • Tracks parameters that a specific plugin is executed with Algorithm AlgorithmParamKeyType AlgorithmImplementation AlgorithmParamKey AlgorithmInvocation AlgorithmParam
Plugin Evolution • Changes abound: • Data file formats • Schema • Be flexible in writing plugins-- command line configuration • Be clear about what schema objects you use
Plugin Standard • See Developer’s Guide: • http://gusdb.org/documentation/3.5/developers/developersguide.html