1 / 59

S4PA Deployment

S4PA Deployment. M. Hegde Science Systems & Applications, Inc April 26, 2006. Introduction. S4PA dependencies Installing S4PA Creating an S4PA instance Monitoring an S4PA instance Instructions available in S4PA Wiki at http://discette.gsfc.nasa.gov/mwiki/index.php/S4PA. S4PA Dependencies.

galeno
Download Presentation

S4PA Deployment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S4PA Deployment M. Hegde Science Systems & Applications, Inc April 26, 2006

  2. Introduction • S4PA dependencies • Installing S4PA • Creating an S4PA instance • Monitoring an S4PA instance Instructions available in S4PA Wiki at http://discette.gsfc.nasa.gov/mwiki/index.php/S4PA

  3. S4PA Dependencies • Perl 5.8.x • S4P 5.28.1+ • XML::LibXML,XML::LibXSLT,XML::Simple,XML::Twig • Net::FTP, Net::Netrc, Net::SSH2 • MLDBM, DB_File, Storable, Data::Dumper • SOAP::Lite, URI::URL • HTTP_service_URL, Clavis • Compilers/libraries needed for metadata extractors and Giovanni pre-processors.

  4. It is good to know… • Editing XML. Using XML schema is preferred. • .netrc setup • Setting up SSH key exchange if needed. • Perl regular expressions for data polling. • XPath for complex granule replacement logic.

  5. S4PA Directory Structure

  6. S4PA Terminology • S4PA terms • Dataset is the equivalent of data product in WHOM and ESDT in ECS. • Data Class is the logical group of datasets for S4PA’s internal use. Generally, this is the group based on common methods in S4PA. It is not visible to data users. • Data Group is the logical group of datasets from data user’s perspective. • Active File System is the file system where S4PA is currently writing data. • Storage Directory is the root directory for data access in S4PA. Its sub-directories are Data Groups. • Data Provider is the label given to a data provider. All datasets belonging to a provider end up on the same active file system.

  7. Architecting an S4PA Instance • Create a user account for operating S4PA (ex. s4paops) and a group (ex. s4pa) to share resources. • Estimate disk space requirements and divide the RAID into file systems whose size equals that of a backup tape or disk. • Name file systems as /ftp/.<provider>/<nnnn> where <provider> is the data provider’s label and <nnnn> is the 3 or 4 digit label of the file system. Ex: /ftp/.trmm/001 • Create the Storage Directory generally in an FTP area. Ex: /ftp/data/s4pa

  8. Architecting - continued • Determine Datasets, Data Classes and Data Groups supported by the instance. • Determine Data Providers supported by the instance. • Identify metadata extractors if any and get its synopsis. • Identify publication requirements, prepare GCMD DIF and collection README document. • Add entries to .netrc for all hosts with whom S4PA will interact including cases where SSH/SFTP is used. • Set up SSH key exchange if necessary.

  9. Obtaining S4PA Distributions • Generally, an S4PA instance depends on a core and an instance specific distribution. • The distributions are available as gzipped tar files from ftp://s4pt.ecs.nasa.gov/software/s4pa/ • The core distribution is named S4PA-X.Y.Z.tar.gz where X, Y and Z are major, minor and patch release numbers. • The project specific distribution is named S4PA_<ProjectName>-X.Y.Z.tar.gz. • The S4PA core and the instance specific are stored in CVS repository with project names S4PA and S4PA_<ProjectName> (Ex: S4PA_ACDISC, S4PA_TRMM etc.,).

  10. Installing S4PA • Identify and obtain necessary S4PA distributions. • Decompress and un-tar distribution files. • S4PA projects use MakeMaker for installation. Use following steps to install a project. • Change directory to root of the un-tarred directory. • perl Makefile.PL PREFIX=/tools/gdaac/TS2 (substitute mode as necessary) • make • make pure_site_install • Save the un-tarred area of instance specific distribution. It may contain configuration files needed later in the process. Ex: ./doc/xsd/S4paDescriptor.xsd ./doc/xsd/S4paSubscription.xsd

  11. Creating an S4PA Instance • S4PA provides a tool, s4pa_deploy.pl, to create necessary S4PA stations, directories and symbolic links. It can: • Create station directories and necessary station configuration files for all stations in S4PA. • Create directories for datasets in the Storage Directory. • Create a symbolic link to the Active File System. • Transfer README document to its dataset storage directory. It will not: • Set up file systems in the Active File System area. • Set up any configuration file needed by metadata extractor or Giovanni pre-processors.

  12. Creating an S4PA Instance • An S4PA instance is described in a file called deployment descriptor. Suggested name: descriptor_<InstanceName>.xml • The deployment descriptor is in XML and is based on schema ftp://s4pt.ecs.nasa.gov/software/s4pa/S4paDescriptor.xsd • Once created the deployment descriptor is stored in cfgdirectory under the instance specific CVS project. • Run command s4pa_deploy.pl -f <Descriptor> -s <DescriptorSchema>

  13. Deployment Descriptor • Notation • Words in bold indicate XML elements. • Italicized words with a @ as super-script, indicate attributes. • The root element of the descriptor is s4pa, it has a NAME @ indicates the S4PA instance name. • s4pacontains a root, storageDir, tempDir, docmentLocation, urlRoot, logger and one or more providers. • s4pa also contains optional auziliaryBackUpArea, project, protocol, reconciliation, publication, subscription, deletionDelay, postoffice, and houseKeeper.

  14. s4pa Element • The content of rootis the root directory of S4PA stations. Ex: <root>/vol1/OPS/s4pa/</root> • The content of storageDir is the root directory of data archive’s public view. Ex: <storageDir>/ftp/data/s4pa/</storageDir> • The content of tempDiris the global directory serves as the root working directory for filters. Ex: <tempDir>/var/tmp</tempDir> • The content of documentLocation is the URL for storage of README documents for datasets. Ex: <documentLocation>http://discette.gsfc.nasa.gov/uploads </documentLocation>

  15. s4pa Element • The content of urlRoot is the root URLs for accessing data specified as element attributes (FTP and HTTP). • Logger has DIR @ and LEVEL @ • DIR @indicates the directory for storing log files, • LEVEL @indicates the logging level (DEBUG and INFO). • Optional project contains one or more locations which indicates the location of project sandbox for metadata extractor. • Optional subscription has INTERVAL @ indicates the subscribe station’s polling interval, defaults to 86400 (1 day).

  16. s4pa Element • Optional deletionDelay has INTER_VERSION@and INTRA_VERSION@ • INTER_VERSION@ indicates the retention period for inter_version_deletion, defaults to 86400*180 (6 months), • INTRA_VERSION@ indicates the retention period for intra_version_deletion, defaults to 86400 (1 day). • Optional postOffice has • INTERVAL@ indicates the postOffice station’s polling interval, defaults to 10. • MAX_JOBS@ indicates the postoffice station’s max_children, defaults to 1. • MAX_ATTEMP@ indicates the postoffice station’s maximum number of retries before job failure, defaults to 1.

  17. protocol Element • protocol has a NAME@. Valid values are FILE, FTP, SFTP and HTTP. If unspecified, FTP is used for a host. • protocol contains one or more hosts. • host indicates the name of the host for which the specified protocol is to be used. Ex: <protocol NAME=“FTP”> <host>discette.gsfc.nasa.gov</host></protocol > <protocol>NAME=“SFTP”> <host>tads1.ecs.nasa.gov</host> <host>auraraw1.ecs.nasa.gov</host> </protocol>

  18. reconciliation Element • reconciliation is the holder of partner data reconciliation information, it contains optional echo, mirador, dotchart. • echo has URL@, USERNAME@, PASSWORD@, PUSH_USER@, PUSH_PWD@, ENDPOINT_URI@required attributes, and MAX_GRANULE_COUNT@, LOCAL_DIR@, CHROOT_DIR@, STAGING_DIR@, DATA_HOST@, MIN_INTERVAL@optional attributes. • mirador and dotchart has one ENDPOINT_URI@required attribute, and the same set of optional attributes as echo element plus an extra PULL_TIMEOUT@.

  19. publication and echo Element • publication is the holder of metadata publication related information, it contains optional echo, mirador, giovanni, dotchart. • echo contains granuleInsert, granuleDelete, browseInsert, browseDelete, collectionInsert. • echo has HOST@, VERSION@, MAX_GRANULE_COUTN@ Ex: <echo HOST="ingest.echo.nasa.gov" VERSION="10"> <granuleInsert DIR="/data/granule"/> <granuleDelete DIR="/data/granule"/> <browseInsert DIR="/data/browse"/> <browseDelete DIR="/data/browse"/> <collectionInsert DIR="/data/collection"/> </echo>

  20. publication - mirador Element • mirador contains granuleInsert, granuleDelete, productDocument and has a HOST@. • granuleInsert, granuleDelete each has HOST@ and DIR@. • productDocument hasHOST@, DIR@, and CMS_TEMPLATE@. Ex: <mirador HOST="invenio.gsfc.nasa.gov”> <granuleInsert DIR="/ftp/private/Mirador/agdisc/Inserts"/> <granuleDelete DIR="/ftp/private/Mirador/agdisc/Deletes"/> <productDocument DIR="/ftp/private/Mirador/agdisc/ProdDocs” CMS_TEMPLATE=“/home/s4pa/mirador_L2_RCW.dwt”/> </mirador>

  21. publication - giovanni Element • giovanni contains granuleInsert, granuleDelete and has a HOST@. • granuleInsert, granuleDelete each has HOST@ and DIR@. Ex: <giovanni HOST=“gdata1.sci.gsfc.nasa.gov”> <granuleInsert DIR="/ftp/private/Giovanni/agdisc/Inserts"/> <granuleDelete DIR="/ftp/private/Giovanni/agdisc/Deletes"/> </giovanni>

  22. publication - dotChart Element • dotChart contains granuleInsert, granuleDelete, dbExport, collectionInsert and has a HOST@. • granuleInsert, granuleDelete, and collectionInsert each has HOST@ and DIR@. • dbExport hasHOST@, DIR@, and INTERVAL@. Ex: <dotChart HOST=“tads1.ecs.nasa.gov”> <granuleInsert DIR="/ftp/private/Dotchart/pending_insert"/> <granuleDelete DIR="/ftp/private/Dotchart/pending_delete"/> <dbExport DIR="/ftp/private/Dotchart/dbExport”/> <collectionInsert DIR="/ftp/private/Dotchart/pending_dif”/> </dotChart>

  23. houseKeeper Element • houseKeeper is the holder for user defined house keeping jobs. • houseKeeper contains one or more job, the content of each job is the customized script command. • job has NAME@ and DOWNSTREAM@ • NAME@ indicates the job title. • DOWNSTREAM@ indicates the downstream station for output work order. Ex: <job NAME=CLEAN_UP>./my_clean_up_job.sh</job> <job NAME=“REQUEST_DATA” DOWNSTREAM=“other/machine_search”> ./my_auto_request.sh</job>

  24. provider Element • provider has a NAME@. • It contains an activeFileSystem, a poller, a pan and one or more dataClass. • The content of activeFileSystem is the location of the current file system being written to. Ex: /ftp/.trmm/001/ • activeFileSystem has • MAX@ : fraction (0-1) of maximum usable disk space. • FILE_SIZE_MARGIN@ : marginal size needed for every file during ingest. For example, ((1 + FILE_SIZE_MARGIN) * File Size) is the size allocated for a file.

  25. provider Element • NOTIFY_ON_FULL@: email address(es) to alert for volume backup when it is completed. • CONFIGURED_VOLUMES@: Optional use of configuration file to specify allocated volumes for rolling archive and non-continuous volumes partition. • LOW_VOLUME_THRESHOLD @: Optional fraction of the configured volume to trigger anomaly when the free space left on the current volume passed this threshold with no more new volume configured in the line-up.

  26. poller and pdrPoller Element • polleris a complex element holding pdrPollers and dataPollers. • pdrPollerhas an INTERVAL@, MAX_THREAD@ and MAX_FAILURE@. • INTERVAL@ is the polling interval in seconds. Default value is 600 seconds. • MAX_THREAD@ is the maximum number of threads allowed for PDR poller. Default value is 1. • MAX_FAILURE@ is the maximum number of failures allowed for pdrpoller. Default value is 1. • pdrPoller contains pdrFilter and one or more jobs. • pdrFilter has a PATTERN@.

  27. pdrPoller - job Element • job contains exlcude and pdrFilter. • job has a NAME@, HOST@, DIR@, IGNORE_HISTORY@, MERGE_PAN@, PATTERN@, and TYPE@. • NAME@ is the name of poller (must be unique). • HOST@ and DIR@ are the host and directory being polled. • IGNORE_HISTORY@ has a boolean value indicating whether to ignore polling history, default to “false”. • MERGE_PAN@ has a boolean value indicating whether pan merging is required, default to “false”. • PATTERN@ is the PDR filename pattern, default to “\.PDR$”. • TYPE@ is to specify if PDR is “EDOS” type, default is non-EDOS.

  28. dataPollerElement • dataPollerhas INTERVAL@, MAX_THREAD@ and MAX_FAILURE@. They have the same definition as in a pdrPoller. • dataPoller has one or more jobs. • A dataPoller’sjob has following attributes: • NAME@ is the unique name of the poller. • HOST@and DIR@are the host and directory being polled. • PROTOCOL@ is the protocol to be used for polling. Valid values are “FTP”, “FILE”, “HTTP”. Default value is FTP. • EXTERNAL_API @ is an API to supply the list of remote files (in ‘URL|size’ format) for a HTTP protocol poller. • RECURSIVE@ is the Boolean (true or false) value indicates recursive polling of a directory for a FTP protocol poller. Default value is “false”.

  29. dataPoller – job Element • MAX_DEPTH @ is the maximum directory for FILE and HTTP protocol polling. • ORIGINATING_SYSTEM@ is the label to be used with PDRs. It is meaningful to the PAN element to be discussed later. Default value is “S4PA”. • INGORE_HISTORY @ has a boolean value indicating whether to ignore polling history, default to “false”. • MAX_FILE_GROUP @ indicates the maximum number of FileGroup in a resulting PDR for the downstream receiving station. Default is unlimited. • MINIMUM_FILE_SIZE @ is the minimum file size of the polled data file. Default value is 0. • REPOLL_PAUSE @ is the sleep time in second before repolling to confirm the polled file size. Default is no pause. • SUB_DIR_PATTERN @ is the sub-directory pattern in Linux ‘date’ command format for a FTP poller to limit the scanning of matched pattern directories only.

  30. dataPoller – job Element • LATENCY @ in days prior to current date for a matching sub-directory name pattern to be polled in a FTP protocol poller. Ex: <job NAME=“test_poller” HOST=“s4pt.ecs.nasa.gov” DIR=“/ftp/private/TS2” PROTOCOL=“FTP” RECURSIVE=“true” MAX_FILE_GROUP=“20” SUB_DIR_PATTERN=“%Y/%Y%m%d” LATENCY=“31”> • A dataPoller’sjob has one or more datasets. • A dataPoller dataset has a NAME@,VERSION@, and ALIAS@. • NAME@is name of the dataset being polled. • VERSION@is the dataset’s version. • ALIAS@ is the pattern for renaming the polled files.

  31. dataPoller – job Element • dataset contains the Perl regular expression for use with file names to detect files belonging to the dataset. Ex: <dataset NAME=“GDAS1” ALIAS=“GDAS1.$1.00z”> gdas1.PGrbF00\.(\d{6})\.00z$</dataset> • dataset could also contains one file and 0 or more associateFile. • file and associateFile has PATTERN@ and ALIAS@ for multiple-file granule polling. Ex: <dataset NAME=“P3L2TRGB” VERSION=“001”> <file PATTERN=“(P3L2TRGB\d{6}\w)D$”/> <associateFile PATTERN=“$1L”/></dataset>

  32. pan Element • A pan contains a local and an optional remote element. • local contains the local directory name for storing PANs. • remote contains one or more originating_systems. • originating_system has a NAME@, a HOST@, a DIR@, and a NOTIFY@. • NAME@is the value of a field by the same name in PDRs encountered by S4PA. • HOST@ and DIR@are the host name and directory where the PAN for the originating system will be pushed to. • NOTIFY@is the email address for PAN to be sent to.

  33. dataClass Element • dataClasshas NAME@, GROUP@, FREQUENCY@, ACCESS@,TIME_MARGIN@, PUBLISH_ECHO @, PUBLISH_MIRADOR @, PUBLISH_GIOVANNI @, EXPIRY @, and DOC @. • NAME@is the data class name. • GROUP@ is the default data group name for the datasets in the data class. • FREQUENCY@ is the temporal frequency for the dataset. Valid values are “yearly”, “monthly”, “daily”, and “none” (for Climatology dataset). Default is daily. • ACCESS@ is the access type for the dataset. Valid values are “public”, “restricted”, and “hidden”. Default is public. • TIME_MARGIN @ is the time difference in seconds for identifying replacement granules. Default is zero.

  34. dataClass Element • PUBLISH_ECHO@, PUBLISH_MIRADOR@, and PUBLISH_ECHO@ is the Boolean value indicating the publication requirement for each partner. Default is true. • Expiry@is the granule retention days for a rolling archive dataset. Default is no expiration. • DOC@is the filename of the README file for the dataset. • dataClass contains an optional method and one or more dataset(s). Ex: <dataClass NAME=“GLDAS” GROUP=“GLDAS_MONTHLY” FREQUENCY=“yearly” ACCESS=“public” PUBLISH_ECHO=“false” PUBLISH_MIRADOR=“true” PUBLISH_GIOVANNI=“false”> <method>/home/s4paops/bin/s4pa_get_gldas_metadata.pl</method> <dataset NAME=“GLDAS_MOS10_M></dataset> </dataClass>

  35. method Element • method contains metadata, compression, decompression and giovanniPreprocess elements. • metadata contains the complete command for metadata extraction. • compression is a complex element containing command, tmpfile and output. • decompression is a complex element containing command, tmpfile and output. • giovanniPreprocess contains the complete command for Giovanni pre-processing .

  36. compression Element • command specifies the compression command to be used. S4PA replaces any string/substring specified as “INFILE” with name of the file being processed. Ex: hrepack -t 'l3m_data:GZIP 1' -i INFILE -o INFILE.tmp • tmpfile is the file name of command output. You can specify it in terms of “INFILE”. Ex: INFILE.tmp • output is the desired file name after compression. You can specify in terms of “INFILE”. Ex: INFILE

  37. decompression Element • command specifies the compression command to be used. S4PA replaces any string/substring specified as “INFILE” with name of the file being processed. Ex: bunzip2 -f INFILE • tmpfile contains an anonymous Perl subroutine that is supplied with the filename as argument. It returns name of the file produced by the decompression command. Ex: sub {my($a) = @_; $a=~s/\.bz2$//; return $a;} • output contains an anonymous Perl subroutine that is supplied with the filename as argument. It returns the desired file name after decompressing data file. Ex: sub {my($a) = @_; $a=~s/\.bz2$//; return $a;}

  38. dataset Element • dataset has NAME@, GROUP@, FREQUENCY@, ACCESS@,TIME_MARGIN@, DIF_ENTRY_ID@, PUBLISH_ECHO @, PUBLISH_MIRADOR @, PUBLISH_GIOVANNI @, EXPIRY @, and DOC @. They override corresponding values defined for a dataClass. • DIF_ENTRY_ID@ indicates the ID of GCMD DIF. • dataset contains a method and optional ignoreCondition(s), uniqueAttribute(s), associateData(s), and dataVersion(s). • A method in a dataset has the same definition as in a dataClass.

  39. dataset –ignoreCondition • ignoreConditionis the location of a metadata (in XML) attribute in XPATH expressions that specify cases where an incoming granule has to be ignored after comparing the same attribute with the existing granule. It has an optional OPERATOR@. • OPERATOR@ is the operation of comparison. Valid values are “EQ”, “NE”, “GT”, “GE”, “LT”, “LE”. Default value is EQ. Ex: <ignoreCondition OPERATOR=“LE”> //DataGranule/SizeBytesDataGranule</ignoreCondition> -- This will avoid the existing granule being replaced by a smaller size incoming granule covering the same RangeDateTime.

  40. dataset –uniqueAttribute • uniqueAttributeis the location of a metadata (in XML) attribute specified using XPATH for determining the uniqueness of a granule. If the value of XPATH expression matches in all cases, incoming granule is deemed a valid replacement. Otherwise, it is treated as a new granule. It also has an optional OPERATOR@. • OPERATOR@ is the operation of comparison. Valid values are “EQ”, “NE”, “GT”, “GE”, “LT”, “LE”. Default value is EQ. Ex: <uniqueAttribute>//DataGranule/GranuleID</uniqueAttribute> -- This will avoid the existing granule being replaced by a incoming granule with different GranuleID covering the same RangeDateTime.

  41. dataset –associateData • associateData is used to associate a data granule with its browse file from a different dataset. • associateData has a NAME@, a VERSION@, and a TYPE@. • NAME@ is the associated dataset name. • VERSION@ is the optional associated dataset version. Default is versionless. • TYPE@ is the association type, currently “Browse” only. Ex: <dataset NAME="TRMM_2A21"> <associateData NAME="TRMM_2A21_BR“ TYPE="Browse" /> </dataset>

  42. dataset –dataVersion • dataVersionhas LABEL@, FREQUENCY@, ACCESS@,TIME_MARGIN@, DIF_ENTRY_ID@, PUBLISH_ECHO @, PUBLISH_MIRADOR @, PUBLISH_GIOVANNI @, EXPIRY @, and DOC @. They override corresponding values defined for a dataset. • LABEL@ can be an empty string for versionless dataset or a non-white space string for a versioned dataset. • dataVersioncontains optional ignoreCondition(s), uniqueAttribute(s), and associateData(s). • All attributes and element in a dataVersion has the same definition as in a dataset.

  43. Creating S4PA Subscriptions • S4PA instance is described in a file called subscription configuration. Suggested name: subscription_<InstanceName>.xml • Two types of subscription are supported: Pull (user initiates the download) and Push (S4PA push files to users). • The subscription descriptor is based on schema ftp://s4pt.ecs.nasa.gov/software/s4pa/S4paSubscription.xsd • Run command s4pa_update_subscription.pl -f <SubscriptionConfiguration> -d <DescriptorSchema> -s <SubscriptionSchema>

  44. Subscription Descriptor • The root element of the descriptor is subscription, it has a NOTICE_SUBJECT @, a HTTP_ROOT@ and a FTP_ROOT@. • NOTICE_SUBJECT @ indicates the general email delivery notice (DN) subject. • HTTP_ROOT@ specifies the root URL for accessing restricted data. • FTP_ROOT@ specifies the root URL for accessing public data. • subscriptioncontains one or more pushSubscription and pullSubscription.

  45. pushSubscription Element • pushSubscription contains notification, destination, and one or more dataset(s). • pushSubscription has ID@, LABEL@, FTP_ROOT@, HTTP_ROOT@, MAX_GRANULE_COUNT@, USER@, INCLUDE_BROWSE@, VERIFY@. • ID@ is a unique identification string across all subscriptions. • LABEL@ indicates any user-specific string that will be included in the DN. • MAX_GRANULE_COUNT@ sets the maximum number of granules in each subscription. • USER@ specifies the username for Machine Request Interface.

  46. pushSubscription Element • INCLUDE_BROWSE@ sets the inclusion of browse file in the subscription. • VERIFY@ is to confirm the existence of the pushed files on the remote site. • notification specifies the address and the format of subscription delivery notice. It contains an optional filter. • The content of the filter specifies the user provided script to create a special format of the delivery notice (ex. XML formatted). It is only needed when the attribute FORMAT@ is specified as “USER-DEFINED”.

  47. notification Element • notification has FORMAT@, PROTOCOL@, ADDRESS@, NOTICE_SUFFIX@, and NOTICE_SUBJECT@. • FORMAT@indicates the format of the notice. Valid values are “S4PA”, “LEGACY”, “PDR”, and “USER-DEFINEED”. • PROTOCOL@indicates the protocol to be used to send the notice. Valid values are “mailto”, “ftp”, “sftp”, “file”. • ADDRESS@indicates the destination of the notice. It can be subscriber’s email address for “mailto” protocol or “<remote_host>/<remote_directory>” for other protocols. • NOTICE_SUFFIX@and NOTICE_SUBJECT@ specify the DN file extension and the special email notification subject. Default suffix is none and default subject is: “GES DISC Order Notification Order ID: DN<xxx>-<xxx>”

  48. destination Element • destination specifies the destination for subscribed data. • destination has PROTOCOL@andADDRESS@. • PROTOCOL@indicates the protocol to be used to send the notice. Valid values are “mailto”, “ftp”, “sftp”, “file”. • ADDRESS@indicates the destination of the notice. It can be subscriber’s email address for “mailto” protocol or “<remote_host>/<remote_directory>” for other protocols. Ex: <notification FORMAT=“S4PA” PROTOCOL=“mailto” ADDRESS=“s4paops@s4pt.ecs.nasa.gov”/> <destination PROTOCOL=“ftp” ADDRESS=“s4pt.ecs.nasa.gov/ftp/private/TS2/push”/>

  49. subscription - dataset Element • dataset contains optional validator and filter. • dataset has a NAME@ and optional VERISON@. • NAME@ is the dataset name for the subscription. • VERISON@ is the version label for the subscription. Default to all versions under the specified dataset. • validatoris used to validate if the incoming granule will trigger the subscription to be processed. The content should be a boolean value (true or false) or a script that return a boolean value. Specify “false” for an Machine Request Interface (MRI) only subscription which will disable the triggering from ingest. The default content is “true”.

  50. subscription - dataset Element • filter specifies the user-provided script to convert the pattern-matched file and deliver the output to the subscriber. • filter has a PATTERN @ to specify the file pattern to apply the filtering scheme. Ex: <dataset NAME=“D5OIXMET” VERSION=“5.1.0”> <validator>s4pa_sub_check.pl -b ‘2007-01-01’ -e ‘2009-12-31’</validator> <filter PATTERN=“xml”>s4pa_extract_ODL.pl -o /var/tmp</filter> </dataset>

More Related