550 likes | 748 Views
iRODS Prototype Update NCCS Advanced Technology Team. 16 March 2009. Change Log. Outline. What is iRODS? iRODS Commands Rules and Micro-services NCCS Prototype Prototype Tests Web Browser and HDF5 Viewer NCCS Architecture and Data Management What Next Backup Slides
E N D
iRODS Prototype UpdateNCCS Advanced Technology Team 16 March 2009
Change Log NCCS Data Management.
Outline • What is iRODS? • iRODS Commands • Rules and Micro-services • NCCS Prototype • Prototype Tests • Web Browser and HDF5 Viewer • NCCS Architecture and Data Management • What Next • Backup Slides • Additional iRODS Information • Performance Testing NCCS Data Management.
What Is iRODS • Integrated Rule-Oriented Data System • Data grid software system developed by the Data Intensive Cyber Environments (DICE) group (developers of the SRB, the Storage Resource Broker), and collaborators. • Or it is everything and/or nothing NCCS Data Management.
Basic iRODS Components Collection(s) iRODS Installation(s) icommands Federation Metadata Metadata admin(s) icat icat guis/apis Collection(s) resource(s) resource(s) user(s) NCCS Data Management.
icommands – Unix Like • iinit Initialize - Store your password in a scrambled form for automatic use by other icommands. • iput Store a file • iget Get a file • imkdir Like mkdir, make an iRODS collection (similar to a directory or Windows folder) • ichmod Like chmod, allow (or later restrict) access to your data objects by other users. • icp Like cp or rcp, copy an iRODS data object • irm Like rm, remove an iRODS data object • ils Like ls, list iRODS data objects (files) and collections (directories) • ipwd Like pwd, print the iRODS current working directory • icd Like cd, change the iRODS current working directory • irepl Replicate data objects. • iexit Logout (use 'iexit full' to remove your scrambled password from the disk) • ipasswd Change your irods password. • ichksum Checksum one or more data-object or collection from iRODS space. • imv Moves/renames an irods data-object or collection. • iphymv Physically move files in iRODS to another storage resource. • ireg Register a file or a directory of files and subdirectory into iRODS. • irmtrash Remove one or more data-object or collection from a RODS trash bin. • irsync Synchronize the data between a local copy and the copy stored in iRODS or between two iRODS copies. • itrim Trim down the number of replica of a file in iRODS by deleting some replicas. • iexecmd Remotely Execute (fork and exec) a command on the server. • imcoll Manage (mount, unmount, synchronize and purge of cache) mounted iRODS collections and the associated cache. • ibun Upload and download structured (e.g. tar) files. NCCS Data Management.
icommands - Metadata • imeta Add, remove, list, or query user-defined Attribute-Value-Unit triplets metadata • isysmeta Show or modify system metadata • iquest Query (pose a question to) the ICAT, via a SQL-like interface NCCS Data Management.
icommands - Informational • ienv Show current iRODS environment • ilsresc List resources • iuserinfo List users • imiscsvrinfo Get basic server information; test communication • irule Submit a user defined rule to be executed by an irods server. • iqstat Show pending iRODS rule executions. • iqdel Removes delayed rules from the queue. • iqmod Modifies delayed rules in the queue. NCCS Data Management.
Rules • The Rule Engine is a critical and fundamental component of the iRODS system, and is involved in many iRODS operations. • The core set of rules are defined in the "core.irb" text file in the release. • The names that begin with "msi" in the rules are Micro-Service Interface routines. These are 'C' functions that the Rules call and that may then call other iRODS functions. • Rules format • actionDef | condition | workflow-chain |recovery-chain • Example: • acCreateUser||msiCreateUser##acCreateDefaultCollections##msiCommit|msiRollback##msiRollback##nop NCCS Data Management.
Micro-service • Small, well-defined procedures/functions that perform a certain task. • Developed and made available by system programmers and application programmers and compiled into the iRODS server code. • Users and administrators can chain these micro-services to implement a larger macro-level functionality (actions) that they want to use or provide for others. NCCS Data Management.
Adding a Micro-service • Develop module – collection of specialize micro-services • Conform to directory structure • Write micro-services ‘C’ code (hdf5 example printout) • Enable module • Make module • Rebuild action tables NCCS Data Management.
msiDataObjReplMicro-service Example /** * \fn msiDataObjRepl * \module core * \author Mike Wan * \date 2007 * \brief replicate an existing data object * \param[in STR_MS_T or DataObjInp_MS_T] dataObjName: Path name of data object * \param[in STR_MS_T] rsrcName: optional * \param[out INT_MS_T] status: status of the operation * \DolVarDependence none * \DolVarModified none * \iCatAtrDependence none * \iCatAttrModified none * \sideeffect none * \return integer * \retval 0 on success * \bug no known bugs **/ NCCS Data Management.
iRODS Prototype NCCS Data Management.
iput iput data icat Client resource metadata Metadata /<filesystem> Data iput –R <resource> </path/filename> NCCS Data Management.
iput With Replicate iput data icat Client Resource 1 metadata Metadata /<filesystem> Data metadata Resource 2 Rule added to core.irb Data data NCCS Data Management.
ils Showing Multiple Copies kirk@client1nccs:~> ils -L /archivenccsZone/home/kirk: kirk 0 client1nccsResc 0 2009-02-27.13:11 & file_1 /tms/home/kirk/file_1 kirk 1 archivenccsResc 0 2009-02-27.13:12 & file_1 /home/archivenccs/iRODS/Vault/home/kirk/file_1 kirk 0 client1nccsResc 0 2009-02-27.13:11 & file_2 /tms/home/kirk/file_2 kirk 1 archivenccsResc 0 2009-02-27.13:13 & file_2 /home/archivenccs/iRODS/Vault/home/kirk/file_2 kirk 0 archivenccsResc 0 2009-02-27.13:11 & file_3 /home/archivenccs/iRODS/Vault/home/kirk/file_3 kirk 1 client1nccsResc 0 2009-02-27.13:13 & file_3 /tms/home/kirk/file_3 kirk 0 archivenccsResc 0 2009-02-27.13:11 & file_4 /home/archivenccs/iRODS/Vault/home/kirk/file_4 kirk 1 client1nccsResc 0 2009-02-27.13:13 & file_4 /tms/home/kirk/file_4 NCCS Data Management.
ireg icat client resource metadata Metadata /<filesystem> Data ireg –R <resource> </path/filename> </irods/full/path> NCCS Data Management.
ireg With Replicate icat client Resource 1 metadata Metadata /<filesystem> Data metadata Resource 2 Data data Rule added to core.irb NCCS Data Management.
ireg With Replicate – Shared File System icat Client Client N/Resource 1 metadata Metadata Data /<filesystem> metadata Resource 2 Data data NCCS Data Management.
iget iget data icat client resource metadata Metadata /<filesystem> Data iget –R <resource> </path/filename> NCCS Data Management.
iget Replication Number icat client Resource 1 metadata Metadata /<filesystem> Data metadata Resource 2 data iget -n Data NCCS Data Management.
isysmeta [hoot@leftknee src]$ isysmeta -l ls hdf5_test.h5 doing ls of /leftkneeZone/home/leftknee/hdf5_test.h5 data_name: hdf5_test.h5 data_id: 10012 coll_id: 10008 data_repl_num: 0 data_version: data_type_name: generic data_size: 1782027 resc_group_name: resc_name: leftkneeResc data_path : /home/hoot/irods/iRODS/Vault/home/leftknee/hdf5_test.h5 data_owner_name: leftknee data_owner_zone: leftkneeZone data_repl_status: 1 data_status: data_checksum : data_expiry_ts (expire time): : None data_map_id: 0 r_comment: create_ts: 01235592554: 2009-02-25.15:09:14 modify_ts: 01235592554: 2009-02-25.15:09:14 NCCS Data Management.
imeta – Attribute Value Units [hoot@leftknee src]$ imeta ls -d hdf5_test.h5 AVUs defined for dataObj hdf5_test.h5: None [hoot@leftknee src]$ imeta add -d hdf5_test.h5 length 10 meters [hoot@leftknee src]$ imeta ls -d hdf5_test.h5 AVUs defined for dataObj hdf5_test.h5: attribute: length value: 10 units: meters [hoot@leftknee src]$ imeta add -d hdf5_test.h5 weight 213 kilograms [hoot@leftknee src]$ imeta ls -d hdf5_test.h5 AVUs defined for dataObj hdf5_test.h5: attribute: length value: 10 units: meters ---- attribute: weight value: 213 units: kilograms NCCS Data Management.
iRODS Web Browser NCCS Data Management.
HDFview iRODS NCCS Data Management.
iRODS Explorer For Windows NCCS Data Management.
Other iRODS Access Methods • FUSE • File system like interface • Tested – caching and performance concerns • PRODS • PHP client API • Does not depend on any external library • Talks to iRODS server directly via sockets with native iRODS XML protocol • Jargon • Pure java API for developing programs with a data grid interface • Currently handles file I/O for local and SRB/iRODS file systems, as well as querying and modify SRB/iRODS metadata • Easily extensible to other file systems. • WebDAV • Access from a iPhone NCCS Data Management.
Security • Default is single authentication – user/password • Grid Security Infrastructure (GSI) option • Globus a prerequisite • Based on public key cryptography NCCS Data Management.
Passwords • Challenge/response protocol using an MD5 hash confirms user has the correct password, • Routines are derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm • Password not sent on the network • iRODS user passwords stored in the iCAT database in a scrambled form • iinit stores the password on disk in a scrambled form • Avoids storing plain-text passwords in files • Warning: with the source code, passwords can be descramble the passwords • Scrambling algorithm is iRODS-specific and is not high-grade encryption • Database system (PostgreSQL) passwords used to control access to the iCAT database • Stored in a server configuration file (by the install script) also in a scrambled form NCCS Data Management.
Access Permissions - ichmod • Default – file owner has full control (read, write or delete) • As owner, give access to other users or groups, either just read access, or read and write, or full ownership • If 'own' given to someone else, they can also give (and remove) access to others. • Remove access by changing the access to 'null'. • Multiple paths can be entered on the command line. • If the entered path is a collection, then the access permissions to that collection will be modified • Give write access to a user or group so they can store files into one of your collections. Access permissions on collections are not currently displayed via ils • As normally configured, all users can read all collections • Inherit/noinherit form sets or clears the inheritance attribute of one or more collections. When collections have this attribute set, new dataObjects and collections added to the collection inherit the access permisions (ACLs) of the collection. 'ils -A' displays ACLs and the inheritance status. NCCS Data Management.
Group ichmod Example archivenccs@archivenccs:~/test> ils -A /archivenccsZone/home/hoot: ACL - hoot#archivenccsZone:own Inheritance - Disabled file1 ACL - hoot#archivenccsZone:own file2 ACL - hoot#archivenccsZone:own file3 ACL - hoot#archivenccsZone:own ichmod read blue file1 ichmod write red file2 ichmod own rodsadmin file3 archivenccs@archivenccs:~/test> ils -A /archivenccsZone/home/hoot: ACL - hoot#archivenccsZone:own Inheritance - Disabled file1 ACL - blue#archivenccsZone:read object hoot#archivenccsZone:own file2 ACL - hoot#archivenccsZone:own red#archivenccsZone:modify object file3 ACL - hoot#archivenccsZone:own rodsadmin#archivenccsZone:own NCCS Data Management.
Collection ichmod Example ichmod own rodsadmin /archivenccsZone/home/hoot archivenccs@archivenccs:~/test> ils -A /archivenccsZone/home/hoot: ACL - george#archivenccsZone:own hoot#archivenccsZone:own rodsBoot#archivenccsZone:own Inheritance - Disabled file1 ACL - blue#archivenccsZone:read object hoot#archivenccsZone:own file2 ACL - hoot#archivenccsZone:own red#archivenccsZone:modify object file3 ACL - hoot#archivenccsZone:own rodsadmin#archivenccsZone:own NCCS Data Management.
Inheritance ichmod Example ichmod inherit /archivenccsZone/home/hoot archivenccs@archivenccs:~/test> ils -A /archivenccsZone/home/hoot: ACL - george#archivenccsZone:own hoot#archivenccsZone:own rodsBoot#archivenccsZone:own Inheritance - Enabled file1 ACL - blue#archivenccsZone:read object hoot#archivenccsZone:own file2 ACL - hoot#archivenccsZone:own red#archivenccsZone:modify object file3 ACL - hoot#archivenccsZone:own rodsadmin#archivenccsZone:own NCCS Data Management.
NCCS Representative Architecture Existing Planned for FY09 Future Plans NCCS LAN (1 GbE and 10 GbE) Data Portal Login Existing Discover 65 TF Analysis FY09 Upgrade ~40 TF Future Upgrades TBD Data Gateways Data Management Viz Direct Connect GPFS Nodes ARCHIVE GPFS I/O Nodes GPFS I/O Nodes GPFS I/O Nodes Disk ~300 TB GPFS Disk Subsystems ~ 1.3 PB Tape ~8 PB Management Servers License Servers GPFS Management PBS Servers Other Services Internal Services NCCS Data Management.
Representative Architecture The analysis uses also require very fast read access to this data from the NCCS analysis platform. The modelers require very fast I/O when generating data on the NCCS computational systems. The generators of the data also want a easy method for sharing data. Analysis Service Compute Service Data Portal FAST FAST SLOW ARCHIVE GPFS Storage Cluster SLOW SLOW The generators of the data also want to store the files into the archive for long term stewardship and retrieval (if necessary). NCCS Data Management.
Competing Requirements • Capacity and Throughput • IPCC, as an example, requires a large amount of data to be kept on disk. • The modelers generating the data also need a fast file system to write and subsequently read that data. • The analysis users need a fast file system from which to access the large amount of data. • All of this lends itself nicely to a global parallel file system (GPFS). • How do we include data management in this model? NCCS Data Management.
Data Management Concept of OperationsArchive Access • Pros • Simple, parallel transfers • High throughput for large files (~100 MB/sec) • Metadata captured iRODS iCAT iRODS Resource iRODS Clients ARCHIVE DISCOVER A BIT FASTER iput, iget SLOW (~10 MB/sec) NFS, cp, scp NOT AS SLOW Bbftp FAST GPFS Storage Cluster • Cons • No file system level interface (Is this a con?) • Cannot open a file from the archive (Again, con?) NCCS Data Management.
Data Management Concept of OperationsData Security and Access • Assume we have a well defined set of data security and access levels (examples for pedagogical purposes only) • Level 0: User only • Level 1: User and Project • Level 2: User, Project, and Service • Level 3: Publicly Accessible • Users define their data security and access levels using the appropriate process • When data is put into iRODS by the user under a specific project, it is labeled with the appropriate access level • All NCCS iRDOS enabled services must then check the access level to see if the service can access the data • In addition, the user must grant access to the data to the service NCCS Data Management.
Data Management Concept of Operationsfor IPCC Data Analysis users still have very fast (GPFS) file system access to the data. Step 1: Modelers generate large amounts of data and store into GPFS (very fast). Step 2: Modelers register the data sets into iRODS. Analysis Service Compute Service iRODS iCAT Data Portal FAST FAST SLOW ARCHIVE GPFS Storage Cluster IPCC data is presented to the data portal either by NFS or iRODS interface. SLOW SLOW Step 3: Automatic rules kick in to do the following: A: Automatically extract and publish metadata into a database. B. Make a copy of the file into the NCCS archive. NCCS Data Management.
Data Management Concept of OperationsMore Implementation Details Services on the data portal would have interfaces into iRODS. Could even have a local iRODS resource for caching data. iRODS iCAT Data Portal iRODS Clients iRODS Resource iRODS Resource iRODS Clients Archive accessible via iRODS; still use DMF. ARCHIVE iRODS Resource Nodes DISCOVER Dedicated nodes would be a combination of GFPS clients and iRODS resources. FAST GFPS Clients GPFS Storage Cluster NCCS Data Management.
Pros and Cons • Pros • Very easy for users; they can register whatever they want. • NCCS specific micro-services can be set up to automatically copy files to the archive • Maintains the fast access to the data for both modelers and analysis users • Multi-stream throughput seems to work very well. • Cons • No file system level access to iRODS (could be a pro) • No link between data in GPFS and iRODS • Data changed with iRODS or GPFS will not be reflected in the other • Required to resynchronize the data every so often • Data within iRODS not accessible via a file system interface. NCCS Data Management.
Data Portal Services & Architecture Connectivity to the Goddard DISC and DISC SW. Interfaces to ESG and PCMDI for model data (IPCC AR5). NASA Other ESG PCMDI Data Portal Sufficient compute capability for some amount of analysis. Local Disk NFS GPFS MC iRODS Local disk will allow for relatively small amount of data to be cached in the portal. Reach back capability into the much larger disk environment within the NCCS GPFS and Archive. Users will not have to move or copy data in order to make it available to the portal services. NCCS Data Management.
Concerns • Integration with ESG • Data base design, implementation and number • iRODS security model versus NASA/NCCS policies • Simple single authentication • GSI – Grid Security Infrastructure • Difficulty of developing module/micro-service • Try “get best copy” as an example • iput and iget bandwidth discrepancy with delay injected remains unresolved • Continuing to explore this in the prototype • Little to no services built on top of metadata • Expansive, detailed metadata will have to be scripted NCCS Data Management.
Installation • Automated install script • Set of preinstall queries • Downloads and installs all components • postgres • Can use Oracle, etc. • unixodbc NCCS Data Management.
icommands - Administration • iadmin Administration commands: add/remove/modify users, resources, etc. Commands are: • lu [name[#Zone]] (list user info; details if name entered) • lt [name] [subname] (list token info) • lr [name] (list resource info) • ls [name] (list directory: subdirs and files) • lz [name] (list zone info) • lg [name] (list group info (user member list)) • lgd name (list group details) • lrg [name] (list resource group info) • lf DataId (list file details; DataId is the number (from ls)) • mkuser Name[#Zone] Type [DN] (make user) • moduser Name[#Zone] [ type | zone | DN | comment | info | password ] newValue • rmuser Name[#Zone] (remove user, where userName: name[@department][#zone]) • mkdir Name [username] (make directory(collection)) • rmdir Name (remove directory) • mkresc Name Type Class Host Path (make Resource) • modresc Name [type, class, host, path, comment, info, freespace] Value (mod Resc) • rmresc Name (remove resource) • mkzone Name Type(remote) [Connection-info] [Comment] (make zone) • modzone Name [ name | conn | comment ] newValue (modify zone) • rmzone Name (remove zone) • mkgroup Name (make group) • rmgroup Name (remove group) • atg groupName userName[#Zone] (add to group - add a user to a group) • rfg groupName userName[#Zone] (remove from group - remove a user from a group) • atrg resourceGroupName resourceName (add (resource) to resource group) • rfrg resourceGroupName resourceName (remove (resource) from resource group) • at tokenNamespace Name [Value1] [Value2] [Value3] (add token) • rt tokenNamespace Name [Value1] (remove token) • spass Password Key (print a scrambled form of a password for DB) • dspass Password Key (descramble a password and print it) • pv [date-time] [repeat-time(minutes)] (initiate a periodic rule to vacuum the DB) • ctime Time (convert an iRODS time (integer) to local time; & other forms) • help (or h) [command] (this help, or more details on a command) • Also see 'irmtrash -M -u user' for the admin mode of removing trash. NCCS Data Management.
Example icommands kirk@client1nccs:~> ienv NOTICE: Release Version = rods2.0.1, API Version = d NOTICE: irodsHost=archivenccs NOTICE: irodsPort=1247 NOTICE: irodsDefResource=archivenccsResc NOTICE: irodsHome=/archivenccsZone/home/kirk NOTICE: irodsCwd=/archivenccsZone/home/kirk NOTICE: irodsUserName=kirk NOTICE: irodsZone=archivenccsZone kirk@client1nccs:~> ils /archivenccsZone/home/kirk: blah foo kirk@client1nccs:~> ilsresc archivenccsResc client1nccsResc NCCS Data Management.
Performance Assessment Summary • Local testing of 1Gigabit showed wire speeds for iputs and igets • Artificial distance testing of 1Gigabit (with two different delay simulators) yielded wire speed on iputs but significantly less on iget (~10 % of iputs) • Repeated dialogue with iRODS personnel but discrepancy remains unresolved • Actual distance testing with ARSC showed acceptable results giving 110 msec rtt and OC-3 pipe NCCS Data Management.
Example Rule – core.irb # 6) acPostProcForFilePathReg - Rule for post processing the registration # of a physical file path (e.g. - ireg command). # # Currently, three post processing functions can be used individually or # in sequence by these rules. # msiExtractNaraMetadata - extract and register metadata from the just # upload NARA files. # msiSysReplDataObj(replResc, allFlag) - can be used to replicate a copy of # the file just uploaded or copied data object to the specified replResc # The allFlag is only meaningful if the replResc is a resource group. In # this case, setting allFlag to "all" means a copy will be made in all # the resources in the resource group. A "null" input means a single # will be made in one of the resource in the resource group # # msiSysChksumDataObj - checksum the just uploaded or copied data object. # acPostProcForPut||msiSysChksumDataObj##msiSysReplDataObj(demoResc8,all)|nop##nop # acPostProcForPut||msiSysReplDataObj(demoResc8,all)|nop # acPostProcForPut||msiSysChksumDataObj|nop # acPostProcForPut||delayExec(<A></A>,msiSysReplDataObj(demoResc8,all),nop)|nop # acPostProcForPut||msiSysReplDataObj(demoResc8,all)|nop #acPostProcForPut||msiSetDataTypeFromExt|nop acPostProcForPut||nop|nop acPostProcForCopy||nop|nop acPostProcForFilePathReg||nop|nop rulegen is a parser that takes rules written in a nicer language to the cryptic one needed by irule and core.irb. The input files for the rulgen is recommended to be *.r (.r extensions) and the output created by the rulegen is inthe form of *.ir (.ir extensions). The grammar for the langauge of the input files are given at the end of this note. NCCS Data Management.
Local 1 Gigabit – iputs NCCS Data Management.