590 likes | 1.08k Views
2. Development Team. DICE teamArcot Rajasekar - iRODS Development Lead Mike Wan - iRODS Chief ArchitectWayne Schroeder - iRODS Product Mgr., DeveloperBing Zhu - Fedora, WindowsMike Conway - Java (Jargon)Paul Tooby - Documentation, FoundationSheau-Yen Chen - Data Grid AdministrationReagan Moore - PIPreservation Richard Marciano - Preservation Development LeadChien-Yi Hou - Preservation Micro-servicesAntoine de Torcy - Preservation Micro-services.
E N D
1. 1 iRODS - integrated Rule Oriented Data System Reagan Moore
rwmoore@renci.org
2. 2 Development Team DICE team
Arcot Rajasekar - iRODS Development Lead
Mike Wan - iRODS Chief Architect
Wayne Schroeder - iRODS Product Mgr., Developer
Bing Zhu - Fedora, Windows
Mike Conway - Java (Jargon)
Paul Tooby - Documentation, Foundation
Sheau-Yen Chen - Data Grid Administration
Reagan Moore - PI
Preservation
Richard Marciano - Preservation Development Lead
Chien-Yi Hou - Preservation Micro-services
Antoine de Torcy - Preservation Micro-services
3. 3
4. 4 Scale of iRODS Data Grid Number of files
Tens to millions to hundreds of millions of files
Size of data
Gigabytes to hundreds of terabytes to petabytes of data
Number of policy enforcement points
64 actions define when policies are checked
System state information
112 metadata attributes for system information per file
Number of functions
185 composable micro-services
Number of storage systems that are linked
One to tens to a hundred storage resources
Number of data grids
One to federation of tens of data grids
5. 5 Data are Inherently Distributed Distributed sources
Projects span multiple institutions
Distributed analysis platforms
Grid computing
Distributed data storage
Minimize risk of data loss, optimize access
Distributed users
Caching of data near user
Multiple stages of data life cycle
Data repurposing for use in broader context
6. 6 Organize Distributed Data into a Sharable Collection Project repository
MotifNet - manage collection of analysis products
Institutional repository
Carolina Digital Repository for UNC collections
Regional collaboration
RENCI Data Grid linking resources across North Carolina
National collaboration
NSF Temporal Dynamics of Learning Center
Australian Research Collaboration Service
National Library
French National Library
National Archive
NARA Transcontinental Persistent Archive Prototype, Taiwan
International collaboration
BaBar High Energy Physics (SLAC-IN2P3)
National Optical Astronomy Observatory (Chile-US)
7. 7 Logical Name Spaces
8. Social Challenges Every community prefers their user interface
Unix shell commands - icommands
Java I/O library - JARGON / JUX
C I/O library
Portals - EnginFrame
Digital Libraries - Fedora / Dspace
Workflows - Kepler / Taverna
Transport - GridFTP / Parrot
Web browsers / Windows browser
Load libraries - Python (Pyrods)
User level file systems - FUSE / WebDAV / PetaFS
Grid APIs - JSAGA
Web services - URSpace / VOSpace
Future ports - Islandora / iDROP
9. Heterogenity Challenges Many types of operating systems
Unix variants, 32-bit/64-bit
Mac OSX/IntelPC, Mac OSX/PowerPc
Linux
Windows XP, Vista
Many types of storage systems
File systems
Tape archives
Cloud storage
Different administrative domains
Challenge-response authentication
Kerberos
GSI - Grid Security Infrastructure (PKI certificates)
Shibboleth
10. 10 Data Virtualization
11. 11 iRODS - Policy-based Management Turn policies into computer actionable rules
Compose rules by chaining micro-services
Manage state information as attributes on namespaces:
Files / collections /users / resources / rules
Validate assessment criteria
Queries on state information, parsing of audit trails
Automate administrative functions
12. 12 iput With Replication
13. 13 Under the hood - a glimpse
14. 14 iRODS Distributed Data Management
15. iRODS Wiki Presentations, papers, tutorials
http://irods.diceresearch.org
Open source software - BSD license
Contributed clients, software
Performance assessments
Download source code
Windows - binary release
Unix / Mac / Linux build from source
iRODS Primer
Morgan & Claypool
Synthesis Lectures on Information Concepts, Retrieval, and Services
16. 16
17. 17 Infrastructure Independence Manage properties of the collection independently of the choice of technology
Access, authentication, authorization, description, location, distribution, replication, integrity, retention
Enforce policies across all storage locations
Rule Engine resident at each storage site
Apply procedures at each remote storage site
Chain encapsulated operations into workflows
Use infrastructure independence to enable use of new technology without interruption
Integrate new access methods, new storage systems, new network protocols, new authentication systems
18. 18 Data Grid Security Manage name spaces for:
{users, files, storage}
Assign access controls as constraints imposed between two logical name spaces
Access controls remain invariant as files are moved within the data grid
Controls on: Files / Storage systems / Metadata
Authenticate each user access
PKI, Kerberos, challenge-response, Shibboleth
Use internal or external identity management system
Authorize all operations
ACLs (Access Control Lists) on users and groups
Separate condition for execution of each rule
Internal approval flags (IRB) within a rule
19. 19 iRODS Rules and Micro-services Reagan W. Moore
20. Rule Base Rules stored in core.irb file
Separate copy of core.irb installed at each storage location
Can have storage or site specific rules
Each rule is associated (through its name) with specific event in the iRODS framework (64 hooks)
acPreProcForPut
acPostProcForPut
acDeleteUser
Can also execute user-defined rules through the irule command
21. Variables Session variables
Define parameters associated with the client session, such as:
$userNameClient
$rodsZoneClient
Workflow variables
Define parameters used within the workflow
*A, *CollName
stdout
Persistent state information
Maintained across sessions, stored in iCAT
DATA_NAME, DATA_SIZE, COLL_NAME, DATA_CHECKSUM
META_DATA_ATTR_NAME, META_DATA_ATTR_UNITS
22. 22 iRods Rules Each rule defines
An action for an event
Condition
Action chains (micro-services and rules)
Recovery chains
Invoked by servers to enforce policies
Invoked by clients to run workflows on servers
Rule types
Atomic -- applied immediately
Deferred -- run at a later time in the background
Periodic – run at a fix time interval
23. 23 Format of a Rule Action | Condition | MS1, …, MSn | RMS1, …, RMSn
Action
Name of action to be performed
Name known to the server and invoked by server
Condition – condition under which the rule apply
Micro-services - If applicable micro services will be executed
Recovery micro-service - If any micro service fails, recovery micro service(s) executed to maintain transactional consistency
Example of MS/RMS
createFile(*F) removeFile(*F)
ingestMetadata(*F,*M) rollback
24. 24 Condition Condition under which this Rule applies
Examples
$rescName == demoResc8
$objPath like /x/y/z/*
Many operators
==, !=, >, <, >=, <=
%%, !! (and, or)
expr like reg-expr , expr not like reg-expr , expr ::= string
25. 25 Micro-services (MSs) Well-defined Server-side Procedures and Functions
C functions on servers
MSs can be chained to form workflow using ‘##’
msiDataObjOpen(*A,*S_FD)##
msiDataObjRead(*S_FD,10000,*R_BUF)##
msiDataObjClose(*D_FD,*stat)
Flow control
whileExec - while loop
forExec – for loop
forEachExec – for each in the table or list
break
ifExec – if-else
26. 26 Micro-services – flow control examples whileExec
assign(*A,0)##whileExec( *A < 20, writeLine(stdout,*A)##assign(*A, *A + 4), nop##nop)
forExec
forExec(assign(*A,0), *A < 20 , assign(*A,*A + 4), writeLine(stdout,*A),nop)
ifExec
ifExec(*A > *D, assign(*A,*D),nop,assign(*D,*A),nop)
27. 27 Other Micro-services
delayExec - execute MSs at a later time
Exec by the iRods batch server (irodsReServer) in the background
Example
delayExec(<PLUSET>1m</PLUSET>,msiReplColl(*desc_coll,*desc_resc, backupMode,*outbuf),nop)
Time keywords
PLUSET – exec after the specified time has passed
ET – exec at the specified time (<ET>23:00</ET>)
FT – repeat exec at the specified frequency
Can be combined
<PLUSET>1m</PLUSET><EF>5m</EF>
remoteExec – execute MSs on remote servers
remoteExec(andal.sdsc.edu,null,msiSleep(10,0)##writeLine(stdout,open remote write in andal), nop)
assign - assign a value to a parameter
writeString - write a string to stdout buffer
writeLine - write a line (with end of line) to stdout buffer
28. 28 Micro-Services parameters Micro-services communicate through:
Arguments/Parameters
Input from the initiator (client/server)
Lieterals
Variables
start with *
Output of a MS can be used as input of another MS in a MS chain
System Session Parameters
Start with “$”
Valid across rule invocations
Persistent data – iCat
Query the iCat
Valid across sessions
XMessages – out-of-band communications
Sender obtains send/receive tickets
Pass receive ticket to receivers
Receiver use ticket to read msg
Msg exchange
Between Parallel Session
Between the batch manager and the task manager on the task status
29. 29 Example of passing parameters between Micro-services
trimColl.ir file:
myTestRule||acGetIcatResults(*Action,*Condition,*B)##
forEachExec(*B,msiDataObjTrim(*B,tgReplResc,null,1,null,*C),nop)|nop##nop
*Action=trim%*Condition= COLL_NAME = '/tempZone/home/rods/loopTest'
*Action%*Condition
irule –F trimColl.ir
30. 30 Using the rulegen parser See: https://www.irods.org/index.php/HELP.rulegen
Uses a nicer rule language and converts it into the core.irb version
rulegen –s rX.r
This converts from the rulegen syntax to the core.irb syntax and displays the result on your screen
rulegen –s rX.r > rX.ir
This converts from the rulegen syntax to the core.irb syntax and stores the result in the file rX.ir
irule –F rX.ir
Executes the policy
31. 31 Adding metadata values mytestrule{
msiString2KeyValPair("FILETYPE_STATUS2=FTPASS",*kvp);
msiAssociateKeyValuePairsToObj(*kvp,*path,"-d");
}
INPUT *Att=$FILETYPE,*Val=$text,*path=/renci/home/rods/listMS.ir
OUTPUT ruleExecOut
Note that there cannot be any spaces around the “=“ sign within the msiString2KeyValPair micro-service. Spaces are interpreted as part of the attribute name and attribute value.
32. 32 Adding Metadata mytestrule{
msiString2KeyValPair("*attrname=*attrvalue",*kvp);
assign(*A,*path/*obj);
writeLine(stdout,*A);
msiAssociateKeyValuePairsToObj(*kvp,*path/*obj,"-d");
}
INPUT *path=/renci/home/rods,*obj=$listMS.ir,*attrname="FILETYPE", *attrvalue="25"
OUTPUT ruleExecOut
33. 33 Reading user-defined metadata acGetDataObjAVU{
msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName'", *Query);
msiExecStrCondQuery(*Query, *GenQOut);
forEachExec(*GenQOut){
msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue);
msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName);
msiGetValByKey(*GenQOut, DATA_NAME, *name);
writeLine(stdout,"*name has attribute *AttrName and value *AttrValue");
}
}
INPUT *CollName="$/renci/home/rods"
OUTPUT ruleExecOut
This lists all of the user-defined metadata values for all of the files in the named collection
34. 34 Example of multiple conditions acGetDataObjAVU{
msiMakeQuery("META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE, COLL_NAME, DATA_NAME", "COLL_NAME = '*CollName' and META_DATA_ATTR_NAME = '*AttrName'", *Query);
msiExecStrCondQuery(*Query, *GenQOut);
forEachExec(*GenQOut){
msiGetValByKey(*GenQOut, META_DATA_ATTR_VALUE, *AttrValue);
msiGetValByKey(*GenQOut, META_DATA_ATTR_NAME, *AttrName);
msiGetValByKey(*GenQOut, DATA_NAME, *name);
writeLine(stdout,"*name has attribute *AttrName and value *AttrValue");
}
}
INPUT *CollName="$/renci/home/rods", *AttrName="FILETYPE"
OUTPUT ruleExecOut
This only lists files that have the specified attribute name
35. 35 Simple rule to list files testlist.ir
mytestRule||acGetIcatResults(*Action,*Condition,*B)##
forEachExec(*B,msiGetValByKey(*B,DATA_NAME,*D)##
msiGetValByKey(*B,COLL_NAME,*E)##
writeLine(stdout,*E/*D),nop)|nop##nop
*C=/renci/home/rods%*Action=list%*Condition=COLL_NAME = ’*C'
ruleExecOut
Try
irule -F testlist.ir prompt
irule -F testlist.ir ‘yourpathname’
irule -F testlist.ir *C=‘yourpathname’
36. 36 Converting String to AVU triplet testrule||
msiDataObjChksum(*objPath,null,*ChksumStr)##
msiGetSystemTime(*Date,human)##
msiString2KeyValPair(Checksum.*Date=*ChksumStr,*KVPair)##
msiAssociateKeyValuePairsToObj(*KVPair,*objPath,-d)|nop
*objPath=/tempZone/home/antoine/tmp.txt
ruleExecOut
37. 37 Installation of iRODS Chien-Yi Hou
38. 38 iRODS Wiki http://irods.diceresearch.org
Descriptions of the technology
Publications / presentations
Download
Performance tests
Tinderbox system (tracks upgrades)
irods-chat page
39. 39 iRODS installation Download appropriate installation manual from iRODS Wiki http://irods.dicerearch.org
Installation procedure will take
Up to 30 minutes for server/catalog/clients
Up to 10 minutes for server/clients
About 3 minutes for clients
We will do a client install
40. 40 Windows Installation From the URL https://www.irods.org/index.php/windows
go to the section labeled Windows i-Commands and click on the file
10-29-09: Windows i-commands 2.2
This will download the file
win_icmds_2_2.zip
Uncompress the file
41. 41 Detailed Windows Install Extract the exe files. This will be a long list of separate executable commands, one for each type of operation that you may need to perform. The list will include:
iadmin - used by the data grid administrator to set up
resources and accounts
icd - change to a different directory in the data grid
ils - list files in a data grid directory
To use these icommands, you will need to set up an environment variable file which has default settings for the data grid that the class will use.
Note the directory name where you have put the executables
42. 42 Detailed Windows Install On the URL https://www.irods.org/index.php/windows
there are instructions in the section labeled
Setting up the iRODS User Environment file in Windows (for i-commands only)
To create the .irodsEnv file:
* Launch a "Command Prompt" by navigating to the menu "Start" -> "Accessories" -> "Command Prompt".
* Change directory to the user home directory.
> cd %HOMEDRIVE%%HOMEPATH%
* Type the following Windows command to create a folder, ".irods", and move into this directory.
> md .irods
> cd .irods
> Notepad .irodsEnv
This will launch a Notepad and create a text file named ".irodsEnv".
43. 43 Detailed Windows Install Enter the following information into Notepad and click save.
irodsHost iren.renci.org’
irodsPort 1247
irodsDefResource 'renci-vault1'
irodsHome '/RENCI/home/usertutor1'
irodsCwd '/RENCI/home/usertutor1'
irodsUserName ’usertutor1'
irodsZone ’renci’
These are the Environment variables for a user account on the data grid ‘RENCI’
You will need to replace the three occurrences of ‘usertutor1’ with your iRODS account name on lines 4, 5, 6
44. 44 Detailed Windows Install To run i-commands in any directory in a Windows machine, the path to where i-commands reside should be set in the Windows PATH environment variable.
To do this, launch the System dialogue via:
* Start -> settings -> control panel.
* Click the "System" icon.
* In the "Advanced" tab, click the "Environment
variables" button.
Add the path name for the i-commands directory to the "PATH" either in user category or the system category. The path name can be found from the window that shows the icommand executables. Add a semi-colon and this path name to the end of the PATH text.
Then close the window and start a new command prompt window. You will be able to execute the icommands from any directory on your system.
45. 45 Detailed Windows Install To connect to the data grid, type
iinit
To change your password, type
ipasswd
You will be prompted for your current password
You will then be asked for the new password
46. 46 iRODS - Unix/Linux/Mac Installation https://www.irods.org/download.html
Fill out form for:
BSD license
Registration / agreement
Tar file
Installation script (Linux, Solaris, Mac OSX)
Automated download of PostgreSQL, ODBC
Installation of PostgreSQL, ODBC, iRODS
Initiation of iRODS collection
47. 47 iRODS Installation- Unix Unpack the release tar file
gzip -d irods.tgz
tar xf irods.tar
cd into the top directory and execute
./irodssetup
It will prompt for a few parameters
48. 48 irodssetup Set up iRODS
------------------------------------------------------------------------
iRODS is a flexible data archive management system that supports many different site configurations. This script will ask you a few questions, then automatically build and configure iRODS.
There are four main components to iRODS:
1. An iRODS server that manages stored data.
2. An iCAT catalog that manages metadata about the data.
3. A database used by the catalog.
4. A set of 'i-commands' for command-line access to your data.
You can build some, or all of these, in a few standard configurations. For new users, we recommend that you build everything.
49. 49 iRODS Client Installation iRODS configuration setup
----------------------------------------------------------------
This script prompts you for key iRODS configuration options.
Default values (if any) are shown in square brackets [ ] at each
prompt. Press return to use the default, or enter a new value.
For flexibility, iRODS has a lot of configuration options. Often
the standard settings are sufficient, but if you need more control
enter yes and additional questions will be asked.
Include additional prompts for advanced settings [no]?
50. 50 iRODS Client Installation iRODS configuration (advanced)
------------------------------
iRODS consists of clients (e.g. i-commands) with at least one iRODS
server. One server must include the iRODS metadata catalog (iCAT).
For the initial installation, you would normally build the server with
the iCAT (an iCAT-Enabled Server, IES), along with the i-commands.
After that, you might want to build another Server to support another
storage resource on another computer (where you are running this now).
You would then build the iRODS server non-ICAT, and configure it with
the IES host name (the servers connect to the IES for ICAT operations).
If you already have iRODS installed (an IES), you may skip building
the iRODS server and iCAT, and just build the command-line tools.
Build an iRODS server [yes]? no
51. 51 iRODS Client Installation iRODS can make use of the Grid Security Infrastructure (GSI)
authentication system in addition to the iRODS secure
password system (challenge/response, no plain-text).
In most cases, the iRODS password system is sufficient but
if you are using GSI for other applications, you might want
to include GSI in iRODS. Both the clients and servers need
to be built with GSI and then users can select it by setting
irodsAuthScheme=GSI in their .irodsEnv files (or still use
the iRODS password system if they want).
Include GSI [no]? no
52. 52 iRODS Client Installation Confirmation
------------
Please confirm your choices.
--------------------------------------------------------
GSI not selected
Build iRODS command-line tools
--------------------------------------------------------
Save configuration (irods.config) [yes]?
Saved.
Start iRODS build [yes]?
53. 53 iRODS Client Installation Build and configure
-------------------
Preparing...
Configuring iRODS...
Step 1 of 4: Enabling modules...
properties
Step 2 of 4: Verifying configuration...
No database configured.
Step 3 of 4: Checking host system...
Host OS is Mac OS X.
Perl: /usr/bin/perl
C compiler: /usr/bin/gcc (gcc)
Flags: none
Loader: /usr/bin/gcc
Flags: none
Archiver: /usr/bin/ar
Ranlib: /usr/bin/ranlib
64-bit addressing not supported and automatically disabled.
54. 54 iRODS Client Installation Step 4 of 4: Updating configuration files...
Updating config.mk...
Created /iRODS/config/config.mk
Updating platform.mk...
Created /iRODS/config/platform.mk
Updating irods.config...
Updating irodsctl...
Compiling iRODS...
Step 1 of 2: Compiling library and i-commands...
Step 2 of 2: Compiling tests...
Done!
55. 55 iRODS Client Installation -----
To use the iRODS command-line tools, update your PATH:
For csh users:
set path=(/iRODS/clients/icommands/bin $path)
For sh or bash users:
PATH=/iRODS/clients/icommands/bin:$PATH
Please see the iRODS documentation for additional notes on how
to manage the servers and adjust the configuration.
Change the path name to your installation path
56. 56 Environment Variables In home directory
cd ~/.irods
vi .irodsEnv
Default values to describe settings for interacting with your data grid
57. 57 Environment File # iRODS personal configuration file.
#
# This file was automatically created during iRODS installation.
# Created Fri Jan 18 10:01:48 2008
#
# iRODS server host name:
irodsHost ‘iren.renci.org’
# iRODS server port number:
irodsPort 1247
# Home directory in iRODS:
irodsHome ’/RENCI/home/usertutor1'
# Current directory in iRODS:
irodsCwd ’/RENCI/home/usertutor1'
# Account name:
irodsUserName ’usertutor1'
# Zone:
irodsZone ’renci'
58. 58 User Configuration To use the iRODS 'i-commands', update your PATH:
For csh users:
set path=(/storage-site/iRODS/clients/icommands/bin $path)
For sh or bash users:
PATH=/storage-site/iRODS/clients/icommands/bin:$PATH
59. 59 irodsctl - script to control iRODS Usage is:
./irods/irodsctl [options] [commands]
Help options:
--help Show this help information
Verbosity options:
--quiet Suppress all messages
--verbose Output all messages (default)
iRODS server Commands:
istart Start the iRODS servers
istop Stop the iRODS servers
irestart Restart the iRODS servers
60. 60 irodsctl options Database commands:
dbstart Start the database servers
dbstop Stop the database servers
dbrestart Restart the database servers
dbdrop Delete the iRODS tables in the database
dboptimize Optimize the iRODS tables in the database
dbvacuum Same as 'optimize'
General Commands:
start Start the iRODS and database servers
stop Stop the iRODS and database servers
restart Restart the iRODS and database servers
status Show the status of iRODS and database servers
test Test the iRODS installation