220 likes | 233 Views
Learn how to maintain individual traceability in shared project accounts using grid tools like NetLogger, GSISSH, and GridFTP. This solution enables collaborative computing by allowing multiple users to manage jobs and access data while upholding individual accountability. Find out how NERSC meets DOE/NIST requirements for individual traceability and access across various platforms.
E N D
Maintaining Individual Traceability in Shared Project Accounts with CEDPS/VDT Tools Shreyas Cholia <scholia@lbl.gov> NERSC Division, Lawrence Berkeley Lab Open Science Grid All-Hands Meeting Caltech-LIGO, Livingston, LA March 2009
Overview • Motivation and Requirements • Solution Overview • Grid Infrastructure Description • Process Accounting Information • Log collection and parsing • Build NetLogger database and reconciling information • Questions
Project Accounts for Collaborative Computing • Project (Group) Accounts enable shared access to compute and data resources for collaboration. • Jobs and data owned by common UNIX project user • Files may persist after individual has left project • Jobs may need to be managed by different users • Allow multiple users to share files and manage jobs, … without relying on group UNIX permissions, … while maintaining individual accountability • Built around standard OSG/VDT grid tools • NetLogger • GSISSH • GridFTP/GRAM • MyProxy
Requirements for Project Account Access • Must maintain individual traceability for all actions performed within project environment • DOE / NIST requirements for individual accountability at NERSC • Should allow both shell and grid based access to project accounts • Users should be able to access multiple project as well as individual accounts • Must include access to data and jobs • Solution should should work across all major NERSC platforms • Should support both OSG and non-OSG communities
Overview of Solution • Use grid certificates to track “real” user performing a given operation. Subject DN in certificate provides the user information. • Limit project account access to • Grid Interfaces • GSISSH • GridFTP • WS-GRAM • Custom login interfaces that record Parent PID • Custom HSI client for HPSS • Custom SSH for login nodes • Collect and parse log files; Reconcile all the information with original user DN using NetLogger.
NERSC CA • All NERSC users are assigned a short-lived certificate through the NERSC CA • Create a short lived certificate # myproxy-logon -s slcs.nersc.gov # grid-proxy-info -subject /DC=gov/DC=nersc/OU=People/CN=Shreyas Cholia 1234
grid-mapfile • NERSC uses grid-mapfile for GSI access • Could be easily extended to use GUMS (phase 2?) • Specify target project in command line • gsissh -l projuser • globus-url-copy gsiftp://projuser@davinci.nersc.gov/testfile file:///localfile • WS-GRAM <localUserID> tag in job spec file • Sample gridmap entry with project accounts: “/DC=org/DC=doegrids/OU=People/CN=Shreyas Cholia” shreyas,projuser,osg
GSISSH • Only supported for local access: • gsissh -l projuser localhost • GSISSH acts like a “sudo” mechanism • User must first log in to NERSC using ssh • This forces user to go through custom sshd with keystroke login • Prevents automatic credential forwarding (disabled for NERSC gsissh clients) so that user credentials are not stored in shared accounts
WS-Gram and GridFTP • User mapped to project account using grid-mapfile • WS-Gram >= GT 4.0.8 logs user DN information associated with job ID • GridFTP includes DN in session information • Logs can be generated in NetLogger format directly by WS-Gram, GridFTP • Pre-WS GT2 GRAM not supported • GT2 GRAM does not support multiple target users using gridmap-auth • VOMS/GUMS could address this for OSG use by keying off FQAN • At NERSC, project user groups are not necessarily in OSG and may not have access to VOMS/GUMS
Process ID Logging • Linux - Comprehensive System Accounting (CSA) for Parent PID tracking - needs kernel mods • Log the process tree on the node • http://oss.sgi.com/projects/csa/ • BSDV3 Accounting • AIX auditing • Provides similar process information.
HPSS Access Project account access is only allowed through one of the following: • GridFTP • GSI authentication • Allows access from outside NERSC • Logs user DN for project account access • Custom HSI client with PID logging • NERSC auth (access restricted to within NERSC - login is automatic from NERSC hosts) • Logs Client PID on server side, which can be traced back as follows Client PID -> PPID -> DN
Special Cases and Caveats • SSH to worker nodes • Users can build custom clients to bypass some of this, but these log entries can be flagged by netlogger • Credential delegation MUST be disabled • Record keeping lifetime? Parent PID not logged until process dies. Currently flag after 24 hours for review.
NetLogger Format • All log lines parsed into key=value pairs • Required Fields for every line: • ts=[timestamp in ISO8601 or secs since epoch] • event=[event identifier in java class notation] • Additionally tag lines with host and client information
Sample log parsing PROCESS ACCOUNTING LOG: grep fogal1 11:05:05 11:05:05 0.11 0.00 0 7643 7641 Mon Jul 14 2008 Mon Jul 14 2008 nl_parser ts=2008-07-14T11:05:05-07:00 event=csa.process level=Info month_start=Jul process.ppid=7641 year_start=2008 tod_start=11:05:05 monthday_start=14 tod_stop=11:05:05 process.pid=7643 cmd=grep pid=7643 year_end=2008 cputime=0.000000 local_user=fogal1 ignore=0 dow_end=Mon dow_start=Mon monthday_end=14 month_end=Jul dur=0 ppid=7641 walltime=0.110000
Logs Collected • Syslogs • ssh, gsissh information • Gridftp logs • gridftp.log • gridftp-auth.log • WS-GRAM logs • accounting.log • container-real.log • Process Accounting/Auditing logs • PBS/SGE/Loadleveler job accounting logs • HPSS HSI Logs • grid-map files
Pre-parsing Logs • Preparsing done on local system • Drop unrelated log lines • Store log lines with dependencies in local temporary database eg. (job acct -> process acct -> gsissh log) • Fill out missing fields where necessary • Create temporary database to hold process information • Process ID tree is only filled out on process conclusion • eg. Cross reference GSISSH DN and PID tables • May not be able to process all records since some records span multiple days • Unprocessed records held • Records > 1 day are flagged • Records without real user are flagged
NetLogger - Parsing and Loading the Database • Stage logs into central collector for parsing • Syslog-NG or rsync • Run NL parsers for all relevant log files • Feed parsed files into NL database • Issue queries against database for useful information select * for real_user where job_id=XXXX • Contributing parsers back to NL source, so other projects can benefit
Current Status and Open Issues • Pilot Project • Being deployed on test clusters • Still requires some level of manual oversight to review flagged entries • Information may not be complete if there is a system crash • Record lifetime
Future Development • Tighter OSG integration (GUMS/VOMS) • Create per user accounting information and integrate with NERSC project accounting
Questions? Thanks to Tina DeClerck, Dan Gunter