280 likes | 415 Views
Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan. Challenges Running an NFSv4-backed OSG Cluster. Overview. Basic NFSv4 in production Open Science Grid (OSG) Overview OSG Installation OSG Configuration Submitting a job!
E N D
Kevin Coffman kwc@citi.umich.edu Center for Information Technology Integration University of Michigan Challenges Running an NFSv4-backed OSG Cluster
Overview • Basic NFSv4 in production • Open Science Grid (OSG) Overview • OSG Installation • OSG Configuration • Submitting a job! • Authentication differences (AFS vs. NFSv4) • Authentication futures
Basic NFSv4 file service in production • Basic file storage • User name mappings • Home directories • Kernel builds, etc.
Open Science Grid Overview • Architecture • Head node & worker notes • Core is NSF Middleware Initiative (including Globus, Condor, kx.509) • Authentication • X.509, kx.509, proxy certs • No cluster file-system required • “Home”, Base, Data, Apps, Temp, Worker node temp
OSG Installation • New Linux kernels, new NFSv4 code, new OSG releases, repeat! • Base installation is done solely on head node • Credentials needed • Root access assumed for local file system access • Mapping machine cred now necessary • Kerberos credentials for NFS file system access • Name-to-UID mapping issues • Found the need for tools/scripts for flushing mappings
OSG Configuration • Daemons (i.e., MonALISA and Condor) on head node and worker nodes require authentication for file system access • Keytabs • More name to UID mapping required • Virtual Organization (VO) accounts • DN to UNIX account name via grid-mapfile • Name to UID mappings required for file system access
Submitting a job! • Job submission uses X.509 authentication • Need Kerberos authentication for file-system access • Gatekeeper forks a job manager process for each job • Job manager process runs as the original user and needs user’s credentials • Verified works as expected using AUTH_SYS w/o requiring Kerberos credentials
mod_ssl Browser mod_kct libpkcs11 KCT kx509 mod_kx509 KCA kinit KDC CHEF Authorization GateKeeper Resource Mgr Authorization Resource MGRID Architecture MGRID Portal User Workstation Apache SSL (Client Certificate required) 3 Kerberos V5 4 Kerberos 2 5 Kerberos mod_ jk mod_ php 1 6 Tomcat GSI Grid Resource LDAP 6 SASL 7 LDAP SASL 8
Grid job authentication issues • Jobs scheduled to run in the future • Long-running jobs (refreshing credentials) • Combination of both (future and long-running) • Distribution of user credentials to worker nodes for file system access
Current Architecture KDC TGS AS 6 1 client server 5 9 SVC GSSD GSSD user process user 7 kernel 12 8 10 13 3 4 gss context cache gss context cache NFS NFSD 11 Credentials on Disk 2 keytab
Authentication futures • SPKM3 • Allows us to stay in X.509 world • Anonymous (DH) • Certificate on server to prevent MIM • X.509 Certificates • LIPKEY • Built on top of SPKM3 • Allows TLS-like password authentication
Linux kernel keys support(a.k.a. keyring) • General credential storage in-kernel • thread-specific keyring • process-specific keyring • session-specific keyring (PAG-like via JOIN_SESSION_KEYRING) • Different key types: ‘user’, ‘rpcsec_gss context’ • Create, delete, link, search, revoke, etc. • Quotas and permissions • Referenced by serial # and description
MIT Kerberos ccache using keyring as backing storage • Assumes a single “active” credentials cache • Can store more than one ccache in same session keyring • All user-level code Session | +---> krb5_cc_active (key: contains 0x00004f12) | +---> /tmp/krb5cc_20010_XF45C2 (keyring: id is 0x000023cd) | | | +---> kwc@CITI.UMICH.EDU (principal info) | +---> krbtgt/CITI.UMICH.EDU@CITI.UMICH.EDU | +---> nfs/screamer.citi.umich.edu@CITI.UMICH.EDU | +---> nfs/troy.citi.umich.edu@CITI.UMICH.EDU | +---> pop/citi.umich.edu@CITI.UMICH.EDU | +---> afs@CITI.UMICH.EDU | +---> /tmp/krb5cc_20010_umich (keyring: id is 0x00004f12) | +---> kwc@UMICH.EDU (principal info) +---> krbtgt/UMICH.EDU@UMICH.EDU +---> imap/tremors.itd.umich.edu@UMICH.EDU
Mount using keyring support • Mount program will use keytab to set up machine credentials in keyring • /sbin/request-key invoked and finds machine credentials • Context is negotiated and “rpcsec_gss context” key instantiated
User access using keyring support • Assumes they have credentials in keyring via kinit or PAM • No more looking around blindly for creds in filesystem • /sbin/request-key invoked and finds user’s session-specific credentials
Keyring issues • Upcalls from asynchronous events • Still need to tie “rpcsec_gss context” keys to Kerberos credentials
Future Architecture KDC TGS AS 4 1 client server 7 SVC GSSD request-key handler user process user 5 kernel 10 6 8 11 TGT 2 3 gss context cache gss contextcache(in keyring) NFS NFSD 9 keytab
Questions / Discussion http://www.citi.umich.edu/projects
References • Open Science Grid • http://www.opensciencegrid.org • MonALISA • http://monalisa.cacr.caltech.edu • Condor • http://www.cs.wisc.edu/condorCondor • Keyring • Kernel Source: /Documentation/keys.txt
Krb5: Obtaining gss context • TGT: currently stored in file system • Per NFSD service ticket: currently stored in file system • GSSD locates user credentials by convention (/tmp/krb5cc_uid) • Synchronizing gss_context and credential problematic
Linux credential interface • New system calls for kernel credential placement • Available for upcoming PAG implementation • Passed via upcall to GSSD • Credential vs. gss context management no longer a problem
Linux Krb5 kernel credential • Pass TGT to kernel as credential • Stored in user process (PAG) • Passed to GSSD via gss_init_sec_context upcall • GSSD manages Krb5 NFSD service tickets • Multiple in kernel TGTs vs. cross realm authentication
Client: LIPKEY with SPKM3 • Initiator • Anonymous SPKM3 client • Credential: • LIPKEY username and password • sent to server encrypted in SPKM3 session key • Context • per <user, nfsd> LIPKEY(?) and SPKM3 gss context
Linux LIPKEY kernel credential • LIPKEY credential (username and password) is per server. • Not stored in kernel • Instead, store information to be passed to GSSD which will prompt user for LIPKEY password for each NFSD.
Client: SPKM with X509 • Initiator • password for long term user X.509 private key • Credential • short term proxy X509 credential and private key (grid-proxy-init) • Context • per <user, nfsd> SPKM gss context
Linux SPKM kernel credential • Pass proxy (short term) X509 credential and private key to kernel as credential • Stored in user process (PAG) • Passed to GSSD via gss_init_sec_context upcall • GSSD manages CA hierarchy and credential checking