70 likes | 222 Views
PanDA HPC integration. Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from. Outline. HPC access, architecture, specialty Current PanDA implementation PanDA architecture for Kraken, Titan Initial testing Next step: Pilot - SAGA integration. HPC specialty.
E N D
PanDA HPC integration.Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from
Outline • HPC access, architecture, specialty • Current PanDA implementation • PanDA architecture for Kraken, Titan • Initial testing • Next step: Pilot - SAGA integration.
HPC specialty • Kraken Cray XT5 (have access from beginning of August) • 9408 nodes • node: 12 core, 16 GB RAM • Titan Cray XT7 (access request in process) • 18,688 nodes • node: 16 core, 32 + 6 GB RAM (2GB per core) • Parallel file system shared between nodes. • Access only to interactive nodes (worker nodes have extremely limited connectivity) • One-Time Password Authentication • Internal job management tool: PBS/TORQUE • One job occupy minimum one node (12-16 cores) • Limitation of number of jobs in scheduler for one user
Current PanDAimplementation • One Pilot per WN • Pilot executes on same node as job • SW distribution through CVMFS One Pilot per WN Pilot executes on same node as job SW distribution through CVMF
PanDA architecture for Kraken, Titan • Pilot(s) executes on HPC interactive node • Pilot interact with local job scheduler to manage job • Number of executing pilots = number of available slots in local scheduler
Initial testing • Some initial testing was done for proving that panda components will be abele to run in HPC environment on interactive nodes • Sergey was successful with starting APF and pilots on Titan, outbound https connection was confirmed, so pilots can communicate with PanDA • I provide successful testing of SAGA API on Kraken. Generally SAGA API allows manage jobs in local HPC job scheduler • Due to interactive node and worker nodes use shared file-system, we did not need any special internal data-management process.
Next step: Pilot - SAGA integration • Actually it’s a bit big step, which may be technically split: • SAGA source integration with pilot code • Reviewing, revers engineering runJob class • Implementation runJobHPC class based on SAGA API