220 likes | 410 Views
Overview. RCACCommunity ClustersGrids at PurdueCampusRegionalNWICGNationalOSGCMS Tier-2NanoHUBTeragridFuture Work. Purdue's RCAC. Rosen Center for Advanced ComputingDivision of Information Technology at Purdue (ITaP)Wide variety of systems: shared memory and clusters352 CPU IBM SP Fiv
E N D
1. Purdue Campus Grid Preston Smith
psmith@purdue.edu
Condor Week 2006
April 24, 2006
2. Overview RCAC
Community Clusters
Grids at Purdue
Campus
Regional
NWICG
National
OSG
CMS Tier-2
NanoHUB
Teragrid
Future Work
3. Purdue’s RCAC Rosen Center for Advanced Computing
Division of Information Technology at Purdue (ITaP)
Wide variety of systems: shared memory and clusters
352 CPU IBM SP
Five 24-processor Sun F6800s, Two 56-processor Sun E10ks
Five Linux clusters ITaP, the central computing and telecommunications organization on Purdue’s WL campusITaP, the central computing and telecommunications organization on Purdue’s WL campus
4. Linux clusters in RCAC Recycled clusters
Systems retired from student labs
Nearly 1000 nodes of single-CPU PIII, P4, and 2-CPU Athlon MP and EM64T Xeons for general use by Purdue researchers
5. Community Clusters Federate resources at a low level
Separate researchers buy sets of nodes to federate into larger clusters
Enables larger clusters than a scientist could support on his own
Leverage central staff and infrastructure
No need to sacrifice a grad student to be a sysadmin!
6. Community Clusters
7. Community Clusters Primarily scheduled with PBS
Contributing researchers are assigned a queue that can run as many “slots” as they have contributed.
Condor co-schedules alongside PBS
When PBS is not running a job, a node is fair game for Condor!
But Condor work is subject to preemption if PBS assigns work to the node.
8. Condor on Community Clusters All in all, Condor joins together 4 clusters (~2500 CPU) within RCAC.
9. Grids at Purdue - Campus Instructional computing group manages a 1300-node Windows Condor pool to support instruction.
Mostly used by computer graphics classes for rendering animations
Maya, etc.
Work in progress to connect Windows pool with RCAC pools.
Future work: leverage Windows pools for science
BLAST, perhaps?Future work: leverage Windows pools for science
BLAST, perhaps?
10. Grids at Purdue - Campus Condor pools around campus
Physics department: 100 nodes, flocked
Envision Center: 48 nodes, flocked
Potential collaborations
Libraries: ~200 nodes on Windows terminals
Colleges of Engineering: 400 nodes in existing pool
Or any department interested in sharing cycles! Envision is GPU cluster
Libraries joining RCAC pools this summer?Envision is GPU cluster
Libraries joining RCAC pools this summer?
11. Grids at Purdue - Regional Northwest Indiana Computational Grid
Purdue West Lafayette
Purdue Calumet
Notre Dame
Argonne Labs
Condor pools available to NWICG today.
Partnership with OSG?
12. Open Science Grid Purdue active in Open Science Grid
CMS Tier-2 Center
NanoHUB
OSG/Teragrid Interoperability
Campus Condor pools accessible to OSG
Condor used for access to extra, non-dedicated cycles for CMS and is becoming the preferred interface for non-CMS VOs.
13. CMS Tier-2 - Condor MC production from UW-HEP ran this spring on RCAC Condor pools.
Processed 23% or so of entire production.
High rates of preemption, but that’s expected!
2006 will see addition of dedicated Condor worker nodes to Tier-2, in addition to PBS clusters.
Condor running on resilient dCache nodes. PBS work forcing Condor jobs to vacate
Still got plenty of work donePBS work forcing Condor jobs to vacate
Still got plenty of work done
14. NanoHUB NanoHUB: Science Gateway to the Grid for Nanotechnology community
Teragrid Science Gateway project
Open Science Grid VO
Condor-C as resource broker for Condor-G submissions to Teragrid or OSG
Working closely with Condor teamNanoHUB: Science Gateway to the Grid for Nanotechnology community
Teragrid Science Gateway project
Open Science Grid VO
Condor-C as resource broker for Condor-G submissions to Teragrid or OSG
Working closely with Condor team
15. Teragrid Teragrid Resource Provider
Resources offered to Teragrid
Lear cluster
Condor pools
Data collections
16. Teragrid Two current projects active in Condor pools via Teragrid allocations
Database of Hypothetical Zeolite Structures
CDF Electroweak MC Simulation
Condor-G Glide-in
Great exercise in OSG/TG Interoperability
Identifying other potential users What’s a zeolite (materials science)
User is not experienced with Condor
But with help from RCAC science support staff, has processed 250,000 hours of work in two months.What’s a zeolite (materials science)
User is not experienced with Condor
But with help from RCAC science support staff, has processed 250,000 hours of work in two months.
17. Teragrid TeraDRE - Distributed Rendering on the Teragrid
Globus, Condor, and IBRIX FusionFS enables Purdue’s Teragrid site to serve as a render farm
Maya and other renderers available
18. Grid Interoperability Both OSG and Teragrid operate on the same resources
For example, take “lear”
Campus users can use it (both those who own parts of it, and Condor users who might run on it opportunistically)
In the same way, so do OSG (CMS) users and non-CMS users via Condor
Teragrid users with allocations on Lear get or the Purdue condor pools may run there as well.Both OSG and Teragrid operate on the same resources
For example, take “lear”
Campus users can use it (both those who own parts of it, and Condor users who might run on it opportunistically)
In the same way, so do OSG (CMS) users and non-CMS users via Condor
Teragrid users with allocations on Lear get or the Purdue condor pools may run there as well.
19. Grid Interoperability Tier-2 to Tier-2 connectivity via dedicated Teragrid WAN (UCSD->Purdue)
Aggregating resources at low level makes interoperability easier!
OSG stack available to TG users and vice versa
“Bouncer” Globus job forwarder
20. Future of Condor at Purdue Add resources
Continue growth around campus
RCAC
Other departments
Add Condor capabilities to resources
Teragrid data portal adding on-demand processing with Condor now
Federation
Aggregate Condor pools with other institutions?
Rcac:
Suns
Dedicated Condor resources
New departments:
College of Technology
BiologyRcac:
Suns
Dedicated Condor resources
New departments:
College of Technology
Biology
21. Condor at Purdue Questions?
22. PBS/Condor Interaction
23. PBS/Condor Interaction PBS Epilogue
/opt/condor/bin/condor_config_val -rset -startd \
PBSRunning=False > /dev/null
/opt/condor/sbin/condor_reconfig -startd > /dev/null
Condor START Expression in condor_config.local
PBSRunning = False
# Only start jobs if PBS is not currently running a job
PURDUE_RCAC_START_NOPBS = ( $(PBSRunning) == False )
START = $(START) && $(PURDUE_RCAC_START_NOPBS)