1 / 39

SUN GRID ENGINE

Sun Grid Engine is an award-winning workload management system that offers dynamic resource management, job scheduling, resource monitoring, policy administration, user authentication and access control, and accounting and reporting. It is used in various industries for computing tasks such as data centers, medical imaging, risk analysis, manufacturing, entertainment/media, energy, government/education, and more. With Sun Grid Engine, users can easily match resources, select schedules, and manage jobs efficiently. The system also provides maximum flexibility, scalability, and integration options for users.

mhoopes
Download Presentation

SUN GRID ENGINE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SUN GRID ENGINE • Wolfgang Gentzsch • Formally (Senior Director of Grid Computing) • Sun Microsystems, Inc.

  2. Popular work load management systems Google search term Google hits (including quotes) ---------------------------------------------------------------------------- "grid engine": 138,000 "load sharing facility": 11,200 "portable batch system": 15,700 SGE +Sun: 204,000 LSF +"Platform Computing": 28,100 PBS +Altair: 48,700

  3. Award-winning Sun Grid EngineThousands of successful Grids Excellence in Cluster Technology

  4. Dynamic Resource Management Distributed ResourceManager Jobs Dispatch Results

  5. Sun Grid Engine Overview • Dynamic Resource Management • Job Scheduling • Resource monitoring • Policy administration • User authentication and access control • Accounting and reporting

  6. Target Industries & Typical Workloads Industries Computing Tasks Data Centers Resource assignment across persistent services Health Sciences Medical imaging, bio-informatics Financial Services Risk and portfolio analysis, Monte Carlo simulations Manufacturing EDA, MCAD, fluid dynamics, crash test simulations, aerodynamics modeling Entertainment/Media Digital content creation, animation, digital asset management Energy Reservoir simulations, seismic processing Software Build, test, verify Government/Edu Weather analysis, nuclear yield simulation

  7. Customers Sample Customers Industries Health Sciences One of the largest pharmaceutical companies in USA Financial Services Largest Wall Street financial institution, Deutsche Bank Manufacturing Ford, Transmeta, Mentor Graphics, Monsanto Entertainment/Media GlobeXplorer, Inc. Energy Landmark Graphics Government/Edu DoE INEEL, University of Leicester, University of Aachen

  8. Resource Matching Selection Scheduling JOB User • User policies • Groups • Roles • Departments • Projects • Job policies • Resources • System characteristics • System status • Resources

  9. Sun Grid Engine Components Execution Daemon ARCo Execution Daemon qsub qrsh qlogin qmon qtcsh QMaster qmaster Execution Daemon App Scheduler DRMAA Execution Daemon Shadow Master

  10. Data Management No explicit data management Script files are transferred Binaries are not Shared File System NFS “by default” File Staging Copy data in before job Copy results out after Not inherent feature Configured via scripting hooks

  11. Security Access Control Lists Explicitly allow or disallow Users and groups Restricted operations Managers and operators Submit and admin hosts Certificate-based encryption Hides and protects data Guarantees identity Replace rsh with ssh

  12. Maximum Flexibility Almost every behavior can be configured Resources Load sensors Hierarchical Hosts, host groups, queues, etc. Users, user groups, departments, projects, etc. Script-based integration points Suspend/resume Job execution Checkpointing, Parallel Environments

  13. Scalability Sun Grid Engine 6.1 target: 10k+ hosts (hosts ≤ CPU's)‏ 500k+ jobs (no task limit)‏ Sun Grid Engine 6.2 target: 90k+ cores Sun Grid Engine 6.1: Job round-trip 0.4s Mostly fork and exec Submit rate >120 Jobs/sec Using DRMAA

  14. Policies and Priorities User 1 Project C Team B Enterprise-wide Resource Demand User 2 Department 1 Contractor X Project A Department 2 Departmental Resource Access Department 3 Department 5 Department 4

  15. Sophisticated Scheduler Align resource usage with business policies Historical usage tracking Time-based priorities Resource-based priorities Fine-grained quotas Maximize utilization Hardware and software Dynamic, continuously evaluated Changes take effect immediately No restart

  16. Grids in the Enterprise Accounting Production Development Running Jobs Siloed Grids Waiting Jobs Idle resources in some grids, Jobs waiting in others.

  17. The Enterprise Grid Accounting Production Development Borrowed Resources Running Jobs Resources shared among departments. Policies can ensure “fair” usage.

  18. Accounting and Reporting ARCo: Accounting and Reporting Console Fine-grained resource accounting Stored in RDBMS in well-defined schema Standard SQL access for 3rd party tools Customizable and extensible Web-based console tool Generate reports, queries, etc. Customizable queries and report formats Spreadsheet report generation for offline analysis

  19. Accounting and Reporting Console • Result List • Save new results • View results generated offline • Query List • Run by ordinary users • Create, Edit by privilegedusers

  20. Customizable Results View • Tables • Simple • Pivot • Definable fields • Customizableheadings • Graphs • Line Chart • Bar Chart • Pie Chart • 3-D or flat

  21. Distributed ResourceManagement Application API Standard from the Open Grid Forum Submit, monitor, control jobs Language & platform agnostic ISVs “Grid-enable” their applications Avoid DRM/Grid system lock-in In-house developers Integrate Grid tasks into workflow, orchestration, online apps, etc.

  22. User Interfaces Browser (accounting)‏ Command-line Graphical <c/> <java/> Programmatic (DRMAA)‏ Programmatic (DRMAA)‏ Sun Grid Engine

  23. Sun Grid Engine Multi-Clustering I need resources I have 2 free Sun Grid Engine grid #1 Sun Grid Engine grid #2 Spare Pool Service Domain Manager

  24. Sun Grid Engine Multi-Clustering I can spare some I still need resources Sun Grid Engine grid #1 Sun Grid Engine grid #2 Service Domain Manager

  25. Sun Grid Engine Multi-Clustering Grids are monitored by Service Level Objectives Policies control relative grid priorities Sun Grid Engine grid #1 Sun Grid Engine grid #2 Service Domain Manager

  26. Multi-Clustered Accounting Multiple grids can use the same ARCo database All accounting data available from the same web interface Sun Grid Engine grid #1 Sun Grid Engine grid #2 ARCo

  27. SGE 6.2 Cloud Connectivity • install a 0-node SGE system on your laptop or desktop and allocate nodes on the Cloud (EC2) on demand. • I.e. SGE Cloud Connectivity feature is fully elastic: • Grid resources allocated through it can go from 0 to whatever is needed and covered by the user's budget. • And they can go back to 0, of course. • All policy controlled. • No user intervention required. • Secure Communication: OpenVPN (part of EC2 AMI and of SGE instance running on user laptop or desktop)

  28. Open Source Project Foundation for Sun Grid Engine Development happens in open source Very widely adopted – strong community Active mailing lists Monitored by the development engineers Licensed under SISSL http://gridengine.sunsource.net/ http://gridengine.info/ By the community, for the community

  29. Product Versus Open Source Support and/or indemnification important? Licensed product Exploring your options? Sun Download Center Want to customize? Open source Want to run on unsupported platforms? Open source Want unsupported features? Open source

  30. 6.1 Supported Platforms

  31. Tokyo Institute of Technology Largest Supercomputer in Asia top500.org Debuted at #7 on Top 500 List June 2006 #14 on June 2007 list Research grid Numeric simulations 25+ applications 6 different flavors of MPI, plus OpenMP, DDI, etc. TSUBAME Grid Cluster

  32. TSUBAME Applications List

  33. TSUBAME Components 655 Nodes Sun Fire x4600 16 cores per node → 10480 cores ClearSpeed CSX600 accelerators 85TFlops theoretical / 48TFlops peak 21.4TB aggregate memory Infiniband network 8 Voltaire ISR 9288 Lustre file system 42 Sun Fire x4500 48 500GB SATA disks per node → ~1PB

  34. Texas Advanced Computing Center First National Science Foundation Track2 system $30M acquisition budget $29M for support over 4 years Awarded September 2006 Production December 2007 TeraGrid member Over 3200 users Over 1000 projects From 48 states Physics, molecular biology, chemistry, astronomy, etc. Larger than current top 20 TeraGrid systems combined Ranger System

  35. Ranger in the TeraGrid

  36. Ranger Components Nodes 82 Sun Blade 6048 – 3936 blades 16 cores per blade → 62976 cores 504TFlops peak 125TB aggregate memory Infiniband network 2 Sun Datacenter Switch 3456 Lustre/QFS/SAM file system 72 Sun Fire x4500 Largest file system is ~1PB

  37. Ranger Switching Fabric

  38. More Information Main product page: http://www.sun.com/gridware/ Open source project site: http://gridengine.sunsource.net/ Community site: http://gridengine.info/ Open source Service Domain Manager site: http://hedeby.sunsource.net/

  39. SUN GRID ENGINE • First Last • first.last@sun.com

More Related