120 likes | 228 Views
Batch Computing at Altera. Condor, Quill and The Enterprise. About Altera. “The Programmable Solutions Company” Pioneer of SOPC technology Founded in 1983 $1.02 billion in 2004 sales 2,300 employees 14,000+ worldwide customers. About Programmable Solutions. Programmable Logic
E N D
Batch Computing at Altera Condor, Quill and The Enterprise
About Altera • “The Programmable Solutions Company” • Pioneer of SOPC technology • Founded in 1983 • $1.02 billion in 2004 sales • 2,300 employees • 14,000+ worldwide customers
About Programmable Solutions Programmable Logic Devices (PLDs) Intellectual Property(IP) Development Software
About Me • Senior Software Engineer at the Toronto Technology Center • B.A.Sc. in Engineering Science from the University of Toronto • Joined Altera in 2001 • Focus on distributed computing
Where It All Began • Developed in Toronto • Centralized scheduling system • Multiple queues • Priority/FIFO execution • No limit on resource claims • Engineer-designed, custom API
Change Is Good, Right? • Multi-OS support • Redundancy and fault tolerance • Easy expansion beyond Toronto • Easy-to-use API • New features • Improve matchmaking • Capacity planning Really Important!
META SCHEDULER SOAP PriorityEngine CONDOR TTC DB Pain Free Migration CONDOR POOL USERTOOLS TTCPOOL
Time Stands Still • Nice-style priorities [1:N] • Use priority factor to ensure PN negotiates before PN+1, PN+2, etc. • RUP(PN) = 0.5 • EUP(PN)/EUP(PN+1) = ½ • Freeze RUP values in time • PRIORITY_HALFLIFE = 100000000000000000000 • Let jobs at PN get all VMs in the system • NEGOTIATOR_IGNORE_USER_PRIORITIES = True
Translation Services <cluster> <id>1</id> <priority>2</priority> <os>windows</os> <group>fitter</group> <job> <id>1</id> ... </job> <job> <id>2</id> ... </job> ... </cluster> +AlteraClusterID = 1 +AlteraGroup = fitter requirements = OpSys = ... +AccountingGroup = P1 AlteraTargetOs = windows ... +AlteraJobID =1 ... queue +AlteraJobID = 2 ... METASCHEDULER
SQL! SQL! Everywhere! METASCHEDULER USAGE HISTORY POSTGRESQL DBMS CONDOR QUILL STATUSINFO CONDORCOLLECTOR SYSTEM AUDITS
From Here, Where? • Roll out across the enterprise • Scaling with multiple schedds • Quill++ • DBMS for configuration management (with R. Nordlund & J. Stowe from The Hartford)