210 likes | 224 Views
Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm. Andrew Gordon Compaq Computer Corporation (Andrew.Gordon@compaq.com). @Hello = $MyTalk->Agenda();. The Windows NT Compute Farm (Today) The Compute Farm Monitoring Project Future Enhancements Questions.
E N D
Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm Andrew Gordon Compaq Computer Corporation (Andrew.Gordon@compaq.com)
@Hello = $MyTalk->Agenda(); • The Windows NT Compute Farm (Today) • The Compute Farm Monitoring Project • Future Enhancements • Questions
Hardware (Today) 33 Compute Servers 112 CPUs 112 GB RAM 3 Project Data Servers ~300++ GB Disk Network Connectivity 100 Mb Batch Q-ing Software Platform Computing’s LSF Software Applications Electronic Design Users Verification Engineers 39 Configured Users $ComputeFarm = new Win32::ComputeFarm();
Dedicated Schedulers Compute Servers Application & Data Servers 100 Mb FDDI Ring Dedicated Compute Farm Monitoring Machine $ComputeFarm->CoolPicture();
Primary Goals Reduce admin. burden Automate management Monitoring Events “Work” Tasks “Error” Checks Why PERL(vs. C++?) Simpler Faster Powerful parsing Easier to extend Could port to C++ later $A_Monitoring_System = new Project(“Does It All...”);
“Work” Tasks Log Rolling Task Compute Farm Snapshots Task Temp Cleaning Task Exit Task “Error” Checks Drive Letters Check LSF Daemons Check Dr. Watson Check Password Pending Job Check Server Load Check $Monitoring_System->Magic();
Auditing Levels of Logging Log All Information Log All Actions Notifications Email Warnings Recovery Actions Kill Processes Reboot Machine ($Monitor->IsVerbose()) && ($Monitor->ShouldSing());
The NT Service On each machine Start and stop monitor Spawns PERL code The Configuration File Defines monitor behavior (Can be) Centrally Located The Log File(s) 1 per Machine (Can be) Centrally Located The PERL Modules PERL Objects used within the Monitor The good “stuff” Installed.pl -app Components.pm
$Monitor->Startup_Sequence(); • NT Machine boots and starts NT SCM • NT SCM spawns “Monitor Service” (C) • Service calls “system ((PERL) Monitor)” • (PERL) Monitor reads configuration file • (PERL) Monitor schedules tasks & checks • (PERL) Monitor enters endless event loop • [Start/stop Monitor w/ NT SCM Controls]
Open(F,$M->ConfigFile()) || BadSetup(); • Defines behavior of Monitor(s) • List of records (TESTs, SEVENTs, etc) • Defines which machines do what • Defines how often (interval or specific time) • Defines if Recovery should be performed • Defines level of Logging • Identifies Notification Recipients
Use PERL_Modules; • The Real “Guts” of the Monitor • Normal OO Goals • Isolate and Encapsulate Data & Functionality • Created “Container objects” • Created “View or Interface objects” • Created “Control objects • Mostly used to hide the execution of some external command ( my $r = `ext_command`; )
$Main = new Object(); • The Primary Monitor “Object” • CLsfServerMonitor • “Container” object which uses other objects: • CScheduler • CLsfTester • CLsfWorker • CLsfConfigInfo
$More = new PERL_Modules(); • CScheduler • Schedules tasks and checks. • Example: my $Scheduler = new CScheduler(); $Scheduler->ScheduleEvent($Event,...); @Events = $Scheduler->GetReadyEvents();
$More = new PERL_Modules(); • CLsfServerTester • Uses several other objects. • Performs error tests. • Uses CRecovery object to perform recovery. • Example: my $Tester = new CLsfServerTester(); $Tester->PerformSbatchdCheck($Test,...);
$More = new PERL_Modules(); • CLsfServerWorker • Executes management tasks: • RollLogsEvent, CleanTempEvent,… • Example: my $Worker = new CLsfWorker(); $Tester->PerformSbatchdCheck($Test,...);
$More = new PERL_Modules(); • CLsfConfigInfo • Provides a “view” to the LSF configuration • Example: my $LCI = new CLsfConfigInfo(); foreach $Server ($LCI->ServerMachines()) {…};
$Other = new PERL_Modules(); • Other interesting objects: CLogger: provides standard logging interface. CLsfJob: provides real time LSF Job info CLsfRecovery: provides recovery actions CLsfView: provides real time LSF cluster info CNTProcessTable: provides process table info
$Future = Monitor::Enhancements(); • $Using = new Win32::Extensions(); • Push (@{$Monitor}, new Checks()): • “Resource Hog” Check • “Stuck” Job Check • Service Checks • “Just makin’ it more betta”
%Bye =$myTalk->Summary(); • Monitor deployed 2Q1998 • Numerous notifications and reboots • Compute farm expansion • Adding 100+ more CPUs this summer • Importance of Monitor grows • New Win32 extensions will help developers tremendously--”It’s like candy…”
__END__ While ( /Questions/ ) { $myTalk->ProvideSomeAnswers(); } $myTalk->Exit(0); __THANKS__