1 / 21

Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm

Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm. Andrew Gordon Compaq Computer Corporation (Andrew.Gordon@compaq.com). @Hello = $MyTalk->Agenda();. The Windows NT Compute Farm (Today) The Compute Farm Monitoring Project Future Enhancements Questions.

ransom
Download Presentation

Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study: Using PERL to Administer and Monitor a Windows NT-based Compute Farm Andrew Gordon Compaq Computer Corporation (Andrew.Gordon@compaq.com)

  2. @Hello = $MyTalk->Agenda(); • The Windows NT Compute Farm (Today) • The Compute Farm Monitoring Project • Future Enhancements • Questions

  3. Hardware (Today) 33 Compute Servers 112 CPUs 112 GB RAM 3 Project Data Servers ~300++ GB Disk Network Connectivity 100 Mb Batch Q-ing Software Platform Computing’s LSF Software Applications Electronic Design Users Verification Engineers 39 Configured Users $ComputeFarm = new Win32::ComputeFarm();

  4. Dedicated Schedulers Compute Servers Application & Data Servers 100 Mb FDDI Ring Dedicated Compute Farm Monitoring Machine $ComputeFarm->CoolPicture();

  5. Primary Goals Reduce admin. burden Automate management Monitoring Events “Work” Tasks “Error” Checks Why PERL(vs. C++?) Simpler Faster Powerful parsing Easier to extend Could port to C++ later $A_Monitoring_System = new Project(“Does It All...”);

  6. “Work” Tasks Log Rolling Task Compute Farm Snapshots Task Temp Cleaning Task Exit Task “Error” Checks Drive Letters Check LSF Daemons Check Dr. Watson Check Password Pending Job Check Server Load Check $Monitoring_System->Magic();

  7. Auditing Levels of Logging Log All Information Log All Actions Notifications Email Warnings Recovery Actions Kill Processes Reboot Machine ($Monitor->IsVerbose()) && ($Monitor->ShouldSing());

  8. The NT Service On each machine Start and stop monitor Spawns PERL code The Configuration File Defines monitor behavior (Can be) Centrally Located The Log File(s) 1 per Machine (Can be) Centrally Located The PERL Modules PERL Objects used within the Monitor The good “stuff” Installed.pl -app Components.pm

  9. $Monitor->Startup_Sequence(); • NT Machine boots and starts NT SCM • NT SCM spawns “Monitor Service” (C) • Service calls “system ((PERL) Monitor)” • (PERL) Monitor reads configuration file • (PERL) Monitor schedules tasks & checks • (PERL) Monitor enters endless event loop • [Start/stop Monitor w/ NT SCM Controls]

  10. Open(F,$M->ConfigFile()) || BadSetup(); • Defines behavior of Monitor(s) • List of records (TESTs, SEVENTs, etc) • Defines which machines do what • Defines how often (interval or specific time) • Defines if Recovery should be performed • Defines level of Logging • Identifies Notification Recipients

  11. Use PERL_Modules; • The Real “Guts” of the Monitor • Normal OO Goals • Isolate and Encapsulate Data & Functionality • Created “Container objects” • Created “View or Interface objects” • Created “Control objects • Mostly used to hide the execution of some external command ( my $r = `ext_command`; )

  12. $Main = new Object(); • The Primary Monitor “Object” • CLsfServerMonitor • “Container” object which uses other objects: • CScheduler • CLsfTester • CLsfWorker • CLsfConfigInfo

  13. $Main->TheSimpleEventLoop();

  14. $More = new PERL_Modules(); • CScheduler • Schedules tasks and checks. • Example: my $Scheduler = new CScheduler(); $Scheduler->ScheduleEvent($Event,...); @Events = $Scheduler->GetReadyEvents();

  15. $More = new PERL_Modules(); • CLsfServerTester • Uses several other objects. • Performs error tests. • Uses CRecovery object to perform recovery. • Example: my $Tester = new CLsfServerTester(); $Tester->PerformSbatchdCheck($Test,...);

  16. $More = new PERL_Modules(); • CLsfServerWorker • Executes management tasks: • RollLogsEvent, CleanTempEvent,… • Example: my $Worker = new CLsfWorker(); $Tester->PerformSbatchdCheck($Test,...);

  17. $More = new PERL_Modules(); • CLsfConfigInfo • Provides a “view” to the LSF configuration • Example: my $LCI = new CLsfConfigInfo(); foreach $Server ($LCI->ServerMachines()) {…};

  18. $Other = new PERL_Modules(); • Other interesting objects: CLogger: provides standard logging interface. CLsfJob: provides real time LSF Job info CLsfRecovery: provides recovery actions CLsfView: provides real time LSF cluster info CNTProcessTable: provides process table info

  19. $Future = Monitor::Enhancements(); • $Using = new Win32::Extensions(); • Push (@{$Monitor}, new Checks()): • “Resource Hog” Check • “Stuck” Job Check • Service Checks • “Just makin’ it more betta”

  20. %Bye =$myTalk->Summary(); • Monitor deployed 2Q1998 • Numerous notifications and reboots • Compute farm expansion • Adding 100+ more CPUs this summer • Importance of Monitor grows • New Win32 extensions will help developers tremendously--”It’s like candy…”

  21. __END__ While ( /Questions/ ) { $myTalk->ProvideSomeAnswers(); } $myTalk->Exit(0); __THANKS__

More Related