130 likes | 235 Views
XtreemOS Application Execution Management: A Scalable Approach. Ramon Nou , Jacobo Giralt, Julita Corbalan , Enric Tejedor, J.Oriol Fitó , Josep M. Perez , Toni Cortes. Barcelona Supercomputing Center (BSC – CNS)
E N D
XtreemOSApplicationExecution Management: A ScalableApproach RamonNou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.OriolFitó, Josep M. Perez, Toni Cortes Barcelona Supercomputing Center (BSC – CNS) XtreemOSis funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.
Outline • XtreemOSOverview • ApplicationExecution Manager • Job ExecutionFlow • Monitoring • Performance and scalability • Job Execution • Job Status • Future
XtreemOSoverview • Whatis? • A Linux-basedoperatingsystemtosupport Virtual OrganizationsforGrid. • Severallayers
XtreemOSoverview • Somekeyfeatures: • TheGrideasyto use (like a Linux) • Highlyscalable. • FaultTolerant. • Abletoruninteractivejobs. • Extensible • 3 nodestypes (can be replicated): • Core • Resource • Client
ApplicationExecution Manager • Job management, Monitoringand resourcemanagement. • Access Point tosubmit and control jobs. • Distributed and asynchronous. • Extensible • Linux concepts in Gridworld: • Process-Threadparadigm. • Signals.
ApplicationExecution Manager • Severaldistributedservices: • Job Manager. • Execution Manager. • Reservation Manager. • … • Semantics: • JobUnit • Set of processes of a Job running in a resource. • Job • Set of JobUnits. Identifiedby a JobID. [Process-Thread]
Job ExecutionFlow runJob(JID) Any XOSD JID = createJob(JSDL) User Job finished (allprocessesfinished) Kernel XOSD JobMng XOSD ExecMng JobDirectory JID Schedules & Executesprocess RSS getResources(JSDL)
Monitoring • Systemmetrics. • Userdefinedmetrics. • Differentlevels of information. • Buffering. • Eachservicemantainsitsmonitoringinformation (SCOPE). • ExecMng has informationaboutprocesses. • JobMng has informationaboutjobs. • ResMng has informationaboutresources.
Performance & scalability • Key points: • Collaborationwith Linux Kernel. • No central storage. (DHT’s) • Can be replicated. • Don’tsearchforbest global scheduling, onlyfor a goodenough local scheduling. • Whatisthe performance withoutDHT’s? • Typical VO, small (100 nodes) local grid.
Job Execution • O(X2): • Needresourcemanagementforeachsubmittedprocess. • Allprocesses are fromthesamejob. (in othersystemstheywould be independentjobs)
Job status • Ask allprocessesinformation of thejobwithlowoverhead. • Look jobfinished status in 0.012 seconds (0.014 in GT5) withoutcontactingExecMng’s
Futureimprovements • Reducedinternalcommunication times. • Cachingto reduce overhead. • Someconclusions: • KernelCollaborationwith «middleware» isimportant. • DHT’s (notevaluated) are a goodoptiontodistribute data. • Butstill no high performance. • Includingthe concept 1 Job-> n Processgivestheuser a lot of benefits. • Easytounderstand, easytomanage.
XtreemOSApplicationExecution Management: A ScalableApproach RamonNou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.OriolFitó, Josep M. Perez, Toni Cortes Barcelona Supercomputing Center (BSC – CNS) XtreemOSis funded by the European Commission through the Information Society Technology under contract IST-FP6-033576.