1 / 18

Common Execution Infrastructure (CEI) Subsystem

Common Execution Infrastructure (CEI) Subsystem. OOI CI System Architecture Team:. CEI Developers. CEI Developer Patrick Armstrong University of Chicago. CEI Senior Developer Pierre Riteau University of Chicago. 2. CEI Developer John Bresnahan Argonne National Lab (part-time).

rasul
Download Presentation

Common Execution Infrastructure (CEI) Subsystem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Common Execution Infrastructure (CEI) Subsystem OOI CI System Architecture Team:

  2. CEI Developers CEI Developer Patrick Armstrong University of Chicago CEI Senior Developer Pierre Riteau University of Chicago 2 CEI Developer John Bresnahan Argonne National Lab (part-time) CEI Developer Pierre Riteau University of Chicago (part-time) 8/31/2014

  3. Subsystem Purpose • Allow OOI applications and system to • Provide Highly Available (HA) services • Scale to demand • Enact OOI deployment policies in elastic environment • Provide a deployment foundation for OOI CI

  4. Core System Structure: Service Layers

  5. CEI Scope • Elastic Computing Services • Implement elastic computing services to provide on-demand scaling and high availability. • Execution Engine Catalog & Repository Services • Working with operations and ITV to develop and refine tools to upload and sync the different deployable type representations adapted to each site. • Process Management Services • Provide the management services for policy-based process execution within specified deployable types intended to support the data distribution services; as such the processes are sequential and require primarily a process to resource match. • Process Catalog & Repository Services • The Process Catalog and Repository Services maintain process definitions as well as lists active processes. • Integration with the National Computing Infrastructure • Provide the capability to deploy OOI processing on the Amazon cloud services as well as academic clouds

  6. High Availability and Scaling • High Availability • Towards an always-on service model • Failures in outsourced resources • Providing a pool of replenishable compute resources • Autoscaling • Provide resources for peaks in demand • Ensure good utilization during “valleys” in demand • Flexible resource mix

  7. Resources for HA and Scaling • Cloud resources are available on-demand, but any particular resource may fail at any time • Applications/processes can absorb new resources • Applications/processes can tolerate failures EPU EPU Management Monitor and regulate set properties based on system-specific and application-specific metrics

  8. Managing Resources

  9. Elastic Processing Unit (EPU) Management create instance AMQP EPU Management EPU Management EPU Management Other Provisioner DTRS Decision Engine IaaS CB EE ioncore 1.2 EE matlab 6.1 EE ioncore 1.3 ou-agent ou-agent ou-agent context-agent context-agent context-agent

  10. Making the EPU HA AMQP Other create instance Bootstrap EPU Provisioner/DTRS Dedicated DE IaaS cloudinit.d ou-agent ou-agent ou-agent EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker EPU Worker

  11. Managing Processes

  12. Creating a Process I AMQP Other Process Dispatcher enter Process Instance Registry EE type A instance launch ee-agent Decision Engine lookup Process Definition Registry request to activate process X

  13. Creating a Process II create instance AMQP EPU Management Other Provisioner/DTRS IaaS request instance Process Dispatcher enter Process Instance Registry EE type A instance launch ee-agent Decision Engine lookup Process Definition Registry request to activate process X

  14. Inside an Execution Engine C – create M – monitor R – restart K – kill O – I/O AMQP Other EE type A instance C supervisord context-agent Matlab script CMR CMKO EPU Management CC instance ou-agent C supervisord M supervisord CMKO CMR CMK Process Dispatcher CC instance ee-agent process (adapter) 1 Package Server datastream subscription result

  15. Adventures in Availability Mean time between failures • Time to repair (TTR) • Diagnosis • Time to scale (TTS) • PENDING (request) • STARTED (deployment) • RUNNING (contextualization) MTBF A = MTBF+MTTR Mean time to repair TTS: preliminary results for 2,000 VMs provisioned on AWS EC2

  16. R3 Scope • Process management • Activation and validation • New execution site registration • Integration with National Infrastructure • Framework for integration of academic cloud providers, TeraGrid and OSG • Integration with Microsoft cloud

  17. R3 Activities • Refine/change scope to achieve a complete and maintainable system • Decide on specific solutions for R3 scope

  18. Questions?

More Related