1 / 17

Assembling the EVOp Infrastructure

Assembling the EVOp Infrastructure. Yehia El- khatib , Gordon S. Blair School of Computing & Communications Lancaster University. Outline. EVOp: An introduction Assembling the infrastructure Developing the web interface Why cloud? Issues influencing cloud uptake What could be done

wilda
Download Presentation

Assembling the EVOp Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assembling the EVOp Infrastructure Yehia El-khatib, Gordon S. Blair School of Computing & Communications Lancaster University

  2. Outline • EVOp: An introduction • Assembling the infrastructure • Developing the web interface • Why cloud? • Issues influencing cloud uptake • What could be done • Summary

  3. EVOp • Environmental Virtual Observatory pilot. • 2 years from the start of 2011. • Funded by the UK Natural Environment Research Council (NERC) to help tackle big environmental science questions through: • Enabling the integration of a variety of information sources at different granularities and scales. • Facilitating the handling of large data sets from different sources. • Providing simple access tools to increase engagement from policy makers, local communities and the general public. • Focus on hydrology.  Grant reference NE/I002200/1

  4. EVOp: 4 main user groups

  5. EVOp: 4 main user groups Scientists Policy Makers Local Communities General Public Web Interface Web Services Models Virtual Resources Processes, not design. Hardware Resources

  6. Infrastructure • Hybrid model where private resources are normally used, and public resources are used at times of increased load. • Developed our own load balancer to manage resource usage to reduce costs while maintaining user experience. • Might adjust in the future to run experimental services (e.g. tailored workflows) on private resources, and move more streamlined services (e.g. models) to run on public resources.

  7. Infrastructure • Public cloud: Amazon Web Services. • Private cloud managed by Eucalyptus Community Cloud. • Provides an open source alternative to EC2 and S3 (similar interfaces). • However, moving between Eucalyptus and AWS is not always easy, as images need a lot of preparation beforehand. • Moreover, recent versions (1.6+) had stability issues. • Also, community support is weak. • Currently testing OpenStack as an alternative, also AWS-compatible. • The use of jClouds is very important to us to minimaliseportability overheads (prevent being locked in to one cloud provider).

  8. Multifaceted Web Interface • We cater to different user groups of varying backgrounds and experience levels. • Users (including scientists) are not IT experts, or at least would rather not be! • They do not want to tussle with compatibility issues, security restrictions, stringencies about citing/sharing, etc. • Developed an intuitive user interface that is tested repeatedly with stakeholders to ensure a low entry barrier for all targeted user groups. • Easy to use (find your way around) • Easy to understand (comprehend what this offers) • Easy to relate to (tweak-ability, reproducibility, reuse, sharing)

  9. Multifaceted Web Interface • General interface allows users to do things like: • Learn about the risk of flooding in their local area. • Explore how different farming practices affect such risk. • Authenticated government / local council officials could: • Learn about polluting nutrients diffused from different catchments. • Examine how policies would affect pollution levels at different scales. • An “advanced path” allows scientists to compose workflows: • A workflow is a pipeline of basic execution units (executables, scripts, web services, etc.). • Done in the browser. No programming prerequisites. • Allows the sharing of workflows and datasets to promote reuse, citing and collaboration.

  10. Why Cloud? • Flexibility (Virtualisation) • Allows the dynamic provisioning of bespoke environments. • Everything from the hardware, platform, libraries, etc. can be customised to suit the exact needs of an application. • Very little limitations on what the application should be.Build what we want! • To draw a comparison: Grid users are tied in to too many specifications of the grid environment: hardware architecture, runtime environment, scheduling interface, and supported application interface. As such, only certain types of jobs can be submitted, where precompiling is sometimes needed to ensure compatibility.

  11. Why Cloud? • Versatile resource management (SOA) • All resources have a uniform view. • Allows us to support data assets of different origins: from in situ gauging stations, warehoused data stores, user provided, and external sources. • Facilitates sharing and reuse (e.g. workflows) which promotes a culture of collaboration. • Provides abstraction so that data can be used in models and simulations without necessarily giving it away. • Provides transparencydetails of where and how the data is held are hidden without affecting user experience.

  12. Why Cloud? • Easy access to resources (IaaS) • IaaS: hardware resources as a utility. • Allows the infrastructure to scale to meet user demand and maintain quality of service. • Ease of mind: issues of reliability, performance, and security at that hardware level are outsourced. • Allows us focus on solving domain-specific problems. • No usage quotas (unless you want to). • Very few AAA hoops to jump through.…as long as you can pay for that!

  13. Issues Influencing Cloud Uptake • Users see the advantage straight away. • Previously a scientist needed to have the data on their computer, develop & calibrate a model, run it. Check output. Rinse & repeat. • Identify ease of use, universal access, abundance of resources. • Some data producers are reluctant to provide their data through what they perceive as new, untested means. • Some communities are more advanced than others. • Easier to get funding based on the PAYG economic model. • Cut upfront costs. Reduce money spent on unused resources. • Funding bodies still perceive security to be a concern. • A cloud is just a computer system. • Public could service providers have whole teams working on security.

  14. What more could be done? • Introduce national (or even regional) initiatives to regulate cloud service provisioning. • This should ease a lot of the worry about trust. • Educating data owners about cloud computing. • Difficult. • Hopefully success stories (e.g. NGS cloud, EduServ) could alter attitudes. • Educating research communities about available cloud solutions. • Teach students cloudnumbers.com rather than MATLAB.

  15. Summary • Cloud resources are easy to steer in order to serve the needs of a scientific application without imposing development restrictions, integration boundaries or deployment difficulties. • Public cloud is convenient, but a hybrid one offers more options. • There are concerns surrounding trust and security (such as data licensing) that affect the uptake of cloud computing in research communities. • Some measures could be taken to alleviate such concerns.

  16. Distributed Computing Paradigms

  17. http://www.evo-uk.org/ Thankyou!Questions @EVOpilot Yehia El-khatib http://www.comp.lancs.ac.uk/department/staff.php?name=gordon Gordon S. Blair gordon@comp.lancs.ac.uk http://www.comp.lancs.ac.uk/~elkhatib/ @yelkhatib

More Related