170 likes | 291 Views
Assembling the EVOp Infrastructure. Yehia El- khatib , Gordon S. Blair School of Computing & Communications Lancaster University. Outline. EVOp: An introduction Assembling the infrastructure Developing the web interface Why cloud? Issues influencing cloud uptake What could be done
E N D
Assembling the EVOp Infrastructure Yehia El-khatib, Gordon S. Blair School of Computing & Communications Lancaster University
Outline • EVOp: An introduction • Assembling the infrastructure • Developing the web interface • Why cloud? • Issues influencing cloud uptake • What could be done • Summary
EVOp • Environmental Virtual Observatory pilot. • 2 years from the start of 2011. • Funded by the UK Natural Environment Research Council (NERC) to help tackle big environmental science questions through: • Enabling the integration of a variety of information sources at different granularities and scales. • Facilitating the handling of large data sets from different sources. • Providing simple access tools to increase engagement from policy makers, local communities and the general public. • Focus on hydrology. Grant reference NE/I002200/1
EVOp: 4 main user groups Scientists Policy Makers Local Communities General Public Web Interface Web Services Models Virtual Resources Processes, not design. Hardware Resources
Infrastructure • Hybrid model where private resources are normally used, and public resources are used at times of increased load. • Developed our own load balancer to manage resource usage to reduce costs while maintaining user experience. • Might adjust in the future to run experimental services (e.g. tailored workflows) on private resources, and move more streamlined services (e.g. models) to run on public resources.
Infrastructure • Public cloud: Amazon Web Services. • Private cloud managed by Eucalyptus Community Cloud. • Provides an open source alternative to EC2 and S3 (similar interfaces). • However, moving between Eucalyptus and AWS is not always easy, as images need a lot of preparation beforehand. • Moreover, recent versions (1.6+) had stability issues. • Also, community support is weak. • Currently testing OpenStack as an alternative, also AWS-compatible. • The use of jClouds is very important to us to minimaliseportability overheads (prevent being locked in to one cloud provider).
Multifaceted Web Interface • We cater to different user groups of varying backgrounds and experience levels. • Users (including scientists) are not IT experts, or at least would rather not be! • They do not want to tussle with compatibility issues, security restrictions, stringencies about citing/sharing, etc. • Developed an intuitive user interface that is tested repeatedly with stakeholders to ensure a low entry barrier for all targeted user groups. • Easy to use (find your way around) • Easy to understand (comprehend what this offers) • Easy to relate to (tweak-ability, reproducibility, reuse, sharing)
Multifaceted Web Interface • General interface allows users to do things like: • Learn about the risk of flooding in their local area. • Explore how different farming practices affect such risk. • Authenticated government / local council officials could: • Learn about polluting nutrients diffused from different catchments. • Examine how policies would affect pollution levels at different scales. • An “advanced path” allows scientists to compose workflows: • A workflow is a pipeline of basic execution units (executables, scripts, web services, etc.). • Done in the browser. No programming prerequisites. • Allows the sharing of workflows and datasets to promote reuse, citing and collaboration.
Why Cloud? • Flexibility (Virtualisation) • Allows the dynamic provisioning of bespoke environments. • Everything from the hardware, platform, libraries, etc. can be customised to suit the exact needs of an application. • Very little limitations on what the application should be.Build what we want! • To draw a comparison: Grid users are tied in to too many specifications of the grid environment: hardware architecture, runtime environment, scheduling interface, and supported application interface. As such, only certain types of jobs can be submitted, where precompiling is sometimes needed to ensure compatibility.
Why Cloud? • Versatile resource management (SOA) • All resources have a uniform view. • Allows us to support data assets of different origins: from in situ gauging stations, warehoused data stores, user provided, and external sources. • Facilitates sharing and reuse (e.g. workflows) which promotes a culture of collaboration. • Provides abstraction so that data can be used in models and simulations without necessarily giving it away. • Provides transparencydetails of where and how the data is held are hidden without affecting user experience.
Why Cloud? • Easy access to resources (IaaS) • IaaS: hardware resources as a utility. • Allows the infrastructure to scale to meet user demand and maintain quality of service. • Ease of mind: issues of reliability, performance, and security at that hardware level are outsourced. • Allows us focus on solving domain-specific problems. • No usage quotas (unless you want to). • Very few AAA hoops to jump through.…as long as you can pay for that!
Issues Influencing Cloud Uptake • Users see the advantage straight away. • Previously a scientist needed to have the data on their computer, develop & calibrate a model, run it. Check output. Rinse & repeat. • Identify ease of use, universal access, abundance of resources. • Some data producers are reluctant to provide their data through what they perceive as new, untested means. • Some communities are more advanced than others. • Easier to get funding based on the PAYG economic model. • Cut upfront costs. Reduce money spent on unused resources. • Funding bodies still perceive security to be a concern. • A cloud is just a computer system. • Public could service providers have whole teams working on security.
What more could be done? • Introduce national (or even regional) initiatives to regulate cloud service provisioning. • This should ease a lot of the worry about trust. • Educating data owners about cloud computing. • Difficult. • Hopefully success stories (e.g. NGS cloud, EduServ) could alter attitudes. • Educating research communities about available cloud solutions. • Teach students cloudnumbers.com rather than MATLAB.
Summary • Cloud resources are easy to steer in order to serve the needs of a scientific application without imposing development restrictions, integration boundaries or deployment difficulties. • Public cloud is convenient, but a hybrid one offers more options. • There are concerns surrounding trust and security (such as data licensing) that affect the uptake of cloud computing in research communities. • Some measures could be taken to alleviate such concerns.
http://www.evo-uk.org/ Thankyou!Questions @EVOpilot Yehia El-khatib http://www.comp.lancs.ac.uk/department/staff.php?name=gordon Gordon S. Blair gordon@comp.lancs.ac.uk http://www.comp.lancs.ac.uk/~elkhatib/ @yelkhatib