560 likes | 637 Views
Web 2.0. Elder Matias CLS – 09-04-28. What Is Web 2.0?. In plain English …. Automating tedious tasks using web technology Tools to help people and software collaborate. Scientific American May 2008 Science 2.0 – The Risk and Reward of Web-Based Research ---------------------------------
E N D
Web 2.0 Elder Matias CLS – 09-04-28
What Is Web 2.0? • In plain English …. • Automating tedious tasks using web technology • Tools to help people and software collaborate
Scientific American May 2008 Science 2.0 – The Risk and Reward of Web-Based Research --------------------------------- “Our real mission isn’t to publish journals but to facilitate scientific communication” Timo Hannay – Head of Web Publishing at Nature Publishing Group
ScienceStudio Elder Matias CLS – 09-04-28
User Access to Synchrotrons • Who is the community that will use your platform? • Synchrotrons are electron storage rings that emit high intensity photons that are used for experiments by a large scientific community (tens of thousands worldwide). • Access is normally granted for single periods of 1-3 days in a half-year cycle. • What couldn’t your community do without the platform? • Physical distances and episodic access prevent rapid scientific progress and limit scientific collaboration. • Why was that a problem or limitation? • Governments worldwide have invested >$2B in these facilities, yet the scientific outcomes could be optimised.
User Access to Synchrotrons • What middleware was needed to resolve the limitations? • Workflow management Engine for the User Office • Web Portal for remote data access (during and post experiment) • Enterprise Service Bus and SOA to integrate internal and external data analysis services • How do your plans meet the needs • Users will have frequent remote access to the VESPERS beamline at the Canadian Light Source under conditions where many collaborators can participate in the experiment.
Science Studio serves three purposes: • Management of all aspects of a scientific experiment including data storage, collaboration with others, processing of data; • Control of, or interaction with, remote experiments on the CLSI VESPERS Beamline and UWO Nanofabrication Laboratory and • User Services (sample management, scheduling, peer review, user training)
Team: People and Orgs System Requirements Testing Data Analysis/Grid Computing Remote Control User Services System Deployment Integration User Office Software Scientific Workflow Engines System Architecture
Team: People and Orgs Mike Bauer Stewart McIntyre Marina Suominen Fuller Jinhui Qin Nathaniel Sherry Dionisio Medrano Dylan Maxwell Daron Chabot Elder Matias Yuhong Yan Zahid Anwar Ludeng (Eric) Zhao Dan Ni YaofengXu Chris Armstrong John Haley
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB System Architecture • VESPERS Beamline • EPICS control system • Beamline Control Module (BCM) • Web Application • Database • File Storage • Web Interface
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB VESPERS Beamline • VESPERS — Very Sensitive Elemental and Structural Probe Employing Radiation from a Synchrotron • A bending magnet beamline on sector 6 at the Canadian Light Source synchrotron in Saskatoon, Saskatchewan. • A hard x-ray microprobe with an energy range of 6 to 30keV. • Techniques: X-Ray Fluorescence (XRF) & X-Ray Diffraction (XRD)
VESPERS Endstation CCD Detector (XRD) Microscope Sample MCA Detector (XRF)
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB EPICS Low-level Control System • EPICS — Experimental Physics and Industrial Control System • The standard control system at the CLS. • EPICS consists of a network of Input-Output Controls (IOCs) which are connected to directly to devices. • An IOC provides many Process Variables (PVs) which relate to either an input or output from a device and have a unique name. • Channel Access (CA) is used to read or write to any PV without knowing which IOC provides the PV. • More than 50,000 PVs in the CLS control system.
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB Beamline Control Module (BCM) • The BCM provides a high-level interface to the low-level control system (EPICS). • Logical and physical separation of business logic and control logic. • Virtual device abstraction that provides independence from low-level control system. • Virtual devices can be logically organized into a device hierarchy. • Basic devices can be combined to build more functional devices. • Communication with external applications using two message queues (ActiveMQ).
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB Web Application • A J2EE Servlet application that provides a web-based interface Science Studio. • Tools: Spring (MVC), iBATIS (ORM), JSecurity (Apache Ki), Apache Tomcat • Divided into two parts: the Core application and the VESPERS beamline application. • Core application is responsible for providing access to the business objects. • VESPERS application is responsible for remote control of the VESPERS beamline.
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB Database • Metadata associated with the operation of a remote controlled beamline and the organization of experimental data collected on that beamline. • A project is the top level organizational unit and is associated with a project team. • A session defines a period of time allocated to a project team to conduct experiments. • An experiment relates a sample and the technique being applied to that sample. • A scan records the location of the acquired experimental data.
person facility laboratory instrument technique project_person project Instrument_technque project_role sample session experiment scan Database Schema
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB Experimental Data Storage • Experimental data is stored at the CLS. • Common directory structure shared with other beamlines. • A large data storage facility is now operational at the University of Saskatchewan as part of WestGrid.
Web Application Beamline Control Module VESPERS HTTP JMS CA SAN DB VESPERS Web Interface • Rich web interface to Science Studio and the VESPERS beamline. • Designed to be used over commodity broadband internet. • Developed for the Firefox web browser without any additional plugins or extensions. • Known to work with other browsers, but requires the Canvas HTML tag. • AJAX is used for the VESPERS interface to provide device values in pseudo real time. • ExtJS, a JavaScript framework, provides many advanced GUI elements.
User Office Workflow CLS health & safety inspection 6-month cycle Mar cientistpacks sample Many other tasks • Perform Experiment • Return Sample • Take Survey • … CLS call for proposals CLS grants scientist Beamline time CLS gathers proposals I wonder if CLS received my sample yet? Scientist must complete Online SS training CLS reviews proposals Proposal submission To CLS Goal: Many tasks in proposal & sample management at CLS To develop a workflow management system that • manages ordering of tasks e.g. (training beforeshipping) • Tracks manual as well as SS task progression
User office Workflow Status Workflow Management Engine Beamline User User Office Features Open source Petri-nets based Direct support for workflow control flow patterns Ability to interact with web services declared in WSDL Relies on XML standards e.g. XPath and XQuery for data & doesn’t use proprietary languages Architecture System Core: YAWL engine. Engine instantiates specifications designed using YAWL designer. managed by the YAWL repository Environment composed of YAWL services inspired by “web services” paradigm, end-users, applications, and organizations are all services in YAWL. Task :Training Completed Notify Record Progress Approved Notify
User Office Workflow Example Prototype Implementation 1. CLS issues a call for proposals and gives deadline2. Beamline users submit proposals3. User Office administrator ends registration or extends deadline4. User Office administrator assigns proposals to user office reviewers5. Reviewers look at proposals and rank them6. User Office looks at ranking and chooses the proposals to accept7. Accepted proposals contact persons are notified8. Beamline User completes training (web service)9. After training is completed (simulated by a delay) the CLS is notified
Scheduling Module • Goal: To automate the review process and the method by which beam time is allocated and scheduled to users depending on • the access mechanism chosen by the user and • the stage of operation (construction, commissioning or operation) of the beamline. • Side effects: • Facilitate the management of cycles, runs and modes of operation • Use automatic scheduling to handle more scheduling conditions and constraints than human beings are able to handle manually and identify optimal solutions.
Scheduling Module Features Users Submit proposals INPUT: CONSTRAINTS 1. One beamlineper experiment 2. Start time after release time 3. Only eligible beamlines can be selected . . 7. No overlap of experiment perbeamline SEARCH AND CONSTRAINT SATISFIABILITY: Integer Programming and Heuristic Algorithm Schedule OUTPUT:
S: Kα Cr: Kα & Cr: Kβ Fe: Kα & Fe: Kβ Ni: Kα & Ni: Kβ X-Ray Fluorescence (XRF): Reveals Elemental Composition Characteristic Element Lines Selected and Mapped Over a 2D Scan Area 2D Maps Generated for Selected Elemental Lines
Peak Search X-Ray Diffraction (XRD): Reveals Structural Information Peak Fitting and Indexing of Image Set to Create a Grain Orientation Map Indexing Process Grain Orientations New C Programme – Matched Peak New C Programme – Expected Peak Old IDL Programme – Matched Peak Old IDL Programme – Matched Peak The XRD Indexing programme examines the locations of peaks in an image in order to determine the kind of lattice structure the samples constituent atoms are arranged in. Shown here are the results of an older indexing programme written in IDL, and the new indexing programme, written in C. The new indexing programme is proving to be more versatile, and more reliable than the old programme, often indexing sets of data that the old programme failed with.
High PerformanceComputing Elder Matias CLS – 09-04-28
Is this about making processors faster? • “Moore’s Law”has limited us • There are also otherfundamental limits • We need to lookat parallel computers
What is High Performance Computing? • Special purpose machines, configured to solve complex problems • Usually multi-processor (tens to thousands) • Requires parallel programming • Models • Grid – multi-machines inter-connected solving the same problem, • Supercomputer – multi-processor with shared memory
Limitation of Parallel Programming(Amdahl’s Law and Gustafson’s Law) • The degree to which a problem can be expressed using a parallel algorithm will limit the speedup achieved on a multi-processor machine. Amdahl’s Law P = % Parallelism S = Speedup (x sequential)N = number of processors
Examples …. LHC • LHC at CERN is an example of a grid application where no one county has sufficient processing capabilities • 15 million gigabytes of data per year • In 2006 LHC Tier 1 Grid was tested • TRIUMF is the Canadian Tier 1 Centre for LHC Experiments Courtesy TRIUMF
How about in the synchrotron Community? • Many synchrotrons understand the need for HPC • Some of CLS users make use of WestGrid for Computation • The New WestGrid data storage facility is intended to support CLS experiments and is located on campus • UWO/ORNL/APS/CLS are working on a joint crystallography application SharcNet using the Cell environment
Diamond - Racks layout Courtesy: Nick Rees Diamond Oct/08
Diamond - Current situation Water pipes Cable Tray Courtesy: Nick Rees Diamond Oct/08
How do I get access to a HPC Machine? • Compute Canada • Responsible for High Performance Computing in Canada • Each regional grid is a member of Compute Canada • ACEnet – Atlantic Canada • CLUMEQ - Quebec • SCINET - UofT • HPCVL – Queens, Royal Military CollageSt. Lawrence, Carlson, Ottawa, … • RQCHP - Quebec • SHARCNET - Ontario • WESTGRID – Western Canada
Grid Data Storage? • UofS is the host for the new WestGrid data storagefacility • Cost: $3.2 M • Includes on-line andarchival storage • Two sites on campus • Photo: tape backup unitholding 6,000 tape(each @1TB)
ANISE Elder Matias CLS – 09-04-28
ANISE: Active Network for Information from Synchrotron Experiments “Active” means near-instantaneous stream processing of complex data during transfer to the user or to storage. Cell processing using Infosphere Streams software from IBM and lightpath provided by CANARIE network. Distributed processing on facilities provided by SHARCNET and WESTGRID. Objective: Develop such a network to provide processed results from experiments such as Laue diffraction at APS (34-ID) and VESPERS at CLS • The network would assist the integration of diffraction data from multiple and large area detectors. • The network would facilitate faster resolution of research problems and free up time for more users. • The network would encouage common data formats and protocols leding to closer collaboration.