470 likes | 548 Views
Plan. What GDA is. A software suite for synchrotron experiment control and data collection Jointly developed by staff/developers from DASC (Data Acquisition and Scientific Computiing Group) at Diamond Light Source Ltd, Rutherford Appleton Lab, Chilton, UK
E N D
What GDA is • A software suite for synchrotron experiment control and data collection • Jointly developed by staff/developers from • DASC (Data Acquisition and Scientific Computiing Group) at Diamond Light Source Ltd, Rutherford Appleton Lab, Chilton, UK • SRCG (Synchrotron Radiation Computing Group) at Daresbury Lab, Warrington, UK • Our aim is to develop an Integrated Science Experiment Environment for synchrotron users at Diamond and SRS Now
Diamond Floor Plan • Diamond House
Experimental stations Phase II: 15 beamlines completed by 2012 Phase I: 7 beamlines - 2007 Booster synchrotron Radio Frequency cavities LINAC Storage ring Beamline
Diamond Constraints • Up to 32 beamlines each covering a different area of scientific research. • More than 1000 scientific proposals accepted per year with varying durations, some as short as 8 hours. • Typically more than 3000 different scientific users will access the facility in one year. • Possibly up to 1000 Terabytes of data collected per year in more than 10,000,000 files. • These data files will need to be catalogued and accessed remotely.
All have common underlying software Diversity of Beamlines (some examples) I16 Materials and Magnetism I06 Nano-structures I15 Extreme Conditions I03 Macro Molecular Crystallography I11 Powder Diffraction
Diamond Overall Requirements • Users are uniquely identified and should need to log in once only for all aspects of the experiment. • Users can move from beamline to beamline as easily as possible. • Common scripting environment across all beamlines. • Remote access including role based access control. • Data migration is automatic from beamlines to externally accessible repository. • Data evaluation and reduction as close to online as possible. • Integration of data reduction and analysis workflows. • Metadata in files sufficient for data analysis • Seamless access to remote large computing resources.
Single Sign On • The aim of this project was to provide a mechanism for uniquely identifying users of UK large scientific facilities irrespective of their method of access. • All users of the major facilities will need only one username/password combination to access any of the facilities. • These credentials or an automatically generated certificate or token will allow access to any computing technology given the correct authorization. • The authorization will be performed locally by the facility involved based on the single unique identifier derived from 1-3. • Partners: STFC, e-Science, SRS, ISIS, Diamond
Beamline and Data Access All based on Single Sign On (SSO) • Beamline Access • Authorized remote beamline control and processing monitoring • Integrated data reduction/analysis for real-time visualisation • Configurable experiment environments • Data Access • Authorized remote access to data and metadata catalogues • Location independent user interface: SRB • Scheduled data migration to central data store (DCC) • Web user interface. (for secure ftp) • Searchable metadata database: Standardized metadata description • Adoption and support for a minimal set of descriptive data formats: NeXus: (http://www.nexusformat.org/Main_Page) and imageCIF (http://www.iucr.org)
Design considerations • A single software framework to all beamlines • Similar look and feel across all beamlines • Reduce users’ learning requirement • Common scripting environment • Flexible, adaptable, configurable, and “plug-play” where possible. • Support both EPICS and non-EPICS hardware • Single integrated science experiment environment • Customisable GUIs and configuration • An open framework that is easy to extend • A system that is simple and easier to maintain
Generic Data Acquisition Overview • What GDA is • GDA architecture and components • User Interfaces • Scripting tools • Embedded Editors • Supported Detectors • EPICS Interfaces • Extension mechanism • Data management and visualisation • GDA deployments
Design choices • Java as the main programming language • platform independent • opportunity to use open source tools • Jython as the scripting language • simple syntax, which we can extend as required • very close relationship to Java • Distributed design • ‘thin’ client programs with multiple servers • Reusable components – adaptable for special experiment requirement. • Extensible with well defined interfaces – enable independent specialist developments. • Self-descriptive controls with GUI and scripting assistance. • more flexible design for future work
GDA fundamentals • Java as the main programming language • Java 1.6 platform independent • Jython as the scripting language with essentially the same syntax as python • Jython 2.2 or later, very close integration with Java • XML configuration • Beamline configuration: device objects • Software integration: EPICS etc • Distributed system • ‘thin’ clients with multiple servers • Reusable & configurable components • Use many open source tools, components • Castor, CAJ, Jacorb, Apache software, etc
GDA Architecture • The Core of GDA • structured in a 3-layer architect • with CORBA backbone for control and update • Top: GUI, OE representation & data visualisation • Middle: Command server, Control request & logics • Bottom: Device server, drivers and controllers • Many other services • Messages, logging, data curation, etc
Scripting Editor Scripting Console Tear off Scripting GUI
Scripting Tools • Embedded Jython interpreter/editor in the Java Application • Give direct access to java objects • Terminal window for command line interface • Scripts are processed in the Command Server • Scanning Framework • Manage and monitor scan processes • Build-in commands: scan, gscan, cscan, tscan • Extensible syntax • Possible emulation of spec, pincer, etc • Script editor
EPICS Interface • XML Interfaces • XML files describing the interface specification • Generated from EPICS databases • Config.xml : Specify beamline operation modes • Devices.xml:device instance available and its type • Types.xml: interface and subsystem definitions • XML schema (epicsgda.xsd) • Define the XML file structures and interface types • generate object marshall/unmarshall codes • Minimise change impacts between GDA and EPICS
Hardware facility Support • User Workstation • Support both Windows PC and Linux PC • Data Storage Disk Array • RAID 5, 1-20 Terabytes on the beamline (short term) • Possible data store with Data Curation Centre (long term) • Data Backup • CD/DVD writer, docking for external Hard drive, etc. • Computing resources • Multiprocessor Rack (Optional) • Network connection • Visitor subnet • Wireless Access Point
Metadata Items – 3 layer model The concept is that the file should contain sufficient metadata to allow subsequent analysis. • General – date, title, proposal number, visit, user, beamline • Beamline specific – positions of beamline components, sample to detector, detector type • Experimental – beam current, duration of scan(s), log of activity, user comments, …
Detector Data Handler (DDH) GDA DDH HE458 John Smith JS06 MetaData Detector Control Data + MetaData Nexus Format Data X-ray Detector Data ATLAS Centre Tape Storage (via SRB) Beamline Storage Disk 5-20Tb Data + MetaData • Automatic data backup via gridStorage Resource Broker (SRB) • Add meta-data (eg. Experiment name) to data • Format data (default: Nexus) • Transfers data into SRB at ~30MB/s • Preliminary tests are very successful.
Data Volume Estimates MX - 2Tb/day per beamline (measured at ESRF with ADSC 315) • x 3 for I02,I03,I04 = 6Tb/day • %3 for compression, assuming CPU available =2Tb/day • Beamline store ~ 20/0.67 = 30 days • Intermediate duration ~ 180/2 = 90 days (say 2 months) • Number of files > 10,000 NCD and Imaging (I18,I22) • 4Mb/sec peak x 2 (2 beamlines) • Assume 25% duty cycle. 2 x (4/4) * 3600 * 24 = 0.86Tb/day • %3 for compression, assuming CPU available ~ 0.5Tb/day • Intermediate duration ~ 20/0.25 = 80 days • Number of files > 10,000 Experiment (I06, I15, I16) • 0.5Mb/sec x 3 (3 beamlines) • Assume 25% duty cycle 3 x (0.5/4) * 3600 * 24 = 0.15Tb/day • %3 for compression, assuming CPU available ~ 0.05Tb/day • Intermediate duration ~ 20/0.05 = 400 days • Number of files > 20,000 Load on central server ~ 2.6Tb/day
Data Visualization • Integrated Data Visualisation Tools • 1D, 2D, 3D • In processing of integration • Data fitting • Integration with IDL • Direct access to IDL objects to display data
Visualization in GDA 1D Plot 3D Plot 2D Plot Fourier Transforms Image overlay
Data Analysis Framework The central concept is allow data processing to proceed in a series of discrete steps with decision process being possible between each. Typically the overview of this data processing pipeline would be a UML modelling diagram or ,more commonly, a flow diagram. The advantages of separating the data analysis into discrete sequences of steps: • The processing programs themselves may be step based • The programs may be available in binary only for a particular computer architecture • The programs may be distributed over different machines particularly should their processing requirements be large. • Assuming that Single Sign On (SSO) is functioning it should be practical to perform this processing distribution to GRID resources such as SCARF or HPCX and avoid the necessity to enter authentication at every step. • It is possible to use the decision process to proscribe different processing branches depending on the results of a particular sequence step. • Automate potentially large numbers of processing steps to be performed without user intervention.
Visualization in GDA 1D Plot 3D Plot 2D Plot Fourier Transforms Image overlay
Extension mechanism and futures • Support 3rd party software as plugins to GDA • Simple to use by copy/drag-drop and configure • Currently only GUI Panel based • motion control OE plugin • Data analysis plugin • Standard location for the extension jars • Enable auto class load at start • Configuration - XML <Plugin> <name>PanelName</name> <pluginName>package.Classname</pluginName> </Plugin>
Linking into experiment pipelines e.g. Crystallisation Phasing Data Collection Protein Production Protein Structure Target Selection Structure analysis Deposition
SRS Staff Greg Diakun (Head) Geoff Mant Paul Stephenson Karen Ackroyd Glenys McBain Steve Kinder Christine Ramsdale Mike Miller DLS Staff Bill Pulford (Head) Paul Gibbons Alun Ashton Stuart Campbell Vasanthi Nagaralingam Matt Pearson Eric Ren Jun Aishima Tobias Richter Rob Walton Stuart Robertson Richard Woolliscroft Fajin Yuan Karl Levik Frank Sun Acknowledgement