1 / 30

ARCS Workshop on Software for Data Analysis of Inelastic Scattering Data

Learn about NeXus, a portable, self-describing, and extensible data format for inelastic scattering data analysis. Discover its history, benefits, design criteria, and myths dispelled.

drummond
Download Presentation

ARCS Workshop on Software for Data Analysis of Inelastic Scattering Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARCSWorkshop on Software for Data Analysis of Inelastic Scattering Data 15-16 March, 2002 California Institute of Technology Ray Osborn Argonne National Laboratory

  2. Aims of a Common Format • Remove need for local expertise • Reduce number of conversion utilities • Reduce redundant software development • Increase cooperation in software development • Increase sophistication of visualization software • Increase utility of generic software • Remove problems of data portability • Help when all documentation is lost

  3. Criteria of a Modern Data Format • It must be portable • It must be self-describing • It must be extensible • It must be flexible in data organization • It must be efficient in data storage • It must be available in the public domain

  4. History of Format Three parallel developments have led to NeXus: 1. Jon Tischler (ORNL) proposed an HDF-based format as a standard for data storage at the Advanced Photon Source (Argonne National Laboratory) 2. Mark Koennecke (PSI) made a similar proposal using netCDF for the European neutron scattering community while working at the ISIS pulsed neutron facility 3. Przemek Klosowski (NIST) produced a first draft of the NeXus proposal drawing on ideas from both sources • This formed the basis for the current design of the NeXus standard which was developed at two workshops, SoftNeSS'95 (NIST Sept. 1995) and SoftNeSS'96(Argonne Oct. 1996), attended by representatives of a range of neutron and x-ray facilities. • The NeXus API was released in late 1997.

  5. Choice of HDF • Hierarchical Data Format (HDF) • Developed at NCSA (UIUC) • Portable - Macs, VMS, U**x, Windows NT/98 • Extensible - Add anything you like when you like • Self-describing - you don’t need to have a manual • Binary (with internal compression) • Hierarchical - i.e. comprehensible and flexible • Widely used - e.g. astronomers, geophysicists • Commercially accessible - IDL, IGOR, PVwave

  6. What’s the Alternative? • imgCIF/CBF • There is a proposal to extend the Crystallography Information File (CIF) format to include binary images. This has insufficient flexibility for all neutron/x-ray instrumentation. • FITS • The Flexible Image Transport System, the astronomical data format, is not self-describing. • ISO STEP/Express • This is a standard for describing database structures rather than a data format itself. • netCDF • The Unidata netCDF standard is a flat-file format. This means that all data sets must have unique names and cannot be organized into hierarchical groups. Is no longer developed. • XML • The eXtensible Markup Language is gaining widespread acceptance as a means organizing database information, e.g. for web display. May have a role in NeXus.

  7. What is NeXus? • a set of subroutines - the NeXus API • to make reading and writing NeXus data easy • a set of design principles • to help people to understand what is in the files • a set of instrument definitions • to allow development of more portable analysis software

  8. Myths about NeXus • HDF is too complicated • the NeXus API is conceptually extremely simple • HDF does not have adequate performance • HDF5 is state-of-the-art in performance • Automatic data compression can speed i/o • NeXus does not appear to impact HDF performance • NeXus is only for storing raw data • it can store any kind of data • in principle, both raw and analyzed data can be stored in the same file transparently • it would also be ideal for Monte Carlo results

  9. Example NeXus Program in F90 program NXlrmecs use NXUmodule ... !Open NeXus output file and write global attributes if (NXopen ("sys$scratch:lrcs"//run_no//".nxs", NXACC_CREATE, file_id) /= NX_OK) stop if (NXUwriteglobals (file_id, user_name, "Argonne National Laboratory", "Argonne, IL 60439, USA", & "(630) 252-9011", "(630) 252-7777", "ROsborn@anl.gov") /= NX_OK) stop if (NXmakegroup (file_id, entry, "NXentry") /= NX_OK) stop !Open NXsource if (NXmakegroup (file_id, "source", "NXsource") /= NX_OK) stop if (NXUwritedata (file_id, "distance", -L1, "m") /= NX_OK) stop if (NXUwritedata (file_id, "moderator", "liquid methane") /= NX_OK) stop if (NXclosegroup (file_id) /= NX_OK) stop !Open NXdata if (NXmakegroup (file_id, "data", "NXdata") /= NX_OK) stop if (NXUwritedata (file_id, "title", char_value) /= NX_OK) stop if (NXUwritedata (file_id, "data", sgarray%counts, "counts") /= NX_OK) stop if (NXputattr (file_id, "signal", 1) /= NX_OK) stop if (NXmakelink (file_id, time_id) /= NX_OK) stop if (NXmakelink (file_id, phi_id) /= NX_OK) stop if (NXclosegroup (file_id) /= NX_OK) stop if (NXclosegroup (file_id) /= NX_OK) stop if (NXclose (file_id) /= NX_OK) stop end program NXlrmecs

  10. Current Status • NeXus is in regular use at a number of facilities • The NeXus design is on the web • <http://www.neutron.anl.gov/NeXus/> • The core API is available for downloading • C, F77, F90, Java, IDL • The NXdict and utility API simplifies certain tasks • C, F90 • NeXus Data Server released in 2000 • Allows pure java browsers • Various browsers now exist • Some visualization software recognizes NeXus files • LAMP, open Genie, ISAW

  11. Facility Support • APS recommended standard for APS CATs • FRM-II under development as format for new reactor instruments • ILL accepted as an interchange format; readable by LAMP • IPNS will use as the run-file format in future DAS • ISIS being used in open Genie and under development for new instruments • JKJ project proposed as run-file format for new facility • KEK under development as run-file format for new instrument • LANSCE under development as format for new instruments e.g. HIPPO • NIST to be used as format for new instruments e.g. Disk-Chopper Spectrometer • LLB in use on several spectrometers • PSI in use as run-file format for current instruments • µSR under development as a standard interchange format • SNS proposed as standard run format as long as based on HDF5

  12. NeXus API Team • The main people involved in developing the NeXus API are: • Mark Koennecke, PSI, Switzerland • Przemek Klosowski, NIST, USA • Freddie Akeroyd, ISIS, UK • Ray Osborn, ANL, USA

  13. NeXus Design Principles The design of the NeXus data files should follow two principles as much as possible : • Files must be completely self-explanatory • no need for local information, e.g. zero angles, focusing formulae • Data must be automatically plottable • automatic identification of axes, titles, units, etc.

  14. NeXus Objects There are only three types of data object in NeXus : • Data • scalar or multidimensional arrays • integer (1, 2, 4, bytes), real (single or double), or character • equivalent to HDF SDS’s • Data attributes • meta-data attached to a data item e.g. units • Groups • folders containing sets of data items and/or other groups • can have any name, but must have a predefined classe.g. NXsample • equivalent to HDF4 Vgroups • group attributes not explicitly used in NeXus (although they are used to define group classes in HDF5)

  15. NeXus Hierarchies sample (NXsample) Run1101 (NXentry) counts time_of_flight monitor (NXmonitor) integral data (NXdata) start_time Run1102 (NXentry) sample (NXsample) monitor (NXmonitor)

  16. NeXus Classes • Every NeXus group is assigned both a name and a class. • A NeXus class defines the expected contents of the group. • It is not necessary for every variable defined for a class to be present in every instance of that class. • NeXus class names begin with NX followed without a break by a lower case word with underscores used to separate words. • In general, there can be more than one group of the same class although it must not have the same name. • The NX class names are a defined part of the NeXus standard and may not be modified by the user, i.e. if the user wants to define their own classes, they must not use the NX prefix.

  17. Structure of NeXus Files

  18. NXentry Class sample (NXsample) Histogram1 (NXentry) LRMECS (NXinstrument) monitor1 (NXmonitor) monitor2 (NXmonitor) data (NXdata) start_time Histogram2 (NXentry) sample (NXsample) LRMECS (NXinstrument)

  19. Simplest NeXus file counts data (NXdata) Scan (NXentry) two_theta N.B. The programmers who produce intermediate files for storing analyzed data should agree on simple interchange rules

  20. NXentry Class • This is the only group class allowed at the top. • It should contain at least one plottable dataset. • It contains data that can sensibly be classified as a single data entry. • It should contain all the data necessary for the intended analysis.

  21. NXinstrument NXsource NXchopper NXinstrument NXcollimator NXguide NXdetector • The NXinstrument group contains all the beamline components • – defined by their distance from the sample • – positive (negative) distances are after (before) the sample • – the sample is not considered a beamline component

  22. Structure of NeXus Data Groups • NXdata groups encapsulate plottable data • Multi-dimensional data • Dimension scales • Axis labels • Title

  23. C Version of the API counts[n,m] (signal=1, units=“counts”) errors[n,m] time_of_flight[m] (axis=1, units=“microseconds”) two_theta[n] (axis=2, units=“degrees”) Defining NeXus data data (NXdata)

  24. Fortran Version of the API counts(m,n) (signal=1, units=“counts”) errors(m,n) time_of_flight(m) (axis=1, units=“microseconds”) two_theta(n) (axis=2, units=“degrees”) Defining NeXus data data (NXdata)

  25. Alternative Axis Labelling Scheme counts[n,m] (signal=1, units=“counts”, axes=“[two_theta,time_of_flight]”) errors[n,m] time_of_flight[m] (units=“microseconds”) two_theta[n] (units=“degrees”) Defining NeXus data (Version II) data (NXdata)

  26. Data Linking NXsource NXinstrument solid_angle[100] NXdetector gas_pressure[100] two_theta[100] NXmonochromator counts[100] NXdata two_theta[100]

  27. Naming Conventions • Lower case letters are used throughout, except for common symbols and abbreviations such as FWHM. • Names are constructed from full words separated by the underscore character e.g.time_of_flight. • For sequentially indexed group names, the sequential number is simply appended to the name, e.g. filter1, filter2. This convention should be used only for data group names. • The hierarchical structure of NeXus files should be used to simplify data names. e.g. “temperature” , not “sample_temperature”.

  28. Units It is very important that, whenever possible, data units are specified • Physical Units • We recommend the use of S.I. units while recognizing that other units are very common in the neutron and x-ray community e.g.meV and Angstroms. Whatever units are used, they must be specified as a character string in the format used by the Unidata UDunits utility. • Dates and Times • NeXus dates and times should be stored using the ISO 8601 formate.g. 1996-07-31 21:15:22+0600. This will avoid confusion, e.g. between U.S. and European conventions, and is appropriate for machine sorting.

  29. Recent Developments • Shift to HDF5 • Mark Koennecke has produced an HDF5 version of the NeXus API • If both HDF4 and HDF5 libraries are available • it is possible to read either type transparently • it is possible to choose which type to write • Use of XML • Chris Moreton-Smith has written a sample DTD file and XML file compatible with NeXus • Allows format validation and higher-level API’s

  30. Wishlist • Organization • Establish groups to develop and maintain instrument definitions. • Make NeXus more self-sustaining. • NeXus tools • Develop validation tools in XML • Develop a NeXus editor • Develop filters to standard tools (Origin, Excel, Python) • Develop high-level tools (GUI browsers, visualization) • Develop better installation kits

More Related