400 likes | 583 Views
TDAS Usage of the CDF File Format. Jim Lewis UCB Space Sciences Lab. Introduction. TDAS has adopted the CDF file format as its primary data storage mechanism. CDF libraries and documentation are available from http://cdf.gsfc.nasa.gov/
E N D
TDAS Usage of the CDF File Format Jim Lewis UCB Space Sciences Lab
Introduction • TDAS has adopted the CDF file format as its primary data storage mechanism. • CDF libraries and documentation are available from http://cdf.gsfc.nasa.gov/ • Supported languages include IDL, C, Fortran, Java, Perl • Supported platforms include Windows, MacOS, Linux, Solaris, and others • CDF files contain data variables with fixed-size records. Available data types include signed or unsigned integers with various bit depths; single or double precision floating point, CDF_EPOCH (8 byte) or CDF_EPOCH16 (16 byte) timestamps, and fixed-length character strings.
CDFs contain metadata describing global properties (mission name and description, PI, informative links) and per-variable properties (units, coordinate systems, associations with related “support data” variables specifying sample times, axis labels, etc.) • The CDF libraries hide platform-specific details like floating point representations or big-endian versus little-endian conventions. • The CDF format imposes very few constraints on how data may be represented, but for interoperability between projects, some agreed-upon data and metadata conventions are desirable.
L0, L1, and L2 processing flows for THEMIS • L0 CCSDS packets (a raw binary, non-CDF format) are extracted from incoming telemetry files. Packet-level decompression is applied if necessary. Corrupted data is identified when possible, and removed from L0 products. • L0 packets require deep knowledge of S/C and instrument subsystems to be usefully interpreted, and are not intended for end users. Exception: TDAS still uses L0 products for working with the ESA instrument, but plans are in place to migrate to use of L1 data. • L0 packets are processed to produce L1 CDFs. For THEMIS, the most important operation at this stage is time-tagging individual samples, since each L0 packet contains multiple samples but only a single timestamp.
L1 CDFs are uncalibrated, and are expressed in each instrument’s native units and coordinate system. • L1 CDFs are combined with calibration, ephemeris, and attitude data to produce L2 CDFs. • L2 CDFs are calibrated, and data is expressed in standard engineering units and geophysically relevant coordinate systems. They represent the THEMIS “key parameter data”.
For interoperability with SPDF, THEMIS uses the ISTP metadata conventions described at http://spdf.gsfc.nasa.gov/sp_use_of_cdf.html • ISTP specifies a required set of global attributes describing the mission, the instrument that produced the data set, and links to mission documentation. • ISTP specifies a required set of per-variable attributes describing each variable, its associations with other variables appearing in that CDF, units, coordinate systems, etc. • A detailed list of ISTP requirements is beyond the scope of this presentation, but we can provide examples of how they’re implemented on THEMIS. • Further documentation is available in the THEMIS L1 File Definition document.
THEMIS-specific metadata conventions • TDAS uses a variable attribute DEPEND_TIME to associate each data variable with a time variable, which should be a double-precision floating point quantity representing a Unix timestamp (seconds since 1970-01-01). • ISTP requires a variable attribute DEPEND_0, which should refer to a variable of type CDF_EPOCH or CDF_EPOCH16. • It is impractical to store both sets of timestamps due to file size considerations. The CDF_EPOCH variable exists in the CDF, but contains no data. Instead, the CDF_EPOCH values are to be calculated from the Unix epoch and timestamps. • THEMIS and SPDF use the DATA_VERSION global attribute and file naming convention somewhat differently.
How TDAS load routines work • Basic operating principle: given a data source (e.g. tha, thb, FYKN), a data level (L1 or L2), a data version (V00, V01, V02), data type (FGM, SCM, ASI, GMAG, etc), and a time range (usually dates, sometimes dates + times), generate a list of filenames or URLs from which data variables need to be loaded. • File management: For each supported mission, TDAS maintains user-configurable parameters describing the remote filesystem or URL root, and a local filesystem root. TDAS can be configured to ignore the remote data location and work directly from the local data repository. • Since THEMIS data is freely available as soon as it’s processed, TDAS contains no provisions for passwords, cookies, certificates, or other authentication or authorization methods.
After the remote data is downloaded to the local repository, the requested variables, time tags, and metadata attributes are read from the local repository using CDF library commands. • Data from multiple files is aggregated into a single entity (a TPLOT variable), which contains data values, timestamps, and metadata values. • The TDAS load routines contain “hooks” for post-processing steps that can be performed after each data set is loaded. This allows data loading, calibration, and coordinate transformations to be performed with a single call to the load routine. • Calibration and coordinate transforms may require combining data from several types of CDFs. For example, the attitude and ephemeris data in the THEMIS L1 STATE CDFs are required (and can be automatically loaded) for many choices of output coordinates.
Developing TDAS-compatible CDFs • New CDF types should have all required ISTP metadata. • Data variables should have the THEMIS DEPEND_TIME attribute defined to point to a variable containing double-precision Unix timestamp data. • The CDF naming convention should support the mapping of data source, processing level, data type, data version, and time range onto a list of URLs or filenames. • The CDF data should be made available either by HTTP download or a locally available filesystem. If HTTP downloads are not feasible due to access control issues, the TDAS remote/local file management functionality can be bypassed, so that TDAS only looks in the local repository, and file distribution to end users will occur outside of TDAS.
Tradeoff considerations for design of CDF file structures • Download time, versus processing time, versus level of interoperability. Example: THEMIS L2 FGM CDFs contain the same magnetometer data in several coordinate systems. We discovered that it was faster to load, calibrate and cotrans the smaller L1 data sets “on the fly” than it was to download the larger L2 files. So why include redundant data in the L2 files? End users might want to import it into some other toolchain which doesn’t support the required calibration or coordinate transform functions. • Multiple CDF types, versus everything in one file. The original THEMIS concept was to have one daily data CDF file, combining all instruments. But this would have introduced a great deal of “churn”, whenever reprocessing was necessary to fix a processing bug or add a CDF variable. Splitting the CDFs by instrument type gave us finer-grained control of reprocessing required to implement changes or bug fixes.
Summary • A CDF is a platform-independent container for storing data, support data, and metadata. • SPDF and TDAS have different, but compatible, conventions for how CDF metadata is expressed. • TDAS naming conventions allow client software to generate a list of filenames or URLs to load, given the type and time range of data requested. • TDAS load routines take CDFs as input, and produce TPLOT variables as output. • TDAS calibration, cotrans, and analysis routines take TPLOT variables as input, and produce new TPLOT variables as output. • Command-line plotting routines generate plots from TPLOT variables. • The TDAS GUI imports/exports between TPLOT variables and its own internal format.
Overview • TDAS software extends SSL routines for THEMIS mission. • Collaborative software development through SVN. • Combines mission specific routines for loading/calibration/coordinate transformation. • With general purpose analysis routines. • Made possible by acceptance of common conventions for data representation inside IDL.
SVN • SVN allows provides automatic version management. • Merges additions/changes from multiple programmers at different locations. • Identifies conflicts if programmers modify same code. • Tracks modification history. • Provides central repository for code. • Server side scripts create releases/nightly builds/processing builds automatically.
Command Line Loading • If CDF conventions are followed load routines can be developed using two techniques. • #1 thm_load_xxx, highly parameterized routine does all the work. • #2 Use file_retrieve/file_http_copy to download file. cdf2tplot loads data into tplot variables. • Simple example of #1 themis/spacecraft/thm_load_bau.pro • Simple example of #2 themis/spacecraft/thm_load_scmode.pro
GUI Loading • Requires command line load routine • Requires interface panel specific to mission to load data. Top level base must fit inside IDL widget_tab. Widget ID provided as parameter. (see: themis/thm_ui_new/thm_ui_init_load_window.pro) • Load panel must, call CL load routine, add loaded tplot variables to GUI thm_ui_loaded_data object, and cleanup tplot variables after load. • Tplot variables and GUI variables are stored in separate namespace to prevent accidental collision/interaction with CL routines.
Topic #1 Development/Management of TDAS programs. A: ERG folder/development branch. Discussion Item: How often do you plan to update ERG branch? Multiple times a day? Once a week? How quickly will ERG code be changing? Agree maintaining separate repositories is ideal. But we need a policy to prevent development fork and maintain compatibility between ERG & TDAS Suggest we provide you authentication for our SVN server. Automated process(nightly) or programmer(periodically) checks out ERG branch from ERG servers, checks into TDAS servers. We can provide example scripts that perform automatic svn builds. NOTE: The less frequently that copies are synchronized the greater the probability that a fork occurs which prevents backward compatibility. This has occurred in the past between TDAS and WIND. Would like to avoid in the future, if doesn't require too much effort. We also have SVN email notifications that occur when changes are made to our repositories. We can put your developers and/or scientists on this list, so that you will be aware of changes throughout TDAS.
Topic #1 Development/Management of TDAS programs.B: General purpose • General Purpose(GP) routines present greater risk of compatibility problems. • But we recognize modifications/additions to GP routines are necessary and beneficial. • Initially, we recommend that any routines are added to the ERG branch, with 'erg_' prefix. Additions will be checked into ssl_general with prefix removed after code review. • Discussion Items: • Do you have any formal QA processes in mind? (QA Scripts, Test Suites, and/or automated unit tests) • What policies do you have in mind for Help & Bug reports? • Any plans on synchronization with TDAS repositories?
Topic #2 Naming conventions • Informal convention is: • A all routines for a mission should use the same prefix (ex: tha_, erg_) • B all tplot variables for a mission should use the same prefix (ex: tha_fgs_dsl, erg_state_pos) • C For ease of load, it is generally useful to maintain these designations inside the CDF, as well. • D Can setup mailing list and/or periodic meetings for technical discussions as needed.
Topic 3: Rules of the road • Currently we maintain rules of the road in CDFs and on THEMIS website, but do not explicitly post them from within the software. • External package IDL_GEOPACK, posts rules of the road to command line upon first usage during a session. • Most TDAS missions use common initialization routines( iethm_init) Could add to erg_init; would present notification, and update data structure to remember preference; same technique used to store configuration parameters could remember user response. • Should automatically appear when using the GUI if erg_load routine were added to themisgui. • Side note: It is important to not assume the presence of a graphical display. Recommend either using only command-line query or checking display availability and falling back to CL.
Topic #4 Access control • Uncertain if we can support CLUSTER-style access controls at this time. • THEMIS doesn't have any explicit access control system. • Does support multiple url-configurable repositories. If a repository provides access it can download. • Individual repositories can control access based upon their own criterion. • For more restricted access, users are required to manually download and place data in appropriate directory. • Potential ways to extend access control. Extend file_http_copy to support https, Create download utility which uses IDLnetURL objects. • On command line access control would probably be coupled with the download process, but cdf2tplot and analysis could be completely independent.
Topic #5: 2d+ plotting routines • Support some 2d+ plotting routines • tplot:specplot, for time-series data. • plotxy & tplotxy: for isotropic 2d line data. • plotxyz: for isotropic 3d line data. • plotxyvec: for isotropic 2d vector arrows. • accgm_plot: for generating maps/coordinate grids using aacgm coordinates • thm_map_set & thm_map_add for creating map-mosiacs • Additional routines & improvements are made as specific requests are received and prioritized.
Topic #6 IDL VM • Testing has not been performed with GUI in VM to verify. (We may have missed places, or there may be VM compatibility issues that are undocumented) • GUI using VM should be possible. When developing THEMIS GUI we made sure to avoid all restrictions on IDL VM. • Releasing on VM would require creating an IDL save file of the GUI and debugging any platform related issues. • Difficulty would depend largely on the extent of undocumented compatibility issues and the time to fix those issues. • TDAS-Web: Unaware of any technical limitations.
Topic #7 GUI development • What is CUI? • GUI has many more inter-dependencies than CL routines. • Development coordination will be a much more significant task. • Currently there is not any standard API for extension of GUI. At the moment extension would require direct modification of GUI internals. • GUI interfaces via several objects: thm_ui_loaded_data(read/write data products) , thm_ui_windows, thm_ui_window(top level of display settings hierarchy), thm_ui_draw_object(Redraw, query information about displayed panels, manage real-time features), thm_ui_call_sequence(Record actions for document replay), thm_ui_message_bar, thm_ui_history(logging) • There are also a number of standard panel widgets(calender, spinner, data tree)
Topic #8 User statistics THEMIS Statistics collected by SSL servers. • What statistics are required? • Current mechanism does not provide more detailed statistics.
Examples: Other missions • TDAS CL supports ACE, FAST, GOES, KYOTO(DST), LANL, STEREO, WIND. Several GMAG & ASI networks: UCLA-GBO,UCLA-EPO, CARISMA(UAlberta), MACCS(Augsburg), GIMA(UAlaska). See ssl_general/missions/ace/ for examples.(ace_init.pro, ace_load_swepam.pro) • TDAS GUI supports ACE,WIND,GOES, plus all GMAGS. See themis/thm_ui_new/panels/thm_ui_load_data_file/ for examples.(thm_ui_init_load_window.pro,thm_ui_goes_data.pro,thm_ui_goes_data_load.pro)
THEMIS Data Analysis Software Graphical User Interface
Resulting Display from Plot/Layout Selections
Results from Line Option Selections
Special Features Zoom Control Legend Box Marked Area Of Interest Cursor Tracking Status Bar