840 likes | 964 Views
Virtual Observatories and Data Interfaces for Atmospheric Science. 12th EISCAT International Workshop Incoherent Scatter Radar School Swedish Institute of Space Physics, Space Campus, Kiruna, Sweden Bill Rideout MIT Haystack Route 40, Westford, MA, USA 1-781-981-5624
E N D
Virtual Observatories and Data Interfaces for Atmospheric Science 12th EISCAT International Workshop Incoherent Scatter Radar School Swedish Institute of Space Physics, Space Campus, Kiruna, Sweden Bill Rideout MIT Haystack Route 40, Westford, MA, USA 1-781-981-5624 brideout@haystack.mit.edu 26 August 2005
Outline • Virtual Observatories • Madrigal • Web interface • Remote API • Extending/Contributing • Madrigal 2.4 • Cedar Database • Other data sources
A day in the life of an Atmospheric Scientist I have done an experiment with my instrument, but now I need to … • Search numerous websites for data • Figure out their parameters, units • Figure out their coordinate system, date format • Figure out how to determine data quality • Write code to download data, or (worse) manually download • Write code to convert to your format • Finally, do science
How can Virtual observatories help? Virtual Observatories – one stop data shopping!
Virtual Observatories • Ideally… • Provide a single interface to access all data • Knows about all data sources • Allows simple, powerful searches to discover unknown data sources • Always gets the most up-to-date data • Uses a single set of well-defined parameters • Provides data in consistent format(s) • Provides data in consistent coordinates • Informs user of contact information and rules-of-the-road for all data
Two approaches • Top down • Bottom up Build an interface Build a standard data source
How do they work? • Top-down approach: • Accept that all data sources will be forever incompatible • Build a data model so metadata can be shared • Build a unique interface to interface to each new data source. • Scales linearly with number of data sources. • Works best with more uniform data (i.e., astronomical images) • Bottom-up approach: • Standardize data format and semantics • Standardize data provider API • Approach taken by Madrigal/Cedar • Try for community acceptance
Outline • Virtual Observatories • Madrigal • Web interface • Remote API • Extending/Contributing • Madrigal 2.4 • Cedar Database • Other data sources
What is the Madrigal database? • An open-source, web-based database designed to hold one group’s data • www.openmadrigal.org has all code and downloads • Built upon the Cedar database format established over 20 years ago • Fundamentally a data source – allows local owners to improve/correct their data • Designed to be used for a wide variety of instruments • New installations always welcome!
Madrigal Data Model Madrigal site (typically a facility with scientists and a Madrigal installation) ↓ Instruments (ground-based, typically with a set location) ↓ Experiments (typically of limited duration, with a single contact) ↓ Experiment Files (represents data from one analysis of the experiment) ↓ Records (measurement over one period of time) ↓ Data shared among all Madrigal sites Data unique to one Madrigal site
Madrigal Records Records (measurement over one period of time) Three types: • Catalog record • descriptive information about entire experiment • Header record • descriptive information about one section of experiment • Data record • Stores values • All parameters defined by Cedar Database standard • Contains 3 parts • Prolog • 1D records • 2D records
Madrigal Data Records • Prolog • Start and end time • Instrument id • Kind of data id • 1D records (scalar) • Single value parameters • 2D records (vector) • Multiple value parameters • All parameters must have same number of rows • Meant to allow multiple spatial measurements • Not meant for time variation – conflicts with Prolog! Prolog Data record ID (scalar) – S/N=2.5 2D (vector) – Altitudes = 100,150, 200,250,300,350
Cedar/Madrigal Database • All parameters in file defined • http://cedarweb.hao.ucar.edu/documents/parameters_list.txt • Ranges of parameters for each instrument • Data stored in one or two 16 bit ints • Additional increment parameters • Error parameters • Mnemonics start with D • Code is negative of parameter
Cedar/Madrigal Database, continued • Special values • missing • assumed (error value only) • knownbad (error value only) • Defined in • http://cedarweb.hao.ucar.edu/cgi-bin/cedar_file_access.pl?filename=documents/cedar_fmt.pdf
Cedar Database parameters Example additional increment parameter
Cedar parameters - continued • Madrigal contains many “derived only” parameters • Not included in Cedar standard • Cannot be stored in Cedar file • New python API hides the existence of additional increment parameters • All values are doubles • Exceptions occur on overflow • More later…
Madrigal Derivation Engine • Derived parameters appear to be in file • Assumes information can be derived from records • Time from prolog • Position either as 1D or 2D • Other parameters • Engine determines all parameters that can be derived
Classes of derived parameters • Space, time • Examples: Local time, shadow height • Geophysical • Examples: Kp, Dst, Imf, F10.7 • Magnetic • Examples: Bmag, Mag conjugate lat and long, Tsyganenko magnetic equatorial plane intercept • MSIS • Examples: Tn, Nol
Outline • Virtual Observatories • Madrigal • Web interface • Remote API • Extending/Contributing • Madrigal 2.4 • Cedar Database • Other data sources
Madrigal web interface - homepage Access Data All Madrigal sites
Three ways to access Madrigal data Data in individual experiments Data across experiments Plot data across experiments
Searching for experiments Choose one or more instruments By default, view only most recent files By default, all Madrigal sites are searched Find any experiments with any overlap with these dates
Madrigal experiment listing These links could be to experiments at any site
Madrigal experiment files – part 1 These two files have no catalog or header records, otherwise there would be a link Data browser (isprint) allows viewing both measured and derived parameters with filtering
Madrigal experiment files – part 2 Madrigal allows any additional web-compatible files to be added to the experiment Image-conversion feature written at Eiscat Notes can be added by users – also written at Eiscat
Data browser (isprint) – part 1 Users can define filters to select certain filters and parameters with one click Filters to reduce data • Time • Altitude • Azimuth • Elevation …
Data browser (isprint) – part 2 Filters, continued • Filter data using any • parameter, or the sum, • difference, product or • quotient of two • parameters. Example: Nel –DNel > 1
Data browser (isprint) – part 3 See a full description of all parameters Choose parameters to display • Measured in bold • Derived in normal font • Listed by category • Click on any parameter for a full description
Data browser (isprint) – part 4 User clicked on CHISQ Some parameters have a more complete description
Data browser (isprint) – part 5 Longer description of CHISQ
Data browser (isprint) – part 6 String to indicate missing data Show header for each record option
Data output Display only text Save text version to file Summary of selected filters Headers were on in this example
Second approach – Global search Global search for data
Global search – part 1 Choose kinds of data Choose one or more instruments Choose seasonal filter (optional) Choose date range (optional)
Global search – part 2 Filter by experiment name (optional) Select parameters to display Filter using any parameter, just like isprint
Global search – selecting parameters Parameters with categories and pop-up definitions as on isprint page
Global search – review search Review all aspects of the global search before submitting
Global search – returned message Message returns number of files being searched, along with rough estimate of time required. Since reports may take a long time to generate, a email with a link is sent when done
Third approach – Plotting across experiments Plotting data from various instruments across experiments
Creating plots Select one or more instruments. In this example Svalbard and Millstone (not visible) selected. Click here to see a list of all experiments Select date range (can cross experiment boundaries) Select a scatter plot or pcolor of altitude versus time
Choose single parameter to plot Same pop-up listing of parameters as in isprint Radio buttons, since only one parameter can be selected
Set up limits and filters Set limits on the parameter you selected Data can also be filtered using another parameter If a pcolor plot, can set altitude limits
Pcolor plot output Rules of road for each site shown Single request generates Millstone and Eiscat plots Plot are requested from each site simultaneously to improve performance
Pcolor plot output – part 2 Can add more stacked plots with different parameters, or start over
Adding additional plots Now add a scatter plot of DST with same time scale
Adding additional plots – part 2 Time scales align with stacked plots if times not changed
Outline • Virtual Observatories • Madrigal • Web interface • Remote API • Extending/Contributing • Madrigal 2.4 • Cedar Database • Other data sources
Remote Access to Madrigal Data • Built on web services • Like the web, available from anywhere on any platform • Complete Matlab and Python API written • More APIs available on request or via contribution
Madrigal Web Services • Simple delimited output via CGI scripts • Not based on SOAP or XmlRpc since no support in languages such as Matlab • CGI arguments and output fully documented at http://www.haystack.edu/madrigal/remoteAPIs.html
Madrigal Web Services – part 2 • To write a new API, each method must • Take input arguments and generate the correct CGI URL • Parse the delimited text • Return data to user