150 likes | 246 Views
Sciamachy features and usage with respect to end-users. The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP Heidelberg team. SCIAMACHY on ENVISAT , a brief introduction. SCIAMACHY. SCIAMACHY data viewer (1 orbit =300Mb).
E N D
Sciamachy features and usage with respect to end-users The typical fate of retrieval people dealing with large datasets… C. Frankenberg, SRON team, IUP Heidelberg team
SCIAMACHY on ENVISAT, a brief introduction SCIAMACHY ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
SCIAMACHY data viewer (1 orbit =300Mb) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Scientific question in my case:Retrieval of CH4 and CO2 Spectra vertical column densities of CO2 and CH4 xVMR(CH4) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
CH4 VMR August through November 2003 Frankenberg et al., Assessing methane emissions from global space borne observations, Science 2005 ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Issues related to ADAGUC • SCIAMACHY data access, 5Gb/day direct download from the Netherlands SCIAMACHY data center • Data access, binary PDS file • No library available at that time • Official reading tool not useful for nearly operational retrievals • Own C/C++ access routine was written • Complex code structure, retrieval and data access are difficult to separate Too instrument specific to be of general interest in ADAGUC ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Issues related to ADAGUC • General procedure: 1) Level 1 PDS File: Geographic entity (usually a 60*120km rectangle) comprises spectra and numerous auxiliary datasets 2) Retrieval via own C++ code, results stored in so called level 2 file 3) Level2 File (own format, so far ASCII) Geographic entity comprises eg CH4 total column and additional parameters such as cloud cover, albedo, fit error, etc. 4) Generating gridded plots of the level 2 files depending on filter criteria (eg. CloudTopHeight < 1km, fitError < 2%) 5) Compare data (raw and gridded) with other datasets (eg. Model output, retrievals of other groups, other satellite sensors) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
What is of general interest? • Points 3-5: 3) Output file generation (file format, no standards!)4) Gridding and plotting data based on predefined selection criteria5) Comparing datasets ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Output file generation • Why ASCII? • Human readable • Easiest exchange between different groups (preferred format for the comparison between SRON, IUP Bremen, IUP Heidelberg) • Variety of linux tools available for processing, most notably awk • Drawbacks… • Slow access, big files, files not self-describing • Why didn’t I use HDF/netCDF/GIS format? • Lazy (additional work, new skills necessary) • Awk tools not available ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Gridding, projections, plotting • What did I use? • Admittedly very simple methods, lat/lon box gridding with own routines, IDL plotting/projection routines • What would be nice? • Better gridding options (eg weighting by the overlapping area) • Data conversion tools for easier access to tools such as GMT (Generic mapping tool) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Comparing datasets • a headache • Even within SCIA: different pixel sizes comparing different species needs averaging to the lowest resolution, how to do the averaging? • Processing a lot of files is slow due to the ASCII format • Data exchange • In my case only within the atmospheric community, so no direct problems as people were experienced with the formats, ASCII no problem anyway (but slow and large) • What is needed for the GIS community, level 2 and/or level 3 (gridded) data? ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
What I find ideal… • Results stored in a relational database management system (RDBMS) with extracting routines of subsets to HDF, netCDF, ASCII • Why? Database systems are meant for large datasets and complex queries to derive subsets • Simple example in SQL languageselect avg(CH4) from results where latitude>50 and latitude <51 … and albedo>0.2 and cloudCover<0.05 • FAST due to indexing (tested with a test database with 5 million entries, one query takes no time)! • Selection criteria easy (no awk necessary) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Even better: Spatial SQL • Spatial SQL: Spatial extension of the database systems (eg. Points, polygons, etc) • Example syntax (Postgres): SELECT ch4_total_column FROM results WHERE distance( center_point, GeomFromText( 'POINT(10.0 20.0)', -1 ) ) < 100 • Dumpers to eg “shape files” available: pgsql2shp [<options>] <database> <query> • Direct connection to data viewers such as QGIS possible • Web interface to the interactive plotting tool mapserver ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
What takes most of the time? • SCIA data format Esp. level2 files for validations are far too complex and frustrate people • Data filtering plotting interpreting change filters and so forth An interactive data viewer would be great (such as in GIS, click on the point and you get additional information) ADAGUC meeting, KNMI, De Bilt, 03/04 October 2006
Lots of time for discussion Website for spatial RDBMS: www.postgis.org