300 likes | 408 Views
Reading HDF family of formats via NetCDF-Java / CDM. John Caron UCAR/Unidata. NetCDF-Java library. 100% Java Open Source (LGPL, MIT) Independent implementation Used as a component in other software (partial) Integrated Data Viewer, THREDDS Data Server (Unidata) Panoply (NASA)
E N D
Reading HDF family of formatsvia NetCDF-Java / CDM John Caron UCAR/Unidata
NetCDF-Java library • 100% Java • Open Source (LGPL, MIT) • Independent implementation • Used as a component in other software (partial) • Integrated Data Viewer, THREDDS Data Server (Unidata) • Panoply (NASA) • ncBrowse (EPIC/NOAA) • Java NEXRAD Viewer (NCDC/NOAA) • MyWorld GIS (Northwestern) • EDC for ArcGIS, ERRDAP (SFSC/NOAA) • Live Access Server (PMEL/NOAA) • ncWMS (Reading) • Matlab plug-in (USGS)
THREDDS Catalog.xml Application Scientific Feature Types Datatype Adapter NetCDF-Java/ CDM architecture NetcdfDataset CoordSystem Builder NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …
Format Readers (IOSP) • General: NetCDF, HDF5, HDF4, OPeNDAP • Gridded: GRIB-1, GRIB-2, GEMPAK • Radar: NEXRAD 2&3, DORADE, CINRAD, Universal Format • Point: BUFR, ASCII • Satellite: DMSP, GINI, McIDAS AREA • Misc: GTOPO, Lightning, etc • Others in development (partial): • AVHRR, GPCP, GACP, SRB, SSMI, HIRS (NCDC)
Why all the trouble? • ~20-40% C/C++ time spent on portability issues • Platform Independence • Linux, Solaris, Windows (Sun) • Mac OS X (Apple) • AIX, Linux, Windows, z/OS (IBM) • HP-UX (Hewlitt-Packard) • Progammer productivity • Object-Oriented • Garbage Collected – no memory leaks • Rich libraries • Open source • Faster than C for some applications
Independent implementation • Written entirely from reading HDF4, HDF5 file specifications • Helped debug (HDF5), validate file specs • File format spec is what will be needed in 100 years to read legacy data • OTOH, semantics not always obvious • Don’t confuse reference implementation with the file/protocol specification
HDF family of formats • HDF5/NetCDF-4 • HDF4 • HDF-EOS • Note: read-only, no parellel I/O, etc
HDF5/NetCDF4 • Goal is to read all HDF5 • Can read all HDF5 files that we have example • including references, soft links • Complete coverage difficult to guarantee – combinatoric explosion • Some esoteric features we are skipping • File drivers, external files, slib compression • Working on a comprehensive test harness • JNI interface to Netcdf4/HDF5 library • read every byte and compare
HDF4 / HDF-EOS • Complete, works against all examples • Tested against 400 sample files (27 Gb) • thanks to Ruth Duerr (NSIDC) • Spot checked against HDFView • Need systematic test to compare reading against the HDF4 C Library
Swath Float lat(245, 33477); Float lon(245, 33477); Float time(33477); Float data(245, 33477); Just know that its swath data • 245 points cross track • 33477 along the track • Each scan has a time coordinate
Swath Float lat(33477, 245); Float lon(33477, 245); Float time(33477); Float data(245, 33477);
Swath Float lat(999,999); Float lon(999,999); Float time(999); Float data(999,999);
Swath Float v1(999, 999); Float v2(999, 999); Float v3(999); Float v4(999,999);
If you write data • Don’t rely on variable name conventions • Don’t rely on index ordering • Don’t rely on matching index sizes • Minimize “you just have to know that…”
Dimensions Dimensions d1=999; d2=999; Variables: float v1(d1=999, d2=999); float v2(d1=999, d2=999); float v3(d2=999); float v4(d2=999,d1=999);
Good Variables: float v1(d1=999, d2=999); v1:standard_name = “Latitude”; float v2(d1=999, d2=999); v2:standard_name = “Longitude”; float v3(d2=999); v3:standard_name = “Time”; float v4(d2=999,d1=999); Data_type = “Swath”; Conventions = “My unique name”;
If you write data • Unique signature • Specify dimensions • Identify georeferencing coordinates • Identify data type • Units are not optional
HDF-EOS, HDF-EOS2 • Read “structural metadata” field to obtain more semantics • Parse text in “ODL” • Data type: Swath, Grid, Point • Dimensions • Geolocation coordinate variable types: Latitude, Longitude, Time
HDF-EOS, HDF-EOS2 • Good • Unique signature, identify coordinates and data type • Not so good • ODL • Not using hdf4/5 constructs • Bad • No data units • No time coordinate units!
Better EOS Variables: float v1(999, 999); v1:standard_name = “Latitude”; v1:dims = “d1 d2”; float v2(999, 999); v2:standard_name = “Longitude”; v2:dims = “d1 d2”; float v3(999); v3:standard_name = “Time”; v3:dims = “d2”; float v4(999,999); v4:dims = “d2 d1”;
NPP (i1.4.0.3_NPP_QUAL) • Good • XML better than ODL • Not so good • Not using hdf4/5 constructs • Bad • No data units • No time coordinate units! • Fatal Error: please reboot • Metadata not in the same file
Summary • Netcdf-Java reads entire HDFx family • Good for Java-philes • Needs more testing • Send example files, $ • Dimensions are not optional • Keep structural and georeferncing metadata in the same file as the data • Can also have specialized external files
Contact caron@ucar.edu Google “netcdf java”
NetCDF-4 and Common Data Model (Data Access Layer)
Dimension primer Float lat(180); Float lon(360); Float alt(20); Float time(1200); Float data(1200,20,180,360);
Unique Name! Float lfip(lfip=180); Float lflop(lflop=180); Float zorg(zorg=20); Float skdf(skdf=1200); Float dglot(skdf=1200,zorg=20, lfip=180,lflop=180);
Float lfip(180); Float lflop(180); Float zorg(20); Float freebish(1200); Float dglot(1200,20,180,180);
Float lat(180); Float lon(180); Float alt(20); Float time(1200); Float data(1200,20,180,180);