420 likes | 568 Views
Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web.
E N D
Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools
Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools
Transform • Add metadata • Store in netcdf • Save script in subversion
Transform Add metadata • Use the inspire meta data form to store information about the dataset. • http://www.inspire-geoportal.eu/inspireEditor.htm • Click launch editor
Transform – add metadata validation Turn validation on
Transform – add metadata File identification Location in subversion micore
Transform – add metadata quality History of your data.
Transform – add metadata constraints Please fill in limitations of use.
Transform – add metadata Save metadata file • Save metadata file (local) • Add to subversion (local) • Commit => metadata into subversion (remote) • Store in • course/Pcnumber/inspire_description.xml
Transform • Add metadata • Store in netcdf • Save script in subversion
Transform Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention
Transform – store in netcdf - netcdf What is netcdf • Data format defined by unidata • Data store used for coverage data and multidimensional data • CF Metadata convention
T Y Z X Transform Transform – store in netcdf - netcdf What is netcdf • An array based data structure for storing multidimensional data • N-dimensional coordinates systems • X coordinate (e.g. longitude) • Y coordinate (e.g. latitude) • Z coordinate (e.g. altitude) • Time dimension • … other dimensions • Variables – support for multiple variables • Temperature, humidity, pressure, salinity, etc • Geometry – implicit or explicit • Regular grid (implicit) • Irregular grid • Points
Transform – store in netcdf - netcdf Storing Multidimensional Data X Y Z 14 numbers 32 numbers
Transform – store in netcdf - netcdf Data Model Data model for netcdf and others. Also usable for hdf, opendap, grib, etc. See the java library for details
Transform – store in netcdf – netcdf - applications ArcGis ArcGis also reads and writes netcdf files.
Transform – store in netcdf - netcdf Your favorite text editor xml representation of a netcdf file
Transform – store in netcdf - netcdf Other Tools Not so stable. Very useful IDV NCO #diff ncdiff -v time file1.nc file2.nc #compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression) #selecting variables by regex ncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc. Web hyperslabs, cool!
Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools
Transform – store in netcdf - script Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention
Transform – store in netcdf - script Write script • Read raw data • Read header line • Read data • Read all data • Create function to read all data • Use function in Matlab • Raw data into empty netcdf file • Create empty netcdf file • Add dimensions and variables • Store variables • Read values
Transform – store in netcdf - script Reading raw data into memory • Use one of the following matlab functions to read the file data into an array • fscanf
Transform – store in netcdf - script Example: Transect.txt file Header line Year number of points 1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951 … 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951 Points X Z X Z …. 9999999 Location: OpenEarthRawData\course\example\raw
Transform – store in netcdf - script Read header line >> fid = fopen('..\raw\transect.txt') fid = 15 >> header = fscanf(fid, '%d', 2) header = 2000 58 >> year = header(1) year = 2000 >> npoint = header(2) npoint = 58
Transform – store in netcdf - script Read data 1 >> % read data data = fscanf(fid, '%d', npoint*2) data = -150 3741 -140 3581 -135 2 >> data = reshape(data, [2, npoint]) data = Columns 1 through 7 -150 -140 -135 -130 3741 3581 3531 3541 3 % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; >> % use column vectors data = data' data = -150 3741 -140 3581 -135 3531
Transform – store in netcdf - script Read all data % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen('..\raw\transect.txt'); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end
Transform – store in netcdf - script Create a function function transect = readtransect(filename) % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen(filename); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end transect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time); end
Transform – store in netcdf - script Use the new function >> data = readtransect('..\raw\transect.txt') data = series: [3x58 double] distance: [58x1 double] time: [3x1 double]
Transform – store in netcdf - script Loading data into netcdf • What does a netcdf file look like • Required meta information
Transform – store in netcdf - script Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; }
Transform – store in netcdf - script Create an empty netcdf file >> nc_create_empty(outputfile) >> nc_dump(outputfile) netcdf transect.nc { dimensions: variables: }
Transform – store in netcdf - script Add dimensions nc_add_dimension(outputfile, 'crossshore', 58) nc_add_dimension(outputfile, 'time', 3) nc_dump(outputfile) >> netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: } help nc_add_dimension
Transform – store in netcdf - script Add variables crossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... ); nc_addvar(outputfile, crossshoreVariable); timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... ); nc_addvar(outputfile, timeVariable); heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... ); nc_addvar(outputfile, heightVariable); nc_dump(outputfile) help nc_addvar
Transform – store in netcdf - script Result netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward), shape = [58] coastward_distance:unit = "metre" float year(time), shape = [3] year:unit = "year" float height(time,coastward), shape = [3 58] height:unit = "metre" }
Transform – store in netcdf - script Store variables nc_varput(outputfile, 'height', data.series) nc_varput(outputfile, 'year', data.time) nc_varput(outputfile, 'coastward_distance', data.distance) help nc_varput
Transform – store in netcdf - script Result: Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; }
Transform – store in netcdf - script Read values surface(nc_varget(outputfile, 'height')')
Transform – store in netcdf - convention Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention
Transform – store in netcdf - convention CF convention Standard used by USGS, NOAA, Arcgis, GDAL Climate and Forecast (CF) Convention http://www.unidata.ucar.edu/software/netcdf/docs/conventions.html Initially developed for • Climate and forecast data • Atmosphere, surface and ocean model-generated data • Also used for observational datasets • CF is the most widely used convention for geospatial netCDF data.
Transform – store in netcdf - convention Improve output • Store extra attributes • Title • Author • Standard_name
Transform • Add metadata • Store in netcdf • Save script in subversion
Transform – save script Save script • Save script (local, using matlab https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/) • Add to subversion (local) • Commit => script into subversion (remote)