1 / 42

Data Standards Workflow

Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web.

marlis
Download Presentation

Data Standards Workflow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools

  2. Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools

  3. Transform • Add metadata • Store in netcdf • Save script in subversion

  4. Transform Add metadata • Use the inspire meta data form to store information about the dataset. • http://www.inspire-geoportal.eu/inspireEditor.htm • Click launch editor

  5. Transform – add metadata validation Turn validation on

  6. Transform – add metadata File identification Location in subversion micore

  7. Transform – add metadata quality History of your data.

  8. Transform – add metadata constraints Please fill in limitations of use.

  9. Transform – add metadata Save metadata file • Save metadata file (local) • Add to subversion (local) • Commit => metadata into subversion (remote) • Store in • course/Pcnumber/inspire_description.xml

  10. Transform • Add metadata • Store in netcdf • Save script in subversion

  11. Transform Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention

  12. Transform – store in netcdf - netcdf What is netcdf • Data format defined by unidata • Data store used for coverage data and multidimensional data • CF Metadata convention

  13. T Y Z X Transform Transform – store in netcdf - netcdf What is netcdf • An array based data structure for storing multidimensional data • N-dimensional coordinates systems • X coordinate (e.g. longitude) • Y coordinate (e.g. latitude) • Z coordinate (e.g. altitude) • Time dimension • … other dimensions • Variables – support for multiple variables • Temperature, humidity, pressure, salinity, etc • Geometry – implicit or explicit • Regular grid (implicit) • Irregular grid • Points

  14. Transform – store in netcdf - netcdf Storing Multidimensional Data X Y Z 14 numbers 32 numbers

  15. Transform – store in netcdf - netcdf Data Model Data model for netcdf and others. Also usable for hdf, opendap, grib, etc. See the java library for details

  16. Transform – store in netcdf – netcdf - applications ArcGis ArcGis also reads and writes netcdf files.

  17. Transform – store in netcdf - netcdf Your favorite text editor xml representation of a netcdf file

  18. Transform – store in netcdf - netcdf Other Tools Not so stable. Very useful IDV NCO #diff ncdiff -v time file1.nc file2.nc #compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression) #selecting variables by regex ncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc. Web hyperslabs, cool!

  19. Data Standards Workflow Extract Load Provide Transform Raw data Scripts Database Charts & Maps Store raw data in subversion to keep track of history Add meta information Script to convert raw data into netcdf Stored files (netcdf) accessible through the web Tools and websites OpenEarthRawData OpenEarth OPeNDAP OpenEarthTools

  20. Transform – store in netcdf - script Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention

  21. Transform – store in netcdf - script Write script • Read raw data • Read header line • Read data • Read all data • Create function to read all data • Use function in Matlab • Raw data into empty netcdf file • Create empty netcdf file • Add dimensions and variables • Store variables • Read values

  22. Transform – store in netcdf - script Reading raw data into memory • Use one of the following matlab functions to read the file data into an array • fscanf

  23. Transform – store in netcdf - script Example: Transect.txt file Header line Year number of points 1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951 … 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951 Points X Z X Z …. 9999999 Location: OpenEarthRawData\course\example\raw

  24. Transform – store in netcdf - script Read header line >> fid = fopen('..\raw\transect.txt') fid = 15 >> header = fscanf(fid, '%d', 2) header = 2000 58 >> year = header(1) year = 2000 >> npoint = header(2) npoint = 58

  25. Transform – store in netcdf - script Read data 1 >> % read data data = fscanf(fid, '%d', npoint*2) data = -150 3741 -140 3581 -135 2 >> data = reshape(data, [2, npoint]) data = Columns 1 through 7 -150 -140 -135 -130 3741 3581 3531 3541 3 % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; >> % use column vectors data = data' data = -150 3741 -140 3581 -135 3531

  26. Transform – store in netcdf - script Read all data % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen('..\raw\transect.txt'); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end

  27. Transform – store in netcdf - script Create a function function transect = readtransect(filename) % preallocate all data % (time, coastward) transectseries = NaN(3, 58); coastward_distance = NaN(58, 1); time = NaN(3, 1); % open file and get file id fid = fopen(filename); i = 1; while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1; end transect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time); end

  28. Transform – store in netcdf - script Use the new function >> data = readtransect('..\raw\transect.txt') data = series: [3x58 double] distance: [58x1 double] time: [3x1 double]

  29. Transform – store in netcdf - script Loading data into netcdf • What does a netcdf file look like • Required meta information

  30. Transform – store in netcdf - script Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; }

  31. Transform – store in netcdf - script Create an empty netcdf file >> nc_create_empty(outputfile) >> nc_dump(outputfile) netcdf transect.nc { dimensions: variables: }

  32. Transform – store in netcdf - script Add dimensions nc_add_dimension(outputfile, 'crossshore', 58) nc_add_dimension(outputfile, 'time', 3) nc_dump(outputfile) >> netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: } help nc_add_dimension

  33. Transform – store in netcdf - script Add variables crossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... ); nc_addvar(outputfile, crossshoreVariable); timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... ); nc_addvar(outputfile, timeVariable); heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... ); nc_addvar(outputfile, heightVariable); nc_dump(outputfile) help nc_addvar

  34. Transform – store in netcdf - script Result netcdf transect.nc { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward), shape = [58] coastward_distance:unit = "metre" float year(time), shape = [3] year:unit = "year" float height(time,coastward), shape = [3 58] height:unit = "metre" }

  35. Transform – store in netcdf - script Store variables nc_varput(outputfile, 'height', data.series) nc_varput(outputfile, 'year', data.time) nc_varput(outputfile, 'coastward_distance', data.distance) help nc_varput

  36. Transform – store in netcdf - script Result: Netcdf file transect.nc netcdf transect { dimensions: coastward = 58 ; time = 3 ; variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ; data: coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ; }

  37. Transform – store in netcdf - script Read values surface(nc_varget(outputfile, 'height')')

  38. Transform – store in netcdf - convention Store in netcdf • What’s netcdf? • Write a script to transform data into netcdf • Using CF convention

  39. Transform – store in netcdf - convention CF convention Standard used by USGS, NOAA, Arcgis, GDAL Climate and Forecast (CF) Convention http://www.unidata.ucar.edu/software/netcdf/docs/conventions.html Initially developed for • Climate and forecast data • Atmosphere, surface and ocean model-generated data • Also used for observational datasets • CF is the most widely used convention for geospatial netCDF data.

  40. Transform – store in netcdf - convention Improve output • Store extra attributes • Title • Author • Standard_name

  41. Transform • Add metadata • Store in netcdf • Save script in subversion

  42. Transform – save script Save script • Save script (local, using matlab https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/) • Add to subversion (local) • Commit => script into subversion (remote)

More Related