420 likes | 519 Views
Introduction to Observational DataBase (ODB) sami.saarinen@ecmwf.int paul.burton@ecmwf.int 25-Apr-2007. Overview. Introduction to ODB Creating a simple database Use of simulobs2odb –program Visualizing data using basic odbviewer More complex databases ODB within IFS/4DVAR-system
E N D
Introduction to Observational DataBase (ODB) sami.saarinen@ecmwf.intpaul.burton@ecmwf.int25-Apr-2007
Overview • Introduction to ODB • Creating a simple database • Use of simulobs2odb –program • Visualizing data using basic odbviewer • More complex databases • ODB within IFS/4DVAR-system • Manipulating ODB data from Fortran90 • Few tools: odbsql, odbdiff, odbcompress, odbdup, odb2netcdf • ODBTk : A GUI-based ODB visualisation toolkit • A separate presentation & demo by Paul Burton
Introduction to ODB • ODB is a tailor made (hierarchical) database software developed at ECMWF to manage very large observational data volumes through the ECMWF IFS/4DVAR-system on highly parallel supercomputer systems • ODB also enables flexible post-processing of observational data even on a desktop computer • ODB software is written in C and Fortran-90 languages and is available virtually on any Unix-systems (and now also for Windows/CYGWIN) • The software can be installed from source code (“tar-ball“) normally in a less than an hour
… Introduction to ODB • An observational database usually contains following items: • Observation identification, position and time coordinates • Observation value, pressure levels, channel numbers • Various quality control flags • Obs. departures from background and analysis fields • Satellite specific information • Other closely related information • All information can be accessed via ODB/SQL language and Fortran90 interface • Also a direct (read-only) access to ODB-data is now available • no programming effort to “scan” ODB-data
Basic components of ODB • ODB/SQL-language • Data Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each other • Data Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !! • Fortran90 interface layer • Data manipulation : create, update & remove data • Execute ODB/SQL-queries and retrieve filtered data • To control MPI and OpenMP-parallelization
Creating a simple ODB database • We will create a very simple database using text files • The 3 text files describe • Data layout i.e. what data items will go into ODB • Location and time information of observations • Actual observation measurement information for each location at the given pressure levels • Feed these files into simulobs2odb-program • Discover the data values in database by using odbviewer
Data definition layout : MYDB.ddl CREATE TABLE hdr AS ( seqno pk1int, obstype pk1int, codetype pk1int, lat pk9real, lon pk9real, date yyyymmdd, time hhmmss, body@LINK, ); CREATE TABLE body AS ( entryno pk1int, varno pk1int, vertco_type pk1int, press pk9real, obsvalue pk9real, );
Input file#2 : hdr.txt #hdr obstype = 2 codetype = 141 seqno lat lon date time body.len 1 45 -15 20041101 000000 1
Input file#3 : body.txt #body entryno varno vertco_type press obsvalue 1 2 1 50000 251.0
Running simulobs2odb • Initialize ODB interactive environment : • use odb • Create database using the following simple command : • simulobs2odb –l MYDB –i hdr.txt –i body.txt • As a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via special @LINK data type • It is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time
Visualizing with odbviewer • History: odbviewer was originally written to be used as a debugging tool for ODB software development • Linked with ECMWF graphics package MAGICS/MAGICS++ • Displays coverage plots • Also a textual report generator • Displays output of data queries • “Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the user • Textual report itself can be invaluable source of information for further post-processing tasks • Making use of the new and more economical tool odbsql
Running odbviewer • Go to database directory • cd MYDB • Run • odbviewer –q ‘SELECT lat,lon,press,obsvalue\ FROM hdr, body \ WHERE obstype = 2’
odbviewer coverage plot Our observation !!
Some odbviewer options -h List of options (gimme some “help” !) -q ‘SQL-stmt’ Provide ODB/SQL-statement inline -v viewname/poolno Choose SQL name (& optionally pool number) -p “1-10,12,15” Choose from a subset of pools -R No radians-to-degrees conversion for (lat,lon) -r Enforce radians-to-degrees conversion -k Show (lat,lon) in degrees even if they were in radians in DB -c Clean start (i.e. recompile all) -e editor Choose preferred editor -e batch Run in batch mode (same as –e pipe) -N Do not produce a report at all -I Do not show plot immediately -P projection Change display projection -C file.cmap Supply a color map file -A plot_area Choose plotting area -F (en)Force to use the old style odbviewer over ‘odbsql’
More complex databases • In reality databases usually contain many more tables (>>5) than in the simple example earlier • Each table can contain 10—50 data columns • There can also be a sophisticated data hierarchy (see the next slide) to describe potentially quite complex relationships between tables • In order to provide a good parallelperformance on supercomputers, data tables are furthermore divided into data pools, which enables parallel I/O, too: • They behave like sub-databases within a database • Allows much bigger data sets than otherwise possible
ECMA/ODB CCMA/ODB Output BUFRs ODB within IFS/4DVAR-system
AMSU-A data after screening Under 10% left active !!
Typical ODB usage at ECMWF … • Database can be created interactively or in batch mode • We usually run our in-house BUFR2ODB in batch-mode • New observation types can also be fed in via text file • Complete database manipulation prefer using Fortran90-interface, but any read/only-database can also be accessed via rudimentary client-server –interface (C/C++) • Another possibility is to run the new tool – odbsql • No need to use of ODB/SQL compilation system • No need to write a single line of Fortran90 • The tool is under development
… Typical ODB usage at ECMWF • When database has been created, the application program queries data via precompiled ODB/SQL and places the result data (also known as view ) into a data matrix allocated by the user program • There can virtually be any number of active viewsat any given time. These can be updated and fed back to database • Due to ODB, the use of WMO BUFR has therefore been minimized at ECMWF in order to enable faster and more robust processing of observations
ECMWF BUFR to ODB conversion • ODBs at ECMWF are normally created by using bufr2odb • Enables MPI-parallel database creation efficient • Allows retrospective inspection of Feedback BUFR data by converting it into ODB (slow & not all data in BUFR) • bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4 • The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive) • Feedback BUFR to ODB works similarly: fb2odb –i feedback_bufr_file –n 8 –u 2
Manipulating ODB from Fortran90 • Currently Fortran90 is the only way to fill an ODB database • simulobs2odb is also a Fortran90-program underneath • likewise odbviewer or practically any other ODB-tool • Also: to fetch and update data, Fortran90 is necessary • ODB Fortran90 interface layer offers a comprehensive set of functions to • Open & close database • Attach to & execute precompiled ODB/SQL queries • Load, update & store queried data
An example ODB program program main use odb_module implicit none integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp real(8), allocatable :: x(:,:) npools = 0 h = ODB_open(‘MYDB’, ’OLD’, npools=npools) < data manipulation loop ; see next page > rc = ODB_close(h, save=.TRUE.) end program main
Data manipulation loop DO jp=1,npools ! Execute SQL, allocate space, get data into matrix rc = ODB_select(h,’sqlview’,nrows,ncols,poolno=jp) allocate(x(nrows,0:ncols)) rc = ODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp) ! Update data, put back to DB, deallocate space call update(x,nrows,ncols) ! Not an ODB-routine rc = ODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp) deallocate(x) rc = ODB_cancel(h,’sqlview’,poolno=jp) ! Use the following only with READONLY-databases ! rc = ODB_release(h,poolno=jp) ENDDO
Compile, link and run • use odb # once per session • (2) odbcomp MYDB.ddl # once only;often from file MYDB.sch • (3) odbcomp sqlview.sql # recompile only when changed • (4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link • (5) ./main.x # run
odbsql • A new tool to access ODB data in read/only –mode • Does not generate C-code, but dives directly into data • Usually faster than generated C-code with exception of accessing large amounts of satellite data (investigated) • The tool is under active development right now • Usage: odbsql –q ‘SELECT column(s) FROM table(s) WHERE …’ \ –s starting_row –n number_of_rows_to_display \ [–X] [other_options]
ODB/SQL – examples (1) SET $t2m = 39; // Scalar parameters, whose values … SET $synop = 1; // … can be overridden in Fortran90 CREATE VIEW t2m AS SELECT an_depar, fg_depar, lat, lon, obsvalue FROM hdr, body WHERE obstype = $synop // Give me synops AND varno@body = $t2m // Give me 2 meter temperatures AND obsvalue is notNULL ; // Don’t want missing data
ODB/SQL – examples (2) SELECT count(*), avg(obsvalue), stdev(fg_depar) FROM hdr, body WHERE obstype = $synop && varno = $t2m AND obsvalue IS NOT NULL; // Observation count per (obstype,codetype)-pair : SELECTobstype, codetype, count(*) FROMhdr ; SELECT varno, avg(fg_depar), CORR(fg_depar, an_depar) FROM body WHERE fg_depar is NOT null ;
odbdiff • Enables comparison of two ODB databases for differences • A very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runs • Usually a non-trivial task • Usage: odbdiff –q ‘SELECT …’ /dir1/DATABASE1 /dir2/DATABASE2 • By default the command brings up an xdiff-window with respect to differences • If latitude and longitude were also given in the data query, then it also produces a difference plot using odbviewer-tool
odbcompress • Enables to create very compact databases from the existing ones for • archiving purposes, or • for smaller database footprint (disk occupancy) • Makes post-processing considerably faster • The user can choose to • Truncate the data precision, and/or • Leave out columns that are less of an importance • Typical compression ratios vary between 2.5X … 11X • the high compression achieved for satellite data !!
odbdup/odbmerge • Allows f.ex. database sharing between multiple users • Over shared (e.g. NFS, Lustre, GPFS, GFS) disks • Duplicates [merges] database(s) by copying metadata (low in volume), but shares the actual (high volume) binary data • Also enables creation of time-series database, for example: odbdup –i “200701*/ECMA.conv” –o USERDB • The previous example creates a new database labelled as USERDB, which presumably spans over the all conventional observations during the January 2007 • The main point : user has now access to whole month of data as if it was a single database !!
odb2netcdf • Translates the result of a given ODB-query (or whole ODB-table) into a series of NetCDF-files, by default one file for each ODB data pool (i.e. partition) • Usage: odb2netcdf –q ‘SELECT …’ [-p pool_number] [-P] • The result files can be viewed with the standard NetCDF tools like ncdump and ncview • The files can also be created in the NetCDF packed format (caveat : truncated data precision), -P option was used
Some interesting facts on ODB • Written mainly in C-language • Except Fortran90-interface and IFS/4DVAR interface • Except BUFRODB (by Milan Dragosavac, ECMWF) • ODB/SQL is currently converted into C-code • 10 lines of SQL generates >> 100 lines of C-code • Standalone ODB installation (w/o IFS) is also available • Tested at least on the following machines • SGI/Altix, IBM Power3/4/5, Linux Intel/AMD • Fujitsu VPPs, NEC SX, Cray XT3/4 • Automatic binary data conversion guarantees database portability between different machines
… and some ODB “limitations” • ODB software is clearly meant for large scale computation since – given lots of memory and disk space, fast CPUs: • A single program can handle up to 2^31 ODB databases • A single database can have up to 2^31 data pools • A single database can have any number of tables • A single table in a data pool can have up to 2^31 rows and (by default) 9999 columns • A single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one go • These really big numbers show that ODBs potential is on parallel computers. Yet we haven’t forgotten the PCs!
Finally… • ODB software is developed to allow unprecedented amounts of satellite data through the IFS/4DVAR system • Software has been operational at ECMWF since June’2000, but is still evolving • Emphasis is now on graphical post-processing and how to enable fast access to very large amounts of data • Who is using ODB outside ECMWF ? At least … • MeteoFrance, Hungarian MS, SMHI, FMI • Aladin and some HIRLAM nations • Australian Bureau of Meteorology • University of Vienna via re-analysis ERA40 collaboration