Introduction to Observational DataBase (ODB) Sami Saarinen, Paul Burton ECMWF 22-Mar-2006

Introduction to Observational DataBase (ODB) Sami Saarinen, Paul BurtonECMWF22-Mar-2006

Overview • Introduction to ODB • Creating a simple database • Use of simulobs2odb –program • Visualizing data using odbviewer, ODBTk • The bigger picture • ODB within IFS/4DVAR-system • A more complex database • Manipulating ODB from Fortran90 • Tools: odbless, odbdiff, odbcompress, odbdup, odb2netcdf

Introduction to ODB • ODB is a tailor made database software developed at ECMWF to manage very large observational data volumes through the IFS/4DVAR-system, and to enable flexible post-processing of observational data • Observational database usually contains following items: • Observation identification, position and time coordinates • Observation value, pressure levels, channel numbers • Various quality control flags • Obs. departures from background and analysis fields • Satellite specific information • Other closely related information

AMSU-A data before screening

Basic components of ODB • ODB/SQL-language • Data Definition Language: To describe what data items belong to database, what are their data types and how they are related (if any) to each other • Data Query Language: To query and return a subset of data which satisfies certain user specified conditions. This is the key feature of the ODB software !! • Fortran90 interface layer • Data manipulation : create, update & remove data • Execute ODB/SQL-queries and retrieve filtered data • To control MPI and OpenMP-parallelization

ODB/SQL compilation system

Typical ODB usage patterns • Database can be created interactively or in batch mode • We usually run our in-house BUFR2ODB in batch • New observation types can also be fed in via text file • Complete database manipulation currently prefers using Fortran90-interface, but read/only database can also be accessed via rudimentary client-server –interface (C/C++) • When database has been created, the application program normally queries data and places the result (also known as view) into a data matrix allocated by the user • There can be virtually any number of active views at any given time. These can be updated and fed back to database

Overview • Introduction to ODB • Creating a simple database • Use of simulobs2odb –program • Visualizing data using odbviewer, ODBTk • The bigger picture • ODB within IFS/4DVAR-system • A more complex database • Manipulating ODB from For • Tools: odbless, odbdif, odbcompress, odbdup, odb2netcdf

Creating a simple database • We will create a very simple database using text files • The 3 text files describe • Data layout i.e. what data items comprise this ODB • Location and time information of observations • Actual observation measurement information for each location at the given pressure levels • Feed these files into simulobs2odb-program • Discover the data values in database by using odbviewer

Data definition layout : MYDB.ddl CREATE TABLE hdr AS ( seqno pk1int, obstype pk1int, codetype pk1int, lat pk9real, lon pk9real, date yyyymmdd, time hhmmss, body @LINK, ); CREATE TABLE body AS ( entryno pk1int, varno pk1int, vertco_type pk1int, press pk9real, obsvalue pk9real, );

Input file#2 : hdr.txt #hdr obstype = 2 codetype = 141 seqno lat lon date time body.len 1 45 -15 20041101 000000 1

Input file#3 : body.txt #body entryno varno vertco_type press obsvalue 1 2 1 50000 251.0

Running simulobs2odb • Initialize ODB interactive environment : • use odb • Create database using the following simple command : • simulobs2odb –l MYDB –i hdr.txt –i body.txt • As a result of these commands, a small database called MYDB has been created and it contains one data pool with two tables hdr and body, which are linked (related) to each other via special @LINK data type • It is now easy to extend database by providing more data, or specifying more data items, or adding more tables, or all above at the same time

Visualizing with odbviewer • History: odbviewer was originally written to be used as a debugging tool for ODB software development • Linked with ECMWF graphics package MAGICS/MAGICS++ it displays coverage plots • Also a textual report generator • Displays output of data queries • “Sensitive” to ODB/SQL-language : tries automatically produce both coverage plot and textual report for the user • Textual report itself can be invaluable source of information for further post-processing tasks

Running odbviewer • Go to database directory • cd MYDB • Run • odbviewer –q ‘SELECT lat,lon,press,obsvalue\ FROM hdr, body \ WHERE obstype = 2’

odbviewer coverage plot Our observation !!

Some odbviewer [options] -h List of options (gimme some “help” !) -q ‘SQL-stmt’ Provide ODB/SQL-statement inline -v viewname/poolno Choose SQL name (& optionally pool number) -p “1-10,12,15” Choose from a subset of pools -R No radians-to-degrees conversion for (lat,lon) -r Enforce radians-to-degrees conversion -c Clean start (i.e. recompile all) -e editor Choose preferred editor -e batch Run in batch mode (same as –e pipe) -N Do not produce a report at all -I Do not show plot immediately -P projection Change projection -C file.cmap Supply a color map file -A plot_area Choose plotting area

ODBTk : The ODB Toolkit • GUI based ODB visualisation tool • Easy way for non-experts to build SQL • Interactive viewing of observational data • Can refine SQL “WHERE” statement as you view the data • Portable, lightweight application • Requires ODB, perl, Fortran90 & C compilers

ODBTk : Building an SQL • Twin views on structure • Hierarchical structure • Allows relationship between tables/columns to be seen • “Flat structure” • Easy to find a given column/member or table • Allows user to sort structure • SQL library • Both local & shared

Visualising Coverage

Visualising X-Y plots

AMSU-A data after screening Under 10% left active !!

ECMA/ODB CCMA/ODB Output BUFRs ODB within IFS/4DVAR-system

A more complex database • In the real world a database may contain many more tables (>>5) than in the simple example earlier • Each table can contain 10—50 data columns • There can also be a sophisticated data hierarchy (next slide) to describe potentially complex relationships between tables • In order to provide a good parallelperformance on supercomputers, data tables are furthermore divided into data pools • They behave like sub-databases within a database • Allows much bigger data sets than otherwise possible

Comprehensive data hierarchy

ECMWF BUFR to ODB conversion • ODBs at ECMWF are normally created by using bufr2odb • Enables MPI-parallel database creation  efficient • Allows retrospective inspection of Feedback BUFR data by converting it into ODB • bufr2odb can also be used interactively, for example: bufr2odb –i bufr_input_file –I 1-20 –n 4 • The preceding example creates 4 pools of ECMA database from the given BUFR input file, but includes only BUFR subtypes from 1 to 20 (inclusive) • Feedback BUFR to ODB works similarly: fb2odb –i feedback_bufr_file –n 8 –u 2

Manipulating ODB from Fortran90 • Currently Fortran90 is the only way to fill an ODB database • simulobs2odb is also a Fortran90-program underneath • likewise odbviewer or practically any other ODB-tool • Also: to fetch and update data, Fortran90 is necessary • ODB Fortran90 interface layer offers a comprehensive set of functions to • Open & close database • Attach to & execute precompiled ODB/SQL queries • Load, update & store queried data

An example ODB program program main use odb_module implicit none integer(4) :: h, rc, nra, nrows, ncols, npools, j, jp real(8), allocatable :: x(:,:) npools = 0 h = ODB_open(‘MYDB’, ’OLD’, npools=npools) < data manipulation loop ; see next page > rc = ODB_close(h, save=.TRUE.) end program main

Data manipulation loop DO jp=1,npools ! Execute SQL, allocate space, get data into matrix rc = ODB_select(h,’sqlview’,nrows,ncols,poolno=jp) allocate(x(nrows,0:ncols)) rc = ODB_get(h,’sqlview’,x,nrows,ncols,poolno=jp) ! Update data, put back to DB, deallocate space call update(x,nrows,ncols) ! Not an ODB-routine rc = ODB_put(h,’sqlview’,x,nrows,ncols,poolno=jp) deallocate(x) rc = ODB_cancel(h,’sqlview’,poolno=jp) ! Use the following only with READONLY-databases ! rc = ODB_release(h,poolno=jp) ENDDO

Compile, link and run • use odb # once per session • (2) odbcomp MYDB.ddl # once only;often from file MYDB.sch • (3) odbcomp sqlview.sql # recompile only when changed • (4) odbf90 main.F90 update.F90 –lMYDB –o main.x # link • (5) ./main.x # run

odbless • A textual browser that allows to look at ODB data page-by-page –basis (a little like Unix less-command): • By default calculates statistical summary for each retrieved data column • Cheap with near-optimal ODB data access pattern • User has a choice of specifying starting row • Usage: odbless –q ‘SELECT column(s) FROM table(s) WHERE …’ \ –s starting_row –n number_of_rows_to_display \ [–b buffer_size –X]

odbdiff • Enables to compare two ODB databases for differences • Very useful tool when trying to identify errors/differences between operational and experimental 4DVAR runs • Usage: odbdiff –q ‘SELECT …’ DATABASE1 DATABASE2 • By default brings up an xdiff-window with respect to diffs • If latitude and longitude were given in the data query, then also produces a difference plot using odbviewer-tool

odbcompress • Enables creation of very compact database from the existing one for • archiving purposes, or for smaller footprint • Makes post-processing considerably faster • At this point the user has choices of both • Truncating the data precision • Leaving out columns that are less of importance • Early tests show that this new tool achieves compression factors from 2.5X to 11X • the higher compression being for satellite data !!

odbdup • Duplicates database(s) by copying metadata (low volume), but shares the actual data (high volume) • Allows database sharing between multiple users • Over shared (e.g. NFS mounted) disk • Enables creation of time-series database, for example: odbdup –i “200601*/ECMA.conv” –o USERDB • The previous example creates a new database labelled as USERDB, which presumably spans over all the conventional observations during January 2006 • The heureka is : user has now access to a whole month of data as if it was situated in one single database !!

odb2netcdf • Translates the given ODB-query (or whole ODB-table) into a series of NetCDF-files, by default one file for each ODB data pool • Usage: odb2netcdf –q ‘SELECT …’ • The result files can be viewed with standard NetCDF tools like ncdump and ncview • The files can also be produced in NetCDF packed format (with a caveat of truncated precision)

Also … Some interesting facts • Written mainly in C-language • Except Fortran90-interface and IFS/4DVAR interface • Except BUFRODB (by Milan Dragosavac) • ODB/SQL is currently converted into C-code • 10 lines of SQL generates >> 100 lines of C-code • Standalone ODB installation (w/o IFS) is also available • Can be built in about 30 minutes for Linux/laptop • Tested at least on the following machines • SGI/Altix, IBM Power3/4, Linux Intel/AMD, VPP, … • Automatic binary data conversion guarantees database portability between different machines

… and some ODB “limitations” • ODB software is clearly meant for large scale computation since – given lots of memory and disk space, fast CPUs: • A single program can handle up to 2^31 ODB databases • A single database can have up to 2^31 data pools • A single database can have any number of tables • A single table in a data pool can have up to 2^31 rows and (by default) 9999 columns • A single ODB/SQL-query over active data pools can retrieve up to 2^31 rows in one go • These really big numbers show that ODBs potential is on parallel computers, but we haven’t forgotten desktop PCs!

Finally… • ODB software is developed to allow unprecedented amounts of satellite data through the IFS/4DVAR system • Software has been operational at ECMWF since June’2000, but is still evolving • Emphasis is now on graphical post-processing and how to enable fast access to very large amounts of data • Other ECMWF member states and co-operating countries that are also using or just becoming users of ODB • MeteoFrance, DWD, Hungary, Aladin/HIRLAM-nations • MetOffice is considering via collaboration with BoM • University of Vienna via re-analysis ERA40 collaboration

Introduction to Observational DataBase (ODB) Sami Saarinen, Paul Burton ECMWF 22-Mar-2006

Introduction to Observational DataBase (ODB) Sami Saarinen, Paul Burton ECMWF 22-Mar-2006

Presentation Transcript

Introduction to Observational DataBase ODB sami.saarinenecmwft paul.burtonecmwft 25-Apr-2007

ASTR_2011 Introduction to Observational Astronomy

ASTR_2011 Introduction to Observational Astronomy

Supplementary Fig . 1

ASTR_2011 Introduction to Observational Astronomy

Introduction to Database

Introduction to Database

Introduction to Database

Object-oriented DBMS

Introduction to Database

Introduction to Database

ASTR_2011 Introduction to Observational Astronomy

Introduction to Database

Observations Preprocessing Carla Cardinali

Overview

Introduction to Database

Introduction to Database

Introduction to Database

Introduction to Database

Introduction to Database