130 likes | 230 Views
The NERC DataGrid: Making data interdisciplinary!. Bryan Lawrence (BADC) Roy Lowry (BODC), Kerstin Kleese van Dam, Kevin O’Neill, Andrew Woolf (CLRC), Dean Williams (PCMDI) and the rest of the NDG team. Outline. Brief intro to the NDG goals: Fat clients, and inter-disciplinarity …
E N D
The NERC DataGrid: Making data interdisciplinary! Bryan Lawrence (BADC) Roy Lowry (BODC), Kerstin Kleese van Dam, Kevin O’Neill, Andrew Woolf (CLRC), Dean Williams (PCMDI) and the rest of the NDG team.
Outline • Brief intro to the NDG goals: • Fat clients, and inter-disciplinarity … • Evolution: from BODC&BADC to the wider community. • Modular Design • Keeping track of the metadata • Harvesting • Where we are now …(we only started in Sep02) • What lies ahead?
ESG/VCDAT: Example of a Client Application • We will: • Provide python based classes for our observational data to complement the access to 3D gridded data. • It will be possible to overlay model and observational data using grid tools.
NDG: Required “Data” Metadata Need a tool to generate B!
Using Globus GridFTP Statement: GridFTP is a “faster-stronger” FTP. Is it? Yes: tests between DL and RAL and RAL and POL suggest an average factor of two in performance for large file transfers (although peak FTP rates can reach ¾ of GridFTP). And: network reliability and bandwidth at 2 Mbit/s is not good enough for sustained large file transfer without GridFTP! (500 Mbyte file requires 40 minutes cf 80 minutes). But: 2 Mbit/s is too slow to deal with the file sizes of interest! Big problem for NERC … The same 40 minute file would take under a minute between DL and RAL • Means a different trade off between client and server processing • Fat clients for Fat pipes, Thin clients for Thin pipes.
Under-the-hood, the power of XML … ECMWF ERA40 • Many TB in spectral format • Double that in NetCDF! • Want to avoid using tape-drives! We have: • Implemented a new caching system based on CDAT to do “on-demand” conversion from spectral-NetCDF • Information in LAS database drives the CDAT back-end, if possible use existing NetCDF otherwise convert on-the-fly. Next: • Will link this and other datasets to our intermediate schema. • Need to add Met Office data (working on new drivers).
Problems ahead? • Access control: technical issues, policy, social habits, trust. • Quality of existing metadata, how to collect what we need? • Joining it all together, Joining all us together … • Making sure we are OGC, ISO compliant. • Making it robust … • (Ten times as much effort as making it work, Tony Hey, June 2003)