400 likes | 565 Views
An ESG Walkthrough. ESG Federation website - DCC File system for ESG. Muhammad Atif. ESG-NCI Gateway. Search. We highly recommend subscribing to our tweets. Latest news. Search and Access Data. You can search without login; but not download Quick Links → Create Account
E N D
An ESG Walkthrough ESG Federation website - DCC File system for ESG Muhammad Atif
ESG-NCI Gateway Search We highly recommend subscribing to our tweets Latest news
Search and Access Data • You can search without login; but not download • Quick Links → Create Account • Follow the on screen instructions • You will receive an email to confirm your registration; at the same time the administrators are also notified. • Admin has to validate you before you can download the data. • After confirmation email from admin; login to the NCI Node • Account → Apply for Group Membership • Mk-3.6 • CMIP-5 research (not in our control – request goes to PCMDI) • Requests to NCI are usually quick. • Same ID can be used on all the gateways. Your OpenID: https://esg.nci.org.au/esgcet/myopenid/<username>
Searching for data • Recommend that you browse the website and get familiarized. • If asked about authentication, you may use a temporary openid • https://esg.nci.org.au/esgcet/myopenid/dcc000 • Password: abc123 • Please note that this openid will be removed after the workshop, it is highly recommended that you create your openid
Download data from the Gateway • Three Download Methods • Using the web browser. • A set of wget-scripts. • Via GridFTP (Data mover lite)
Download Method-1 (Web browser based) • Intuitive but slow • Follow on-screen instructions and nothing can go wrong. • Works like normal downloads from the browser of your choice i.e. click to download. • IE, FF, Chrome and Safari are supported. • Works well if you are after couple of files
Download -2 (Wget Scripts) • Ability to download multiple files • Select the files (variables) you are interested in. • Presents a wget-download.sh script, that you need to save and run. • Command line based – No GUI • Two methods of Authentication • Authorization token (Depreciated) • PCMDI gateway only. • No login required, However the token expires in 24 hours and the script is of no use after that. • My Proxy Login (Official) • Needs a separate step for authentication. • Need to run Java applet or MyProxyClient software • If authentication expires, just run the MyProxyClient software • Note: If you are interested in doing lots of downloads, we can provide a custom script to speedup the process on DCC……… Example to follow later
Process for MyProxyLogonDownload on the DCC • ssh abc123@dcc.nci.org.au -Y • Download MyProxyLogon-ESG.jar file • wget http://esg.nci.org.au/esgcet/webstart/myProxyLogon/MyProxyLogon-ESG.jar • Run MyProxyLogon.jar file; • module load java • Java –jar MyProxyLogon-ESG.jar • It writes the certificates to your $HOME/.esg folder • Run the wget-download.sh command
Download-3 (DML) • Parallel Downloads • DML Preferences Concurrency • Faster than wget • Uses GridFTP • Caveat: Not available on all ESG nodes • NCI one of the few that has the facility
ESG Data on the DCC • IPCC AR5/CMIP5 • CSIRO-QCCCE Mk-3.6 • CAWCR ACCESS • Replicated data from the other ESG nodes. • Other data • CMIP3 • Observational data • Processed data
DCC File system organization • All ESG data in /projects/ESG: • Authoritative • Unofficial-ESG-replica • CAWCR_CVC_processed • /projects/ESG/Authoritative • Serves data using the policies of the ESG Federation • This is the directory that our ESG software serves data from • All data is the current official copy • User example: login to the DCC and have a look.
Unofficial Replica • /projects/ESG/Unofficial_Replica • IPCC • The IPCC directory is where you can reference data that we have downloaded from other nodes (though not an official replica). The subdirectories could be partial datasets or complete ones. • IPCC_tmp_flat • Direct symlinks to files, flat directory structure • tmp • You can download your data here in $USER folder. • We can provide you with scripts to help download data here • GlobalObs_and_Reanalysis • data sourced from various places that Lawrie Rikus/Ben Hu have been maintaining. • Also served through a THREDDS service - for remote access
Unofficial Replica • /projects/ESG/Unofficial-ESG-replica/IPCC • User downloads using wget scripts/DML. • Partial data; Not all of the data is downloaded. • Does not necessarily contain the most up to date version • Data may be changed by the remote node since last download. • ESG (and official replica directory) always has the latest version. • Organised as Data Reference Syntax (DRS)
Data Reference Syntax (How files are organized @ DCC) • This is how the tree looks like compared to DRS • DRS cmip5.<product>.<institute>.<model>.<experiment>. <time_frequency>.<realm>.<cmor_table>.<ensemble> • File System /projects/ESG/unofficial-ESG-replica/IPCC/CMIP5/output1/NCC/NorESM1-M/historical/mon/seaIce/OImon/r1i1p1/v20110901/sic/<FILE>
Downloading data to DCC File System • If you would like a significant amount of data that we don’t have, then … please contact us. • Reasons: • It may already be downloaded but not linked • Downloading data is still tricky • Space management • That said – we would like to facilitate downloads of priority data. • How? …. Lets do it
Demo • Download the wget file from esg-gateway • ssh –Y user@dcc.nci.org.au • java –jar MyProxyLogon-ESG.jar • Copy wget-file to dcc (scp, copy n paste) in a new folder • ./esg-download.py wget-download.sh • View the directory, it should have a number of wget-split-* • ./esg-qsub-download.py –i wget-split- • Press “y” • Check the files after some time We will be streamlining it further
Help • DCC and ESG both are evolving continuously • Comments and suggestions are always welcome • Help Desks • Anything related to ESG federation website/ other models are not native to NCI • Cmip5-helpdesk@stfc.ac.uk • Related to DCC compute cluster • help@nf.nci.org.au
Downloads are managed by Us • GridFTP • Fast • Downloads managed by you as a user
Controlled Vocab: • http://esg-pcmdi.llnl.gov/internal/esg-data-node-documentation/cmip5_controlled_vocab.txt
ESG-NCI Gateway Search by categories We highly recommend subscribing to our tweets
Search and Access Data • You can search without login; but not download • Quick Links → Create Account • Follow the on screen instructions • You will receive an email to confirm your registration; at the same time the administrator(s) are also notified. • Admin has to validate you before you can download the data. • After confirmation email; login to the NCI Node • Account → Apply for Group Membership • Mk-3.6 • CMIP-5 research (not in our control – request goes to PCMDI) • Requests to NCI are usually quick. • For others, this may take time (one – two days)
Download data from the Gateway • Three Methods • Using the web browser. • A set of wget-scripts. • Via GridFTP (Data mover lite)
Download – 1 (Web based) • Intuitive but slow • Follow on-screen instructions and nothing can go wrong. • Works like normal downloads from the browser of your choice • IE, FF, Chrome and Safari are supported. • Works well if you are after couple of files
Download -2 (Wget Scripts) • Ability to download multiple files • Presents you with a wget-download.sh script. • Command line based – No GUI • Two methods • Authorization token (Depreciated) • PCMDI gateway only. • My Proxy Login (Official) • Needs a separate step for authentication • Need to run Java applet or MyProxyClient software • Note: If you are interested in doing lots of downloads, we can provide a custom script to speedup the downloads on DCC.
Process for MyProxyLogonDownload on the DCC • ssh abc123@dcc.nci.org.au -Y • Download MyProxyLogon-ESG.jar file • wget http://esg.nci.org.au/esgcet/webstart/myProxyLogon/MyProxyLogon-ESG.jar • Run MyProxyLogon.jar file; instructions are provided in the wget download script that you have already downloaded via web-browser. • module load java • Java –jar MyProxyLogon-ESG.jar • It writes the certificates to your $HOME/.esg folder
Download-3 (DML) • Parallel Downloads • DML Preferences Concurrency • Faster than wget • Uses GridFTP • Caveat: Not available on all ESG nodes • NCI one of the few that has the facility
ESG Data on the DCC • IPCC AR5/CMIP5 • CSIRO-QCCCE Mk-3.6 • CAWCR ACCESS • Replicated data from the other ESG nodes. • Other data • CMIP3 • Observational data • Processed data
DCC File system organization • All ESG data in /projects/ESG: • Authoritative • Unofficial-ESG-replica • CAWCR_CVC_processed • /projects/ESG/Authoritative • Serves data using the policies of the ESG Federation • This is the directory that our ESG software serves data from • All data is the current official copy. • User example: login to the DCC and have a look.
Unofficial Replica • /projects/ESG/Unofficial_Replica • IPCC • The IPCC directory is where you can reference data that we have downloaded from other nodes (though not an official replica). The subdirectories could be partial datasets or complete ones. • IPCC_tmp_flat • Direct symlinks to files, flat directory structure • tmp • You can download your data here in $USER folder. • We can provide you with scripts to help download data here • GlobalObs_and_Reanalysis • data sourced from various places that Lawrie Rikus/Ben Hu have been maintaining. • Also served through a THREDDS service - for remote access
Unofficial Replica • /projects/ESG/Unofficial-ESG-replica/IPCC • User downloads using wget scripts/DML. • Partial data; Not all of the data is downloaded. • Does not necessarily contain the most up to date version • Data may be changed by the remote node since last download. • ESG (and official replica directory) always has the latest version. • <MOVE TO NEW SLIDE>Organised as Data Reference Syntax (DRS) cmip5.<product>.<institute>.<model>.<experiment>.<time_frequency>.<realm>.<cmor_table>.<ensemble> Ref the official link to the standard.
Downloading data to our ESG • If you would like a significant amount of data that we don’t have, then … please contact us. • Reasons: • It may already be downloaded but not linked • Downloading data is still tricky • Space management • That said – we would like to facilitate downloads of priority data. • How? …. (new slide)
Downloads are managed by Us • GridFTP • Fast • Downloads managed by you as a user
Controlled Vocab: • http://esg-pcmdi.llnl.gov/internal/esg-data-node-documentation/cmip5_controlled_vocab.txt