200 likes | 344 Views
CMIP5 Download Tutorial. Jennifer M. Adams 12 January 2012 /data/cmip5/extras/CMIP5_Tutorial.pptx. /data/cmip5/docs/data_reference_syntax.pdf Eleven keywords u niquely describe a data set:
E N D
CMIP5 Download Tutorial Jennifer M. Adams12 January 2012 /data/cmip5/extras/CMIP5_Tutorial.pptx
/data/cmip5/docs/data_reference_syntax.pdf Eleven keywords uniquely describe a data set: Project. Institute. Model. Experiment. Frequency. Realm. MIP Table. Product. Variable. Ensemble. Version. Vocabulary Lesson
Project (Activity): cmip5Institute: The institute responsible for model results, e.g. MOHC,NOAA-GFDL, NCAR, MPI-M, NASA-GMAO.…Model: The name of the model used,e.g. HadCM3, GFDL-ESM2M, CCSM4, MPI-ESM-LR….Experiment: The name of the experiment family and type,e.g. amip, amip4xCO2, decadal1960, piControl, historical….
Frequency: The interval between time samples. Options are:yr, mon, day, 6hr, 3hr, subhr, monClim, fxRealm: The high level modeling component. Options are: atmos, ocean, land, landIce, seaIce, aerosol, atmosChem, ocnBgchem MIP Table: A spreadsheet* entry for realm and variable components,e.g. Amon, Omon, Lmon, LImon, day, …. * /shared/cmip5/docs/standard_output.xlsx
Variable: A short name that identifies a physical quantity, e.g. pr, ps, psl, tas, tauu, tauvuas, vas….Product: A designation of CMIP5 files. Options are: output, output1, output2, unsolicitedEnsemble: A name that distinguishes among closely related simulations and includes 3 numbers: realization (rN), initialization method (iM), and perturbed physics (pL).e.g. r1i1p1, r7i1p1, r1i3p1, r1i1p2.… Version: Uniquely identifies a particular version of the data set, e.g. v20110923, v20111208, v1, v2.…
/project/cmip5/jma/desired.txtThis list, based on user requests, is managed by Tim and Jennifer. /project/cmip5/jma/acquired.txt This list, based on contents of /data/cmip5/data, is auto-updated. Each Downloader will be assigned an item from the desired list and must grab all available models and ensemble members for the given Experiment/Realm/Frequency/MIP Table. Desired & Acquired Data Sets
For Downloading: http://pcmdi3.llnl.gov/esgcet/home.htm http://cmip-gw.badc.rl.ac.uk/home.htmhttp://ipcc-ar5.dkrz.de/home.htm For Searching: http://esg.prototype.ucar.edu/home.htm Always Use Firefox! ESGF Gateways
OpenID: https://pcmdi3.llnl.gov/esgcet/myopenid/jennifer Username: jennifer Password: sdf,WER.5 Authentication
Dataset URL Gateway Search Results Dataset ID
mv wget-download.sh wget.cmip5.output1.MOHC.HadCM3.decadal2000.mon.atmos.Amon.r1i2p1.v20110708.sh Rename Wget Script IT IS VERY IMPORTANT THAT YOU DO THIS CORRECTLY! Because … Otherwise there is no way to tell what dataset wget-download.sh is configured to grab I need the keywords in the new script name to enable several automation scripts Some of the metadata (esp. version number) can’t be captured any other way
Create a top level working directory (you will need my help with this):mkdir /shared/working/cmip5/jma Make several subdirectories under your top level directory, one for each wget: cd /shared/working/cmip5/jma mkdir a01 a02 a03 a04 a05 a06 a07 a08 a09 a10 a11 a12 a13 a14 a15 a16 sftp wget.renamed.sh from laptop to /shared/working/cmip5/jma/a01/ Set Up Work Environment
mkdir $HOME/.esgcd $HOME/.esgcp /data/cmip5/extras/MyProxyLogon-ESG.jar . cp /homes/jma/.esg/update ../update Certificates
Login to a server: cpu1-cpu6 cd /shared/working/cmip5/jma/a01 /project/cmip5/jma/dorun.sh & This script filters out unwanted files, notes wget format, runs edited wget script, captures output in log file, and monitors script’s progress. Run it in the background. /project/cmip5/jma/ckrun.sh When dorun.sh is no longer working, this script evaluates the success of wgets, checks if all desired files are here, returns status of download. Launch the Wget
If ckrun.sh determines that the download is complete, it will create a file called ‘done’. Take a moment to celebrate, then move on to next data set. • If ckrun.sh determines that download is incomplete, there was an error. • Look for a file in working subdir named ‘formatA’ or ‘formatB’ or ‘formatC’. If format=A: • Go back to the Gateway and get a new wget script • Copy new wget.renamed.sh into working subdirectory • Rerun dorun.sh • If format=B or C: • You may need to update your certificates • You do not need a new wget script; just rerun dorun.sh Repeat As Necessary
Watch out for zombies, a.k.a. processes that are hung and can’t be killed • Use dataset URLs to circumvent the search • Start by downloading only these high-priority models first: Pitfalls & Shortcuts