170 likes | 317 Views
Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data. Jennifer Miletta Adams IGES/COLA AMS 2013. Workflow Requirements. No , , , , et al. Script-Based Flexible Automated
E N D
Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta AdamsIGES/COLAAMS 2013
Workflow Requirements No , , , , et al. Script-Based Flexible Automated Runs in a UNIX environment
Workflow Elements Create list of desired data: all available models and ensembles for a subset of experiments, realms, frequencies, and variables Keep track of what has already been acquired Identify what is available Get needed data Make data user-friendly
Specification of Desired Data piControl/atmos/mon/Amon/ clthflshfsshurs pr prsnprwpspslrldsrlusrlutrlutcsrsdtrsutrsutcstasuas vas piControl/atmos/day/day/ clthflshfsshuss pr prsnpslrldsrlusrluttasuas vas piControl/ocean/mon/Omon/ msftmyzpsurhopoto so thetaotosuovo piControl/ocean/day/day/ tos piControl/land/mon/Lmon/ evspsblsoievspsblveglaimrromrsomrsostran piControl/land/day/day/ mrsos piControl/landIce/mon/LImon/ sncsnw piControl/seaIce/mon/OImon/ sic sit historical/atmos/mon/Amon/ clthflshfsshurs pr prsnprwpspslrldsrlusrlutrlutcsrsdtrsutrsutcstasuas vas historical/atmos/day/day/ clthflshfsshuss pr prsnpslrldsrlusrluttasuauasva vas historical/atmos/fx/fx/ sftlf historical/ocean/mon/Omon/ msftmyzpsutauuotauvotos historical/ocean/day/day/ tos historical/land/mon/Lmon/ cropFracevspsblsoievspsblveglaimrromrsomrsostran historical/land/day/day/ mrsos historical/landIce/mon/LImon/ sncsnw historical/seaIce/mon/OImon/ sic sit transixtransiy
Keep Track of Acquired Data I Local CMIP5 data files are stored under thefollowing directory structure (10 keywords): /cmip5 /data /Experiment /Realm /Frequency /MIP-Table /Variable /Institute.Model /Ensemble /Version /datafiles.nc
Keep Track of Acquired Data II Use the “find” command to create a master list of all subdirectory names under /cmip5/data Updated daily, this master list shows all acquired data residing on local disk Users can sort or filter this list in order to discover if data they need has been acquired
Discovery of Available Data I Build a Dataset Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=Dataset&latest=true&replica=false&facets=id&limit=0&project=CMIP5&experiment=piControl&realm=atmos&time_frequency=mon&cmor_table=Amon&variable=clt&variable=hfls….&variable=vas
Discovery of Available Data II Capture dataset search results into a text file called “tmp” using wget : wget –O tmp “$URL” Remove debris text from “tmp” to extract relevant information: grep 'name=\"cmip5.’ tmp> tmp2 seds/'<int name=\"'//g tmp2 > tmp3 seds/'\">1<\/int>'//g tmp3 > tmp4 seds/'\">2<\/int>'//g tmp4 > result
Discovery of Available Data III The result text file contains a list of dataset IDs and data nodes for all available data that match my search criteria: cmip5.output1.BCC.bcc-csm1-1-m.piControl.mon.atmos.Amon.r1i1p1.v20120705|bcccsm.cma.gov.cn cmip5.output1.BCC.bcc-csm1-1.piControl.mon.atmos.Amon.r1i1p1.v1|bcccsm.cma.gov.cn cmip5.output1.BNU.BNU-ESM.piControl.mon.atmos.Amon.r1i1p1.v20120626|esg.bnu.edu.cn cmip5.output1.CCCma.CanESM2.piControl.mon.atmos.Amon.r1i1p1.v20120623|dapp2p.cccma.ec.gc.ca cmip5.output1.CMCC.CMCC-CM.piControl.mon.atmos.Amon.r1i1p1.v20120627|adm07.cmcc.it etc. This list of what is available is compared to the master list of what has been acquiredto determine what is needed
Get Needed Data Determine number of files for each data set Download wget script, give it a unique name Keep authentication certificates up-to-date Execute wget script Put files in proper place under /cmip5/data/
Determine Number of Files Build a File Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=File&dataset_id=cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu&variable=clt&variable=hfls….&variable=vas Extract number of files from result : wget –qtmp “$URL” –O - | grepnumFound
Download WGET Script I Build a wget URL: http://pcmdi9.llnl.gov/esg-search/wget? &dataset_id=cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu&limit=1000&variable=clt&variable=hfls….&variable=vas If number of files > 1000: You need separate URLs: Append “&offset=1000” to 1st URL to get 2nd group of filesAppend “&offset=2000” to 1st URL to get3rdgroup of files
Download WGET Script II Build a meaningful name for wget script: wget.cmip5.output1.NCAR.CCSM.rcp85.mon.atmos.Amon.r1i1p1.v1.sh Use wget to download the wget script : wget –q –O wgetname.sh “$URL”
User Access and Authentication Register with ESGF and get an OpenID and password e.g. https://pcmdi9.llnl.gov/esgf-idp/openid/jennifer Enroll in appropriate group (e.g. CMIP5 research) Obtain or renew certificates for user authentication Use the python utility “MyProxyClient” to renew certificates automatically: #!/bin/bashexport X509_CERT_DIR=$HOME/.esg/certificates export X509_USER_PROXY=$HOME/.esg/credentials.pem < /homes/jma/pass /usr/local/bin/myproxyclient \logon -s pcmdi9.llnl.gov -o $X509_USER_PROXY \-p 7512 –T -ljennifer -S
Execute WGET Script Run wget script, capture all output in a log file: wgetname.sh -v > wget.log 2>&1 & A wget may fail for any number of reasons: Data node down Data node throttling number of simultaneous wgets File not found Checksum failure Certificate expired, or authorization failed Connection timeout Forbidden If at first you don’t succeed, try, try, try again Failure is an option
Make Data User-Friendly Create GrADS descriptor files Aggregate files over time dimension Make use of ensemble dimension when appropriate Identify missing or overlapping time periods Assign non-standard dimensions (e.g. basin averages or fixed fields) Handle 365_day calendars Create PDEF files for non-rectilinear grids For ocean and sea ice realms ESMF’s RegridWeightGen utility generates the interpolation weights Vector fields must be rotated from grid-relative to Earth-relative coordinates before interpolation
Get Additional Information About CMIP5: http://cmip-pcmdi.llnl.gov/cmip5/cmip5-helpdesk@stfc.ac.uk About ESGF: http://esgf.org/wiki/ESGF_Index/ esgf-user@lists.llnl.gov About this presentation: jma@iges.org