1 / 17

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data. Jennifer Miletta Adams IGES/COLA AMS 2013. Workflow Requirements. No , , , , et al. Script-Based Flexible Automated

candy
Download Presentation

Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Successful Strategies for Overcoming the Obstacles in Acquisition, Management, and Analysis of CMIP5 Data Jennifer Miletta AdamsIGES/COLAAMS 2013

  2. Workflow Requirements No , , , , et al. Script-Based Flexible Automated Runs in a UNIX environment

  3. Workflow Elements Create list of desired data: all available models and ensembles for a subset of experiments, realms, frequencies, and variables Keep track of what has already been acquired Identify what is available Get needed data Make data user-friendly

  4. Specification of Desired Data piControl/atmos/mon/Amon/ clthflshfsshurs pr prsnprwpspslrldsrlusrlutrlutcsrsdtrsutrsutcstasuas vas piControl/atmos/day/day/ clthflshfsshuss pr prsnpslrldsrlusrluttasuas vas piControl/ocean/mon/Omon/ msftmyzpsurhopoto so thetaotosuovo piControl/ocean/day/day/ tos piControl/land/mon/Lmon/ evspsblsoievspsblveglaimrromrsomrsostran piControl/land/day/day/ mrsos piControl/landIce/mon/LImon/ sncsnw piControl/seaIce/mon/OImon/ sic sit historical/atmos/mon/Amon/ clthflshfsshurs pr prsnprwpspslrldsrlusrlutrlutcsrsdtrsutrsutcstasuas vas historical/atmos/day/day/ clthflshfsshuss pr prsnpslrldsrlusrluttasuauasva vas historical/atmos/fx/fx/ sftlf historical/ocean/mon/Omon/ msftmyzpsutauuotauvotos historical/ocean/day/day/ tos historical/land/mon/Lmon/ cropFracevspsblsoievspsblveglaimrromrsomrsostran historical/land/day/day/ mrsos historical/landIce/mon/LImon/ sncsnw historical/seaIce/mon/OImon/ sic sit transixtransiy

  5. Keep Track of Acquired Data I Local CMIP5 data files are stored under thefollowing directory structure (10 keywords): /cmip5 /data /Experiment /Realm /Frequency /MIP-Table /Variable /Institute.Model /Ensemble /Version /datafiles.nc

  6. Keep Track of Acquired Data II Use the “find” command to create a master list of all subdirectory names under /cmip5/data Updated daily, this master list shows all acquired data residing on local disk Users can sort or filter this list in order to discover if data they need has been acquired

  7. Discovery of Available Data I Build a Dataset Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=Dataset&latest=true&replica=false&facets=id&limit=0&project=CMIP5&experiment=piControl&realm=atmos&time_frequency=mon&cmor_table=Amon&variable=clt&variable=hfls….&variable=vas

  8. Discovery of Available Data II Capture dataset search results into a text file called “tmp” using wget : wget –O tmp “$URL” Remove debris text from “tmp” to extract relevant information: grep 'name=\"cmip5.’ tmp> tmp2 seds/'<int name=\"'//g tmp2 > tmp3 seds/'\">1<\/int>'//g tmp3 > tmp4 seds/'\">2<\/int>'//g tmp4 > result

  9. Discovery of Available Data III The result text file contains a list of dataset IDs and data nodes for all available data that match my search criteria: cmip5.output1.BCC.bcc-csm1-1-m.piControl.mon.atmos.Amon.r1i1p1.v20120705|bcccsm.cma.gov.cn cmip5.output1.BCC.bcc-csm1-1.piControl.mon.atmos.Amon.r1i1p1.v1|bcccsm.cma.gov.cn cmip5.output1.BNU.BNU-ESM.piControl.mon.atmos.Amon.r1i1p1.v20120626|esg.bnu.edu.cn cmip5.output1.CCCma.CanESM2.piControl.mon.atmos.Amon.r1i1p1.v20120623|dapp2p.cccma.ec.gc.ca cmip5.output1.CMCC.CMCC-CM.piControl.mon.atmos.Amon.r1i1p1.v20120627|adm07.cmcc.it etc. This list of what is available is compared to the master list of what has been acquiredto determine what is needed

  10. Get Needed Data Determine number of files for each data set Download wget script, give it a unique name Keep authentication certificates up-to-date Execute wget script Put files in proper place under /cmip5/data/

  11. Determine Number of Files Build a File Search URL: http://pcmdi9.llnl.gov/esg-search/search?type=File&dataset_id=cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu&variable=clt&variable=hfls….&variable=vas Extract number of files from result : wget –qtmp “$URL” –O - | grepnumFound

  12. Download WGET Script I Build a wget URL: http://pcmdi9.llnl.gov/esg-search/wget? &dataset_id=cmip5.output1.NCAR.CCSM4.rcp85.v1|tds.ucar.edu&limit=1000&variable=clt&variable=hfls….&variable=vas If number of files > 1000: You need separate URLs: Append “&offset=1000” to 1st URL to get 2nd group of filesAppend “&offset=2000” to 1st URL to get3rdgroup of files

  13. Download WGET Script II Build a meaningful name for wget script: wget.cmip5.output1.NCAR.CCSM.rcp85.mon.atmos.Amon.r1i1p1.v1.sh Use wget to download the wget script : wget –q –O wgetname.sh “$URL”

  14. User Access and Authentication Register with ESGF and get an OpenID and password e.g. https://pcmdi9.llnl.gov/esgf-idp/openid/jennifer Enroll in appropriate group (e.g. CMIP5 research) Obtain or renew certificates for user authentication Use the python utility “MyProxyClient” to renew certificates automatically: #!/bin/bashexport X509_CERT_DIR=$HOME/.esg/certificates export X509_USER_PROXY=$HOME/.esg/credentials.pem < /homes/jma/pass /usr/local/bin/myproxyclient \logon -s pcmdi9.llnl.gov -o $X509_USER_PROXY \-p 7512 –T -ljennifer -S

  15. Execute WGET Script Run wget script, capture all output in a log file: wgetname.sh -v > wget.log 2>&1 & A wget may fail for any number of reasons: Data node down Data node throttling number of simultaneous wgets File not found Checksum failure Certificate expired, or authorization failed Connection timeout Forbidden If at first you don’t succeed, try, try, try again Failure is an option

  16. Make Data User-Friendly Create GrADS descriptor files Aggregate files over time dimension Make use of ensemble dimension when appropriate Identify missing or overlapping time periods Assign non-standard dimensions (e.g. basin averages or fixed fields) Handle 365_day calendars Create PDEF files for non-rectilinear grids For ocean and sea ice realms ESMF’s RegridWeightGen utility generates the interpolation weights Vector fields must be rotated from grid-relative to Earth-relative coordinates before interpolation

  17. Get Additional Information About CMIP5: http://cmip-pcmdi.llnl.gov/cmip5/cmip5-helpdesk@stfc.ac.uk About ESGF: http://esgf.org/wiki/ESGF_Index/ esgf-user@lists.llnl.gov About this presentation: jma@iges.org

More Related