190 likes | 280 Views
Climate Application Final Report. UC-Spain. SENAMHI-Perú. UDEC-Chile. Richard Miguel San Martín Mauricio Carrillo Gabriela Rosas Amelia Diaz Delia Acuña. Rodrigo Abarca Claudio Baeza. Jose M. Gutierrez Valvanuz Fernandez Antonio S. Cofiño Fernando García Jesús Fernandez. Challenges.
E N D
Climate ApplicationFinal Report UC-Spain SENAMHI-Perú UDEC-Chile Richard Miguel San Martín Mauricio Carrillo Gabriela Rosas Amelia Diaz Delia Acuña Rodrigo Abarca Claudio Baeza Jose M. Gutierrez Valvanuz Fernandez Antonio S. Cofiño Fernando García Jesús Fernandez
Challenges • Enabling grid computing for climate model simulation: Global circulation models provide a coarse description of the ocean and atmosphere (200km resolution) and have to be linked to regional models to obtain useful representations over areas of interest. Regional models depend on many parameters related to sub-grid physical processes (multi-parametric jobs). CAM + WRF CAM and WRF are open-source state of the art global and regional models. They need to be run in cascade: CAM WRF Sea surface temperature output converter NCAR Graphics library WRF EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Challenges • Enabling data mining applications on simulations: The high-dimensional character of the data involved in climate simulations requires efficient data mining techniques to extract some useful knowledge. Unsupervised clustering allows partitioning the simulation databases, producing characteristic weather or climate types (or groups) governing the global dynamics. Self-Organizing Maps (SOM) is one of the most popular clustering algorithms, which is especially suitable for high dimensional data visualization and modeling. The weather types can be locally projected to obtain statistical regional forecasts of variables of interest. (Right) Precipitation at two different stations in Peru for a El Niño period. EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Climate Cascade Demo Ensemble prediction systems comprise multiple runs of a weather model with slightly different initial conditions and/or model parameterizations. The resulting simulations contain valuable information about the sampled sources of uncertainty. Compare the SOM distribution of each parameterization. SE WRF (par 1) CAM WRF (par 2) Sea surface temperature … SOM WRF (par n) One El Niño year 365 simulations … EGRIS-1, Itacuruçá (Brasil), 4.12.2006
WRF CAM Application Achivements • CAM and WRF are running in the Lost Island Grid • CAM is a Data Producer and WRF a Data Consumer • WRF is feed with CAM data • We are using SYSTEM call from FORTRAN90 to upload information to LFC and AMGA. • All this is done using shell scripts. • Progress has been made for Task Management and Monitoring using AMGA • The user after polling what data is available, decides which job wants to run • Start, Restart or Cancel a CAM experiment Jobs • Start or Cancel WRF Tasks EGRIS-1, Itacuruçá (Brasil), 4.12.2006
CAM Status CAM Simulation LFC DATA SE Metadata CAM job Status AMGA Information CAM job Status & Checkpoint R-GMA CAM job Status & Data Output To be implemented EGRIS-1, Itacuruçá (Brasil), 4.12.2006
CAM: Community Atmospheric Model The Community Atmosphere Model (CAM) is the latest in a series of global atmosphere models developed at NCAR for the weather and climate research communities. • grid size: 128 x 64 x 27 (XYZ) = 221184 gridpoints • 6 output time steps = 197MB NetCDF -> 33MB/tstep • This includes ALL default variables (32x3D + 56x2D) • WRF only requires as input 5x3D and 9x2D (effective MB: 5/step = 620MB/month(6hly input). • 720GB per 100 years • 1 Year of simulation takes 48 CPU hours. • A climate simulations of 100 Years takes 7 CPU months A case study simulating the climate of the past century It will require a CAM job running 7 months. Then Checkpoints is an important feature. For ENSEMBLES studies (multi-parametric) those figures increase. EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Data and Metadata • Datasets produced by WRF and CAM models are stored in the LFC catalog. • The metadata from these datasets is extracted and uploaded to AMGA. • CAM produce checkpoint dataset and It’s uploaded to LFC and notified to the user using AMGA. EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Application Workflow • The user queries CAM Jobs status. If jobs is not running query to AMGA about if was done. If not check was what the last checkpoint file and restart the CAM job. • Meanwhile CAM job is running, the User queries AMGA about datasets produced by CAM then triggers the WRF jobs. EGRIS-1, Itacuruçá (Brasil), 4.12.2006
UI upload_info_CAM.sh Insert entry in WRF collection AMGA change_status_CAM.sh Update status Insert runon CAM collection AMGA checkpoint_CAM.sh Insert entry in CHECKPOINT collection AMGA update_history_CAM.sh Insert entry in HISTORYCAM collection AMGA Application Workflow Schema Resource Broker supersubmiter.sh 1 WN Generate CAM.jdl 2 2 3 Submit CAM.jdl 1 CAM 3 Insert entry in CAM collection AMGA 4 EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Coordinator: Task States 4 3 1 2 6 5 1: Task is ready for scheduled 2: Task is submited to Grid by coordinator 3: Task is running on Grid 4: Task is done successfull 5: Task execution or submit fail 6: Task cancelled by user through coordinator EGRIS-1, Itacuruçá (Brasil), 4.12.2006
CAM Info & Data Flow LFC DATA CAM SE Metadata AMGA Information R-GMA WRF To be implemented EGRIS-1, Itacuruçá (Brasil), 4.12.2006
WRF Info & Data Flow LFC DATA SE Metadata WRF AMGA Information R-GMA To be implemented EGRIS-1, Itacuruçá (Brasil), 4.12.2006
UI WRF and User Interaction LFC DATA SE Metadata coordinator AMGA Information R-GMA WRF To be implemented EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Application Workflow Improvement WRF CAM AMGA Portal Coordinator RB R-GMA management monitoring EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Issues about application development • A uniform framework for application development is missing. • gLite is a mixture of different initiatives • LFC and SE operations are unreliable, some times is not possible to delete or recover data. • Is the application responsible for reliability? Or the GRID? • An application workflow framework is required to have more monitoring and control over the applications. • Job submission is a quest, you never knows what is going to happen and little chance of post-mortem analysis. • Metadata is an important issue in Data Management that hasn’t been well establish. • A metadata system is a really something useful for development. • APIs are not well tested. • C and Python for sure they work, but PERL and JAVA release are not well tested. (Not enough user community?) EGRIS-1, Itacuruçá (Brasil), 4.12.2006
What was expected from EGRIS • DAGs and Checkpointable job submission. • Restart of jobs with dependencies . • Using metadata catalog from worker nodes: • Loading metadata with AMGA API from WN. • Integration of the metadata catalogs and datasets catalogue • Data access protocol to datasets. • OpenDAP service in the Storage Element. • Development of a portal for job submission and monitoring: • Authentication management from portal • Monitoring status of jobs. • Retrieval of information from metadata catalog EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Thanks Thanks to EELA project for organizing EGRIS-1. Thanks to the local committee to setting up the Lost Island GRID Thanks to the tutors for their help And thanks to Valva, Claudio and Mauricio for their effort to migrate the Climate Application to the GRID EGRIS-1, Itacuruçá (Brasil), 4.12.2006
Questions? EGRIS-1, Itacuruçá (Brasil), 4.12.2006