150 likes | 283 Views
Project Athena: Origins. The World Modeling Summit (WMS) in May 2008 called for a revolution in climate modeling to more rapidly advance improvements in accuracy and reliability
E N D
Project Athena: Origins • The World Modeling Summit (WMS) in May 2008 called for a revolution in climate modeling to more rapidly advance improvements in accuracy and reliability • The WMS recommended petascale supercomputers dedicated to climate modeling based in at least 3 international facilities • Dedicated petascale machines are needed to provide enough computational capability and a controlled environment to support long runs and the management, analysis and stewardship of very large (petabyte) data sets • The U.S. National Science Foundation, recognizing the importance of the problem, realized that a resource (Athena) was available to meet the challenge of the World Modeling Summit and offered to dedicate the Athena supercomputer for 6 months in 2009-2010 • An international collaboration was formed among groups in the U.S., Japan and the U.K. to use Athena to take up the challenge
Project Athena: Collaborating Groups COLA- Center for Ocean-Land-Atmosphere Studies, USA (NSF-funded) ECMWF- European Centre for Medium-range Weather Forecasts, UK JAMSTEC- Japan Agency for Marine-Earth Science and Technology, Research Institute for Global Change, Japan University of Tokyo, Japan NICS- National Institute for Computational Sciences, USA (NSF-funded) CrayInc. Codes NICAM: NonhydrostaticIcosahedral Atmospheric Model IFS: ECMWF Integrated Forecast System Supercomputers • Athena: Cray XT4 - 4512 quad-core Opteron nodes (18048) • #30 on Top500 list (November 2009) – dedicated Oct’09 – Mar’10 • Kraken: Cray XT5 - 8256 dual hex-core Opteron nodes (99072) • #3 on Top500 list (November 2009) replaced Athena – allocation of 5M SUs
Athena Experiments http://wxmaps.org/athena/home/ Straus/GMU/COLA AGU Dec 2010
Surface pressure Potential Vorticity
Blocking Index. 13 month integrations of ECMWF model (at T159 and T1259). DJFM 1960-2003 ERA-40 T1259 T159
DJFM Weather Regimes Euro-Atlantic Region 500 hPa Geopotential height – ERA DJFM 1960-2007
Aircraft observations showing spectra of wind components and T, plotting log(E) vs. log(k), so that the slope of the straight lines indicate the exponent n in the previous slide. Other in-situ observations have confirmed these results! Atmospheric Spectra Power Laws Two scaling regimes: Log-log plot of Energy vs. wavenumber 100 – 10 km Straus/GMU/COLA AGU Dec 2010
ECMWF Dec-March Simulations: Eddy Kinetic Energy Spectrum 250 hPa Hint of two regimes at T1279 and T2047 but not at T511 Total Eddy Kinetic Energy ECMWF Black: T511 (40 km grid) Red: T1279 (16 km grid) Blue: T2047 (10 km grid) 250 hPa 5 DJF seasons Note sudden downturn in spectra: suggests dissipation regime Straus/GMU/COLA AGU Dec 2010
Local Spectral Slope b En ~ n-b y-axis is b x-axis is log10(n) Large slope indicates dissipation regime Slope of Total Eddy Kinetic Energy Black: T511 (40 km grid) Red: T1279 (16 km grid) Blue: T2047 (10 km grid) 250 hPa level 5 DJF seasons T2047, T1279 show weak shallowing of spectra at higher wavenumbers Least squares fit of log10(eddy kinetic energy) to line with slope –b , locally over a range of constant log10(n) Straus/GMU/COLA AGU Dec 2010
Athena Results • Seasonal Length Runs • Results shown for 5 DJF seasons and 5 JJA seasons • Results for both ECMWF and NICAM models Straus/GMU/COLA AGU Dec 2010
Cluster analysis methodology The (modified) K-means cluster analysis method (K is the number of clusters into which the data will be grouped, this number must be specified in advance) (Straus et al. 2007) can be summarized in the following four steps: Identification of clusters in the reduced phasespace defined by the empirical orthogonal functions (EOFs. The leading EOFs (to explain about 80% of the space-time variance) are retained. For a given number k of clusters, the optimum partition of data into k clusters is found by an algorithm that takes an initial cluster assignment (based on the distance from pseudorandom seed points), and iteratively changes it by assigning each element to the cluster with the closest centroid, until a ‘‘stable’’ classification is achieved. (A cluster centroid is defined by the average of the PC coordinates of all states that lie in that cluster.) This process is repeated many times (using different seeds), and for each partition the ratio r*kof variance among cluster centroids (weighted by the population) to the average intra-cluster variance is recorded. The partition that maximises this ratio is the optimal one.
Cluster analysis - Significance The goal is to assess the strength of the clustering compared to that expected from an appropriate reference distribution, such as a multidimensional Gaussian distribution. • In assessing whether the null hypothesis of multi-normality can be rejected, it is therefore necessary to perform Monte-Carlo simulations using a large numberM of synthetic data sets. • Each synthetic data set has precisely the same length as the original data set against which it is compared, and it is generated from a series of n dimensional Markov processes, whose mean, variance and first-order auto-correlation are obtained from the observed data set. • A cluster analysis is performed for each one of the simulated data sets. For each k-partition the ratio rmk of variance among cluster centroids to the average intra-cluster variance is recorded. • Since the synthetic data are assumed to have a unimodal distribution, the proportionPkof red-noise samples for which rmk < r*k is a measure of the significance of the k-cluster partition of the actual data, and 1- Pkis the corresponding confidence level for the existence of k clusters.
Cluster analysis - How many clusters? The need of specifying the number of clusters can be a disadvantage of K-means method if we don’t know in advance what is the best cluster partition of the data set in question. However there are some criteria that can be used to choose the optimal partition. • Significance: partition with the highest significance with respect to predefined Multinormal distributions • Reproducibility: We can use as a measure of reproducibility the ratio of the mean-squared error of best matching cluster centroids from a N pairs of randomly chosen half-length datasets from the full actual one. The partition with the highest reproducibility will be chosen. • Consistency: The consistency can be calculated both with respect to variable (for example comparing clusters obtained from dynamically linked variables) and with respect to domain (test of sensitivities with respect to the lateral or vertical domain).