1 / 23

A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Stati

A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph. May 12-13, 2005. OUTLINE. 1. Introduction 2004 Summer Olympic Games

matteo
Download Presentation

A Multivariate Analysis on the 2004 Summer Olympic Games Wei Xiong, M.Sc Student, Department of Mathematics and Stati

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Multivariate Analysis on the 2004 Summer Olympic GamesWei Xiong, M.Sc Student, Department of Mathematics and Statistics, University of Guelph May 12-13, 2005

  2. OUTLINE 1. Introduction • 2004 Summer Olympic Games • Multivariate techniques: cluster analysis, multivariate analysis of variance, multivariate regression analysis • Literature review of analyses on Olympic Games 2. Data Analysis and Discussion 3. Conclusions

  3. 2004 Summer Olympic Games • the largest event, 11,000 athletes from 202 countries, 929 metals won by 75 countries/regions. • Multivariate (>1 response variable) Techniques • Cluster Analysis: obs’n (countries) classified into clusters (groups) based on each obsn’s similarity of multi variables (number of gold, silver, bronze and total), by measuring the distance or dissimilarity between any two clusters.

  4. Multivariate Analysis of Variance (MANOVA): a generalization of ANOVA, used to compare more than two population mean vectors Hypothesis: H0: 1 = … =  t versus Ha:  j≠  k (for some j ≠k) H0 is rejected if H = SS(Treatment) >> E = SS(Error) Wilk’s statistic = |E| / |E+H|

  5. MultivariateRegression model: Y (nxp) = X (nxq) (qxp) + E (nxp) where n: observations, p: response variables, q: explanatory variables Least square estimator of  is: (X'X )-1X'Y

  6. Literature review • Condon et al [1] tried to predict a country’s success at the Olympic Games using linear regression models and neural network models. • Lins et al [2] developed a Data Envelopment Analysis (DEA)-based model to rank each country based on its ability to win medals in relation to its available resources. • Churilov and Flitman [3] improved the Data Envelopment Analysis (DEA)-based model by combining different sets of input parameters with the DEA model. • This study:uses multivariate techniques to analyze the 2004 Summer Olympic Games and try to explore the factors that influence the number of medals won.

  7. Table 1: Rankings For Participating Countries Note: number of countries in cluster 1, 2, 3, 4 and 5 are 2, 3, 7, 7, 56 respectively.

  8. Table 2: Least Square Means for Group Medals Medals Group Note: # close to each other

  9. Multivariate Analysis of Variance (MANOVA):Compares the metal means for the 5 groups MANOVA Test: Hypothesis of No Overall Group Effect Statistic Value F Value Pr > F Wilks' Lambda 0.02126952 49.34 <.0001 proc glm; class group model y1-y4=group; manova h=group; lsmeans group/pdiff; run;

  10. Least Squares Means for effect group for silver (y2) Pr > |t| for H0: LSMean(i)=LSMean(j) i/j 1 2 3 4 2 <.0001 3 <.0001 <.0001 4 <.0001 <.0001 0.0572 5 <.0001 <.0001 <.0001 <.0001 Note: p-values for other metals < 0.0001

  11. ? WHY • Why some countries won more medals and the others won less • Hypotheis: the larger the population and GDP, the more the medals Population: the larger the population (x1), the more the outstanding athletes available GDP (Gross Domestic Product): the higher the GDP, the more the funding for athletes training

  12. Table 3: Multivariate Regression of Medals on Population (x1) [5] and GDP (x2) [6] proc glm; model y1-y4 = x1-x2/xpx i; run; y’s x’s

  13. Conclusions • The 2004 Summer Olympic Games are analyzed using multivariate methods: Cluster Analysis, Multivariate Analysis of Variance, Multivariate Regression Analysis. • Participating countries are classified into 5 groups based on their number of medals won. It is found that each group differs significantly in terms of the number of medals in that group.

  14. Population and GDP are two significant factors for each group’s number of medals: an increase of 1 million in population increase the number of gold by 0.0116, or the number of total medals by 0.019. 1 billion’s increase in GDP increase the number of gold by 0.0031, silver 0.0033, bronze 0.0027, or total by 0.0091. References [1] Edward M. Condon, Bruce L. Golden and Edward A. Wasil (1999).Predicting the success of nations at the Summer Olympics using neural networks.Computers & Operations Research. 26(13),1243-1265.

  15. [2] Marcos P. Estellita Lins, Eliane G. Gomes, João Carlos C. B. Soares de Mello and Adelino José R. Soares de Mello (2003). Olympic ranking based on a zero sum gains DEA model.  European Journal of Operational Research. 148(2), 312-322. [3] L. Churilov and A. Flitman (2004). Towards fair ranking of Olympics achievements: the case of Sydney 2000.Computers & Operations Research. Available online 6 November 2004. [4] http://www.athens2004.com/en/OlympicMedals/medals, accessed May 11, 2005. [5] http://www.geohive.com/global/index.php, accessed Nov. 25, 2004. [6] http://www.geohive.com/global/geo.php?xml=ec_gdp1&xsl=ec_gdp1, accessed May 11, 2005.

  16. Questions and Comments ? WELCOME !

  17. Appendix 1Table 1. Number of metals for each country/region • Country/Region,Gold,Silver,Bronze,Total • USA 35,39,29,103 CHN 32,17,14,63 RUS 27,27,38,92 AUS17,16,16,49 JPN16,9,12,37 GER 14,16,18,48 FRA11,9,13,33 ITA 10,11,11,32 KOR 9,12,9,30 GBR 9,9,12,30 CUB 9 7 11 27 UKR 9 5 9 23 HUN 8 6 3 17 ROM 8 5 6 19 GRE 6 6 4 16 NOR 5 0 1 6 NED 4 9 9 22 BRA 4 3 3 10 SWE 4 1 2 7 ESP 3 11 5 19 CAN 3 6 3 12 TUR 3 3 4 10 POL 3 2 5 10 NZL 3 2 0 5 THAThailand314826BLRBelarus2671527AUTAustria241728ETHEthiopia232729IRII.R.Iran222630SVKSlovakia222631TPEChineseTaipei221532GEOGeorgia220433BULBulgaria2191234JAMJamaica212535UZBUzbekistan212536MARMorocco210337DENDenmark206838ARGArgentina204639CHIChile201340KAZKazakhstan143841KENKenya142742CZECzechRepublic134843RSASouthAfrica132644CROCroatia122545LTULithuania120346EGYEgypt113547SUISwitzerland113548INAIndonesia112449ZIMZimbabwe111350AZEAzerbaijan104551BELBelgium102352BAHBahamas101253ISRIsrael101254CMRCameroon100155DOMDominicanRep100156IRLIreland100157UAEUArabEmirates100158PRKDPRKorea041559LATLatvia040460MEXMexico031461PORPortugal021362FINFinland020263SCGSerbia.Monteneg020264SLOSlovenia013465ESTEstonia012366HKGHongKong010167INDIndia010168PARParaguay010169NGRNigeria002270VENVenezuela002271COLColombia001172ERIEritrea001173MGLMongolia001174SYRSyrianArabRep001175TRITrinidad.Tobago0011

  18. SAS coding-1 data Anthemn2004SummerOlympic; input Country $ y1-y4; cards; see Table 1 for data ; proccluster method=eml standard rmsstd rsquare outtree=tree; var y1-y4 ; id country; run; proctree data=tree noprint n=5 out=countryout; id country; run; proctree data=tree n=5; id country; run; procsort; by country; procsort data=Anthemn2004SummerOlympic out=new; by country; data temp; merge new countryout; by country; procsort; by cluster; procprint; id country; procfactor heywood rotate=varimax, quartimax; var y1-y4 ; by cluster; procprincomp; var y1-y4 ; run; procfactor heywood rotate=varimax, quartimax; var y1-y4 ; run;

  19. SAS coding-2 data Anthemn2004SummerOlympic; input group y1-y4 x1-x2 ; cards; 5 35 39 29 103 273 10882 5 27 27 38 92 146 433 4 32 17 14 63 1247 1410 4 17 16 16 49 19 518 4 14 16 18 48 82 2401 ; procglm; class group; model y1-y4=group; manova h=group/printe printh; lsmeans group/pdiff; run;

  20. SAS coding data Anthemn2004SummerOlympic; input group y1-y4 x1-x2 ; cards; …………….. ; proccorr; var y1-y4 x1-x2; run; procglm; model y1-y4 = x1-x2/xpx i; MANOVA H=x1 x2 /printe printh; run;

  21. Cluster analysis: Countries Classified into 5 Groups Groups: 5 4 3 2 1 CAN

  22. Table 2: Factor Analysis on Metals Note: *cumulative eigenvalues, percentage of total variation explained in the four variables (metals) #Factor loading, correlation between latent factor and variables (Factor Analysis, rotation = quartimax, make latent factor strongly or weakly correlated to variables)

  23. Correlation Between y’s and x’s x1( # Population) [2] , x2 ( # GDP, Gross Domestic Product) [3] Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 y1 y2 y3 y4 x1 0.46543 0.3038 0.23199 0.34887 <.0001 0.0081 0.0452 0.0022 x2 0.70219 0.76180 0.60769 0.71640 <.0001 <.0001 <.0001 <.0001 Note: reasonable correlation between y’s and x1, large correlation between y’s and x2. # Both population and GDP are in 2003

More Related