1 / 55

Geometric Interpretation of Random Sampling and Variance in Multivariate Data

Understanding the geometric aspects of random sampling, joint distributions, and variance in multivariate datasets. Explore proofs, examples, and interpretations to enhance statistical analysis.

joelgill
Download Presentation

Geometric Interpretation of Random Sampling and Variance in Multivariate Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia

  2. Array of Data *a sample of size n from a p-variate population

  3. Row-Vector View

  4. Example 3.1

  5. Column-Vector View

  6. Example 3.2

  7. Geometrical Interpretation of Sample Mean and Deviation

  8. Decomposition of Column Vectors

  9. Example 3.3

  10. Lengths and Angles of Deviation Vectors

  11. Example 3.4

  12. Random Matrix

  13. Random Sample • Row vectors X1’, X2’, …, Xn’ represent independent observations from a common joint distribution with density function f(x)=f(x1, x2, …, xp) • Mathematically, the joint density function of X1’, X2’, …, Xn’ is

  14. Random Sample • Measurements of a single trial, such as Xj’=[Xj1,Xj2,…,Xjp], will usually be correlated • The measurements from different trials must be independent • The independence of measurements from trial to trial may not hold when the variables are likely to drift over time

  15. Geometric Interpretation of Randomness • Column vector Yk’=[X1k,X2k,…,Xnk] regarded as a point in n dimensions • The location is determined by the joint probability distribution f(yk) = f(x1k, x2k,…,xnk) • For a random sample, f(yk)=fk(x1k)fk(x2k)…fk(xnk) • Each coordinate xjk contributes equally to the location through the same marginal distribution fk(xjk)

  16. Result 3.1

  17. Proof of Result 3.1

  18. Proof of Result 3.1

  19. Proof of Result 3.1

  20. Some Other Estimators

  21. Generalized Sample Variance

  22. Geometric Interpretation for Bivariate Case q

  23. Generalized Sample Variance for Multivariate Cases

  24. Interpretation in p-space Scatter Plot • Equation for points within a constant distance c from the sample mean

  25. Example 3.8: Scatter Plots

  26. Example 3.8: Sample Mean and Variance-Covariance Matrices

  27. Example 3.8: Eigenvalues and Eigenvectors

  28. Example 3.8: Mean-Centered Ellipse

  29. Example 3.8: Semi-major and Semi-minor Axes

  30. Example 3.8:Scatter Plots with Major Axes

  31. Result 3.2 • The generalized variance is zero when the columns of the following matrix are linear dependent

  32. Proof of Result 3.2

  33. Proof of Result 3.2

  34. Example 3.9

  35. Example 3.9

  36. Examples Cause Zero Generalized Variance • Example 1 • Data are test scores • Included variables that are sum of others • e.g., algebra score and geometry score were combined to total math score • e.g., class midterm and final exam scores summed to give total points • Example 2 • Total weight of chemicals was included along with that of each component

  37. Example 3.10

  38. Result 3.3 • If the sample size is less than or equal to the number of variables ( ) then |S| = 0 for all samples

  39. Proof of Result 3.3

  40. Proof of Result 3.3

  41. Result 3.4 • Let the p by 1 vectors x1, x2, …, xn, where xj’ is the jth row of the data matrix X, be realizations of the independent random vectors X1, X2, …, Xn. • If the linear combination a’Xj has positive variance for each non-zero constant vector a, then, provided that p < n, S has full rank with probability 1 and |S| > 0 • If, with probability 1, a’Xj is a constant c for all j, then |S| = 0

  42. Proof of Part 2 of Result 3.4

  43. Generalized Sample Variance of Standardized Variables

  44. Volume Generated by Deviation Vectors of Standardized Variables

  45. Example 3.11

  46. Total Sample Variance

  47. Sample Mean as Matrix Operation

  48. Covariance as Matrix Operation

  49. Covariance as Matrix Operation

  50. Covariance as Matrix Operation

More Related