240 likes | 337 Views
Statistics 359a. Regression Analysis. Necessary Background Knowledge - Statistics. expectations of sums variances of sums distributions of sums of normal random variables t distribution – assumptions and use calculation of confidence intervals simple tests of hypotheses and p-values.
E N D
Statistics 359a Regression Analysis
Necessary Background Knowledge - Statistics • expectations of sums • variances of sums • distributions of sums of normal random variables • t distribution – assumptions and use • calculation of confidence intervals • simple tests of hypotheses and p-values
Necessary Background Knowledge – Linear Algebra • multiplication of conformable matrices • transpose of a matrix • determinant of a square matrix • inverse of a square matrix • eigenvalues of a square matrix • quadratic forms
Origin of Least Squares Introduction of the metric system and the length of a meter • 1790 – French National Assembly commissions the French Academy of Sciences to design a simple decimal-based system of weights and measures • 1791 – French Academy defines the meter to be 10-7 or one ten-millionth of the length of the meridian through Paris from the north pole to the equator.
Adrien-Marie Legendre • Legendre on the French commission in 1792 to determine the length of the meridian quadrant • measurements of latitude made in 1795 • complex calculations made from the measurements in 1799 • Legendre proposes the method of least squares in 1805 to determine the length of a meter
Data • old French units of measurement: 1 module = 2 toises • old French to imperial English: 1 toise = 6.395 feet • metric to imperial: 1 meter = 3.2808 feet
Solution is: D = 28497.78 modules 90D = 2564800.2 modules = length of the meridian quadrant Therefore 1 meter = 0.256480 modules = 0.512960 toises = 3.280 feet modern meter = 3.2808 feet
Origin of the Term “Regression” • Francis Galton, 1886, ‘Regression towards mediocrity in hereditary stature.’ Journal of the Anthropological Institute, 15: 246 – 263 • See JSTOR under UWO library databases
Theoretical Basis For X and Y bivariate normal with equal means variances For > 0 E(Y |X ) < x for x > and E(Y |X ) > x for x <
Example in Data Analysis Through Regression • Relationship between the price of a violin bow and its attributes such as age, shape and ornamentation on the bow
Price and Date of Sale • 1995 seems to be a more expensive year • Is the effect confounded with some other attribute common to 1995?
Price and Year of Manufacture • Is there anything special about 1920? • Is there a quadratic trend in the data?
Price and Weight of the Bow • Is there any trend with respect to the weight?
Octagonal vs. Round Bows • No apparent trend
The Gold Standard? • The presence of gold on a bow generally makes it more expensive
Tortoise Shell Frogs • Some evidence of added expense for tortoise shell
Price and Pearl Accessories • No apparent effect
Can we use the model built with the current data to predict the future price of a bow Example: some 1999 data from auctions 1920 bow, 60.5 g., round with gold and pearl accessories - $4098 1933 bow, 61 g., octagonal with pearl accessories only - $2421 Prediction