Finishing the Wiki

Finishing the Wiki • Attempt to follow through on the Wiki as best you can • Refine hypotheses • Find data to support/refute hypotheses. • Produce graphs/charts to support your claim • If possible, add additional analysis • I will assess you

Assessing the Wiki • I will assess your Wiki grades based on • Amount of contribution • Number of times you worked on it. • Quality of thought and analysis, as based on well articulated discussion • Supporting information such as data, graphs, charts, etc.

Introduction: The General Linear Model • The General Linear Model is a phrase used to indicate a class of statistical models which include simple linear regression analysis. • Regression is the predominant statistical tool used in the social sciences due to its simplicity and versatility. • Also called Linear Regression Analysis.

Simple Linear Regression: The Basic Mathematical Model • Regression is based on the concept of the simple proportional relationship - also known as the straight line. • We can express this idea mathematically! • Theoretical aside: All theoretical statements of relationship imply a mathematical theoretical structure. • Just because it isn’t explicitly stated doesn’t mean that the math isn’t implicit in the language itself!

Simple Linear Relationships • Alternate Mathematical Notation for the straight line - don’t ask why! • 10th Grade Geometry • Statistics Literature • Econometrics Literature

Alternate Mathematical Notation for the Line • These are all equivalent. We simply have to live with this inconsistency. • We won’t use the geometric tradition, and so you just need to remember that B0 and a are both the same thing.

Linear Regression: the Linguistic Interpretation • In general terms, the linear model states that the dependent variable is directly proportional to the value of the independent variable. • Thus if we state that some variable Y increases in direct proportion to some increase in X, we are stating a specific mathematical model of behavior - the linear model.

Linear Regression:A Graphic Interpretation

The linear model is represented by a simple picture

The Mathematical Interpretation: The Meaning of the Regression Parameters • a = the intercept • the point where the line crosses the Y-axis. • (the value of the dependent variable when all of the independent variables = 0) • b = the slope • the increase in the dependent variable per unit change in the independent variable (also known as the 'rise over the run')

The Error Term • Such models do not predict behavior perfectly. • So we must add a component to adjust or compensate for the errors in prediction. • Having fully described the linear model, the rest of the semester (as well as several more) will be spent of the error.

The 'Goal' of Ordinary Least Squares • Ordinary Least Squares (OLS) is a method of finding the linear model which minimizes the sum of the squared errors. • Such a model provides the best explanation/prediction of the data.

Why Least Squared error? • Why not simply minimum error? • The error’s about the line sum to 0.0! • Minimum absolute deviation (error) models now exist, but they are mathematically cumbersome. • Try algebra with | Absolute Value | signs!

Other models are possible... • Best parabola...? • (i.e. nonlinear or curvilinear relationships) • Best maximum likelihood model ... ? • Best expert system...? • Complex Systems…? • Chaos models • Catastrophe models • others

The Notion of Linear Change • The linear aspect means that the same amount of increase unemployment will have the same effect on crime at both low and high unemployment. • A nonlinear change would mean that as unemployment increased, its impact upon the crime rate might increase at higher unemployment levels.

Why squared error? • Because: • (1) the sum of the errors expressed as deviations would be zero as it is with standard deviations, and • (2) some feel that big errors should be more influential than small errors. • Therefore, we wish to find the values of a and b that produce the smallest sum of squared errors.

Minimizing the Sum of Squared Errors • Who put the Least in OLS • In mathematical jargon we seek to minimize the Unexplained Sum of Squares (USS), where:

Decomposition of the error in LS

T-Tests • Since we wish to make probability statements about our model, we must do tests of inference. • Fortunately,

Measures of Goodness of fit • The Correlation coefficient • r-squared • The F test

The correlation coefficient • A measure of how close the residuals are to the regression line • It ranges between -1.0 and +1.0 • It is closely related to the slope.

Tests of Inference • t-tests for coefficients • F-test for entire model • Since we are interested in how well the model performs at reducing error, we need to develop a means of assessing that error reduction. Since the mean of the dependent variable represents a good benchmark for comparing predictions, we calculate the improvement in the prediction of Yi relative to the mean of Y (the best guess of Y with no other information).

Goodness of fit • The correlation coefficient • A measure of how close the residuals are to the regression lineIt ranges between -1.0 and +1.0 • r2 (r-square) • The r-square (or R-square) is also called the coefficient of determination • Ranges between 0.0 and 1.0 • Expresses the % of Y explained by X

Finishing the Wiki

Finishing the Wiki

Presentation Transcript

“ Platypus Wiki ” The Semantic Wiki Wiki Web

Using The Wiki

Wiki

Finishing

Finishing

Finishing the Year

Wiki

WIKI

The WIKI

Wiki

Finishing the Program

Exploring the WIKI

The Wiki and the Blog NIH Wiki Fair

Finishing