220 likes | 315 Views
Statistics for CS 312. Descriptive vs. inferential statistics. Descriptive – used to describe an existing population Inferential – used to draw conclusions of related populations. Graphical descriptions. Histograms Frequency polygons/curves Pie charts. Measures of central tendency.
E N D
Descriptive vs. inferential statistics • Descriptive – used to describe an existing population • Inferential – used to draw conclusions of related populations
Graphical descriptions • Histograms • Frequency polygons/curves • Pie charts
Measures of central tendency • Mean – average – used most often • Median – midpoint value – used when data is skewed • Mode – most frequently occurring value – used when interested in what most people think
Measures of variability • Range – highest value minus lowest value • Standard deviation – average of how distant the individual values are from the mean
Normal curve • Bell shaped curve – 68% of values lie within one standard deviation of the mean • Non-normal – skewed either negatively (tail to left) or positively (tail to right) • Percentiles - values that fall between two percentile values • Standard scores – distance from mean in terms of the standard deviation – z = (X-m) / s. • Z scores – transformed standard scores – Z = 10z + 50
Variables • Quantitative – things that can be measured (age, income, number of credits) • Qualitative – things without an inherent order (college major, address)
Populations and samples • Population – entire universe from which a sample is drawn • Sample – subset of population • Symbols – mean m, µ; standard deviation s, σ; variance s2, σ2
How representative is the sample • Random sample – use random numbers to choose members of the sample • Stratified sample – sample that represents subgroups proportionally
Hypothesis testing • Hypothesis as to relationship of variables – similar or different • Inference from a sample to the entire population
Statistical significance • Accept true hypotheses and reject false ones • Based on probability (10 heads in a row occurs once in 1024 coin tosses) • Significant result means a significant departure from what might be expected from chance alone • Example – a result two standard deviations from the mean occurs 2.3% of the time in a normally distributed population
Null hypothesis • Assumption that there is no difference between two variables • Example – Male and female college students do similar amounts of music downloading using BitTorrent. • Example – School use of computers is unrelated to income of the students’ families
Levels of significance • 5 percent level – Event could occur by chance only 5 times in 100 • 1 percent level – Event could occur by chance only 1 time in 100 • Significance level should be chosen before doing experiment
Types of errors • Type I error – Rejection of a true null hypothesis • Type II error – Acceptance of a false null hypothesis • Decreasing one type increases the other
One and two tailed tests • One tailed test – Experimental values will only fail the null hypothesis in one direction • Two tailed test – Values could occur on either the positive or negative tail of the curve
Estimation • Concerns the magnitude of relationships between variables • Hypothesis testing asks “is there a relationship” • Estimation asks “how large is the relationship” • Confidence interval – provides an estimate of the interval that the mean will be in
Sequence of activities • Description • Tests of hypotheses • Estimation • Evaluation
Correlation • Quantifiable relationship between two variables • Example – relationship between age and type of computer games played • Example – relationship between family income and speed of home computer connection.
Correlation chart • Two (or more) dimensional table • Variables on the axes, could be intervals • Scattergram – positive correlated values scatter with positive slope, negative with negative slope
Product-moment coefficient • Formula based on deviations from means • If deviations are the same or similar, values are positively correlated • If deviations are the opposite, values are negatively correlated • Most correlations are somewhere in between +1 and -1
D D A B C A B C Perfect positive correlation: r = +1 X Y Y
D D A B C C B A Perfect negative correlation: r = -1 X Y