340 likes | 499 Views
Math 15 Introduction to Scientific Data Analysis. Lecture 4 Descriptive Statistics. Course Lecture Schedule. There is a quiz today!. http://ccb.ucmerced.edu/home.php?id=jobs. Review – Absolute and relative references. How to display data. There are some steps missing!. Histogram
E N D
Math 15Introduction to Scientific Data Analysis Lecture 4 Descriptive Statistics UC Merced
Course Lecture Schedule • There is a quiz today!
http://ccb.ucmerced.edu/home.php?id=jobs UC Merced
How to display data There are some steps missing! UC Merced
Histogram • Histogram is a graphical display of tabulated frequencies. • Frequency • Number of observations in a given statistical category (or group). • Total outcomes of a class (bin, or group.) • A grouping is called a class (or bin). • Data is often summarized with 5 to 15 classes (bins). • Bin width = (Range of data)/(# of bins) UC Merced
1 0.167 0 0.0 1 0.167 1 0.167 3 0.5 6 1.0 Example: Let’s Make a Histogram by Hand! • Range: (Max – Min) = 74-51 = 23 • Bin width: (Range)/# of bins = 23/5 = 4.6 • Make table! 62,66,74,51,73,71 How frequently “the group” appeared? UC Merced
62,66,74,51,73,71 Remember these steps to create a histogram! UC Merced
Any Questions? UC Merced
One research may well generate masses of data. • For example, a comparatively small study that distributes 200 questionnaires with maybe 20 items on each can generate potentially 4000 items of raw data. • To make sense of this data it needs to be summarized in some way, so that • The reader has an idea of the typical values in the data, and how these vary. • The reader can construct a mental picture (including chart or graph) of the data and the people, events or objects they relate to. • To do this researchers use descriptive or summary statistics: they describe or summaries the data. • Describe a quantitative nature of the data UC Merced
1. Statistics • Science of collecting, organizing, presenting analyzing, and interpreting numerical data in relation to the decision making process. • Goal of Statistics • Get a “feel” for the data • Type of variables • Descriptive (Summary) statistics • Pictorial representation – Graphs and Charts We did this on the last lab! UC Merced
Basic Statistics Definitions • Population • Totality under study • i.e. • Students attending UC Merced • US population (300 millions) • Fishes in Lake Yosemite, etc. • Sample • Subset of a population • i.e. • Students taking Math 15 (for UC Merced) • Fishes caught in Lake Yosemite, etc. You may need a number of samples to have a good statistics that reflect a nature of population! UC Merced
2. Types of descriptive statistics • All quantitative studies will have some descriptive statistics, • As well as frequency tables (Histograms). • For example, sample size, maximum and minimum values, averages and measures of variation of the data about the average. • The two main types of descriptive statistics encountered in research papers are: • Measures of central tendency • Measures of dispersion. UC Merced
Measures of Central Tendency • These statistics provide a measure of what values lie at the center of the distribution. • The most common is called the MEAN or sometimes the AVERAGE (or the EXPECTED VALUE) • The formula for the sample mean is the sum of all values divided by the number of observations. UC Merced
Median Measures of Central Tendency • The MODE represents the most frequently occurring value. • Thinking visually, the mode would be in a histogram the tallest bar. • The MEDIAN is the 50th percentile, such that half of the values are above the median and half the values are below. 51,62,66,71,73,74 n = 6 UC Merced
Measures of Central Tendency • Mean (Average) • Excel: =AVERAGE(cell range) • Median • Excel: =MEDIAN(cell range) • Mode • Excel: =MODE(cell range) UC Merced
The choice of which particular descriptive statistics to report will affect the “picture” that is presented of the data, and there is the potential to mislead. Example: Redmond, WA where Bill Gates and his family live: The city has ~46,000 people with a mean income of about $36,000. What would be the effect on mean and median of including Bill Gates (assuming his income is $2.5 billion per year)? UC Merced
Simple Examples: Mean, Median, Mode • Respiratory rate • Group 1 = (11, 12, 13, 13, 14, 15); • Group 2 = (11, 12, 13, 13, 14, 39); • The mean (average) is more susceptible to extreme values • Median is 13 – value that divides data in 50:50 • Mode is 13 – the most common value UC Merced
Example Question: What is average or median of these 400 data points? UC Merced
Excel can do… Mode Average UC Merced
Any Questions? UC Merced
3. Measures of dispersion or variability • Measures how “spread out” around the center are the data. • A measure of variability is the RANGE. • This is simply the maximum value minus the minimum value. UC Merced
Mode RANGE Average Measures of dispersion or variability • RANGE. • This is simply the maximum value minus the minimum value. more susceptible to extreme values • There is no RANGE function in Excel: = Max() – Min() UC Merced
Measures of dispersion or variability. • The most common measures of variability are • STANDARD DEVIATION and VARIANCE. • The variance is the standard deviation squared, or the standard deviation is the square root of the variance. • Personally, prefer to think in terms of standard deviations, because it represents the “typical” (or “standard”) deviation of values from the mean. UC Merced
Measures of dispersion or variability. • The formula for the sample variance is the sum of squared deviations from the mean divided by the number of observations minus 1: • The sample standard deviation (= s) is simply the square root of the sample variance. Sum of square of distances from the mean UC Merced
Mode Average Measures of dispersion or variability • Variance or Standard Deviation • The one on the left is more dispersed than the one on the right. It has a higher variance. UC Merced
Measures of Dispersion • Range: • Excel: =MIN(cell range) and =MAX(cell range) to get minimum and maximum values • To define the Range: = MAX(:)-MIN(:) • Sample Standard Deviation and Sample Variance: • Excel: =STDEV(cell range) or =VAR(cell range) UC Merced
4. Shape of Curve UC Merced
4. Measures of Shape • Skewness is a measure of the lack of symmetry in a distribution, or whether the distribution is skewed to the left or the right. • “Positive skewness”: Values clustered toward lower range with a long tail extending to upper ranges. • “Negative skewness”: Values clustered toward upper range with long tail extending to lower ranges. • If you are interested, the formula for the skewness is: UC Merced
Mode Average Illustrate the notion of skewness Mode • Both have the same average and variance. Average positively skewed negatively skewed UC Merced
Measures of Shape • Skewness • Excel: =SKEW(cell range) UC Merced
Any Questions? UC Merced
Creating Wheelchair lanes in the side walk One more Example: How to make an analysis? Creating Wheelchair lanes in the side walk Creating Wheelchair lanes in the side walk UC Merced
Important Announcements UC Merced