530 likes | 740 Views
Statistics. Descriptive Mathematics. Introductory Video. https://www.youtube.com/watch?v=edEXEyvG4Wk https://www.youtube.com/watch?v=84H8HNV9mk0. Data vs. Statistics. Data is: . information that is gathered from a population (or a sample of the population). Statistics are:.
E N D
Statistics Descriptive Mathematics
Introductory Video • https://www.youtube.com/watch?v=edEXEyvG4Wk • https://www.youtube.com/watch?v=84H8HNV9mk0
Data vs. Statistics Data is: information that is gathered from a population (or a sample of the population). Statistics are: numbers that gives information about a population (or a sample of the population). Examples of statistics are: mean, median, mode, range, variance, and standard deviation.
Descriptive Statistics • Descriptive statistics includes statistical procedures that we use to describe the population we are studying. The data could be collected from either a sample or a population, but the results help us organize and describe data. Descriptive statistics can only be used to describe the group that is being studying. That is, the results cannot be generalized to any larger group. • Descriptive statistics are useful and serviceable if you do not need to extend your results to any larger group. However, much of social sciences tend to include studies that give us “universal” truths about segments of the population, such as all parents, all women, all victims, etc. • Frequency distributions, measures of central tendency (mean, median, and mode), and graphs like pie charts and bar charts that describe the data are all examples of descriptive statistics.
Inferential Statistics • Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample. That is, we can take the results of an analysis using a sample and can generalize it to the larger population that the sample represents. In order to do this, however, it is imperative that the sample is representative of the group to which it is being generalized. • To address this issue of generalization, we have tests of significance. A Chi-square or T-test, for example, can tell us the probability that the results of our analysis on the sample are representative of the population that the sample represents. In other words, these tests of significance tell us the probability that the results of the analysis could have occurred by chance when there is no relationship at all between the variables we studied in the population we studied. • Examples of inferential statistics include linear regression analyses, logistic regression analyses, ANOVA, correlation analyses, structural equation modeling, and survival analysis, to name a few.
Difference between these two • As seen above, descriptive statistics is concerned with telling about certain features of a data set. Although this is helpful in learning things such as the spread and center of the data we are studying, nothing in the area of descriptive statistics can be used to make any sort of generalization. In descriptive statistics measurements such as the mean and standard deviation are stated as exact numbers. Though we may use descriptive statistics all we would like in examining a statistical sample, this branch of statistics does not allow us to say anything about the population. • Inferential statistics is different from descriptive statistics in many ways. Even though there are similar calculations, such as those for the mean and standard deviation, the focus is different for inferential statistics. Inferential statistics does start with a sample and then generalizes to a population. This information about a population is not stated as a number. Instead we express these parameters as a range of potential numbers, along with a degree of confidence. • It is important to know the difference between descriptive and inferential statistics. This knowledge is helpful when we need to apply it to a real world situation involving statistical methods.
Population • A population is the total set of individuals, groups, objects, or events that the researcher is studying. • For example, if we were studying employment patterns of recent U.S. college graduates, our population would likely be defined as every college student who graduated within the past one year from any college across the United States.
The unit of population is whatever you are counting: there can be a population of people, a population of households, a population of events, institutions, transactions, and so forth. Anything you can count can be a population unit. • But if you can't get information from it, and you can't measure it in some way, it's not a unit of population that is suitable for survey research.
Sample • To study the population, we select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. • A sample is a relatively small subset of people, objects, groups, or events, that is selected from the population. Instead of surveying every recent college graduate in the United States, which would cost a great deal of time and money, we could instead select a sample of recent graduates, which would then be used to generalize the findings to the larger population.
Survey research is based on sampling, which involves getting information from only some members of the population.
Why there is a need of SAMPLING when we have the POPULATION ?
Question Time ! • A proposal before a state's legislature would increase the gasoline tax. The additional funds would be used to improve the state's roads. Some state legislators are concerned about how the voters view this proposal. To gain this information, a pollster randomly selects 1,009 registered voters in the state and asks each whether or not he or she favors the additional tax for the designated purpose. Describe the population and sample.
Answer • The population is all registered voters in the state. The sample is made up of the 1,009 registered voters who were polled.
Simple random sample • Before we discuss random sampling,you need to be clear about the exact meaning of "random.“ • In common speech, it means "anything will do", but the meaning used in statistics is much more precise: a person is chosen at random from a population when every member of that population has the same chance of being sampled. • If some people have a higher chance than others, the selection is not random. To maximize accuracy, surveys conducted on scientific principles always use random samples.
Imagine a complete list of the population, with one line for every member: for example, a list of 1500 members of an organization, numbered from 1 up to 1500. Suppose you want to survey 100 of them. To draw a simple random sample, choose 100 different random numbers, between 1 and 1500. Any member whose number is chosen will be surveyed. If the same number comes up twice, the second occurrence is ignored, as nobody will be surveyed more than once.
Sample or Census ? • A sample involves looking only at some items selected from the population but a census is an examination of all items in a defined population. The accuracy of a census can be illusory. • For example, The U.S. decennial census cannot locate every individual in the United States (the 1990 census is thought to have missed 8 million people while the 2000 census is believed to have over counted 1.3 million people). • Reasons include the extreme mobility of the U.S. population and the fact that some people do not want to be found (e.g., illegal immigrants) or do not reply to the mailed census form. Further budget constraints make it difficult to train enough census field workers, install data safeguards, and track down incomplete responses or non responses. For these reasons, U.S. censuses have long used sampling in certain situations. • Many statistical experts advised using sampling more extensively in the 2000 decennial census, but the U.S. Congress decided that an actual headcount must be attempted. • When the quantity being measured is volatile, there cannot be a census. • For example, The Arbitron Company tracks American radio listening habits using over 2.6 million “Radio Diary Packages.” For each “listening occasion,” participants note start and stop times for each station. Panelists also report their age, sex, and other demographic information. Table 2.5 outlines some situations where a sample rather than a census would be preferred, and vice versa.
CENSUS • A census is a method which involves collecting data about every individual in a whole population. • This individuals in a population may be people or object. • A census is detailed and accurate, but is expensive, time consuming and often impractical.
Parameters Vs. Statistic • A parameter is a numerical measure of a population. • A statistic is a numerical measure of a sample. • For a given population, a parameter is fixed, while the value of a statistic may vary from sample to sample.
Examples • If we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. • Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter.
List down population, sample, parameter, statistic, variable, data. • Determine what the key terms refer to in the following study. We want to know the average (mean) amount of money first year college students spend at ABC College on school supplies that do not include books. We randomly survey 100 first year students at the college. Three of those students spent $150, $200, and $225, respectively.
Solution • The population is all first year students attending ABC College this term. • The sample could be all students enrolled in one section of a beginning statistics course at ABC College (although this sample may not represent the entire population). • The parameter is the average (mean) amount of money spent (excluding books) by first year college students at ABC College this term. • The statistic is the average (mean) amount of money spent (excluding books) by first year college students in the sample. • The variable could be the amount of money spent (excluding books) by one first year student. Let X = the amount of money spent (excluding books) by one first year student attending ABC College. • The data are the dollar amounts spent by the first year students. Examples of the data are $150, $200, and $225.
TRY IT • Determine what the key terms refer to in the following study. We want to know the average (mean) amount of money spent on school uniforms each year by families with children at Knoll Academy. We randomly survey 100 families with children in the school. Three of the families spent $65, $75, and $95, respectively.
Variable • A variable is a characteristic that changes or varies over time and/or for different objects under consideration. • A variable, notated by capital letters such as X and Y, is a characteristic of interest for each person or thing in a population. • E.g. If you are measuring the height of adults in a certain area, the height is a variable that changes with time for an individual and from person to person .
Numerical (Quantitative) Data • They are numeric. • They represent a measurable quantity. • Quantitative data are the result of counting or measuring attributes of a population. • E.g. heights of students at school, pulse rate, weight, number of people living in your town, number of students who are in Math HL or SL.
TRY IT • The data are the number of machines in a gym. You sample five gyms. One gym has 12 machines, one gym has 15 machines, one gym has ten machines, one gym has 22 machines, and the other gym has 20 machines. What type of data is this? • The data are the areas of lawns in square feet. You sample five houses. The areas of the lawns are 144 sq. feet,160 sq. feet, 190 sq. feet, 180 sq. feet, and 210 sq. feet. What type of data is this?
Categorical (Qualitative) Variable • Qualitative data are the result of categorizing or describing attributes of a population. • They measure a quality or characteristic. • E.g. Hair color, blood type, the color of a ball (red, blue,black,etc.)
TRY IT • You go to the supermarket and purchase three cans of soup (19 ounces) tomato bisque, 14.1 ounces lentil, and 19 ounces Italian wedding), two packages of nuts (walnuts and peanuts), four different kinds of vegetable (broccoli, cauliflower, spinach, and carrots), and two desserts (16 ounces Cherry Garcia ice cream and two pounds (32 ounces chocolate chip cookies). • Name data sets that are quantitative discrete, quantitative continuous, and qualitative.
Solution • The three cans of soup, two packages of nuts, four kinds of vegetables and two desserts are quantitative discrete data because you count them. • The weights of the soups (19 ounces, 14.1 ounces, 19 ounces) are quantitative continuous data because you measure weights as precisely as possible. • Types of soups, nuts, vegetables and desserts are qualitative data because they are categorical.
NOTE • You may collect data as numbers and report it categorically. • For example, the quiz scores for each student are recorded throughout the term. • At the end of the term, the quiz scores are reported as A, B, C, D, or F.
Univariate Vs. Bivariate Data • Statistical data are often classified according to the number of variables being studied. • Univariate data. • When we conduct a study that looks at only one variable, we say that we are working with univariate data. • Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.
Bivariate Data • When we conduct a study that examines the relationship between two variables, we are working with bivariate data. • Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.
Representation of data • When data is first collected, there are some simple ways of beginning to organize the data. • Data in row form • Data in ordered array from smallest to largest or vice versa. • Stem-and-leaf display • The most useful are the frequency distribution and the Histogram.
Key terms Involved with frequency distribution • Frequency - It is the number of times a particular data point occurs in the set of data. • Frequency distribution - It is a table that list each data point and its frequency. • Relative Frequency - is the frequency of a data point expressed as a percentage of the total number of data points
Example • Consider the data set • 1 , 3, 1, 2, 4, 2, 4, 1, 5, 3, 1, 3, 2, 2, 4, ,1, 3, 4, 1, 2, 3, 2, 4, 1, 3, 2, 1, 2, 5, 2 • Represent the above data in a frequency table form.
Histograms • A histogram consists of continuous (adjoining) boxes. It has both a horizontal axis and a vertical axis. • The horizontal axis is labelled with what the data represents (for instance, distance from your home to school). The vertical axis is labelled either frequency or relative frequency (or percent frequency or probability). The graph will have the same shape with either label. • The histogram (like the stem plot) can give you the shape of the data, the centre, and the spread of the data.
When data is recorded for continuous variable, there are likely to be many different values. • We organize the data in a frequency table by grouping it into a class interval of different width.
Some similar features of Column graphs and Frequency Histograms
Class limits – boundaries - intervals • http://wizznotes.com/mathematics/statistics/class-limits-boundaries-and-intervals
Some extra knowledge • http://www.umsl.edu/~lindquists/sample.html