130 likes | 311 Views
The discipline of statistics: Provides methods for organizing and summarizing data and for drawing conclusions based on information contained in data. Examples of its use: In politics: election polls, economic forecasting.
E N D
The discipline of statistics: Provides methods for organizing and summarizing data and for drawing conclusions based on information contained in data. Examples of its use: In politics: election polls, economic forecasting. Business: stock market trends, market share prediction, customer satisfaction evaluation. Engineering: quality control, product reliability. Science: testing scientific hypotheses (quite a wide range depending on the field). Medicine: new medicine licensing, studying the association between smoking and lung cancer.
The goal of statistics is, usually, to draw conclusions and make decisions about some collection of objects constituting a population of interest. Two types of populations: Concrete, well-defined populations: the group of all graduates of the engineering program of UI in 2008. We already know all these graduates; they passed their exams and were given a degree. Conceptual or hypothetical populations: the group of all graduates of the engineering program of UI. Some individuals graduated and some will graduate in the future. The later ones are currently unknown to us.
The branch of statistics where we draw a conclusion or make a decision about a population is called inferential statistics. Inferential statistics involve taking a sample from the population of interest and using that sample to make conclusions about the population. We take a sample because a population is too big to study as a whole or because it is impossible to study all of its members (such as in conceptual populations). Statisticians are out of business if an entire population can be studied; then we have a census.
Samples should be collected in an objective manner to avoid making erroneous conclusions (avoid biasing our inference). Samples need to be representative of the population of interest and reflect its composition. Random samples are usually taken to accomplish this goal. An example of biased inference occurs when we sample from a population and make inference about another. For example, to study the distribution of weights of newborns in the US (target population) we can’t just sample from Idaho (sampled population); the composition of the Idaho-population might be different than that of the US population (less African and Asians Americans live here than in other areas, for example).
Another branch of statistics where we are only interested in summarizing the collected sample is called descriptive statistics. Descriptive statistics provide tools that can also help give an insight about the behavior of the population. Descriptive statistics include graphical tools (such as histograms and stem and leaf diagrams) and numerical summaries (such as sample means and sample variances).
In both inferential and descriptive statistics we are interested in studying one or more characteristics (variables) of a population. For example: The point-grade-average of engineering graduate of UI in 2008. (univariate: one variable of interest) The weight and height of newborn babies in the year 2008. (bivariate: two variables of interest) The point-grade-average, height, weight, and gender of UI engineering graduates in the year 2008. (multivariate: multiple variables of interest)
Variables can be categorical (such as gender [M,F]) or numerical such as weight, height or point-grade-average). We usually refer to these variables using letters from the end of the alphabet (for example: x can refer to weight, y can refer to height, z can refer to point-grade-average and so on). Variables can also be discrete and continues (we’ll come to that).
Probability is the underlying area of mathematics that provides the theoretical framework for statistics. Probability assumes perfect knowledge about the population of interest and uses this knowledge to study and make conclusions about a sample. That is, in probability we study the chance of obtaining a sample from a population that we know all what we need to know about. (deductive reasoning) In statistics, on the other hand, we are given a sample and are asked to make a conclusion about the population. (inductive reasoning)
Probability Population Sample Inferential Statistics
Example: Say that the population is the outcomes of flipping a coin a large number of times. Probability point of view: if we know that the coin is fair (that is there is a 50/50 chance that we will observe a head on each coin flip) what is the chance that we will observe 60 heads and 40 tails in a sample of size 100? Inferential statistics point of view: if we observe 60 heads and 40 tails in a sample of size 100, is it justified to say that the coin is fair? Or, does the sample give evidence against this conclusion.
We will progress in this class as follows: We will study descriptive statistics in the next few lectures After that we will focus on probability theory for a while Then we will come back and use what we know in probability and what we learned in descriptive statistics in inferential statistics.