100 likes | 150 Views
Here are some additional methods for describing data. 0 10 100 Here I have the number line. We start on the left at zero and as we move to the right we increase in value. (We can move to the left of zero into the negative numbers.)
E N D
0 10 100 Here I have the number line. We start on the left at zero and as we move to the right we increase in value. (We can move to the left of zero into the negative numbers.) Now, if we want a number line vertically we tip it from the 100 side up, as my arrow shows. This is the normal convention in all of math. EXCEPT for the stem-and-leaf plot or display. In this one area we tip the line up from the zero side. As an example of some data say 50 people take a test that has up to 150 points. Possible values on the test are 70, 120, 132, 79 and so on.
With the 50 people who took the test, let’s imagine we sort the data from low score to high score. In other words, we might arrange the data so that the lowest score is first, and then the next highest and so on. The last score is the highest. When we have a 3-digit number, like with test score, we have in general xxy. I put xxy because we will call the xx part the stem and y the leaf. The number 69, for example would be stem 06, or 6, and leaf 9. 123 would be steam 12 and leaf 3. Say we had two people on the test with score 69 and 68. In sorted order we would have 68, 69. In stem-and-leaf form we would have stem 6 and leafs 8 and 9. On the next slide I will have the beginning of a stem-and-leaf display that is in the book on page 53.
9 • 1 2 2 2 4 5 5 6 7 8 8 8 6 7 8 9 10 11 12 13 Here I have stem and leaf for the scores in the 60’s and the 90’s. Not the frequency of scores in the 60’s was two (two leafs) and the frequency of scores in the 90’s was 12. The stem-and-leaf is like a histogram, but the leafs of each number are used in a stack to the side of the stem. Longer stacks represent more frequent groups.
Crosstabulations, or crosstabs for short. Sometimes in statistics we will want to work with two variables at a time. The reason is that we think the two variables are somehow related. As an example, what do you think is the relationship between student grades and the number of classes they skip in a semester. My hunch would be the more classes skipped the lower the grade. But we may want to research this idea. Crosstabs is a tabular summary of two variables. On the next screen I create a basic crosstabs based on an example of how an “expert” rates a restaurant quality and the price of meals in the restaurant. 300 restaurants were visited. Notice the expert had to rate each restaurant as being good, very good, or excellent. Plus the expert had to note the meal price.
Here I have a frequency crosstab. Note that 42 restaurants were rated good and had prices in the $10-19 range. The “total” row is really the frequency distribution for the variable Meal Price, and the “total” column is the frequency distribution for the variable quality. In this table, we could put percentages in several formats.
Percentages in the cells We call each part of the table a cell and depending on what we want to think about we can calculate different percentages. Column percentages When we look at a given column we have a meal price and each row is a quality rating. If we use a column total as the basis of a percentage then we see percents of quality rating at each price range. Row percentages When we look at a given row we have a quality rating and each column is a price range. If we use the row total as the basis for a percentage then we see the percents of price range for a given quality.
Overall percentages Since we had 300 in the data, each cell divided by 300 tells us the percent of times that cell came up. Scatter Diagram Say we talk to people and we ask them their years of schooling and income last year. For each person we would have two values. On the next screen I show a scatter plot, where each person is a dot in the graph.
Scatterplot y - income note here in the scatterplot that in general the higher the schooling, the higher the income. Thus, knowing schooling will permit better prediction of income. x - years of schooling In a graph we put the variable of interest on the y axis. Here it is thought that knowing the years of schooling for a person will better help us understand income and schooling is put on the x axis. In other words, certain values of income are ‘matched’ with schooling amounts.
In a scatterplot, when the dots seem to be going up hill we say there is a positive, or direct, relationship between the variables. This means that the higher the value on one variable, the higher the value of the other variable. Dots going down hill suggest a negative, or indirect, relationship. This means as the value of one variable is getter higher the value on the other is getting lower. y y y x x x Positive relationship Negative relationship No relationship?