420 likes | 506 Views
Transforming data: Some very valuable tools. S-012. Transforming scores: Shifting scales can be a big help. Some common transformations Proportions or percentages Rank order The Z transformation (standardizing) Square root Logarithm. 1. Raw scores to proportions or percentages.
E N D
Transforming scores:Shifting scales can be a big help Some common transformations • Proportions or percentages • Rank order • The Z transformation (standardizing) • Square root • Logarithm
1. Raw scores to proportions or percentages Probably the most common transformation. We do this all the time.
1. Raw scores to proportions or percentagesAnother example • Another example: Analyzing conversations at dinner tables. • Recordings of conversations • Adjust for length of conversation • Proportion of turns, or proportion of utterances
2. Transforming to ranks Example: Grade 4 students reading. Number of pages reported in one week.
2. Transforming to ranks Example: Grade 4 students reading. Number of pages reported in one week.
2. Transforming to ranks Example: Grade 4 students reading. Number of pages reported in one week. (Stata likes to rank from lowest to highest.) Ranking preserves the order, but it ignores the distances between the scores. Ranking is a very common and very useful transformation.
The “Z” transformation My favorite! The best! Example: Students’ scores at two different times.
The “Z” transformation How well did student #1 do at time 1? How about student 2? 3? Etc.? How did they do at time 2?
The “Z” transformation Use the group mean and SD to create z-scores.
The “Z” transformation Use the group mean and SD to create z-scores.
The “Z” transformation The z-scores now help us a lot in comparing individual performance at time1 and time2.
The “Z” transformation formula There are two versions of the formula. 1. Here we use the sample mean and the sample SD. 2. Here we use the population mean and the population SD. • Use the mean and SD of the sample. • How far is each score from the sample mean? • How many standard deviations away? • Use the mean and SD of a population. • How far is each score from the population mean? • How many standard deviations away?
Z- transformation exampleGRE scores • The old version of the GRE was scaled so that the mean was 500, with a standard deviation of 100. (Mean = 500, SD = 100) • If a student had a score of 600, how good is that score? • X = 600, so z = 1.0. (One SD above the GRE population mean.) • The new version of the GRE is rescaled so that the mean is 150, with a standard deviation of 9.0. (Mean = 150, SD = 9) • If a student had a score of 160, how good is that score? • X = 160, so z = 1.1. (A bit more than 1 SD above the GRE population mean.)
The “Z” transformation My favorite! The best! • Key idea: • How many SDs away from mean • How far from mean – in SD units • What a great idea! • Lets us compare things even when we use different tests or different scoring systems Other examples?
The square-root transformation:Often useful when things are positively skewed
The square-root transformation:Often useful when things are positively skewed Mean = 26.6 Median = 9 Mean pulled way up beyond the median. Not “normal” at all. (Not bell-shaped.) Skewed to the right, positively skewed.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed • Let’s look at each score and take the square root. • This will pull in the high scores.
The square-root transformation:Often useful when things are positively skewed Mean = 26.6 Median = 9 Mean = 4.2 Median = 3
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) These (the exponents) are the logs (the logarithms) 1 10 = 10 3 10 4 10 2 = 100 10 • Here I am using “base 10” logs. = 1000 = 10000
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) The “logs” are the exponents 1 10 = 10 1 3 10 4 10 2 = 100 10 2 = 1000 3 = 1000 4
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) The “logs” are the exponents 1 ? 10 10 = 10 = 90 3 10 4 10 2 = 100 10 = 1000 = 1000 What will the log of 90 be?
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) The “logs” are the exponents 1 ? 10 10 = 10 = 90 3 10 4 10 2 = 100 10 = 1000 = 1000 What will the log of 90 be? 1.95
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) The “logs” are the exponents 1 0.95 1.95 1.70 10 10 10 10 = 10 = 50 = 90 = 9 3 10 4 10 2 = 100 10 = 1000 = 1000
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) The log transformation has a dramatic effect on the scores. This changes the distances between the scores. This has a huge effect on the distribution. When scores are spread out widely on the scale (e.g., 10, 100, 1000, etc.) the log helps to pull in the very high scores. Actually, it pulls in the high scores, and it can help to spread out the low scores. This is a very useful and very common transformation. (Widely used in economics, biology, demography, etc.)
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) Mean = 1608 Median = 90 The low scores (9, 10, 50, 100) are all clustered together at the left side.) We cannot really see them. The large values are far away from the small values.
The log transformation:Often useful when things are positively skewedOr when the range is very wide (over several orders of magnitude) Mean = 1608 Median = 90 Original scores Log scores High scores pulled in. Lower scores more spread out. The scale has changed.
The log transformation:Also often useful when we are studying growth over time Example: Studying children’s vocabulary growth How many words are they learning? During early months, the “scores” (the vocabulary sizes) are low, so they are bunched together. • The scale changes quite a bit here. • The early scores are 4, 5, 7, 10, 20. • The later scores are 400, 600, 1100. • So this is another example where the log transform may be helpful. But at older ages, the growth continues, and so the scores are much more spread out.
The log transformation:Also often useful when we are studying growth over timeCheck these graphs Vocabulary is growing, and it seems to be growing faster and faster! Wait! Now we see that the growth is steady. (Here the growth is 20% per month.) The log transformation is helpful here!
The log transformation:Also often useful when we are studying growth over timeCheck these graphs Growth in vocabulary for two children. (Both growing rapidly!) But the gap is getting larger and larger over time. The log transformation shows us the differences in the growth rates. (Here the difference is only one percent per month.) But this monthly difference is steady, so it ends producing a big difference over time.
Transformations:The can help, but they require lots of thought • Percentages are useful • Very common • Adjusts things to rates rather than simple counts • Easy to understand • Rank ordering • Preserves the order • Ignores the distances • Very common • Several important statistical tests use the rank order • Square-root transformation • Often useful with count data (days absent) (household size) • When there is positive skew (Skewed to the right) • Pulls in the long tail • Works with positive values • Log transformation • Useful when scores are spread out over a very wide scale • When we look at things that change in percentage terms (e.g., growth rates:1-percent growth, or 5-percent growth) • Works only with positive values (Sometimes we add a constant so we can use square root or log transform.) • Sometimes harder to interpret • Very commonly used in economics, biology, ecology, etc.
But the best, most important, most valuable, most versatile, all-around most-cool transformation is . . . Z Call it “Zee” Call it “Zed” However you pronounce it, it is a greatconcept. • How far away? How many SDs away? • Z is a standard score. On a standard scale. • Helps us compare results on different tests (different test scales). • Helps compare results of different studies. • Helps us judge differences when we are comparing groups.