1 / 82

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth. Defining and measuring variables. Independent & dependent variables. Independent variable : something we manipulate in an experiment Dependent variable : something we measure

cirila
Download Presentation

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributions & Descriptive statistics • Dr William Simpson • Psychology, University of Plymouth

  2. Defining and measuring variables

  3. Independent & dependent variables • Independent variable: something we manipulate in an experiment • Dependent variable: something we measure • By manipulating the IV, we expect to produce a change in the DV

  4. Scales of measurement • variables classified according to type of scale • type of analysis depends on type of scale • Worst to best: Nominal, ordinal, interval, ratio

  5. Nominal • Nominal data: assign categorical labels to observations • Not really measurement • E.g. male/female; married/single/widowed/divorced • Numbers on football jerseys

  6. Ordinal • Ordinal data: values can be ranked (ordered). Categorical but rankable • E.g. small, medium, large; movie rating 1-5; Likert scale • Can only be ranked. Rating scale is not like cm. The diff between & is not nec the same as between &

  7. Adding a response of "strongly agree" (5) to two responses of "disagree" (2) would give us a mean of 4, but what is the meaning of that number?

  8. Interval • Interval data: ordinary measurement, e.g. temperature • Unlike ordinal data, we can say the diff between 1 & 2 deg C is same as diff between 4 & 5 deg

  9. Ratio • Ordinary measurements, but with an absolute, non-arbitrary zero point • E.g. weight, length: any scale must start at zero • deg C: not ratio, because 0 arbitrarily set at freezing pt of water

  10. Discrete & continuous variables • variables measured on interval & ratio scales are further identified as either: • discrete – Integers, no intermediate values. E.g. #Smarties in a box • continuous - measurable to any level of accuracy. E.g. Weight of Smarties contents

  11. Frequency distributions

  12. We have a pile of scores • Not all scores are equally likely • How were scores distributed?

  13. Subjects were timed (in sec) while completing a problem-solving task: • 7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2

  14. Stem & leaf • Two components: the stem and the leaf • In problem-solving example, stem = ones, leaf = tenths • Stems range between 5 and 9

  15. 7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2 • 5|98 • 6|821 • 7|6347 • 8|1182 • 9|2 • Key: 9|2 means 9.2

  16. Heights in cm:154, 143, 148,139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157, 149, 146 • - Put 2 digits in stem; split stems 0-4, 5-9 • 13|969 • 14|334323 • 14|87796 • 15|431 • 15|67 • 16|24 • Key: 13|6 means 136

  17. GSR values: 23.25, 24.13, 24.76, 24.81, 24.98, 25.31, 25.57, 25.89, 26.28, 26.34, 27.09 • - Round the last 2 digits • 23|3 • 24|188 • 25|0369 • 26|33 • 27|1 • Key: 23|3 means 23.3

  18. Histogram • Alternative way to look at distribution • It is like a version of stem-and-leaf turned 90 deg

  19. Example • Time to complete task (min): • 8 2 6 12 9 14 1 7 7 9 11 8 12 10 5 7 10 9 10 11 4 8 2 11 10 11 13 13 14 11 13 10 12 13 5 16 11 17 10 6 13 11 5 9 12 14 8 2 12 4

  20. Sort scores into about 10 or so bins (similar to stem in stem-and-leaf)

  21. Decide on sensible bins • Count the number of observations in each bin (length of each leaf in stem-and-leaf) • This number in each bin is called the frequency

  22. time frequency 0-1 1 2-3 3 4-5 5 6-7 5 8-9 8 10-11 13 12-13 10 14-15 3 16-17 2

  23. This table is then used to make the histogram • Histogram is bar chart with frequency on y axis and score on x axis • Sometimes done other ways, e.g. connect the dots (frequency distrib polygon)

  24. 15 10 Frequency 5 0 0 2 4 6 8 10 12 14 16 18 20 Time (min)

  25. in R • x<-c(8, 2, 6, 12, 9, 14, 1, 7, 7, 9, 11, 8, 12, 10, 5, 7, 10, 9, 10, 11, 4, 8, 2, 11, 10, 11, 13,13, 14, 11, 13, 10, 12, 13, 5, 16, 11, 17, 10, 6, 13, 11, 5, 9, 12, 14, 8, 2, 12, 4) • hist(x) • stem(x) • boxplot(x)

  26. Probability distributions • Histogram is estimate of true probability distribution • Many theoretical probability distributions exist • Basis of statistical models used to make inferences about population

  27. Binomial distribution • Binomial distribution is a discrete distribution • the binomial distribution applies when: • there is a series of n trials (e.g., 10 coin tosses) • only 2 possible outcomes per trial • outcomes are mutually exclusive (head or tail) • outcome of each trial independent of others

  28. The binomial distribution gives the chance of getting each total number of ‘successes’ after doing all the (binary) trials of the expt • E.g. it gives the chance of getting 1, 2, or 3 girls after giving birth to 6 children • p = p(success) = p(girl) = 0.5 each trial • q = p(failure) = p(boy) = 1-p = 0.5 • n = number of trials = 6

  29. prob distribution where n = 6 and the prob of each outcome is 0.5 on each trial looks like: probability number of girls

  30. For any probability distribution, the y-axis is given by a formula • For the binomial, it looks like this: • k successes in n trials; () is binomial coefficient • you don’t need to know it

  31. Normal distribution • Continuous probability distribution • Every probability distribution’s y-axis is given by a formula • For normal distribution, the y-axis (probability density) is:

  32. Descriptive statistics

  33. We have a pile of scores • Have made stem-and-leaf, histogram • Want to summarise further: descriptive statistics

  34. 1. Centre (location) • What is the ‘typical’ score? If you were to make a prediction for a new score, what would it be?

  35. a) Mean (average) • Mean = sum(x)/n

  36. Mean as balance point • Imagine that each observation is a toy block • Place the blocks on a ruler; the position (1, 2, etc inches) represents the value • The balance point is the mean

  37. 1 2 2 3 1 2 2 5 1 2 2 9 Mean is pulled towards extreme observation (outlier)

  38. b) Median • Median is middle score; 50th percentile • useful when extreme scores (outliers) lie in one tail of distribution (skewed)

  39. Calculate the median • Sort scores • If odd n, median is middle value • If even n, median is mean of 2 middle values • 25 13 9 18 1 -> 1 9 13 18 25; med=13 • 25 13 9 18 -> 9 13 18 25 • Median= (13+18)/2 = 15.5

  40. Median and outliers • 1 2 2 3 • 1 2 2 5 • 1 2 2 9 • Median = 2 in all cases

  41. c) Mode • Mode is most frequently occurring score • Mean should really be used only for interval/ratio data. Mode good otherwise • E.g. mean movie rating – not really sensible. Mode sensible • Sometimes no unique mode exists (e.g. bimodal)

  42. Bimodality can be due to mixture of two different populations (e.g. male and female)

  43. 15 10 Frequency 5 0 0 2 4 6 8 10 12 14 16 18 20 Time (min) Time to complete task (min) • Mean = 9.36 Median = 10 Mode =11

  44. mean(x) • median(x) • Mode <- function(x) {  ux <- unique(x)  ux[which.max(tabulate(match(x, ux)))]} • Mode(x)

  45. Likert scale • e.g. Brief Psychiatric Rating scale (BPRS) • Interview + observations of patient's behaviour over preceding 2–3 days • Each item scored 0-7

  46. Suppose we have a new treatment • Does it reduce anxiety? • Define “anxiety” as score on Q2

  47. We use BPRS on lots of patients • Compare treatment and placebo • How? Find mean(treatment) vs mean(placebo)?

  48. NO

  49. The numbers 0-7 are not really numbers! • They have only rank (order) info • Ordinal

More Related