1 / 55

Collecting and interpreting acceptability judgments using Magnitude Estimation

Collecting and interpreting acceptability judgments using Magnitude Estimation. Caroline Heycock with Zakaris Svabo Hansen and Antonella Sorace University of Edinburgh. NLVN-course/NORMS-seminar Tórshavn, Faroe Islands, 8–16 August 2008. Outline. Why do we need acceptability judgments?

ozzie
Download Presentation

Collecting and interpreting acceptability judgments using Magnitude Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collecting and interpreting acceptability judgments using Magnitude Estimation Caroline Heycockwith Zakaris Svabo Hansen and Antonella Sorace University of Edinburgh NLVN-course/NORMS-seminar Tórshavn, Faroe Islands, 8–16 August 2008

  2. Outline • Why do we need acceptability judgments? • What are the problems with acceptability judgments? • How can Magnitude Estimation help with any of these problems? • Exemplification from ongoing studies on Faroese (and related languages)

  3. Why do we need judgment data? Need Problems ME Examples • There is no direct way to access I-language (the speaker’s knowledge of their language), we need to triangulate from all available sources of data. • Corpus data typically • aggregate across speakers • include performance errors • allow no straightforward distinction between non-occurring and ungrammatical • may not exist

  4. Outline • Why do we need acceptability judgments? • What are the problems with acceptability judgments? • How can Magnitude Estimation help with any of these problems? • Exemplification from ongoing studies on Faroese (and related languages)

  5. Validity Need Problems ME Examples Judgments are also a type of behaviour, known to be affected by • processing constraints • personality and mental state • presentation (order, context, mode) • absolute vs relative task • linguistic training

  6. Reliability Need Problems ME Examples • Interspeaker variation • This may or may not be considered a problem of reliability, depending on assumptions about individual’s grammars, but it is at least a methodological problem • Intraspeaker inconsistency

  7. Conventional measurements of acceptability Need Problems ME Examples • Judgments of linguistic acceptability usually form category scales (ok/*) or limited ordinal scales (ok/?/?*/*), (1,2,3,4,5) • These scales require absolute rating judgments, rather than relative ranking judgments • Ordinal scales provide no information about the relative distance between adjacent points on the scale

  8. Problems arising with conventional scales for acceptability judgments Need Problems ME Examples • Limited in their range of values • Lack of statistical power • These scales cannot be analysed using parametric statistics, because this type of analysis requires the data to be on at least an interval scale. • Inconsistency • Even trained linguists use diacritics in different ways. Comparison between different studies is extremely difficult. • Uninterpretability • What do the middle points on a rating scale actually mean? • How can we distinguish between lack of certainty and intermediate acceptability?

  9. Judgment data: interpreting midpointsThráinsson 2003, Petersen 2000

  10. Judgment data: interpreting midpointsThráinsson 2003, Petersen 2000

  11. Outline • Why do we need acceptability judgments? • What are the problems with acceptability judgments? • How can Magnitude Estimation help with any of these problems? • Exemplification from ongoing studies on Faroese (and related languages)

  12. M[agnitude] E[stimation] in psychophysics Need Problems ME Examples • ME is an experimental technique used to determine quickly and easily how much of a given sensation a person is having. • In an ME experiment subjects are presented with a standard stimulus (a modulus) and are asked to express the magnitude by a number. • They are then presented with a series of stimuli that vary in intensity and are asked to assign each of the stimuli a number relative to the modulus.

  13. ME in psychophysics Need Problems ME Examples • Subjects assign a number: • to the modulus to reflect magnitude of pertinent characteristics (length, loudness, brightness) • to each successive stimulus to indicate apparent magnitude relative to the first (or to a previous stimulus)

  14. ME in psychophysics: Scaling Need Problems ME Examples • Scaling in ME is not about absolute accuracy of judgments; • Scaling is about the relative relationships between judgments of stimuli of different intensities.

  15. ME in psychophysics: modalities Need Problems ME Examples • The numerical modality is the most common but other modalities are possible (e.g. line length). • Other modalities can be more user-friendly particularly if you are testing people who (think they) are numerically-challenged.

  16. ME in psychophysics: can people do it? Need Problems ME Examples • Many magnitude estimation experiments use a control condition in which subjects are asked to perform magnitude estimations of the length of a line. • Magnitude estimations of line length have been shown to be proportional to the actual length of the lines.

  17. ME in Linguistics Need Problems ME Examples • Unlike other dimensions, linguistic acceptability has no obvious “physical” continuum to plot against subjects’ impressions. • However, Bard, Robertson & Sorace 1996 have applied standard cross-modality matching techniques and were able to show that the technique is reliable.

  18. Typical instructions Need Problems ME Examples • Here’s an example of what the instructions look like...

  19. Instructions The purpose of this exercise is to get you to judge the acceptability of some English sentences. You will see a series of sentences on the screen. These sentences are all different. Some will seem perfectly okay to you, but others will not. What we're after is not what you think of the meaning of the sentence, but what you think of the way it's constructed.

  20. Your task is to judge how good or bad each sentence is by assigning a number to it. • You can use any number that seems appropriate to you. For each sentence after the first, assign a number to show how good or bad that sentence is in proportion to the reference sentence.

  21. For example, if the first sentence was: (1) cat the mat on sat the. and you gave it a 1, and if the next example: (2) the dog the bone ate. seemed 20 times better, you'd give it twenty. If it seems half as good as the reference sentence, give it the number 0.5

  22. You can use any range of positive numbers you like including, if necessary, fractions or decimals. • You should not restrict your responses to, say, an academic marking scale. • You may not use minus numbers or zero, of course, because they aren't proper multiples or fractions of positive numbers. • If you forget the reference sentence don't worry; if each of your judgments is in proportion to the first, you can judge the new sentence relative to any of them that you do remember.

  23. There are no 'correct' answers, so whatever seems right to you is a valid response. Nor is there a 'correct' range of answers or a `correct` place to start. • Any convenient positive number will do for the reference. • We are interested in your first impressions, so don't spend too long thinking about your judgment.

  24. Remember: • Use any number you like for the first sentence. • Judge each sentence in proportion to the reference sentence. • Use any positive numbers you think appropriate.

  25. Choices about the modulus: face validity Need Problems ME Examples • The experimenter has the option of assigning a fixed number to the modulus. • Another option is to leave the modulus in sight throughout the experiment. • This option has good face validity, but it isn’t clear to what extent it affects the ultimate reliability of the estimates. • People don’t need to remember the modulus; if they are making judgments proportionally, the reference point shifts as they move on.

  26. Advantages of quasi-randomization Need Problems ME Examples • The experimenter can impose constraints on the randomization to prevent certain experimental items from occurring consecutively. • The modulus can be chosen to represent an intermediate degree of acceptability. • A number (or a line) of intermediate size can be assigned to the modulus by the experimenter.

  27. Timed vs untimed ME Need Problems ME Examples • Timing the intervals between sentences may reduce the likelihood that people consult metalinguistic or prescriptive knowledge. • Intervals have to be different for non-native speakers: they have to be piloted carefully.

  28. Varying the instructions Need Problems ME Examples • There is a tendency in some people to use a fixed (usually 10-point) scale. This is possibly because of familiarity with school marking systems. • If the instructions contain an explicit warning against using a restricted range of numbers, the tendency is much reduced. • People are very sensitive to instructions: these have to be as explicit and clear as possible. • A detailed practice session is essential!

  29. Advantages Need Problems ME Examples • ME yields interval scales, which allow the use of parametric statistics • Mathematical operations can be applied to the estimates, allowing: • a direct indication of the speaker’s ability to discriminate between more or less acceptable sentences • a direct measure of the strength of speakers’ preferences

  30. Advantages Need Problems ME Examples • Informants are enabled to express their intuitions without any restrictions of the judgment scale. • They are asked to provide purely comparative judgments: these are relative both to a reference item and the individual subject’s own previous judgments. • At no point is an absolute criterion of grammaticality applied. • The subjects themselves fix the value of the reference item relative to which subsequent judgments are made.

  31. Advantages Need Problems ME Examples • The scale used by informants is open-ended and has no minimum division: subjects can always add a further highest score or produce an additional intermediate rating. • The result is that subjects are able to produce judgments which distinguish all and only the differences they perceive.

  32. Data analysis: normalisation Need Problems ME Examples ME data need to be normalized because people use different ranges of estimates. • Raw magnitude values are often transformed into logs in order to yield a normal distribution. • Each number is divided by the modulus that the subject had assigned to the reference sentence, or alternatively the z-scores are used. • Any statistical package can easily do these transformations.

  33. Outline • Why do we need acceptability judgments? • What are the problems with acceptability judgments? • How can Magnitude Estimation help with any of these problems? • Exemplification from ongoing studies on Faroese (and related languages)

  34. Faroese Need Problems ME Examples Some questions: • Do current speakers of Faroese have V-to-I as part of their competence grammar(s)?that is, do they allow the order Finite Verb > Negation in all types of subordinate clause? • Do current speakers of Faroese allow “generalised embedded Verb Second” (V2)?That is, do they allow a wide range of subordinate clauses to begin with something other than the subject? • With respect to these phenomena, how is Faroese situated with respect to Icelandic and Danish?

  35. How acceptable is V-I in Faroese? We looked at the effect of two variables and their interaction (2 within-subjects variables, 2 and 3 levels): • Order • Verb-Adverb • Adverb-Verb • Type of “adverb” • Negation (ikki) • “High” adverb (kanska) • “Low” adverb (ofta) These orders were all contained in relative clauses.

  36. Examples • Adverb: Negation Order: V-Adv Hatta er filmurin, sum Hanus hevur ikki sæðThat is film-def that Hanus has neg seen • Adverb: Negation Order: Adv-V Hetta er brævið, sum Elin ikki hevur lisiðThat is letter-def that Elin neg has read • Adverb: Low Adv Order: V-Adv Hetta er lagið, sum Teitur hevur ofta spæltThat is piece-the that Teitur has often played • Adverb: Low Adv Order: Adv-V Hatta er sangurin, sum Eivør ofta hevur sungiðThat is song-def that Eivør often has sung

  37. How “generalized” is V2 in Faroese? We looked at the effect of two variables and their interaction (2 within-subjects variables, 2 and 5 levels): • Order • Subject-Initial • Adjunct-Initial • Clause type • Main clause • “Bridge verb” complement • Nonbridge verb A complement (regret, admit) • Nonbridge verb B complement (deny, doubt, be proud) • Indirect question

  38. Examples • Clause Type: Bridge Order: Subject-Initial Lív segði, at hon kom seint til arbeiðis í gjárLív said that she came late to work yesterday • Clause Type: Bridge Order: Adjunct-Initial Beinir segði, at í morgin kemur hann seint til arbeiðisBeinir said that tomorrow comes he late to work • Clause Type: NonBridge B Order: Subject-Initial Sámal noktaði, at hann hevði verið alla náttina á barrini í fleiri førumSámal denied that he had been all night in bar-def frequently • Clause Type: NonBridge B Order: Adjunct-Initial Einar noktaði, at í fleiri forum hevði hann drukkið alla náttina á barriniEinar deniedthat frequently had he drunk all night in bar-def

  39. Faroese 1 vs Faroese 2: geographic? • In Jonas 1996 it is argued that there are two distinct “dialects” in Faroese: • Faroese 1, which optionally allows V-to-I • Faroese 2, which does not allow V-to-I • Jonas suggests that these two dialects may correlate both with age and with dialect area: Faroese 1 more common in the southern islands, and among older speakers. • We investigated the geographic dialect suggestion by collecting data from 25 subjects from Tórshavn (North) and 22 subjects from Suðuroy (South). Subjects were, as much as possible, matched for age.

  40. No geographic dialect difference • The main effect of dialect group was not significant • There was no significant interaction between language group and position of verb, or between language group and type of adverb • We did not find any evidence for a geographic dialect difference with respect to V-to-I in our subjects

  41. Commparison with Danish, Icelandic • There is a significant interaction between language and order of the verb with respect to Negation/Adverb. • I.e. the effect of the different orders is different, depending on the language...

  42. Comparing Verb/Adverb orders • To see where there is any difference between the different adverbs in terms of whether or not the verb can move past them, we can look at the difference between the Verb-Adverb and Adverb-Verb orders with respect to each of the three adverbs • We’d expect no difference between verb movement over the three adverbs in Icelandic (all should be good) and in Danish (all should be bad) • If Faroese is just intermediate between Icelandic and Danish, we’d also expect no effect of the different adverb types here.

  43. Comparing Verb/Adverb orders • Our Faroese subjects dispreferred the order Finite Verb - Negation in an unambiguously non-V2 context to the same extent that the Danish subjects did. • However, our Faroese subjects found Verb-Adverb orders better than Verb-Negation orders (this effect was found neither in Danish nor in Icelandic). • It is possible that to the extent that IP-internal verb movement is still grammatical in Faroese, for some speakers it is to an intermediate position.

  44. Looking at the effect of V2 The best measure of the effect of V2 is to look at the difference between the Subject-Initial and Adjunct-Initial order, for each clause type: That is, what is the difference between the scores for sentences of type (a) and type (b) for each clause type? (a) Order: Subject-Initial Lív segði, at hon kom seint til arbeiðis í gjárLív said that she came late to work yesterday (b) Order: Adjunct-Initial Beinir segði, at í morgin kemur hann seint til arbeiðisBeinir said that tomorrow comes he late to work

  45. The effect of V2: Danish • In Danish there was a significant difference between the effect of V2 in a main clause and after the second category of “nonbridge” verbs (deny, doubt, be proud). • There was however no significant difference between the effect of V2 in a main clause and after the first category of “nonbridge” verbs (regret, admit). • Taken together, this suggests that for this language Vikner’s original categorisation of “bridge” verbs for V2 is not correct; instead these results are more consistent with the proposals in Bentzen et al (2007) or Julien (2007).

  46. The effect of V2: Faroese and Icelandic • In Faroese and Icelandic, however, there is no significant difference between the effect of V2 in a main clause and after the second category of “nonbridge” verbs. • This suggests that V2 in these languages targets a different projection than in Danish (and the other mainland Scandinavian languages?)

More Related