1 / 15

Using semantic associations for the detection of real-word spelling errors

Using semantic associations for the detection of real-word spelling errors. Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University of London. Real-word spelling errors. One word mistakenly produced for another. There is a bored ( board ) for messages.

hetal
Download Presentation

Using semantic associations for the detection of real-word spelling errors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using semantic associationsfor the detection of real-word spelling errors Jennifer Pedler School of Computer Science & Information Systems Birkbeck, University of London.

  2. Real-word spelling errors One word mistakenly produced for another There is a bored(board) for messages. She wrote the appointment in her dairy (diary).

  3. Confusable Pairs

  4. Semantic ‘flavour’ Carve stone wood knife . . oak walnut marble granite chisel Crave man food success . . people chocolate attention

  5. material stone wood marble granite oak walnut WordNet Relationships Hyponymy/hypernymy ISA relation

  6. WordNet senses for stone • stone, rock(countable, as in “he threw a stone at me”) • stone, rock(uncountable, as in “stone is abundant in New England”) • stone (building material) • gem, gemstone, stone • stone, pit, endocarp(e.g. cherry stone) • stone(unit used to measure ... weight) • stone (lack of feeling...)

  7. material stone wood sandstone granite marble limestone oak walnut beech ash covering pericarp stone Branches for two senses for stone Stone, rock stone, pit, endocarp

  8. Section of final carve hypernym tree entity 0 (1028) P 0.6 substance 0 (342) P 0.2 material 0 (91) P 0.05 stone 19 (39) P 0.02 wood 10 (52) P 0.03 granite 4 P 0.002 marble 12 P 0.007

  9. entity carve 0.96 crave 0.04 substance carve 0.87 crave 0.13 material carve 0.99 crave 0.01 food carve 0.41 crave 0.59 stone carve 1.0 crave 0.0 foodstuff carve 0.22 crave 0.78 Merged tree

  10. Final Scores: carve 0.776  crave 0.223 Scoring at run-time Seventeenthcenturydollscarvedfromwoodfetch very highprices...

  11. Behaviour Correct spelling Error Accept as correct   Ignored error Flag as error  False alarm  Spellchecking Optimum performance: Minimise false alarms, maximise number of actual errors flagged Test data: Flob Corpus I million words 1310 confusables Flob Original: Seventeenth century dolls carved from wood fetch very high prices... Flob Reversed: Seventeenth century dolls craved from wood fetch very high prices...

  12. Selecting the confusable with the highest score - Examples Flob Original (confusable correctly accepted): There is a famous early 15thcenturychestin thechapelbearing acarvedsceneof two jousters inaction. Scores: carve 0.8326, crave 0.1674  prefer carve Thismethodin particular has interesteddairyfarmersbecause, being alkaline, it helps theanimal'srumen to function efficiently. Scores: dairy 0. 0.6797, diary 0. 0.3203  prefer dairy Flob Reversed (error correctly flagged): Blackhandscaught him,unitedtherope... Scores: unite 0.2428, untie 0.7572  prefer untie Theensuringfightis real Douglas Fairbanks Jnrstuff. Scores: ensure 0.3233, ensue 0.6767  prefer ensue

  13. Selecting the confusable with the highest score - Results

  14. Setting a confidence threshold ... reformsnow in the final making should signify a newerain localgovernmentin whichresultscountfor more than ructions. Scores: ear 0.5136, era 0.4864 prefer ear

  15. The one remaining false alarm But thefirm... it's gone bankrupt and we're all out on ourears. Scores: ear 0.0093 era 0.9907 prefer era Final Examples Less frequent word preferred with high level of confidence A fiercefightensured.. Scores: ensure 0.0646, ensue 0.9354 prefer ensue You've bought themannerHouseand you've got a Ferrari. Scores: manor 0.9951, manner 0.0049 prefer manor

More Related