Writing II

Writing II Word processors Spell checkers Grammar checkers

Word processors • Typical functions (with linguistic relevance) • Text formatting via a graphical user interface • Automatic completion/expansion/correction • Spelling correction • Grammar and style correction • Dictionary and thesaurus functions • Sorting and collating (of tables) • Count words • Compare and merge documents • All of the above according to local norms • As determined by language and/or area

Low-level functions • Text justification • Usually involves inserting spaces, though not randomly • Arabic allows letter shapes to be stretched • Counting words • What counts as a word? • Collation, sorting • Alphabetical order may differ from language to language

Low-level functions • Hyphenation • Printing conventions (don’t hyphenate short words, or isolate too few characters) • There are rules, which differ by language • Eng (morph) present-ation ~ Fr (phon) présen-tation • Some rules change spelling • Swe trafikk+kultur = trafikultur, but trafikk-kultur • Ger ck  k-k, eg dicke  dik-ke • Swiss Ger ss can be split, but not in a compound, eg Stras-se but gross-artig (* gros-sartig) • Hyphen is repeated on second line in some languages • Hyphenation should avoid misleading the reader, or other unfortunate consequences, eg mish-ap, leg-end, Arse-nal

Spelling checkers • First developed in early 1980s • Now standardly available for all text-based software (not just word processors) • Available for many languages (at least, those for which “spelling” is a relevant concept) • Nevertheless, still quite crude in design and application I have a spelling checker,It came with my PC,It plain lee marks four my revueMiss steaks aye can knot sea. Eye ran this poem threw it,Your sure reel glad two no.Its vary polished in it's weigh,My checker tolled me sew. [...] Candidate for a Pullet Surprise or Owed to a Spell Checker Jerrold H. Zar http://www.bios.niu.edu/zar/poem.pdf

Spelling checkers • Operate only at the word level • Checking each word independently against a word list (dictionary) • For most languages this implies some knowledge of morphology for handling inflections • Though see what happens when you add a word to the dictionary

Spelling checkers • For not-found words, possible alternatives are suggested • Calculated using “Levenshtein distance” • Simple string difference calculation • May or may not take account of likely errors due to • Transpositions of symbols (eg langauge) • Transpositions of neighbouring keys (eg levture) • Phonetic misspellings (eg fizix)

Levenshtein distance • Smallest number of substitutions, insertions and deletions needed to change one string into another • Most efficient computer algorithm for calculating this discovered by V.I. Levenshtein in 1965 • (Particular) substitutions, transpositions, etc. may be “weighted” to bias the score • Considering size of dictionary, processing must be lightning fast

Spelling checkers • Display of suggestions may or may not take account of likelihood considering • Levenshtein distance score (is the word with the lowest score necessarily the likeliest correction?) • Frequency of use • Matching part of speech • Readability (long list of alternatives not helpful to a bad speller, eg dyslexic)

Spelling checkers • In general do not handle true homograph errors • They could quite easily deal with • Very frequent errors that can be identified by immediate context (eg its~it’s, there~their, no~know, ...) • (Some) errors that can be identified by part-of-speech tagging (eg practice~practise) • More difficult to deal with errors that depend on meaning

Spelling checkers • Dictionary size: “the bigger the better?” • Including rare words disadvantageous • especially if they are same as common misspellings (eg bhat) • They clutter up the list of suggestions • Most spelling checkers now compromise • 90,000 entries according to Wikipedia • Sensible handling of morphology (inflections and derivations) can reduce size considerably

Spelling checkersfor other languages • Concept of “spelling” not appropriate for some writing systems • If writing system is really phonetic, spell checker only has to deal with true typos (miskeying), not alternative phonetic realisations • Compounding rules in languages like Ger, Dutch mean many “new” words – checker should not flag these if they are potentially correct • Spelling is much less standardized for some languages, eg Hebערקעיראק עראק‘Iraq’ • Languages with very rich morphology have potentially infinite different word forms, so simple dictionary lookup is not appropriate

Grammar checking • “Grammar” as in “good grammar”? • Early grammar checkers were really style checkers • Still word-based, will flag use of “weak” words like nice, very, etc. and use of clichés, • and mechanical errors, eg double words, apparent punctuation errors • Now grammar checking involves genuine text analysis • Several companies were involved but Microsoft has now become dominant • Arguably resulting in stagnation (see Wikipedia)

Grammar checking • Who wants it and what for? • What mistakes do native speakers make? • Borderline between style and grammar? • media/data is/are; less~fewer; compared to/with • Comma after subject of sentence • however as a conjunction • Some mistakes clear-cut • Do people type ungrammatical sentences? • Mistakes introduced by editing

Grammar checking for learners • Language learners have different needs from ordinary users • Mistakes are somewhat predictable • They make different mistakes • They might also like an explanation or link to a grammar (in the pedagogic sense) tutorial • Grammar checker can be predictive, i.e. go looking for specific mistakes • Could be set at an appropriate level

Grammar checking • True grammar checking would involve syntactic analysis ... • Needs a dictionary indicating parts of speech • Morphological processing (as before) • Rules of grammar • ... and possibly some semantic processing • Actually, it’s too hard to do completely • But a lot can be done

Grammar checking • Subject-verb agreement • Modifier-noun agreement (Eng this~these etc, but more extensive for other langauges) • Verb complement checking (wait for, depend on, etc) • Inclusion of a main clause • All of the above only if the sentence is fairly simple

Grammar checking • For real grammar-checking, use a tagger and/or parser (see later) • Some things can be done with statistical models • Learn probability of word sequences (n-grams) from a large corpus • Use this model to judge grammaticality of text

Using a language model • Language model can also be used to distinguish between • Homophones • Near synonyms • In either case by looking at collocations • Again, n-grams • Or co-occurrence of words in the sentence

Using a language model: n-grams • Counting hits on Google • Homophone distinction • principle reason (110k)~ principal reason (1.03m) • stationarycupboard (831) ~ stationerycupboard (37.7k) • could of gone (27.7k) ~ could have gone (1.92m) • I wonder weather (2.78k) ~ I wonder whether (1.74m) • dessert + camel (307k) ~ desert + camel (2.07m) • Near synonym distinction • “strong coffee” (443k) ~ “powerful coffee” (668) • “strong engine” (86k) ~ “powerful engine” (614k) • strong + coffee (17.6m) ~ powerful + coffee (8.9m) • strong + engine (28.6m) ~ powerful + engine (28.8m)

Using a language model: collocation • Homophone distinction • dessert + camel (307k) ~ desert + camel (2.07m) • Near synonym distinction • strong + coffee (17.6m) ~ powerful + coffee (8.9m) • strong + engine (28.6m) ~ powerful + engine (28.8m) • Similar distinctions can also be measured with reference to a structured thesaurus such as WordNet ( next week’s topic)

Writing II

Writing II

Presentation Transcript

Seminar IV Communications – II Writing Writing and Speaking

Writing Research & Education Plans Proposal Writing Workshop II

Cultural Practices of Writing II

Cultural Practices of Writing II

College Writing II

WRITING II

College Writing II Week 5

English 112: College Writing II

Seminar IV Communications – II Writing Writing and Speaking

Better your Writing (Part II)

APA Writing Style II

Business Writing II

Effective Resume Writing II

Grant Writing II

Academic Writing Workshop Part II

Reporting and Writing II

Legal Research & Writing II

NEWS WRITING II

REU Writing Workshop II

Reporting and Writing II

Writing systems II

Writing Process English II

Writing II

Writing II

Presentation Transcript

Seminar IV Communications – II Writing Writing and Speaking

Writing Research &amp; Education Plans Proposal Writing Workshop II

Cultural Practices of Writing II

Cultural Practices of Writing II

College Writing II

WRITING II

College Writing II Week 5

English 112: College Writing II

Seminar IV Communications – II Writing Writing and Speaking

Better your Writing (Part II)

APA Writing Style II

Business Writing II

Effective Resume Writing II

Grant Writing II

Academic Writing Workshop Part II

Reporting and Writing II

Legal Research &amp; Writing II

NEWS WRITING II

REU Writing Workshop II

Reporting and Writing II

Writing systems II

Writing Process English II

Writing Research & Education Plans Proposal Writing Workshop II

Legal Research & Writing II