280 likes | 309 Views
Stylistics and stylometry. What is “style”?. Term not much loved by linguists Too vague Has connotations in neighbouring fields (“style” = good style, ie a value judgment)
E N D
What is “style”? • Term not much loved by linguists • Too vague • Has connotations in neighbouring fields (“style” = good style, ie a value judgment) • Many books/articles make reference to etymology of the word (Lat. stilus = ‘pen’), so it follows that style is mainly about written language • Various definitions, some very close to things already seen (especially “register”) • Two main aspects widely supposed: • style is choice • style is described by reference to something else
Style as choice • For any intended meaning there are a range of alternative ways of expressing that meaning • Different choices express nuances • of meaning • of other things (style?) eg buy vs purchase • Example: • Visitors are respectfully informed that the coin required for the meter is 50p; no other coin is acceptable • 50p pieces only • Propositional meaning is the same; difference in expression conveys something else (register etc)
Style as choice • Style is a choice, but often the “choice” is somewhat predetermined • ie a choice between appropriate and inappropriate style • So maybe “style” is just another word for register?
Style and the norm • Some writers define style as • “individual characteristics of a text” • “total sum of deviations from a norm” • But what is the “norm”? • Is there some form of the language that is neutral as regards style/register? • Note also that the norm shifts: eg Bible AV was written in the vernacular of its time • Literary stylistics focuses on the exceptional
Even if there is no norm, we can describe style comparatively • Stylistics mainly involves comparing and contrasting texts • and associating linguistic variance with contextual explanation • Some authors see style as being what is added to the text
Stylistic analysis • Gulf between literary vs linguistic stylistics • Lit crit focuses on effect on the reader, intended or otherwise, so largely intuitive and subjective • Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description
Stylistic analysis • Inventory of linguistic devices and their effect • usually in a contrastive way: • in contrast with other writers in a similar genre • in contrast with other genres • Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc. • Effects can be directly expressive, or indirectly, by association • example: onomatopoeia vs alliteration as a phonological device
Stylistic analysisCrystal & Davy (1969) Investigating English Style • Informally identify stylistic features felt to be significant • Devise a method of analysis which facilitates comparison between usages • Identify the stylistic function of the features so identified
Types of features • “Invariable” features due to the individual or the time – usually of little interest • Discourse features • medium (= Halliday’s mode), what features distinguish written language from spoken language • participation: eg monologue vs dialogue • Province (= field) lexis and syntax • Status (= tenor) features relating to relative social standing of writer/speaker and reader/listener • Modality (= text type) eg message delivered as a letter, postcard, text message, email, etc • Singularity: deliberate occasional idiosyncracies
Method and function • Methods and features determine each other • you can only measure features that you can extract • simple counting features are easy to extract • more complex features can be extracted thanks to NLP techniques of corpus annotation (tagging, parsing, etc) • Describing the function of observed differences • could be based on intuition • or (see later) partially automated (factor analysis)
What to count • Simple things may characterise different styles • average sentence length • average word length • type:token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words • vocabulary growth (homogeneity of text) • number of new types in 1st, 2nd, …, nth 1000 words • in rich varied text, number will climb steadily • Especially when used comparatively
What to count • More complex analyses can give a more interesting picture • specific syntactic structures • degree of modification in NPs • types of verbs (eg verbs of persuasion, speech verbs, action verbs, descriptive verbs) • distribution of pronouns (1st/2nd/3rd person) • etc … (anything you can think of) • Quite sophisticated mathematical techniques can give an overall picture • eg factor analysis: identifies from a (big) range of variables which ones best identify/characterize differences
Normalization and significance • Always important to compare like with like • It is usual when counting things to “normalize” over the length of the text • If one text is longer than the other, of course you would expect higher frequencies of everything • Issue of statistical significance • Small differences may not really tell you anything • Various measures can confirm whether difference is statistically significant or due to random fluctuation
How to count • How to recognize paragraph breaks? • How to recognize sentence breaks? • Headlines don’t end in a fullstop • Not all sentences end in a fullstop • Not all full stops are sentence ending (abbreviations) • How to count words • Hyphenated words, contractions e.g. don’t • How to measure word-length/complexity • length only roughly corresponds to complexity • number of characters vs number of syllables • cf. through vs idea • counting syllables implies either a dictionary or an algorithm
More sophisticated counting • Tagging and parsing allows you to look at grammatical and lexical issues • Use of particular POSs (conjunctions, pronouns, auxiliaries, modals) • Use of particular features (tenses, …) • Use of particular constructions (passives, interrogatives)
Quantifying register differences • Much work based on corpora trying to quantify and characterize register differences • Work pioneered by Douglas Biber • Simple counts like the ones suggested • Also, more complex computations
Example From D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambriufge University Press, 1998. Ch 5: the study of discourse characteristics
Multidimensional analysis • Collect a huge range of measures of a wide variety • some simple word counts • syntactic features • classes and subclasses of N,V,Adj,Avd • Factor analysis
Factor analysis • Statistical method to take large number of apparently random variables and group them together into “factors” • Factors will be groups of (+ve and –ve) features • Linguist might then try to characterize the factors in terms of some psycholinguistic feature
Example • Biber took two Google classifications of text types: “Home” and “Science” • Harvested ~1500 webpages in each category (3.74m words) • originally got ~2500 webpages, but some were not suitable http://jan.ucc.nau.edu/biber/Web text types.ppt