290 likes | 403 Views
Peter Grzybek & Ernst Stadlober. Quantitative Text Typology. http://www-gewi.uni-graz.at/quanta http://quanta.uni-graz.at Austrian Research Fund Project #15485. Let‘s suppose there is …. … A Universe of Texts. Is the Universe Structured ? Or Can We Structure it ?.
E N D
Peter Grzybek & Ernst Stadlober Quantitative Text Typology http://www-gewi.uni-graz.at/quanta http://quanta.uni-graz.at Austrian Research FundProject #15485
Let‘s suppose there is … … A Universe of Texts
Is the Universe Structured ? Or Can We Structure it ? How Can the Text Universe Be Structured?
Corpus Analysis vs.Text Analysis • (Re-)Construction • of a norm • of a standard • of „language“ Text As a Homogeneous Entity „Text Mixture“ Self-regulating System („Quasi Text“) Complete Text
What is a Text ? • Complete novel, composed of books ? • Complete book of a novel, consisting of several chapters ? • Individual chapters ? • Dialogical vs. narrative sequences within a text ? • Two Major Problems: • Data Homogeneity • Definition of Basic Analytical Units
Both problems relevant for quantitative approaches WHY QUANTITATIVE APPROACHES ? • ASSUMPTION: • If a ‚text‘ is governed by synergetic processes, these processes can and must be quantitatively described. • The descriptive models obtained for each ‚text‘, can be compaired to each other, possibly resulting in one or more general model(s). • Thus, a quantitative typology of texts can be obtained.
WHY WORD LENGTH ? Synergetics In a Nutshell – Frequencies and Dependencies Word Length: Graphemes, Phonemes, Syllables, Morphemes,…
TYPES OF TEXT TYPOLOGIES • I. Qualitative • II. Quantitative-Qualitative • Tabula Rasa Principle (Clustering Methods) • A-priori A-posteriori Principle (Discrimination Methods)
Structuring the Text Universe (Ia): Text Sorts
Structuring the Text Universe (Ib): Functional Styles
In a qualitative approach, the text universe is structured with regard to external (pragmatic) factors („with reference to the world“) • general communicative functions of language (functional styles) • specific situational functions (text sorts)
Top-Down Bottom-Up
Bottom-Up Top-Down First and Second Order Cross Comparisons
Intended Emphasis on Letters • ‚Letter‘ as a Prototype of Language • Located between Oral and Written Communication • Result of One Homogeneous Process of Text Generation
A Small World of Texts Word Length Frequencies (in %) of Four Texts Literary Prose Text (#256) Versified Poetic Text (#359) Journalistic Comment (#324) Private Letter (#1)
Post-Hoc-Tests (Text Sorts) Groups without significant differences form „homogeneoussubgroups“ • Homogeneous subgroups do exist • All four letter types in different subgroups !
Post-Hoc-AnalysesHomogeneousSubgroups DiscriminantanalysesCases are attributed to groups, on the basis of specific predictor variables Thevariablesare submitted to linear transformations in order to arrive at an optimal discriminationof the individual cases
DiscriminantAnalysis: Eight Text Sorts Discrimination variables: m1, m2, v, p1 (56.30%)
Discriminant Analysis: Four Letter Types (n=213) {Private L.} {Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m1, v 70.40 %
Discriminant Analysis: Three Letters Types (n=213) {Private L., Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m1, p2 86.90 % Distinction of Literary Letters Irrelevant ?
Discriminant Analysis: Private vs. Public Letters (n=213) {Private L., Ep. Novel}, {Readers‘ & Open L.} Discrimination variables: m1, p2 92.00 % Distinction of Private vs. Public Styles ?
Discriminant Analysis: Private vs. Public Texts (n=248) {Private L., Ep. Novel}, {Readers‘ & Open L., Comments} Discrimination variables: m1, p2 91.10 % Public vs. Private Styles ?
Discriminant Analysis: Private/Oral vs. Public/Written Texts (n=290) {Private L., Ep. Novel, Drama}, {Readers‘ & Open L., Comments} Discrimination variables: m1, p2 92.40 % Oral vs. Written Styles ?
Discriminant Analysis: Three Text Types (n=330) {Private / Oral} {Public / Written} {Verse} Discrimination variables: m1, p2, v 91.20 % Towards a New Typology ?
Discriminant Analysis: Four Text Types (n=398) {Private / Oral} {Public / Written} {Prose} {Verse} Discrimination variables: m1, p2, v 79.90 %
Discriminant Analysis: Three Text Types (n=398) {Private / Oral} {Public / Written / Prose} {Verse} Discrimination variables: m1, p2, v 92.70 %