70 likes | 337 Views
2 Modern Approaches to Corpus Linguistics. Dominique Longrée , LASLA – Université de Liège et FUSL ( Bruxelles ). 2 Modern Approaches to Corpus Linguistics. • automatic taggers as heuristic tools • multilevel approaches : the motives what do they have in common ?.
E N D
2 Modern Approaches to Corpus Linguistics Dominique Longrée, LASLA – Université de Liège et FUSL (Bruxelles) 2 Modern Approaches to Corpus Linguistics • • automatic taggers as heuristic tools • • multilevel approaches : the motives • what do they have in common ?
2 Modern Approaches to Corpus Linguistics 1. Automatic taggers as heuristic tools • a LASLA research project : • testing various automatic recognition software, know as taggers • Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly • - from one type of text to another • - from one tagger to another. • Questions : • - are the results better with a tagger trained • on one author or on a given text • for another text • - by the same author, or within the same discourse? • - what can we deduce from those results regarding • the tagger or • the homogeneity of corpora?
2 Modern Approaches to Corpus Linguistics 1. Automatic taggers as heuristic tools • The test-texts : • - book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens • - The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), • - book 3 of The History of Alexander the Great by Quintus Curtius • – QC3 (7261 tokens), • - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) • - poem 66 of Catullus – Catu66 (586 tokens) • Varying the nature of the training and evaluation corpus , • in order to identify and measure variant factors : • style of the work • style of the author • diachrony • literary genre • type of discourse
2 Modern Approaches to Corpus Linguistics 1. Automatic taggers as heuristic tools • In theoretical terms : • taggers appear to have some value as heuristic instruments • For instance, highlight • - the homogeneity of the historical style • over and above diachronic development • - the gap between narration and discourse (speeches) • - the gap between the styles of Caesar and Cicero • a smaller gap between Catullus and Cicero • or between Catullus and Quintus Curtius/Tacitus • than the gap between Catullus and Caesar, • etc
2 Modern Approaches to Corpus Linguistics 2. Multilevel approaches : the “motives” • Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements • repente, subito ‘suddenly’, ‘abruptly’ • - syntactical structures / ‘linking clichés’ • Quibus rebus cognitis ‘Those things being known’ • Quod ubi animaduertit ‘When he had noticed that’ • Limits - no very analysis as text’s structure indicators - no study of their interaction • - poor use for characterising text genre and style
2 Modern Approaches to Corpus Linguistics 2. Multilevel approaches : the “motives” • The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode • - in order to evaluate text homogeneity • LASLA and BCL approach • - to develop endogenous exploratory methods - to take into account this text linearity • - to specify functional convergences between several indicators • methods • calling upon mathematical models (neighborhoods, bursts) • combining • small-scale qualitative approach • large-scope quantitative analysis
2 Modern Approaches to Corpus Linguistics 3. What do these approaches have in common ? • they take texts and discourses into account in both their dimensions • - the multilevel nature of texts and of languages, • from phonetics to pragmatics • - the fact that texts and discourses • - are organized according to linearity • - can be considered as topological entities.