580 likes | 661 Views
Some statistical methods on syntactic variables in L1 writing Report from an ongoing study. Bård Uri Jensen PhD student UiB / Hedmark University College (Hamar) Solstrand 2010-03-26. Contents. Introducing the project The ELEV corpus vs the ASK corpus Extracting data Analysing data.
E N D
Somestatisticalmethodsonsyntactic variables in L1 writingReport from an ongoingstudy Bård Uri Jensen PhD student UiB / Hedmark University College (Hamar) Solstrand 2010-03-26
Contents • Introducing the project • The ELEV corpus vs the ASK corpus • Extracting data • Analysing data
My doctoral project • Research question • Do peopletend to make differentgrammaticalchoiceswhenthey type onkeyboardratherthanwrite by hand? • Hypotheses • Higherproduction speed affectsthechoices in a ”spontaneous” direction • Skilledwritersmayutilisetheenhancedfunctionality and shift features in theoppositedirection • Otherpsychologicalfactorsmayaffectthechoices • motivationalfactors • social media norms
The ELEV corpus • A ”parallel” corpus of hand-written and keyboarded texts • Two texts by each pupil • The ASK corpus system • Manual syntactic segmentation • t-units • clauses • fragments • No error tags
<t-unit> All humans aredifferent, </t-unit> <t-unit> Womenuse computers </t-unit> <t-unit> and boys readbooks </t-unit> <t-unit> I like cross-countryskiing. Because it givesmebetterstamina. </t-unit> <t-unit> Alle mennesker er forskjellige, </t-unit> <t-unit> Kvinnfolk driver på data </t-unit> <t-unit> og gutter leser bøker </t-unit> <t-unit> Jeg liker å få på ski. Fordi det gir meg bedre kondisjon. </t-unit>
<t-unit type="imp"> get (yourself) drunk. </t-unit> <t-unit type="spm"> Is this a healthydevelopment? </t-unit> <t-unit type="imp"> drikk deg full. </t-unit> <t-unit type="spm"> Er dette en sunn utvikling? </t-unit>
<t-unit> The police know <clause type="nominal"> therearepeople under 18 <clause type="relativ"> who drink there, </clause> </clause> </t-unit> <t-unit> Politiet vet <clause type="nominal"> det er folk under 18 <clause type="relativ"> som drikker der, </clause> </clause> </t-unit>
<frag> Butwhataboutotherbooks? </frag> <t-unit type="frag"> but [I] know aboutseveralgirls <clause type="relativ"> whodon’t do it also! </clause> </t-unit> <frag> Men hva med andre bøker? </frag> <t-unit type="frag"> men veit da om flere jenter <clause type="relativ"> som ikke gjør det også! </clause> </t-unit>
<t-unit type="spm"> Is this a <corrsic=”helthy"> healthy </corr> development? </t-unit> <t-unit type="spm"> Er dette en <corr sic="sund"> sunn </corr> utvikling? </t-unit>
Corpus searches [features='.* subst .*']; <t-unit>[]*</t-unit>; <t-unit_type=”imp”>[]*</t-unit>; <t-unit>[]{5,10}</t-unit>; <t-unit>([lemma='\$.']*[!lemma='\$.']){5,10}[lemma='\$.']*</t-unit>;
Corpus searches : frontal subclauses <t-unit> [features='.* konj .*']?(<clause_type="nominal"> | <clause_type="relativ"> | <clause_type="adverbial">) [];
Corpus searches : embedding <t-unit>[!clause]+<clause>[]*</clause>[!clause]+</t-unit>; <t-unit>[!clause]+<clause_type!="relativ">[]*</clause>[!clause]+</t-unit>;
Corpus searches :lexical distribution [lemma!='\$.']; [features=".* verb .*"];
Statistics : Three examples • Some simple analyses • differences of mean • correlations • Classification analysis • Clustering
Classification analysis • Independent variables (parameters) • writing mode • hand ~ keyboard • writing skills • medium ~ high • gender • essay question • Dependent variable • freqof attributive adjectives • subclausefreq