60 likes | 183 Views
Preparing a corpus for the Sketch Engine. Vertical format. One word per line in a plain text file Suddenly , their luck changed. With lemmas and POS-tags. Suddenly suddenly RR , - PUN their their PRP luck luck NN1 changed change VVD . - PUN.
E N D
Kilgarriff: Preparing a corpus for SkE Preparing a corpus for the Sketch Engine
Kilgarriff: Preparing a corpus for SkE Vertical format • One word per line in a plain text file Suddenly , their luck changed .
Kilgarriff: Preparing a corpus for SkE With lemmas and POS-tags Suddenly suddenly RR , - PUN their their PRP luck luck NN1 changed change VVD . - PUN
Kilgarriff: Preparing a corpus for SkE With XML structure markup <doc id=“ABC” region=“UK” genre=“fiction”> <s> Suddenly suddenly RR <g/> , - PUN their their PRP luck luck NN1 changed change VVD <g/> . - PUN <s>
Kilgarriff: Preparing a corpus for SkE Corpus configuration file • Tells the system • Where data and other files are • What attributes • word, tag, lemma and structures • <doc> <p> <s> <g/> it contains • How to display
Kilgarriff: Preparing a corpus for SkE Simple example PATH /corpora/test2 ATTRIBUTE word ATTRIBUTE lemma ATTRIBUTE tag STRUCTURE doc { ATTRIBUTE region ATTRIBUTE genre } STRUCTURE s