1 / 6

Preparing a corpus for the Sketch Engine

Preparing a corpus for the Sketch Engine. Vertical format. One word per line in a plain text file Suddenly , their luck changed. With lemmas and POS-tags. Suddenly suddenly RR , - PUN their their PRP luck luck NN1 changed change VVD . - PUN.

myles-scott
Download Presentation

Preparing a corpus for the Sketch Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kilgarriff: Preparing a corpus for SkE Preparing a corpus for the Sketch Engine

  2. Kilgarriff: Preparing a corpus for SkE Vertical format • One word per line in a plain text file Suddenly , their luck changed .

  3. Kilgarriff: Preparing a corpus for SkE With lemmas and POS-tags Suddenly suddenly RR , - PUN their their PRP luck luck NN1 changed change VVD . - PUN

  4. Kilgarriff: Preparing a corpus for SkE With XML structure markup <doc id=“ABC” region=“UK” genre=“fiction”> <s> Suddenly suddenly RR <g/> , - PUN their their PRP luck luck NN1 changed change VVD <g/> . - PUN <s>

  5. Kilgarriff: Preparing a corpus for SkE Corpus configuration file • Tells the system • Where data and other files are • What attributes • word, tag, lemma and structures • <doc> <p> <s> <g/> it contains • How to display

  6. Kilgarriff: Preparing a corpus for SkE Simple example PATH /corpora/test2 ATTRIBUTE word ATTRIBUTE lemma ATTRIBUTE tag STRUCTURE doc { ATTRIBUTE region ATTRIBUTE genre } STRUCTURE s

More Related