70 likes | 181 Views
Linguistic Annotation Framework. ISO TC37 SC4 Working Group 1 4/11/11 Brandeis University. Status of LAF document. Nearly final version Needs to be finalized in next few weeks
E N D
Linguistic Annotation Framework ISO TC37 SC4 Working Group 1 4/11/11 Brandeis University
Status of LAF document • Nearly final version • Needs to be finalized in next few weeks • Prior document has been distributed for comment from member country groups and changes have been made on this basis • GrAF schemas have been extensively implemented in two major corpora • Open American National Corpus (OANC) and Manually Annotated Sub-Corpus (MASC)
Remaining Issues • Definitions of • Anchors, regions (segmentation)** • Layers • Media • Need to verify that these are adequate for all media, including speech, image, etc. Would appreciate suggestions for wording concerning application to media
Remaining Issues • Resource Header • Definitions of various entities • Consistency of attribute names etc. • Consistency of reference from annotation documents etc.
Remaining Issues • Replace <tagUsage> in annotation document header with means to provide annotation categories used (and number of times used) Possibilities: • List of the categories (or ISOCat references) and frequencies • need to define an element for this • External document with the information • XML? • Specification of the categories without frequencies, e.g. documentation of the scheme • ???
Remaining Issues • Rewording of feature structure specification to reflect change in fs spec that accommodates GrAF format for <f> • <f name=“FE” value=“perceiver”/>
Remaining Issues • Document format • Placement of examples