80 likes | 178 Views
AnCoraPipe : A tool for multilevel annotation. Manu Bertran, Bàrbara Soriano, Oriol Borrega, Marta Recasens Universitat de Barcelona CBA 2008. Contents Data format Annotation interface Installation Description Future improvements. Data format.
E N D
AnCoraPipe: A tool for multilevel annotation Manu Bertran, Bàrbara Soriano, Oriol Borrega, Marta Recasens Universitat de Barcelona CBA 2008
Contents • Data format • Annotation interface • Installation • Description • Future improvements
Data format • Data are stored in UTF-8 encoded XML format. • Design principles: • Reduced inventory of node names. • Attributes are atomic. • Attributes describe only the node they depend of. • There is no redundancy in the data. • Adding new annotation levels/values is fast and easy. • Annotation time has been reduced by a whole 50%.
Annotation interface • Installation requirements: • Java 1.5 o higher. • SWT Java graphic library (included in our package for Windows XP). • Otherwise, the graphic library can be obtained with the free Eclipse package.
Annotation interface • Description • The interface is organized in a series of screens where specific data for each annotation level are shown. • The interface highlights all nodes capable of being annotated, and the sentences which have not been marked yet, in order to make the annotator’s work easier.
Annotation interface • The system allows for the addition of external tools for specific annotation levels: • WordNet • Coreference
Future improvements • Making the tool available from the Internet, adapting it to Linux and Mac environments. • Implementing corpus query methods from the interface. • Implementing statistical corpus description methods. • Adding tools to handle verbal and nominal lexicons. • Adding semiautomatic methods and machine learning functions for the partial annotation of corpora.
AnCora • http://clic.ub.edu/ • http://clic.ub.edu/ancora/ • http://clic.ub.edu/mbertran/tbfeditor/