250 likes | 452 Views
Using Xaira to explore corpora. Tony McEnery. What is Xaira?. An acronym for X ML A ware I ndexing and R etrieval A rchitecture The XML-aware version of SARA for the BNC corpus Including the Index Toolkit and the Client Working reliably with all writing systems supported by Unicode.
E N D
Using Xaira to explore corpora Tony McEnery
What is Xaira? • An acronym for XML Aware Indexing and Retrieval Architecture • The XML-aware version of SARA for the BNC corpus • Including the Index Toolkit and the Client • Working reliably with all writing systems supported by Unicode
Getting your corpus ready for use with the Xaira client • Mark up the corpus in XML • Markup can be very complex or very simple • If your corpus is not XML marked up, use Index Tool (Tools – Preprocess in the Index Toolkit) to add simple XML markup • For a non-alphabet language corpus, convert it into Unicode (e.g. UTF-8, UTF-16) • Use the Index tool (Tools – Index Wizard in the Index Toolkit) to index your corpus
Word query • Click on the first icon • Type in a search word, click on Query
Quick (Phrase Query) • Quick Query (Phrase query)
Addkey (POS/Lemma) Query ( ) • Search for a POS class • Search for a word of a particular POS class
XML Query ( ) • Search for a XML element
Query Builder ( ) • A powerful combination of all query types
Reference • Corpus name: Freiburg • Subcorpus: Null (not defined) • Total number of hits: 380 (found in 28 files) • The mouse is at No. 379 • Location of this line: sentence No. 1529 in Sample No. 16 in the file FLOB_C in the folder of FLOB
Page mode vs. line mode ( ) • Page mode: one concordance per page (use Page Up/Down to turn pages) • Line mode: KWIC
Sorting ( ) • 1st/2nd/3rd sort
Copy, select concordances • Right click on a concordance line to copy, (block) select concordance(s), etc.
Plain text vs. XML display ( ) • XML format
Thinning and editing ( ) • Removing unwanted concordances • Selection: Keep the selected concordances • Reverse selection: Keep the unselected concordances • Random • One per text
Collocation ( ) • Compute collocations of the search term
Defining subcorpora • “Texts – Column control” in the Client • “Texts – Define partition” in the Client
Distribution ( ) • “Texts – Open Partition” in the Client • As per Corpus As per text class
Save and export a query • Save: “File – Save (as)” for later use • Export (XML): “Query – Listing” • Edit a query ( ) so you don’t need to type in everything when making a related new query
Xaira FAQs • Is Xaira free and where can I get it? • Yes, it is absolutely free. You can get a copy (binary for Windows, and source codes for compilation on the Unix/Linux/Mac system) at the SourceForce website. The latest release is 115. http://sourceforge.net/project/showfiles.php?group_id=130289 • Where can I get more documentation? • You can get more documentation at the Xaira site: http://www.oucs.ox.ac.uk/rts/xaira/ • Where can I get technical help? • You can sign up for the Xaira Preview List to get help: http://www.tei-c.org.uk/tei-bin/betatest