1 / 24

Using Xaira to explore corpora

Using Xaira to explore corpora. Tony McEnery. What is Xaira?. An acronym for X ML A ware I ndexing and R etrieval A rchitecture The XML-aware version of SARA for the BNC corpus Including the Index Toolkit and the Client Working reliably with all writing systems supported by Unicode.

sienna
Download Presentation

Using Xaira to explore corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Xaira to explore corpora Tony McEnery

  2. What is Xaira? • An acronym for XML Aware Indexing and Retrieval Architecture • The XML-aware version of SARA for the BNC corpus • Including the Index Toolkit and the Client • Working reliably with all writing systems supported by Unicode

  3. Getting your corpus ready for use with the Xaira client • Mark up the corpus in XML • Markup can be very complex or very simple • If your corpus is not XML marked up, use Index Tool (Tools – Preprocess in the Index Toolkit) to add simple XML markup • For a non-alphabet language corpus, convert it into Unicode (e.g. UTF-8, UTF-16) • Use the Index tool (Tools – Index Wizard in the Index Toolkit) to index your corpus

  4. Word query • Click on the first icon • Type in a search word, click on Query

  5. Quick (Phrase Query) • Quick Query (Phrase query)

  6. Addkey (POS/Lemma) Query ( ) • Search for a POS class • Search for a word of a particular POS class

  7. XML Query ( ) • Search for a XML element

  8. Query Builder ( ) • A powerful combination of all query types

  9. Reference • Corpus name: Freiburg • Subcorpus: Null (not defined) • Total number of hits: 380 (found in 28 files) • The mouse is at No. 379 • Location of this line: sentence No. 1529 in Sample No. 16 in the file FLOB_C in the folder of FLOB

  10. Page mode vs. line mode ( ) • Page mode: one concordance per page (use Page Up/Down to turn pages) • Line mode: KWIC

  11. Sorting ( ) • 1st/2nd/3rd sort

  12. Copy, select concordances • Right click on a concordance line to copy, (block) select concordance(s), etc.

  13. Plain text vs. XML display ( ) • XML format

  14. Thinning and editing ( ) • Removing unwanted concordances • Selection: Keep the selected concordances • Reverse selection: Keep the unselected concordances • Random • One per text

  15. Collocation ( ) • Compute collocations of the search term

  16. Defining subcorpora • “Texts – Column control” in the Client • “Texts – Define partition” in the Client

  17. Distribution ( ) • “Texts – Open Partition” in the Client • As per Corpus As per text class

  18. Save and export a query • Save: “File – Save (as)” for later use • Export (XML): “Query – Listing” • Edit a query ( ) so you don’t need to type in everything when making a related new query

  19. Truly multilingual - Chinese

  20. Truly multilingual - Bengali

  21. Truly multilingual - Hindi

  22. Truly multilingual - Punjabi

  23. Truly multilingual - Urdu

  24. Xaira FAQs • Is Xaira free and where can I get it? • Yes, it is absolutely free. You can get a copy (binary for Windows, and source codes for compilation on the Unix/Linux/Mac system) at the SourceForce website. The latest release is 115. http://sourceforge.net/project/showfiles.php?group_id=130289 • Where can I get more documentation? • You can get more documentation at the Xaira site: http://www.oucs.ox.ac.uk/rts/xaira/ • Where can I get technical help? • You can sign up for the Xaira Preview List to get help: http://www.tei-c.org.uk/tei-bin/betatest

More Related