1 / 21

UCCTS, 2010 (Omskrik)

Learn about the IAC developed by Barcelona Media & Universitat Pompeu Fabra for easy corpus building, searching methods, interfaces, and advantages/disadvantages. Explore examples of sequences and statistics, and discover how to insert a corpus into IAC.

joanjames
Download Presentation

UCCTS, 2010 (Omskrik)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IAC (ACCESS INTERFACE CORPUS)DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRATONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA)JUDITH DOMINGO (BARCELONA MEDIA)CARME COLOMINAS (UNIVERSITAT POMPEU FABRA) UCCTS, 2010 (Omskrik)

  2. IAC CORPORA USE: REQUIREMENTS • It’s easy to build corpus from the web but difficult to search • We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.

  3. IAC CORPORA: SEARCHING METHODS • Concordances software (MonoConc, Concordance) • Databases • Corpus query systems (ie.CQP, EMDROS) • Useful but tough to learn • Not useful for training as students spend too much time to learn the query system

  4. IAC CORPORA: INTERFACES (SEARCHING METHODS) ADVANTAGES • User-friendly • Not necessary training DISADVANTAGES • Learn more than 1 interface from the user point of view • Programming and design interfaces background needed (external resources) • If different attribute types are added > new design of the interface > new founding needed • Usually, more expensive than other options

  5. IAC was born (developed by Barcelona Media and UPF) IAC (ACCESS INTERFACE CORPUS) Translation Department (UPF) had many corpus (changing and growing constantly) GOALS • Monolingual and aligned corpora • Fast and easy creation of interfaces for corpora • One interface design for all the corpora

  6. IAC INTERFACES • Simple : Key Words Out of Context • Advanced : Key Words In Context • Statistics: KWIC and frequency-based results *** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart EXAMPLESIAC

  7. xml for metadata Tabular Verticalized xml for structural data IAC CORPUS FORMAT <metadata title = “Demo” year=“2010”> <func=subj> The Det sg boy Noun sg </func> buys Verb sg <func=DO> pencils Noun pl </func> </metadata>

  8. IAC CORPORA: INSERTING A CORPUS INTO IAC • Upload the corpus (txt file) at the server • Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added

  9. IAC CONCLUSIONS IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations • User-friendly tool • Access to multiple corpus from the same platform • No need of external developer or programming background • Fast interface creation that can be modified easily

  10. Thank you! • judith.domingo@barcelonamedia.org Temporary web: http://webconsultaiactemporal.barcelonamedia.org

  11. SOME EXAMPLES…

  12. ADVANCED SEARCH To show the advanced search, we use an annotated corpus with translation. Let's look at examples of sequences with 1 or more words with syntax errors.

  13. ADVANCED SEARCH

  14. ADVANCED SEARCH

  15. ALIGNED CORPORA WITH METADATA As example of aligned corpora, a Spanish > English corpus Poder (verb) Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.

  16. ALIGNED CORPORA WITH METADATA

  17. ALIGNED CORPORA WITH METADATA

  18. STATISTICS Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish

  19. STATISTICS

  20. STATISTICS Back

More Related