1 / 43

Using the Corpógrafo

Using the Corpógrafo. Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA. First steps. Get a username and password You will receive one automatically. Working with the Corpógrafo. Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research All research done ONLINE

tamah
Download Presentation

Using the Corpógrafo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA USP workshop

  2. First steps • Get a username and password • You will receive one automatically USP workshop

  3. USP workshop

  4. USP workshop

  5. Working with the Corpógrafo • Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research • All research done ONLINE • Each username/password = separate space on our server • At present > anyone can work with it using 10 MB space for FREE • BUT - you get an empty space + tools + tutorial! USP workshop

  6. Help Files • Introdução à utilização do Corpógrafo - um pequeno tutorialA tutorial – to be translated into English – describing the whole process of terminiology research using the Corpógrafo. Available in PDF. • Corpógrafo RoadmapIn English and Portuguese – a panoramic view of the Corpógrafo and how it works. Available in PDF. • The Corpógrafo in Easy StagesIn English and Portuguese – User’s guide to the Corpógrafo and FAQ. Available in PDF. • Also Note > on entry page there is a Glossary of terms and instructions PT > EN USP workshop

  7. File Manager Area where each individual or group can: • upload texts to space on server • convert various text formats to .txt • ‘clean’ them of unnecessary material • check tokenization and sentence divisions • register full information on source, domain and text type • group – and re-group - texts into corpora USP workshop

  8. File Manager • 1. Files • >List Files on Server • >Add Files • >Add Files from URL (Experimental!)2. Corpora • > List Corpora> Compile New Corpus USP workshop

  9. USP workshop

  10. USP workshop

  11. EXTEX • Tool for converting file formats to .txt at: • http://poloclup.linguateca.pt/ferramentas USP workshop

  12. USP workshop

  13. USP workshop

  14. USP workshop

  15. USP workshop

  16. USP workshop

  17. General corpus analysis Corpora analysis area: • Concordancing tools for regular expressions • at sentence level • KWIC concordancing • Collocations • N-gram tool • Case-sensitive • Alphabetical or frequency ordering USP workshop

  18. USP workshop

  19. USP workshop

  20. USP workshop

  21. USP workshop

  22. Corpora + TDB • Choose corpus • Choose related TDB = All terms, examples, definitions extracted from corpus (semi) automatically transferred to TDB = All metadata on texts in corpus can be automatically transferred to TDB USP workshop

  23. Term extraction • N-grams • Unfiltered • Filtered with restrictions on term in PT,EN,FR,IT,ES,DE • Filtered with restrictions on term and context in PT,EN,FR,IT,ES,DE • Singular + plural terms can be combined • Existing terms in TDB need not appear USP workshop

  24. USP workshop

  25. Term selection from n/grams • Consultation of list of n-grams • Check term status of each n-gram via underlying concordances • Check sources • Send to TDB USP workshop

  26. USP workshop

  27. USP workshop

  28. USP workshop

  29. Search for definition candidates • Already possible via TDB • Under development • Research area for Mestrado dissertations and bolseiros USP workshop

  30. TDB - Terminology database Databases are designed to be multilingual • Terms listed alphabetically + language tag • General data • Morphological data • Source metadata: Authors, texts etc • Definitions + search for candidates • Translation equivalents • Semantic relations USP workshop

  31. USP workshop

  32. USP workshop

  33. USP workshop

  34. USP workshop

  35. USP workshop

  36. USP workshop

  37. USP workshop

  38. USP workshop

  39. USP workshop

  40. USP workshop

  41. USP workshop

  42. USP workshop

  43. Future developments – general policy • General testing and improvement • Development of new ideas or functions – using isomorphic relationships between researchers’ needs and our possibilities • Coordination of individual corpus projects into bigger projects, when possible or necessary USP workshop

More Related