Using the Corpógrafo

Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA USP workshop

First steps • Get a username and password • You will receive one automatically USP workshop

USP workshop

Working with the Corpógrafo • Corpógrafo is a suite of integrated tools for INDIVIDUAL or GROUP research • All research done ONLINE • Each username/password = separate space on our server • At present > anyone can work with it using 10 MB space for FREE • BUT - you get an empty space + tools + tutorial! USP workshop

Help Files • Introdução à utilização do Corpógrafo - um pequeno tutorialA tutorial – to be translated into English – describing the whole process of terminiology research using the Corpógrafo. Available in PDF. • Corpógrafo RoadmapIn English and Portuguese – a panoramic view of the Corpógrafo and how it works. Available in PDF. • The Corpógrafo in Easy StagesIn English and Portuguese – User’s guide to the Corpógrafo and FAQ. Available in PDF. • Also Note > on entry page there is a Glossary of terms and instructions PT > EN USP workshop

File Manager Area where each individual or group can: • upload texts to space on server • convert various text formats to .txt • ‘clean’ them of unnecessary material • check tokenization and sentence divisions • register full information on source, domain and text type • group – and re-group - texts into corpora USP workshop

File Manager • 1. Files • >List Files on Server • >Add Files • >Add Files from URL (Experimental!)2. Corpora • > List Corpora> Compile New Corpus USP workshop

USP workshop

EXTEX • Tool for converting file formats to .txt at: • http://poloclup.linguateca.pt/ferramentas USP workshop

USP workshop

General corpus analysis Corpora analysis area: • Concordancing tools for regular expressions • at sentence level • KWIC concordancing • Collocations • N-gram tool • Case-sensitive • Alphabetical or frequency ordering USP workshop

USP workshop

Corpora + TDB • Choose corpus • Choose related TDB = All terms, examples, definitions extracted from corpus (semi) automatically transferred to TDB = All metadata on texts in corpus can be automatically transferred to TDB USP workshop

Term extraction • N-grams • Unfiltered • Filtered with restrictions on term in PT,EN,FR,IT,ES,DE • Filtered with restrictions on term and context in PT,EN,FR,IT,ES,DE • Singular + plural terms can be combined • Existing terms in TDB need not appear USP workshop

USP workshop

Term selection from n/grams • Consultation of list of n-grams • Check term status of each n-gram via underlying concordances • Check sources • Send to TDB USP workshop

USP workshop

Search for definition candidates • Already possible via TDB • Under development • Research area for Mestrado dissertations and bolseiros USP workshop

TDB - Terminology database Databases are designed to be multilingual • Terms listed alphabetically + language tag • General data • Morphological data • Source metadata: Authors, texts etc • Definitions + search for candidates • Translation equivalents • Semantic relations USP workshop

USP workshop

Future developments – general policy • General testing and improvement • Development of new ideas or functions – using isomorphic relationships between researchers’ needs and our possibilities • Coordination of individual corpus projects into bigger projects, when possible or necessary USP workshop

Using the Corpógrafo

Using the Corpógrafo

Presentation Transcript

Technical Issues: Update for S Corp ESOPs

Case 4: The Battle for Value, 2004: FedEx Corp. vs. United Parcel Service, Inc.

Woodgrove Bank Corp.

Introduction to Motors

E3500 Handheld Explosive Detector

First Sunday of Advent 2013-12-01

Oracle9 i Application Server Henry Byorum Director, Business Development Oracle Corp.

AMCA International Technical Seminar 2009

Welcome to APLL

Preferred Utilities Manufacturing Corp