240 likes | 385 Views
Overview. LING 5200 Computational Corpus Linguistics Martha Palmer. What’s a corpus?. McEnery & Wilson: (i) (loosely) any body of text (ii) (most commonly) a body of machine-readable text
E N D
Overview LING 5200 Computational Corpus Linguistics Martha Palmer
What’s a corpus? • McEnery & Wilson: • (i) (loosely) any body of text • (ii) (most commonly) a body of machine-readable text • (iii) (more strictly) a finite collection of machine-readable text, sampled to be maximally representable of a language or variety BASED on Kevin Cohen’s LING 5200
What’s corpus linguistics? • “the study of language based on examples of ‘real life’ language use” (McEnery & Wilson) • A methodology, not a branch of linguistics • Biber et al.: • Uses computers • “Natural” texts • Large & principled collection • Both quantitative and qualitative BASED on Kevin Cohen’s LING 5200
What was Chomsky’s complaint? • Linguistics should model competence not performance. What are the underlying rules that allow us to generate language? • Context – structuralists believed in collecting linguistic data about a language without taking meaning and communication into consideration. • Mirrors the debate between the rationalists and the empiricists. • But, does Chomsky account for meaning? (see Searle) BASED on Kevin Cohen’s LING 5200
Phonetics Phonology Morphology Syntax Semantics Pragmatics Psycholinguistics Computational Lx Descriptive Lx Historical Lx Sociolinguistics Which Linguistic branches can make use of corpus linguistics? BASED on Kevin Cohen’s LING 5200
Natural Corpus Language Linguistics Processing Computational Linguistics Corpus linguistics in context data applications models BASED on Kevin Cohen’s LING 5200
What’s LING 5200 Corpus Linguistics? • Tools • Techniques BASED on Kevin Cohen’s LING 5200
Overview • Quick intro to Unix • A little corpus design • Quick tour of corpora and annotation • Tools for working with corpora • Programming in Python • Some software engineering BASED on Kevin Cohen’s LING 5200
Why Python? • It works • Many advantages • It’s a bona fide programming language • You’ll need it for CSCI 5832 BASED on Kevin Cohen’s LING 5200
Administrative things • Textbooks – Unix, Python • Office hours – Mon 5-6, Tues 1-2 • verbs.colorado.edu/mpalmer/ling5200 • Prerequisites - none • Grades – homeworks/project • Accounts on babel BASED on Kevin Cohen’s LING 5200
Logging on for the first time • First thing to do: change your password. • passwd • Give it your current password, then your new password. Repeat the new one. (to catch typos) BASED on Kevin Cohen’s LING 5200
Connecting with another computer ssh –l your_name babel.colorado.edu You are prompted to log in. BASED on Kevin Cohen’s LING 5200
Logging on for the first time, again • First thing to do: change your password. • passwd • Give it your current password, then your new password. Repeat the new one. (Why?) BASED on Kevin Cohen’s LING 5200
Where am I? • Type pwd • You see something like this: /home/mpalmer BASED on Kevin Cohen’s LING 5200
What's that mean?? BASED on Kevin Cohen’s LING 5200
Important directories / bin home etc usr local mpalmer ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS
Important directories / bin home etc usr local mpalmer /home/mpalmer/ling5200 ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS
Important directories / bin home etc usr local mpalmer /home/mpalmer/ling5200 /usr/local/bin ling5200 bin BASED on Kevin Cohen’s LING 5200 RCS
Navigating directories • ls to list contents, cd to change directory • Directories are just like windows folders • /home/mpalmer shortcut: ~ • “the directory above this one”: .. • “this directory”: . BASED on Kevin Cohen’s LING 5200
What's in the neighborhood? • Type ls • You see a list of directories and files that are contained within the current directory Homework_1.txt tools buglog.txt BASED on Kevin Cohen’s LING 5200
I'd like to go somewhere else… • Type pwd • Type cd • Where are you? • Type cd .. • Where are you? • Type cd your_user_id • Where are you? BASED on Kevin Cohen’s LING 5200
Unix is a verb-initial language cd .. "go" where to go BASED on Kevin Cohen’s LING 5200
Unix is a verb-initial language cd If no argument, I assume you mean "home" "go" BASED on Kevin Cohen’s LING 5200
Making a new directory • Type cd • Type ls • Type mkdir ling5200 • Type ls • Go to the directory you just made (how?) • Type pwd • Type ls BASED on Kevin Cohen’s LING 5200