250 likes | 445 Views
Linguistics in an Age of Engineering. Christopher Manning Depts of Computer Science and Linguistics http://www.stanford.edu/~manning/ manning@cs.stanford.edu. Roadmap. Philosophy Sizing up the market Problems and negatives A hopeful future Practical stuff Size up the institution!
E N D
Linguistics in an Age of Engineering Christopher Manning Depts of Computer Science and Linguistics http://www.stanford.edu/~manning/ manning@cs.stanford.edu
Roadmap • Philosophy • Sizing up the market • Problems and negatives • A hopeful future • Practical stuff • Size up the institution! • What to teach • What’s the aim
The market: expanding • Web companies (search, B2B e-commerce, web agents, etc.) • Lots of statistical/machine learning stuff • Speech companies • Surprising vertical opportunities (there seems a lot of demand for lexicons – medical, chemical) • “Core NLP” is now a rather small part of what’s currently out there
The need for marketing In the technical/recruiting area, linguistics isn’t sufficiently recognized: • engineers/recruiters don’t understand it • one often sees vague descriptions like “language/linguistics background”, “love of language” • The field needs to develop a product, and to be able to supply it – e.g., undergrads
Attitudes • Linguistics departments do not generally see themselves as training students for technical careers • Linguistics departments generally don’t train students for technical careers • More of the excitement in CL these days seems on the engineering side than the cog sci side • Maybe they could/should do more training?
Attitudes • Theoretical and applied physics are different but by and large applied physicists think theoretical physicists do useful work, • Problem: by and large “language engineering” people think theoretical linguistics is wrong/ largely worthless • Truth is, the language engineering people aren’t completely wrong
Basis • (Especially in America) there has been a divorce between the concerns of theoretical linguistics and the empirical data on which language processing applications are built • The hottest theory is often the most contorted explanation of the smallest set of facts – sometimes facts that aren’t even right • I-language vs. E-language • Ideal speaker/hearer vs. speech communities
Directional problems • Best time for linguistics/CL interface: • c. 1988?: “Unification-based” feature grammars were at the height of their influence in both computational linguistics and linguistics • Worst time for linguistics/CL interface: • Around now? CL is increasingly dominated by statistical and other machine learning methods for which linguists lack training, interest, and appropriate math background
A hopeful future • Easily manipulable linguistic data is everywhere! • Solves the “linguistics of paucity of evidence” (J. Sinclair) of the 20th century • Computer competence is becoming more widespread • the statistics of the 21st century? • Exciting new and growing opportunities to apply work in areas like CL/machine learning to linguistic problems (lang acq, syntax, fieldwork).
A hopeful future • Resurgence of empirical linguistics • Hugely growing interest in doing empirical research using samples of real language • From many directions: not only traditional haunts like sociolx/lang acq., but also syntax, discourse, phonology, … • This usually means quantitative approaches and at any rate computation to support this • These people need CL theory too. Desperately.
Practical stuff There’s no one answer! 1: general atmosphere • Stanford: • 70% of undergrads complete CS106 (programming methodology, intro to C) • “Symbolic Systems Program” has a vital role • USyd: • Ling students nearly all without programming • CL classes have strongly bimodal enrollment
Internal and external links • Internal: • Empiricists are your friends (socio, phonetics) • At Stanford this is rapidly meaning everybody • External • Speech in EE/elsewhere • Learning/language processing in psychology • And, of course, CS departments • Extra-mural • companies in the local area are important
A tale of three cities • Carnegie Mellon: • comp ling program housed in a philosophy dept, mainly masters, mainly CS intake • University of Sydney: • linguistics dept, arts/sciences divide, shrinking opportunities in EE/CS, mainly arts, fieldwork • Stanford: • joint appointment; engineering/H&S division but permeable, SSP, pro-CL mind set
Institutional Context • The possibilities for CL depends more than one might think on broader intellectual climate • Working to develop the right institutional context is difficult – perhaps even impossible – for an assistant professor • It’s probably easiest/best if you can find a place that already has what you want!
Stanford • 1/2 + 1/2 (or 1?) or is it 3 ling CL faculty • UG -> Masters • Key driver is SSP. Produces lots of students • Currently no independent masters • May start one up • PhD level: in past many drifted out of CL • Now changing. A lot of opportunities • R&D labs crucial (Xerox, SRI, startups, …)
Programming Languages The five defensible choices: • Prolog: call me old-fashioned, but… • C: base-level workhorse (Stanford?) • Java: modern OO (plus GUI, etc.) • Perl: scripting is accessible and practical • none: use packages and environments Perl may be the most successful to teach in CL, but can’t do old-fashioned “parsing” course
Curriculum (Can’t really do in 1 slide) • Corpus-based/finite state/IR good starting point • Remember IR as well as speech! • Parsing, interpretation as 2nd course? • Coverage of modern statistical/corpus-based techniques is vital (jobs, interest, future) • Computational discourse, knowledge rep’n • Bridges: topics courses • E.g., OT learnability, HPSG grammar writing
Entry points • Can be quite diverse • From phonetics into speech • From empirical/field data into computational corpus linguistics • From syntactic theory into parsing • Lexicography into computational lexicography • Applied linguistics into CALL • For CL in linguistics to work, you need good connections to other departmental areas
Exit points • I think it is good to be clear on what one is aiming at…
NLP = NL Programming • Excellent UNIX, C/C++, Java, OOA/OOD experience desirable • C, C++ and/or Java is required. Perl, sed, awk, and similar tools is helpful • Proven programming skills in C, C++, and/or Perl are essential • PhD in speech, applied math, DSP, or NLP • Statistical language processing experience
The bigger part of the jobs • These people need a thorough CS background (don’t kid yourself) • You’ll get the odd enterprising kid appropriate for this anyway, but many more if there are interdisciplinary programs (Stanford: symbolic systems) • Connects to mainstream of academic NLP
(Computational) Linguists • Linguists … strong background in semantics, pragmatics, text analysis, or CL • Experience writing computational grammars essential. FST a plus. • Creating, updating and maintaining phonetic transcriptions. • Development of knowledge resources, building lexicons. • Knowledge of WordNet a plus
Smaller part of the jobs. Needs branding. • But there’s a reasonable number out there • A skill level that can easily be adjoined on to a linguistics program, even by one person • These people may not need to struggle through C(++) classes at all • Perl, other scripting (VBA?), general savvy, XML • Lexicon/grammar/morphology development
Conclusions • In a field in which it can be depressing to see smart capable students going off into a market without very many jobs for them, for those students who want to specialize in areas with other high-level job options, we should give them the skills and tools they need to get and succeed in those jobs • Moreover, I suspect a lot of the most scientifically interesting linguistics of the 21st century will centrally involve computational work