1 / 25

Linguistics in an Age of Engineering

Linguistics in an Age of Engineering. Christopher Manning Depts of Computer Science and Linguistics http://www.stanford.edu/~manning/ manning@cs.stanford.edu. Roadmap. Philosophy Sizing up the market Problems and negatives A hopeful future Practical stuff Size up the institution!

orenda
Download Presentation

Linguistics in an Age of Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linguistics in an Age of Engineering Christopher Manning Depts of Computer Science and Linguistics http://www.stanford.edu/~manning/ manning@cs.stanford.edu

  2. Roadmap • Philosophy • Sizing up the market • Problems and negatives • A hopeful future • Practical stuff • Size up the institution! • What to teach • What’s the aim

  3. The market: expanding • Web companies (search, B2B e-commerce, web agents, etc.) • Lots of statistical/machine learning stuff • Speech companies • Surprising vertical opportunities (there seems a lot of demand for lexicons – medical, chemical) • “Core NLP” is now a rather small part of what’s currently out there

  4. The need for marketing In the technical/recruiting area, linguistics isn’t sufficiently recognized: • engineers/recruiters don’t understand it • one often sees vague descriptions like “language/linguistics background”, “love of language” • The field needs to develop a product, and to be able to supply it – e.g., undergrads

  5. Attitudes • Linguistics departments do not generally see themselves as training students for technical careers • Linguistics departments generally don’t train students for technical careers • More of the excitement in CL these days seems on the engineering side than the cog sci side • Maybe they could/should do more training?

  6. Attitudes • Theoretical and applied physics are different but by and large applied physicists think theoretical physicists do useful work, • Problem: by and large “language engineering” people think theoretical linguistics is wrong/ largely worthless • Truth is, the language engineering people aren’t completely wrong

  7. Basis • (Especially in America) there has been a divorce between the concerns of theoretical linguistics and the empirical data on which language processing applications are built • The hottest theory is often the most contorted explanation of the smallest set of facts – sometimes facts that aren’t even right • I-language vs. E-language • Ideal speaker/hearer vs. speech communities

  8. Directional problems • Best time for linguistics/CL interface: • c. 1988?: “Unification-based” feature grammars were at the height of their influence in both computational linguistics and linguistics • Worst time for linguistics/CL interface: • Around now? CL is increasingly dominated by statistical and other machine learning methods for which linguists lack training, interest, and appropriate math background

  9. A hopeful future • Easily manipulable linguistic data is everywhere! • Solves the “linguistics of paucity of evidence” (J. Sinclair) of the 20th century • Computer competence is becoming more widespread • the statistics of the 21st century? • Exciting new and growing opportunities to apply work in areas like CL/machine learning to linguistic problems (lang acq, syntax, fieldwork).

  10. A hopeful future • Resurgence of empirical linguistics • Hugely growing interest in doing empirical research using samples of real language • From many directions: not only traditional haunts like sociolx/lang acq., but also syntax, discourse, phonology, … • This usually means quantitative approaches and at any rate computation to support this • These people need CL theory too. Desperately.

  11. Practical stuff There’s no one answer! 1: general atmosphere • Stanford: • 70% of undergrads complete CS106 (programming methodology, intro to C) • “Symbolic Systems Program” has a vital role • USyd: • Ling students nearly all without programming • CL classes have strongly bimodal enrollment

  12. Internal and external links • Internal: • Empiricists are your friends (socio, phonetics) • At Stanford this is rapidly meaning everybody • External • Speech in EE/elsewhere • Learning/language processing in psychology • And, of course, CS departments • Extra-mural • companies in the local area are important

  13. A tale of three cities • Carnegie Mellon: • comp ling program housed in a philosophy dept, mainly masters, mainly CS intake • University of Sydney: • linguistics dept, arts/sciences divide, shrinking opportunities in EE/CS, mainly arts, fieldwork • Stanford: • joint appointment; engineering/H&S division but permeable, SSP, pro-CL mind set

  14. Institutional Context • The possibilities for CL depends more than one might think on broader intellectual climate • Working to develop the right institutional context is difficult – perhaps even impossible – for an assistant professor • It’s probably easiest/best if you can find a place that already has what you want!

  15. Stanford • 1/2 + 1/2 (or 1?) or is it 3 ling CL faculty • UG -> Masters • Key driver is SSP. Produces lots of students • Currently no independent masters • May start one up • PhD level: in past many drifted out of CL • Now changing. A lot of opportunities • R&D labs crucial (Xerox, SRI, startups, …)

  16. Programming Languages The five defensible choices: • Prolog: call me old-fashioned, but… • C: base-level workhorse (Stanford?) • Java: modern OO (plus GUI, etc.) • Perl: scripting is accessible and practical • none: use packages and environments Perl may be the most successful to teach in CL, but can’t do old-fashioned “parsing” course

  17. Curriculum (Can’t really do in 1 slide) • Corpus-based/finite state/IR good starting point • Remember IR as well as speech! • Parsing, interpretation as 2nd course? • Coverage of modern statistical/corpus-based techniques is vital (jobs, interest, future) • Computational discourse, knowledge rep’n • Bridges: topics courses • E.g., OT learnability, HPSG grammar writing

  18. Entry points • Can be quite diverse • From phonetics into speech • From empirical/field data into computational corpus linguistics • From syntactic theory into parsing • Lexicography into computational lexicography • Applied linguistics into CALL • For CL in linguistics to work, you need good connections to other departmental areas

  19. Exit points • I think it is good to be clear on what one is aiming at…

  20. NLP = NL Programming • Excellent UNIX, C/C++, Java, OOA/OOD experience desirable • C, C++ and/or Java is required. Perl, sed, awk, and similar tools is helpful • Proven programming skills in C, C++, and/or Perl are essential • PhD in speech, applied math, DSP, or NLP • Statistical language processing experience

  21. The bigger part of the jobs • These people need a thorough CS background (don’t kid yourself) • You’ll get the odd enterprising kid appropriate for this anyway, but many more if there are interdisciplinary programs (Stanford: symbolic systems) • Connects to mainstream of academic NLP

  22. (Computational) Linguists • Linguists … strong background in semantics, pragmatics, text analysis, or CL • Experience writing computational grammars essential. FST a plus. • Creating, updating and maintaining phonetic transcriptions. • Development of knowledge resources, building lexicons. • Knowledge of WordNet a plus

  23. Smaller part of the jobs. Needs branding. • But there’s a reasonable number out there • A skill level that can easily be adjoined on to a linguistics program, even by one person • These people may not need to struggle through C(++) classes at all • Perl, other scripting (VBA?), general savvy, XML • Lexicon/grammar/morphology development

  24. Conclusions • In a field in which it can be depressing to see smart capable students going off into a market without very many jobs for them, for those students who want to specialize in areas with other high-level job options, we should give them the skills and tools they need to get and succeed in those jobs • Moreover, I suspect a lot of the most scientifically interesting linguistics of the 21st century will centrally involve computational work

More Related