230 likes | 414 Views
The Rôle of Linguistics for the Future of Language Processing. Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University at Saarbruecken. Outline. The development of linguistics Linguistics and the computer The relevance of CL for theoretical linguistics
E N D
The Rôle of Linguisticsfor the Future of Language Processing • Hans Uszkoreit • German Research Center for Artificial Intelligence • and Saarland University at Saarbruecken
Outline • The development of linguistics • Linguistics and the computer • The relevance of CL for theoretical linguistics • The role of linguistics for language technology • Current trends and outlook
IT in Science • Data-Gathering and Maintenance • automatic handling of large volumes of data • Scientific Computing • data and model visualization • data exploitation, • simulation • modelling • Electronic scientific information • data on research (centers, people, resources, projects, literature) • Electronic scientific content • reports, articles, books, e-journals, e-print archives
Development of Linguistics • first half of 20th century: linguistics becomes concrete structuralist linguistics - ontological concepts (entities and structures) • second half of 20th century: linguistics becomes formalgenerative linguistics - formalisms for syntax and semantics • first half of 21st century: linguistics becomes empirical empirical linguistics - quantitative models - graded grammaticality
The Rôle of Computation • formalization led to highly complex systems of formal rules, principles or constraints that cannot be tested, validated and modified without sophisticated information processing • language data of sufficient size cannot be gathered, searched, and maintained anymore without powerful computing
Empirical Linguistics • discrete findings • statistical findings • replicability • shared interpretations of data • connection with data and results
EMPIRICAL LINGUISTICS introspective data research experimental psycholinguistic data corpus data DB of relevant data
Driving Forces of CL Linguistics linguistic theory Engineering language technology applications Cognition models of human language processing
theoretical linguistics applied linguistics linguistics w/o the computer linguistics with the computer Role of Computing in Linguistics
Until 1980 Linguistics Computational Linguistics
1980-1990 Linguistics Computational Linguistics
1990 - 2000 Linguistics Computational Linguistics
LTMETHODS non-discrete discrete hybrid shallow HMM-based POS Tagger deep
LTMETHODS non-discrete discrete hybrid shallow HPSG-Parser with MRS deep
LTMETHODS non-discrete discrete hybrid shallow PCF Parser deep
LTMETHODS non-discrete discrete hybrid shallow syntactic LFG parser with ME selection deep
LTMETHODS (Trends) non-discrete discrete hybrid shallow deep
S VP NP NP V NP Det N A N Sue gave Paul an old penny. Simulation and Modelling
S S S/NP VP NP NP NP NP V NP Det N V NP Det N A N A N Sue gab Paul einen alten Pfennig. Sue gave Paul an old penny. $x[(old'(penny')) (x) Ù (Past(give'(sue‘, paul‘, x)))]
APPLICATIONS • Machine Translation e.g. Systran, Logos, METAL-Comprendium, IBM PT • Access to Databasese.g. Core Language Engine • New: Information Extraction and Text Enrichmente.g. WHITEBOARD, DEEP THOUGH
Problems with Deep Analysis • Coverage (Development Time) • Robustness (Coping with Out-of-Grammar Input) • Efficiency (Runtime and Space Efficiency) • Specificity (Selection among Readings)
Outlook • Linguistics will develop hybrid discrete and nondiscrete models of language • More subareas of linguistics will employ computational modelling • Computational linguistics will play a central role in the emprirical branch of linguistic research • Computational linguistics methods and results do have a future in language technology • Language technology will have to get more deeply into semantics • The field provides some grand challenges
Grand Challenges • hybrid models of language processing and learning, • models of language change • empirical methodology of language science: large multilevel linguistically interpreted data collections • ambient computing -- ubiquitous natural access to information and assistance • turning the WWW as well as personal and collective digital infor-mation repositories into digital memories and knowledge bases