140 likes | 327 Views
Role of NLP in Linguistics. 16-07-2010 Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad India. NLP and Linguistics. Have similar goals Understanding human language(s)
E N D
Role of NLP in Linguistics 16-07-2010 Dipti Misra Sharma Language Technologies Research Centre International Institute of Information Technology Hyderabad India
NLP and Linguistics • Have similar goals • Understanding human language(s) • NLP relies on the theoretical models provided by linguistics • Therefore, NLP definitely needs linguistics What about Linguistics ? Does it benefit from NLP ?
NLP is useful • NLP tools can be useful for certain linguistic tasks such as • collecting, organizing, classifying data, • providing statistics etc. • This saves effort, brings forth facts which help in generalizations .... Makes life easier for linguists
NLP and Linguistics Resources • NLP techniques are useful for creating linguistic resources such as • verb frames, transfer grammars, bilingual lexicons etc • Studies in CL have shown the usefulness of NLP techniques in historical linguistics as well (e.g. phylogenetic trees) Thus, NLP is useful not only for data related tasks but also for creation of linguistic resources
What else ? • NLP researchers and linguists look at language from different perspectives • NLP researchers look for solutions which provide higher coverage • exceptions can be dealt with later • Linguistic researchers find exceptions more interesting • these help identify problem areas for the theory
Resource creation for NLP involves a close study of large scale real time data (e.g. linguistic annotation) Close look at real time data often springs linguistic issues which have theoretical implications However
Our experience • Hindi has • A long list of lexical items • Historically derived from Sanskrit verb roots • But • Are categorized as adjectives in Hindi For example, ‘sthita’ (situated), swiikrita (accepted), sviikaarya (acceptable), likhita (written), kathit (told) ……
However These ‘adjectives’ of Hindi have modifiers which have argument like properties – both semantically and syntactically For example, dillii mein sthit qutub miinaar ek darshaniiy Delhi in situated Qutub Minar one worth-watching sthal hai place is Qutub Minar situated in Delhi is a place worth visiting unke dvaaraa kathit kahaaniyaan bahut pracalit hain Them by ` told stories very popular are The stories told by them are very popular
The issue (1/2) • Both ‘dillii mein’ and ‘unke dvaaraa’ have appropriate case markers • ‘mein’ is locative and ‘dvaaraa’ agentive • These adjectives are historically non-finite verbs • However, Hindi grammars do not account for them so anymore • These are not morphologically decompositional either
Morphological decomposition of sthit (situated) and kathit (told) would lead to a Sanskrit analysis and NOT a Hindi analysis Hindi, for example, does not have ‘sthaa’ or ‘kath’ as verb roots It doesn’t have ‘ita’ as an active participial suffix either. How do we explain the argument like properties of their modifiers ? The issue (2/2)
Linguists understand the relation but not through a linguistic process of Hindi A linguistic process (or at least the roots and suffixes) from Sanskrit will have to be brought in Is it that languages have elements which are at different stages of development/evolution ? What does it indicate ?
Another example • Indian languages show frequent use of complex predicates Examples: pratiikshaa karnaa (wait do), kshamaa karnaa (forgive do) • The problem, When is an NV sequence a complex predicate and when it is not ?
The problem has long been discussed in linguistics literature Several diagnostics have also been proposed However, Quite a few NV sequences are a single unit semantically Syntactically, they fail the diagnostics The question remains, Do we consider such cases as ‘complex verbs’ or as instances of ‘verb argument’ ? Complex Predicates
Conclusions • NLP tools and techniques can be useful for linguists • NLP throws up rich examples which need to be handled • Poses challenges for the theory