50 likes | 159 Views
DLSI Lexical Analysis. Prof Brook Wu and Ph.D. student Xin Chen. Lexical Analysis. Focus on processing “text” Difficulties: word sense ambiguities, e.g.: regular “mouse” v.s. computer “mouse” irregularities, e.g.: datum, data
E N D
DLSI Lexical Analysis Prof Brook Wu and Ph.D. student Xin Chen
Lexical Analysis • Focus on processing “text” • Difficulties: • word sense ambiguities, e.g.: regular “mouse” v.s. computer “mouse” • irregularities, e.g.: datum, data • Part-of-speech tag ambiguities, e.g.: an “offer” (noun) v.s. “Prof Bieber offers …” (verb)
Lexical Analysis in DLSI project • Purpose: generate link anchors for important concepts in returned documents. • Work involved: • Find glossaries/thesauri on the web or contact DLSI partners for information. • Organize them into a master file. • Find glossary/thesaurus term in text using lexical analysis techniques, including tokenization, part-of speech tagging, parsing, and matching.
Qualifications and Supervision • You should participate because text processing and lexical analysis is getting popular, for there is very rich information available in text. Industry will want people who know how to effectively process documents. • Qualifications: • Proficiency in JAVA, or C++ • Supervision: • A team of up to 3 students will be supervised by Prof Wu, but will mainly be led by Xin Chen, a Ph.D. candidate in IS.