50 likes | 228 Views
Linguistic Analysis of noisy Text Data. Working Group Notes. Summary. Motivations for LA of NTD Sources of Complexity Perspective View. LT×NDA: Motivations. NTD focuses on texts as target units of meaningful information central in any useful communication
E N D
LinguisticAnalysisofnoisy Text Data Working Group Notes
Summary • Motivationsfor LA of NTD • SourcesofComplexity • PerspectiveView
LT×NDA:Motivations • NTD focuses on textsas target unitsofmeaningful information central in anyusefulcommunication • LinguisticAnalysisis the suitabletoolformanaging the complexityofsuch information • Linguisticabstractions: • Words • Patterns (e.g. n-grams, phrases, …) • Topics are crucialtoincrease the qualityof the NTD analysis
LT×NDA: ComplexIssues • Sourcesofnoisevarywrttasks/domain/applications • Dynamics: NTD changesovertimeaccordingto • Languagechanges • Noisechanges • Social/Technologicalchanges (e.g. SMS styles, newproducts/nicknames, verbs) • Some domains and tasksrequiremodelsforquitedifferentformsofnoise • Audio Typing vs. Dialectalexpressions in interviews • Integrationbetweenauthoritative information and noisycommunication (medical domain: doctor vs. patient, written notes)
LT×NDA: PerspectiveView • Pleaseavoidlocal minima • Underspecifciation: Goodlearningalgorithmsmaybebetterthanveryrichfeatures • Examples in the WS proceedings