220 likes | 336 Views
HLT Industry in the Netherlands. Piek Vossen Faculteit der Letteren, Vrije Universiteit Amsterdam Irion Technologies, Delft Workshop HLT Collaboration SA & Low Countries 24-26 November 2008, Cape Town. Overview of HLT-NL. NLT Text (33).
E N D
HLT Industry in the Netherlands Piek Vossen Faculteit der Letteren, Vrije Universiteit Amsterdam Irion Technologies, Delft Workshop HLT Collaboration SA & Low Countries 24-26 November 2008, Cape Town
NLT Text (33) • Thesaurus based text processing: Collexis, GridLine, Knowledge Concepts • Text mining: Textkernel, Irion • Spelling: Polderland, *TALO • Search: Irion, WiseGuys, Ilse, Intelligent • Classification: Irion, Collexis, Textkernel • Summarization: Carp Technologies • User profiling, data mining: AskNow, Sentient Machine Research
NLT Text (33) • Dialogue/Q&A: AskNow, Elitech, Q-go, Irion, • Lexicons: Van Dale • Translation tools: Lingvistica, Topterm, Linguistic Systems • Document & knowledge management: Getronics, CIBIT, AI Engineering, ZyLAB Europe, Niceware, Sopheon, Human Inference, LibRT • Manual text complexity: Bureau Taal • Manual language analysis, trends and politics: Kieskompas, Trendlight • Medical language tools: Lexima, ViaTaal, inTaal • Semantic web: LibRT
NLT Speech (18) • !Effective ASR telephone applications, stock market • Comsys ASR TTS telephone applications, call centres • Dedicon TTS spoken documents for disabled • Dialogues Unlimited ARS telephone applications • DutchEar ASR TTS telephone applications, self services, colleague connect, stock market, traffic support, helpdesk support, speaker identification • Fluency TTS text to synthetic speech • FORUS-P ASR database management • G2 Speech ASR dictating, work flow management, medical domain, legal domain • Group 2000 ASR telephone applications
NLT Speech (18) • Kompagne ASR TTS medical domain • Logica ASR TTS contact & call centers • ORCAvoice ASR TTS telephone applications • Philips ASR TTS dictating, medical domain, legal domain • Sound Intelligence Sound hearing aid, medical domain • Telecats ARS TTS telephone applications, information retrieval, messaging, routing, call handling en large platforms • VoCognition ASR TTS logistics of storage centres • Voice Data Bridge ASR TTS telephone applications, information retrieval, telecom operators • YPCA ASR interactive services, automobile concepts
NLT-Text Collexis: • http://www.collexis.com/ • Technology: • Fingerprints of documents using the knowledge residing in a thesaurus or multiple thesauri • Fingerprints from existing results used to generate new results with higher precision • Discovering the relationships between the elements of different content sources and uncovering unique information • Application: Search, Knowledge management, Text mining • Market: Government, Legal, Health science • Projects & software
NLT-Text Gridline: • http://www.gridline.nl/ • Technology: Semi-automatic development of thesauri and ontologies • Application: Search, Authoring • Market: Government, Law firms • Projects & software
NLT-Text KnowledgeConcepts: • http://www.knowledge-concepts.com/ • Technology: • Relation detection in text through use of semantic networks, thesauri and taxonomies • Part-Of-Speech taggers, lemmatisers, entity extractors, stopword lists, and language identifiers • Application: • multilingual search, • classification and analytical products • Market: • Government, Banks, PTT, Publishers • Projects & software
NLT-Text Polderland: • http://www.polderland.biz/ • Technology: • spelling suggestion/correction • fuzzy matching • semantic expansion • Application: • search, • content- and document management software, • authoring, • automatic classification and meta data extraction • Market: • CRM-systemen, • contactcenter software, • publishing systems, • sharepoint and portal-server systems • Projects & software
NLT-Text Q-GO: • http://www.q-go.nl/ • Technology: • question analysis and normalization • search • dialogue, Q&A • Application: • Search through dialogue/ Q&A, • Online customer support • Market: • Banks, Insurance companies • Projects & software
NLT-Text Elitech: • http://www.elitech.nl/ • Technology: • Question analysis, user profile, • question answer matching, answer database • Application: multimodal Q&A, selfservice • Market: Railways, cities, Banks, Energy, Insurance, Travel agents, Telecom, Government • Projects & software
NLT-Text TextKernel: • http://www.textkernel.com/ • Technology: • memory based learning (analogical or similarity-based reasoning) • text classification • string extraction (names, numbers, formulations, zip codes) • Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or Stochastic Grammars • Application: • Text classification, Information extraction • Market: • Recruiting, Tangram, WiseGuys, • Cooperates with system integrators (e.g. Capgemini, WCC Search & Match, Connexys) • Projects & software
NLT-Text Carp: • http://www.carp-technologies.nl/nld/Home/ • Technology: • Parsing • Semantic network • Application: • summarizers, • search, • anonymizer, • text analysis • Market: • local governments (province cities), • Department of Justice • Projects & software
NLT-Text Irion: • http://www.irion.nl/ • Technology: • statistic and phrase retrieval • text classification • language identification, taggers, grammars, wsd • information extraction • dialogue modelling • multilingual semantic networks, thesauri • Application: • Text classification, text mining, cross-lingual retrieval, dialogue systems, language-analysis for text complexity • Market: • Governments (local and national) • Libraries • Publishers • Projects & software
NLT-Text TextKernel: • http://www.textkernel.com/ • Technology: • memory based learning (analogical or similarity-based reasoning) • text classification • string extraction (names, numbers, formulations, zip codes) • Hidden Markov Models, Decision Trees, Naive Bayes, SVM's, or Stochastic Grammars • Application: • Text classification, Information extraction • Market: • Recruiting, Tangram, WiseGuys, • Cooperates with system integrators (e.g. Capgemini, WCC Search & Match, Connexys) • Projects & software
ViaTaal inTaal KiesKompas Bureau Taal TrendLight LibRT Complex Reason Irion Semantics Collexis Carp Q-go Gridline Syntax KnowledgeConcepts TextKernel Polderland Tagging Autonomy Fast Endeca Google Microsoft Manual Lemmatize AskNow WiseGuys Statistics Co-training Automatic Simple Index Decide Search Classifi- cation Mining Dialogue Q&A Analysis
Discussion • Is the technology mature enough? • Long way from technology to software products > software development. • More money from investors required • Small company syndrome: sails & marketing • More money from investors required • Need for commercial software developers (salaries) • Need for NLP developers
Some cases of failure • Government departments & university/education libraries • VWS bought Verity • cheap license but still expensive (100K and 30K maintenance per year) • does not work: • diacritics • morphology • compounds • upper/lower case • expensive IT consult to investigate solution: alternative search was not an option (no money & people for another integration) • classify text, index thesaurus labels, match queries to labels • Many RFIs involving Autonomy, Verity, Fast and Irion • best system • to small to be thrustworthy