200 likes | 211 Views
Explore the integration of linguistic technologies in publication systems to efficiently process and analyze multilingual, heterogeneous information. Learn about NLP technologies, examples, applications in ATLAS project, automatic categorization, and text clustering.
E N D
Разширяване на кръгозора:Използване на лингвистични технологии в системи за публикации ICT PSP call identifier: CIP-ICT-PSP-2009-3 Theme 5: Multilingual Web 5.3 Multilingual Web content management - methods, tools and processes
The information today • Flood of multilingual and heterogeneous information • The challenge: The information has to be processed and analyzed in order to be used more efficiently
The information today • Increasing amount of multilingual and heterogeneous information
The information today Widening the scope!
The Language Technologies (LT) • The computers process the information; humans do understand it. • The computers has limited resources to understand the information; the humans has limited resources to process the information. • The NLP technologies optimizes the level of understanding of the computers and thus increase the productivity of the humans.
Overview • The NLP technologies by examples • NLP in practice – the ATLAS project • Conclusions • Questions
NLP by examples (1) • Divide and Conquer • Grouping the information: • By importance
NLP by examples (1) • Divide and Conquer • Grouping the information • By importance • Automatic text categorization • Politics (24) • Sports (5) • Entertainment (5) • Technologies (12) • Science (20) • Rumors (6) • Other (10)
NLP by examples (1) • Divide and Conquer • Grouping the information: • By importance • Automatic categorization • Text clustering • Politics (24) International affairs (12), Conflicts (3), Terrorism (5), Nature and Environment (8), ... • Science (20) Math (2), Physics (5), Nature and Environment(3), NLP technlologies(4), ... • Other (10) Money and Banks (3), Richard Branson (4), Learning materials (3), ...
NLP by examples (1) • Temporal dynamics • Before, Now, Tomorrow? • Politics (24 + 3) International affairs (10 -2), Конфликити (3), Terrorism (6 +1), Nature and Environment (10 +2), ...
NLP by examples(2) • We do value your opinion! • Positive, negative or objective?
NLP by examples(3) • Salient excepts • Persons • politics, actors, scientists, fictions characters • Organizations and Institutions • NATO, EU, BAS, Bank of England, Google, Apple, … • Geographical locations • Bulgaria, Sofia, EU, Western Europe, Tibet • Dates • Steven Paul Jobs was born in San Francisco on February 24, 1955 person city date
NLP by examples(3) • Salient excepts • Jobs was a demanding perfectionist who always aspired to position his businesses and their products at the forefront of the information technology industry by foreseeing and setting trends, at least in innovation and style ... • As of October 9, 2011, Jobs is listed as primary inventor related to a range of technologies from actual computer and portable devices to user interfaces ...
NLP by examples(3) • Salient excepts • Jobs was a demanding perfectionist who always aspired to position his businesses and their products at the forefront of the information technology industry by foreseeing and setting trends, at least in innovation and style... • As of October 9, 2011, Jobs is listed as primary inventorrelated to a range of technologies from actual computer and portable devices to user interfaces ...
NLP by examples(4) • You might be also interested in this and that … • Suggestions for similar content • According to the textual information • According to the persons, locations and dates • According to the key concepts and ideas • According to the genre and fictions characters • Cross-lingual Information Retrieval
NLP by examples(5) • Machine translation • Text summarization • Of a single document • Of a collection of documents
NLP in practice – ATLAS project • ATLAS – multilingual content management system which harnesses NLP technologies • Supported languages: Bulgarian, English, German, Polish, Romanian and Greek. www.atlasproject.eu • Using ATLAS • Software-as-a-service: http://i-publisher.atlasproject.eu • API for integration with 3rd party systems • ATLAS extracts and provides • Key phrases and names entities • A list of similar documents • The automatic categorization and text summary • Machine translation
The ATLAS project • ICT PSP project • ATLAS consortium: • Coordinator: Tetracom Interactive Solutions–Bulgaria • DFKI -DeutschesForschungszentrumFuerKuenstlicheIntelligenz GmbH – Germany • Atlantis Consulting SA – Greece • Institute for Bulgarian Language “Professor LuybomirAndreychin” at the Bulgarian Academy of Sciences – Bulgaria • InstytutPodstawInformatykiPolskiejAkademiiNauk – Poland • Universität Hamburg – Germany • UniversitateaAlexandruIoanCuza – Romania • Sveucilišteu Zadru – Croatia • ITD - Institute of Technologies and Development – Bulgaria • Project duration • 3 years, counting from 1st March, 2010
Conclusion? • What are the NLP technologies? • They provide a way to harness the computational resources of the computers for better information understanding • What can they be used for? • More effective way to handle the increasing amount of multilingual information • Who can use these technologies? • Libraries • Publishing houses • Medias • Online bookstores • Layers • Banks, companies and organization