1 / 2

CS Real-time Multilingual Classification

The client is a major telecom company providing nationwide telecom services. They wanted a system that performs real-time, multi-lingual classification and sentiment analysis of text data.<br><br>https://www.streamanalytix.com

Download Presentation

CS Real-time Multilingual Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Study Industry: Telecommunications Real-time Multi-lingual Classification and Sentiment Analysis of Text Client Overview The client is a major telecom company providing nationwide telecom services. They wanted a system that performs real-time, multi-lingual classification and sentiment analysis of text data. Highlights and Benefits • Rapid and accurate real-time text categorization and sentiment analysis Challenges • Adjustable text categorization for domain-specific classes The client was looking for a solution that allows storing, indexing, and querying PetaBytes (PBs) of data with a very high throughput. Some of the critical requirements were: • Multi-lingual support • Ingest and parse high volume of data [250M (15 TB) records/day] of varied types (for example, weblogs, email, chat, and files) • Enhanced sentiment analysis to focus on feature-specific opinion mining • Apply real-time multi-lingual classification and sentiment analysis with very high accuracy (four nines) • Linear scalability to increase the number of nodes in the cluster • Store metadata and raw binary data for querying • Query SLA - 5s on cold data • Provision to add custom component for added functionalities

  2. Our Solution Technologies The solution provided by Impetus had three modules: • Analytics Module: Responsible for performing text categorization and sentiment analysis. It implements a matrix decomposition-based text-classification algorithm. The incoming test document had to pass through a series of pre-processing and numerical computations. Impetus designed the classifier to achieve very low latency. R, Latest Semantic Analysis, Text Mining, Apache Kafka, Apache HBase, Elasticsearch, Apache Storm • Event Store/ Indexer Abstraction Layer: Responsible for storing and indexing the information based on the configuration • Publish Module: Responsible for publishing the analytical result or event data to the external system Client System Pull Event Data Processing & Analytics Layer Publish Event Data Event Parsing/Processing Event Store+Indexer Module Publish Module Analytics Module Store Processed Events Publish Analytics Results Persistence Store Persistence Layer Event Store Event Store/Indexer Abstraction Layer Event Indexer Figure 1: Solution Architecture © 2014 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. August 2014 StreamAnalytix, a product of Impetus Technologies, enables enterprises to analyze and respond to events in real-time at Big Data scale. Based on Apache Storm, StreamAnalytix is designed to rapidly build and deploy streaming analytics applications for any industry vertical, any data format, and any use case. Visit www.streamanalytix.com or write to us at inquiry@streamanalytix.com

More Related