120 likes | 129 Views
Detecting economic events from news sources using semantic technology for timely and accurate identification. The SPEED framework processes unstructured data to extract financial information in real-time. The pipeline leverages domain knowledge for event recognition.
E N D
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Introduction (1) • News greatly impacts financial markets • Some of many recent examples: Google buys Motorola Mobility for $12.5B (VentureBeat) - This morning, Google announced that it will buy Motorola Mobility — Moto’s mobile device arm — for $12.5 billion. Google will acquire Motorola Mobility for $40 per share in cash, a 63 percent premium over the company’s Friday closing price. Google says it will run Motorola Mobility as a separate business. Motorola spun off its business into two divisions last year, Mobility and Solutions (the data and telecom portion), as a response to declining profits. Google shares were down around 1.5 percent, while Motorola Mobility’s stock jumped 57 percent. The company says Motorola Android phones won’t be receiving any special treatment as a consequence of the deal — but that’s a tough nut to swallow, since Google often plays favorites. Steve Jobs resigns from Apple, Cook becomes CEO (Reuters) - On Wednesday, Silicon Valley legend Steve Jobs resigned as chief executive of Apple Inc in a stunning move that ended his 14-year reign at the technology giant he co-founded in a garage. Apple shares dived as much as 7 percent in after-hours trade after the pancreatic cancer survivor and industry icon, who has been on medical leave for an undisclosed condition since January 17, announced he will be replaced by COO and longtime heir apparent Tim Cook. 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Introduction (2) • It is important to automatically and accurately identify economic events in news items in a timely manner • This involves processing large amounts of heterogeneous sources of unstructured data • Domain-specific information captured in domain semantics facilitates detection of relevant concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Introduction (3) • SPEED: a Semantics-based Pipeline for Economic Event Detection • Our approach: • Extracts financial events from emerging news (RSS feeds) • Annotates news messages with meta-data • Aims for fast processing in order to enable real-time use 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
SPEED: Framework 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
SPEED: Implementation (1) • Java-based pipeline using a general architecture for text engineering (GATE) • GATE components used: • English Tokenizer – Part-Of-Speech Tagger • Sentence Splitter – Morphological Analyzer • Adaptations and additions required: • Word Sense Disambiguation • Ontology-based components 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
SPEED: Implementation (2) • Ontology Gazetteer: • GATE uses an inefficient list of ontology concepts • We employ a look-up tree based on hash maps • Word Group Look-Up: • Tree-based approach using WordNet • Word Sense Disambiguator • Adaptation of the Structural Semantic Interconnections (SSI) algorithm • Event Phrase Gazetteer: • Matches event concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
SPEED: Implementation (3) • Event Pattern Recognition: • Based on GATE Rule Transducer, utilizing JAPE patterns • Additionally operates on event concepts • Ontology Instantiator: • Retrieves event annotations in text • Creates event individuals in ontology • Updates affected concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Evaluation (1) • Word Sense Disambiguator: • Evaluated on SemCor • Original SSI: precision 53%, recall 31% • Adapted SSI: precision 59%, recall 59% • Entire framework: • Evaluated on 200 news messages from Yahoo! Business & Technology feeds, annotated by three domain experts (with IAA 66% or higher) for 10 events regarding: • CEOs (60) • Partners (23) • Revenues (22) • Presidents (22) • Subsidiaries (46) • Profits (33) • Products (136) • Share values (45) • Losses (27) • Competitors (50) • Event instances: precision 86%, recall 81% • Fully decorated events: precision 62%, recall 53% 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Evaluation (2) • Latency: • Total pipeline: 632 milliseconds per document • Linguistic and syntactic analysis: 30% • Word Sense Disambiguation: 60% • Remaining tasks: 10% 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Conclusions • SPEED framework: • Components are semantically enabled • Pipeline outputs are ontology instances • Adapted SSI algorithm • Evaluation underlines fast and accurate performance • Future work: • Applications in algorithmic trading • Linking sentiment to discovered events (e.g., trends, moods, opinions, etc.) 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)
Questions 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)