160 likes | 357 Views
Technical Overview and Challenges. Achim Klein, University of Hohenheim 1 st Review Meeting, Luxembourg, 30 November 2011. Major Expected Outcomes. Financial market information system , providing n ew insights improved decision making with respect to three challenging use cases
E N D
Technical Overview and Challenges Achim Klein, University of Hohenheim 1st Review Meeting, Luxembourg, 30 November 2011
Major Expected Outcomes • Financial market information system, providing • new insights • improved decision making • with respect to three challenging use cases • Real-timeand scalable pipeline for • financial unstructured data acquisition, • information extraction, • sentiment analysis, • information integration, • visualization and decision-support models
Main Innovations and Challenges Innovations 1. Structured unstructured data • Noise and uncertainty 2. Offline processing real time stream • Ontology evolution, extraction, analysis • Online decision-support models, visualization 3. Small vast amounts of data 4. Financial decision-support models based on high level features • Challenges • Accuracy • Time efficiency • Throughput • Usefulness
Compassslide Architecture, Integration & Scaling Strategy WP2 & WP7 WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Data Acquisition • Main objectives • Large-scale acquisition of unstructured data • Uniform access to streams • Initial noise handling • Main challenges • Web data clean-up, and duplicate detection • Scalability WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Ontology Infrastructure Architecture, Integration & Scaling Strategy WP2 & WP7 WP1 & WP8 • Main objectives • Provide financial domain ontology for information extraction tasks • Main challenges • (Semi-) automatic construction and evolution of ontology and word lists UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Information Extraction • Main objectives • Natural language pre-processing • Extraction of named entities • Topic classification • Main challenges • Training data for topics WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Sentiment Analysis • Main objectives • Extract sentiments with respect to use case specific sentiment objects’ features. • Main challenges • Accuracy • Time efficiency WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Decision Support Infrastructure • Main objectives • Provide event detection and prediction • Machine learning and qualitative models based on high level features • Advanced real-time visualization • Main challenges • Usefulness for decision makers • Time efficiency WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Information Integration • Main objectives • Storage of acquired and extracted data • Integration of existing structured data • Uniform access • Main challenges • Heterogeneity of information and storages • Throughput WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Architecture and Integration • Main objectives • Scalable architecture • Integration of pipeline components • Integrated financial market information system • Main challenges • Real time streams • Massive data volume WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10
Implementation Roadmap Prototypes Live-Feeds Large-scale • Requirements report • State of the art • analysis • Set up infrastructure • Start data acquisition • Architecture • Scaling plan • Corpus • Preliminary • prototypes • Integrated • Financial Market • Information System • Improved • prototypes • Real time • streaming • Decision • -support • models and • visualization • Scale data • volume • Function • complete • End-user prototypes • Final • demonstration • Evaluation • reports
Summary of Y1 Tech. Achievements • Infrastructure • Collecting documents since April 2011 (~8 mio. documents, 200 GB/month) • Corpus ofsentence-levelannotateddocuments (~900 andgrowing) • Financial ontology (~4000 instances) • First knowledgebase • Prototypes • Data acquisition • Sentiment extraction • Technology Evaluation and Experiments • Integration (ZeroMQ) • Scaling (storage, messaging) • Portfolio selectionexperiment 1ST YEAR ACHIEVEMENTS
Thankyou 14
Summary of Y1 Tech. Achievements • Multi-core project server up and running • Collecting millions of documents since April 2011 • Data storage experiments and first knowledge base • Firstversion of financialontologyavailable • Sentimentextractionforall use cases • Initial decision making experiment (portfolio selection) • Scaling experiments (storage, messaging) • Initial (more advanced) integration tests • ++ (not foreseen) annotated document corpus for sentiment analysis (gold standard) 1ST YEAR ACHIEVEMENTS
Main Innovations and Challenges • Structured Unstructured • Address noise and uncertainty • Offline Online streams (near real time) • Ontology infrastructure • Machine learning • Sentiment extraction • Qualitative modeling • Visualization • Small Vast amountsof data • Financial decision support • Based on high-level semantic features • Glass-box models