1 / 16

Technical Overview and Challenges

Technical Overview and Challenges. Achim Klein, University of Hohenheim 1 st Review Meeting, Luxembourg, 30 November 2011. Major Expected Outcomes. Financial market information system , providing n ew insights improved decision making with respect to three challenging use cases

Download Presentation

Technical Overview and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technical Overview and Challenges Achim Klein, University of Hohenheim 1st Review Meeting, Luxembourg, 30 November 2011

  2. Major Expected Outcomes • Financial market information system, providing • new insights • improved decision making • with respect to three challenging use cases • Real-timeand scalable pipeline for • financial unstructured data acquisition, • information extraction, • sentiment analysis, • information integration, • visualization and decision-support models

  3. Main Innovations and Challenges Innovations 1. Structured  unstructured data • Noise and uncertainty 2. Offline processing  real time stream • Ontology evolution, extraction, analysis • Online decision-support models, visualization 3. Small  vast amounts of data 4. Financial decision-support models based on high level features • Challenges • Accuracy • Time efficiency • Throughput • Usefulness

  4. Compassslide Architecture, Integration & Scaling Strategy WP2 & WP7 WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  5. Data Acquisition • Main objectives • Large-scale acquisition of unstructured data • Uniform access to streams • Initial noise handling • Main challenges • Web data clean-up, and duplicate detection • Scalability WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  6. Ontology Infrastructure Architecture, Integration & Scaling Strategy WP2 & WP7 WP1 & WP8 • Main objectives • Provide financial domain ontology for information extraction tasks • Main challenges • (Semi-) automatic construction and evolution of ontology and word lists UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  7. Information Extraction • Main objectives • Natural language pre-processing • Extraction of named entities • Topic classification • Main challenges • Training data for topics WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  8. Sentiment Analysis • Main objectives • Extract sentiments with respect to use case specific sentiment objects’ features. • Main challenges • Accuracy • Time efficiency WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  9. Decision Support Infrastructure • Main objectives • Provide event detection and prediction • Machine learning and qualitative models based on high level features • Advanced real-time visualization • Main challenges • Usefulness for decision makers • Time efficiency WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  10. Information Integration • Main objectives • Storage of acquired and extracted data • Integration of existing structured data • Uniform access • Main challenges • Heterogeneity of information and storages • Throughput WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  11. Architecture and Integration • Main objectives • Scalable architecture • Integration of pipeline components • Integrated financial market information system • Main challenges • Real time streams • Massive data volume WP2 & WP7 Architecture, Integration & Scaling Strategy WP1 & WP8 UC#1 Market Surveillance UC#2 Reputational Risk management UC#3 Online Retail Brokerage Domain independent GUI (Open Source) WP3 WP4 WP6 Management Dissemination & Exploitation Data Acquisition Ontology Infrastructure Information Extraction Sentiment Analysis Decision Support Infrastructure WP5 Information Integration Data, Information & Knowledge Base WP9 WP10

  12. Implementation Roadmap Prototypes Live-Feeds Large-scale • Requirements report • State of the art • analysis • Set up infrastructure • Start data acquisition • Architecture • Scaling plan • Corpus • Preliminary • prototypes • Integrated • Financial Market • Information System • Improved • prototypes • Real time • streaming • Decision • -support • models and • visualization • Scale data • volume • Function • complete • End-user prototypes • Final • demonstration • Evaluation • reports

  13. Summary of Y1 Tech. Achievements • Infrastructure • Collecting documents since April 2011 (~8 mio. documents, 200 GB/month) • Corpus ofsentence-levelannotateddocuments (~900 andgrowing) • Financial ontology (~4000 instances) • First knowledgebase • Prototypes • Data acquisition • Sentiment extraction • Technology Evaluation and Experiments • Integration (ZeroMQ) • Scaling (storage, messaging) • Portfolio selectionexperiment 1ST YEAR ACHIEVEMENTS

  14. Thankyou 14

  15. Summary of Y1 Tech. Achievements • Multi-core project server up and running • Collecting millions of documents since April 2011 • Data storage experiments and first knowledge base • Firstversion of financialontologyavailable • Sentimentextractionforall use cases • Initial decision making experiment (portfolio selection) • Scaling experiments (storage, messaging) • Initial (more advanced) integration tests • ++ (not foreseen) annotated document corpus for sentiment analysis (gold standard) 1ST YEAR ACHIEVEMENTS

  16. Main Innovations and Challenges • Structured  Unstructured • Address noise and uncertainty • Offline  Online streams (near real time) • Ontology infrastructure • Machine learning • Sentiment extraction • Qualitative modeling • Visualization • Small  Vast amountsof data • Financial decision support • Based on high-level semantic features • Glass-box models

More Related