1 / 43

Financial Information Grid –an ESRC e-Social Science Pilot

Financial Information Grid –an ESRC e-Social Science Pilot. Khurshid Ahmad Department of Computing, University of Surrey; Jon Nankervis Department of Accountancy and Finance, University of Essex. FINGRID Project.

shada
Download Presentation

Financial Information Grid –an ESRC e-Social Science Pilot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Financial Information Grid –an ESRC e-Social Science Pilot Khurshid Ahmad Department of Computing, University of Surrey; Jon Nankervis Department of Accountancy and Finance, University of Essex

  2. FINGRID Project • The FINGRID project is a collaboration between econometricians at Essex, computing academics, particularly in grid computing and artificial intelligence, at Surrey (plus financial traders). • The FINGRID project aims to provide a solution for the information management/ processing challenge in social sciences: analysis and fusion of distributed quantitative and qualitative data and programs. • FINGRID is the third project at Surrey that deals with qualitative data (news and reports) and qualitative data (time series) EU Projects ACE (1996-99), GIDA (2001-03).

  3. FINGRID Objectives • Create a Grid environment based on Open Grid Services Architecture to provide a demonstrable software application, for analysing financial information in the form of quantitative and qualitative data. • Evaluate the benefits of the Grid approach.

  4. FINGRID Reflections • DAME (York): Engine Behaviour Time-series + Reports in a controlled language; Case-based Reasoning; • Belfast e-Science Centre: Value at Risk Computation; • MYGRID and MIAKT: Information Extraction + Image Annotation

  5. FINGRID Project Team David Cheng, Research Officer, Text Analysis; (ESRC funded) Tuğba Taşkaya-Temizel, Tutor, Grid Computing, Grid Architect; Lee Gillam, Research Officer, Grid Implementation; Pensiri Manumapousat, Research Student, Text Categorisation; Saif Ahmad, Research Student, Wavelet Analysis; Hayssam Trablousi, Research Student, Named Entity Extraction; Ademola Popoula, Research Student, Fuzzy Logic Analysis; Gary Dear, Computing Officer, Grid Implementation; Khurshid Ahmad, Principal Investigator; Jon Nankervis, Co-Investigator (Essex) ESRC Funding: Fifty Thousand Pounds Sterling (Gross).

  6. The Problem Social science research requires the capture and analysis of data that is quantitative - numerical data - and data that is qualitative - opinions expressed in language or other sign systems. The fusion of multi-modal information, is critical to social sciences research.

  7. The Problem – Decision Making Challenges: Hypothesis formation and theory development in financial and political economics,both by researchers and financial traders, now involves analysis of streaming time serial data and financial and political news. The Data:

  8. Streaming Time-serial Data and News Service STREAMING ECONOMIC/POLITICAL NEWS- Reuters; Yahoo; Bloomberg, BBC! Al Jazeera

  9. The Problem – Decision Making • Financial and political analysis requires data over short time periods (daily) or longer time periods (5-10 years). • This is large volume of data which requires instant processing – much like data emerging from particle or gene factories- except that the data is in two or more modalities in our case. • The financial/political analysis requires access to data tombs(archives) and data nurseries(streaming news and time-series)

  10. The Problem – Decision Making • Decision making involves dealing with factual news (who, where, what, when) and news related to ‘market sentiment’ news • Decision making involves dealing with time-ordered data which lacks stochastic stability and has considerable variance changes.

  11. Market Sentiment? • In addition to the very quantitative data related to trading volumes and price movements, the financial traders, and increasingly economists, rely on market sentiment. • Behaviour of the investors, security analysts, and financial/monetary theoreticians, is influenced by information other than market data: investor credulity; herding  sentiment analysis

  12. Market Sentiment? Motivation Bounded Rationality Herbert Simon(Nobel Prize in Economics 1978) Rational Decision Making in Business Organisations: Mechanisms of Bounded Rationality –failures of knowing all of the alternatives, uncertainty about relevant exogenous events, and inability to calculate consequences . Daniel Kahneman (Nobel Prize in Economics 2002) Maps of bounded rationality –intuitive judgement & choice: Two generic modes of cognitive function: an intuitive mode: automatic and rapid decision making; controlled mode deliberate and slower. E-Economics? FINGRID? Computing at the limits of rationality  distributed multi-modal data analysis and fusion

  13. Market Sentiment, Behavioral Psychology Investor sentiment & stock market bubbles has some causal relationship with: Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.

  14. Market Sentiment, Quantitative Behavioral Psychology • Investor sentiment can be affected by: • Closed-end fund discount (CEFD); • Turnover ratio (in NYSE for example) (TURN) • Number of Initial Public Offerings (N-IPO); • Average First Day Returns on R-IPO • Equity share S • Dividend Premium • Age of the firm, external finance, ‘size’(log(equity))……. • A novel composite index: • Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt +0.464RIPOt+0.371 St-0.431Pt-1 A very complex non-linear regression on large data sets – computed on monthly basis Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.

  15. FINGRID Contribution • Extraction of market sentiments using a ‘local grammar’ of rise/fall, growth/decay coupled with attributed and un-attributed news (rumours). • Automatic analysis of terminology and ontology: Financial Trading has 25 sub-domains. • An integrated framework of time-series analysis (pre-processing, filtering, trend and seasonality, variance change) using wavelet analysis and fuzzy-logic. • Neural network based classifiers for classifying streaming news. • Implementation of a Grid-based solution and ‘daily’ market report service.

  16. Fusing Qualitative and Quantitative Data Analysis • We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

  17. What we need… • A common infrastructure: • for interoperability and reusability • for aggregating distributed resources to create a single-source computing power and provides seamless access • which allows sharing geographically distributed resources

  18. Is Grid Computing the Solution? IBM on Financial Grid Computing: Grid computing enables the virtualisation of distributed computing and data resources @ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml

  19. Is Grid Computing the Solution? • GRID • Resource Sharing; • Collaboration: Financial Economics, Sociology of Poverty, Policy Formation • Working with living data • much Grid work relates to data tombs  social sciences with data nurseries • living data is unstable, incomplete, and requires at least two interdependent modalities – one compensates for the other • Software, including legacy, is in silos and its operation based on tradition. Packages come with experts! • ‘Home’ punters – everybody plays the market • Speed up – factor of 5 in text analysis; 3-4 in Monte Carlo simulations @ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml

  20. FINGRID Infrastructure in Surrey A 24-node data and compute Grid interfaced to a ‘real world’ data stream (Reuters News and Financial Time series Feed) for capturing, analysing and fusing quantitative and ‘qualitative’ data. Reuters Feed: 2 dedicated data lines, PC and Sun for feed management and associated networking

  21. FINGRID Infrastructure: Reuters Financial Services Streaming Data and News Service

  22. FINGRID Architecture A 3 tier Architecture • The first tier facilitates the client in sending a request to one of the services: Text Processing Service or Time Series Service; • The second tier facilitates the execution of parallel tasks in the main cluster and is distributed to a set of slave machines (nodes); • The third tier comprises the connection of the slave machines to the data providers

  23. Text and Time Series Service Streaming Textual Data Distribute Tasks 1 2 Send Service Request Streaming Numeric Data Notify user about results Main Cluster Receive Results 4 3 GRID Cluster 24 Slaves Surrey Grid FINGRID Architecture • Given an allocated task, the corresponding data is retrieved from the data providers by the slave machines. • The main cluster monitors the slave machines until they have completed their tasks, and subsequently combines the interim results. • The final result is sent back to the client machine.

  24. FINGRIDTechnology • Globus Toolkit 3.0 (based on Open Grid Services Architecture (OGSA)) • Java CogKit (Java Commodity Grid) for resource management • Languages for Development  JAVA + Reuters SSL Developer’s Kit (Java) for the connection with the Reuters streaming data • Applications Integrated: • Existing statistical programs in FORTRAN • Matlab: JMatlink (adapted to Linux environment for the communication with Matlab environment) • Other Technologies: • XML (NewsML) for the news information • CGI for communication of Java Applet with the server side

  25. FINGRID Services • News Analysis: service for extracting MARKET SENTIMENT. • Correlation: Market sentiment correlation with financial time series. • Bootstrapping: service for computing standard errors, confidence intervals and hypothesis testing by a simulation of the time series or market sentiment series.

  26. FINGRID Service: Market Sentiment At one level market sentiment is often expressed in news reports and editorials, and ranges from views about national economies to the imminent take-overs, mergers and acquisitions and from people leaving/joining an organization to news about political and economic successes and failures.

  27. Market Sentiment • Sentiments are expressed using metaphors. • The metaphors, bullish and bearish, so-called animal metaphors, refer to the aggressive or recessive (shy) mood of the investors and perhaps of the traders. • The sentiment words are typically used metaphorically and in general are ambiguous (‘rose’ may be used in different contexts and indeed as a proper noun). • The local grammar reduces the ambiguity by constraining the use of the sentiment words.

  28. Market Sentiment A finite state automata (local grammar), learnt by our system, from a news corpus, for identifying ‘sentiments’ in free text unambiguously, was used for extracting sentiment information.

  29. Market Sentiment A finite state automata (local grammar), was learnt by our system, from a news corpus, for identifying names of persons and organisations in free text unambiguously, was used for attributing sentiment information to people and organisations.

  30. Brown RCV1 Words/s (1 machine) 7,120 - Words/s (2 machines) 14,091 5,334 Words/s (4 machines) 23,944 10,532 Words/s (8 machines) 31,453 14,590 Case Studies & Results • Text Analysis Service • For the Brown Corpus, the number of words processed per second is similar to Hughes et al.: 7,120 versus 6,670 in a single CPU system. • Our 2-node grid implementation shows a 98% gain of performance, whereas Hughes et al. (SMP configuration, equivalent to our 2-node grid) implementation shows a 27% gain. • Relative performance of the word frequency counting experiment on the RCV1 corpus is lower than the Brown corpus - it is necessary to parse the XML files prior to processing.

  31. Case Studies & Results • Text Analysis Service • A Java program for sentiment extraction has been developed. • Experiments on Reuters RCV1 corpus (2.3GB) were conducted. Significant improvement on processing time: 15.9 hours on a 4-node grid to 13.1 hours on a 8-node grid. Time required to process a month news with different configurations

  32. FINGRID Service: Fusing quantitative & qualitative information • Time serial data related to financial instruments, for example, currency, stocks, derivatives, often exhibit nonstationarity. • In order to extract long-term trends, seasonal variation, and the random component, in a complex time-series, increasingly multi-scale analysis and fuzzy-logic is used. • The positive and negative sentiments related to a financial instrument may be ordered as a time series. • This sentiment series is then correlated with the movement of a financial instrument. • Such correlation can be used for prediction, or better still for the analysis of (volatile) movements in the market.

  33. Fusing Qualitative and Quantitative Data Analysis • We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

  34. FINGRID Service: Bootstrapping & Large-scale simulations • Bootstrap method assumes that the observed data is a representative of the unknown population. • Bootstrap procedures are data-based simulation methods that estimate the distribution of estimators by re-sampling observed data. • Statistical inferences obtained from distributions of simulated data are reported to be more reliable than inferences gained from asymptotic theory when the sample size is infinitely large (MacKinnon2002). • Bootstrap tests and Monte Carlo tests are examples of simulation-based tests.

  35. Case Studies & Results Bootstrapping • Java-wrapped (Fortran) implementations of bootstrapping algorithm. • processing time of the bootstrapping program with different grid node configurations, starting from two-node to eight-node, was measured. When the number of bootstrap replications set to 1000, 1050 seconds was required on a 2- node grid; and 404 seconds on a 8-node grid

  36. Fusing Qualitative and Quantitative Data Analysis • We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

  37. Fusing Qualitative and Quantitative Data Analysis

  38. Fusing Qualitative and Quantitative Data Analysis

  39. Fusing Qualitative and Quantitative Data Analysis

  40. Fusing Qualitative and Quantitative Data Analysis

  41. Conclusion • We have identified the following problems that may cause performance degradation in a grid environment: • The configurations of the machines: During the distribution of tasks, we did not consider the configuration of the machines  faster machines were idling while the rest were still processing. • One common data source: Network latency occurs due to the number of nodes using the same bandwidth to retrieve files. • Amdahl’s law: Amdahl’s law is applicable to our grid, where the fraction of code f, which cannot be parallelised,affects speedup factor. • Program constraints: In the task distribution process, the file size is not considered.

  42. Conclusions • The FinGrid project has achieved three major objectives. • The project demonstrates how both quantitative and qualitative data from multiple sources can be processed, analysed, and fused. • It has raised considerable interest in the financial news information market ( Ahmad et al. 2004). • Contribution in terms of improvements to goods and services and financial software houses, and news vendors have shown interest in the project. • A Master’s level Grid Computing module has been developed based on our experience in FinGrid.

  43. Next Steps • Investigate and evaluate Condor-G, MPICH2 and OGSA-DAI for effective job management, parallel processing and database management. Towards a knowledge grid PARALLELand DISTRIBUTED KNOWLEDGE DISCOVERY: Continual analysis and fusion of text and numerical data both real-time and historical data. KNOWLEDGE GRID SERVICES: KNOWLEDGE RETRIEVAL: Adapt information extraction methods and systems (e.g. Surrey’s SYSTEM QUIRK) onto a GRID architecture for extended semantic analysis. KNOWLEDGE MODELLING: Representation of non-stationary time series using Wavelet Analysis, Neural Networks and Fuzzy Logic, such that the system learns from its past experience.

More Related