230 likes | 449 Views
Data Sourcing, Statistical Processing and Time Series Analysis. Presented at EDAMBA summer school, Soreze (France) 23 July – 27 July 2009. An Example from Research into Hedge Fund Investments . ‘In the business world, the rearview mirror is always clearer than the windshield’
E N D
Data Sourcing, Statistical Processing and Time Series Analysis Presented at EDAMBA summer school, Soreze (France) 23 July – 27 July 2009 • An Example from Research into Hedge Fund Investments
‘In the business world, the rearview mirror is always clearer than the windshield’ - Warren Buffett -
Research Purpose • Developing accurate parametric pricing models for hedge funds and fund of hedge funds • Accounting for the special statistical properties of alternative investment funds • Providing practitioners and statisticians with a framework to assess, categorize and predict hedge fund investments
Research Approach • Research Philosophy • Research Approach • Primary Data Positivistic, deductive research: Postulation of hypotheses that are tested via standard statistical procedures Empirical analysis: Interpreting the quality of pricing models on the basis of historical data External secondary data: Historic time series adjusted for data-bias effects
Data Sourcing • DATA POOL
Data Treatment • FACTOR ANALYSIS • DATA POOL • MODEL BUILDING • STATISTICAL CLUSTERING
Data Import Access Database Excel Pivot table report
Access Database Management • Introduce Autonumber as primary keys • Define foreign keys for data queries • Define table relationships (one-to-many) • Build junction tables (many-to-many) • Write SQL queries to display relevant data • Integrate SQL in VBA code
Why Access? • Avoiding duplicate entries • Cross-referencing data from various sources • Combining and aggregating different databases • Efficient storage due to relational data management • Queries allow for retrieval/display of specific data • Linked-in with Microsoft VBA and Excel (data displayable as Pivot table reports) • Searching for specific entries via SQL
Data Validity • Consistency of performance history across different database providers • Degree of history-backfilling bias • Exclusion of defaulted funds/non-reporting funds from databases (survivorship bias) • Extent of infrequent or inconsistent pricing of assets (managerial bias)
Data Bias • Survivorship • Self-Selection • Database • Instant History • Look-ahead Inclusion of graveyard funds Multiple databases Rolling-window observation / Incubation period
Statistical tests • Regression Alpha • Average Error term • Information Ratio • Normality (Chi-squared, JarqueBera) • Goodness of fit, phase-locking and collinearity (Akaike Information Criterion, Hannan-Schwartz) • Serial Correlation (Durbin-Watson, Portmanteau) • Non-stationarity (unit root)
Comparative Analysis • Strategy 1 • Leverage • Strategy 2 • Leverage Unbalanced ANOVA (within and between treatments) t – test (leverage vs. no leverage) t – test (between strategies) t – test for equal means t – test for equal means t – test for equal means • Strategy 1 • No Leverage • Strategy 2 • No Leverage t – test for equal means
Empirical Findings • The accuracy of pricing models could be significantly improved when accounting for special statistical properties of hedge funds (Non-normality, non-linearity) • Hedge fund performance can be attributed to location choice as well as trading strategy • A limited number of principal components explains a significant proportion of cross-sectional return variation
Literature Review • Hedge Fund Linear Pricing Models • Sharpe Factor Model (Sharpe, 1992) • Constrained Regression (Otten, 2000) • Fama-French Factor Model (Fama, 1992) • Factor Component Analysis (Fung, 1997) • Simulation of Trading component (lookback straddle)