1 / 140

Mixed Tools for Market Analysis and Their Applications

Explore the combination of MST and p-median problem in market graphs analysis, using spanning p-forest with p-stars for data interpretation. Includes applications in cell formation. Discusses problems like tolerance issue for MST and experimental results.

jacobsenj
Download Presentation

Mixed Tools for Market Analysis and Their Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mixed Tools for Market Analysis and Their Applications Boris Goldengorin LATNA – Laboratory of Algorithms and Technologies for Network Analysis Higher School of Economics, Moscow, Russian Federation bgoldengorin@hse.ru Joint work with M. Batsyn, V. Kalyagin, A. Kocheturov, P.M. Pardalos, A. Vizgunov

  2. Dedicated to Boris Mirkin Birthday • Professor, Department of Applied Mathematics, Higher School of Economics, Moscow RF • - clustering • - decision making • - mathematical classification • - evolutionary trees • - data and text interpretation • Citation indices  All Citations 3865 • h-index 28 • i10-index 50

  3. Mirkin visit me in Alma-Ata, Kazakhstan in 1981 The USSR Workshop on Statistical and Discrete Analysis of Non-Numerical Information, Expert’s Estimations and Discrete Optimization. Abstracts. Moscow-Alma-Ata, VINITI AN SSSR, 1981, pp.356 (in Russian)

  4. Abstract • Efficient daily trading impose aggregation of positions correlated to each other by one of trader’s criteria. The positions aggregation is one of possible ways to increase the online trader’s capacity. • In this talk we analyse the well known minimum spanning tree (forest) approach used for the market graphs analysis and combine this approach with less known pseudo-Boolean approach based on the p-median problem. • We illustrate our mixed tools (spanning p-forest combined with p-stars) by application them to different sources of data including market graphs and cell formation in group technology.

  5. Outline of the talk • The Market Graph • The Minimum Spanning Tree (MST) Problem • MST and Its Tolerances • Stars and the p-Median Problem • Pseudo-Boolean polynomial • Mixed Boolean pseudo-Boolean Model (MBpBM) • Experimental results • Concluding Remarks • Directions for Future Research 5

  6. Market Graph • Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold • ~6000 vertices (stocks)

  7. Market Graph • Correlation coefficients for the edges: Distribution of correlation coefficients in the US stock market for several overlapping 500-day periods during 2000–2002 (period 1 is the earliest, period 11 is the latest).

  8. Market Graph • Market graph (all the considered instances for different correlation thresholds) follows the power-law model • Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos, 2005)

  9. Degree distribution of the Market graph

  10. Finding Cliques in the Market graph • Using the IP formulation of the maximum clique problem to find the exact solution:

  11. Maximum Clique size for different correlation thresholds • Large cliques despite very low edge density – confirms the idea about the “globalization” of the market

  12. The Minimum Spanning Tree (MST) Problem. • For a given simple weighted undirected graph G = (V;E;W) find a spanning tree T = (V;E(T)) such that the total sum of all edge weights w(e) for all e ϵ E(T) is minimized. It is well known that a MST is a connected acyclic graph, containing exactly (n-1) edges, and might be computed be means of the Kruskal’s (greedy type) algorithm. • At each step the Kruskal’s algorithm selects a shortest edge such that the current graph will be a forest.

  13. Examples of Spanning Trees Weekly volatility before technology crash Daily return before technology crash

  14. Clique and Forest

  15. Kruskal’s Algorithm for the MST • Repeat the following step until a forest T has n-1 edges (initially E(T) is empty): Add to T a shortest edge that does not form a cycle with edges already in E(T). • Assume that we have ordered all m = |E| edges in a non-increasing order such that w(e1) ≤ w(e1) ≤ … ≤ w(em) Thus, the Kruskal’s algorithm will terminate with a MST in at most O(mlogm) with m = n(n-1)/2 for a complete graph.

  16. The tolerance problem for a MST • The problem of finding for each eϵE, the maximum decrease l(e) and the maximum increase u(e) of the edge length w(e) preserving the optimality of T under the assumption that the lengths of all other edges remain unchanged. • The values l(e) and u(e) are called the lower and the upper tolerances, respectively, for an edge eϵE with respect to the given MST T and the function of edge lengths w.

  17. An optimal MST and Its Tolerances in O(mlogm) time In the following portion we show that a MST together with all its upper and lower tolerances can be computed in O(mlogm) time by a tiny modification of the Kruskal’s algorithm. Let us recall that by adding a single edge y not in T to the chosen spanning subtree S(T) we create a unique cycle C = {e1;e2,…,ek,y} where the tail of y is the head of ek and the head of y is the tail of e1 or vice versa.

  18. Cliques and a spanning trees

  19. Equivalent Problems • The clique problem and the independent set problem are complementary: a clique in G is an independent set in the complement graph of G and vice versa. • Set {1,2,3,4} – is the maximum clique, set {0,2,5} is the maximum independent set

  20. Market Graph • Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold • ~6000 vertices (stocks)

  21. Market Graph • Correlation coefficients for the edges: Distribution of correlation coefficients in the US stock market for several overlapping 500-day periods during 2000–2002 (period 1 is the earliest, period 11 is the latest).

  22. Market Graph • Market graph (all the considered instances for different correlation thresholds) follows the power-law model • Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos, 2005)

  23. Degree distribution of the Market graph

  24. Finding Cliques in the Market graph • Using the IP formulation of the maximum clique problem to find the exact solution:

  25. Maximum Clique size for different correlation thresholds • Large cliques despite very low edge density – confirms the idea about the “globalization” of the market

  26. The p-Median Problem (PMP) I = {1,…,m} – a set of m facilities (location points), J = {1,…,n} – a set of nusers (clients, customers or demand points) C = [cij] – a m×n matrix with distances (measures of similarities or dissimilarities) travelled (costs incurred) Costs Matrix location points clients - location point (cluster center) - Client (cluster points) 27

  27. The PMP: combinatorial formulation The p-Median Problem (PMP) consists of determining p locations (the median points) such that 1 ≤ p≤ m and the sum of distances (or transportation costs) over all clients is minimal. complexity 1 m p - opened facility - location point - client 28 p = 3

  28. The PMP: combinatorial formulation • I – set of locations • J – set of clients • cij– costs for serving j-th client from i-th location • p – number of facilities to be opened 29

  29. The PMP: Applications • Facilty location • Cluster analysis • Quantitative psychology • Telecommunications industry • Sales force territories design • Political and administrative districting • Optimal diversity management (assortment problems) • Cell formation in group technology (flexible manufacturing systems) • Vehicle routing • Topological design of computer and communication networks 30

  30. The PMP: Applications • Facility location - consumer (client) - possible location of supplier (server) - supplier (server), e.g. supermarket, bakery, laundry, etc.

  31. The PMP: Applications • Facility location - consumer (client) - possible location of supplier (server) - supplier (server), e.g. supermarket, bakery, laundry, etc.

  32. The PMP: Applications Output • Cluster analysis • Input: • finite set of objects • measure of similarity cluster 1 cluster 2 cluster 3 cluster 4 “best” representatives – p-medians

  33. The PMP: Applications • Quantitative psychology patients symptoms (behavioural patterns) type 1 mentality features type 2 mentality features “leaders” or typical representatives

  34. The PMP: Applications • Telecommunications industry

  35. The PMP: Applications • Sales force territories design customers (groups of customers) entries of the costs matrix account for customers’ attitudes and spatial distance possible outlets for some product Goal: select p best outlets for promoting the product

  36. The PMP: Applications • Political and administrative districting districts, cities, regions degree of relationship: political, cultural, infrastructural connectedness districts, cities, regions

  37. The PMP: Applications • Optimal diversity management • given a variety of products (each having some demand, possibly zero) • select p products such that: • every product with a nonzero demand can be replaced by one of the p selected products • replacement overcosts are minimized

  38. The PMP: Applications • Optimal diversity management • Example: wiring designs, p=3 configurations with zero demand

  39. The PMP: Applications • Cell formation in group technology functional layout cellular layout see also video at http://www.youtube.com/watch?v=q_m0_bVAJbA - machines - products routes

  40. The PMP: Applications • Vehicle routing - clients / storage - vehicle routes

  41. The PMP: Applications • Topological design of computer and communication networks

  42. The PMP: Applications • Topological design of computer and communication networks

  43. The PMP: Applications • Topological design of computer and communication networks

  44. Publications, more than 500 Goldengorin et al, 2011, 2012 Elloumi, 2010; Brusco and K¨ohn, 2008; Belenky, 2008; Church, 2003; 2008; Avella et al, 2007; Beltran et al, 2006; Reese, 2006 (Overview, NETWORKS) ReVelle and Swain, 1970; Senne et al, 2005.

  45. Brusco and Kohn PSYCHOMETRIKA—VOL. 73, NO. 1, 89–105 There is an evidence that the p-median model can, for certain data structures, provide better cluster recovery than alternative clustering procedures (Klastorin, 1985). Klastorin provided a limited comparison of misclassification rates of the complete linkage (Johnson, 1967), average linkage (Sokal & Sneath, 1963), minimum variance (Ward, 1963), K-means (Hartigan & Wong, 1979; MacQueen, 1967), and p-median methods (Mulvey & Crowder, 1979). For data generated based on squared Euclidean measures of dissimilarity, Ward’s method provided the lowest misclassification rates, followed by the p-median method. The p-median model, however, provided the lowest misclassification rates when the pairwise measure of dissimilarity was based on Euclidean distance.

  46. The PMP: Boolean Linear Programming Formulation (ReVelle and Swain, 1970) s.t. - each client is served by exactly one facility - p opened facilities - prevents clients from being served by closed facilities xij = 1, if j-th client is served by i-th facility; xij = 0, otherwise 47

  47. The PMP:alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) 48

  48. The PMP: alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) 49

  49. The PMP: alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) Decision variables 50 S - set of opened plants

More Related