Mixed Tools for Market Analysis and Their Applications

Mixed Tools for Market Analysis and Their Applications Boris Goldengorin LATNA – Laboratory of Algorithms and Technologies for Network Analysis Higher School of Economics, Moscow, Russian Federation bgoldengorin@hse.ru Joint work with M. Batsyn, V. Kalyagin, A. Kocheturov, P.M. Pardalos, A. Vizgunov

Dedicated to Boris Mirkin Birthday • Professor, Department of Applied Mathematics, Higher School of Economics, Moscow RF • - clustering • - decision making • - mathematical classification • - evolutionary trees • - data and text interpretation • Citation indices All Citations 3865 • h-index 28 • i10-index 50

Mirkin visit me in Alma-Ata, Kazakhstan in 1981 The USSR Workshop on Statistical and Discrete Analysis of Non-Numerical Information, Expert’s Estimations and Discrete Optimization. Abstracts. Moscow-Alma-Ata, VINITI AN SSSR, 1981, pp.356 (in Russian)

Abstract • Efficient daily trading impose aggregation of positions correlated to each other by one of trader’s criteria. The positions aggregation is one of possible ways to increase the online trader’s capacity. • In this talk we analyse the well known minimum spanning tree (forest) approach used for the market graphs analysis and combine this approach with less known pseudo-Boolean approach based on the p-median problem. • We illustrate our mixed tools (spanning p-forest combined with p-stars) by application them to different sources of data including market graphs and cell formation in group technology.

Outline of the talk • The Market Graph • The Minimum Spanning Tree (MST) Problem • MST and Its Tolerances • Stars and the p-Median Problem • Pseudo-Boolean polynomial • Mixed Boolean pseudo-Boolean Model (MBpBM) • Experimental results • Concluding Remarks • Directions for Future Research 5

Market Graph • Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold • ~6000 vertices (stocks)

Market Graph • Correlation coefficients for the edges: Distribution of correlation coefﬁcients in the US stock market for several overlapping 500-day periods during 2000–2002 (period 1 is the earliest, period 11 is the latest).

Market Graph • Market graph (all the considered instances for different correlation thresholds) follows the power-law model • Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos, 2005)

Degree distribution of the Market graph

Finding Cliques in the Market graph • Using the IP formulation of the maximum clique problem to find the exact solution:

Maximum Clique size for different correlation thresholds • Large cliques despite very low edge density – confirms the idea about the “globalization” of the market

The Minimum Spanning Tree (MST) Problem. • For a given simple weighted undirected graph G = (V;E;W) find a spanning tree T = (V;E(T)) such that the total sum of all edge weights w(e) for all e ϵ E(T) is minimized. It is well known that a MST is a connected acyclic graph, containing exactly (n-1) edges, and might be computed be means of the Kruskal’s (greedy type) algorithm. • At each step the Kruskal’s algorithm selects a shortest edge such that the current graph will be a forest.

Examples of Spanning Trees Weekly volatility before technology crash Daily return before technology crash

Clique and Forest

Kruskal’s Algorithm for the MST • Repeat the following step until a forest T has n-1 edges (initially E(T) is empty): Add to T a shortest edge that does not form a cycle with edges already in E(T). • Assume that we have ordered all m = |E| edges in a non-increasing order such that w(e1) ≤ w(e1) ≤ … ≤ w(em) Thus, the Kruskal’s algorithm will terminate with a MST in at most O(mlogm) with m = n(n-1)/2 for a complete graph.

The tolerance problem for a MST • The problem of finding for each eϵE, the maximum decrease l(e) and the maximum increase u(e) of the edge length w(e) preserving the optimality of T under the assumption that the lengths of all other edges remain unchanged. • The values l(e) and u(e) are called the lower and the upper tolerances, respectively, for an edge eϵE with respect to the given MST T and the function of edge lengths w.

An optimal MST and Its Tolerances in O(mlogm) time In the following portion we show that a MST together with all its upper and lower tolerances can be computed in O(mlogm) time by a tiny modification of the Kruskal’s algorithm. Let us recall that by adding a single edge y not in T to the chosen spanning subtree S(T) we create a unique cycle C = {e1;e2,…,ek,y} where the tail of y is the head of ek and the head of y is the tail of e1 or vice versa.

Cliques and a spanning trees

Equivalent Problems • The clique problem and the independent set problem are complementary: a clique in G is an independent set in the complement graph of G and vice versa. • Set {1,2,3,4} – is the maximum clique, set {0,2,5} is the maximum independent set

Market Graph • Vertices are stocks, and an edge connects two stocks if the correlation between their price fluctuations over a certain period is greater than a specified threshold • ~6000 vertices (stocks)

Market Graph • Correlation coefficients for the edges: Distribution of correlation coefﬁcients in the US stock market for several overlapping 500-day periods during 2000–2002 (period 1 is the earliest, period 11 is the latest).

Market Graph • Market graph (all the considered instances for different correlation thresholds) follows the power-law model • Using the combination of heuristic and exact algorithms, the exact solution of the maximum clique problem was found (Boginski, Butenko & Pardalos, 2005)

Degree distribution of the Market graph

Finding Cliques in the Market graph • Using the IP formulation of the maximum clique problem to find the exact solution:

Maximum Clique size for different correlation thresholds • Large cliques despite very low edge density – confirms the idea about the “globalization” of the market

The p-Median Problem (PMP) I = {1,…,m} – a set of m facilities (location points), J = {1,…,n} – a set of nusers (clients, customers or demand points) C = [cij] – a m×n matrix with distances (measures of similarities or dissimilarities) travelled (costs incurred) Costs Matrix location points clients - location point (cluster center) - Client (cluster points) 27

The PMP: combinatorial formulation The p-Median Problem (PMP) consists of determining p locations (the median points) such that 1 ≤ p≤ m and the sum of distances (or transportation costs) over all clients is minimal. complexity 1 m p - opened facility - location point - client 28 p = 3

The PMP: combinatorial formulation • I – set of locations • J – set of clients • cij– costs for serving j-th client from i-th location • p – number of facilities to be opened 29

The PMP: Applications • Facilty location • Cluster analysis • Quantitative psychology • Telecommunications industry • Sales force territories design • Political and administrative districting • Optimal diversity management (assortment problems) • Cell formation in group technology (flexible manufacturing systems) • Vehicle routing • Topological design of computer and communication networks 30

The PMP: Applications • Facility location - consumer (client) - possible location of supplier (server) - supplier (server), e.g. supermarket, bakery, laundry, etc.

The PMP: Applications Output • Cluster analysis • Input: • finite set of objects • measure of similarity cluster 1 cluster 2 cluster 3 cluster 4 “best” representatives – p-medians

The PMP: Applications • Quantitative psychology patients symptoms (behavioural patterns) type 1 mentality features type 2 mentality features “leaders” or typical representatives

The PMP: Applications • Telecommunications industry

The PMP: Applications • Sales force territories design customers (groups of customers) entries of the costs matrix account for customers’ attitudes and spatial distance possible outlets for some product Goal: select p best outlets for promoting the product

The PMP: Applications • Political and administrative districting districts, cities, regions degree of relationship: political, cultural, infrastructural connectedness districts, cities, regions

The PMP: Applications • Optimal diversity management • given a variety of products (each having some demand, possibly zero) • select p products such that: • every product with a nonzero demand can be replaced by one of the p selected products • replacement overcosts are minimized

The PMP: Applications • Optimal diversity management • Example: wiring designs, p=3 configurations with zero demand

The PMP: Applications • Cell formation in group technology functional layout cellular layout see also video at http://www.youtube.com/watch?v=q_m0_bVAJbA - machines - products routes

The PMP: Applications • Vehicle routing - clients / storage - vehicle routes

The PMP: Applications • Topological design of computer and communication networks

Publications, more than 500 Goldengorin et al, 2011, 2012 Elloumi, 2010; Brusco and K¨ohn, 2008; Belenky, 2008; Church, 2003; 2008; Avella et al, 2007; Beltran et al, 2006; Reese, 2006 (Overview, NETWORKS) ReVelle and Swain, 1970; Senne et al, 2005.

Brusco and Kohn PSYCHOMETRIKA—VOL. 73, NO. 1, 89–105 There is an evidence that the p-median model can, for certain data structures, provide better cluster recovery than alternative clustering procedures (Klastorin, 1985). Klastorin provided a limited comparison of misclassification rates of the complete linkage (Johnson, 1967), average linkage (Sokal & Sneath, 1963), minimum variance (Ward, 1963), K-means (Hartigan & Wong, 1979; MacQueen, 1967), and p-median methods (Mulvey & Crowder, 1979). For data generated based on squared Euclidean measures of dissimilarity, Ward’s method provided the lowest misclassification rates, followed by the p-median method. The p-median model, however, provided the lowest misclassification rates when the pairwise measure of dissimilarity was based on Euclidean distance.

The PMP: Boolean Linear Programming Formulation (ReVelle and Swain, 1970) s.t. - each client is served by exactly one facility - p opened facilities - prevents clients from being served by closed facilities xij = 1, if j-th client is served by i-th facility; xij = 0, otherwise 47

The PMP:alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) 48

The PMP: alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) 49

The PMP: alternative formulation, Cornuejols et al. 1980 Let for each client j - sorted (distinct) distances (Kj – number of distinct distances for j-th client) Decision variables 50 S - set of opened plants

Mixed Tools for Market Analysis and Their Applications

Mixed Tools for Market Analysis and Their Applications

Presentation Transcript

Swaps and their Applications

Options and their Applications

Percents and Their Applications

Mixed Methods For Poverty Analysis

Mixed Tools for Markets Graph Analysis

Resistive Type of Sensors - Their Analysis and Applications

DIODES AND THEIR APPLICATIONS

Superconductors and their applications

Media and their applications

MARKET ANALYSIS TOOLS

Applications: Analysis Tools

TOOLS FOR MARKET ANALYSIS

TOOLS FOR MARKET ANALYSIS

Copulas and their Applications

Different kinds of cutting tools and their applications

Market Research Analysis Tools

Mixed Reality Market Forecast Analysis Report

A List of Wood Cutting Tools and Their Applications

Air Tools and Their Applications

ETL Tools and Their Applications in Data Warehousing