350 likes | 508 Views
A criterion-based PLS approach to SEM. Michel Tenenhaus (HEC Paris) Arthur Tenenhaus (SUPELEC). Economic inequality Agricultural inequality GINI : Inequality of land distributions FARM : % farmers that own half of the land (> 50) RENT : % farmers that rent all their land
E N D
A criterion-based PLS approach to SEM Michel Tenenhaus (HEC Paris) Arthur Tenenhaus (SUPELEC)
Economic inequality Agricultural inequality GINI :Inequality of land distributions FARM : % farmers that own half of the land (> 50) RENT : % farmers that rent all their land Industrial development GNPR : Gross national product per capita ($ 1955) LABO : % of labor force employed in agriculture Political instability INST : Instability of executive (45-61) ECKS : Nb of violent internal war incidents (46-61) DEAT : Nb of people killed as a result of civic group violence (50-62) D-STAB : Stable democracy D-UNST : Unstable democracy DICT : Dictatorship Economic inequality and political instability Data from Russett (1964), in GIFI
Economic inequality and political instability (Data from Russett, 1964)
A SEM model Agricultural inequality (X1) INST GINI ECKS C13 = 1 FARM 1 DEAT RENT 3 C12 = 0 D-STB GNPR D-INS 2 C23 = 1 LABO DICT Industrial development (X2) Political instability (X3) 4
Some modified multi-block methods for SEM cjk = 1 if blocks are linked, 0 otherwise and cjj = 0 GENERALIZED CANONICAL CORRELATION ANALYSIS GENERALIZED PLS REGRESSION
Covariance-based criteria for SEM cjk = 1 if blocks are linked, 0 otherwise and cjj = 0
A general procedure to obtain critical points of the criteria • Construct the Lagrangian function related to the optimization problem. • Cancel the derivative of the Lagrangian function with respect to each wi. • Use the Wold’s procedure to solve the stationary equations ( Lohmöller’s procedure). • This procedure converges to a critical point of the criterion. • The criterion increases at each step of the algorithm.
Yj = Xjwj Outer Estimation Initial step wj Y1 Zj ej1 Inner estimation Y2 ej2 YJ ejJ • Choice of weights ejh: • Horst : ejh = cjh • Centroid : ejh = cjhsign(Cor(Yh , Yj) • Factorial : ejh = cjhCov(Yh , Yj) cjh = 1 if blocks are linked, 0 otherwise and cjj = 0 The general algorithm Iterate until convergence
With usual PLS-PM constraints: Specific cases PLS Mode B
Specific cases With usual PLS regression constraint: PLS New Mode A
X1 X2 x1 x2 (*) I. PLS approach : 2 blocks (*) Deflation: Working on residuals of the regression of X on the previous LV’s in order to obtain orthogonal LV’s.
PLS regression (2 components) Use of PLS-GRAPH dim 1 - Mode A for X - Mode A for Y - Deflate only X dim 2
PLS Regression in SIMCA-P : PLS Scores australie 3 royaume-uni 2 états-unis argentine nouvelle zélande belgique venezuela uruguay pays-bas irak 1 cuba espagne italie chili luxembourg équateur suède pèrou colombie autriche autriche suisse rép. dominicaine france West Germany grèce salvador bolivie 0 costa-rica guatémala t[2] taiwan brésil brésil norvège panama canada honduras nicaragua égypte sud vietnam philippines - 1 inde libye finlande danemark irlande - 2 japon japon - 3 pologne yougoslavie - 4 - 4 - 3 - 2 - 1 0 1 2 3 4 t[1]
1 . 0 0 RENT 0 . 8 0 GINI 0 . 6 0 FARM 0 . 4 0 GNPR DEMOSTB 0 . 2 0 DEAT pc(corr)[Comp. 2] 0 . 0 0 INST DEMOINST ECKS - 0 . 2 0 DICTATURE - 0 . 4 0 LABO - 0 . 6 0 - 0 . 8 0 - 1 . 0 0 - 1 . 0 0 - 0 . 8 0 - 0 . 6 0 - 0 . 4 0 - 0 . 2 0 0 . 0 0 0 . 2 0 0 . 4 0 0 . 6 0 0 . 8 0 1 . 0 0 pc(corr)[Comp. 1] Correlation loadings
- Mode A for X - Mode B for Y - Deflate only X Redundancy analysis of X on Y(2 components) dim 1 dim 2
dim 1 dim 2 Inter-battery factor analysis (2 components) - Mode A for X - Mode A for Y - Deflate both X and Y
dim 1 dim 2 Canonical correlation analysis (2 components) - Mode B for X - Mode B for Y - Deflate both X and Y
Barker & Rayens PLS DA - Mode A for X - Mode B for Y - Deflate only on X 20
Barker & Rayens PLS DA Economic inequality vs Political regime Separation between political regimes is improved compared to PLS-DA.
Generalized PLS regression Generalized CCA II. Hierarchical model : J blocs x1 X1 X1 XJ …….. x . . . xJ XJ Deflation : On original blocks and/or the super-block
III. Multi-block data analysis SABSCOR : PLS Mode B + Centroid scheme Use of XLSTAT-PLSPM
Multiblock data analysis SSQCOR : PLS Mode B + Factorial scheme
* Criterion optimized by the method Practice supports “theory” * * (checked on 50 000 random initial weights)
C13=1 C12=0 C23=1 Cij = 1 if i and j are connected = otherwise IV. Structural Equation Modeling Agricultural inequality (X1) INST GINI ECKS FARM 1 DEAT RENT 3 D-STB GNPR D-INS 2 LABO DICT Industrial development (X2) Political instability (X3)
SABSCOR-PLSPMMode B + Centroid scheme Y1 = X1w1 Y3 = X3w3 Y2 = X2w2
SSQCOR-PLSPMMode B + Factorial scheme Y1 = X1w1 Y3 = X3w3 Y2 = X2w2
* * * Criterion optimized by the method (checked on 50 000 random initial weights) Comparison between methods Practice supports “theory”
Mode A + Centroid scheme The criterion optimized by the algorithm, if any, is unknown. 30
SABSCOV-PLSPMNew Mode A + Centroid scheme weight 0.66 Cor=.429 0.17 0.74 Cov=1.00 0.44 0.11 0.50 -0.55 Cov=-1.69 0.69 0.46 Cor=-.764 -0.72 One-step hierarchical PLS Regression 31
SABSCOV-PLSPMNew Mode A + Factorial scheme weight 0.66 Cor=.4276 0.17 0.74 Cov=1.00 0.44 0.10 0.48 -0.56 Cov=-1.69 0.69 0.49 Cor=-.7664 -0.72 One-step hierarchical PLS Regression 32
Generalized Barker & Rayens PLS-DASSQCOV-PLSPMNew Mode A for X1 and X2 and Mode B for Y weight 0.62 Cov 0.75 -0.72 0.55 0.22 0.39 0.67 -1.04 -0.74 One-step hierarchical B&R PLS-DA
Generalized Barker & Rayens PLS-DA Industrial development Agricultural inequality
Conclusion • In the PLS approach of Herman Wold, the constraint is: • In the PLS regression of Svante Wold, the constraint is: • This presentation unifies both approaches.