430 likes | 629 Views
APPLICATION OF MULTIVARIATE ANALYSES TO FIND PREDICTORS OF MULTIPLE GESTATIONS FOLLOWING IN VITRO FERTILIZATION. Krisztina Boda and Péter Kovács Department of Medical Informatics, University of Szeged, Hungary ( boda@dmi.uszeged.hu ), and Kaali Institute IVF Center, Budapest, Hungary.
E N D
APPLICATION OF MULTIVARIATE ANALYSES TO FIND PREDICTORS OF MULTIPLE GESTATIONS FOLLOWING IN VITRO FERTILIZATION Krisztina Boda and Péter Kovács Department of Medical Informatics, University of Szeged, Hungary (boda@dmi.uszeged.hu), and Kaali Institute IVF Center, Budapest, Hungary
Introduction • Multivariate methods are frequently used in medical research. The choice of the appropriate method depends on several criteria, but multicollinearity is a common problem of these methods. • The aim of this work is to show the application of multivariate methods to find the best predictors of multifetal pregnancy from several, highly correlated independent variables.
Background • Assisted reproductive technology (ART) has lead to a dramatic increase in multiple gestations. • Multiple gestations, especially high-order multiple gestations are undesired outcome following ART. A multifetal pregnancy is associated with significant maternal, fetal and neonatal morbidity/mortality.
Data • Retrospective analysis of 896 fresh in vitro fertilization (IVF) cycles that resulted in pregnancy from 2002-2003. • Patient characteristics • age, baseline FSH, etiology of infertility , • stimulation parameters • protocol, number of follicles, oocytes, mature oocytes (MII), fertilization rate, endometrial thickness, • embryology parameters • number of embryos transferred, quality of best embryo transferred, embryo score (ESC) • were evaluated and compared between cycles resulting in singleton and multiple gestations.
Frequencies of pregnancies by the number of embryos transferred (What can we do with the „0”-s?)
Dealing with zeros: pooling cells The depedent variable is dichotomous (singleton – multiple pregnancies)Number of embryos transferred: omit „1” , pool 4-5
Methods • Two groups: • singleton vs. multiple gestations • Three groups: • singleton, twin and higher-order multiple gestations • Factors that could influence outcome were compared using univariate methodsfirst. • A multiple logistic regression was used to evaluate the association between cycle outcome and those factors that potentially influence the order of pregnancy; • binary logistic regression to compare two groups, • and multinomial logistic regression to compare three groups. • Poisson regression • Strong correlation was found between several independent variables. Multicollinearity diagnostics were performed.
Variables and p-values of univariate analyses when comparing singleton vs. multiple pregnancies
Variables and p-values of univariate analyses when comparing singleton vs. multiple pregnancies Candidate variables for binary logistic regression Red: p<0.05 Blue: p is „small”
Structure of variables based on correlations by cluster analysis
The phenomenon of multicollinearity • When the independent variables are correlated, there are problems in estimating regression coefficients. • The greater the multicollinearity, the greater the standard errors. • Slight changes in model structure result in considerable changes in the magnitude or sign of parameter estimates.
Identification of problematic multicollinearity I.Collinearity statistics • Tolerance. A statistic used to determine how much the independent variables are linearly related to one another. The proportion of a variable's variance not accounted for by other independent variables in the equation. • Variance inflation factor (VIF). The reciprocal of the tolerance. As the variance inflation factor increases, so does the variance of the regression coefficient, making it an unstable estimate. Large (>4) VIF values are an indicator of multicollinearity. Rj2: the coefficient of determination for the regression of the jth independent variable on all other independent variables.
Identification of problematic multicollinearity I.Collinearity statistics
Identification of problematic multicollinearity II.Factor analysis • Extraction method: • principal components analysis • Rotation method: • varimax with Kaiser normalization • Number of factors • eigenvalues >1 • Results: • Number of factors=6 • Total variance explained=69.62
Parameters with the strongest association with a factor were mostly included into multivariate model
Binary logistic regression • Dependent variable: • pregnancy (singleton vs. multiple pregnancies) • Independent variables: • No. of mature oocytes • Mean embryo score • No. of embryos transferred (categorical) - 2 vs. 3 - 2 vs. ≥4 • Age • Rate of fertilized oocytes • Baseline FSH (IU/l) • Endometrial thickness
Results of stepwise binary logistic regression (main effects) The significance of model terms in logistic regression was assessed by the likelihood ratio test. Mean embryo score and the number of embryos transferred were positively, while baseline FSH level was negatively associated with multiple gestations.
Estimated probability of multiple pregnancy at a mean FSH level 7.65 The model-equation
Examination of interactions Specific interactions between parameters of interest were also investigated.
Poisson regression model • Dependent variable: • Number of pregnancy • Independent variables: • No. of mature oocytes • Mean embryo score • No. of embryos transferred (categorical) - 2 vs. 3 - 2 vs. ≥4 • Age • Rate of fertilized oocytes • Baseline FSH (IU/l) • Endometrial thickness
Poisson regression results by PROC GENMOD procgenmoddata=KOVACSP.diakhoz; model tsz= fsh et23 et24 meanesc /dist=poi link=log obstats dscale;; odsoutput ObStats=temp; run; Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 810 190.1216 0.2347 Scaled Deviance 810 810.0000 1.0000 Pearson Chi-Square 810 219.2273 0.2707 Scaled Pearson X2 810 934.0028 1.1531 Log Likelihood -3246.1994 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 0.1497 0.0693 0.0139 0.2855 4.67 0.0307 fsh 1 -0.0095 0.0056 -0.0205 0.0015 2.89 0.0891 et23 1 0.1492 0.0389 0.0730 0.2255 14.70 0.0001 et24 1 0.1883 0.0496 0.0910 0.2856 14.40 0.0001 meanesc 1 0.0063 0.0020 0.0024 0.0102 9.99 0.0016 Scale 0 0.4845 0.0000 0.4845 0.4845 NOTE: The scale parameter was estimated by the square root of DEVIANCE/DOF.
Estimated number of pregnancies at a mean FSH level 7.65 The model-equation
Pooling pregnancies into three groups: singletons, twins and multiple pregnancies Frequencies of pregnancies by the number of embryos transferred and by age multiple
Variables and p-values of univariate analyses when comparing singleton, twins and multiple pregnancies
Variables and p-values of univariate analises when comparing singleton, twins and multiple pregnancies Candidate variables for multinomial logistic regression Red: p<0.05 Blue: p is „small”
Multinomial logistic regression • Dependent variable: pregnancy • Reference category: singleton pregnancy • Independent variables (based on univariate results and factor analysis): • embryo score • Baseline FSH • Cryopreservation • Age (group) • (number of embryos transferred was suppressed : only >2 could be taken into account)
Results Likelihood Ratio Tests Effect -2 Log Likelihood Chi-Square df Sig. of Reduced Model Intercept 1181.617(a) .000 0 . ESC 1197.148 15.531 2 .000 FSH 1188.177 6.560 2 .038 CRYOS 1188.188 6.570 2 .037 AGE 1189.922 8.305 2 .016 Embryo score, FSH, age less than 35 years and the availability of surplus embryos for cryopreservation were linked to high-order multiple gestations.
Examination of interactions • The interactions between age and the other variables in the model and all two-way interactions were examined and tested by the likelihood ratio test. • None of these interactions was significant.
Results of multinomial logistic regression Embryo score was significantly associated with higher-order multiple gestations as well Baseline FSH was lower in patients whose cycle resulted in twins When surplus embryos were available for cryopreservation, the risk of high-order multiple gestation was increased The risk of a high-order multiple gestation was increased 3.649 times among women under the age of 35 years .
Conclusions The several multivariate methods revealed similar results. The application of multicollinearity diagnostics and factor analysis was helpful in the choice of independent variables in the multivariate models:In the final models original and „relatively” uncorrelated variables were used.
Conclusions By limiting the number of high quality embryos transferred, especially among young women who have several good quality embryos, one could reduce the number of multifetal gestations and the perinatal outcome could be improved.
References Articles: • Elster N. and the Institute for Science, Law, and Technology Working Group on Reproductive Technology: Less is more: the risks of multiple births. Fertility and Sterility 2000;74, 617-622. • The ESHRE Capri Workshop Group: Multiple gestation pregnancy. Human Reproduction 2000;15,1856-1864. • Van Steen K, Curran D, Kramer J, Molenberghs G, Vreckem A, Bottomley A, Sylvester R. Multicollinearity in prognostic factor analyses using the EORTC QLQ-C30: identification and impact on model selection. Statistics in Medicine 2002: 21, 3865-3884. Books: • Hosmer DW and Lemeshow S. Applied Logistic Regression. Wiley: New York, 2000. • Agresti A. An Introduction to Categorical Data Analysis. Wiley: New York, 1996.
Drawbacks The number of embryos transferred was decided by the clinician, and was based on his own experience – this subjective element may cause bias in the model and in the parameter estimation. However, here randomisation could not be used because of ethical reasons. The data set contained no information about unsuccessful in vitro fertilizations, that did not result in pregnancy.
Identification of problematic multicollinearity II. Collinearity Diagnostics Condition index: square root of the ratio of the largest to the smallest eigenvalue
Result of stepwise binary logistic regression (main effects) The model-equation
Result of stepwise binary logistic regression (main effects) The significance of model terms in logistic regression was assessed by the likelihood ratio test. Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 20.7141 4 0.0004 Score 19.9532 4 0.0005 Wald 19.4124 4 0.0007 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 3.1985 8 0.9213