1 / 18

8 th Iranian Workshop on Chemometrics, IASBS, 7-9 Feb 2009

8 th Iranian Workshop on Chemometrics, IASBS, 7-9 Feb 2009. QSAR/QSPR Model development and Validation Essential for successful application and interpretation. Mohsen Kompany-Zareh. Content:. 31 molecules 53 descriptors. Selwood data: D (31x53) , Y(31x1). >> load selwood.txt;

michi
Download Presentation

8 th Iranian Workshop on Chemometrics, IASBS, 7-9 Feb 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 8th Iranian Workshop on Chemometrics, IASBS, 7-9 Feb 2009 QSAR/QSPR Model development and ValidationEssential for successful application and interpretation Mohsen Kompany-Zareh

  2. Content:

  3. 31 molecules 53 descriptors Selwood data: D (31x53) , Y(31x1) >> load selwood.txt; >> D=selwood(:,1:end-1); >> y=selwood(:,end); D Model y

  4. Simplest model: Multiple Linear Regression D b = y b = D+ y >> b0= D\y; >> yEST= D*b0; Model is developed Validation? 22 of 53 coeff.s are zero!! b0

  5. Problem: Sometimes a highly fitted and accurate model for training set is not proper for validation sets !! Is not reliable !!

  6. External validation There are many different methods for selection of members in training and test set. Division to calibration and test sets calD = [D(1:3:end,:);[D(2:3:end,:)]]; valD = D(3:3:end,:); caly = [y(1:3:end,:);[y(2:3:end,:)]]; valy = y(3:3:end,:); Model calD Developm. caly valD validation valy b1=calD\caly; %model development

  7. >> calyEST=calD*b1; >> valyEST=valD*b1; %external model validation   Not good prediction

  8. >> calyEST=calD*b1; %root mean square error of calibr >> rmsec1=sqrt(((caly-calyEST)'*(caly-calyEST))/calDr) RMSEC=2.9396e-014 >> testyEST=testD*b1; %external model validation >> rmsep1=sqrt(((testy-testyEST)'*(testy-testyEST))/testDr)  Not good prediction  RMSEP=2.2940

  9. Train Test residual SS

  10. Train Test Tot variance SS

  11. Train R2 = 1.0000 Test  q2 = -8.5220

  12. Training set Internal validation Cross validation Leave-one-out

  13. Training set

  14. validation developm # subsamples = # molec.s in training set cumPRESS

  15. LOO CV for i = 1:Dr calX = [X(1:i-1,:);[X(i+1:Dr,:)]]; valX = X(i,:); caly = [y(1:i-1,:);[y(i+1:Dr,:)]]; valy = y(i,:); b = (calX\caly)'; valyEST(i) = valX*b‘; press(i) = ((valyEST(i)-valy).^2)'; end cumpress= sum(press); rmsecv = sqrt(cumpress/Dr); q2LOO=1-((y-valyEST')'*(y-valyEST'))/… ((y-mean(y))'*(y-mean(y)))

  16. q2LOO = -4.8574 RMSECV = 2.0397 >> q2ASYMPTOT=1-(1-R2)*(calDr/(calDr-calDc))^2 q2ASYMPTOT = 1.0000 >> if q2LOO-q2ASYMPTOT<0.005,disp('reject'),end REJECT

  17. QUIK 4 correlated descriptors M= y= >> corr(M) >> p=size(M,2); >> CorrEV=svds(corr(M),p); It seems possible to use svd(M)

  18. >> K=sum(abs((CorrEV/sum(CorrEV))-(1/p)))/(2*(p-1)/p); All in afunction >> [KM]=QUIK(M) KM = 1.0000 Maximum correlation between descriptors >> [KMY]=QUIK([M Y]) KMY = 1.0000 if KMY-KM<0.05,disp('reject'),else,disp('NOT reject'), end REJECT

More Related