710 likes | 841 Views
بنام خدا. An Introduction to multi-way analysis. Mohsen Kompany-Zareh IASBS, Nov 1-3, 2010. Session one. The main source:. Kronecker product Khatri-Rao product Multi-way data Matricizing the data Interaction triad G PARAFAC Panel performance Matricizing and subarray Rank
E N D
An Introduction to multi-way analysis MohsenKompany-Zareh IASBS, Nov 1-3, 2010 Session one
Kronecker product • Khatri-Rao product • Multi-way data • Matricizing the data • Interaction triad • G • PARAFAC • Panel performance • Matricizing and subarray • Rank • Dimensionality vector • Rank-deficiency in three-way arrays • Tucker3 rotational freedom • Unique solution • Tucker2 model • Tucker1 model
kronecker product (A B) >> A=[2 3 4; 2 3 4] >> B=[3 4; 3 5] >> krnAB=[A(1,1)*B A(1,2)*B A(1,3)*B ; A(2,1)*B A(2,2)*B A(2,3)*B] krnAB = 6 8 9 12 12 16 6 10 9 15 12 20 6 8 9 12 12 16 6 10 9 15 12 20 >>
kronecker product >> A=[2 3 4; 2 3 4] >>B=[3 4; 3 5] >> p=kron(A,B) >>p= 6 8 9 12 12 16 6 10 9 15 12 20 6 8 9 12 12 16 6 10 9 15 12 20 >> All columns in A see all columns in B.
kronecker product >> A=[2 3 4; 2 3 4] >>C=[3 4 5; 3 5 2] >>krnAC=[kron(A(:,),C(:,))... column 1 kron(A(:,1),C(:,2))... column 2 kron(A(:,1),C(:,3))... .. kron(A(:,2),C(:,1))... .. kron(A(:,),C(:,))... .. kron(A(:,2),C(:,3))... kron(A(:,3),C(:,1))... kron(A(:,3),C(:,2))... kron(A(:,),C(:,))] column 9 krnAC = 6 8 10 9 12 15 12 16 20 6 10 4 9 15 6 12 20 8 6 8 10 9 12 15 12 16 20 6 10 4 9 15 6 12 20 8 >> 1 1 2 2 3 3 Khatri-Rao Product
kronecker product >> A=[2 3 4; 2 3 4] >>C=[3 4 5; 3 5 2] krnAC = 6 8 10 9 12 15 12 16 20 6 10 4 9 15 6 12 20 8 6 8 10 9 12 15 12 16 20 6 10 4 9 15 6 12 20 8 vec(a2b2) vec(a3b3) vec(a1 b1) vec(a1 b3) vec(a3b1) vec(a2b1) vec(a2b3) vec(a1 b2) vec(a3b2) Interaction terms
Khatri-Rao Product >> A=[2 3 4; 2 3 4] >> B=[3 4 5; 3 5 2] khtrAB= 6 12 20 6 15 8 6 12 20 6 15 8 >> No of columns in A should be the same as the number of columns in B.
Kronecker product • Khatri-Rao product • Multi-way data • Matricizing the data • Interaction triad • G • PARAFAC • Panel performance • Matricizing and subarray • Rank • Dimensionality vector • Rank-deficiency in three-way arrays • Tucker3 rotational freedom • Unique solution • Tucker2 model • Tucker1 model
Multi-way Data (generalization of matrix algebra) A zero-order tensor: a scalar; a first-order tensor : a vector; a second-order tensor (a matrix) for a sample => 3 way data, for analysis a third-order tensor (three-way array) for a sample => 4 way data, for analysis a fourth-order tensor : a four-way array and so on.
One component, HPLC-DAD a1 b1
One component, HPLC-DAD, different concentrations (elution profile) Only the intensities are changed... These 9 matrices form a TRIAD, the simplest trilinear data
A triad : X A cube of data 12x7x7 3rd order data for one sample >> a1' 0.0033 0.0971 0.8131 1.9506 1.3406 0.2640 0.0149 >> b1' 0.0222 1.7650 0.4060 0.8826 0.0111 0.0000 0.0000 >> c1' 1 2 3 4 5 6 7 8 9 10 11 12 Obtained from Tensor product of 3 vectors a1 b1 c1
a1 b1 % A triad by outer product % X111=a1 b1 c1 ... for l=1:length(a1) for m=1:length(b1) for n=1:length(c1) disp([l m n]) Xtriad(l,m,n)=a1(l)*b1(m)*c1(n); end end end X=Xtriad; .... c1
Matricizing the data X111= Unfold3D(X111, 1) (in three directions) The first chemical component
...and for the 2nd and the next chemical components: X111 = a1 b1 c1 + X222 = a2 b2 c2 + X333=a3b3c3 Each component in a separate triad (no interaction) X = X111 + X222 + X333 Trilinear PARAFAC
Interaction triad In the presence of Interaction : X111 = a1 b1 c1 + X222 = a2 b2 c2 + X121=a1b2c1 X = X111 + X222 + X121 Non Trilinear!! Tucker
G How many interaction triads? For two components in three modes: X111 = a1 b1 c1 G(111)= 2 X112 = a1 b1 c2 G(112)= 0 X121 = a1 b2 c1 G(121)= 1 X122 = a1 b2 c2 G(122)= 0 X211 = a2 b1 c1 G(211)= 0 X212 = a2 b1 c2 G(212)= 0 X221 = a2 b2 c1 G(221)= 0 X222 = a2 b2 c2 G(222)=-3 6 possible interaction triads 1 interaction triads
B(1002) G(111)= 2 G(222)=-3 G(121)= 1 G(2x2x2) C(3x2) A(11x2)
For three components in three modes: (3 3 3) – 3 = 24 possible interactions
G(?x?x?) B(1003) C(20x2) A(15x4) How many G elements?
% Tucker3outer product G=rand(4,3,2); for p=1:size(G,1) for q=1:size(G,2) for r=1:size(G,3) for i=1:size(A,2) for k=1:size(C,2) for m=1:size(B,2) disp([p q r i j k]) Xtriad(l,m,n)=A(i,l)*B(j,m)*C(k,n)*G(i,j,k); end end end X=X+Xtriad; end end end One triad
What about Tucker4?
% PARAFACouter product G=zeros(3,3,3); G(1,1,1)=1;G(2,2,2)=1;G(3,3,3)=1; for p=1:size(G,1) for q=1:size(G,2) for r=1:size(G,3) for i=1:size(A,2) for k=1:size(C,2) for m=1:size(B,2) disp([p q r i j k]) Xtriad(l,m,n)=A(i,l)*B(j,m)*C(k,n)*G(i,j,k); end end end X=X+Xtriad; end end end One triad
B(1003) C(20x3) A(15x3) PARAFAC Simple interpretation
Monitoring panel performance within and between experiments by multi-way models Rosaria Romano and MohsenKompany-Zareh Copenhagen Univ, 2007
Organic Milk of high Quality Sensory studies 2007- University of Copenhagen Two different experiments were conducted in 2007: - Spring experiment (May, week 21 & 22) - Autumn experiment (September, week 36 & 37) The objective is to establish knowledge about production of high quality organic milk with a composition and flavour different from conventionally produced milk.
Spring experiment data Data description: • 7 varieties of milk with respect to: - 2 cow races: Holstein-Fries (HF), Jersey (JE); - 7 farms: WB, EMC, UGJ, JP, HM, OA, KI. • panel: - 9 assessors, 2 sessions (focus on the second!), 3 replicates for each session. • 12 descriptors: odor (green), appearance (yellow), flavor (creamy, boiled-milk, sweet, bitter, metallic, sourness, stald-feed) after taste (astringent0, fatness, astringent20). • measurement scale: continuous scale anchored at 0 and 15.
Parafac on the spring experiment(1) Model: Parafac with two components (27.9% ExpVar), on data averaged across the samples mode JE HF • high reproducibility of the replicates in both groups; • big variation in the JE group: • - WB is the less yellow JE milk; • UGJ seems have something in common with HF group.
Parafac on the spring experiment(2) Model: Parafac with two components (27.9% ExpVar), on data averaged across the samples mode Best Reliability on Multi-way Assessment (Bro and Romano, 2008)
Kronecker product • Khatri-Rao product • Multi-way data • Matricizing the data • Interaction triad • G • PARAFAC • Panel performance • Matricizing and subarray • Rank • Dimensionality vector • Rank-deficiency in three-way arrays • Tucker3 rotational freedom • Unique solution • Tucker2 model • Tucker1 model
Rank • A has full rank (if and only if ) : r(A) = min(I,J). • If r(A )= R, [Schott 1997] • A = t1p1 + ·· ·+tRpR • R rank one matrices • (trpr , components). Bases are not unique: rotational freedom intensity (or scale) indeterminacy. sign indeterminacy.
If X (I × J ) :generated with I × J random numbers • =>probability of (X has less than full rank) =0 • .. • => measured data sets in chemistry: • always full rank (mathematical rank) <= measurment • noise • Ex: UV spectra (100 wavelengths) ; • ten different samples, • each: same absorbing species at different concentrations. • X (10 ×100) • if Lambert–Beer law holds : rank one. + measurement errors => mathem rank = ten.
X = cs’ + E = Xhat + E(model of X) • vector c : concns, • s : pure UV spectrum of the abs species • E : noise part. • 1. systematic variation 2. Noise(undesirable) • pseudo-rank =Math rank (Xhat) = one • < math rank (X). • ‘chemical rank’ : number of chemical sources of variation in data.
Rank deficiency pseudo-rank < chemical rank. ( linear relations in or restrictions on the data). Ex; X = c1s1 + c2s2 + c3s3 + E , s1 = s2 (linear relation) => X = (c1 + c2)s1 + c3s3 + E Chem rank (X)= 3 pseudo-rank (X)= 2, rank deficient
A randomly generated 2 × 2 × 2 array to have a rank lower than three : a positive probability [Kruskal 1989]. a probability of 0.79 of obtaining a rank two array a probability of 0.21 of obtaining a rank three . probability of obtaining rank one or lower is zero. generalized to : 2 × n × n arrays [Ten Berge 1991].
2 × 2 × 2 array: the maximum rank: three typical rank: {2, 3}, (almost all individual rank: very hard to establish. Three way rank : important in second-order calibration and curve resolution. for degrees of freedom ?? for significance testing.
Matricizing Matricizing and Sub-arrays X(4 × 3 × 2) Boldfaces : in the foremost frontal slice
Dimensionality vector Row-rank, column-rank, tube-rank • two-way X : rank(X) = rank(X’) column rank= row rank • :not hold for three-way arrays. • three-way array X(I × J × K) : matricized in three different ways • (i) row-wise, giving X(J ×IK), a two-way array • (ii) column-wise, giving X(I×JK) , • tube-wise, giving X(K×IJ). • and three more with the same ranks,not mentioned • ranks of the arrays X(J×IK),X(I×JK) and X(K×IJ), • = (P, Q, R): dimensionality vector of X.
P, Q and R: not necessarily equal. In contrast with two-way P = Q = r(X). dimensionality vector (P, Q, R) of a three-way array X with rank S Obeys certain inequalities[Kruskal 1989]: (i) P ≤ QR ; Q ≤ PR; R ≤ PQ (ii) max(P, Q, R) ≤ S ≤ min(PQ, QR, PR)
Three matricized forms: These arrays have rank 4, 3, and 2. Dimensionality vector is [4 3 2] P, Q and R can be unequal.
Pseudo-rank, rank deficiency and chemical sources of variation pseudo-rank of three-way arrays: straight generalization of the two-way definit. X = Xhat + E E : array of residuals. pseudo-rank of X= minimum # PARAFAC components necessary to exactly fit Xhat.
Rank-deficiency in three-way arrays Spectrophometric acid-base titration of mixtures of three weak mono-protic acids (or Flow injection analysis + pH gradient) HA2 H+ + A2- HA3 H+ + A3- HA4 H+ + A4- six components models of separate titration of the three analytes(HA2, HA3, HA4), XHA2 = ca,2sa,2 + cb,2sb,2 + EHA2 XHA3 = ca,3sa,3 + cb,3sb,3 + EHA3 XHA4 = ca,4sa,4 + cb,4sb,4 + EHA4 10 samples, 15 titn points, and 20 wavel.s => X(10×15×20),
X = Xhat + E • ca,2 + cb,2 = α(ca,3 + cb,3) = β(ca,4 + cb,4) • only four independently varying concn profiles. • Pseudo-rank (X(IJK)) = four. • pseudo-rank (X(3 × JK)) =three. • six different ultraviolet spectra form, • pseudo-rank (X(6 × KI)) =six • ==>> a Tucker3 (6,4,3) model is needed to fit X.
3 6 4 = 72 nonzero elements !! • Inequality laws: • (i) P ≤ QR ; Q ≤ PR; R ≤ PQ • max(3, 6, 4) ≤ S ≤ min(PQ, QR, PR) • 6 ≤ S ≤ 12
three-way rank of X is ≥ 6 (six PARAFAC components fit the data) Pseudo rank (S=6) is not less than chemical rank(6) => no three-way rank deficiency. rank deficiencies in one loading matrix of a three-way array are not the same as a three-way rank deficiency.