1 / 22

MixeR Package for CDA - - graphical display of three and four part (sub)compositions

Matevž Bren 1,2 Vladimir Batagelj 2,3 1 University of Maribor, Slovenia matevz.bren@fov.uni-mb.si 2 Institute of Mathematics, Physics and Mechanics, Slovenia 3 University of Ljubljana, Slovenia IAMG 2005, August 21-26, Toronto, Canada.

natala
Download Presentation

MixeR Package for CDA - - graphical display of three and four part (sub)compositions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matevž Bren1,2 Vladimir Batagelj2,3 1University of Maribor, Slovenia matevz.bren@fov.uni-mb.si 2Institute of Mathematics, Physics and Mechanics, Slovenia 3University of Ljubljana, Slovenia IAMG 2005, August 21-26, Toronto, Canada MixeR Package for CDA -- graphical display of three and four part (sub)compositions

  2. IAMG 2005, August 21-26, Toronto Introduction Groundwork on Compositional Data Analysis is the book of John Aitchison from 1986 The statistical Analysis of Compositional Data. From the book we quote: “The properties of many substances or objects, such as gasoline, metal alloys and cakes, depend on the particular mixture, or composition, of their ingredients. The purpose of the experiments with different mixtures is to obtain some understanding of the nature and extend of the dependence of the properties on the composition. In the analysis of such experiments the composition is confined to the role of a covariate.”

  3. IAMG 2005, August 21-26, Toronto Introduction… Examlpe 1: Glacial data set - from Aitchison (1986) 92 samples of pebbles of glacial tills sorted into four categories red sandstone, gray sandstone, crystalline and miscellaneous. The percentages by weight of these four categories and the total pebbles counts are recorded. RedSandstone GraySandstone Crystalline Misc Counts 1 91.8 7.1 1.1 0.0 282 2 88.9 10.1 0.5 0.5 368 ... ... ... 90 15.9 83.3 0.8 0.0 245 91 16.9 74.3 1.2 5.9 575 92 31.4 65.9 2.7 0.0 698 “The glaciologist is interested in describing the pattern of variability of his data and whether the compositions are in any way related to abundance.”

  4. IAMG 2005, August 21-26, Toronto Introduction… Compositions (compounds, mixtures, alloy…) can be represented with vectors of the portions of individual components. The portions are nonnegative and they have constant sum equal to 100 (percentage) or 1 (portions). The sample space for compositions is (unit) simplex SD For D=3 graphically represented by a ternary diagram For D=4 graphically represented by a tetrahedron

  5. Introduction… Left: three parts compositions x=(x1, x2, x3) in ternary diagram x1 + x2 + x3 =1 Right: four part compositions x=(x1, x2, x3 , x4,) in tetrahedron x1 + x2 + x3 + x4 =1 IAMG 2005, August 21-26, Toronto

  6. IAMG 2005, August 21-26, Toronto Introduction… R at http://www.r-project.org is `GNU S' - a language and environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering...). Further extensions can be provided as packages.

  7. IAMG 2005, August 21-26, Toronto Introduction… In 2003 we started a MixeR project - library of functions in R to support the CDA i.e. statistical analysis of mixtures: operations on compositions perturbation and power multiplication, subcomposition with or without residuals, computing Aitchison's, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of three and four parts (sub)compositions in ternary diagrams and tetrahedrons with additional features: barycentre, geometric mean of the data set, the percentiles and ratio lines, marking and colouring of subsets of the data set, centring of the data, notation of individual data in the set etc. logratio transformations of compositions into real vectors that are amenable to standard multivariate statistical analysis etc.

  8. IAMG 2005, August 21-26, Toronto Compositional Data Analysis SW tools • CoDa 1986 by John Aitchison, written in Quick Basic available with the Aitchison’s book • CoDa upgraded by John Bacon-Shone • CoDaPack 2001 freeware SW by Santiago Thió and Raimon Tolosana in Excel http://ima.udg.es/Recerca/EIO/inici_cat.html • atemps in R - by Joel Raynolds and Dean Billheimer at http://www.biostat.wustl.edu/archives/html/s-news/2003-12/msg00139.html

  9. IAMG 2005, August 21-26, Toronto Compositional data analysis SW tools… • MixeR 2003 by Batagelj and Brenat http://vlado.fmf.uni-lj.si/pub/MixeR • ‘compositions’ package 2005, by K. Gerald van den Boogaart and Raimon Tolosana Delgado at http://cran.r-project.org/src/contrib/Descriptions/compo sitions.html

  10. IAMG 2005, August 21-26, Toronto Mixture class in R The input mixture data - object m consist of m$tit the title, m$mat the data matrix, m$sum the value of the row sums, if constant and m$sta status of the mix object with values -2 - matrix contains negative elements -1 - zero row sum exists 0 - matrix contains zero elements 1 - matrix contains positive elements, rows with different row sums 2 - matrix with constant row sum 3 - normalized mixture, the row sums are equal to 1

  11. IAMG 2005, August 21-26, Toronto Mixture class in R… Example 1:The glacial mixture object > m <- mix.Read('glacial.dat') $tit [1] "GLACIAL DATA 92 samples of pebbles of glacial tills sorted into four categories percentages by weight" $sum [1] NA $sta [1] 0 $mat RedSandstone GraySandstone Crystalline Misc 1 91.8 7.1 1.1 0.0 2 88.9 10.1 0.5 0.5 ... ... ... 91 16.9 74.3 1.2 5.9 92 31.4 65.9 2.7 0.0 attr(,"class") [1] "mixture"

  12. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R mix.Read(file, eps=1e-6) Reads a mix data from the file and returns a mix object. If |m$sum - 1|< eps it sets m$sta = 3 mix.Check(m, eps=1e-6) Determines the m$sum and m$sta of a given mixture object m. mix.Normalize(m, c=1) Normalizes a given mixture object m if m$sta > 0. The rows sums are now normalized to the constant c with default value c=1. mix.Random(nr, nc, s=1) Constructs the random mix object with nr rows and nc columns and constant row sum s

  13. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… Subcompositions of mixture objects mix.Sub(m, k, Normalize=TRUE) subcomposition of m without the k=(k_1,...,k_r) columns normalized if Normalize=T mix.Extract(m, k, Normalize=TRUE) subcomposition of m with only the k=(k_1,...,k_r) columns normalized if Normalize=T mix.ExtractRes(m, k) subcomposition with the k=(k_1,...,k_r) columns all the rest is amalgamated in the residual output is the normalized mixture object with the r+1 columns

  14. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… Visualization in ternary diagram routine mix.Ternary(m,dist,distG,cls,Center, Borders,Gmean) Draws ternary diagram with mixture data m with additional features centered, borders percentile lines and geometric mean of the data. The default value for Center, Borders and Gmean is FALSE. dist - additional distances to numbers marking the percentile line, distG - additional distances to numbers marking the percentile line of the geometric mean and cls – colors of the percentile lines.

  15. The 'mix' procedures in R… LEFT: The three part subcomposition with geometric mean > mix.Ternary(mix.Sub(m,4),Gmean=T) RIGHT: centered for better visualization of the differences between cases – border perc. lines for actual variation. >mix.Ternary(mix.Sub(m,4),Borders=T,Center=T) IAMG 2005, August 21-26, Toronto

  16. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… Visualization in tetrahedron routine mix.Q2kin(fkin, m) transforms a 4 parts mixture m quadrays into 3-dimensional XYZ coordinates and writes them as a file.kin. The kin file we display as 3D animation with MAGE viewer – free software available at http://kinemage.biochem.duke.edu/software/software1.html/#mage

  17. The 'mix' procedures in R… Snapshots of glac.kin 3D MAGE view of tetrahedral display of glacial data – four parts compositions. > mix.Q2kin(“glac.kin", m) IAMG 2005, August 21-26, Toronto

  18. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… Percentile lines routine percentile.lines(y, direction, cls, dist,lt) draws percentile lines into drown ternary diagram. y – percents or portions for percentile lines direction - directionions for percentile lines, value 1, percentile lines to the vertex No.1 = top, value 2, to the vertex No.2 = right, value 3, to the vertex No. 3 = left. The default value is direction = 1:3 (all directionions) cls – is the vector with colours, first for percentile lines to the vertex No. 1, second … The default value is cls = c("yellow" , "yellow2", "yellow3") dist – additional distances to numbers marking the percentile lines, first for perc. lines to the vertex No.1… The default value dist = c(0.05, 0.05, 0.05) lt – is the vector with line types (values 1, 2,..., 10), first for…The default value lt = c(4,3,2)

  19. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… Example 2 mix object m with nine cases and three variables, i.e. 9x3 matrix having 0.1 to 0.9 values in the first column, ratios between the second and third being ½ $tit [1] "Deciles values in the first column" $sum [1] 1 $sta [1] 3 $mat aa bb cc 1 0.1 0.30000000 0.60000000 2 0.2 0.26666670 0.53333330 3 0.3 0.23333330 0.46666670 ... ... ... 9 0.9 0.03333333 0.06666667 attr(,"class") [1] "mixture"

  20. IAMG 2005, August 21-26, Toronto The 'mix' procedures in R… We draw a ternary diagram with these nine points in different colours – cls, shapes – pch, and size cex=1 > cls <- c("khaki", "pink", "sienna", "tan", ...,"purple" ) > mix.Ternary(m, col=cls, pch=0:8, cex=1) > perc.lines(10*1:9,dir=1, cls="cyan", lt=1) Example 3 > mix.Ternary(mix.Random(22,3)) > perc.lines(10*1:9, cls=c("blue", "blueviolet", "violet"))

  21. The 'mix' procedures in R… LEFT: Three parts compositions with deciles values in the first variable and constant ratios ½ between the second and the third variable – simulated data, deciles lines in the first direction RIGHT: ternary diagram with random 22 points and deciles lines in all three directions. IAMG 2005, August 21-26, Toronto

  22. IAMG 2005, August 21-26, Toronto Conclusions We have demonstrated some mix routines and features for visualization of three and four parts (sub)compositions, available at http://vlado.fmf.uni-lj.si/pub/MixeR To provide a complementary use of ‘compositions’ package and MixeR routines would be a most welcoming step. Therefore our future work would be to code transformations routines from the mix object to the objects of the five different classes: rplus, rcomp, acomp, aplus and mult implemented in ‘compositions’ package and of course transformations from the four classes to the mix objects. With these routines we hope to enable users to apply and to benefit from both, the ‘compositions’ package and also the MixeR library routines.

More Related