1 / 88

Menu GCDkit

Menu GCDkit. GCDkit I. Loads ASCII file or imports clipboard (e.g. data copied from Excel) Data are separated by tabulators, commas or semicolons The 1 st line contains unique labels for the data columns (e.g. ‘SiO2’, ‘Fe2O3’, ‘Rb’, ‘Nd’ ), the 1 st column unique sample IDs

pembroke
Download Presentation

Menu GCDkit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MenuGCDkit

  2. GCDkit I. Loads ASCII file or imports clipboard (e.g. data copied from Excel) • Data are separated by tabulators, commas or semicolons • The 1st line contains unique labels for the data columns (e.g. ‘SiO2’, ‘Fe2O3’, ‘Rb’, ‘Nd’),the 1st column unique sample IDs • Decimal commas are converted to decimal points if appropriate • Missing values are allowed anywhere in the data file; as such are interpreted also values ≤ 0, or any of ‘NA’,‘N.A.’,‘-’, ‘b.d.’, ‘bd’

  3. GCDkit I. • Total iron as ferrous oxide: ‘FeOt’ or ‘FeO*’ • Structurally bound water: ‘H2O.PLUS’, ‘H2O+’, ‘H2OPLUS’ or ‘H2O_PLUS’ • Column ‘Symbol’ (if any): plotting symbols (as codes or single characters) • A column whose name starts with ‘Col’ (if any): code for colour of the symbols • Avoid special symbols in the column names, and accented characters throughout the file!

  4. GCDkit I. Appends new samples (= new rows) to the data in memory. • The structures of both data files are, as much as possible, matched. • If necessary, empty columns are introduced to either of the data sets. File 1 File 2

  5. GCDkit I. Adds new data (columns) to the samples stored in the memory. • No new samples are introduced that would occur solely in one of the files. File 2 File 1

  6. GCDkit I. Saves the modified data set stored in memory under a specified filename. • The data can be retrieved again into GCDkit using the ‘Load data file’ command.

  7. GCDkit I. Information about the current dataset: • levels and frequencies for each of the labels, • list + no. of numeric columns, • for each of the numeric variables no. of available values, • total no. of samples, • list of samples in the selected subset (or all samples if none is defined), • current grouping information.

  8. GCDkit I. Prints a cross table (contingency table) for 1-3 labels and plots corresponding barplots.

  9. Contingency tables An example of a contingency table involving two labels

  10. GCDkit I. Restricts the textual output to an absolute minimum (which is useful for large data files)

  11. GCDkit II.

  12. GCDkit II.

  13. Data handling

  14. Intermezzo 1: Specifying a variable in GCDkit • Enter complete name of a variable (e.g., ‘SiO2’) • Type only part of the variable name. If the result is ambiguous, the desired variable has to be selected from the list of the multiple matches by mouse (applies also for empty patterns) • Specify the variable sequence number (2 for the second one). • Often if a formula is entered, the results are interpreted and computed by the calculation core. S

  15. Intermezzo 2: Formulae & calculation core Formula can involve any combination of names of existing numerical columns, with the constants, brackets, arithmetic operators +-*/^ and R functions. Examples of valid formulae: • (Na2O+K2O)/CaO • Rb^2 • log10(Sr) • mean(SiO2)/10

  16. Data handling I. Displays a single numeric variable or a result of a calculation # Works as a simple R shell too! • summary(Rb,na.rm=T) • cbind(SiO2/2,TiO2,Na2O+K2O) • cbind(major) • hist(SiO2,col="red") • boxplot(Rb~factor(groups))

  17. Intermezzo 3: Specifying multiple variables • List of column name(s), in full, separated by commas • Sequence numbers of variables or their ranges (1,10:15) • Name of a built-in list, such as ‘LILE’, ‘REE’, ‘major’ and ‘HFSE’ or their combinations with the column names • User-defined list = simple character vector. Currently only a single, stand-alone user-defined list can be employed as a search criterion • For empty patterns, the correct name(s) has to be selected by mouse click(s) (± Shift ± Ctrl) from the list of the available variables

  18. Intermezzo 3: Specifying multiple variables - examples • Search pattern = majorSiO2, TiO2, Al2O3, Fe2O3, FeO, MnO, MgO, CaO, Na2O, K2O, P2O5 • Search pattern = LILERb, Sr, Ba, K, Cs, Li • Search pattern = HFSENb, Zr, Hf, Ti, Ta, La, Ce, Y, Ga, Sc, Th, U • Search pattern = REELa, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu • Search pattern = 1:5,7Numeric data columns number 1, 2, ...5, 7 • # User-defined listmy.elems<-c("Rb","Sr","Ba")Search pattern = my.elemsRb, Sr, Ba

  19. Intermezzo 4: Searching and subsetting • The search pattern is first tested whether it could be interpreted as a query of the sample name(s). The list of exact sample names separated by commas is allowed. • The pattern is assumed to correspond to a selection of sample sequence numbers. • Lastly the search pattern is interpreted as a Boolean condition. • Entering empty pattern usually returns all the samples in the data set.

  20. Intermezzo 4: Searching and subsetting - examples 1. By sample name • Search pattern = ozSamples with names Koz, KozD-5, Roz-5 … • Search pattern = Bl-1,Bl-2,Koz-3Samples with names Bl-1,Bl-2,Koz-3 • Regular expressions (advanced technique, see later)

  21. Intermezzo 4: Searching and subsetting - examples 2. By sample range In this case the search pattern is treated as a selection of sample sequence numbers (effectively a list separated by commas that may also contain ranges expressed by colons). • Search pattern = 1:5# First to fifth samples in the data set • Search pattern = 1,10# First and tenth samples • Search pattern = 1:5, 10:11, 25# Samples number 1, 2, ...5, 10, 11, 25

  22. Intermezzo 4: Searching and subsetting - examples 3. By Boolean conditions Patterns may employ variable names and in R common comparison operators (see Table). • The character strings should be quoted. • The conditions can be combined together by logical and, or and brackets. • Logical and can be expressed as ‘.and.’ ‘.AND.’ ‘&’ • Logical or can be expressed as ‘.or.’ ‘.OR.’ ‘|’ • Regular expressions can be employed to search in the textual labels. (advanced technique, see later )

  23. Intermezzo 4: Searching and subsetting - examples 3. By Boolean conditions • Search pattern: Intrusion="Rhum“# Finds all analyses from Rhum • Search pattern: Intrusion="Rhum".and.SiO2>65Search pattern: Intrusion="Rhum".AND.SiO2>65Search pattern: Intrusion="Rhum"&SiO2>65# All analyses from Rhum with silica greater than 65# (all three expressions are equivalent) • Search pattern: MgO>10&(Locality="Skye"|Locality="Islay")# All analyses from Skye or Islay with MgO greater than 10

  24. Data handling I. Displays specified combination of numeric variable(s) and/or labels for selected range of samples. • So far only names of existing numeric data columns and not formulae can be handled.

  25. Data handling I. Deletes a single numeric variable or a label. • Some fields are mandatory and cannot be removed.

  26. Data handling I. Appends an empty numeric data column or new label to the current data set.

  27. Data handling I. Simultaneous editing of all labels for individual samples using a spreadsheet-like interface. • When the desired changes have been performed, close button is to be clicked.

  28. Data handling I.

  29. Data handling I. Global replacement of selected discrete values (levels) for a given label.

  30. Data handling I. Simultaneous editing of all numeric data using a spreadsheet-like interface.

  31. Intermezzo 5: Regular expressions Many enquiries in the GCDkit employ regular expressions. This is a quite powerful searching mechanism more familiar to people working in Unix. • Most characters, including letters and digits, are regular expressions that match themselves. • Dot ‘.’ matches any character. • Metacharacters with a special meaning‘?’ ‘+’ ‘{’ ‘} ’ ‘|’ ‘(’ ‘) ’)must be preceded by a backslash. • Brackets can be used to group subexpressions.

  32. Intermezzo 5: Regular expressions

  33. Intermezzo 5: Regular expressions

  34. Intermezzo 5: Regular expressions - examples # Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy, Westray, Sanday, Stronsay, Tiree, Islay • Search pattern = ol Coll, Colonsay • Search pattern = n.aColonsay, Sanday, Stronsay • Search pattern = ^SSkye, Sanday, Stronsay • Search pattern = e$Skye, Tiree • Search pattern = [ds]ayColonsay, Sanday, Stronsay • Search pattern = [p-s]ayColonsay, Westray, Stronsay

  35. Intermezzo 5: Regular expressions - examples # Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy, Westray, Sanday, Stronsay, Tiree, Islay • Search pattern = ol|oyColl, Colonsay, Hoy • Search pattern = l{2}Mull, Coll # Sample names are: Bl-1, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11, KozD-1, Ri-1 • Search pattern = oz-|Bl-Bl-1, Bl-2, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11 • Search pattern = oz-[1-3] Koz-1, Koz-2, Koz-11 • Search pattern = oz-1{1,} Koz-1, Koz-11

  36. Data handling I. Selecting subsets of the data stored in memory by searching sample names or levels of a single label. • regular expressions implemented Lokalita

  37. Data handling I. Selecting subsets of the data stored in memory by their range. 1:5

  38. Data handling I. Selecting subsets of the current dataset using Boolean conditions. • queried can be both numeric fields and labels (or combinations thereof) • regular expressions can be employed to search the labels Suita=“Ricany”

  39. Data handling I. Restores data for all samples in the same form as they were loaded from a data file.

  40. Data handling II. Grouping the data according to the levels of a single label. Suita

  41. Data handling II. Grouping the data according to the interval a single numerical variable falls into. • Enter a comma-delimited list of one or more breakpoints defining the intervals • The default includes the mean, that would be supplemented by 0 and maximum (i.e. two intervals) • The names of individual groups can be specified • The vector containing the information on the groups can be appended to the labels.

  42. Data handling II. SiO2 52,63 Basic,Intermediate,Acid

  43. Data handling II. Grouping the data using selected classification diagram. • The vector containing the information on the current groups can be appended to the labels.

  44. Data handling II. Grouping the data using the cluster analysis. • After the dendrogram is drawn, the user is asked how many clusters is the dataset to be broken into. • The vector containing the information on the current groups can be appended to the labels. • The groups are initially numbered but the names can be changed readily using the function Edit labels as factor. 5

  45. Data handling II. Enables merging several groups into a single one. • The vector containing the information on the current groups can be appended to the labels. Old Young Young Old

  46. Intermezzo 4: Plotting symbols Use codes from the table or single character vectors as ‘*’,‘B’,‘s’

  47. Intermezzo 5: Plotting colours NB that only numeric codes can be used to specify plotting colours so far.

  48. Data handling III. Assigns plotting symbols and colours simultaneously according to the levels of the defined groups. 34

  49. Data handling III. Assign plotting symbols or colours according to the levels of a single label.

  50. Data handling III. Assign uniform plotting symbols or colours to all the analyses in the current data set.

More Related