960 likes | 1.16k Views
Menu GCDkit. GCDkit I. Loads ASCII file or imports clipboard (e.g. data copied from Excel) Data are separated by tabulators, commas or semicolons The 1 st line contains unique labels for the data columns (e.g. ‘SiO2’, ‘Fe2O3’, ‘Rb’, ‘Nd’ ), the 1 st column unique sample IDs
E N D
GCDkit I. Loads ASCII file or imports clipboard (e.g. data copied from Excel) • Data are separated by tabulators, commas or semicolons • The 1st line contains unique labels for the data columns (e.g. ‘SiO2’, ‘Fe2O3’, ‘Rb’, ‘Nd’),the 1st column unique sample IDs • Decimal commas are converted to decimal points if appropriate • Missing values are allowed anywhere in the data file; as such are interpreted also values ≤ 0, or any of ‘NA’,‘N.A.’,‘-’, ‘b.d.’, ‘bd’
GCDkit I. • Total iron as ferrous oxide: ‘FeOt’ or ‘FeO*’ • Structurally bound water: ‘H2O.PLUS’, ‘H2O+’, ‘H2OPLUS’ or ‘H2O_PLUS’ • Column ‘Symbol’ (if any): plotting symbols (as codes or single characters) • A column whose name starts with ‘Col’ (if any): code for colour of the symbols • Avoid special symbols in the column names, and accented characters throughout the file!
GCDkit I. Appends new samples (= new rows) to the data in memory. • The structures of both data files are, as much as possible, matched. • If necessary, empty columns are introduced to either of the data sets. File 1 File 2
GCDkit I. Adds new data (columns) to the samples stored in the memory. • No new samples are introduced that would occur solely in one of the files. File 2 File 1
GCDkit I. Saves the modified data set stored in memory under a specified filename. • The data can be retrieved again into GCDkit using the ‘Load data file’ command.
GCDkit I. Information about the current dataset: • levels and frequencies for each of the labels, • list + no. of numeric columns, • for each of the numeric variables no. of available values, • total no. of samples, • list of samples in the selected subset (or all samples if none is defined), • current grouping information.
GCDkit I. Prints a cross table (contingency table) for 1-3 labels and plots corresponding barplots.
Contingency tables An example of a contingency table involving two labels
GCDkit I. Restricts the textual output to an absolute minimum (which is useful for large data files)
Intermezzo 1: Specifying a variable in GCDkit • Enter complete name of a variable (e.g., ‘SiO2’) • Type only part of the variable name. If the result is ambiguous, the desired variable has to be selected from the list of the multiple matches by mouse (applies also for empty patterns) • Specify the variable sequence number (2 for the second one). • Often if a formula is entered, the results are interpreted and computed by the calculation core. S
Intermezzo 2: Formulae & calculation core Formula can involve any combination of names of existing numerical columns, with the constants, brackets, arithmetic operators +-*/^ and R functions. Examples of valid formulae: • (Na2O+K2O)/CaO • Rb^2 • log10(Sr) • mean(SiO2)/10
Data handling I. Displays a single numeric variable or a result of a calculation # Works as a simple R shell too! • summary(Rb,na.rm=T) • cbind(SiO2/2,TiO2,Na2O+K2O) • cbind(major) • hist(SiO2,col="red") • boxplot(Rb~factor(groups))
Intermezzo 3: Specifying multiple variables • List of column name(s), in full, separated by commas • Sequence numbers of variables or their ranges (1,10:15) • Name of a built-in list, such as ‘LILE’, ‘REE’, ‘major’ and ‘HFSE’ or their combinations with the column names • User-defined list = simple character vector. Currently only a single, stand-alone user-defined list can be employed as a search criterion • For empty patterns, the correct name(s) has to be selected by mouse click(s) (± Shift ± Ctrl) from the list of the available variables
Intermezzo 3: Specifying multiple variables - examples • Search pattern = majorSiO2, TiO2, Al2O3, Fe2O3, FeO, MnO, MgO, CaO, Na2O, K2O, P2O5 • Search pattern = LILERb, Sr, Ba, K, Cs, Li • Search pattern = HFSENb, Zr, Hf, Ti, Ta, La, Ce, Y, Ga, Sc, Th, U • Search pattern = REELa, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu • Search pattern = 1:5,7Numeric data columns number 1, 2, ...5, 7 • # User-defined listmy.elems<-c("Rb","Sr","Ba")Search pattern = my.elemsRb, Sr, Ba
Intermezzo 4: Searching and subsetting • The search pattern is first tested whether it could be interpreted as a query of the sample name(s). The list of exact sample names separated by commas is allowed. • The pattern is assumed to correspond to a selection of sample sequence numbers. • Lastly the search pattern is interpreted as a Boolean condition. • Entering empty pattern usually returns all the samples in the data set.
Intermezzo 4: Searching and subsetting - examples 1. By sample name • Search pattern = ozSamples with names Koz, KozD-5, Roz-5 … • Search pattern = Bl-1,Bl-2,Koz-3Samples with names Bl-1,Bl-2,Koz-3 • Regular expressions (advanced technique, see later)
Intermezzo 4: Searching and subsetting - examples 2. By sample range In this case the search pattern is treated as a selection of sample sequence numbers (effectively a list separated by commas that may also contain ranges expressed by colons). • Search pattern = 1:5# First to fifth samples in the data set • Search pattern = 1,10# First and tenth samples • Search pattern = 1:5, 10:11, 25# Samples number 1, 2, ...5, 10, 11, 25
Intermezzo 4: Searching and subsetting - examples 3. By Boolean conditions Patterns may employ variable names and in R common comparison operators (see Table). • The character strings should be quoted. • The conditions can be combined together by logical and, or and brackets. • Logical and can be expressed as ‘.and.’ ‘.AND.’ ‘&’ • Logical or can be expressed as ‘.or.’ ‘.OR.’ ‘|’ • Regular expressions can be employed to search in the textual labels. (advanced technique, see later )
Intermezzo 4: Searching and subsetting - examples 3. By Boolean conditions • Search pattern: Intrusion="Rhum“# Finds all analyses from Rhum • Search pattern: Intrusion="Rhum".and.SiO2>65Search pattern: Intrusion="Rhum".AND.SiO2>65Search pattern: Intrusion="Rhum"&SiO2>65# All analyses from Rhum with silica greater than 65# (all three expressions are equivalent) • Search pattern: MgO>10&(Locality="Skye"|Locality="Islay")# All analyses from Skye or Islay with MgO greater than 10
Data handling I. Displays specified combination of numeric variable(s) and/or labels for selected range of samples. • So far only names of existing numeric data columns and not formulae can be handled.
Data handling I. Deletes a single numeric variable or a label. • Some fields are mandatory and cannot be removed.
Data handling I. Appends an empty numeric data column or new label to the current data set.
Data handling I. Simultaneous editing of all labels for individual samples using a spreadsheet-like interface. • When the desired changes have been performed, close button is to be clicked.
Data handling I. Global replacement of selected discrete values (levels) for a given label.
Data handling I. Simultaneous editing of all numeric data using a spreadsheet-like interface.
Intermezzo 5: Regular expressions Many enquiries in the GCDkit employ regular expressions. This is a quite powerful searching mechanism more familiar to people working in Unix. • Most characters, including letters and digits, are regular expressions that match themselves. • Dot ‘.’ matches any character. • Metacharacters with a special meaning‘?’ ‘+’ ‘{’ ‘} ’ ‘|’ ‘(’ ‘) ’)must be preceded by a backslash. • Brackets can be used to group subexpressions.
Intermezzo 5: Regular expressions - examples # Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy, Westray, Sanday, Stronsay, Tiree, Islay • Search pattern = ol Coll, Colonsay • Search pattern = n.aColonsay, Sanday, Stronsay • Search pattern = ^SSkye, Sanday, Stronsay • Search pattern = e$Skye, Tiree • Search pattern = [ds]ayColonsay, Sanday, Stronsay • Search pattern = [p-s]ayColonsay, Westray, Stronsay
Intermezzo 5: Regular expressions - examples # Searched is list of localities: Mull, Rhum, Skye, Coll, Colonsay, Hoy, Westray, Sanday, Stronsay, Tiree, Islay • Search pattern = ol|oyColl, Colonsay, Hoy • Search pattern = l{2}Mull, Coll # Sample names are: Bl-1, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11, KozD-1, Ri-1 • Search pattern = oz-|Bl-Bl-1, Bl-2, Bl-3, Koz-1, Koz-2, Koz-5, Koz-11 • Search pattern = oz-[1-3] Koz-1, Koz-2, Koz-11 • Search pattern = oz-1{1,} Koz-1, Koz-11
Data handling I. Selecting subsets of the data stored in memory by searching sample names or levels of a single label. • regular expressions implemented Lokalita
Data handling I. Selecting subsets of the data stored in memory by their range. 1:5
Data handling I. Selecting subsets of the current dataset using Boolean conditions. • queried can be both numeric fields and labels (or combinations thereof) • regular expressions can be employed to search the labels Suita=“Ricany”
Data handling I. Restores data for all samples in the same form as they were loaded from a data file.
Data handling II. Grouping the data according to the levels of a single label. Suita
Data handling II. Grouping the data according to the interval a single numerical variable falls into. • Enter a comma-delimited list of one or more breakpoints defining the intervals • The default includes the mean, that would be supplemented by 0 and maximum (i.e. two intervals) • The names of individual groups can be specified • The vector containing the information on the groups can be appended to the labels.
Data handling II. SiO2 52,63 Basic,Intermediate,Acid
Data handling II. Grouping the data using selected classification diagram. • The vector containing the information on the current groups can be appended to the labels.
Data handling II. Grouping the data using the cluster analysis. • After the dendrogram is drawn, the user is asked how many clusters is the dataset to be broken into. • The vector containing the information on the current groups can be appended to the labels. • The groups are initially numbered but the names can be changed readily using the function Edit labels as factor. 5
Data handling II. Enables merging several groups into a single one. • The vector containing the information on the current groups can be appended to the labels. Old Young Young Old
Intermezzo 4: Plotting symbols Use codes from the table or single character vectors as ‘*’,‘B’,‘s’
Intermezzo 5: Plotting colours NB that only numeric codes can be used to specify plotting colours so far.
Data handling III. Assigns plotting symbols and colours simultaneously according to the levels of the defined groups. 34
Data handling III. Assign plotting symbols or colours according to the levels of a single label.
Data handling III. Assign uniform plotting symbols or colours to all the analyses in the current data set.