790 likes | 1.12k Views
CHEM 801.6. Physical and Analytical Chemistry Module Lecture 1 – Some basic math Ferenc Borondics 2013. Practical information. Where to find me: Ferenc Borondics ferenc.borondics@usask.ca ferenc.borondics @lightsource.ca Canadian Light Source (Rm2021)
E N D
CHEM 801.6 Physical and Analytical Chemistry Module Lecture 1 – Some basic math Ferenc Borondics 2013
Practical information Where to find me: Ferenc Borondics ferenc.borondics@usask.ca ferenc.borondics@lightsource.ca Canadian Light Source (Rm2021) Not easy to get in, but not impossible…
Practical information Analytical and Physical chemistry module Homework 40%, final exam 60% Each homework carries equal weight. Submit homework every Monday by midnight via email. If the email is not in my mailbox by Tuesday morning it means 0 (zero) score.
Practical information You’ll find the lectures on the internet: http://midir.lightsource.ca/talks http://goo.gl/sqXVbo
Practical information 5 lectures • Some basic math, statistics and fundamental theories • Optical spectroscopy and spectromicroscopy • Scanning probe and electron microscopy • Clever analytical devices and the working principles – Ian Burgess • Synchrotron X-ray techniques – Eli Stavitski
Causality and correlation Correlation is important in scientific discovery If we increase of a gas tank T then p increases. pV=nRT (bi-directional causation) Four basic scenarios when A and B are correlated • A -> B • A <-> B • C->(A,B) • By chance – this is the one to watch out for!!!
Accuracy vs. Precision Let’s do a measurement. Repeat N times to be sure…
Vocabulary • What is accuracy? closeness of agreement between a measured quantity value and a true quantity value of a measurand • What is precision? closeness of agreement between indications or measured quantity values obtained by replicate measurements on the same or similar objects under specified conditions • What is trueness??? closeness of agreement between the average of an infinite number of replicate measured quantity values and a reference quantity value http://goo.gl/MsD8vC
Accuracy vs. Precision This is the true value in the center
Accuracy vs. Precision Accurate? Precise? N N Measurement 1
Accuracy vs. Precision Accurate? Precise? N Y Measurement 2
Accuracy vs. Precision Accurate? Precise? Y N Measurement 3
Accuracy vs. Precision Accurate? Precise? Y Y Measurement 4
How to describe datasets Descriptive statistics (some useful quantities) Mean Median Standard Deviation Variance RMS of values Data Min/Max
Types and sources of error person Subject of measurement Measurement method Ambient effects Individual measurement error Random error Constant systematic error Changing systematic error
true value random error Systematic error outlier expected value
Absolute and relative error X – true value x – determined with measurement Absolute error: Relative error:
Error propagation (y) can be determined from x1, x2, x3, …, xn measurements by a calculation of y=f(x1, x2, x3, …, xn) Dx1, Dx2, Dx3, …, Dxn absolute errors are known We need Dy1, Dy2, Dy3, …, Dyntotal error contributions http://goo.gl/y0PcGL
Error propagation 2 General formula for Dy = Dy1+Dy2+Dy3+ … + Dyn But instead: ! For uncorrelated variables…otherwise cov (s2)
Error propagation 4 - homework Calculate the error of a density measurement Calculate the error for pV=nRT equation for both sides r=3.052 g/cm3, Dm=0.01g, DV=0.01cm3, V=10cm3 What is absolute error of the density measurement? r±… What is the relative error of the density measurement?
Principal Component Analysis (PCA) SD and variance describe 1D datasets’ “spread” independent variation of data in multiple dimensions between dimensions How to describe the variation of multiple dimension data with respect between dimensions? Answer: Covariance If you are interested in details of PCA: http://goo.gl/r4ID3r
PCA 2 Covariance Sign and magnitude! Example: hoursstudied = 1, 2, 4, 3, 2, 4, 2 marksobtained= 2, 4, 5, 4, 5, 5, 2 covariance = 1.2857 1.0952 1.0952 1.8095 It’s a matrix.
PCA 3 Visualization Larger datasets More than 2D…?
PCA 4 Example: students… Find the most important parameters to describe students Measure: height, weight, hair color, eye color, grades, shoe size, gender, … Now what?
PCA 5 Which parameters to ignore? constants: number of heads constants with low variance: hair thickness linearly dependent: head size and height Which parameters should we keep? not dependent on others: eye color have large variance: grades Goal: find high variance variables and do a transformation that maximizes their variance (with some conditions) -> this is PCA
PCA i) ii) iii)
PCA Two objectives: - reduce data dimensionality - discover “hidden” correlations/structures Creating new variables called principal components - a few PCs account for most of the total variance - orthogonal to each other, minimize covar http://goo.gl/L9m66l
PCA Loadings: pc1=a1x1+a2x2+a3x3+…+anxn ak-> loading Where to stop / how many PCs to use? rule of thumb: stop if cumulative variance is > 70%
PCA Application in spectroscopy
0.8 0.7 0.6 0.5 0.4 Absorbance 0.3 0.2 0.1 4000 3500 3000 2500 2000 1500 1000 Wavenumbers (cm-1) PCA imaging PC1 PC2
PCA An interesting application: Eigenfaces http://goo.gl/k6Qm37
PCA 6 • Used in exploratory statistics • Multivariate statistical method to • Reduce data dimensions • Help to discover “hidden” relationships/patterns • Shift variances into a few variables
Cluster analysis Given a huge amount of data Several 10s of thousands of spectra Sort them into sets by some criteria
Cluster analysis (HCA) Many different algorithms • Hierarchical cluster analysis (HCA) 1. Calculate distance matrix (n x n): contains information on the similarity of spectra. 2. The two most similar spectra, that are spectra with the smallest “distance” are combined to form a new object (cluster). 3. The spectral distances between all remaining spectra and the new object are re-calculated. 4. A new search for the two most similar objects (spectra or clusters) is initiated. These objects are merged and again, the distance values for the newly formed cluster are determined. 5. Loop n-1 times until only one cluster remains.
Cluster analysis (HCA) http://goo.gl/HF4nCZ
Cluster analysis (HCA) Distance definitions D-values HW: define your own distance Hint: Google is your friend Eulidean distance Normalized E. distance And others…
Cluster analysis (HCA) Clustering methods (again, many…) And others…
Cluster analysis (HCA) http://goo.gl/HF4nCZ
Cluster analysis (k-means) # of clusters are determined by user 1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. 2. Assign each object to the group that has the closest centroid. 3. When all objects have been assigned, recalculate the positions of the K centroids. 4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated Minimize Distance can be calculated in many ways (see HCA) http://goo.gl/X6kPSH