1 / 107

Preprocessing

Preprocessing. Chapter 11. What is Preprocessing. Preliminary image manipulations prior to analysis Typical operations Radiometric processing to adjust for haze Geometric processing to register image to map Cloud masking or land masking There is no “standard” set of preprocessing steps.

tudor
Download Presentation

Preprocessing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preprocessing Chapter 11

  2. What is Preprocessing • Preliminary image manipulations prior to analysis • Typical operations • Radiometric processing to adjust for haze • Geometric processing to register image to map • Cloud masking or land masking • There is no “standard” set of preprocessing steps

  3. Preprocessing Categories • Three main types • Feature extraction • Radiometric correction • Geometric correction

  4. Feature Extraction • Features are not geographic entities visible on an image • Features refers to statistical characteristics of the data • Individual bands or combinations have information about systematic variation in the scene • In theory, discarded data contain noise and errors • Aim is to improve accuracy of the image

  5. Feature Extraction • Feature extraction can also reduce the number of bands that have to be analyzed • Reduces computations • Think about a hyperspectral scanner with over 200 bands

  6. Feature Extraction • Correlation matrix • Comes from regression analysis

  7. Feature Extraction • The correlation matrix is just the correlations among all possible pairs of bands

  8. Feature Extraction • The correlation matrix is just the correlations among all possible pairs of bands

  9. Feature Extraction • The correlation matrix is just the correlations among all possible pairs of bands

  10. Feature Extraction • The correlation matrix is just the correlations among all possible pairs of bands

  11. Feature Extraction • So bands 3,5, and 6 carry almost as much information as all seven bands • We could ignore bands 1,2,4, and 7 • This is a really simplistic form of feature extraction • Usually more complex and based on statistical relationships

  12. Feature Extraction • Principal Component Analysis (PCA) is such a statistical technique • PCA involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. • The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible.

  13. We’ll use an example from a TM image of Morro Bay, California

  14. Feature Extraction • This transformation is a rotation of the original axes to new orientations that are orthogonal to each other and therefore there is no correlation between variables. • In the following example there is a high correlation between bands 1 and 2.

  15. New axes Original axes

  16. MSS (Multi-Spectral Scanner) data • For 4 bands PC1=aDN1+ bDN2+ cDN3+ dDN4 • Typically 95% of all variance can be explained by 2 components, so the 'intrinsic dimensionality' of MSS data is 2, • although lower components can display features not visible on bands, otherwise obscured by the main components

  17. TM (Thematic Mapper)  and ETM data • The seven bands yield up to 7 components, although many users ignore Thermal (band 6) as it is so different. • The components vary according to the scene: forested, mountainous etc.. but generally we see these patterns: • PC1: 'brightness' the weighted sum of all bands, topography is strong (or cultural data), hydrography weak.. PC2: 'greenness' (Band 4 v Visible) superior land/water and veg discrimination PC3: 'Swirness' (Near IR and Visible v Mid (ShortWave) IR): highlights hydrography and moisture PC4: Thermal band influence PC5: Bands 5 v 7  (the two mid IR bands) PC6: Blue v Red   (similar to 1/3 ratio) PC7: Green v Red (similar to 2/3 ratio)

  18. Caveats • Since each PC is a linear combination of the original channels one has to be prepared to interpret what they mean • Each image will have its own PC set, which can not be applied to another image • Would have to recalculate for each image

  19. Feature Extraction • PCA determines the optimum linear combination of bands that can account for the variation in pixels • Since the rotation is a linear combination of the original measurements, if all of the axes are included in the rotation, no information is lost. • "No information is lost" means that the original measurements can be recovered from the principal components. • If the original data set is singular, then principal components will produce a new representation that is not singular. • There are several ways of viewing this transformation:

  20. Feature Extraction • It can be viewed as a rotation of the existing axes to new positions in the space defined by the original variables. • In this new rotation, there will be no correlation between the new variables defined by the rotation. • The first new variable contains the maximum amount of variation, the second new variable contains the maximum amount of variation unexplained by the first and orthogonal to the first, etc...

  21. Feature Extraction • It can be viewed as finding a projection of the observations onto orthogonal axes contained in the space defined by the original variables. • The criteria being that the first axis "contains" the maximum amount of variation, or "accounts" for the maximum amount of variation. • The second axis contains the maximum amount of variation orthogonal to the first. • The third axis contains the maximum amount of variation orthogonal to the first and second axis and so on until one has the last new axis which is the last amount of variation left.

  22. How do we get this figure? • The elliptical cloud that lies parallel to the X axis is what we might expect. • But, need to remember that this is a rigid rotation of axes in a 7 dimensional space, one for each band (or variable). • We can see here that the original data was not Multivariate Normal, an assumption that would need to be met if one wanted to carry out any parametric statistical tests. • This non-normality is indicated the anomalous cloud of points going diagonally across the graph. If the data were multivariate normal in 7 dimensions, then the plot would only have a cloud like the horizontal one in the above plot.

  23. Let's check TM bivariate scatter plots for the Morro Bay bands. • Various combinations were tried, but an interesting one is Band 2 (abscissa) vs Band 4 (ordinate):

  24. This plot shows a bimodal distribution of data points. • The upper (blue & purple dots) plot shows strong correlation (spread around a mean line is small) between the two bands. • This plot is for all the water in the scene - the DNs for water extend over a wide range of values but value changes in one band are matched by similar change increments in the other band. • The second plot (orange/yellow/green) is for all other classes in the image. • There is strong correlation when DN values are low but as these increase for both bands the plot widens. • This means that for much of the DN value range, the two bands are less correlated and should serve increasingly well as discriminators in any classification.

  25. PCA output is a new set of DN pixels for each derived component. • The DN set can be made to appear as an image that resembles to some extent any of the individual TM bands. • We will now look at each of these components as images, keeping in mind that many of the tonal patterns in individual components do not seem to spatially match specific features or classes identified in the TM bands and represent linear combinations of the original values instead. • We make only limited comments on the nature of those patterns that lend themselves to some interpretation.

  26. The first Principal Component contains the maximum amount of variation in the 7-dimensional space defined by the seven Thematic Mapper bands. • The image produced from PC 1 data commonly resembles an actual aerial photograph. • This is normal for the first component, in that it broadly simulates standard black and white photography and it contains most of the pertinent information inherent to a scene. • The hills appear more realistic because the sharp light-dark contrast in most TM bands is subdued. • Note the internal structure of the waves and the absence of any indication of sediment load in the sea.

  27. The histogram of PC 1 shows two peaks • The first, on the left, are the ocean pixels and the second one, to the right, the land pixels.

  28. For PC 2, the bulk of the pixels falls in such a narrow range, the image does not seem to have much interpretable patterns

  29. PC 2 image histogram equalization stretched • Breaking waves are singled out as very bright

  30. Some of the gray patterns in the PC3 image below can be broadly correlated with two combined classes of vegetation: • The brighter tones come from the fairways in the golf course and many of the agricultural fields. • Moderately darker tones coincide with some of the grasslands, forest or tree areas, and coastal marshland. • Note that both the beach and waves almost disappear as patterns.

  31. The breakers completely disappear in the PC4 image, while the rest of the scene is rather flat and mostly dark but with several areas of medium grays.

  32. You may be wondering what the remaining PCs (through PC7) look like, and if they show any useful information. • The response, after examining, for example, PC6, is that the features we are familiar with do appear but probably offer little new in interpretation. • Note that the waves in the image below now are black - interesting but perhaps meaningless; the golf course pattern is also black.

  33. The information in PCA images can be revealed better by combining them visually as registered overlays. • Any three of these PC images can be made into color composites with various assignments of blue, green, and red. • In all, 24 different combinations are possible.

  34. The next image composed of PC 4 = blue, PC 1 = green, and PC 3 = red. • In this rendition, the golf course has a singular color signature (orange-red) and a unique internal structure. • Most other vegetation shows as red to purple-red tones, but the grasslands (v) has an unusual color, describable as greenish-orange. • The brighter slopes of the hills and mountains appear as medium green, while some areas in shadow, are bluish. • The urban areas also have a deep blue color. • The beach bar now appears as turquoise and the adjacent breakers are olive-green.

  35. Subsets • Does not seem to be very challenging • Often have to “register” by matching to another data set • Convenient to prepare subsets before registration due to increased computation with large images • But if subset is too small, ther may not be enough Ground Control Points • Have to use an intermediate size subset

  36. Subsets • Subsets should be large enough to provide context • Enough training sites for accuracy

  37. Radiometric Preprocessing • Many operations are image restoration • Help to remove interferences and noise • Brightness of the surface is what is desired • Brightness from atmosphere can contribute to overall brightness

  38. Radiometric Preprocessing • Atmospheric corrections in 3 categories • Model physical behavior of EM radiation through the atmosphere • Advantages are rigor and wide applicability • Disadvantages – very complex, require detailed meteorological information

  39. Radiometric Preprocessing • Examination of reflectances from objects of known brightness • IR is absorbed by water, so should be black • If different, could subtract that value from all the pixels • Known as histogram minimum method (HMM) or the dark object subtraction (DOS) technique.

More Related