E N D
GG450 March 6 & 11, 2008 Elementary Digital Analysis
Before getting into seismic methods and processing, we’ll spend some time on the principles of digital data analysis. Since effectively all geophysical analysis is now done on computers, a basic understanding of the power and limitations of digital methods is important.
The output of seismometers and other instruments often require considerable work before the information contained can be made useful. The operations necessary may require large numbers of repeated calculations - the type of operations efficiently handled by a computer. Unfortunately we can't simply 'feed' the data into the computer and "tell" the computer what information we want (not yet, anyway, computers are still very stupid). We need to understand both the benefits possible and the limitations imposed by computer, or digital, analysis.
What must be done to the data to make it computer-readable? Analog data: Analog data are the type we normally deal with when using electronic systems such as seismometers. At any time we can find a continuous signal, or voltage, that is related to motion detected by our seismometer. Example of an analog time series. The signal is continuous in amplitude and time. If there is a time when the amplitude is V0 and the amplitude is V0+V at some later time. Then there is some time when the amplitude is V0+dV, an infinitesimally small change from V0+V. This is NOT true for digital signals.
Amplitude-Modulated Data: In amplitude-modulated signals, AM radio for example, the AMPLITUDE of the carrier at a particular frequency changes as the signal level changes. This is still analog data. Example of an amplitude-modulated signal (similar to the data in the figure above). The amplitude of the carrier frequency is changed according to the signal amplitude.
Frequency-Modulated Data: For FM signals, as signal amplitude changes, the frequency of a "carrier" changes - as in FM radio. Example of frequency-modulated signal (similar to the data in the figures above). The frequency of the carrier is increased for large signals and decreased for small signals. FM data sound like a warbling tone. FM signals have higher FIDELITY than AM signals, and are used for higher quality music. What is fidelity?
FM and AM signals are still analog – computers can’t read them. • To transform your signals into a computer-readable format, you need to DIGITIZE them. We need to do two things: • Change the continuous signal into discrete SAMPLES in time (or space, or whatever, depending on the type of data), and • 2) QUANTIZE the samples by limiting the smallest value, the smallest resolvable step in amplitude, and the largest size. We tell the computer the SCALE factor of the smallest number it can recognize and limit the size of the sample.
Computers, being very stupid, can understand only two states, on and off, or whatever you like to call it, true-false, 1-0, yes-no, etc. They can't handle analog data directly because analog signals are defined at all times - no matter how small the change since a previous time - and the analog signal level is defined exactly. Computers can't understand anything continuous either in time or in amplitude.
Number systems. We humans use the base-10 or “decimal” number system, because we have ten fingers. The characters 0-9 represent the ten “states” of a decimal digit. In the BINARY number system, which computers CAN understand, the two characters representing the states are 0 and 1. In decimal numbers, when a number is greater than 9, we add another digit to the left which gets multiplied by 10. In binary, when we have a number greater than 1, we add a digit to the left that is multiplied by 2. OCTAL and HEXIDECIMAL numbers are derivatives of the binary system as shown below:
The dark lines mark addition of another digit. Observe that the binary, octal, and hex systems are related, and that they are particularly well suited to a computer's binary logic, while the decimal system is good for counting on human fingers, but cumbersome for computers. Changing a number from binary to octal or hex is easy, compared to changing it to decimal.
We can represent any number as a series of digits with 10 possible values from 0 to 9 - called decimal numbers, or with 8 possible values from 0 to seven, called octal numbers, or 16 possible values from 0 to 9 and A,B,C,D,E, and F, called hexadecimal (or just hex).
1) Change the numbers below to the other number systems: 2) When we change a number in the decimal system from 9 to 10 or 99 to 100 we need to add another digit. This happens every order of magnitude. In the binary system, this happens every time we add a digit? What happens in the octal and hex systems?
DIGITAL DEFINITIONS: • BIT: One binary digit, with only two possible values, 0 or 1 • BYTE: 8 bits • How many decimal values can a byte have? 28-1=255. This is a handy number - more than the total number of symbols we often use - numerals, punctuation, letters (small and capitals), etc. So text characters are usually coded in a byte. • WORD: A group of bits that constitute a number. Words have different numbers of bits depending on how the word is to be used and the type of computer. An INTEGER word has a whole number value, an ASCII CHARACTER is a number associated with a letter or alphanumeric character. • For example, the letter “A” in ASCII is “01000001” in binary, “101” in octal, or “41” in hex. The letter “a” in ASCII is “01100001”, “141”. or “61” respectively.
SAMPLE: A group of bits that represent a data value at a particular time (or place). How do we change a voltage into a sample? We use an Analog-to-Digital Converter, A/D, or ADC. Most A/D's work by progressive approximation. A sample may not need to contain lots of bits to have the required resolution. If data consist of “yes” or “no” values, then 1 bit is enough. DYNAMIC RANGE: The range from the smallest number that can effectively be placed in a sample to the largest number before the word "overflows" is called the dynamic range, usually measured in dB. Since increasing the number of bits in a sample by one increases the largest number by a factor of two, dynamic range increases by 6 dB/bit. For example, a 10 bit word has 60 dB of potential dynamic range (60 dB = 6 dB/bit *10 bits). The effective low-end numbers are often limited by system noise, rather than the number of bits.
What’s a dB? A salesman wants to sell you an amplifier with a dynamic range of 40 dB ! Wow! Is that good? What does it mean? dB stands for decibel, or ten ‘Bells’, after Alexander Graham Bell. dB=10 Log10(E/E0) where E is the energy or power of a signal and E0 is a reference energy. Alternatively: dB=20 Log10(A/A0) where A is the of a signal and A0 is a reference Amplitude.
A seismometer has a noise level of 3 volts, and the largest expected signal will be ±10 volts. How many bits are necessary per sample to resolve this signal? • [Smallest # =3x10-6, largest # = 10. Largest/smallest= 3.3x107. ]
DATA RATE: The number of samples of an analog signal taken per time unit. The data rate is particularly critical in digital analysis. The data are only defined at the times when samples are taken. How can we do this and not loose any information? Nyquist Theorem: An analog signal sampled at a rate that is at least twice the frequency of the highest frequency present in the signal will contain all the information that was in the original signal. YOU MUST HAVE AT LEAST TWO SAMPLES FOR EVERY CYCLE TO RETAIN ALL THE INFORMATION IN A SIGNAL.
We won't prove this, but you should know what it means: If you have a signal without appreciable energy at periods shorter than 10 seconds, if that signal is sampled at a rate of at least one sample every 5 seconds, all the information in the signal will be retained. The highest frequency where information is available (1/2 the sampling frequency, is called the NYQUIST frequency. This is also true for spatially sampled data – like data used to make a map. If your sample points are not closer together in distance than 1/2 the wavelength of the shortest “signal”, then your resulting map will be distorted.
You obtain a gravity profile like the one below. You want to digitize it for input into a computer. How closely do you need to sample the data?
Let's see what happens if we try to cheat on Dr. Nyquist: • In the figure above, a wheel is rolling from left to right at each row. We take snapshots of it that are shown as colored wheels. A mark on the wheel shows how far it has rotated. • In the top row, the snapshots are close together, and we see that the wheel rotates 90° between snapshots. • In the second row, the darkened circles show what we would see if we took snapshots 1/8th as fast. The wheel would appear to not be rotating, since the black line is always at the top. • In the third row, the snapshots are taken 1/9th as fast as the top row, and the wheel appears to be rotating much slower than expected. • In the fourth row, the snapshots are taken 1/7th as fast as in the top row, and the wheel appears to be rotating BACKWARDS.
ALIASING: When a signal is sampled such that the Nyquist rule is not followed, the information at frequencies above the Nyquist is not LOST, it is moved to a lower frequency. If we fold a plot of amplitude vs. frequency at the Nyquist frequency, the energy in frequencies above the Nyquist will show up in our digital data "folded back" around the Nyquist frequency. If the analog frequency of a signal is 45 Hz, and the sampling frequency is 50 Hz, then the Nyquist frequency is 25 Hz. The signal in question is 20 Hz above the Nyquist, and after digitizing, it will appear at 20 Hz below the Nyquist, or at 5 Hz.
WHY DIGITAL ANALYSIS? Seismic data are often contaminated by noise, and recorded in weird formats. With careful processing, we can increase the signal-to-noise ratio, and remove unwanted effects. We may want to combine seismograms from different times and different geophones, or "stretch" seismograms depending on the velocity of the material the arrivals traveled through, so that they have a "depth" axis, rather than a time axis. If the seismic source for our data is complex, we may want to remove its effects so that we can see the effects of the earth instead. These operations are FAR easier and more accurate to do digitally than with analog operations.
Also, digital data can be copied nearly forever without loss of information or introduction of noise. This is NOT true of analog data - as you may know from trying to copy music or video from tape-to-tape. In fact, the GENETIC CODE is DIGITAL. If it weren't, mutations would be much more common. However, it is possible for digital signals to lose or distort information that was in the analog signal if digitization is not performed well. If the signal amplitude is too large for the dynamic range of the system, the signal will be clipped. Clipping adds noise to a signal. If the RESOLUTION of the digitizing system is too low, the uncertainty in the signal level will add noise to the data. If the TIME at which a particular sample was taken is uncertain, then noise is added to the data.
SIGNAL COMPRESSION is often applied to digital signals to make them fit into a smaller space in memory. Such compression is LOSSLESS if NO information is lost from the original digital signal, or LOSSY if information is lost. Lossy compression is NOT a good idea if the data are to be further processed, but it can be a great space saver for many applications. JPEG and MPEG are lossy compression schemes.
What types of operations can be done efficiently on a computer? Several operations are important, and they are based on the operations you already know - like addition, subtraction, and multiplication, plus two new operations that we will study - convolution and Fourier Transform. These two processes form the backbone of nearly all digital analysis.