180 likes | 338 Views
Data Mining Using Eigenpattern Analysis in Simulations and Observed Data. Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830. John B. Rundle Department of Physics and Colorado Center for Chaos & Complexity University of Colorado, Boulder, CO.
E N D
Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830 John B. Rundle Department of Physics and Colorado Center for Chaos & Complexity University of Colorado, Boulder, CO Presented at the GEM/ACES Workshop Maui, HI July 30, 2001
Activity Correlation Operators Let y(xi,t) be the number of earthquakes per unit time at location xi and time t. Now center the time series (remove mean and standard deviation) y(xi,t) z(xi,t) … where z(xi,t) is the centered time series. Define two correlation operators, a static correlation operator C(xi,xj ) and a rate correlation operator K(xi,xj): C(xi,xj ) = z(xi,t) z(xj,t) dt Static K(xi,xj ) = (2) 2 {z(xi,t)/t} {z(xj,t)/t} dt Rate
Diagonalize the Correlation Operators C(xi,xj ) and K(xi,xj ) are both symmetric, square, and postive definite matrix operators. We can therefore apply singular value decomposition to find the eigenvectors and eigenvalues: C(xi,xj ) = 2 T K(xi,xj ) = 2 T where T denotes the transpose. is a matrix of static eigenpatterns n(xi) is a diagonal matrix of eigenprobabilities i2 is a matrix of rate eigenpatterns n(xi) is a diagonal matrix of eigenfrequencies i2
Comparison of Eigenpatterns 1,2 for 0 (Top)with Eigenpatterns 1,2 for = 0 (Bottom) Positively correlated: (red - red) & (blue - blue). Negatively correlated: (red - blue). Uncorrelated: (red - green) & (blue - green). JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes”
Patterns of Earthquakes in Southern California Earthquakes in southern California have been systematically recorded since 1932. The rate at which these events occur can be used to define activity time series in 10 km x 10 km spatial boxes that can be used to find the spatial patterns. Figures courtesy KF Tiampo Below is a map of the first PCA mode, which we call the “Hazard Mode”. Red areas tend to be active or inactive at the same time. Above is a map of the relative intensity of seismic activity in southern California, 1932-1999. This can be considered to be a seismic “hazard map”. Above is a map of the second PCA mode, which we call the “Landers Mode”. Red areas tend to be inactive when blue areas are active & vice versa. All sites in a blue or red area tend to be active (or inactive) at the same time.
Comparison of Log Likelihoodsfor PDPC from 500 random catalogs of seismic activity in Southern California with occurrence of future events (M > 5) with Log Likelihoods of hazard map & actual catalog via PDPC. Example of a PDPC arising from a catalog that has been randomized in space and time. Histogram: Log Likelihoods for 500 random catalogs. RSV: Use hazard map as predictor. Actual PDPC: Plot at left Actual catalog: PDPC for 1978-Dec 31, 1991
Earthquake Forecasting via the Mathematics of Quantum Mechanics Pattern techniques suggest a new approach to forecasting earthquakes. The idea is to view the patterns in the context of PHASE DYNAMICAL SYSTEMS, whose mathematics can be mapped into the mathematics of QUANTUM MECHANICS. See JB Rundle et al. (2000); KF Tiampo et al. (2000) Using this new technique, one can compute the Phase Dynamical Probability Change (PDPC) anomalies that develop during the years 1988-1999. Our retrospective studies indicate that colored anomalies can be regarded as indicating high probability for current and future major earthquakes (M > 6) over the period ~ 1999-2009, and have considerable forecast skill. In the PDPC method, intensity of seismic activity is mapped to a “wave function (x,t)”. Intensity of seismic activity, 1932-1999
Testing the Forecast One way to test the forecast for events from 2000-2010 is to plot all events with M > 4.0 that have occurred since Jan 1, 2000, superposed upon the colored forecast anomalies. These events are the small circles at right. Note that our method should really only forecast events with M > 6.0
Space-Time Patterns in ComplexMulti-Scale Earthquake Fault Systems Since much of the dynamics is not accessible to direct observations, we must focus on learning about the system through analysis of the observable patterns Space-time patterns in the system are mathematical expressions of the strong statistical correlations between various parts of the system The system state vector characterizes the current state of the system -- it has an amplitude and a phase angle
Mapping Earthquake Dynamics into theMathematics of Quantum Mechanics(or “Phase Dynamics”)(JB Rundle et al., Phys. Rev. E, v61, 2416, 2000) This new technique can be regarded as a novel datamining method Quantum Mechanical systems are strongly correlated systems (QM is a nonlocal theory) The mathematics of QM describe systems with periodic and quasiperiodic observables, as well as hidden variables Relative probabilities are well-defined quantities in QM Normalized system state vectors are actually “WAVE FUNCTIONS” that describe earthquake probability amplitudes
An Earthquake Forecast ? Using our technique, we can compute the PDPC anomalies that develop during the years 1988-1999. Our retrospective studies indicate that these anomalies can be regarded as forecasts for major earthquakes (M > 6) over the period ~ 1999-2009
Data from Last Tuesday PDPC Forecast for ~ 1999-2010 Earthquake Fault System Dynamics are Strongly Correlated in Space and Time and Lead to Patterns
Summary & Future Directions The methods described here can be used to understand many classes of driven threshold systems Network dynamics are determined importantly by the network connectivity as well as the details of the nonlinear threshold process Meanfield threshold systems appear to have locally ergodic behavior Space-time patterns of observable failures (earthquakes) can be used to understand many facets of the underlying, unobservable dynamics (physical state variables)
Boolean Correlation Operators and Space-Time Patterns We can define a set of basis patterns of earthquake activity using Booleancorrelation operators. To do so, we need to define a Boolean activity time series: y(xi,t) As a first step, we coarse grain the domain in space and time…i.e., we divide the region of interest up into N boxes (say, ~10 km on a side) and time into a series of Q short intervals (say, 8 hours). If an earthquake occurs in a spatial box centered at (xi,t), we give a value y(xi,t) = 1 ; y(xi,t) = 0 Otherwise. We therefore have a set of N time series, all Q elements long: y(xi,t) = 0,0,0,1,0,0,0,0,0,0,1,0,0,0,0… etc.
Boolean Activity Eigenpatterns from Simulations Here we show Static or Activity Eigenpatterns from the simulation…these constitute one possible basis set for all possible space-time patterns displayed by the system Key to Correlation Patterns: Red sites are positively correlated with red (and blue with blue) Red sites are negatively correlated with blue Red sites & Bluesites are uncorrelated with green The Activity Eigenpatterns are RELATIVE PROBABILITY AMPLITUDES. ( JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes” )
Summary & Future Directions Numerical simulations (“Third Leg of Science”) are now being used to understand many classes of driven threshold systems (systems with many scales of length and time) Network dynamics of these complex systems are determined importantly by the network connectivity as well as the details of the nonlinear threshold process Meanfield threshold systems have dynamics that demonstrate first and second order (phase) transitions. Threshold systems are capable of universal computation such as that which occurs in the human brain Space-time patterns of observable failures (earthquakes) can be used to understand many facets of the underlying, unobservable dynamics (physical state variables)