Supervised Multiattribute Classification

Attributes for Resource Plays Kurt J. Marfurt (The University of Oklahoma) Satinder Chopra (Arcis) Supervised Multiattribute Classification

Course Outline • A Short Overview of Spectral Decomposition • Geometric Attributes • Attribute Prediction of Fractures and Stress • Interactive Multiattribute Analysis • Statistical Multiattribute Analysis • Unsupervised Multiattribute Classification • Supervised Multiattribute Classification • Impact of acquisition and processing on seismic attributes • Post-stack data conditioning • Prestack data conditioning • Inversion for Acoustic and Elastic Impedance • Attributes and hydraulic fracturing of shale reservoirs • Attributes applied to the Mississippi Lime

Multiattribute Analysis Tools Interpreter-Driven Attribute Analysis Machine Learning Attribute Analysis Unsupervised Learning • K-means • Mixture Models • Kohonen Self-Organizing Maps • Generative Topographical Maps Interactive Analysis • Cross-correlation on Maps • Cross-plotting and Geobodies • Connected Component Labeling • Component Analysis • Image Grand Tour Statistical Analysis • Analysis of Variance (ANOVA, MANOVA) • Multilinear Regression • Kriging with external drift • Collocated co-kriging Supervised Learning • Statistical Pattern Recognition • Support Vector Machine • Projection Pursuit • Artificial Neural Networks

Artificial Neural Nets (ANN) Neurons

Artificial Neural Nets (ANN) • Objective: From continuous input measurements (e.g. seismic attributes): • Predict a continuous output (e.g. porosity) • Predict discrete lithologies (e.g. wet sand, gas sand, limestone, shale,…)

Artificial Neural Nets (ANN) Attributes Observations +1 Looks like a duck? Quack like a duck? Walk like a duck? yes 0 no

Linear Neurons used in Predictive Deconvolution a0=1 (Bias) w0 a1 w1 a2 w2 a3 Output w3 Perceptron,r Prediction distance aN Prediction wN N-long operator, w 0 3 2 1 Time (s) (Courtesy Rock Solid Images)

yes no { 1 if y > +0.5 0 if y < -0.5 Output, r = w0 w1 w2 wn . . . an a2 a1 Input attributes, ai The Perceptron a0=1 Unknown weights, wi 1.0 0.5 r 0.0 -1.5 -1.0 -0.5 0.0 +0.5 +1.0 +1.5 y

yes no Inverter y y=-1*0+0.5*1= +0.5 w0= 0.5 -1*1+0.5*1= -0.5 1 w1= -1 a1 1.0 0.5 r 0.0 -1.5 -1.0 -0.5 0.0 +0.5 +1.0 +1.5 y

yes no BooleanOR y y=1*0+1*0-0.5*1= -0.5 w0= -0.5 y=1*0+1*1-0.5*1= +0.5 1 w1=1 w2=1 y=1*0+1*1+0.5*1= +0.5 y=1*1+1*1-0.5*1= +1.5 a1 a2 1.0 0.5 r 0.0 -1.5 -1.0 -0.5 0.0 +0.5 +1.0 +1.5 y

yes no BooleanAND y y=1*0+1*0-1.5*1= -1.5 w0= -1.5 y=1*0+1*1-1.5*1= -0.5 1 w1=1 w2=1 y=1*0+1*1-1.5*1= -0.5 y=1*1+1*1-1.5*1= +0.5 a1 a2 1.0 0.5 r 0.0 -1.5 -1.0 -0.5 0.0 +0.5 +1.0 +1.5 y

BooleanXOR y Doesn’t work! a1 a2

Linear Separability a2 1 1 0 1 1 0 OR XOR a1 0 1 0 0 0 1 a2 a2 AND a1 a1 OK! OK! Can’t separate!

yes no BooleanAND Boolean XOR y=1*h1-1*h2+-0.5*1=0.5 y h2 h1 w0= -0.5 y=1*h1-1*h2+-0.5*1=-0.5 w0= -0.5 y=1*h1-1*h2+-0.5*1=-0.5 w2=1 w1=1 w1=1 1 w0= -1.5 w1=-1 1 the hidden layer! w2=1 w1=1 1 BooleanOR y=1*h1-1*h2+-0.5*1=0.5 a1 a2 1.0 0.5 r 0.0 -1.5 -1.0 -0.5 0.0 +0.5 +1.0 +1.5 y

A typical neural network input layer! hidden layer! output layer! (Ross, 2002)

Decision workflow • Choose the classes you wish to discriminate • Choose attributes that differentiate these classes • Train using calibrated or “truth” data • Validate with “truth” data not used in the training step • Apply to the target data • Interpret the results (van der Baan and Jutten, 2000)

Alternative perceptrons Continuous output classes (e.g. porosity) Intermediate results (in hidden layer) Discrete output classes e.g. lithology fh[r(w)] fs[r(w)] fG[r(w)] r(w) r(w) r(w) differentiable differentiable (van der Baan and Jutten, 2000)

2-attribute example with a single decision boundary Weights Attributes Perceptron Output w1 a1 y r(w) w2 a2 0 or 1 w0 Decision boundary (van der Baan and Jutten, 2000)

Example of two attributes with a single decision boundary Brad says: “We could have more than one decision boundary!” Decision boundary a2=-w1/w2*a1+w0/w1 a2 Class 1 Brad a1 Class 2 (van der Baan and Jutten, 2000)

Example of two attributes with three decision boundaries Weights Attributes Perceptron Weights Output Hidden Layer 0 or 1 Decision boundaries Explicit representation (van der Baan and Jutten, 2000)

Example of two attributes with three decision boundaries Weights Attributes Perceptron Output Hidden Layer 0 or 1 Decision boundaries This is a more compact representation of the previous image (van der Baan and Jutten, 2000)

Example of two attributes with three decision boundaries Class 2 boundary 1 Class 2 Class 2 a2 Class 1 boundary 2 Class 2 Class 2 Class 2 boundary 3 a1 (van der Baan and Jutten, 2000)

The danger of too many boundaries (hidden neurons) Brad says: “You can overfit your data by putting in too many decision boundaries, thereby overdividing your attribute space!” Brad (courtesy Brad Wallet, OU)

The danger of too many degrees of freedom (polynomial fitting) 7th order polynomial a2 1st order polynomial 2nd order polynomial Prediction error Prediction error Prediction error a1

The danger of too many attributes a2 3D hyperplane (a plane) 2D hyperplane (a line) 4D hyperplane a1 Training data Validation data a3

A feed-forward network One of several ways of estimating the weights, w(easily understood by Geophysicists). Use a Taylor Series expansion: Initial guess based on random weights, w. a0=input attributes z0=output measurements Let’s define Prediction error given current weights, w. Sensitivity of output to the weights (Jacobian matrix) (note that f must be differentiable!) Equation predicting the output from the input (van der Baan and Jutten, 2000)

Tomography Known output (measurements) Unknown model parameters Known previous model resultl Differentiable model system

Neural networks Known input (attributes) Unknown weights Known output (“truth” data) Differentiable model system

Computing the weights, w f[r(w)] r(w) Differentiable preceptron! (van der Baan and Jutten, 2000)

Iterative least-squares solution using the normal equations Levenberg-Marquardt (or Tikhonov) Regularization

A typical neural network input layer! hidden layer! output layer! (Ross, 2002)

Example 1. Mapping a stratigraphic depositional system (Ruffo et al., 2009)

Seismic line perpendicular to channel system (Ruffo et al., 2009)

Seismic facies classification using a neural network classifier (Ruffo et al., 2009)

Use 4-way averaged vertical 2D GLCM attributes parallel to dip at a suite of azimuths (Ruffo et al., 2009)

Seeding the facies classification algorithm (Ruffo et al., 2009)

Lithofacies classification (Ruffo et al., 2009)

Lithofacies classification scheme (Ruffo et al., 2009)

Lithofacies classification (Ruffo et al., 2009)

Seismic facies overlain on seismic data (Ruffo et al., 2009)

Horizon slice (Ruffo et al., 2009)

Example 2. Clustering of - and - volumes - (Chopra and Pruden, 2003)

Neural network estimation Gamma ray response Porosity (With mask generated from gamma ray response) (Chopra and Pruden, 2003)

San Luis Pass weather prediction exercise Exercise: flip 6 coins: Heads=sunny Tails=stormy Read out your correlation rate: 0/6 = -1.00 3/6 = -0.00 1/6 = -0.67 4/6 = +0.33 2/6 = -0.33 5/6=+0.67 6/6 = 1.00 August 24, 2005 – sunny August 25, 2005 - storms August 26, 2005 - sunny August 27, 2005 - sunny August 28, 2005 - sunny August 29, 2005 - storms tails heads

San Luis Pass weather prediction exercise Which coins best predict the weather in San Luis Pass? Should Marfurt go fishing?

Potential risks when using seismic attributes as predictors of reservoir properties • When the sample size is small, the uncertainty about the value of the true correlation can be large. • given 10 wells with a correlation of r=0.8, the 95% confidence level is [0.34,0.95] • given only 5 wells with a correlation of r=0.8, the 95% confidence level is [-0.28,0.99] ! (Kalkomey, 1997)

Spurious Correlations A spurious correlation is a sample correlation that is large in absolute value purely by chance. (Kalkomey, 1997)

The more attributes, the more spurious correlations! (Kalkomey, 1997)

Risk = expected loss due to our uncertainty about the truth * cost of making a bad decision Cost of a Type I Error (using a seismic attribute to predict a reservoir property which is actually uncorrelated) is: • Inaccurate prediction biased by the attribute. • Inflated confidence in the inaccurate prediction — apparent prediction errors are small. Cost of a Type II Error (rejecting a seismic attribute for use in predicting a reservoir property when in fact they are truly correlated) is: • Less accurate prediction than if we’d used the seismic attribute. • Larger prediction errors than if we’d used the attribute. (Kalkomey, 1997)

Validation of Attribute Anomalies • 1. Basic QC • is the well tie good? • are the interpreted horizons consistent and accurate? • are the correlations statistically meaningful? • is there a physical or well-documented reason for an attribute to correlate with the reservoir property to be predicted? • 2. Validation • does the prediction correlate to control not used in training? • does the prediction make geologic sense? • does the prediction fit production data? • can you validate the correlation through forward modeling? (Hart, 2002)

Supervised Multiattribute Classification