פרקים נבחרים בפיסיקת החלקיקים

פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4

Simplest variable combination: diagonal cut

Combining variables • Many variables that weakly separate signal from background • Often correlated distributions • Complicated to deal with or to use in a fit • Easiest to combine into one simple variable Fisher discriminant:

Neural networks BB & qq Background MC BB MC Continuum MC Signal MC

Signal BB bgd cc+uds Input variables for neural net Legendre Fisher Log(Dz) cosqT Log(K-D DOCA) Lepton tagging (BtgElectronTag & BtgMuonTag)

Uncorrelated, (approximately) Gaussian-distributed variables • “Gaussian-distributed” means the distribution of v is • How to combine the information? • Option 1: V = v1 + v2 • Option 2: V = v1 – v2 • Option 3: V = a1 v1 + a2 v2 • What are the best weights ai? • How about ai= (<vis> – <vib>) = difference between the signal & background means Signal Background v1 Background v2 Signal

Incorporating spreads in vi • <v1s> – <v1b> > <v2s> – <v2b>, but v2 has a smallerspreads and more actual separation between S and B • ai = (<vis> – <vib>)/((sis)2 + (sib)2)where (sis)2 = <(vis – <vis>)2> = e (vies – <vis>)2 / Nis the RMS spread in the vi distribution of a pure signal sample (similarly defined for sib) • You may be familiar with the form<(v – <v>)2> = <v2> + <<v>2> – 2 <v<v>> = <v2> - <v>2 v1 Signal Background v2 Background Signal

Linearly correlated, Gaussian-distributed variables • Linear correlation: • <v1> = <v1>0 + c v2 • (s1)2 independent of v2 • ai = (<vis> – <vib>) / ((sis)2 + (sib)2)doesn’t account for the correlation • Recall (sis)2 = <(vis – <vis>)2> • Replace it with the covariance matrix Cijs = <(vis – <vis>)(vjs – <vjs>)> • ai = j (<vis> – <vib>) (Cijs + Cijb)-1 • Fisher discriminant: F  j ai vi Inverse of the sum of the S+Bcovariance matrices

Fisher discriminant properties • Best S-B separation for a linearly correlated set of Gaussian-distributed variables • Non-Gaussian-ness of v is usually not a problem… • There must be a mean difference <vis> – <vib>  0 • Need to calculate ai coefficients using (correctly simulated) Monte Carlo (MC) signal and background samples • Should validate using control samples(true for any discriminant) Take abs value

More properties • F is more Gaussian than its inputs • (virtual calorimeter example) • Central limit theorem: • If xj (j=1, …n) are independent random variables with means <xj> and variances sj2, then for large n, the sum j xj is a Gaussian-distributed variable with mean j <xj> and variance j sj2 • F can usually be fit with 2 Gaussians or a bifurcated Gaussian • A cut on F corresponds to an (n-1)-diemensional plane cut through the n-dimensional variable space

Nonlinear correlations • Linear methods (Fisher) are not optimal for such cases • May fail altogether if there is no S-B mean difference

Artificial neural networks • “Complex nonlinearity” • Each neuron • takes many inputs • outputs a response function value • The output of each neuron serves as input for the others • Neurons divided amonglayers for efficiency • The weight wijl between neuron i in layer l and neuron j in layer l+1 is calculated using a MC “training sample”

Response functions • Neuron output = r (inputs, weights) = a(k(inputs, weights))

Common usage a = linear in output layer a = tanh in hidden layer k = sum in hidden & output layer

Training (calculating weights) • Event a (a=1…N) has input variable vector x = (x1…xnvar) • For each event, calculate the deviation from the desired value (0 for background, 1 for signal) • Calculate the error function for random values w of the weights

… Training • Change the weights so as to cause the most steep decline in E: • “online learning”: remove the sums • Requires a randomized training sample

What architecture to use? • Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is sufficient to approximate a continuous correlation function to any precision, if the number of neurons in the layer is high enough • Alternatively: several hidden layers and less neurons may converge faster and be more stable • Instability problems: • output distribution changes with different samples

What variables to use? • Improvement with added variables: • Importance of variable i:

More info • A cut on a NN output = non-linear slice through n-dimensional space • NN output shape can be (approximately) Gaussianized: • q  q’ = tanh-1[(q – ½ (qmax+qmin) / ½(qmax – qmin)]

פרקים נבחרים בפיסיקת החלקיקים

פרקים נבחרים בפיסיקת החלקיקים

Presentation Transcript