1 / 38

Geometric margin domain description with instance-specific margins

Geometric margin domain description with instance-specific margins. Adam Gripton Thursday, 5 th May, 2011. Presentation Contents. High-level motivation Development of system Exact-centre method Dual-optimisation method Experimental data collection Conclusions. High-level motivation.

adie
Download Presentation

Geometric margin domain description with instance-specific margins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5th May, 2011

  2. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  3. High-level motivation Task as originally stated: • Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification • Involves automatic feature extraction and inference via classification step

  4. NaI NaI Source Source Shield Neutron HRGS High-level motivation Data consignment: • Fissile elements • Californium-252 • Highly Enriched • Uranium • Weapons-Grade • Plutonium • Shielding methods • Aluminium • Steel ball • Steel planar to det • Lead • CHON (HE sim.) • Detectors • Sodium Iodide scintillator (NaI) • High-Resolution Germanium • (semiconductor) spectrometer (HRGS) • Neutron array counter (N50R)

  5. High-level motivation Data consignment: Spectroscopy experiments

  6. High-level motivation Data consignment: Neutron multiplicity arrays

  7. High-level motivation • Features (columns) based on physically relevant projections of raw experimental data • Class vector: refers to fissile material or shielding method • Some data absent: either not measured or not applicable (structurally missing)

  8. High-level motivation Two principal aims: • Devise a novel contribution to existing literature on classification methods • Provide system of classification of abstract data that is applicable to provided dataset

  9. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  10. Development of system Overview Aim 1 Novel Contribution Multi-Class Kernel Methods Missing Data Aim 2 Applicability To Dataset

  11. Development of system Overview Multi-Class Kernel Methods Missing Data • SVDD (Tax, Duin) and Multi-Class Hybrid (Lee) • Geometric SVM (Chechik)

  12. Development of system • “Kernel trick” : ML algorithms that only query data values implicitly via the dot product Working with Kernels • Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)> • Valid if Mercer condition holds ({k(xi,xj)} p.semid.) • Allows analysis in complex superspace without need to directly address its Cartesian form

  13. Development of system Support Vector Domain Description • “One-class classification” • Fits sphere around cluster of data, allowing errors {ξi} • Extends in kernel space to more complex boundary • Hybrid methods: multi-class classification

  14. Development of system Support Vector Domain Description • Dual formulation allows centre to be described in kernel space via weighting factors αi:

  15. Development of system Support Vector Domain Description • Values of αi form partition: • αi=0 inside • αi=1 outside • αi=C (support vectors) Only support vectors determine size and position of sphere

  16. Development of system • Cannot use kernel methods directly with missing features • Must impute (fill in) or assume probability distribution of missing values: pre-processing • Missing features describe complex parametric curves in kernel space  • Seek a method which can address incomplete data directly: minimise point-to-line distances Dealing with Missing Data

  17. Development of system • Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data • Uses two loops of optimisation to replace instance-specific norms with scalings of full norm • Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation Dealing with Missing Data

  18. Development of system Synopsis for Novel System Abstract, context-free Kernel Extension Domain description (one-class) Structurally missing features Avoid imputation / prob. models Applicable to provided data

  19. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  20. Exact-centre method • Seeks solution in input space only • Demonstrates concept of optimisation-based distance metric

  21. Exact-centre method • Cannot sample from entire feature space! • Selects centre point a such that φ(a) is optimal centre (hence solves a slightly different problem) • Tricky (but possible) to optimise for soft margins

  22. Exact-centre method • Always performs at least as well as imputation in linear space w.r.t. sphere volume • Often underperforms in quadratic space (which is expected, as domain restricted)

  23. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  24. Dual-optimisation method • Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps • Takes lead from dual formulation of SVDD where weighting factors αiare appended to dataset and implicitly describe centre a

  25. Dual-optimisation method • a must itself have full features, and therefore so must the “xi” in the sum • Must therefore provide auxiliary dataset X* with full features to perform this computation • Choice is largely arbitrary, but must span in FS • Weighting factors no longer “tied” to dataset

  26. Dual-optimisation method Given an initial guess α: • Need to first produce full dataset Xa optimally aligned to a, by optimisation over all possible imputations of incomplete dataset • Then need to perform minimax optimisation step on vector of point-to-centre distances:  New candidate α at each optimisation step

  27. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  28. Experimental data collection Preparatory trials of datasets constructed to exhibit degree of “structural missingness”: • 2-D cluster of data with censoring applied to all values |x| > 1 • Two disjoint clusters –in [f1,f2], and in [f3,f4] • One common dimension and three other dimensions each common to one part of set Synthetic Data

  29. Experimental data collection Structure of comparisons: Synthetic Data Imputation with [zeros, feature means, 3 nearest neighbours] vs. Our XC and DO methods Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)

  30. Experimental data collection Structure of comparisons: Synthetic Data • Dual-optimisation method on hard margins only • Particle-Swarm Optimisation also used to provide cross-validated classification study • Main study is into effect on sphere size Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)

  31. Experimental data collection Four main features selected for analysis: • Compton edge position (6 features) • Area under graph up to Compton edge (6) • Mean multiplicity of neutron data (1) • Poisson fit on neutron data (9) and chi-squared goodness-of-fit (3) Total 25 features Feature Extraction

  32. Experimental data collection Feature Extraction PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components  missingness pattern intact

  33. Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

  34. Conclusions • Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case • Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight • Cross-validation study using a joint optimiser shows improvement with quadratic kernel

  35. Conclusions • Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue • Link exists with the Randomised Maximum Likelihood (RML) sampling technique • Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency

  36. Conclusions • Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve • Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed Caveats

  37. Conclusions • Fuller exploration into PSJO technique to provide alternative to auxiliary dataset • Heavily reliant on optimisation procedures: could make more efficient than nested loop • Extension to popular radial-basis (RBF) kernel • A more concrete application to sponsor dataset Room for Improvement

  38. Thank you for listening… Figure 1.1 (a)

More Related