380 likes | 492 Views
Geometric margin domain description with instance-specific margins. Adam Gripton Thursday, 5 th May, 2011. Presentation Contents. High-level motivation Development of system Exact-centre method Dual-optimisation method Experimental data collection Conclusions. High-level motivation.
E N D
Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5th May, 2011
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
High-level motivation Task as originally stated: • Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification • Involves automatic feature extraction and inference via classification step
NaI NaI Source Source Shield Neutron HRGS High-level motivation Data consignment: • Fissile elements • Californium-252 • Highly Enriched • Uranium • Weapons-Grade • Plutonium • Shielding methods • Aluminium • Steel ball • Steel planar to det • Lead • CHON (HE sim.) • Detectors • Sodium Iodide scintillator (NaI) • High-Resolution Germanium • (semiconductor) spectrometer (HRGS) • Neutron array counter (N50R)
High-level motivation Data consignment: Spectroscopy experiments
High-level motivation Data consignment: Neutron multiplicity arrays
High-level motivation • Features (columns) based on physically relevant projections of raw experimental data • Class vector: refers to fissile material or shielding method • Some data absent: either not measured or not applicable (structurally missing)
High-level motivation Two principal aims: • Devise a novel contribution to existing literature on classification methods • Provide system of classification of abstract data that is applicable to provided dataset
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
Development of system Overview Aim 1 Novel Contribution Multi-Class Kernel Methods Missing Data Aim 2 Applicability To Dataset
Development of system Overview Multi-Class Kernel Methods Missing Data • SVDD (Tax, Duin) and Multi-Class Hybrid (Lee) • Geometric SVM (Chechik)
Development of system • “Kernel trick” : ML algorithms that only query data values implicitly via the dot product Working with Kernels • Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)> • Valid if Mercer condition holds ({k(xi,xj)} p.semid.) • Allows analysis in complex superspace without need to directly address its Cartesian form
Development of system Support Vector Domain Description • “One-class classification” • Fits sphere around cluster of data, allowing errors {ξi} • Extends in kernel space to more complex boundary • Hybrid methods: multi-class classification
Development of system Support Vector Domain Description • Dual formulation allows centre to be described in kernel space via weighting factors αi:
Development of system Support Vector Domain Description • Values of αi form partition: • αi=0 inside • αi=1 outside • αi=C (support vectors) Only support vectors determine size and position of sphere
Development of system • Cannot use kernel methods directly with missing features • Must impute (fill in) or assume probability distribution of missing values: pre-processing • Missing features describe complex parametric curves in kernel space • Seek a method which can address incomplete data directly: minimise point-to-line distances Dealing with Missing Data
Development of system • Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data • Uses two loops of optimisation to replace instance-specific norms with scalings of full norm • Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation Dealing with Missing Data
Development of system Synopsis for Novel System Abstract, context-free Kernel Extension Domain description (one-class) Structurally missing features Avoid imputation / prob. models Applicable to provided data
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
Exact-centre method • Seeks solution in input space only • Demonstrates concept of optimisation-based distance metric
Exact-centre method • Cannot sample from entire feature space! • Selects centre point a such that φ(a) is optimal centre (hence solves a slightly different problem) • Tricky (but possible) to optimise for soft margins
Exact-centre method • Always performs at least as well as imputation in linear space w.r.t. sphere volume • Often underperforms in quadratic space (which is expected, as domain restricted)
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
Dual-optimisation method • Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps • Takes lead from dual formulation of SVDD where weighting factors αiare appended to dataset and implicitly describe centre a
Dual-optimisation method • a must itself have full features, and therefore so must the “xi” in the sum • Must therefore provide auxiliary dataset X* with full features to perform this computation • Choice is largely arbitrary, but must span in FS • Weighting factors no longer “tied” to dataset
Dual-optimisation method Given an initial guess α: • Need to first produce full dataset Xa optimally aligned to a, by optimisation over all possible imputations of incomplete dataset • Then need to perform minimax optimisation step on vector of point-to-centre distances: New candidate α at each optimisation step
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
Experimental data collection Preparatory trials of datasets constructed to exhibit degree of “structural missingness”: • 2-D cluster of data with censoring applied to all values |x| > 1 • Two disjoint clusters –in [f1,f2], and in [f3,f4] • One common dimension and three other dimensions each common to one part of set Synthetic Data
Experimental data collection Structure of comparisons: Synthetic Data Imputation with [zeros, feature means, 3 nearest neighbours] vs. Our XC and DO methods Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)
Experimental data collection Structure of comparisons: Synthetic Data • Dual-optimisation method on hard margins only • Particle-Swarm Optimisation also used to provide cross-validated classification study • Main study is into effect on sphere size Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)
Experimental data collection Four main features selected for analysis: • Compton edge position (6 features) • Area under graph up to Compton edge (6) • Mean multiplicity of neutron data (1) • Poisson fit on neutron data (9) and chi-squared goodness-of-fit (3) Total 25 features Feature Extraction
Experimental data collection Feature Extraction PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components missingness pattern intact
Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions
Conclusions • Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case • Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight • Cross-validation study using a joint optimiser shows improvement with quadratic kernel
Conclusions • Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue • Link exists with the Randomised Maximum Likelihood (RML) sampling technique • Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency
Conclusions • Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve • Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed Caveats
Conclusions • Fuller exploration into PSJO technique to provide alternative to auxiliary dataset • Heavily reliant on optimisation procedures: could make more efficient than nested loop • Extension to popular radial-basis (RBF) kernel • A more concrete application to sponsor dataset Room for Improvement
Thank you for listening… Figure 1.1 (a)