Geometric margin domain description with instance-specific margins

Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5th May, 2011

Presentation Contents • High-level motivation • Development of system • Exact-centre method • Dual-optimisation method • Experimental data collection • Conclusions

High-level motivation Task as originally stated: • Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification • Involves automatic feature extraction and inference via classification step

NaI NaI Source Source Shield Neutron HRGS High-level motivation Data consignment: • Fissile elements • Californium-252 • Highly Enriched • Uranium • Weapons-Grade • Plutonium • Shielding methods • Aluminium • Steel ball • Steel planar to det • Lead • CHON (HE sim.) • Detectors • Sodium Iodide scintillator (NaI) • High-Resolution Germanium • (semiconductor) spectrometer (HRGS) • Neutron array counter (N50R)

High-level motivation Data consignment: Spectroscopy experiments

High-level motivation Data consignment: Neutron multiplicity arrays

High-level motivation • Features (columns) based on physically relevant projections of raw experimental data • Class vector: refers to fissile material or shielding method • Some data absent: either not measured or not applicable (structurally missing)

High-level motivation Two principal aims: • Devise a novel contribution to existing literature on classification methods • Provide system of classification of abstract data that is applicable to provided dataset

Development of system Overview Aim 1 Novel Contribution Multi-Class Kernel Methods Missing Data Aim 2 Applicability To Dataset

Development of system Overview Multi-Class Kernel Methods Missing Data • SVDD (Tax, Duin) and Multi-Class Hybrid (Lee) • Geometric SVM (Chechik)

Development of system • “Kernel trick” : ML algorithms that only query data values implicitly via the dot product Working with Kernels • Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)> • Valid if Mercer condition holds ({k(xi,xj)} p.semid.) • Allows analysis in complex superspace without need to directly address its Cartesian form

Development of system Support Vector Domain Description • “One-class classification” • Fits sphere around cluster of data, allowing errors {ξi} • Extends in kernel space to more complex boundary • Hybrid methods: multi-class classification

Development of system Support Vector Domain Description • Dual formulation allows centre to be described in kernel space via weighting factors αi:

Development of system Support Vector Domain Description • Values of αi form partition: • αi=0 inside • αi=1 outside • αi=C (support vectors) Only support vectors determine size and position of sphere

Development of system • Cannot use kernel methods directly with missing features • Must impute (fill in) or assume probability distribution of missing values: pre-processing • Missing features describe complex parametric curves in kernel space  • Seek a method which can address incomplete data directly: minimise point-to-line distances Dealing with Missing Data

Development of system • Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data • Uses two loops of optimisation to replace instance-specific norms with scalings of full norm • Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation Dealing with Missing Data

Development of system Synopsis for Novel System Abstract, context-free Kernel Extension Domain description (one-class) Structurally missing features Avoid imputation / prob. models Applicable to provided data

Exact-centre method • Seeks solution in input space only • Demonstrates concept of optimisation-based distance metric

Exact-centre method • Cannot sample from entire feature space! • Selects centre point a such that φ(a) is optimal centre (hence solves a slightly different problem) • Tricky (but possible) to optimise for soft margins

Exact-centre method • Always performs at least as well as imputation in linear space w.r.t. sphere volume • Often underperforms in quadratic space (which is expected, as domain restricted)

Dual-optimisation method • Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps • Takes lead from dual formulation of SVDD where weighting factors αiare appended to dataset and implicitly describe centre a

Dual-optimisation method • a must itself have full features, and therefore so must the “xi” in the sum • Must therefore provide auxiliary dataset X* with full features to perform this computation • Choice is largely arbitrary, but must span in FS • Weighting factors no longer “tied” to dataset

Dual-optimisation method Given an initial guess α: • Need to first produce full dataset Xa optimally aligned to a, by optimisation over all possible imputations of incomplete dataset • Then need to perform minimax optimisation step on vector of point-to-centre distances:  New candidate α at each optimisation step

Experimental data collection Preparatory trials of datasets constructed to exhibit degree of “structural missingness”: • 2-D cluster of data with censoring applied to all values |x| > 1 • Two disjoint clusters –in [f1,f2], and in [f3,f4] • One common dimension and three other dimensions each common to one part of set Synthetic Data

Experimental data collection Structure of comparisons: Synthetic Data Imputation with [zeros, feature means, 3 nearest neighbours] vs. Our XC and DO methods Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)

Experimental data collection Structure of comparisons: Synthetic Data • Dual-optimisation method on hard margins only • Particle-Swarm Optimisation also used to provide cross-validated classification study • Main study is into effect on sphere size Linear Kernel K(x,y)=<x,y> Quadratic Kernel K(x,y)=(1+<x,y>)2 Hard Margin (all within sphere) Soft Margin (50% outwith sphere)

Experimental data collection Four main features selected for analysis: • Compton edge position (6 features) • Area under graph up to Compton edge (6) • Mean multiplicity of neutron data (1) • Poisson fit on neutron data (9) and chi-squared goodness-of-fit (3) Total 25 features Feature Extraction

Experimental data collection Feature Extraction PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components  missingness pattern intact

Conclusions • Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case • Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight • Cross-validation study using a joint optimiser shows improvement with quadratic kernel

Conclusions • Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue • Link exists with the Randomised Maximum Likelihood (RML) sampling technique • Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency

Conclusions • Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve • Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed Caveats

Conclusions • Fuller exploration into PSJO technique to provide alternative to auxiliary dataset • Heavily reliant on optimisation procedures: could make more efficient than nested loop • Extension to popular radial-basis (RBF) kernel • A more concrete application to sponsor dataset Room for Improvement

Thank you for listening… Figure 1.1 (a)

Geometric margin domain description with instance-specific margins

Geometric margin domain description with instance-specific margins

Presentation Transcript

Domain-Specific Corpora

Internal Domain-Specific Languages

Domain-Specific Software Engineering

Domain Specific Languages

How domain specific are Domain Specific Languages?

Domain-Specific Software Architecture

Topics = Domain-Specific Concepts

Domain-Specific Model Verification with QVT

Domain-Specific Software Engineering

Connecting HPIO Capabilities with Domain Specific Needs

Domain-Specific Languages:

Adding Domain-Specific Knowledge

Agile Development with Domain-Specific Languages

Domain Specific Language

ArchiWordNet Integrating WordNet with Domain-Specific Knowledge

Domain Specific Engineering Environments.

Domain Specific Languages

Domain Specific Languages

Domain Specific Models

Domain Specific Languages

Certifying Domain-Specific Policies