170 likes | 243 Views
GSC-II Classifications Oct 2000 Annual Meeting. V. Laidler G. Hawkins, R. White, R. Smart, A. Rosenberg, A. Spagna. Preliminary Classification. Goal: Classify as well as possible to plate limit Metric: Minimize overall number of errors Procedure:
E N D
GSC-II ClassificationsOct 2000 Annual Meeting V. Laidler G. Hawkins, R. White, R. Smart, A. Rosenberg, A. Spagna
Preliminary Classification Goal: Classify as well as possible to plate limit Metric: Minimize overall number of errors Procedure: • Use ranks to handle plate to plate variation • Match training population to sky population • OC1 oblique decision tree (Murthy et al) • Build several decision trees & let them vote • Classification categories star / nonstar / defect Classification / Laidler
Next Step Classification Goal: Provide reliable guide stars to V~19(?) Metric: Minimize contaminationof “stars” to Vlim while maintaining sufficient completenessfor adequate coverage Contamination: We called it a star but it’s really nonstellar Completeness: Everything that is really a star is called a star Classification / Laidler
Development Areas • Multi-plate weighted voting • Training set magnitude distribution • Training set sources • Classification categories • Classification features • Object selection Available In progress Future Classification / Laidler
Multi-plate Weighted Voting • Weights calculated empirically from percentages of misclassifications(NED, NPM, ~4 plates per survey) • Compensates for observed bias in classifier and breaks ties Classification / Laidler
MP weighted voting compared to Mendez Galaxy model • Current classification comes from a single plate • Multiplate weighted voting is straightforward DB operation Conservative star selection further reduces contamination; coverage remains adequate Classification / Laidler
Training set mag distribution:What happens to V < Vlim objects? • Optimized approach • have more dynamic range • contribute all the weight when counting errors Preliminary approach • occupy 20% of ranked hyperspace • are outnumbered when counting errors • contain the same classification bias as the sky • are free of classification bias Classification / Laidler
Training Set Sources Classification Categories • Decision trees can be improved by using training sets with smaller dispersion in parameter space • Catalog objects will likely provide cleaner, better separated populations • Galaxies and blends are different => reside in different areas of parameter space => individually constitute better defined populations than when combined • Galaxy / blend classifications are value added to the catalog Classification / Laidler
New training set • Magnitude balanced to F=17: bright only • Star/galaxy/blend classifications • Stars, galaxies from catalogsNED,NPM,CAMC,LCRS • Blends from deblender “parent” objects • 1200 objects XP330, XP853, XP005 b={48,41,28} Classification / Laidler
New training set: Compare to production classifier • “Above all, do no harm” • Visually examine objects that changed classifications Classification / Laidler
New training set: compare to external catalogs Significant improvement in magnitude range of training set • Extend training set: can we extend this performance to Vlim? • Possibly use star/galaxy/blend to Vlim, star/nonstar/defect below Classification / Laidler
The “curse of dimensionality” tells us that tree performance can be improved by reducing the number of features Edinburgh group has used two features specifically to separate blends from galaxies Current classification features Maximum Density Integrated Density Semimajor axis Semiminor axis Ellipticity Unweighted semimajor axis Unweighted semiminor axis Unweighted ellipticity 4 texture features 2 spike features 16 areas Future work: Classification features Classification / Laidler
Future work: Object Selection • Object selection can be considered an additional classification step • Select based on: • Blend status • Multi plate information • Probability • Select for functional or science goals: • Minimize contamination • Maximize completeness • Probability comes from leaf population • Final probability comes from averaging probabilities from each tree • Can we use probabilities to further optimize guide star selection? Classification / Laidler
What do the probabilities mean? • Do the probabilities measure the observed population? No. This is not unexpected. Decision trees are optimized to produce correct answers, not to produce accurate models of the probability function. • Do the probabilities indicate reliability? Yes. • Conclusion: We can use the probabilities to construct a “class quality” field, but should not take them at face value. Classification / Laidler
How to Improve a Classifier Classification / Laidler
Using Ranks • Sort the objects in order by the raw feature • Assign a ranked feature based on position in the list Classification / Laidler