280 likes | 444 Views
Interactive Pattern Discovery with Large Imaging Databases. Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies In collaboration with David Wittman, J. Anthony Tyson of UC Davis Samuel Carliles, William O’Mullane, Alex Szalay of JHU.
E N D
Interactive Pattern Discovery with Large Imaging Databases Tin Kam Ho Computing Sciences Research Center Bell Labs, Lucent Technologies In collaboration with David Wittman, J. Anthony Tyson of UC Davis Samuel Carliles, William O’Mullane, Alex Szalay of JHU
Solving the Puzzle with a 3-step Approach • Describe each symbol shape with a numerical vector [23 12 17 28 11 …] • Find clusters of symbol shapes • Interpret each cluster using context
10.10.10 51.37.50.54.41.35.37 39.47.33.44 13.13 33.52.6.52 83.65.73.68 73.84 72.65.83 83.69.84 65 71.79.65.76 79.70 82.69.83.84.79.82.73.78.71 83.69.82.86.73.67.69 79.78 73.84.83 76.79.78.71.13.68.73.83.84.65.78.67.69 78.69.84.87.79.82.75 70.65.83.84.69.82 84.72.65.78 84.72.69 83.89.83.84.69.77 73.84.83.69.76.70 68.73.83.67.79.78.78.69.67.84.83 67.65.76.76.83 65.70.84.69.82 65 67.65.66.76.69 66.82.69.65.75.14 *** SERVICE GOAL -- AT&T said it has set a goal of restoring service on its long-distance network faster than the system itself disconnects calls after a cable break.
The Deep Lens Survey(Tyson, Wittman, … ) BVRz to 26 mag over 28 sq. degree http://dls.physics.ucdavis.edu/
Weak Gravitational LensingUses distortion of background galaxies to map foreground mass concentrations J.A. Tyson, DLS 2002
Stars or Galaxies? J.A. Tyson, DLS 2002
Discrimination task depends on tiny differences in color and shape • Survey is to an unpreceded depth: most objects have never been observed before and nobody knows their true classification • How does one build confidence on the results of the classifier? • Need to correlate several perspectives: object characteristics in the color space, shape parameters, the brightness statistics • Visualization can help verify correctness of preprocessing steps, clean up undesirable artifacts, choose relevant samples, spot explicit patterns, select useful features, and suggest algorithms and models
The Virtual Observatory http://www.us-vo.org/ http://www.ivoa.net/
Essential Steps in Automatic Pattern Recognition Samples Supervised learning Unsupervised learning features features Feature Extraction Classifier Training Clustering classifier feature 2 Cluster Validation Classification Cluster Interpretation class membership feature 1
Data Relationships Across Multiple Feature Sets Data Mining Simulation Analysis Parameters Responses Feature Set A Set B Feature Computation Unknown Relationship Filtering, Clustering Clustering
Key Algorithms • Clustering: find natural groups in data, construct index structures to facilitate proximity queries • Dimensionality reduction: embed high-dimensional data in 2D displays • Navigation: traverse index structures in systematic ways
Clustering Methods • Model basedClustering identification of finite mixtures • PartitionalClustering divides data set into N mutually exclusive subsets • Hierarchical Clustering top-down procedures: tree splitting bottom-up, agglomerative procedures: merge similar clusters successively
Similarity / Clustering of Objects from Different Perspectives • Objects can be described by many typesof attributes: position, weight, shape, spectrum, time variability, … • Meaningful similarity metric exists only for the same type of attributes • Clusters found from one perspective need to be correlatedto those from others e.g. Are the objects similar in color also similar in shape? Shape clusters Color clusters
Exploratory Tools Needed To bring in domain expertise, interpretation context To visualize data or classifier geometry To track point/class correlations To test tentative classifications To compare groupings from different perspectives To relate numerical data to other data types To facilitate systematic, repeatable explorations
Mirage for Interactive Pattern Recognition http://www.cs.bell-labs.com/who/tkh/mirage Data Display in Linked Views • Show patterns in histograms, scatter plots, parallel coordinates, tables, and images Selection and Tracking • Select points in any view, broadcast to all others Traversal of Data Structures • Walk in histograms, cluster graphs or trees, echoed in all other views Graphical Utilities • Open multiple-page plots with arbitrary configuration Command Scripts • Run prepared groups of operations as an animation Intuitive Graphical Tool for • Exploratory Data Analysis • Visualization of Clusters and Classes • Correlation of Proximity Structures • Manual or Automatic Classification
Software Features • Based on Java Swing library • Intuitive, easy-to-use graphical operations • Mutiple-page, arbitrary plot configurations • Online or offline cluster analysis • GUI or Script driven command execution • Database interface via JDBC • Ready to be adapted for on-line monitoring • Ready to be integrated withdatabaseaccessand decision support systems
Design Motivated by the Needs Interactive plays, intuitive operations to bring domain experts into the loop Multiple types of plots, extensible for more to visualize data or classifier geometry Linked views, traversal actions to track point/class correlations Highlights, colors to test tentative classifications Projection to arbitrary subspaces to compare groupings in different perspectives Linking data with images to relate numerical data to other types Command scripting to facilitate systematic, repeatable explorations
Challenges for the Analysis Tool • Separate treatment of non-comparable groups of variables • Versatile visualization utilities allowing many perspectives • Support for exploratory discovery across diverse data types • Integrate manual & automatic pattern recognition methods • Also, a good tool should • -- leverage existing visualization and analysis methods • - enable continued growth: new visualization, analysis tools • - support interface with existing databases • - be scalable in data volume and processing speed
Towards Extensibility Mirage Core External Rendering Code VO Data Archives Custom Data Views Data Access Clients Cone Search, CAS FITS viewer, … Python? Matlab? Extinction Calculator Data Analysis Methods Data Exchange Pipes Other Analysis Platforms Web Services
VO Enabled Mirage(with Samuel Carliles, William O’Mullane, and Alex Szalay)
VO Enabled Mirage • http://skyservice.pha.jhu.edu/develop/vo/mirage/ • Load VOTable data and perform VO Cone/SIAP and • SDSS CAS searches using IVOA Client Package • Astronomical imaging module loads FITS images • using JSky classes, supporting image operations: • Select data points and broadcast selection to other views. Cut levels. Colormap. SAO DS9-style brightness/contrast enhance. Zoom.
Object selection Mirage Core Extracts RA,DEC,[mag]from Mirage data set Positions, mags SOAP client callsExtinction server Positions, mags, filterIDs Enhanced data set Result stream Extinction Service Merges resultswith Mirage data set E(b-v), dered_mags Extinction Web Service(with Chris Miller, Simon Krughoff)Using DIRBE/IRAS Dust Maps by Schlegel et al.
More at NVO Public Release 1.0 205th Meeting of the American Astronomical Society 9-13 January 2005 San Diego, CA Wednesday, 12 January Astronomical Research with the Virtual Observatory
Analysis of Simulations of Control Dynamics in Optical Transport Systems(with the FROG collaboration) Fiber link Head End Terminal Repeater Repeater Gain Equalizer Repeater Repeater Signal Spectrum with noise floor Tail End Terminal
SLA verification Provisioning Billing SEQUIN SNMP polling MPLS IP Core(QoS-guaranteed paths) Monitoring Network Traffic (With Marina Thottan, Ken Swanson) Software tool for online monitoring and analysis of QoS in IP networks • continuously monitors traffic statistics at edge and core devices • synthesizes statistics in real time to obtain network-wide QoS status and general network element health indicators • Mirage refreshes displays on alerts of database updates via Java Messaging Service DiffServ Edge(aggregation andclassification)