280 likes | 292 Views
This research paper presents the FlowString interface that uses a shape invariant similarity measure to match partial streamlines for exploratory flow visualization. The approach includes generating a character-level alphabet and word-level vocabulary, resampling streamlines, calculating dissimilarity measures, clustering, constructing a streamline suffix tree, and performing exact vs. approximate search.
E N D
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene Michigan Technological University Presented at IEEE Pacific Visualization Symposium March 5, 014 Yokohama, Japan
FlowString interface Alphabet and vocabulary Streamline query Query result Query string Streamline set Textual Visual Parameters
Streamline similarity measures • Proximity-based measures • Leverage spatial proximity between integral curves • Feature-based measures • Extract geometrical, topological or domain specific features for similarity analysis • Distribution-based measures • Capture feature distributions for more robust similarity comparison • Transformation-based measures • Map data properties or features into a transformed space for similarity measuring
Our solution • Shape-based measure • Extract features that are invariant under translation, rotation and scaling • Support flexible partial streamline matching • Approach • Advocate a vocabulary approach • Construct character-level alphabet and word-level vocabulary • Design intuitive and convenient user interface and interaction
Terms (1/2) • Character (low-level shape descriptor) • Unique local shape primitive extracted from streamlines • Alphabet • A set of characters describing various local shapes • Word (high-level shape descriptor) • A sequence of characters encoding a streamline shape pattern • Vocabulary • A set of words describing various regional shapes
Terms (2/2) • String • Mapping of a global streamline to a sequence of characters • Substring • Encoding a portion of the corresponding streamline
Notations • Character • a (same order) • a’ (reversed order) • A (both orders) • Multiple characters with common features (|) • (a1 | a2 | … am) • Word concatenation (|and &) • [abc]|[bbc] (segments that match either abc or bbc) • [abc]&[bbc] (segments that match both abc and bbc with some distance apart) • Other symbols • a+(single character repetition) • ? and * (wildcard symbols)
Outline of FlowString approach • Alphabet generation • Streamline resampling • Dissimilarity measure • Affinity propagation clustering • String operation • Streamline suffix tree • Vocabulary construction • Exact vs. approximate search
Streamline resampling (1/2) • Goal: the number of sample points is similar to the local features with the same shape but different scales • Criteria: • A streamline segment between two sample points should be simple enough (no feature is ignored) • The density of sample points should be related to the local feature size • Solution: maintain a constant accumulative curvature between two neighboring sample points along the streamline
Streamline resampling (2/2) Neighborhood size r = 7
Character concatenation • (a): characters assigned to all sample points, which produces a deterministic shape • (b) and (c): characters assigned to every r-1 sample points, which produces different shapes
Dissimilarity measure • Dissimilarity between the local shapes of two sample points (Pa and Pb) • Use Procrustes distance which minimizes a measure of shape difference • Ignore geometric positions and orientations • Require a registration (Procrustes superimposition) before distance calculation
Affinity propagation clustering • Apply affinity propagation for clustering • Simultaneously consider all data points as potential exemplars • Automatically determine the best number of clusters • Perform two-level clustering to generate characters
Character generation (1/3) Second-level clustering result
Character generation (2/3) First-level clustering result
Character generation (3/3) Original shape primitives
Streamline suffix tree • Convert each streamline to a string using the alphabet • Construct a suffix tree to enable efficient operations on these strings • Linear time and space cost to construct the tree • Transform the problem of searching for a string to searching for a node in the tree • O(m+z) searching time, where m is the length of the string and z is its number of appearance
Vocabulary construction • Automatically identify meaningful words to construct the vocabulary • Select the most common patterns from the streamlines (i.e., detect the most frequently appeared substrings) • Achieve through a simple depth-first search traversal of the streamline suffix tree • O(n) time, where n is the total length of the original strings (i.e., the number of nodes is linear to n)
Exact vs. approximate search (1/2) • The need for approximate search • Similarities among the shapes represented by different characters are different • Different numbers of repetition of a certain shape often seem to be similar • K-approximate search using dynamic programming where k is a threshold used in the edit distance • Extend to handle single character repetition (+) and multiple characters with common features (|)
Exact vs. approximate search (2/2) Exact matching (E|F)(E|F) Approx. matching (k =15) (E|F)(E|F) Exact matching EE Exact matching FF E: spiral with large torsion F: spiral with small torsion
FlowString Robust partial streamline matching using shape invariant features Characters / alphabets and words / vocabulary metaphors Intuitive user interface and interaction support Future work Conduct domain expert evaluation Extend FlowString to handle multiple data sets Release FlowString to benefit the community Acknowledgements U.S. National Science Foundation Summary