230 likes | 260 Views
Hierarchical Segmentation: Finding Changes in a Text Signal. Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center. Problem Statement. Problem How do we browse video? Goal Create a table-of-contents Solution Look for topic changes in text. Chapter 1. Chapter 2. TOC Example.
E N D
Hierarchical Segmentation:Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center
Problem Statement • Problem • How do we browse video? • Goal • Create a table-of-contents • Solution • Look for topic changes in text
Chapter 1 Chapter 2 TOC Example
Scale Space Filter LSI Segment Overview of This Talk • Goal and approach • Latent semantic indexing (LSI) • Scale space • Combination • Results
Approach • Sentences -> Semantic Space • Filter at multiple scales • Look for large jumps • Three subjects (loops) shown • Loop 1: Polychromaticity Artifacts • Loop 2: Emission Tomography • Loop 3: Ultrasound Tomography
Courtesy of Jianbo Shi (CMU) Building on Previous Work • LSI and clustering • Text tiling • Change point analysis • Segmentation • Scale space
Docs 10D Latent Semantic Indexing • Collect histogram of word frequencies • Use SVD to capture frequent combinations • Orthogonal decomposition • Represent in low-dimensional space Docs Words
LSI Within a Document • Split into chunks • Fixed size • Sentences • Compute histograms • Perform SVD • Look at results • Sources • “Principles of Computerized Tomographic Imaging” • PBS News Hour
LSI – 2D Projection Chapter 4 of Principles of Computerized Tomographic Imaging
LSI – Self-similarity • Measure similarity • Cosine of angle between “documents” • Plot all pairs of chunks/sentences • Look for block diagonal Chapter 4 of Principles of Computerized Tomographic Imaging
Scale-space Filtering • What size are the features? • Look at different scales! • Continuous scale • Used for • Object Recognition • Feature Detection
Green line marks best high-level segmentation 10d semantic space Scale varies from 1 to 400 sentences Scale-space Movie
Scale-space Segmentation • Low pass filter signal • Form image of scale vs. time • Look for changes • Track peaks of vector derivative across scale
Scale-space Example • Derivative as function of scale and sentence
LSI and Scale Space • Putting it all together • Split document/transcript • Perform LSI analysis • Look at change in angle • Perform scale-space segmentation • Show tree
Peaks in scale-space derivative Peaks traced to their origin Scale-Space Image
Results – CT • Comparison • Scale-Space • Book Headings
Results – News • Comparison • Scale-Space • Ground Truth
Results – Autocorrelation • Block sentences • Measure correlation • Positive Peak • Anti-correlation
Discussion Issues • Evaluation (and ground truth) • Lafferty’s measure • Temporal properties • Histogram/SVD chunking size • Autocorrelation
Computational Effort • Histogram: O(N) • SVD: O(N3) • Scale space: O(N2) • N < 1000 • Number of sentences in a video or document is not large
LSI Document Lookup • Histogram documents • Entropy term weighting • Compute SVD • Use first 10-100 vectors to model space • Encode query as histogram • Look for documents in similar direction
LSI Example • Collection of book titles • Differential equations vs. algorithms and applications