190 likes | 259 Views
Multimodal Alignment of Scholarly Documents and Their Presentations. Bamdad Bahrani JCDL 2013 Submission. Feb 2013. Motivation. How many papers do you read every week? How many you read deeply? How many you just skim? Title, abstract and conclusion Enough?
E N D
Multimodal Alignment of Scholarly Documents and Their Presentations BamdadBahrani JCDL 2013 Submission Feb 2013
Motivation • How many papers do you read every week? • How many you read deeply? • How many you just skim? • Title, abstract and conclusion Enough? • A summary of the paper Most important issues
Motivation • Slide Presentation as a summary • It includes important contents from paper • It is made by the same author • But • Not detailed enough • Misses some technical parts of the paper
Introduction • The Paper • and its Slide Presentation • Alignment map
Previous Works • Hayamaet al. • 2005 • Japanese technical papers and presentation sheets • Using HMM • Kan • 2007 • SlideSeer • Crawling of paper-presentation pairs, aligning them and GUI • Beamer and Girju • 2009 • Detailed analysis of different similarity measures Only Textual Content
Error Analysis Around 70% are showing “Evaluation and Result”
Alignment Modals • Text Similarity • Between each slide and each section • The core aligner unit • The baseline • A cosine similarity measure: TF . IDF • Linear Ordering • Ordering between slides and sections are monotonic • Visual appearance of slides
Text Extraction Unit • Presentation • Paper Slide Title text Slide Body text Slide Number Slides MS PowerPoint VB compiler Section Title Section Body PDF XML PDFx Parser (via Python)
Slide Image Classifier Unit • 1. Text • 2. Outline • 3. Drawing • 4. Results Slides Image Take Snapshot Image Classifier
Image Class Instructions • 1. Text • Text similarity alignment weight Increase 2/3 • 2. Outline • Text similarity alignment weight Decrease 1/3 • Linear ordering alignment weight Decrease 1/3 • 3. Drawing • Uniform probability for all weights • 4. Result • Exceptional rule: Align directly to “Experiment and Result” section
Image Classifier experiment and result • 750 Manually annotated slides • Linear SVM • Feature extraction: Histogram of Oriented Gradiants • Blurring filters • Normalization • 10 fold cross validation
Experiments • Experiment 1: • Baseline • Paragraph-to-slide alignment • Only textual data • Experiment 2: • Section-to-slide alignment • Only textual data
Experiments • Experiment 3: • The effect of Linear Ordering alignment was added. • Textual data and ordering information • Experiment 4: • The effect of Image Classification was added. • Textual data, ordering information and visual content
Results 25% Ordering Baseline Section Image Class
Conclusion • Many slides with images and drawings • Textual data is not enough • Taking advantage of graphical features of slides
Future Tasks • Bigger dataset • More efficient text similarity measures • Differentiate between Title and Body text weights • Support more input file format • A GUI to view aligned documents
System Architcture Input: Presentation Multimodal Fusion Slide Image Classifier 1. Text 3. Drawing nil Text Extraction Textual Similarity 2. Index 4. Results Linear Ordering Output: Alignment Input: Document