1 / 48

Passive Capture and Structuring of Lectures

Passive Capture and Structuring of Lectures. Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell University. Introduction. Multimedia Presentations Manual Labor-intensive Experience-on-Demand (EOD) of CMU Capture & abstract personal experiences (audio / video)

Download Presentation

Passive Capture and Structuring of Lectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell University

  2. Introduction • Multimedia Presentations • Manual • Labor-intensive • Experience-on-Demand (EOD) of CMU • Capture & abstract personal experiences (audio / video) • Synchronization of Audio, Video & position data

  3. Introduction Contd. • Classroom 2000 (C2K, Georgia Tech) • Authoring multimedia documents from live events • Data from white boards, cameras, etc. are combined to create multimedia documents for classroom activities • Similarity (EOD & C2K) • Automatically capture • Author Multimedia documents

  4. Introduction Contd. • Dissimilarity : • C2K: Invasive capture (Explicitly start capture), Structured environment (Specific) • EOD : Passive capture, unstructured

  5. Motivation • Structured Multimedia document from seminars, talk, or class • Speaker can walk, press a button and give a presentation using blackboards, whiteboards, 35mm slides, overheads, or computer projection • One hour later, structured presentation on web

  6. Overview • Cameras ( Encoded in MPEG format) • Overview camera (entire lecture) • Tracking camera (H/W built tracker), tracks speaker, capture head & shoulders • Upload slides to server (Speaker)

  7. Overview Index • Video Region • RealVideo • Index • Title & duration of current slide • Synchronized with video • Prev / Next skip slides • Timeline • Boxes represents the duration of slide Video Slides Timeline

  8. Problems Handled • Synchronization • Transitive (position of event A in a timeline) • A<->B => B can be add to same timeline • Synchronization error E1 = (A,B) and E2 = (B,C) => error (A,C) = E1 + E2 • Collected data • Timed (T-data, Video) • Untimed (U-data, Electronic slides)

  9. Problems Handled Contd. • Synchronization • Time-timed Synchronization (TTS) • Two video streams • Timed-untimed Synchronization (TUS) • Slides with video • Untimed-untimed Synchronization (UUS) • Slides titles : Parsing the HTML produced by PowerPoint • Automatic Editing • Rule based structuring of Synchronized data

  10. Δ2 V2(t) Δ V1(t) Δ1 Synchronization point To solve this, consider one or more Synchronization Points Timed-Timed Synchronization • Temporal link between streams captured from independent cameras

  11. Camera Machine Sound Card Camera Machine MPEG Audio MPEG Audio Wireless Mic Receiver Right Left Left Right Speaker Audio Timed-Timed Synchronization Contd. Sync Tone • Artificial creation of Synchronization Point of duration 1 second • One of the channel in MPEG streams • Sound card is used for tone generation • Later, detection of the positions of tones in each stream.

  12. Timed-Timed Synchronization Contd. • Detection of Synchronization Tone • Brute force approach (Fully decoding of MPEG Audio) • Proposed Method • Scale factors indicates overall volume of packets • Summing up Scale factors for volume estimation • Exceeds certain thresholds • Assuming MPEG-2 : worst error 26 ms (22.5 * 1152 microseconds) and max error 52 ms • Video: 30 FPS, e < 1/30 seconds

  13. Timed-Timed Synchronization Contd. • Tighter bound (22.5 kHz) • Error <= 1/22.5 <= 44 micro sec < 26 ms (max error 26 ms) • For video of 15 FPS, max error 66 ms • Using this in MPEG System, a tone of 70 seconds can be located < 2 seconds

  14. Use a tolerance of 0.5 sec for the synchronization Timed-Untimed Synchronization • Synchronization of slides with one of the video

  15. Timed-Untimed Synchronization Contd. • Segmentation of slides from video of V(t) • Color Histogram • Slide having same background • Low resolution • Feature based Algorithm • Clipping frames, Low-pass filter, Adaptively thresholded • Let B1 and B2 is the two consecutive processed frames

  16. Timed-Untimed Synchronization Contd. • Assumption : Slides contain dark foreground and light background • Applied to I-frame of MPEG video with 0.5 sec interval • Matching • Matching performed with the original slides for confirmation of slide change • Similarity > 95%, match declared & terminated • Similarity > 90%, highest similarity is returned • Too much noisy to match

  17. Timed-Untimed Synchronization Contd. • Unwrapping • Video sequence contain foreshorten version of slides • Quadrilateral F -> Rectangle (size as original) • Camera & Projector fixed, corner points of F are same • Perspective transform -> Rectangle • Bilinear Interpolation (Rectangle)

  18. Timed-Untimed Synchronization Contd.

  19. Timed-Untimed Synchronization Contd. • Similarity • Hausdorff Distance • Dilation (radius 3) of pixels in original binary images • Setting all pixel to black in the dilation radius of any black pixels to count overlap (G) • b # of black pixels dilation (for extracted one, F) • b’ # of black pixels F & G • Forward match ratio = b’ / b • Similarly, reverse match ratio is calculated by dilating the F & keeping G (without dilating)

  20. Timed-Untimed Synchronization Contd. • Evaluation • 106 slides, 143 transitions • Accuracy 97.2 % • Need to be tuned for dark background and light foreground

  21. Automatic Editing • Combining captured videos into single stream • Constraints • Footage from overview must be shown 3 sec before and 5 second after slide change • 3 sec < any shot < 25 sec • Heuristic algorithm Edit Decision List (EDL) • Shot taken from one video source • Consecutive shots come from different video source • Shot: Start time, duration, which video source • Concatenating the footage of shots (final edited video)

  22. Automatic Editing Contd.

  23. Automatic Editing Contd. Shots from overview camera < 3 sec & separated from the tracking camera are merged Short from tracking camera > 25 sec are broken to 5 sec shots

  24. Conclusion • Automatic Synchronization and Editing Systems • Classification of different kind of Synchronization • Slide change detection for dark foreground and light background (Textual part) • Slide Identification confirms slide change detection • Rotation and translation can affect the matching

  25. Future Work • Motion vector analysis and scene cut detection (Trigger switch to overview camera) • Automatic enhancement to poor lighting • Orientation and position of speaker for editing • Shots from more cameras • Use of blackboards, whiteboards and transparencies

  26. Looking at Projected Documents: Event Detection & Document Identification:

  27. Introduction • Documents play major role in presentations, meetings, lectures, etc. • Captured as a video stream or images • Goal: annotation & retrieval using visible documents • Temporal segmentation of meetings based on documents events (projected): • Inter-documents (slide change, etc) • Intra-documents (animation, scrolling, etc) • Extra-documents (sticks, beams, etc) • Identification of extracted low-resolution document images

  28. Motivation • Detection & identification from low-resolution devices • Extendable for documents on table • Current focus on projected documents • Captured as a video stream (Web-cam)

  29. Slide Change Detection • Presentation slides as a video stream • Slides in a slideshow: same layout, background, pattern, etc. • Web-cam is auto-focusing (nearly 400 ms for stable image) • Variation of lighting condition

  30. During Auto-focusing period Fading during auto-focusing Different slides with similar text layout Slide Change Detection (Cont’d)

  31. Slide Change Detection (Cont’d) • Existing methods for scene cut detection • Histogram (color and gray) • Cornell method (Hausdorff Distance) • Histogram methods fail due to: a) low-resolution b) low-contrast c) auto-focusing d) fading • Cornell: Uses identification to validate the changes • Fribourg method: Slide stability • - Assumption : Slide visible  2 seconds  slide skipping

  32. xN-1 xi xi+1 xi-1 xN-2 x0 x1 Check for Stability Stability Confirmation 0 1 2 i N i -1 i +1 N -1 0.5 s 0.5 s 2 s 2 s 2 s Proposed Slide Change Detection

  33. Ground-Truth Preparation • Based on SMIL •  300 Slideshows collected from web • Automatic generation of SMIL file: Random duration of each slide • Contains slide id, start time, stop time and type (skip or normal)

  34. <slideid="1" imagefile="Slide1.JPG" st="0000000" et="9.641000" type="normal" /> <slideid="2" imagefile="Slide2.JPG" st="9.641000" et="12.787199" type="normal" /> <slideid="3" imagefile="Slide15.JPG" st="12.787199" et="13.775500" type="skip" /> <slideid="4" imagefile="Slide11.JPG" st="13.775500" et="14.341699" type="skip" /> <slideid="5" imagefile="Slide25.JPG" st="14.341699" et="15.885400" type="skip" /> <slideid="6" imagefile="Slide20.JPG" st="15.885400" et="16.476199" type="skip" /> <slideid="7" imagefile="Slide9.JPG" st="16.476199" et="18.094100" type="skip" /> <slideid="8" imagefile="Slide3.JPG" st="18.094100" et="23.160102" type="normal" /> <slideid="9" imagefile="Slide4.JPG" st="23.160102" et="26.523102" type="normal" /> …….. An example of Ground-Truth SMIL file Evaluation • Ground-Truth: SMIL  XML • Slideshow video  Slide Change Detection  XML • Evaluation: Compare 1 & 2 • Metric used: Recall (R), Precision (P), F-measure (F)

  35. 4 Frame Tolerance 1 Frame Tolerance R:0.80, P:0.83, F:0.81 R:0.92, P:0.96, F:0.93 1 Frame Tolerance 1 Frame Tolerance Fribourg (R:0.84,P:0.82,F:0.83) Cornell (R:0.40, P:0.21, F:0.23) Color Hist (R:0.07, P:0.04, F:0.05) Gray Hist (R:0.18, P:0.12, F:0.13) Results

  36. Fribourg (R:0.93, P:0.91, F:0.92) Cornell (R:0.80, P:0.51, F:0.54) 4 Frames Tolerance 4 Frames Tolerance Color Hist (R:0.13, P:0.09, F:0.10) Gray Hist (R:0.27, P:0.17, F:0.19) Results (Cont’d)

  37. Low-resolution Docs Identification • Difficulties in Identification • Hard to use existing DAS (50-100 dpi) • Performance of OCR is very bad • Hard to extract complete layout (Physical, Logical) • Rotation, translation and resolution affect global image matching • Captured images vary : lighting, flash, distance, auto-focusing, motion blur, occlusion, etc.

  38. Proposed Docs Identification • Based on Visual Signature • Shallow layout with zone labeling • hierarchically structured using features’ priority • Identification : matching of signatures • Matching : simple heuristics, following hierarchy of signature

  39. Visual Signature Extraction • Common resolution, RLSA • Zone labeling (text, image, solid bars, etc.) • Blocks separation: Projection Profiles • Text blocks (One line per block) • Bullet and vertical text line extraction

  40. Feature vector for Image, Bars (Horizontal and Vertical), Bullets : • Feature vector for each Text line and, Bar with text (Horizontal and Vertical): Bounding box of various features Visual Signature Extraction

  41. Structuring Visual Signature • Hierarchy depends on extraction process & real world slideshow • Narrows the search path during matching <VisualSign> <BoundingBox NoOfBb="10"> <Text NoOfLine="7"> <HasHorizontalText NoOfSentence="7"> <Sy="53" x="123" width="436" height="25" NoOfWords="4" PixelRatio="0.40" /> … </HasHorizontalText><HasVerticalTextNoOfSentence="0" /> </Text> <HasImage NoOfImage="3"> <Imagey="1" x="16" width="57" height="533" PixelRatio="0.88" /> … </HasImage> <HasBullet NoOfBullets="2"> <Bullety="122" x="141" width="12" height="12" PixelRatio="1.0" /> .. </HasBullet> <Line NoOfLine="0"><HasHLineNoOfLine="0" /><HasVLineNoOfLine="0" /></Line> <BarWithText NoOfBar="0"> <HBarWithTextNoOfBar="0" /><VBarWithTextNoOfBar="0" /> </BarWithText> </BoundingBox> </VisualSign>

  42. F Bbox Text Bar Line Text Image Bullets f2 f3 H-Line H-Text V-Text V-Line HBarText VBarText f1 f7 f4 f5 f6 f8 Tree representation of features in visual signature Structured Signature-based Matching • Search Technique: • Takes the advantage of hierarchical structure of visual signature • Higher level features compared  lower-level features matched

  43. Matching Performance Results • Evaluation based on Recall and Precision • ~ 200 slide images (web-cam) queried (repository  300 slides) (R:0.94, P:0.80, F:0.86)

  44. Conclusion • Proposed Slide Change Detection • Automatic evaluation • Performance : best compared to state-of-the-art • Lower time and computational complexity • Overcomes: auto-focusing, fading nature of web-cam • Performance : accuracy improved compared to Cornell (low tolerance) • Could be used for meeting indexing : high precision

  45. Conclusion • Proposed Slide Identification: • Based on Visual Signature • No need for any classifier • Fast : only Signature matching (without global image matching) • Without OCR • Could be helpful for real-time application (translation, mobile OCR, etc.) • Applicable for digital cameras and mobile phones • Finally: Documents as a way for indexing & retrieval

  46. Future Works • Evaluation of animation • Detection and identification : pointed and partially occluded documents • Identification with complex background structure • Evaluation: Digital cameras, mobile phones • Background pattern and color information to Visual Signature • Identification of documents on table

  47. Possible Projects • Deformation correction (Perspective, Projective, etc.) • Automatic detection of projected documents in the captured video • Detection of occluded objects • Background pattern recognition

  48. Thank You !

More Related