1 / 21

An Architecture for Mining Resources Complementary to Audio-Visual Streams

An Architecture for Mining Resources Complementary to Audio-Visual Streams. J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Sv á tek, T. Declerck, A. Cobet, T. Sikora, N. O'Connor, V. Tzouvaras, H. Zeiner, J. Petr ák. Introduction.

duer
Download Presentation

An Architecture for Mining Resources Complementary to Audio-Visual Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Architecture for Mining Resources Complementary to Audio-Visual Streams J. Nemrava, P. Buitelaar, N. Simou, D. Sadlier, V. Svátek, T.Declerck, A. Cobet, T. Sikora, N. O'Connor, V. Tzouvaras, H. Zeiner, J. Petrák

  2. Introduction • Video retrieval can strongly benefit from textual sources related to the A/V stream • Vast textual resources available on the web can be used for fine-grained event recognition. • Good example is sport-related videos • Summaries of matches • Tabular (list of player, cards, substitutions) • Textual (minute-by-minute reports)

  3. Available Resources • Audio-Video Streams • A/V analysis captures features from the video using suitable detectors • Primary Complementary • Directly attached to the media • Overlay text, spoken commentaries, • Secondary Complementary • Independent from the media • Written commentaries, summaries, analysis

  4. Audio-Video Analysis • Crowd image detector • Speech-Band Audio Activity • On-Screen Graphics Tracking • Motion activity measure • Field Line orientation • Close-up

  5. Primary complementary resources • Video track • Overlay text OCR • text region detection • Time synchronization • Merging 16 frames to recognizemoving from static objects in the video • Textual information such as overlay text and players numbers provide additional primary resource • Audio track • Speech commentaries

  6. Secondary Complementary Resources • Tabular • Summaries, list of players, goals, cards • “meta” information • Location, referee, attendance, date

  7. Secondary Complementary Resources • Unstructured • Several minute-by-minute sources • Text analysis and event extraction using SPRouT • Player actions • Player Names • German and English • ‘A beautiful pass by Ruud Gullit set up the first Rijkaard header.’ SProUT Ontology based IE tool

  8. Ontology • SProUT uses SmartWeb football ontology for • Player action • Referee action • Trainer action

  9. Architecture Overview 1 2

  10. Architecture overview

  11. Reasoning over complementary resources of football games • Textual Sources (per coarse-grained minute) • Extraction of semantic concepts from unstructured texts using DFKI ontology based information extraction tool • Video Analysis (for every second) - DCU • Crowd image detector – values range ∈ [0,1] • Speech-Band Audio Activity - values range ∈ [0,1] • Motion activity measure - values range ∈ [0,1] • Close-up - values range ∈ [0,1] • Field Line orientation - values range ∈ [0,90]

  12. Video Analysis Fuzzification Similar process for motion, close up and crowd detectors • A period of 20 seconds is evaluated • A threshold value was set according to the detectors mean value during the game. • Top value was mapped to [0,1]

  13. Video Analysis Fuzzification • Line angle • Values between 0-7 are Middle Field • Values between 17-27 are End of Field • Fuzzification according to their occurrences in the period of 20 seconds • Example • Middle Field 13 occurrences Fuzzy Value = 0.65 • End of Field 4 occurrences Fuzzy Value = 0.2 • Other 3 occurrences Fuzzy Value = 0.15

  14. Declaring Alphabet … Concepts= {Scoringopportunity Outofplay Handball Kick Scoregoal Cross Foul Clear Cornerkick Dribble Freekick Header Trap Shot Throw Pass Ballpossession Offside Charge Lob Challenge Booked Goalkeeperdive Block Save Substitution Tackle EndOfField MiddleField Other Crowd Motion CloseUp Audio} Roles= {consistOf} Individuals= {min0 sec20 sec40 sec60 min1 sec80 sec100 sec120 min2 sec140 sec160 sec180 min3 sec200…}

  15. Knowledge Representation- ABox 〈 min1 : Kick≥ 1 〉 〈 min1 : Scoregoal≥ 1 〉 〈 sec80 : Audio≥ 0.06 〉 〈 sec80 : Crowd≥ 0.231 〉 〈 sec80 : Motion≥ 0.060 〉 〈 sec80 : EndOfField≥ 0.05 〉 〈(min1 : sec60 ) : consistOf ≥ 1〉 〈(min1 : sec80 ) : consistOf ≥ 1〉 〈(min1 : sec100 ) : consistOf ≥ 1〉 〈(min1 : sec120 ) : consistOf ≥ 1〉

  16. Knowledge Representation- TBox

  17. Query Examples

  18. Architecture Overview 1 2

  19. Cross-Media Features • Basic idea • Identify which video detectors are more prominent for which event class • For instance for CORNERKICK the “end-zone” video detector should be significantly high • Strategy • Analyze distribution of video detectors over event classes • Identify significant detectors for each class • Feedback into the video event detection algorithm

  20. Cross-Media Features • purpose of the cross-media descriptors is to capture the features and relations in multimodal data so as to be able to retrieve complementary information when dealing with one of the data sources • build up model to classify events in video independently from the video • Use of cross-media features in event-type classification of video segments by use of fuzzy reasoning with the FiRe inference engine • Fire is focused on events retrieval

  21. Thank you for your attention

More Related