1 / 30

Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System

Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System. Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan 25 November 2003. Outline. Introduction Motivation Architecture of PVCAIS - Media Acquisition Module

wmcnair
Download Presentation

Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Year Project 2003/2004LYU0302PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan 25 November 2003

  2. Outline • Introduction • Motivation • Architecture of PVCAIS - Media Acquisition Module - Archive Indexing Module - Videoconference Accessing Module • Implementation in First Term • Future Work • Conclusion

  3. Introduction • PVCAIS stands for Personal Video Conference Archives Indexing System • A system that provides the convenient searching and browsing support for videoconferencing users on past videoconference archives

  4. Introduction • What is video conference? A real-time communication technology which combines different media: audio, video, text chat, file transfer, whiteboard and shared communications - More precisely is “multimedia conference”

  5. Motivation • Videoconference is becoming popular in education, business, personal communication • Participants wish to keep videoconference archives for later references • Normal video and audio files are neither searchable nor helpful to recall their contents • Indexing of videoconference archives has not been investigated till now

  6. Architecture of PVCAIS • Consists of 3 modules: - Media Acquisition Module - Archive Indexing Module - Videoconference Accessing Module

  7. Architecture of PVCAIS Archive Indexing Media Acquisition Videoconference Accessing

  8. Media Acquisition • Extracts channel data and forms media files • Videoconferencing physically contains 4 types of channels: Audio, Video, Data and Control • Audio and Video channels: transmit incoming/ outgoing audio and video information • Data channel: carries information for user application such as Text Chat, Whiteboard and File Transfer • Control channel: transmits system control information such as Member Information

  9. Media Acquisition • Video-in and Video-out channel • Reduce redundancy : just store key-frames • Detect scene change in real time • Each key frame picture is stored with a timestamp

  10. Media Acquisition • Audio-in and Audio-out channel • mixed into one stream after videoconference • will be used for Speech Recognition • Text Chat channel • sender, receiver • message • store with timestamp

  11. Media Acquisition • Whiteboard channel • Consists of a text-based index file and a number of snapshot pictures • Index file records timestamp for each whiteboard update event and the path of the corresponding snapshot picture • Update of this channel happens in a period of time -> need to detect when update begins and ends by monitoring data transfer in this channel

  12. Media Acquisition • File Transfer channel • Will have a copy of the sent/received files to the directory of archive and an index file • Index file includes sender’s and recipient’s user names and the path of the files • Control channel • Contains timestamp and information of each event such as member joined and member left

  13. Video_in Video_out Audio_in Audio_out One line One line Two lines One line Two lines Three lines One line Two lines Three lines Four lines One line Two lines Three lines Four lines Five lines Text_chat Whiteboard File_in File_ out i i i i Control Video_in archive Video_out archive Audio archive Text chat archive Whiteboard archive Document archive Time 0:00:00 Control archive Paradigm of storing the videoconference archives. Media Acquisition

  14. Archive Indexing • 7 raw files are extracted in Media Acquisition Module • Need to implement some indexing functions to retrieve more informationThese includes: Face Detection, Face Recognition, Speech Recognition, OCR, Time-based Text Merging, Keyword Selection, Title Generation

  15. Archive Indexing • Face Detection - distinguish between Slides and Faces - if face is detected, find out the face region Face Detection Slide Face Detection Face

  16. Archive Indexing • Face Recognition - Associate human faces in Video-in with name - Need to keep a face base - If no match in the face base, ask remote user to enter the name

  17. Archive Indexing • Speech Recognition - Generate speech script from audio archive- Speech of a videoconferencing contains the most information - Can use commercial library: Microsoft SAPI, IBM Via Voice • OCR - Take the slide archive as input and recognizes text from them - Need to identify and localize text on the complex background

  18. Archive Indexing • Time-based Text Merging - Merge the Speech transcript, Chat script, Whiteboard script and slide text archive to Text source according to their timestamp • Keyword Selection - takes the Text source as input - generates keyword for the videoconference

  19. Archive Indexing • Title Generation - takes the Text source as input - automatically generates a title for the videoconference • Generate XML index file - integrates all the archives - stores all the related files of a videoconference into a single directory

  20. Videoconference Accessing • Provides an interface for user to manage, search and review all indexed conference. • Allows user to modify the content of a conference, such as editing title or keywords, or delete a conference. • Allows user to search for a conference by different criteria, such as member name or keyword. • Allows user to review a conference by playing back the audio or the key frames.

  21. Implementation • NetMeeting 3.0 • A Windows feature that provide Internet conferencing function. • Support video, audio and data conferencing including application sharing, chat, whiteboard and file transfer. • Other features include remote desktop sharing.

  22. Implementation • NetMeeting 3.0 SDK • An extension of NetMeeting, provides an interface for programmers and Web developers to integrate conferencing capabilities into their applications. • API is in the form of COM interfaces and functions.

  23. Implementation • A simple NetMeeting compatible videoconference program built on top of the NetMeeting 3.0 SDK. • Support: • Video • Audio • Text message • File Transfer • Whiteboard

  24. Implementation • By directly using the functions of the API, the following raw data can be obtained: • the members information • file transfer record • text messages record • Video, audio and whiteboard data cannot be directly obtained.

  25. Implementation • Video • create a thread to check the display of the video windows • if scene change is detected, the video will be captured and stored as a still image. • the stored images are key frames of the conference and will be used for face detection and recognition after the conference.

  26. Implementation • Audio • create a thread to record the local audio from the microphone. • when certain amount of audio data is recorded, send the audio data to all members of the conference. • all the received audio files and locally recorded audio files will be combined to generate a single audio file. • the final audio file will be used for voice recognition, the voice engine used is Microsoft SAPI.

  27. Implementation • Whiteboard • cannot capture the NetMeeting whiteboard information because the format of the data is not stated in the API. • solution: create our own whiteboard function and data format.

  28. Conclusion • We developed a videoconferencing agent • All channel data except whiteboard can be collected. • Speech Recognition and Face Detection & Recognition is integrated into the system but accuracy needs to be improved • Simple searching can be performed on stored archives

  29. Future Work • Whiteboard • Improve accuracy of Voice Recognition • XML • Better searching method • OCR for slide in video • Improve User Interface

  30. Q & A Session

More Related