650 likes | 1.15k Views
Video Classification. By: Maryam S. Mirian For: Multimedia & Pattern Recognition Joint Courses Project. Outline. What is Video Classification? Straightforward or Difficult? What is its Applications? What are its methods? Review of Video Classification Methods
E N D
Video Classification By: Maryam S. Mirian For: Multimedia & Pattern Recognition Joint Courses Project
Outline • What is Video Classification? • Straightforward or Difficult? • What is its Applications? • What are its methods? • Review of Video Classification Methods • What is my own Project, exactly?
What is Video Classification? Classify a Video (Shot) into one of Nc predefined Classes: • Indoor / outdoor • News / Sports • …
Is Video Classification Difficult? Why? • YES, Because: • Data Stream is a Multi-dimensional signal. • It has a subjective nature.
Required Steps for Classification Object Observations Feature Extraction Feature Reduction Classification Class Labels Using Methods like: PCA, LDA The most Important and the most difficult part
Methods of Classification • Bayesian Classification • kNN Classification • Neural Classification • MLP • RBF • Classification based on Support Vector Machines • Rule-based Classification
Bayesian Decision Making So, x belongs to w2
Methods of Classification • Bayesian Classification • kNN Classification • Neural Classification • MLP • RBF • Classification based on Support Vector Machines • Rule-based Classification
kNN Decision Making k = 5, 2 Red Neighbor While 3 Black Neighbor, so X should be Black!
Methods of Classification • Bayesian Classification • kNN Classification • Neural Classification • MLP • RBF • Classification based on Support Vector Machines • Rule-based Classification
Applications ofAutomatic video classification • Automatic Video segmentation • content based retrieval • browsing and retrieving digitized video • identifying close-up video frames before running a computationally expensive face recognizer. • effective management of ever-increasing amount of broadcast news video: personalization of news video.
Classify Shot or Video? • One effective way to organize the video is to segment the video into small, single-story units and classify these units according to their semantics. • A shot represents a contiguous sequence of visually similar frames. It is a syntactical representation and does not usually convey any coherent semantics to the users.
Ide et al. [1998] • Problem Domain: News video • Features: • Videotext • motion • face • segmented the video into shots • used clustering techniques • classify each shot into 1 of 5 classes: Speech/report, Anchor, Walking, Gathering, and Computer graphics shots. • Quite simple but seems effective for this restricted class of problems.
Huang et al. [1999] • Problem Domain: TV Programs • news report • weather forecast • Commercials • basketball games • football games • Features: • Audio • Color • motion
Chen and Wong [2001] • Problem Domain: • news video: • News • Weather • Reporting • Commercials • Basketball • Football • Features: • Motion • Color • text caption • cut rate • used a rule-based approach
Basic Ideas • Proposes a two-level, multi-modal framework. • The video is analyzed at the shot and story unit (or scene) levels. • At the shot level, a Decision Tree to classify the shot into one of 13 pre-defined categories is employed. • At the scene level, the HMM (Hidden Markov Models) analysis is used to eliminate shot classification errors • Results indicate that a high accuracy of over 95 % for shot classification can be achieved. • The use of HMM analysis helps to improve the accuracy of the shot classification and achieve over 89% accuracy on story segmentation.
Features in Shot Level • Low-level Visual Content Feature • Color Histogram • Temporal Features • Background scene change • Speaker change • Audio • Motion activity • Shot duration • High-level Object-based features • Face • Shot type • Videotext • Centralized Videotext
Feature vector of a shot • Si = (a, m, d, f, s, t, c) • a the class of audio, a ∈{ t=speech, m=music, s=silence, n =noise, tn = speech + noise, tm= speech + music, mn=music+noise} • m the motion activity, m ∈{l=low, m=medium, h=high} • d the shot duration, d ∈{s=short, m=medium, l=long} • f the number of faces, Ν ∈ f • s the shot type, s ∈{c= closed-up, m=medium, l=long, u=unknown} • t the number of lines of text in the scene, Ν ∈ t • c set to “true” if the videotexts present are centralized, c ∈{t=true, f=false}
About Problem Domain… • Sport Classification seems OK • Interesting Enough • It is helpful for Sports-Lovers
About Extracting features…. • Features used in video analysis: color,texture,shape,motion vector… • Criteria of choosing features : they should have similar statistical behavior across time • Color histogram: simple and robust • Motion vectors:invariance to color and light
So, My Own Project is • Sports Video Classifications : Football, Basketball, ….(Those Well-defined sports, I can find Video On!) • Steps I should take: • Finding or Gathering a Video Collection • Shot Detection • Feature Extraction : • Key Frame (s) Extraction: • Selecting Middle Shot I-Frame • Use of Clustering • … • Motion Vector–based Features • Straight Lines Detection • Design a Classifier • Test the Approach
Looking @Ekin,Tekalp[2003] one Research on Football Video Classification
Features • Cinematic • result from common video composition and production rules. • shot types, camera motions and replays. • Object-based • Described by their spatial, e.g., color, texture, and shape, and spatio-temporal features, such as object motions and interactions
Robust Dominant Color Region Detection • A soccer field has one distinct dominant color (a tone of green) that may vary from stadium to stadium, and also due to weather and lighting conditions within the same stadium. • The statistics of this dominant color, in the HSI space, are learned by the system at start-up, and then automatically updated to adapt to temporal variations.
Shot classification • Long Shot • A long shot displays the global view of the field. • In-Field Medium Shot • a whole human body is usually visible. • Close-Up Shot • shows the above-waist view of one person • Out of Field Shot • The audience, coach, and other shots
How Extend to Shot from a Frame? • Due to the computational simplicity they find the class of every frame in a shot and assign the shot class to the label of the majority of frames.
Decision Schema based on G • The first stage uses G value and two thresholds,TcloseUpand Tmedium to determine the frame view label.
Soccer Eevent Detection • Goal Detection • Referee Detection • Controversial calls, such as red-yellow cards and penalties • Penalty Box Detection
Goal Detection • Occurrence of a goal is generally followed by a special pattern of cinematic features. • A goal event leads to a break in the game. • one or more close-up views of the actors of the goal event. • show one or more replay(s) • the restart of the game is usually captured by a long shot.
Referee Detection • Assumed that there is, a single referee in a: • medium • out of field • close-up shot • So no search for a referee in a long shot
Penalty Box Detection • Field lines in a long view can be used to localize the view and/or register the current frame on the standard field model
Interesting Summaries • Goal summaries • summaries with Referee and Penalty box objects
Adaptation of Parameters • Parameters • Tcolor in dominant color region detection • TcloseUp and Tmedium in shot classification • referee color statistics • The training stage can be performed in a very short time to find Mean and Variance of a Normal pdf.
Results for High-Level Analysis and Summarization • Goal detection results
Results for High-Level Analysis and Summarization(2) • Referee detection results
Results for High-Level Analysis and Summarization(3) • Penalty box detection results
References • Automatic soccer video analysis and summarization, in Symp. Electronic Imaging: Science and Technology: Storage and Retrieval for Image and Video Databases IV, IS&T/SPI03, Jan. 2003, CA. • “The Segmentation and Classification of Story Boundaries In News Video”, Proceeding of 6th IFIP working conference on Visual Database Systems- VDB6 2002, Australia 2002 • Pattern Classification, by Duda, Hart, and Stork, 2000
Thanks for Your Attention Any Question or Comment?