1 / 16

Computer Science Department

Computer Science Department. A Speech / Music Discriminator using RMS and Zero-crossings. Costas Panagiotakis and George Tziritas. Department of Computer Science University of Crete Heraklion Greece. Computer Science Department. EUSIPCO 2002, Toulouse France. 1. Presentation Organization.

kyoko
Download Presentation

Computer Science Department

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Science Department A Speech / Music Discriminator using RMS and Zero-crossings Costas Panagiotakis and George Tziritas Department of Computer Science University of Crete HeraklionGreece

  2. Computer Science Department EUSIPCO 2002, Toulouse France 1 Presentation Organization • I. Introduction • II. Segmentation • Classification • Results • Conclusion

  3. Computer Science Department EUSIPCO 2002, Toulouse France 2 Introduction (1/3) Input Figure 1: Original Sound Signal (44100 or 22050 sample rate) Output Figure 2: Real time Segmentation and Classification (Speech,Music,Silence)

  4. Computer Science Department EUSIPCO 2002, Toulouse France 3 Introduction (2/3) Approaches • Features extraction (energy,frequency) • Feature based Segmentation and Classification Basic purpose • Real time segmentation and classification • Algorithmic - computation constraints • Low feature number • Low change extraction error (20 msec) • Low minimum distance between two changes (1 sec) • High accuracy (95 %)

  5. Computer Science Department Introduction (3/3) Basic Features • Computed every 20 msec • Independent characteristics Root Mean Square (RMS) • Signal energy A = • Figure 3: RMS in music Figure 4: RMS in speech Zero Crossings (ZC) • Mean frequency • Figure 5: ZCin music Figure 6: ZC in speech EUSIPCO 2002, Toulouse France 4

  6. Computer Science Department • Figure 7: Histogram RMS in speech, approximation by χ2 distribution • Figure 8: Histogram RMS in speech, approximation by χ2 distribution EUSIPCO 2002, Toulouse France 5 Segmentation (1/3) Basic characteristics RMS based χ2 distribution fits well the RMS histograms Γ( a + 1) m : mean , s2 :variance Two stage algorithm • Stage 1 • 1 sec accuracy (low computation cost) • Stage 2 • 20 msec accuracy (high computation cost)

  7. Computer Science Department Frame i-1 Frame i Frame i+1 Frame i+2 LOW HIGH EUSIPCO 2002, Toulouse France 6 Segmentation (2/3) • Stage 1 • Partitioning in 1 sec frames (50 RMS values) • Change in Frame i  Frame i-1 and Frame i+1 have to differ • Computation of frame distance D (Matusita Distance) using frame similarity (p) • Frame i is candidate for Stage 2 (there is a change) • If D(i) > threshold and D(i) local maximal p( p1 , p2 ) Change in frame i RMS time 1 sec frames Distance

  8. Computer Science Department EUSIPCO 2002, Toulouse France 7 Segmentation (3/3) • Stage 2 • 20 msec accuracy • for each candidate frame (i) from stage 1 • 1. move 2 successive frames (1 sec) located before and after frame (i) • 2. find the time instant where the 2 successive frames have the maximum Matusita distance in RMS distribution • Possible oversegmentation • Figure 11: The segmentation result and the RMS data • Figure 10: The RMS data and the distance D

  9. Computer Science Department Classification (1/4) • Basic purpose • Segment classification in one of following classes • Music • Speech • Silence • Main Algorithm • Hypothesis • Segmentation gives homogenous segments • Input • Basic characteristics RMS, ZC • Actual features computation of segment • Classification based on actual features values EUSIPCO 2002, Toulouse France 8

  10. Computer Science Department Classification (2/4) Actual Features specification • Normalized RMSvariance, σ2Α • σ2Α = • Usually (86 %) σ2Α(music) < σ2Α (speech) • The probability of null ZC, ZC0 • Always ZC0 (music) = 0 Usually (40%) ZC0(speech) > 0 • Maximal mean frequency, max(ZC) • Almost always in speech max(ZC) < 2.4 kHz In 2% of the cases in music max(ZC) > 2.4 kHz EUSIPCO 2002, Toulouse France 9

  11. Computer Science Department Classification (3/4) Actual Features specification • Joint RMS/ZC measure, Cz • Speech : High correlation RMS, ZC many void intervals  low RMS and ZC • Music : Essentially independent RMS, ZC • Void intervals frequency, Fu • Void intervals detection ( 20 msec ): • (RMS < T1) && (RMS < 0.1•max(RMS(i)) && (RMS < T2) || (ZC = 0) • Group neighborly silent intervals • Fu : frequency of grouped silent intervals • Always in speech Fu > 0.6 • In at least 65% of music Fu < 0.6 iA EUSIPCO 2002, Toulouse France 10

  12. Computer Science Department A i A Silence segment check Silence Actual features check speech music ομιλία EUSIPCO 2002, Toulouse France 11 Classification (4/4) Silence segment recognition Segment is silence  E < Threshold • Decision making algorithm

  13. Computer Science Department EUSIPCO 2002, Toulouse France 12 • Data Data source • Segmentation performance Results • 11.328 sec speech • 3.131 sec music • 70% audio CDs • 15% WWW • 15% recordings • Actual features performance • 97% detection probability • Change accuracy ~ 0.2 sec Accuracy ZC0 Cz σ2Α σ2Α, ZC0 σ2Α Cz Cz σ2Α ZC0 σ2Α Fu σ2Α All Features Features

  14. Computer Science Department • Complexity Conclusion • Minimum complexity O(N) • Low computation cost • Summary • Real time segmentation and classification in three classes • Energy distribution (RMS) suffices for segmentation • RMS – ZC suffices for classification • Purpose : minimum cost and high performance • Future extension • Content-based indexing and retrieval audio signals • Pre-processing stage for speech recognition EUSIPCO 2002, Toulouse France 13

  15. Computer Science Department Segmentation - Classification Demo

  16. Computer Science Department Sound Player Demo

More Related