1 / 38

Speaker Recognition Research in Joensuu

Speech and Image Processing Unit (SIPU) http://cs.joensuu.fi/sipu/. Puheteknologian talviseminaari. Speaker Recognition Research in Joensuu. Pasi Fränti. Joensuu 10.3.2006. Goals for PUMS season 3 (1/2). Usability of automatic speaker identification in forensic applications

tamah
Download Presentation

Speaker Recognition Research in Joensuu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech and Image Processing Unit (SIPU) http://cs.joensuu.fi/sipu/ Puheteknologian talviseminaari Speaker Recognition Research in Joensuu Pasi Fränti Joensuu 10.3.2006

  2. Goals for PUMS season 3 (1/2) • Usability of automatic speaker identification in forensic applications • Compatibility with large databases • Automatization of LTAS + fusion with MFCC. • Voice activity detection

  3. Goals for PUMS season 3 (2/2) • Speaker verification in real (noisy) environment • Prototype for access control • Solving technical requirements for prototype in elevator. • Usability for detecting sound sources in general • Key word search (using HTK or Lingsoft Recognizer)

  4. Research Group PUMS personnel Pasi Fränti Professor Ilja Sidoroff Marko Tuononen, BSc Rosa Gonzalez-Hautamäki, MSc Doctoral researchers Collaborators Juhani Saastamoinen, PhLic Ismo Kärkkäinen, MSc Ville Hautamäki, MSc Tomi Kinnunen, PhD (Singapore) Victoria Yanulevskaya Evgeny Karpov, MSc (NRC)

  5. 1. Applicability to forensic applications • Automatic speaker recognition study has been done. • Results are not reported but actions taken within tasks 3 and 4. • Material can be found in Kinnunen’s PhD thesis [4] and Niemi-Laitinen’s presentation.

  6. 2. Support for large databases - Not yet done -

  7. 3. LTAS and other features • Automatic calculation of LTAS done. Integration to WinSprofiler in progress. Reporting in progress. • Benefit of LTAS is merely its speed and ease of use: no difficult control parameters. • No additional benefit to recognition accuracy. MFCC includes the same information. • Could be used for preliminary pruning in case of large datasets.

  8. Noise robustness of F0 feature Results reported in [3, 5]

  9. 4. Voice activity detection • Software for speech segmentation (VoiceGrep). • Command line version for Linux. • Windows version in WinSprofiler. • Testing done in SIPU laboratory. • Labtec® pc mic 333, 44,1 kHz • Recordings were emphasized 24 dB by Audacity voice editor

  10. 4a. Test material and results • Material • 4 hours in total. • Bad quality recordings: 11 bits data, of which 4-5 informatio, and the rest noise. • VoiceGrep made 168 detections: • 56 speech (33%) • 112 non-speech (67%) • Material included 71 real speech segments: • Average segment length 16 s. • VoiceGrep found 25 of these (35 %)

  11. 4b. VoiceGrep overall results

  12. 4c. VoiceGrep example(Correct detection) End of the speech is missed Start of the speech is detected correctly Play sample #1

  13. 4d. VoiceGrep example(false detections) Door opening Running water Walking Door Play sample #2 Play sample #3

  14. 4e. VoiceGrep example(missed speech segment) Door Door Speech and walking Play sample #4

  15. 4f. Entire data set(4 hours) Data Speech segments Result of VoiceGrep

  16. 5. Speaker verification in noisy environment • Systematic testing of the effective parameters has been reported in [1]. • Applicability of speaker verification in real environment has been reported in [2] and in Kinnunen’s PhD thesis [5]. • Additional testing will be done if enough time.

  17. 5a. Text-dependent verificationin access control • Utilizing time series information improves recognition. • Best result if everyone has their own password.

  18. 6. Prototype for access control Emergency button Microphone Motion detector

  19. 7. Calling elevator(technical requirements) • Communication with OPC-server: • Implemented with Matrikon server. • Program logic to elevator implemented: • Reads variables from OPC-server. • Interprets and shows elevator status. • Includes recording logic. • Speaker and voice related stuff: • Not yet implemented. • Main window does not show anything yet.

  20. 8. Usability for detecting sound sources in general - Not yet done -

  21. 9. Keyword search - Not yet done -

  22. Publications (season 3) • J. Saastamoinen, Z. Fiedler, T. Kinnunen and P. Fränti, "On factors affecting MFCC-based speaker recognition accuracy", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 503-506, October 2005. • H. Gupta, V. Hautamäki, T. Kinnunen and P. Fränti, "Field evaluation of text-dependent speaker recognition in an access control application", Int. Conf. on Speech and Computer (SPECOM'05), Patras, Greece, 551-554, October 2005. • T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0 Modeling for Text-Independent Speaker Recognition" Int. Conf. on Speech and Computer(SPECOM'05), Patras, Greece, 567-570, October 2005.

  23. Theses (season 3)Opinnäytetyöt • T. Kinnunen, "Optimizing Spectral Feature Based TextIndependent Speaker Recognition”, PhD thesis, University of Joensuu, June 2005. • R. Gonzalez-Hautamäki, "FundamentalFrequency Estimation and Modeling for Speaker Recognition”, MSc thesis, University of Joensuu, July 2005.

  24. Applications scenarios Speaker Recognition Speaker Verification Speaker Identification Is this Bob’s voice? Whose voice is this? ? + (Claim) Identification Verification Imposter!

  25. Software 1: Console program

  26. Software 2: WinSprofiler

  27. Software 3: Symbian Port to Symbian OS with Series 60 UI platform

  28. Software 4: Door SProfiler Opening laboratory door by speaking

  29. Software 5: Lift SProfiler(to appear in season 4 perhaps…)

  30. Future development (1) Software integration Keyword search WinSprofilerWindows (JoY)MobileSeries 60 (JoY) DBsupport SRLIB: VAD MSE F0 extractionfusion by weighted MSE VQ GMM MFCC LTAS

  31. Future development (2) Applications Call center Forensic applications Calling elevator Speech analyzer tool Access control common speaker recognition app. interface Verification Classifier fusion Segmentation Keyword search srlib VAD DB

  32. Future development (3) Technical development • Implement and integrate F0, maybe also other formants (F1, F2). • Automatic voiced/unvoiced segmentation. • User enrollment. • Use of sequence information (triplets). • Development of WinSprofiler software to the direction of voice profiler and speech analyzer tool!

  33. Machine room Lift car & hardware Future development (4) CAN GW box EthernetTCP/IP Display Microphone Our PC Approach detection OPC server SRLIB 3.0 DCOM Elevator prototype OPC client LiftCaller

  34. Alice Alice Speaker Recognition Verified & allowed Bob Speaker Recognition Paul Speaker Recognition Minna Not registered Speaker Recognition Unknown VPN Vision 1: Teleconferencing Speaker Recognition Unkonwn Minna Bob

  35. Vision 2: Call-center • Speech is the main tool for people in call-center • Voice login of personell • Removes the need for manual entry

  36. Vision 3: Language recognition • Related problem to speaker recognition – the same research groups usually study both problems. • Not trivial to solve. • Studied a lot for Asian languages, even for rare languages that do not have any ”written form”.

  37. Vision 4: Medical applications • Doctor use voice to record summary of patient meetings. • Access by keyword search. • Annotation. • Authentication of speaker.

  38. Thank for you patience! Questions?

More Related