1 / 23

Latent Semantic Mapping (LSA)

Latent Semantic Mapping (LSA). 資工四 阮鶴鳴 資工四 李運寰 “ A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998. LSM feature. M be an inventory of M individual units. N be a collection of N compositions.

tadhg
Download Presentation

Latent Semantic Mapping (LSA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998

  2. LSM feature • M be an inventory of M individual units. • N be a collection of N compositions. • Define a mapping between (M,N) and a continue vector space L. • Each unit in M is represented in L. • Each composition in N is represented in L

  3. Feature extraction • :the number of times occurs in • :the total number of units present in • ,

  4. Singular Value Decomposition (SVD) • :eigenvector of • :(eigenvalue of )1/2 • :eigenvector of

  5. LSM framework extension

  6. Improved Topic Separability Under LSA

  7. Improved Topic Separability Under LSA • Inter-Topic: • LSA does not appreciably affect the average distance between inter-topic pairs. • Intra-Topic: • LSA dramatically reduces the average distance between intra-topic pairs. • Conclusion; • Maybe that this is because the similarity of units and compositions.

  8. Improved Topic Separability Under LSA -Something Correlative • LSI (Latent Semantic Indexing) • It also uses SVD to span the LSI space. • The example above comes from an experiment of a paper about LSI • 2000 terms • 20 topic • 100 terms per topic for prime set • 95% from prime set/5% from others • Distances are in radians

  9. Improved Topic Separability Under LSA -Something Correlative

  10. LSM applications • Mapping (M,N) to the same vector space L • In LSA : M individual words/unit N collection of meaningful composition • Using the SVD formalism.

  11. Example • Unit set M: • What • is • The • Time • Day • Meeting • Cancel • Four sentences: • 1) “what is the time.” • 2) “what is the day.” • 3) “what time is the meeting.’’ • 4) “cancel the meeting.”

  12. LSM additional application -- junk e-mail filtering • Separate the legitimate and junk mail • (M,N) : M inventory of words or symbols. N collection of legitimate and junk e-mail.

  13. LSM additional Application -- pronunciation modeling • Assigning phonemic phonetic transcriptions to graphemic word. (GPC) • (M,N) : M an inventory of letter n-tuples.N a collection of words. • When the out-of-vocabulary word happen? • Straightforward to gather the corresponding set of pronunciations from the existing dictionary

  14. Example: pronunciation modeling • Unit set M: • tho • hou • oug • ugh • gh~ • ~th • rou • thr • hro • ~ro • Four words (composition set N): • “rough” • “though” • “through”

  15. LSM additional application -- TTS unit selection • (M,N) : • M an inventory of centered pitch periods from the boundary region • N a collection of time slices

  16. LSM additional application -- TTS unit selection

  17. LSM additional application -- TTS unit selection

  18. LSM additional application -- TTS unit selection

  19. LSM additional application -- TTS unit selection • TTS: • NLP + DSP • What is MLDS and FSS?

  20. LSM additional application -- TTS unit selection • NLP: • Accurate phonetic transcription can only be achieved provided the part of speech category of some words is available, as well as if the dependency relationship between successive words is known. • Natural prosody heavily relies on syntax. It also obviously has a lot to do with semantics and pragmatics, but since very few data is currently available on the generative aspects of this dependence, TTS systems merely concentrate on syntax. Yet few of them are actually provided with full disambiguation and structuration capabilities.

  21. The LSM inherent tradeoff/drawback • descriptive power and mathematical tractability --ex: the dimension of R • updating dataand computation --ex: SVD recomputation • the narrow training data and polysemy

  22. Word clustering example • Cluster 1: antique, art, artist, collector, drawing, painter, Picasso,……… • Cluster 2: attorney, court, criminal, judge, jury, rule, witness,…….. • drawing a conclusion; breaking a rule.

  23. Reference • “Latent semantic indexing: A probabilistic analysis” in Proc. 17th ACM Symp.1998, • “High-quality text-to-speech synthesis :an overview.” Thierry DUTOIT Faculte Polytechnique de Mons, TCTS Lab

More Related