1 / 5

Enhancing Multilingual Learning Representations: Insights from International MLR Issues

This research focuses on MLR issues across different countries such as Japan, China, and Germany, emphasizing the need for new features, varying query types, and language blending approaches. Metrics designed for English may not be suitable for Japanese MLR development, with unique features like Query Word Length and Phonetic URL Match being crucial. Evidence suggests Germany is ahead in MLR compared to Google, while Japan faces fewer spam issues. Future developments include features like vcano.match and Matching segmented chunks.

Download Presentation

Enhancing Multilingual Learning Representations: Insights from International MLR Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Japanese MLR

  2. International/JP MLR Issues • Have to do more with less data • Blending different languages? • Can’t necessarily filter adult • May need new/different features • Different types of queries English/Bracket/Phrase/etc • Metrics designed for English • China has lots more spam • Japan has much less spam • Germany looks 10-20% ahead of Google by DCG

  3. JP MLR vs. English MLR

  4. Different features important for JP • http://internal.inktomi.com/~lukeb/FeatureImportance.html • “Linkflux” • How soon the word appears in the document • Is the first word in query in the title

  5. New features for JP • Query Word Length very important • Query type important • Phonetic url match • Future: • vcano match • Matching segmented chunks

More Related