1 / 10

Cross-Lingual Linking of News Stories using ESA

Cross-Lingual Linking of News Stories using ESA. Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI , NUI Galway, Ireland OEG, UPM, Madrid, Spain. Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012 . Overview. P roblem Space Approach Search Space Reduction

elias
Download Presentation

Cross-Lingual Linking of News Stories using ESA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-Lingual Linking of NewsStories using ESA Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI, NUI Galway, Ireland OEG, UPM, Madrid, Spain Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012

  2. Overview • Problem Space • Approach • Search Space Reduction • Semantic Ranking • Cross-Lingual Explicit Semantic Analysis (CL-ESA) • Evaluations • Conclusion & Future Work

  3. Problem Space • Cross-lingual news story linking • identify the same news articles in different languages • Cross-Lingual Plagiarism detection • Data set • 50 English News Stories • 50K Hindi News Stories • Challenge • Not directly Translated • Similar keywords in different stories • Different keywords in similar stories

  4. Approach • Search Space Reduction • News publication dates • by taking K days window • Vocabulary overlap • Translating English news stories using Google Translate • SemanticRanking • Rank the news stories with their semantic relatedness • CL-ESA semantic relatedness score

  5. Semantic Ranking/Relatedness • Corpus-based Relatedness • Semantic meaning as a distributional vector • Words that occur in similar contexts tend to have similar/ related meanings i.e. meaning of a word can be defined in terms of its context. (Distributional Hypothesis (Harris, 1954)) • Latent Semantic Analysis (LSA) • Latent or implicit semantics (unsupervised) • Explicit Semantic Analysis (ESA) • Explicit semantics from explicitly derived concepts (supervised)

  6. Cross lingual ESA (CL-ESA) W1*URI1+w2*URI2…. wn*URIn EN Word1 • Multilingual Wikipedia Index • EN, DE, ES, PT, FR, NL, HI • Easily extendable for other languages • Performed better than CL-latent models W1*URI1+w2*URI2…. wn*URIn Wordn W1*URI1+w2*URI2…. wn*URIn Word1 HI W1*URI1+w2*URI2…. wn*URIn Wordn W1*URI1+w2*URI2…. wn*URIn Word1 ES W1*URI1+w2*URI2…. wn*URIn Wordn Inverted Index Term@en Vector Cosine W11*URI1+w12*URI2…. w1n*URIn Semantic Relatedness Term@hi W11*URI1+w12*URI2…. w1n*URIn

  7. Experiments • Run1 • window of 4 days (2 days before and 2 days after) • Rank all news stories using CL-ESA • Run2 • window of 14 days (7 days before and 7 days after) • Rank all news stories using Modified CL-ESA • Run3 • English stories were translated into Hindi using Google translator • Took top 1000 Hindi news using vocabulary overlap • Re-rank all news stories using CL-ESA

  8. Evaluation: Results • CL!NSS challenge

  9. Conclusion • Initial approach for cross lingual linking of news stories • Bigger window with modified CL-ESA works best • Translated vocabulary overlap did not work well • Use other ranking scores • LSA, LDA • Evaluate separate effect of components • Bigger window size Vs Ranking function

  10. Thank You Questions?

More Related