1 / 34

TopicTrend

Discover Emerging and Novel Research Topics. TopicTrend. By: Jovian Lin. Introduction. Formulating a research idea is the 1 st step for success in academia. A worthy research idea must be original and innovative .

zazu
Download Presentation

TopicTrend

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discover Emerging and Novel Research Topics TopicTrend By: Jovian Lin

  2. Introduction • Formulating a research idea is the 1st step for success in academia. • A worthy research idea must be original and innovative. • In order to come up with innovative research ideas, researchers have to read a lot of published articles… • … which is time-consuming.

  3. “No.” “Is there any shortcut to success?” “There are efficient ways to achieve success” Search Engines in Digital Libraries:

  4. Introduction • Search engines support information seeking and retrieval. List of titles (of articles) Search Engine “Search Query”

  5. Search Results

  6. Introduction Howusefulis this result tothejunior researcher? • Search engines support information seeking and retrieval. • However, is this enough for the junior researcher? FYP students 1st year PhD students • Define a research topic (from zero knowledge) • Help in survey • Identify emerging/new research areas to explore • Determine related topics

  7. Problem Definition • Junior researchers want: • Understand research topics andtrends. • RecognizeHOTtopics. • Understand how topics interactand influenceresearch activity.

  8. Problem Definition • Junior researchers want: • Understand research topics and trends. • RecognizeHOTtopics. • Understand how topics interactand influenceresearch activity. Current InefficientMethod Enter a search query Extract new terms fromselected article View results Select a few articles to read

  9. Search Results

  10. Information overload !

  11. Problem Definition • Junior researchers want: • Understand research topics and trends. • RecognizeHOTtopics. • Understand how topics interactand influenceresearch activity. CurrentInefficient Method Enter a search query Extract new terms fromselected article View results Select a few articles to read

  12. Problem Definition • Junior researchers want: • Understand research topics and trends. • RecognizeHOTtopics. • Understand how topics interactand influenceresearch activity. DesiredEfficient Method Enter a search query View results TopicTrend List of HOTresearch topics (related to the search query) Do it quick! Visualization of the research topics

  13. Quick Demo

  14. Evaluation • Recruited 4 participants. • Participants: • Tested TopicTrendusing queries from their respective domains. • RatedTopicTrend’s output (w.r.t. their query). [Quantitative] • Filled up a questionnaire. [Qualitative] • Chemistry / PhD • Engineering (Transportation) / PhD • Comp Science (AI) / PhD • Engineering / FYP

  15. Evaluation “machine learning” Topic H Topic A 1 Topic I Topic B 0 Topic G Topic C 1 Topic D 1 Topic J Topic E 1 Topic F 1 Topic F Topic A Topic G 1 Topic H 1 Topic I 1 Topic B Topic J 1 Topic E Score 9/10 Topic C Topic D

  16. Evaluation Quantitative Average score = 68.125%

  17. Evaluation Qualitative • Questionaire using Five-Point Likert Scale. • 1=Disagree, 5 =Agree. • Some examples: • “The system was easy to use.” • “The system gave interesting results.” • “I was able to get a better understanding of the topics.” • “I was able to discover trends.” • “I was able to discover relationships between topics.” • “I was able to discover potential, novel topics.” • Details in Project Report. 4.75 / 5 4 / 5 4 / 5 4 / 5 4 / 5 4 / 5

  18. Conclusion • TopicTrend is a visualization tool that helps junior researchers: • Understand research topics and trends. • RecognizeHOTtopics. • Understand how topics interact and influenceresearch activity. • However, results were mediocre • Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc) • Solutions and Future Work: • TF-IDF weight — don’t have to manually enter stop words. • Statistical measure to evaluate how important a word is. • The importance increases to the number of times a word appears in the document... • But is offset by the frequency of the word in the corpus. • Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei) • Online LDA – find topics fasterthan normal LDA; analyze in a stream. • Dynamic Topic Models (DTM) – captures the word evolution of each topic over time. • Search by exemplar (instead of search by keyword) • Benefits users who have difficulty expressing their query.

  19. Conclusion • TopicTrend is a visualization tool that helps junior researchers: • Understand research topics and trends. • RecognizeHOTtopics. • Understand how topics interact and influenceresearch activity. • However, results were mediocre • Due to presence of stop phrases (e.g., “problem set”, “proposed model”, etc) • Solutions and Future Work: • TF-IDF weight — don’t have to manually enter stop words. • Statistical measure to evaluate how important a word is. • The importance increases to the number of times a word appears in the document... • But is offset by the frequency of the word in the corpus. • Latent Dirichlet Allocation (LDA) – view each abstract as a mixture of topics. (David Blei) • Online LDA – find topics faster than normal LDA; analyze in a stream. • Dynamic Topic Models (DTM) – captures the word evolution of each topic over time. • Search by exemplar (instead of search by keyword) • Benefits users who have difficulty expressing their query.

  20. Thank You

  21. Backup Slides

  22. Implementation • OpenNLP— a machine learning based toolkit for the processing of natural language text. • Used OpenNLPto retrieve a list of NPs. NP A OpenNLP Tools NP B An article NP C NP D NP E NP F Sentence Detection Tokenization Part-of-Speech (POS) Tagging Chunking and Retrieving NPs

  23. Implementation • Sentence Detection Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. Those contraction-less sentences don't have boundary/odd cases...this one does. • Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. • Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. • Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. • Those contraction-less sentences don't have boundary/odd cases...this one does.

  24. Implementation • Tokenization • Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. • Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. • [Pierre] [Vinken] [,] [61] [years] [old] [,] [will] [join] [the] [board] [as] [a] [nonexecutive] [director] [Nov.] [29] [.] • [Mr.] [Vinken] [is] [chairman] [of] [Elsevier] [N.V.] [,] [the] [Dutch] [publishing] [group] [.]

  25. Implementation • Part-of-Speech Tagging • Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. • Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. • [NNP] [NNP] [,] [CD] [NNS] [JJ] [,] [MD] [VB] [DT] [NN] [IN] [DT] [JJ] [NN] [NNP] [CD] [.] • [NNP] [NNP] [VBZ] [NN] [IN] [NNP] [NNP] [,] [DT] [JJ] [NN] [NN] [.]

  26. Implementation • Text Chunking and Extracting NPs • Text chunking consists of dividing a text in syntactically correlated parts of words. • Uses the Tokenization and POS Tagging data. • For example:He reckons the current account deficit will narrow to only # 1.8 billion in September.Becomes:[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ] .

  27. Implementation • Text Chunking and Extracting NPs • Text chunking consists of dividing a text in syntactically correlated parts of words. • Uses the Tokenization and POS Tagging data. • Note the: • B-Chunk • I-Chunk

  28. Implementation • OpenNLP— a machine learning based toolkit for the processing of natural language text. • Used OpenNLPto retrieve a list of NPs. NP A OpenNLP Tools NP B An article NP C NP D NP E NP F Sentence Detection Tokenization Part-of-Speech (POS) Tagging Chunking and Retrieving NPs

  29. Implementation • An algorithm to calculate the score of a NP. 1 + 1 10 + 1 Score = Score = 1 + 2 + 10 + 20 10 + 2 + 1 + 20 NP A 10 # (0 ~ 2 years) 3 11 = = = 0.090 = 0.333 NP B # (2 ~ 4 years) 2 33 33 NP C # (4 yrs & beyond) 1 NP D NP E NP F 1 # (0 ~ 2 years) # (2 ~ 4 years) 2 # (4 yrs & beyond) 10

  30. Implementation • An algorithm to calculate the score of a NP. NP A NP B NP C NP D NP E NP F

  31. Implementation • Re-rank the list of NPs base on the score. New! NP A NP B NP B Re-rank NP D NP C NP E NP D NP C NP E NP A NP F NP F

  32. Implementation Calculate the relationship strength between NPs byconsidering the common articles (PIIs) that they have. The more articles they have in common, the thicker the edge.

  33. The End

More Related