1 / 10

Where Do You Go for Biomedical Funding?

Where Do You Go for Biomedical Funding?. Yi Liu, Ahmet Altay. Background. Problem In biomedical research there are many sources of federal funding. How to choose the right institution for funding for a given research idea? Data

sai
Download Presentation

Where Do You Go for Biomedical Funding?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where Do You Go forBiomedical Funding? Yi Liu, Ahmet Altay

  2. Background • Problem • In biomedical research there are many sources of federal funding. • How to choose the right institution for funding for a given research idea? • Data • Biomedical grant summaries from 20 institutions between the period 1972 and 2009

  3. Pre-Processing • Clean up texts from mark-up/meta words/duplicates • Remove institutions with less than 5000 grant information • Bag-of-words approach with a pre-determined dictionary • Removed 319 stop words from text • Used stemming (Porter) to further collapse text • Dictionary size of 83485 with 120636 distinct spellings • Use mgrep to annotate our data with dictionary words

  4. Histogram for Stems per Abstract

  5. Processing • Generate a TFIDF matrix given the dictionary and abstracts • TFIDF matrix is huge (83435 by 561769) • Reduce TFIDF matrix for computational efficieny • Remove zero dictionary counts and abstracts • Use SVD and represent use a smaller sub-space of original matrix • Singular values decrease quickly. We used first 100 eigen vectors without losing much precision.

  6. Distribution of Singular Values

  7. Effect of Using Eigen Sub-space • Tested performance of smaller data set (400). • Performance of raw TFIDF is similar to eigen sub-space.

  8. Evaluation • For a given test abstract we used kNN search to find 100 closest abstracts. • Used a custom scoring algorithm to pick a grantor that best represents 100 nearest neighbors found: • Tested entire data set using Leave-1-out cross-validation

  9. Results (1)

  10. Results (2)

More Related