110 likes | 198 Views
Where Do You Go for Biomedical Funding?. Yi Liu, Ahmet Altay. Background. Problem In biomedical research there are many sources of federal funding. How to choose the right institution for funding for a given research idea? Data
E N D
Where Do You Go forBiomedical Funding? Yi Liu, Ahmet Altay
Background • Problem • In biomedical research there are many sources of federal funding. • How to choose the right institution for funding for a given research idea? • Data • Biomedical grant summaries from 20 institutions between the period 1972 and 2009
Pre-Processing • Clean up texts from mark-up/meta words/duplicates • Remove institutions with less than 5000 grant information • Bag-of-words approach with a pre-determined dictionary • Removed 319 stop words from text • Used stemming (Porter) to further collapse text • Dictionary size of 83485 with 120636 distinct spellings • Use mgrep to annotate our data with dictionary words
Processing • Generate a TFIDF matrix given the dictionary and abstracts • TFIDF matrix is huge (83435 by 561769) • Reduce TFIDF matrix for computational efficieny • Remove zero dictionary counts and abstracts • Use SVD and represent use a smaller sub-space of original matrix • Singular values decrease quickly. We used first 100 eigen vectors without losing much precision.
Effect of Using Eigen Sub-space • Tested performance of smaller data set (400). • Performance of raw TFIDF is similar to eigen sub-space.
Evaluation • For a given test abstract we used kNN search to find 100 closest abstracts. • Used a custom scoring algorithm to pick a grantor that best represents 100 nearest neighbors found: • Tested entire data set using Leave-1-out cross-validation