200 likes | 363 Views
UNC Knowledge Base. 21 Sept 2012 Ryan cronk Ashraf Farrag John Pu. Discussion Points. Overview of the Knowledge Base User definition Proposal Challenges Study methodology and timeline. Knowledge Base Project. Custom-built content management tool & search engine
E N D
UNC Knowledge Base 21 Sept 2012 Ryan cronk Ashraf Farrag John Pu
Discussion Points Overview of the Knowledge Base User definition Proposal Challenges Study methodology and timeline
Knowledge Base Project Custom-built content management tool & search engine Collect, maintain, and provide content in the form of questions and answers on health informatics topics “Next-gen” Frequently Asked Questions (FAQ) – help SILS users to find questions and answers
End Goal How can this platform improve the learning experience of new HIT students? Is it better for HIT searches than other search engines like Google?
Who are the system users? • Students at UNC interested in HIT (CHIP) • BSIS • Certificate students (CIS, PHI, Health Care Informatics) • MSIS/PhD • What are they interested in? (working assumptions) • Coursework • They want an A+ in INLS 523 • Getting jobs • Knowing and understanding industry terminology • Learning more about both efficiently
The Problem for Users • Search engines do not give authoritative or specific information on specialized topics • Designed for generalized Q&A search responses • Additionally, for a specialized topic, finding answers using keywords is not always a best approach • Users may lack the necessary vocabulary • Example: A Google search of “informatics” yields all kinds of results. Are we interested in Bioinformatics? Health Informatics? Environmental Informatics? Laboratory Informatics? Etc…
Proposed Solution Generating authoritative and accurate information Providing a search function that goes beyond a basic keyword query approach Instead of immediately yielding answers, the search uses query expansion to match a term with a set of questions Ultimately, we want: Curated content – carefully selected and relative to domain (in this case, HIT)
(Working) Research Questions • Does the functionality of the Knowledge Base improve for the user with the incorporation of the query expansion • Test: Query expansion enabled and disabled, etc. (more on this in a bit) • Does the Knowledge Base provide value to users over a traditional search engine • How do we define value?
General Challenges Including sources of information Generating strong questions System architecture is more complex than typical support through current search engine technology Iterative improvements through user testing and evaluation take time
Challenges: Building a Database of terms • Need an HIT database before we can effectively run the study • Small database can be put together quickly but could potentially add bias • Large database takes time but will be comprehensive • Options • HIMMS • Wikipedia • Other ideas: • Mining blogs, webpages, etc.
Challenges: Query expansion for HIT terms • WordNet: general synonyms for general language and doesn’t necessarily make the connections for HIT terms • Need to develop a specialized vocabulary resource • We need the CHIP HIT WordNet!
Options for Testing Query expansion enabled and disabled Query expansion off, query expansion w/ WordNet, query expansion w/ HIT WordNet Query expansion (WordNet or HIT WordNet) of Knowledge Base search vs. Google search
But wait… • Follies of an engineering mindset… • Is it actually worth designing our own search? • What if we just curate really good content and use a Google custom search? • Most websites that have curated content are indexed by Google anyways and you can find it by just using the white page with a search box • Ex. Stackoverflow.com – you can find all the web content just by searching google.com • If you can’t beat ‘em, join ‘em? (or at least borrow their custom search functionality?)
So now what??? • “A general state of uneasiness is the start of wisdom” - Dr Javed Mostafa
Next Steps/Project Timeline • In Parallel over the next month: • Submit IRB and develop study design (Ryan) • Import content for DB from Wikipedia & other sources (Ashraf) • Query expansion for HIT terms (John) • Start writing paper • Recruit Participants • Run Study • Evaluate Results • Paper complete by January 2013
Study Design • Participants randomly assigned query-on/query-off • Computer with two screens – one with Qualtrics survey questions, one with the KB • Data collection • Qualtrics Survey • Post-hoc interview • Feedback and comments from the users on certain tasks, etc.
Novel Components HIT content curated database of terms (none that we know of in existence) HIT WordNet: current WordNet is just general English Whole package – search + query expansion