110 likes | 129 Views
Join us as we explore new insights through linking and interlinking key drug data sets on the web. Discover how businesses can benefit from comprehensive data analysis and gain a competitive edge in the pharmaceutical industry. Our platform provides tools for browsing, querying, and filtering data effortlessly. Uncover the potential of connecting existing data sources to enhance your competitive strategy.
E N D
Linking Open Drug DataSusie Stephens,Principal Research Scientist, Eli Lilly
The Linked Data Cloud Source: Chris Bizer
Linking Open Drug Data • HCLSIG task started October 1, 2008 • Primary Objectives • Survey publicly available data sets about drugs • Publish and interlink these data sets on the Web • Explore interesting questions in competitive intelligence that could be answered if the data sets are linked • Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao
Assessment of Data Sources Mark Sharp et al. A Framework for Characterizing Drug Information Sources. AMIA 2008
Published Data Sets • LinkedCT (http://linkedct.org) • Online registry of more than 60,000 clinical trials • Published in XML • 7,011,000 triples (290,000 interlinking) • DrugBank (http://www4.wiwiss.fu-berlin.de/drugbank) • A repository of almost 5,000 FDA-approved drugs • Published as DrugBank DrugCards • 1,153,000 triples (23,000 interlinking) • DailyMed (http://www4.wiwiss.fu-berlin.de/dailymed/) • High quality information about marketed drugs • Flat file representation • 124,000 triples (29,600 interlinking) • Diseasome (http://www4.wiwiss.fu-berlin.de/diseasome) • Information about 4,300 disorders and disease genes linked by known disorder-gene associations • Published in XML • 88,000 triples (23,000 interlinking)
Classes of Links • Based on common identifiers • Links present in the source data sets • Based on link discovery and record linkage techniques • String matching • E.g., “Alzheimer’s disease” in LinkedCT was matched with “Alzheimer_disease” in Diseasome • Semantic matching • E.g. “Varenicline” has the synonym “Varenicline Tartrate” and the brand names “Champix” and “Chantix”
Business Use Case • A neuroscience focused business manager is interested in seeing an update on new clinical trials by competitors on Alzheimer’s Disease (AD) • A phase III trial by Pfizer for a drug called Varenicline has just been listed in linkedCT • More information of interest is found in DBpedia, DailyMed, and DrugBank • DailyMed indicates the drug is already on the market for Nicotine addiction and has minimal side effects • DrugBank allows the manager to see the targets for Varenicline • Diseasome, however, indicates that the corresponding genes are only implicated in nicotine addiction, rather than AD • This suggests a more complex relationship between the diseases than just the drug target • Extending the browsing to the SWAN Knowledgebase shows that there are hypotheses relating AD to nicotine receptors through amyloid beta
Technical Challenges • Life sciences data is difficult to connect due to inconsistent terminology and the prevalence of synonyms, and homonyms • Refinement of tools and techniques for enabling more automatic linking of entities across data sets • Selection of ontologies to enable consistent mappings • Development a sufficiently robust platform as to enable inferencing • Provide an interface to users that supports browsing, querying, and filtering data • Persuade data providers to publish in RDF would alleviate the need for us to update data, and provide some of the interlinking
Next Steps • Ensure that existing data are accurately and comprehensively linked • Incorporate additional data sources into the LODD cloud that are of interest to competitive intelligence (e.g. Traditional Chinese Medicine) • Use novel link discovery tools and frameworks including Silk and LinQuer • Explore using SIOC to aggregate information as what patients are saying about drugs • Submit paper to the iTriplify Challenge
Task Alignment • LODD is looking to use Pharma Ontology’s work to help inform the mappings • Data converted to RDF is also loaded into BioRDF’s HCLS KB
Conclusions • Added 4 drug-related data sets into the cloud for competitive intelligence • Will add further data sources to the LODD cloud to enable more insights to be gleaned • Will continue to explore and test tools that are being developed for LOD