110 likes | 206 Views
Mining the Web: Discovering New Biomedical Knowledge. Aly Khan. The Human Genome Project. Goal: Sequence the human DNA Completed in 2003 Joint effort between National Institutes of Health and Celera Genomics. ~25,000 genes. 25,000 Genes. What do they do? How do they interact?.
E N D
Mining the Web: Discovering New Biomedical Knowledge Aly Khan
The Human Genome Project • Goal: Sequence the human DNA • Completed in 2003 • Joint effort between National Institutes of Health and Celera Genomics. • ~25,000 genes
25,000 Genes • What do they do? • How do they interact?
Finding context • Use vast amounts of published works to find novel relationships between genes • 17,000,000 records from more than 5,000 biomedical journals
On searching • Biomedical literature unbounded • Unstructured text in biomedical publications
Applications • NLP Parse text for matches using POS tags: • [Query noun phrase term] “is a” [noun phrase class] • hiv is a virus • [Noun phrase class] “is a” [Query noun phrase term] • genes such as 4fgf
Applications “The results demonstrated that KaiC interacts rhythmically with KaiA, KaiB, and SasA.” Ozgur et al. Path1: KaiC – nsubj – interacts – obj – SasA Path2: KaiC – nsubj – interacts – obj – SasA – conj_and – KaiA Path3: KaiC – nsubj – interacts – obj - SasA – conj_and – KaiB Path4: SasA – conj_and – KaiA Path5: SasA – conj_and – KaiB Path6: KaiA - prep_with - SasA – conj_and – KaiB
Contextual representation • PTEN is transcriptionally regulated by transcription factors such as p53 and Egr-1. • In response to DNA damage, the cell-cycle checkpoint kinase CHEK2 can be activated by ATM kinase to phosphorylate p53 and BRCA1, which are involved in cell-cycle control and apoptosis.
Goals • Creating a global ontology for genes, diseases, etc. • Automated discovery of relationships.