310 likes | 443 Views
Chemical name interpretations & Molecular time lines -. This shows detailed record view – with molecular links -. This shows the chemicals report with molecular timeline & mouse over of chemical names. Exploring co-table analysis of Molecules with Gene ID ’ s.
E N D
Chemical name interpretations & Molecular time lines -
This shows the chemicals report with molecular timeline & mouse over of chemical names
Exploring co-table analysis of Molecules with Gene ID’s For example – show me all of the co-occurrences of these (x) molecules with these (any / all) gene’s !
1 From the main menu select the Analyze tab
Now Enter the Inchi keys for the molecules of interest - 3 Click here to enter a sample (test) set of molecules
Now select - patent field – to explore “patents” ! 4 These are the molecules of interest – (Inchi keys to explore) Select Patent field here
Now select - facet = patent field + Gene then click analyze 5 Molecules Facet = Patents + Genes
This shows the “cotable” results = co-occurrences of molecules + NCBI –Gene ID’s These are the NCBI Gene ID #’s To transpose the charts or export the data – click here
This shows the transposed chart – of co-occurrences of molecules + NCBI –Gene ID’s Click here to see the patents containing this molecule + this particular gene
Co-table Analysis For example : Show me all documents where imitrex was Mentioned with “any” …..sign and / or symptoms (note: these are terms such as headache, vomiting, nausea ..etc ..there are > 680 of them).
Draw a compound of interest 1 2 Click – view compound in co-table
Draw a compound of interest 1 2 Click – view compound in co-table
3 Select a MeSH category for Co-occurance analysis 4 Click analyze
This shows the number of documents that contained the source molecule and ANY of the MeSH – C23 terms Click on the numbers to “link to ” the documents
Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)
This shows the number of documents that contained the source molecule and ANY of the MeSH – disease (C01) terms
This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms
This shows the comparison of different statins and the co-occurrence of MeSh terms Chemical Structures vs. Signs and Symptoms Medline co-occurrence of Statin structures vs. MeSH –
Chemical Search using ChemAxon w/ DB2 Search Proximal Search Nearest Neighbor Search
Clustering Claims Originality BioTerm Analysis Discovery
Landscape Analysis Visualization Networks
IBM’s - Massively Parallel Probabilistic Architecture Question Synthesis Final Merging & Ranking Question/Topic Analysis Hypothesis & Evidence Scoring Query Decomposition Hypothesis Generation Trained Models Soft Filtering Hypothesis & Evidence Scoring Hypothesis Generation Hypothesis & Evidence Scoring Hypothesis Generation Soft Filtering Answer, Confidence Watson generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. Thesegather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence. E. Sources A. Sources Deep Evidence Scoring Answer Scoring Supporting Evidence Retrieval Primary Search Candidate Answer Generation Evidence Retrieval Deep Evidence Scoring 25 Source – J Kreulen
Technical Issues to consider when applying QA systems like Watson Nature of Domain: Open vs. ClosedClosed domain implies all knowledge is contained within a specific domain characterized by ontologies and there is no need to go outside the domain.Jeopardy is an open-domain example where it is general knowledge. Knowledge/Data Sources: AvailabilityQA systems are natural language search engines. Watson goes beyond NL search. If knowledge sources are incomplete, unavailable, insufficient or inadequate then it is not possible for the system to provide an answer. In some cases one would need to envisage Interactive QA that require human interaction to guide the search. Another very important consideration is the availability of sufficient sample data for training (i.e. training corpus). Need for multi-modalityIs there a need for Transcription from Speech to Text before a question is answered? This would require integration of Speech to Text capabilities that are not really ready for real-time applications. Latency Watson is capable of processing 500GB of information per second with 3 sec response to questions and used most of its knowledge source in memory (as opposed to disk) for speed. What is the latency requirement for the application? Multi-Lingual or Cross-Lingual Support Watson can support only English at this time; with language-specific parsers other languages can be supported . If knowledge sources or QA is required in multiple languages then that would not be a good candidate. Additionally if cultural context have to be accommodated in the answer then it would not be prudent to deploy QA systems directly interacting with users. Question Type Decomposition and classification of the question is critical to how QA systems work. Bulk of the question types in Jeopardy were Factoid questions. Watson did not include 2 question categories: One is Audio/Video type questions that require looking at a video to answer and another are questions that require special instructions (e.g. verbal instructions to explain a question.) Answer Types Watson is not designed to curate a task-oriented system. It can handle temporal and geo-spatial reasoning in its answers. As it stands it cannot handle business process type of reasoning (to do task B tasks A, C must be completed etc.) DeepQA Application (Java/C++) Apace Hadoop + Apache UIMA SUSE Linux Enterprise Server 11 Watson Infrastructure • 90 Power 750 Servers • Each Server 3.5GHz POWER7 8 Core Processor with 4 threads/core • Total: 2880 POWER7 Cores with 16TB RAM • Processing speed: 500Gb/sec; 80 TeraFLOPS • 94th on Top 500 Supercomputers • Note: This hardware is for Jeopardy. Any other application of Watson will require appropriate sizing and optimization for purpose.
I would like to acknowledge the IBM Almaden Research – team Jeff Kreulen Ying Chen Scott Spangler Alfredo Alba Tom Griffin Eric Louie Su Yan Issic Cheng Prasad Ramachandran Bin He Ana Lelescu Qi He Linda Kato Ana Lelescu Brad Wade John Colino Meenakshi Nagarajan Timothy J Bethea German Attanasio Laura Anderson Robert Prill + a host of folks from IBM China Labs -
Challenges ahead – • Access to full – text • Language issues • Chinese • Japanese • Korean • Other • Legal issues • Web data • Integration with Medical content
Attempts to process Chinese Patent Documents Extracting chemical structures form Chinese patents… Chemicals from Chinese Patents -
Computer Curation Process Overview & integration with our collaborators - Services Hosted at IBM Almaden User Applications Annotation Factory ChemVerse Selected Internet Content Knime or Pipeline Pilot U.S. Patents (1976 -—2009) ChemVerse db (Semantic Associations) e Classifier & Other Data Associations View selected Documents & Reports BIW U.S. Pre- Grants (All) ADU* Database + compu ted Meta Data IP Database (e.g. DB2) Data Sources Parse & Extract data PCT & EPO Apps Cognos/DDQB/ Other Apps Medline Abstracts (>18 M) In-House Content Computational Analytics Annotator 1 Chem Axon Search Annotator 2 SIMPLE * ADU = Automated Data Update