1 / 21

Ontology-based Annotation & Query of TMA data

Ontology-based Annotation & Query of TMA data. Nigam Shah Stanford Medical Informatics (nigam@stanford.edu). Tissue Microarrays. www.nature.com/clinicalpractice/onc. Stanford tissue microarray database. http://tma.stanford.edu/tma_portal/. Key analysis issue.

semah
Download Presentation

Ontology-based Annotation & Query of TMA data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-based Annotation & Query of TMA data Nigam Shah Stanford Medical Informatics (nigam@stanford.edu)

  2. Tissue Microarrays www.nature.com/clinicalpractice/onc

  3. Stanford tissue microarray database http://tma.stanford.edu/tma_portal/

  4. Key analysis issue • Tissue microarrays query a large number of samples/patients for one protein. • The key query dimension in TMA data is a tissue sample • Because of the lack of a commonly used ontology to describe the diagnosis [or annotations] for a given TMA sample in TMAD it is not easy to perform such as query.

  5. Ontologies considered • The NCI Thesaurus, version 05.09g • The SNOMED-CT, from UMLS 2005 AA

  6. Available annotations for a block • Each donor block in the TMA has semi-structured text associated with it.

  7. Map text to ontology terms • Make all possible permutations • Rules to weed out bad permutations • Check for an exact match with NCI and SNOMED-CT terms (and/or synonyms) • Rules to weed out bad matches 24 permutations Prostate Carcinoma Adeno intraductal Prostate Carcinoma Adeno intraductal : Carcinoma Prostate intraductal Adeno : Adeno Carcinoma intraductal Prostate : Prostate intraductal Adeno Carcinoma Prostate_Ductal_Adenocarcinoma

  8. Sample matches (from NCI-T)

  9. Results and validation • Mapped the term-sets for 8495 records, which correspond to 783 distinct term-sets. • 577 term-sets (6614 records) matched to the NCI thesaurus • 365 term-sets (3465 records) matched to SNOMED-CT • In total mapped 6871 records (80%) of annotated records in TMAD (641 distinct term-sets) to one or more ontology terms.

  10. Browsing interface

  11. Parents & Siblings nodes with data (Burly wood) Child nodes with no data (Grey) Child nodes with data (Yellow)

  12. Click on the “anchor” link to get data

  13. Updates since February

  14. How do ontology based annotation help? • Better search: we can retrieve samples of all the retroperitoneal tumors or malignant uterine neoplasms for example. • Better Integration of data: we can correlate gene expression with protein expression across multiple tumor types. • Tissue microarray data from TMAD • Gene expression data from GEO

  15. Integrating mRNA and protein expression Genes Samples Proteins Samples

  16. Partial alignment of NCI-T and SNOMED-CT as a “bonus”

  17. Steps in Alignment • Anchor identification • Identify similar class labels in the ontologies to be aligned • Usually done by string matching • Ontology structure • Use the “similar” classes as anchors and examine the local [graph] structure around them to inform the “similarity” metric R Root Term-1 Term-2 t1 t2 Term-3 Term-4 t3 t4 t5 t7 Term-5 t6

  18. We might improve alignment … Ontology [graph] structure based step t5 S2 Term-5 t5 S2 R Root Term-5 Term-1 Term-2 t1 t2 Term-2 t1 Term-3 Term-4 t3 t4 t5 Term-5 t5 t7 Term-5 t6 Provide Anchors from annotated data

  19. Better Text-mapping  Better Alignment

  20. Summary Ability to map word-groups to ontology terms

  21. Pathology Robert Marinelli Matt van de Rijn Medical Informatics Kaustubh Supekar Daniel Rubin Mark Musen Funding NIH Credits and acknowledgements

More Related