70 likes | 185 Views
2006 B4GM Term Project. Extension of disease-related pathway using text mining. 2006.04.24 분자유전체의학 정희준. Introduction. Disease process 는 disease 의 이해와 치료에 있어서 중요 MeSH MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PuMed
E N D
2006 B4GM Term Project Extension of disease-related pathway using text mining 2006.04.24 분자유전체의학 정희준
Introduction • Disease process는 disease의 이해와 치료에 있어서 중요 • MeSH • MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PuMed • MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts • OMIM • OMIM archives mature, high-quality data of high significance, the standard in rare mendelian disorders • ArrayXPath • ArrayXPath have 3,088 genes or gene products • It has repository of meta-information for public pathway databases, GenMAPP, KEGG, BioCarta and PharmGKB • If input disease name is matched to the corresponding MeSH heading or entry term, PathMeSH outputs the list of the pathways containing the disease-related gene product
Problem • OMIM은 질병에대한 유전적 요인의 유전자를 정리
OMIM MorbidMap New GRIP MeSH hierarchies Disease Gene pathway PubMed Concept diagram
Method • Step 1. Collect PubMed’s abstract • MeSH heading과 유전자 symbol을 입력하여 검색되는 PubMed의 abstract 수집 • Step 2. Build Gene/Gene product dictionary • Entrez Gene, HGNC, SWISSPROT에서 제공하는 symbol, gene name의 dictionary 구축
Step 3. Extract gene/gene product in abstract • Step 1에서 모은 각 질병의 abtract에서 gene/gene product를 추출 • Step 4. Apply filtering • Step 3에서 추출한 gene/gene product에 대한 유이성 검사 • Filter를 통과한 gene/gene product를 disease-gene 관계에 포함