1 / 10

Learning Causal Relationships From Medical Abstracts

Learning Causal Relationships From Medical Abstracts. Jonathan Elsas Jaime Arguello. Goal. Extract entity pairs (A,B,P) , where it is said that A  B with Pr = P Extract contexts (pre, mid, post,direction, P) that mark causal relationships with Pr = P

dawn-price
Download Presentation

Learning Causal Relationships From Medical Abstracts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Causal Relationships From Medical Abstracts Jonathan Elsas Jaime Arguello

  2. Goal • Extract entity pairs(A,B,P), where it is said that A  B with Pr = P • Extract contexts(pre, mid, post,direction, P) that mark causal relationships with Pr = P • Use the best entity pairs to extract more contexts • Use the best contexts to extract more entity pairs • BEST = highest P( +C| (A,B)) & P( +C| (pre, mid, post))

  3. Examples post pre mid Toxic dermatitis due to TB 1 therapy. post mid pre In 518 cases, a single factor caused the abortion post mid Risk of cancer persists for years following smoking cessation.

  4. Algorithm Input: ContextList seeds, EntityPairList seeds [1] Use all Contexts in ContextList to compute: P( +c | (Ai, Bi)) = Σk P( +c | Sk ) P ( Sk | (Ai ,Bi)) [2] Keep N best (Ai, Bi) in EntityPairList [3] Use all EntityPairs in EntityPairList to compute: P( +c | Si) = Σk P( +c | (Ak, Bk)) P ((Ak, Bk) | Si) [4] Keep N best Si in ContextList to compute [1] [5] Increment N and return to [1] P ( Sk | (Ai ,Bi)) ~ #[Sk ,& (Ai ,Bi)] / # (Ai ,Bi) P ((Ai ,Bi) | Sk) ~ #[Sk ,& (Ai ,Bi)] / # Sk

  5. Corpus and Tools • 300,000 MEDLINE abstracts • Annotated with POS & sentence boundaries • Limited to within-sentence contexts • Use Indri IR toolkit to index annotations • Allows sentence retrieval • Structure Query Language e.g. #1(<#any:nn #any:nnp> cause.vb <#any:nn #any:nnp>)

  6. Observations (1) • The algorithm seems to learn well during first few iterations • One seed: (Heavy smoking, cancer) • Learns contexts: “caused by” and “due to” • Learns entity pairs: (rubber, dermatitis), (bromobenzene, necrosis) • All this within 2-3 iterations!

  7. Observations (2) • Quality Rapidly degrades • Quickly converges to specialized area of the corpus. • [A quickly growing population of people with] [appears to be non] [women] • Need a better way of limiting the window of pre- and post-context, or when pre- and post- are irrelevant • [Yet while men are increasingly kicking the habit doctors fear that][induced][could become as much of a threat to women in the future.] • Doesn’t depend on whether we start with context-seeds or entity-pair-seeds

  8. Next Steps • More Data! • 300,000 abstracts way too sparse • Several million abstracts available from NLM • Add better entity extraction, and canonicalization • Problem: NLM METATHESAURUS needs 26 Gigs!

  9. Last Minute Google Results • Pre-context seems to help a lot • e.g. “studies show that”_____ “causes”. • Some Good contexts learned: • [null] [is less in people who quit] [null] • [null] [is the major single cause of] [null] • [null] [deaths in 2002 attributed to passive] [null] • [null] [accounts for at least 30 of all] [null] • [null] [is less in people who quit] [null] • Need to find way to transform learned contexts into more general form

  10. Ideas?

More Related