100 likes | 218 Views
Learning Causal Relationships From Medical Abstracts. Jonathan Elsas Jaime Arguello. Goal. Extract entity pairs (A,B,P) , where it is said that A B with Pr = P Extract contexts (pre, mid, post,direction, P) that mark causal relationships with Pr = P
E N D
Learning Causal Relationships From Medical Abstracts Jonathan Elsas Jaime Arguello
Goal • Extract entity pairs(A,B,P), where it is said that A B with Pr = P • Extract contexts(pre, mid, post,direction, P) that mark causal relationships with Pr = P • Use the best entity pairs to extract more contexts • Use the best contexts to extract more entity pairs • BEST = highest P( +C| (A,B)) & P( +C| (pre, mid, post))
Examples post pre mid Toxic dermatitis due to TB 1 therapy. post mid pre In 518 cases, a single factor caused the abortion post mid Risk of cancer persists for years following smoking cessation.
Algorithm Input: ContextList seeds, EntityPairList seeds [1] Use all Contexts in ContextList to compute: P( +c | (Ai, Bi)) = Σk P( +c | Sk ) P ( Sk | (Ai ,Bi)) [2] Keep N best (Ai, Bi) in EntityPairList [3] Use all EntityPairs in EntityPairList to compute: P( +c | Si) = Σk P( +c | (Ak, Bk)) P ((Ak, Bk) | Si) [4] Keep N best Si in ContextList to compute [1] [5] Increment N and return to [1] P ( Sk | (Ai ,Bi)) ~ #[Sk ,& (Ai ,Bi)] / # (Ai ,Bi) P ((Ai ,Bi) | Sk) ~ #[Sk ,& (Ai ,Bi)] / # Sk
Corpus and Tools • 300,000 MEDLINE abstracts • Annotated with POS & sentence boundaries • Limited to within-sentence contexts • Use Indri IR toolkit to index annotations • Allows sentence retrieval • Structure Query Language e.g. #1(<#any:nn #any:nnp> cause.vb <#any:nn #any:nnp>)
Observations (1) • The algorithm seems to learn well during first few iterations • One seed: (Heavy smoking, cancer) • Learns contexts: “caused by” and “due to” • Learns entity pairs: (rubber, dermatitis), (bromobenzene, necrosis) • All this within 2-3 iterations!
Observations (2) • Quality Rapidly degrades • Quickly converges to specialized area of the corpus. • [A quickly growing population of people with] [appears to be non] [women] • Need a better way of limiting the window of pre- and post-context, or when pre- and post- are irrelevant • [Yet while men are increasingly kicking the habit doctors fear that][induced][could become as much of a threat to women in the future.] • Doesn’t depend on whether we start with context-seeds or entity-pair-seeds
Next Steps • More Data! • 300,000 abstracts way too sparse • Several million abstracts available from NLM • Add better entity extraction, and canonicalization • Problem: NLM METATHESAURUS needs 26 Gigs!
Last Minute Google Results • Pre-context seems to help a lot • e.g. “studies show that”_____ “causes”. • Some Good contexts learned: • [null] [is less in people who quit] [null] • [null] [is the major single cause of] [null] • [null] [deaths in 2002 attributed to passive] [null] • [null] [accounts for at least 30 of all] [null] • [null] [is less in people who quit] [null] • Need to find way to transform learned contexts into more general form