Learning Causal Relationships From Medical Abstracts

Learning Causal Relationships From Medical Abstracts Jonathan Elsas Jaime Arguello

Goal • Extract entity pairs(A,B,P), where it is said that A  B with Pr = P • Extract contexts(pre, mid, post,direction, P) that mark causal relationships with Pr = P • Use the best entity pairs to extract more contexts • Use the best contexts to extract more entity pairs • BEST = highest P( +C| (A,B)) & P( +C| (pre, mid, post))

Examples post pre mid Toxic dermatitis due to TB 1 therapy. post mid pre In 518 cases, a single factor caused the abortion post mid Risk of cancer persists for years following smoking cessation.

Algorithm Input: ContextList seeds, EntityPairList seeds [1] Use all Contexts in ContextList to compute: P( +c | (Ai, Bi)) = Σk P( +c | Sk ) P ( Sk | (Ai ,Bi)) [2] Keep N best (Ai, Bi) in EntityPairList [3] Use all EntityPairs in EntityPairList to compute: P( +c | Si) = Σk P( +c | (Ak, Bk)) P ((Ak, Bk) | Si) [4] Keep N best Si in ContextList to compute [1] [5] Increment N and return to [1] P ( Sk | (Ai ,Bi)) ~ #[Sk ,& (Ai ,Bi)] / # (Ai ,Bi) P ((Ai ,Bi) | Sk) ~ #[Sk ,& (Ai ,Bi)] / # Sk

Corpus and Tools • 300,000 MEDLINE abstracts • Annotated with POS & sentence boundaries • Limited to within-sentence contexts • Use Indri IR toolkit to index annotations • Allows sentence retrieval • Structure Query Language e.g. #1(<#any:nn #any:nnp> cause.vb <#any:nn #any:nnp>)

Observations (1) • The algorithm seems to learn well during first few iterations • One seed: (Heavy smoking, cancer) • Learns contexts: “caused by” and “due to” • Learns entity pairs: (rubber, dermatitis), (bromobenzene, necrosis) • All this within 2-3 iterations!

Observations (2) • Quality Rapidly degrades • Quickly converges to specialized area of the corpus. • [A quickly growing population of people with] [appears to be non] [women] • Need a better way of limiting the window of pre- and post-context, or when pre- and post- are irrelevant • [Yet while men are increasingly kicking the habit doctors fear that][induced][could become as much of a threat to women in the future.] • Doesn’t depend on whether we start with context-seeds or entity-pair-seeds

Next Steps • More Data! • 300,000 abstracts way too sparse • Several million abstracts available from NLM • Add better entity extraction, and canonicalization • Problem: NLM METATHESAURUS needs 26 Gigs!

Last Minute Google Results • Pre-context seems to help a lot • e.g. “studies show that”_____ “causes”. • Some Good contexts learned: • [null] [is less in people who quit] [null] • [null] [is the major single cause of] [null] • [null] [deaths in 2002 attributed to passive] [null] • [null] [accounts for at least 30 of all] [null] • [null] [is less in people who quit] [null] • Need to find way to transform learned contexts into more general form

Ideas?

Learning Causal Relationships From Medical Abstracts

Learning Causal Relationships From Medical Abstracts

Presentation Transcript

Causal learning in humans

Abstracts

Abstracts

Abstracts

Abstracts

Abstracts

Abstracts

Abstracts

Abstracts

Learning Causal Structure from Observational and Experimental Data

Human-Understandable Inference of Causal Relationships

Simulation and Application on learning gene causal relationships

Causal Cognition 1: learning

Learning from Sibling Relationships

Abstracts

Abstracts

Learning to Extract Proteins and their Interactions from Medline Abstracts

Causal Cognition 1: learning

Criteria for Establishing Causal Relationships

Information Extraction from BioMedical Abstracts

Abstracts