130 likes | 156 Views
Knowledge Space Map for Organic Reactions. Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing and Learning the Map. Knowledge Space Map. Addition. Subtraction. Spelling. Vocabulary. Grammar.
E N D
Knowledge Space Map for Organic Reactions Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing and Learning the Map
Knowledge Space Map Addition Subtraction Spelling Vocabulary Grammar • Isolate atomic knowledge units / nodes / elements • Determine dependency graph of knowledge units (defines a learning order by topological sort) • Enables targeted and purposeful lesson plans based on the “fringes” of student’s current knowledge state Multiplication Division Fractions Exponents Logarithms
Chemistry Knowledge Space? • Current system has user driven selection of which chapter(s) to work on, then system randomly generates problem • Idealized approach: Assess student’s current knowledge state and auto-generate next problem to target next most useful subject • Existing tutorial based on predictive power of 80+ reagents, which are based on 1500+ elemental rules. These could be interpreted as 1500+ knowledge units
Rule Clustering • Many rules are just variants of the same concept / knowledge unit • Alkene, Protic Acid Addition, Alkoxy • Alkene, Protic Acid Addition, Benzyl • Alkene, Protic Acid Addition, Allyl • Alkene, Protic Acid Addition, Tertiary • Alkene, Protic Acid Addition, Secondary • Alkene, Protic Acid Addition, Generic • … • Some rules will always be used in conjunction with another (like “qu”) • Not really a learning dependency order between these rules then, you essentially know one of the rules IFF (if and only if) you know the others
Data Model Proposal • Want general framework for representing relationships • Each reaction rule represents an elementary knowledge unit node • Weighted, directed edge between each node represents learning dependency relationship A B (90%) • Given that a student “knows” rule B, there is a 90% probability that they “know” rule A • Conversely, if do NOT know rule A, 90% probability that do NOT know rule B. • Define “know”: Student should consistently answer correct any problem that is based only on rules that they “know” • Define rule similarity measure as average of reciprocal dependency relationships
Major Relationship Cases • Strong learning dependency • A B (99%) • A B (50%) • Strong similarity / mutual dependency • A B (99%) • A B (99%) • No relation (random correlation) • A B (50%) • A B (50%)
Additional Enhancements • Add baseline probability of “knowing” each node, instead of assuming uniform 50% • Analogous to using background weights for amino acid distribution in protein sequence • Add a confidence number for each of these probability weights to reflect how trustworthy our prior data is • Analogous (maybe equal) to n, the number of data points that were used to arrive at the current estimate
Learning Relationship Map • Give students assessment exams based on the rule sets with criteria to distinguish problems that students get “right” vs. “wrong” • Defines sets of rules • R: All rules used in problems students got right • W: All rules used in problems students got wrong (that are not in R) • Adjust rule relation values • Decrease Ri Wj relations • Increase Ri Rk relations • Scale adjustment based on confidence in prior
Learning Propagation • Each assessment exam may only cover a handful of specific rules in R and W • When updating relation for rule R1 R2, look for all rules similar to R1 and all similar to R2 • Assume respective updates for all relations between similar rule pairs, scaled by the magnitude of similarity to R1 and R2 • Technically, all rules are similar to all others by some degree, but don’t want to update 15002 relations every time. Set similarity threshold, which effectively defines clusters around rules.
Constructing Relationship Map • Initial pass should be able to automatically find a lot of “similarity” relationships just based on existing structured data • Rule names • Combined usage in test examples • Included in common reagents, chapters, etc. • Use book chapters order as initial guess for dependency orders • Similarity analysis could reduce 1500+ rules to ~100? rule “clusters” which is more tractable to manually assign major dependencies not automatically addressed by book chapter order
Open Questions • Student knowledge evolves over time, maybe even with one exam. How to hit “moving target” of their current knowledge state? • Baseline probabilities of knowing a rule. Random sample of all students? Will differ greatly based on population sample chosen.
SMILES Extensions 1 O 1 O 8 4 5 10 7,8 3 H NH-R2 + + H2O 2 2 9 3 7 R1 OH 9 4 5 10 R1 NH-R2 • Atom Mapping • Necessary to map reactant to product atoms • Proper transform requires balanced stoichiometry • Hydrogens generally must be explicitly specified Carboxylic acid + [O:1]=[C:2]([*:9])[O:3][H:7]. Primary amine [H:8][N:4]([*:10])[H:5]>> Amide + [O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. Water [H:7][O:3][H:8]
Transformation Rules • Chemical state machine modeling at mechanistic level of detail • State information: Molecular structure • State transition: Transformation rules carbocationhalide addition p-bond protic acid addition