300 likes | 419 Views
Application of the AR2NL system for reporting association rules in Finnish. 21.10.2004 Emilia Ylirinne emilia@iki.fi Tampere University of Technology. Introduction. Based on system LISp-Miner Data of the medical project STULONG AR2NL translates association rules into Czech and English
E N D
Application of the AR2NL system for reporting association rules in Finnish 21.10.2004 Emilia Ylirinne emilia@iki.fi Tampere University of Technology
Introduction • Based on system LISp-Miner • Data of the medical project STULONG • AR2NL translates association rules into Czech and English • Translating into Finnish
Topics • Reporting Data Mining results in Natural Language • System AR2NL • Translating into Finnish • Concluding Remarks (based on doctor Petr Strossa’s articles)
Reporting Data Mining results in Natural Language • Association rule φ ψ • Founded implication φ ψ p,n • Four-fold contingency table
Example of association rule ED(univ) RS(mng) 0.95,76 AJ(sits) • Four-fold table
Natural Language (NL) Formulations • several ways to formulate 1. 76 (i.e. 95 %) of the observed patients confirm this dependence: if the patient has university education and responsibility of a manager, then he mainly sits in his job. 1. 76 (eli 95 %) havainnoiduista potilaista toteuttaa seuraavan riippuvuuden: jos potilaalla on korkeakoulutus ja työ johtotehtävissä, hän istuu enimmäkseen työssään.
Natural Language (NL) Formulations 2. 95 % of the observed patients that have reached university education and work as a managerial position also mainly sit in their job. 2. 95 % havainnoiduista potilaista, jotka ovat saaneet korkeakoulutuksen ja työskentelevät johtotehtävissä, myös enimmäkseen istuvat työssään.
Natural Language (NL) Formulations 3. It is characteristic for the patients that have reached university education and work as a managerial position that they also have a sedentary job. This fact is confirmed by 76 (i.e. 95 %) observed patients. 3. Potilaille, jotka ovat saaneet korkeakoulutuksen ja työskentelevät johtotehtävissä, on ominaista, että heillä on myös istumatyö. Tämän toteuttaa 76 (eli 95 %) havainnoitua potilasta.
Natural Language (NL) Formulations • X Y0.95,76 Z 1.a (i.e. 100p %) of the observed patients confirm this dependence: if the patient has NLF(X) and NLF(Y), then he NLF(Z). 2.100p % of the observed patients that NLF(X) and NLF(Y) also NLF(Z). 3. It is characteristic for the patients that NLF(X) and NLF(Y) that they also have NLF(Z). This fact is confirmed by a (i.e. 100p %) observed patients.
Noun phrase, NP university education korkeakoulutus a managerial position työ johtotehtävissä a sedentary job istumatyö • Verb phrase, VP works as a managerial position työskentelee johtotehtävissä has reached university education on saavuttanut korkeakoulutuksen • Adjectival phrase AP university-educated korkeakoulutettu • Participial phrase working as a manager johtotehtävissä työskentelevä mainly sitting in his job työssään istuva
Natural Language (NL) Formulations 1.a (i.e. 100p %) of the observed patients confirm this dependence: if the patient has NP(X) and NP(Y), then he VP(Z). 1.a (eli 100p %) havainnoiduista potilaista toteuttaa seuraavan riippuvuuden: jos potilaalla on NP(X) ja NP(Y), hän VP(Z).
Natural Language (NL) Formulations 2.100p % of the observed patients that VP(X) and VP(Y) also VP(Z). 2.100p % havainnoiduista potilaista, jotka VP(X) ja VP(Y), myös VP(Z).
Natural Language (NL) Formulations 3. It is characteristic for the patients that VP(X) and VP(Y) that they also have NP(Z). This fact is confirmed by a (i.e. 100p %) observed patients. 3. Potilaille, jotka VP(X) ja VP(Y), on ominaista, että heillä on myös NP(Z). Tämän toteuttaa a (eli 100p %) havainnoitua potilasta.
Finnish language • Belongs to Uralian family of languages • More than a dozen cases (http://www.cs.tut.fi/~jkorpela/finnish-cases.html) • Synthetic language uses suffixes to express grammatical relations and also to derive new words in my house, too -> talossanikin after you had written -> kirjoitettuasi • ”Free” word order
Pete loves Anna - Anna loves Pete Pete rakastaa Annaa. This is the normal word order, the same as in English. Annaa Pete rakastaa. This emphasizes the word Annaa: the object of Pete's love is Anna, not someone else. Rakastaa Pete Annaa. This emphasizes the word rakastaa, and such a sentence might used as a response to some doubt about Pete's love; so one might say it corresponds to Pete does love Anna. Pete Annaa rakastaa. This word order might be used, in conjunction with special stress on Pete in pronunciation, to emphasize that it is Pete and not someone else who loves Anna. Annaa rakastaa Pete. This might be used in a context where we mention some people and tell about each of them who loves them. So this roughly corresponds to the English sentence Anna is loved by Pete. Rakastaa Annaa Pete. This does not sound like a normal sentence, but it is quite understandable. source: http://www.cs.tut.fi/~jkorpela/finnish-intro.html
Finnish language • no definite or indefinite article • no grammatical gender • negation, corresponding to English not, behaves as a verb • ownership or possession (have and be in English) I have a dog -> Minulla on koira ("at me (there) is (a) dog")
System AR2NL • Main features • Written in XML standard • Files which contains data needed in translations • Translates association rules with founded implication
FP-file • Formulation Patterns • Base of (NL) sentences • File which contains data needed in translations • Translates association rules with founded implication
FPA-file • Formulation Patterns - Auxiliary • Substitutions for higher-order non-terminal symbols • Variability of sentences
Entitynames-file • Entities • e.g. ”Patient”, ”which”
MN-file • Morphology - Nouns • Language dependent • Singular and plural case endings • 7 cases in Czech, 14 cases in Finnish
MV-file • Morphology - Verbs • Singular and plural case endings • Participial form and case
Elementary-file • Important part • Contains data of the literals • Noun phrase, Adjectival phrase, Verb phrase
Conversion process • example of the process
Problems • word order in participial form drink beer - drinking beer juoda olutta - olutta juova • cases in participial form many cases • ja (and) in logic and in Finnish Patients drinking beer and smoking mainly sits in their job. Olutta juovat ja tupakoivat potilaat istuvan enimmäkseen työssään • ownership
Concluding Remark • AR2NL system can translate association rules into Finnish, too