1 / 16

Machine Learning Applications in Biological Classification of River Water Quality

Machine Learning Applications in Biological Classification of River Water Quality. Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연. Contents. Introduction Learning Rules for Biological Classification of British Rivers The Data The Experiment

gilon
Download Presentation

Machine Learning Applications in Biological Classification of River Water Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning Applications in Biological Classification of River Water Quality Saso Dzeroski, Jasna Grobovic and William J. Walley 98419-548 조 동 연

  2. Contents • Introduction • Learning Rules for Biological Classification of British Rivers • The Data • The Experiment • Analysis of Data about Slovenian Rivers • The Influence of Physical and Chemical Parameters on Selected Organisms • Biological Classification • Discussion

  3. Introduction • Indicator Organisms (Bioindicators) • Given a biological sample, information on the presence and density of all indicator organisms present in the sample is usually combined to derive a biological index that reflects the quality of the water as the site where the sample was taken • Saprobic Index • The main Problem: subjectivity • The subjectivity introduced at intermediate levels can and should be minimized.

  4. Learning Rules for Biological Classification of British River • Data • 292samples of 80 benthic macroinvertebrates • Abundance of animals • 0: no members of the particular family • 1: 1-2 • 2: 3-9 • 3: 10-49 • 4: 50-99 • 5: 100-999 • 6: more than 1000 • Sparse matrix • Five classes

  5. Experiments 1 • Modified CN2 algorithm • Measure the relative information score • Use the m-estimate instead of the Laplace estimate • The rules were required to be highly significant (99%). • 15 difference values of m were tried (0, 0.01, 0.25., 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024). • Criterion • Information Score • Accuracy • Smaller value of the parameter m

  6. Result 1 • 12 rules, m = 32 • 83% accuracy on the training set, 75% information content • Each rule covered 25 examples and contained 5 conditions. • The expert’s conclusions confirmed the rules.

  7. Experiment 2 • The main criticism was that the rules use only a small number of taxa, whereas the expert takes into account the whole community. • Six additional attributes • MoreThan0, MoreThan1, …, MoreThan5 • reflect the number of families • Result 2 • 13 rules, m = 64 • accuracy 84%, information content 80%

  8. Experiment 3 • 195 training example, 97 test example • Obvious performance improvement from the original to the extended problem.

  9. Analysis of Data about Slovenian Rivers • Data • 4 years (1990 - 1993) • Biological samples are taken twice a year (summer, winter). • Physical and chemical analyses are performed several times a year for each sampling site. • 698 water examples • training (70% - 489 cases), test (30% - 209 cases)

  10. The Influence of Physical and Chemical Parameters on Selected Organisms • From an ecological and water quality of view, these are important research topic. • Binary Classification: Present / Absent • Attributes • Plants: Hardness, NO2, NO3, NH4, PO4, SiO2, Fe, Detergents, COD, BOD • Animals: Temperature, PH, O2, Saturation, COD, BOD

  11. Result • Accuracy: 66% - 85% • Information score: 23% - 50% • 10 - 20 rules for each taxa • The average rule length was less than 5 conditions. • Average rule coverage was 15 to 45 examples.

  12. Nitzschia palea Elmis sp.

  13. Biological Classification • 13 physical and chemical parameters • 27 bioindicators • 7 classes • The majority class comprises 339 of the 698 examples, thus the default accuracy is 48.6%.

  14. Discussion • We have described several applications of rule induction in the domain of biological water quality classification. • The produced rules are transparent and can be easily understood by experts. • The induced rule contained valuable knowledge about the domain studied. • Machine learning techniques can be useful tools for classification and data analysis in the domain of river water quality and other ecological domains.

More Related