190 likes | 409 Views
Analysis of Air Pollution (PM 10 ) and Respiratory Morbidity Rate using K -Maximum Sub-array (2-D) Algorithm . Kyoko Fukuda Environmental Science Programme University of Canterbury , Christchurch, NZ K.Fukuda@math.canterbury.ac.nz . Tadao Takaoka Computer Science and Software Engineering
E N D
Analysis of Air Pollution (PM10) and Respiratory Morbidity Rate using K-Maximum Sub-array (2-D) Algorithm Kyoko Fukuda Environmental Science Programme University of Canterbury, Christchurch, NZ K.Fukuda@math.canterbury.ac.nz Tadao Takaoka Computer Science and Software Engineering University of Canterbury, Christchurch, NZ T.Takaoka@cosc.canterbury.ac.nz
Introduction • Background: Air pollution and health. • Introducing K-Maximum Sub-array (2-D) algorithm in environment and health science. • Comparisons to Statistical Analysis. • Data sets. • Results and discussions. • Conclusions.
www.christchurch.org.nz www.canada.com http://z.about.com www.niwa.cri.nz http://www.4x4adventures.co.nz Beautiful views Christhcurch and New Zealand
20-40m Smog at low height Air pollution problem in Christchurch www.ecan.govt.nz
Air pollution problem in Christchurch Photo: www.starproductions.co.nz • Acceptable annual air quality guideline for [PM10] 20 and 50 ugm-3 • Mean annual: 21.221.5. • Mean winter: 36.131.0. • Mean summer: 14.25.4. • Below the guidelinemay impact on health. (Scott and Gunatilake, 2002)
http://www.ecan.govt.nz http://www.epa.gov Air pollution (PM10) and Health • Increase and aggravate asthma and hospital admissions for lung and heart disease. • Create disease in the airways of children and increase respiratory illness in children. • Damage the lungs and permanent changes in lung structure. • Increase deaths from respiratory and cardiovascular disease. • Increase cause chest pain and nausea. • Cause shortness of breath and (faster) laboured breathing. http://www.ecan.govt.nz
Statistics in Environment and Health Sciences K-MSA in Environment and Health Sciences • Regression models can take multiple confounding factors. • Detects relative risk factors. e.g) Nighttime chest symptoms over 55 years old (Harre et al., 1997): a 100ugm-3 increase inPM10 is 1.38 with 95% CI 1.07-1.78. • Requires data pre processing. • imputation and noise reduction (Time series analysis). • Assumptions? • Introducing uncommon fields: computer algorithm. • Two dimensional observations can take many multiple dimensions. • Flexibility to the nature of the data. • Detects the maximum events of interest. • No specific assumptions. • Similar to clustering, but …
S=193 S=21 Maximum Subarray • To find a consecutive portion (of any size) having the largest sum in an array.
13 19 21 21 K-Maximum Subarrays • What about 2nd ,3rd…Kth maximum sum? • Disjoint, Overlapping cases Disjoint Overlapping
Kadane’s Algorithm – one dimension • For maximum sub-array a[k..l] of a[1..n], (k, l) := (0,0); s := -; t := 0; j := 1; fori := 1 tondo begin t := t +a[i]; ift > sthen begin (k, l) := (j, i); s := tend; if t < 0 then begin t := 0; j := i +1 end end • Obviously this algorithm takes O(n) time.
Two-dimensional problem Kadane’s algorithm is extended to two dimensions in the following way. 1. For each row k of array a (k1) 2. For each row i ≥ k of array a 3. Solve the one-dimensional maximum sub-array for the strip portion from row k to row i 4. Let the solution be a[k .. i, l .. j] 5. Take the maximum of the m(m-1)/2 solutions.
1. New York 4. Los Angeles 2. Ohio k-disjoint maximum subarray problemNight view of North America (www.nasa.gov) 3. Chicago
K-Maximum Sub-array (K-MSA)(Bae and Takaoka, 2005) • The average of all the values in P is then calculated as • A new matrix, Q, of the same dimensions as P, is created, such that • Finally, the 2D maximum sub-array of Q is derived as the input data array a. Detects K maximum subarrays.
Categorised air pollution levels Sum = 164 Age Sum = 127 5 Respiratory morbidity rate for female (annual) Detect direct relationship: AP and morbidity rate by K-MSB. K=1: sum=164 for 0-5 years from low to Ex.high. K=2: sum=127 for >50 years from low to V.high. K=3: sum=5 for 76-85 years for Ex.high. Incorporated Two one-dimensional information
Studied data set • Christchurch (residential area) in NZ: 1998-2002. • Six different PM10 levels: V. Low (LQ/2), Low (LQ), Med, High (UQ), V. High (Upper fence), and Ex.High (>UF). • Acute respiratory morbidity rate (ICD-9: 460-519)*: 0 to 98 years old, female, male and both, four seasons. *CDHB permission (URB/06/05/031)
All seasons: 0-5 years. Female: >50 years Male: >70 years Low to Ex.High. Results • Winter: Wider impact. • Female: All age at high [PM10]. • Male: 0-5 years at wide range [PM10]. • Summer: Specified impact. • All ages at low to high. • Older ages at very high [PM10].
Conclusions on findings • K-MSA (2D) extracts information about the relationship between [PM10] and acute respiratory morbidity rate; • Range of the threshold criteria for the maximum associations of different age groups and the admission counts with different [PM10]. • Very young ages and old age groups are susceptible to changes of environment, e.g., [PM10]. • Winter: wider range of [PM10] affects on specific age groups. • Female – generally all age groups, or very young or old age groups. • Male – very young, >65 years, or young age groups (11-15 years). • Summer: specific range of [PM10] associates with all age groups.
Future challenges: • Higher dimensions: climate and confounding factors (e.g., health conditions). • Longer term effects. • Time lag between the air pollution levels for the short-term effects. • Comparisons to different algorithms, e.g., Heterogeneous clustering algorithm. • Questions to computing time. • Overlapping maximum subarrys.
Acknowledgements Data was provided by Environment Canterbury and the Canterbury District Health Board for research at the University of Canterbury. The K-MSA technique was developed by S. E. Bae as part of his PhD thesis. Thanks to P. Pearson for processing the data set. Thanks to T. Aberkane (ECan) for the air pollution data and I. Hudson and C. Williamson for the health data.* * The health data was originally obtained in 2003 with grant suport from I. Hudson and C. Williamson, and permission for its use for research at the University of Canterbuy was obtained from the CDHB in 2006.