1 / 33

Who uses functional data analysis- and for what?

Who uses functional data analysis- and for what?. Marian Scott NERC workshop, University of Glasgow March 2014. A few examples. clustering Principal components analysis regression Geostatistics Other

hayes
Download Presentation

Who uses functional data analysis- and for what?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Who uses functional data analysis- and for what? Marian Scott NERC workshop, University of Glasgow March 2014

  2. A few examples • clustering • Principal components analysis • regression • Geostatistics • Other • We have identified a few published pieces of work where function data analysis is presented.

  3. Functional Data Analysis in Ecosystem Research: The Decline of Oweekeno Lake Sockeye Salmon and Wannock River Flow “Functional regression is a natural tool for exploring the potential impact of the physical environment (continuously monitored) on biological processes (often only assessed annually). This paper explores the potential use of functional regression analysis and the closely related functional principal component analysis for studying the relationship between river flow (continuously monitored) and salmon abundance (measured annually).” Example 1:

  4. In FDA, the resolution of the raw data is an important consideration. In order to capture the peaks and valleys in functional data, the data need to be collected on a sufficiently dense time scale. Daily river flow measurements are felt to have sufficiently high resolution to capture important fluctuations which might relate to marine survival of salmon. The functional principal components of the daily river flow highlight variability in different time frames: Component 1 (Figure 2a), during January and February with a prolonged tail extending over March and April; Component 2 (Figure 2b), the contrast between December and February–March flows; and Component 3 (Figure 2c), March and early April flows. Example 1:

  5. The first three functional principal components account for approximately 36%, 24%, and 18% of the variability in the residual river flow curves respectively. The first and third FPCA components are correlated with marine survival (r =−0.38 and r = −0.45) while the second is not (r = 0.08). Standard multiple linear regression indicates that the 1st and 3rd functional principal component scores are significant predictors of marine survival Both functional regression analysis and regression using functional principal component scores indicate a strong negative association between marine survival and river flow in March and early April Example 1:

  6. FPCA of flow

  7. Analysis of air quality monitoring networks by functional clustering Air quality monitoring networks are important tools in management and evaluation of air quality. Classifying monitoring stations via homogeneous clusters allows identification of similarities in pollution, of representative sites, and of spatial patterns. We classify using functional cluster analysis, applied to the air quality monitoring network in Piemonte (Northern Italy). Example 2

  8. Adopting the FDA approach, time series gathered as discrete observations are converted into functional data. This conversion has the advantage of reducing thousands of observations to a few coefficients, but preserving information about the functional structure, that is, the temporal pattern, of the time series; it is realized fitting the functional data by non-parametric regression model via B-splines, with a fixed number of knots. FDA approach

  9. clustering Since we fix the same degree and the same knot vector when using B-splines, the model coefficients will have the same meaning for all time series. Hence, after the first stage, we classify sites of air quality monitoring networks using the non-hierarchical Partitioning Around Medoid (PAM) algorithm.

  10. Air monitoring network

  11. Examples of curves

  12. Sets of curves

  13. The identified clusters

  14. Example 3 • Looking for similar patterns among monitoring stations. Venice Lagoon application The main purpose of this paper is to provide a classification, in terms of different trophic variables, of the sites of a water quality monitoring network located in Venice Lagoon. We apply a classification method based on functional data analysis (FDA), which allows to take into account sample information about the temporal dynamics of the variables of interest.

  15. Venice lagoon

  16. Sets of curves for sites

  17. Sets of smooth curves

  18. Cluster results

  19. Example 4 Investigating fine-scale spatio-temporal predator–prey patterns in dynamic marine ecosystems: a functional data analysis approach. • Studies of fine-scale changes in oceanography, prey and predator behaviour with tidal currents require repeated surveys of the same location over brief time-scales. Such data are highly temporally and spatially autocorrelated and require appropriate analytical tools.

  20. Example 4 We used functional principal component analysis (FPCA), to analyse repeated, fine-scale, survey data collected in the North Sea. FPCA was used to explore the relationship between the behaviour of an important North Sea prey species (sandeel Ammodytes spp.) and a vulnerable surface-foraging predator (black-legged kittiwake Rissa tridactyla) with fine-scale tidally driven changes in bio-physical characteristics (temperature stratification and maximumsubsurface chlorophyll concentration).

  21. Fig. 2. MS1: (a) number of sandeel schools by 5-min position for each repeat; (b)MS1 circuit showing the position of each 5-min bin; (c) corresponding functional principal component analysis showing the first two PCs (solid line = PC1, dotted line = PC2); (d) corresponding PCscores plot, (1–12 = repeat number); (e) topography over a single circuit.

  22. Fig. 4. Functional principal component analysis of number of sandeel schools for the three surveys: (d, e, f) first two PCs (solid line = PC1, dotted line = PC2), and (a, b, c) PC scores plots (1–12 = repeat number) for (a, d)MS1; (b, e)MS2; and (c, f)MS3. Corresponding fluorescence profiles for (g)MS1 repeat 12; (h)MS2 repeat 8; and (i)MS3 repeat 7.

  23. Example 4 In our study, we used FDA to demonstrate the importance of tidal currents in the physical–biological coupling of trophic transfer to top predators. Physical–biological coupling is an important aspect of the marine ecosystem influencing everything from primary production (e.g.Weston et al. 2005) to trophic transfer to fish and top predators (Hunt et al. 1998;Genin 2004; Bertrand et al. 2008).

  24. Example 5 Exploring between site differences in water quality trends: a functional data analysis approach Functional data analysis represents a philosophy for analysing data that are curves. Functional equivalents exist for many of the well-known statistical methods. In this article we consider principal components analysis, linear modelling and hierarchical cluster analysis, and illustrate how they might be used to better understand and contrast water quality trends. The focus is on a single water quality variate at multiple sites and the site-to-site differences in trend. The questions of interest surrounding these trends include: Which monitoring sites are similar? Are there notable differences between sites? What effect does depth have?

  25. Example 5 • Monthly nutrient and sediment data are available from mid-1997 to mid-2001 in Wivenhoe, Somerset and North Pine dams from 13 sites, each sampled at the surface and the bottom. The smooth time trend for each water quality variate at each site and depth combination is obtained from a smoothing spline with 4 degrees of freedom, after allowing for seasonal effects represented using a B-spline basis expansion with 44 terms to ensure that they are faithfully reproduced in the functional data analysis that follows.

  26. Example 5 Smooth trends for nitrate at each dam, surface

  27. Example 5 • Functional principal components analysis was carried out to identify the primary sources of variation in the smooth trends after adjusting for the average trend. Figure 2 gives the average trend for total nitrogen and shows a strong peak in 1999. The first three components are also present in the same figure and explain 97 per cent of the variation in the curves. Principal component 1 identifies extra peakedness in mid-1999, while principal components 2 and 3 largely identify elevated total nitrogen towards the end and at the beginning of the monitoring period, respectively. • The scores from these three principal components are plotted for the 26 smooth trends

  28. Functional PCA

  29. Example 5 • Functional linear modelling allows us to ask more direct questions of these functional objects. It extends linear modelling and enables us to include functional objects as the response or explanatory variables. • The case we are most interested in here is the functional response and establishing whether we can describe variation in the curves through site-level covariates.

  30. For Wivenhoe the shape of the two curves is very similar, though total nitrogen appears to be consistently around 0.05 units higher at the bottom throughout the monitoring period. The bottom may also peak marginally later. In the Somerset dam, the curves are similar in both shape and magnitude, indicating that the depth effect is small. Total nitrogen is, however, greater at the bottom than that at the surface until mid-1999, when it switches to being slightly over. At North Pine, total nitrogen is higher at the dam bottom, with the difference being at its maximum during 1998 before converging in early 2000.

  31. Example 5 Hierarchical cluster analysis applied to the functional distance matrix for the 26 total nitrogen site trends yields the dendrogram in Figure 7. An average linkage is used. Its portrayal of site similarity is consistent with that of Figures 3 and 4. Sites within each dam tend to group together. There is some overlap between Wivenhoe and Somerset. In particular, note that Wivenhoe sites 16 and 17 at the bottom are closer to the Somerset sites than to the Wivenhoe sites.

  32. References Ainsworth L M,, Routledge R, Cao J (2011). Functional Data Analysis in Ecosystem Research: The Decline of Oweekeno Lake Sockeye Salmon and Wannock River Flow. JABES 16(2). Henderson B, (2006). Exploring between site differences in water quality trends: a functional data analysis approach. Environmetrics 17 Ignaccolo,R, Ghigo S, Giovenali E (2008). Analysis of air quality monitoring networks by functional clustering. Environmetrics, 19. Pastres R, Pastore A, Tonellato S F (2010) . Looking for similar patterns among monitoring stations. Venice Lagoon application. Environmetrics. Embling et al (2012). Investigating fine-scale spatio-temporal predator–prey patterns in dynamic marine ecosystems: a functional data analysis approach. J of Applied Ecology, 49.

More Related