Elsevier

Ecological Modelling

Volume 221, Issue 2, 24 January 2010, Pages 152-160
Ecological Modelling

Clustering species using a model of population dynamics and aggregation theory

https://doi.org/10.1016/j.ecolmodel.2009.10.013Get rights and content

Abstract

The high species diversity of some ecosystems like tropical rainforests goes in pair with the scarcity of data for most species. This hinders the development of models that require enough data for fitting. The solution commonly adopted by modellers consists in grouping species to form more sizeable data sets. Classical methods for grouping species such as hierarchical cluster analysis do not take account of the variability of the species characteristics used for clustering. In this study a clustering method based on aggregation theory is presented. It takes account of the variability of species characteristics by searching for the grouping that minimizes the quadratic error (square bias plus variance) of some model’s prediction. This method allows one to check whether the gain in variance brought by data pooling compensate for the bias that it introduces. This method was applied to a data set on 94 tree species in a tropical rainforest in French Guiana, using a Usher matrix model to predict species dynamics. An optimal trade-off between bias and variance was found when grouping species. Grouping species appeared to decrease the quadratic error, except when the number of groups was very small. This clustering method yielded species groups similar to those of the hierarchical cluster analysis using Ward’s method when variance was small, that is when the number of groups was small.

Introduction

The high species diversity of some terrestrial or sea ecosystems such as tropical rain forests or coral reefs has raised many questions about their functioning (Hubbell and Foster, 1986, Hubbell, 1997, Whitmore, 1998). Ecologists have tried to simplify this diversity by assigning species to functional groups, i.e. groups of species that have the same functions in the ecosystem (Díaz and Cabido, 1997, Köhler et al., 2000, Fonseca and Ganade, 2001, Baker et al., 2003, Mcgill et al., 2006). Even if marked patterns such as the dichotomy between pioneers and climax species in tropical rain forests have been identified (Swaine and Whitmore, 1988, Baker et al., 2003), the definition of functional groups has remained an inaccessible Holy Grail, the distribution of species along functional gradients always being continuous rather than discrete. To build functional groups, ecologists typically grouped species on the basis of their similarity with respect to ecological characteristic or functional traits (Gourlet-Fleury et al., 2005). The methods used to group species were mainly cluster analysis, when they were not simply an educated guess.

People interested in the modelling of the dynamics of species-rich ecosystem have also paid attention to the grouping of species. The motivation of modellers was basically not to find functional groups, but rather to compensate for the scarcity of data for the less abundant species, that are also the most numerous. The scarcity of data for these rare species prevented from estimating the parameters of the models of population dynamics with enough precision. By pooling species, more sizeable data sets could be formed and reliable parameter estimates could be obtained. Despite this motivation, modellers have mainly stuck to the paradigm of functional groups, i.e. the grouping of species was made on the basis of their similarity with respect to their characteristics (Köhler and Huth, 1998, Köhler et al., 2000). Often the groups of species were built independently from the model of population dynamics (e.g. Favrichon, 1998). Sometimes the building of the groups of species was linked to the model of population dynamics, the grouping being based on the residuals of the model (Vanclay, 1991a, Vanclay, 1992, Gourlet-Fleury and Houllier, 2000).

When pooling species into a group, the number of available observations increases and thus the variance of the estimators of model parameters decreases. But at the same time, an estimation bias is introduced since the values of the parameters for a given species are confounded with those of the group. The wider the group is, the larger the bias is and the smaller the variance is. The bias vanishes when each group is a singleton restricted to a single species, but the variance is then maximum. To assess the interest of a species grouping from the modeller’s point of view, it is thus necessary to compute the quadratic error that results from the groups, where the quadratic error is the square bias plus variance.

This study aims at assessing the interest of groups of species from the modeller’s point of view, i.e. on the basis of the quadratic error on model’s predictions that it brings. The null grouping is when there are as many groups as species and each group identifies with a species (in other words, no effective grouping is made). A grouping of species will be considered as justified if it brings a lower quadratic error than the null grouping. The quadratic error will be interpreted as a disaggregation error in the context of aggregation theory. Aggregation theory deals with the error implied when shifting the level of description of a system from a detailed level to an aggregated less-detailed level (Iwasa et al., 1987, Iwasa et al., 1989, Ritchie and Hann, 1997). In the present case, the aggregation consists in replacing s species with g groups of species. Once the disaggregation error is defined, a method for defining groups of species follows by searching, for a given number g of groups, the grouping that minimizes the disaggregation error.

In this study, we presented a general framework useable with any model of population dynamics. We then applied the grouping strategy to 94 well represented tree species of a tropical rain forest in French Guiana. We chose to use a matrix model for size-structured populations to model population dynamics, and we addressed three questions: (i) How to build the disaggregation error? (ii) Is there a statistical interest to build groups, compared to null grouping? (iii) What happens if the groups are built according to a different strategy, either using the same model of population dynamics (groups of Favrichon, 1994, resulting from a cluster analysis), or using a different model (groups of Gourlet-Fleury and Houllier, 2000)?

Section snippets

Aggregation diagram

Let s be the number of species. For each species k{1,,s}, nk observations X1k, …, Xnkk are available. Each observation is considered as a random variable drawn from a distribution Fk(θk) that depends on unknown parameters θk. These parameters are those of the model of population dynamics. Expectations and variances will refer to the distributions Fk. Parameters θk are estimated from observations using an estimator θˆk. The model of population dynamics is here considered as an application ξ

Species characteristics

Fig. 3 shows the correlation circle of the PCA of the table giving the vital rates (fˆ, mˆ, pˆ) for each species. The recruitment rate is positively correlated with the mortality rate, and together these two rates define the turnover rate. The turnover rate explains the first axis of the PCA. The upgrowth transition rate pˆ is almost independent from the turnover rate and explains the second axis of the PCA. The mortality rate is actually close to the recruitment rate for all species, so that

Clustering method

On the basis of the Usher matrix models and for the 94 species studied at Paracou, the choice of modellers to build groups of species is justified: for reasonably well chosen groupings, the gain in variance that results from data pooling over-compensates in terms of quadratic error for the bias that results from the groups. The positive balance in terms of quadratic error is obtained for a large range of number of groups (3g<s in the present case) and for different grouping methods. Only when

References (39)

  • H. Caswell

    Matrix Population Models: Construction, Analysis and Interpretation

    (2001)
  • M. Delcamp et al.

    Can functional classification of tropical trees predict population dynamics after disturbance?

    J. Veg. Sci.

    (2008)
  • S. Díaz et al.

    Plant functional types and ecosystem function in relation to global change

    J. Veg. Sci.

    (1997)
  • Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. No. 57 in Monographs on Statistics and Applied...
  • V. Favrichon

    Classification des espèces arborées en groupes fonctionnels en vue de la réalisation d’un modèle de dynamique de peuplement en forêt guyanaise

    Rev. Écol. (Terre et Vie)

    (1994)
  • V. Favrichon

    Modeling the dynamics and species composition of tropical mixed-species uneven-aged natural forest: effects of alternative cutting regimes

    For. Sci.

    (1998)
  • C.R. Fonseca et al.

    Species functional redundancy, random extinctions and the stability of ecosystems

    J. Ecol.

    (2001)
  • S. Gourlet-Fleury et al.

    Grouping species for predicting mixed tropical forest dynamics: looking for a strategy

    Ann. For. Sci.

    (2005)
  • Cited by (20)

    • Comparing strategies for representing individual-tree secondary growth in mixed-species stands in the Acadian Forest region

      2020, Forest Ecology and Management
      Citation Excerpt :

      Species groupings according to taxonomy may seem fitting in a biological sense, but may be inappropriate in terms of quantifying growth (Zhao et al., 2006). Lastly, while the method of grouping may successfully categorize the most common species into unique groups, determining how rare and/or infrequent species are accounted for remains an important and unanswered question (Picard et al., 2010). These infrequent species may be placed into groups subjectively (Phillips et al., 2002), while some argue that the characteristics of these rare species should not go into defining groupings (Picard et al., 2010).

    • A neutral vs. non-neutral parametrizations of a physiological forest gap model

      2014, Ecological Modelling
      Citation Excerpt :

      A cluster analysis was able to allocate 64 species to a specific group, a discriminant analysis assigned further 72 species and the “remaining” 439 species were assigned manually to groups based on maximum stem-diameter at breast height (DBH) or “phylogenetic information”. Picard et al. (2010) used a clustering method based on an aggregation concept for a dataset of 94 tree species from French Guiana and investigated the optimal number of groups. They found that the minimum number of groups for optimizing the aggregation is 26 which is still a high number for a relatively low number of species.

    • A comparison of five classifications of species into functional groups in tropical forests of French Guiana

      2012, Ecological Complexity
      Citation Excerpt :

      While Favrichon (1994) used multivariate analysis and k-means clustering of species attributes, Köhler (2000) used a priori classification based on ecological traits collected in the literature or inferred from field data. The three other classifications used model-based clustering: Gourlet-Fleury and Houllier (2000) used an individual-based growth model, Picard et al. (2010) used a matrix projection model with aggregation theory, and Mortier et al. used a finite mixture of matrix models in a Bayesian context. These different classification techniques reflect the diversity of cluster analyses used in ecology (Gourlet-Fleury et al., 2005).

    View all citing articles on Scopus
    View full text