Elsevier

Agricultural and Forest Meteorology

Volume 216, 15 January 2016, Pages 68-81
Agricultural and Forest Meteorology

Extending and improving regionalized winter wheat and silage maize yield regression models for Germany: Enhancing the predictive skill by panel definition through cluster analysis

https://doi.org/10.1016/j.agrformet.2015.10.003Get rights and content

Highlights

  • We introduce parameter-based clustering of separate time series models.

  • The predictive power increases depending on the cluster analysis method.

  • Spatial correspondencies between different clusterings could be observed.

Abstract

Regional agricultural yield assessments allowing for weather effect quantifications are a valuable basis for deriving scenarios of climate change effects and developing adaptation strategies. Assessing weather effects by statistical methods is a classical approach, but for obtaining robust results many details deserve attention and require individual decisions as is demonstrated in this paper. We evaluated regression models for annual yield changes of winter wheat and silage maize in more than 300 German counties and revised them to increase their predictive power. A major effort of this study was, however, aggregating separately estimated time series models (STSM) into panel data models (PDM) based on cluster analyses. The cluster analyses were based on the per-county estimates of STSM parameters. The original STSM formulations (adopted from a parallel study) contained also the non-meteorological input variables acreage and fertilizer price. The models were revised to use only weather variables as estimation basis. These consisted of time aggregates of radiation, precipitation, temperature, and potential evapotranspiration. Altering the input variables generally increased the predictive power of the models as did their clustering into PDM. For each crop, five alternative clusterings were produced by three different methods, and similarities between their spatial structures seem to confirm the existence of objective clusters about common model parameters. Observed smooth transitions of STSM parameter values in space suggest, however, spatial autocorrelation effects that could also be modeled explicitly. Both clustering and autocorrelation approaches can effectively reduce the noise in parameter estimation through targeted aggregation of input data.

Introduction

Winter wheat (Triticum aestivum L.) and silage maize (Zea mays subsp. Mays L.) are major crops in Germany currently grown on 3.1 and 2.0 million hectares, respectively (Statistisches Bundesamt, 2014). Large parts of their relative inter-annual yield changes can be explained by a combination of time-aggregated weather variables. Profound knowledge of the weather effects and their robust modeling are required for assessing climate change effects which in turn are necessary for decision making on climate change adaptation strategies. Due to simplicity, effectiveness and objectivity, the assessment of weather effects by statistical methods has a long tradition, but there is no standardized approach how to optimally perform such an assessment. Developing a robust model with high predictive skill requires extensive formulation and testing of alternatives for gradual enhancements and decisions on development routes. This paper reports such a kaizen process for the German winter wheat and silage maize yields.

For these two example crops, Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b present spatially distributed yield modeling approaches for Germany for the years 1991–2010. They fit separate time series models for 289 counties (Landkreise) and utilize panel data models for federal states (Länder) and river basins. A third approach bridges the spatial levels through random coefficient modeling, a hybrid method requiring restricted maximum likelihood estimation. All three methods are however driven by the same data on county basis. Coefficients of determination are generally high and exceed 0.9 in the northeastern parts of the country where crop growth strongly depends on precipitation. However, yield estimations for single years whose input data was excluded from calibration (leave-one-out validation) regularly show depleted correlations with the observed yields.

The predictive skill of these models may be further challenged in climate change impact studies lacking the additional variables of the calibration data, namely acreage and fertilizer price. It is argued that the effect of the weather can only be estimated when the other relevant factors are simultaneously considered in the calibration process (Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b, Kaufmann and Snell, 1997). But does this really apply when noise and further, unknown factors are involved?

We start the investigations presented in this article with the separate time series models (STSM) of Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b based on an earlier work by Wechsung et al. (2008), because their aggregated results show the best performance compared to the other approaches of that study. Our main idea to improve the predictive power of the regression models is to stabilize the parameter estimations by optimized aggregation of the STSM to panel data models (PDM). The arbitrary definition of panel groups by political boundaries or river basins did not lead to better predictions (Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b), because the model parameters depend on climatic and landscape units which do not match the prescribed regions. For Germany, relatively homogeneous soil-climate zones have already been delineated (Roßberg et al., 2007) which could also be considered for paneling, cf. Mirschel et al. (2014); Tao et al. (2014) and You et al. (2009) present respective examples for wheat cultivation zones in China.

Comparative clustering trials utilizing such additional spatial information might be subject matter of future assessments. Here we follow a more general approach: We assume that (1) the individually estimated parameter vectors of the STSM are scattered about one vector for the given crop, climate, and soil combination, (2) there are groups of counties with comparable soil and climate and thus sharing the same parameter vectors, and (3) the parameter variation of the STSM does not exceed the variation owing to soil and climate. A cluster analysis of the estimated parameter vectors will unveil the groups with common parameter values. The latter can then be estimated with much higher confidence through respective PDM, and the predictive power of these should exceed that of the STSM.

Cluster analyses have already been applied to yield observations or their trends (Lee et al., 1993, Trethowan et al., 2003). They are partly used in combination with topographical or soil characteristics to define management zones (Roel and Plant, 2004, Yang et al., 2006). Another application of cluster analyses is the comparison of the performance from different crop cultivars (Mądry et al., 2011). The idea of grouping multiple yield regression models by clustering results (of topographical features) was realized by McKinion et al. (2010) who, respectively report improvements in the R2 values of their yield simulations. To our knowledge, we present the first application of a cluster analysis of STSM parameters in crop yield modeling.

The general principle that separately estimated models tend to produce higher errors than their aggregations can be explained by noise reduction in the combined models through a larger input data basis (cf. Woodard and Garcia, 2008). For crop yield regression modeling, this has already been demonstrated by spatially averaging of STSM parameters or input variables according to the spatial autocorrelations of yield-governing factors like climate or soil properties. Examples include Lee et al. (2013) who use county-wise spatial lag modeling for wheat yields and wheat quality in Oklahoma or Cai et al. (2014) who developed geographical weighted panel regression for corn yields in the United States. Bornn and Zidek (2012) coupled the parameter estimation for STSM by Bayesian methods for enhancing wheat yield predictions in Canada, accounting both for spatial correlations between agricultural regions (model units) and management differences between provinces. The main difference between these approaches oriented towards spatial correlations and a cluster-based PDM definition is that the former presume spatial agglomerations and transitions of parameter values while the latter call for a limited number of valid parameter combinations but without the need for spatial relationships between the model sites or areas.

We start our investigations by extending the original STSM setup of Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b to counties without weather station; this is achieved by spatial interpolation of the aggregated weather variables. Then we test the effects of dropping the non-meteorological variables and splitting a compound variable into its constituents temperature and radiation. With, respectively improved model formulae, we perform several types of cluster analysis with the STSM parameters, and finally present the best resulting PDM. The aim of improving the predictive skill could be largely achieved, but there are of course some observations to be discussed and finally some caveats to be named.

Section snippets

Study area and data

Germany consists of 16 federal states which are currently subdivided into 402 administrative units on county level. The latter define the finest spatial resolution for which annual harvest yield statistics are published. Because the two most frequent crops in Germany, winter wheat and silage maize, are grown in most of the counties, there is a large spatiotemporal data base for statistical yield modeling. Due to frequent mergers and restructurings of the political geography, older county data

Starting point

The high correlations between the STSM estimations and the official yield statistics reported by Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b – most coefficients of determination (R2) being in the range of 0.7–0.9 – are by far not reached for non-aggregated county results in validation mode. Fig. 3 shows the respective maps which indicate practically no predictive power of the chosen approaches for large parts of the country. Both the wheat and the maize models work only well in

Discussion

The German agricultural landscape is not at all homogeneous. While the spatial pattern in general cropping intensity can already be spotted in Fig. 1, the regional preferences for certain crops differ also considerably. The figures in Table 1 illustrate these inhomogeneities already on federal state level, but on county level the yield data, regularly given in dt ha−1 by the statistical offices, is partly based on marginal cropping areas which have been observed to increase noise levels and

Conclusions and outlook

We have increased the predictive power of simple time series regression models (STSM) for inter-annual winter wheat and silage maize yield changes in more than 300 German counties over a two-decade time period (1991–2010) by alterations of the model equations and clustering into panel data models (PDM). First, we could largely improve the leave-one-out validation R2 values of the STSM by splitting a theoretically well-founded input variable into its constituents, simple time aggregates of

Acknowledgments

The authors would like to thank all the people involved in sampling and collecting the agricultural yield data, especially Andrea Lüttger and Richard Mommertz for feeding and maintaining our inhouse data base. This work was carried out within an international research project named “FACCE MACSUR—Modelling European Agriculture with Climate Change for Food Security, a FACCE JPI knowledge hub”. Furthering was provided by the German Federal Ministry for Education and Research (BMBF), grant no. FKZ

References (53)

  • F. Tao et al.

    Responses of wheat growth and yield to climate change in different climate zones of China, 1981–2009

    Agric. For. Meteorol.

    (2014)
  • G. Wessolek et al.

    Trade-off between wheat yield and drainage under current and climate change conditions in northeast Germany

    Eur. J. Agron.

    (2006)
  • T.R. Wheeler et al.

    Temperature variability and the yield of annual crops

    Agric. Ecosyst. Environ.

    (2000)
  • L. You et al.

    Impact of growing season temperature on wheat productivity in China

    Agric. For. Meteorol.

    (2009)
  • G. Zhao et al.

    Demand for multi-scale weather data for regional crop modeling

    Agric. For. Meteorol.

    (2015)
  • D.A. Belsley et al.

    Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

    (1980)
  • R.S. Bivand et al.

    Interpolation and geostatistics

  • R. Cai et al.

    Estimating the spatially varying responses of corn yields to weather variations using geographically weighted panel regression

    J. Agric. Resour. Econ.

    (2014)
  • J.-P. Chilès et al.

    Geostatistics: Modeling Spatial Uncertainty

    (2012)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc. Ser. B: Methodological

    (1977)
  • EEA

    CLC2006 technical guidelines

    Tech. Report 17/2007

    (2007)
  • B.S. Everitt et al.

    Cluster Analysis

    (2011)
  • C. Fraley et al.

    mclust Version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation

    Technical Report No. 597

    (2012)
  • C. Fraley et al.

    Model-based clustering, discriminant analysis and density estimation

    J. Am. Stat. Assoc.

    (2002)
  • C. Gornott et al.

    Niveauneutrale Modellierung der Ertragsvolatilität von Winterweizen und Silomais auf mehreren räumlichen Ebenen in Deutschland

    J. Kulturpflanz.

    (2015)
  • C. Gornott et al.

    Statistical regression models for assessing climate impacts on crop yields—a validation study for winter wheat and silage maize in Germany

    Agric. For. Meteorol.

    (2015)
  • Cited by (0)

    View full text