Extending and improving regionalized winter wheat and silage maize yield regression models for Germany: Enhancing the predictive skill by panel definition through cluster analysis
Graphical abstract
Introduction
Winter wheat (Triticum aestivum L.) and silage maize (Zea mays subsp. Mays L.) are major crops in Germany currently grown on 3.1 and 2.0 million hectares, respectively (Statistisches Bundesamt, 2014). Large parts of their relative inter-annual yield changes can be explained by a combination of time-aggregated weather variables. Profound knowledge of the weather effects and their robust modeling are required for assessing climate change effects which in turn are necessary for decision making on climate change adaptation strategies. Due to simplicity, effectiveness and objectivity, the assessment of weather effects by statistical methods has a long tradition, but there is no standardized approach how to optimally perform such an assessment. Developing a robust model with high predictive skill requires extensive formulation and testing of alternatives for gradual enhancements and decisions on development routes. This paper reports such a kaizen process for the German winter wheat and silage maize yields.
For these two example crops, Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b present spatially distributed yield modeling approaches for Germany for the years 1991–2010. They fit separate time series models for 289 counties (Landkreise) and utilize panel data models for federal states (Länder) and river basins. A third approach bridges the spatial levels through random coefficient modeling, a hybrid method requiring restricted maximum likelihood estimation. All three methods are however driven by the same data on county basis. Coefficients of determination are generally high and exceed 0.9 in the northeastern parts of the country where crop growth strongly depends on precipitation. However, yield estimations for single years whose input data was excluded from calibration (leave-one-out validation) regularly show depleted correlations with the observed yields.
The predictive skill of these models may be further challenged in climate change impact studies lacking the additional variables of the calibration data, namely acreage and fertilizer price. It is argued that the effect of the weather can only be estimated when the other relevant factors are simultaneously considered in the calibration process (Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b, Kaufmann and Snell, 1997). But does this really apply when noise and further, unknown factors are involved?
We start the investigations presented in this article with the separate time series models (STSM) of Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b based on an earlier work by Wechsung et al. (2008), because their aggregated results show the best performance compared to the other approaches of that study. Our main idea to improve the predictive power of the regression models is to stabilize the parameter estimations by optimized aggregation of the STSM to panel data models (PDM). The arbitrary definition of panel groups by political boundaries or river basins did not lead to better predictions (Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b), because the model parameters depend on climatic and landscape units which do not match the prescribed regions. For Germany, relatively homogeneous soil-climate zones have already been delineated (Roßberg et al., 2007) which could also be considered for paneling, cf. Mirschel et al. (2014); Tao et al. (2014) and You et al. (2009) present respective examples for wheat cultivation zones in China.
Comparative clustering trials utilizing such additional spatial information might be subject matter of future assessments. Here we follow a more general approach: We assume that (1) the individually estimated parameter vectors of the STSM are scattered about one vector for the given crop, climate, and soil combination, (2) there are groups of counties with comparable soil and climate and thus sharing the same parameter vectors, and (3) the parameter variation of the STSM does not exceed the variation owing to soil and climate. A cluster analysis of the estimated parameter vectors will unveil the groups with common parameter values. The latter can then be estimated with much higher confidence through respective PDM, and the predictive power of these should exceed that of the STSM.
Cluster analyses have already been applied to yield observations or their trends (Lee et al., 1993, Trethowan et al., 2003). They are partly used in combination with topographical or soil characteristics to define management zones (Roel and Plant, 2004, Yang et al., 2006). Another application of cluster analyses is the comparison of the performance from different crop cultivars (Mądry et al., 2011). The idea of grouping multiple yield regression models by clustering results (of topographical features) was realized by McKinion et al. (2010) who, respectively report improvements in the R2 values of their yield simulations. To our knowledge, we present the first application of a cluster analysis of STSM parameters in crop yield modeling.
The general principle that separately estimated models tend to produce higher errors than their aggregations can be explained by noise reduction in the combined models through a larger input data basis (cf. Woodard and Garcia, 2008). For crop yield regression modeling, this has already been demonstrated by spatially averaging of STSM parameters or input variables according to the spatial autocorrelations of yield-governing factors like climate or soil properties. Examples include Lee et al. (2013) who use county-wise spatial lag modeling for wheat yields and wheat quality in Oklahoma or Cai et al. (2014) who developed geographical weighted panel regression for corn yields in the United States. Bornn and Zidek (2012) coupled the parameter estimation for STSM by Bayesian methods for enhancing wheat yield predictions in Canada, accounting both for spatial correlations between agricultural regions (model units) and management differences between provinces. The main difference between these approaches oriented towards spatial correlations and a cluster-based PDM definition is that the former presume spatial agglomerations and transitions of parameter values while the latter call for a limited number of valid parameter combinations but without the need for spatial relationships between the model sites or areas.
We start our investigations by extending the original STSM setup of Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b to counties without weather station; this is achieved by spatial interpolation of the aggregated weather variables. Then we test the effects of dropping the non-meteorological variables and splitting a compound variable into its constituents temperature and radiation. With, respectively improved model formulae, we perform several types of cluster analysis with the STSM parameters, and finally present the best resulting PDM. The aim of improving the predictive skill could be largely achieved, but there are of course some observations to be discussed and finally some caveats to be named.
Section snippets
Study area and data
Germany consists of 16 federal states which are currently subdivided into 402 administrative units on county level. The latter define the finest spatial resolution for which annual harvest yield statistics are published. Because the two most frequent crops in Germany, winter wheat and silage maize, are grown in most of the counties, there is a large spatiotemporal data base for statistical yield modeling. Due to frequent mergers and restructurings of the political geography, older county data
Starting point
The high correlations between the STSM estimations and the official yield statistics reported by Gornott and Wechsung, 2015a, Gornott and Wechsung, 2015b – most coefficients of determination (R2) being in the range of 0.7–0.9 – are by far not reached for non-aggregated county results in validation mode. Fig. 3 shows the respective maps which indicate practically no predictive power of the chosen approaches for large parts of the country. Both the wheat and the maize models work only well in
Discussion
The German agricultural landscape is not at all homogeneous. While the spatial pattern in general cropping intensity can already be spotted in Fig. 1, the regional preferences for certain crops differ also considerably. The figures in Table 1 illustrate these inhomogeneities already on federal state level, but on county level the yield data, regularly given in dt ha−1 by the statistical offices, is partly based on marginal cropping areas which have been observed to increase noise levels and
Conclusions and outlook
We have increased the predictive power of simple time series regression models (STSM) for inter-annual winter wheat and silage maize yield changes in more than 300 German counties over a two-decade time period (1991–2010) by alterations of the model equations and clustering into panel data models (PDM). First, we could largely improve the leave-one-out validation R2 values of the STSM by splitting a theoretically well-founded input variable into its constituents, simple time aggregates of
Acknowledgments
The authors would like to thank all the people involved in sampling and collecting the agricultural yield data, especially Andrea Lüttger and Richard Mommertz for feeding and maintaining our inhouse data base. This work was carried out within an international research project named “FACCE MACSUR—Modelling European Agriculture with Climate Change for Food Security, a FACCE JPI knowledge hub”. Furthering was provided by the German Federal Ministry for Education and Research (BMBF), grant no. FKZ
References (53)
- et al.
Efficient stabilization of crop yield prediction in the Canadian Prairies
Agric. For. Meteorol.
(2012) - et al.
Evaluation of the Integrated Canadian Crop Yield Forecaster (ICCYF) model for in-season prediction of crop yield across the Canadian agricultural landscape
Agric. For. Meteorol.
(2015) - et al.
Heat stress induced ethylene production in developing wheat grains induces kernel abortion and increased maturation in a susceptible cultivar
Plant Sci.
(2007) - et al.
Site-specific impacts of climate change on wheat production across regions of Germany using different CO2 response functions
Eur. J. Agron.
(2014) - et al.
Pre-harvest forecasting of county wheat yield and wheat quality using weather information
Agric. For. Meteorol.
(2013) - et al.
Spatial analyses to evaluate multi-crop yield stability for a field
Comput. Electron. Agric.
(2010) - et al.
YIELDSTAT—a spatial yield model for agricultural crops
Eur. J. Agron.
(2014) - et al.
River flow forecasting through conceptual models Part I—A discussion of principles
J. Hydrol.
(1970) - et al.
Simulating regional winter wheat yields using input data of different spatial resolution
Field Crops Res.
(2013) Multivariate geostatistics in S: the gstat package
Comput. Geosci.
(2004)