1 Introduction

Probabilistic Seismic Hazard Assessment (PSHA) has often been shown to be strongly influenced by the uncertainty in strong ground motion estimation, especially at long return periods, i.e., low annual rates of exceedance (e.g. Bommer and Abrahamson 2006). Ground motion prediction is primarily done through the use of empirical relations usually called Ground Motion Prediction Equations (GMPEs) and its variability, commonly referred to as sigma, σ, is broadly interpreted as aleatory variability, i.e., scatter attributed to the random and complicated nature of the physical processes of the generation and propagation of seismic waves in the earth’s interior. However, several studies (e.g. Anderson and Brune 1999) have suggested that such an interpretation is not accurate and that a fraction of sigma should be treated as epistemic uncertainty, i.e., uncertainty that can be resolved with the appropriate amount of knowledge and data. According to Anderson and Brune (1999), this is because part of the variability in ground motion is due to path and site effects, which may be repeated in subsequent earthquakes. Identifying and quantifying this epistemic fraction could optimally lead to a decrease in σ and hence to more realistic PSHA results. This is of primary importance when designing ground motion for critical facilities such as nuclear power plants and related infrastructure.

Accepting an epistemic fraction of σ is equivalent to dropping the ergodic assumption (Anderson and Brune 1999). In the ergodic approach, the lack of data in time is compensated with data in space, and the spatial and temporal variability of ground motion are considered equal. Hence, the expected variability of strong ground motion at a specific site is assumed to be equal to the σ of the GMPE that is adopted for that site. However, GMPEs are rarely constrained by data from a specific site, fault, or even region, but are usually constructed on the basis of more global sets of data from regions of similar (or not so similar) seismotectonic characteristics. In fact, as strong ground motion data sets have rapidly augmented during the past decades, it has been made possible to test the hypothesis of ergodicity and it appears that variability of strong ground motion at a specific site, commonly referred to as single-station sigma, σ SS , is usually much lower that the variability of global GMPEs (e.g. Chen and Tsai 2002; Atkinson 2006; Morikawa et al. 2008; Rodriguez-Marek et al. 2011, 2013). The same conclusion applies to a certain source-station path, repetition of which may lead to even lower variability (Lin et al. 2011).

In the quest of sigma components, Joyner and Boore (1981) were the first to separate the aleatory variability into inter- and intra-event variability. Nowadays, most scientists seem to adopt the corresponding terms of “between-event” and “within-event” variability suggested by Al Atik et al. (2010), which are illustrated in, what tends to be classic, Fig. 3 of Strasser et al. (2009). The notation of Al Atik et al. (2010) is followed throughout this paper, whenever possible.

Between-event variability is observed on events of the same magnitude and style-of-faulting and is attributed to differences in the source rupture process such as different stress-drop, rupture velocity, slip velocity, etc. Within-event variability is practically the spatial variability observed during a specific event between sites at the same distance from the source. This latter component of σ is attributed to phenomena pertinent to the deep geological structure and the geotechnical characteristics of the sites, such as ground motion amplification, near-surface attenuation, propagation effects and nonlinear site response. The between- and within-event residuals are generally assumed to be uncorrelated and, thus, the total standard deviation is computed as the square root of the sum of the squares of the between- and within-event standard deviation, commonly referred to as tau (τ) and phi (φ), respectively.

In the present work, we choose to study a small region, that of the EUROSEISTEST in Northern Greece, and work with a dataset that is as well constrained as possible with respect to what is encountered in practice, in order to be able to better constrain epistemic uncertainty and investigate the contribution of different parameters on the global and single-station uncertainty. We initially create a simple GMPE to remove the mean from our data and analyze residuals, aiming to investigate how we can decrease the components of global uncertainty by improving source and site metadata. We then move to single-station components of uncertainty. We compare our values to results found across literature and further examine the behavior of different σ components with period and number of recordings used. We identify the sites with the highest variability, φ SS , and discuss possible explanations. We examine tendencies with magnitude, distance, and depth at different periods, as well as possible effects of the azimuthal coverage of source-site paths around the examined sites.

Finally, we investigate an alternative approach towards computing φ SS : the use of existing predictive GMPEs in lieu of creating an ad hoc GMPE with local data. Up to now, two approaches have been used in previous studies: either a local model is calibrated in order to remove the mean and analyse residuals (this reflects most cases, e.g. Al Atik 2013; Rodriguez-Marek et al. 2011), or a global model is adopted (Chen and Faccioli 2013). This study shows that these approaches indeed yield comparable φ SS . This is a test not performed before, and important for cases where not enough data is available to create a local model, or when site-specific PSHA is performed.

2 The site under study

To perform our study we chose to use the accelerometric data of the EUROSEISTEST, a multidisciplinary European experimental site, which is located in the Mygdonia valley, at about 30 km to NE of the city of Thessaloniki in Northern Greece (Fig. 1a). EUROSEISTEST is one of the longest running test sites of its kind, being in operation since 1993. It consists of a three dimensional strong motion array of 16 surface and 6 downhole stations (Fig. 1b, c). The surface stations form a cross-shape array, expanding along the two axis of the elongated Mygdonia valley. At the center of the surface array, there is a group of five borehole stations installed at various depths down to the bedrock, at circa 200 m depth. The span of the valley in the NNW-SSE direction is about 5 km with site conditions ranging from soft sediments to hard rock, with average shear-wave velocities down to 30 m (Vs30) from 190 to 1840 m/s (site classes spanning from A to C/D, according to EC8; CEN 2003).

Fig. 1
figure 1

a Regional map showing the location of EUROSEISTEST and event epicentres. b Cross-section of the basin showing the two boreholes (adapted from Pitilakis et al. 1999). c Layout of the array. Stations are colour-coded according to EC8 classification (classes A, B, C) (adapted from Pitilakis et al. 1999)

The primary reason for selecting the EUROSEISTEST area is the good existing knowledge of site conditions, which is a result of more than 20 years of geophysical, geotechnical, and site response studies (e.g. Jongmans et al. 1998; Raptakis et al. 1998, 2000; Kudo et al. 2002; Manakou et al. 2010). Another advantage is that this dataset can be well controlled in terms of the quality of its seismological metadata, as it refers to a relatively small, well-studied area and to events after 2003, i.e., after the modernization and densification of permanent monitoring networks in Greece. Metadata is of fundamental importance in our study, as we are dealing mostly with small distances from the source and small magnitude events (Fig. 2).

Fig. 2
figure 2

Distribution of moment magnitude and epicentral distance for the final dataset for this study, colour-coded by the number of recordings per event. Events with less than 3 recordings (red markers) were rejected

3 Data and metadata

3.1 Accelerometric data

Accelerometric data used in our analysis correspond to the ‘filtered’ data set of the EUROSEISTEST database (http://euroseisdb.civil.auth.gr; last accessed February 2016; Pitilakis et al. 2013). These data have been filtered using an acausal, 4th-order band-pass Butterworth filter and a threshold value of 3 for the signal-plus-noise to noise ratio. Starting from more than 1100 recordings available, we created a subset of 691 recordings from 74 events, with magnitudes from M2.0-M5.6 and distances from 5 to 220 km. The selection of these recordings was based on several criteria that we set. We neglected events prior to 2003 as most data corresponding to them were recorded by low-resolution instruments, and parts of the array were not yet in operation (i.e., the E–W branch). We did not include events at distances longer than 220 km, or recordings that do not have both horizontal components. We rejected events coming from the southern Aegean subduction regime, thus kept focal depths to a maximum of 25 km. The recordings that passed these criteria are all the data points shown in Fig. 2. But most importantly, given the nature of this study, we rejected all events that were not recorded by at least 3 stations, because these could lead to inadequate resolution of the event terms and hence inflated values of τ. These rejected recordings are marked in the figure in red.

The lowest usable frequencies in the recordings of the dataset range from 0.1 to 3 Hz (decreasing with magnitude) and the highest usable frequencies range from 10 to 50 Hz. Corner frequencies, and especially the low-cut corners, vary significantly among the different recordings and this should be kept in mind when single station analysis results are interpreted at discrete periods.

To obtain the peak values of ground motion required in the subsequent step of the regression analysis, we computed the “average” horizontal component of each recording defined by the median rotated direction RotD50 (Boore 2010). These “average” components were then used to compute the Peak Ground Acceleration (PGA) and Spectral Acceleration (SA) at various periods from 0.01 to 2 s.

3.2 Source metadata

The components of ground motion variability in the EUROSEISTEST area were evaluated using two different sets of seismological metadata, which we will refer to as the “initial” and “refined” set from now on.

In the initial set, locations and magnitudes of the examined earthquakes were taken from the monthly seismicity bulletins of the Department of Geophysics of the Aristotle University of Thessaloniki (THE, http://seismology.geo.auth.gr; last accessed February 2016), complemented, wherever THE did not provide a solution, by the revised relative catalog of the National Observatory of Athens (NOA, http://www.gein.noa.gr, http://bbnet.gein.noa.gr; last accessed February 2016). Until 2008, the two Institutes were using different data sets, with THE having more stations around our study region. In 2008, NOA and THE, joined the Hellenic Uniform Seismological Network (HUSN) and since then they share the same dataset, although phase picking and routine earthquake location on a 24/7 basis is performed independently by the two Institutes.

In the refined set, we adopted the relocated foci of Galanis (2010), available for events up to December 2005, and those of Ktenidou and Roumelioti (2014) for events that occurred in the period 2006–2013 at distances less than 30 km from the center of the EUROSEISTEST array. For more distant events in the same period (2006–2013), we kept the routine solutions of THE. Magnitude values in the refined dataset are as in Ktenidou and Roumelioti (2014), as well, i.e., homogenized and re-estimated when Mw values were undefined or uncertain. For more details on the magnitude estimation problems in the area and their re-assessment, the reader is referred to the pertinent bibliographic source.

3.3 Site metadata

In Fig. 3a we show the distribution of recordings per station for the final dataset used in the study (i.e., after applying all selection criteria mentioned in section ‘Accelerometric data’). Another of the dataset’s advantages is the large number of recordings per station (ranging from 9 to 90, with an average of about 33 recordings per site). In typical global datasets one cannot expect such well-recorded sites and thus cannot place such good constraints on site term estimates, as this study will. We note here that the large range of recordings per station is due to several factors. On the one hand, some stations were placed later than others. On the other hand, downhole stations have experienced several malfunction issues, mostly associated with moisture getting into the underground equipment. These problems affect the quality of the recorded data and can set stations out of operation from time to time. This is directly reflected in the smaller number of recording at downhole stations compared to the surface station at TST.

Fig. 3
figure 3

a Vs30 values for all stations in the array. Dashed lines indicate the EC8 classification into classes A, B and C. b Number of recordings per station

In Fig. 3b we show the distribution of Vs30 values per station. We assigned a Vs30 value to each recording site after having examined all the available geological, geotechnical and geophysical information provided in the EUROSEISTEST web portal (http://euroseisdb.civil.auth.gr; last accessed February 2016). In Fig. 3a, the dashed lines indicate limiting Vs30 values for EC8 site classification. We note that we have adapted the definition of Vs30 to downhole stations, for which we compute it over the first 30 m directly under the buried station. There is overall a satisfactory diversity in soil categories ranging from quite soft soils at the center of the valley (EC8 class C, e.g. stations TST_000, GRA, FRM) to hard rock at the edges and at the deepest downhole station (EC8 class A, stations PRO_033 and TST_196).

4 Theoretical background on breaking down sigma

As mentioned earlier, the study of ground motion variability requires a median ground-motion prediction model (GMPE) that will be used as a reference. To describe the error term in the GMPE, we adopted a mixed (fixed and random) effects model as suggested by Abrahamson and Youngs (1992). The model can be described by the general form:

$$y_{es} = f\left( {X,\theta } \right) + \delta B_{e} + \delta W_{es}$$
(1)

where y es is the natural logarithm of the observed ground motion parameter, f(X, θ) is the median ground motion model, Χ is the vector of predictor variables of the model (e.g. earthquake magnitude, M, closest distance to ruptured area, R rup , etc.) and θ is the vector of model coefficients; δB e is the random effect of the eth event representing between-event variations and δW es is the within-event residual, which corresponds to the difference between an individual observation of an event e at a station s and the event-corrected median model estimate.

Residuals δB e and δW es are zero-mean, independent and random variables with normal distributions and standard deviations τ and φ, respectively. These two sets of residuals, when added, form the total residuals, ∆ es , of a GMPE:

$$\Delta _{es} = \delta B_{e} + \delta W_{es}$$
(2)

The within-event residuals can be broken down to:

$$\delta W_{es} = \delta S2S_{s} + \delta WS_{es}$$
(3)

where given multiple recordings of ground motion at a specific site s, δS2S S is the systematic deviation of the observed amplification at this site from the empirically predicted median amplification that could be attributed, for example, to site effects. Then, δWS es is the remaining within-event residual at site s from event e, i.e., the part of the residual that does not appear in a systematic manner. The standard deviations of δS2S S and δWS es are denoted by φ S2S and φ SS , respectively. This latter term, φ SS , is what is commonly referred to as single-station phi and is used to compute the single-station sigma (σ SS ), i.e., the aleatory variability of the ground motion model at a single site under the partially non-ergodic assumption. The standard deviations of Eq. (3) and the single-station sigma constitute the parameters to be primarily investigated in the present study. It is important to note that partially (single-station) or fully (single-path) non-ergodic standard deviations (phis) should not generally be used unless there is sufficient data to estimate—either empirically, theoretically, or numerically—the site response. This is because φ S2S can only be dropped from the aleatory uncertainty of the global GMPE if the error of the site response calculations is added to the epistemic part of the uncertainty within the PSHA framework. Furthermore, this causes the mean of the global GMPE to change, so that the non-ergodic sigmas should be used with an ad-hoc, site- or path-specific GMPE.

5 Study of ground motion uncertainty and variability

5.1 Single-station standard deviations

In the first part of our analysis we aimed to investigate how we can decrease the components of global uncertainty by improving source and site data. We started with the computation of the prediction model using a simple functional form, which included quadratic scaling and magnitude-dependent geometric spreading, as per Eq. 4:

$$f = b_{1} M + b_{2} M^{2} + \left( {b_{3} + b_{4} \left( {M - 4} \right)} \right)\ln \left( {R_{rup} + 10} \right) + b_{5} R_{rup} + b_{6} S + b_{7}$$
(4)

where f is the natural logarithm of the spectral acceleration, R rup is the closest distance to the ruptured surface assumed to be equal to the hypocentral distance, R hyp , S is the site effect term and b i the coefficients to be determined by the regression analysis. The goal of this model is not to be used outside this study for prediction purposes, but to capture the mean of our data and allow us to compute well-balanced residuals for studying standard deviation.

For our first set of regressions, we assumed complete lack of knowledge as to the geotechnical characteristics of the recording sites, (i.e., S = 0 in Eq. 4) whereas magnitude and distance values were defined based on available seismicity bulletins as described in the “Source Metadata” section (‘initial’ set of metadata). Resulting values for the different components of variability at PGA and at 1 Hz are plotted in the left part of Fig. 4, at the first two columns of data points. In the second column of data points on the same plot, we show the corresponding results after the use of the ‘refined’ set of metadata, i.e., the relocated epicenters and magnitude re-assessments. At PGA, the value of τ drops from about 0.43–0.29, thus decreasing by more than 30%. The decrease of τ is even more significant at the much lower frequency of 1 Hz, reaching 50%. This eventually leads to a decrease in the total variability, σ, while φ S2S and φ are practically stable. This implies that the decrease in σ is entirely due to the decrease in τ. We thus find that having good quality seismological metadata (magnitude and location) can be very important in reducing uncertainty. In this dataset, differences between the initial and refined metadata are 0.1–0.7 in magnitude, up to 4 km in epicentral and up to 10 km in hypocentral distance. A sensitivity check showed that improving magnitude accuracy for this dataset is more important than improving location metadata.

Fig. 4
figure 4

Left two columns the decrease of τ (and hence of σ) with the improvement of source parameter knowledge (for the ergodic case). Middle three columns the decrease of φ S2S (and hence of φ and σ) with the improvement of site parameter knowledge (for the ergodic case, using the relocated source metadata). Right column the passing from ergodic to partially non-ergodic values in the site-specific analysis context. All values are in natural log and are shown for PGA

Our second goal was to reduce uncertainty related to the site metadata, so we sought to simulate a gradual improvement in site information, and expected it to primarily affect the φ component of the total uncertainty. To account for differences in sites, we initially introduced a switch to differentiate between soil and rock. This is roughly the actual state of practice in Greece, where switches are implemented based on soil classification (e.g. Skarlatoudis et al. 2003, 2004). In the middle panel (three columns) of Fig. 4, this took us from the first to the second column of data points, with φ S2S , i.e., the systematic site-to-site variability, decreasing by 10% for PGA. Next, we replaced the soil/rock switch by the actual measured value of Vs30, which is the typical way of accounting for site response in current GMPEs. This brought about an additional decrease of φ S2S by approximate 10% more for PGA, which in turn led to a reduction in φ and σ. For longer periods, the overall improvement was larger, of the order of 30%. At these periods we observed that while φ S2S decreases, τ somewhat increases. This is most likely because the two quantities are anti-correlated, and in our dataset -as in most- it is inevitable to have some correlation between parameters, as all events are not recorded at all distances.

Examples of the breaking-down of the σ components are illustrated in Figs. 5 and 6. In Fig. 5 we plot the computed values of the between-event residuals, δBe, in relation to earthquake magnitude (top panel), as well as the computation of their standard deviation, τ (bottom panel). δBe values are not correlated with magnitude, thus verifying that the magnitude scaling of our GMPE describes the center of our data well. Figure 6a shows the within-event, δWes, components and their standard deviations. For each station, we have computed the mean of the δWes and thus estimated the systematic, site-specific terms, δS2S, whose standard deviation gives us φ S2S (Fig. 6b). Correcting each station’s δWes with the site term, δS2S, we then have the site-corrected within-event residuals, or δWSes (Fig. 6c). Figure 6d shows that these residuals are not correlated to distance, indicating that the distance scaling in our GMPE describes the center of the data satisfactorily. The standard deviation of each station’s δWSes gives us that site’s single-station variability, or φ SS,S (Fig. 6e). We verify that φ SS,S and δS2S,s are not correlated to Vs 30 (Fig. 6e, f) or the number of events each station recorded (Fig. 6g).

Fig. 5
figure 5

Example of the correlation of between-event residuals, δBe, with magnitude (top) and the estimation of τ (bottom). Results are for PGA, using the relocated source metadata and Vs30 as a site proxy

Fig. 6
figure 6

The break-up of a within-event residuals δWes, into b the systematic site term δS2S, and c site-corrected within-event residuals δWSes, and of the estimation of φ, φ S2S , and φ SS , respectively, d correlation of δWSes with distance, e correlation of φ SS,S with Vs30 at the stations, f correlation of δS2S,S and φ SS,S with Vs30, and g correlation of φ SS,S with the number of recordings per site. Results are for PGA using the relocated source metadata and Vs30 as a site proxy

After optimizing the global standard deviations, we proceeded to examine the single-station uncertainty. In order to use non-ergodic sigma values, site-specific estimates must be made for the site terms, using either empirical or numerical methods. With that in mind, we show the values computed for single-station φ and σ, namely φ SS and σ SS , in the last column of data points in Fig. 4 (panel on the right).

In Fig. 7 we compare the values we computed at PGA with corresponding estimates retrieved from the literature (Chen and Tsai 2002; Atkinson 2006; Boore and Atkinson 2008; Chiou and Youngs 2008; Morikawa et al. 2008; Anderson and Uchiyama 2011; Lin et al. 2011; Ornthammarath et al. 2011; Rodriguez-Marek et al. 2011, 2013; Al Atik 2013; Chen and Faccioli 2013; Luzi et al. 2014) and found them to be lower than φ SS , τ and σ values reported elsewhere. Especially for σ SS , the values that we obtain (of the order of 0.4 in natural logarithms) are closer to values reported for single-path cases, σ SP (e.g. Atkinson 2006; Morikawa et al. 2008; Lin et al. 2011).

Fig. 7
figure 7

Comparison of the results of this study (white bars) with corresponding results from published studies. All values pertain to PGA and are natural logarithms

5.2 Single station standard deviations: stability, dependencies and site variability

In the previous section we presented results for two selected periods, PGA and 1 s. Figure 8a–d shows the dependence of global and single-station uncertainty components with period, from 0.01 s up to 2 s. Looking at the number of recordings (Fig. 8e) at longer periods, our dataset decreases. This is because we only used data within its range of validity and we did not compute spectral accelerations outside the filtering frequencies, i.e., we had no data below the lowest usable frequency. Compared to the data we used at high frequencies, at 1 s we have already lost roughly half our data in terms of recordings and events, and at 2 s we have lost over 2/3 of them. This is why we do not show any results for periods longer than 2 s. The gradual increase in τ (Fig. 8a) is often observed as periods grow longer. It is reasonable given the decrease in the number of earthquakes we can use at long periods. As is often observed, τ and φ are anti-correlated, and we see this in Fig. 8a, where φ decreases as τ increases. Figure 8b shows the components of φ; we observe a spike in φ and φ S2S values around 0.7–0.8 s. These periods correspond to the resonant period of some of the sites in this study, and the spikes indicate an increase in variability due to site response at these resonances. Figure 8c shows the single-station values (dashed lines) compared to total φ and σ values (solid lines). At PGA (roughly 0.01 s), the ratio of σ SS /σ is near 80% and the ratio of φ SS /φ is near 70% (Fig. 8d). These are slightly lower than some published values, and this may indicate that our dataset has somewhat stronger site terms than usual sites.

Fig. 8
figure 8

ad Dependence of uncertainty components with period, e decrease of the number of recordings and events with period

The number of recordings used may have an effect on our results. In this study we used a criterion for the minimum number of recordings per event, which we set at 3. We checked the robustness of our estimates by running a parametric analysis for PGA, where we shifted this criterion towards stricter values, namely 5, 8, 10, 12, and 14 (Fig. 9). The results were found to be very stable from 3 up to 8 recordings per event (Fig. 9a). Whenever the criterion became stricter than 10, τ decreased, as was expected, since each event term was even better determined. The downside of this is that there were fewer events left in the dataset (half or less), as shown in Fig. 9b. At the same time, when the criterion became stricter than 10, the φ S2S increased slightly. This has been observed before, and for the US this increase is of the order of 10% (N. Abrahamson, unpublished results), which is similar to what we observe in this study. Our final decision was to keep our criterion value at 3 recordings per event. This value gives us a good estimate of all components of uncertainty, without sacrificing any of the data, especially as this would have a heavy effect on longer periods.

Fig. 9
figure 9

a Dependence of uncertainty components with the criterion of minimum required number of recordings per event. Results are for PGA. b Shrinkage of the dataset as the minimum number of recordings/event criterion increases

Another aspect of interest to the stability and robustness of our estimates is the question of representation of different site types in the dataset. As we mentioned in the “Data and Metadata” section, rock and soil sites are not represented equally, as is almost always the case. Added to the magnitude-distance correlations in our dataset, this may cause further trade-offs between τ and φ. We compute the average Vs30 per event, as an index of the prevalent type of station that recorded each event. In Fig. 10 we plot our δBe residuals for each event against this average Vs30 value: e.g. for Vs30 = 300 m/s, the event was recorded mostly by soft soil stations, while for 800 m/s, the event was recorded by mostly rock stations. The latter case is very rare, but it is evident from the plot that there is some correlation between residuals and Vs30 values. This correlation is still there even if we use a stronger minimum recording per event criterion, e.g. 10 rather than 3. This indicates that there are likely some correlations in our dataset between source and site terms, but—as in most datasets—this cannot be overcome.

Fig. 10
figure 10

Dependence of between-event residuals with the averaged Vs30 per event for different minimum recordings per event criteria (3 and 10 recordings/event)

Having tested the robustness and stability of the results, we proceeded to observe tendencies in them, and understand dependencies with various parameters. First we use the φ SS,S results to find which sites have the strongest and weakest variability in EUROSEISTEST. In Fig. 11a, we plot the stations of our array, colour-coded as to their φ SS,S values, at different frequencies: PGA, 1 and 2 Hz. In the plots, the scale from blue to orange orders stations from the least to the most variable. The plots indicate that the stations in the EW axis are less variable overall than some of the stations along the NS axis. Station PRO_000 has the highest φ SS,S at high frequencies, but is rather stable at low frequencies. This could be related to topographic effects, as it is the only station of the array located on a hill. Rai et al. (2016) observed higher φ SS,S due to topography, but mostly due to troughs rather than peaks in the terrain. Station TST_196 has the highest φ SS,S at 1 Hz. The resonant frequency of the soil column at TST is close to 1 Hz, so this could be related to the presence of a downgoing wavefield. The variability near the basin edges along the NS axis (stations PRR, STC) could be related to basin edge and lateral discontinuity effects, but if it is so, it is not clear how it behaves with frequency. The variability of most other stations is harder to speculate upon without complementing these results with simulations.

Fig. 11
figure 11

a EUROSEISTEST stations colour-coded with respect to their variability φ SS,S for PGA (left), 2 (middle) and 1 Hz (right). The scale of blue, green, yellow and orange goes shows the transition from the least variable to the most variable station. b The effect of including a site-related variable in the GMPE (no site information in solid lines vs. knowledge Vs30 in dashed lines) on φss,s for the surface and downhole stations at TST and PRO over all frequencies studied. c Same as (b), for δS2Ss

At this point is it useful to try to understand how the site-related variable in the GMPE can affect results. Figure 11b, c show the φss,s, i.e., the variability per station, and δS2S, i.e., the systematic deviations from the average site response scaling for each site, for the two limit cases presented in the first section: a GMPE with no site information (solid lines), and one with measured Vs30 as a proxy for site response scaling (dashed lines). We focus on the surface and downhole stations in the two boreholes, namely stations PRO_000, PRO_033, TST_000, and TST_196. Figure 11b shows that φss,s is very similar for both models, with and without Vs30 knowledge. In Fig. 11c we see that adding the knowledge of Vs30 brings δS2Ss closer to zero at all frequencies studied. When models include a site variable, δS2Ss values do not indicate absolute site response, but rather the deviation from the response predicted for the particular site class or Vs30; i.e., they indicate how far the prediction is from reality, and if the deviation is large, then this means that there would be a lot to be gained from site-specific analyses at that site. For the two downhole stations, PRO_033, and TST_196, δS2Ss changes noticeably when including Vs30 in the GMPE (meaning the initial δS2S values are rather representative of actual site response at those stations), while for the surface stations, less so. This indicates that use of Vs30 in the model improves the prediction for the hard-rock (downhole) sites more than it does for softer (surface) sites. The possible effect of the GMPE on φ S2S will be discussed again in the following section, when we test different existing models to compute single-station sigmas.

We also examined the existence of any dependencies of the between-event and the corrected within-event residuals with parameters such as magnitude, distance, and depth (Fig. 12). Global datasets have shown certain trends with magnitude and distance, but their magnitude range is well above the magnitudes of our dataset, so it is interesting to see to which extent we could detect such trends in our data. Figure 12a shows that τ tends to decrease for smaller magnitudes (rather than increase, as is sometimes observed), and that this tendency is stronger for longer periods. This could be related to correlations between source and site terms, possibly due to the fact that for very small magnitudes, the “kappa” factor, κ, of Anderson and Hough (1984) may often mask the true corner frequency of the source. Conversely, φ SS tends to increase for smaller magnitudes (Fig. 12c), and this is clearer for higher frequencies. This effect has been observed before down to M4.5 (e.g. Rodriguez-Marek et al. 2013), and is seen here to hold also for magnitude ranges down to M2. There are several possible reasons for this effect, including poor locations and depth estimates for very small events, or higher stress drop variability and kappa effects, which may cause some of the source uncertainty to map onto φ. The correlation of φ SS with distance (Fig. 12d) is not as clear. For shorter distances, φ SS seems to decrease at longer periods and increase for higher frequencies. The latter is found by Rodriguez-Marek et al. 2013. To the extent that events recorded at short distances are generally small events, this latter tendency is consistent with the previous observation. Finally, the dependency with depth (Fig. 12b) is not very clear either, but φ SS seems to increase slightly for shallower events at longer periods. This could be related to oblique incidence of the incoming waves onto the basin edges in the case of very shallow events. However, depth is the least constrained parameter and we can merely speculate as to its effect.

Fig. 12
figure 12

Dependence of a τ on magnitude, b φ SS on hypocentral depth, c φ SS on magnitude, and d of φ SS on closest distance to the rupture at 1, 2 Hz, and PGA. The solid lines indicate the average value and the symbols are the binned values

Another possible interpretation for the increase of site variability for small events compared to large ones has to do with the azimuthal coverage. Larger events in our dataset are very few to provide a good azimuthal coverage of the EUROSEISTEST, while small events tend to surround the site better. Poor azimuthal coverage may lead to poor path sampling and hence to lower variability for larger events, while the good coverage may lead to better sampling of all paths surrounding the site, and hence higher variability for smaller events. This is difficult to quantify. One index that has been proposed is the closeness index (CI) defined by Lin et al. (2011). The CI value ranges from 0 for collocated events to almost 2 for events at 180° difference in azimuth from a site. A CI higher than 1 indicates that there are no significant single-path effects that would decrease the φ SS towards φ SP . Following the methodology those authors lay out, we computed the closeness index for each pair of events recorded by each station, together with the normalized difference in the within-events variability for each pair of events. We computed the average CI per station and found that they range from 1.3 to 1.6, so this index does not show significant single-path effects for our dataset, likely due to poor coverage.

6 Using existing GMPEs to compute single-station sigma

Our reasoning for using a local GMPE to study residuals and standard deviations in our dataset was given in the previous section. In this section, we further investigate the possibility of using existing global GMPEs for the purpose of computing single-station variability (this was done e.g. by Chen and Faccioli 2013, for New Zealand). Of the very large number of existing models, we choose a few representative ones based on certain criteria:

  • Akkar et al. (2014), as an example of the most recent generation of European models from the RESORCE database,

  • Cauzzi et al. (2015), as a recent model for active crustal regions outside Europe

  • Bindi et al. (2011), as a European GMPE which contains a considerable amount of data between M4.0 and M4.5,

  • Danciu and Tselentis (2007) and Skarlatoudis et al. (2003), as the most recent models developed for crustal events in Greece, and

  • Skarlatoudis et al. (2004), as a model specifically developed for smaller magnitudes in Greece.

Table 1 presents some of the salient characteristics of these models, including magnitude and distance range, distance metric, style-of-faulting (normal, reverse, strike-slip, and thrust), use of quadratic scaling for magnitude and site characterization information. Many of these models predict quantities the scope of the present study, so in the last column we only mention whether they predict SA at a full range of periods or merely PGA. (which we denote as a and b), while Skarlatoudis et al. (2004) proposes two formulae for PGA from small events (which we denote as a and b) and one for the mixed dataset of small and large together (which we denote as x). Cauzzi et al. (2015) propose one equation with three alternative ways of computing site amplification (we denote these as a,b,c). We also note here that all models but one have a range of applicability above M4.0–M4.5. From Fig. 2 it is clear that most of our data come from smaller earthquakes. We will revisit this point in detail in what follows.

Table 1 Comparison of the salient features of the following GMPEs: Akkar et al. (2014), Cauzzi et al. (2015), Bindi et al. (2011), Danciu and Tselentis (2007), Skarlatoudis et al. (2003, 2004)

We analyze the residuals between the predictions of these models and our dataset with refined metadata. We assume normal faulting mechanisms for our events, as this is by far the prevalent mechanism in the region. We use hypocentral distance, R hyp , for Akkar et al. (2014) and Cauzzi et al. (2015), and the epicentral distance, R epi , for all other models. We are interested first in observing how each model’s scaling fits our dataset, and secondly in whether any scaling problems are ‘absorbed’ by the between-event uncertainty, τ, and possibly by the systematic deviation from average site scaling (φ S2S ), so as to allow an adequate estimate of the single-station variability, φ SS .

Figure 13 shows a comparison of all the standard deviation components (τ, φ, σ, φ S2S , φ SS , σ SS ) computed at all periods available using the chosen GMPEs. The standard deviations computed in the first part of this study, where we fit a new GMPE to the data, are shown for comparison. The first observation is that the φ SS seems practically independent of the model used, as all data points coincide for each period. The systematic deviation, φ S2S , is rather stable for PGA, but at longer periods exhibits some discrepancies between existing models and our model, as well as between individual existing models. These discrepancies are stronger as periods become longer and will be discussed later. But the most striking discrepancy is between τ values, which appear to be strongly dependent on the model used. All models show larger τ values than what we achieved by fitting a new GMPE, especially around 2–10 Hz. Up to 1 s, the model of Akkar et al. (2014) shows the largest increase in τ, while that of Danciu and Tselentis (2007)—and at short periods, also that of Cauzzi et al. (2015)—lie closest to the τ of the herein computed local GMPE.

Fig. 13
figure 13

Comparison of the standard deviation components (τ, φ, σ, φ S2S , φ SS , σ SS ) as computed at all periods available using the following GMPEs: Akkar et al. (2014; ABS14), Cauzzi et al. (2015), Bindi et al. (2011; Bindi11), Danciu and Tselentis (2007; Danciu07), Skarlatoudis et al. (2003; Ska03, 2004; Ska04). The standard deviations computed in the first part of this study, fitting a local GMPE to the data, are shown by the black solid line. We note that the φ SS is practically independent of the model used, while τ is very strongly dependent on the model. All models show much greater τ values than what we achieved by fitting a new GMPE. Up to 1 s, the model of Akkar et al. (2014) shows the largest increase in τ, while that of Danciu and Tselentis (2007) lies closest to the results of the new GMPE. All values are natural logarithms

In Fig. 14a we show δΒ e residuals for all GMPEs tested. Residuals trends with magnitude are evident in all cases, suggesting that the magnitude scaling used in these models is not appropriate for our dataset. This was to be expected, given that the magnitude scaling becomes steeper as magnitude decreases and our data are beneath typical model magnitude ranges. The figure indicates the equation of the magnitude trend (a + bM) and its coefficient of correlation (R2) to quantify the deviation of each model’s magnitude scaling from our data. Based on the metrics, the model of Danciu and Tselentis (2007) has the least deviation. However, in Fig. 14b we see that the δWS es residuals are well balanced with distance. This means that, for all models tested, the magnitude scaling errors were successfully absorbed by the events terms, whereas there is no discernible error in the distance scaling due to the fact that the tested GMPEs come from the same or similar seismotectonic regions (Greece or Europe). We stress that it is these two conditions that allow us to accurately compute single-station sigma despite the obviously large σ values and the biased total residuals.

Fig. 14
figure 14

a Between-event residuals for observed spectral accelerations at PGA for all GMPEs used, showing clear trends with magnitude, b Within-event residuals for observed spectral accelerations at PGA for all GMPEs used, showing no trend with distance. GMPE codes as in the caption of Fig. 13

Finally, we examined the differences between tested models in the final products: φ S2S and φ SS . In Fig. 15 we focus on PGA, and break up the average values of φ S2S and φ SS into their site-specific components, φ SS,S and δS2S. We recall at this point the unusually high number of recordings per station in this study compared to typical global datasets, which ensures good estimates of site-specific terms. In Fig. 15 we plot φ SS,S and δS2S per station and also show the average values for comparison. φ SS,S is remarkably stable and models converge the best at the most and least variable sites of the array (PRO_000 and PRS, respectively). δS2S is rather stable for most stations, but shows striking variability for stations PRO_033 and TST_196, which are the only two very hard rock sites in the array. We note that models that do not account for site information in their formulation (i.e., Skarlatoudis et al. 2004) plot far from models that use Vs30 as a predictor variable (i.e., our ad hoc GMPE, the Akkar et al. 2014, and the Cauzzi et al. 2015 b model), while models that use a site classification switch (i.e., Skarlatoudis et al. 2003; Danciu and Tselentis 2007; Bindi et al. 2011) tend to plot in between. The difference at PRO_033 between our model and the others that use Vs30 is significant, but this is in line with what we observed during testing different formulations of our own GMPE. Similar observations can be made for the results at 1 Hz (Fig. 15b), i.e., φ SS,S is again rather stable, while differences in δS2S are even greater at the two hard rock stations.

Fig. 15
figure 15

a Comparison of the standard deviation components δS2S and φ SS,S computed for PGA using different GMPEs for each station of the array (Vs30 values in parenthesis—stations are ordered by increasing Vs30). The standard deviations computed in the first part of this study, fitting a local GMPE to the data, are shown by the black solid line. We note that the overall φ SS is practically independent of the model used, while φ SS,S shows some variability for stations E01 and TST_136. φ S2S,S shows the largest variability for the two hardest downhole stations: PRO_033 and TST_196, b Same as in (a) but for 1 Hz. GMPE codes as in the caption of Fig. 13

7 Conclusions

Ground motion variability was studied using a carefully selected dataset, well constrained in terms of source parameters and site information. Refinement of earthquake locations and magnitudes led to a significant reduction of the between-event component τ of global aleatory uncertainty σ (by 30% at PGA and up to 50% at 1 Hz). This underlines once more the importance of the seismological metadata quality in constraining the uncertainty in a PSHA framework.

Having a good knowledge of the recording sites categories, we computed single-station components of uncertainty and found that, within a partially non-ergodic framework, the total σ and φ can be decreased by about 20 (σ SS ) and 30% (φ SS ), respectively. Our single-station values, when compared to corresponding values published in the literature, were found to be lower and rather closer to published single-path estimates. We cannot exclude the possibility of some effect of poor ray coverage in our dataset, although the classic single-path index (the closeness index) does not indicate such effects in our case.

We examined the behavior of uncertainty components with period from 0.01 to 2 s, and found that τ increases as expected. We found a small peak in the φ S2S around 0.7–0.8 s, which may indicate greater variability around some of the sites’ prevalent resonance period. We also investigated the sensitivity of our results to the number of recordings used. Despite some correlations in the data, we found that a criterion of 3 recordings per event is sufficient for high frequencies, and is actually necessary to preserve enough data for low frequencies. However, most stations in the dataset have much higher numbers of recordings, yielding good estimates of site terms. We next identified the stations of the array with the highest and lowest site variability (φ SS,S ). φ SS,S does not correlate with Vs30, but we speculate some correlation to 2D/3D wave propagation effects, such as topographic amplification at PRO_000, down-going wave fields at TST_196, and basin edge effects at PRR, PRS and STC, all of which would exhibit sensitivity to azimuth, among other factors. By binning our data with respect to magnitude, distance and depth, we observed tendencies at different periods. At smaller magnitudes, τ tends to decrease and φ SS tends to increase. This may be due to poorer constraints for smaller events, which may map on φ SS rather than τ due to κ and source-site trade-offs. Dependencies with distance and depth can be seen in our data but are less straightforward.

Finally, we investigated an alternative approach towards computing φ SS : the use of existing predictive GMPEs in lieu of creating an ad hoc GMPE with local data. We select some models from Greece and Europe, mostly calibrated for larger magnitudes, but compatible in terms of regional attenuation. All models greatly overpredict ground motion with respect to our data, mainly because of errors in magnitude scaling, which is steeper for our small events. However, we found that the event terms correct for these large errors (the price being much higher τ values) and that within-event residuals are well balanced, which indicates that there is no significant error in distance scaling. Under these two conditions, we suggest that the φ SS component can also be computed this way, i.e., by resorting to existing global GMPEs rather than fitting a new one through the data. However, this implies the need to use a regionally applicable model, so that the distance scaling is adequate. Up to now, either a local model is calibrated in order to remove the mean and analyse residuals (this reflects most cases), or a global model is adopted. This comparison test has not been performed before, and is important for cases where not enough data is available to create a local model, or when site-specific PSHA is performed.

We also point out that the GMPE formulation (namely, its site response predictor variable) can affect the systematic deviation δS2S. δS2S can vary between models that account for site response directly through a Vs30 proxy and models that do not account for it, or use site-class-based switches. These differences are large for the two hard rock downhole stations of our array, TST_196 and PRO_33, at all frequencies, especially short ones. This may indicate that GMPEs may be poorly constrained where they are sometimes most needed, i.e., for hard rock; this may be due to the lack of hard-rock data in global datasets, the greater difficulty in characterizing hard-rock sites compared to soft sites, and the smaller representativeness of Vs30 as a proxy (see also Ktenidou and Abrahamson 2016). However, the discrepancy between GMPEs in δS2S does not affect φ SS,S values overall, which are rather consistent between models at all sites and over all periods.