Introduction

The real estate industry is an important manifestation in the process of urbanization, and the housing price is a vital economic indicator reflecting the sustainability of regional development. Actually, the housing market has been particularly preoccupied late because of the excesses of rampant housing price growth, especially in Chinese cities. With the post-1978 reforms, China established a marketized system and shifted from a centrally-planned to a more market-based economy1 which means the market plays a dominant role in capital allocation and factor production2. A market-based system of housing provision was gradually founded since 1988 which promoted a vigorous urban housing market and caused housing prices skyrocketing3. As a barometer of national economic development, the soaring increasing housing prices in China’s cities has been concerned by many observers and analysts4,5,6. From 2005 to 2015, the price-income ratio is the nominal house price divided by the nominal disposable income per head (https://data.oecd.org/price/housing-prices.htm) of China’s leading real estate cities, Beijing and Shenzhen, increased from 7.69 to 13.37 and 5.95 to 15.54 by respectively calculating them from data sources on the websites: Shenzhen Municipal Statistics Bureau: http://www.sztj.gov.cn/; Beijing Municipal Bureau of Statistics: http://www.bjstats.gov.cn/tjsj/; National Bureau of Statistics of the People’s Republic of China: http://data.stats.gov.cn/search.htm. For housing markets, this rate is described as “affordable”, which is one of the key measures for a region’s socio-economy stability and should not exceed three times gross annual household income in general7. The ten-year trend of the rising housing price-income ratio shows that the price increases of commercial housing in China have been much higher than the increases in the ability of residents to pay. The implementation of the purchase restriction policy did not significantly affect housing prices but eased the impact of rising housing prices on technological innovation activities by suppressing excess. These cases demonstrate that housing prices in China should be studied urgently.

Chinese economic hypergrowth and urban ascent in the past 3 decades were driving forces behind the fast growth of housing markets in urban areas8. Likewise, the housing price is related to such factors as population migration and distribution, gross national product (GDP) and urbanization from a macro scale. For instance, housing price is tied with socio-economic components has been confirmed. A large number of economists pointed out the correlation between the GDP and housing prices9,10. The process of urbanization causing land price decrease directly to land use regulation restriction severely brings about increasing housing price11,12. And Regional variations in urbanization levels would affect housing prices13. In addition, population issues may also be related to housing prices, Saiz14 noted that immigration pushes up rents and housing values in US cities. Gonzalez and Ortega15 found that in the causal estimates of the effect of immigration on housing prices in Spain over the period 2000–2010, immigration was responsible for one-quarter of the increase in housing prices. Therefore, these indicators can be used to represent the development of regional housing prices.

Traditionally, the housing price data came from the census, which do not reflect timely market activity or the full scope of the regional estate market. However, it is worth noting that the night-time light imagery was used to estimate the influencing indicators of the housing price in real time and city scale. As a surrogate measure, the night-time light imagery has a potentiality to replace multiple indicators such as economic, social, resources and environmental circumstances16,17,18,19. For example, Elvidge et al.20 used the Defence Meteorological Satellite Program Operational Linescan System (DMSP-OLS) data to study the relationship between gross GDP, electric power consumption and light area in 21 countries and found that the light area was highly correlated to GDP and electric power consumption. Moreover, Doll et al.21 analysed night-time light remote sensing data of 11 European Union countries and the United States; and such data have been shown to correlate with national-level figures of GDP. Meanwhile, studies in China22,23, Africa24 and the United States25 have led to similar conclusions. Additionally, DMSP-OLS night-time light remote sensing data have been used in the study of urbanization and urban spatial expansion26,27,28,29, population migration and distribution30,31,32,33. Overall, previous studies have demonstrated that night-time light remote sensing data have been successfully used in social-economy factors, such as population migration and distribution, GDP, urbanization and so on.

Based on recursion that housing price is related to such factors as population migration and distribution, GDP and urbanization, and these factors can be predicted by night-time light remote sensing data, it can be deduced that there is a correlation between night-time light remote sensing data and housing price. The more frequent and dense human activities, the brighter light reflection and the more obvious result of night-time light remote sensing data. And the more frequent and dense human activities, the greater economic expansion and development. Night-time light remote sensing data and the socio-economic has a positive correlation, meaning they increase and decrease together34. The housing price is closely tied to these socio-economic factors. Therefore, the quantitative connections between night-time light remote sensing data and housing prices are robust and worth studying. In fact, there are few studies on the correlation between night-time light and the real estate market. E.g., Zhang35 estimated Chinese provincial real estate development time lags between land being purchased and the property being occupied using DMSP/OLS and real estate statistical data; and Wang et al.36 estimated Chinese housing vacancy rate using night-time light data and OpenStreetMap37 data. However, the aforementioned studies have researched many factors influencing housing prices instead of itself. Furthermore, owing to the spatial heterogeneity of China housing market, these researches indicated that using night-time lights alone for spatial modelling is insufficient to study housing markets. Therefore, this study provides a new perspective to mine the relationship between night-time light imagery and regional housing price using time-series analysis of individual cities, which avoids spatial differentiation of average housing prices at different cities. Meanwhile, there are some specific advantages for studying housing prices by night-time light data: Firstly, the night-time light date is objective. Night-time light date can directly reflect human activities to be used as a more objective data in socio-economic parameter estimation, most of which are emitted by human activities. In comparison to data in studying housing price in the field of economics, in the indicators for measuring economic development such as GDP are less objective and difficult to avoid statistical errors and human impact. Secondly, Night-time light data is easily available. It can be downloaded directly from the official website of the National Oceanic and Atmospheric Administration (NOAA). Compared with studying housing price researches in economics, they often require diversified indicator data for modelling, but these data are not easily available. Thirdly, the night-time light data has the advantages of dynamic updating and global coverage. With the existing relative background and the study of housing prices, we propose a regression model between annually average night-time light intensity (ANLI) and annually average commercial residential housing prices (ACRHP) for target provincial capital cities in inland China respectively. The work and contributions of this article are as follows:

(1) Based on the time-series analysis of individual cities, a new reliable data mining model between ANLI and ACRHP is first proposed. In order to guarantee that our study data are more reliable, we eliminated the abnormal errors of a few years and selecting an optimal mining model from several models to ensure the reliability of the results.

(2) The uncertainty of quantitative analysis about the prediction of ACRHP in the field of remote sensing is first studied and analysed by adjusting annual inflation rate or not. The traditional prediction usually obtains a certain value, whereas we propose a scientific and reasonable interval estimation to quantitatively measure the uncertainty of ACRHP using remote sensing.

(3) A new prediction method of night-time light intensity is proposed for the case of missing data for some years. The DMSP-OLS night-time light data are provided only until 2013; therefore, we propose a method for predicting intervals of night-time light intensity in subsequent years.

(4) Mining mechanism between ACRHP and ANLI is also first revealed. Moreover, the influence and lag of policy on ACRHP are also discussed by trend analysis.

(5) This paper not only enriches the application research of night-time light data but also provides a new reference point-of-view (i.e. using DMSP-OLS ANLI) to mining ACRHP in inland capital cities of China. It has great theoretical and practical significance for the real estate market.

Study area and data

Study area

The study area included 18 capital cities in inland China. They are Changchun, Changsha, Chengdu, Chongqing, Guiyang, Harbin, Hefei, Hohhot, Kunming, Lanzhou, Nanchang, Taiyuan, Urumqi, Wuhan, Xi’an, Xining, Yinchuan and Zhengzhou. Because of lacking official statistics of Lasha’s ACRHP, this paper doesn’t consider the inland capital city of Lasha. The locations of these cities are shown in Fig. 1. Comparing with the coastal city of China, these cities in inland China have appropriate urban estate economy developing scale and night-time light imagery quality.

Figure 1
figure 1

The capital cities of inland China.

Geographically, these inland capital cities cover most economy developing regions of China presently. All of these capital cities are important hub because they connect other parts of the province. Take Wuhan as an example, it’s a key role in China’s domestic transportation which has been regarded as the “thoroughfare to nine provinces”. With population agglomeration and urban expansion, the economy of Wuhan was developing rapidly in the past decade, representatively in estate economy. The local government took varied policy measures to stimulate the steady rise of housing prices, which provide a suitable condition for us to study housing prices.

In addition, comparing with the coastal region of China, the saturated digital number38 values of the light image in the economic status of the inland region before 2013 are not serious because of lagging in economic development. There only a few inland provincial capital cities have saturation problems close to 2013. Furthermore, this problem is only concentrated in part of these cities’ core area. The high degree of saturated DN values of light images may indeed have a certain impact on related research, but there is currently no approach recognized by the public to reduce the high degree of saturated DN values of light images. The existing methods mainly include radiation calibration, non-radioactive calibration and the vegetation adjusted night-time light to improve the saturation problem39,40,41. However, these methods have shortcomings and there is no officially recognized method. In this case, the accuracy of the desaturated data cannot be guaranteed. Therefore, choosing inland China as the study area is appropriate and ensuring the credibility of the results to a large extent.

Study data

DMSP-OLS night-time light data

In this article, we use the DMSP-OLS night-time light data to study housing prices. Comparing with NPP-VIIRS sensor launched in 2011 without history data and earth observation satellite Luojia 1-01 launched in mid-2018, the DMSP-OLS dataset can synthetize annual average data with long time-series historical data. The DMSP-OLS dataset was downloaded from the website of the NOAA (http://www.ngdc.noaa.gov/eog/dmsp/downloadV4composites.html). The data include average visible light, cloud-free coverage and stable light average data from 1992 to 2013. Accidental noise sources, such as clouds, lightning, flames and burning gases, have been eliminated in the stable light average dataset, which has values ranging from 1 to 63. We selected these datasets because some major outliers (such as those from fires) had already been discarded. Figure 2 shows the DMSP-OLS data for the 18 inland provinces and provincial capitals in China.

Figure 2
figure 2

The 2013 DMSP-OLS data of the 18 inland provinces and provincial capitals in China.

In this article, preprocessing of the DMSP-OLS data mainly included three steps:

(a) Reprojecting the imagery. To make it convenient to clip the imagery, the projection coordinate system was converted into the Lambert Conformal Conic system and the spheroid was converted into WGS 1984.

(b) Clipping the imagery. To make the imagery clearer, we clipped the DMSP-OLS stable light average data imagery and only kept the imagery of target cities.

(c) Intercalibrating radiometric information. To automatically extract the reference pixels with stable lights, the LMedS-based method42 was used to intercalibrate radiometric information.

Land area of administrative region data

To ensure that all the statistical data are unified and accurate, the land area data used in this paper are all from the China City Statistical Yearbook (2013). Table 1 shows the land area data of the 18 provincial capitals in China.

Table 1 The land area data of the administrative regions of the 18 provincial capitals.

Housing price data

To ensure that all the statistical data from 2002 to 2013 are unified and accurate, ACRHP data used in this paper are all from the China Statistical Yearbook (2002–2013). Table A (in the Appendix) shows the ACRHP data from 2002 to 2013 of 18 inland provincial capitals in China.

Methodology

In this study, we applied for cities’ polygon data from the National Geomatics Centre of China (http://ngcc.sbsm.gov.cn/). Then we overlaid the vector polygon data on the DMSP-OLS data and clipped out the target capital city imagery. After data preprocessed, the ANLI of each region is calculated and the correlation between annually ANLI and annually ACRHP is studied by establishing the regression model that is one of the important data mining methods. Next, we conduct a feasibility assessment to obtain the optimal mining model. Finally, we obtain and compare the results of the experiment. The process flow of our study is illustrated in Fig. 3.

Figure 3
figure 3

Flow chart of research processing.

ACRHP adjusted by inflation rate

To correction variation of the data, the inflation rate was used to adjust the ACRHP. We obtained official inflation rate data form the World Bank (https://data.worldbank.org/). Table 2 shows Chinese annual inflation rate from 2002 to 2014. Table B (in the Appendix) shows the ACRHP data adjusted by the inflation index.

Table 2 Chinese inflation as measured by the annual growth rate of the GDP implicit deflator.

Calculation of ANLI

First, given the problem of the inter-annual variation of night-time light, the exponential smoothing method was used in this study to obtain stable regional total night lights43,44. Then to calculate ANLI which represents the city night-time light intensity per unit of land area. It can be presented as a formula as follows:

$${\rm{ANLI}}=\mathop{\sum }\limits_{i=1}^{63}{N}_{i}{B}_{i}/S$$
(1)

In this formula, Ni represents the number of pixels with brightness i, Bi represents the brightness value itself, and S represents the land area data of the target capital city’s administrative region. Table 3 shows the ANLI calculation results from 2002 to 2013 of 18 provincial capitals in inland China.

Table 3 ANLI values of 18 provincial capitals in inland China (2002–2013).

Optimal regression model selection

Regression analysis is one of the classical statistical methods for data mining45,46, which can help to identify whether the correlation between two or more variables. In this study, the response variable is ACRHP and the explanatory variable is ANLI. Due to the spatial differentiation in Geographical science, the economic development levels of the provinces are usually different, and the ACRHP and ANLI data vary greatly. Hence, different empirical models are established for different cities in this paper, including the linear regression model, the exponential regression model, the logarithm regression model, the quadratic regression model, and the power regression model.

Linear regression model:

$${{\rm{ACRHP}}}_{j}=a{({\rm{ANLI}})}_{j}+b$$
(2)

Exponential regression model:

$${{\rm{ACRHP}}}_{j}=a{e}^{b{({\rm{ANLI}})}_{j}}$$
(3)

Logarithm regression model:

$${{\rm{ACRHP}}}_{j}=a\,{\log }_{b}{({\rm{ANLI}})}_{j}$$
(4)

Quadratic regression model:

$${{\rm{ACRHP}}}_{j}=a{{({\rm{ANLI}})}_{j}}^{2}+b{({\rm{ANLI}})}_{j}+c$$
(5)

Power regression model:

$${{\rm{ACRHP}}}_{j}=a({{\rm{ANLI}}}_{j}^{b})$$
(6)

where a, b and c are regression coefficients; j = 1, …, 18 refers to one capital city of observation. So, the optimal model for j-th city can be determined by:

$$\mathop{\max }\limits_{k=1,\mathrm{.}.,5}\{1-\sum _{i}{({{\rm{ACRHP}}}_{ik}-{\widehat{{\rm{ACRHP}}}}_{ik})}^{2}/\sum _{i}{({{\rm{ACRHP}}}_{ik}-{\overline{{\rm{ACRHP}}}}_{ik})}^{2}\}$$
(7)

where k means 1~5 different regression models corresponding Eq. (2)~(6) respectively; i=1,…,12 refers to year of observation at j-th capital city; \(\widehat{{\rm{ACRHP}}}\) expresses an estimator by regression; and \(\overline{{\rm{ACRHP}}}\) expresses a mean value.

We calculate the coefficient of determination (R2) of each existing regression and compare them to obtain the optimal model with the highest R2. It is worth noting that, statistically, the number of samples used in this experiment is sufficient. In this study, the essential observation number is 2 (because Eq. (2)~(6) usually includes 2 parameters a and b), and the observation number is 12 so that the degree of freedom (i.e. redundant observation) is 12–2 = 10. Hence redundant observation is sufficient.

Abnormal error elimination

To prevent gross error (i.e. abnormal error) influences on the accuracy of the regression model between ANLI and ACRHP, least median of squares (LMedS)42,47,48,49 is used to eliminate gross errors (abnormal value). The objective function can be written:

$$\min \,[\mathop{{\rm{med}}}\limits_{i}({r}_{i}^{2})]$$
(8)

where \({r}_{i}\) is the ith residual error of the ith observation from Eq. (2)~(6). The “med” means the median. Then:

$${w}_{i}=\{\begin{array}{cc}1 & {\rm{if}}\,{{r}}_{i}^{2}\le {(2.5\times 1.4826(1+5/(n-l))\sqrt{{M}_{j}})}^{2}\\ 0 & {\rm{otherwise}}\end{array}$$
(9)

when wi = 0, Mj is the minimal median for each subsample indexed by J; and l is the essential observation number of regression Eq. (2)~(6), which means 2.5-standard-deviation rule. Hence, outliers are removed by the LMedS.

After abnormal error elimination, regression models are again established and the R2 of each regression model is also calculated. By comparing the former R2 and the current R2 of each model, the regression model with the highest figure of R2 is selected to be the optimal mining model.

Uncertainty estimation and performance evaluation

The ANLI of future years should be required in housing price prediction but the DMSP-OLS night-time light dataset was only updated to 2013. Considering that the night-time light data has the characteristics of being dynamic, stable and objective, we use time series prediction to avoid image distortion. The steps of housing price prediction are as follows:

(a) ANLI regression models for each provincial capital are established according to the time series; then, the function with the highest degree of R2 is selected as its regression model to predict the future ANLI of the target cities.

(b) The assumption of linear regression is used, and the nonlinear function of ANLI prediction should be transformed into the linear function: \({Y}_{0}={b}_{0}+{b}_{1}{x}_{0}\). Combined with the target cities, the predicted function is assumed to be:

$${\hat{Y}}_{0}={\hat{b}}_{0}+{\hat{b}}_{1}{x}_{0}$$
(10)

where \({\hat{b}}_{1}\) and \({\hat{b}}_{0}\) represent coefficients of the linear function predicting ANLI for target cities by parameter estimation.

The interval estimation of \({Y}_{0}\) is as follows:

$$({\hat{Y}}_{0}\pm {t}_{\alpha /2}\,(n-2)\hat{\sigma }\sqrt{1+\frac{1}{n}+\frac{{({x}_{0}-\bar{x})}^{2}}{{\sum }_{i=1}^{n}{({x}_{i}-\bar{x})}^{2}}})$$
(11)

where n represents sample size, \(\hat{{\rm{\sigma }}}\) represents population standard deviation, \(\bar{x}\) represents sample mean, \({t}_{\alpha /2}\) represents a value of confidence level (α) corresponding to T-distribution, and \(\alpha =0.05\).

(c) ANLI interval estimation for target cities of future years is calculated by MATLAB (the software package).

To ensure the authenticity of the model, the optimal data mining model should be a progressive feasibility assessment. The ANLI interval estimation of future years is used in the optimal data mining model between annually ACRHP and annually ANLI; therefore, the result of the ACRHP interval estimation is calculated. Finally, we compare this result with the official statistical ACRHP published by the National Bureau of Statistics of the target cities to demonstrate feasibility. Therefore, the optimal data mining model is verified and can be used to predict housing prices.

Experimental results and analysis

The coupling results of ANLI and ACRHP

Hefei is taken as an example and five regression models between ANLI and ACRHP are established so that the optimal regression model can be obtained by comparing the R2. Table 4 shows five regression models between ANLI and ACRHP for Hefei.

Table 4 All the regression models of ANLI and ACRHP for Hefei.

(R2 represents the coefficient of determination used to evaluate the accuracy and reasonableness of the coupling models.)

After comparing all the above-mentioned regression models, including the linear regression model, the exponential regression model, the logarithm regression model, the quadratic regression model, and the power regression model, we conclude that the regression model with the highest figure of R2 is the Quadratic regression model (88.95%). Therefore, we can approximately conclude that the Quadratic regression model is the optimal mining model for predicting housing prices.

The regression models of the other 17 target cities are calculated in the same way. And Table 5 shows five regression models for Changchun, Changsha, Chengdu, Chongqing, Guiyang, Harbin, Hohhot, Kunming, Lanzhou, Nanchang, Taiyuan, Urumqi, Wuhan, Xi’an, Xining, Yinchuan, and Zhengzhou.

Table 5 All regression models for the other 17 cities.

Observing all the experimental results, we can conclude that the optimal mining model for Changchun, Changsha, Chengdu, Chongqing, Guiyang, Harbin, Hefei, Hohhot, Kunming, Lanzhou, Taiyuan, Wuhan, Xi’an, Yinchuan, Urumqi, and Zhengzhou is the quadratic regression model, while the optimal mining method for Nanchang and Xining is the exponential regression model.

Abnormal error elimination and optimal model determination

Figure 4 shows the curve fittings and the abnormal errors of each capital city. The abnormal errors of the optimal model of each capital city are eliminated by the LMedS algorithm.

Figure 4
figure 4

The abnormal errors and curve fittings for each capital city.

Table 6 shows the results of again establishing the regression model after eliminating the abnormal errors.

Table 6 The optional regression models of ANLI and ACRHP after eliminating the abnormal errors.

Abnormal error elimination can significantly improve the accuracy of the mining model. To reduce the impact of the abnormal error on the accuracy of the mining model, the abnormal error of ANLI and ACRHP are eliminated after obtaining the optimal model. Comparing the current regression models with the former regression models (Table 5, Table 6), the accuracies of the models are significantly improved. The results of comparing the two situations of the same city’s regression models that eliminate abnormal error show that the optimal mining relationship between ACRHP and ANLI for Changchun, Changsha, Chengdu, Chongqing, Guiyang, Harbin, Hefei, Hohhot, Kunming, Lanzhou, Taiyuan, Wuhan, Xi’an, Yinchuan, Urumqi, and Zhengzhou is the quadratic function, while for Nanchang and Xining is the exponential regression model.

Uncertainty estimation of prediction

Predicted future housing prices

ANLI regression models of each provincial capital according to their time series are established. The explained variable is ANLI and the explanatory variable is year Y. The calculation is based on the ANLI of the previous time series, and the function with the highest degree of R2 is selected as its regression model. The optimal regression model of the ANLI time series prediction of each provincial capital is shown in Table 7.

Table 7 The optimal regression model for the ANLI time series prediction of each provincial capital.

Using the principle of least-squares curve fitting for regression analysis and prediction, the ANLI of the 18 provincial capitals in future years can be obtained. Figure 5 shows the results of taking 2014 as an example to evaluate the rationality of each city’s model and predict the future ANLI. Table 8 lists the ANLI prediction intervals for each capital city.

Figure 5
figure 5

Prediction of ANLI values of cities in 18 inland provinces (The ANLI Time Series of the capital city and its 95% Confidence Interval).

Table 8 ANLI prediction interval for each capital city in 2014.

The obtained ANLI prediction interval is brought into the optimal regression model of ANLI and AHP of each capital city, and the housing price range of 2014 can be calculated and finally compared with the price published by the National Bureau of Statistics to test the model. Taking Hefei as an example, the data show that ANLIMIN = 14.0021 and ANLIMAX = 22.5585. The possible average housing price prediction interval is: ACRHPMIN = 5640.0554 yuan per square metre and ACRHPMAX = 8606.2131 yuan per square metre. The housing price of Hefei from the 2014 official statistics is 7157 yuan per square metre, which is within this prediction interval. Table 9 shows the ACRPH prediction range and actual housing price for each provincial capital.

Table 9 ACRPH prediction range and actual housing price for each provincial capital.

From the results above, the prediction results are mainly accurate. As seen in Table 9, one unanticipated finding was that the ACRHP of Chengdu was overestimated 88 yuan, while the Wuhan was underestimated 107 yuan.

Optimization prediction results

As the results above, the uncertainty of ANLI is considered, while the uncertainty of ACRHP is ignored. To improve the accuracy of our optimal model, the ACRHP was adjusted by official inflation rate data acquiring form the World Bank.

Table 10 shows ACRPH prediction range and actual housing price for each provincial capital after ACRHP corrected by Chinese inflation rate. From the results above, the prediction results are all in our prediction interval which further confirms the feasibility and accuracy of our method.

Table 10 ACRPH prediction range and actual housing price for each provincial capital after ACRHP correction.

Discussion

The experimental sample size

According to the principle of statistical inference, when a small probability event occurs, it cannot be considered as an accident event. We selected m = 18 inland provincial capital cities in China as test areas. There are two outcomes for predicting ACRHP, consistency or inconsistency. For m provincial capital cities, there are 2m cases (R2 is high or low, namely consistency or inconsistency). Two sets of experiments were undertaken to compare the performance.

The first set of experiments forecasted ACRHP by ANLI directly. There are two cities out of prediction interval, i.e., Chengdu and Wuhan. Therefore, the probability of the strong correlation between ACRHP and ANLI of all the m provincial capitals is \({C}_{18}^{2}/{2}^{18}=153/262144=0.0005836487,\) which is a very small probability event. The second set of experiments used the adjusted ACRHP by inflation index for prediction. All cities are in the prediction interval. The probability of the strong correlation between ACRHP and ANLI of all the m provincial capitals is only \(1/{2}^{18}=1/262144=0.0000038147\), which is the much smaller probability event.

All in all, the experimental results verify that our sample size is enough, scientific and reliable; so there is a strong satistical correlation relationship between ACRHP and ANLI for 18 inland provincial capital cities in China.

The influence of saturation problem

The saturation problem of DMSP/OLS data has little effect in this research. The reason is as follows.

(a) For this study, there only few inland provincial capital cities in China have saturation problems close to 2013. Moreover, this problem is only concentrated in certain few areas of the developed city centre such as Wuhan and Chengdu.

(b) Research object is the city-scale ACRHP, so “average” night-time light intensity (ANLI) is used to analyse housing prices which can smooth (i.e. “average”) or decrease the saturated error of night-time light brightness value. In other words, the error, which exceeds 63, divided by the very large S is almost ignored. S represents an administrative area of provincial capital city in Eq. (1), and provincial capital cities are always with the large areas, e.g., the smallest inland provincial capital city - Taiyuan is 6988 km2.

(c) Saturation processing of DMSP-OLS may introduce new errors due to spatial heterogeneity. Therefore, we selected inland China as study areas where the saturation problem is not serious to ensure the credibility of the results to a large extent.

The mechanism between ACRHP and ANLI

It can be seen from the experimental results that the correlation degrees of the ANLI and ACRHP for the 18 provincial capitals in inland China are satisfactory. The optimal mining model is the quadratic regression model. In addition, ACRHP can be used to predict the future ACRHP. The relevant information can be summarized as follows.

(a) ANLI and ACRHP are highly correlated. Firstly, the correlation between ANLI and ACRHP can be explained by the internal mechanism. As mentioned in the Introduction section, there is a transmission mechanism between night-time light and ACRHP. Housing price is related to such social-economy factors as population migration and distribution, gross national product (GDP), and urbanization from a macro point of view. And these social-economy factors can be reflected and represented from night-time light imagery. In all, this conduction effect can be generalized by the substitution of the representation Eq. (12). Secondly, the experimental results strongly demonstrate that there is indeed a strong correlation between ANLI and ACRHP. In the process of constructing regression models of ANLI and ACRHP, as shown in Table 6, the \({R}^{2}\) of each regression model is above 0.80, which demonstrates that there is a high correlation between ANLI and ACRHP.

$$\begin{array}{c}ACRHP=f({x}_{1},{x}_{2},\cdots ,{x}_{i})\\ {x}_{1}={g}_{1}(NTL)\\ {x}_{2}={g}_{2}(NTL)\\ \cdots \cdots \\ {x}_{n}={g}_{i}(NTL)\\ ACRHP=f({g}_{1}(NTL),{g}_{2}(NTL),\cdots ,{g}_{i}(NTL))={\rm{F}}(NTL)\end{array}$$
(12)

where f(x) and g(x) represent the functional relationship; \({x}_{i}\) represents social-economy factors such as population, gross national product, human activities, urbanization and so on; \(g({\rm{NTL}})\) represents the quantity relationship between these social-economy factors and brightness value of night-time light imagery. Based on recursion, we can acquire a composite function ─ \(ACRHP=F(NTL)\) which reflects the transmission mechanism between NTL and ACRHP.

(b) Overall, the optimal mining model between ANLI and ACRHP of the most inland provincial capitals in China is the quadratic function, which can be regarded as an empirical formula. Additionally, the optimal mining model is quadratic function can be explained by Taylor series. In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function’s derivatives at a single point. The Taylor series of a real or complex-valued function f (x) that is infinitely differentiable at a real or complex number a is the power series

$$f(a)+\frac{f{\prime} (a)}{1!}(x-a)+\frac{f{\prime\prime} (a)}{2!}{(x-a)}^{2}+\frac{f\prime\prime\prime (a)}{3!}{(x-a)}^{3}+\ldots $$
(13)

which can be written in the more compact sigma notation as

$$\mathop{\sum }\limits_{n=0}^{\infty }\frac{{f}^{(n)}(a)}{n!}{(x-a)}^{n}$$
(14)

where n! denotes the factorial of n and f(n)(a) denotes the nth derivative of f evaluated at the point a. Any elementary function can be approximated by using a finite number of terms of its Taylor series. The optimal mining model is the quadratic polynomial, which can approximate any arbitrary function relationship. For this reason, the quadratic polynomial can be used to explain the relationship between ANLI and ACRHP more accurately. However, the quadratic function is only an optimal model in the capital cities of the most provinces in inland China, and it is an empirical model. Due to spatial differentiation, different cities may have different optimal models.

(c) ANLI can be used to predict the future ACRHP of provincial capitals in China. Based on the conclusion that ANLI and ACRHP are highly correlated, we predict the ACRHP in the following years of the target cities and compare them with the data published by the National Bureau of Statistics, with satisfactory results (Table 9). Among the results, the actual housing price of Chengdu and Wuhan in 2014 slightly deviates from the predicted housing price. Obviously, the 2014 DMSP-OLS night-time light intensity is calculated by establishing a regression curve of the time series prediction, which may make the 2014 night-time light intensity itself uncertain: when it is used to predict the housing price, it may lead to some deviation. However, this “unusual case” also can be reasonably explained by socio-economic factors.

For Chengdu, the ACRHP is overestimated 88 yuan. There are several possible explanations for this result. Firstly, natural disasters may influence the purchase behaviours, especially the earthquake, which usually causes a temporary real estate marketing crisis because of the negative consequences affecting the buildings50. Prior studies have noted that the 2008 Wenchuan earthquake (the deadliest earthquake to hit China in the past three decades) changed the consumption concept and the consumption behaviour of the resident51. According to official statistics, the ACRHP of Chengdu is 4778 yuan in 2008, while the ACRHP of Wuhan is 4781 yuan. By 2016, the ACRHP in Chengdu has increased to 7504 yuan, but the ACRHP in Wuhan has exceeded 10,000 yuan. In addition, the urban planning by local government is another important factor caused low ACRHP in Chengdu. In 2006, the government set a goal to construct a high-density city which improved floor area ratio and reduced the cost of real estate developers. All in all, the natural disasters and land policies have jointly led to moderate growth of the ACRHP in Chengdu.

For Wuhan, the ACRHP is underestimated 107 yuan. This finding was unexpected and suggests that the size of its economies and the change of corresponding policies may be the main factor. Table 11 shows the GDP of each capital city and its rank among all Chinese cities in 2014. Wuhan has a high economic level with its GDP in 2014 ranked eighth in cities across China, and the saturated digital number38 values of the light image is serious problem. Furthermore, in 2014, Wuhan abolished housing purchase restriction began in 2012 which brought “real estate market heat” and boosted the sale of houses. Therefore, the housing price of Wuhan experienced a big rise in 2014.

Table 11 The GDP of each capital city and its rank among all Chinese cities in 2014.

In addition to the above possible reasons, the official statistical housing prices of Chengdu and Wuhan are slightly deviates from the predicted housing price, and the actual values of the other 16 cities are accurately in the prediction range. Therefore, the average DMSP-OLS night-time light intensity can be used to predict the future ACRHP.

Besides, the reasons why the policy factors only have a slight impact on our regression model in most inland capital cities can be explained as below:

Firstly, the direct government intervention cannot radically change the driving mechanism of the housing prices especially in a market economy52. Meanwhile, indirect intervention already reflected in night-time lighting. For example, the Chinese government reformed the hukou system to adapt the current trend that populations are diverse moving from rural to urban centres, which promotes urbanization in China. And the series anthropogenic factors change already reflected in night-time light imagery.

Secondly, the available DMSP-OLS data until 2013 when the local government of central China has not regulated house prices toughly. Even if the housing price regulated by policies exists, the housing price is still rising steadily, especially in the researched capital cities of provinces and the urban centre53,54. Meanwhile, there are lags between policy implementation and housing price changes so that the housing restriction policies do not affect the housing price immediately55. To evaluate the trend of the historical ACRHP, the Mann-Kendall test56,57 was applied at a 0.05 significance level (Fig. 6). The results show that historical ACRHP is in a state of continuously significant increasing. In other words, the actual influence of the policy is smaller than we recognised. For example, it is well-known that the policy of home purchase restrictions has been one of China’s harshest housing market interventions to curb the overheating real estate market by imposing restrictions on purchasing power. Chinese central government has implemented basic purchase restrictions in 40 major cities designated by the Ministry of Housing and Urban-Rural Development since 2011. In Wuhan, a purchase restriction order was issued on February 23, 2011. However, it was only three years that the Wuhan Housing Security and Management Bureau held an internal meeting (September 23, 2014) and then announced the complete cancellation of the purchase restriction58. From the trend analysis, we can see that Wuhan housing prices have not been significantly affected by the purchase restriction policy, but have continued to accelerate the rise. The results also show that the market totally become one decisive factor determining market positions the real estate market which is also beyond our expectations about the effect of the home purchase restrictions policy. A more realistic simulation of the impact of the policy for housing prices is not as great as we expected, which is limited.

Figure 6
figure 6

Mann-Kendall trend test of ACRHP at 18 inland capital cities in China during 2002–2017 (UF > 0 represents an increasing trend, while UF < 0 represents a decreasing trend. And if UF beyond 95% confidence interval line represents the increasing trend or decreasing trend is significant).

In all, it is true that the Chinese government’s policy has affected housing prices to a certain extent, such as the monopoly of land supply. However, the land monopoly supply system has not changed during the continuous and rapidly rising housing prices period since 2002. Explaining rapidly changing variables with a relatively invariant variable is an incomplete research idea.

Conclusions

Taking time-series analysis of individual cities to describe the relationship between night-time light imagery and regional housing price are highly correlation. This is a very encouraging result while considering the utilization of night-time light imagery for estimating and predicting ACRHP in areas where lack timely temporal socio-economic statistics. The quadratic function is considered to be the optimal mining model in most capital cities by regression model analysis. Given the complexity of factors that affect the housing price, our research is based on the city-scale ACRHP to reduce the data noise and simplify the model parameters. We demonstrated that the night-time light imagery has a great potential to mine ACRHP. Besides, predicted ACRHP except for Chengdu and Wuhan which was slight deviation, other cities are within the prediction interval which explained our regression mining model still has important reference significance. Furthermore, the method can be used to enrich application research of night-time light data and provides a new reference point of view to exploring housing prices at the city scale. What’s more, there is a lag between government policy and housing prices. The impact of government policies on housing prices is limited. Based on the time-series analysis of individual cities, the relationship between annually ANLI and annually ACRHP was explored at a city scale. The experimental results show that although the government takes measures to regulate the real estate market, Chinese housing price continues to soar.

In addition, this paper uses the regression mining model of the time series prediction of ANLI to predict future average housing prices to evaluate the rationality of the model. The results prove that the prediction housing prices are mainly the same as the official reported statistics prices, which certifies the rationality of the mining model. Therefore, we conclude that using the ANLI of a city is a feasible method to predict ACRHP. To apply this mining model in a region with a developed economy and a high degree of saturated DN values of light images is a question that needs further study. Next, we will study and analyse the coupling relationship between the ANLI and ACRHP in economically developed regions such as the Yangtze River Delta Economic Zone and the Pearl River Delta Economic Zone in China. In addition, we also consider optimizing ANLI data by fusions of recent NPP-VIIRS data or China’s Luojia-1 data after 2013 and historical DMSP-OLS data before 2013.

All in all, this study has great theoretical significance for the real estate market which not only discovered a new pattern that average night-time light intensity (ANLI) is a fair indicator of average commercial residential housing price (ACRHP), but also established a likelihood function relationship between ANLI and ACRHP. Meanwhile, this study also has great practical significance. The results of this study can provide a useful reference for the public to choose the appropriate cities for employment or settlement and offer a very important and interesting reference point for real estate market investment.