Conclusion

The best availability to the public transport system is in the city center. With increasing radius from the city center, the availability decreases as the highest values of the nearest public transport terminal distances tends to be clustered on the outskirts of the city, especially for all rail modes. It was found out that the municipalities that were joined to the capital most recently have smaller pubic transport service covarage and the activities in the development of the tram network was very small in the past decades. Furthermore, analysis, whether there are significant spatial clustering patterns was performed, by investigating Global and Local Autocorrelation in the data. For each explanatory variable and the dependent variable as well, neighbouring areas with similar spatial patterns were found.
The significant finding from this analysis is that the residential land price can be predicted using calculable distance variables. The distance from the centroid of the polygon to the nearest metro station is the strongest predictor. This finding confirms the role of the underground rail as core public transport system in Prague.

Analysing Spatial distributions of fitted model coefficients

On the figure below, spatial distribution of the area coefficient estimate is ploted. The values seems to have plausible sign, indicating an increasing effect of the residential land price. Unlike for R^2, the highest values are not clustered in the historical city center. The highest values can be found on the eastern outskirt of the city, this is most likely to be caused by the actual polygon size as this one is the largest in the dataset. Another high values of the coefficients can be found north and west of the city center. The north area is called Troja and the west area is named Bertramka and these are the one of the most lucrative neighbourhood in the Prague consisting of luxurious mansions. The lowest values corresponds with the smallest polygons. Overall the values seems to be randomly scattered as no spatial clustering pattern is obvious.

Area coefficient spatial distribution

 

On the figures below, spatial distributions of the distance coefficients estimates are plotted. For the nearest metro distance coefficient, it can be observed that the lowest (red) values tends to be clustered in the city center. As the most valuable land is located there, it corresponds with the fact that the model assigns the strongest decreasing effect on these polygons. When going more from the center to outskirts it seems that the decreasing effect of metro distance on price tends to be weaker, as with increasing radius from city center, the polygons‘ color shade is getting lighter. Although some positively signed coefficients are present, especially in the Horní Počernice Municipality, their occurrence is not frequent. The green polygon near the city center is located very close to the Prague Castle and it is most likely affected by one of the highest price value.

 

Nearest Metro Stop coefficient spatial Distribution

When investigating spatial distribution of the nearest tram stop distance coefficient, it can be observed that the model has assigned positively signed coefficients to the polygons in the city center. Although this not seem to be plausible result, it is most likely caused by the fact that the prices in the center are several times higher and it creates disbalances in the model. The solution can be to assign penalization for the most valuable residential land areas. However, this would have created bias in the model, violating linear relationship assumption between variables. Overall, the coefficients seems to have plausible sign on the eastern half of the city. On the western bank of the Vltava river and further, slightly positive values tends to cluster there. When comparing with metro coefficient estimates, the values for tram seems to be more randomly scattered as no buffering pattern is apparent in the distribution.

Nearest Tram Stop coefficient spatial Distribution

The Geographically weighted regression is a powerful tool to quantify how much the distance to nearest public transport terminal affects the residential land price. Although some limitations have appeared such as unplausible coefficients signs in some areas, in global average, the model presents plausible and realistic results.
When the multicolineratiy of the explanatory variables was found, several solutions were analyzed. By applying dimensionality reduction by Principal Component Analysis the explained variation in the data could have improved. However this solution was discarded as it would not be possible to correctly interpret, which public transport mode makes the largest contribution to the residential land price.

Fitting GWR model

After reprojecting the data to UTM grid, several types of Geographical Weighted Regression models were estimated. After fitting basic GWR model to the data, it has turned out that the suffers from multicolinearity of the independent variables. The coefficeints for distances in the full GWR model did not have plausible signs. It was expected for the coefficients to be negative which would imply that with increasing distance to nearest public transport terminal price of the land decreases. However, in some cases, these coefficients had positive sign, resulting in non realistic behaviour, compared with empirical findings. As the independent variables are highly correlated, Geographically Weighted Ridge and Geographically Weighted Lasso regression models were fitted, as in these models, the assumption of non-correlated independent variables is relaxed. Although these models provided improvement, some coefficients did not have plausible sign anyway. Therefore, reduction of the model was performed. The selection of the best performing model was based on the highest R^2 and the lowest AIC criterion. Several combination of the independent variables and kernel functions were investigated. Removing the nearest train and the nearest bus stop distance independent variables provided the best fit, in terms of the above mentioned criteria. The summary table of the model’s output from R is printed below.
Coefficient values imply that there is increasing effect of the area and decreasing effects of distances on price. In average, if the area increase by one square meter, the mean residential land price per square meter increase by 0.00763. In average, if the distance to nearest metro station increase by one meter, the mean price per square meter in Czech Crowns decrease by 0.624. If the distance to nearest tram stop increase by one meter, the mean price decrease by 0.183. GWR model provides a powerful predictor of the residential land price, explaining on average 84% of the variability in the dataset.

GWR model summary

To investigate how the values of the coefficients are distributed across space, values of \beta estimates and the local R^2 values were mapped to corresponding polygons. The results are depicted on the next page. When investigating Local R^2 statistics we can observe that them model fits to the data very well in the city center, as the highest values of local R^2 can be found there. The lowest values of R^2 can be found south of the the city center, another low R^2 polygons can be found on the eastern band of the Vltava river meander. As the range of the magnitude is from 0.064 to 0.981 , it can be concluded that in the lightly shaded polygons the price of the residential land does not follow linear trend, based on the summary statistics of R^2 printed in the table ?? below, it can be concluded that these regions are minor and their effect on the model is low.

Min.1st Qu.MedianMean3rd QuMax.
0.063720.539940.709680.679720.838770.98145
Local R2 spatial distribution

 

Fitting Ordinary Least Squares Model

At first, an OLS model is investigated, summary output table is printed below:

EstimateStd. Errort valuePr(> |t|)
(Intercept)10739.3519323.281233.220.0000
AREA_SQ_M0.01020.00254.040.0001
HUBDISTBUS-1.56660.9151-1.710.0870
HUBDISTMET-1.14450.1295-8.840.0000
HUBDISTTRA-0.07380.1357-0.540.5867
HUBDISTTRAM-0.43520.1164-3.740.0002
ScoresF:65.78 (5,2302)M. R^2: 0.125Adj. R^2: 0.1231AIC: 46814.1

With p value above 0.05, it can be concluded that bus and tram distance are not significant. This is most likely to be caused by multicolinearity of independent variables as all predictors are related with distance and even have the same unit. A model reduction is performed to have only significant variables. Summary table of the reduced model is printed below:

EstimateStd. Errort valuePr(>|t|)
(Intercept)10304.5593210.620148.920.0000
AREA_SQ_M0.01010.00254.020.0001
HUBDISTMET-1.15070.1283-8.970.0000
HUBDISTTRAM-0.44000.1149-3.830.0001
ScoresF:108.4 (3,2302)M. R^2: 0.1237Adj. R^2: 0.1226AIC: 46813.49

Adjusted R squared metrics of the reduced model is 0.1226. It can be concluded that 12.26 percent of the variance in the data can be explained. Although model reduction was performed and model performance has slightly increased the model still fits poorly to the data.

Geographically Weighted Regression

Geographically weighted regression is a multivariate model that is taking into account non-stationarity across space. As the coefficients in GWR may vary across the inspected area, the method is an adequate tool to analyse local properties of the dependent variable.
Comparing with Ordinary least square regression, that generates a single equation for the global model:

(1)   \begin{equation*} \mathbf{Y} = \beta_0 + \boldsymbol{\beta_i} \mathbf{X_i} + \boldsymbol{\epsilon} \end{equation*}

GWR constructs a separate equation for every feature in the dataset incorporating the dependent and explanatory variables of features falling within the bandwidth of each target feature. The Geographically weighted regression equation is

(2)   \begin{equation*} Y= \beta_0 (u_i,v_i) +\beta_1 (u_i,v_i)x_i1 + \beta_2 (u_i,v_i)x_i2 + ... +\beta_k (u_i,v_i)x_ik + \epsilon_i \end{equation*}

where (ui, vi) represents the coordinates of the ith point in space. The cordinates for this model are the latitude and longitude coordinates of each polygons‘ centroid. The weight assigned is based on a distance decay function centered at location i and observations nearer to i are given greater weight than observations further away. The (global) ordinary least squares linear regression model assumes that the observations being
used are independent. As it was already explored in the previous chapter, this assumption is violated as there are significant spatial clustering patterns in the data. In Geographically Weighted Regression, this assumption is relaxed, as it allows for spatially varying coefficients by producing estimates
of the parameter at each data location, factoring in spatial heterogeneity.
Using OLS, the parameters for a linear regression model can be obtained by solving:

(3)   \begin{equation*} \boldsymbol{\beta} = (\mathbf{X}^\intercal \mathbf{X})^{-1} \mathbf{X}^\intercal \mathbf{y} \end{equation*}

The parameter estimates for GWR may be solved using a weighting scheme:

(4)   \begin{equation*} \boldsymbol{\beta} = (\mathbf{X}^\intercal W(u,v) \mathbf{X})^{-1} \mathbf{X}^\intercal W(u,v) \mathbf{y} \end{equation*}

The weights are chosen such that those observations near the point in space where the parameter estimates are desired to have more influence on the result than observations further away.
The Gaussian function is used for the weight calculation, the weight for the ith observation is:

(5)   \begin{equation*} w_i = exp (-d_i/h)2 \end{equation*}

where d_i is the Euclidean distance between the location of observation i and location (u,v), and h is a bandwidth.

 

Spatial distribution of moran statistics for nearest train station distance

Spatial distribution of local Moran statistics for nearest train station distance is illustrated. It can be seen that the train network in Prague is shaped into a star with the main terminal (Praha Hlavní nádraží) in the city center. This corresponds with the original purpose of the system as it was designed to be used as a commuter rail for the suburban regions. There are 10 sections of the train lines that leads to Central Bohemian Region and the majority of low distance regions are aligned around them. Red polygons can be seen in the geographical center of the map however this cluster is completely surrounded by rail lines and it is well connected to other modes. North of the city center, large cluster of red polygons can be observed near Bohnice Municipality. It seems that this clustering pattern is repeating as some of these clustered areas has turned out red for buses and metro distances as well, implying poor connectivity which apparently leads to lower residential land prices in this region, concluding similar results with the findings in empirical analysis.

Spatial distribution of moran statistics for price

When investigating spatial distribution of local moran statistic of price, we can observe several high value polygons neighbouring with another high value polygons. The most apparent is red area north west. This neighborhood is called Hanspaulka and it consists of historical mansions. Smaller red polygons nearby consists of historical villas as well and they offer beautiful view on the Prague Castle which is nearby. The mentioned factors explain that these areas follow similar spatial clustering pattern. East of the Vltava river meander another large red polygon can be observed. This neighbourhood is called Harfa and it contains newly built luxury penthouses. Generally it can be concluded that high area regions neighbouring with another high area regions consist of either historical mansions, luxury penthouses or historical apartment blocks. Large blue polygons on the eastern part of the city contains mostly separated concrete panel apartment blocks, that were built during communist era. Although the outlying blue polygon on the south end of the city consist of detached houses, its located near highway and ring road crossing and no metro station is nearby, these factors contributes to result of low value of moran statistics.

Spatial distribution of moran statistics for price

 

Modelling

In this section, selected mathematical modelling methods will be applied to the processed data. First, inspection whether the data are spatially autocorrelated will be conducted. Afterwards, Geographically Weighted Regression will be applied, with Residential Land Price as dependent variable.

Nearest train station spatial distribution

The spatial distribution of the nearest train station distances for each polygon is depicted on the figure below. When inspecting the picture, it seems that the values are scattered as no low distances cluster of polygons is apparent and any other spatial pattern is not clear on the first look. When comparing with tram and metro spatial distribution of distances, we can observe that the areas on the eastern end of the city, near the border with Central Bohemian Region is better connected to the train network, rather than any other mode of public transport. This corresponds with the fact, that this part of the city has suburban character and it was designed for its residents to commute by rail.

Distance to nerest train station

Autorof thumbmail picture: Petr Novák:

Source

 

Data Analysis

In this section empirical data analysis will be conducted. Starting with the summary statistics of the dataset and then proceeding with spatial analysis of model’s variables. For each component of the model, dedicated section is listed below.