Fitting Generalized Linear Models – Poisson GLM

Poisson GLM

Compared with linear regression model, The Poisson log-linear model allows the variance to depend on the mean, and models the log of the mean as a linear function of the covariates.

Given a set of parameters θ and an input vector x, the mean of the predicted Poisson distribution, as stated above, is given by

    \begin{equation*} \lambda ´:= E(\mathbf{Y|X}) = e^{\mathbf{\theta´ x}} \end{equation*}

Where \mathbf{Y} is the vector of response variable, \mathbf{X} is vector of explanatory variables and \mathbf{theta} is vector of parameters.

By inspecting histogram of Ozone Concentration it seems that Poisson regression model can be good start as we can view the Ozone concentration as particles count and we use the log as the link function. We start without adding interactions or modifying the link function, we have 4 out of 8 variable significant on 5\% confidence level.
We can describe this model as follows:

    \begin{align*} log(\lambda_i) &=\beta_0 +\beta_{Temp}*Temp_i +\beta_{InvHt}*Inv_Ht_i +\beta_{Pres}*Pres_i+\beta_{Vis}*Vis_i\\ &+\beta_{Hgt}*Hgt_i +\beta_{Hum}*Hum_i +\beta_{InvTmp}*InvTmp_i +\beta_{Wind}*Wind_i \end{align*}

Where \lambda_i is the i-th expected value of (E[Y_i|X_i], which has to be positive, \beta_0 is the intercept and \beta_p are the coefficients of predictors.

We try adding visibility and wind speed as factors, but we quickly figure out that it is not right to do so. Therefore, we start looking at interactions. We add all possible interactions, however, as discussed earlier, this is not a good approach, therefore, we start removing correlated variables as well as their interactions. We observe that even though the AIC scores don’t change much while we perform this tuning, the lowest AIC scores which we obtain are for the models when remove visibility (AIC = 1744.5) and when we remove the interactions between the heavily correlated variables (AIC = 1744.1). However, between these two models, the latter has quite a high number of statistically significant explanatory variables. We can also see this using the summary of the model (not presented here due to size issues but can be seen by running the code for fit3).
We performed tuning the Poisson regression model by backward selection from full model. When we removed right combination of parameters and interactions we got following model, where all parameters in the model are significant on 5 percent level, excluding the Temperature, which is significant on 10 percent. The best performing model in terms of significance of parameters and the lowest AIC for Poisson distribution is following:

    \begin{align*} log(\lambda_i) &= \beta_0+\beta_{Temp}*Temp_i+\beta_{InvHt}*Inv_Ht_i+\beta_{Pres}*Pres_i +\beta_{Vis}*Vis_i+\beta_{Hgt}*Hgt_i \\ &+\beta_{Hum}*Hum_i + \beta_{Temp:InvHt}*(Temp_i*InvHt_i) +\beta_{Temp:Pres}*(Temp_i*Pres_i)\\ &+\beta_{Temp:Hgt}*(Temp_i*Hgt_i)+\beta_{Temp:Hum}*(Temp_i*Hum_i)+\beta_{InvHt:Pres}*(InvHt_i*Pres_i)\\ &+\beta_{InvHt:Hgt}*(InvHt_i*Hgt_i)+\beta_{InvHt:Hum}*(InvHt_i*Hum_i)+\beta_{InvHt:Wind}*(InvHt_i*Wind_i)\\ &+\beta_{Pres:Hum}*(Pres_i*Hum_i) +\beta_{Pres:Hgt}*(Pres_i*Hgt_i) +\beta_{Vis:Wind}*(Vis_i*Wind_i)\\ &+\beta_{Hgt:Hum}*(Hgt_i*Hum_i)+\beta_{Wind:InvTmp}*(Wind_i*InvTmp_i)\\ &+\beta_{Temp:InvHt:Hum}*(Temp_i*InvHt_i*Hum_i)+\beta_{Temp:Pres:Hgt}*(Temp_i*Pres_i*Hgt_i)\\ &+\beta_{Temp:Pres:Hum}*(Temp_i*Pres_i*Hum_i)+\beta_{Temp:Hgt:Hum}*(Temp_i*Hgt_i*Hum_i)\\ &+\beta_{Temp:Hgt:Hum}*(Temp_i*Hgt_i*Hum_i)+\beta_{InvHt:Pres:Hum}*(InvHt_i*Pres_i*Hum_i)\\ &+\beta_{InvHt:Vis:InvTmp}*(InvHt_i*Vis_i*InvTmp_i)\beta_{InvHt:Vis:Wind}*(InvHt_i*Vis_i*Wind_i)\\ &+\beta_{InvHt:Hgt:Hum}*(InvHt_i*Hgt_i*Hum_i)+\beta_{InvHt:Wind:InvTmp}*(InvHt_i*Wind_i*InvTmp_i)\\ &+\beta_{Pres:Hgt:Hum}*(Pres_i*Hgt_i*Hum_i)+\beta_{Temp:Pres:Hgt:Hum}*(Temp_i*Pres_i*Hgt_i*Hum_i)\\ &+\beta_{InvHt:Vis:Wind:InvTmp}*(InvHt_i*Vis_i*Wind_i*InvTmp_i) \end{align*}

Napsat komentář

Vaše e-mailová adresa nebude zveřejněna. Vyžadované informace jsou označeny *