After following the algorith described above, we found out that the best model is a SARIMA(0,1,1)x(1,1,2)7:
model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)
results=model.fit()
results.summary()
| Dep. Variable: | D.DS7.daily_rev | No. Observations: | 594 |
|---|---|---|---|
| Model: | SARIMAX(0, 0, 1)x(1, 0, [1, 2], 7) | Log Likelihood | -5980.084 |
| Date: | Sun, 16 May 2021 | AIC | 11970.168 |
| Time: | 22:52:08 | BIC | 11992.102 |
| Sample: | 0 | HQIC | 11978.710 |
| - 594 | |||
| Covariance Type: | opg |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| ma.L1 | -0.5369 | 0.052 | -10.284 | 0.000 | -0.639 | -0.435 |
| ar.S.L7 | -0.6119 | 7.303 | -0.084 | 0.933 | -14.925 | 13.701 |
| ma.S.L7 | -0.2659 | 7.331 | -0.036 | 0.971 | -14.635 | 14.103 |
| ma.S.L14 | -0.5294 | 6.414 | -0.083 | 0.934 | -13.101 | 12.042 |
| sigma2 | 4.936e+07 | 1.78e-06 | 2.77e+13 | 0.000 | 4.94e+07 | 4.94e+07 |
| Ljung-Box (L1) (Q): | 0.18 | Jarque-Bera (JB): | 60.71 |
|---|---|---|---|
| Prob(Q): | 0.67 | Prob(JB): | 0.00 |
| Heteroskedasticity (H): | 2.57 | Skew: | 0.43 |
| Prob(H) (two-sided): | 0.00 | Kurtosis: | 4.31 |
results.plot_diagnostics(figsize=(15, 12))
plt.show()
Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.
Top Right: The density plot suggest normal distribution with mean zero.
Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.
Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.
Also the AIC criterion is minimized compared to our first model and those we tried before we end up to the final!
pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
#pred_ci.iloc[:, 0],
# pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()
So how to interpret the plot diagnostics?
Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.
Top Right: The density plot suggest normal distribution with mean zero.
Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.
Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.
### Final Model
After following the algorith described above, we found out that the best model is a SARIMA(0,1,1)x(1,1,2)7:
model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)
results=model.fit()
results.summary()
results.plot_diagnostics(figsize=(15, 12))
plt.show()
Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.
Top Right: The density plot suggest normal distribution with mean zero.
Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.
Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.
Also the AIC criterion is minimized compared to our first model and those we tried before we end up to the final!
pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
#pred_ci.iloc[:, 0],
# pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()