Final Model¶

After following the algorith described above, we found out that the best model is a SARIMA(0,1,1)x(1,1,2)7:

In [92]:
model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)

results=model.fit()

results.summary()
Out[92]:
SARIMAX Results
Dep. Variable: D.DS7.daily_rev No. Observations: 594
Model: SARIMAX(0, 0, 1)x(1, 0, [1, 2], 7) Log Likelihood -5980.084
Date: Sun, 16 May 2021 AIC 11970.168
Time: 22:52:08 BIC 11992.102
Sample: 0 HQIC 11978.710
- 594
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ma.L1 -0.5369 0.052 -10.284 0.000 -0.639 -0.435
ar.S.L7 -0.6119 7.303 -0.084 0.933 -14.925 13.701
ma.S.L7 -0.2659 7.331 -0.036 0.971 -14.635 14.103
ma.S.L14 -0.5294 6.414 -0.083 0.934 -13.101 12.042
sigma2 4.936e+07 1.78e-06 2.77e+13 0.000 4.94e+07 4.94e+07
Ljung-Box (L1) (Q): 0.18 Jarque-Bera (JB): 60.71
Prob(Q): 0.67 Prob(JB): 0.00
Heteroskedasticity (H): 2.57 Skew: 0.43
Prob(H) (two-sided): 0.00 Kurtosis: 4.31


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.09e+30. Standard errors may be unstable.
In [93]:
results.plot_diagnostics(figsize=(15, 12))
plt.show()

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.

Also the AIC criterion is minimized compared to our first model and those we tried before we end up to the final!

In [95]:
pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
                #pred_ci.iloc[:, 0],
               # pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()
In [ ]:
So how to interpret the plot diagnostics?

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.

### Final Model

After following the algorith described above, we found out that the best model is a SARIMA(0,1,1)x(1,1,2)7:

model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)

results=model.fit()

results.summary()

results.plot_diagnostics(figsize=(15, 12))
plt.show()

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.

Also the AIC criterion is minimized compared to our first model and those we tried before we end up to the final!

pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
                #pred_ci.iloc[:, 0],
               # pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()