model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)

results=model.fit()

results.summary()


results.plot_diagnostics(figsize=(15, 12))
plt.show()


pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
                #pred_ci.iloc[:, 0],
               # pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()


So how to interpret the plot diagnostics?

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.

### Final Model

After following the algorith described above, we found out that the best model is a SARIMA(0,1,1)x(1,1,2)7:

model = sm.tsa.statespace.SARIMAX(dfRevClean["daily_rev"], order=(0,1,1), seasonal_order=(1,1,2,7), simple_differencing=True)

results=model.fit()

results.summary()

results.plot_diagnostics(figsize=(15, 12))
plt.show()

Top left: The residual errors seem to fluctuate around a mean of zero and have a uniform variance.

Top Right: The density plot suggest normal distribution with mean zero.

Bottom left: All the dots should fall perfectly in line with the red line. However we have some significant deviations that imply the distribution is skewed.

Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are not autocorrelated. Any autocorrelation would imply that there is some pattern in the residual errors which are not explained in the model.

Also the AIC criterion is minimized compared to our first model and those we tried before we end up to the final!

pred = results.get_prediction(start=480, dynamic=False)
pred_ci = pred.conf_int()
ax = timeseries1['Daily_Rev First Difference'][487:].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 4))
#ax.fill_between(pred_ci.index,
                #pred_ci.iloc[:, 0],
               # pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Daily Revenue')
plt.legend()
plt.show()

Dep. Variable:	D.DS7.daily_rev	No. Observations:	594
Model:	SARIMAX(0, 0, 1)x(1, 0, [1, 2], 7)	Log Likelihood	-5980.084
Date:	Sun, 16 May 2021	AIC	11970.168
Time:	22:52:08	BIC	11992.102
Sample:	0	HQIC	11978.710
	- 594
Covariance Type:	opg

	coef	std err	z	P>\|z\|	[0.025	0.975]
ma.L1	-0.5369	0.052	-10.284	0.000	-0.639	-0.435
ar.S.L7	-0.6119	7.303	-0.084	0.933	-14.925	13.701
ma.S.L7	-0.2659	7.331	-0.036	0.971	-14.635	14.103
ma.S.L14	-0.5294	6.414	-0.083	0.934	-13.101	12.042
sigma2	4.936e+07	1.78e-06	2.77e+13	0.000	4.94e+07	4.94e+07

Ljung-Box (L1) (Q):	0.18	Jarque-Bera (JB):	60.71
Prob(Q):	0.67	Prob(JB):	0.00
Heteroskedasticity (H):	2.57	Skew:	0.43
Prob(H) (two-sided):	0.00	Kurtosis:	4.31

Final Model¶