Our final part is aimed to analyze the sales of the store and attempt to predict our future sales and revenue by using timeseries analysis techniques and models.
We used the tables „olist_orders_dataset“ and „olist_order_items_dataset“. The orders dataset includes the timestamp of every completed order and the order_items dataset includes the cost of every order. So, we did some standard data set manipulations (data cleaning, dropping the columns, data merging etc) and created a final data set that contains the dates (as an index) and the overall daily revenue (under the column “price”).
Investigating Autocorrelation and Partial Autocorrelation
Creating Train/Test split
ARMA model
Investigating Non Stationarity
The next step is to analysis patterns regarding the stationarity of the process. The initial step here is to differentiate the data. This means to create a new time series consisting of the differences between each observation and its previous one.
Selecting Final Model
After this it was obvious that a SARIMA model should be used. The conclusion of the analysis is that SARIMA(0,1,1)x(1,1,2)7 gives the best fit to our data. In the next article, Non-Parametric method of Recurent Neural Network will be applied, to forecast sales within Olist dataset.