Applying Autoregressive models for sales prediction

Our final part is aimed to analyze the sales of the store and attempt to predict our future  sales and revenue by using timeseries analysis techniques and models.  

We used the tables „olist_orders_dataset“ and „olist_order_items_dataset“. The orders  dataset includes the timestamp of every completed order and the order_items dataset  includes the cost of every order. So, we did some standard data set manipulations (data cleaning, dropping the columns, data merging etc) and created a final data set that contains the dates (as an index) and the overall daily revenue (under the column “price”). 

Investigating Autocorrelation and Partial Autocorrelation

Creating Train/Test split

ARMA model

Investigating Non Stationarity

The next step is to analysis patterns regarding the stationarity of the process. The initial step here is to differentiate the data. This means to create a new time series consisting of  the differences between each observation and its previous one.

Selecting Final Model

 After this it was obvious that a SARIMA model should be used. The conclusion of the analysis is that SARIMA(0,1,1)x(1,1,2)7 gives the best fit to our data. In the next article, Non-Parametric method of Recurent Neural Network will be applied, to forecast sales within Olist dataset.