Ensemble Methodology for Demand Forecasting

Ensemble Methodology for Demand Forecasting

Generating forecast is undoubtedly one of the most important sectors in any industry. It not only helps the demand planner of the company which is focusing primarily on the future forecasting of the sales but also the inventory management and its handling cost. Down the line of the company’s supply chain, a wrong forecast can impact in an alternate way of incurring heavy cost. Apart from this a correct forecasting leading to a good accuracy says how the profit and loss of the company depends upon. This heavily depends upon the predicting machine that the company is using for the forecasting. Demand planners look for a consistently accurate results over a period of time. Where  the accuracy of the forecast is not only good but also the algorithm is stable in a long period of time. This we have illustrated through the analysis and the model takes care of the stability of a particular algorithm selected for a SKU. In this paper we want to highlight a new ensemble technique using the averaging method which not only gives priority to the algorithm which consistently maintains a good accuracy but also decreases the deviation from the actual sales. Based upon the history it tries to give the importance to the one which predicts better and penalizes other algorithms which deviate from the actual sales.

In any supply chain industry, demand forecasting plays a very vital role not only in bringing the profit but also maintaining the right quantity of products at the right time. It is one of the key driving factor in planning and decision making for any Supply Chain Management and Enterprise level. The efficiency or accuracy of the demand forecasting is taking as the major account for taking any major decisions such as capacity buil ding, resource allocation, expansion and forward or backward integration etc. Forecasting is a part of the machine learning intelligence known as ” prediction” [l]. It comes majorly under the regression techniques defined in the literature. The major concern in the forecasting part is the deviation from the actual sales. Thus achieving hundred percent accuracy is difficult job for all the researchers in this field. As the accuracy goes on increasing the deviation from the actual sales also decreases. Now the major concern is the under or the over forecasting in the demand forecasting [2]. The accuracy curve can be assume d to be a bell-curve in which for a particular accuracy, there can be two forecasting values i.e on either side of the hundred percent accuracy. Thus the major factor is the kind of forecast the machine is generating. Here, we will be discussing about a new ensemble [3][4] technique which is a kind of averaging method to better the forecast accuracy than the other state of art statistical or time-series algorithms [5] like ARIMA, Moving Average, Exponential Smoothing etc. or the regression techniques like Linear Regression, Support Vector Regression etc. Our method tries to combine the above two algorithms to decrease the forecast deviation from the actual sales.

In the second section we will be presenting the existing models in the market to deal with the time-series problem.In the third section will be discussed about the ensemble technique followed by the results to support our concept.

Demand Forecasting is basically a regression problem but time as  the primary constraint. The existing process which exist as  the  basic intelligence behind all the models in the market are the statistical or time series algorithms and other regression based models. In the time series models the al­ gorithms like ARIMA, Moving Average , Weighted Moving Average, Croston Model, ETS etc and in the regression based models Support Vector Regression, Decision Tree Regression, Random Forest Regression , Linear Regression , Lasso Regres­ sion, Artificial Neural Networks, Recurrent Neural Networks etc [6]. All the models tries to minimize the error or deviation from the actuals in the validation set. This results into the selection of the better model among the bag of models present in the software for predicting the future. The evaluation parameter which is used is the mse. After doing a lot of research mse was fixed to be the evaluation parameter. The algorithm or the flow which is used to forecast the future is as follows:

1) Data Processing

• Outlier’s Detection: Mostly the points in the time line whose value/sales in this case lies outside the mean ± 3*standard-deviation is considered as the outliers and are imputed with the values using the average of the 3 nearest neighbours in the time line or by imputing to the threshold defined [7].

• Missing Values Imputation: The values in the time line having no value or zero sales are considered as the candidate for the missing-val ues. If the dataset has a lot of zeros which is more than 40% is said to be highly sparse and the zeros are not treated or im puted . But if there are lesser than the pre-defined threshold of zeros in the dataset to be considered for the model output , then these are imputed with either mean of that year sales or average of the nearest neighbor in the time line [8].

• Dataset Creation: Here we are using two set of odels for generating the forecast: (1) Using the ate-of-art Time Series Algorithms and (2) Other regression based models. The regressors for these models is also defined accordingly. For the Time- Series Algorithms, the moving window is used to create the data set with the optimized lag for each model and sku(stock keeping unit). While in case of Regression Based Algorithms, we have fixed the lag as 12 after a lot of research and the metric for checking the accuracy. So that some of the seasonality pattern is recurred in the data.

2) Creating the Validation Sets:

 Hold Out Method with 70,30 ratio. 70 percent of the dataset which is the training set is used for the following two things:

• Hyper-Parameters optimization:

 Each algorithm has some hyper-parameters to optimize which is the value at the desired minima/maxima of the cost function.

• Generating the forecast in the training set to find how, the forecast is generated in this set.

 The validation set of 30 percent is used for fixing the model/algorithm which has the minimum mse.

3) Model Input: The training part of the data is passed through all the algorithms in the particular set of the model (Time-Series Algorithms or Regression-Based Algorithms) to generate the forecast and to check how the forecast in the training deviates from the actual training set. Also in this part the hyper-parameters for each algorithm is fixed. In the testing/validation part the optimized algorithm is used to generate the forecast and to check which algorithm gives the better result closer to the actuals in the validation set [9].

4) Model Fix and Stability test: The model/algorithm which is consistently forecasting in a better way in the consec­ utive 3 months range is fixed to forecast.

5) Forecast Generation: The model chosen in the above step is used to generate the forecast in the out-of-sample set of the next unseen months to come.

A. Forecast Accuracy Check:

There are lot of methods or metrics to check the accuracy on the time series prediction process. In this case we are following two methods to measure the accuracy of the models:

Here k denotes the range of the months in which the forecast accuracy is checked. If k = 3, then the forecast accuracy checked is termed as MT or the Medium-Term. This is very important in the product management. When k = 1, the forecast accuracy is termed as N-1 Accuracy. For the MT, the accuracy considered are cumulative sum of November-17 , December-17 and January-18. Thus,

In the other hand, when the N-1 Accuracy is taken into the consideration, for the same above case the accuracy will be checked in November-17. Thus,

Apart from the above forecasting measures, some of the industries also use the state-of art accuracy measurement metrics which is defined as below:

Here, we will show the results of our work in both the accuracy measures and how our improved methodology will prove to be better in both the cases. Instead of checking the model performance in only one month, we will also concentrate in the stability of the models selected for the products or stock­ keeping-unit (SKU). The most important is the model selected should be good in bias and variance at the same time. As achieving both is a matter of ideal cases, we will try to explain how this has been taken care in our cases.

B. Base Model:

The market uses their own intelligence to give the forecast. The model that is used by them are some  of the standard model in the market and is then enriched with the market intelligence on the top of the forecast which is termed as the Enriched Forecast or Final Forecast. The base models which they use has the following steps:

1) Data Processing: Treatment of the outliers using the standard mean ± 3 * standarddeviation method.

2) Model Input: The algorithms which are used SES model, Croston model, Seasonal Linear Regression and Double exponential smoothing. The model is chosen based on the kind of the SKU it is. If the SKU is behaving Seasonal, then only the seasonal models are chosen and like-wise. Mostly in this existing scenario, only the Seasonal Behavior and Trend Behavior is taken into the consideration.

3) Forecast Generation: Once the kind of SKU is fixed, the SKU is passed in that kind of model and accordingly the forecast is generated. On the top of it, for those SKU’s which didn’t follow the seasonal pattern, an external seasonality is applied and also if the generated forecast doesn’t match with the level of the actuals, it is corrected again externally.

4) Enrichment: Once the forecast is generated, the market use s its own intelligence to enrich the sales of each SKU to vary the sales reported by the model. An example is in a particular market, there has been some ban on a particular type of product. But the sales history of that product has been in a particular range. But due to this which has not been implemented in the model as an external regressor, the model output will be in that particular range itself

This model is very saturated in case of providing a good base-I orecast. The tool which is mostly used by the marke by tweaking the parameters and models present in JDA and SAP-APO, which are among the top forecasting tools available in the market. Apart from this in this existing procedure we will be discussing and focusing upon the state­ of-art machine learning algorithms used for the time-series application and the steps used to generated the forecast.

C. Models and Algorithms:

In this we will be discussing more about the existing sce­ nario which we have implemented to generate the forecast for different number of SKU’s. State-of-art time series algorithms like ARIMA, ETS, Moving Average, Weighted Moving Aver­ age etc. Along with this, naive forecasting and also checking the last year’s sales and giving the same forecast is also used. Apart from this different Regression Based methods are also used to generate the forecast. The algorithms like Support Vector Machines for Regression, Decision Trees Regression, Random Forest Regression, Lasso Regression, Ridge Regres­ sion, Linear Regression and Multilayer Regression is used. In this process, it is mostly:

• Generating the forecast in the Validation set for both the model

• Checking the deviation in the actuals in both the models

• Fixing which model will be used to generate the forecast

2) Model Description: Here the forecast is generated for the three consecutive months and the importance is given to that model out of Time-Series based and Regression based which is chosen in the history forecast generation. For ex­ ample, in the 3 history runs, if time-series based wins in the FACC over the regression based model, then in this forecast generation, time-series model is chosen over the regression model and vice-versa.

3) Short-Comings of this method: Here the accuracy which

is compared at the end for all the SKU’s suffers a lot due to the Forecast Accuracy as defined by:

Here, the ± depends upon Over Forecast and Under Fore­ casting. The term is defined if the forecast generated by the model is actually more than the actuals for the particular month accuracy calculation month or less than the actuals for the particular accuracy calculation month respectively. Elaborating the derivation ,

6) Case 1: Over Forecasting

4) Case 2: Under Forecasting

Thus, we can see that in the under-forecasting case, if the forecast deviates a lot, then the accuracy(FACC) might even go negative. If we check from the above model discussion, the output which is provided is a best of the model which the validation gives. Thus, out of the two methods, the method which gives a better forecast most of the time is chosen for a set/cluster of the market. The cluster is mostly the different kinds of market present in the data. If we check the above two cases closely, we can find that, the forecast accuracy is mostly or closely related to the forecast generated by the model. In this current scenario if we sum up the forecast generated by both the model output for each of the clusters, the accuracy drops. This due to the following reasons:

  •  Both are Over-forecasting: This sum takes the actual forecast more deviating from the actuals.
  • Both are Under-forecasting: This takes the sum more below the actuals and towards the negative forecast accuracy.
  • One Over-forecasting and another Under-forecasting: This tries to minimize the error or deviation.

Here in this scenario, the two different levels of forecasting (one under and another over) drives the accuracy badly.

In the second case, if the accuracy measure is:

here the only non-constant term will be the deviation part i.e 

as the forecast is the variable term. Also the above accuracy measure if k=l is the accuracy

measure for most of the industries doing the demand planning.


In the above method we can see that there are two kinds of forecast being generated:

• Using the state-of-art Time-Series Algorithm

• Using other Regression-Based Algorithm

The new method ensembles the two forecast with a specified weightage to each of the forecast and gives the result.

In the above figure, the model is basically a kind of neural network that decides the weightage for each model based upon the deviation/errors generated in the previous month s [10]. Here the weightage is decided on the errors generated in the last three months of the check of the forecast against the actuals. Hence, unlike the usual ensemble where the weightage is decided in the validation part of the model, the weightage here decided on the errors the model generates in the previous history. Thus, the weightage comes into the picture after the greedy selection of the model. Below is the steps followed in this method:

8) Step 1: Data Processing- the outliers are treated using MAD concept. In this the instead of the state of art m ean ± 3 * (standard – d evi ation ), we use m edian ± 3 * (median – absolute – devi ation ) [11]. Here, for treating the outliers and finding out whether that par­ ticular point (here in this case sales for the particular month), is actually an outliers or not. Once the outliers are observed, the seasonality pattern for those outliers observed points is checked, if some points are observed as the seasonal points, then these points are not imputed. Steps for the outliers treatment using MAD (Median­ Absolute-Deviation):

1- Check the MAD for each point for the rolling window of 12. For the first 6 points in the dataset and the last 6 points, instead of the window 12, it is the ne xt 6 and the previous points.

2 -Mark the points which are above/below median ±

3 * (median – absolute -deviation).

3- Check the seasonality for the point observed to be potential outliers

4- For the extreme points, impute them with median+

2 * (median – absolute – deviation) and for the

bottom outlier points it is imputed with median –

1.5 * m edian – absolut e – d evi ation

The term “2” and “1.5” use d above can be changed based on the data used for the forecasting.

9) Step 2: Dataset Creation– For running the model and generating the weights that to be assigned to the forecast for each of the sku from each of the model (time-series model output and regression-based model output) for that particular month, 4 times of each model execution runs has to be done with the current months and 3 previous months actual sales file. For Example, if the forecast is to be generated for the month of September’ 17, the files to be considered are:

Dataset with actuals till September’17: dt

 Dataset with actuals till August’17 : dt1

Dataset with actuals till July’17: dt_ 2

 Dataset with actuals till June’17: dt _3

10)  Step 3 : Model Run-  All the datasets d;, for i = t, t -1 , t – 2, t – 3, is input to the two defined above models .

11) Step 4: Forecast generation- The output generated for the two models are: Ji,, i = t + 1, …, t + 17 from Time Series Model-h ,, i = t + 1, … , t + 17  from  Regression Based

12) Step 5: Weights Generation- Weights for each of the model is found out from the deviation of the forecast from the actuals in the model. Let a; be the actuals for the data t – 1, t – 2, t – 3.

Let wli and w2; be the weights assigned to each models. Then,

This mostly tries to penalize that model which performs bad in the validation model runs (for the previous months to generate the weights) . Thus , trying to decrease the variance that is generated by considering both the time­ series based model and the regression based model as the weak-learners.

The weight which will be multiplied to the forecast generated for the t + 1 months or time-frames will be:

This happens for each model. Also the number of terms inside the average can also vary depending upon the type of the dataset and also the number of model runs will also depend on the same.

7) Step 6: Final Forecast Generation- The weights gen­ erated in the above step is multiplied with the respective forecast generated by the model to give the final forecast output. In the next final step for the final forecast gen­ eration, some market intelligence is incorporated to give the Final Forecast.

IV. Results

In this section will try to showcase the accuracy

comparison with the ac cies of Time-Series Model and Regression-Based Mode e overall results will be shown in the three months consistent accuracy measure comparison. All the results will be compared against one market. And for the confidentiality, the name of the market and the sku name is masked.

A. Accuracy Comparison using FACC measure:

In the above table, the total number of SKU’s that we have taken into the consideration is 2909. The accuracy measure in the above table is

Here, k is 3. For the Jun-MT 17, the months considered for

the weights generation is the previous months. Thus, for the results generation, if the actuals will be till January’ 17 and the forecast will be February’ 17 onwards and the accuracy will be calculated for the months of April’ 17, May’ 17 and June’ 17 absolute sum of deviations from the consecutive actuals sum. From the above results we can see that that the proposed method is proving to be better than the individual model with a significant difference. Also the stability is better as it is consistent in the time frame as well.

B. Accuracy Comparison using the Accuracy measure:

Here   in    the   above    table    the   accuracy    measure    that has been   taken   into   the   account   is   Accuracy    =   1  –

Thus, we can see that even the above table, our proposed methodology for Ensemble out performs incomparison to the individual models which are Time­ Series Model and Regression Model. Also the above results prove that the proposed solution is stable in the span of the three months.

C. Accuracy Comparison for optimizing the weight measure:

For considering which time zone or time period to be taken care for the optimization of the generation of the weights, the above results shows that optimizing using the N-1 i.e two months after the forecast generation and for the MT, are comparable. In this case we have taken the results using the N-1 optimizing the weights.

D) Figures and Plots

The below figure is an example of the dashboard generated for a sku as the output of the forecasting methodology. In that it is clearly visible that the proposed result accuracy is better than the individual model output.

For the three months stability for the model output, we are showing the graphs below.

E) Plot for June Forecast

F) Plot for July Forecast

G) Plot for Aug Forecast

From the above figures, we can see that the proposed model is trying to smoothen the difference generated by both the models for the consecutive months, thus according to the theory the variance is getting decreased. If this scenario doesn’t prevail, the accuracy will not be great, and the model will not perform better.

The model which we have presented has all the state-of-art statistical methods used in the demand forecasting fields. In the results above we can see that ensemble of the results of time-series model and regression based model gives a better result due  to the fact of nullifying the over-forecasting and under-forecasting and bringing the forecast values near to the actual in most of the cases and giving weightage to that model which has performed better in the previous history.

On top of it the Ensembling makes the deviation to minimise as the averaging of two low bias high variance models to a result and the model as low bias and low variance. though the bias will a bit increase, the decrease in nee dominates to improve the accuracy measures than the individual models. These results are far better than considering individual algorithms used in the two models.

Accurate forecast are very important for the demand planning team. The data used in this research and building the model is using the sales-in data for one particular market and then testing it across different data. The important factor to be considered is the stability of the model and removing the ga me-playing. Two open-source platforms are used to build this model. Time-series model is developed in R-Studio and Regression Based model using the data mining algorithms developed in Python. After the results are generated, Ensemble of results is validated and generated using the Microsoft Excel. In the future scope of the project, the Time-Series and Machine Learning to be built in one platform and check how the minimization of mse produces the forecast.


[l] Hsu, Che-Chiang, and Chia-Yon Chen. “Applications of improved grey prediction model for power demand fore­ casting.” Energy Conversion and management 44.14 (2003): 2241-2249.

2) Witt, Stephen F., and Christine A. Witt. “Forecasting tourism demand: A review of empirical research.” International Journal of forecasting 11.3 (1995): 447-475.

3) Dietterich, Thomas G. “Ensemble methods in machine learning.” Multiple classifier systems 1857 (2000): 1-15.

 4) Giorgi, Filippo, and Linda 0. Mearns. “Calculation of average, uncertainty range, and reliability of regional climate changes from AOGCM simulations via the reliability ensemble averaging(REA) method.” Journal of Climate 15.10 (2002): 1141-1158.

5) Shumway, Robert H., and David S. Stoffer. “An approach to time series smoothing and forecasting using the EM algo­ rithm.” Journal of time series analysis 3.4 (1982): 253-264.

6) Bishop, Christopher M. Pattern recognition and machine learning . springer, 2006.

7) Caussinus, Henri, and Olivier Mestre. “Detection and correction of artificial shifts in climate series.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 53.3 (2004): 405-425 .

8) Giering, Michael. “Retail sales prediction and item rec­ ommendations using customer demographics at store level.” ACM SIGKDD Explorations Newsletter 10.2 (2008): 84-89.

9) Yuan, Ming, and Yi Lin. “Model selection and estimation in regression with grouped variables.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.1 (2006): 49-67.

10) Kumar, P. Arun Raj, and S. Selvakumar. “Detection of dis­ tributed denial of service attacks using an ensemble of adaptive and hybrid neuro-fuzzy systems.” Computer Communications 36.3 (2013): 303-319.

11) Leys, Christophe, et al. “Detecting outliers: Do not use standard deviation around the mean , use absolute deviation around the median.” Journal of Experimental Social Psychol­ ogy 49.4 (2013): 764-766.

Leave a Reply

Your email address will not be published. Required fields are marked *