Identifying outliers: The first step is to identify the outliers. This can be done manually or by using statistical methods. Some common methods for identifying outliers include:
- The interquartile range (IQR): The IQR is a measure of the spread of data. To calculate the IQR, first find the first and third quartiles of the data. The IQR is the difference between the third and first quartiles. Data points that are more than 1.5 times the IQR away from the nearest quartile are considered outliers.
- The z-score: The z-score is a measure of how far a data point is from the mean. A z-score of 2 or more indicates that a data point is more than 2 standard deviations away from the mean. Data points with z-scores of 2 or more are considered outliers.
- The Cook’s distance: The Cook’s distance is a measure of how much a data point affects the fit of the model. Data points with Cook’s distances of 1 or more are considered outliers.
Removing outliers: Once you have identified the outliers, you can decide whether to remove them from the data set. There are a few things to consider when making this decision:
The cause of the outlier: If the outlier is caused by a data entry error or an equipment malfunction, then you should remove it from the data set. However, if the outlier is caused by a real event, then you may want to keep it in the data set.
The impact of the outlier: If the outlier has a small impact on the fit of the model, then you may not need to remove it from the data set. However, if the outlier has a large impact on the fit of the model, then you should remove it from the data set.
Treating outliers as separate data points: Another option is to treat outliers as separate data points and forecast them separately. This can be done by using a different forecasting method for outliers or by creating a separate data set for outliers.
Using outliers to improve the forecast: Finally, you can also use outliers to improve the forecast. This can be done by identifying the cause of the outlier and adjusting the forecast accordingly.
The best way to manage outliers in demand forecasting will depend on the specific data set and the forecasting method. However, by understanding the different statistical methods for managing outliers, you can choose the method that is best for your data set.