Predicting Number of Passenger on Next Month with AI

OptiWisdom
5 min readFeb 10, 2020

--

An airline company needs to know about how many passengers it will have in a given term and the profit it will generate. In this direction, they need to predict the future, so they encounter artificial intelligence. The OptiScorer engine which developed by OptiWisdom and contains artificial intelligence, predicts the number of passengers in the coming months based on historical passenger data. To obtain these predictions, airline passenger data from https://www.kaggle.com/andreazzini/international-airline-passengers is used.

The dataset contains two columns in total. The first column is separated hourly in date-time format and named “Month”; the second column is passenger’s values which named “International airline passengers”. This is enough for the engine which is OptiScorer 2.0 to predict. Based on the usage rules of OptiScorer 2.0, the first column must be date-time format with days, months, years and hours, and the last column must contain a specific number. Since the dataset contains only two columns, the first and last columns are in the intended format and the predicted value is expressed in a number. In this analysis, predicted international airline passengers were between January 1949 and December 1960. Here is the view of the dataset:

As a result, a single file is sufficient to train the model. When the file uploaded to OptiScorer 2.0, the predictions obtained, and the time series graphs are found in the following figures.

Figure 1 — Time Series

To illustrate Figure 1, the grey line indicates the number of passengers (actual data), the orange line indicates the predicted number of passengers. The red line indicates how daily trends change over time of the year. Concurrently, trends in the number of passengers can be seen around holidays or long-term trends. Those indicated in green can be explained as the change in seasonal trends. In this respect, seasonal effects can be determined by using time series. Finally, the thin and grey line which is behind the green line means residual data. In other words, it can be specified as data that distorts the stability of the data.

Figure 2 — Time Series

Figure 2 represents the zoomed version of Figure 1. The orange line indicates the international passengers predicted by OptiScorer 2.0 according to machine learning. Thus, the predictions of the next six months, which is predicted according to the number of international passengers in other months can be determined.

Figure 3 — Time Series

In this time series, dataset, trend, seasonal and residuals data shown in Figure 1 are indicated separately.

Figure 4 — Predictive Time Series

This figure shows the predictive time series. In Figure 5, you can see the zoomed version of the graph.

Figure 5 — Predictive Time Series with zoomed version

Figure 5 shows the predictive upper, predictive and predictive lower values explicitly. Besides, these line charts provide a value corridor to predict future passenger numbers. The points where this corridor is broken are considered as anomaly.
The figure shows a specific day of the year. However, you can see the remaining months and years which are predicted by OptiScorer 2.0 in the excel table below.

Before explaining the following figures, which are Figure 6 and Figure 7, it is useful to know the following information.

In statistics, mean absolute error (MAE) is a measure of the difference between two continuous variables. Assume X and Y are variables of paired observations that express the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement. Consider a scatter plot of n points, where point i has coordinates (xi, yi)… Mean Absolute Error (MAE) is the average vertical distance between each point and the identity line. MAE is also the average horizontal distance between each point and the identity line.

The mean absolute error is given by:

Figure 6 — Predictive-Real Comparison
Figure 7 — Predictive-Real Comparison with zoomed version

In this figure, you can see the predictive-real comparison of the dataset. The success achieved in the data in Figure 6 and Figure 7 is 0.591 and, the mean absolute error (MAE) is 24.827. Therefore, this comparison is successful because the achieved success rate is higher than the average error rate. To explain Figure 7, the orange part above the red line represents the maximum predicted value and the yellow part that below the red line represents the minimum prediction value. Therefore, the highest value for this prediction is 569.2381, the smallest value is 513.2834 and the net predictive value is 539.9818 on Jun 31, 1960. Additionally, predictiveLower and predictiveUpper ranges specified in the figure above are called corridor, and dataset lines that are outside of this range are identified as anomalies. You can see the value of the remaining months and years in the table below.

--

--