Previous research on sales prediction has always used a single prediction model. However, no single model can perform the best for all kinds of merchandise. Accurate prediction results for just one commodity are meaningless to sellers. A general prediction for all commodities is needed. This paper illustrates a novel trigger system that can match certain kinds of commodities with a prediction model to give better prediction results for different kinds of commodities. We find some related factors for classification. Several classical prediction models are included as basic models for classification. We compared the results of the trigger model with other single models. The results show that the accuracy of the trigger model is better than that of a single model. This has implications for business in that sellers can utilize the proposed system to effectively predict the sales of several commodities.
Sales prediction is playing a growing and important role in many fields, such as economic forecasting, electric power forecasting, resource prediction, etc. Sales prediction is an important prerequisite for enterprise planning and correct decision making, allowing companies to better plan their business activities (
Sales prediction is important for offline businesses, especially car sales, real estate, and other everyday ventures (
Previously, sales prediction research on online sales has been less studied because of the scarcity of real data on the subject. With the popularity of smart mobile terminals, Ecommerce, especially B2C (BusinesstoCustomer), has been booming in recent years. Thus, the appropriate sales prediction method in the field of Ecommerce to promote efficiency in online sales operations is a significant issue. In comparison with offline sales, ecommerce has its own sales characteristics, such as detailed basic user information and Web browsing information.
The paper proceeds with a new perspective that focuses on how to choose an appropriate approach to forecast sales with higher effectiveness and more accurate precision. The data for this paper have been provided by a well known, competitive Chinese online shopping company that is part of the B2C market in ecommerce book sales. We delve into a new research field, ecommerce, and apply real sales data to several classical prediction models, aiming to discover a trigger model that could select the appropriate forecasting model to predict sales of a given product. There is no doubt that it will effectively support an enterprise in making sales decisions in actual operations. It will enrich the theoretical basis and research methods in the background of big data.
The remainder of the paper is organized as follows. Section 2 discusses several existing classical sales prediction research methods and models, which are the theoretical background of our study. Section 3 presents the data analysis and processing method and then the trigger model. Section 4 verifies this model through empirical analysis. Section 5 contains the conclusion and discussion.
In this section, we will briefly review the previous studies on sales prediction and several classic prediction models. More than 200 kinds of prediction methods have been developed, which can be divided into two categories, subjective and objective methods.
The subjective prediction method is based on the experience of experts who judge and estimate. It is strongly subjective and flexible. Examples are the Delphi method (
Most conventional sales prediction methods introduce either factors or time series to determine the forecast. McElroy and Burmeister (
However, the above research focused mainly on improving the accuracy of sales prediction via optimizing a single model algorithm or analyzing the factors that influence sales. For special cases, such as when the sales volume was zero, the single prediction model didn’t perform well. In addition, most of the previous methods only predicted results for one object, for example, one kind of book’s sales. In actual situations, the approach needs to cover a large scale of products. Thus, the traditional single model optimization method has significant limitations in sales prediction.
We built a trigger model system instead of depending on a single model algorithm. Based on data about factors that influence sales, “the system” triggers one of the prediction models discussed previously, leading to better prediction results than before. Also, our method can be used for a much larger scale of sales prediction. Therefore, we provide a new proposal for sales prediction research, which has been proven to be a significant improvement over past methods through our validation.
In this section, a trigger system is proposed for online sales prediction. An overview of the proposed framework is illustrated in Figure
The framework of our Trigger Model System.
As can be seen from Figure
As it is difficult to collect a large quantity of data, we selected books to be our research objects. Sales records for books are more stable than for other commodities such as household appliances, which fluctuate with the seasons.
In the book sales market, there are significant differences between the sales data in entities trading and that in B2C. They both have common basic information (name, category, author, etc.) and trading information (time, sales volume, etc.), but special information, such as total attention, reference price, dealing price, and comments, is very important in the B2C platform. Total attention means the sum of “clicking”, “searching”, and “following” quantities. When customers search for a book, click a web page, or follow a book, the behavior is recorded. A product’s total attention represents its degree of popularity. The reference price refers to the book’s original price while the dealing price refers to the price the book actually sold for. “Comments” refers to customers’ comments on the book.
The raw data we used was taken from 5 different database tables. For reasons of privacy, personal information was excluded. We used Excel to manipulate the raw data, aiming to filter redundant data and irrelevant attributes. Figure
Entity relationship diagram.
Some basic models are the foundation of our research. We first divided each book’s data into two parts. The data from Feb 1st to Apr 30th were used for training the prediction models, and the data from May 1st to May 31st were used for testing the prediction models. We used each basic model to do the prediction for the whole data set. Then we proposed a trigger model to choose which model performs the best for a certain SKU. The technique for the basic models is as follows.
The Artificial Neural Network is a mathematical model which imitates the distributed parallel information processing animal behavior characteristic of neural networks. It is widely used in the prediction field. The BPNN, created by MATLAB, is a feed forward neural network.
The Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model is a regression model specifically for financial data. It makes a further model for error variance, which is especially suitable for the analysis and forecast of volatility.
The Extreme Learning Machine (ELM) is an algorithm of NN, which is a generalized singlehidden layer feed forward network. It performs faster than any other model. Besides regression, it can also be used as a classification model. Thus, we use ELM not only as a basic prediction model but also as the classification model for our trigger system.
We selected 4/5 of the total dataset as the training set of our trigger model. The data were selected randomly; thus, the property features were evenly distributed. 3/4 of the training set was treated as the base of the trigger model while the other 1/4 of the training set was used to test the model’s effects. Figure
Trigger model research procedure.
Before the experiment, we hypothesized that many properties, such as total sales and sales variance, were related to model selection. We made full use of all provided data and found that the following are closely related to classification.
CV Sales (CVS): coefficient of variation for sales.
CV Attention (CVA): coefficient of variation for attention.
Sold Price Variation (SPV): the variation of sold price, which is different from the market price. It includes the price difference for promotions. We analyzed both, and the SPV gave better results.
The SPV calculation is shown as follows.
where max and min S are the highest and the lowest sold price during the 3 months, and S is the average sold price.
Besides the prediction model we mentioned above, another prediction model can also be used as the basic model. With this model the prediction for the base data is done, and then the result is added to the system.
The raw data were collected from the professional B2C online shopping market. Sales related data from February 2013 to May 2013 of 199 different books were provided. The books were randomly chosen from the top 1000 sales.
We identify several important and representative attributes to support our study. Data are processed in order by SKU and time. The involved attributes are listed in Table
Details of attributes.
Attribute  Description 



the unique identification number of the book  
how many times the book is searched daily  
how many times the book’s webpage is clicked daily  
how many times the book is followed daily  
due to time limits, we use only a daily number of comments in this research  
the book’s original price  
the book’s actual sold price  
past sales 
In particular, numbers for searching, clicking, and following data were sparse. Thus, we used total attention, the sum of the three parts, as degree of attention. All these factors may be used in prediction models.
Among the 199 SKUs, we selected 160 to be the training set of our trigger model. Because the data were selected randomly, the property features were evenly distributed. 120 items from the training set were treated as the base of the trigger model while the other 40 items from the training set were used to test the model.
We divided every book’s data into two parts, the training set and the testing set. The data from Feb 1st to Apr 30th were for training the prediction models, and the data from May 1st to May 31st were for testing the prediction models. Next, we applied the training set data (160 books’ attributes and sales data) and the 3 selected prediction models to train our trigger model in the appropriate environment. The details are shown in Table
Details of prediction models.
Model  Prediction Method  Variables  Environment 



Model 1  BPNN Model  previous sales, total attention, daily average price  MATLAB 
Model 2  GARCH Dynamic Model  previous sales, total attention, daily average price  EVIEWS 
Model 3  ELM Model  10 Variables  MATLAB 
The Mean Absolute Percentage Error (MAPE) was used to evaluate the performance of our proposed method. Because the books had significant sales volume disparities, MAPE could measure more accurately. The calculation of MAPE is shown as follows.
where Ft is the actual sales, Ft' is the forecast sales, and N is the number of items.
We used the testing set data (data for 39 books) in the trigger model’s validation experiment. The experiment results are shown in Figure
Trigger model prediction results.
We can see from this figure that the total prediction trend of the trigger model is fairly flat. Most of the books’ prediction results are less than 1. About half of the books’ results are less than 0.5. After calculation, the average MAPE is 0.540844.
In this section, we contrast the predictive effects of Model 1, Model 2, and Model 3, and the trigger model. What is more, we select ARMA as the baseline model and introduce its prediction result into the result comparison.
As can be seen from Figure
Testing data set comparison results.
Testing data set MAPE comparison.
MODEL  Model 1  Model 2  Model 3  ARMA  Trigger Model 



0.99771  0.797165  0.985958  1.144498  0.540844 
This paper presents a new approach, building a trigger model for forecasting selection, to improve accuracy and efficiency in the area of ecommerce. We applied two typical forecasting models and several dimensions to the trigger model through training and testing the classification model with real sales data. Finally we obtained more accurate forecasting results than could be obtained by executing a single model. However, the study has some weak points. First, the amount of raw data is not enough. The forecasting accuracy needs to be increased further; moreover, we only selected two forecasting models to classify. More models need to be introduced to broaden the trigger model’s application scope.
In conclusion, we present the idea of using a “trigger model” in the area of sales prediction. This focuses on the correlation of two subjects and ignores the causal relationship between them. It reflects the basic idea of “Big Data”. In the future, the trigger model could be made smarter and more mature. If successful, the trigger model is likely to have a considerable impact on sales prediction.
This research work was partly supported by 973 Project (Grant No. 2012CB316205), National Natural Science Foundation of China (Grant No.71001103, 91224008, 91324015), Humanities and Social Sciences Foundation of the Ministry of Education (No. 14YJA630075), Beijing Natural Science Foundation (No. 9122013), Beijing Social Science Fund (No. 13JGB035), Beijing Nova Program (No.Z131101000413058), and Program for Excellent Talents in Beijing.