1 Introduction

With rapid economic growth, city sizes are expanding, and city traffic flow is surging. Thus traffic congestion, traffic accidents, and serious air pollution are becoming more common. Priority has been given to the development of urban public transportation policy by using GPS, GIS, the internet, and communication technology to realize the collection, transmission, storage, and processing of the massive historical and real-time data gathered from bus IC cards, in order to build an intelligent, modern city public transport dispatching platform that will solve social problems such as urban traffic congestion, energy shortages, and air pollution. In addition, when passengers choose a reasonable travel plan, they can lessen not only their travel costs but can also increase the efficiency of public transport vehicles. Therefore, the choice of a reasonable bus route is of great importance to the daily operation and management of urban traffic.

In recent years both in China and abroad, urban public transport has been the subject of extensive research and discussion. However, there are still many complaints about crowded buses, the length of waiting time, and the unpunctuality of the buses. How to effectively and efficiently recommend comfortable bus routes to bus passengers is a challenging and complex task. To address this critical challenge, we must consider factors reflecting the passengers’ demands such as waiting time, crowded time, and driving time. To recommend comfortable bus routes for bus passengers, we suggest using multi-objective programming with various constraints and have developed a genetic algorithm to search for solutions. As a result, a bus route according to the differing requirements of passengers can be recommended.

The rest of the paper is organized as follows. In Section 2, we review related work on the data processing of bus IC cards and personalized information recommendation services. In Section 3, we present a multi-objective program with various constraints to recommend comfortable bus routes for bus passengers and use a genetic algorithm to search for solutions. In Section 4, we discuss our implementation and empirical analysis of the proposed method. Finally, we conclude the paper and point out future research directions.

2 Literature review

In recent years, much attention has been paid to programming city bus routes and information recommendation services. Most studies focus on bus scheduling and planning (Cheng, 2010). Bo et al. (2002) presented an optimal method of peak curve according to the Fisher algorithm of serial specimen clustering and set a random optimum model by the theory of random service system. Ge et al. (2005) proposed an improved discrete method to produce a bus timetable that takes both the cost of dispatching a vehicle and the value of passengers’ waiting time into consideration under the constraint of the vehicle capacity. Avishai (1984) put forward four methods of passenger flow statistics that are used to determine the departure time and intervals between buses. Haghani and Chiang (2003) took bus lines as a network and completed the route of the journey for every passenger to manage the vehicles centrally. The goal was to make the passengers’ waiting time shorter and the cost of vehicles lower under the conditions of limited parameters.

Moreover, a group of researchers focused on how to plan bus routes from both the passengers’ and bus companies’ perspectives. Chen et al. (2004) analyzed urban bus dispatching in terms of service, quality, and efficiency and built a bus service frequency optimal model with a multi-target optimization model to fit passengers’ satisfaction as well as the benefit of bus corporations. Li et al. (2009) set up a bus scheduling optimization model with bus revenue as the object function and chose passengers’ satisfaction with waiting time and congestion using a simulated annealing evolutionary algorithm based on a self-adaptive operator.

In terms of data collection and data analysis, both domestic and overseas scholars have made great progress. Dai et al. (2006) have done research on an analysis of passengers’ travel information and traffic statistics through the bus IC card records of passengers. Bagchi (2004) examined the impact of smart-card data in relation to two case studies: one was concerned with the impact on the data collection process and the other explored the impact on travel behavior.

Meanwhile, several works exist that discuss personalized information recommendation services, which have been applied into various contexts such as job search (Al-Otaibi, 2012; Malinowski, 2006; Paparrizos, 2011), expert finding (Cao, 2005; Fang, 2007; Macdonald, 2008), travel planning (Park, 2008; Staab, 2002; Yu, 2009), and restaurant recommendation (Lee, 2006; Liang, 2008; Mui, 2001). For example, Al-Otaibi and Ykhelf (2012) built recommender systems to provide job information for job hunters. Balog et al. (2009) proposed language models to find appropriate experts to help users solve tricky problems. In tourism, a personalized route recommendation service was designed for theme parks, and RFID information and tourist behaviors were explored to support decisions (Park, 2008). In recent years, online information service has become more and more personalized. As a representative of applications in the field of online maps, a real-time bus route can be easily and quickly discovered on Google maps. In individual cities such as Beijing and Xiamen, citizens can get information about the location and arrival time for buses through a mobile phone app.

However, most of this research and these applications focus on convenience when choosing a bus route while few are concerned with the information recommendation services. To cover the research gap discussed above, a multi-objective program with various constraints is suggested to recommend comfortable bus routes for bus passengers, and a genetic algorithm is developed to search for the expected solutions.

3 The proposed approach

Based on the existing research results in Section 2, this section will present the proposed approach. It is composed of three steps, which are shown in Figure 1. First, data from IC cards are merged and processed. The information about bus stations and bus lines is integrated with the IC card data. Second, the average driving time between different bus stations by different bus lines is computed. The number of passengers on a bus can also be estimated although only the number of passengers who get on the bus is available. Finally, to make a personalized recommendation, the preference of the individual passenger must be taken into account by allowing passengers to give different weights to different factors. Through the model proposed, bus lines have integrated waiting time, crowdedness, and driving time according to passengers’ needs. The method provides a tool for bus passengers when choosing comfortable bus routes.

Figure 1

Procedure of the proposed approach.

3.1 Data merge and process

Because the data of bus IC cards is massive, some indicators need to be further processed. The following are the calculation methods.

3.1.1 Calculating at which bus station to get off

Bus passengers don’t swipe cards when getting off the bus in several cities. Therefore, we cannot directly find out the number of passengers getting off a bus at any bus station. The original method used to discover this number can be described like this. If a passenger swiped an IC card at bus station A and then swiped an IC card at bus station B and bus stations A and B belong to the same bus route, where the passenger got off can be inferred. However, the above method may not work when the same card is reused. Then the attraction rate of the bus station can solve the problem. Attraction rate reflects the attractive intensity of a bus station to passengers. Considering the symmetry of passenger flow from up-link and down-link, the rule can be deduced.

3.1.2 Calculate the times when a bus is crowded

Mathematically, a bus’s crowded time is formulated as

(1)

where dij is the driving time from bus station i to bus station j; cij is the crowded time from bus station i to bus station j; hi is the number of passengers from the bus station i; s1 is the first threshold value of number of crowded passengers on the bus. s2 is the second threshold value of the number of passengers on the crowded bus. From Eq. (1), we can deduce that the more passengers on the bus, the more complaints will be made by the passengers.

3.2 Building the model

First, the decision variables and parameters to be used in the model are defined as follows: xijpq is the decision variable if the bus route p is recommended from bus station i to bus station j at time period q then xijpq = 1; otherwise xijpq = 0; S is the total number of bus stations, Q is the total number of time periods and B is the total number of bus routes; wijpq, dijpq, cijpq stand for the waiting time, driving time, and crowded time from bus station i to bus station j at time period q; wmax, dmax, cmax are the maximum values of wijpq, dijpq, cijpq; T1, T2, T3 are three objectives standing for waiting time, driving time, and crowded time.

Then, a model is built as follows:

(2)

(3)

(4)

subject to

(5)
$0<{w}_{ij}^{pq}<{w}_{\mathrm{max}}$

(6)
$0<{d}_{ij}^{pq}<{d}_{\mathrm{max}}$

(7)
$0<{c}_{ij}^{pq}<{c}_{\mathrm{max}}$

(8)

(9)

In the above model, there are three objectives. Objective function (2) minimizes the waiting time, objective function (3) minimizes the driving time, and objective function (4) minimizes the crowded time. Aside from the objectives, constraint functions (5), (6), (7) guarantee that waiting time, driving time, and crowded time are within certain limits. Constraint function (8) guarantees that a comfortable bus route between any two bus stations can be recommended.

In order to solve the model more conveniently, we next transform a multi-objective programming problem to a single objective programming problem as follows:

(10)

The score is the general performance of the bus route, taking T1, T2, T3 into account. ωs is the weight of the object. ωs is assigned between 0 and 1. The sum of ωs is 1. In a real world application, the weights totally depend on decisions made by passengers. They can choose the significance among different factors and offer different weights for these factors. As a result, the recommended bus lines may change as the weights of different factors change.

4 The proposed algorithm for solving the model

There are many different methods used to solve problems with multiple objectives (Steuer, 1992; Vassilew, 1993). The most common one is the linear weighting method (Hwang, 1981; Yu, 1985). In this, multiple objectives are transferred into an ordinary linear program by adding them with different weights. Although it is easy to understand, it is still a little bit subjective because of its man-made weight (Deb, 2001). Because more and more researchers are focusing on multi-objective optimization, there are many effective methods to use, in particular the multi-objective genetic algorithm (GA). Multi-objective GA has almost a similar main procedure as GA. The GA solution process to solve the model includes computing fitness, selection, crossover, mutation, and stopping criteria.

The main GA procedure is shown in Figure 2.

Figure 2

The main procedure of GA.

4.1 Initial population

In this part, before the initialization, we do encoding for the chromosome that is a specific solution vector, usually recognized as x in the solution set X. Genes, which always represent certain features, are distributed in a chromosome. Encoding is designed for matching the solution set and the chromosomes (Konak, 2006). A great number of encoding schemes have been used due to the features and identities of problems (Fan, 2011). Binary encoding and real encoding are the most familiar techniques. Encoding takes a vital role in GA programming as it determines the mode and efficiency of the algorithm operation. Deciding the population size and initial population are the next step. Aside from special demands, the population is generally randomly initialized.

4.2 Computing fitness

The fitness function is calculated according to its objective function and its constraint condition, in order to choose the fitter chromosomes to be parents in the next step, selection. The smaller the value of the fitness function and the higher the rank of the chromosome, the better the resulting vector is.

4.3 Selection

As mentioned above, chromosomes are ranked with a fitness value. There are several methods with which to select fitter ones to be the parents, such as roulette wheel selection, stochastic universal sampling, local selection, and tournament selection. For example, the roulette wheel is based on a randomly generated percentage. Each chromosome has a probability based on its fitness value (Fan, 2011).

4.4 Crossover & mutation

In the crossover step, parents swap some of their genes with each other and generate new offspring (Mirrazavi, 2001). The resulting mutation increases the diversity of the genes and chromosomes by changing the genetic value with a given probability (Fan, 2011). There are several crossover and mutation methods. The easiest one is to employ the existing function in a GA toolbox, such as the gatbx, which is made by Sheffield University in the UK.

In our model using the multi-objective GA, we choose the parallelism selection genetic algorithm to solve the model. The parallelism selection genetic algorithm is a typical measure used in multi-objective problems, which need to divide the population into several groups to solve the respective objectives. The number of groups is determined by the quantity of the objective. Each group has its own fitness function and selection results. We merge all groups into one and implement the crossover and mutation step afterwards.

Considering that the three objectives in the model have different scales and dimensions, we normalize them using objective attainment (Chen, 2008). The definitions of the three are as follows:

(11)
$u\left({T}_{1}\right)=1-\frac{{T}_{1}^{\mathrm{max}}-{T}_{1}}{{T}_{1}^{\mathrm{max}}-{T}_{1}^{\mathrm{min}}}$

(12)
$u\left({T}_{2}\right)=1-\frac{{T}_{2}^{\mathrm{max}}-{T}_{2}}{{T}_{2}^{\mathrm{max}}-{T}_{2}^{\mathrm{min}}}$

(13)
$u\left({T}_{3}\right)=1-\frac{{T}_{3}^{\mathrm{max}}-{T}_{3}}{{T}_{3}^{\mathrm{max}}-{T}_{3}^{\mathrm{min}}}$

where the values of u(T1), u(T2) and u(T3) are between 0 and 1.

We encode the chromosome just like the encoding methods in the 0–1 integer programming problem. Each gene represents a decision variable that either chooses a bus route or not. The binary code is necessary and suitable here. The notation for chromosome representation is the following:

${x}_{1}{x}_{2}{x}_{3}\dots .{x}_{m-1}{x}_{m}$

where m is the number of decision variables (alternative routes). One chromosome indicates a one solution scheme.

4.5 The proposed GA algorithm

On the basis of the model, the fitness function considers the constraints mentioned above. The constraint is that there is only one alternative route that can be the final recommendation result. The pseudo-code of our fitness function is as follows:

while each row

if(constraint)

result=inner product of decision variables & coefficient

(refer to waiting time, driving time…)

else

result=max value

end if

end

As for the selection, crossover, and mutation methods, we adopt the existing function in gabtx with a population size of 100, generation gap of 0.9, maximum number of generations of 100, and a crossover rate of 0.7.

5 Empirical analysis

5.1 Data description and evaluation criteria

The data resources are the bus IC card records for three days in June 2012 in Chongqing, China. In order to achieve the visualization experimental results, the experiment used records in the period from 8:00–9:00, which is in the morning rush hour. According to the bus IC card records, bus stations with high passenger flow volume are considered as hot stations. The geographical distribution of hot stations in Chongqing, China is shown in Figure 3.

Figure 3

The geographical distribution of hot stations in Chongqing, China.

According to the distribution of hot stations, 20 hot lines between two hot stations are chosen randomly for the experiment.

5.2 Experimental results

The three objects of waiting time, driving time, and crowded time are given equal weight. Using the data provided and the model above, we finally arrive at the empirical results in Table 1. In the meanwhile, computing time for each recommended result between the original bus stop and the destination is less than one second.

Table 1

The recommended comfortable bus route results.

Origin Destination The alternative bus routes The most comfortable route Score of the best route

1 2 128,118,815 128 1.758934
3 2 341,823,148,263 823 1.108361
1 3 416 416 2.875463
4 3 403,416,463,473,471 403 1.047812
5 2 454,231,148 148 1.705109
3 5 454,148 148 1.151822
6 2 128,118,107,815 815 1.198029
4 6 818,128,118,815 815 1.366495
5 6 113,181,863 181 1.967453
7 1 132,118,605,181,114,872 114 1.171361
2 7 139,118,106,815,148 815 1.479173
5 7 113,821,270,181,202,863,201,148,801 863 0.924748
8 6 113 113 2.280954
7 8 113,270,202,201 201 1.684108
9 2 467,464,209,472 464 1.289752
5 9 133,270,166,236,807 236 1.078292
10 4 138,818,128,416,118 128 0.937499
11 1 818,416,411 818 1.194276
2 11 839,873 839 1.568126
10 11 818,416 416 1.578496

To do the comparison, we search on Google (https://www.google.com/maps/) to compare the recommendation results. Because the data for the experiments were produced in 2012 and the search results were done in 2014, bus line changes might have occurred, leading to inconformity (Table 2).

Table 2

Origin Destination The alternative bus route The fastest route

1 2 815,118,421,138 815
3 2 823,469,148,454 823
1 3 411,412,416 411
4 3 403,476,471 403
5 2 231,148,875 231
3 5 148,454,504 148
6 2 128,118 128
4 6 818,815 818
5 6 863,201,270,202 863
7 1 118,132,112 118
2 7 202,815 202
5 7 201,863,202,270 201
8 6 863 863
7 8 201,113,163,270 201
9 2 467,464,819 467
5 9 807,166,464 807
10 4 416,412,411 416
11 1 411,416,606,873 411
2 11 231,839,429,836 231
10 11 606,416,412 606

To evaluate the degree of satisfaction for the recommendation results, the questionnaire presented in the appendix was designed for passengers to complete (Table 3). In the questionnaire, the recommendation results from the proposed method and Google Map are randomly shown for a fair comparison. The final results show that the performance of our proposed method outperforms the one using Google Map in terms of waiting time, driving time, crowdedness, and other factors that influence bus route selection.

6 Conclusion

This paper proposes an approach for the problem of how to effectively and efficiently recommend comfortable bus routes to bus passengers. In this approach, multi-objective programming with various constraints and a genetic algorithm are utilized to solve the problems. Using the method proposed, waiting time, crowded time, and driving time between different bus stations in different bus routes at different times are calculated using data from bus IC cards. The proposed method was implemented using data from bus IC cards in Chongqing, China. In order to make the result of experiment visual, 20 hot lines between two hot stations were chosen randomly to verify the results. An empirical analysis was carried out. The most comfortable bus route between the starting point and destination was recommended, which integrated the factors of waiting time, crowdedness, and driving time that passengers were most concerned about. In summary, the proposed approach allows bus passengers to effectively and efficiently choose a comfortable bus route according to their own needs.

Some research issues remain for further studies. First, the massive amount of bus IC cards data needs to be processed in a more detailed fashion. More relationships and knowledge may be found by further data analysis. In addition, presenting the results in the form of a mobile app would be more effective. Finally, the method proposed would be improved by reading evaluations done by bus passengers.