How to Find a Comfortable Bus Route - Towards Personalized Information Recommendation Services

on different routes at different calculated from bus IC cards data history. a multi-objective program with various constraints is suggested to recommend comfortable bus routes for bus passengers, and a genetic algorithm is developed to search for expected solutions. The proposed method is implemented using bus IC cards data from Chongqing, China and will be a promising tool for bus passengers when choosing comfortable bus routes.


Literature review
In recent years, much attention has been paid to programming city bus routes and information recommendation services. Most studies focus on bus scheduling and planning (Cheng, 2010). Bo et al. (2002) presented an optimal method of peak curve according to the Fisher algorithm of serial specimen clustering and set a random optimum model by the theory of random service system. Ge et al. (2005) proposed an improved discrete method to produce a bus timetable that takes both the cost of dispatching a vehicle and the value of passengers' waiting time into consideration under the constraint of the vehicle capacity. Avishai (1984) put forward four methods of passenger flow statistics that are used to determine the departure time and intervals between buses. Haghani and Chiang (2003) took bus lines as a network and completed the route of the journey for every passenger to manage the vehicles centrally. The goal was to make the passengers' waiting time shorter and the cost of vehicles lower under the conditions of limited parameters.
Moreover, a group of researchers focused on how to plan bus routes from both the passengers' and bus companies' perspectives. Chen et al. (2004) analyzed urban bus dispatching in terms of service, quality, and efficiency and built a bus service frequency optimal model with a multi-target optimization model to fit passengers' satisfaction as well as the benefit of bus corporations. Li et al. (2009) set up a bus scheduling optimization model with bus revenue as the object function and chose passengers' satisfaction with waiting time and congestion using a simulated annealing evolutionary algorithm based on a self-adaptive operator.
In terms of data collection and data analysis, both domestic and overseas scholars have made great progress. Dai et al. (2006) have done research on an analysis of passengers' travel information and traffic statistics through the bus IC card records of passengers. Bagchi (2004) examined the impact of smart-card data in relation to two case studies: one was concerned with the impact on the data collection process and the other explored the impact on travel behavior.
Meanwhile, several works exist that discuss personalized information recommendation services, which have been applied into various contexts such as job search (Al-Otaibi, 2012;Malinowski, 2006;Paparrizos, 2011), expert finding (Cao, 2005;Fang, 2007;Macdonald, 2008), travel planning (Park, 2008;Staab, 2002;Yu, 2009), and restaurant recommendation (Lee, 2006;Liang, 2008;Mui, 2001). For example, Al-Otaibi and Ykhelf (2012) built recommender systems to provide job information for job hunters. Balog et al. (2009) proposed language models to find appropriate experts to help users solve tricky problems. In tourism, a personalized route recommendation service was designed for theme parks, and RFID information and tourist behaviors were explored to support decisions (Park, 2008). In recent years, online information service has become more and more personalized. As a representative of applications in the field of online maps, a real-time bus route can be easily and quickly discovered on Google maps. In individual cities such as Beijing and Xiamen, citizens can get information about the location and arrival time for buses through a mobile phone app.
However, most of this research and these applications focus on convenience when choosing a bus route while few are concerned with the information recommendation services. To cover the research gap discussed above, a multi-objective program with various constraints is suggested to recommend comfortable bus routes for bus passengers, and a genetic algorithm is developed to search for the expected solutions.

The proposed approach
Based on the existing research results in Section 2, this section will present the proposed approach. It is composed of three steps, which are shown in Figure 1. First, data from IC cards are merged and processed. The information about bus stations and bus lines is integrated with the IC card data. Second, the average driving time between different bus stations by different bus lines is computed. The number of passengers on a bus can also be estimated although only the number of passengers who get on the bus is available. Finally, to make a personalized recommendation, the preference of the individual passenger must be taken into account by allowing passengers to give different weights to different factors. Through the model proposed, bus lines have integrated waiting time, crowdedness, and driving time according to passengers' needs. The method provides a tool for bus passengers when choosing comfortable bus routes.

Data merge and process
Because the data of bus IC cards is massive, some indicators need to be further processed. The following are the calculation methods.
3.1.1 Calculating at which bus station to get off Bus passengers don't swipe cards when getting off the bus in several cities. Therefore, we cannot directly find out the number of passengers getting off a bus at any bus station. The original method used to discover this number can be described like this. If a passenger swiped an IC card at bus station A and then swiped an IC card at bus station B and bus stations A and B belong to the same bus route, where the passenger got off can be inferred. However, the above method may not work when the same card is reused. Then the attraction rate of the bus station can solve the problem. Attraction rate reflects the attractive intensity of a bus station to passengers. Considering the symmetry of passenger flow from up-link and down-link, the rule can be deduced.

Calculate the times when a bus is crowded
Mathematically, a bus's crowded time is formulated as 1 1 2 where d ij is the driving time from bus station i to bus station j; c ij is the crowded time from bus station i to bus station j; h i is the number of passengers from the bus station i; s 1 is the first threshold value of number of crowded passengers on the bus. s 2 is the second threshold value of the number of passengers on the (1), we can deduce that the more passengers on the bus, the more complaints will be made by the passengers.

Building the model
First, the decision variables and parameters to be used in the model are defined as follows: x ij pq is the decision variable if the bus route p is recommended from bus station i to bus station j at time period q then x ij pq = 1; otherwise x ij pq = 0; S is the total number of bus stations, Q is the total number of time periods and B is the total number of bus routes; w ij pq , d ij pq , c ij pq stand for the waiting time, driving time, and crowded time from bus station i to bus station j at time period q; w max , d max , c max are the maximum values of w ij pq , d ij pq , c ij pq ; T 1 , T 2 , T 3 are three objectives standing for waiting time, driving time, and crowded time.
Then, a model is built as follows: In the above model, there are three objectives. Objective function (2) minimizes the waiting time, objective function (3) minimizes the driving time, and objective function (4) minimizes the crowded time. Aside from the objectives, constraint functions (5), (6), (7) guarantee that waiting time, driving time, and crowded time are within certain limits. Constraint function (8) guarantees that a comfortable bus route between any two bus stations can be recommended. In order to solve the model more conveniently, we next transform a multi-objective programming problem to a single objective programming problem as follows: The score is the general performance of the bus route, taking T 1 , T 2 , T 3 into account. ω s is the weight of the object. ω s is assigned between 0 and 1. The sum of ω s is 1. In a real world application, the weights totally depend on decisions made by passengers. They can choose the significance among different factors and offer different weights for these factors. As a result, the recommended bus lines may change as the weights of different factors change.

The proposed algorithm for solving the model
There are many different methods used to solve problems with multiple objectives (Steuer, 1992;Vassilew, 1993). The most common one is the linear weighting method (Hwang, 1981;Yu, 1985). In this, multiple objectives are transferred into an ordinary linear program by adding them with different weights. Although it is easy to understand, it is still a little bit subjective because of its man-made weight (Deb, 2001). Because more and more researchers are focusing on multi-objective optimization, there are many effective methods to use, in particular the multi-objective genetic algorithm (GA). Multi-objective GA has almost a similar main procedure as GA. The GA solution process to solve the model includes computing fitness, selection, crossover, mutation, and stopping criteria.
The main GA procedure is shown in Figure 2.

Initial population
In this part, before the initialization, we do encoding for the chromosome that is a specific solution vector, usually recognized as x in the solution set X. Genes, which always represent certain features, are distributed in a chromosome. Encoding is designed for matching the solution set and the chromosomes (Konak, 2006). A great number of encoding schemes have been used due to the features and identities of problems (Fan, 2011). Binary encoding and real encoding are the most familiar techniques. Encoding takes a vital role in GA programming as it determines the mode and efficiency of the algorithm operation. Deciding the population size and initial population are the next step. Aside from special demands, the population is generally randomly initialized.

Computing fitness
The fitness function is calculated according to its objective function and its constraint condition, in order to choose the fitter chromosomes to be parents in the next step, selection. The smaller the value of the fitness function and the higher the rank of the chromosome, the better the resulting vector is.

Selection
As mentioned above, chromosomes are ranked with a fitness value. There are several methods with which to select fitter ones to be the parents, such as roulette wheel selection, stochastic universal sampling, local selection, and tournament selection. For example, the roulette wheel is based on a randomly generated percentage. Each chromosome has a probability based on its fitness value (Fan, 2011).

Crossover & mutation
In the crossover step, parents swap some of their genes with each other and generate new offspring (Mirrazavi, 2001). The resulting mutation increases the diversity of the genes and chromosomes by changing the genetic value with a given probability (Fan, 2011). There are several crossover and mutation methods. The easiest one is to employ the existing function in a GA toolbox, such as the gatbx, which is made by Sheffield University in the UK. In our model using the multi-objective GA, we choose the parallelism selection genetic algorithm to solve the model. The parallelism selection genetic algorithm is a typical measure used in multi-objective problems, which need to divide the population into several groups to solve the respective objectives. The number of groups is determined by the quantity of the objective. Each group has its own fitness function and selection results. We merge all groups into one and implement the crossover and mutation step afterwards.
Considering that the three objectives in the model have different scales and dimensions, we normalize them using objective attainment . The definitions of the three are as follows: where the values of u(T 1 ), u(T 2 ) and u(T 3 ) are between 0 and 1. We encode the chromosome just like the encoding methods in the 0-1 integer programming problem. Each gene represents a decision variable that either chooses a bus route or not. The binary code is necessary and suitable here. The notation for chromosome representation is the following: where m is the number of decision variables (alternative routes). One chromosome indicates a one solution scheme.

The proposed GA algorithm
On the basis of the model, the fitness function considers the constraints mentioned above. The constraint is that there is only one alternative route that can be the final recommendation result. The pseudo-code of our fitness function is as follows: while each row if(constraint) result=inner product of decision variables & coefficient (refer to waiting time, driving time…) else result=max value end if end As for the selection, crossover, and mutation methods, we adopt the existing function in gabtx with a population size of 100, generation gap of 0.9, maximum number of generations of 100, and a crossover rate of 0.7.

Data description and evaluation criteria
The data resources are the bus IC card records for three days in June 2012 in Chongqing, China. In order to achieve the visualization experimental results, the experiment used records in the period from 8:00-9:00, which is in the morning rush hour. According to the bus IC card records, bus stations with high passenger flow volume are considered as hot stations. The geographical distribution of hot stations in Chongqing, China is shown in Figure 3.
According to the distribution of hot stations, 20 hot lines between two hot stations are chosen randomly for the experiment.

Experimental results
The three objects of waiting time, driving time, and crowded time are given equal weight. Using the data provided and the model above, we finally arrive at the empirical results in Table 1. In the meanwhile, computing time for each recommended result between the original bus stop and the destination is less than one second.
To do the comparison, we search on Google (https://www.google.com/maps/) to compare the recommendation results. Because the data for the experiments were produced in 2012 and the search results were done in 2014, bus line changes might have occurred, leading to inconformity ( Table 2).
To evaluate the degree of satisfaction for the recommendation results, the questionnaire presented in the appendix was designed for passengers to complete ( Table 3). In the questionnaire, the recommendation results from the proposed method and Google Map are randomly shown for a fair comparison. The final

Conclusion
This paper proposes an approach for the problem of how to effectively and efficiently recommend comfortable bus routes to bus passengers. In this approach, multi-objective programming with various constraints and a genetic algorithm are utilized to solve the problems. Using the method proposed, waiting time, crowded time, and driving time between different bus stations in different bus routes at different times are calculated using data from bus IC cards. The proposed method was implemented using data from bus IC cards in Chongqing, China. In order to make the result of experiment visual, 20 hot lines between two hot stations were chosen randomly to verify the results. An empirical analysis was carried out. The most comfortable bus route between the starting point and destination was recommended, which integrated the factors of waiting time, crowdedness, and driving time that passengers were most concerned about. In summary, the proposed approach allows bus passengers to effectively and efficiently choose a comfortable bus route according to their own needs. Some research issues remain for further studies. First, the massive amount of bus IC cards data needs to be processed in a more detailed fashion. More relationships and knowledge may be found by further data analysis. In addition, presenting the results in the form of a mobile app would be more effective. Finally, the method proposed would be improved by reading evaluations done by bus passengers.

Acknowledgments
This research work was partly supported by the National Natural Science Foundation of China (Grant No. 71001103, 91224008, 91324015, 91324019), the Humanities and Social Sciences Foundation of the Ministry of Education (No. 14YJA630075), the Beijing Natural Science Foundation (No. 9122013), the Beijing Social Science Fund (No. 13JGB035), the Beijing Nova Program (No. Z131101000413058), and the Program for Excellent Talents in Beijing.