This study demonstrates the use of mobile phone data to derive country-wide mobility patterns. We identified significant locations of users such as home, work, and other based on a combined measure of frequency, duration, time, and day of mobile phone interactions. Consecutive mobile phone records of users are used to identify stay and pass-by locations. A stay location is where users spend a significant amount of their time measured through their mobile phone usage. Trips are constructed for each user between two consecutive stay locations in a day and then categorized by purpose and time of the day. Three measures of entropy are used to further understand the regularity of user’s spatiotemporal mobility patterns. The results show that user’s in a high entropy cluster has high percentage of non-home based trips (77%), and user’s in a low entropy cluster has high percentage of commuting trips (49%), indicating high regularity. A set of doubly constrained trip distribution models is estimated. To measure travel cost, the concept of a centroid point that assumes the origins and destinations of all trips are concentrated at an arbitrary location such as the centroid of a zone is replaced by multiple origins and destinations represented by cell tower locations. Note that a cell tower location can only be used as trips origin/destination location when a stay is detected. The travel cost measured between cell tower locations has resulted in shorter trip distances and the model estimation shows less sensitivity to the distance-decay effect.

Travel demand modeling involves analysis of how much trip is generated, where these trips go, by which mode and on which routes. Except in few occasions, people travel to satisfy needs such as work, leisure, etc. or to perform some activity at a location which is not nearby. In order to understand travel demand, transport planners must understand the spatiotemporal distributions of these activity locations (

Origin-destination (OD) flow is one of the key information required to provide the basis for accurate travel forecasts by a transport planning model. This information is also vital for policy making and devising travel demand management measures. Traditional methods for collecting trips origin and destination such as surveys are costly, laborious and take the time of trip makers. In recent years, mobile phone data, which includes the passively recorded spatiotemporal trajectories of large portion of the population, have emerged as promising inputs for travel demand model development (

Trip distribution is one of the main stages of the traditional four-step transportation planning model. It reflects the pattern of trip making behavior in terms of number of trips between trip origins and destinations. Over the years, different types of trip distribution models have been developed. Some of the simplest model such as growth-factor model, is appropriate for short-term studies where no major change in the transportation network is foreseen. However, there are circumstances that cause changes in the transport network cost. One of the most known models suitable for long-term strategic studies is gravity model. This model responds better to changes in the trip pattern when important changes in the transport network take place (

The main motivation behind our study relates to the measurement problem during trip distribution model development such as those involving travel cost, which in our case is represented by travel distance. Travel cost is usually estimated by the centroid-to-centroid distance between the origin and the destination zones. This obviously is an approximation to the true average trip distance between the two zones. In addition, the centroid-to-centroid distance can lead to a zero-distance separation of the intra-zonal flow, which in reality is always positive (

The remainder of the paper is organized as follows: Section 2 reviews previous studies in the domain of OD estimation and trip distribution modeling using mobile phone data. Section 3 provides description of datasets and data preparation procedures. Section 4 presents detailed methods that are used to infer the OD trips. Section 5 presents three measures of entropy to understand regularity of user’s OD trips. Section 6 presents the results of trip distribution models. The final section outlines the limitations of our research work and the conclusions drawn.

The use of mobile phone data has been explored for the development of large scale mobility sensing since the early 2000s (

Previous study show mobile phone data have the advantage of updating OD flow estimates more frequently, which reduces the extensive time required to derive OD flows through traditional methods. This process can also be repeated with new datasets that can be obtained with reduced cost in contrast to data obtained through traditional surveys (

The problem of estimating flows between two zones is a classic problem that appears in a variety of fields such as raw material or goods distribution, flows of capital in economics, or flows of particles in Physics. One of the prior works suggested that the number of trips between two zones follows the gravity law (

Where, p_{i} and p_{j} are the populations of zone i and zone j, d_{ij} is the distance between zone i and zone j, α is the gravity constant for trip distribution, and T_{ij} is the number of undirected trips between the two zones. The gravity law in

Where, the single gravity constant for trip distribution factor α is replaced by two sets of balancing factors and _{i}_{j}B_{j}D_{j}f_{ij}_{j}_{i}A_{i}O_{i}f_{ij}_{ij}_{i} and D_{j}; f(C_{ij}) is a generalized cost function (travel cost).

Batty (

Where, C_{ij} is the average inter-zonal trip length between zone i and zone j. T_{xiyj} is the number of trips between origin subzone x at zone i and destination subzone y at zone j, and C_{xiyj} is the trip distance between subzone x at zone i and subzone y at zone j.

Regardless of the recent advances on the use of mobile phone data for travel demand modeling such as OD flows of different modes (

This study uses anonymized Call Detail Records (CDRs) of mobile phone users collected from the entire country of Senegal for the period of two weeks between January 7 to January 20, 2013. In 2013, Senegal had an estimated population of 13,508,715. The country is divided into 14 regions, which are further divided into 45 departments, and 123 arrondissements (districts) (

Consecutive mobile phone records of users are used to identify if the user stays in a particular location, engaged in some activity, or passing by the location en-route to his/her destination. Hariharan and Toyama (

One of the key differences between the study by Zheng et al. (

Because of the nature of the CDR data in our study, we cannot follow the stay location detection procedure provided by Zheng et al. (

In total, 44.4 million mobile phone connections that are obtained from 319,508 users for a period of two weeks are analyzed. After consecutive traces with time duration of less than 10 minutes are eliminated, the data points are reduced to 7.5 million stay locations (the number of times where the time duration between group of consecutive traces/calls are more than 10 minutes). Figure

Frequency of stays.

We identified the most significant locations visited by each user such as users’ home, work and “other” locations. Then, the trip made to these locations are connected to activity types of work or home or other. To identify the significant locations, a combined measure of frequency, duration, time and day of mobile phone calls is used. For each user, a home district is identified based on the aforementioned criteria during the night-time (10pm–7am) (

Figure

Figure

The analysis presented in Figure

For each user, the consecutive traces associated with the stay are arranged along the date and time of the day. A trip can be identified if the trip-maker has more than one stay location within 24 hours (one-day) period where midnight is taken as the transition time from one day to the next. It is assumed that a trip is made between two consecutive stay locations. Thus, the first interaction time at location i and the last interaction time at location i + 1 should be within a period of one day.

OD trips are categorized by time of the day (24 hours) and purpose. Home-based work trips (HBW) are trips between a person’s home and workplace. Home-based other trips (HBO) are trips between a person’s home and other destinations which are not for the purpose of working. A non-home based trip (NHB) is a trip that neither begins or ends at a person’s home regardless of the purpose of the trip. The relative share of average weekday trips for HBW is 19.5%, HBO 13.6%, and NHB 66.9%. Alexander et al., (

Figure

A number of studies have explored the use of mobile phone data for OD estimation. One of the cornerstones of these studies is the precise inference of activity locations (

Three entropy measures are calculated for each user’s mobility pattern: (i) Entropy 1: H1_{x} = log_{2}N_{x}, where N_{x} is the number of unique locations visited by the user x. H1_{x} is used to understand the degree of irregularity of a user assuming each visited location has equal probability; (ii) Entropy 2: _{x}(y) is the probability of visiting location y by the user x. The probability depends on the frequency of previous visits. H1_{x} and H2_{x} are solely based on user’s spatial pattern, which do not capture the time of location visitation. To incorporate the time parameter and the sense of OD trips (visited locations pair), a joint entropy of user’s mobility is introduced; (iii) Entropy 3: H3_{x} = –Σ_{o}Σ_{d}Σ_{t}p (o, d, t) log_{2}p (o, d, t), where o and d are spatial parameters representing the origin and destination of a trip, respectively. The origin and destination of a trip can be home (H), work (W), or other (O). The temporal parameter is represented by t, where t_{1} (9am to 2pm), t_{2} (2pm to 7pm), t_{3} (7pm to 10pm), and t_{4} (10pm to 9am). p (o, d, t) is the joint probability of o, d, and t. At this level, trips made from/to locations other than home and work locations are not explicitly modeled and classified under ‘other’ location. If “other” is considered as one location, a user can make 28 trip types: {(H, W, t_{1}), (H, O, t_{1}), (W, H, t_{1}), (W, O, t_{1}), (O, H, t_{1}), (O, W, t_{1}), (O, O, t_{1}), …, (O, H, t_{4}), (O, W, t_{4}), (O, O, t_{4})}. However, based on the two weeks data, by average a user visited 4.85 distinct locations. Thus, there can be more than 28 trip combinations.

To calculate the entropy values, 243,928 users with inferred home and workplace locations are selected. Figure _{x} and H2_{x} measure user’s regularity in terms of visited locations. In fact, pH1_{x} peaks at H1_{x} = 1.86, suggesting that on average it took around four locations to identify user’s randomly chosen next location (2^{1.86} = 3.63). The value of H3_{x} ranges between 0 and 12.7. The users with an entropy value of 0 are regarded as highly regular. These users travel between the same origins and destinations and within the same time interval, daily. On the other hand, the users whose entropy are high, travel between different origins and destinations at different time intervals.

The distribution of the entropy

To further understand user’s travel pattern, k-means clustering method is used to categorize the users into groups based on their entropy value (H3_{x}). A plot of the within groups sum of squares by the number of clusters is used to determine the appropriate number of clusters. The resulting number of clusters is three. Figure _{x}) range between 5.5 and 12.7; 3.2 and 5.5; 0 and 3.2 in the high entropy cluster, in the moderate entropy cluster, and in the low entropy cluster, respectively. The result shows 35% of the users are in the high entropy cluster with 77% of their trips are non-home based. Users in the low entropy cluster account for 21% and 49% of their trips are commuting, indicating a high regularity. The rest of the users are in the moderate entropy cluster and account for 44% of all.

The modeling framework set out in this study focuses on inter-zonal trips between origin zone

where τ is the overall component representing the level of inter-zonal flows,

Once the sample OD flows are detected (Section 4); the next step is to expand them in order to represent the mobility behavior of the total population. There are no available model outputs or comprehensive travel survey data (land use, number of employees, floor area, socioeconomic characteristics, etc.) in Senegal that can be used to develop trip generation models. A simplified method is developed to produce the following two information: (i) total daily person trips originating from each district (_{Oi}_{Dj}_{user}_{totali}_{user}_{i}_{totali}_{i}_{oi}

Two approaches are used to estimate the average inter-district trip distances. Approach 1: the average inter-district trip distance is based on the Euclidian distance measured between the centroids of the origin and destination districts. Approach 2: Eq. 3 is used to measure the average inter-district trip distance. Figure

Two doubly constrained log-linear models are estimated using the average daily inter-district OD flows derived from sample users and expanded to the general population based on census data of the region (in Section 6.1): (i) Model 1 – trip distance is obtained based on Approach 1; and (ii) Model 2 – trip distance is obtained based on Approach 2. The inverse power

We use the glm () function implemented in in R to fit the log-linear model in Eq. 5 to the data (

Eq. 6 shows the estimation of Model 1, where _{1} and dest_{1}, respectively. The coefficients of the origin categorical variables vary from 4.53 (Mbane), which is one of the districts that generates a high number of daily trips to –1.54 (Rufisque), which is one of the districts that generates a low number of daily trips. The coefficients of the destination categorical variables go from 0.77 (Dakar Plateau) to –4.11 (Fongolimbi), which is one of the districts that attracts few daily trips. The coefficient of distance variable is negative and significant.

Eq. 7 shows the estimation of Model 2.

Figure ^{2}) value is also used to compare the observed trips against the models outputs and we found 0.78, and 0.91 for Model 1, and Model 2, respectively.

The trip distance (or travel cost) (lnC_{ij}) parameters of the two doubly constrained log-linear models are shown in

Additional analysis is done to check the reasonableness of the estimated OD flows using Orientation Ratio (OR). This ratio is a simple indicator to show the tendency of trips moving from a given production area to the attraction area (_{ij} is orientation ratio between trip production district i and trip attraction district j, T_{ij} is number of trips from i to j, D_{j} destination total of district j, O_{i} is origin total of district i and there are 123 districts in the study area. Figure

In the developing countries, it is a challenging task to obtain mobility data because of the limited budget available to conduct large scale mobility surveys. In this study, we used CDR data obtained through the D4D Senegal challenge to analyze country-wide mobility patterns of people. Our main focus of the present study was exploring the potential of CDR data to detect the origin and destination flows and measure the spatial and temporal variability of user’s OD trips. Then, we developed two trip distribution models. In this regard, we attempted to make a case for replacing the centroid-to-centroid based trip distance by cell tower-to-cell tower based trip distance to measure travel cost. Thus, the average inter-zonal travel cost that adheres to the reality can be measured.

The extracted trips are categorized by purpose and time of the day. The results show the NHB trips are considerably high and in contrast with previous findings from another region of the world (

Two approaches were presented to estimate the inter-zonal trip distance by analysing Senegal’s CDR data at 123-district level traffic analysis zone system. The centroid-to-centroid distance produced relatively larger average trip distance. As a result, the estimated model is highly sensitive to the distance-decay effect. In the second approach, the origin and destination locations within a zone are approximated by the locations of cell towers, where users spend a significant amount of their time (measured through their mobile phone usage). This has resulted in a shorter trip distance and the model shows less sensitivity to the distance-decay effect. We also detected stay time at the workplace and time spent away from the workplace, which only provided part of the information required for the transportation planning and operation. The timing of when these working hours occur, what time employee arrive and depart from their workplace provide more insightful information, but mobile phone data have limitation in terms of providing actual arrival/departure times.

One of the aims of our study is to provide transport planners in the developing countries with an option that can be considered in the absence of detailed transport data for transport planning. The results can also be used to support decisions regarding the inter-district public transport planning as well as major national and regional road network development projects. In our analysis, residence of sample users is used to obtain the expansion factor required to expand the results to a general population. However, future studies should incorporate detailed profile such as socio-economic and demographic of sample users to properly represent the composition of sample data.

The presented results are not validated against ground truth because of lack of mobility data, where Senegal does not have traffic counting in a regular basis or has no priori travel demand data from previous surveys that tend to be costly, labour intensive and time disruptive to the trip makers. Future studies should also apply CDR data from a longer period in order to capture seasonality of travel demand and improve the representativeness of the extracted movements and flows. In our study, the choice of 10 min threshold value to categorize traces into stay or pass-by is arbitrary. In reality, the threshold value should be area specific. Future studies should consider cell tower service area coverage, traffic and pedestrian congestions to determine area specific threshold values.

The dataset used in this study was obtained through the framework of Data for Development (D4D) Senegal Challenge. The researchers are not allowed to share the data directly. However, the D4D Senegal Challenge data is available on request to other researchers for academic, non-commercial purposes by D4D organizers.

This work was supported by the Eyes High Postdoctoral Fellowship at the University of Calgary. This research work was partially supported by Chiang Mai University. The authors would like to thank the Data for Development Senegal Challenge for providing the mobile phone data.

The authors have no competing interests to declare.

MGD designed the study and processed the data. All authors analyzed the results and wrote the manuscript. All authors have read and approved the final manuscript.