These Research Outputs are not official statistics. Rather they are published as outputs from research into an Administrative Data Census approach. These outputs must not be reproduced without this disclaimer and warning note and should not be used for policy- or decision-making.Back to table of contents
The Administrative Data Census Project is working to assess whether the government stated ambition that “censuses after 2021 will be conducted using other sources of data” can be met. We’re aiming to produce population estimates, household estimates and population and housing characteristics using a combination of administrative and survey data. This is to meet demands for improved population statistics and as a possible alternative to the census.
This publication contains early research to evaluate the current potential of using anonymised data generated by mobile phones to estimate commuting flows (and the mode of transport used for commuting) within an Administrative Data Census. We’d appreciate your feedback on our methodology and would like to hear any ideas you have for improvements. Please send your feedback to Admin.Data.Census.Project@ons.gov.uk.
This publication focuses specifically on a set of standard census outputs: the origin-destination (O-D) flows of people in employment from their usual residence (origin) to their main workplace (destination). These outputs are commonly referred to as census travel to work (TTW) data. The 2011 Census produced a number of travel to work outputs such as commuting by age, by sex and by main mode of transport. These outputs are available for different spatial scales ranging from Middle Layer Super Output Area (MSOA) up to country level. The 2011 Census TTW data are used for transport planning and policy areas such as economic activity and the labour market.
The government holds information, or “administrative data”, about individuals’ home and work locations; however, this is captured inconsistently. For example, data in HM Revenue and Customs may use a company’s head office for individuals employed by that company. This means it doesn’t meet the specific census definition of main workplace and cannot be used to generate the statistics that users need.
Our nationwide Annual Population Survey provides some information on employment patterns. However, estimates of employment are based on a complete year of survey responses and are only produced for larger geographies such as local authority (LA) level due to the sample size.
In the private sector, one of the more promising administrative data sources for informing on commuting flows (as documented together with advantages and disadvantages in an ONS literature review) is mobile phone data (MPD). Using the location of masts or cell-towers that mobile users frequently connect to, mobile network operators (MNOs) are able to estimate the geographical areas containing the usual residence and place of work for their subscribers. With appropriate weighting, the MNOs are then able to produce transport-related estimates including commuter flows, denoting commuters who are resident in each LA and their movement to their main workplace LA. The advantages of using MPD include the ability to model estimates for smaller areas than LA level as well as more frequent and timely outputs.
This analysis compares commuter flow estimates modelled from a sample of one MNO’s subscribers with equivalent 2011 Census TTW data. It describes how the MPD estimates may be used to either replace or add to the census questions. The MPD data that were provided to us are anonymised (that is, no personal identifiable data were provided). Further information on how the commuter flows have been derived from MPD is given in Annex A.Back to table of contents
These Research Outputs are not official statistics.
These outputs use commuter flow estimates modelled by a data analytics company (CitiLogik) from mobile phone data (MPD) from one of the main UK mobile network operators (MNO), Vodafone UK. When we refer to MNO in this publication, we are referring to Vodafone UK.
The MPD are from mobile subscribers aged 18 and over, as indicated by the age on contracts.
We’ve received aggregated MPD to protect individuals’ privacy – no personal identifiable data were provided.
This analysis examines commuter flows from local authority (LA) to LA across England and Wales.
The MPD were collected over a four-week period during March and April 2016.
The analysis is restricted to commuter flows starting or ending in three target LAs: Southwark and the neighbouring LAs of Croydon and Lambeth.
As mobile phone data (MPD) constitutes personal data, all processing and handling of it is subject to the Data Protection Act 1988. Office for National Statistics is also subject to obligations set out in the Statistics and Registration Service Act 2007. Any intention to use MPD within the future production of official statistics will involve extensive evaluation including privacy impacts.
The MPD used to produce commuter flows for this research are the location and timestamps created when mobile phones interact with the mobile network. No identifiable information such as phone numbers, details of owners or recordings have been used. The measure of location is the cell-tower to which the mobile phone is connected to. As cell-towers cover the surrounding area, it is not possible to determine a mobile’s precise location such that a home or work address can be identified.
The MPD location data have been generated from the subscribers (both those on contracts and on Pay As You Go terms) of one large UK mobile network operator. Only subscribers who have opted out of having their data processed are not included. The data have been anonymised and then processed by the analytical company CitiLogik. We have received aggregated (total numbers of) commuter flows with a minimum threshold of 15.Back to table of contents
To make the 2011 Census travel to work (TTW) data and the mobile phone data (MPD) more comparable, two adjustments were made, both described more fully in Annex B. These adjustments were:
adjusting the 2011 Census TTW data to be more timely to represent mid-year 2015, similar to the MPD flows
re-weighting the MPD flows to represent the age range 16 and over, similar to standard 2011 Census TTW data1
The flows of people in employment between their usual residence and their main workplace are referred to as commuter flows. The 2011 Census TTW data show that for each of the three target local authorities (LAs), there were around 300 commuter flows leaving the LA for other LAs and around 300 commuter flows with a destination of the target LA. Most of these commuter flows were small with over 70% of them collectively representing only around 4% of total commuters. In our target LAs, the largest commuter flow tended to be the “intra-LA” commuter flow (where the residence LA was the same as the workplace LA). The intra-LA flow represented 27% of all commuters living or working in Croydon, 9% in Lambeth2 and 10% in Southwark.
Initial inspection of the MPD flows indicated that the intra-LA flow for each target LA was very high compared with the census TTW data3. The remaining MPD flows were generally underestimated. Possible reasons for these conflicting results are detailed later in this publication. These three intra-LA flows were removed from the analysis and considered separately.
The results in Figure 1 show how the MPD flows compare with the 2011 Census TTW data. In the diagrams, each point represents a specific LA origin to LA destination. For each combination of LAs, Figure 1 shows the estimated MPD flows against the 2011 Census TTW flow. The LA origin is the target LA for all points in the residence basis charts, while the workplace basis charts set the target LA to be the LA destination. For commuter flows larger than around 100 commuters, a strong linear relationship on both a residence and workplace basis is seen.
Given the strong linear relationship, a linear regression4 was plotted through the data as shown in each of the panels in Figure 1. This measures how well MPD flows represent the 2011 Census TTW outputs. The gradients of the regression lines are shown in Table 1 and indicate that the MPD flows are generally 70% to 80% of the 2011 Census TTW flows (once intra-LA flows are removed). Commuter flows into Lambeth are underestimated to a greater degree in the MPD as they are around 55% of census totals. The excellent correlation at LA level (0.97 average) suggests the underestimation in the MPD flows is reasonably consistent.
One explanation for this underestimation is that the method used to identify a commuter relies on observing movement that indicates a standard working pattern. This means repeat daily journeys on weekdays from an area of residence to another area and back. The time of the journey and the time spent in the other area are also reflective of standard day time working. Commuters with non-standard work patterns, such as night or shift workers; depot workers and those on zero-hours contracts might not be identified by this method. Additionally, commuters who are on holiday, ill or otherwise absent from work may not be identified due to the limited period of MPD available for study in this analysis.
|Local authority||Population basis||Linear regression coefficient (proportion of 2011 Census TTW flows represented by MPD flows)||Correlation (Pearson method)|
Download this table Table 1: Gradients of regression lines for mobile phone data against census flows.xls .csv
Intra-local authority flows
The intra-local authority (LA) flows were much higher in the mobile phone data (MPD) flows than in the 2011 Census travel to work (TTW) data. One main reason for this overestimation will be students being mistakenly inferred as commuters as their movement behaviour will be similar. Although the MPD uses data from subscribers known to be aged 18 or over, it’s also likely that some parents will take out subscriptions for their children’s mobiles. Children of secondary school age and in higher education might therefore also be included in the MPD flows.
Another factor explaining the high overestimation might be the different concepts behind what the MPD flows and the 2011 Census TTW data represent. The 2011 Census TTW data don’t include people who worked mainly from home or those who had no fixed place of work. Correspondingly, the methodology used in the MPD flows requires a workplace to be at least a short distance away from residence and therefore should also exclude home workers. Workers with very short commutes will also be missing from MPD flows. However, it’s possible that people with no fixed place of work might be seen to spend time at a work location regularly enough for MPD to identify them as being a worker. Table 2 illustrates the different categories of worker expected to be within the 2011 Census TTW and MPD flows.
|Data||Commute more than 2 kilometres||Commute less than 2 kilometres||Work mainly at or from home||No fixed workplace|
|2011 Census TTW||Yes||Yes||No||No|
Download this table Table 2: Categories of worker expected within census and mobile phone data.xls .csv
To further examine the difference between the 2011 Census TTW and MPD estimates of the intra-LA flows, it is assumed that census respondents who commuted less than two kilometres or had no fixed place of work, lived and worked in the same local authority (LA). It is also assumed that students aged 16 and over will live and study in the same LA.
Figure 2 charts the 20 largest 2011 Census TTW flows from residents of the target LAs and the percentage difference between the 2011 Census TTW and the MPD flows. This chart is interactive. You can select the target LA, and include students aged 16 and over or commuters with no usual workplace within the 2011 Census intra-LA flows, and similarly exclude commuters who commute less than two kilometres.
Figure 2: Top 20 commuting flows originating in the target local authorities
Figure 2 highlights the high over-count of the MPD estimate in the intra-LA flow and the general undercount in flows into other LAs.
There are exceptions in that commuter flows into the City of London and Tower Hamlets (which contains Canary Wharf) are overestimated in the MPD data for all three target LAs. Some explanations for this include:
it’s easier to correctly identify a commuter into these areas, as they’re well known as centres of industry; this means it’s less likely that someone would go there regularly for any other purpose than work
these areas contain commuters belonging to higher socio-economic groups who are more likely to carry more than one mobile phone and to be identified as multiple commuters
these two areas are likely to have many frequent visits by workers who are normally based elsewhere
there might be real change in the number of commuters into these areas that isn’t accounted for in the adjustment to census to make the data more comparable; for example, the number of workplaces might have increased at a higher rate than increases in the population aged 16 to 64 years
The comparison between the MPD and 2011 Census intra-LA flows greatly improves if full-time students aged 16 and over are included within the census estimates, suggesting that they may be an important factor for the differences. This will be examined in further analyses at Middle Layer Super Output Area (MSOA) level.
Figure 3 shows the equivalent chart of the top 20 commuting flows from the 2011 Census into each of the target LAs. Again, it is an interactive chart.
Figure 3: Top 20 commuting flows with a destination in the target local authorities
Commuting by main mode of transport
The mobile phone data (MPD) flows were also broken down by predicted main mode of transport used for commuting. The modes available were road, rail and other (such as walking). As per the definition in the 2011 Census, the main mode of transport for any multi-mode commuting journey was the one used to cover the greatest distance.
Figure 4 shows the relationship between MPD and 2011 Census travel to work (TTW) data for the target local authorities (LAs). The LA to LA commuter flows by main mode of transport were smaller than the overall commuting flows. Therefore, the focus is on commuter flows up to 2,000 in size for both the census and the MPD data. The flows originating and having destinations in the target LAs are included together for ease.
The patterns in the breakdown of the commuter flows by mode of transport show that the different modes are generally similar across each target LA.
Commuter flows by road are quite well represented in the MPD data for Croydon with more variability present in Lambeth and Southwark. There’s also an indication that the MPD flows by road are too high for commuters travelling into and out of Southwark, especially for 2011 Census TTW commuter flows less than 100.
However, rail travel is underestimated in the MPD data for commuters into and out of all three target LAs. This could be partly due to the difficulty in differentiating road and rail travel within dense metropolitan areas, such as the three target LAs. Of note is that the MPD only classifies underground trips as “rail” if the journey interacts with a train station as well. This means many underground trips might be categorised as being road based.
The “other” mode includes commutes on foot as well as those that can’t be categorised as road or rail. It appears that there’s a linear relationship – MPD being higher for commuter flows into and out of Croydon, but lower for Lambeth and Southwark. It is of note that commuters using this mode are likely to be mainly within the intra-LA flows, which have been removed from this analysis.
Further information on the methodology used to infer mode of transport is needed to understand more clearly the reasons for the differences. It should also be noted that, in part, these could be real changes in commuting modes that have happened in the past five years and which the limited modelling undertaken on census data has been unable to match. For example, commuting flows may have been affected by the London 2012 Olympics or the move of public sector employees to Croydon.
Notes for: What do the outputs show?
The travel to work (origin-destination) data from the 2011 Census covers all usual residents in the UK aged 16 and over who were in employment the week before the census.
The largest commuter flow for Lambeth is for commuters who live in Lambeth and work in Westminster. This flow represents 13% of all commuters either living or working in Lambeth.
The intra-local authority MPD flows were 93% larger than 2011 Census TTW in Croydon, 196% larger in Lambeth and 180% larger in Southwark.
Linear regression using ordinary least squares method with an intercept set to zero.
This research has shown, for local authority (LA) areas, that mobile phone data (MPD) flows and 2011 Census travel to work (TTW) data have good correlation for longer distance commuter flows over a magnitude of around 100 commuters. Although the MPD flows underestimate TTW, this is broadly consistent for the largest flows into and out of the target LAs. Future enhancement of the method to infer a commuter might be to remove the constraint of implied standard working hours to try and include commuters with non-standard work patterns, such as night or shift-workers, weekend workers and workers with varying hours of work.
Commuters whose journeys start and end in the same LA are greatly overestimated using MPD. As there is a methodological limitation of MPD in detecting home-workers or commuters who travel very short distances, this suggests many people, for more local areas, are being incorrectly identified as being workers. For example, students and people who visit nearby shopping areas twice a week might be identified as being a worker in MPD. Future enhancement of the MPD algorithms might be able to separate students from workers by looking at movement behaviour across a longer period and considering the different movements during term times against school holiday times.
These intra-LA flows represent a large proportion of all commuters (27% of all workers living or working in Croydon, around 10% for Lambeth and Southwark). We therefore propose to replicate this research using Middle Layer Super Output Area (MSOA) geographies to see if MPD can provide good correlation for smaller areas than LA.
At MSOA level, the research will also look more closely at the different modes of transport used. It will put special focus on MSOAs containing train stations or tube stations. This will further add to understanding why rail journeys are underestimated in the MPD-flows at LA level. Other areas of interest for MSOA level analysis will include examining the built environment (such as the presence of colleges or retail centres) and whether the MPD overestimates workers due to likely confusion with students and shoppers. We’ll also examine MSOAs containing new housing developments built since the 2011 Census to see if MPD might provide more timely information on changes to commuter flows.
Office for National Statistics (ONS) anticipates accessing more non-disclosive MPD for the investigation into commuter flows. These new data might be produced using the traces from mobile subscribers on alternative mobile networks to the data used in this research. It is also expected that any MPD sourced will likely include improvements to the underlying methodology. This could include appropriately incorporating demographic information like age and sex within the weighting procedure to produce estimates representative of the population.Back to table of contents
We’re keen to get your feedback on these Research Outputs and the methodology used to produce them. This includes how they might be improved and potential uses of the data. Please email your feedback to Admin.Data.Census.Project@ons.gov.uk. Please include the title of the output in your response.Back to table of contents
The methodology used to derive commuter flows from mobile phone data (MPD) is owned by the mobile network operator (MNO) and the data analytics company. We can only describe it in a general sense to give you a broad understanding. Continued improvements in the methodology means that results from this analysis are also likely to improve over time.
The anonymised MPD has been generated from the subscriber of one large UK MNO. Subscribers known to be under 18 years old are not included. The MNO also allows subscribers to opt out of having their data processed. These could all have an effect on the analysis. As MPD constitutes personal data, all processing and handling of it is subject to the Data Protection Act 1988. Office for National Statistics (ONS) is also subject to obligations set out in the Statistics and Registration Service Act 2007.
Mobile location data
Mobile phones connect via radio waves to a nearby cell-tower or base station that covers a specific geographic area called a “cell”1. Groups of cells form larger geographies called “location areas”2. It’s assumed that when a mobile is connected to a specific cell-tower, the mobile and its subscriber are somewhere within that tower’s cell area.
Two types of location information are generated from mobile phones. These are:
call detail records (CDRs), which are generated when the phone is active; this means when the user is taking or placing a phone call, sending or receiving a mobile message or using the internet on a smart-phone – in these situations, the time and the cell ID of the cell-tower forming the mobile connection is recorded
network data – when the phone is switched on but in idle mode, time and location data are generated from regular location updates (known as “pinging”); further location information is generated when the mobile phone transfers to a different cell-tower, suggesting that the phone is moving
Home location (origin)
Home location is worked out for each subscriber and is based on the cell where the mobile is located during the night or when switched on first thing in the morning. The MPD data used in this research covers a period spanning four full weeks during March and April 2016. This is to show repeated patterns and give greater confidence about identifying home location. MNOs don’t use the address given on personal contracts to work out home location, as this information is considered to be less reliable.
Work location (destination)
Work location is more difficult to model than home location. It’s usually set to the location where a mobile phone is found between standard work hours during standard work days (Mondays to Fridays). The method used to detect a workplace also needs the mobile subscriber to repeat journeys at least twice a week over the four-week period. They also need to spend a sufficient period of time there. Many mobile subscribers won’t have movement patterns that fit with this definition and won’t be identified as a commuter. Equally, some mobile subscribers will have similar movement patterns to a commuter, such as students.
Mapping cell areas to other geographies
An important step in producing MPD-flows involves mapping cell areas to other geographies. In this research, Middle Layer Super Output Areas (MSOAs) were chosen. The method to do this is proprietary to the data supplier and complex, as it requires consideration on how to allocate cell areas when they cover more than one MSOA. This causes problems with correctly identifying the home MSOA location of a subscriber identified as living in a certain cell area.
Producing the commuter estimates
All subscribers aged 18 and over (as recorded in the mobile phone contract) identified as living in an MSOA are given a weight set to the 2015 mid-year population estimate for people aged 16 to 64 years3 divided by the total number of subscribers resident there. This weight attempts to correct for both the mobile phone penetration (average number of mobile devices owned by a resident irrespective of the mobile network operator (MNO)) and the MNO market share (of mobiles) in the area.
Subscribers identified as having a usual MSOA workplace are then totalled into flows of origin MSOA to destination MSOA, incorporating the weight given to subscribers resident in each origin MSOA.
Notes for Annex A: How have the commuter flows been derived from mobile phone data?
Cell areas vary in size. In urban areas, where cell-towers are densely situated, they may have a range of 300 to 400 metres. In rural areas, cell-tower density is very sparse and a cell may have a range of five kilometres or more.
Location areas can also vary in size. Some can be very large.
Mobile phone data can only include data from subscribers aged 18 and over. Weighting to the age range 16 and over, as in this analysis, gives a better comparison with the age range of typical published census outputs. The proportion of commuters aged 16 and 17 years is small compared to commuters aged 18 and over.
Census travel to work (TTW) data are collected using well-defined concepts and definitions and are available in a variety of formats. Mobile phone data (MPD) is a by-product of mobile telephony and needs to be modelled into commuter flows to represent the census data definitions. This is a difficult task requiring an understanding of how mobile phone behaviour relates to the census TTW data and an appreciation of the possible biases involved. This annex describes adjustments made to both of these data to make them more comparable.
Two main adjustments were made to the data. This first adjusts the census estimates so that they relate to mid-year 2015 population estimates. The second involves re-weighting the MPD flows to represent the population aged 16 and over.
Making census more timely
Census data indicate that there’s a large reduction in the proportion of workers over the age of 64 compared with those aged 16 to 64 years. It’s therefore reasoned that the working age population – taken to be those aged 16 to 64 years – is one of the main factors affecting the number of people commuting out of an area. As such, the 2011 Census TTW data were adjusted based on the change in the aged 16 to 64 population from 2011 to 2015. The exact adjustment for each local authority (LA) is given by the following multiplier:
This multiplier for LA(i) applied to the total commuters for all journeys originating in LA(i), and was calculated for each origin LA(i), where i ranges from one to the number of LAs. The revised totals for census data were used for all analysis in this publication. The median percentage change in working age population per LA from mid-2011 to mid-2015 was negative 0.345% with a maximum of 14.7% growth in Tower Hamlets and a minimum of 6.38% reduction in West Somerset. For the target LAs, Croydon’s working age population grew by 2.1%, Lambeth by 6.8% and Southwark by 6.5%. This adjustment therefore increased the census flows originating from these LAs.
Working age versus total population for MPD weighting
The MPD estimates use geo-location data collected from one mobile network operator’s (MNO’s) subscribers during four weeks in March and April 2016. Even though official estimates indicate that 93% of the adult population own and use a personal mobile phone, there are multiple MNOs. This means the subscribers from only one represent a much smaller fraction of the total population. This makes it necessary to weight the modelled estimates from any single MNO so the full population is represented.
The MPD estimates were weighted to cover the usually resident population using the Population estimates for the UK: mid-2015. These were the most recent available when the MPD estimates were commissioned and best reflected the period of the MPD collection. In contrast, the most recent population census was carried out in March 2011 and it’s reasonable to presume that TTW data may have changed over time.
The MPD estimates were based on data from mobile subscribers from one MNO aged 18 and over. The age and sex of each subscriber wasn’t used within the modelling of these MPD estimates.
To account for the difference between the number of mobile phone subscribers who commute and the total commuting population, these estimates were weighted by the MNO to ONS’s 2015 estimate of the aged 16 to 64 population. This age range was chosen as it represents the working age population and is a standard age range for many published employment statistics.
However, as the methodology to produce MPD-flows first needs the locations of subscribers’ homes to be worked out, this suggests that weighting to the population aged 18 and over would be more appropriate. It was decided to re-weight the MPD flows to the age range 16 and over to match standard published TTW tables and as the proportion of the population aged 16 and 17 years is very small.
Although the weighting method used by the data supplier is based on Middle Layer Super Output Area geographies, the re-weighting was made using equivalent estimates at LA level. The ratio between the “aged 16 and over” and “aged 16 to 64 years” totals was calculated for each LA and then used to multiply the relevant LA’s MPD-flows originating there.
The general impact of this re-weighting was that the MPD flows increased on average by around 25% although this increase was heavily dependent on the age composition of the LA. Flows from residents of Croydon rose by 20% but flows from Lambeth and Southwark only rose by around 10%, due to the larger proportion of residents aged 16 to 64 years. These “up-weighted” totals for MPD-derived data were used for the analysis presented in this publication.Back to table of contents