Measuring uncertainty in international migration estimates

1. Overview of measuring uncertainty in international migration

There has been a shift towards using administrative data as the main data source for producing estimates of long-term international migration since April 2021. This is because there has been a recognition that the International Passenger Survey (IPS) has been stretched beyond its original purpose. The Office for Statistics Regulation (OSR) recommended that users need a clear understanding of uncertainty associated with international migration estimates and information on how the estimates can be used appropriately. Uncertainty measures will be informative for users as quality indicators for the estimates.

In this study, we investigate how a simulation-based approach can quantify some sources of uncertainty in methods used to generate international migration estimates. Our approach applies a resampling with replacement method to generate plausible intervals of uncertainty. "Uncertainty" is defined here as the quantification of doubt about a measurement.

Our study is partial and does not cover all sources of uncertainty. Future research will more thoroughly investigate other sources of uncertainty, such as bias in administrative data sources, and construct a more comprehensive measure of uncertainty in international migration estimates. Once further research has been completed, we will bring together all the individual measures of uncertainty into comprehensive measures of uncertainty for international migration, for both immigration and emigration estimates.

There are limitations to consider when interpreting the results presented in this paper as:

we do not measure all the main sources of uncertainty and do not provide an uncertainty estimation for overall international migration estimates
net migration results should not be derived from this paper and further analysis will be required to estimate uncertainty associated with net migration
our measures of central tendency (such as mean and median) are derived from the simulation results and are not the same as the point estimates in official figures published
comprehensive measures of uncertainty that take into consideration all the main sources of uncertainty may produce wider intervals than those presented in this paper
any future transformation in international migration estimates (for example, data or method changes) will change uncertainty estimation

The results in this paper should not be taken as a full approximation of uncertainty around, or directly applied to, published international migration estimates.

Back to table of contents

2. Aims

This research aims to provide a preliminary estimate of uncertainty (primarily variance) in the adjustment stages of administrative-based migration estimates (ABMEs), for EU, non-EU and British nationals.

An adjustment is a correction made in the estimation process for some known bias in the data sources. These adjustments are based on some assumptions (such as historical patterns in data), and we measure the variance introduced by these assumptions.

The adjustments assessed in this report include the following:

EU nationals, under-16s adjustment, for immigration and emigration
EU nationals, student adjustment, for immigration and emigration
EU nationals, late-registration adjustment for immigration only
EU nationals, temporal disaggregation and prediction using the Denton-Cholette method, for both immigration and emigration
non-EU nationals, early leavers adjustment for immigration only
non-EU nationals, re-arrivals adjustment for emigration only

We also assess uncertainty around estimates for British nationals, and produce a partial composite measure of uncertainty based on all adjustments for EU nationals.

Figures and estimates published here are not comparable with those published in our most recent Long-term international migration, provisional: year ending December 2022 bulletin.

Since this analysis was conducted, some methodological changes have been applied to the estimates of international migration. This reflects that, with each estimation method change, the uncertainty measurement method requires updating as well.

The estimates of uncertainty presented here are partial, based purely on simulation-based variance, and do not account for any bias in the migration estimates. The uncertainty intervals shown here are therefore likely to be narrower than the actual intervals around the point estimate.

Back to table of contents

3. Methodology

Data

To quantify uncertainty in long-term international migration (LTIM), the same raw data that was used to produce our LTIM estimates year ending (YE) December 2022 was used. LTIM estimates are composed of three nationality groupings (EU, non-EU and British) and different data sources are used for each.

For EU nationals, we use the Registration and Population Interaction Database (RAPID). RAPID is created by the Department for Work and Pensions (DWP) to provide a single coherent view of citizens' interactions across the breadth of systems in the DWP, HM Revenue and Customs (HMRC) and local authorities via Housing Benefit. RAPID covers every National Insurance number (NINo) and for each record, the number of weeks of "activity" within these systems is summarised in each tax year. Records are then categorised as either long-term or short-term by looking for patterns of interactions with the tax and benefits system.

For non-EU nationals, we use Home Office Borders and Immigration Systems data. These are created by the Home Office. They combine visa and travel information to link an individual's travel movements into and out of the country using passport information, and assign long-term international migration status based on the derived length of stay in the UK.

The International Passenger Survey (IPS) is the main data source for estimating migration of British nationals. To estimate migration during the period when the IPS was suspended (March to December 2020), state space modelling was used.

Methods

The quantification of uncertainty is advised in The Aqua Book (PDF, 1.045MB). Qualitative expressions of uncertainty (for example, moderate and high) are ambiguous and mean different things to different people, and the degree of uncertainty may be misunderstood by others. It is recommended to express the impact of uncertainty quantitatively, when possible, in terms of the range of outcomes and their likelihoods, even if this is approximate and/or subjective.

A simulation approach was applied to quantify uncertainty in the adjustments, the modelling, and the survey-based estimates. For processes that incorporated surveys, the surveyed population was resampled with replacement to produce large numbers of replicates which had the same structure as the original sample. In the case of British national estimates, the International Passenger Survey (IPS) was directly sampled, while for EU estimates based on modelling, the IPS confidence intervals were used as plausible ranges. We did not use the raw IPS sample data for the EU national under-16s adjustment because of low sample counts and volatility.

For parts of the process that were without an immediate sample, the process was modelled statistically using probability distributions that mimic migration patterns. When required, an underlying normal distribution was assumed when the observed data could not be directly sampled. Further information on the selection of probability distributions can be found in Section 11: Appendix.

Further analysis not presented here has been conducted to check the impact of this assumption through comparison with other probability distributions. We compared our underlying normal distribution assumption with other probability distribution assumptions. The uniform distribution was used to reflect an uninformed distribution that allocated equal probability across the defined ranges to produce an uncertainty upper-bound point of comparison. Other distributions, such as the log-normal and the beta distribution, were used to ascertain some parameter assumption choices for our normal distributions. The simulations generate a range of possible values that are used for interval estimation.

Further details are published in our Methods to produce provisional long-term international migration estimates methodology.

Non-EU nationals

Final-year immigration adjustment

The Home Office Borders and Immigration Systems data include a visa start and end date, as well as arrival and departure dates, for each record in the dataset. For those individuals whose first arrival occurred within the 12 months before the end of the dataset, we do not yet have enough data to estimate a long-term stay of 12 months or more. Instead, we use their visa end date as a proxy for a future departure date. All individuals in this group with a visa lasting at least 12 months are therefore counted as long-term immigrants and given by N^v_x, for year x and visa type v (work, study, family, other).

This assumption leads to an overestimation of long-term immigration, as a proportion of people leave before their visa end data and would not be classified as a long-term migrant. To adjust for that overestimate, the number of people in year x that emigrated before their visa end date by visa type v, E^v_x, are counted in the three previous non-coronavirus (COVID-19) years, 2017, 2018, and 2019.

For the three years this gives proportions:

The proportions are then used to estimate p^v₂₀₂₂, by fitting the past proportions to constrained normal probability distributions. The number of early leavers, E^v₂₀₂₂ , is then produced from the proportion by simulating from a binomial distribution.

We assume that:

The simulated number of early leavers is subtracted from N^v₂₀₂₂ to produce final counts by visa type.

The process is repeated 10,000 times to produce a distribution of immigration counts that is used for interval estimation.

Emigration re-arrivals adjustment

Long-term migrants with a last departure from the UK are recorded as emigrants if their visa has ended and they do not return to the UK within 12 months, or if they only return for a short-term stay. Some of these people classified as long-term emigrants may later return to the UK for another long-term stay. Like the final-year immigration adjustment, the people that would have been classified as emigrants but then return are counted for the three previous non-COVID-19 years. We simulate the proportion of early returns for the current year by fitting probability distributions to past data, and then simulate from a binomial distribution to produce current year counts. The process is repeated 10,000 times and used for interval estimation.

EU nationals

Late-registration adjustment

RAPID data relies on people, upon first arrival into the country, applying for a NINo. In practice, there is a gap between date of arrival and date of registration for a substantial proportion of first-time arrivals, who could then be counted as immigrants in an incorrect year. To account for the gap, counts of NINo registration, based on one year and two years after arrival to the country, are created for previous years. The proportions of late registration for NINo are calculated for previous years, and similarly to the non-EU final year adjustment, are taken for the current year by simulating from fitted probability distributions. Counts for the current year, as well as the two previous, are then adjusted by simulating from their respective binomial distribution. The process is repeated 10,000 times to produce a range of values.

Student adjustment

RAPID uses interactions with the breadth of benefits and earnings systems in the DWP and HMRC to estimate migration into and out of the UK. Any students who do not work alongside their studies will not be identified as long-term migrants using RAPID. The international student inflows for the academic YE 2016 to the academic YE 2018 are linked to their corresponding records in HMRC Pay as You Earn Real Time Information (PAYE RTI) from April 2014 to April 2019 via the Demographic Index. The Demographic Index is described in our report, Evaluating Statistical Quality in the Demographic Index (PDF, 550KB). Anyone without any payments from employment in the tax year of arrival and the following tax year are considered to not be included in our RAPID estimates. The proportion of "students not working" was calculated for the three academic years (ending 2016 to 2018). An average of these years was used for years of Higher Education Statistics Agency (HESA) data where PAYE RTI is not yet available. Furthermore, the proportion of "students departing the country" was calculated for the same three years from the Longitudinal Education Outcomes (LEO) dataset from the Department of Education.

For immigration, the proportion of working students is modelled from past data points and simulated to derive samples of the proportion of non-working students. These proportions of non-working students are then applied to the HESA total to provide a plausible set of simulated student counts where each iteration is added to the RAPID number, producing a range of estimates.

For emigration, the process is the same, except that the proportions of departing students are also simulated to provide a plausible set of simulated students not included in RAPID. Independence between students being in RAPID and emigrating from the UK is assumed.

Under-16s adjustment

RAPID, since it relies on interaction with primarily financial admin sources, has a coverage gap for minors (those aged under 16 years). The IPS data is used to estimate current year adult-to-child ratio separately for immigration and emigration because of this. The IPS estimates produce counts and variance by age group. They are assumed to follow a normal distribution with the given count as the mean. Then the adult-to-child proportion is simulated from resampling counts of those aged under 16 years and those aged 17 years and over, assuming independence between the two groups. The estimated counts of immigrants and emigrants aged under 16 years is then added to the RAPID totals. The process is repeated 10,000 times to produce a range of values.

Modelling

Temporal disaggregation is used to generate EU-national estimates to account for timeliness limitations of RAPID, as the data are only provided up to the end of the financial year (ending March). The temporal disaggregation process performs both distribution and extrapolation of RAPID data up to the reference date for publication. Distribution is the disaggregation of already known totals in the annual source to the monthly totals, which does not go beyond the end time of RAPID data in March. Extrapolation is the process of generating monthly figures beyond the timeframe of RAPID data, based on the signals and trends in the higher frequency time series of the IPS.

To quantify the uncertainty in temporal disaggregation we have used the IPS monthly confidence intervals for EU immigration and emigration, based on an assumption that the intervals provide a plausible range to capture the uncertainty in the higher frequency time series. We have then resampled with replacement from this range, with an underlying normal distribution. The result is 10,000 plausible IPS time series for EU immigration and emigration, which are then used as inputs for the temporal disaggregation modelling. The 10,000 outputs from the model provide the range of values which make up the uncertainty interval.

British nationals

British national migration is a survey-based estimate from the IPS. To measure uncertainty in estimates of British national migration we have used non-parametric bootstrapping of the observed survey data from January 2022 to December 2022. We make no distributional assumptions about the survey data.

The interval estimate is derived through generating 10,000 new samples. We generated these new samples from resampling with replacement directly from the IPS-observed data. The 10,000 new samples provide a range of plausible estimates for British national migration. The complex sample design of the IPS and non-sampled international travel routes and time periods mean that weights are used to create national estimates of international travellers. Based on previous research [Note 1], and the proportional contribution of British nationals to international migration estimates, we have taken a simple resampling approach, with no reruns of survey weighting or imputation. This approach will therefore understate the uncertainty. Incorporating these sources of uncertainty is a possible area for future improvement.

[Note 1] Unpublished report by X. Ou and P. Smith (2012), 'The Developed Methodology to Estimate Statistical Error of Mid-year Local Authority Emigration Estimates', 2012.

Back to table of contents

4. Results

Our measuring uncertainty results include:

EU nationals, immigration and emigration, for adjustments and for modelling
a composite measure for EU nationals, immigration and emigration, combining adjustments and modelling
non-EU nationals, immigration and emigration, for adjustments
British nationals from survey-based estimation

All results presented in this section are rounded to the nearest hundred.

EU-national adjustments

Here, we show estimates of uncertainty around adjustments used to calculate the EU migration into and from the UK, in the year ending (YE) March 2022. The results presented assume that the adjustment time series (adult-to-child ratio, proportion of students in Registration and Population Interaction Database (RAPID), proportion of people registering for a national insurance number (NINo) a year after arriving in the UK) follow a normal distribution. Once that proportion is sampled, it is used to simulate the people being selected in their respective group. Three adjustments are made for immigration (RAPID with late-registration adjustment, student adjustment, and under-16s adjustment) and two adjustments are made for emigration (student adjustment and under-16s adjustment).

Table 1 and 2 present the results for 2021 and 2022, as the late registration adjustment affects the prior two years of data that then feed into the disaggregation model.

Table 1: Statistics from 10,000 simulations for year ending March 2021 EU immigration, impact of adjustments only
Adjustment	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
RAPID with late-registration adjustment	136,700	3,900	129,200	134,100	136,700	139,400	144,600
Student adjustment	33,300	500	32,200	32,900	33,300	33,600	34,300
Under-16s adjustment	10,000	3,400	3,500	7,600	9,900	12,200	16,700

Download this table Table 1: Statistics from 10,000 simulations for year ending March 2021 EU immigration, impact of adjustments only

.xls .csv

Table 2: Statistics from 10,000 simulations for year ending March 2022 EU immigration, impact of adjustments only
Adjustment	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
RAPID with late-registration adjustment	161,800	5,600	151,000	158,000	161,700	165,600	173,000
Student adjustment	20,100	300	19,500	19,900	20,100	20,300	20,700
Under-16s adjustment	11,800	4,000	4,200	9,000	11,700	14,400	19,900

Download this table Table 2: Statistics from 10,000 simulations for year ending March 2022 EU immigration, impact of adjustments only

.xls .csv

The mean from the under-16s adjustment based on the simulations is 11,800, with a standard deviation of 4,000. What the simulations additionally show us is that 95% of the results lie somewhere between a value of 4,200 and 19,900. There is therefore a large amount of uncertainty.

In theory, if the upwards adjustment to the data to account for the under-16s missing from the RAPID dataset was assumed to be the mean of 11,800 for YE March 2022 for EU immigrants, the "true" number could be between 4,200 and 19,900. Quantifying this uncertainty is essential to inform users of these statistics through providing a plausible range where the "true" number could lie.

The student adjustment contribution is estimated to have lower uncertainty than the under-16s adjustment, in the range of 19,500 and 20,700 for 95% of simulations.

Estimates of uncertainty around adjustments used to calculate the number of EU nationals emigrating out the UK, in YE March 2022, are presented here in Table 3. These adjustments include an under-16s adjustment and a student adjustment. Because of the lower complexity of the emigration process, our method introduces less variability to the estimates and very likely underreports the true emigration uncertainty.

Table 3: Statistics from 10,000 simulations for year ending March 2022 EU emigration, impact of adjustments only
Adjustment	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
Student adjustment	11,800	700	10,400	11,400	11,800	12,300	13,200
Under-16s adjustment	3,400	1,500	700	2,400	3,400	4,400	6,500

Download this table Table 3: Statistics from 10,000 simulations for year ending March 2022 EU emigration, impact of adjustments only

.xls .csv

The simulation results suggest, for EU immigration and emigration, that there is proportionally more uncertainty associated with the under-16s adjustment in comparison with the student adjustment. However, these results should be interpreted with caution. This is because the data used for the student adjustment is a proportion of EU students in work from pre-coronavirus (COVID-19) and pre-Brexit years, which is projected forward to the current year. The pattern of EU students' migration to and from the UK, and their tendency to appear in RAPID through their paid work, could have changed significantly since.

Modelling

Here we present uncertainty estimates for temporal disaggregation and the prediction of EU nationals migration using the Denton-Cholette method. This method is detailed in the report, ESS guidelines on temporal disaggregation, benchmarking and reconciliation (PDF, 2.397MB). The immigration and emigration simulation results from the resampling are presented in Table 4.

Table 4: Statistics from 10,000 simulations for year ending December 2022 EU immigration and emigration after modelling
Year Ending Quarter	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
Dec 2022 immigration	152,700	20,400	117,800	139,000	150,700	164,100	198,500
Dec 2022 emigration	205,500	29,700	151,000	185,900	203,600	223,000	270,600

Download this table Table 4: Statistics from 10,000 simulations for year ending December 2022 EU immigration and emigration after modelling

.xls .csv

For immigration, our simulated results indicate that 95% of the estimates were between the range of 117,800 and 198,500 (approximately 81,000 difference). For emigration, 95% of the estimates were between the range of 151,000 and 270,600 (approximately 120,000 difference). The simulation results suggest that temporal disaggregation, which uses the International Passenger Survey (IPS) as a higher frequency input data to disaggregate RAPID data, is a main source of uncertainty.

EU composite measure

Table 5 shows the results for a preliminary measure of uncertainty for EU national migration. The composite measure is derived from combining the adjustments and modelling. This is done through combining the simulation results from the RAPID adjustments and the simulation results from generating new IPS time series for EU migration. We can use these as two sources to input into the Denton-Cholette model. The model requires a higher frequency input (such as simulated IPS-based EU migration time series) and a lower frequency input (such as simulated results after applying adjustments to RAPID data as the main source for EU migration estimates). This means that we have 10,000 simulated versions of RAPID (with adjustments) and 10,000 simulated versions of IPS EU migration, both for immigration and emigration. These act as inputs into the Denton-Cholette model, and the outputs provide our composite measure.

The adjustments are in place to correct for the structural missingness in RAPID data. As such, the composite measure does not contain any uncertainty estimation in relation to bias within the underlying data.

Table 5: Statistics from 10,000 simulations for year ending December 2021 and December 2022 EU immigration and emigration, accounting for adjustments and modelling
	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
YE Dec 2021 immigration	191,100	10,200	172,100	184,200	190,900	197,700	212,400
YE Dec 2021 emigration	237,300	8,000	221,100	232,100	237,500	242,600	252,500
YE Dec 2022 immigration	148,600	20,900	112,800	134,100	146,800	160,800	195,600
YE Dec 2022 emigration	205,300	29,700	151,000	185,700	203,400	222,800	270,600

Download this table Table 5: Statistics from 10,000 simulations for year ending December 2021 and December 2022 EU immigration and emigration, accounting for adjustments and modelling

.xls .csv

For immigration in YE December 2022, 95% of the estimates were between the range of 112,800 and 195,600 (approximately 83,000 difference). For emigration, 95% of the estimates were between the range of 151,000 and 270,600 (approximately 119,000 difference).

Figures 1 and 2 display YE quarterly uncertainty estimates for the composite of EU estimates. Both figures show uncertainty increases when EU estimates include forecasting as a component of the estimation. The implication is that extrapolation is a main source of uncertainty. We see smaller uncertainty estimates up until March 2022, which is the timeframe of RAPID, the main data source. We then see increasing uncertainty estimates as we progress beyond March 2022, when we need to extrapolate beyond the timeframe of the main data source.

Figure 1: Boxplot of 10,000 simulations for immigration totals since year ending June 2021

EU immigration simulation boxplots, year ending (YE) quarterly, composite

Source: Office for National Statistics, Department for Work and Pensions, Higher Education Statistics Authority

Notes:

The boxplots in the grey area, beginning YE June 2022, show estimates that are based on forecasts.

Download this chart Figure 1: Boxplot of 10,000 simulations for immigration totals since year ending June 2021

Image .csv .xls

Figure 1 shows the results from the simulations for immigration. The boxplots by quarter show the range of results for year ending immigration totals. For instance, YE March 2022 is the immigration count for April 2021 to March 2022. For boxplots up to YE March 2022, the data used in the simulation was drawn from observed RAPID data plus adjustments and disaggregation. For boxplots after that point, the data used in the simulation included estimated RAPID data points. Wider uncertainty is associated with the estimated data.

Figure 2: Boxplot of 10,000 simulations for emigration totals since year ending June 2021

EU emigration simulation boxplots, year ending (YE) quarterly, composite

Source: Office for National Statistics, Department for Work and Pensions, Higher Education Statistics Authority

Notes:

The boxplots in the grey area, beginning YE June 2022, show estimates that are based on forecasts.

Download this chart Figure 2: Boxplot of 10,000 simulations for emigration totals since year ending June 2021

Image .csv .xls

Figure 2 is like Figure 1. The data up to YE March 2022 is based on observed RAPID totals, while after that point, the emigration data is estimated.

Non-EU adjustments

Immigration

Estimates of uncertainty around adjustments used to calculate the number of non-EU nationals immigrating into the UK, in YE December 2022, are presented here. Table 6 shows results for the early leavers adjustment for non-EU nationals.

Table 6: Statistics from 10,000 simulations for year ending December 2022 non-EU immigration by visa type, after applying the early-leavers adjustment
Visa type	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
Work	232,700	8,500	215,800	227,800	232,800	238,400	249,000
Study	359,100	26,300	306,800	341,100	359,000	377,000	410,600
Family	50,900	1,200	48,400	50,000	50,900	51,700	53,200
Other	34,300	1,300	31,700	33,500	34,400	35,200	36,600

Download this table Table 6: Statistics from 10,000 simulations for year ending December 2022 non-EU immigration by visa type, after applying the early-leavers adjustment

.xls .csv

The simulation results suggest that the data for immigrants on study visas are most susceptible to uncertainty following the application of the early leavers adjustment: 95% of the estimates are between the range of 224,700 and 324,300 (approximately 100,000 difference). A main reason is the degree of change in the year-to-year variance in the proportion of early leavers with student visas. This makes sense as, if the behaviour of this population changes over time, our estimates will be more uncertain.

Other visa types also have large uncertainty, such as visit visas. These contribute less to overall immigration uncertainty as their counts are smaller.

Emigration

Estimates of uncertainty around adjustments used to calculate the number of non-EU nationals emigrating out of the UK, in YE December 2022, are shown in Table 7. These adjustments include an emigration re-arrivals adjustment for potential emigrants who return to the country when they would have been counted as emigrating.

Table 7: Statistics from 10,000 simulations for year ending December 2022 non-EU emigration, after applying the emigration re-arrivals adjustment
	Mean	Standard deviation	Percentile
			2.50%	25%	50%	75%	97.50%
Dec 2022 emigration	187,800	1,200	185,400	187,000	187,800	188,700	190,200

Download this table Table 7: Statistics from 10,000 simulations for year ending December 2022 non-EU emigration, after applying the emigration re-arrivals adjustment

.xls .csv

For emigration, 95% of the estimates were between the range of 185,400 and 190,200 (approximately 5,000 difference). The small uncertainty compared with immigration is because of the relative simplicity of the emigration calculation process compared with immigration. This likely results in an underestimation of uncertainty and is an area for future improvement.

British national survey estimates

Estimates of uncertainty around the number of British nationals immigrating and emigrating into and from the UK, in YE December 2022 are given here. The results from our resampling with replacement from the IPS are presented in Table 8.

Table 8: Statistics from 10,000 simulations for year ending December 2022 British migration
	Mean	Sample Size	Standard deviation	Percentile
				2.50%	25%	50%	75%	97.50%
Immigration	88,000	116	9,800	69,400	81,200	87,800	94,700	107,600
Work	19,000	33	3,600	12,300	16,400	18,800	21,300	26,500
Study	3,600	6	1,900	700	2,200	3,400	4,800	7,800
Other	65,400	77	9,000	48,400	59,300	65,200	71,400	83,400
Emigration	91,900	188	8,400	76,100	86,100	91,600	97,300	109,300

Download this table Table 8: Statistics from 10,000 simulations for year ending December 2022 British migration

.xls .csv

The method estimates that immigration is between 69,400 and 107,600 for 95% of the simulation results, with a median of 87,800. "Other" is the main reason for immigration by British nationals, and the estimate is between 48,400 and 83,400 for 95% of the simulation results, with a median of 65,200. Emigration is between 76,100 and 109,300 for 95% of the simulation results, with a median of 91,600.

Back to table of contents

5. Conclusion

Showing uncertainty in estimates is essential to improving the interpretation of statistics and to bring clarity to users about what the statistics can and cannot be used for. This is stated in a report by the Office for Statistics Regulation (OSR), Approaches to communicating uncertainty in the statistical system (PDF, 605KB). Providing uncertainty measures can enable users to make informed decisions based on the level of the uncertainty associated with the estimates. The purpose of this research was to explore how a simulation-based method can be applied to help quantify uncertainty in international migration estimates, with a focus on some of the sources of uncertainty in the statistical system. Our work on measuring uncertainty in international migration estimates, when completed, will be important for providing users with quality indicators around the estimates. Our research is still progressing, and not all sources of uncertainty are included in this paper, with some assumptions being made in the approach we have outlined.

Back to table of contents

6. Future developments

Our research is partial, and we are working towards developing more comprehensive measures of uncertainty for international migration estimates, which will require further work.

Bias in estimates from administrative data

Further work is planned which will aim to quantify the uncertainty stemming from the main data sources used to create administrative-based migration estimates (ABMEs), including Home Office Border Systems Data and the Registration and Population Interaction Database (RAPID). More information on these important data sources, including coverage issues which result in bias, can be found in the Home Office statistics on exit checks: user guide and our Methods for measuring international migration using RAPID administrative data methodology.

Model and parameter uncertainty

To fill the gap between the end date of RAPID data and the date to which migration estimates are published, the Denton-Cholette method is used to predict EU nationals' migration. This method relies on input data, model selection and parameter settings. Future research will be conducted to incorporate data uncertainty, model uncertainty, and parameter uncertainty associated with temporal disaggregation of RAPID data for EU national estimates.

Extension of current work on the uncertainty of adjustments

Previous feedback has suggested that more work could be done investigating the impact of the assumed distributions around which our simulation-based uncertainty calculations are constructed. The work in this paper presents uncertainty based on an assumption that the distribution underlying the data is normal, and previous work has been done to compare this with a uniform distribution assumption. However, further sensitivity analyses demonstrating the effect of variations in the assumed distribution on the uncertainty estimates are ongoing.

Comprehensive composite measure of uncertainty

We presented in this paper a composite measure of uncertainty for the adjustments and modelling steps used to estimate the migration of EU nationals, as well as an overall measure of uncertainty in the migration of British nationals. Our ambition in the future is to provide a similar composite measure of uncertainty for migration of non-EU nationals and for migration as a whole (all nationalities combined). This would take into account any bias stemming from administrative data (particularly from RAPID and Home Office data) in addition to the simulation-based variance described in this paper.

Further work will be undertaken to consider the type of uncertainty intervals that are produced with international migration estimates. We will consider, for example, empirical uncertainty intervals, bias-adjusted uncertainty intervals, and empirically centered uncertainty intervals.

Back to table of contents

7. Glossary

Administrative data

Collections of data maintained for administrative reasons, for example, registrations, transactions, or record keeping. They are used for operational purposes and their statistical use is secondary. These sources are typically managed by other government bodies.

EU

EU is the sum of EU14, EU8, and EU2, plus Malta, Cyprus and Croatia (from 1 July 2013). British nationals are excluded from these numbers.

Home Office Borders and immigration data

Combines data from different administrative sources to link an individual's travel in or out of the UK with their immigration history. This system has data for all non-European Economic Area (non-EEA) visa holders.

International Passenger Survey (IPS)

Our International Passenger Survey (IPS) collects information about passengers entering and leaving the UK and has been running continuously since 1961. The IPS was resumed in January 2021, after being suspended since March 2020 because of the coronavirus (COVID-19) pandemic. Currently, we use it for our British national estimates and for providing information on reason for migration.

Non-EU

Non-EU is the sum of the rest of the world, including the rest of Europe. British nationals are excluded from these numbers.

"Other" reason for migration

Non-EU

For non-EU migrants, the reason for migration is based on their visa type. "Other" reason includes people who immigrated into the UK under visas classified as:

admin
visit
other
settlement
protection
those that did not fit into any of our designated classifications

Registration and Population Interaction Database (RAPID)

Registration and Population Interaction Database (RAPID) is a database created by the Department for Work and Pensions (DWP). It provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance number (NINo).

Back to table of contents

8. Feedback and acknowledgements

We are keen to receive feedback and observations on our measuring uncertainty work, including those who find it useful, and those who think it needs further thought and refinement. Please contact us at demographic.methods@ons.gov.uk with any comments.

We are grateful to colleagues within the Office for National Statistics for their support, knowledge, and feedback on the topic.

Back to table of contents

9. Related links

How we are improving population and migration statistics
Article | Released 15 November 2021
Methods and notes on the ongoing transformation of migration statistics.

Review of Migration Statistics produced by the Office for National Statistics
Article | Released 1 March 2022
Summary of recommendations for improvement of migration statistics from the Office for Statistics Regulation.

Long-term international migration, provisional: year ending December 2022
Bulletin | Released 25 May 2023
Experimental and provisional estimates of UK international migration, 2018 to 2022. Covers the period since coronavirus (COVID-19) travel restrictions eased.

Methods to produce provisional long-term international migration estimates
Methodology | Released 25 May 2023
An explanation of the methods used to produce the latest provisional experimental statistics on migration flows into and out of the UK.

Back to table of contents

10. Cite this working paper

Office for National Statistics (ONS), published 01 June 2023, ONS website, methodology, Measuring uncertainty in international migration estimates

Back to table of contents

11. Appendix

Our simulation-based approach requires making some assumptions about the underlying probability distribution. We used probability distributions to inform the chance of an observation being selected. In this paper, we make use of two distributions.

Normal distribution

A normal, or Gaussian, distribution has certain characteristics. There is a symmetrical distribution of the data with no skew, and the mean and median have the same value. As a probability model, the normal distribution provides practitioners with a reasonable model, with many examples of normal distributions in the real world. We selected a normal distribution to match on the assumption that there is a decreasing probability in being resampled the further the distance from the mean.

In cases where the underlying data was approximately normal but needed boundary conditions, we used a truncated normal distribution, which has the same density as a normal distribution in between the boundaries, but with zero probability outside the boundary range. This was used, for instance, to sample International Passenger Survey (IPS) counts, which in certain cases had a small mean and a large variance. This meant that some of our samples could be negative, implying negative counts. Instead, a truncated normal distribution was used to limit the distribution to realistically possible values only (greater than zero).

Binominal distribution

There are situations when there are only two outcomes of an event. For example, the outcome of a coin toss can only be heads or tails. In these circumstances, the characteristics of a binominal distribution are suitable and appropriate. For measuring uncertainty in international migration estimates, there are sources of uncertainty associated with only two possible outcomes. For example, if a person on a long-term visa (12 or more months) either becomes or does not become an early leaver (they leave before 12 months). In these cases, our estimation of uncertainty has been informed by applying a binominal distribution.

Back to table of contents

Cookies on ons.gov.uk

Measuring uncertainty in international migration estimates

In this section

Data

Methods

Non-EU nationals

Final-year immigration adjustment

Emigration re-arrivals adjustment

EU nationals

Late-registration adjustment

Student adjustment

Under-16s adjustment

Modelling

British nationals

EU-national adjustments

Download this table Table 1: Statistics from 10,000 simulations for year ending March 2021 EU immigration, impact of adjustments only

Download this table Table 2: Statistics from 10,000 simulations for year ending March 2022 EU immigration, impact of adjustments only

Download this table Table 3: Statistics from 10,000 simulations for year ending March 2022 EU emigration, impact of adjustments only

Modelling

Download this table Table 4: Statistics from 10,000 simulations for year ending December 2022 EU immigration and emigration after modelling

EU composite measure

Download this table Table 5: Statistics from 10,000 simulations for year ending December 2021 and December 2022 EU immigration and emigration, accounting for adjustments and modelling

Figure 1: Boxplot of 10,000 simulations for immigration totals since year ending June 2021

EU immigration simulation boxplots, year ending (YE) quarterly, composite

Source: Office for National Statistics, Department for Work and Pensions, Higher Education Statistics Authority

Notes:

Download this chart Figure 1: Boxplot of 10,000 simulations for immigration totals since year ending June 2021

Figure 2: Boxplot of 10,000 simulations for emigration totals since year ending June 2021

EU emigration simulation boxplots, year ending (YE) quarterly, composite

Source: Office for National Statistics, Department for Work and Pensions, Higher Education Statistics Authority

Notes:

Download this chart Figure 2: Boxplot of 10,000 simulations for emigration totals since year ending June 2021

Non-EU adjustments

Immigration

Download this table Table 6: Statistics from 10,000 simulations for year ending December 2022 non-EU immigration by visa type, after applying the early-leavers adjustment

Emigration

Download this table Table 7: Statistics from 10,000 simulations for year ending December 2022 non-EU emigration, after applying the emigration re-arrivals adjustment

British national survey estimates

Download this table Table 8: Statistics from 10,000 simulations for year ending December 2022 British migration

Bias in estimates from administrative data

Model and parameter uncertainty

Extension of current work on the uncertainty of adjustments

Comprehensive composite measure of uncertainty

Administrative data

EU

Home Office Borders and immigration data

International Passenger Survey (IPS)

Non-EU

"Other" reason for migration

Non-EU

Registration and Population Interaction Database (RAPID)

Normal distribution

Binominal distribution