Size of the population: further information

1. What Research Outputs did you publish in October 2015?

For our first Research Outputs we produced population estimates for each local authority, by 5-year age groups and sex for 2011, 2013 and 2014. We compared these estimates with official Census estimates in 2011 and official mid-year population estimates (MYEs) in 2013 and 2014. We call the methodology we used for these outputs Statistical Population Dataset (SPD) V1.0.

Back to table of contents

2. What’s new about the November 2016 Research Outputs?

This year we present an analysis of population estimates produced from a new, improved Statistical Population Dataset (SPD) V2.0 methodology. This gives improved coverage of SPD estimates by using 3 new components: the school census records; improvements to the matching methodology; and the use of “activity” data from the NHS, the Department for Work and Pensions (DWP) and Her Majesty’s Revenue and Customs (HMRC) to assign records on the SPD to the most likely address.

This release contains population estimates for:

each local authority by single year of age and sex, 2011 and 2015 (both original and new methodologies) – additionally, this will be produced for 2013 and 2014 on the original methodology to complete the time series
each Middle and Lower Layer Super Output Area (MSOA and LSOA) by 5-year age groups and sex for 2011 and 2015 (both original and new methodologies)

The accompanying materials analyse the impact of these changes by looking at the differences in performance between SPD v1.0 and SPD v2.0 when compared with the official statistics we have already produced – the 2011 Census and the 2015 Mid-Year Estimates.

Back to table of contents

3. What is a Statistical Population Dataset (SPD)?

Once individual records have been matched across data sources the information is pulled together into a single, coherent dataset that forms the basis for estimating the population. This is called a Statistical Population Dataset (SPD).

Back to table of contents

4. What is “activity” data?

“Activity” can be defined as an individual interacting with an administrative system, for example, for National Insurance or tax purposes, when claiming a benefit, attending hospital appointments or updating information on government systems in some other way. Only demographic information (such as name, date of birth and address) and dates of interaction are needed from such data sources to improve the coverage of our population estimates.

In future years we will explore the potential use of “activity” data for removing records from the Statistical Population Dataset (SPD) if there is no evidence that they are still part of the “usually resident” population. This should result in a reduction in males of working age on the SPD, where the SPD is generally higher than the official estimates.

Back to table of contents

5. What status do these statistics have? Are they official/national statistics?

These Research Outputs do not have National Statistics status and are not to be used as a substitute for census data and official mid-year population estimates. Our Research Outputs series has been developed to keep population data users up-to-date with our assessments of administrative data and, in the longer term, to show progress towards a possible future census alternative.

Back to table of contents

6. What sources do you use?

We currently use the following administrative sources to produce our Research Outputs on population estimates: the NHS Patient Register, the Department for Work and Pensions (DWP)’s Customer Information System, data from the Higher Education Statistics Agency, England and Wales school census data, and “activity data” from the NHS, DWP and Her Majesty’s Revenue and Customs (HMRC). We also use aggregate statistics for home and foreign armed forces personnel, supplied by the Ministry of Defence.

Back to table of contents

7. How are these Research Outputs produced and how was a decision made on which admin sources to use?

Our research so far is based on pseudonymously linking (see below “What is pseudonymisation?”) individual records across the following data sources: the NHS Patient Register, the Department for Work and Pensions (DWP)’s Customer Information System and data from the Higher Education Statistics Agency and the England and Wales school census from the Department for Education (DfE).

We also link “activity data” from the NHS, DWP and Her Majesty’s Revenue and Customs (HMRC) to assign records on the SPD to the most likely address. Only demographic information (such as name, date of birth and address) and dates of interaction are needed from such data sources to improve the coverage of our population estimates.

We apply a series of rules to include or exclude records from our estimate, depending on which sources individual records appear on and other indicators of residency available on the datasets. In selecting these sources we published a series of Administrative Data Source Reports that explore their suitability for producing population statistics, based on coverage, variable content, and the administrative processes underpinning data collection. This year, we have published 2 new data source overview reports on the Personal Demographic System (PDS) data from NHS Digital and on Income and Benefits data from DWP and HMRC. These are 2 new data sources that we have used to make improvements to our SPD methodology this year.

Back to table of contents

8. What is pseudonymisation?

Pseudonymisation is a procedure by which identifying fields (ie names, dates of birth and addresses) within a data record are replaced by one or more artificial identifiers to protect the privacy of individuals. Consequently our researchers have access only to the de-identified data.

Back to table of contents

9. Can individuals in these Research Outputs be identified?

Individuals cannot be identified in these outputs nor in the report accompanying the outputs. We have obligations under the Statistics and Registration Service Act (SRSA) 2007 and the Data Protection Act 1998 to protect the identity of individuals, households, businesses and their characteristics in published outputs. In addition, the Code of Practice for Official Statistics requires that our statistics do not reveal the identity of, or any private information about an individual or organisation even when combined with other relevant sources.

An important aspect of protecting information in tables is that there must be uncertainty whether a small count represents a true value. Uncertainty in Research Outputs arises from the quality of the administrative data and the methods used in their construction. Consequently it is not possible to be certain that a small number represents a true value. The methods used to derive numbers at a local level introduce additional uncertainty for statistics at that level.

Back to table of contents

10. What is the population they are measuring/what are the definitions used?

The population we are aiming to measure is the UN’s definition of usually resident population. This is consistent with the target population for census and mid-year population estimates (MYEs) and is defined to include people who reside in England and Wales for at least 12 months, regardless of their nationality, and excludes short-term migrants and visitors (United Nations, 2008).

However, administrative data are primarily collected for operational purposes and not designed to specifically capture usual residence. The methodology for the Research Outputs series has been developed with the intention of following this definition as far as the data will allow. This has been done by selecting administrative data sources that have wide population coverage and by using the inclusion, exclusion and distribution rules described above.

Back to table of contents

11. Do these data include UK residents living abroad?

No, see above - ‘What is the population they are measuring/what are the definitions used?’

Back to table of contents

12. What do these Research Outputs show?

When we compared the outputs released in October 2015 with official 2011, 2013 and 2014 statistics it showed that there is potential for administrative data to be used in estimating the size of the population. The methodology used at that time (Statistical Population Dataset (SPD) V1.0) produced an estimate that is slightly lower than the official estimates at national level. Differences are more pronounced for certain age groups and at smaller area level. However, the admin data-based population estimates for local authorities are fairly close to official estimates in the majority of cases.

Our research released in November 2016 uses our new SPD V2.0 methodology. When we compared its results to the official 2011 and 2015 estimates it showed improvements in the accuracy and detail of the Research Outputs. For example we have: used new sources of data to assign records to their most likely address, improving accuracy at the local level; improved our coverage of children aged 5 to15; and have published outputs at Middle and Lower Layer Super Output Area level (MSOA and LSOA).

Back to table of contents

13. Why are these Research Outputs being published now? Does this mean that the mid-year population estimates (MYEs) are incorrect?

No. We are publishing Research Outputs from 2015 onwards to demonstrate the progress we are making with assessing administrative data’s potential as an alternative for producing information on housing, households and people that is currently provided by a 10-yearly census, after 2021. Our methodology for assessing administrative data will continue to develop each year. On the other hand, MYEs are produced using an established methodology and remain the official Population Estimates.

Back to table of contents

14. What are the main differences between these Research Outputs and the mid-year population estimates (MYEs)?

The main difference is that the administrative data for the Research Outputs are being used to produce an estimate of the population at a point in time by linking records across multiple sources. While some administrative data are used in the MYE methodology, they are primarily used to measure flows in internal migration and to distribute estimates for international migration. Another important difference is that the MYE series is rebased every 10 years by the census, whereas the Research Outputs will continue to be produced independently of census data.

Back to table of contents

15. Are you going to produce a version of the national population projections (NPPs) based on these Research Outputs?

No. The projections are based on the official mid-year population estimates, the best estimate of the population available. In addition, they require measures of changes in births, deaths and migration over time that are consistent with the estimates to set the assumptions used. The Research Outputs are not official estimates and do not provide consistent measures of these components of change. Also, NPPs are required on a consistent basis for the whole of the UK. Research Outputs do not provide this.

Back to table of contents

16. Why aren’t there any Research Outputs on the size of the population for 2012?

Not all data sources were available to ONS over the reference period in 2012. Datasets unavailable for 2012 will not be supplied in future and so estimates for 2012 will be permanently omitted from this series.

Back to table of contents

17. Can I get these Research Outputs at single year of age?

Yes – our latest Statistical Population Dataset (SPD) V2.0 methodology produces research data for a single year of age at local authority level – a step forward on last year’s research estimates for 5-year age groups.

Back to table of contents

18. Why can’t I get single year of age data for Lower Layer Super Output Areas (LSOAs)?

At present we have not done sufficient research to understand the differences between Research Outputs and our official small area population estimates when compared at single year of age at this level of geography. We intend to explore this and make data available in the future.

Back to table of contents

19. Which geographies do these Research Outputs cover?

In our 2015 outputs, the geographies covered included national, regional and local authorities in England and Wales. Our latest 2016 Research Outputs extend this coverage to Middle and Lower Layer Super Output Area (MSOA and LSOA) level.

Back to table of contents

20. How do I provide feedback?

A short survey accompanies these Research Outputs on the size of the population. We particularly value feedback that helps improve and develop the methodology for producing census and population statistics using administrative data and identifies new sources that may help make these improvements. Although we welcome feedback at any time, it would be helpful to provide feedback on this year’s research report and outputs by 31 January 2017. We would also appreciate any insights into why the Statistical Population Dataset (SPD) figures and the small area population estimates differ, which may come from local knowledge of an area’s characteristics.

Back to table of contents

In this section

1. What Research Outputs did you publish in October 2015?

2. What’s new about the November 2016 Research Outputs?

3. What is a Statistical Population Dataset (SPD)?

4. What is “activity” data?

5. What status do these statistics have? Are they official/national statistics?

6. What sources do you use?

7. How are these Research Outputs produced and how was a decision made on which admin sources to use?

8. What is pseudonymisation?

9. Can individuals in these Research Outputs be identified?

10. What is the population they are measuring/what are the definitions used?

11. Do these data include UK residents living abroad?

12. What do these Research Outputs show?

13. Why are these Research Outputs being published now? Does this mean that the mid-year population estimates (MYEs) are incorrect?

14. What are the main differences between these Research Outputs and the mid-year population estimates (MYEs)?

15. Are you going to produce a version of the national population projections (NPPs) based on these Research Outputs?

16. Why aren’t there any Research Outputs on the size of the population for 2012?

17. Can I get these Research Outputs at single year of age?

18. Why can’t I get single year of age data for Lower Layer Super Output Areas (LSOAs)?

19. Which geographies do these Research Outputs cover?

20. How do I provide feedback?

You might also be interested in:

Cookies on ons.gov.uk

Size of the population: further information

In this section

You might also be interested in: