1. Introduction

In this report, you will find information about the administrative data sources that have been used by the Office for National Statistics (ONS) within the Census and Data collection Transformation Programme (CDCTP). Administrative data are collected by government and other organisations primarily for administrative (not research or statistical) purposes, such as registration, transaction and record keeping, usually for the provision of public services. 

The sources are important for research that will deliver improvements to official statistics that are used for the public good. They support the core principles of the UK Statistics Authority on delivering a revised and comprehensive system of population and migration statistics.  

The research covers various topics of significant importance to users, including: 

  • population and migration statistics 

  • population sub-groups and characteristics  

  • households and living arrangements  

  • housing and housing characteristics  

  • longitudinal analysis and outcomes  

Further information about our research on these topics and the administrative data used is provided on our Research outputs using administrative data page and in our Population and migration statistics transformation in England and Wales, population characteristics update: 2023 article. To enable this research, administrative data are acquired from other organisations outside of the ONS. This involves us understanding the needs of our users, the public, and governing bodies so that we can find the best data to use for future statistical development. 

Data have been acquired in line with our Data acquisition policy and Data ethics policy to assist with our research into transforming our Census 2021 and data collection activities. 

The data are subject to robust controls to ensure that individuals cannot be identified. The ONS does not share or disclose any personal information. Furthermore, ONS complies with all data protection legislation, including the General Data Protection Regulation, the Data Protection Act 2018, the Statistics and Registration Act 2007 and the Digital Economy Act. Further information, including ONS's privacy statement and data protection policy can be found on our Data protection page.

Back to table of contents

2. Overview of administrative data sources

The sections that follow provide an overview of the administrative data sources, including information about why a source has been used in our research and its importance. Data sources have been chosen based on their coverage of the population, how well they capture the required attributes of the population, and their quality. The sources are important for statistics that are used to ensure there are the right services and associated infrastructure to support the current and future population. This includes health, education, employment, housing, transport, retail, and recreation services. The statistics are also essential for understanding and addressing inequalities across regions and groups of the population. 

Back to table of contents

3. Health data sources

Birth and death registrations, and birth notifications  

The Local Registration Service in partnership with the General Register Office (GRO) record all births and deaths in England and Wales. The data are collected under the Births and Deaths Registration Act 1874, where there is a legal duty for parents to register the birth within 42 days. 

Births are also recorded by a midwife or doctor, which is generally done soon after a baby is born (a birth notification). These data are timelier than birth registrations and include other information, such as ethnicity. 

Births and deaths data are essential for research into administrative-based population estimates, as they account for population change because of natural causes. The data are also used for ethnicity statistics.  

Further information about the births and deaths datasets and their quality are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD), and Census 2021

Hospital Episode Statistics and the Emergency Care Dataset 

The Hospital Episode Statistics (HES) and the Emergency Care Dataset (ECDS) record attendances, appointments, and admissions to NHS Hospitals in England. An extract of the data (excluding information about health) is used along with the Personal Demographics Service to ensure our administrative-based population estimates adequately capture the resident population of England through the population's interactions with health services. The data also include information about characteristics of the population, including ethnicity, so are an important source for administrative-based ethnicity statistics. 

Further information about the HES and ECDS and their quality are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD), and Census 2021

Patient Episode Database for Wales and Emergency Department Data Set Wales  

Patient Episode Database for Wales (PEDW) and the Emergency Department Data Set (EDDS) include information on attendances, appointments, and admissions to NHS Hospitals in Wales. In common with HES and ECDS, the datasets are important for both population and ethnicity statistics, providing coverage for Wales. 

Improving Access to Psychological Therapies 

The Improving Access to Psychological Therapies (IAPT), also known as the NHS Talking Therapies programme, data contain information about the population that has accessed NHS-commissioned adult psychological therapies and services in England. Data on ethnicity from IAPT are used in combination with other sources to produce admin-based ethnicity statistics. 

Further information about IAPT and their quality are provided in our Producing admin-based ethnicity statistics for England: methods, data and quality article.  

Personal Demographic Service 

The Personal Demographics Service (PDS) contains demographic data for those who have interacted with an NHS Service in England, Wales, and the Isle of Man, including through GP practices and hospital visits. PDS data have been used since 2016. Prior to using PDS the Patient Register (PR) was used for GP registrations. 

The PDS provides information on the resident population in England and Wales through people's interaction with NHS services. Despite known time lags and some known coverage limitations (for which we apply statistical methods to account for), it is one of the most important sources for our administrative-based population estimates. It is also a long-established source for capturing population moves between local authorities and across the countries of the UK for our existing population National Statistics.  

Further information, including about the quality of PDS, are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD)  and Census 2021

More information about health data sources that are used by the Office for National Statistics (ONS), along with ONS's health data policy, is available on our Sources of data page. The page also includes information about the use of the data for health statistics, including statistics on the impact of COVID-19.

Back to table of contents

4. Housing data sources 

Valuation Office Agency property attributes data 

The Valuation Office Agency (VOA) is an executive agency, sponsored by HM Revenue and Customs (HMRC). Since the 1990s, it has been responsible for banding dwellings liable for Council Tax (CT) in England and Wales. To fulfil this function, VOA collects data on property attributes for residential properties. This includes information on property type, number of rooms and floor area, which is used to produce administrative-based statistics on accommodation type and overcrowding. 

The ONS has applied the Quality Assurance of Administrative Data (QAAD) Toolkit to the Valuation Office Agency (VOA) property attribute data. The summary of this assessment can be found in our Valuation Office Agency property attribute data: quality assurance of administrative data used in Census 2021 methodology. The data were also used as part of Census 2021 to provide information on number of rooms. Details around the data quality are provided in our Administrative data used in Census 2021, England and Wales methodology

Local authority supplied Council Tax data 

Each local authority (LA) in England and Wales is responsible for the collection of Council Tax (CT), a yearly charge for all domestic properties. It includes information about exemptions, discounts, and premiums applied to certain types of properties at a dwelling level.  

CT data provide important information about population change at local level, including by type of household. The data were used in our Quality assurance of Census 2021 and are also important for our future research into admin-based population estimates.   

Further information about CT data and their quality is provided in our Administrative data used in Census 2021, England and Wales methodology

Energy Performance Certificate 

The Energy Performance of Buildings Register holds all Energy Performance Certificates (EPCs) for England and Wales. EPCs are valid for 10 years and published on the Department for Levelling Up, Housing and Communities (DLUHC) website. EPCs indicate the energy efficiency of a building to prospective tenants or buyers, with the intention to improve it. EPC also includes information on floor area and number of rooms in a property. 

EPC data will be used for administrative-based statistics on housing and energy efficiency, including statistics on overcrowding. The data were also used as part of Census 2021 quality assurance, to validate the statistics on central heating type and accommodation type. Details around the quality of EPC data are provided in our Administrative data sources used in Census 2021, England and Wales methodology

Tenancy Deposit Protection Scheme  

Tenancy Deposit Protection Scheme (TDPS) data are provided by the DLUHC. DLUHC receives tenancy deposit agreement data from government approved schemes to fulfil its legislative role in providing protection of tenancy deposits and to dispense a dispute resolution service. 

TDPS covers tenancy agreements in the private rental sector in England and Wales. The TDPS data are being used alongside Zero Deposits data and DLUHC Continuous Recording of Lettings and Sales in Social Housing in England (CORE) data to explore the feasibility of producing sub-regional tenure estimates without a census. 

Zero Deposit 

Zero Deposit data are supplied by Zerodeposit.com, a private business offering a replacement to the traditional security deposit. The Zero Deposit scheme enables tenants in England and Wales to move into private rental properties without putting down a five-week cash deposit.  

Zero Deposit data are being used alongside TDPS data and DLUHC Continuous Recording of Lettings and Sales in Social Housing in England (CORE) data to explore the feasibility of producing sub-regional tenure estimates without a census. 

Continuous Recording of Lettings and Sales in Social Housing in England 

The CORE dataset is a national information source collected by DLUHC that records information on the characteristics of both private registered providers' and local authorities' new social housing rentals and purchases. 

CORE data are being used alongside TDPS data and Zero Deposit data to explore the feasibility of producing sub-regional tenure estimates without a census. 

Additional housing datasets 

For housing and household statistics, it is important to maintain a comprehensive list of residential addresses, which is achieved via the ONS Census Address Frame. The Address Frame was used for Census 2021 collection and covers all residential addresses in England and Wales. The Address Frame is built using several administrative and commercial data sources linked to AddressBase Premium (ABP). Further information about the Address Frame and the associated administrative sources, see our Administrative data sources used in Census 2021, England and Wales methodology

Back to table of contents

5. Education data sources 

National Pupil Database 

The National Pupil Database (NPD) is held by the Department for Education (DfE) and includes multiple datasets split into categories of population, attainment, and post-school destinations. Students' socio-demographic characteristics are obtained from the record-level sources and linked to attainment data recorded by awarding bodies via the unique Pupil Matching Reference Number.  

The NPD is used alongside Individualised Learner Record data (ILR) and Higher Education Statistics Agency (HESA) for research into education statistics, including highest level of qualification attained. The ONS are also working with Welsh Government to acquire the Welsh equivalents of the NPD.  

English and Welsh schools censuses 

The English School Census and Welsh School Census include all pupils in state-funded schools in England and Wales, respectively, along with characteristics data such as a pupil's ethnicity. They are important sources for ensuring children are adequately represented in our research into admin-based population estimates and our statistics on ethnicity.  

Further information about ESC and WSC and their quality are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021

Individualised Learner Record 

The Individualised Learner Record (ILR) data contain information about individuals who attend training from providers in the Further Education and Skills sector in England. The ILR data are important for admin-based population estimates, as they capture those who are in further education who may be missing from other admin sources. The ILR is used alongside National Pupil Database (NPD) and Higher Education Statistics Agency (HESA) for research into education statistics, including highest level of qualification attained. The data include characteristics information, such as ethnicity, so are important for ONS's research on ethnicity statistics. 

Further information about the ILR and their quality are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021

Lifelong Learning Wales Record 

The Lifelong Learning Wales Record (LLWR) is a collection of data on learners and the learning undertaken by them from learning providers funded directly or in-part by the Welsh Government. The data include further education institutions, other work-based learning providers and community learning provision. Apprenticeships and work-based learning are also included. 

The LLWR provides excellent coverage of students in further education, so is an important source for ensuring this population sub-group is represented in administrative-based population statistics. Information on students' characteristics, such as ethnicity is also important for administrative-based ethnicity statistics. The LLWR will be used for research into education statistics, including highest level of qualification attained. 

Higher Education Statistics Agency Student Data 

Higher Education Statistics Agency (HESA) student data contain information about students at publicly funded higher education institutions in the United Kingdom, including International Students. Students in higher education drive changes in populations at local levels, as they move to be close to their place of study. This includes the movement of international students into the UK. For this reason, the data are needed to adequately capture students in the admin-based population and migration estimates. 

HESA data are used alongside National Pupil Database (NPD) and Individualised Learner Record data (ILR) for research into education statistics, including highest level of qualification attained. The data are also important for admin-based ethnicity, labour market status and education statistics. 

Further information about HESA and their quality are provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021

Back to table of contents

6. Income, tax, and benefits data sources 

Department for Work and Pension's (DWP's) Customer Information System and Benefits and Income Datasets  

The DWP Customer Information System (CIS) contains demographic information on everyone who has a National Insurance Number (NINo) in the United Kingdom. The data include children whose parent(s) has (have) claimed child benefit, as well as individuals who require a NINo to work or receive benefits in the UK, including migrants. 

DWP's benefits and Iincome datasets (BIDs) contain information on benefits distributed by DWP (including the state pension); HMRC Pay as You Earn, Tax Credits and Child Benefit data, and LA Housing Benefit data. 

The CIS and BIDs data are used to ensure administrative-based population statistics adequately capture the working age and pensioner population. The data are also essential for research into admin-based labour market status and income statistics, which are important for understanding inequalities down to local levels of geography. 

Further information about the CIS and BIDs datasets are provided in publications outlining the administrative data sources used for both the Statistical Population Dataset (SPD) and our Admin-based income statistics Quality and Methodology Information (QMI)

From 2023, the deliveries of Child Benefit and Tax Credits data into the ONS will be directly provided from HMRC. 

HMRC Self-Assessment data, PAYE Real Time Information data 

Self-Assessment is the system used by HMRC to collect income tax from individuals who are self-employed or have other forms of income not registered to a PAYE scheme. Tax is usually deducted automatically from wages, pensions and savings. However, individuals and businesses with other types of income must report their income in a Self-Assessment tax return. The data are used alongside data on other sources of income, such as income from employment and benefits to produce statistics on labour market status and individual and occupied address incomes. 

HMRC's PAYE data have historically been received as part of the BIDs delivery from DWP. However, the latest and future deliveries will come directly from HMRC. HMRC also provide the ONS with data from its PAYE Real Time Information (RTI) system, which contain more detailed information on the earnings, tax deductions, National Insurance Contributions (NICs) and workplace pensions from employers.  

The data around income are essential for understanding and addressing inequalities across population groups and regions in England and Wales. 

Further information about the HMRC datasets is provided in our Admin-based income statistics Quality and Methodology Information (QMI) publication. 

The Registration and Population Interaction Database 

The Registration and Population Interaction Database (RAPID) is created by the DWP to provide a single coherent view of citizens' interactions across the breadth of systems in DWP, HMRC, and LAs via Housing Benefit. 

RAPID data have proven important for research into admin-based migration estimates, particularly for migrants from the European Union. ONS currently receives aggregate data from RAPID for use in migration statistics. 

RAPID is also used alongside the Migrant Worker Scan, which is a record of non-UK nationals who have been issued with a NINo as part of work into admin-based migration estimates.   

Further information about RAPID can be found in our Methods for measuring international migration using RAPID administrative data methodology and our Administrative data used in Census 2021, England and Wales methodology

Back to table of contents

7. Migration and travel data sources

Home Office Border Systems, Refugee and Asylum Seeker Data 

Home Office Border Systems Data combines visa and travel information to link an individual's travel movements into and out of the country. The data are an essential part of our research into producing migration statistics. Aggregate refugee and asylum seeker data published by the Home Office are also used to include resettled refugees and asylum applicants in the ONS's long-term international migration estimates. More detailed information on how the ONS uses these data to estimate non-EU immigration and emigration is published in our Long-term international migration, provisional: year ending December 2022 bulletin and our Methods to produce provisional long-term international migration estimates methodology.  

Further information on Home Office Border Systems Data and their quality are provided in the Home Office statistics on exit checks: user guide

Vulnerable Persons Resettlement Scheme (VPRS) and Vulnerable Children's Resettlement Scheme (VCRS) data and Asylum Refugee Route (ARR) data 

There are two main sources of data on refugees which include refugees who arrive in the UK via resettlement schemes overseen by government, and who have been granted a protection status prior to arrival. These schemes include the VPRS and VCRS. Secondly, those refugees who arrive here in other disorganised or irregular ways, and who subsequently apply for and are granted refugees status (ARR). 

VPRS, VCRS and ARR data are used with other sources, including the Home Office Border Systems Data, NHS Personal Demographics Service (PDS) and Census 2021 data as part of the Refugee Integration Outcomes (RIO) Cohort Study. RIO is a collaboration between the ONS and the Home Office aimed at improving the evidence base around integration outcomes for refugees in the UK.  

RIO uses data for refugees resettled in England and Wales under VPRS and VCRS between 2015 and 2020. This includes 16,350 resettled under the schemes. Further data for resettled refuges is available in regularly published Home Office Immigration Systems statistics. 

The ARR data in RIO contains approximately 97,000 individuals who were granted asylum between 2015 and 2020 in England and Wales. This sample excludes those still awaiting a decision on their asylum claim, or those who were denied asylum. The majority of the ARR population in RIO are from Iran, Eritrea, Sudan, Syria, and Afghanistan. Further data for asylum refugees is available in regularly published Home Office Immigration Systems statistics.  

The ONS and the Home Office plan to expand RIO in the future to incorporate more recent cohorts of refugees and link to a wider range of economic, health and education data. Information about RIO is provided in:  

Back to table of contents

8. Other population groups data sources

Electoral Register 

The Electoral Register (ER), sometimes called the "electoral roll", includes everyone registered to vote in the UK. The dataset contributes to our admin-based population statistics and was used to quality assure Census 2021 for England and Wales. 

Further information about the ER and their quality are provided in our Administrative data sources used in Census 2021, England and Wales methodology

Ministry of Justice (prisoners' data) 

Annual prisoner data are supplied to the ONS by HM Prison and Probation Service (HMPPS), an executive agency within the Ministry of Justice (MoJ). The data cover all prison establishments in England and Wales, which are required to record prisoner details on Prison National Offender Management Information System (Prison-NOMIS). The data include length of sentence and type, which is used to determine whether to count someone as resident at the prison or their home address. They provide a useful snapshot of the resident prison population and will be used in future admin-based population estimates, in addition to their use in current official population statistics. 

Armed Forces  

The Ministry of Defence (MOD) Armed Forces personnel data include the number of serving UK Armed Forces personnel and civilian personnel with a Defence Medical Services (DMS) registration. Personnel with a DMS registration have their primary care (GP services) provided by the MOD rather than by the NHS. The data are important for ensuring the armed forces are including in population statistics and are also used for statistics on ethnicity. 

Further information about the data and their quality are provided in our Administrative data sources used in Census 2021, England and Wales methodology

Service Leavers 

The Ministry of Defence (MOD) Service Leavers Database (SLD) provides information for service personnel that have left the UK armed forces, irrespective of regular or reserve status and length of service. The data is sourced from legacy personnel systems and the current system, Joint Personnel Administration (JPA). We receive a subset of variables from the SLD for data back to 1975. 

The MOD has collaborated with the ONS to set up a data linkage study looking at the feasibility of producing statistics on UK armed forces veterans by linking data from the SLD and Census 2021 to our Statistical Population Dataset version 4.2 (SPD V4.2). Further information is provided in our Feasibility research on producing UK armed forces veteran statistics for England and Wales: 2021 article.

Back to table of contents

9. Data sources used to transform and carry out a successful Census 2021 

The Office for National Statistics (ONS) used additional administrative data to support a high-quality Census 2021 in England and Wales.  

Our Administrative data used in Census 2021, England and Wales methodology gives details about each data source, its coverage, accuracy and timelessness against the needs of the census.

Back to table of contents