Design for Census 2021

1. Overview

This article describes our end to end statistical design for Census 2021 to ensure census results are of high quality and fit for purpose. It is part of a series of articles published on 1 October 2020 updating our design and plans for Census 2021.

These plans take into account our past experience and lessons learned from the 2019 Collection Rehearsal and feedback from local authorities on our quality assurance plans. We have adapted our plans and are planning contingencies in light of the coronavirus (COVID-19) pandemic. More information is available in our overview.

The census is the largest statistical exercise that the Office for National Statistics (ONS) undertakes, producing statistics that inform all areas of public life and underpin social and economic policy. It provides a wealth of information at small geographies to inform local planning and decision-making.

Our aim for Census 2021 is to count everyone once, in the right place, and for questions to be completed accurately. However, we recognise that this is challenging to achieve. We need to ensure that there is a methodological basis for optimising the count and then to estimate or adjust for individuals who were not counted or were counted more than once. We call this the "statistical design".

The statistical design for Census 2021 has four main phases: design and build (Section 2, monitor and counter (Section 3), process and estimate (Section 4), and outputs and disclosure control (Section 5).

We know from the rehearsal and international experience that the best way to achieve a high quality census is through high response rates from all communities. Our rehearsal showed the importance of community engagement and high quality management information to enable intervention during live operations. This monitor and counter phase is vital.

Our process and outputs phase is built upon the high quality Census Coverage Survey (CCS) and quality assurance approach. We will also have more administrative data than ever before and will utilise the best of all available sources to ensure a high quality census for all.

We have set out here a range of approaches we may take, should we need to, to ensure we deliver the high quality census and population estimates needed for users and are confident these plans enable delivery of this (see Contingency planning in Section 4: Process and estimate.

The statistical design

The main measures of the quality of census statistics are the levels of accuracy (bias and variance) of the estimates of population size at both national and local levels. As for the 2011 Census, our quality objectives are to be nationally accurate as measured by a confidence interval of plus or minus 0.2%, with bias less than 0.5% for England and Wales, and to produce data that are high quality locally with 95% confidence intervals for all local authorities within plus or minus 3%.

To help achieve these, there are census response rate targets of 94% for England and Wales as a whole and no less than 80% for each local authority. Of equal importance is minimising variability in response – 90% of Lower layer Super Output Areas (LSOAs) within a local authority or hard-to-count category must have response rates that occur within 10% of the mean response rate for that local authority or "hard-to-count" group (see Maximising response in Section 2: Design and build for definition of "hard-to-count").

We check for reliability through quality assuring the census (see Quality assurance in Section 4: Process and estimate), implementing contingencies where necessary (see Contingency planning in Section 4: Process and estimate) and seeking assurance on our methods through the independent Methodological Assurance Review Panel.

In terms of timeliness and punctuality of the census outputs (see Section 5: Outputs and disclosure control), we have set a challenging target to release the first results by end March 2022 (four months earlier than in 2011). As well as an earlier first release, the publication of the main results will be compressed into a shorter period than 2011, aiming to be complete by March 2023.

The design has four main phases: design and build (Section 2), monitor and counter (Section 3), process and estimate (Section 4), and outputs and disclosure control (Section 5).

Design and build

This covers the essential components of preparing for a census; the topics and questions for the census; who should be counted and where; the census questionnaire; identifying the households and communal establishments (CEs) to be contacted; and the strategy for reminding people to respond if they have not already done so.

Monitor and counter

Census 2021 will be online first, meaning we will be asking everyone to complete the census online, if possible. This approach will enable us to monitor response in real time and react quickly, targeting resources to maximise response. Support will be available through online help, our Census Contact Centre and Census Support Centres, where people can go to get help to complete their forms. We will send paper questionnaires to people who are unlikely, unable or reluctant to respond online, and there will be capability for telephone capture where necessary.

Process and estimate

Even with a variety of strategies to maximise response, it is inevitable that some people will not be counted or some questions not answered when they should have been. To meet the quality objectives, the data must therefore be captured, cleaned, edited and adjusted for non-response before any outputs can be produced.

Outputs and disclosure control

The benefits of the census will be realised by users having access as soon as possible to high-quality statistics, released free at the point of use. We want to maximise the data available at the lowest geographic levels and make it as easily as possible for users to access the data they need, while protecting confidentiality (disclosure control). This will include data tables, analysis, microdata, and data about commuting and internal migration flows. We plan to publish Census 2021 data via the ONS website in a combination of pre-determined tables and a web-based interactive dissemination system where users can specify the data they require.

Use of administrative data in Census 2021

We are making greater use than ever before of the administrative data collected by public and some private sector organisations. As well as supporting our response to unexpected events, these data are being used throughout to help drive up census quality, to produce new outputs by integrating census and administrative data, and to help quality assure the statistics before release. We will be making far greater use of administrative data than in the 2011 Census. Further details are given in the following sections but in summary, we are planning to use administrative data to:

improve the quality of Census 2021 Address Frame; we have been able draw on more admin data sources than were available in 2011 (see Address Frame in Section 2: Design and build)
predict how and when people are likely to respond to ensure that we have the right resources available in the right places to provide help to those who need it (see Identifying those who might need help in Section 2: Design and Build).
potentially support the imputation of missing date of age birth data and the imputation of missing households (see Background and Processing in Section 4: Process and estimate)
help quality assure the census results (see Quality assurance in Section 4: Process and estimate)
support statistical contingencies to support a range of lower than expected response scenarios (see Contingency planning in Section 4: Process and estimate)
produce integrated (combined census and administrative data) outputs (see Section 5: Outputs and disclosure control); for example, by using Valuation Office Agency (VOA) data on number of rooms (instead of asking such a question as we have done in previous censuses) and using Department for Work and Pensions (DWP) and HM Revenue and Customs (HMRC) data on income to meet user needs for small area income statistics

Keeping your data safe

Of utmost importance is the protection of personal data. All personal information collected in the census is protected and kept confidential for 100 years. We only use data to produce statistics and undertake statistical analysis and research, and there are strict penalties under legislation for unlawful disclosure. A process called disclosure control (see Section 5: Outputs and disclosure control) is used to ensure that statistical outputs provide as much value and utility as possible to users while protecting the confidentiality of information about individuals, households and organisations. Further details on our data strategy and how we will keep your data safe are available, and we have published a Updated Equality Impact Assessment (PDF, 440KB).

National Statistics accreditation

The Office for National Statistics (ONS) has a responsibility to ensure the results of Census 2021 in England and Wales are accurate and adhere to the Code of Practice for Statistics set by the UK Statistics Authority.

The Code ensures that the statistics published by government serve the public. When producers of official statistics comply with the Code, it gives users of statistics and citizens confidence that published government statistics are:

of public value
high quality
produced by trustworthy people and organisations

The conduct of the entire Census 2021 operation will be assessed by the Office for Statistical Regulation (OSR), the regulatory arm of the UK Statistics Authority, against the Code of Practice for Statistics.

The most recent review of our response to OSR preliminary findings in the Census 2021 assessment process was published on 11 September. You can find out more about the different phases of the National Statistics assessment process, and our response, from the National Statistics accreditation web page.

Back to table of contents

2. Design and build

Extensive research has been undertaken to develop and evaluate methods and approaches that will optimise census responses to meet our overall quality objectives. Alongside this research, the 2017 Census Test and 2019 Census Rehearsal have provided opportunities to evaluate the effectiveness of the statistical design.

This section describes the essential components of preparing for a census: the topics and questions for the census; who should be counted and where; the census questionnaire; identifying the households and communal establishments (CEs) to be contacted (addressing); and the strategy for reminding people to respond if they have not done so.

Topics and user needs

Decisions about the inclusion or exclusion of topics on the census were made based on extensive consultation with a wide range of stakeholders as well as detailed research and rigorous testing.

We conducted a public consultation in 2015, The 2021 Census – Initial view on content for England and Wales (PDF, 3.6MB). In 2016, we published our consultation response (PDF, 796KB), which detailed the scoring mechanism we used to evaluate users' data needs, provided the rationale for our decisions, and outlined our proposals and research plans.

Our evidence-based recommendations for Census 2021 topics were further developed through meetings and correspondence with stakeholder organisations, consultation with subject experts, feedback at census events and survey responses on specific topics. We also sought advice from and worked closely with Census Advisory Groups (CAGs) (which represent the interests of local authorities, central government departments, academics, third-sector bodies, business and professional bodies) and topic experts from Welsh Government, National Records of Scotland (NRS) and the Northern Ireland Statistics and Research Agency (NISRA).

We published information on our research and testing, including two census topic research updates: 2021 Census topic research: December 2017 and 2021 Census topic research update: December 2018. These articles focused on decisions around which topics to include rather than the exact wording of the questions. For information regarding the approach taken to topic choice and consultation in Scotland and Northern Ireland, please see the NRS and NISRA census websites respectively.

The 2018 government White Paper, Help Shape Our Future: The 2021 Census of Population and Housing in England and Wales, detailed the topics the Office for National Statistics (ONS) recommended for inclusion in Census 2021. Most topics are the same as in 2011. However, three new topics are included: armed forces veterans, sexual orientation and gender identity. Questions regarding year last worked and the number of rooms in the accommodation, which were included in the 2011 Census, are no longer included.

Our web page dedicated to our question development work contains links to a range of detailed topic and question development publications, for further information.

Following the publication of the White Paper, we continued our programme of research and testing for the recommended question designs for Census 2021 (see the next subsection). Final approval on our topic and question recommendations was given by Parliament through the passing of the necessary secondary legislation.

Population bases

One of the main aspects in designing a census is the definition of who exactly should be counted and where. Respondents to the consultation described in the previous subsection stated a need for the enumeration base to remain the same for Census 2021 as it was in the 2011 Census; that is, the main outputs will continue to be on a resident basis. Further information is available.

Question and questionnaire design

We have conducted extensive stakeholder engagement, research and testing to inform the design of the questions and the questionnaires for Census 2021 in England and Wales. This has included working with the NRS and NISRA in seeking to harmonise questions across the UK censuses where possible while recognising that each country has its own user and respondent needs.

Each question's development was unique, based on our findings at each stage. However, all testing followed a basic structure, beginning with engagement with data users to understand their requirements, followed by a programme of qualitative, quantitative and user experience (UX) testing.

The same data will be collected on both the electronic and paper questionnaires. However, we have optimised the question designs separately for each version of the questionnaire to ensure that we collect the best quality data.

Separately, as noted in Topics and user needs at the beginning of this section, we have published a full description of the approach taken to question and questionnaire development including:

the design process of the online and paper census questionnaires
how we ordered the census questions
how we approached the design of the questions in Welsh
the criteria we used to evaluate the final question designs

Throughout the question development, an important consideration was ensuring that on all identity questions the respondents were clear that they could identify how they wish and are not constrained by the response options provided. This has primarily been achieved by making the option to write-in an alternative answer clear and through developing search-as-you-type functionality for the online questionnaire.

The 2019 Census Rehearsal, while not designed primarily to test questions, was used to test near final question designs. Feedback from respondents, and rehearsal data, were used to help finalise the questions.

While the exact questions and response options have been finalised, we are considering whether additional guidance may be needed to help respondents interpret the questions in different scenarios, for example, on students' term-time address.

Update March 2021: Further details on the enumeration of students including revised and enhanced guidance is available as an addendum at Census 2021 - How we are ensuring an accurate estimate of students.

We will share more information in due course and look forward to engaging with our users to discuss the guidance that will be needed in interpreting the outputs, for example, in measuring change over time.

Address Frame

Overview and quality targets

An accurate list of addresses is central to the statistical design of Census 2021 to ensure high quality statistics. "Invitation to respond" letters will be sent to all residential households, and we will monitor return rates based on those addresses (see the next subsection and Section 3: Monitor and counter). As a result of the inevitable and continuous changes in residential addressing, we have developed processes to maximise the accuracy of the residential household frame used as the basis for inviting households to participate in Census 2021. Additional arrangements will be in place for people who live in CEs and those in Special Population Groups (SPGs): CEs are managed residential accommodation such as hotels, care homes, student halls of residence and prisons, and SPGs are residential accommodation that are counted in the same way as other households but where additional arrangements are needed, such as caravan sites and family housing on military bases.

We have developed quality targets for those addresses that may be missed in the frame (the list used as the basis for making initial contact, those that should no longer be included and those that are misclassified). There are also targets to ensure we minimise concentration of undercover and overcoverage in any given local authority.

Undercoverage:

Identifies 99.25% of residential addresses (no more than 0.75% under coverage).
No more than 2% under coverage in any local authority.
Identifies 100% of CEs with over 50 bed spaces.

Overcoverage:

Less than 1% overall.
Wrongly classify no more than 0.3% of addresses.
Includes no more than 0.3% duplicates.

Census Address Frame

The frame is based on AddressBase Premium (ABP), which is widely used across both the public and private sector and is continually updated by Geoplace. It uses Local Land and Property Gazetteers (LLPGs) in conjunction with a range of address intelligence sources such as from the Valuation Office Agency (VOA), Royal Mail and Ordnance Survey. As updated versions of the product are released every six weeks, Geoplace have developed processes for understanding and assuring the quality of the address information.

We supplement AddressBase with administrative and commercial sources to create the frame required for Census 2021. For CEs, we use other sources to supplement Addressbase with information and to carry out checks to identify any missed addresses. Supplementary sources include:

Cushman and Wakefield (Student Halls)
Care Quality Commission and Care Inspectorate Wales (Care Homes)
Ministry of Defence and US Armed Forces (Armed Forces Bases)
Ministry of Justice (Prisons)
Edubase (Boarding Schools)
Ministry of Communities and Local Government Survey of Traveller Sites

The initial frame is based on a copy of AddressBase taken in summer 2020 to give time to print 26 million initial contact letters. A further copy will be taken later in 2020 to account for any new addresses and to flag addresses that should no longer be included on the frame.

A final assessment of new addresses will be made in Quarter 1 (Jan to Mar) 2021. This will include evaluation of AddressBase to identify any new addresses. Census field officers will visit such addresses to make initial contact where required.

Anyone who does not receive an initial contact card (see next subsection) will be encouraged to inform the ONS (either online or through the contact centre) that their address has been missed. They will then be provided with the means to complete a questionnaire.

Demonstrating the quality of the frame

Research has been undertaken to understand and correct for missing addresses. Further information about this work will be published in due course.

An evaluation of missed addresses was carried out through an assessment of addresses in other administrative sources that have not been used in constructing the frame. The sources used included the NHS Patient Demographic Service (PDS) and the English and Welsh School Censuses (PDF, 2.94MB). This work found no significant clustering of missed addresses.

As a result of the coronavirus pandemic, it was not possible to carry out the planned field address check in summer 2020 involving field staff visiting uncertain addresses to confirm their status. Instead, the planned clerical address check was increased in size and scope, covering a sample of addresses and CEs. This involved a range of online searches to make an assessment of whether there was evidence of addresses being active. This was supplemented by an exercise linking the frame to up-to-date sources such as Council Tax information. By clerically checking in this way, the ONS was able to check more addresses than initially planned by the field address check. The process has also led to the development of linkage tools, which can be used to improve the quality of the frame up to Census Day and for subsequent use (such as in the sampling frame for ONS social surveys).

Clerical checking of addresses of CEs was also undertaken to ensure accurate classification of CE type and number of bed spaces. This work also included identifying room-level address information within halls of residence. This information had already been provided by some local authorities within AddressBase. The 2011 Census demonstrated that obtaining responses could be particularly challenging in student halls of residence. Monitoring response by individual room should enable the ONS to track overall response.

As a consequence of the coronavirus pandemic, it was not possible to collect room-level addresses for care homes as initially planned. No further clerical work was undertaken given the high response rates achieved for care homes in 2011 and because direct contact could not be made.

Maximising response

Maximising response as well as minimising variability in response to the census is vital to meeting the quality targets (see The statistical design in Section 1: Overview). Everyone therefore needs to be aware of their responsibility to take part in the census and be able to do so. Census 2021 in England and Wales will be the most inclusive ever, with tailored support available in many different forms to ensure that everyone can take part. A respondent-centric approach will ensure that it is as easy as possible for people to respond however, and wherever, they wish to.

A digital-first census

For the first time, the default mode of completion for Census 2021 will be via an online questionnaire. Households will be sent a letter with an access code unique to their address, enabling them to complete a census questionnaire online. People will have the option of responding using a mobile device, tablet, desktop or laptop computer. There will be online help available and assistance will be available via Census 2021 Support Centres (at a variety of locations such as libraries, community centres, youth groups or social housing communal areas) for those who wish to complete the census online but may need help to do so.

While we will encourage those who can complete online to do so, we are aware that not everyone will have the skills or capability to do this. Paper questionnaires will therefore be available to ensure that there are no barriers to completion for those who cannot, or choose not to, complete their census online. Paper questionnaires may be completed and posted back, but they will also include an access code to enable online response should the recipient wish to do so.

There are four different routes for the public in residential addresses receiving paper questionnaires:

paper as first contact: in some areas of England and Wales, where the take-up of the online option is likely to be relatively low, the initial contact will be a paper questionnaire, delivered by post
paper questionnaire as a reminder: in some other areas, where the initial contact letter includes only a unique access code for online completion but we expect lower levels of response, a paper questionnaire will be posted out at a later date to non-responding households as a reminder to respond
on request via contact centre or website: anyone who wants to complete a paper questionnaire will be able to request one at any time either via the website or by calling our helpline
handed out by census officers: census field officers will visit all addresses where no response has been received to remind people to respond; at this point, field officers will also be able to supply paper questionnaires on the doorstep if requested

Those living in CEs will receive either a letter with an access code or a paper questionnaire, also with an access code, dependent on type of establishment and the likely propensity to respond online.

Telephone capture will also be available as a mode of completion for those who need it. The initial contact letter provides a phone number for anyone to call if they need help. At this point, people may be offered telephone capture if deemed appropriate.

Evidence from the 2019 Collection Rehearsal has shown that our strategy (based on administrative data for online take-up of services) for when and how to provide paper questionnaires generated the expected levels of response and has given us confidence that we will meet our quality targets. Of those who responded to the rehearsal, 82% responded online. However, as the rehearsal was voluntary, those who may have had difficulties in responding online may have been less likely to take part. Rehearsal areas also included a majority of addresses in urban areas and the online share of responses is therefore expected to be higher compared to England and Wales as a whole.

Helping everyone to respond

Providing the right support and ensuring that everyone is aware of what they need to do is vital to ensuring that we meet our response targets and to achieving accessibility and meeting equality requirements (PDF, 660KB). Respondent needs are, therefore, at the heart of our design. Around three weeks before Census Day, all residential addresses in England and Wales will receive a card through the post letting them know when Census Day is and to expect an "invitation to take part" soon. The initial contact letter or paper questionnaire will be delivered at least a week before Census Day and will provide the information respondents need to take part. Additional support will be available via the online help facility, the Contact Centre helpline or via Census Support Centres. Translation leaflets will be available in more than 40 languages. Interpretation services will also be provided if required.

If we have not received a census return from an address, we will send reminder letters and Census Officers will visit to encourage people to respond. The 2019 Collection Rehearsal demonstrated the effectiveness of reminder letters and led to a decision to increase their use. All non-responding households will receive at least two reminder letters. The non-response follow-up phase starts two days after Census Day and will continue for around the next six weeks.

For those living in CEs, Census Officers will make contact with the establishment manager and liaise over the best way to ensure those living there can be counted and identify any support needed.

Responding privately

Everyone's personal information is protected and kept confidential for 100 years. However, the introduction of voluntary questions about sexual orientation and gender identity has meant that we are more aware than ever before that some people may wish to keep their answers private from the rest of their household.

It may also be impractical to share an online access code for a household questionnaire with the rest of the household in certain circumstances, such as in houses of multiple occupation where residents do not know each other. Anyone may therefore request an individual questionnaire (either a paper form or a code to access an individual online questionnaire), and answers provided on an individual form will take precedence over answers provided for the same person on a household form.

Identifying those who might need help

To ensure that we have the right resources available in the right places to follow-up all non-responding households, we need to be able to predict how and when people are likely to respond.

We have developed two indices to help this prediction. The first is a hard-to-count index (DOCX, 1.24MB) to categorise the relative likelihood of households in a small area (Lower layer Super Output Area (LSOA) level) responding without reminder letters or field visits. This is modelled on 2011 Census patterns of response, updated using administrative data sources to account for demographic changes since 2011. The second is a separate geographic index to indicate the relative propensity of households in an area to respond to Census 2021 online. This model uses Driver and Vehicle Licensing Agency (DVLA) data on the mode of driving licence application and renewals (online or paper) as a proxy for the mode of census return and includes Office of Communications (Ofcom) data on broadband take-up, age of residents and Government Office Region.

The two indices will be used together to determine where to send paper first, to determine where to send paper questionnaires as a reminder, and to help guide planning for Census Support Centres. These indices also feed into the volumetric modelling that determines how many field staff we need to recruit, and where, as well as the volumes of reminder letters and other printed material required.

Certain population groups have been identified as needing either additional support, engagement or guidance to enable them to take part in the census, for example, access to translation materials. We have worked with certain population groups to identify where we can adapt our field operations, communication campaigns and support services to maximise the inclusivity or Census 2021 as well as maximising response.

Work is ongoing to establish the best way to support those who might need help in a situation where they may be unable to leave their homes – such as those who may be self-isolating or under a COVID-19 lockdown scenario. Operational planning response to the coronavirus (COVID-19) for Census 2021, England and Wales provides more details.

Back to table of contents

3. Monitor and counter

One advantage of a predominantly online census is the rapid feedback of information during live operations. We will start processing online returns as soon as they are submitted. We will also be able to monitor live patterns of responses and, more importantly, react quickly to ensure that we maximise the effectiveness of resources and increase the likelihood of reaching response rate targets.

We have developed an approach that automates the prioritisation of non-response follow-up resources in real time. The priority order of follow-up visits can also be altered to reflect the patterns of response in the area, a critical tool for minimising variability in response rates.

Response-chasing Algorithm (RCA)

The Response-chasing Algorithm (RCA) is a decision support tool designed to enable rapid identification and prioritisation of any shortfalls in response and recommend appropriate interventions to get back on track. Expected patterns of response are produced by a Field Operations Simulation (FOS) model (DOCX, 971KB), based on assumptions including the willingness of respondents to respond without follow-up, propensity to respond online, the effectiveness of reminder letters, the contact rates of field staff and field visit durations.

The RCA compares expected and live return rates for Lower layer Super Output Areas (LSOAs) in England and Wales. Based on the extent of any shortfall or excess, it recommends appropriate interventions to return response levels to those required to meet the quality targets. Interventions are determined based on availability of resources and their effectiveness in each area and include additional reminder letters and/or additional field staff hours.

Being able to analyse and visualise census response patterns during the live operations means that not only can we identify shortfalls in returns, but we can also identify the likely characteristics of non-responders and target engagement activities and digital communications towards those groups. The rapid turnaround between observed returns and interventions recommended is matched by the speed and flexibility with which such interventions can be deployed. Digital media allows messages to be rapidly adapted and tailored in response to specific events and the needs of the operation. Dedicated teams of mobile field force will be available to re-locate to wherever in the country they are needed the most. Furthermore, the availability of part-time field officer contracts will increase the flexibility of the workforce to meet demand.

The RCA was successfully tested in the 2019 Collection Rehearsal to identify areas where additional resources were required and ensured that targets were met despite field staff recruitment challenges. Areas of improvement were identified and are currently being developed for 2021, including an improved forecasting method that takes into account differences between predicted and observed behaviours.

Field Prioritisation Algorithm (FPA)

If we were to just focus on maximising overall response, we would risk increasing variability in response by increasing response rates in "easier" areas while making little progress in the "harder" areas, potentially resulting in increased variability in quality in the final outputs. The Field Prioritisation Algorithm (FPA) has been developed specifically to focus on minimising return rate variability.

The FPA works at a lower level of geography than the RCA and, unlike the RCA, does not alter existing resource levels but automatically reshuffles existing field staff within Team Leader Areas to prioritise resource to areas with the lowest levels of response. The algorithm analyses return rates at Output Area (OA) level within each Team Leader Area and prioritises the order of visits for field officers.

A comprehensive, flexible and intelligent approach – changes since the 2011 Census

Both the RCA and FPA provide a significant improvement from 2011, when data available during live operations were less timely and detailed. They represent an evolution in streamlining local decision-making in response to poor levels of response or high response variability. The RCA and FPA use the live data feeds to provide quick and intelligent response to problem areas and to reallocate existing resources. Combined with a more flexible operation, and aided by interactive management information dashboards enabling rapid access to live data, the statistical design for the Census 2021 data collection makes the most of the technological advances over the last decade to improve the efficiency of the operation and ensure that we have the resources where we need them most.

Back to table of contents

4. Process and estimate

Background

For Census 2021, we are building on the statistical processing and estimation used in previous censuses in England and Wales. Our aim for Census 2021 is to count everyone once, in the right place, but we recognise that this is challenging to achieve. Since 2001, we have made adjustments to all census outputs to account for estimated non-response using information from a follow-up census coverage survey. This means the statistics represent our best estimate of the whole population and its characteristics, not just those who responded.

We have also developed additional contingency plans to a range of scenarios, for example, a lower than expected response as experienced in New Zealand in 2018 and to respond to the coronavirus (COVID-19) pandemic and other unforeseen circumstances.

The next subsection describes the processing of collected census data – coding, data cleaning, editing for obvious errors and processes to estimate (impute) where answers to questions are missing. The Coverage estimation and adjustment subsection follows and describes the statistical processes to estimate and adjust for overall non-response error. The Quality assurance subsection describes the quality assurance processes, and the Contingency planning subsection discusses our contingency plans for lower than expected response.

Processing

Coding

This is the process by which we convert the collected questionnaires into an efficient machine-readable format and assign codes to variables based on how the respondent answered a question. These coding rules will also specify how to assign a code for more complex scenarios such as coding missing data for subsequent use in imputation (see later in this section); where two or more responses have been received for a question but only one is required; and where both a box has been ticked and a write-in response given.

Responses to text-based "write-in" boxes will be coded by automatically comparing the written text against a pre-defined index. If a match is assessed to be sufficiently close (and unique), then a numerical code is assigned. The development of search-as-you-type functionality (see Question and questionnaire design in Section 2: Design and build) for the online questionnaire makes this relatively straightforward in most cases. Answers that are deemed uncoded from a first pass are further assessed to determine if they can be coded with updates to the indexes or using parsing strategies, a process by which words are broken down to then determine if a match to the coding index can be made.

Write-in responses (particularly on paper forms) can be error prone because of spelling mistakes, unexpected characters, or new terms that have not been encountered before. Checks will be undertaken on samples of write-in responses (such as country of birth and language) to ensure that responses have been coded accurately and consistently. Quality targets, taking account of the complexity of the question response, have been set to achieve at least that achieved in the 2011 Census (see Section 5 of the 2011 Census General Report).

Cleaning, edit and imputation

Once collected, census data records are passed through a validation and cleaning process. This involves removing invalid records and responses, removing duplicates, and imputing responses to mandatory questions where they have not been completed by a respondent. These are standard data-cleaning processes used in the production of most official statistics.

Online validation has been built into the online questionnaire, for example, to prompt the respondent to check for apparent errors like dates of birth before 1900. This will help to improve overall data quality and reduce the time taken to clean data compared with previous censuses. Our online-first approach will reduce the number of responses by paper questionnaire, which will also reduce associated scanning errors and the need for clerical capture. This will increase the quality of data capture overall.

The main stages involved in cleaning, editing and imputation are: removing "false persons" (RFP); resolving multiple responses (RMR); rules-based edits; and edit and imputation.

Removing "false persons" (RFP)

Some questionnaires are completed but contain so little information it is difficult to determine whether the questionnaire is a genuine response. In the absence of some core census variables, these questionnaires are difficult to process and risk creating overcoverage by including false records. Building on analyses of the 2001 and 2011 Census data, it has been possible to confirm which combinations of variables are most often present on a genuine response.

RFP is a process that checks the data for this combination. For a person record to be counted as a genuine response, the following information must be present on the record:

name and date of birth or
one of name or date of birth and one of sex or marital status

If a person record does not meet these requirements, it is considered to be a "false person" (DOCX, 506KB) and flagged as such. These records are not included in the outputs.

Resolving multiple responses (RMR)

It is possible to receive multiple responses for the same address, either through design or error. For example, a legitimate reason for receiving multiple responses could be if the household reference person (HRP) included all residents but one of them did not want to disclose some information so also submitted a private individual form. An example of an error is if one member of the household submits a response on paper for the whole household and another independently does it for the whole household online. This is a form of overcoverage.

RMR is a process that seeks to resolve these duplicate and conflicting responses to end up with a single response for each person, household or communal establishment (CE) at an address. It uses a series of rules to construct a single record. It also assigns persons captured on an individual form or continuation form to a household or CE, ensuring that after RMR, every person has an assigned household or CE. It only deals with multiple responses from the same location; a separate method assesses remaining overcoverage as detailed in the next subsection.

Rules-based edits

Rules are used to identify and determine the correct response to an individual question (for example, qualifications) where appropriate. This is only applied where there is a high degree of certainty about what the correct response should be, for example, a form issued to a prison that is returned with the type of establishment type missing. A further set of "filter rules" are used in association with the questionnaire's "skip pattern", where some respondents are not required to respond to all questions. This is to correct "errors" such as children aged under 16 years answering labour market questions and is more likely to occur on paper returns as the online version has in-built "filter rules". Again, this is only applied where there is a high degree of certainty what the correct response should be.

Edit and imputation

This process identifies missing, invalid or inconsistent data, and where necessary it imputes a value based on the likely response. This results in a dataset with no gaps or not stated answers, apart from the voluntary questions on religion, sexual orientation and gender identity.

As in previous censuses since 1981, we use a donor-based, minimum-change imputation strategy based on the original methodology first published by Fellegi and Holt in 1976. This is widely recognised as a methodological standard for imputing census and social survey data. Inconsistencies are identified by a set of pre-defined edit rules specifying invalid relationships between variables and identifying how they could be resolved causing the minimum amount of change to the observed data.

Missing values are replaced by drawing an observed value from another record in the data, referred to as a donor. A donor is selected from a small pool of potential donors with characteristics similar to the record currently being imputed. Similarity is measured by comparing the differences between the record needing imputation and each potential donor across a set of demographic and other predictive matching variables. We use the Canadian Census Edit and Imputation System (CANCEIS) to do this.

CANCEIS will also be used in the imputation part of the coverage estimation and adjustment process described in the next subsection.

We are considering using administrative data from the NHS Personal Demographic Service (patients registered with an NHS GP) as an "auxiliary variable" to improve imputation of age (where an individual has not completed date of birth). Our research concluded that this is the best way of utilising administrative data for imputation of missing answers in Census 2021.

Coverage estimation and adjustment

Coverage estimation

The Census aims to capture the entire population, but even with a variety of strategies to maximise response (see Maximising response in Section 2: Design and build) it is inevitable that some people will not be counted. A coverage error occurs when a member of the population is not counted. This is called undercoverage. Overcoverage, on the other hand, occurs when people are counted more than once or in the wrong location.

Coverage estimation processes measure the extent of undercoverage and overcoverage errors in the census data to be estimated. This allows us to provide population size estimates with higher accuracy than simply by using raw census counts. In the 2001 and 2011 Censuses of England and Wales, the coverage-adjusted population size estimates were the main published census outputs.

The undercoverage and overcoverage errors vary substantially by demographic characteristics (such as age and sex) and geography. In the 2011 Census of England and Wales, we estimated there to be 6% undercoverage and 0.6% overcoverage.

To estimate these levels of undercoverage and overcoverage, we use an independent re-count known as the Census Coverage Survey (CCS).The CCS consists of short interviews with every household in a random sample of unit postcodes. The CCS interviewers identify all the households in the sampled postcodes and then interview them. They do not use the Census Address Frame. The sample is stratified by local authority and the hard-to-count index described in Maximising response in Section 2: Design and build. The total sample size will be approximately 350,000 households.

CCS data will be linked to the corresponding census data where available, which allows us to identify undercoverage and overcoverage in those areas. From this, we create statistical models (principally via a process known as capture and recapture (YouTube) or dual system estimation (PDF, 817KB) to provide population size estimates for both sampled and non-sampled areas.

In 2011, the estimation approach was to measure undercoverage and overcoverage independently for around 100 areas by the hard-to-count stratum. This was because of the paper processing schedule and the time taken to process. For Census 2021, we will be using a modified version of the previous approach using a (mixed effects) logistic regression-based method that can be applied to larger areas. It will use information from across the whole of England and Wales at once, allowing a higher precision of estimates while still taking into account local differences. This means the estimates can be produced more quickly as there is just one (albeit more complex) model to fit and assess rather than 100 simpler models.

Confidence intervals are calculated at national and local authority level as part of the coverage estimation process. These are the confidence intervals that will be reported against the quality criteria outlined in The statistical design in Section 1: Overview.

Coverage adjustment

The purpose of the census coverage adjustment process is to statistically "amend" the household and person census database so that it is consistent with population estimates derived from the coverage estimation process. By adding households and people to the census database so that it agrees with the coverage estimates, it will account for people and households that are missed by the census. This means that robust census population outputs can be obtained for lower-level geographic areas (such as census Output Areas (OAs)).

The 2021 adjustment strategy has been designed to address the practical difficulties that were experienced during implementation of the 2011 methods and to make best use of the new strategy and outputs from the coverage estimation. It assumes that census population estimates are provided from coverage estimation for a variety of basic demographic characteristics for both households and persons in local authorities. For 2021, to improve accuracy, the adjustment process will be extended to use local authority level population estimates rather than constraining to estimates at higher geographic levels as in 2011.

The population estimates "inform" the adjustment process as to how many persons and households are estimated to have been missed, and so they need to be added to the census database. Since 2011, we have developed a new algorithm (Combinatorial Optimisation method – a synthetic micro-simulation technique) to better optimise the selection of donor households to be used for this purpose.

In 2011, the adjustment first added persons who were estimated to have been missed into counted census households and then whole households and people. We have reviewed the 2011 approach to seek to improve it. This has shown that for 2021, to improve quality while reducing complexity, we will only impute whole households and the persons within them. We will be closely monitoring the impact on the household size distribution and, if necessary, use established methods to impute missing people into counted households. This revised approach has addressed the problems experienced in 2011, providing significant gains in quality (DOCX, 65.3KB).

Imputed households will be placed into a geographic location by assigning postcodes from areas that they would have likely been missed from, using a mixture of census operations and administrative data to inform this where appropriate.

The adjustment process also imputes persons into CEs. The process follows the same stages as for households, except CEs are adjusted separately according to size and persons added into the CE they were selected from. More detail on the coverage estimation and coverage adjustment methodologies are available in papers presented to the Methodological Assurance Review Panel.

Quality assurance

The census results will undergo extensive quality assurance before being published. This work will be designed to:

ensure the census results provide a reliable basis for decision-making
give data users confidence that the census results are fit for purpose
allow the release of census results as soon as possible
leave a legacy of methods, tools and skills for the quality assurance of future statistics from a transformed population statistics system

Our approach to quality assurance of the Census 2021 data builds on the approach and lessons learned from the 2011 Census as described in the 2011 Census Quality Assurance Methodology Evaluation report.

The main lessons from 2011 included the importance of engaging users and understanding their perspectives in the development of quality assurance methods and output content, the benefits of using quality assurance panels to provide challenge to the results of the quality assurance work, and the need to understand how the census processes affected the data and each other.

For 2021, we have split the quality assurance into two parts: the assurance of processes, which looks at how the data have been collected and processed, and validation of the census estimates, which considers the accuracy of the census results.

The first extends the assurance conducted within each part of the census operation to ensure that processes have worked as designed. It looks to identify where errors in the data may have been introduced and whether they will have an impact on either the process or the outputs. Examples of this include "respondent error", where the information provided on a census questionnaire is not correct (for example, unexpected combinations of responses), and inconsistent coding of responses (for example, in how a write-in answer for ethnic group has been coded).

The validation of estimates will include both the census population estimates and the census estimates for all other census topics. As we want to give data users confidence that the census results are fit for purpose, we want to "see quality assurance through the eyes of a user". We want to understand what data sources users are likely to check the census results against and what information would provide reassurance that the census results can be relied on for decision-making. To help achieve that, we have established a working group with selected local authorities to provide feedback on our plans and to suggest where these should be extended. More information is available in the Annex.

As in 2011, our validation of the census population estimates will involve demographic analysis (for example, looking at the sex and age structure of populations within each local authority area) and comparisons with other administrative and survey sources. These include using Council Tax data as indicators of occupied addresses, Higher Education Statistics Agency data (PDF, 350KB) to understand the location and age structure of student populations, and data on recipients of pensions as a check on results for population at older ages. We are investigating how social changes resulting from the coronavirus pandemic might affect those data sources or comparisons based on them.

We will take advantage of the increased availability of these other sources within the ONS and work closely with teams developing transformed population statistics to understand the strengths and weaknesses of these sources as measures of the population and to develop, as far as possible, standard approaches and tools to quality assure population statistics.

The latest data will be available to the quality assurance team each day from the start of the data collection period. Carrying out initial quality assurance in parallel to data collection and processing will allow anomalies in the data to be identified and investigated as soon as possible and reduce the time needed for the final quality assurance of data before the published outputs are produced.

In quality assuring the census results for the full range of topics, we will compare results with those from the 2011 Census and from other available sources and make use of expertise across the Government Statistical Service. This is to ensure that topic experts on subjects such as demography, housing, labour market and health are assessing the stories shown by the provisional census results in the context of other evidence and trends.

The evidence from the quality assurance investigations will be presented for assessment to quality assurance panels of experts from across the ONS. Where we identify an issue that needs to be addressed, this will go through a formal governance process to decide the appropriate action.

More detail on how we intend to develop and conduct our quality assurance of the census results is available in Approach and processes for assuring the quality of the 2021 Census data.

Contingency planning

The overall coverage strategy has been developed to produce robust estimates on the assumption that census (and CCS) collection response quality targets will be met. We know from previous census experience both here and abroad, and from the coronavirus pandemic, that we need to be prepared for the unexpected. We are therefore developing a coverage contingency strategy that is able to be integrated into the processing and estimation phases, if needed. The strategy makes greater use of administrative data than the standard design as set out earlier and is based on a number of possible scenarios.

The scenarios include:

expected individual questionnaires not returned (as found in the 2018 New Zealand census)
localised census count issues (as found in the 2016 Canadian Census with Fort McMurray)
broader census count issues (such as missing the overall census quality targets)
population subgroup issues (such as higher than expected non-response for a particular ethnic group or community)

If these scenarios occur, there is likely to be an impact on the time taken to process estimates and to produce final estimates.

Our contingency plans include how administrative data could be used and how we could adapt our methods if needed.

To make use of administrative data, work has been undertaken to understand and match a range of sources. This has been carried out alongside existing research into admin-based population estimates (ABPEs). Preparation has included the development of both person- and address-centred linkage approaches, which will have the flexibility to be used in a range of scenarios.

Methodological research has been based around adaptations to the existing statistical design. Research has included how alternative coverage estimation approaches could be used such as synthetic estimation (based on existing models) and weighting class approaches, which could be applied at both national or local levels. The ONS will seek advice from the Methodology Assurance Research Panel (MARP) to help us decide which contingencies are most appropriate. Some early thinking on the impact that COVID-19 could have is reflected in the panel papers. Detailed methodological papers presented to MARP will be published after this assurance.

Back to table of contents

5. Outputs and disclosure control

Outputs

The main benefits of the census will be realised by users having access as soon as possible to high-quality statistics, released free at the point of use.

We want to maximise the data available at the lowest geographic levels and make it as easily as possible for users to access the data they need, while protecting confidentiality. This will include data tables, analysis, microdata, and data about commuting and internal migration flows.

Changes since the 2011 Census

Since the 2011 Census, we have listened to users' views and plan to publish Census 2021 data via the Office for National Statistics (ONS) website in a combination of pre-determined tables and a web-based interactive dissemination system where users can specify the data required. This means Census 2021 data will be:

flexible – the system allows users to build their own tables selecting the geography, population base and variables they require
timely – we are aiming to disseminate national and local authority level estimates for England and Wales within 12 months of Census Day and all other main data within 24 months of Census Day
accessible – we will aim to host the system on the ONS website, meaning the majority of census data will be available from one location and follow Government Digital Service guidelines on accessibility

User consultation

From 28 February to 23 May 2018, we invited feedback on our Initial View on Census 2021 Output Content Design. This consultation outlined our initial proposed design of Census 2021 outputs and the dissemination channels for England and Wales.

The overall design was supported by users. Our full response to the consultation was published on 18 December 2018.

We have since continued to develop all the products we outlined in the consultation. We have been focusing our recent research and testing on each of the following areas.

First, a dissemination system. To support the release of Census 2021 data, we are working on different approaches to provide analysis, commentary and visualisations. We want to make our census data and analysis as engaging and accessible to as wide a range of audiences as possible.

We are continuing to develop a flexible dissemination system that will enable users to find and access Census 2021 data and create their own datasets by selecting the geography, population base and variables they require.

Second, integrated outputs. We plan to produce integrated (combined census and administrative data) outputs by using Valuation Office Agency (VOA) data on number of rooms instead of asking a specific question as we have done in previous censuses. We published a blog on 31 July 2020, Census 2021 – For the first time the ONS is using administrative data to count number of rooms, which provides an update on progress.

We are planning to use other administrative data on income (using Department for Work and Pensions (DWP) and HM Revenue and Customs (HMRC) data) alongside census data. These additional components depend on having timely access to underlying data sources and overcoming the complexity in combining data from different sources. Further information on income and other administrative data-related research can be found on the population characteristics web page of the ONS website.

Third, geography. Response to the 2018 consultation provided insight into user interest and the importance of different geographies. There is continued interest in census outputs for wards (divisions in Wales) and parishes, and we understand the importance of these geographies for users. We are currently exploring different approaches to producing these data to better meet the needs of users in cases where Output Areas (OAs) (the basic geographic building bricks) do not align well with these geographies.

A geography consultation is planned for autumn 2020, which will provide an opportunity for users to comment on any aspect of the policy and the proposed products and services prior to finalisation of plans for the dissemination of census outputs and geography products and services.

Fourth, consistent and accessible metadata. The responses to the 2018 consultation clearly indicated the importance of providing metadata (for example, supporting documentation about the quality of outputs and definitions used) that are consistent and easily accessible to all users.

We are currently analysing the results of a user survey that aims to help us understand users' preferences when searching for census data, metadata and any other supporting information. This will help inform the best ways to provide metadata in terms of location, quantity and detail.

Future research and user testing is planned to understand how we will provide metadata through the flexible dissemination system and when using an application programming interface (API) and Geographic Information System (GIS) software.

Updates on this work, and our approach to providing metadata, will be available on our website.

Fifth, origin-destination data. We still plan to release these origin-destination data earlier than we did in 2011 and are looking at the best way to disseminate this information to users. More information on these data is available on the origin-destination web page.

Sixth, anonymised microdata. We will continue to provide access to anonymised microdata samples for users following the Census 2021 and widely promote their benefits. We will also ensure the samples do not present a statistical disclosure risk. Future updates on microdata will be published on the microdata web page.

Seventh, a longitudinal study. The ONS Longitudinal Study will be extended by adding data from Census 2021 for England and Wales to those records already included from the 1971 to 2011 censuses.

We are continuing to research the best possible arrangements for access to microdata samples, origin-destination products and the ONS Longitudinal Study.

Eighth, UK harmonisation of outputs. To support harmonisation activities and coordinate engagement with users of UK data, we meet regularly with National Records of Scotland (NRS), Northern Ireland Statistics and Research Agency (NISRA) and Welsh Government. NRS will be undertaking the census in Scotland in 2022. We are working closely with NRS and NISRA to ensure we continue to produce the UK-wide population statistics necessary and will continue to engage with users through a new UK Data User Working Group (see Section 7: Further information and engagement). UK population estimates for mid-2021 will still be delivered, but they will be based on the 2021 Censuses in England and Wales and Northern Ireland and "rolled-forward" from the 2011 Census in Scotland.

We will also ensure that harmonisation information is more easily accessible to users when outputs are made available. Updates on UK harmonisation can be found on the harmonisation web page.

Further consultation

We plan to consult on the timing of output products and then make an indicative release schedule available as part of our Census Outputs Prospectus in autumn 2021.

Disclosure control

The ONS has a legal obligation, under the Statistics and Registration Service Act 2007 and the General Data Protection Regulation May 2018, to protect the confidentiality of census data. Statistical disclosure control methods ensure that statistical outputs provide as much value and utility as possible to users while protecting the confidentiality of information about individuals, households and organisations.

Census 2021 statistical disclosure control

For Census 2021, we are building on the methods of statistical disclosure control used for 2011 Census outputs. These methods were well received by users, but there were some limitations on table designs to maintain confidentiality.

The main targeted record swapping method is similar to the approach applied to 2011 Census data. Individuals and households were assessed for risk based on their uniqueness (on a small number of characteristics). Pairs of households in different areas were swapped to add uncertainty to data viewed at a low geography. The individuals or households with rare or unique characteristics were more likely to be swapped, though every household had a chance of being swapped. Swaps were made between "similar" households, based on a different set of characteristics, to minimise the impact on utility.

For users to access the data they need, we are developing a website application to enable users to design their own data requests. While record swapping is the main protection for outputs in 2021, the targeting has been updated to protect risks in the flexible dissemination system, and the cell-key method will protect against "differencing" attacks, which are more common when there is a more flexible system producing a greater number of tables. The combined methods will ensure that apparent information on individuals will be uncertain, it may be true or it may be the result of the disclosure control methods and adding this uncertainty does not significantly damage the utility of the data.

When a user defines a table they require, that table will have automated "disclosure checks" applied to determine whether the combination of variables, categories and geographies could be accessed from the flexible dissemination system. Work is ongoing to investigate the parameters of these checks, which aim to balance releasing as much data as possible without tables being too detailed or posing a disclosure risk.

To reach this decision, we consulted with users on our approach as part of the 2021 Census Outputs Consultation and received a positive response. We also sought external approval from the UK Census Committee (the most senior governance for the UK censuses and chaired by the National Statistician) and the UK Statistics Authority Methodological Assurance Review Panel.

More information and the alternative methods considered are detailed in a paper presented to the UK Statistics Authority Methodological Assurance Review Panel (DOCS, 256KB).

Harmonisation

The ONS, NRS and NISRA are broadly harmonised on the use of targeted record swapping and the cell-key method. However, there will be differences between the countries on some of the details within the approach, which are outlined in Statistical Disclosure Control in 2021 (DOCX, 303KB).

This approach has been shared with the Eurostat SDC Expert Working Group and adopted as the recommended approach for the 28 EU National Statistical Institutes (NSIs) for Harmonised protection of census data.

Back to table of contents

6. Leaving a legacy for the future

Census 2021 will lay a foundation for the future where we aspire to make even greater use of administrative data combined with surveys to provide ever more frequent, timely and responsive statistics and analysis to meet policy and other user needs. Census 2021 data and the systems and methods used to carry it out will provide an important legacy for the future. Census 2021 data will still have considerable value in the years following the census, and we are looking at ways we can "roll forward" that information and update it with administrative and survey data in ways that we have not done before.

The methods developed for the census on addressing, edit and imputation, and coverage estimation are similar to those required in the future. We also intend to use the disclosure control approach and flexible table builder system and other output dissemination approaches for wider Office for National Statistic (ONS) statistical dissemination.

A recommendation from the National Statistician at the end of 2023 will set out what is needed to support this new statistical system and whether another census is needed in 2031. You can find out more about our transformation in our 2019 progress update.

Back to table of contents

7. Further information and engagement

We need support from our stakeholders to make Census 2021 a success and enable us to provide users with the census outputs that they need. From our experience of the 2011 Census, we know that local authorities' knowledge and understanding of their local areas and communities significantly contributed to the success of the census. We are working in partnership with local authorities and other stakeholder groups to ensure that we:

raise awareness and understanding of Census 2021
explain to local authorities the role they can play in participating and supporting Census 2021
build confidence and trust in Census 2021 methodology and outputs

We do this through regular scheduled meetings and on an ad hoc basis as needed, including Local Authority Operational Management Group meetings and our Census Advisory Groups (CAGs) that represent the interests of our main user communities. You can find out more about how we are working with local authorities and community groups or email our Local Authorities Partnership Team.

We explain more about how we have been engaging with users to ensure Census 2021 will serve the publicc in our response to the Office for Statistics Regulation (OSR) recommendations as part of the National Statistics accreditation process for Census 2021 statistics. Examples of how we are working with other users to get their feedback and input into our plans include the Local Authority Quality Assurance Working Group (LAQAWG), which helps ensure our plans and approach to quality assurance will build users' trust in our census estimates and confidence in their use (see the Annex).

We are also setting up a new UK Data User Working Group, aimed at those who would like to engage with us about developing and using UK data outputs. If you would like to register an interest in joining this group, or get involved in the development of Census 2021 output content and design, please email the Census Outputs Team.

We have also consulted the UK Statistics Authority Methodological Assurance Review Panel, chaired by Sir Bernard Silverman, to provide independent scrutiny of all aspects of the statistical design. The panel is drawn from a range of academic disciplines, and members have experience in statistical methods and are users of census, survey and administrative data.

Because of the coronavirus (COVID-19) pandemic, we were unable to hold our annual conference in July 2020, where we had planned to engage more widely with users about our plans for Census 2021. We are always interested in our users' input and feedback, and we are developing different ways to inform our users about our plans and statistical design for Census 2021, which will also provide opportunities to ask questions and help us better meet your information needs.

More information about the latest opportunities to get involved is available. You can also register for email updates about the 2021 Census and/or other areas of the Office for National Statistics (ONS) or email us at 2021census.engagement@ons.gov.uk.

Back to table of contents

8. Annex: Quality assurance approach for Census 2021

In January 2020, we published an initial proposal for the approach and processes for assuring the quality of the 2021 Census data and invited comments from users of the census results.

To help develop that initial proposal, we have been working with our local authority users to refine our approach to quality assuring the results. We are grateful to local authority staff who have contributed to this work and welcome any comments on the proposed approach via email at census.quality.assurance@ons.gov.uk.

Involving local authorities in the development of the approach to quality assuring census results builds on the approach we adopted in the 2011 Census through the Quality assurance studies project.

The Local Authority Quality Assurance Working Group (LAQAWG) has provided a useful forum to review our Census 2021 quality assurance plans and better understand the actions that local authorities will take to understand the quality of our census estimates for their areas. We plan to publish an update to our initial proposed approach to quality assuring Census 2021 results later in 2020.

Our approach

While there are a wide range of users of census results, we recognise that local authority users have a unique perspective as not only users of the data but as experts on their local population. The information we initially sought from authorities could be summarised as three important questions:

what aspects of a local authority's population will need particular attention in our quality assurance?
what data sources will local authorities use to check their Census 2021 results when they are first published?
are there checks we should be carrying out on the Census 2021 results beyond those described in the initial proposal?

To address these questions, we established LAQAWG as a group of representative local authorities to talk through the proposed approach to quality assurance and local authority concerns about quality. We also continued with the wider engagement with local authorities, reminding them of the publication of the initial proposal and inviting comments on that. This approach combined the benefits of giving all authorities the opportunity to influence our approach with those of having detailed discussions of the proposed approach with a range of authorities.

Local Authority Quality Assurance Working Group (LAQAWG)

As it would have been impractical to hold detailed discussions with every local authority, we invited a representative selection of local authorities to form a working group to discuss the approach to quality assuring the census results. In choosing a representative group, we took into account the criteria used to select the census rehearsal areas, such as the presence of large elderly, student or migrant populations. We also ensured there was a good geographic spread of areas including at least one authority in Wales. Applying these criteria produced a list of 10 local authorities.

We held the first meeting of the group in November 2019 and held four further meetings between then and September 2020. At the suggestion of the group, we expanded its membership from the initial 10 to 19, allowing us to ensure that each region was represented. The authorities forming the group are listed later in this annex.

As well as the full meetings, we have also held individual meetings with some group members to discuss issues specific to their area. For example, Brighton and Hove were able to supply us with insight on their LGBTQ+ population, which may be useful for us when it comes to quality assuring the census estimates in that area.

Wider engagement with local authorities

In 2019, we had discussed our preliminary thoughts on quality assuring the Census 2021 results with external users at the Census Advisory Groups and the Local Authority Operational Management Group. In January 2020, we published our initial proposal for the approach and processes for assuring the quality of the 2021 Census data and invited comments from all users. We circulated this proposal to local authorities through the Census 2021 local authority newsletter in March and August 2020.

What they told us

The discussions at the LAQAWG and comments received through the other engagement described earlier are summarised in this subsection.

General

After an initial meeting, three priorities emerged:

knowledge – local authority users wanted to understand the Census 2021 design so they could be reassured that possible quality issues were being addressed
communication – authorities wanted us to be transparent about our plans for quality assurance
understanding – authorities wanted us to listen to and take on board their concerns about potential quality issues

This feedback was taken into account in planning our Statistical design for Census 2021 and the final approach to quality assurance planned for publication later in 2020. We have taken account of the concerns expressed by authorities during this work in our development of our plans for assurance.

Aspects of local authorities' populations that need particular attention

Local authorities identified a range of aspects of their populations that might require particular attention. The most common concern was houses in multiple occupancy (HMOs). These are residential properties that have three or more tenants who share either bathroom, living room or kitchen facilities. While authorities did not express general concern about these properties being missed, several were concerned that people living within the addresses would be more likely to be missed.

Other characteristics or populations that were mentioned by multiple local authorities were:

student populations
Gypsy, Traveller and Roma populations
other minority ethnic groups
migrants and asylum seekers
populations where English is not their first language
hotels and bed and breakfasts

Other aspects identified by individual authorities included:

less digitally engaged or able populations
rough sleepers
prisons
LGBTQ+ populations

This information is largely consistent with our previous expectations, and we expect to cover all these points in our standard assurance checks. We recognise the concern over HMOs and will investigate using local authority data on these to inform our quality assurance.

Data sources used to check Census 2021 results

As part of quality assurance activities prior to the publication of census estimates, we will check against a range of sources to demonstrate coherence. We were keen to understand the sources local authorities would compare with published census estimates for them to have confidence in coherence. The majority of the data sources identified by local authorities had already been identified by us for use in quality assurance and are listed in Annex C of our initial proposed approach to quality assurance.

The Electoral Register (PDF, 553KB) was discussed as a comparator for adult residents and occupied households, but it was derived from a source feeding into the Office for National Statistics's (ONS's) Address Register so could not provide an independent check of addresses. There was a common expectation to check Census 2021 results against Council Tax data and English and Welsh School Census data (PDF, 2.94MB), though local knowledge could be helpful in interpreting the latter.

In addition to the sources identified in the published proposal, authorities also suggested using data on communal establishment (CE) (such as care homes) bed spaces and HMOs.

We are working to increase the number of local authorities that provide us with Council Tax data and are looking to acquire data on CEs and HMOs (as noted earlier).

Further quality assurance local authorities suggest we should do

The working group confirmed our need to implement the checks detailed in the approach and processes for assuring the quality of the 2021 Census data. However, they also felt that we need to communicate clearly the work taking place in readiness to quality assure and validate the numbers prior to the release of the census estimates and the design and methodology of Census 2021.

We plan to publish an update to the initial proposed approach to quality assuring Census 2021 results later in 2020.

Local Authority Quality Assurance Working Group members

Birmingham\ Blackpool\ Brighton and Hove\ Bristol\ Cardiff\ Cornwall\ Dover\ Enfield\ Kensington and Chelsea\ Leicester\ Manchester\ Newcastle\ Nottingham\ Plymouth\ Scarborough\ Swansea\ West Suffolk

Next steps

The Local Authority Quality Assurance Working Group (LAQAWG) has provided a useful forum to review our Census 2021 quality assurance plans and better understand the actions that local authorities will take to understand the quality of our census estimates for their areas. The broad support for our planned approach is positive, and we will incorporate suggestions for further actions that could help build user confidence in our estimates where we can.

Back to table of contents

Cookies on ons.gov.uk

In this section

The statistical design

Design and build

Monitor and counter

Process and estimate

Outputs and disclosure control

Use of administrative data in Census 2021

Keeping your data safe

National Statistics accreditation

Topics and user needs

Population bases

Question and questionnaire design

Address Frame

Overview and quality targets

Census Address Frame

Demonstrating the quality of the frame

Maximising response

A digital-first census

Helping everyone to respond

Responding privately

Identifying those who might need help

Response-chasing Algorithm (RCA)

Field Prioritisation Algorithm (FPA)

A comprehensive, flexible and intelligent approach – changes since the 2011 Census

Background

Processing

Coding

Cleaning, edit and imputation

Removing "false persons" (RFP)

Resolving multiple responses (RMR)

Rules-based edits

Edit and imputation

Coverage estimation and adjustment

Coverage estimation

Coverage adjustment

Quality assurance

Contingency planning

Outputs

Changes since the 2011 Census

User consultation

Further consultation

Disclosure control

Census 2021 statistical disclosure control

Harmonisation

Our approach

Local Authority Quality Assurance Working Group (LAQAWG)

Wider engagement with local authorities

What they told us

General

Aspects of local authorities' populations that need particular attention

Data sources used to check Census 2021 results

Further quality assurance local authorities suggest we should do

Local Authority Quality Assurance Working Group members

Next steps