1. Summary

Administrative data can provide information on housing characteristics. Our previous assessment of the potential uses of Valuation Office Agency (VOA) data identified that they are a primary data source to be incorporated into the design and development of Census 2021. This was acknowledged in the 2021 Census White Paper, which states that the Office for National Statistics (ONS) is committed to replace the number of rooms question using VOA data.

Incorporating VOA data into the census requires linking the two data sources together. An imputation process is required to ensure that the quality of Census 2021 outputs are not impacted by households that are missing the number of rooms because of either a failure to link to VOA data or missing values in VOA data. This report provides an overview of research using linked 2011 Census and VOA data to ensure VOA number of rooms is suitable to undergo imputation. A more detailed paper on the methodology used has also been published.

To test the methodology ahead of Census 2021, the comparisons reported here have been made at address level by linking 2011 Census and VOA records (as at 2011) using the unique property reference number (UPRN1). Records with duplicate UPRNs, which may be indicative of more than one household at an address, have been included in these analyses. This differs from our previous publications, which removed records with duplicate UPRNs. The aim of imputation is to create a full dataset with no missing data, and removing duplicated records would mean the data may not capture certain types of households.

Our analysis demonstrates that it is feasible to predict missing VOA number of rooms using donor-based imputation using 2011 Census household variables. Based on this analysis, it is the ONS’ view that linked VOA number of rooms data are suitable to undergo imputation in Census 2021 for England and Wales.

Notes for: Summary

  1. A unique property reference number (UPRN) is a unique alphanumeric identifier for every address in Great Britain and can be found in Ordnance Survey’s address products.
Back to table of contents

2. Things you need to know about this release

We are transforming the way we produce population, migration and social statistics to better meet the needs of our users and to produce the best statistics from all available data. This includes the use of alternative sources to provide information on number of rooms in Census 2021 (see Section 3).

More information about our plans to do this and how we are progressing a programme of work to put administrative data at the core of population, migration and social statistics is available. More information about the Valuation Office Agency (VOA) data can be found in the source overview, and a summary of the quality assurance we have undertaken on it for Census 2021 is available. Our method for linking the two data sources is described in Section 9.

We welcome users providing feedback on these research outputs and the methodology used to produce them, including how they might be improved and potential uses of the data. Please email your feedback to admin.based.characteristics@ons.gov.uk and include “Housing” in the subject line of your response.

Imputation is the process of identifying and treating errors in data. Errors in VOA number of rooms data refer to the number of rooms value being missing in the original VOA data prior to linkage with the census or because of failure to link VOA data with the census.

Crucially, these results apply to this one variable from this one administrative source. They should not be seen as a general endorsement that all linked survey-administrative data are suitable to undergo donor-based imputation.

Back to table of contents

3. Background

A Census 2021 topic consultation recommended to continue collecting information on the number of bedrooms from the census, as this is used to derive measures of overcrowding and underoccupancy. Number of rooms on the census primarily meets the same information need as number of bedrooms.

The Office for National Statistics’ (ONS’) intention to reduce respondent burden by using alternative sources to provide information on number of rooms was announced in the Census 2021 White Paper, Help Shape Our Future: The 2021 Census of Population and Housing in England and Wales.

Our previous assessment of the feasibility of using Valuation Office Agency (VOA) data to replace the number of rooms question concluded that the direct agreement rate between the 2011 Census and VOA data for number of rooms was 16%. This was primarily attributable to definitional differences between the 2011 Census and VOA rooms variables. The census included kitchens, utility rooms and conservatories in its number of rooms estimates, which the VOA data do not. Since most properties have a kitchen, the number of rooms in the census data was generally higher than the corresponding number of rooms in the VOA data (see Figure 1). If we assume that the number of rooms derived using VOA data records is at least one room less than when derived using the census data, then the agreement rate increases to 48%.

Comparatively, the quality of the census responses for number of rooms was measured by the 2011 Census Quality Survey (CQS) at 67%. The survey found that differences occurred because respondents had misunderstood the question. Most of these differences (93%) were within plus or minus one room.

This does not mean that the VOA data are of low statistical quality. Using VOA number of rooms for Census 2021 does imply a discontinuity with 2011 Census estimates (because of the definitional difference) that users need to be aware of. It will not be appropriate to measure change in number of rooms from 2011 to 2021; instead, the census bedroom question can be used for comparisons over time. Using the number of rooms in the VOA data for Census 2021 will provide a high-quality relative measure of size enabling the comparison of households across areas within the same time period. Therefore VOA number of rooms can be used for the derivation of the Carstairs Index and Indices of Multiple Deprivation (IMD).

This is the first time we are using administrative data linked to the census to produce a census statistical output. We need to ensure administrative data are suitable to undergo imputation to ensure that the quality of future census outputs is not impacted by households that are missing number of rooms.

Back to table of contents

4. Linking 2011 Census with Valuation Office Agency data

The current research linked 2011 Census data to 2016 Valuation Office Agency (VOA) data at address level using unique property reference numbers (UPRNs). Properties in the VOA data that were built after 2011 were removed prior to linkage to enable better comparison to the census data. A more detailed description of the linkage methodology can be found in Section 9.

The 2011 Census captured address information at household level. The 2011 Census defines a household as “one person living alone, or a group of people (not necessarily related) living at the same address who share cooking facilities and share a living room or sitting room or dining area”.

Most residential addresses in England and Wales are used by a single household, but we identified that 1% of households had a duplicate address on the 2011 Census, which may be because there was more than one household at an address. In contrast, the VOA data hold information on addresses, and it is not currently possible to identify multiple households at an address from address information alone. Without additional information about the residents and the relationships between them, it is difficult to tell when there are multiple households living at the same address.

For the purpose of these analyses, 2011 Census records with duplicate UPRNs have been included. As these records are not from an administrative data source, we are able to determine multiple households at these addresses. Note that these records were not linked to the VOA data and were treated the same as census records that did not link to VOA data.

2011 Census records that could not be assigned a UPRN (1.7%) have been removed in these analyses as we could not link these records to the VOA data. The 2011 Census did not use UPRNs as address identifiers at the time of capture. There are methodological differences between the 2011 Census and Census 2021 in the way UPRNs are assigned, which means the number of census records that cannot be assigned a UPRN should reduce. Our plans to address this in the future are outlined in Section 7.

Back to table of contents

5. Missingness and viability of using a donor-based imputation approach

Missing data in the number of rooms variable occurred where 2011 Census data did not link to Valuation Office Agency (VOA) data or number of rooms was missing on the VOA data. Imputation is required in both cases. Imputation is also required where data violate an edit rule. For example, the 2011 Census edit and imputation strategy contained the rule: “A household cannot have more bedrooms than rooms.” In these cases, the VOA number of rooms value would be removed and a new one imputed to be consistent with the census number of bedrooms value.

Our analyses used a donor-based imputation method (Canadian Census Edit and Imputation System (CANCEIS)). This method was used in 2011 Census processing, and it is our intention to use it for Census 2021 processing. Records with errors in responses (“recipients”) are assigned values from other similar records that are error-free (“donors”). In these analyses, recipients were records where the number of rooms was missing. The majority of these records were 2011 Census records that did not link to VOA data.

To ensure imputation remains unbiased, donor records are selected based on having similar characteristics as the recipient records. The characteristics used for donor selection were the 2011 Census household variables1. For example, a donor record may have the same accommodation type as the recipient record and have its number of rooms value copied (imputed) into the recipient record.

2011 Census records that were not linked to VOA data formed a subpopulation of census records but crucially were not completely distinct from linked records. This means there were enough suitable donors available for imputation. Figure 2 is representative of how the distributions differed between the linked and unlinked records.

Missingness in the data was also considered prior to imputation. We found that the probability of a missing value can be predicted by the values of the census household responses. Therefore, a suitable unbiased imputation strategy could be designed. We discuss this in more detail in our methodology article.

We also investigated the number of properties where the number of bedrooms was equal to the number of rooms: 1.6% of records in the 2011 Census had an equal number of rooms and bedrooms, compared to 6.4% when comparing census number of bedrooms and VOA number of rooms. This is because of the number of rooms in the VOA data typically being one less than in the 2011 Census. This will have implications for the continuity of outputs. We welcome user feedback on this.

Notes for: Missingness and viability of using a donor-based imputation approach

  1. The census household variables are:
    • number of usual residents
    • accommodation type
    • landlord
    • tenure
    • number of bedrooms
    • central heating
    • number of cars or vans
Back to table of contents

6. Demonstration of donor-based imputation using linked 2011 Census and VOA data

We demonstrate the viability of the approach outlined by using a linked dataset focusing on the 10 local authorities with the highest percentages of missing data (between 18.0% and 45.8%; more detail can be found in our methodology article) in the Valuation Office Agency (VOA) number of rooms variable after linkage, with the addition of the unlinked census records. This is because these local authorities could present the most challenges to imputation. Data were imputed using a similar strategy to that detailed in the 2011 Census Item Edit and Imputation Process report (PDF, 204KB).

The current 2021 processing plan is that census questionnaire response data will be processed first, and administrative data will be linked and processed second. This retains the integrity of census questionnaire responses and ensures that they cannot be edited by administrative data during imputation.

To reproduce this, we first imputed missing census household variables and then imputed VOA number of rooms data using fully imputed census household variables to select appropriate donor values.

Given the definitional differences between the 2011 Census and VOA number of rooms variables, we did not compare the distributions after imputation. Instead, we compare the post-imputation distribution of VOA number of rooms to the pre-imputation distribution of VOA number of rooms data.

As Figure 3 shows, the VOA post-imputation distribution appears similar to the pre-imputation distribution. This is in line with what is expected in a successful imputation based on our knowledge of the causes of missingness. Overall, imputation did not have a large effect on the distribution of number rooms, which gives us confidence in using this method of imputation in the production of statistics about number of rooms in Census 2021.

Back to table of contents

7. Next steps

We aim to repeat these analyses using 2019 Census Rehearsal data linked to equivalent Valuation Office Agency (VOA) data to ensure we are operationally ready for Census 2021.

This research demonstrates a worst-case scenario. In the future, data linkage between VOA and census data should improve. This is because there are methodological differences between the 2011 Census and Census 2021 in the way unique property reference numbers (UPRNs) are assigned. Our current research removed 1.7% of 2011 Census records that could not be assigned a UPRN. Census 2021 will use an address frame that has an implicit link to VOA data for most records; therefore, the number of census records that cannot be linked to VOA data should be lower.

There were timeframe differences between 2011 Census and VOA data used in the current research, which was a snapshot capture from 2016 and filtered to only include addresses built before 2012 to enable a better comparison to census data. Some error in the linked data may be attributed to this difference. In the future, we will have VOA data that aligns with the census data collection period. This will likely reduce any discrepancies resulting from the data being captured in different years, hence reducing overall rates of error.

Back to table of contents

8. Feedback

We are keen to get feedback on this research and the methodology used, including how they might be improved and potential uses of the data. Please email your feedback to admin.based.characteristics@ons.gov.uk and include “Housing” in the subject line of your response.

Back to table of contents

10. Contact details for this methodology

Sarah Collyer
Email: admin.based.characteristics@ons.gov.uk

Back to table of contents