Administrative data can provide information on housing characteristics. Our previous assessment of the potential uses of Valuation Office Agency (VOA) data identified that they are a primary data source to be incorporated into the design and development of Census 2021. This was acknowledged in the 2021 Census White Paper, which states that the Office for National Statistics (ONS) is committed to replace the number of rooms question using VOA data.
Incorporating VOA data into the census requires linking the two data sources together. An imputation process is required to ensure that the quality of Census 2021 outputs are not impacted by households that are missing the number of rooms because of either a failure to link to VOA data or missing values in VOA data. This report provides an overview of research using linked 2011 Census and VOA data to ensure VOA number of rooms is suitable to undergo imputation. A more detailed paper on the methodology used has also been published.
To test the methodology ahead of Census 2021, the comparisons reported here have been made at address level by linking 2011 Census and VOA records (as at 2011) using the unique property reference number (UPRN1). Records with duplicate UPRNs, which may be indicative of more than one household at an address, have been included in these analyses. This differs from our previous publications, which removed records with duplicate UPRNs. The aim of imputation is to create a full dataset with no missing data, and removing duplicated records would mean the data may not capture certain types of households.
Our analysis demonstrates that it is feasible to predict missing VOA number of rooms using donor-based imputation using 2011 Census household variables. Based on this analysis, it is the ONS’ view that linked VOA number of rooms data are suitable to undergo imputation in Census 2021 for England and Wales.
Notes for: Summary
- A unique property reference number (UPRN) is a unique alphanumeric identifier for every address in Great Britain and can be found in Ordnance Survey’s address products.
We are transforming the way we produce population, migration and social statistics to better meet the needs of our users and to produce the best statistics from all available data. This includes the use of alternative sources to provide information on number of rooms in Census 2021 (see Section 3).
More information about our plans to do this and how we are progressing a programme of work to put administrative data at the core of population, migration and social statistics is available. More information about the Valuation Office Agency (VOA) data can be found in the source overview, and a summary of the quality assurance we have undertaken on it for Census 2021 is available. Our method for linking the two data sources is described in Section 9.
We welcome users providing feedback on these research outputs and the methodology used to produce them, including how they might be improved and potential uses of the data. Please email your feedback to email@example.com and include “Housing” in the subject line of your response.
Imputation is the process of identifying and treating errors in data. Errors in VOA number of rooms data refer to the number of rooms value being missing in the original VOA data prior to linkage with the census or because of failure to link VOA data with the census.
Crucially, these results apply to this one variable from this one administrative source. They should not be seen as a general endorsement that all linked survey-administrative data are suitable to undergo donor-based imputation.Back to table of contents
A Census 2021 topic consultation recommended to continue collecting information on the number of bedrooms from the census, as this is used to derive measures of overcrowding and underoccupancy. Number of rooms on the census primarily meets the same information need as number of bedrooms.
The Office for National Statistics’ (ONS’) intention to reduce respondent burden by using alternative sources to provide information on number of rooms was announced in the Census 2021 White Paper, Help Shape Our Future: The 2021 Census of Population and Housing in England and Wales.
Our previous assessment of the feasibility of using Valuation Office Agency (VOA) data to replace the number of rooms question concluded that the direct agreement rate between the 2011 Census and VOA data for number of rooms was 16%. This was primarily attributable to definitional differences between the 2011 Census and VOA rooms variables. The census included kitchens, utility rooms and conservatories in its number of rooms estimates, which the VOA data do not. Since most properties have a kitchen, the number of rooms in the census data was generally higher than the corresponding number of rooms in the VOA data (see Figure 1). If we assume that the number of rooms derived using VOA data records is at least one room less than when derived using the census data, then the agreement rate increases to 48%.
Comparatively, the quality of the census responses for number of rooms was measured by the 2011 Census Quality Survey (CQS) at 67%. The survey found that differences occurred because respondents had misunderstood the question. Most of these differences (93%) were within plus or minus one room.
This does not mean that the VOA data are of low statistical quality. Using VOA number of rooms for Census 2021 does imply a discontinuity with 2011 Census estimates (because of the definitional difference) that users need to be aware of. It will not be appropriate to measure change in number of rooms from 2011 to 2021; instead, the census bedroom question can be used for comparisons over time. Using the number of rooms in the VOA data for Census 2021 will provide a high-quality relative measure of size enabling the comparison of households across areas within the same time period. Therefore VOA number of rooms can be used for the derivation of the Carstairs Index and Indices of Multiple Deprivation (IMD).
This is the first time we are using administrative data linked to the census to produce a census statistical output. We need to ensure administrative data are suitable to undergo imputation to ensure that the quality of future census outputs is not impacted by households that are missing number of rooms.Back to table of contents
Missing data in the number of rooms variable occurred where 2011 Census data did not link to Valuation Office Agency (VOA) data or number of rooms was missing on the VOA data. Imputation is required in both cases. Imputation is also required where data violate an edit rule. For example, the 2011 Census edit and imputation strategy contained the rule: “A household cannot have more bedrooms than rooms.” In these cases, the VOA number of rooms value would be removed and a new one imputed to be consistent with the census number of bedrooms value.
Our analyses used a donor-based imputation method (Canadian Census Edit and Imputation System (CANCEIS)). This method was used in 2011 Census processing, and it is our intention to use it for Census 2021 processing. Records with errors in responses (“recipients”) are assigned values from other similar records that are error-free (“donors”). In these analyses, recipients were records where the number of rooms was missing. The majority of these records were 2011 Census records that did not link to VOA data.
To ensure imputation remains unbiased, donor records are selected based on having similar characteristics as the recipient records. The characteristics used for donor selection were the 2011 Census household variables1. For example, a donor record may have the same accommodation type as the recipient record and have its number of rooms value copied (imputed) into the recipient record.
2011 Census records that were not linked to VOA data formed a subpopulation of census records but crucially were not completely distinct from linked records. This means there were enough suitable donors available for imputation. Figure 2 is representative of how the distributions differed between the linked and unlinked records.
Missingness in the data was also considered prior to imputation. We found that the probability of a missing value can be predicted by the values of the census household responses. Therefore, a suitable unbiased imputation strategy could be designed. We discuss this in more detail in our methodology article.
We also investigated the number of properties where the number of bedrooms was equal to the number of rooms: 1.6% of records in the 2011 Census had an equal number of rooms and bedrooms, compared to 6.4% when comparing census number of bedrooms and VOA number of rooms. This is because of the number of rooms in the VOA data typically being one less than in the 2011 Census. This will have implications for the continuity of outputs. We welcome user feedback on this.
Notes for: Missingness and viability of using a donor-based imputation approach
- The census household variables are:
- number of usual residents
- accommodation type
- number of bedrooms
- central heating
- number of cars or vans
We aim to repeat these analyses using 2019 Census Rehearsal data linked to equivalent Valuation Office Agency (VOA) data to ensure we are operationally ready for Census 2021.
This research demonstrates a worst-case scenario. In the future, data linkage between VOA and census data should improve. This is because there are methodological differences between the 2011 Census and Census 2021 in the way unique property reference numbers (UPRNs) are assigned. Our current research removed 1.7% of 2011 Census records that could not be assigned a UPRN. Census 2021 will use an address frame that has an implicit link to VOA data for most records; therefore, the number of census records that cannot be linked to VOA data should be lower.
There were timeframe differences between 2011 Census and VOA data used in the current research, which was a snapshot capture from 2016 and filtered to only include addresses built before 2012 to enable a better comparison to census data. Some error in the linked data may be attributed to this difference. In the future, we will have VOA data that aligns with the census data collection period. This will likely reduce any discrepancies resulting from the data being captured in different years, hence reducing overall rates of error.Back to table of contents
We are keen to get feedback on this research and the methodology used, including how they might be improved and potential uses of the data. Please email your feedback to firstname.lastname@example.org and include “Housing” in the subject line of your response.Back to table of contents
You might also be interested in:
- ONS working paper series no 20 – Feasibility of using donor-based imputation for census outputs on number of rooms using Valuation Office Agency data
- Estimating the number of rooms and bedrooms in the 2021 Census: An alternative approach using Valuation Office Agency data
- Valuation Office Agency data
- Administrative Data Census Research Outputs