We have published multivariate data from Census 2021, based on the usual resident population of England and Wales. These data allow you to combine different variables and look at the relationships between the data, providing you with rich insights into the characteristics of the population of England and Wales.
We released multivariate data from Census 2021 during phase two of the Census 2021 outputs release schedule. The data were released in four stages:
28 March 2023: Data combining multiple variables, England and Wales: Census 2021. New functionality that will allow you to make your own datasets by selecting specific variables, classifications, and levels of geography.
4 April 2023: Sexual orientation and gender identity combining multiple variables, England and Wales: Census 2021. Pre-built multivariate datasets combining sexual orientation and gender identity data with other variables from Census 2021.
18 April 2023: Person-level data combining multiple variables, England and Wales: Census 2021. Pre-built multivariate datasets for person-level data not already released through the Create a custom dataset tool.
25 April 2023: Data about households and families combining multiple variables, England and Wales: Census 2021. Pre-built multivariate datasets about households and families not already released through the Create a custom dataset tool.
Create a custom dataset tool
We have introduced new functionality for Census 2021 data that allows you to make your own datasets by selecting different combinations of census variables. You can access the Create a custom dataset tool from the Census hub.
There are millions of possible combinations of variables and classifications within this release from which people can create a custom dataset.
Sometimes we need to make changes to data if it is possible to identify individuals. This is known as statistical disclosure control (SDC). All census data is subject to statistical disclosure control (SDC). There is a greater impact on sexual orientation and gender identity (SOGI) because of the risk of small values at low geographies. Therefore, we are publishing pre-built datasets including SOGI variables at the lowest possible non-disclosive geography.
How to use the create a custom dataset tool
We have created a short video you can view on how to create a custom dataset.
You can start by choosing a population base then, before downloading these datasets, you will be able to:
select a level of geography, for example, lower-tier local authorities
select whether you want data for all areas or filter for specific area types, for example, Southampton and or Cardiff
choose and change the variables for Census 2021 data
after you've downloaded your data, you can filter for a specific category of the variables included in that dataset, for example, the population aged 16 to 24 years from the age variable
For most datasets, there are additional features that allow you to:
select a classification (a level of detail for a variable) if there are multiple options
add or remove variables
Single year of age data
In all our census outputs we have to balance the possibility of personal identification with the utility of the data. We have listened to feedback and we understand the need to release single year of age data at small geographies. As such, we will now be releasing single year of age data at middle super output area level as part of the Create a custom dataset tool.
Ethnic group classifications
The Ethnic group variable will show 3 classifications which contain 19, 7 and 5 categories respectively.
To provide information on those who identified through the write-in response option, we produced detailed datasets for 287 ethnic groups, 57 religious groups and 93 main languages. These were published as part of our topic summaries: Ethnic group, national identity, language, and religion: Census 2021 in England and Wales - Office for National Statistics (ons.gov.uk). You can read more about this in our blog post: How am I represented in Census 2021 data? | National Statistical (ons.gov.uk). These detailed datasets will not be included in the flexible table builder but instead will form part of our blended offering.
Statistical disclosure control
The new flexible functionality for Census 2021 is enabled through dynamic statistical disclosure control methodology that protects data confidentiality.
Once you make a request for a dataset, the system will run automated disclosure checks in real time to determine if the requested data are safe to share. If the dataset passes our statistical disclosure checks, then you can download the data. The Create a custom dataset tool will provide dynamic suggestions on how to edit your data request to maximise the returned data.
There is a balance in that if you request data about more detailed classifications of variables, you are more likely to receive these for larger geographies than smaller geographies. Likewise, if you request data about very small geographies, you are more likely to receive these if you have requested less detail about the variables.
Cell key perturbation
The use of perturbation causes small changes to cells but does not intrinsically impact the interpretation of the data.
Where tables are constructed in different ways, the perturbation applied will be different, leading to differences between totals and tables not 'adding-up' to their totals. To minimise the effect of perturbation, we recommend where possible using totals from tables with fewer cells, at higher geographies.
Perturbation will be applied to all outputs from Census 2021, so small differences between totals are expected in all outputs. The extra protection against differencing allows us to release a wider variety of outputs and use an automated, flexible dissemination system, which we hope will be of great benefit to users.
Where statistical disclosure rules will likely suppress data, these population bases are not included in the create a custom dataset offering.
Instead, we have published pre-built multivariate datasets. These datasets have already passed disclosure checks and include data for the lowest possible, non-disclosive geographies. Broadly, these pre-built multivariate datasets contain data relating to sexual orientation and gender identity, short-term residents, students, families, dependent children and Houses of Multiple Occupancy.
Where to find the data
We have published Census 2021 multivariate datasets on our website, where you are able to use the full flexibility on offer. See our Release Calendar for details. Guidance about definitions, variables and classifications can be found in the Census 2021 dictionary.
We will also provide the standard set of pre-built multivariate datasets on Nomis, a service of the Office for National Statistics.
If you want to automate multivariate data, you can use our API. Guidance about using our API is available through the ONS Developer Hub.
We have detailed the variables and classifications used in the Create a custom dataset tool and the pre-built datasets in the corresponding tabs of our multivariate specifications spreadsheet.
If you need more information about our plans for providing data from Census 2021, please contact us at firstname.lastname@example.org.
- Census 2021 Phase 2 multivariates (727.1 kB xlsx)