1. Background

In late 2017, Office for National Statistics (ONS) began an audit of data sources and publications that are available to understand equalities in the UK today including outcomes for all nine of the protected characteristic groups covered by the Equality Act 2010 1. This was the first step in developing a Centre for Equalities and Inclusion. The aim of the Centre is to work with others to ensure that the right data are available to address the main social and policy questions about fairness and equity in society, relevant analyses are taken forward and the best methods are in use.

In the last decade, we’ve seen an increasing demand for robust data to monitor equalities, driven by a range of developments including:

These events have increased the demand for equalities data in the UK as well as having an impact on its availability and accessibility. Also during this period, the Digital Economy Act 2017 came into force, increasing opportunities for making better use of existing data and data science methods have expanded our horizons, creating new opportunities for sourcing, analysing and presenting data.

Taken together, this suggests the time is right to take stock of the state of equalities data in the UK, to understand where we are now and the challenges and opportunities ahead. On the premise that better data is the basis for better decisions, this audit is a first step in that direction.

This report presents the findings from the audit on the state of the existing evidence on the protected characteristics. It describes how we carried out the audit and work we undertook to refine its content. It then summarises some of the more general findings before going on to look at each of the protected characteristics in turn. These sections define the characteristic, describe any relevant harmonised principles2 and then go on to summarise what the audit has shown in terms of the coverage and quality of the data. We then go on to identify the issues we have identified with the existing evidence base, including any known data gaps. In the final section we describe our future plans to build on what the audit has shown us.

Notes for: Background

  1. The Equality Act applies in England, Wales and Scotland and defines the following as protected characteristics: age; disability; gender reassignment; marriage and civil partnership; pregnancy and maternity; race; religion or belief; sex; and sexual orientation. Although there is some overlap with the protected characteristics in the Equality Act, separate legislation applies in Northern Ireland.

  2. These are a set of principles that the Government Statistical Service (GSS) has produced to be used when collecting and analysing data.

Back to table of contents

2. Our approach

Our initial approach to the audit was to catalogue publications, datasets and sources relating to the nine protected characteristics, first by carrying out an initial search of GOV.UK to identify relevant sources of official statistics and then by successive stages of crowd-sourcing for feedback on its content and further input. To ensure its completeness, the audit was distributed initially within Office for National Statistics (ONS). It was then advertised on the Government Statistical Service (GSS) website and finally was targeted at non-governmental organisations with a demonstrated interest in the issues.

For each of the resources listed, we asked respondents to provide information on the following:

  • the protected characteristic(s) that it covers

  • the organisation responsible for it

  • the underlying dataset – the survey source or administrative dataset

  • the theme – this was a free text field to capture the broad subject matter of the source

  • its geographical coverage – UK, Great Britain, country-specific or lower levels of geography

  • its geographical granularity – whether the source is currently broken down into lower levels of geography, such as country, region, local authority

  • the date of the most recent publication

  • how often it is released

  • the time period for which the data are available

Prior to our initial publication, we did some further work to refine the results (see our initial report for full details). This included organising the resources into the six domains as defined in the Measurement framework for equality and human rights; these are health, education, justice and personal security, living standards, work and participation.

However, our initial look at the results highlighted the need to work collaboratively with data producers and users to make a more detailed assessment of the available data and to be able to identify where improvements are needed. We therefore convened a working group, including representatives from:

  • the Equality and Human Rights Commission

  • the Government Equalities Office

  • the Race Disparity Unit

  • the Department for Work and Pensions

  • the Welsh Government

  • the Northern Ireland Executive Office

  • the Department for Culture, Media and Sport

  • the GSS harmonisation team

  • the Office for Statistics Regulation

  • Lancaster University’s Division of Health Research

  • representatives from relevant ONS teams

Following the recommendations from this group, the audit was changed to capture the raw data sources underlying the initial resources provided by the crowd-sourcing exercise1. Additional fields were added to capture harmonisation information and to reflect the extent to which each data source included characteristics on socio-economic group and people at higher risk of harm, abuse, discrimination or disadvantage (see Intersectionality). One of the fundamentals of the human-rights based approach to data is that the individual has their own voice; to capture this element, a field was also included to reflect the extent to which data collected on a specific characteristic are self-reported. (Due to the current level of missing information in some of these additional fields and the work continuing to populate them, not all of them have been published at this time – see Equalities data audit for further information.)

It is important to note that the spreadsheet published alongside this report and used as the basis for our findings, is a snapshot of the state of the evidence as we knew of it at the time of publication. The approach taken to populate the fields has involved both a search of the publicly available metadata and making direct contact with data producers. Due to the volume of data sources captured and its complexity, this work is still ongoing and there are gaps to fill. The intention is that the audit will be a live document that will continue to evolve over time as new data sources are added and existing data sources are updated. We are relying on users to let us know if there are sources of data that are not currently captured or any amendments needed to the existing information. The aim is for the audit to be a resource for researchers and we welcome feedback on its content and format to ensure that it is valuable and relevant (see also Next steps).

We have also been engaging with a range of stakeholders via conferences, seminars and meetings to capture their experiences of the existing data, in particular, where they perceive there are gaps in the current evidence base. These are also reflected in this report.

Notes for: Our approach

  1. This was restricted to sources published in the last five years. As a result, the 2011 Census was excluded from this exercise, though recognising the breadth and depth of information that is available from it.
Back to table of contents

3. Summary of the audit

Using the method described, almost 230 unique sources of data that contain information on the protected characteristics have been captured by the audit to date. While almost half of these sources cover the health domain, the coverage of the remaining domains in terms of numbers of sources is fairly evenly distributed, with approximately 30 to 40 sources for each.

Administrative data make up most of what is available in the justice and personal security domain and approximately half of the data in the health and education domains. In contrast, the remaining domains are covered mainly by surveys or censuses.

One of the areas that the audit has revealed as an issue is in terms of researchers being able to access the necessary information about the sources that are available, even before they try to access the data. Although this varies by source, in general, while the majority of surveys include a wealth of metadata, including variable guides, user guides and technical reports, getting access to this kind of information for administrative sources is much harder. This has implications for those domains or protected characteristics that are mainly covered by administrative sources. There are some examples of good practice in this area, including NHS Digital’s easily accessible data dictionaries, but these tend to be the exception.

It is recognised that disadvantage may be experienced differently by those with multiple protected characteristics, for example, a woman from an ethnic minority group. This is referred to as intersectionality. For this reason, it is important that data are available to effectively monitor the intersection of different protected characteristics.

The Equality and Human Rights Commission (EHRC) define intersectionality as:

“An analytical tool that we use… to show the distinct forms of harm, abuse, discrimination and disadvantage experienced by people when multiple categories of social identity interact with each other.”1

In line with their five components of evidence collection and analysis, the audit aims to capture the extent to which data are available on these different components, that is, the protected characteristics, socio-economic group, people at higher risk of harm, abuse, discrimination or disadvantage and geographical analysis.

It is important to note, that while the audit flags data sources that include these characteristics, because of the multiple ways in which they could be combined, it is not possible to draw any conclusions about the ability of these sources to be used for intersectional analysis. It is simply intended as a starting point to establish in which sources it might be possible to consider intersectionality. In general, users indicate a demand for data that can be used for intersectional analysis.

Similarly, the audit gives an indication of the geographical information included as a guide to what might be possible in terms of granularity. However, the ability of the data to be broken down to this level will depend on the sample size and the population of interest. For this reason, no comment is made on the granularity of the different data sources in this report. Users have indicated an unmet need across the board for data at lower levels of geography.

The audit also captures around 15 longitudinal data sources that contain information on the protected characteristics, almost all of which are survey sources. There is coverage across the protected characteristics and at least one source covering each domain, though most sources cover multiple domains. These include Understanding Society, the Millennium Cohort Study, the ONS Longitudinal Study and the Annual Population Survey. In general, users report a lack of relevant longitudinal data over a sufficiently long time period to allow for analysis of people’s life trajectories in relation to the protected characteristics.

A complication that arises in the living standards domain, is where the household situation is applied equally to all members of the household. This assumes that resources are equally shared among all members of the household, but this may well not be the case.

Notes for: Summary of the audit

  1. See EHRC’s Measurement framework for equality and human rights for more information.
Back to table of contents

4. Age

The Equality Act 2010 defines age as follows:

"(1) In relation to the protected characteristic of age—
(a) a reference to a person who has a particular protected characteristic is a reference to a person of a particular age group;
(b) a reference to persons who share a protected characteristic is a reference to persons of the same age group.
(2) A reference to an age group is a reference to a group of persons defined by reference to age, whether by reference to a particular age or to a range of ages."

The harmonised principles for the collection of data on age are contained within the principles for the characteristics of those living in a household, within the Demographic Information principles. These are designed for household surveys and stipulate that collection should be through a single grid, covering name, sex and age. Age is collected by asking “What is your date of birth?” or “What was your age last birthday?”, although interviewers are instructed to give their best estimate if the respondent refuses to give their age.

As an important piece of demographic information, age is one of the most frequently collected protected characteristics. However, while most of the sources in the audit contain information on age, different age groups are not necessarily equally represented in each.

While the majority of sources in the audit for which information was available on the population coverage did include children, the extent to which they are able to provide information on their own behalf can vary. Where children are included as part of a wider household data collection, information on children is often collected by proxy.

The audit identified over 40 sources that provide data specifically on children, for example, the Children and young people’s health services dataset, or directly collect responses from those aged under 16 years, including Understanding Society’s youth questionnaire (completed by those aged 10 to 15 years), the Active Lives Survey (completed by 14- and 15-year-olds) and the Crime Survey for England and Wales (interviews with those aged 10 to 15 years). Where the data collection involves a questionnaire, these are designed to be completed by the child, so the assumption is that the child is free to express their identity and opinions, though this assumption may not always hold.

While there are clearly ethical and practical considerations to be considered when undertaking data collection on children, the involvement of a responsible adult in completing the information, raises issues of data quality. Where guidance is available, it generally stipulates that the adult involved in completing this information should provide a response from the point of view of the child. However, there is no way of knowing to what extent this applies in practice (the same also applies in the case of responses from adults who may need support in interpreting and completing questionnaires). Even for those cases where the child is consulted over their response, they may be reluctant to share their true responses, particularly when collecting data on some of the more sensitive characteristics including sexual orientation and gender reassignment.

Even where the age group of interest is part of the target population, coverage of different age groups can vary. This impacts on the ability to carry out meaningful analysis on people of different ages. Poor granularity for age is particularly evident amongst older age groups, commonly reported in publications as “over 60” with no further breakdown provided. This is likely to be more of a problem where data collection is online, a mode that is being used increasingly and which is known to be associated with lower response rates for older people.

Back to table of contents

5. Sex

The Equality Act 2010 defines sex as follows:

"In relation to the protected characteristic of sex—
(a) a reference to a person who has a particular protected characteristic is a reference to a man or to a woman;
(b) a reference to persons who share a protected characteristic is a reference to persons of the same sex."

Like age, the harmonised principles for collection of data on sex are contained within the principles for the characteristics of those living in a household within the Demographic Information principles. The principles specify that the grid used to collect the characteristics of the household should include specific questions except in the case of sex, “which will often be volunteered or observed”. The guidance stipulates that the question should only be asked if this is not the case, in which case, the question “What is your sex?” should be used. The principles further identify two response options: male or female.

Around 200 of the sources included in the audit contain information on sex, making this the protected characteristic with the greatest coverage alongside age. Around half of the sources are administrative data sources, with surveys or censuses accounting for the remaining. As would be expected for the characteristic with the greatest coverage, the coverage across domains reflects the overall domain coverage of the audit, with half of the sources covering health, but fewer covering work and participation.

There is a strong user demand for data to be collected on sex, because of its use in planning of services and allocation of resources across central and local government, among other uses (see 2021 Census topic consultation). The Equality Act defines the protected characteristic in binary terms, male or female. However, it is important to be clear in the data collection about how “sex” is defined. While the terms “sex” and “gender” are often used interchangeably (and there are examples of this in the audit) they have different meanings; sex is biologically determined while gender is how an individual perceives themselves.

Perhaps in part because the harmonised principles stipulate that sex will generally be observed or volunteered, there is limited information within the audit on how this information is collected. For those sources for which information is available, seven include options other than male or female, five of which are in the health domain and three of which cover multiple domains. Only one of these seven sources covers the UK (the LGBT survey) with five covering England and the last one covering Northern Ireland.

Both the UK LGBT survey and the Northern Irish Young Life and Times Survey are intentionally collecting gender information, with both including a range of gender identity options in their questionnaires. In contrast, the sources of data within the health domain are focused on capturing sex so the options are framed in biological terms. The Adult Social Care Survey includes the option for respondents to choose an “other” category and the three NHS digital administrative sources (Hospital Episode Statistics, the Improving Access to Psychological Therapies (IAPT) dataset and the Patient Reported Outcome Measures data) include a “not specified” option with guidance indicating that this should be used when sex cannot be determined. The Emergency Care Dataset similarly includes an “indeterminate” option.

The demand for data on gender identity is discussed further in the section on Gender reassignment, however, it is apparent that there is a need to more clearly define the concepts of sex and gender in the data collection. In addition, there is a need to clarify how the data will be used so that data collection can be targeted to meet it, for example, there may be a need to capture information on those who do not identify as either male or female.

Back to table of contents

6. Race and ethnicity

The Equality Act 2010 defines race as follows:

"(1) Race includes—
(a) colour;
(b) nationality;
(c) ethnic or national origins.
(2) In relation to the protected characteristic of race—
(a) a reference to a person who has a particular protected characteristic is a reference to a person of a particular racial group;
(b) a reference to persons who share a protected characteristic is a reference to persons of the same racial group.
(3) A racial group is a group of persons defined by reference to race; and a reference to a person's racial group is a reference to a racial group into which the person falls.
(4) The fact that a racial group comprises two or more distinct racial groups does not prevent it from constituting a particular racial group."

There is no harmonised principle covering race. Although race is the protected characteristic, ethnicity is the primary source of data collection in the UK and is therefore used in monitoring equality1.

The harmonised principles for ethnicity are different for England, Wales, Scotland and Northern Ireland, in part reflecting differences in legislation between the countries. However, there are specific recommendations for their use across Great Britain and the UK to deal with these differences.

The current harmonised principles, which have been in place since 2011, define 18 categories of ethnic group in England and Wales, 19 categories in Scotland and 16 categories for Northern Ireland2. Where sample sizes do not allow the full categorisation to be applied, or to aid comparability between countries with different categorisations, the data should be aggregated to five main broad headings, with notes to explain the differences.

After age and sex, ethnicity is the most prevalent characteristic captured in the audit data sources, with approximately 150 sources including it; approximately half are administrative data sources and the other half are surveys or censuses.

The audit includes ethnicity data sources across all the domains, with around half of the sources including information relating to health. Around 20 of the sources included information that covered more than one domain, including some of the main government household surveys, for example, the Annual Population Survey, the LGBT Survey and the Family Resources Survey.

In our initial report on the findings of the audit, we were reporting on the publications that were available on the protected characteristics. The focus on ethnicity revealed numerous publications where analysis had been produced using aggregations that were not harmonised. Many others were also using earlier versions of the harmonised principles, which are now out of date. This applied in some cases even for relatively recent pieces of analysis and the Ethnicity Facts and Figures website lists 20 different categorisations of ethnicity based on 2001 Census or 2011 Census categorisations. We are still populating the information on the use of the harmonised standards in the current version of the audit, but for those that are complete, almost one-third have been found to not be using the current version of the harmonised standards. This has implications for the comparability of the data.

In the guide for collecting and classifying ethnicity data, Office for National Statistics (ONS) recognises ethnicity as a subjective and multi-faceted concept that is self-defined and reflects how people see themselves. Because of this, a person’s ethnicity can change at different times, depending on the social and political context. ONS and the United Nations Statistics Division describe some of the criteria used to identify ethnic group as nationality, country of birth, language, religion, national or geographical origin and skin colour.

The subjective nature of ethnicity means that it should always be self-reported wherever possible, though the guide notes that some individuals, for example, children, may need help to understand the categories. For the sources of data included in the audit, it is not always clear whether ethnicity has been self-reported. In general, where guidance is available, it stipulates that ethnicity should be self-reported. However, there are examples where ethnicity is known not to be self-reported, for example, data on youth cautions3. Where ethnicity is not self-reported, the quality of the data can be compromised and this will also impact comparability across different sources. Ethnicity data that are not self-reported may also result in excessive use of “other” categories, as seen in data about detentions under the Mental Health Act4, or high levels of missing data.

Findings from the Race Disparity Audit revealed a lack of data showing detailed breakdowns for minority ethnic groups. This is a common problem, particularly with survey data where sample sizes are too small for these groups to allow meaningful analysis. There are ways in which this can be dealt with, for example, by boosting the data collection to target groups of interest or combining multiple years of data collection, but these have their issues, including implications for sample design and costs and identifying year-on-year changes. Although collecting data at this level of detail presents a challenge, it is essential so that we can adequately understand the situation of different ethnic groups.

Notes for: Race and ethnicity

  1. An exception to this is in the recording of hate crimes in the justice system; these are described as racial hate crimes, resulting from the perception of a person’s real or perceived race.

  2. To comply with Northern Irish legislation, there is only one White category and Irish traveller is a main category, separate from White.

  3. See Ethnicity facts and figures, youth cautions, for more information for more information.

  4. See Ethnicity facts and figures, Detentions under the Mental Health Act, for more information for more information.

Back to table of contents

7. Religion or belief

The Equality Act 2010 defines religion or belief as follows:

"(1) Religion means any religion and a reference to religion includes a reference to a lack of religion.
(2) Belief means any religious or philosophical belief and a reference to belief includes a reference to a lack of belief.
(3) In relation to the protected characteristic of religion or belief—
(a) a reference to a person who has a particular protected characteristic is a reference to a person of a particular religion or belief;
(b) a reference to persons who share a protected characteristic is a reference to persons who are of the same religion or belief."

Religion involves several concepts1:

  • religious affiliation – “how respondents connect or identify with a religion, irrespective of whether they actively practice it”

  • religious belief – “those beliefs followers of a religion are expected to hold”

  • religious practice – the activities that people undertake in relation to religion, “includes activities such as worship, prayer, participation in special sacraments, and fasting”

The harmonised principles on religion state that “Where a single question on religion is required for data collection in the UK, religious affiliation is the recommended concept to measure.” The harmonised question “What is your religion?” has been specifically chosen to capture an individual’s religious affiliation, as this was deemed to most closely align with the definition in the legislation. While the question remains the same for all the countries of the UK, the response categories are different for each country.

Although the Equality Act includes philosophical beliefs under this protected characteristic, the decision was made not to include it in the harmonised principles because testing revealed a negative impact on the data collected on religious affiliation; reference to belief in the same question as religion altered responses to reflect religious belief rather than affiliation.

There are almost 60 unique sources of data on religion in the data audit, covering all six of the domains, in proportions similar to that overall. Most of these sources are surveys, many of which cover multiple domains, including the Family Resources Survey, the National Survey for Wales and the LGBT survey.

When Parliament agreed that data relating to religion could be captured on the census, it stipulated that this should be voluntarily provided, due to public acceptability concerns. In most of the data sources for which we have this information, religion is an optional entry or a “prefer not to say” option is included. The voluntary nature of the religion question may have implications for the number of responses received as well as the quality of the data collected.

In terms of the number of responses, high levels of missingness in survey samples that are already limited, can result in insufficient data to enable meaningful analysis by different religions. Quality may be an issue particularly if non-response is not distributed evenly across the different religions. Further work is needed to establish the extent to which the sources included are missing data on this characteristic, which may indicate quality concerns, although over 90% of respondents answered this question in the 2011 Census, suggesting that even though it is voluntary, response rates were still high.

Just under half of the data sources use the harmonised standards, though currently there are almost as many for which we do not know whether the harmonised principles are being followed. A variety of other questions were included either instead of the harmonised question or sometimes alongside it. These included:

  • Do you consider yourself as belonging to any particular religion or denomination?

  • What is your religion, even if you are not currently practicing?

  • Do you have a specific religion?

  • Do you regard yourself as belonging to any specific religion?

  • Which, if any, of the following best describes your religion?

  • What is your religion or belief?

Similar to ethnicity, data on religion should be self-reported and there are implications for the quality of the data collected where this is not the case. In common with the other protected characteristics that should be self-reported, where guidance is available, it generally stipulates that the question should be answered either by the respondent themselves or from their point of view, though again there is no way of knowing to what extent this is happening. For example, in administrative data, there might be over-use of non-specific categories, such as “other” or “prefer not to say” options.

Although the 2021 Census topic consultation identified strong user need for data on religious affiliation for policy development and in relation to equality issues, there was also evidence of a demand for data covering religious beliefs and practices.

For those sources for which we have this information, there are some examples in the audit of attempts to capture more than religious affiliation. This includes the Taking Part Survey, the Northern Ireland Crime Survey and the Continuous Household Survey all of which ask whether the respondent is currently practising. In addition, the Adult Psychiatric Morbidity Survey includes questions related to beliefs, specifically, respondents’ beliefs towards life after death, whether prayer has value and if they believe there is a god. However, these are rare and do not provide for effective estimates of the religious population based on beliefs or practices.

Notes for: Religion or belief

  1. See The 2021 Census: Assessment of initial user requirements on content for England and Wales: Religion topic report for more information, published in 2016.
Back to table of contents

8. Disability

The Equality Act 2010 defines disability as follows:

"(1) A person (P) has a disability if—
(a) P has a physical or mental impairment, and
(b) the impairment has a substantial and long-term adverse effect on P's ability to carry out normal day-to-day activities.
(2) A reference to a disabled person is a reference to a person who has a disability.
(3) In relation to the protected characteristic of disability—
(a) a reference to a person who has a particular protected characteristic is a reference to a person who has a particular disability;
(b) a reference to persons who share a protected characteristic is a reference to persons who have the same disability."

The harmonised principles relating to disability were developed following the Equalities Data Review (2007), which suggested improvement in the coordination, comparability, quality and accessibility of disability statistics and the application of a principled approach to data collection. A programme of research and consultation followed, which led to the development of three harmonised principles, each relating to a separate aspect of disability. The three principles are:

The aim of this multi-faceted approach is to provide an agreed suite of questions that “separate out the components leading to a simplified classification of disability, which is defined as activity restriction and participation restriction”. This is aligned with the social model of disability, defined by Scope in the following way:

“The social model of disability is a way of viewing the world, developed by disabled people. The model says that people are disabled by barriers in society, not by their impairment or difference. Barriers can be physical, like buildings not having accessible toilets. Or they can be caused by people's attitudes to difference, like assuming disabled people can't do certain things.”

Although all three elements are required to fully capture disability as defined in the Equality Act, the harmonised principle on impairment includes a pragmatic note that, “the questions are intended as a suite, not all of which has to be used in every situation.”

A fourth harmonised principle, drawing together the essential elements of the others in a succinct format, is the Statistical measures of disability based on long-lasting health conditions and illnesses and activity restriction.

Overall, more than one-third of the sources in the audit include information on disability in some way and cover each domain of life, with several covering multiple domains (including Understanding Society, the Annual Population Survey, the Millennium Cohort Study, the English Longitudinal Study of Ageing, and the National Survey for Wales). Health is the domain most extensively covered.

Looking at the types of data that contain information on disability, there is a roughly even split between surveys and censuses, and administrative data sources. All four countries of the UK have data on disability, including country-specific datasets.

Of the UK-wide sources, the Family Resources Survey (FRS) is commonly used for estimates of prevalence of disability at the UK level. The FRS has used the harmonised principles for the measurement of disability (introduced in 2011) since 2012 to 2013. Prior to this (between 2004 to 2005 and 2011 to 2012), a different definition of disability was used based on barriers in nine areas of life. A consistent time series providing annual prevalence estimates of disability at the UK level is therefore available only from 2012 to 2013.

In keeping with the approach taken in the harmonised principles, the FRS user guidance advises that there are several groups excluded from the core definition of disability included in the Equality Act and these groups are not captured in the survey (see page 27):

“The definition of disability used in the FRS publication is consistent with the core definition of disability under the Equality Act 2010. A person is considered to have a disability if they have a long-standing illness, disability or impairment which causes substantial difficulty with day-to-day activities. However, some individuals classified as disabled and having rights under the Equality Act 2010 are not captured by this definition:

  • People with a long-standing illness or disability who would experience substantial difficulties without medication or treatment

  • People who have been diagnosed with cancer, HIV infection or multiple sclerosis and who are not currently experiencing difficulties with their day to day activities

  • People with progressive conditions, where the effect of the impairment does not yet impede their lives

  • People who were disabled in the past and are no longer limited in their daily lives are still covered by the Act.”

The Labour Force Survey (LFS), is another important UK-wide source of statistics of the participation of disabled people in the labour market. Recently, the release of disability statistics from the LFS was suspended temporarily due to apparent discontinuities in the data between Quarter 2 (Apr to June) and Quarter 3 (July to Sept) of 2017. The discontinuity was explored in an article published by Office for National Statistics (ONS), but no conclusive reasons for the change were found and the releases were re-instated with a note to treat with caution comparisons between Quarter 2 2017 and subsequent quarters.

Another important survey comparing the experiences of disabled and non-disabled people in Great Britain, the Life Opportunities Survey, ran from 2009 until 2014. It provided statistics on work, education, social participation, transport and use of public services but was discontinued in 2014.

Together, these developments suggest that further consideration may be needed of the quality, quantity and coverage of UK sources of data on disability.

Among the sources in the audit for which we have information on harmonisation, the harmonised disability standard is not consistently applied. The audit currently contains less information about harmonisation of impairment types. It appears that more surveys collect data on disability than on impairment type, but as there is a lot of missing information on impairments, we cannot currently be sure of this.

Looking at both disability and impairment together (presumably the gold standard for understanding the impacts of disability on people’s lives), there is a wide range of different approaches in practice among those that are collecting data on one or the other, or both:

  • harmonised for disability and impairment (for example, Understanding Society)

  • harmonised for disability but not impairment (for example, GP Patient Survey, LFS)

  • harmonised for disability but impairment not collected (for example, British Social Attitudes, Deprivation of Liberty Safeguards, National Survey for Wales)

  • harmonised for impairment but not disability (for example, What About Youth Survey)

  • collecting data on both disability and impairment but not harmonised for either (for example, Adult Psychiatric Morbidity Survey, Active Lives Survey, Corporate Parenting Returns)

There are also examples of data sources with information about impairment, but not disability (for example, Children and Young People’s Inpatient Day Care Survey, Emergency Department Survey, Maternity Services Survey).

In relation to children, definitions of disability may also relate to special educational needs. This adds further complexity and results in the use of a range of different definitions of disability for children and young people, as highlighted by Parsons and Platt (2013, page 3):

“The very definition of disability is, however, a contested area. Many surveys have employed questions that align with the Equality Act definition, as well as questions about impairments (i.e. medical or functional) and activities and participation (Porter, et al., 2008; Read, 2007). Other studies adopt definitions based on the classifications of SEN used in schools. Yet these questions are not necessarily consistent across studies. This has resulted in different studies using different definitions to estimate prevalence.”

There is insufficient information about harmonisation of disability in data collected through non-survey modes to be able to say confidently whether the definition proposed in the harmonised principle is being applied regularly (though there are some examples of administrative data that are not harmonised). In many cases, the data relate to users of services, such as those accessing mental health or learning disability services or other types of care (for example, corporate parenting returns). In these cases, it is likely that people are determined to have a disability prior to or as part of the process of accessing services.

In some administrative datasets, there may be a possibility of under-reporting of disability or impairments due to the way the data are recorded. For example, the Deprivation of Liberty Safeguards dataset notes highlights:

“…Caution must be exercised when analysing this dataset by Disability code. The dataset requires only one 'Primary Disability' to be recorded so does not reflect those instances where an individual has more than one type of disability. Furthermore, the nature of Deprivation of Liberty Safeguards may result in the mental capacity of a client determining the Primary Disability recorded over any other disabilities, and therefore that non-mental disabilities may be undercounted in any analysis.”

In such cases, it is also unlikely that individuals have self-reported a disability or impairment, and that it has been recorded by a professional on their behalf.

Back to table of contents

9. Sexual orientation

The Equality Act 2010 defines sexual orientation as follows:

"(1) Sexual orientation means a person's sexual orientation towards—
(a) persons of the same sex,
(b) persons of the opposite sex, or
(c) persons of either sex.
(2) In relation to the protected characteristic of sexual orientation—
(a) a reference to a person who has a particular protected characteristic is a reference to a person who is of a particular sexual orientation;
(b) a reference to persons who share a protected characteristic is a reference to persons who are of the same sexual orientation."

The harmonised principles on sexual orientation describe it as “an umbrella concept which is informed by a person’s sexual identity, attraction and behaviour”, which may well not align. The principles include guidance for interviewer-led collection, telephone interviewing and self-completion, though in every case the intention is to capture self-perceived sexual identity. The principles include the following explanation for the collection of sexual identity rather than any of the other aspects of sexual orientation:

“The measurement of Sexual Identity was identified within the research as the component of Sexual Orientation most closely related to experiences of disadvantage and discrimination. The question was not designed for specific or detailed studies of sexual behaviour or attraction where a series of more detailed questions and answer categories might be more appropriate.”

The question has four response options:

  • heterosexual or straight

  • gay or lesbian

  • bisexual

  • other

The audit includes around 40 unique sources that have been classified under the “sexual orientation” category, with coverage across all the domains, in similar proportions to the audit as a whole. Almost two-thirds are surveys with the remainder being administrative data sources covering this category. The surveys include major household surveys like the Annual Population Survey, which are weighted to be representative of the UK population, though most ask these questions of those aged 16 years and over only. The more comprehensive surveys specifically targeting the lesbian gay and bisexual (LGB) population, including the LGBT survey, are self-selected samples from which it is not possible to weight up to the UK population. Most of the administrative sources cover very specific sub-populations.

Half of the sources are using the harmonised principles, which were designed to capture information on sexual identity, though describing it as sexual orientation. Six sources are not using the harmonised standard and most of these refer to sexual orientation rather than identity, though it is difficult to know if this has any impact on responses.

As with ethnicity and religion, data on sexual orientation and/or identity should be self-reported, though it is not always possible to identify where this is the case. Where guidance is available, it generally stipulates that the response should be provided from the point of view of the person it is about, though there is no way of knowing if this is the case. Given the sensitivity of the topic, it is possible that the respondent may not wish to disclose their sexual orientation to the person answering the question, which may impact on the quality of the data collected.

Responses to the 2021 Census topic consultation identified the need for a reliable estimate of the size of the LGB population, describing its importance in developing policy, monitoring, resource allocation and service planning. Although the terminology referred to the LGB population, the user need actually encompasses the full range of the non-heterosexual population. However, different views were expressed on whether this should encompass the full range of identity, attraction and behaviour, or follow the harmonised principles in capturing only identity. Some users identified sexual behaviour as the most important element of sexual orientation to measure, at least in relation to health, as they felt this was more relevant to outcomes than either identity or attraction.

ONS is currently considering how best to meet user needs for this topic (see the Sexual identity research and testing plan and December 2017 update).

Back to table of contents

10. Gender reassignment

The Equality Act 2010 defines gender reassignment as follows:

"(1) A person has the protected characteristic of gender reassignment if the person is proposing to undergo, is undergoing or has undergone a process (or part of a process) for the purpose of reassigning the person's sex by changing physiological or other attributes of sex.
(2) A reference to a transsexual person is a reference to a person who has the protected characteristic of gender reassignment.
(3) In relation to the protected characteristic of gender reassignment—
(a) a reference to a person who has a particular protected characteristic is a reference to a transsexual person;
(b) a reference to persons who share a protected characteristic is a reference to transsexual persons."

The Equality and Human Rights Commission (EHRC) approach to the protected characteristic “gender reassignment” is that trans and non-binary people are protected against discrimination regardless of whether they have had any kind of medical supervision or intervention.

There is currently no harmonised principle for gender reassignment or gender identity.

The audit revealed a general lack of data on gender reassignment with only 12 unique sources of data captured, coming from surveys and administrative sources. Data are available for each country of the UK in at least one source, though the coverage for the different countries across the domains varies, as would be expected with only a small number of unique sources. There was particularly a lack of data on living standards, with only three sources, between them covering Northern Ireland and England, providing very limited information in this domain. In particular, the English source was specific to the homeless population.

The lack of data and geographical coverage has serious implications for our ability to be able to draw robust conclusions about the population as a whole. This is further hampered by the fact that the surveys with the broadest coverage, the LGBT survey and the Stonewall survey on LGBT in Great Britain, rely on respondents self-selecting. There is currently no source that provides an estimate of the magnitude of the population.

Although there is a lack of data on gender reassignment, the few sources identified by the audit all tended to include information on at least one of the other protected characteristics, with different sources including age, sex, ethnicity, disability, sexual orientation, religion or belief, and marriage or civil partnership. Only one source included pregnancy and maternity information.

In its overview on gender identity, Office for National Statistics (ONS) recognises it as a subjective and self-defined concept and reflects how people see themselves. This emphasises the importance of self-reporting of this characteristic. For the sources of data included in the audit, most are self-reported, however not all are, such as police recorded crime. Where gender reassignment is not self-reported, the quality of the data can be compromised.

As there is currently no harmonised principle for the collection of data on gender identity, it is captured differently in all the sources. The EHRC have developed a recommended suite of four questions for capturing gender identity (PDF, 1.93MB). The first three are required to identify the population with a transgender identity and within this population with the protected characteristic of gender reassignment. None of the data sources in the audit use the EHRC questions. Two use the Equality Challenge Unit (ECU) suggested question, other sources include options ranging from:

  • male

  • female

  • refusal

  • prefer not to say

  • don’t know

  • trans man

  • trans woman

  • male to female transgender

  • female to male transgender

  • non-binary

  • other

In one data source’s interview guidance, the gender question can be inferred by observation for those unable to declare their gender and also includes the options “intermediate (unable to be classified as male or female)” or “not known”, though it is unclear whether the question is attempting to capture sex or gender identity.

There is a strong user demand for data on gender identity from a wide range of stakeholders. The 2021 Census topic consultation also highlighted a need for it in order to understand inequality, inform and monitor policy development and allocate services for this population. ONS is currently considering how best to meet user needs for this topic (see the Gender identity research and testing plan and December 2017 update).

Back to table of contents

11. Marriage and civil partnership

The Equality Act 2010 defines marriage and civil partnership as follows:

"(1) A person has the protected characteristic of marriage and civil partnership if the person is married or is a civil partner.
(2) In relation to the protected characteristic of marriage and civil partnership—
(a) a reference to a person who has a particular protected characteristic is a reference to a person who is married or is a civil partner;
(b) a reference to persons who share a protected characteristic is a reference to persons who are married or are civil partners."

The harmonised principles relating to marriage and civil partnership are contained within the Demographic Information principles. These include both a question with a number of marital status options or guidance on questions to ask to complete the household grid. Both options relate only to those aged 16 years and over, reflecting that anyone can get married or form a civil partnership in the UK if they are aged 16 years and over, though permission is needed from the individual’s parents or guardians if the individual is aged under 18 years in England, Wales and Northern Ireland.

The Civil Partnership Act 2004 granted civil partnerships to same-sex couples in the UK with rights and responsibilities identical to civil marriage. The Marriage (Same Sex Couples) Act 2013 makes provision for the marriage of same-sex couples in England and Wales, either in a civil ceremony (in a register office or approved premises, for example, hotel) or on religious premises. Civil partners have been able to convert their civil partnership into a marriage since 10 December 2014. More recently it has been announced that couples of opposite sexes can also form civil partnerships.

The audit identified around 50 unique data sources including information on marriage and civil partnerships, 18 of which are administrative data sources. In terms of domain coverage, there is coverage across all the domains. Similar to the overall picture, over half were in the health domain, 16 in the participation domain but only six in the justice and personal security domain. The sources covered all of the UK.

Back to table of contents

12. Pregnancy and maternity

The Equality Act 2010 identifies pregnancy and maternity as one of the protected characteristics and prohibits pregnancy or maternity discrimination. The Equality and Human Rights Commission (EHRC) specifies what is covered under this characteristic in the following way:

“Pregnancy is the condition of being pregnant or expecting a baby. Maternity refers to the period after the birth, and is linked to maternity leave in the employment context. In the non-work context, protection against maternity discrimination is for 26 weeks after giving birth, and this includes treating a woman unfavourably because she is breastfeeding.”

There is no harmonised principle that applies to the statistical measurement of pregnancy and maternity.

Almost 40 of the data sources included in the audit contain information about pregnancy or maternity. These are predominantly in the health domain, but there are also examples of surveys containing data on maternity and pregnancy and covering multiple domain areas such as education, work, living standards and participation, including the Labour Force Survey, the Millennium Cohort Study, the National Survey for Wales and the Taking Part Survey. The Work Life Balance Employer Survey looks specifically at employer responses to flexible working and employee leave, and would seem particularly relevant to understanding issues in relation to aspects of pregnancy and maternity leave from the employer perspective, but the survey was last updated in 2015.

None of the datasets currently included in the audit appear to provide data directly relevant to women’s perceived experiences of discrimination in relation to pregnancy or maternity. The EHRC and the (former) Department for Business, Innovation and Skills (BIS) jointly commissioned a programme of research in 2015 among employers and mothers to address this gap. It updated research previously done by the EHRC in 2005.

Among the sources in the audit containing data on pregnancy and maternity, there is an almost even mix of survey and administrative data sources, including some linked administrative datasets. The administrative data are almost exclusively focused on the health domain, but there is currently one example relating to work and education (the Higher Education Statistics Agency Staff Record). There are sources covering the UK, Great Britain and some country-specific sources.

Much of the current data in the audit focuses on the health of women throughout pregnancy and maternity. Although we have some survey sources focusing on a range of different areas of life and also covering pregnancy and maternity, we did not find any regularly updated sources that would provide evidence of women’s perceived experiences of pregnancy or maternity discrimination.

Back to table of contents

13. Emerging issues with the existing evidence base

Our work on the audit together with discussions with stakeholders and users have identified a number of issues with the existing evidence base. These relate to:

  • transparency and accessibility

  • coverage and comprehensiveness (including gaps)

  • granularity

  • harmonisation and comparability

  • inclusiveness in data collection and reporting

Transparency and accessibility

Better metadata are needed for the data sources that exist to enable researchers to better understand the quality and any other issues with the data.

Specifically, greater clarity is needed over:

  • who has provided the response in data collection exercises, particularly in relation to children and those who may need support in understanding or interpreting the questions

  • the extent to which protected characteristics are self-reported, so researchers are aware of any issues in this respect

  • the use of the terms “sex” and “gender” and better understanding of what is being collected

  • the levels of non-response to the question on religion in different data collections and whether this may be introducing bias in estimates

Coverage and comprehensiveness

More data are needed that encompass the full range of protected characteristics alongside socio-economic groups and those at higher risk of harm, abuse, discrimination or disadvantage to allow for inter-sectional analysis.

Some of these requirements reflect a need for basic information on specific groups, to understand their demographic characteristics, the size of their populations, their experiences and outcomes across all the domains. These groups are:

  • refugees and asylum seekers

  • people not living in private residential households who are generally excluded from participation in surveys using household-based samples

  • the population who do not fall into the binary male or female categorisation of sex

  • the transgender population.

  • the non-heterosexual population that includes those aged under 16 years

Other data needs reflect a requirement for better data in specific areas for the protected characteristics groups to:

  • capture the full range of age groups in sufficient numbers to be able to report on the differing experiences of those of different ages

  • allow for a better understanding of minority ethnic groups’ experiences

  • enable analysis across the full range of religions

  • enable analysis on those that are not defined under the definition of “core disability”

  • provide evidence of women’s perceived experiences of discrimination in relation to the protected characteristic of pregnancy and maternity

  • provide information on the effect of pregnancy and maternity on women’s employment trajectories over time

  • cover the experiences of different groups over the lifecourse by using longitudinal data


This includes granularity in terms of lower level geographical data, household level data, and individual level data for those living within and outside private households.

More data are needed at lower levels of geography across all the protected characteristics (as well as those at greater risk of disadvantage such as homeless people, and refugees and asylum seekers) to provide policy-makers at a local level with the necessary evidence to inform their decisions.

Data on how resources are shared within households are also needed to establish where individuals within a household may experience different levels of disadvantage in relation to living standards.

Harmonisation and comparability

The audit has shown that harmonised principles are not applied consistently, undermining comparability across sources; this is particularly notable in relation to ethnicity where a wide range of approaches are in use.

Inclusiveness in data collection and reporting

The Human rights-based approach to data stipulates that an inclusive approach should be taken to data collection, reporting and disseminating, but our work has shown that this approach is not being consistently applied:

  • frequently-used approaches to data collection and sampling may systematically exclude some groups at greatest risk of disadvantage such as those not living in private residential households

  • survey administration may inadvertently prevent some groups from self-identification (for example, not asking respondents directly about their sex)

  • online data collection may inhibit the participation of some population groups (for example, older people)

  • although surveys may collect data on protected characteristics, they may not always report on the data collected; this can result in unnecessarily limited sharing of data and evidence on equalities

Back to table of contents

14. Next steps

As noted previously, the audit that forms the basis of this report provides a snapshot of what we currently know about the data that are available on the protected characteristics. However, there is still work to do to on it. We will continue with the work to populate the audit both to include additional sources that it does not currently capture and to ensure that the information contained within it is complete. This will enable us to determine any genuine gaps in the existing evidence base that need to be filled.

Because of the approach we have taken, the audit currently only identifies administrative data sources that are routinely used in the production of statistics. We will work with teams within Office for National Statistics (ONS) and elsewhere, working on alternative sources of data to identify the potential of any additional sources to be used to provide estimates of the populations of interest and to capture the relevant quality information for these new sources.

Our longer-term aim has always been to present the audit in a more user-friendly format, ideally in the form of an explorable tool. We will continue to look at options for making this happen.

There are areas where we think the audit could be improved. It currently focuses on large-scale quantitative data sources, predominantly produced by government. However, we recognise that there are other sources with potential to provide valuable evidence on inequalities in UK society. We will consider how best to accommodate some of these sources, including qualitative sources, setting minimum quality standards for data to be included.

Our aim has always been to work collaboratively to achieve our goals. We will be convening our Strategic Advisory Board and our Technical Advisory Group to consider the findings of the audit, in particular, the areas that have been identified as limitations with the existing data. Considering current policy priorities, we will work together to prioritise the work going forward, to set our longer-term objectives and develop our workplans.

Back to table of contents