1. Main points

  • This analysis was produced by academics outside of the Office for National Statistics (ONS), meaning the methodology used differs from existing ONS outputs and therefore estimates may differ.
  • The project led by Nazrul Islam (University of Oxford) found that the coronavirus (COVID-19) pandemic has had a disproportionate impact on those in the most deprived areas.
  • The project led by Sarah Rhodes (University of Manchester) found occupational differences in long COVID symptoms, and that occupational differences in prevalence could not be fully explained by differences in vaccine uptake, ethnicity, or viral load.
  • The project led by James Munday (London School of Hygiene and Tropical Medicine) found that further improvement is needed to make more effective forecasts of COVID-19 infections.
Back to table of contents

2. Overview of the projects

The coronavirus (COVID-19) pandemic has had a profound impact across the UK. In response to the coronavirus pandemic, the COVID-19 Infection Survey (CIS) measures levels of infection and antibody positivity, as well as providing additional analyses covering characteristics of people testing positive, reinfections, and vaccine effectiveness.

The Office for National Statistics (ONS) announced funding awards for three academic projects on 24 December 2021, to use CIS data in innovative ways. The funding period for the projects is now complete and this article summarises methods and results for:

  • Coronavirus (COVID-19) and social inequalities, led by Dr Nazrul Islam, University of Oxford

  • Occupational analyses using the Coronavirus (COVID-19) Infection Survey, led by Sarah Rhodes, University of Manchester

  • Producing forecasts of COVID-19 infection by age-group in England, led by Dr James Munday, London School of Hygiene and Tropical Medicine

These analyses were produced by academics outside of the ONS. This means the methodology used differs from existing ONS outputs and therefore estimates may differ. The full academic teams are listed in Section 10, Collaboration.

Back to table of contents

3. Project 1: Coronavirus (COVID-19) and social inequalities

Research aims and methods

This project examined the coronavirus (COVID-19) pandemic’s impact on socio-economic inequalities in the UK. It was led by Dr Nazrul Islam, University of Oxford.

The three research objectives were to estimate socio-economic inequalities in:

  • unemployment, job loss, and re-employment
  • the exposure to and outcomes from COVID-19 infection
  • the risk of developing long COVID within and across occupation groups

The study used Office for National Statistics (ONS) Annual Population Survey data between 2012 and 2020 to estimate the impact of the coronavirus pandemic on unemployment by Index of Multiple Deprivation (IMD). This measure divides areas into 10 deciles, with 1 being the most deprived areas and 10 being the least deprived. The study used linear regression on the 2012 to 2019 data to estimate a counterfactual expected rate of unemployment in 2020 to compare with the observed rate.

The study focused on analysis using COVID-19 Infection Survey (CIS) data between 26 April 2020 and 31 January 2022 in participants aged between 16 and 64 years to reflect the working age population.

To analyse the impact of the coronavirus pandemic on job loss, re-employment, exposure to COVID-19 and outcomes from COVID-19 infection, logistic regression modelling was used. Odds ratios (OR) and adjusted proportions were estimated for each outcome, split by deprivation decile to compare the most and least deprived areas. The model adjusted for socio-demographic factors such as age, sex, ethnicity, household size, urban or rural, and job type.

The second research objective focused on the exposure to, and outcomes from, COVID-19 infection. Risk of exposure to COVID-19 was analysed through the ability to work from home or maintain physical distancing at work. The analysis also investigated the infection risk from COVID-19 variants (Delta and Omicron). Outcomes after infection were analysed through self-reported NHS contact and self-reported hospital admission. To identify clusters of long COVID symptoms, association rule mining, a machine learning technique, was used to identify patterns of symptoms within deprivation deciles and occupation groups.

Results

Summary

Nazrul Islam’s results show that the coronavirus pandemic has had a disproportionate impact on those in the most deprived areas in relation to employment outcomes, the risk of exposure, the outcomes of infection, and increased risk of long COVID. This analysis used a different methodology to existing ONS outputs and results cannot be compared.

Employment

The projected unemployment rate in 2020 using ONS Annual Population Survey data was 2.8% (95% confidence interval (CI): 2.0 to 3.6). The observed unemployment rate was 4.6% (95% CI: 4.3 to 4.9), a difference of 1.8 percentage points (pp) (95% CI: 1.0 to 2.7). Figure 1 shows that the difference in unemployment was highest in the most deprived decile (3.6pp, 95% CI: 1.5 to 5.6) compared with the least deprived decile (1.1pp, 95% CI: 0.1 to 2.1).

Using CIS data, Nazrul Islam’s analysis found that 21.9% (95% CI: 20.9, 22.8) of participants in the most deprived areas either lost their job or were furloughed at any point between 26 April 2020 and 31 January 2022 compared with 14.3% (95% CI: 13.9, 14.7) in the least deprived areas. Similarly, in the least deprived decile, the proportion of re-employment was 64.6% (95% CI: 63.5 to 65.6), while in the most deprived decile it was 51.5% (95% CI: 50.1 to 52.9). The difference between the most and least deprived was strongest in the hospitality and social care sectors.

Exposure to and outcomes from COVID-19 infection

The analysis found that the coronavirus pandemic had a greater impact on individuals in the most deprived areas in exposure to and outcomes from COVID-19 infection. Focusing on the risk of infection, 80.6% (95% CI: 79.9, 81.3) of participants in the most deprived areas reported going into a workplace during the coronavirus pandemic rather than working from home. This was compared with 62.6% (95% CI: 62.1, 63.1) of participants in the least deprived areas. The proportion of participants reporting difficulty maintaining physical distancing at work was 64.3% (95% CI: 63.4 to 65.2) in the most deprived decile compared with 55.7% (95% CI: 55.1 to 56.2) in the least deprived. Participants in healthcare and teaching sectors were also more likely to report difficulty maintaining physical distancing compared with other professions.

On infection risk, the analysis found that individuals in the most deprived decile were more likely to test positive for the Delta or Omicron variants compared with individuals in the least deprived areas. For Delta, the difference was 1.9pp (most deprived: 7.3%, 95% CI: 6.9 to 7.7; least deprived: 5.4%, 95% CI: 5.2 to 5.6). While for Omicron the difference was 1.8pp (most deprived: 6.7%, 95% CI: 6.2 to 7.1; least deprived: 4.9%, 95% CI: 4.6 to 5.1).

Finally, on infection outcomes, the self-reported proportion of people contacting the NHS and hospital admissions was also higher in the most deprived decile compared with the least. For contacting the NHS, the difference was 5.4pp (46.6%, 95% CI: 45.4 to 47.8 against 41.2%, 95% CI: 40.4 to 42.0). The difference in self-reported hospital admission was 2.8pp (6.7%, 95% CI: 5.8 to 7.6 against 3.9%, 95% CI: 3.4 to 4.4), as shown in Figure 2.

Ongoing symptoms following COVID-19 (self-reported long COVID)

This analysis included participants with a positive COVID-19 test who reported that symptoms persisted for at least four weeks. The analysis found that 11.1% (95% CI: 10.5 to 11.8) of participants in the most deprived decile reported any long-COVID symptom compared with 8.1% (95% CI: 7.8 to 8.4) in the least deprived. The results also show that long COVID symptoms varied across deprivation deciles. Weakness or tiredness were the most common symptoms, affecting 67% of individuals in the most deprived areas and 57% in the least. The largest difference in prevalence was in anxiety or worry, which was reported by 40% of individuals with self-reported long COVID in the most deprived areas and by 25% of individuals in the least deprived areas. This is shown in Figure 3.

Back to table of contents

4. Project 2: Occupational analyses using the Coronavirus (COVID-19) Infection Survey

Research aims and methods

This project focused on the interaction between coronavirus (COVID-19) and occupation. It was led by Sarah Rhodes, University of Manchester. The three research aims were to:

  • understand the drivers of occupational differences in rates of COVID-19 infection by examining viral load and vaccination
  • examine how occupational differences vary by ethnic group and region
  • understand whether prevalence and severity of long COVID symptoms differed (respectively) by occupational groups

The analysis used COVID-19 Infection Survey (CIS) data from 26 April 2020 to 31 January 2022 for participants aged between 20 and 64 years. The analysis focused on a new occupational grouping scheme derived from four-digit standard occupational codes (SOC). The Office for National Statistics (ONS) classifies occupational data by SOCs to group occupations according to the level and specialisation of skill. This was triangulated with other occupational groupings used in similar studies.

The analysis focused on three time periods, based on the dominant COVID-19 variant:

  • Alpha-dominant period, 1 April 2020 to 31 May 2021
  • Delta-dominant period, 1 June to 31 October 2021
  • Omicron-dominant period, 1 November 2021 to 31 January 2022

The analyses used regression methods. The first used quantile regression to compare viral load across occupational groups using cycle threshold (Ct) values. The second analysis used a series of time-varying Cox regression models to explore whether occupation explains differences in positivity rates by ethnic group and region. The third analysis used logistic regression to compare the probability of self-reported long COVID across worker sector groups. All regression models adjusted for socio-demographic variables such as age and sex.

Results

Summary

For the first research aim, Sarah Rhodes’ results show that the differences in viral load were small across occupations, while differences by vaccination status did not explain occupational differences in infection risk. The second analysis found that occupation did not explain differences in infection risk by ethnicity or region. For the third research aim, results show that self-reported long COVID prevalence varied by occupation. This analysis used a different methodology to existing ONS outputs and results cannot be compared.

Viral load by occupation and sector

The analysis found that between 26 April 2020 to 31 January 2022 variation in Ct values across occupations was small. Evidence suggested that viral load was higher for participants in the teaching and education sectors, while transport workers also had higher viral loads when compared with IT or non-essential workers. However, the differences were small, and these findings do not imply causality.

Within the Alpha period, workers in education, patient-facing healthcare, personal care, police and protective services, and public-facing transport workers had higher viral loads compared with other office-based workers. Again, the differences were small and do not imply causality. In the Delta and Omicron periods there was little variation between occupation groups.

Vaccination as a mediator in the relationship between occupation and COVID-19 infection

While 94.14% of participants had received two vaccinations by 31 January 2022, Figure 4 shows there was substantial variation by occupation group in the rate of individuals who were not double-vaccinated. The highest rates of non-vaccination was in food processing where 9.04% of workers had not received two doses. Other professions with higher rates of non-vaccination included personal care (8.71%), hospitality (8.55%), and manual (8.38%).

When vaccination was included in the regression model, variation in COVID-19 infection risk between occupational groups remained. This means that the differences in infection risk between occupations are not explained by different vaccination rates. The main exception to this was in manual workers where low rates of vaccination appeared to partially explain elevated relative risks.

COVID-19 infection by ethnic group and occupation

Initial results show that there was variation in infection risk by ethnicity. However, the results showed that differences in occupational exposure do not account for differences in infection risk by ethnicity overall. Interactions between occupation and ethnic group suggested that the relative risk of infection varied by occupation across ethnic groups. However, results were imprecise, making it difficult to describe the exact nature of the variation.

COVID-19 infection by region and occupation

Similar to the findings on ethnicity, the results indicated that despite differences in infection risk between regions, occupation did not account for a substantial portion of the difference. For instance, risk of infection in the South was estimated to be 18% lower than the North when only adjusting for socio-demographic variables (hazard ratio: 0.82, 95% confidence interval (CI): 0.80 to 0.84) and when including occupation (hazard ratio: 0.82, 95% CI: 0.80 to 0.84. Interactions between occupation and region were imprecise.

Ongoing symptoms following COVID-19 (self-reported long COVID) by occupation

This analysis focused on participants with a positive COVID-19 test who reported that symptoms persisted for at least four weeks. The results indicated that rates of self-reported long COVID varied by occupation, even after adjusting for other factors such as age and sex. Overall, the probability of reporting at least one long COVID symptom was 15%. Prevalence was highest in the police and protective services (25%), education (22%), and social care sector (22%).

Occupational differences in risk of self-reported long COVID have reduced over time, especially in health and social care professions compared with low-risk groups. Patient-facing healthcare professionals were at 60% greater risk compared with low-risk groups in the Alpha period. This reduced to 17% greater risk by the Omicron period.

A small proportion of workers in all groups reported long COVID symptoms that affected their life a lot, with the highest prevalence among workers in the education, social care, police and protective services, personal care, hospitality, and transport sectors.

Back to table of contents

5. Project 3: Producing forecasts of coronavirus (COVID-19) infection by age group in England

Research aims and methods

The aim of this research project was to understand how coronavirus (COVID-19) surveillance data and social contact data can forecast COVID-19 infection by age group. It was led by Dr James Munday, London School of Hygiene and Tropical Medicine.

Forecasting infections is important as it helps to support public health responses by informing preparation and mitigation strategies.

This incorporated three main stages, including:

  • preparing the infection, antibody, and social contact data
  • developing a framework for creating forecasts
  • evaluating the forecasts

This study used innovative methods of semi-mechanistic forecasting which combines the strengths of statistical approaches to forecasting with plausible infection dynamics and enables forecasts to be produced for separate age groups. The project used social contact data from the CoMix survey, which asks participants about their social contact and compliance with social distancing measures. The analysis also used COVID-19 Infection Survey (CIS) data on positivity and antibody levels and public NHS vaccination data.

These were included in four forecasting models, which forecast five days (approximately one generation of infection) into the future. The four models were based on the level of social contact data included and are outlined in this section. Cases were projected at weekly intervals between August 2020 and December 2021 in England.

The models were evaluated on how well they predicted infections at 61 historical dates, using the interval score, the absolute error of the mean (AEM), and the forecast bias. For each of these, a lower score means a forecast is performing better, detail is provided in Section 7, Glossary. Their performance was also compared with two basic models (one generation prior and linear extrapolation models), each with more simple assumptions about future infections. The six models are:

  1. full contact matrix model - uses the full CoMix contact matrix information, mean and standard deviation for the expected contact rate between age groups
  2. mean contacts by age group - uses the mean and standard deviation of the overall contact rate of each age-group
  3. mean contacts overall - uses the mean and standard deviation of contact rate of the total population
  4. no contact data - does not use contact data, fits the interaction based on CIS age-specific infection incidence over time
  5. one generation prior model - estimates infections based on incidence data five days prior to the projected data
  6. linear extrapolation - estimates infections based on an extrapolation from the previous two generations (10 and 5 days prior)

Results

Summary

The six methods of forecasting were tested and evaluated. Preliminary results found that the full contact model showed improved relative performance during some periods of the coronavirus pandemic and performed best for young children and older adults. However, the baseline models performed best overall. Further improvement is needed to make more effective forecasts of COVID-19 infections using semi-mechanistic forecasting.

Forecasts

For all four models, the inferred infectiousness by age did not vary overall. However, infectiousness did vary by time, mostly increasing over the course of the coronavirus pandemic. This may reflect the growth of infection and rise of more infectious variants over time.

For the full-contact-data model there were substantial differences in age-specific susceptibility. However, this distribution varied by forecast date. In September to November 2021 there was notably lower susceptibility in 2- to 15-year-olds compared with the rest of the population, with peak susceptibility in young adults. Later in the coronavirus pandemic, the difference between age groups reduced and the model inferred lower susceptibility in older adults when compared with other age groups.

Evaluation

The overall scores for each model are presented in Table 1. A score closer to zero means the model is forecasting infections closer to the observed outcome. The forecast with the best interval score and absolute error was the linear extrapolation baseline, scoring 887.03 and 1219.11 in each metric respectively. This was substantially lower than the full contact model (1213.88 and 1464.02). This means the linear extrapolation method produced forecasted estimates closest to the observed infections when evaluated over the entire study period. The full contact data model was however the best performing of the four models that used contact data.

The relative performance of all the models varied over the course of the coronavirus pandemic. Between November 2020 and January 2021, the full contact data model performed comparably well to the linear extrapolation. The contact models performed particularly poorly during the summer of 2021. This period coincided with the emergence of the Delta variant in the UK. This means that the mean rate of infection on contact was changing rapidly. The model, which fits to 30 days of data, struggled with this change.

The performance of the forecasts also varied by age. The full contact model performed particularly poorly in the age groups between 11 and 35 years. One potential reason for this is that these groups had the highest contact rates, particularly among peers. This may indicate that within-group-mixing accounts for a large component of transmission dynamics in this age group. However, it performed better in young children and older adults, achieving the best score in age groups 2 to 10 years, 50 to 69 years and 70 years and over. This apparent reliance on contact data may indicate that many infections in these age groups result from transmission from other groups.

Back to table of contents

6. Coronavirus (COVID-19) Infection Survey data

Coronavirus (COVID-19) Infection Survey: England
Dataset | Released 24 June 2022
Findings from the Coronavirus (COVID-19) Infection Survey for England.

Coronavirus (COVID-19) Infection Survey: Northern Ireland
Dataset | Released 24 June 2022
Findings from the Coronavirus (COVID-19) Infection Survey for Northern Ireland.

Coronavirus (COVID-19) Infection Survey: Scotland
Dataset | Released 24 June 2022
Findings from the Coronavirus (COVID-19) Infection Survey for Scotland.

Coronavirus (COVID-19) Infection Survey: Wales
Dataset | Released 24 June 2022
Findings from the Coronavirus (COVID-19) Infection Survey for Wales.

Coronavirus (COVID-19) Infection Survey: technical data
Dataset | Released 24 June 2022
Technical and methodological data from the Coronavirus (COVID-19) Infection Survey, England, Wales, Northern Ireland and Scotland.

Back to table of contents

7. Glossary

Cycle threshold (Ct) values

The strength of a positive coronavirus (COVID-19) test is determined by how quickly the virus is detected, measured by a cycle threshold (Ct) value. The lower the Ct value, the higher the viral load and stronger the positive test. Positive results with a high Ct value can be seen in the early stages of infection when virus levels are rising, or late in the infection, when the risk of transmission is low.

Odds ratio

An odds ratio (OR) is a measure of the relative risk of an outcome in one population compared with a different population, where ORs greater than one indicate the outcome is more likely, while less than one is less likely.

Deprivation

Deprivation is based on an index of multiple deprivation (IMD) (PDF, 2.18MB) score or equivalent scoring method for the devolved administrations, from 1, which represents most deprived up to 100, which represents least deprived. The odds ratio shows how a 10-unit increase in deprivation score, which is equivalent to 10 percentiles or 1 decile, affects the likelihood of testing positive for COVID-19.

Hazard ratio

A measure of how often a particular event happens in one group compared with how often it happens in another group, over time. When a characteristic (for example, being male) has a hazard ratio of one, this means that there is neither an increase nor a decrease in the risk of re-infection compared with a reference category (for example, being female).

Semi-mechanistic forecasting

Semi-mechanistic methods are a hybrid of statistical and mechanistic models of forecasting. They use time-series dynamics and data of infectious disease dynamics to estimate a small number of epidemiological parameters under a framework which is consistent with scientific understanding of the dynamics of the system. These are used to create short term forecasting models.

Interval Score

The Interval Score is a Proper Scoring Rule to score quantile predictions, following Strictly proper scoring rules, prediction, and estimation, Gneiting and Raftery (2007). Smaller values are better.

Absolute Error of the Mean (AEM)

The Absolute Error of the Mean is calculated as the average error between the mean of the forecast and the true value over the n forecasts made (number of forecast dates multiplied by five days). Lower values are better.

Bias score

Bias is calculated from predictive Monte-Carlo samples, automatically recognising whether forecasts are continuous or integer valued.

Confidence interval

A confidence interval gives an indication of the degree of uncertainty of an estimate, showing the precision of a sample estimate. The 95% confidence intervals are calculated so that if we repeated the study many times, 95% of the time the true unknown value would lie between the lower and upper confidence limits. A wider interval indicates more uncertainty in the estimate. Overlapping confidence intervals indicate that there may not be a true difference between two estimates.

For more information, see our methodology page on statistical uncertainty.

Long COVID

The estimates presented in this analysis relate to self-reported long COVID, as experienced by individuals at any time, rather than clinically diagnosed ongoing symptomatic coronavirus (COVID-19) or post-COVID-19 syndrome. There is no universally agreed definition of long COVID, but it covers a broad range of symptoms such as fatigue, muscle pain and difficulty concentrating. The list of long COVID symptoms within the COVID-19 Infection Survey (CIS) can be found on our CIS questionnaires.

Back to table of contents

8. Data sources and quality

Our Coronavirus Infection Survey (CIS) methodology article provides further information around the survey design and how we process data.

More information on the strengths and limitations of the data, data uses and users is available in our Coronavirus (COVID-19) Infection Survey QMI and our Coronavirus (COVID-19) Infection Survey statistical bulletin.

More information on the Annual Population Survey is available in the Annual population survey (APS) QMI.

Further information on the CoMix study can be found in CoMix study - Social contact survey in the UK.

Back to table of contents

9. Future developments

This project has highlighted the new ways in which the Office for National Statistics (ONS) is engaging with external experts and stakeholders. The funded part of this work has now been completed by the academics. The academics are now completing their manuscripts for publication in journals with plans for also presenting at conferences in the next year.

Back to table of contents

10. Collaboration

Logos for London School of Hygiene and Tropical Medicine and Public Health England

The Coronavirus (COVID-19) Infection Survey analysis was produced by the Office for National Statistics (ONS) in collaboration with our research partners at the University of Oxford, the University of Manchester, UK Health Security Agency (UK HSA) and Wellcome Trust.

This article presents the methods and results of the three short-term, collaborative academic CIS projects funded by ONS, announced on 24 December 2021. These were led by three research teams:

COVID-19 and social Inequalities

Project lead:

  • Dr Nazrul Islam – University of Oxford

Project team:

  • University of Oxford: Prof. Eva Morris, Prof. Sarah Lewington, Dr. Ben Lacey
  • University of Leicester: Prof. Kamlesh Khunti, Dr. Francesco Zaccardi, Dr. Clare Gillies, Dr. Sharmin Shabnam, Dr. Cameron Razieh, Dr. Yogini Chudasama, Dr. Manish Pareek
  • University of Southampton: Dr. Hajira Dambha-Miller
  • ONS: Daniel Ayoubkhani, Dr. Vahe Nafilyan
  • University College London: Prof. Amitava Banerjee
  • Harvard University: Prof. Ichiro Kawachi
  • University of Cambridge: Prof. Martin White

Occupational analyses using the ONS Coronavirus (COVID-19) Infection Survey

Project lead:

  • Sarah Rhodes – University of Manchester

Project team:

  • University of Manchester: Dr. Jack Wilkinson, Dr. Matthew Gittins, Prof. Martie van Tongeren
  • University of Glasgow: Dr. Evangelia Demou, Dr. Theocharis Kromydas, Prof. Srinivasa Vittal Katikireddi
  • London School of Hygiene and Tropical Medicine: Prof. Neil Pearce
  • University of Lancaster: Dr. Rhiannon Edge
  • ONS: Dr. Vahe Nafilyan

Producing forecasts of COVID-19 infection by age-group in England

Project lead:

  • James Munday – London School of Hygiene and Tropical Medicine

Project team:

  • London School of Hygiene and Tropical Medicine: Prof. Sebastian Funk
Back to table of contents

Contact details for this Article

Simeon North, in collaboration with Nazrul Islam, Sarah Rhodes, James Munday
infection.survey.analysis@ons.gov.uk
Telephone: +44 1633 560499