1. COVID-19 Schools Infection Survey

The COVID-19 Schools Infection Survey is jointly led by the Office for National Statistics (ONS), London School of Hygiene and Tropical Medicine (LSHTM) and Public Health England (PHE).

The COVID-19 Schools Infection Survey aims to investigate the prevalence of current coronavirus (COVID-19) infection and presence of antibodies to COVID-19 among pupils and staff in sampled primary and secondary schools in England.

Repeated surveys are being carried out to collect risk factor information together with virus and antibody samples in a cohort of pupils and staff. Antibody conversion and viral prevalence at points in the academic year are important outcome measures.

This methodology guide is intended to provide information on the methods used to collect the data, process it, and calculate the statistics produced from the COVID-19 Schools Infection Survey. We will continue to expand and develop methods as the study progresses.

This methodology guide can be read alongside:

Back to table of contents

2. Study design: the sample

The COVID-19 Schools Infection Survey (SIS) has a stratified, multi-staged sample design. Strata are formed by a cross-classification of prevalence (each local authority being classified as either “high” or “low” prevalence according to coronavirus (COVID-19) rates in the week 2 to 8 September 2020) and school type (primary or secondary). The first stage of sampling was the selection of local authorities, and it is from each selected local authority that samples of schools (primary and secondary) have been drawn.

The following schools were excluded from the sampling frame:

  • special schools, independent schools, pupil referral units and further education colleges
  • schools taking part in other school-based COVID-19 studies

Sampling of upper-tier local authority areas in England

The study oversampled schools in “high prevalence” areas of the country. High prevalence upper-tier local authorities (N=30) were defined as local authorities in the top 20% when ranked by the rate of confirmed cases of COVID-19 infection per 100,000 population from Pillar 2 testing in the week 2 to 8 September 2020. Low prevalence upper-tier local authorities (N=119) were defined as local authorities in the bottom 80% when ranked by the rate of confirmed cases of COVID-19 infection per 100,000 population from Pillar 2 testing in the week 2 to 8 September 2020.

10 upper-tier local authorities were randomly sampled from the “high prevalence” group and 5 from the remaining “low prevalence” upper-tier local authorities.

High prevalence

Bradford, Gateshead, Knowsley, Lancashire, Leicester, Liverpool, Manchester, Salford, Sunderland, and Warrington.

Low prevalence

Barking and Dagenham, Bournemouth, Christchurch and Poole, Norfolk, Reading, and Redcar and Cleveland.

Sampling of schools

The aim of the study was to recruit 100 secondary schools and 50 primary schools across the 15 selected upper-tier local authorities, with approximately 70% (70 secondary and 35 primary) schools in high prevalence areas and 30% (30 secondary and 15 primary) in low prevalence areas.

Within each of the four strata (high prevalence primary, high prevalence secondary, low prevalence primary, low prevalence secondary), a further stratification by upper-tier local authority (of the ones selected in the first stage of sampling) has been applied. The aim has been to achieve an equal number of schools in each upper-tier local authority within the prevalence by type stratum.

To compensate for non-response and refusal we selected 250 schools to be approached.

Practical modifications

Several practical modifications were made to the sample selection procedure because of practical constraints on the selection of schools.

The number of academy trusts (which manage some schools, the rest being managed by the local authorities themselves) that could be selected in each upper-tier local authority was limited to a maximum of four across primary and secondary school types combined. This was because of limited capacity to approach academy trusts for enrolment. Thus, once the cap on academy trusts was reached, no schools from other academy trusts could be included in the sample. This amendment has not been explicitly factored into the calculation of inclusion probabilities or design weights, but the later calibration applied should help mitigate the effects of this.

Changes were also made with respect to one of the selected local authorities. We stopped recruiting schools run by that local authority as it decided to withdraw its managed schools from the study. An attempt to mitigate this was made by sampling more schools run by academy trusts within the upper-tier local authority.

Lower than anticipated response rates have seen the school samples expanded. Additional schools were selected in November 2020, and a further sample within particular upper-tier local authorities was drawn in February 2021. We have attempted to ensure there are at least three responding schools of each type (primary and secondary) in each upper-tier local authority to increase the reliability of individual local authority comparisons at the data publication stage. This criterion was not met in five upper-tier local authorities (Norfolk, Lancashire, Bournemouth, Leicester and Sunderland) in the achieved sample for Rounds 1 and 2.

Sampling of Individuals: staff and pupils

Within the selected schools, primary and secondary, all staff were eligible and invited to participate in the study. Within primary schools, all pupils were eligible to participate, however, because of the larger number of pupils in secondary schools, eligibility was restricted to two consecutive year groups in each secondary school. Year groups in secondary schools were chosen at random and in equal proportions across the schools and local authorities. Low response however, in Rounds 1 and 2, has led to a decision to widen eligibility to pupils in all year groups (except Year 11) within secondary schools from Round 4 onwards.

Pupils from Year 11 are not eligible for enrolment. It was deemed that this study would be too disruptive for these pupils during their final year of secondary school.

Back to table of contents

3. Study design: data we collect

In each school that agreed to participate, head teachers were asked to register and complete a short questionnaire. They were also provided with information about the survey to forward to staff, parents of pupils aged under 16 years, and pupils aged 16 years or over. After completing a consent form, participants (or their parent if the pupil was under 16 years) will be asked to complete a short online “enrolment” questionnaire. A questionnaire collecting more detailed information will be delivered to participants following each round of current coronavirus (COVID-19) infection and SARS-CoV-2 (COVID-19) antibody tests.

A study team visited each school to collect the biological samples for testing from the staff and pupils who had enrolled in the study. Tests for pupils involved a nose swab for current coronavirus (COVID-19) infection, and an oral fluid (saliva) sample for SARS-CoV-2 (COVID-19) antibodies against the virus. Tests for staff involved a nose swab for current COVID-19 infection and a finger prick blood test for COVID-19 antibodies against the virus. Everyone enrolled was offered testing regardless of whether they were experiencing COVID-19 symptoms, although people experiencing COVID-19-like symptoms should not be attending school.

For each subsequent round of testing, participants receive advance notification of the date of the sample collection day, with a short follow-up questionnaire.

Back to table of contents

4. Timing

The first round of testing took place between 3 and 19 November 2020. The second round of testing took place between 2 and 10 December 2020. Only those who had enrolled in the study and were in school on the day of testing were tested. This means those with coronavirus (COVID-19) symptoms and those instructed to self-isolate would not be present in the school building to be tested on the assigned test day.

The closure of schools during the lockdown from 5 January 2021 meant that the third round of the Schools Infection Survey was cancelled. However, anyone who had enrolled in the study but had not had an antibody test was offered a home testing kit.

In future testing rounds, those absent from school on the day of testing will receive a testing kit at their house so they can take part in that round of the study.

Back to table of contents

5. Participation rates

Recruitment of schools to the study began on 12 October 2020. At the time of the December 2020 testing period, the sample included 41 primary schools, 78 secondary schools and two all-through schools (which were included with secondary schools for the purpose of the sample selection) across the 15 sampled local authorities. In Round 1 of testing, 48,100 participants (9,900 staff and 38,200 pupils) were estimated to be eligible to take part in at least one coronavirus (COVID-19) current infection or SARS-CoV-2 (COVID-19) antibody test. In Round 2 of testing, 57,400 participants (12,200 staff and 45,200 pupils) were estimated to be eligible to take part in testing (Table 1).

In Round 1 of testing, 9,732 (4,337 staff and 5,395 pupils) participated in at least one current COVID-19 infection or COVID-19 antibody test. In Round 2 of testing, 12,203 (5,114 staff and 7,089 pupils) participated in at least one test. Across the two rounds of testing, 14,185 (6,129 staff and 8,056 pupils) participated in at least one COVID-19 current infection or antibody test.

Back to table of contents

6. Weighting

Weighting is applied to the data collected from responding schools, pupils and staff to make the data representative of the wider, target population of the local authority from which they have been drawn. The weighting takes account of the design of the sample and reflects the response patterns observed and the total numbers of staff and pupils in schools and the local authorities selected.

Accounting for response patterns in the weighting is particularly important. Response rates to the study may differ between various subgroups of the eligible population, and if this response propensity is correlated with the study outcomes, positivity rates and other estimates will be biased if they are computed from the unweighted, observed data. It is important to note that weighting can only be carried out to adjust for observed biases in response rates. There may be other unobserved biases that have an impact on an individual’s likelihood of taking part, which cannot be controlled for by the weights calculated.

Generally, our aim of weighting the Schools Infection Survey data has been to achieve representativeness at the individual local authority level. More specifically, separate sets of weights have been computed for each local authority, participant group (pupils, staff), school type (primary or secondary schools), type of test (antibody or current infection) and target population (local authority level enrolled pupil population for pupils and local authority level school staff).

For each of these sets, weighting has been performed as a single calibration1 step with uniform input weights, which reflect the sample design within each local authority. After calibration, the weights are assigned to particular subgroups of the sample and sum to prespecified population totals. The following calibration groups were used:

  • sex (male, female)
  • ethnicity (white British, not white British)
  • age group for staff (under 25 years, 25 to 29 years, 30 to 39 years, 40 to 49 years, 50 to 59 years, 60 years and over)
  • year group for pupils (Reception and Years 1 and 2, Years 3 and 4, Years 5 and 6, Years 7 and 8, Years 9 and 10, Years 12 and 13)

Because of the lack of data for the current academic year, totals for the local authority level pupil and staff populations were computed from the 2019 to 2020 school census tables published by the Department for Education (DfE) and applied without further correction. In some cases, calibration groups had to be combined or omitted because of low counts in certain categories.

For calibration to staff and pupil totals, the effect of weighting on the positivity estimates was found to be moderate in most cases. This implies that either the amount of oversampling or undersampling with respect to calibration groups has usually been limited or that the positivity estimates are similar across calibration groups. For example, for staff antibody testing, the absolute difference between weighted and unweighted estimated prevalence rates was at most 0.5 percentage points (pp) in 24 out of 51 cases and at most 1 pp in a further 9. The largest difference was 4.1 pp.

Weighting for the first and second rounds of testing was created in February 2021. For further rounds of testing, we will continue to use weighting and refine our procedure through modifications to the design weights and performing explicit non-response modelling.

Notes for: Weighting

  1. Mathematically, calibration can be viewed as a constrained optimisation whereby the input weights are transformed into output weights such that the two sets of weights are as similar as possible (for example, as measured by the squared difference) while the output weights simultaneously have to fulfil the calibration group constraints.
Back to table of contents

7. Linking survey data and biological samples

Information collected from each participant who agreed to take part is anonymised. An individual serial number and identifier (Participant ID) is used. This allows for the differentiation of data collected between each pupil or staff member. Each Participant ID is linked to its school by the school’s unique reference number.

The biological samples are given a barcode and this barcode is also recorded against the Participant ID by the study team. This allows the test results to be matched to the correct individual. Personal identifiers (for example, name) are not used to link the data.

Back to table of contents

8. Test sensitivity and specificity

The coronavirus (COVID-19) infection estimates provided in the COVID-19 School Infection Survey bulletin are the percentage of the school-based population testing positive for current COVID-19 infection on the day of testing. The proportion testing positive for current COVID-19 infection should not be interpreted as being the prevalence rate. To calculate prevalence rates, we would need an accurate understanding of the swab test’s sensitivity (true-positive rate) and specificity (true-negative rate).

Test sensitivity

Test sensitivity measures how often the test correctly identifies those who have the virus, so a test with high sensitivity will not have many false-negative results. Studies suggest that sensitivity may be somewhere between 85% and 98%.

Our study involves participants self-swabbing under the supervision of a study healthcare worker. It is possible that some participants may take the swab incorrectly, which could lead to more false-negative results. However, research suggests that self-swabbing under supervision is likely to be as accurate as swabs collected directly by healthcare workers.

Test specificity

Test specificity measures how often the test correctly identifies those who do not have the virus, so a test with high specificity will not have many false-positive results.

We can assume that the specificity of our test must be very close to 100% because the numbers of positive tests in our study is low, meaning that specificity would be very high if all positives were false. We know that the virus is still circulating, so it is extremely unlikely that all these positives are false. However, it is important to consider whether any of the small number of positive tests were false-positive.

Type of tests

The nasal swabs were sent to one of the UK’s national laboratories for COVID-19 detection using an accredited reverse transcriptase polymerase chain reaction (RT-PCR) test. This assay has been shown to have a 70% sensitivity and a high 95% specificity.

Capillary blood samples from staff were collected and tested using a commercial immunoassay for antibodies against SARS-CoV-2 (Roche cobas® Elecsys Anti-SARS-CoV-2 assay). The assay has been shown to have a high sensitivity (97.2%) and specificity (99.8%).

Oral fluid samples from students were collected and sent for detection of antibodies against the SARS-CoV-2 Nucleoprotein (NP) using an Immunoglobulin G (IgG)-capture-based enzyme immunoassay (EIA). The assay has been shown to have 80% sensitivity and 99% specificity. Although the presence of immunoglobulins in oral fluids are in concentrations of at least 1 per 1000th of that found in blood, the reactivity of salivary immunoglobulins mirrors that of serum. Therefore, oral fluids are an attractive non-invasive alternative sample to blood, particularly in children.

Back to table of contents

9. Uncertainty in the data

The estimates presented in the Schools Infection Survey statistical bulletin are subject to uncertainty. There are many causes of uncertainty, but the main sources of uncertainty in the analysis and data presented include each of the following.

Uncertainty in the test (false-positives, false-negatives)

These results derive directly from the tests, and no test is perfect: there will be false-positives and false-negatives from the tests. In addition, false-negatives could also arise because participants in this study are self-swabbing and some may not produce a sample that can provide a conclusive result.

The data are based on a sample of people, so there is some uncertainty in the estimates

Any estimate based on a sample contains some uncertainty as to whether it reflects the broader population of interest because of its smaller sample size. A confidence interval gives an indication of the degree of uncertainty of an estimate, showing the precision of a sample estimate. The 95% confidence intervals are calculated so that if we repeated the study many times, 95% of the time the proportion testing positive would lie between the lower and upper confidence limits. A wider interval indicates more uncertainty in the estimate. Overlapping confidence intervals indicate that there may not be a true difference between two estimates.

Pupils and staff who chose to enrol in the study may be different to those who do not enrol

As well as uncertainty, samples can be affected by non-response bias. This can occur when there is a systematic difference between those who take part in the study and those who do not, meaning participants are not representative of the study population. If this difference is also associated with the likelihood of contracting the coronavirus (COVID-19) then the estimates produced from the data collected cannot be generalised to the study population as a whole. Weighting is used to compensate for non-response.

Back to table of contents