Table of contents
- Background of social survey data collection changes since the coronavirus (COVID-19) pandemic began
- Methodology
- Impact of operational changes during the coronavirus (COVID-19) pandemic
- Impact of introducing knock to nudge (KtN) as an additional measure during the coronavirus (COVID-19) pandemic
- Discussion of the impact of COVID-19 on ONS social survey data collection
- Glossary
- Annex
2. Methodology
Datasets used for analysis
We will compare survey response rates and characteristics of responding households and individuals for all surveys across three different time periods and modes of data collection.
The first group of unweighted data covers April 2019 to February 2020, when respondents were interviewed face-to-face in their homes for all surveys studied. These datasets do not include the month of March 2020 because data collection was suspended on 17 March 2020.
The second group of unweighted data covers a time period when respondents were interviewed via telephone, without knock-to-nudge (KtN) intervention. This time period varies between surveys because KtN was implemented at different times for different surveys.
Telephone numbers of respondents to the Labour Force Survey (LFS), the Survey on Living Conditions (SLC), the Living Costs and Food Survey (LCF), and the Wealth and Assets Survey (WAS) were obtained via the online portal, a telematching provider, or in response to interviewer letters. However, telephone numbers of respondents to the National Survey for Wales (NSW) were already available, because the sample was re-sampled from the previous financial year.
The dataset for NSW for the telephone period includes May to July 2020; this is so we can compare sample characteristics with the previous financial year. After July 2020, NSW sampled respondents that took part in NSW in 2017 to 2018; therefore, we could not explore any direct influence of the mode from one financial year to the other.
The third group of unweighted data includes cases where respondents were contacted by interviewers via KtN and subsequently interviewed by telephone. Most of these respondents would have had a doorstep interaction with an interviewer, with very few having only received a "called today" card. For the purpose of this paper, no dataset from June 2021 onwards was included; this controlled for the impact that the gradual lifting of most coronavirus (COVID-19) restrictions may have had on the characteristics of respondents.
The third group of data only includes respondents whose telephone numbers were not obtained via the online portal or by telematching, which introduces a self-selection bias. Also, not all cases without a phone number were allocated to KtN.
However, looking at the distribution of the characteristics of respondents who received KtN intervention is useful. It shows us whether KtN helps obtain respondents with characteristics that were under-represented in our survey datasets prior to the KtN intervention (when telephone numbers were obtained via an online portal or by telematching only).
Dataset geography and time periods
LFS (wave one)
The geography covered in LFS datasets was Great Britain (GB). Dataset 1 (face-to-face mode) took place from April 2019 to February 2020. Dataset 2 (telephone mode) took place from April 2020 to March 2021. Dataset 3 (telephone mode filtered for cases that received KtN) took place from April to May 2021.
SLC (wave one)
The geography covered in SLC datasets was GB. Dataset 1 (face-to-face mode) took place from April 2019 to February 2020. Dataset 2 (telephone mode) took place from April to September 2020. Dataset 3 (telephone mode filtered for cases that received KtN) took place from October 2020 to March 2021.
LCF
The geography covered in LCF datasets was GB. Dataset 1 (face-to-face mode) took place from April 2019 to February 2020. Dataset 2 (telephone mode) took place from April to September 2020. Dataset 3 (telephone mode filtered for cases that received KtN) took place from October 2020 to March 2021.
WAS (wave one)
The geography covered in WAS datasets was GB. Dataset 1 (face-to-face mode) took place from April 2019 to February 2020. Dataset 2 (telephone mode) took place from April to December 2020. Dataset 3 (telephone mode filtered for cases that received KtN) took place from January to March 2021.
NSW
The geography covered in NSW datasets was Wales. Dataset 1 (face-to-face mode) took place from April 2019 to February 2020. Dataset 2 (telephone mode) took place from May to July 2020. Dataset 3 (telephone mode filtered for cases that received KtN) took place from January to March 2021.
Northern Ireland data were not included in the analysis for LFS, SLC, LCF and WAS. Data in Northern Ireland are collected by the Northern Ireland Statistics and Research Agency (NISRA), which has not implemented KtN.
Variables used for analysis
We wanted to compare characteristics of respondents and households across surveys and different modes of collection. For the analysis in this paper, we therefore included person-level variables (age, ethnicity, marital status, and National Statistics Socio-economic classification (NS-SEC)) and household-level variables (tenure, household size, and indices of multiple deprivation in quintiles).
NSW explored variables at household reference person-level only, whereas other analysis for surveys accounted for all household members when exploring person-level variables.
More details about the variables explored can be found in Annex 2.
Reference estimates of the GB population are used to assess the biases introduced by the change of data collection mode for some of variables. However, the main ONS social surveys do not target the whole GB population but exclude certain small subgroups. For example:
LFS includes residents in private households, residents in NHS accommodation, and young people living away from their parental home during term time (about 98.5% of the total UK population), but excludes people not in households (such as people in care homes and prisoners)
SLC includes slightly fewer people than LFS because residents in NHS accommodation are also excluded
LCF and WAS only include residents in private households (about 97% of the population)
Because of the differences in target populations, the proportions for the various categories (such as age categories) of the variables of interest (such as age) in the GB population may therefore vary slightly between surveys.
Statistical tests
The Pearson's chi square of association statistical test was used to see whether there was any association among the selected person-level or household-level variables and modes of data collection or ways of gathering contact details.
A chi-square test compares the observed frequencies with those you would expect to get by chance if there was no association. This test is used on categorical or ordinal data.
A p-value of less than 0.05 is reported as significant. This means that the statistical test is showing a statistically significant result and an association between a variable and the mode of collection, for example, which is greater than would be expected by chance. It should be noted that a chi-square test is highly sensitive to sample sizes. For example, for large sample sizes, a weak association between two variables could become significant.
For a fully valid statistical test, the sampling design should be taken into account. Surveys are clustered and the dependence between observations that is introduced by clustering can lead to biased results. In particular, p-values may become too small and confidence intervals too narrow. However, design-adjusted tests require the full weighted microdata with cluster and stratum information, which are not yet available for all surveys. The standard statistical software packages offer tests, which deal with weighted proportions (a more common scenario). None of the tests offered by standard statistical packages are appropriate for the KtN scenario, where the null hypothesis involves the comparison of raw unweighted proportions.
As it was not feasible to run design-adjusted tests, standard chi-square tests were used to provide some information on whether the observed results could simply occur by chance and sampling fluctuations. Small p-values imply that a design-adjusted test would also produce a highly significant result. Similarly, large p-values imply that the design-adjusted p-values would also be large. No clear conclusion can be drawn in cases where p-values are neither very small nor large.
There is some ambiguity in this approach, because it is unclear when exactly a p-value should be considered neither very small nor large. However, this does not affect the key messages in this paper, which are largely based on descriptive statistics and exploring the characteristics of different datasets.
Back to table of contents6. Glossary
Waves
Traditional to longitudinal surveys that re-sample households over a period of time to understand changes in society. This article only focuses on the first wave of longitudinal surveys.
Showcards
Used in a face-to-face survey so that respondents can choose an answer on the card rather than reading and answering with an answer option. The answers listed can be in the form of numbers, scales, words, pictures of other graphical representations.
Telematching
A process whereby a contractor provides an online facility to match telephone numbers against sampled addresses, with landline and mobile numbers provided from sources such as the electoral register and British telecommunications. Prior to the coronavirus (COVID-19) pandemic, this was already a routine process on the Labour Force Survey (LFS) for sampled addressed located north of the Caledonian canal.
Cross-sectional survey
Respondents are asked to take part in a survey at one specific point in time.
Longitudinal survey
Respondents are asked to take part in a survey over a period of time.
Survey and respondent burden
Reflects the time and effort in answering survey questions and can be influenced by questionnaire design, survey length, topic and mode.
Randomised controlled trials
A form of research trial where individuals are assigned randomly to different experimental conditions. There are usually one or more experimental conditions, where the impact of the experimental conditions is compared with a control condition that has to intervention applied.
Noncontacts
Sampled addresses where the interviewer has not been able to establish any contact. That is, the respondent has neither conducted the interview nor refused the survey request. This differs from ineligible addresses, which are unoccupied or not suitable for the survey request.
Incentives
A form of compensation for a respondent’s time and effort to fill in a survey. Incentives can be monetary and non-monetary and offered either ahead of survey participation (unconditionally) or after the respondent had taken part in the survey (conditionally). Incentives are also effective in increasing responses to the survey.
Online portal
Was set up at the beginning of the coronavirus pandemic, urging respondents to provide ONS with there telephone number to facilitate social survey interviewing over the phone. The online portal asked for the respondents’ Unique Access Code provided in the advance letter and subsequently their telephone number. This information was iteratively fed through to survey interviewers for telephone interviewing.
Quota
The number of addresses assigned to individual interviewers.
Back to table of contents7. Annex
Annex 1
Survey | Unconditional incentive | Conditional incentive |
---|---|---|
LFS wave 1 | £10 | - |
SLC wave 1 | £5 | - |
LCF | £5 | £50 |
WAS wave 1 | £5 | £10 |
NSW | - | £15 |
Download this table Table 10: Incentive offered for Labour Force Survey (LFS) wave 1, Survey on Living Conditions (SLC) wave 1, Living Costs and Food Survey (LCF), Wealth and Assets Survey (WAS) wave 1, and National Survey for Wales (NSW), UK
.xls .csvAnnex 2
Annex 2a
Household variables explored.
Four household-level variables are included in the social surveys analysed in this article. These are:
tenure
household size
Indices of Multiple Deprivation (IMD).
The tenure variable has three categorisations. These are:
own (the household is owned outright)
mortgage (the household is owned through a mortgage)
share or renting; this is a collapsed category including sharing, renting, living rent free, and squatting (squatting was only provided by the WAS Survey)
NSW had differing categories for tenure. These were mapped onto the three main categories to produce a harmonised variable.
The household size variable has three categorisations. These are:
1 (a single occupant household)
2 (a two person household)
3 plus (three or more people living in the household)
Indices of Multiple Deprivation (IMD) has five categorisations. These are:
most deprived 20%
most deprived 40%
deprived 50%
least deprived 40%
least deprived 20%
This variable is the official measure of relative deprivation for small areas (or neighbourhoods) in Great Britain (GB). It is common to describe how deprived an area is by saying whether it falls among the most deprived 10%, 20%, and so on.
In this report, IMD quantiles were used rather than deciles. The IMD scores are based on seven different domains of deprivation:
income
education
skills and training
employment
health and disability, crime
barriers to housing and services
living environment
IMD was also used for analysis on NSW for consistent analysis across surveys, although usually the Welsh Index of Multiple Deprivation (WIMD) is preferably used for NSW. For this analysis, the November 2019 csv file was used.
Annex 2b
Person-level variables explored.
Four person-level variables are included in the social surveys analysed in this article. These are:
age
ethnicity
marital status
The National Statistics Socio-economic classification (NS-SEC)
The age variable has three classifications, which are:
aged 0 to 15 years
aged 16 to 45 years
aged 46 years and over
NSW was the only survey that did not collect any information for respondents aged under 16 years.
The ethnicity variable has three categorisations. These are:
White
Black, Asian, Arab, Mixed and Other
Missing
The ethnicity group categories Black, Asian, Arab, Mixed and Other were grouped together owing to small numbers within the sub-categories. The category of Missing was also included for some surveys when it was a substantial category and was not missing at random.
The collection of the ethnicity variable did vary across the surveys. For example:
WAS did not include the Arab ethnicity group as an answer option or allow for ethnicity to be captured by proxy
NSW did not ask this question in the telephone mode when operating without knock-to-nudge (KtN)
the ethnicity question was temporarily excluded from the SLC (April to May 2020) and LCF (April to July 2020) questionnaires to optimise the surveys for the telephone mode
The marital status variable has four categorisations. These are:
single
married, civil partnership or separated
divorced or dissolved civil partnership
widowed or surviving civil partner
This variable was filtered for respondents that were aged 16 years and over. Similar answer categories in relation to legal status categories were grouped together. For example, married, separated and in a civil partnership falls within the same category legally, so were grouped for the purposes of this research.
The NS-SEC variable has three categorisations. These are:
higher managerial, administrative, and professional occupations
intermediate occupations
routine and manual occupations
The NS-SEC variable was filtered for respondents that were aged 16 years and over. It was also filtered to include employed respondents only.
Annex 3
Variable | Population estimate |
---|---|
Age | |
0-15 | 19% |
16-45 | 38% |
46+ | 43% |
Ethnicity | |
White | 88% |
BAME | 12% |
Marital status | |
Single | 35% |
Married/Civil-Partner/Separated | 50% |
Divorced | 8% |
Widowed | 6% |
Household size | |
1 | 29% |
2 | 35% |
3+ | 36% |
Employment - nssec | |
Higher | 45% |
Intermediate | 24% |
Routine | 31% |
Tenure | |
Own | 35% |
Mortgage | 30% |
Rent | 35% |
Indices of multiple deprivation quintiles | |
Most deprived 20% | 20% |
Most deprived 40% | 20% |
Deprived 50% | 20% |
Least deprived 40% | 20% |
Least deprived 20% | 20% |
Download this table Table 11: Population estimates from 2019 to 2020, UK
.xls .csvWe got estimates for:
age using mid-year estimates from 2019, including Northern Ireland
ethnicity using weighted LFS data from the financial year 2019 to 2020, including Northern Ireland
marital status using 2019 GB population estimate data from England and Wales, and Scotland (PDF, 381 KB)
household size using weighted LFS data from the financial year 2019 to 2020, including Northern Ireland
NS-SEC using weighted LFS data from the financial year 2019 to 2020, including Northern Ireland
tenure using weighted LFS data from 2019, including England, Scotland and Wales
IMD using statistical random sampling, which assumes each group should contain around 20% of households, owing to the construction of the category
Annex 4
NSW | ||
---|---|---|
FtF | Telephone | |
Time period | Apr 19 to Feb 20 | May 20 to Jun 20 |
Survey response | 58.5 | 72.6 |
Standard deviation | 2.3 | 1.6 |
Highest month response rate | 61.2 | 74.4 |
Lowest month response rate | 53.8 | 71.2 |
Download this table Table 12: Response rates for National Survey for Wales (NSW) when conducted face-to-face (FtF) and over the phone with a re-contact sample, April 2019 to June 2020, Wales
.xls .csv
NSW | ||
---|---|---|
Telephone | Telephone and KtN | |
Time period | May 20 - Jun 20 | Jan 21 - Mar 21 |
Survey response | 72.6 | 39.5 |
Standard deviation | 1.6 | 5.4 |
Highest month response rate | 74.4 | 45.5 |
Lowest month response rate | 71.2 | 35.4 |
Download this table Table 13: Response rates for National Survey for Wales (NSW) when conducted over the phone and through knock-to-nudge (KtN) from May 2020 to March 2021, Wales
.xls .csvContact details for this Methodology
sabina.kastberg@ons.gov.uk, veronique.siegler@ons.gov.uk
Telephone: +44 1633 455934, +44 1329 447803