This research is the first conducted by the Office for National Statistics (ONS) to compare administrative and survey-based estimates of self-employment income.
Median self-employment income is consistently higher in the survey-based Effects of taxes and benefits measure than the administrative-based Self Assessment measure, while the gap between mean and median values is much larger for the admin-based measures across all three years (tax years ending 2016, 2017 and 2018).
Initial analysis suggests this could be because the administrative data better capture the top and bottom of the income distribution, although further research is needed to confirm this.
This phenomenon was also true when looking at administrative and survey-based records for the same person, with some survey respondents potentially reporting turnover rather than income – however, this research is complicated because of data linkage challenges.
Evidence published by HM Revenue and Customs (HMRC) suggests some individuals under-report their self-employment income when submitting a Self Assessment tax return; however, our analyses cannot shed light on whether these individuals would feed this under-reporting through to their survey response.
The Office for National Statistics (ONS) publishes official estimates of income in the Effects of taxes and benefits on UK household income (ETB) yearly statistical bulletins, using Living Costs and Food Survey (LCF) data. The data and publications include breakdowns for self-employment income.
This article presents research into the use of HM Revenue and Customs (HMRC) Self Assessment data to produce an administrative-based measure of self-employment income, replicating methods used to produce ONS’ survey-based estimates.
Comparisons are made using survey and administrative data covering the tax years ending 2016, 2017 and 2018, as well as with HMRC's official estimates of self-employment income.
This research will contribute to two main work streams as part of our plans to transform population and social statistics:
the development of a measure of self-employment income to incorporate into our Admin-based income statistics
the potential use of administrative data to improve existing survey-based measures of income or as an alternative data source to replace survey income questions
Creating a Self Assessment measure of self-employment income using survey-aligned methodology
Our survey-based measure of self-employment income draws on existing methods used to produce the ONS' ETB estimates. To compare, we have developed an administrative data-based measure which aligns with these existing methods to ensure similar concepts are being compared throughout. This will help us understand how self-employment income estimates differ when substituting administrative data into existing survey processes.
Three rules have been identified to ensure consistency between these measures:
The LCF asks respondents to report self-employment income for up to three jobs from a specific box on their Self Assessment return, or other tax-related documentation. We have sourced income amounts from the same Self Assessment box to calculate our administrative-based measure1. This guidance is specific to self-employment profits, so the closest aligned box collecting self-employment loss has been sourced where appropriate.
During LCF processing, non-zero loss income amounts are converted to zeros before being included with profits to calculate self-employment income. The same processing has been applied to Self Assessment data.
The LCF asks respondents to report income from the most recent year that they prepared accounts for HMRC. For around two-thirds of respondents, this is the previous tax year, with around one-third reporting from two years ago, meaning collected figures are always out of date. LCF processing uprates2 these figures to the current tax year. We have designed a similar process for the linked records analysis, with tax years aligned between the LCF and Self Assessment data.
Notes for: About our transformation research:
- LCF respondents were asked additional questions to capture all self-employment income, including whether they have drawn money from a work account for non-business purposes. This has been considered when designing our measures.
- Uprating LCF self-employment income to current tax year involves: 1) splitting figures into year quarters; 2) using employee income for corresponding period to determine how much self-employment income should be increased or decreased.
Aggregate comparisons are made between the Office for National Statistics' (ONS') survey-based ETB measure of self-employment income and individuals reporting self-employment income through a Self Assessment tax return.
Median self-employment income is consistently higher in ETB than our equivalent administrative-based measure across all three years (tax years ending 2016, 2017 and 2018). This pattern was consistent when comparing self-employment income within income deciles in all years.
Mean self-employment income is also higher in ETB estimates, with the gap between mean and median values larger for the administrative-based measure across all three years.
The ETB estimates that more individuals are in self-employed work in the tax years ending 2016 and 2018, although not 20171, and that this amounts to an increasingly higher total self-employment income amount when compared with Self Assessment estimates for all years.
|Tax year ending 2016|
|Self Assessment||Effects of taxes and benefits|
|Tax year ending 2017|
|Self Assessment||Effects of taxes and benefits|
|Tax year ending 2018|
|Self Assessment||Effects of taxes and benefits|
Download this table Table 1: Self Assessment and Effects of taxes and benefits estimates of self-employment income, tax years ending 2016, 2017 and 2018.xls .csv
This suggests that survey-based ETB measures of self-employment income are noticeably larger than the equivalent administrative-based Self Assessment measures.
One explanation for this difference is that the coverage of the two data sources differ. Given the count of self-employed individuals is higher in the ETB than Self Assessment across two of the three years, more individuals may be reporting self-employment income in the survey than individuals completing a Self Assessment return. Individuals are not legally required to submit a Self Assessment return for total self-employment receipts less than £1,000. Such individuals could therefore report this self-employment income in the survey but may not have submitted a Self Assessment return.
Our analyses have shown that in the tax year ending 2016, 6% of Living Costs and Food Survey (LCF) respondents reported a self-employment income less than £1,000, while 20% of individuals submitting Self Assessment returns reported self-employment incomes below this threshold. Having removed returns reporting £0 incomes, this proportion still remains above the LCF proportion at 8%. This suggests the differences in self-employment income between Self Assessment and ETB estimates are not the result of fewer individuals reporting less than £1,000 incomes via Self Assessment returns.
Similarly, managing directors of their own company may report as self-employed on the survey, yet for tax purposes would be classified as a company employee, so would not have submitted a self-employment tax return. Future research will aim to identify these individuals using descriptive information provided, as well as other sub-groups who inconsistently report their income on the LCF and Self Assessment.
In comparison with survey data, administrative data are much better at capturing very high earners, with the highest self-employment income being over 100 times higher in the Self Assessment data than in the LCF. Very high incomes can be difficult to capture using surveys, with individuals either not responding or being sampled for the survey.
While this does not explain why ETB estimates of self-employment income are higher than the Self Assessment estimates, it could explain why the differences between mean self-employment income are smaller than those between medians, given very high incomes will increase the mean.
Self Assessment data also capture a larger proportion of relatively small income values. Analyses found that in the tax year ending 2016, 42% of LCF respondents reported incomes below £10,000, whereas 60% of individuals submitting Self Assessment returns reported self-employment incomes below £10,000. After removing £0 incomes this fell to 53%, remaining above the LCF proportion reporting below £10,000. This may help to explain the lower mean and median administrative-based measures of self-employment income.
Alternative Self Assessment-based estimates of self-employment income
Our research uses Self Assessment data supplied by HM Revenue and Customs (HMRC), which include data from all tax returns submitted every year. We have used this supply to create an administrative-based measure of self-employment income, for comparison with the survey-based ETB measure.
Self Assessment data also feed directly into HMRC's collection of statistics about personal incomes, released as National Statistics. These statistics are produced using HMRC's Survey of Personal Incomes (SPI), which samples information held by HMRC about individuals liable for UK income tax. The SPI samples individuals from three operational systems: Pay As You Earn, Self Assessment and Claims. A slight difference between SPI and the ONS Self Assessment-based estimates is the exclusion of data from SA200 returns2, although this should have minimal impact on the findings reported in this section and in Figures 2, 3 and 4.
These SPI published statistics include a breakdown by income type, including self-employment income. This SPI-based measure of self-employment income is calculated as:
Self-employment income = profit minus (capital allowances plus losses brought forward)
This calculation aligns well with the variable used for our Self Assessment and ETB-based measures, with both sourcing the variable "Total taxable profits" (where ETB survey respondents refer to a Self Assessment return). This incorporates capital allowances and losses brought forward.
The SPI measure is produced for all individuals with self-employment income and for self-employed individuals only liable to tax. When produced for all self-employed individuals, instances in which a loss is made or profits are completely offset will be recorded as nil self-employment income. However, individuals will be retained in the dataset and included in the count of self-employed individuals.
This methodology aligns well with methods to produce ETB and replicated for our Self Assessment measure. Figure 2 illustrates how the two SPI estimates differ from our Self Assessment and ETB estimates.
The SPI count of taxpaying-only self-employed individuals is lower than the Self Assessment and ETB measures. The mean self-employment income is higher for this taxpayer-only group, consistent with this measure excluding individuals with lower, non-taxable amounts.
In contrast, the SPI count of all self-employed individuals (regardless of tax liability) is noticeably higher than the Self Assessment and ETB counts, suggesting the SPI methodology has identified more self-employed individuals in the UK.
A similar trend is seen for total self-employment income, with the SPI (all self-employed) measure recording a higher total income figure compared with the other three measures, albeit only marginally higher than ETB.
Mean self-employment income for this SPI (all self-employed) measure is lower than the means estimated from the other three measures. This demonstrates that despite capturing a greater number of self-employed individuals, on average, lower self-employment incomes are being captured via this SPI measure.
It should be noted that the grossing factors used by HMRC to produce population-level estimates were revised for the tax year ending 2019 figures. This resulted in a decrease of around 630,000 self-employed individuals. Grossing factors were not adjusted for previously published years; however, we could expect a similar decline in self-employment individuals for the tax year ending 2016, bringing it more in line with our Self Assessment and ETB counts.
Notes for: Aggregate comparisons of self-employment income from the Effects of taxes and benefits (ETB) and Self Assessment data:
- Differences in ETB estimates between the three years may be because of volatility introduced by the underlying survey methodology.
- SA200 forms are issued directly by HMRC for certain individuals who have simple tax affairs. Future developments of ONS Self Assessment-based estimates will aim to incorporate these data, which are already included in SPI.
The Living Costs and Food Survey (LCF) and Self Assessment tax returns both capture self-employment income. The LCF does so by taking a sample of the self-employed population, regardless of income, while Self Assessment returns by law should capture all individuals with self-employment income above £1,000. As such, the two may be covering slightly different populations. If different populations form the base of the self-employment income estimates for each collection method, it is reasonable to assume the resulting estimates could differ as a result; and are therefore affected by the differences in coverage.
In linked individual-level comparisons, the group of individuals sampled from the LCF is identical to the group of individuals taken from the Self Assessment returns. As such, there is no difference in the coverage of the two groups. Given the same individuals form the base of the self-employment income estimates for each collection method, it is reasonable to assume that any differences in these estimates are the result of differences in what individuals are reporting via each of these collection methods. For example, a self-employed individual could report an income of £13,000 when completing the LCF, yet report an income of £11,500 when submitting their Self Assessment return. The coverage is identical, given they appear in both collection methods, yet they are reporting different incomes via each method.
An illegal activity where registered individuals or businesses deliberately omit, conceal or misrepresent information in order to reduce their tax liabilities [definition taken from HMRC's Measuring tax gaps publication].
Exploiting the tax rules to gain a tax advantage that Parliament never intended. It often involves contrived, artificial transactions that serve little or no commercial purpose other than to produce a tax advantage. It involves operating within the letter but not the spirit of the law [definition taken from HMRC's Measuring tax gaps publication].Back to table of contents
Living Costs and Food Survey (LCF)
The LCF is a household survey run by the Office for National Statistics (ONS) to collect information on spending patterns and the cost of living in the UK.
Further information is available in the LCF technical report.
Self Assessment data
Detailed information covering Self Assessment data is available in the Data sources and quality section of Measuring self-employment income using administrative data.
When completing a Self Assessment return, businesses should report either a profit or a loss, not both. Our aggregate and linked analyses only include Self Assessment returns that satisfy this criterion. A small proportion of returns have been removed from our sole trader (0.05%) and partner populations (0.25%).
Linking LCF and Self Assessment records
Records have been matched using four groups of identifiers available in both datasets, referred to as match keys (Table 5), as a common unique identifier is not available across these datasets.
|Match key||Initial||Surname||Date of Birth||Gender||Postal district|
|4||Yes||Yes – trigram||Yes||Yes||Yes|
Download this table Table 5: Match keys used to link LCF and Self Assessment data records.xls .csv
Levenshtein distance, a measure of the difference between two string sequences, was used to identify records which were very closely, but not perfectly matched (for example, "DE6" and "DE5"), resulting in either inclusion or clerical checking of these records.
Duplicate records were removed using survey unique identifiers to ensure one record per LCF respondent was included in each tax year.
Data quality for LCF and Self Assessment data linkage
Several factors may have reduced the linkage rate between the LCF and Self Assessment data.
Some LCF self-employed respondents may have failed to link because they do not have a Self Assessment record to link with. This could apply to individuals with incomes below the legal requirement for Self Assessment submission, or managing directors who report as self-employed via the LCF but are recorded as employed for tax purposes.
Quality issues with personal identifiers used for linkage may also have affected our linkage rates, resulting in missed links, or false negatives. For example, for tax years ending 2016, 2017 and 2018, between 120 and 215 LCF records had an invalid date of birth, and between 145 and 180 LCF records had dummy name variables. 17,513 Self Assessment records also had a null name, although there were no null or blank dates of birth recorded.
Many individuals submitting Self Assessment records use their tax agent's address for correspondence, meaning their personal address is not available to link with the LCF (which records a personal address only). There are also some specific industries, such as London Black Cab drivers, which have one tax agent covering several thousand individuals, all of whom record the same address in their Self Assessment returns. This makes it impossible to link these individuals to the LCF.Back to table of contents
Contact details for this Article
Telephone: +44 (0)1329 444992