1. Main points

  • This research is the first conducted by the Office for National Statistics (ONS) to compare administrative and survey-based estimates of self-employment income.

  • Median self-employment income is consistently higher in the survey-based Effects of taxes and benefits measure than the administrative-based Self Assessment measure, while the gap between mean and median values is much larger for the admin-based measures across all three years (tax years ending 2016, 2017 and 2018).

  • Initial analysis suggests this could be because the administrative data better capture the top and bottom of the income distribution, although further research is needed to confirm this.

  • This phenomenon was also true when looking at administrative and survey-based records for the same person, with some survey respondents potentially reporting turnover rather than income – however, this research is complicated because of data linkage challenges.

  • Evidence published by HM Revenue and Customs (HMRC) suggests some individuals under-report their self-employment income when submitting a Self Assessment tax return; however, our analyses cannot shed light on whether these individuals would feed this under-reporting through to their survey response.

Back to table of contents

2. About our transformation research

The Office for National Statistics (ONS) publishes official estimates of income in the Effects of taxes and benefits on UK household income (ETB) yearly statistical bulletins, using Living Costs and Food Survey (LCF) data. The data and publications include breakdowns for self-employment income.

This article presents research into the use of HM Revenue and Customs (HMRC) Self Assessment data to produce an administrative-based measure of self-employment income, replicating methods used to produce ONS’ survey-based estimates.

Comparisons are made using survey and administrative data covering the tax years ending 2016, 2017 and 2018, as well as with HMRC's official estimates of self-employment income.

This research will contribute to two main work streams as part of our plans to transform population and social statistics:

  • the development of a measure of self-employment income to incorporate into our Admin-based income statistics

  • the potential use of administrative data to improve existing survey-based measures of income or as an alternative data source to replace survey income questions

Creating a Self Assessment measure of self-employment income using survey-aligned methodology

Our survey-based measure of self-employment income draws on existing methods used to produce the ONS' ETB estimates. To compare, we have developed an administrative data-based measure which aligns with these existing methods to ensure similar concepts are being compared throughout. This will help us understand how self-employment income estimates differ when substituting administrative data into existing survey processes.

Three rules have been identified to ensure consistency between these measures:

  1. The LCF asks respondents to report self-employment income for up to three jobs from a specific box on their Self Assessment return, or other tax-related documentation. We have sourced income amounts from the same Self Assessment box to calculate our administrative-based measure1. This guidance is specific to self-employment profits, so the closest aligned box collecting self-employment loss has been sourced where appropriate.

  2. During LCF processing, non-zero loss income amounts are converted to zeros before being included with profits to calculate self-employment income. The same processing has been applied to Self Assessment data.

  3. The LCF asks respondents to report income from the most recent year that they prepared accounts for HMRC. For around two-thirds of respondents, this is the previous tax year, with around one-third reporting from two years ago, meaning collected figures are always out of date. LCF processing uprates2 these figures to the current tax year. We have designed a similar process for the linked records analysis, with tax years aligned between the LCF and Self Assessment data.

Notes for: About our transformation research:

  1. LCF respondents were asked additional questions to capture all self-employment income, including whether they have drawn money from a work account for non-business purposes. This has been considered when designing our measures.
  2. Uprating LCF self-employment income to current tax year involves: 1) splitting figures into year quarters; 2) using employee income for corresponding period to determine how much self-employment income should be increased or decreased.
Back to table of contents

3. Aggregate comparisons of self-employment income from the Effects of taxes and benefits (ETB) and Self Assessment data

Aggregate comparisons are made between the Office for National Statistics' (ONS') survey-based ETB measure of self-employment income and individuals reporting self-employment income through a Self Assessment tax return.

Median self-employment income is consistently higher in ETB than our equivalent administrative-based measure across all three years (tax years ending 2016, 2017 and 2018). This pattern was consistent when comparing self-employment income within income deciles in all years.

Mean self-employment income is also higher in ETB estimates, with the gap between mean and median values larger for the administrative-based measure across all three years.

The ETB estimates that more individuals are in self-employed work in the tax years ending 2016 and 2018, although not 20171, and that this amounts to an increasingly higher total self-employment income amount when compared with Self Assessment estimates for all years.

This suggests that survey-based ETB measures of self-employment income are noticeably larger than the equivalent administrative-based Self Assessment measures.

One explanation for this difference is that the coverage of the two data sources differ. Given the count of self-employed individuals is higher in the ETB than Self Assessment across two of the three years, more individuals may be reporting self-employment income in the survey than individuals completing a Self Assessment return. Individuals are not legally required to submit a Self Assessment return for total self-employment receipts less than £1,000. Such individuals could therefore report this self-employment income in the survey but may not have submitted a Self Assessment return.

Our analyses have shown that in the tax year ending 2016, 6% of Living Costs and Food Survey (LCF) respondents reported a self-employment income less than £1,000, while 20% of individuals submitting Self Assessment returns reported self-employment incomes below this threshold. Having removed returns reporting £0 incomes, this proportion still remains above the LCF proportion at 8%. This suggests the differences in self-employment income between Self Assessment and ETB estimates are not the result of fewer individuals reporting less than £1,000 incomes via Self Assessment returns.

Similarly, managing directors of their own company may report as self-employed on the survey, yet for tax purposes would be classified as a company employee, so would not have submitted a self-employment tax return. Future research will aim to identify these individuals using descriptive information provided, as well as other sub-groups who inconsistently report their income on the LCF and Self Assessment.

In comparison with survey data, administrative data are much better at capturing very high earners, with the highest self-employment income being over 100 times higher in the Self Assessment data than in the LCF. Very high incomes can be difficult to capture using surveys, with individuals either not responding or being sampled for the survey.

While this does not explain why ETB estimates of self-employment income are higher than the Self Assessment estimates, it could explain why the differences between mean self-employment income are smaller than those between medians, given very high incomes will increase the mean.

Self Assessment data also capture a larger proportion of relatively small income values. Analyses found that in the tax year ending 2016, 42% of LCF respondents reported incomes below £10,000, whereas 60% of individuals submitting Self Assessment returns reported self-employment incomes below £10,000. After removing £0 incomes this fell to 53%, remaining above the LCF proportion reporting below £10,000. This may help to explain the lower mean and median administrative-based measures of self-employment income.

Alternative Self Assessment-based estimates of self-employment income

Our research uses Self Assessment data supplied by HM Revenue and Customs (HMRC), which include data from all tax returns submitted every year. We have used this supply to create an administrative-based measure of self-employment income, for comparison with the survey-based ETB measure.

Self Assessment data also feed directly into HMRC's collection of statistics about personal incomes, released as National Statistics. These statistics are produced using HMRC's Survey of Personal Incomes (SPI), which samples information held by HMRC about individuals liable for UK income tax. The SPI samples individuals from three operational systems: Pay As You Earn, Self Assessment and Claims. A slight difference between SPI and the ONS Self Assessment-based estimates is the exclusion of data from SA200 returns2, although this should have minimal impact on the findings reported in this section and in Figures 2, 3 and 4.

These SPI published statistics include a breakdown by income type, including self-employment income. This SPI-based measure of self-employment income is calculated as:

Self-employment income = profit minus (capital allowances plus losses brought forward)

This calculation aligns well with the variable used for our Self Assessment and ETB-based measures, with both sourcing the variable "Total taxable profits" (where ETB survey respondents refer to a Self Assessment return). This incorporates capital allowances and losses brought forward.

The SPI measure is produced for all individuals with self-employment income and for self-employed individuals only liable to tax. When produced for all self-employed individuals, instances in which a loss is made or profits are completely offset will be recorded as nil self-employment income. However, individuals will be retained in the dataset and included in the count of self-employed individuals.

This methodology aligns well with methods to produce ETB and replicated for our Self Assessment measure. Figure 2 illustrates how the two SPI estimates differ from our Self Assessment and ETB estimates.

The SPI count of taxpaying-only self-employed individuals is lower than the Self Assessment and ETB measures. The mean self-employment income is higher for this taxpayer-only group, consistent with this measure excluding individuals with lower, non-taxable amounts.

In contrast, the SPI count of all self-employed individuals (regardless of tax liability) is noticeably higher than the Self Assessment and ETB counts, suggesting the SPI methodology has identified more self-employed individuals in the UK.

A similar trend is seen for total self-employment income, with the SPI (all self-employed) measure recording a higher total income figure compared with the other three measures, albeit only marginally higher than ETB.

Mean self-employment income for this SPI (all self-employed) measure is lower than the means estimated from the other three measures. This demonstrates that despite capturing a greater number of self-employed individuals, on average, lower self-employment incomes are being captured via this SPI measure.

It should be noted that the grossing factors used by HMRC to produce population-level estimates were revised for the tax year ending 2019 figures. This resulted in a decrease of around 630,000 self-employed individuals. Grossing factors were not adjusted for previously published years; however, we could expect a similar decline in self-employment individuals for the tax year ending 2016, bringing it more in line with our Self Assessment and ETB counts.

Notes for: Aggregate comparisons of self-employment income from the Effects of taxes and benefits (ETB) and Self Assessment data:

  1. Differences in ETB estimates between the three years may be because of volatility introduced by the underlying survey methodology.
  2. SA200 forms are issued directly by HMRC for certain individuals who have simple tax affairs. Future developments of ONS Self Assessment-based estimates will aim to incorporate these data, which are already included in SPI.
Back to table of contents

4. Comparing self-employment income from the Living Costs and Food Survey (LCF) and Self Assessment data using person-level linked records

We compared survey-based and administrative-based measures using person-level linked records to investigate whether coverage differences solely explain self-employment income differences found in the aggregate comparisons. The same sub-group of individuals are present in both data sources.

Linking LCF and Self Assessment records

Records were linked for individuals appearing in both the Living Costs and Food Survey (LCF) and Self Assessment data for the tax years ending 2016, 2017 and 2018. Records were linked using several personal identifiers as a common unique identifier was not available across the datasets. Linkage methodology and rates are available in the Data sources and quality section.

Having successfully linked 51 to 57% of self-employed respondents with a Self Assessment return, caution should be taken when interpreting these research findings. This analysis provides useful insight into the differences in reported self-employment income between administrative and survey data sources, however, low linkage rates mean these findings may not fully represent the UK's self-employed population.

Aligning LCF and Self Assessment records

While a single Self Assessment collection year refers to one tax year, one LCF collection year covers multiple tax years, as LCF respondents are asked to report income from the last year they prepared accounts for HM Revenue and Customs (HMRC). Around two-thirds report income from the previous tax year, while one-third report income from two years ago.

To increase comparability, LCF collection years have been reconstructed into tax years (Table 3), with respondents reporting on a particular tax year being grouped together, regardless of the year in which they completed the LCF. This allows for direct comparisons between Self Assessment and LCF tax years.

Analysis of person-level linked LCF and Self Assessment records

In the tax year ending 2016, both mean and median self-employment income was higher in the LCF, despite the same individuals being included in both samples (Table 4). This trend was evidenced across all income deciles in the sample (Figure 5). Total self-employment income was also substantially higher in the LCF than the Self Assessment data.

In linked individuals, 75% reported a higher self-employment income in the LCF, with a mean difference of £6,727, while 25% reported a higher income in the Self Assessment data. This is a mean difference of £8,081 between the two measures.

Explaining the difference between linked LCF and Self Assessment measures of self-employment income

By using person-level linked data with survey-aligned methodology, we have removed the possibility of coverage differences between our survey and administrative self-employment income measures. There are still clear differences that suggest these are more likely the result of reporting differences between the two sources. See Glossary for definitions of coverage and reporting differences.

Reporting differences can include survey respondents not providing the information requested or the impacts of tax evasion and avoidance on Self Assessment returns. Self-employed individuals may also pay family members a wage, claiming this as an allowable expense, hence reducing their overall self-employment profit and ultimately their tax liability.

LCF respondents are asked to refer to tax documentation when supplying their self-employment income, with those using their Self Assessment return asked to report their total taxable profit.

Our analyses found that 13% of LCF respondents referred to their Self Assessment return for the tax year ending 2016, with 61% of respondents consulting other tax documentation and 26% not consulting any documentation. Mean and median self-employment income reported via Self Assessment and the LCF differ between these three groups (Figure 4). Individuals sourcing their LCF self-employment income from their Self Assessment return still reported a higher income in the survey than through their Self Assessment return. However, the difference between median self-employment income for this group is smaller than for individuals referring to other tax documentation, which again is smaller than those not referring to documentation.

Differences in the survey and administrative measures of self-employment income may be affected by the more complicated cases recorded via the LCF. For example, individuals may report post-tax self-employment income or drawings taken directly from business accounts. While all additional information is considered during LCF processing, this could have an impact on the estimates produced when comparing directly with Self Assessment returns.

Respondents may also be reporting their turnover rather than "total taxable profit". Analysis has identified 26 individuals in our linked dataset who reported self-employment income on the LCF that was within 10% of their Self Assessment turnover, yet more than 10% away from their total taxable profit (Figure 5). This could inflate self-employment income estimates in the LCF, given values that are normally deducted from turnover to calculate total taxable profit are not being accounted for (such as expenses or allowances).

The LCF and Self Assessment data also involve very different collection methods, which could have an impact on the quality of information provided. LCF respondents take a few minutes to answer self-employment income questions as part of a larger survey, whereas individuals have up to 10 months to submit their Self Assessment return, with the threat of heavy fines if incorrect.

HMRC's tax gap estimates, which estimate the amount of tax that should be and is paid each year, provide evidence of tax evasion and avoidance in Self Assessment returns1. This suggests under-reporting of self-employment income in these returns.

The latest statistics (tax year ending 2019) estimated that self-employed individuals and small partnerships had a tax gap of 22.9%, or £4.1 billion, and that large partnerships had a tax gap of 5.4%, or £1.1 billion. In the tax year ending 2017, 29% of all self-employment tax returns under-declared tax liability, with 7% under-reporting by £1 to £500, 3% under-reporting by £501 to £1,000 and 19% under-reporting by more than £1,000.

These findings suggest that a proportion of self-employed individuals under-report their income when submitting a Self Assessment tax return. However, our analyses cannot shed light on whether these individuals would feed this under-reporting through to their LCF response2.

Notes for: Comparing self-employment income from the Living Costs and Food Survey (LCF) and Self Assessment data using person-level linked records:

  1. Tax gaps can have various causes, including tax evasion and avoidance, criminal attacks and error.
  2. LCF respondents are asked to supply self-employment figures for the last year they prepared accounts for HMRC, therefore should never be providing figures for a tax year that does not have an existing Self Assessment return.
Back to table of contents

5. Glossary

Coverage differences

The Living Costs and Food Survey (LCF) and Self Assessment tax returns both capture self-employment income. The LCF does so by taking a sample of the self-employed population, regardless of income, while Self Assessment returns by law should capture all individuals with self-employment income above £1,000. As such, the two may be covering slightly different populations. If different populations form the base of the self-employment income estimates for each collection method, it is reasonable to assume the resulting estimates could differ as a result; and are therefore affected by the differences in coverage.

Reporting differences

In linked individual-level comparisons, the group of individuals sampled from the LCF is identical to the group of individuals taken from the Self Assessment returns. As such, there is no difference in the coverage of the two groups. Given the same individuals form the base of the self-employment income estimates for each collection method, it is reasonable to assume that any differences in these estimates are the result of differences in what individuals are reporting via each of these collection methods. For example, a self-employed individual could report an income of £13,000 when completing the LCF, yet report an income of £11,500 when submitting their Self Assessment return. The coverage is identical, given they appear in both collection methods, yet they are reporting different incomes via each method.

Tax evasion

An illegal activity where registered individuals or businesses deliberately omit, conceal or misrepresent information in order to reduce their tax liabilities [definition taken from HMRC's Measuring tax gaps publication].

Tax avoidance

Exploiting the tax rules to gain a tax advantage that Parliament never intended. It often involves contrived, artificial transactions that serve little or no commercial purpose other than to produce a tax advantage. It involves operating within the letter but not the spirit of the law [definition taken from HMRC's Measuring tax gaps publication].

Back to table of contents

6. Data sources and quality

Living Costs and Food Survey (LCF)

The LCF is a household survey run by the Office for National Statistics (ONS) to collect information on spending patterns and the cost of living in the UK.

Further information is available in the LCF technical report.

Self Assessment data

Detailed information covering Self Assessment data is available in the Data sources and quality section of Measuring self-employment income using administrative data.

When completing a Self Assessment return, businesses should report either a profit or a loss, not both. Our aggregate and linked analyses only include Self Assessment returns that satisfy this criterion. A small proportion of returns have been removed from our sole trader (0.05%) and partner populations (0.25%).

Linking LCF and Self Assessment records

Records have been matched using four groups of identifiers available in both datasets, referred to as match keys (Table 5), as a common unique identifier is not available across these datasets.

Levenshtein distance, a measure of the difference between two string sequences, was used to identify records which were very closely, but not perfectly matched (for example, "DE6" and "DE5"), resulting in either inclusion or clerical checking of these records.

Duplicate records were removed using survey unique identifiers to ensure one record per LCF respondent was included in each tax year.

Data quality for LCF and Self Assessment data linkage

Several factors may have reduced the linkage rate between the LCF and Self Assessment data.

Some LCF self-employed respondents may have failed to link because they do not have a Self Assessment record to link with. This could apply to individuals with incomes below the legal requirement for Self Assessment submission, or managing directors who report as self-employed via the LCF but are recorded as employed for tax purposes.

Quality issues with personal identifiers used for linkage may also have affected our linkage rates, resulting in missed links, or false negatives. For example, for tax years ending 2016, 2017 and 2018, between 120 and 215 LCF records had an invalid date of birth, and between 145 and 180 LCF records had dummy name variables. 17,513 Self Assessment records also had a null name, although there were no null or blank dates of birth recorded.

Many individuals submitting Self Assessment records use their tax agent's address for correspondence, meaning their personal address is not available to link with the LCF (which records a personal address only). There are also some specific industries, such as London Black Cab drivers, which have one tax agent covering several thousand individuals, all of whom record the same address in their Self Assessment returns. This makes it impossible to link these individuals to the LCF.

Back to table of contents

Contact details for this Article

Samantha Pendleton
Admin.Based.Characteristics@ons.gov.uk
Telephone: +44 (0)1329 444992