1. Abstract

This article provides an update on the use of commercial data on borrowing acquired from the credit reference agency Equifax for our Enhanced Financial Accounts (EFA) initiative.

The article highlights that the data we have had permission to use so far is not sufficiently detailed to meet the requirements of EFA as set out in an initial article in May 2017. However, we have recently gained permissions to access and use more granular data from Equifax, which should be able to meet these requirements.

Back to table of contents

2. Introduction

We have ambitious plans to transform our economic statistics over the coming years, informed by our Economic Statistics and Analysis Strategy and with the aim of increasing the robustness and quality of UK economic statistics. Working in partnership with the Bank of England, one main element of our transformation work is the development of Enhanced Financial Accounts (EFA) – in particular more detailed “Flow of Funds” statistics – to meet evolving user needs.

The main aims of the EFA initiative are to improve the quality, coverage and granularity of financial statistics and a possible avenue for these improvements is through the use of commercial data. The benefits of commercial data over traditional surveys are numerous:

  • data can be obtained in a timelier manner and has the potential for greater granularity

  • obtaining data from a single source, rather than multiple respondents to surveys also further ensures that the same definitions are being applied across a common subject, leading to higher quality and reduced burden on households and businesses

  • the coverage of the data can be far greater than from surveys

In the May 2017 article, we set out our plans to analyse commercial data sources covering loans, debt securities and equity, the potential benefits of using commercial data and our broader plans for the EFA initiative. This article follows up on progress made with relation to commercial data covering loans since May.

We are in the process of acquiring commercial data relating to the issuance and ownership of debt securities and equity and will report back on this in the coming months.

Back to table of contents

3. Context

Credit reference agencies in the UK support lenders (for example, banks and building societies) in assessing the creditworthiness of potential borrowers.

Since the May 2017 article, we have acquired the services of Equifax, a credit reference agency, to supply anonymised and aggregated data on lending by various financial and non-financial companies. This data covers both lending to individuals and households, as well as lending to commercial organisations.

The matrix shown in Figure 1 highlights, by sector and transaction, the specific areas of the economy the Equifax data will provide information on. The data provides information on loans across most sectors of the economy, with the exception of rest of the world and government. For private non-financial corporations, monetary financial institutions and other financial institutions, the data covers assets and liabilities. For other sectors, it covers liabilities (that is, borrowing) only.

Back to table of contents

4. Accessing credit reference agency data

Provision of lending data by credit reference agencies is determined by a set of principles governed by the Steering Committee on Reciprocity (SCOR). SCOR is a cross-industry forum made up of representatives from credit industry trade associations, credit industry bodies and credit reference agencies. These principles determine how data can be shared and, as such, defined the service that Equifax could provide to us.

Since acquiring the services of Equifax, we have worked closely with them, building our understanding of the data and their understanding of our requirements. This has enabled us to gain further permissions from SCOR that allow us to access more detailed information from Equifax or another credit reference agency. We have reached agreement with Equifax and will gain access to these data shortly. This data will still be on an anonymised basis for households and unlisted companies.

Back to table of contents

5. Analysing the usability of commercial data

In the May 2017 article, we set out a series of questions that we needed to answer in order to ascertain whether and how commercial data could be used in the production of the financial accounts. Following our initial analysis of the Equifax data, we return to these questions and provide initial answers.

Question 1. Are the data gained from commercial sources any better than those sourced from surveys?

Approach:

The first question we need to address is what does “better” look like. Each data source will have its own strengths and weaknesses. Some of the sources will be more granular in nature and allow a lot more flexibility in the types of analyses we can carry out. Other commercial sources may offer a far greater coverage than surveys, which will be constrained by sampling. Alternatively, the commercial data may offer better quality or timeliness than the Office for National Statistics (ONS) survey data can. An initial assessment of the strengths and weaknesses of each data source has taken place as part of the procurement exercise but further analysis of the successful bidders’ submissions will take place when we acquire the data. These further analyses will be at the heart of the initial work we will carry out while assessing the quality of the data.

Desired outcome:

Data sources that deliver a product which is a marked improvement in terms of granularity, coverage and quality, when compared with the traditional survey option.

The potential for cost savings will also be considered a benefit.

Initial answer:

The initial permissions granted to us by the Steering Committee on Reciprocity (SCOR) put limitations on what data were initially made available by Equifax. This limited the usefulness of the data when compared with alternative sources, both those currently used in compilation of financial statistics and those alternative sources that are being investigated for future use. As such, the commercial data in its current format has both strengths and weaknesses as a potential source.

In terms of strengths:

Increased volume data when compared with surveys: The volume of data making up the aggregates supplied to us far exceed those that could be compiled from survey data as they are based on all data available to the commercial provider (which consists of many millions of records), whereas surveys are limited by the responders to surveys. Furthermore, all figures are based on real data, whereas estimation (imputation or grossing) has to be applied to survey data.

Improved granularity when compared with surveys: The granularity of lending types available in the aggregated data is also better than surveys due to the relatively limited number of questions that are currently asked to sampled companies. To reduce burden, it is generally only the main categories (such as a long-term and short-term split and certain main lending types) for which data are asked. In the detailed Equifax data that underpin these aggregates there are over 30 distinct categories.

In terms of weaknesses:

Sampling frame: It is difficult to accurately determine the coverage of the data as it has not been collected using a statistical sampling frame. However, high-level comparisons with other data sources indicate that a high proportion of the market is covered by the Equifax data. Reasons for under-coverage include the fact that there will be some lenders who do not supply their data to Equifax and differences due to categorisation (institutions self-classify when providing data to Equifax so slight coverage differences may occur). Quantifying the amount of under-coverage (or misclassification) is difficult as we are unable (within the current remit of the data provided to us) to identify which lenders are included in the data and which are not. Without this, we cannot make appropriate adjustments to the data to address gaps left by non-reporting lenders.

Timeliness: The timeliness of new data is currently approximately 11 weeks after the end of the reporting period. ONS currently publishes financial statistics as part of its publication ‘UK Economic Accounts’. This publication is released at the end of the quarter following the reference period (so the Quarter 2 (Apr to June) 2017 release is published at the end of Quarter 3 (July to Sept) 2017). For use within the production of regular statistics, this lag would therefore mean data would be too late to be processed for use within this first release of data.

Granularity required for the Enhanced Financial Accounts: The Equifax data for both lenders and borrowers is currently not sufficiently granular to meet the proposed breakdowns that will be produced as part of the Enhanced Financial Accounts (EFA). As such, the data would need to be used alongside (rather than replace) other data sources, which would allow this granularity to be produced.

For example, ONS uses data provided by the Bank of England to compile statistics on the monetary financial institutions sector (which largely covers banks and building societies). This would remain the main source of data for this sector’s statistics, but with commercial data able to add more depth (such as a granular geographical dimension) to these statistics. Producing a reliable “map” between the classifications of the commercial data as it is received to our own requirements will have to be based on some assumptions of how the data have been classified (though the aggregation method was derived through collaboration between Equifax and ONS to best meet the needs of EFA).

Annex A displays the current proposed breakdowns along with an indication of how the aggregated data matches these breakdowns. The last two columns in the table show how lenders and borrowers have been grouped or aggregated as part of the data provided. Note that where cells contain “N/A” this is where these sectors are either not covered by the data collected by Equifax or where lenders or borrowers in this category cannot explicitly be identified as included in the data.

Summary

Whilst there are improvements in terms of the quality and quantity of data received from Equifax and the characteristics of that data, there are also considerations that would need to be made as to how this would be integrated into the financial accounts alongside numerous other data sources. Although the aggregated nature of the data doesn’t naturally lend itself towards meeting the requirements of EFA, we believe that obtaining a more granular level of data will allow most of these requirements to be met – though timeliness will remain an issue. Future work in this area is outlined in the next steps section at the end of this article.

Question 2. Are there gaps in the commercial data and if so, how would you propose to get an alternative data source?

Approach:

The European System of Accounts: ESA 2010 framework sets out the structure with which financial statistics must be compiled to ensure international comparability. However, the EFA proposals will look to provide further granularity to meet the needs of our varied stakeholders. It is already known that the commercial data available will not be a “silver bullet”, but rather it will be used alongside current and other new sources. Gaps will be identified through understanding the data being used and its coverage and from knowledge picked up through previous reviews of financial statistics. Administrative and regulatory data, as well as ONS surveys, will all be used alongside commercial data to complete the picture.

Desired outcome:

A comprehensive matrix that will highlight areas of concern in terms of coverage, which ONS can then investigate further with alternative data sources.

Initial answer:

In the context of all loans, we are already aware that there are certain parts of the framework that cannot be met by this data source. This includes lending and borrowing by government and rest of the world entities. This, however, was understood prior to the process of acquiring commercial data began. ONS already uses other data sources in these areas and are also investigating other potential new data sources.

For example, the majority of data on loans involving the government sector are obtained from HM Treasury or other government departments. Data on lending by UK banks to overseas residents is obtained from the Bank of England. Data on lending by overseas banks to the UK is obtained from the Bank for International Settlements for many countries (with ongoing work to incorporate newly-reporting countries). Lending by other overseas institutions is an area of investigation also being undertaken within the EFA development programme, although the framework already includes lending between the UK and overseas institutions where a direct investment relationship exists.

Within other sectors of the economy, there are still some gaps. Certain types of borrowing are not included in the scope of data held by credit reference agencies. Examples of this include pawnbroking and employer-to-employee loans.

Once we have access to the more detailed data from Equifax and the data are confronted with existing sources and compilation methods used in the financial accounts, it will become clear where other data sources are needed. Peer-to-peer lending, for example, is a possible gap but at this stage, we will need to work further with Equifax to understand how well this (relatively) new type of finance is being reported (if at all) in their data. There are some lending types we know are not covered within commercial data, such as unregistered or unauthorised lenders. We will seek alternate sources for these.

In terms of missing institutions, the aforementioned limitations of the current aggregated data have made it harder to identify gaps. For example, whilst we have aggregated data covering lenders across a range of different industries, we are unable to identify which of these industries are well-covered and which are more likely to be missing lenders within the data. We can, however, use some of the aggregated data to get an indication as to how well certain types of institutions (especially in the banking sector) are covered by comparing with aggregated data from other sources.

Question 3. Will you be looking to make comparisons against current data sources?

Approach:

Commercial data will be “mapped” to the ESA 2010 framework and Experimental Statistics produced based on it. This will be compared with any current statistics being produced.

Where feasible, this will be used to determine the quality of the commercial data. In some instances, however, it is acknowledged that current data sources have some deficiencies in quality. In these scenarios, accompanying information and metadata linked to the commercial data will indicate the quality of the new (experimental) estimates. This will be done using the UK Statistics Authority’s Quality Assurance Toolkit.

Desired outcome:

A good outcome from these investigations will show that either:

  • the quality of existing data sources is proven, with commercial data adding further depth and insight or

  • commercial data can replace existing data sources due to its better quality

It is fully acknowledged that it is likely to be a combination of the existing and commercial data sources for different uses of the data.

Initial answer:

As touched upon previously, the usefulness of comparisons between the Equifax data and other data sources is determined by the extent to which the different sources use the same definitions and structures. While working with Equifax on the specification of the service to be provided, we took into account the structure by which ONS compile the financial accounts (and plans to compile the enhanced financial accounts). Although the nature of the Equifax data is that aggregates have been provided, it does, however, allow for comparisons to be made against other data sources.

It is also important to consider the strength or quality of the data source to which we compare. As stated in our desired outcome, we hope to be able to prove that commercial data can provide an alternative to other data sources due to its better quality. We are, however, making improvements in other areas. With that in mind, here are some comparisons between data compiled from the Equifax data against other sources.

Analysis 1: Comparing with ONS published data

It’s first worth looking at what the commercial data looks like when compared with published data covering similar scope. The UK Economic Accounts (UKEA), which is published in line with the international System of National Accounts framework, includes a category for “Loans secured on dwellings”. The expected coverage of this category is very similar to data received from Equifax in the “Mortgages” category and so we can look to compare these figures to get an indication as to whether the data is comparable.

Figure 2 shows that while the UKEA data is consistently above that of the value of mortgages outstanding in the Equifax data, both datasets show similar trends, in which there has been a steady quarter-by-quarter increase in the total value of mortgages being lent to households since Quarter 3 (July to Sept) 2015. The Equifax totals are consistently lower than UKEA. This makes sense given that we have not tried to account for gaps in coverage of the Equifax data at this stage. Slight definitional differences may also be responsible for some of the gap. We will look to estimate the coverage of Equifax data and compare with a broader range of sources in our ongoing analysis. The figures, however, give a positive indication that the Equifax data show similar characteristics to our other data sources.

Analysis 2: Comparisons with banking data

This next analysis looks at data on all lending to consumers (also referred to as households or individuals). The Bank of England publish on a monthly basis its Bankstats release, which covers a range of statistics including lending to individuals.

Figure 3 shows lending to individuals from the Bankstats release against lending from the consumer lending data from Equifax. Again there is a good correlation between these two data sources. Both these datasets include lending from monetary financial institutions (MFIs, which mostly comprise banks and building societies) but also from other institutions, which may include specialist mortgage lenders or institutions providing consumer credit. Though not included in the graph, the underlying data shows the split between lending by MFIs and by non-MFIs to be in similar proportions in the Equifax and Bankstats data (MFIs constituting around 85% of lending).

Overall conclusion

The comparisons we’ve been able to make with some of our other data sources give encouraging results. While we know that Equifax does not have complete coverage of the markets they cover, their granularity is nevertheless very good and offers an improvement over, for example, running a sample survey. Nevertheless, it is difficult to make other comparisons across some sectors due to the nature of the data. More granular data will allow us to do this which will then firstly continue to assure us of the quality of the data and secondly further move us towards the granularity of statistics we are looking to produce as part of EFA through a good quality source that provides the whom-to-whom relationship between the lender and borrower.

Whilst the current service provided (limited by ONS’s previous permissions over access to data) would not be fully fit for purpose in meeting some of the main aims of the EFA programme (including the production of whom-to-whom matrices), we expect this will change given access to more granular data and we are encouraged by the quality of information supplied thus far.

Question 4. How are you looking to improve data on lending by using commercial data?

Approach:

Lending data by banks is currently of good quality due to our ongoing working relationship with the Bank of England.

Lending by other financial institutions is where we see more scope for improvement due to the current reliance on surveys.

On both sides, we intend to look at adding a further level of detail that couldn’t be obtained via current means.

We have a number of areas of interest including breaking down the types of lending undertaken in greater detail, identifying the levels of unsecured lending and possibly introducing a geographical element to the analysis by looking at borrowing by region.

Desired outcome:

The desired outcome is that more detailed statistics on lending can be produced by using a combination of commercial data and other sources.

Initial answer:

We are required to produce statistics that are fit for use in the national accounts. Current data sources are sufficient to meet legal requirements but, in an ever-changing financial world, it is acknowledged that the methods and sources need to be improved to remain up-to-date with new and changing financial activity. As such, commercial data is seen as a possible source to make these improvements (along with increasing the use of administrative and regulatory data). This was highlighted by the Bean review of economic statistics and is an important component of our Better Statistics, Better Decisions strategy.

Furthermore, we are looking for data that allows us to meet the requirements and ambitions of the Enhanced Financial Accounts, specifically to better understand which sectors of the economy are lending to other sectors. We currently use surveys to meet some of these requirements, but the amount of information that can be obtained in this way is limited to the questions that are asked. Commercial data doesn’t have this restriction and so the intention is that we are able to produce much richer and informative statistics on lending. For example, with more granular detail on the type of lending being undertaken, changes in borrowing behaviour can be observed. Also, with more granular geographical data, it is hoped that it will be possible, alongside other data sources and information available, to identify whether there are areas of increased risk.

There has also been user interest in particular types of lending, such as car finance or the size of the unsecured lending market. While the current service provided by Equifax does not allow this to be explicitly identified (as the aggregated data means it is not possible to produce statistics at the specific levels intended), it is expected that future changes to the service will allow for some of these more targeted areas of interest to be met by a combination of commercial data and other administrative or regulatory sources. By identifying businesses and the type of lending they provide (though always processing and publishing this data in aggregated form), we will look to provide new types of analysis previously not possible due to the information available to us.

Question 5. Are there any other benefits of using commercial data you hope to be able to prove? Conversely, what risks are associated with the use of commercial data?

Approach:

The expectation is that commercial data can be provided in a timelier manner than other ONS data sources, however, working with commercial companies to design the service provided will prove this.

Commercial data represents a potential improvement over surveys as a more complete set of statistics can be produced in a quicker fashion, leading to fewer revisions. The quality of data should also prove to be greater than that of surveys due to its potentially wider coverage. This will be investigated.

Surveys can rely on the interpretation of those completing them, which can lead to potential bias. In the long run, there is also an expectation that the costs (both to ONS and business) will be reduced, though this will not be fully researchable at this time.

The risk to using commercially sourced data lies in terms of control. This will be a time of change for ONS and although we embrace it, there will undoubtedly be some moments of uncertainty as we step firmly out of our comfort zone. ONS has always prided itself on the quality of its data outputs and the outsourcing of the initial collection and validation process is something that we will need to monitor. Another risk lies in the fact that the data has not been collected for statistical purposes, which will mean we may need to adapt our method to make best use of it.

Desired outcome:

We will know the risks and benefits, have a clear view on where the commercial data can be best utilised and inform our final decisions on using commercial data.

Initial answer:

We should first look at the benefits. Many of these have been covered with answers to the previous questions including the ability to produce more granular data and the identification of interactions between sectors of the economy - both main drivers of EFA. While the existing service offered by Equifax may not have allowed us to produce the level of detail outlined in our proposed framework (due to the limitations on what they were permitted to provide), we were able to learn about the level of detail available and use this information to request and gain access to more granular data. The new levels of information available will now allow us to consider commercial data as one of the components in meeting the main aims of EFA.

Conversely it is important to consider the risks of using commercial data. Question 1 covered some of these, such as the timeliness of data and the ability to fully understand the coverage of data. Furthermore, the ability to interrogate and quality assure the data in more detail when working with a “third party” is more limited than it is when working directly with the original source of the data.

In weighing up the benefits and risks, and whether to continue investigating the use of commercial data, we need to consider the availability of alternative data sources and the more granular data we expect to receive as part of the new service from Equifax.

Back to table of contents

6. Conclusion

Our initial analysis of the Equifax data has highlighted that, while coverage and quality appear promising, it does not contain sufficient detail in its aggregated form to fulfil the requirements of the Enhanced Financial Accounts initiative. However, we are confident that the more detailed anonymised data to which we have now been given permission to access will enable us to fulfil these requirements.

Back to table of contents

7. Next steps

Figure 4 shows a broad timeline for the next steps that will be undertaken in using both Credit Reference Agency data and other data sources as part of the wider plans to enhance data on lending and borrowing for the Enhanced Financial Accounts.

Back to table of contents

8. Author and acknowledgements

Author : Pete Jones

Acknowledgement: The author would like to thank Richard Campbell, Sarah Adams, Abbe Williams, Laurence Day and Kay Adaramodu for their assistance.

Back to table of contents

10. Annex A: Equifax data classification mapped to EFA proposed breakdown

Back to table of contents

Contact details for this Article

Sarah Adams
FlowOfFundsDevelopment@ons.gov.uk
Telephone: +44 (0)1633 455787