## 1. Main points

• We have broken down the source data used in the regional measurement of gross value added (GVA) into three different types of data; we have calculated these shares for both the income measure (GVA(I)) and the production measure (GVA(P)), using data for 2014.

• The first type covers data that are directly observed at a regional level, collected in a way that can be immediately and wholly assigned to a single region; at the aggregate UK level, there is around 32% observed data in GVA(I) and around 23% in GVA(P).

• The second type covers data that are not directly observed, but are estimated using sampling and weighting techniques common to all sample surveys; at the aggregate UK level, there is around 51% estimated data in GVA(I) and around 49% in GVA(P).

• The third type covers data that are modelled to provide regional estimates, most often by the apportionment of data collected for a larger area using some regional indicator; at the aggregate UK level, there is around 17% modelled data in GVA(I) and around 29% in GVA(P).

• Most countries and regions of the UK have similar shares of observed, estimated and modelled data; the exception is Northern Ireland, where there is a much higher proportion of observed data and much lower proportion of modelled data, due to the separate data collection that takes place there.

## 2. Introduction

Statistics are compiled using a variety of techniques and sources of information. Some are simple and therefore easy to understand. Others are complex, involving many different components measured in different ways and combined to provide aggregate estimates. The UK Regional Accounts measure of gross value added (GVA) is a very complex statistic that combines two independent measures, each compiled using hundreds of input datasets to represent individual components of GVA.

Users of regional GVA statistics have expressed interest in gaining a better understanding of the extent to which these estimates are based on observed data, that is information provided by businesses or government that relates directly to activities taking place in a known geographic area.

The assumption underlying this interest is that the more observed data are present in the estimates, the more accurate those estimates will be. Where statistical techniques, such as sampling and apportionment, are used to help derive regional estimates, there is an increased risk that the methods used will introduce sampling error or modelling imprecision to the data, with the possibility of producing misleading results.

This article presents an analysis of the UK balanced measure of regional gross value added (GVA(B)) published in December 2017, focusing on the data sources and methods used to measure activity in the countries and regions of the UK during 2014. In the analysis, we estimate the proportion of GVA that is directly observed, the proportion that is derived by sampling and estimation (using weighting to represent non-sampled activity) and the proportion that is derived by apportionment of data relating to larger units that cross regional borders, or by other forms of modelling that involve using different variables to guide the regional allocation of a component of GVA.

## 3. How do we measure regional gross value added?

In the UK Regional Accounts, we measure regional gross value added (GVA) twice, using different approaches and different data sources, to compile two independent estimates for each area. The two approaches are the income approach (GVA(I)) and the production approach (GVA(P)). We then put these two independent estimates through a “balancing process” that seeks to identify the relative strength of each and produce a single “balanced” estimate of GVA for each region (GVA(B)).

The two measures are conceptually equal but use different methods to derive GVA. The income approach sums the components of income (compensation of employees, mixed income, rental income, gross trading profit and surplus, non-market capital consumption, holding gains, taxes less subsidies on production) to give a measure of GVA. In the production approach, GVA is calculated as total output of goods and services less the value of goods and services used up in the production process (intermediate consumption).

In addition, there are several places in both measures where some element of GVA is separated out for a particular purpose, requiring an individual treatment to apportion it to the regions. Many of these have arisen following the implementation of changes necessary to comply with the terms of the European System of Accounts 2010: ESA 2010, as new data sources have been found to measure specific activities in both national and regional accounts. As a result, we have a fairly long list of components that are used in each of the two measures of regional GVA.

Once both income and production measures have been compiled, we assign quality measures to each of those components and weight them together, according to each component’s contribution to GVA, to give a pair of quality metrics for the two estimates. These quality metrics are used to inform a weighted arithmetic mean of the two independent estimates, which gives us GVA(B).

Further details of the balancing process can be found in the article Development of a balanced measure of regional gross value added.

## 4. Analysing regional gross value added data

Each of the components used in the measurement of regional gross value added (GVA) starts with a value for the UK as a whole, which is taken from the latest UK National Accounts Blue Book dataset. We then use the most appropriate available regional indicator to allocate the national total to parts of the UK, in a top-down hierarchical process. In this way we ensure that all of the regions sum to the published UK total and all sub-regions sum to their respective region total.

The regional data used to inform the distribution of the national value cover a wide range of sources, including several of the major Office for National Statistics (ONS) business surveys, surveys carried out by other government departments and various administrative data collected by government for non-statistical purposes, such as monitoring and regulation. Where possible we aim to use a regional measure of the same component or variable that we are trying to allocate, but in some cases no regional data are available. In these cases, we have to use a modelling approach in which we use the distribution of another variable that we believe to be similarly distributed to provide a pattern.

By understanding the ways in which they collect data, we can calculate or estimate the proportion of each of these data sources that represents directly observed data, data estimated using weights for non-sampled units, and data apportioned down from larger units or otherwise modelled using a proxy indicator. Then we can weight the components together according to their contribution to GVA to provide aggregate measures of the three types of data.

We begin by ranking all the components in order of their contribution to GVA. That way we can address those with the largest weight first and build up our picture piece by piece until we have covered a sufficient amount of the total to provide reliable and representative results.

In the income measure, the largest components are (weights in 2014 UK GVA):

• compensation of employees (55%)

• rental income (13%)

• mixed income (5%)

• non-market capital consumption (2%)

• taxes on production (2%)

• all other components (0.5% or less)

In the production measure, the largest components are:

• output less intermediate consumption (71%)

• government including public corporations (13%)

• imputed rental of owner-occupiers (9%)

• self-employed (3%)

• research and development (1%)

• offshore oil and gas extraction (1%)

• non-profit institutions serving households (0.6%)

• agricultural output and intermediate consumption (0.6%)

• all other components (0.4% or less)

If we can cover all of the named components we will achieve approximately 99% coverage of both measures. All of the other components have negligible impact. We can then scale the aggregate results to provide representative percentage shares of the entire GVA measure.

To provide useful results that allow comparisons to be made we will carry out the analysis for each Nomenclature of Units for Territorial Statistics (NUTS)1 country and region of the UK, including the Extra-Regio category for activity that cannot be assigned to a single mainland region, and also derive aggregates for England and the UK as a whole.

Results will be presented for each Standard Industrial Classification 2007: SIC 2007 section (A to T) and in aggregate for the whole economy measure of GVA.

## 5. Principal data sources used in regional gross value added

There are many data sources used to estimate the regional allocation of the various components of regional gross value added (GVA). In fact, there are over 400 input datasets that are used across the income and production measures. However, only relatively few of them have a weight sufficient to be worth considering in this analysis.

In this section we describe the main features of the data sources that have the greatest contribution to regional GVA, with particular focus on the extent of modelling and estimation they employ.

The principal structural business survey operated by the Office for National Statistics (ONS), the Annual Business Survey (ABS) collects a range of financial information from UK companies. Data are collected for the whole company (reporting unit), but regional results are compiled through a regression model that apportions data to individual sites (local units). Obviously, single site businesses are not affected by this model.

Data are collected by ONS from around 62,000 businesses in Great Britain, and by the Department of Finance Northern Ireland (DoF) from around another 9,000 businesses in Northern Ireland.

Sample selection is carried out using a stratified random sample design. Groups of reporting units (cells) are defined by three strata: employment size band; industry; and geographical region. There are around 4,000 of these cells in the ABS design.

All businesses in the largest employment size band are selected every year. For most industries, this is all businesses with 250 or more employees, but for a few industries that have large numbers of businesses with more than 250 employees, it is raised to all those with 1,000 or more employees. The smallest businesses (0 to 9 employment) in any industry will only be selected for one year and then not selected again for the next three years.

To calculate the estimates for an entire population from data collected from a sample, ABS uses standard statistical weighting methods. Essentially the results received from the sample are multiplied by two weights.

The a-weight, also known as the design weight, which accounts for the sample design so that a business’ probability of selection is properly reflected. So, for example, a business with a small probability of being selected for the survey will have a large design weight.

The g-weight, or calibration factor, makes a correction for any potential bias in the selected sample. For example, in a random selection of five businesses out of a population of 10, it is possible that the five businesses selected have, by chance, higher values for the variables of interest than the non-sampled businesses. If no correction is made, the population total would be over-estimated. Auxiliary information, that is, information not collected by the survey, which acts as a proxy for the variable of interest, is used to correct for this effect. The ratio of the actual population total for the auxiliary variable to the population total estimated from the sample’s auxiliary variables is calculated and this is called the g-weight. For ABS, the auxiliary variables are the employment and turnover held on the Inter-Departmental Business Register (IDBR), with the choice dependent on the variable being estimated.

The weighted value is calculated using the following formula:

weighted value = returned value of the variable * a-weight * g-weight

Estimates of population totals are then found by simply summing the weighted values over the whole sample. This method of estimating population values from samples is common to most ONS business surveys, with only detail changes between different surveys. We have described it in some detail here, but it may also be assumed to apply to the other ONS sources listed here unless explicitly stated otherwise.

To produce ABS regional data, the reporting unit data must be apportioned amongst the local units of that business. Regional data are apportioned based on local unit industry classification, employment size and regional location.

The regional ABS methodology uses information held on the IDBR for local unit employment to compile detailed estimates below the national level. Since no local unit information is collected by the ABS, the reporting unit data are apportioned amongst the constituent local units in line with a regression model. The covariates used in this model represent industry, geography and employment size bands. The model parameter estimates are obtained by fitting the model which best predicts the data gathered from reporting units with very few local units.

The ABS provides several variables that are used in regional GVA. Foremost of these are estimates of approximate gross value added (aGVA) and total purchases, which are used in the production measure (GVA(P)) to estimate output less intermediate consumption for a wide range of industries. In the income measure (GVA(I)), we use estimates of total employee costs to represent the compensation of employees in the manufacturing industries and we derive estimates of approximate gross operating surplus by subtracting total employee costs from aGVA, which are used to represent gross trading profits for many industries.

The ABS data are not used in the agriculture or finance industries, or for the non-market public services or activities of households.

Of all the data sources used in regional GVA, the ABS has the greatest overall impact, representing around 71% of GVA(P) and 22% of GVA(I). It also includes elements corresponding to all three of the categories of data we wish to analyse: directly collected from businesses operating in a single region; weighted to represent non-sampled businesses; and apportioned to regions from UK-wide company information.

### Business Register and Employment Survey

The ONS Business Register and Employment Survey (BRES) publishes employee and employment estimates at detailed geographical and industrial levels. It collects comprehensive employment information from businesses in England, Scotland and Wales, representing the majority of the Great Britain economy. The Department of Finance Northern Ireland (DoF) collects the same BRES information independently in Northern Ireland. Both data sources are then combined to produce estimates on a UK basis.

BRES provides data on the number of employees in the UK in the public or private sector (determined by the legal status of the business) and those working on a full-time or part-time basis. For the purpose of the survey, part-time is classified as 30 hours per week or less. BRES data are also used to update the Inter-Departmental Business Register (IDBR), which is the main sampling frame used for most of our business surveys. The survey sample, of approximately 80,000 businesses, is weighted up to represent the economy covering all sectors, using methods similar to those described for the ABS.

If an enterprise is selected for BRES, then all its constituent local units are selected. Data are requested from each local unit. Broadly, the sample is stratified into: large or complex enterprises, unusual enterprises, and medium and small enterprises. Medium and small enterprises are further stratified by country (England, Scotland and Wales) and two-digit SIC 2007. The strata containing large or complex or unusual businesses and medium enterprises in Scotland and Wales are fully enumerated strata.

The Northern Ireland sample in 2014 was around 12,000 businesses and this is the sample that is represented in this analysis. It should be noted that in every alternate year a larger sample is used, either 30,000 businesses or a full census of around 70,000 businesses. In different years the amount of observed and estimated data will vary considerably. We show only the lowest amount of observed data.

BRES employees data are used in conjunction with earnings estimates from the Annual Survey of Hours and Earnings (ASHE) to derive estimates of compensation of employees in regional GVA(I) for all industries except manufacturing and households.

BRES public sector employee estimates are used in regional GVA(P) to represent the non-market output of government and other public services. BRES employees data are also used in GVA(P) for the measurement of parts of the finance industry and the activities of non-profit institutions serving households.

BRES is the second-largest data source used in regional GVA, representing around 24% of GVA(I) and 7% of GVA(P). Because data are collected for each local unit, there is no apportionment of BRES data to derive regional results, so it only contributes to the observed and estimated data categories, the latter comprising the weighting of data to represent non-sampled units.

### Annual Survey of Hours and Earnings

The ONS Annual Survey of Hours and Earnings (ASHE) is the most comprehensive source of earnings information in the UK. It provides information about the levels, distribution and make-up of earnings and hours paid for employees by sex and full-time and part-time working. Estimates are available for various breakdowns, including industries, occupations, geographies and age groups. ASHE is used to produce hours and earnings statistics for a range of weekly, annual and hourly measures.

ASHE is based on a 1% sample of employee jobs taken from HM Revenue and Customs (HMRC) Pay As You Earn (PAYE) records. Information on earnings and hours is obtained from employers and treated confidentially. The total sample size is around 180,000 employee jobs. When non-response is taken into account the achieved sample is around 0.5% of the working population.

ASHE is the official source of estimates for the number of jobs paid below the National Minimum Wage. ASHE is also used to produce estimates of the proportion of jobs within each workplace pension category. Since ASHE is a survey of employee jobs, it does not cover the self-employed or any jobs within the armed forces. Given the survey reference date in April, the survey does not fully cover certain types of seasonal work, for example, employees taken on for only summer or winter work.

Returned data are weighted to UK population totals from the Labour Force Survey (LFS) based on classes defined by occupation, region, age and sex. There are two processes involved in the weighting of responses for ASHE. The first allocates individual cases a design weight to adjust for non-response. For this purpose, responses are treated as being in one of four strata, depending on whether they were part of the original questionnaire despatch, one of the later supplementary surveys or have a special arrangement in place with ONS to return their data electronically.

For the second part of the weighting, the final file of responses is post-stratified to population estimates taken from the LFS in 108 post-strata. These post-strata are defined as a cross-classification of: occupation (nine groups); age-band (three groups); gender; and region (London and South East; and the rest of the UK). ASHE average weekly earnings data are used in conjunction with employee numbers estimates from the Business Register and Employment Survey (BRES) to derive estimates of compensation of employees in regional GVA(I) for all industries except manufacturing and households.

ASHE public sector earnings estimates are used in regional GVA(P) to represent the non-market output of government and other public services, and for the measurement of parts of the finance industry.

ASHE represents around 24% of regional GVA(I) and 6% of regional GVA(P), making it the third-largest data source used. Similarly to BRES, it involves no apportionment, but owing to the relatively small sample proportion the data are effectively 99.5% estimated through weighting to represent non-sampled workers.

### Rental data from the Valuation Office Agency

Actual rentals for housing paid by households are an estimate of the housing services consumed by households who are actually renting their residence. Imputed rentals can be described as the income that a homeowner effectively pays themselves, equivalent to what they could have received if they rented their house to a tenant. In the UK, actual rentals are substantially smaller than imputed rentals, as a majority of households own their homes.

For actual rentals we use Valuation Office Agency (VOA) data on rental prices in England and Wales and similar data from the devolved administrations for Scotland and Northern Ireland. The same data are used to represent imputed rental, since no actual transaction takes place.

These administrative sources have a combined sample size of over 500,000 properties per year and are stratified by region and dwelling type. In addition, they enable furnished and unfurnished properties to be separately identified along with actual rentals for second homes. The regions used are Wales, Scotland, Northern Ireland and the nine English regions and the dwelling types are flats (including maisonettes), terraced houses, semi-detached houses and detached houses.

Within each of these strata, private actual rentals are calculated as the average price of privately-rented dwellings, multiplied by the number of dwellings. Different prices are used for furnished and unfurnished properties, with the number of dwellings being sourced from the Ministry for Housing, Communities and Local Government (MHCLG). Total private actual rentals is then the sum of the private actual rentals across all the strata.

We use the imputed rental data in both measures of regional GVA and the actual rental data in the income measure, since it has a separate rental income component. The production approach includes actual rental within the more general output of the real estate industry. The rental data represent around 10% of GVA(I) and around 9% of GVA(P). The actual rental data contain around 6% observed data, based on the approximate sample size of 500,000 properties, whereas the imputed data are by definition entirely estimated.

### Self-assessment data from HM Revenue and Customs

Her Majesty’s Revenue and Customs (HMRC) provide administrative data on the profits of sole traders and partnerships, collected from self-assessment forms required for taxation purposes.

The data are supplied on a financial year basis and are converted to calendar years. Once converted, the data are lagged by a year compared with the published GVA data, so they are not included in the provisional year estimates. The data are based on an extract of almost 100% of the self-assessment data, making it a virtual census of the self-employed workforce.

The data provided by HMRC are allocated to regions of the UK according to the usual residence of the person completing the self-assessment form. In regional GVA, we allocate GVA according to the place where the activity takes place, what we term a “workplace basis”. While it is likely that the vast majority of self-employed people carry out the bulk of their work within their NUTS1 region of usual residence, this assumption loses credibility when applied to smaller geographic areas. For the purpose of this analysis, we have assumed that most of the self-assessment data are observed data, as we are only concerned with the larger geographic areas.

The exceptions are for industries and regions where the data have been suppressed by HMRC to avoid disclosure of personal information, owing to small numbers of contributors. In these cases, we have imputed values for the missing data and for this analysis, we have classified these as modelled data. In general, they represent a very small part of the self-assessment data.

For regional GVA(I), HMRC self-assessment data for partnerships and sole traders are used respectively as regional indicators for the gross trading profits of partnerships and the mixed income of self-employed workers. Together these represent around 7% of regional GVA(I).

For regional GVA(P), HMRC self-assessment data for sole traders are used as a regional indicator for the GVA of sole traders, for a subset of industries identified as having a significant proportion of self-employed workers. In this case, the self-assessment data are used because the ABS has a relatively poor coverage of sole trader enterprises. This represents around 3% of regional GVA(P).

### Business Enterprise Research and Development Survey

The purpose of the ONS Business Enterprise Research and Development (BERD) Survey is to provide estimates of businesses' expenditure and employment relating to research and development (R&D) performed in the UK. It provides information on expenditure on R&D performed by UK businesses, the source of funding for this R&D work, and the employment of people working on R&D.

The sample is drawn from a continually updated register of known R&D performers in England, Scotland and Wales. In addition, businesses in Northern Ireland are surveyed by the Department of Finance Northern Ireland (DoF) and their estimates added to those we collect to form UK totals. Approximately 5,400 (that is, 4,000 Great Britain and 1,400 Northern Ireland) questionnaires are sent to businesses known to perform R&D.

Smaller businesses identified as R&D performers are sampled using various sampling fractions. The selected businesses are sent a shorter version of the R&D questionnaire, which requests just the R&D expenditure and employment totals. The detailed information for these businesses that is not collected on the short questionnaire is estimated using the data received from the questionnaires of larger R&D performers. Totals for the non-sampled businesses are estimated using ratio estimation with business employment as the auxiliary variable.

Changes introduced as part of the amendments to the System of National Accounts (SNA) in 2008 and the European System of Accounts (ESA) in 2010 specify that R&D, from 2014 onwards, should not be considered as an ancillary activity and therefore used up in the production process as intermediate consumption. Instead, expenditure on R&D should constitute investment in R&D assets, which as a consequence needs to be capitalised in the UK National Accounts. Since this change, R&D expenditure has contributed to the compilation of the value of the UK’s net worth, adding to the stock of assets, and has been included as part of UK gross domestic product (GDP) and regional GVA estimates.

To produce regional estimates of R&D, each business receiving the long questionnaire (the 400 largest R&D performers accounting for approximately 75% of total R&D expenditure) is asked to provide the workplace postcodes for all the sites at which the business performed R&D and to allocate the total expenditure figures of the business to the sites on a percentage basis.

Data for businesses receiving a short form and those not sampled, accounting for around 22% of the total, have their regional proportions estimated by using the county region code for each of these businesses on the business register as a proxy for where their R&D is being performed. The Northern Ireland returns are a virtual census and account for the remaining 3% of the UK total.

The BERD sample and survey results only cover business enterprises. This excludes government organisations, higher education establishments and non-profit organisations. Information on R&D carried out by these bodies comes from the related Government Research and Development (including research councils) survey (GovERD), and the Private Non-Profit Research and Development (PNP) survey. Higher education R&D (HERD) data are collected from a census of higher education institutes and provided to us by the Higher Education Funding Councils (HEFCs).

### Oil and gas extraction data from Scottish Government and BEIS

The vast majority of oil and gas extraction takes place in the North Sea oilfields and is assigned to the Extra-Regio category for activity that cannot be assigned to a single mainland region. However, a small fraction of the activity does takes place on land and this part is allocated to countries and regions.

Scottish Government produces a regular annual analysis of the oil and gas extraction industry, which includes estimates for the total UK activity and also for the offshore part. This allows us to derive the onshore part by subtraction.

The offshore activity is all assigned to the Extra-Regio category. The onshore activity is allocated to regions using volume data from the Department for Business, Energy and Industrial Strategy (BEIS), which provides the actual amounts of oil and gas extracted from each field on UK land.

The data provided by BEIS also cover the offshore oilfields and are used by Scottish Government as the basis for their estimates, ensuring consistency between onshore and offshore allocation. The different types of output (for example, crude oil, natural gas) are all converted to a standard unit of tonnes of oil equivalent (TOE) so that activity can be allocated on an equivalent basis.

Onshore and offshore oil and gas extraction feature in both measures of regional GVA and the same data source is used in both, although the data are broken down into different components for use in the income and production measures. In both cases, we consider the data used to be 100% observed, as they are based on actual volumes of oil and gas by geographic location.

### Agricultural accounts data from Defra

The agriculture industry is mostly measured using data provided by the Department for Environment, Food and Rural Affairs (Defra). Defra provides a range of variables from their agricultural accounts, which are themselves compiled from multiple data sources covering various aspects of the agriculture industry. It is therefore difficult to accurately assess the extent of modelling and estimation within the agricultural data without involving Defra in some rather burdensome work. We can, however, make some assumptions based on the information we know. Since agriculture represents only around 0.6% of UK GVA, the impact of these assumptions on our analysis is small and localised.

#### Extract from Defra’s Summary quality report for Total Income from Farming releases

The following extract is taken from Defra’s Summary quality report for Total Income from Farming releases published in July 2013:

3.1.1 First estimate.

This estimate is published four months after the end of the reference year. It is based on 65 per cent ‘actual’ data by value from survey results and administration data, and on model-based estimates largely for output or production data, with most intermediate consumption and other costs being derived from price data, estimates of volume changes based on professional advice, and a variety of modelling techniques.

A full dataset for the production and income account with revisions to previous years is published (see also ‘Third estimate’). Other analyses, such as of productivity and volume indices are published in the following month in the statistical compendium, ‘Agriculture in the United Kingdom’.

3.1.2 Second estimate.

This estimate is published eleven months after the end of the reference year. The estimate of Total Income from Farming is improved by basing most estimates of intermediate consumption and other costs on the results of the Farm Business Survey results for England that are published in October. At this point, Total Income from Farming is based on 90 per cent of actual data by value. A revised dataset for the aggregate agricultural accounts is published.

3.1.3 Third estimate.

This estimate is published in April of year n + 2 following the reference year at the same time as the first estimate for the next reference year (see ‘First estimate’ above). In this release, Defra publishes a full dataset incorporating estimates made by the devolved administrations in compiling agricultural accounts for Scotland, Wales and Northern Ireland. At this point, the estimate of Total Income from Farming is based on 100% ‘actual’ data by value.

Methodological improvements may also be made and, where possible, applied to the whole series to ensure comparability of the time series is maintained.

#### How ONS will use Defra estimates

Provisional year estimates of regional GVA will use the first estimate from Defra, but all other years will be based on the third estimate, with 100% “actual” data. It seems fair to assume that at this point, there is no modelling of data, but the partial use of surveys does suggest a hidden amount of estimation within the process. We will therefore use the middle estimate of 90% observed data and 10% estimated data for the purpose of this analysis.

### Labour Force Survey

The primary purpose of the ONS Labour Force Survey (LFS) is to provide good quality point-in-time and change estimates for various labour market outputs and related topics. The labour market covers all aspects of people's work, including the education and training needed to equip them for work, the jobs themselves, job-search for those out of work and income from work and benefits.

The sample is made up of approximately 40,000 responding UK households and 100,000 individuals per quarter. Respondents are interviewed for five successive waves at three-monthly intervals and 20% of the sample is replaced every quarter. The LFS is intended to be representative of the entire population of the UK.

The LFS uses calibration weighting. The weights are formed using a population weighting procedure that involves weighting data to sub-regional population estimates and then adjusting for the estimated age and sex composition by region (income weighted separately).

One of the limitations of the LFS is that the sample design provides no guarantee of adequate coverage of any industry, as the survey is not stratified by type of industry. The LFS coverage also omits communal establishments, except for NHS housing and students in boarding schools and halls of residence. Members of the armed forces are only included if they live in private accommodation. Also, workers aged under 16 years are not covered.

Data from the LFS are used in both measures of regional GVA to represent the activity of households as employers. The LFS data are used because we lack any alternative source of information on this activity, since households are excluded from business surveys and tend to have very limited coverage in administrative data sources.

It is not possible to calculate the true sampling fraction for households that employ people, so we have assumed a random distribution and have used the overall sample coverage of the working age population, which comes to approximately 0.25% observed data, the remainder being estimated.

## 6. Results for countries and regions of the UK

We have compiled the results for the data sources described in the preceding section and for many other components that have lesser individual impact upon regional gross value added (GVA) estimates. In this section we present those results in aggregate form, for each country and region of the UK.

Note that results shown for the UK as a whole are the aggregation of values for all countries and regions in estimates of regional GVA and do not represent the data sources used in the national accounts measure of UK GVA. In general, we would expect the national accounts measure to have a higher proportion of observed data, since there is no need for modelling by apportionment. We would expect to see a similar amount of estimation for non-sampled businesses, as this is standard practice for sample surveys and affects nearly all ONS statistics except the census.

In the tables and charts that follow, we show the percentage of each of the two regional GVA measures, using the income approach (GVA(I)) and the production approach (GVA(P)), that is represented by directly observed data, data estimated using standard statistical sampling methods, and data modelled by apportionment or other means.

The process of combining the two measures to produce a single balanced estimate of regional GVA (GVA(B)) is carried out using quality metrics at a lower level of geography than is used in this analysis. It is therefore difficult to apply these as weights to the GVA(I) and GVA(P) results to produce corresponding GVA(B) results. In any case, it is not obvious what additional value would be provided by having this aggregate over and above the information available for the two component measures, since both are used in the GVA(B) estimate.

Table 1 shows the results for all countries and regions at the whole economy level. These values are shown graphically in Figure 1 and Figure 2, which show respectively the income and production measures.

The results for the income measure (GVA(I)) are very consistent, with most regions having between 28.9% (South East) and 33.7% (Scotland) observed data, from 47.6% (Scotland) to 54.3% (East of England) estimated data, and between 15.2% (South West) and 19.4% (Wales) modelled data. The clear exception is Northern Ireland, with 50.9% observed, 39.8% estimated and only 9.3% modelled data.

The main reason for this difference is the unique status of Northern Ireland in that data collection for business surveys is carried out separately by the Northern Ireland administration. Data are collected from businesses solely for their activity in Northern Ireland, so the need for apportionment is drastically reduced.

Variations between other regions are generally small and can be traced to differences in the industrial composition of the regions.

The results for the production measure (GVA(P)) are slightly less consistent, but there is still a large degree of consistency between most regions. All regions except Northern Ireland have between 18.7% (South East) and 24.9% (Scotland) observed data, from 44.7% (London) to 57.5% (East of England) estimated data, and between 20.8% (South West) and 35.5% (South East) modelled data.

Again, Northern Ireland is very different, with 57.5% observed, 39.4% estimated and only 3.1% modelled data. The effect of Northern Ireland’s separate data collection is even more pronounced here due to the greater use of business survey data in the GVA(P) measure.

It is also notable that London and the South East have greater proportions of modelled data than the other regions and correspondingly lower amounts of both observed and estimated data. This is due to those regions having greater representation of industries where modelling has been needed, and is also likely to be accentuated by larger companies maintaining a presence in or around the capital, often the company headquarters.

Overall, we can see that there is a greater proportion of observed data in the income measure (31.6% at the aggregate UK level compared with 22.8% in the production measure) and the simplest, though not the only, reason is rooted in the use of two major business surveys. The Annual Business Survey (ABS) has a relatively large degree of modelling, where large companies’ data need to be apportioned to regions. The Business Register and Employment Survey (BRES) collects data for each individual site within sampled businesses, so no modelling is required. GVA(I) makes greater use of BRES data than GVA(P) does. GVA(P) makes greater use of ABS data than GVA(I) does.

Table 2 shows the industrial breakdown of the results at the aggregate UK level. In general, the same data sources are used to measure an industry across all countries and regions of the UK, so most regions will follow a similar pattern. You can find tables showing the industrial breakdown of the results for each individual country and region in Annex A.

The two industries that stand out as having the greatest proportion of observed data in either measure are agriculture (A) and mining (B). These industries have very small GVA contributions in most regions, with the majority of mining GVA going to the Extra-Regio category. They do have slightly greater weight in the economy of Scotland, though, and this helps to explain Scotland’s slightly higher proportion of observed data.

Of the other industries, the public services (O, P and Q) tend to have higher proportions of observed data than most and this is helped by their use of BRES employees data. As principally non-market industries, the compensation of employees plays a greater part in total GVA than it does in the more profit-oriented market industries.

The two industries that stand out as having the smallest proportion of observed data in either measure are real estate (L) and households (T). Real estate is dominated by imputed rental of owner-occupiers, which is entirely estimated. Households is measured using data from the Labour Force Survey (LFS), which represents a very small proportion of the total working population and has a very small contribution to GVA.

In the production measure, some other industries have smaller proportions of observed data due to the need for apportionment of large companies in the ABS. Most notable among these is the wholesale and retail trade (G), which tends to be dominated by large chains operating across the UK.

Industries with larger proportions of modelled data include electricity and gas (D) and finance (K) in both measures, with water supply (E) also standing out in the production measure. Again, this is due mainly to the dominance of large companies operating across regions.

Further details of regional variations in each industry can be found in the tables in Annex A.