We plan to introduce alternative data sources and new methods for consumer price statistics from 2023.
Rail fares and second-hand cars are the first categories we intend to transform.
By adopting weighted multilateral index methods, we can make better use of these data sources.
We are planning to use the GEKS-Törnqvist, a 25-month window and a mean splice on published series.
In this article we explain why multilateral methods such as the GEKS-Törnqvist are preferable and how these methods work in practice.
We are currently undertaking an ambitious programme of transformation across our consumer price statistics, including identifying new data sources, improving methods, developing systems and establishing new processes. From 2023, we are looking to introduce alternative data sources for second-hand cars and rail fares statistics. These data sources will allow us to measure inflation from an improved coverage of price quotes, introduce weights at a lower level of aggregation than before, and allow for automated data acquisition.
However, integrating these data sources comes with unique challenges when compared with the traditionally collected data sources that we are used to processing. Our index methods need to be updated to make the best use of these new, dynamic data sources.
Over the past few years, we have compared a variety of different methods, including traditional fixed-based methods, chained methods and multilateral methods. In our early work we produced a framework for understanding and studying these methods. We worked with the Economic Statistics Centre of Excellence (ESCoE) to commission an expert-led review of multilateral index methodology by Fox, Levell and O'Connell. We also took our empirical analyses to our Technical Advisory Panel on Consumer Prices and to the consumer price inflation-themed Ottawa Conference (PDF, 794 KB). The result of this work is for the Office for National Statistics (ONS) to use the GEKS-Törnqvist, a 25-month window and a mean splice on published series as the index methodology choices for integrating weighted alternative data sources.
There are several reasons for us to adopt this index methodology, including:
it allows us to make better use of entering and leaving products within our dynamic scanner data
the GEKS-Törnqvist is weighted, treating products in line with their economic importance
the GEKS-Törnqvist is free of chain drift (unlike the bilateral Törnqvist) and the use of the Törnqvist better accounts for substitution behaviour
it ensures that the chain drift introduced by extending our series is mild: a 25-month window strikes a good balance between mitigating drift and pragmatic considerations over when data can be used
geometric averaging within the mean splice better avoids overt influence of outliers
Alternative data sources
We are looking to introduce a variety of alternative data sources into consumer price statistics. We are initially focussing on introducing web-provided data for second-hand cars and transaction data for rail fares.
Comparison of traditional and alternative datasets
Data collected: Manually
Product coverage: Generally fewer products
Time coverage: Monthly prices (mostly)
Type of data: Static
Low-level weights: No
Data collected: Automatically
Product coverage: Generally more products
Time coverage: Varies, sometimes transactions (for example, scanner); sometimes weekly or daily prices (for example, web-scraped)
Type of data: Dynamic
Low-level weights: Varies, yes (for example, scanner data) or no (for example, web-scraped data)
Alternative data sources allow us to obtain much greater product and time coverage of markets with an acquisition process that is more automated.
When it comes to using index formulae to aggregate prices into indices, it is important to consider the type of data being processed (static or dynamic) and the availability of product-level weights. The unweighted bilateral index formulae that are currently used in traditional consumer price statistics (such as the Jevons index) would not make the most of weighted alternative data sources because of these considerations.
Static and dynamic data: why multilateral methods are preferred for dynamic data
Real-world markets are "dynamic". Products enter and exit markets, causing inconsistencies in observed prices. An example is shown in Table 1, where product 2 leaving the market in April results in a period where no prices can be observed for product 2.
|Product||Price, Jan||Price, Feb||Price, Mar||Price, Apr|
Download this table Table 1: Dynamic data are inconsistent because of product entry and exit.xls .csv
In traditional consumer prices indices compilation, prices are collected for a sample of products and products are typically replaced (or in rarer cases, imputed) as they fall out of the market, using quality adjustments where necessary. As sample sizes are smaller with traditionally collected data, these replacements can be made manually.
Replacing products as they leave the market produces a "static" dataset. An example of this is shown in Table 2. In Table 2, products 1 to 3 are chosen from Table 1 to form an initial sample, and as products 2 and 3 leave the market they are directly replaced with products 4 and 5, respectively, which have been chosen as comparable replacements of the same quality.
|Product||Price, Jan||Price, Feb||Price, Mar||Price, Apr|
|2 --> 4||4||4||4.1||4.5|
|3 --> 5||4||3.8||3.6||4.1|
Download this table Table 2: A static data frame created by sampling from Table 1 and making comparable replacements as products leave the market.xls .csv
A bilateral index method compares the prices of a measurement month (for example, April) against a base month (for example, January) excluding information from other months. The bilateral method uses data from two ("bi") periods only.
A multilateral index method compares the prices of a measurement month (for example, April) against a base month (for example, January) including information from other months (for example, February and March). The multilateral method uses data from multiple periods.
Bilateral methods work well with static datasets since all rows have "matches" in prices between the base and measurement months, and so all rows can be used. They work less well with dynamic data where there may be some unobserved prices in the base or measurement month that would cause the product not to be useable in calculations. Multilateral methods, on the other hand, perform well with dynamic data. In Table 1, being able to use information from February and March to calculate the April index allows the April index to be influenced by the price increase that product 4 experiences between February to April.
Since we would like to make full use of scanner and web-scraped data, multilateral indices offer a means of working with the dynamic nature inherent to these large datasets.
In traditional consumer prices indices practice, we use national accounts data to determine the economic importance (weight) of higher-level aggregates, for example when aggregating apples and oranges into a fruit index. However, expenditure surveys do not have the granularity to weight varieties of products at the lowest levels of aggregation, and therefore unweighted indices such as the Jevons and Dutot are most appropriate.
In the Table 3 dataset, there are two products, with product 1 capturing 80% of consumer expenditure. Product 1 doubles in price and product 2 halves in price. The (unweighted) Jevons index treats each product with equal importance giving an index of 1; whereas the (weighted) Törnqvist, gives an index of 1.52, recognising the greater economic importance of product 1. Scanner data give us access to product-level weights so it would therefore be advantageous to make use of these to better reflect the economic importance of different products within the inflation index.
|Product||Price, Jan||Price, Feb||Weight|
Download this table Table 3: Weighted indices treat product 1 with greater importance because of its higher weight.xls .csv
Therefore, since scanner data are dynamic and include product-level weight information, we are looking to use a weighted multilateral index method to make best use of the scanner data. We are intending to use the GEKS-Törnqvist to do this. The GEKS-Törnqvist gets its name from five contributors, Gini, Éltető, Köves, Szulc and Törnqvist and is sometimes alternatively known in international literature as the CCDI (Caves-Christensen-Diewert-Inklaar) index. To understand the (multilateral) GEKS-Törnqvist, we first explore the (bilateral) Törnqvist.Back to table of contents
Definitions and notation
In scanner data, we obtain expenditure (e) and quantity (q) sold from retailers. We then make the following transformations:
price (p) is calculated by dividing expenditure by quantity (p = e/q)
expenditure shares (s) are calculated as the proportion of expenditure for a product relative to the expenditure on all products within a category (s = e/∑e)
We indicate the month with a number (for example, e2 indicates expenditure in month 2). Table 4 shows an example of these transformations. For the rest of this article, we assume these transformations are performed and use prices and expenditure shares.
Download this table Table 4: How we transform month 2 expenditures and quantities to prices and expenditure shares.xls .csv
Note that the definition of a product varies depending on the type of good – definitions are given for rail fares and second-hand cars within our impact analysis. The expenditure and quantities given in Table 4 are the summed expenditure and quantities for all transactions for that product within the month.
Suppose that we have obtained the prices and expenditure shares in Table 5, and we wish to calculate the Törnqvist from a base month (month 1) to a measurement month (month 2).
Download this table Table 5: Data used to calculate a Törnqvist index.xls .csv
We first calculate (in Table 6):
price relatives (r): divide the measurement month price by the base month price
average expenditure shares (h): the arithmetic mean of the base and measurement month expenditure shares
Download this table Table 6: We have expanded Table 5 to include price relatives and average expenditure shares.xls .csv
The Törnqvist formula is calculated as a weighted geometric average of price relatives (r), using average expenditure shares (h) as weights:
Note that the Jevons, that uses equal weights when calculating a geometric average over price relatives, can be understood as an unweighted variant of the Törnqvist:
Since the Jevons is the most common index method used within consumer price statistics, there is a consistency in using the Törnqvist alongside the Jevons to form inflation statistics at the elementary level.
We can express these index methods as formulae:
Consumers often substitute from products increasing in price to products decreasing in price, particularly at the product level. Index methods with fixed weights taken before (or after) this substitution occurs can understate (or overstate) substitution, resulting in too-high (or too-low) indices. This is recognised as "substitution bias". By using an average of base and measurement month weights, the Törnqvist does not suffer from substitution bias and is known as a "superlative index".
While accounting for product-level weights, in their natural form superlative indices such as the Törnqvist and Fisher are still only suitable for static data. To enable the methods to be suitable for use with dynamic data, they can be adopted into the GEKS multilateral method.Back to table of contents
How to calculate the GEKS-Törnqvist
The GEKS multilateral index method is paired with an underlying bilateral index method. Common options include the GEKS-Törnqvist, GEKS-Fisher and GEKS-Jevons. As a multilateral method, the GEKS uses price observations from all months within a given "window", irrespective of where they are relative to the base or measurement month. The number of months used within a window is referred to as the "window length". Common choices of window length are 13, 25 and 37 months. For brevity, our demonstrations will use the GEKS-Törnqvist with a 13-month window, despite using a 25-month window as our final method choice.
We set out the following shorthand (as an example):
This measures the GEKS-Törnqvist from month 1 to month 5 within a 13-month window covering months 1 to 13. Note that the base and measurement months must exist within the window.
To calculate GEKST(t1,t5,t1:t13), this can be viewed as a "stepping stone" index, where the goal is to get from t1, to t5, by first stepping through one of the "stepping stones" t1:t13. We take one step from the base month to the "stepping stone", and one step from the "stepping stone" to the final month. Each "step" represents calculating a bilateral Törnqvist index. We multiply "pairs" of Törnqvists together and take a geometric average of all possible pairs.
We first calculate the following equations:
Note that if a new product appears in month 4 then increases in price in month 5, then it will influence the Törnqvist(t4,t5) index, which in turn influences the overall GEKS-Törnqvist index. This allows entering (and leaving) products to affect indices.
GEKS-Törnqvist worked example
In Table 7 we have given prices and quantities for four months. Our goal is to calculate GEKST(t1,t3,t1:t4).
Download this table Table 7: Four months of price and quantity data to be used to calculate the GEKS-Törnqvist from month 1 to month 3..xls .csv
To calculate GEKST(t1,t3,t1:t4), we need to calculate four pairs of underlying bilateral Törnqvist indices (since a window length of 4 means four possible "stepping stone" months). The Törnqvists are calculated using the same approach in the Törnqvist index section:
We then calculate GEKST(t1,t3,t1:t4) as a geometric average:
The revision problem
However, note that the measurement month of GEKST(t1,t5,t1:t13) is May (t5), but it requires information beyond May (for example, Törnqvist(t1,t13) uses information from December). This poses a risk to timely publishing, as we would need to wait until December to publish the May index. This is described as "the revision problem" and means that we cannot directly use the GEKS-Törnqvist in the production of consumer price statistics. Instead we need to look at using extension methods alongside GEKS to solve this problem.Back to table of contents
The rolling window
It is possible to calculate timely indices, avoiding the revision problem.
We construct a "rolling window" of indices. We start within window 1, t1:t13. We calculate all 13 months of indices within this first window. We then roll the window on by one month to t2:t14 and calculate all 13 months within window 2. We repeat this process, giving Table 8.
|Month||Window 1||Window 2||Window 3|
Download this table Table 8: Rolling window indices.xls .csv
In Figure 1, we give an example of how the indices in Table 8 may look. Each window in Figure 1 has a separate reference point – month 1 for window 1, month 2 for window 2 and month 3 for window 3. Our goal is to combine these windows so that inflation can be measured from a single reference point. To do this we use extension methods. Several methods exist for this, but we will focus on splicing methods for the purpose of this article.
Extension methods: splicing
We can combine the indices given in each window in Figure 2 to produce a single index series with a single reference point. Splicing gives us a way of doing this. Windows 1 and 2 have 12 "overlapping" monthly indices. When splicing, we choose one (or more) of these months to be a link month, adjusting window 2 to link onto window 1 on the chosen month(s). The resulting new index is then appended to the published series.
For example, if we connect two consecutive windows by using the first overlapping month as the link month, then we obtain the "window splice" as shown in Figures 2, 3 and 4.
Figure 2 shows indices calculated for a rolling window covering the three months prior to splicing; window 1 forms our initial published series.
Figure 3 shows window 2 spliced onto window 1 and window 3 spliced onto the recalculated window 2 indices (giving the window splice) and onto the published series (giving the window splice on published); the window splice and window splice on published give the same result when splicing window 2.
In Figure 4 the final index values for the recalculated (post-spliced) windows 2 and 3 join the published series.
Figures 2, 3 and 4 show the window splice but there are alternatives. We could splice on any other month, with named options including the movement splice (splicing on the last shared month) and the half-splice (splicing on the middle shared month). Alternatively, we could splice on every month and take a geometric average, giving the mean splice. The international consensus is shifting towards use of half- and mean-splices that have a moderating effect on the relatively extreme behaviour found in the window and movement splices. In our research, the choice of half- or mean-splice had a mild effect on results. Each of these methods can be spliced on published or recalculated indices.
Following extensive research and recommendations from external experts, we have decided to use a GEKS-Törnqvist with a 25-month window and a mean splice on published series extension method. The mean splice mitigates against outliers and moderates the more extreme behaviour found in the window or movement splices. A 25-month window is chosen for a suitable trade-off between characteristicity and transitivity (as described in paragraph 15.26 in the Australian Bureau of Statistics guidance on multilaterals).
Finally, in our rolling window, window 1 indices can only be published in month 13, window 2 indices in month 14 and so on. We therefore use month 13 as our reference point. For example, with a 13-month window, if we entered production in January 2023, then month 1 is January 2022 and month 13 is January 2023. Months 1 to 12 are historical data used to initialise the process. Since month 13 is our reference, we will need to re-reference our index to this month so that it can be aggregated with traditional data sources.
Splicing worked example
In Table 9 we have produced GEKS-Törnqvist indices using a five month window for three windows. We enter production in month 5, using months 1 to 4 as back data. We are looking to use the movement splice to link together these windows so there is a single reference point and obtain an index for months 6 to 7 in reference to month 5.
|Window 1||Window 2||Window 3|
Download this table Table 9: We look to splice windows 1 to 3 to produce an index from month 5 to months 6 to 7.xls .csv
The solution for this is given in Table 10. Since we are using a movement splice, we will splice on the last shared month between consecutive windows. We therefore take the following steps:
Splice window 2 onto window 1 using month 5.
We adjust window 2 by multiplying by 1.008 (1.026 divided by 1.018); these values are given in "Window 2 (spliced)".
Splice window 3 onto the spliced window 2 using month 6 and adjust window 3 by multiplying by 1.015 (1.031 divided by 1.016) (note that 1.031 comes from the recalculated post-spliced window 2, not the original pre-spliced series); these values are given in "Window 3 (spliced)".
The "published" series extends window 1 into windows 2 and 3, creating a series with month 1 as a single reference point.
We divide this series by 1.026 to obtain indices for months 6 and 7 in reference to month 5, when we entered production.
These final values are published.
|Month||Window 1||Window 2 (spliced)||Window 3 (spliced)||Published||Published (re-referenced)|
Download this table Table 10: Using a movement splice gives an index of 1.005 for month 6 and 1.004 for month 7 when referencing month 5 (the month we enter production).xls .csv
Had we chosen to splice on the published series, we would splice on the published column rather than the recalculated spliced series.Back to table of contents
The index methodology outlined in this article will form the starting point for how alternative data sources for most goods categories will be introduced into consumer price statistics. However, we will continue to explore the suitability of these methods with other data sources we look to introduce in future. If we look to introduce web-scraped data sources, we may need to consider the suitability of an unweighted variant (perhaps the GEKS -Jevons). We may also need to consider alternatives for when we experience rapid product turnover and product obsolescence because of rapidly changing quality.Back to table of contents
Office for National Statistics (ONS), released 28 November 2022, ONS website, methodology article, GEKS-Törnqvist: introducing multilateral index methods into consumer price statistics
Contact details for this Methodology
Telephone: +44 1633 456900