Geographic referencing (or 'georeferencing') is an increasingly important process in the production of National Statistics, allowing greater data accuracy and facilitating the sharing and aggregation of data.
This 'Beginners Guide to Geographic Referencing' describes the process and explains why geographic referencing is an improvement on the existing process of postcode referencing.
The production of National Statistics involves the collection, processing and output of statistical data.
Most data events can be referenced to a known location and this means that most statistics can be output using a geographic classification.
For example, we might produce statistics of unemployment rate by electoral ward, or birth rate by local authority district.
Since the late 1970s the approach to data referencing has been to use the event postcode, as described in section 3.
Although this has been a valuable method, it is not without its limitations, and we are therefore moving towards a new approach, geographic referencing.
This involves referencing events to a specific and fixed point, usually a grid reference; the many advantages of this are described in section 4.
3. Postcode Referencing
The traditional method of referencing data to the event postcode has a number of advantages:
most people know their postcode so can readily supply it when responding to a survey
postcode directories (such as the ONS Postcode Directory or National Statistics Postcode Look-up) can be used as a ready means of matching each postcode to a range of geographic areas
3.1. Problems with postcode referencing
Although postcode referencing is very straightforward, it has a number of key weaknesses:
3.1.1. Postcodes do not map directly to other geographic areas
Postcodes areas do not take account of administrative boundaries (or any other geography).
This 'straddling' of boundaries means that many postcodes can only be assigned to administrative areas on a 'best fit' basis.
The result is that addresses lying close to administrative boundaries are sometimes assigned to the wrong area.
For small areas such as electoral wards the resulting statistical errors can sometimes be considerable.
Fortunately the errors are less significant for larger areas as:
there are proportionally fewer postcodes straddling the boundaries
the errors are more likely to be cancelled out as data which are wrongly allocated to one area may be balanced by an opposite misallocation elsewhere on the area boundary. This cancellation effect is even stronger in datasets with a large number of observations
3.1.2. Postcodes can move around
Royal Mail assigns postcodes to address locations for the sole purpose of providing an efficient mail delivery service.
Postcodes may be discarded, reassigned and reused as a result of demolitions and new building activity.
Although ONS Geography maintains a database of discarded postcodes, this cannot by itself be relied upon to provide an accurate locational reference.
Royal Mail may occasionally decide to reuse these discarded postcodes in another part of the same postcode sector and thus the physical location of a postcode may shift.
This could cause data to be assigned to the wrong area unless care is taken to use the correct year's directory (note though that Royal Mail will aim to not reuse a postcode for at least two years after it has been discarded).
3.1.3 Area boundaries keep changing
The UK has a very high level of electoral and administrative boundary change - for example, between 2001 and 2010 there were over 8,000 electoral ward/division boundary changes in England and Wales alone.
This further complicates postcode to area referencing.
Once a ward boundary has changed the allocation of some properties will be incorrect.
In addition, when the next version of the postcode directory is released, it will once again be affected by straddling.
All properties in the split postcode will end up referenced to either Ward A or Ward B, and this means that a proportion of them are bound to be wrong.
3.2. Postcode referencing: Conclusion
Postcode referencing is a straightforward approach but has a number of weaknesses relating both to the unstable nature of UK geography and also the fact that postcode boundaries do not match up to those of other geographic areas.
In general these problems are relatively insignificant when dealing with large areas, but can be more substantial for small areas.
With the advent of Neighbourhood Statistics and the associated demand for small area statistics, a better method of referencing is required.
We are therefore moving towards geographic referencing.
4. Geographic Referencing
As indicated, referencing to postcodes has a number of limitations.
If however we can reference to something which is fixed - for example, a grid reference, the problems are reduced.
There is also better potential for data visualisation as grid-referenced events can be located on a map and viewed in relation to other geographic features including administrative areas and boundaries, as well as physical features such as roads, coastline and buildings.
As well as simply viewing the data, we also have the potential to use Geographic Information Systems (GIS) to carry out detailed analysis and modelling.
We can also readily link between different datasets, as we simply need to identify events with a common grid reference.
There are a number of possibilities for geographic referencing:
4.1.1. Geographic referencing using the postcode centroid
Under the Gridlink® initiative, ONS Geography's postcode directories provide the grid reference of the property closest to the postcode centroid (the geographic centre of the postcode).
This is a good start, and may be the most accurate reference possible as we may not have any more detailed locational information for the data event.
However, although we can relate the grid reference of the postcode centroid to a map, and perform detailed analysis on the associated data, this method does not solve the problems of straddling and boundary change.
4.1.2. Geographic referencing using address-level grid references
Address-level grid referencing, which we are working towards, is even more powerful.
Whereas the postcode centroid gives an approximate location of a data event, the address-level grid reference describes precisely where it occurs.
This has several advantages:
straddling is no longer an issue as postcodes are no longer considered
dealing with administrative boundary change is even easier. We simply load the new boundary set into a GIS and, knowing the events are precisely located, can very quickly produce accurate statistics for the new boundaries
outputs and analysis can be even more flexible. For example, if we wanted to consider whether there is a relationship between how close people live to a motorway and the incidence of a particular disease, our data is now referenced with the accuracy required to do this
Note however that although address-level grid-referencing is powerful, it does have limitations:
not all data can be assigned to an address
automated assignation of grid references to addresses is more difficult than it is for postcodes. This is because, unlike postcodes, addresses can be lengthy, complicated and inconsistent. For example, the first line of an address may be a building number and street name, the number of a flat within a building, or the name of a property
as data relates to individual addresses, so greater security precautions may be required to protect the confidentiality of individuals
4.2. Other forms of locational referencing
Address-level grid referencing is appropriate for data events that relate to residential and business properties, but some events relate to other types of location.
For example, if the data event is the occurrence of a specific type of cereal crop, the location will be a field.
Such events can be assigned to land parcels via identifiers such as Ordnance Survey's topographic identifiers (TOIDs) or Land Registry parcel boundaries.
Other events (for example, the location of a street crime) may simply need a grid reference.
An alternative might be to reference to the nearest address.
The key point though is that all data needs suitable, consistent and unambiguous geographic identifiers.
The approach of using postcodes to reference geographic data has been a valuable tool but is subject to a number of serious limitations, especially when trying to produce statistics for small areas.
The move towards geographic referencing based on the postcode centroid offers many advantages in terms of facilitating event linkage, data visualisation and data analysis, but doesn't eliminate the problems caused by straddling and boundary change.
If a reference can be given at address-level however, something we are working towards, the potential is even greater, allowing for detailed and accurate small-area statistics.
Different types of data will of course require different types of referencing, and issues such as ensuring confidentiality are crucial.
The Office for National Statistics is therefore giving a great deal of attention to ensuring that we utilise geographic referencing in the best possible way.
The result should be a major contribution to the quality of UK statistics.