In this section
1. Scope
This document outlines the policy for data linkage and matching performed for research and statistical purposes.
The scope of this policy includes linking (also known as matching) of datasets as well as the dissemination of linked and matched data, where the Office for National Statistics (ONS) is either a data controller or a data processor. It applies to all ONS staff (including contractors) and any external parties accessing ONS systems for the purposes of linking (also known as matching).
Back to table of contents2. Background
The Office for National Statistics (ONS) relies on an increasing number of data sources to produce the official statistics the country relies on. The number of sources has particularly increased since creation of the Digital Economy Act in 2017. Data linking and matching is an incredibly powerful and useful technique, which combines datasets to enrich the information they contain. This has a huge potential benefit for government and could help find patterns and insights adding value to existing analyses, answering important questions about our economy and society.
Good data linkage will bring cost efficiencies, making better use of existing data, and will reduce the burden on respondents by avoiding the re-collection of data. It will also lead to improved data quality through the identification and removal of duplication and other inconsistencies. Bad data linkage can cause inaccurate statistics and biased outputs.
As the UK’s recognised national statistical institute, the ONS takes very seriously its responsibility to link datasets securely, ethically and robustly. The ONS is taking a leading role in advancing data linkage techniques and showing its trustworthiness in sharing and linking data through robust data safeguarding and clear public communication.
Back to table of contents3. Policy statement
Data linkage and matching is only conducted by the ONS for the purposes of producing statistics or undertaking research that serve the public good.
ONS will ensure that data linkage and matching is done securely, legally and ethically, including privacy considerations, and complies with well-established quality standards.
Back to table of contents4. Policy detail
Data Security and Privacy
Linkage of data from multiple sources may involve sensitive personal or business information. When the data sources are combined, it may be possible to establish unusual or inconsistent attributes, for example, an individual's social, financial or health circumstances. It is absolutely essential that information used for linkage projects is kept secure and used in line with GDPR regulations.
The resulting linked dataset is a new data asset that should be added to the appropriate data catalogue in line with the ONS Security Principles, and the ONS Data Principles, and with relevant data standards applied. A new data sensitivity assessment will need to be completed and the results included in the appropriate data catalogue. Where the new data sensitivity results are higher or lower than those of the multiple sources used, a check should be performed to ensure the new dataset is stored in an appropriate secure location commensurate with the new agreed data sensitivity result. Any datasets sent to internal or external recipients should be accompanied by appropriate data and metadata handling instructions.
When linking data sources containing sensitive personal information, consideration should be given to methods for protecting the identities of individuals and organisations, in line with the conditions imposed by data controllers, which will be determined by the potential privacy risks.
Business areas involved in linking would decide on a case-by-case basis on the optimal method to protect the privacy of data subjects. Two main approaches should be considered:
Separation - ensuring that individuals linking/matching data would have access only to the variables required for the linkage/matching and that there is a clear segregation of duties; or
anonymisation - ensuring that anonymised (or more strictly pseudonymised) data are used for linkage.
The ONS is exploring the use of automated approaches to linkage to improve efficiency, in particular for large datasets, while enhancing data privacy.
Ethics
Data linkage activities must adhere to the UK Statistics Authority's ethical principles and comply with the ONS's Data Ethics Policy.
Quality Standards
In a linking project, errors will occur. Errors may be of the 'false positive' type, where a pair of records are incorrectly linked, or of the 'false negative' type, where a pair of records relating to the same entity are not linked.
Match rates can be used, for example, to show the proportion of records on a dataset that are linked with another dataset, or the proportion of links made by a particular linkage method. However, match rates give no indication of the quality of the linkage, just the number of matches made and so should not be used as a quality metric.
When any new pair of datasets is linked, even if one or both have been linked to another dataset previously, a new set of matching errors arise. The quality of each linkage project should be assessed in terms of the errors made, with estimates of precision (the proportion of links made that are true matches) and recall (the proportion of all true matches that have been found) always being reported. For the linkage of pseudonymised data, evaluation of matching quality will require the data owner to provide subsets of identifiable data.
The required quality standard should be determined for each data linkage project. It is anticipated that different projects will have different success criteria for linkage quality, and this may be a key determinant of the most appropriate matching methodology. For example, linkage using exact agreement on matching variables will yield results quickly, but will not find all the possible matches; whereas a strategy employing a range of methods, starting with exact matching and then following with probabilistic and finally clerical resolution would find more good quality matches but at a cost of additional time and resource.
Staff conducting linkage should select the appropriate methods, to make sure that resource and time invested are proportional to the potential use and impact of the linked data.
Regardless of the linkage methods used, the quality of the input data and the number of datasets being linked will also affect the quality of the resulting linked dataset. For example:
missing data values make an impact on the ability of any matching algorithm to make good matching decisions and add uncertainty to the measurement of linkage errors, and
the more datasets that are linked together in sequence, the more potential there is for matching quality to drop
All data linkage projects should have an understanding of the linkage quality and the impact of any matching error or bias, for example, for hard-to-match populations, on subsequent analyses. In addition to reporting on linkage quality, it is good practice to report metrics on characteristics and quality of the original datasets, such as levels of missing data and the date of most recent update, as well as detailed metadata (see Metadata policy).
This policy is consistent with the recommendations made in the National Statistician's Quality Review.
Back to table of contents5. Roles and responsibilities
Data Linkage Hub
Responsible for:
- ensuring the consistent application of this policy to its members and all ONS staff.
Accountable to:
- the Lead Data Architect and the Data Architecture Delivery Board.
The Data Linkage Lead:
Responsible for:
- updates existing data linkage policy
- reviews data linkage policy on or before its review date
ONS Staff carrying out data linkage
Responsible for:
- complying with the data linking and matching policy
- consulting with the Data Linkage Hub in Data Architecture, ONS before commencing any data linkage activities
Methodology
Responsible for:
- providing advice on data linkage methodology issues
- developing robust and consistent data linkage methods
- providing training and building capability on data linkage methods
Legal Services
Responsible for:
- providing advice on current and evolving legal issues if required.
Senior Information Risk Officer (SIRO)
Responsible for:
- data security.
Information Asset Owner
An Information Asset Owner (IAO) is the senior individual in the business area.
Responsible for:
- an information or data asset.
Accountable to:
- the SIRO.
Data Ethics team
Responsible for:
- providing oversight of the ethics self-assessment tool
- giving advice on ethical issues identified and
- referring any projects with high ethical risks to the National Statistician's Data Ethics Advisory Committee (NSDEC)
National Statisticians Data Ethics Advisory Committee (NSDEC)
Responsible for:
- providing independent advice on data ethics issues, where required.
Accountable to:
- the National Statistician.
Data Governance Committee (DGC)
Responsible for:
- ensuring the consistent application of this policy to all ONS staff
- assessing the organisational risk by matching or linking data
Accountable to:
- the National Statistician's Executive Group.