1. Introduction

Office for National Statistics (ONS) is the executive office of the UK Statistics Authority. It is the UK’s national statistical institute and largest producer of official statistics. ONS produces statistics on a range of important economic, social and demographic topics. Official statistics are for the benefit of society and the economy generally and help Britain make better decisions. They allow the formulation of better public policy and the effective measurement of those policies; they inform the direction of economic and commercial activities; they provide valuable information for analysts, researchers, public and voluntary bodies; and they enable the public to hold to account all organisations that spend public money, thus informing democratic debate. ONS has seen a marked increase in the demand for ad hoc insights, alongside traditional statistics, with a substantial increase in the variety and volume of data available and the types of users who wish to gain access to this data for statistics and research purposes.

The increasing volume and variety of data available, coupled with rapid technological development will facilitate the processing and analysis of more data in richer and more complex forms. The ability to harness the power of data is critical in enabling official statistics to support the most important decisions facing the country.

To help meet these new challenges and constraints, ONS has developed a set of data principles to guide our data practices and management.

The ONS Data Principles are informed by the United Nations Economic Commission for Europe (UNECE) Common Statistical Data Architecture (CSDA) key principles, which themselves are based on The Open Group Architecture Framework (TOGAF) data principles.

The ONS Data Principles are categorised by their relevance to the “cradle-to-grave” Data Journey pillars and the cross-cutting Data Management contexts shown in the following diagram and subsequent sections.

Back to table of contents

2. Data ingestion

The following principles are related to the sourcing, acquisition and ingestion of data.

2.1. Source once, use many times

Statement

To facilitate reuse, collaborative solutions and negotiated data arrangements in preference to issuing requests or notices to enable this access will be established for comprehensive data feeds of all the relevant data from the source system with periodic updates – preferably deltas. The data feed should include all records, entities and attributes at the lowest level of granularity (elementary-level).

Note: When applying this principle due consideration must be given to the legal and regulatory as well as the supplier constraints.

Rationale

The main drivers are:

  • promote reuse – offers the best possibility for reuse of data across the organisation, and ensuring it is fit for purpose

  • reduce burden on respondents and data suppliers – a comprehensive elementary-level data feed will reduce repetition or duplication and thus ensure efficient use; this applies equally to survey and other supplied data

  • future-proof data feeds – creating a comprehensive elementary-level data feed at the outset eliminates the need for rework later as requirements change thus reducing subsequent delivery timelines, effort and expenditure

  • simplify the sourcing process – by not applying logic to restrict or manipulate the data content at source also minimises the impact upon the source system (both during development and at runtime) as well as speeding up the delivery of data

Implications

The main implications are:

  • extra initial effort – to ingest a comprehensive elementary-level data feed will take extra effort; however, this increase is not linear and it will be significantly less than the effort required to modify the data feed later

  • higher storage costs – the larger datasets will incur higher storage costs; over time as more of the data are used, this cost will become less of an issue, also, the costs incurred for the additional storage will be significantly lower than the cost of modifying the data feed later

  • increased network traffic – the larger datasets will result in higher network traffic; over the long-term, however, having a single extract of data for onward use will have a lower impact than multiple smaller extracts, and should give a predictable network load

  • higher up-front cost – the full sourcing cost will be incurred at the outset irrespective of whether this is above and beyond the initial data requirements; however, this should reduce the need to go back to the source system for subsequent data requirements

  • effective data management – requires a good understanding of the wider organisational requirements for managing the received data

  • effective data governance – data governance processes and policies need to be in place to ensure that data can be found and used effectively

  • consider future data needs – potential future requirements also need to be taken into consideration when negotiating the agreement with the supplier

  • synchronisation with source – periodic full refresh of source data will be required in order to ensure the delta feeds are fully synchronised with the source system

2.2. One way in

Statement

Data feeds can arrive via any one of the agreed strategic routes. However, there is only one entry point for ingestion into a strategic data store – that is, data is not loaded directly into any of the downstream layers.

Rationale

The main drivers are:

  • border control – manage data and metadata quality, consistency, integrity and security at the point of entry

  • data inventory – simplifies the process for maintaining a detailed inventory of inbound data that can be made available to the users

  • avoid duplication – reduces the risk of duplicating feeds

Implications

The potential implication is:

  • perceived lack of flexibility – users will not be able to load data directly into any of the downstream layers which may be perceived as slowing them down; however, this can be mitigated with streamlined ingest processes.

2.3. No data without metadata

Statement

Metadata are relevant to all phases of the data journey and is driven by both ONS business and external user needs.

Specifically, appropriate metadata accompanies ALL data received and is provided for data (datasets and/or data elements) that are created subsequently and/or derived from any received data.

Rationale

Metadata are needed to correctly and efficiently manage the ingestion, storage and subsequent usage of the data.

The main drivers are:

  • data provenance and lineage – the metadata will provide data provenance as well as business and technical lineage of the ingested datasets and data elements

  • impact assessment – the lineage will allow timely and cost-effective technical assurance for any change projects, impact assessment or any new projects

  • data classification – metadata will facilitate the categorisation of data elements in business terminology, conforming to agreed frameworks and the identification of synonyms, as well as further enrichment by attaching business metadata and workflow for data stewardship

  • data discovery – metadata will enable users to search and explore datasets based upon concepts, variable names and links to similar concepts (semantics, and so on), which will be particularly important for data exploration and data science

  • data governance – a good understanding of the data will expedite the addressing of data quality issues as well as the enforcement of data security and privacy requirements

  • process resilience – data processing can adapt to changes in source data structures

Implications

The potential implications are:

  • user access – users will not have access to data unless metadata is provided

  • upfront effort – the organisation will need to invest in the creation and management of metadata

  • understand business process – both data as a service (DaaS) and the downstream developers must understand the relationship between business process and data to allow the correct processing of meaningful metadata throughout the “data journey”; this is part of treating data as an asset

  • standards and guidelines – need to define robust standards and guidelines for creating, processing, provisioning and subsequently disseminating metadata

  • effective metadata governance – governance processes and policies need to be in place to ensure that metadata is managed efficiently and effectively

  • solutions and processes – solutions and processes need to be put in place for the effective and efficient collation and management as well as search, discovery and utilisation of metadata

2.4. Retain as-received data

Statement

Data received from source systems is stored in an “as-received” state – that is, before it is processed (any filtering, transformation, aggregation, and so on, is applied).

Rationale

The main drivers are:

  • minimise source system impact – in the event of an error, it should be possible to rollback and reload the data into downstream layers without the source systems having to recreate the feeds unnecessarily

  • historic view of received data – provide the capability to retrospectively view the data as it was received; this is particularly useful as operational systems often tend to maintain only the current view

  • operational data archive – provide a read only data archiving service for decommissioned operational systems

Implications

The potential implications are:

  • higher storage costs – keeping a copy of all the feeds for the appropriate retention periods will require significant storage; however, this can be mitigated by opting for a cost-effective storage solution

  • retention policy – as-received records should be stored for the maximum length of time permitted by the data retention policy; after this point in time the record must be deleted automatically

  • change management – effort is required to ensure that changes to the data structure of source feeds can be accommodated, and that retention of as-received data can continue; the metadata feed should enable automated processes to manage these changes

Back to table of contents

3. Data processing

The following principles are relevant to the internal preparation and storage of data.

3.1. Process all data

Statement

All incoming data are loaded, processed and made available for provisioning.

Subsets of data can be subsequently created as part of the provisioning process.

Note: When applying this principle due consideration must be given to the legal and regulatory as well as the supplier constraints.

Rationale

The main drivers are:

  • business continuity – if a data feed contains an acceptable number of quality issues and/or errors, all data should be processed and provisioned to consumers

  • reduce requests for resubmission – by managing any data quality issues and errors at this stage, minimises the need to request resubmissions from the data suppliers

  • reduce future effort – the effort needed to augment an existing dataset will be considerably less if the additional data elements are readily available and there is no need to fetch them from the underlying layers and/or source

Implications

The potential implications are:

  • data error and exception handling – this approach will require error and exception handling routines to process all data

  • comprehensive error and exception reports – to ensure that all data quality issues are forwarded to the appropriate data stewards will require a comprehensive suite of error and exception reports

  • data correction process – wherever it is practical and/or feasible, data correction will be done within the strategic data store; only as a last resort should a replacement feed be requested from the data supplier

  • de-identification of sensitive data – sensitive personal identifying information must be appropriately masked, pseudonymised or anonymised to suit the relevant usage scenario

3.2. Store elementary-level data

Statement

Data are stored at the lowest level of granularity available (elementary-level). They can subsequently be rolled up as part of the provisioning process.

Rationale

The strategic benefits for storing data at the elementary-level are as follows:

  • drill-down/roll-up – elementary-level data can always be rolled up or drilled into as and when needed but elementary-level details cannot be derived from pre-aggregated information

  • future-proof data needs – storing elementary-level data not only fulfils the current requirements but it also provides the flexibility and extensibility to handle future requests with reduced effort

  • improve data quality process – elementary-level data are useful for troubleshooting data quality issues and they also provide lineage back to the underlying data in the source systems

Implications

The potential implications are:

  • higher storage costs – there will be higher storage costs due to the increased data volumes; over time as the benefits are realised, this cost will become less of an issue; also, the costs incurred for the additional storage will be significantly lower than the cost of altering the granularity later

  • extra initial effort – extra up-front effort may be needed to design, build and test the entities for housing the elementary-level data; however, subsequent rework will be significantly reduced

  • higher up-front cost – the full cost of storing elementary-level data will be incurred at the outset irrespective of whether this is above and beyond the initial requirements; however, this should reduce the need to go back to previous layers or even the source system to meet new granular data requirements

3.3. Maintain change history

Statement

A history of data changes is maintained using appropriate industry standard techniques.

Scenarios where this principle is relevant include:

  • a full dataset is supplied on a periodic basis where the temporal aspects of the dataset must be identified to support historical analysis

  • a full or partial dataset is re-supplied to correct issues with previously supplied data

In certain scenarios, it may be appropriate to maintain change history at the dataset level rather than the record level.

Rationale

The main drivers are:

  • historic views of data – the main benefit of having the full change history is that it will provide the capability to recreate historic views of the data

  • audit purposes – change history is needed for audit purposes and may also be needed to support certain legal and regulatory requirements

  • data provenance and lineage – the change history will support data provenance as well as business and technical lineage of the datasets and data elements

Implications

The potential implications are:

  • logical deletes only – data will not be physically deleted but logical (or soft) deletes will be implemented instead; this gives the capability to recreate historic views of the data

  • higher storage costs – there will be higher storage costs due to the increased data volumes; over time as the benefits are realised, this cost will become less of an issue

  • extra initial effort – additional effort will be needed to design, build and test the history handling functionality; however, subsequent rework will be significantly reduced

  • higher up-front cost – the full cost of developing the history handling functionality and additional storage costs will be incurred at the outset irrespective of whether this is above and beyond the initial requirements; however, these costs will be significantly lower than the cost of altering the granularity later

3.4. Create consistent keys

Statement

Surrogate keys (system-generated unique identifiers) are assigned when the data is cleansed, conformed and integrated. These keys then remain unchanged in all subsequent downstream layers. These keys are used in place of business keys by a system to ensure data consistency. They enable data linking and the de-identification of sensitive personal identifying information.

Rationale

The main drivers are:

  • enables data linking – having consistent key values enables the linking of datasets by joining of common data elements

  • de-identification of sensitive data – surrogate keys enable the masking, pseudonymising and/or anonymisation of sensitive personal identifying information

  • traceability of data – simplifies the traceability of data for lineage and audit purposes

  • changes over time – surrogate keys allow the consistent storage and retrieval of record history over time

Implications

The potential implications are:

  • key management – a robust mechanism is required for key generation and validation, however, this one-off cost will be rapidly recovered by the benefits outlined previously

  • tracing masked data – it may be possible to trace masked data back to the underlying raw identifying data; however, this will not be an issue as all user access will be managed and controlled, as defined by the principle “appropriate access (role based)”

Back to table of contents

4. Data provisioning

The following principles cover the sharing of data by authorised users.

4.1. Share data

Statement

Data from the various source systems are conformed and consolidated into integrated views, of a strategic data store, which can be shared by authorised users, in accordance with ONS Data Security principles.

Note: When applying this principle due consideration must be given to the legal and regulatory as well as the supplier constraints. Rationale

The main drivers for sharing data from a strategic data store are:

  • single version of the truth – data from the numerous operational systems (as well as external sources) will be integrated to provide a “single version of the truth” and then made available to the users

  • eliminate data inconsistencies – having separate repositories will inherently result in data inconsistencies between them; having a strategic data store will eliminate such inconsistencies

  • reduce maintenance costs – having a strategic data store is less costly to maintain than having separate repositories of duplicated data for the various groups of users as changes are only implemented in one place

Implications

The potential implications for sharing data from a strategic data store across disparate groups of users are:

  • collaborative approach – the various user groups need to adopt a collaborative approach towards the maintenance and access of the data

  • enterprise data model – sharing metadata and conforming to an enterprise data model will go a long way towards ensuring the efficacy of the common shared environment

  • data security – giving users access to a common shared environment places a greater emphasis on managing data security more rigorously

Back to table of contents

5. Data publishing

5.1. Publish data via approved routes

Statement

Data are made available to external parties via a website or by direct access to provisioned data.

The full set of ONS Data Publishing principles are included in an Annex to this document.

Rationale

The main driver is:

  • self-service – enable both tailored and customised publishing of data

Implications

The potential implications are:

  • reduce requests for data – by making the data widely available on a self-service basis there will be a reduction in the number of requests for data made on an ad-hoc basis directly to ONS Customer Service staff

  • reduced costs – as a direct result ONS costs will be reduced

Back to table of contents

6. Data security

6.1. Secure data storage

Statement

Data at rest is appropriately protected in support of the Data Protection Act (DPA), General Data Protection Regulation (GDPR) and ONS Security Policy.

Rationale

The main drivers are:

  • The Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • ONS Security Policy – a definition of the various mechanisms employed by ONS to provide security for data and more widely

  • ONS Security Principles – the approach taken to security for the Data Access Platform (DAP)

Implications

The potential implications are:

  • higher administrative effort – data will need to be classified dependent upon their security classification

  • increased technical effort – various tiers of infrastructure will be required to match the data security classification and techniques for data storage identified

6.2. Secure data transmission

Statement

Data “in flight” is appropriately protected in support of the Data Protection Act (DPA), General Data Protection Regulation (GDPR) and ONS Security Policy.

Rationale

The main drivers are:

  • The Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • ONS Security Policy – a definition of the various mechanisms employed by ONS to provide security for data and beyond

  • ONS Security Principles – the approach taken to security for the Data Access Platform (DAP)

Implications

The potential implications are:

  • higher administrative effort – data will need to be classified dependent upon their security classification

  • increased technical effort – various transmission mechanisms will be required to match the data security classification

6.3. Legal and regulatory compliance

Statement

Data stores comply with all relevant laws, policies, regulations and standards of good practice regarding the storing, processing, sharing, viewing and disposal of data.

In addition, the data stores also comply with any restrictions to data usage that have been imposed by the data supplier via a Service Level Agreement (SLA) and/or Memorandum of Understanding (MoU).

Rationale

The main drivers are:

  • compliance is compulsory – there is generally a mandatory requirement to comply with all relevant laws, policies, regulations and standards of good practice

  • avoid the consequences – compliance can be extremely challenging particularly where there are conflicts between opposing requirements – for example, the regulatory requirement to store detailed information versus an individual’s right to anonymity; however, the consequences of noncompliance are severe and therefore should be taken seriously; they are as follows:

    • investigation by the relevant authorities
    • risk of prosecution, claims for damages or other civil proceedings
    • incurring fines, penalties and even custodial sentences
    • loss of reputation and public confidence
  • provides indirect benefits – in addition to avoiding the scenarios outlined previously, implementing a good compliance approach can also result in the following indirect benefits:

    • data quality and efficiency improvements
    • improves trust in the ONS brand
    • improvements in general risk management
  • getting to “know the customer” better

Implications

The implications of implementing a robust compliance approach are as follows:

  • compliance training – both technical staff and business users must be made aware of the compliance rules and regulations regarding the storage, processing, sharing, viewing and disposal of the data

  • high implementation cost – implementing the appropriate processes to meet the compliance requirements can be very complex, time consuming and hence costly

  • compliance specialists – there may be a need to recruit specialist consultants or provide training for the existing staff

  • evidence of compliance – (for example, impact assessment reports) should be retained as it may be required for data governance purposes (for example, when investigating possible misuse of data)  

Back to table of contents

7. Data access

7.1. One way out

Statement

User access for both internal and external consumers is provided in a controlled manner via a single logical data access layer.

Rationale

The main drivers are:

  • border control – manage data access, quality, consistency, integrity and security in one location – that is, data access layer; this will eliminate data anomalies, inconsistencies and “dirty” reads that could occur by accessing the underlying layers directly

  • lockdown data preparation layers – giving users direct access to the underlying data preparation layers not only raises security concerns but will frequently result in data read inconsistencies; there is also the possibility of inadvertent record-locking causing the data preparation processes to fail

Implications

The potential implications are:

  • tailored views – various ”views” of the data will be required to support the concept of “lenses”, that is, different users will view and access data differently according to the principle of appropriate access (follows)

  • higher administrative effort – data will need to be classified dependent upon their security category and sensitivity

  • increased technical effort:

    • various transmission mechanisms will be required to match the data security category
    • the dissemination domain (website and open API) will need to support data that is fit for dissemination purpose

7.2. Appropriate access (role-based)

Statement

Access to data is controlled dependent upon the role of the party requiring access to data.

Rationale

The main drivers are:

  • compliance – compliance with legal and regulatory mandates and the ONS Data Confidentiality principle associated with sharing and viewing of data

  • user need – users who need and are entitled to access data for their work can do so

Implications

The potential implications are:

  • higher administrative effort:

    • data sensitivity must be assigned to data
    • user roles must be defined
    • user role/sensitivity matrix must be completed and maintained
  • increased technical effort – various technical mechanisms will be required to support the assignment of data sensitivity and user roles

7.3. Timely access to data

Statement

Access controls do not unduly delay access to data.

Rationale

The main drivers are:

  • service level agreements (SLAs) and memorandum of understandings (MoUs) – conformance with any SLA or MoU

  • efficient data provisioning – timely access to data following an initial user request

Implications

The potential implications are:

  • higher administrative effort – an SLA or MoU will be established for each data access route for each dataset

  • increased technical effort – various technical mechanisms will be required to support any SLA or MoU

Back to table of contents

8. Data retention and storage

8.1. Keep data for long enough

Statement

Retain personal data and metadata no longer than is necessary for the purpose it was obtained for (Information Commissioner's Office (ICO), 2018).

Rationale

The main drivers are:

  • ICO Data Protection Principles – Schedule 1 to the Data Protection Act (DPA) defines the Data Protection Principles

  • the Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • National Statistician’s Data Ethics Committee (NSDEC) Principles – the use of data has clear benefits for users and serves the public good

Implications

The potential implications are:

  • increased storage requirements – as a direct result of retaining data, the amount of storage required will increase

  • higher administrative effort to:

    • define and agree a retention period for both data and metadata
    • allocate the retention period to data assets
    • record (as metadata) the purpose the data were obtained for
  • reputational damage – the reputation of ONS could suffer because of the lack of data available, if they are deleted too soon

8.2. Don’t keep data for too long

Statement

Information is not to be kept forever – it is deleted when it is no longer needed for historical, statistical or research purposes (Information Commissioner's Office (ICO), 2018).

Rationale

The main drivers are:

  • ICO Data Protection Principles – Schedule 1 to the Data Protection Act (DPA) defines the Data Protection Principles

  • the Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • National Statistician’s Data Ethics Committee (NSDEC) Principles – the use of data has clear benefits for users and serves the public good

Implications

The potential implications are:

  • higher administrative effort to:

    • ensure data are removed when the retention period is attained or when the data are no longer required
    • need to have the ability to retain data and metadata on a case-by-case basis in order to ensure that evidence is not lost if supporting an ongoing data misuse investigation
  • increased technical effort:

    • to ensure the data, including any backups, are removed
    • need to have the ability to retain data and metadata on a case-by-case basis in order to ensure that evidence is not lost when supporting an ongoing data misuse investigation.
  • record metadata for data that has been backed up, restored from backup and disposed of (this covers the full data lifecycle, and includes the location of the data)

8.3. Respect local laws or regulations

Statement

Where necessary data are retained compliant with any local mandates.

Rationale

The main driver is:

  • Local Data Protection mandates – there is local jurisdictional data protection legislation that must be adhered to when data are either provided or consumed within those jurisdictions

Implications

Depending upon the nature of the local mandates, potential implications are:

  • increased storage requirements – the amount of storage required may increase

  • more complex storage requirements – local mandates may require varied storage requirements

  • higher administrative effort to:

    • understand and implement processes to address local mandates
    • allocate the retention period to data assets
  • reputational damage – the reputation of ONS could suffer because of the lack of adherence with local mandates

8.4. Archive

Statement

Data no longer required for immediate access and are taken off-line for future reference, but are held equally securely as currently operational data.

Rationale

The main drivers are:

  • cost reduction – archived data storage is less costly than that of operational data storage

  • performance improvements – as less data will be located on prime infrastructure, freeing capacity for processes that require operational data

Implications

The potential implications are:

  • more complex storage requirements – managing both archive and operational data storage will increase the complexity

  • higher administrative effort – to determine which data can be stored in an archive environment

  • increased technical effort – recording metadata for data which has been archived, restored from archive and disposed of (this covers the full data lifecycle, and includes the location of the data)

8.5. Back-up

Statement

Data that are copied for business readiness reasons, are held equally securely as currently operational data.

Rationale

  The main driver is:

  • ONS Business Continuity Plan (BCP) – data must be copied so that in the event of a disaster, data can be made available to users and consumers from an alternative data store or the data restored

Implications

The potential implications are:

  • additional off-site storage requirements – to ensure that copied data is unaffected by a disaster, one approach is to store the data in a different location

  • higher administrative effort – to define and agree the data to be backed-up

  • the reputation of ONS could suffer – because of the lack of availability of data that have been backed-up

  • increased technical effort:

    • to ensure the data are copied to the back-up infrastructure and on the correct media type
    • to periodically ensure the restore from back-up process functions correctly
  • record metadata for data which has been backed-up, restored from back-up and also disposed of (this covers the full data lifecycle, and includes the location of the data)

Back to table of contents

9. Personal data identification

9.1. Data confidentiality

Statement

The confidentiality of data subjects is protected, through appropriate application of the Five Safes Framework.

The Five Safes Framework reflects the different dimensions of data use, that can be controlled, to protect confidentiality. Depending on the sensitivity of the data, and the intended use, each dimension can be controlled to a greater or lesser extent. The five dimensions are:

  • people

  • projects

  • settings

  • outputs

  • data

Rationale

The main drivers are:

  • ensure confidentiality – demonstrate the steps we take to ensure confidentiality, to maintain the trust of the various parties (data subjects, data owners and regulators)

  • flexible framework – to enable different sensitivities of the same data to be used for different purposes (for example, ONS internal use of data versus public consumption of published statistics)

Implications

The potential implications are:

  • higher administrative effort – to assess that the proposed controls for each dataset and its use are appropriate and properly implemented

9.2. Data de-identification

Statement

Where necessary data that enables personal identification are removed.

Rationale

The main drivers are:

  • the Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • National Statistician’s Data Ethics Committee (NSDEC) Principles – the use of data has clear benefits for users and serves the public good

Implications

The potential implications are:

  • higher administrative effort – to identify data items that could be used to identify individuals, typically described as “personal identifiers”

  • increased technical effort – to implement technical solutions to support the business rules

  • the reputation of ONS could suffer – if an individual’s data can be identified

9.3. Data anonymisation

Statement

  Anonymising data requires that identifiers are:

  • removed

  • obscured

  • aggregated

  • altered in some way

The term “identifiers” is often misunderstood to simply mean formal identifiers such as the data subject’s name, address and unique identification numbers, for example, a Social Security or National Health Service number. But identifiers can in principle include any piece of information, or combination of pieces of information, that makes an individual unique in a dataset and as such vulnerable to re-identification.

Rationale

The main drivers are:

  • ONS Statistical Disclosure Control (best practice) – best practice for applying disclosure control to statistical data, as defined by ONS Methodology

  • the Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • National Statistician’s Data Ethics Committee (NSDEC) Principles [13] – the use of data has clear benefits for users and serves the public good

Implications

The potential implications are:

  • higher administrative effort – to identify business rules for different anonymisation scenarios

  • increased technical effort – to implement technical solutions in support of the business rules

9.4. Data is sensitive

Statement

All data that ONS manages have a “value” for business use and a sensitivity based on their content. The security required to protect data is based on their sensitivity. Assessment of sensitivity is undertaken by compliance with the ONS Data Sensitivity mode.

Rationale

The main drivers are:

  • the Data Protection Act (DPA) – controls how personal information is used by organisations, businesses or the government

  • General Data Protection Regulation (GDPR) – the GDPR aims primarily to give control to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU

  • ONS Data Sensitivity Model

Implications

The potential implications are:

  • higher administrative effort – to identify business rules for the sensitivity of data

  • increased technical effort – to implement technical solutions in support of the business rules

Common terms and usage

These principles relate to consistent terminology being applied to data, in all phases of the “data journey”, enabling re-use of data consistently.

Back to table of contents

10. Common terms and usage

10.1 Common controlled vocabulary

Statement

All users have a common understanding of the various terms and their usage. These terms are unique and have an unambiguous, non-redundant definition that is managed centrally by a controlled vocabulary registration authority.

Note: The scope of this principle encompasses the entire process from inbound data feed through to consumption by the end user.

Rationale

The main drivers are:

  • eliminate inconsistencies – having a common understanding of the various terms and their usage will eliminate inconsistencies and misunderstanding between the various user groups

  • efficient communication and data exchange – having a common understanding will facilitate effective communication between users as well as expediting data exchange between systems

  • single version of the truth – having a common understanding of the various terms and their usage will promote a “single version of the truth”

  • enable re-use of data elements stored in a heterogeneous manner

Implications

The potential implications are:

  • enterprise-wide alignment – everyone across the enterprise needs to buy into the concept of a common controlled vocabulary and the value that it brings

  • upfront effort – to establish a common controlled vocabulary, a significant initial investment will be required to collate and agree the various terms and their usage across the enterprise

  • maintenance and governance – to ensure that the common controlled vocabulary remains up-to-date and is used correctly, processes need to be set up and resources allocated for ongoing maintenance and governance

Data principles as of January 2019.

For further details please contact data.architecture@ons.gov.uk

Back to table of contents