1. Scope

This policy is applicable to all Office for National Statistics (ONS) staff activities involving web scraping of non-personal data. When obtaining or procuring web scraping services from a third party, the ONS will seek to ensure that the overarching principles contained in this policy are met. The policy outlines important principles of web scraping and provides practical guidance.

This policy does not cover the use of Application Programming Interfaces (APIs). It differs from the ONS Social Media Data Policy, which outlines procedures related to the collection, use and analysis of social media data obtained from social media platforms. It also differs from the ONS Open Data Policy, which provides guidance on collection and use of open government, academic, for-profit and non-profit organisation data for statistics and statistical research.

Back to table of contents

2. Background

The use of alternative data sources is an important element of the Office for National Statistics' (ONS') current five-year strategy, Statistics for the Public Good, for delivering high quality data and analysis to inform the UK, improve lives and build the future. Driven by this strategic goal, ONS staff may use web scraping as an alternative data collection mechanism that can complement and improve traditional forms of data collection such as surveys. Web scraping is the collection of data automatically retrieved from the internet.

The purpose of this policy is to ensure that web scraping at the ONS is carried out transparently, consistently, ethically, and in accordance with all relevant legislation.

Back to table of contents

3. Policy statement

This policy sets out the practices and principles that the Office for National Statistics (ONS) staff will follow when scraping data from websites to produce statistics and conduct statistical research, including exploratory research, which serves the public good.

When web scraping, the ONS will ensure that we minimise any burden on websites, respect the robots exclusion protocol and associated restriction, abide by all applicable legislation, and monitor the evolving legal situation.

Back to table of contents

4. Policy detail

Web scraping is only conducted by the Office for National Statistics (ONS) for the purposes of any one or more of its functions set out in the Statistics and Registration Service Act 2007 and Census Act 1920. The ONS' functions are for the production and publication of official statistics that serve the public good.

The ONS' web scraping will minimize the burden on the website owners. The practices we will follow include, where applicable:

  • delaying accessing pages on the same domain

  • adding idle time between requests

  • limiting the depth of crawl within the given domain

  • when scraping multiple domains, parallelising the crawling operation to minimise successive requests to the same domain

  • scraping at a time of day when the website is unlikely to be experiencing heavy traffic (for example, early in the morning or at night)

  • optimising the web scraping strategy to minimise volumes of requests to domains

  • only collecting parts of pages required for the purpose

If substantial amounts of data are extracted on a regular basis, and/or if web scraping lasts for longer than three months, this information would be communicated clearly with website owners in writing.

Before web scraping activities commence, the ONS will contact website owners, providing information on the purpose and scope of web scraping, duration of the project, how to identify an ONS web scraper, weblink to this policy, and how to opt out. The website owners will have two weeks to respond to the ONS request. If no reply is received, the ONS will take it as no objection and web scraping activities will commence after this period.

There may be an exception to this notice period when there is a strong case on the basis of national interest. This case will be clearly explained to website owners in writing.

During web scraping, the ONS will only visit publicly accessible parts of the sites. The ONS will respect the Robots.txt file and associated restriction and will use it to navigate which parts of sites are allowed for access or not. To distinguish the ONS from a visit by normal users, ONS User Agent String will look like the following example:

Office for National Statistics (https://www.ons.gov.uk/aboutus/transparencyandgovernance/datastrategy/datapolicies/webscrapingpolicy, data.acquisition@ons.gov.uk)

The ONS is fully committed to compliance with the Data Protection Act 2018, ensuring that all processing of data is fair, lawful and transparent. The ONS recognises that there may be ethical and legal issues related to scraping and using data which potentially identifies individuals. The ONS respects Section 39 of the Statistics and Registration Service Act on confidentiality of personal information, and the ONS will not disclose any data which are covered by this protection.

All ONS staff who wish to web scrape must complete the Ethics Self-Assessment form, which will be shared with the Data Ethics team. The Data Ethics team will refer projects to the National Statistician's Data Ethics Advisory Committee (NSDEC) in instances where ethical risks are identified as high.

Back to table of contents

5. Roles and responsibilities

Office for National Statistics (ONS) staff who request web scraping

Responsible for:

  • Complying with the ONS Web Scraping Policy, and consulting with the Data Acquisition team before commencing any web scraping activities.

Accountable to:

  • line managers.

Data Acquisition, Data Acquisition and Operations (DAO)

Responsible for:

  • Advising ONS staff on any alternative and/or existing data sources.

  • Ensuring that web scrapers are fully compliant with the ONS Web Scraping Policy.

  • Receiving and evaluating the web scraping request from ONS staff.

  • Seeking advice from ONS Legal Services, UK Statistics Authority Data Ethics team, and/or the Data Governance Committee (DGC) when needed.

  • Engaging with the website owners' opt-out requests and any enquiries.

  • Keeping all records of and monitoring ONS web scraping activities.

Accountable to:

  • Data Governance Committee.

Email: Data.Acquisition@ons.gov.uk

ONS Legal Services

Responsible for:

  • Providing advice and guidance on current and evolving legal context of open data.

Email: legalservices@ons.gov.uk

UK Statistics Authority Data Ethics

Responsible for:

  • Providing advice on ethical issues as the first point of contact and advising the National Statistician's Data Ethics Advisory Committee (NSDEC) process if additional advice from NSDEC is required.

Accountable to:

  • Head of Data Governance Policy and Legislation.

Email: Data.Ethics@statistics.gov.uk

Data Governance Committee (DGC)

Responsible for:

  • Ensuring the consistent application of this policy to all ONS staff and advising and assessing the organisational risk by conducting web scraping.

Accountable to:

  • National Statistics Executive Group.

Email: DGSDC@ons.gov.uk

National Statistician's Data Ethics Advisory Committee (NSDEC)

Responsible for:

  • Providing independent advice on ethical issues if required.

Accountable to:

  • National Statistician.

See more information about the National Statistician's Data Ethics Advisory Committee.

Back to table of contents