The amount and variety of data that is available is growing rapidly and at a quicker pace. There is a wider range of data available in many formats, including audio, video, computer logs, purchase transactions, sensors and social networking sites. This has created Big Data, which are large, often unstructured datasets that are available, potentially in real time. At the same time, new data science techniques for maximising the value of both Big Data and other data sources are constantly being developed.
Big Data is a big topic. As the UK's largest producer of official statistics, we want to understand the effect it may have on our statistical processes and outputs. Our Big Data Team are investigating the advantages and challenges of using alternative sources of data and data science techniques in official statistics. This includes projects such as exploring web-scraped price data, machine learning for matching addresses and natural language processing for coding textual survey responses.
We regularly publish the outcomes of our work, and occasionally blog about it too. You can find some of our reports in the download section of this page, or for a complete list you can visit our Github.io page, which also contains some of our code.
We are committed to protecting the confidentiality of all information we hold. To produce statistics using alternative data sources, we are only interested in trends or patterns that can be seen, not personal data about individuals. However, we recognise that accessing data from the private sector or from the internet may raise concerns around security and privacy. We ensure that all of our work fully complies with legal requirements and our obligations under the Code of Practice for Official Statistics. We also work closely with the National Statistician’s Data Ethics Advisory Committee to consider the ethical issues associated with using these types of data sources within official statistics. For instance, we have developed guidance for web-scraping in official statistics.
For more information about the Big Data Team and its projects, please contact us via email at email@example.com.
Want to work on data science within government? Go to the Government Statistical Service Data Scientist recruitment page.
You might also be interested in:
- ONS methodology working paper series no. 8- Statistical uses for mobile phone data: literature review
- ONS methodology working paper series no. 11 – Identifying caravan homes in Zoopla data: June 2017
- ONS methodology working paper series number 13 – comparing the density of mobile phone cell towers with population estimates
- Progress report - January to March 2015 (256.0 kB pdf)
- Progress report - October to December 2014 (429.3 kB pdf)
- Progress report - July to September 2014 (370.6 kB pdf)
- Progress report - April to June 2014 (464.1 kB pdf)
- Progress report - January to March 2014 (53.9 kB pdf)
- Analysing low electricity consumption using DECC data (506.6 kB pdf)
- Web scraped data: Extreme price changes (107.0 kB pdf)
- University of Southampton report - Using energy metering data to support official statistics: A feasibility study (4.8 MB pdf)
- GSS methodology series No 40 - Modelling sample data from smart-type electricity meters to assess potential within official statistics (2.1 MB pdf)
- ONS methodology working paper series No 5 - Comparing travel flows between 2011 Census and Oyster card data (1.5 MB pdf)
- ONS Methodology Working Paper Series No 7 - Comparing counts of electricity meters and addresses by postcode in England and Wales (808.3 kB pdf)
- Research indices using web scraped data: May 2016 update (622.0 kB pdf)
- Using geolocated Twitter traces to infer residence and mobility (1.3 MB pdf)