1. A to E

Activity data

“Activity” can be defined as an individual interacting with an administrative system, for example, for National Insurance or tax purposes, when claiming a benefit, attending hospital or updating information on government systems in some other way. Only demographic information (such as name, date of birth and address) and dates of interaction are needed from such data sources to improve the coverage of our population estimates.

“Activity” data can also be referred to as “interactions’, “signs of life” and “signals” data.

AddressBase

A geographic dataset of addresses, properties and land areas where services are provided. It includes properties that have been demolished and those yet to be built.

Administrative data

This is data that people have already provided to government, for example, in the course of accessing public services. Some of this data could be re-used by the Office for National Statistics (ONS) to produce statistics about the population.

ONS has been using administrative data for many years. For example, annual births and deaths statistics are used, as well as NHS patient registrations, to roll forward the population estimates between censuses.

Benefit unit

A single adult or a married or cohabiting couple and any dependent children within a benefit claim.

Child Benefit

A benefit paid for each child aged under 16 years, or aged under 20 and still in full-time non-advanced education (or on unwaged training). It is administered by HM Revenue and Customs (HMRC).

CIS

Customer Information System of the Department for Work and Pensions, an administrative data source which contains a list of people who have a National Insurance number.

Coherence

In our research outputs, coherence refers to the change in Statistical Population Dataset (SPD) population estimates, relative to official estimates.

Communal establishment

An establishment providing managed residential accommodation. “Managed” in this context means full-time or part-time supervision of the accommodation.

Confidence interval

A 95% confidence interval is a range within which the true population would fall for 95% of the times the sample survey was repeated. For example, for a 95% confidence interval, the true (unknown) value of the estimate would be expected to lie within it 19 times out of 20. A more detailed explanation is provided in the Paper O4.

It is a standard way of expressing the statistical accuracy of a survey-based estimate.

Deterministic matching

Uses match-keys to link records across administrative data sources. If the match-keys are the same on both sources, the records are matched.

DWP

Department for Work and Pensions.

Earnings

Received by employees in return for employment.

Equivalised household income

Adjusted household income which takes into account the number of people living in the household and their ages. This allows for the fact that households with many members are likely to need a higher income to achieve the same standard of living as households with fewer members.

Back to table of contents

2. F to L

Family Resources Survey (FRS)

A continuous household survey which collects information on the income and circumstances of a representative sample of private households in the UK.

Flows, Flow-based approach

This refers to population change between 2 time points, due to births, deaths and movements of people through internal and international migration.

FP69 flag

This is an indicator that there has been no contact between a GP practice and an individual in the last 15 months. An attempt to contact at the address has been made by the GP practice but no communication has been received. It cannot be assumed that the individual still resides at the address (these records are excluded from Statistical Population Dataset (SPD) V1.0 and SPD V2.0).

Government Statistical Service (GSS)

Community of civil servants who collect, analyse and publish official statistics to help government, business and the public understand the current state of the UK economy and society.

Gross income

Income before deductions such as National Insurance contributions and tax.

Half-weighting

The method used in Statistical Population Dataset (SPD) V1.0 to assign person records to a local authority (LA) or Lower Layer Super Output Area (LSOA). This happens when a match is made between Patient Register (PR) and Customer Information System (CIS) records (not Higher Education Statistics Agency (HESA)) but the addresses do not agree. The record is allocated a weight of 0.5 (a “half-weight”) in each of the 2 areas given by the PR and CIS addresses. In SPD V2.0, redistribution is used to overcome disagreement of addresses.

HESA

Higher Education Statistics Agency whose data contains a list of students who are registered on to a Higher Education course in England and Wales.

HMRC

Her Majesty’s Revenue and Customs

Individual Savings Account (ISA)

A tax-free government savings scheme, usually arranged via a bank or building society.

Internal migration

This refers to residential moves between different geographic areas within the UK. This may be between local authorities, regions or countries of the UK. It excludes moves within a single local authority, as well as international moves into or out of the UK.

International migration

We use the UN recommended definition of a long-term international migrant: “A person who moves to a country other than that of his or her usual residence for a period of at least a year (12 months), so that the country of destination effectively becomes his or her new country of usual residence.”

Labour Force Survey (LFS)

A survey of the employment circumstances of the UK population. It is the largest household survey in the UK and provides the official measures of employment and unemployment.

Links

Are created by identifying similar or identical records across datasets.

Local authority

Geographical area with a total population of between 2,200 and 1,074,000 people.

Lower Layer Super Output Area (LSOA)

Total population between 1,000 and 3,000 people, average 1,600 people.

For an introduction to the different types of geography see our Beginner's Guide to UK Geography.

Back to table of contents

3. M to R

Match-keys

Are used in the deterministic matching of administrative datasets. They are created by combining identifying variables (or parts of them) such as name, sex, date of birth and postcode in all of the datasets to be matched.

Middle Layer Super Output Area (MSOA)

Geographical area with a total population of between 5,000 and 15,000 people.

MYEs

Mid-year population estimates

National Benefits Database (NBD)

Database containing information on a number of benefits administered by the Department for Work and Pensions (DWP) – Jobseeker’s Allowance, Income Support, Incapacity Benefit, Severe Disability Allowance, Employment and Support Allowance, Carer’s Allowance, Widow’s benefits or bereavement benefits, Disability Living Allowance, Pension Credit, State Pension and Attendance Allowance.

Net income

Income after deductions such as National Insurance contributions and tax.

Net migration

Net migration is the difference between people moving into an area and people moving out of the same area. If net migration is positive then it means that more people have moved to live in the area than have left to live elsewhere.

NHS

This is the lowest geographical level at which census estimates are provided. Total population between 100 and 625 people, average 300 people.

Occupied address

Addresses with at least 1 “usual resident” included in our Statistical Population Dataset (SPD).

Official estimates

In our research outputs, official estimates refer to population statistics from the 2011 Census and annual mid-year population estimates (MYEs).

Output Area (OA)

This is the lowest geographical level at which census estimates are provided. Total population between 100 and 625 people, average 300 people.

P1 quality standard

Maximum quality standard set out in the evaluation criteria of the Beyond 2011 programme. This is equivalent to the maximum quality achieved in the current system (that is, what is achieved in a census year), every year (see Paper O2 for more information).

P3 quality standard

Average quality standard set out in the evaluation criteria of the Beyond 2011 programme. This is equivalent to the average quality achieved in the current system, every year (see Paper O2 for more information).

Pay As You Earn (PAYE)

PAYE is the system used by HM Revenue and Customs (HMRC) to collect and account for Income Tax on earnings from employment and pensions.

PDS

Personal Demographic Service from NHS Digital. A national electronic database of NHS patients, which contains only demographic information with no medical details. The PDS differs from the Patient Register (PR) since it is updated more frequently and by a wider range of NHS services. The PDS data available to ONS consists of a subset of the records, including those which show a change of postcode recorded throughout the year or a new NHS registration.

Pensioner unit

Benefit units who are single pensioners (individuals over State Pension age) or pensioner couples (married or cohabiting pensioners where 1 or more are over State Pension age).

Person Identification Number (PIN)

A unique identifier assigned to each person on a dataset. This must be common to every dataset they appear on. Such an identifier is not available in administrative data in the UK – instead, each data source tends to have its own unique identifier, such as NHS number on the Patient Register (PR), National Insurance number on the Customer Information System (CIS).

Population Coverage Survey (PCS)

The aim of the PCS is to measure and adjust for coverage errors on the Statistical Population Dataset (SPD) and produce unbiased population estimates. The current assumption for the PCS design is that it would cover approximately 1% of the population in England and Wales, or around 350,000 households. The development of the proposed design is described fully in Paper M8.

PR

Patient Register from NHS Digital, which contains a list of all patients who are registered with a GP in England and Wales.

Probabilistic matching

This is an approach that identifies links between records in 2 datasets by comparing and quantifying the relative similarity of records (for example, giving a similarity score).

The main difference from deterministic matching is that probabilistic matching does not require record values to be identical in some respect ((for example, identical names).

In the context of research outputs, this is used in Statistical Population Dataset (SPD) V2.0 to find additional Patient Register (PR) to Customer Information System (CIS) matches that are not found by deterministic matching.

Pseudonymisation

Pseudonymisation is a procedure by which identifying fields (that is, names, dates of birth and addresses) within a data record are replaced by one or more artificial identifiers to protect the privacy of individuals.

Quinary age groups

5-year age groups.

Redistribution

The method used in SPD V2.0 to improve on the half-weighting used in SPD V1.0. This uses supplementary “activity” data and modelling to determine the most likely address from those available. Then the record can be assigned wholly to the most likely address, removing the need for “half-weights”.

Residual record

A record where it hasn’t been possible to link all of the address records from the Patient Register (PR), Customer Information System (CIS) and English School Census to AddressBase, or where it hasn’t been possible to lift a unique property reference number (UPRN) for all Higher Education Statistics Agency and Welsh School Census records that match on postcode to either the PR or CIS.

Resolution

The approach applied in Statistical Population Dataset (SPD) V2.0 to overcome combinations of matches that are inconsistent with each other, such as a Higher Education Statistics Agency (HESA) record that has links to 2 different Patient Register to Customer Information System pairs. The relative reliability of each match-key is used to decide which links are the most likely true matches and links conflicting with these are rejected.

Back to table of contents

4. S to Z

School census (SC)

Refers to the English and Welsh school censuses from the Department for Education and the Welsh government. It includes information on all pupils attending state schools in England and Wales. (State schools are those maintained by the Department for Education or local authorities).

Single Housing Benefits Extract (SHBE)

Dataset containing information on Housing Benefit claims provided to DWP by local authorities.

Small area model-based income estimates (SAIE)

The official estimates of weekly household income at middle layer super output area level for England and Wales.

Statistical Population Dataset (SPD)

A single, coherent dataset that forms the basis for estimating the size of the resident population. It is produced by linking records across multiple administrative data sources and applying a set of inclusion and distribution rules.

Statistical Spine

The statistical spine is a dataset comprising all the unique records from the Patient Register, Customer Information System, Higher Education Statistics Agency and school census datasets. It contains information about the links that have been made between records across these administrative sources.

Stock counts or Stock based approach

Refers to a snapshot of the population at a point in time.

Tax Credits

Tax Credits are paid by HM Revenue and Customs. There are 2 types: Child Tax Credit provides support to help with the cost of raising a child for those with low incomes; and Working Tax Credit is a payment to boost the income of working people who are on a low income.

Unique property reference number (UPRN)

A unique alphanumeric identifier for every spatial address in Great Britain.

Parent unique property reference number (UPRN)

Hierarchy of address information in AddressBase, for example, a block of flats but no individual entry for each flat contained within the block.

Child unique property reference number (UPRN)

Hierarchy of address information in AddressBase, for example, separate flats within a block each containing a unique identifier.

Usually resident population

We are currently adopting the UN definition of “usually resident” – that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).

Working age population

People aged 16 to 64.

Back to table of contents