1. Overview of the methods used to create the Health Index

  • This methodology report accompanies an article introducing the Health Index as an Experimental Statistic to measure and understand the health of the nation.
  • The Health Index has been designed with the support of health experts to present a single number measuring health of an area, with a clear structure underneath that number for how different measures of health are combined to produce this value.
  • The development of the Health Index has followed guidance by the Competence Centre on Composite Indicators and Scoreboards (COIN) on producing composite indices.
  • Data have been selected from a wide variety of sources to allow comparisons across time and by geography, down to upper-tier local authority level.
  • Data selection has been based on key principles, such as the aim to measure health and its determinants rather than health services.
  • Factor analysis has been used to group individual indicators of health into subdomains, guided by expert advice; factor analysis results have informed each indicator’s weight towards the total Index value.
Back to table of contents

2. Developing the Health Index

The Office for National Statistics (ONS) has released an article presenting work to date on a composite Health Index. The release is a provisional, or “beta” version covering England at upper-tier local authority (UTLA) level for the years 2015 to 2018. It provides an illustrative presentation of what the Index could look like, what the results could show and how this will enable new analysis. In conjunction with releasing this article, a public consultation was launched to gain feedback on the uses for the Health Index and the methods used to produce it.

The proposal for a Health Index was made in the 2018 annual report of the government’s then Chief Medical Officer (CMO), Dame Sally Davies, entitled Health 2040 – Better Health Within Reach. The report stated:

“We need to track progress in improving health and health outcomes, to and beyond 2040 with a new composite Health Index that reflects the multi-faceted determinants of the population’s health and equity in support of ensuring health is recognised and treated as one of our nation’s primary assets. This index should be considered by Government alongside GDP and the Measuring National Well-being programme. We regularly collect most of the datasets that have the individual measures that could be combined.”

Our aim is to develop the Health Index into a regular publication allowing differences in health to be tracked over time. The four UK health departments have been involved in its development, with a view to extending coverage beyond England in the future.

The work to develop the Health Index so far has been completed in consultation with an Expert Advisory Group (EAG) consisting of representatives from a range of government, academic and third sector organisations. This group includes:

  • Office for National Statistics (ONS)
  • Department of Health and Social Care (DHSC)
  • Public Health England (PHE)
  • Royal Society for Public Health (RSPH)
  • The Health Foundation
  • University College London
  • Association of Directors of Public Health
  • Cabinet Office
  • Department for Business, Energy and Industrial Strategy (BEIS)
  • Department for Environment, Food and Rural Affairs (Defra)
  • Department for Transport (DfT)
  • Institute for Fiscal Studies (IFS)
  • Institute for Social and Economic Research (ISER)
  • The King’s Fund
  • London Health Partnership
  • Ministry of Housing, Communities and Local Government (MHCLG)
  • The National Institute for Health and Care Excellence (NICE)
  • NHS England
  • Northern Ireland Health Department
  • The Organisation for Economic Co-operation and Development (OECD)
  • Scottish Government
  • Welsh Government

The short-term development of this Beta release was supported by a sub-group of this EAG:

  • Office for National Statistics (ONS)
  • Department of Health and Social Care (DHSC)
  • Public Health England (PHE)
  • Royal Society for Public Health (RSPH)
  • The Health Foundation
  • University College London

We extend our thanks to all members for their valuable input into the Health Index’s development, which will continue until mid-2021.

Back to table of contents

3. How the Health Index differs from existing products

The Chief Medical Officer’s report identified a need for a “single number” headline health indicator to act as a policy stimulus and public focus. There is no established example of a health index of the type we are currently developing, in England or elsewhere.

In terms of the existing health indicator “scene”, there are multiple frameworks in use in England, the UK and internationally, including:

These frameworks all have important uses, and most contain elements of all three domains defined for the Health Index, described in Section 6. What the Health Index offers, which these sources individually do not, is a single headline indicator of health that is transparent in its construction, can be compared over time, can be compared at different geographical levels, and can be broken down into the effects that drive changes.

Back to table of contents

4. Potential users of and uses for the Health Index

We expect there to be three broad groups of people using the Health Index:

  • the media and general public
  • policy-makers in government and local government
  • analysts outside of government

The media and general public can present and see the headline measures as an indicator of change in the nation’s health, and of inequality in health between different groups.

Policy-makers in government and local government can clearly identify which topics related to health are not improving over time, and measure health impacts when assessing policies. The Health Index enables measurement of impacts on health to become more regular and consistent. Local government decision-makers can compare health in their area with other places with similar characteristics, and learn about differences between them.

Analysts outside government, such as academics and those in think-tanks and charities, can improve the body of evidence on different aspects of health and the stories this can tell us.

Back to table of contents

5. Process for constructing the Health Index

Our process to construct the Health Index for England largely follows that outlined in the Organisation for Economic Co-operation and Development (OECD) and Joint Research Centre’s (JRC) Handbook on Constructing Composite Indicators and subsequently, in the Competence Centre on Composite Indicators and Scoreboards’ (COIN) 10-step guide.

The steps included in this guide are as follows:

  1. Theoretical framework
  2. Data selection
  3. Imputation of missing data
  4. Multivariate analysis
  5. Normalisation
  6. Weighting
  7. Aggregating indicators
  8. Sensitivity analysis
  9. Link to other measures
  10. Visualisation

This article will focus on Steps 1 to 8 and is structured as such. We have renamed Step 5 to “Homogenising the data” to reflect that scale-based transformations are also involved here. Links to other measures are discussed in Section 3 so are not detailed here.

Back to table of contents

6. Theoretical framework (COIN Step 1)

The concept of health that the Index covers is largely derived from the Chief Medical Officer’s (CMO) original recommendation, which suggested that the index should be:

“inclusive of health outcome measures, modifiable risk factors and the social determinants of health”.

This encompasses the World Health Organization’s definition of health – that health “is a state of complete physical, mental and social well-being, and not merely the absence or disease or infirmity” – and adds specificity to the idea of well-being.

The theoretical framework that the CMO alluded to is well-known in public health and epidemiology, and can be summarised as dividing the factors influencing health into three categories:

  • health status or outcomes: mortality or life expectancy, morbidity measures such as disease prevalence; wider well-being measures could also be considered in this category
  • modifiable risk factors (MRFs): these are things that affect health that can be potentially changed at individual level, such as health-related behaviours (for example, smoking, exercise) and actionable clinical findings (for example, blood pressure), but it is important to understand that these factors are in the middle of a bigger causal chain
  • wider or social determinants of health (WDHs): circumstances that have a major effect on life chances including both MRFs and health outcomes, but cannot be addressed at individual level; examples include unemployment rates, availability of healthy food, quality of transport infrastructure and environmental pollution

The “rainbow” diagram of Dahlgren and Whitehead (Dahlgren G and Whitehead M (1991) “Policies and strategies to promote social equity in health”. Institute for Future Studies, Stockholm) is often used to illustrate the relationships between these different factors impacting health.

With such a broad concept of health in scope for the Health Index, the topics included typically cover general health issues that are applicable to the whole population.

Considering the definitions mentioned previously, elements of health are divided into three domains for the Health Index, each corresponding to one of these three categories:

  • healthy people –- health outcomes, ensuring representation of the population as a whole
  • healthy lives – health-related behaviours and personal circumstances
  • healthy places – wider determinants of health, environmental factors
Back to table of contents

7. Data selection overview (COIN Step 2)

The first step in deciding what content to include was to conduct a review of existing indices and frameworks that had a relation to health. The aim was to understand what content they included, and what of that was relevant to the Health Index. This was conducted in the context of the broader definition of health explained in Section 6.

Following this we reviewed the wider literature to understand whether there was additional content the Health Index should include, as its aims, functions and purpose differ from these other products. In conjunction with both steps, a range of data sources that could potentially be used to measure these concepts were identified.

We reported these initial proposals for Health Index content to the complete Expert Advisory Group (EAG) to gain their feedback on the concepts included, how they were measured and whether there were additional concepts that should be added. Using this feedback, a detailed review of the content proposed for inclusion was carried out, including a critical review of how these should be measured and what data were available to construct the Index presenting those concepts. At all stages of this process the aim was to maintain the right balance between concept and data, ensuring the use of the most optimal measure without unduly compromising on data quality.

Central during this process was ensuring that we measure health itself and its determinants, rather than healthcare activity, service performance or policy. This has been considered in both the inclusion of concepts and especially in the ways in which they are measured.

Some data sources have been ruled out because they are too directly linked to one of these aspects. For example, if the Index were to include the number of people receiving adult social care as a measure, the overall figure representing health would change if the national thresholds for social care eligibility changed, even if the nation’s health did not actually get better or worse. There are some concepts, however, for which there are too few comprehensive sources to fully distance an indicator from these aspects (for example, for mental health). In these instances, careful considerations have been made to understand the benefits of their inclusion outweigh the limitations.

Data requirements for quality

The data that have been selected to develop the Index at this stage have come from already published sources as this means certain quality standards will already have been met. The data have also been checked to ensure they meet the needs for the Index, using the following criteria:

  • data must be available for enough years to make comparisons over time, which at this stage, means 2015 to 2018; there may be some exceptions to this where it is reasonable to assume that big changes would not occur from year to year
  • there must be reasonable certainty that the data will continue to be produced into the future, to ensure comparisons over time are based on consistent data as far as possible
  • data must be available for upper-tier local authority areas (UTLAs) or lower-tier local authority areas (LTLAs), which is the smallest geographical breakdown available for most health data sources suitable for the Index’s needs; this is to allow Health Index numbers to be seen both for England as a whole and for specific geographical areas, allowing comparisons to be made between areas of interest

Reviewing with expert support

When this review was complete, the conclusions reached were shared with the EAG sub-group who were supporting on the current release, for further consideration. Some further revisions were necessary at the point of data acquisition, where more detailed explorations revealed previously undiscovered issues with particular data sources, such as large amounts of missing data at upper tier local authority (UTLA) level.

Management of data differences

Even when data are labelled as presenting the same years, the period covered by those labels differs between single points in time, calendar years, financial years, academic years and other periods. For the Health Index, we want to be able to report results for calendar years for consistency with other Office for National Statistics (ONS) health statistics, but not all data sources are published on this basis.

Where data differ from calendar years, we have assigned the data to the year in which most of the source period falls. For example, the financial year April 2016 to March 2017 was used for the 2016 calendar year. If there is a possibility of updating the Health Index more frequently than annually in future, this approach could change.

For some data sources, some values are based on small numbers for individual years, which risk being disclosive. In these cases, three-year aggregates are used to present the data. Where this applies, the data have been assigned to the final year covered. For example, healthy life expectancy data for 2016 to 2018 are used to represent 2018. Typically a value calculated in this way would be counted as the value for the middle of the three years used to calculate it, but we are presenting in this way to ensure conclusions drawn from adding another year of data are understood to be based on data from that latest year.

Selecting and grouping data

The following sections detail the indicators that have been included for each of the domains of Healthy People, Healthy Lives and Healthy Places. After the sections on indicators are details of the limitations of the present data selection, where there are concepts we have not been able to include at present, and development plans for the inclusion of more data.

Where applicable we use rates that have been standardised for age and/or sex over those that have not, to minimise the impact that changes in these demographics had on Health Index values.

Back to table of contents

8. Data selection: healthy people (COIN Step 2)

This domain directly relates to the health outcome measures outlined in the Chief Medical Officer’s (CMO) recommendation. Much of the health literature defines “health outcomes” as the outcomes from healthcare procedures. However, we have worked on the basis that this domain (and the Index as a whole) should not include measures of healthcare activity, as these will likely reflect the performance and policy of healthcare rather than population health. As such, for this index, we consider “health outcomes” to be comprised of mortality, morbidity and mental health.

A focus on mortality and morbidity

For mortality, the general approach was to add indicators alongside life expectancy to broaden the measure of mortality to cover the whole of the lifespan. Therefore, indicators were included for infant mortality, avoidable deaths, suicide and healthy life expectancy. Healthy life expectancy and life expectancy were both considered, but as we wanted the Index as a whole to measure morbidity as well as mortality, healthy life expectancy was agreed to be the more appropriate measure.

To complement the aforementioned mortality indicators, the remainder of the Healthy People indicators largely focus on morbidity. In terms of the physical health conditions included, they have been selected based on their status as top contributors to mortality or morbidity according to the Health Profile for England.

Selecting data: measuring prevalence and incidence

These indicators have been measured using the prevalence of the relevant conditions, but incidence was also considered as a way to measure these conditions. Incidence relates to the number or rate of new diagnoses in a given period, and prevalence shows how many people in that period are living with the condition.

Prevalence data are more widely available, and allow us to capture the impact on health of all those currently living with those conditions, thereby creating a measure of the stock of morbidity. This also creates more separation of these measures from mortality. This is particularly relevant for those conditions that are not a direct cause of death, such as musculoskeletal conditions, where morbidity is the health outcome rather than mortality.

The main drawback of using prevalence lies with conditions that are a direct cause of death. Including prevalence measures means the Index’s assessment of health will decline when prevalence increases. If treatments were to become increasingly effective at managing conditions and reducing mortality rates, without individuals entering remission or being cured, individuals would be living with the conditions over longer periods. This is currently being seen in cancer as one example. If this were to continue, this would act to increase the prevalence of conditions – and therefore present declining health in this index – when the increase in prevalence is not wholly negative.

Elsewhere in the Index we would see the positive effect of this in reduced mortality rates, and it is reasonable to expect a degree of increased morbidity if more people are living with a condition, regardless of the reason for this. Conditions and their treatment have side effects that will impact on this. However, the weight these indicators are given in the Index is important in determining to what degree decreased mortality and increased morbidity influence Index values.

The morbidity indicators are therefore a range of physical health conditions detailed in this section, disability, and difficulty completing activities of daily living (ADLs). The latter has been included in an attempt to capture the impact of difficulties that do not necessarily relate to a specific condition but which still impact on quality of life. Including this may also help to indicate the severity of health conditions, their symptoms and treatment side effects, all of which can affect individuals’ ability to carry out day-to-day activities.

We have included as many indicators as possible relating to mental health and well-being, but the data availability is much more restrictive than for physical health conditions. We have attempted to capture the diagnosis of conditions as well as self-perceived well-being.

Indicators, data and sources

The indicators included for Healthy People, and the data and data sources to measure them, are as follows.

Infant mortality

This indicator consists of the Office for National Statistics (ONS) infant mortality rate, which is the number of infant deaths under 1 year of age per 1,000 live births. These data are produced using data from death and birth registrations but for the current purposes have been obtained from Public Health England’s (PHE’s) Fingertips tool, as this provides the three-year aggregates required to present otherwise small numbers for individual years.

Avoidable deaths

This indicator consists of ONS avoidable deaths, defined as the age-standardised mortality rate (deaths per 100,000 of the population) in those aged from 0 to 74 years old for all causes considered avoidable. These are produced using data from death registrations and population estimates.

Suicides

This indicator consists of the ONS suicide rate, defined as the age-standardised mortality rate (deaths per 100,000 of the population aged 10 years and over) from suicide and injury of undetermined intent. These data are produced using data from death registrations and population estimates.

Healthy life expectancy

This indicator consists of ONS healthy life expectancy at birth for males and females, which is a measure of the average number of years a person would expect to live in good health based on contemporary mortality rates and prevalence of self-reported good health. The prevalence of good health is derived from responses to a question on general health in the Annual Population Survey. For a particular area and time period, it is an estimate of the average number of years a new-born baby would live in good general health if he or she experienced the age-specific mortality rates and prevalence of good health for that area and time period throughout his or her life. The male and female values were combined using a population weighted average for each area. The ONS calculates values and produces the underlying data, which is from various sources: the Annual Population Survey, death registrations, population estimates and Census 2011.

Dementia

This indicator consists of the Quality and Outcomes Framework (QOF) prevalence of dementia measure, defined as the percentage of General Practitioner (GP) patients on a practice register for dementia. These data are produced by NHS Digital.

Musculoskeletal conditions

This indicator consists of the QOF prevalence of rheumatoid arthritis measure, defined as the percentage of GP patients (aged 16 years old or over) on a practice register for rheumatoid arthritis, and the QOF prevalence of osteoporosis measure, defined as the percentage of GP patients (aged 50 years old or over) on a practice register for osteoporosis. These have been combined using an average of the values for each. These data are both produced by NHS Digital.

Respiratory conditions

This indicator consists of the QOF prevalence of asthma and the prevalence of QOF Chronic Obstructive Pulmonary Disease (COPD) measures, defined as the percentage of GP patients on a practice register for asthma or COPD respectively. These have been combined using an average of the values for each. These data are produced by NHS Digital.

Cardiovascular conditions

This indicator consists of the following measures:

These are all defined as the percentage of GP patients on a practice register for the relevant condition. These have been combined using an average of the values for each. These data are all produced by NHS Digital.

Cancer

This indicator consists of the QOF prevalence of cancer measure, defined as the percentage of General Practitioner (GP) patients on a practice register for cancer. These data are produced by NHS Digital.

Diabetes

This indicator consists of the QOF prevalence of diabetes, defined as the percentage of GP patients (aged 17 years old and over) on a practice register for diabetes. These data are produced by NHS Digital.

Kidney disease

This indicator consists of the QOF prevalence of chronic kidney disease measure, defined as the percentage of GP patients (aged 18 years old and over) on a practice register for chronic kidney disease. These data are produced by NHS Digital.

Disability that impacts daily activities

This indicator consists of the ONS percentage of working age adults (16 to 64 years old) who are disabled (under the Equality Act) or work-limiting disabled statistics. These data are produced using the Annual Population Survey and are sourced from Nomis.

Difficulty completing activities of daily living (ADLs)

This indicator consists of the GP Patient Survey proportion of adults with a long-term condition that reduces their ability to carry out day-to-day activities.

Frailty

This indicator is measured using data for hip fractures in people, from Hospital Episode Statistics (HES), defined as the number of emergency hospital admissions for fractured neck of femur in persons aged 65 years and over, directly age-standardised rate per 100,000. These data are produced by PHE using data on hospital admissions in HES from NHS Digital, and unrounded mid-year population estimates from the ONS.

Depression

This indicator consists of the QOF prevalence of depression measure, defined as the percentage of GP patients (aged 18 years old and over) on a practice register for depression. These data are produced by NHS Digital.

Life satisfaction

This indicator consists of the ONS average life satisfaction score, defined as the mean score (out of 10) of respondents (aged 16 years old and over) answering the question “Overall, how satisfied are you with your life nowadays?”. These data are from the ONS Annual Population Survey (APS) Integrated Household Survey.

Life worthwhileness

This indicator consists of the ONS average life worthwhileness score, defined as the mean score (out of 10) of respondents (aged 16 years old and over) answering the question “Overall, to what extent do you feel the things you do in your life are worthwhile?”. These data are from the APS Integrated Household Survey.

Happiness

This indicator consists of the ONS average happiness score, defined as the mean score (out of 10) of respondents (aged 16 years old and over) answering the question “Overall, how happy did you feel yesterday?”. These data are from the APS Integrated Household Survey.

Anxiety

This indicator consists of the ONS average anxiety score, defined as the mean score (out of 10) of respondents (aged 16 years old and over) answering the question “Overall, how anxious did you feel yesterday?”. These data are from the APS Integrated Household Survey.

Self-harm

This indicator consists of HES hospital admissions as a result of self-harm, defined as emergency hospital admissions for intentional self-harm using a directly age-standardised rate (per 100,000 of the population). These data are sourced from PHE Fingertips and are produced by PHE using data on hospital admissions in HES from NHS Digital, and unrounded mid-year population estimates from the ONS. Analysis uses the single year of age grouped into quinary age bands, by sex.

Children’s social, emotional and mental health

This indicator consists of the Department for Education (DfE) proportion of school pupils with social, emotional and mental health needs, defined as the number of school children (primary and secondary) who are identified as having social, emotional and mental health needs expressed as a percentage of all school pupils. These data are from the DfE special educational needs statistics.

Back to table of contents

9. Data selection: healthy lives (COIN Step 2)

This domain covers both physiological and behavioural modifiable risk factors. It also relates to social and economic factors – from the wider determinants of health – that affect the population at the individual level.

The most prevalent modifiable risk factors are defined in the Health Profile for England, using data from the Global Burden of Disease; this is supported by the World Health Organization’s list of risk factors for non-communicable disease, as well as the Marmot Review. Many of the modifiable risk factor indicators can be found in PHE’s Public Health Outcomes Framework and the Global Burden of Disease. In addition to those more traditionally recognised metrics, literature suggests emerging risk factors including sleeping patterns and sedentary time, though it has been more difficult to acquire data that measure these.

A subset of the wider determinants of health are included here: the socioeconomic factors that impact at the individual level. Derived in particular from The Health Foundation’s Exploring the social determinants of health series, these refer to the individual’s education, employment, income, and social or support networks. This is supported by the Marmot Review and the Health Profile for England.

Indicators, data and sources

The indicators included for Healthy Lives, and the data and data sources used to measure them are as follows.

Overweight and obesity in children

This indicator consists of NHS Digital’s prevalence of overweight and obesity in reception pupils and the prevalence of overweight and obesity in Year 6 pupils. These are defined as the proportion of children (aged 4 to 5 and 10 to 11 years old, respectively) classified as overweight or obese. Children are classified as such if their body mass index (BMI) is on or above the 85th percentile of the British 1990 growth reference (UK90) according to age and sex. These data are produced by NHS Digital as part of the National Child Measurement Programme. The data for the two age groups have been combined using a simple average.

Overweight and obesity in adults

This indicator consists of the Active Lives Survey’s percentage of adults classified as overweight or obese, based on a definition of adults as those aged 18 years and older. Adults are defined as overweight or obese if their BMI is greater than or equal to 25 kilograms per square metre (kg/m2). These data are modelled and age-standardised estimates, calculated from adjusted height and weight variables, produced by Public Health England (PHE) using Sport England’s Active Lives Survey.

Hypertension

This indicator consists of the Quality and Outcomes Framework’s (QOF’s) prevalence of hypertension, defined as the percentage of GP patients on a practice register for hypertension. These data are produced by NHS Digital.

Low birth weight

This indicator consists of PHE’s live births with low birth weight. This is defined as live births with a recorded birth weight of less than 2,500 grammes and a gestational age of at least 37 complete weeks, as a percentage of all live births with recorded birth weight and a gestational age of at least 37 complete weeks. These data are produced by the ONS but have been sourced from PHE’s Fingertips tool, as this provides the three-year aggregates required for presenting results when individual years have small numbers.

Physical activity

This indicator consists of the Active Lives Survey’s percentage of adults who are physically active for 150 minutes or more per week. This includes those aged 19 years and over and is calculated based on the minutes of activity being equivalent to moderate intensity activity in bouts of 10 minutes or more. These data are produced by PHE using Sport England’s Active Lives Survey.

Healthy eating

This indicator consists of the Active Lives Survey’s proportion of adults eating five or more portions of fruit and vegetables on a “usual day”. These data are produced by PHE using Sport England’s Active Lives Survey.

Smoking

This indicator consists of ONS’s smoking prevalence in adults, aged 18 years and over, based on those self-reporting as being a current smoker. These data are produced using the Annual Population Survey (APS).

Alcohol misuse

This indicator consists of Hospital Episode Statistics (HES’s) hospital admission episodes for alcohol-related conditions. This is defined as admissions to hospital where the primary diagnosis is an alcohol-related condition, or a secondary diagnosis is an alcohol-related external cause. This is a directly age-standardised rate per 100,000 population (standardised to the European standard population). These data are calculated by PHE from NHS Digital HES and ONS mid-year population estimates.

Drug misuse

This indicator consists of HES’s hospital admissions with a primary diagnosis of drug poisoning by illicit drugs and hospital admissions with a primary diagnosis of drug-related mental health and behavioural disorders, both age-standardised rates per 100,000 population. These data are produced by NHS Digital and have been combined by addition.

Cancer screening

This indicator consists of the following:

  • breast cancer screening coverage, which is the proportion of women eligible for screening who have had a test with a recorded result at least once in the previous 36 months
  • bowel cancer screening coverage, which is the proportion of eligible men and women aged 60 to 74 years who had an adequate faecal occult blood test (FOBt) screening result in the previous 30 months
  • cervical cancer screening coverage (25- to 49-year-olds), which is the proportion of eligible women aged 25 to 49 years at the end of the period reported who were screened adequately within the previous three and a half years
  • cervical cancer screening coverage (50- to 64-year-olds), which is the proportion of eligible women aged 50 to 64 years at the end of the period reported who were screened adequately within the previous five and a half years

These data are produced by PHE using data from NHS Digital National Health Application and Infrastructure Services (NHAIS). These data have been combined using a simple average.

Vaccination coverage

This indicator consists of the following:

  • Population vaccination coverage – Pneumococcal Conjugate Vaccine (PCV) (1-year-olds)
  • Population vaccination coverage – Meningitis B (MenB) (1-year-olds)
  • Population vaccination coverage – Diphtheria, Tetanus, Pertussis (Dtap), Inactivated Poliovirus Vaccine (IPV) and Haemophilus influenzae type b (Hib) (1-year-olds)
  • Rotavirus vaccination coverage (1-year-olds)
  • Measles, mumps and rubella (MMR) vaccination coverage (2-year-olds)
  • Population vaccination coverage – PCV (2-year-olds)
  • MenB booster vaccination coverage (2-year-olds)
  • Hib and Meningitis C (MenC) booster vaccination coverage (2-year-olds)
  • Population vaccination coverage – MMR for two doses (5-year-olds)
  • Diphtheria, Tetanus, Polio, Pertussis booster vaccination coverage (5-year-olds)
  • Human Papillomavirus (HPV) for one dose coverage at 12 to 13 years
  • HPV for two doses vaccination coverage at 13 to 14 years

These data are produced by NHS Digital using COVER data produced by PHE. These data have been combined following imputation using a simple average.

Sexual health

This indicator consists of PHE’s new sexually transmitted infection (STI) diagnoses, excluding chlamydia in under 25-year-olds, per 100,000 of the population. Chlamydia diagnoses are excluded because large numbers of cases are asymptomatic, so increases in diagnosis rates could result from increased testing rather than increased infection. These data are collected and collated by the Blood Safety, Hepatitis, STIs and HIV Division of PHE.

Teenage pregnancy

This indicator consists of PHE’s number of conceptions in women aged 15 to 17 years, per 1,000 females. These data are produced by PHE from ONS conceptions data on live births, stillbirths and legal abortions.

Early years development

This indicator uses PHE’s percentage of 5-year-olds achieving a good level of development. This is based on children defined as having reached a good level of development at the end of the Early Years Foundation Stage (EYFS) as a percentage of all eligible children. These data are produced by PHE using data from the EYFS Programme, from the Department for Education (DfE).

GCSE achievement

This indicator uses DfE’s percentage of pupils achieving grades 4 or above (A* to C) in English and Mathematics GCSEs, based on pupils at state schools.

Pupil absence

This indicator consists of DfE’s persistent absenteeism statistic, defined as the percentage of pupils at all schools (state-funded primary, state-funded secondary and special schools) who are persistent absentees, that is, have overall absences equating to 10% or more of their possible sessions.

Young people’s education, employment and training

This indicator is measured using DfE’s proportion of 16- and 17-year-olds recorded as not in education, employment or training (NEET). These data are produced by the DfE from local authority data.

Unemployment

This indicator consists of ONS estimates of unemployment using a model developed to expand upon ONS’s Labour Force Survey (LFS) data.

Workplace safety

This indicator consists of Reporting of Injuries, Diseases and Dangerous Occurrences Regulations (RIDDOR) reported non-fatal injuries, presented as the rate of reported non-fatal injuries per 100,000 employees. These data are produced by the Health and Safety Executive (HSE).

Job-related training

This indicator consists of ONS’s percentage of working age adults who received job-related training in the last 13 weeks. These data are produced using the Annual Population Survey (APS).

Child poverty

This indicator uses Department for Work and Pensions (DWP) data on children living in absolute low income, defined as the percentage of children aged 0 to 15 years living in families with absolute low income.

Low pay

This indicator is measured using ONS’s percentage of employees earning below the National Living Wage (NLW). These data are produced using the Annual Survey of Hours and Earnings (ASHE).

Children in state care

This indicator consists of PHE’s percentage of children state care, defined as the number of children looked after at 31 March (including adoption and care leavers) per 10,000 of the population aged under 18 years old. These data are produced by PHE using data from DfE.

Back to table of contents

10. Data selection: healthy places (COIN Step 2)

This domain includes those social and environmental factors that affect the population at a collective level. These relate to circumstances that can influence health outcomes and modifiable risk factors, but that cannot be addressed at the individual level. We used the Marmot Review and Public Health England’s (PHE’s) Spatial Planning for Health report, in particular, to inform the topics that we included here: physical environment, housing, and community services and safety. These are also supported by The Health Foundation’s Exploring the social determinants of health series.

As such, the wider, social determinants of health specified in the Chief Medical Officer’s (CMO’s) recommendation are divided across the Healthy Lives and Healthy Places domains according to the level at which they affect the population. This has been done to avoid a wider determinants domain containing many more indicators than the other domains. The contents of all three of these domains will be explored more fully with factor analysis as the Health Index is developed further, to ensure the specific measures of the indicators included are categorised appropriately into subdomains.

While the proposed structure of having the three domains of Healthy People, Healthy Lives and Healthy Places has been supported by those who initially proposed a composite health index, and the Expert Advisory Group (EAG), this structure will also be scrutinised by factor analysis. Should the evidence from the analysis suggest a different approach is appropriate, this may also be revised.

Indicators, data and sources

The indicators included for Healthy Places, and the data and data sources to measure them are as follows.

Air pollution

This indicator consists of the Department for Environment, Food and Rural Affairs (Defra) air pollution measures, defined as the annual concentration of fine particulate matter at an area level, adjusted to account for population exposure. Fine particulate matter is also known as PM2.5 and has a metric of micrograms per cubic metre. These data are produced by Defra from modelled Defra pollution data and Office for National Statistics (ONS) population estimates. The data used here differ from the PHE Fingertips air pollution indicator, where only the anthropogenic (human-made) component of PM2.5 is used, as all PM2.5 will have an impact on health. We use total PM2.5 to give a measure of the impact of pollution on health, whether human-made or not.

Public green space

This indicator consists of ONS’s average distance to the nearest park or public garden, measured as the average distance to the nearest boundary of a park or public garden using postcode centroids. These data are calculated by ONS using Ordnance Survey Open Greenspace Data. This is currently a one-off release in 2020 for the data as of 2018, which has been used as the value for all years presented. We propose including this measure in spite of a lack of regular time series because we perceive this concept to be important to measuring a person’s location’s impact on their health. We also anticipate this indicator to be more stable over time than some others.

Private outdoor space

This indicator consists of ONS’s access to garden space, defined as the percentage of addresses (houses and flats) with access to private garden space. These data are produced using ONS and Ordnance Survey data. This is currently a one-off release in 2020 for the data as of 2018, which has been used as the value for all years presented. We propose including this measure in spite of a lack of regular time series because we perceive this concept to be important to measuring a person’s location’s impact on their health.

Transport noise

This indicator consists of Defra’s percentage of the population exposed to road, rail and air transport noise of 65 A-weighted decibels (dB(A)) or more during the daytime, and the percentage of the population exposed to road, rail and air transport noise of 55dB(A) or more during the night-time. For both, noise exposure is determined by strategic noise mapping (produced in connection with the Environmental Noise Directive (END)) using national calculation methods and input data from the relevant authorities. The results are overlaid on a residential population dataset to determine the number of people exposed per authority. The daytime and night-time measures have been combined using an average of the values for each. These data are produced by Defra for the Public Health Outcomes Framework (PHOF), with noise exposure data from Defra and population estimates from the ONS.

Neighbourhood noise

This indicator consists of Defra’s rate of complaints about noise, defined as the number of complaints about noise per year per local authority (per 1,000 population). These data are collated by the Chartered Institute of Environmental Health (CIEH) and the extrapolation is determined by Defra in association with CIEH. The indicator values are calculated by CIEH and PHE, Centre for Radiation, Chemical and Environmental Hazards, and population data are from the ONS.

Road safety

This indicator consists of number of road accidents per volume of traffic, with the accident value defined as the number of personal injury road traffic accidents on a public road reported to the police, classified as fatal, serious or slight. The indicator was produced for this publication by the ONS, using data from the Department for Transport (DfT) to create a value for the number of accidents per billion vehicle miles.

Road traffic volume

This indicator consists of the volume of traffic per area, defined as the number of billion vehicle miles from all motor vehicles annually, per square kilometre of land (not including inland water and to average high tide mark). The indicator was calculated for this version of the Health Index by the ONS, using data from the DfT and ONS.

Household overcrowding

This indicator consists of the Ministry of Housing, Communities and Local Government (MHCLG’s) household overcrowding using an occupancy rating less than zero, meaning the number of rooms is less than the predicted “required” number of rooms.

Occupancy rating provides a measure of whether a household's accommodation is overcrowded or under-occupied. There are two measures of occupancy rating, one based on the number of rooms in a household's accommodation, and one based on the number of bedrooms; here we use rooms. The ages of the household members and their relationships to each other are used to derive the number of rooms they require. The number of rooms required is subtracted from the number of rooms in the household's accommodation to obtain the occupancy rating. An occupancy rating of minus 1 implies that a household has one fewer room than required, whereas plus 1 implies that they have one more room than the standard requirement. These data are produced by the ONS from the 2011 Census. The 2011 data have been used for all years with the remaining years imputed as detailed in Section 14.

Rough sleeping

This indicator consists of MHCLG’s number of people sleeping rough, which is the number of people sleeping outdoors on a single night in October or November per 100,000 residents. These data are produced annually by MHCLG and use ONS data for resident population estimates.

Housing affordability

This indicator uses ONS’s housing affordability statistic, defined as the ratio of lower quartile house prices to lower quartile gross annual (where available) residence-based earnings. These data are produced using earnings data from the Annual Survey of Hours and Earnings (ASHE) and house prices from house price statistics for small areas.

Distance to GP services

This indicator consists of the distance to the nearest GP practice, defined as the average minimum “as the crow flies” distance from households in the local authority to the nearest GP practice. This has been calculated by the ONS for the current purposes using GP practice addresses from NHS Digital and postcode centroids from the ONS National Statistics Postcode Lookup (NSPL).

Distance to pharmacies

This indicator consists of the distance to the nearest pharmacy (dispensary), defined as the average minimum “as the crow flies” distance from households in the local authority to the nearest dispensary. This has been calculated by the ONS for the current purposes using dispensary addresses from NHS Digital and postcode centroids from the ONS NSPL.

Distance to sports or leisure facilities

This indicator consists of the distance to the nearest sports or leisure facility, defined as the average minimum “as the crow flies” distance from households in the local authority to the nearest sports or leisure facility. This has been calculated by the ONS for the current purposes using sport facility addresses from Sport England and postcode centroids from the ONS NSPL.

Personal crime

This indicator consists of ONS’s police recorded personal crime, which is the number of personal crimes per 1,000 people. Personal crime offences are defined as violence against the person, sexual offences, robbery, theft, criminal damage and arson.

Back to table of contents

11. Data selection limitations (COIN Step 2)

As mentioned previously, only published sources that required little or no manipulation to use have been included at this stage. There are potential data sources additional to those detailed, which we may be able to include when developing the Index more fully, in order to add to the depth and breadth of the included indicators, or improve the way we measure certain concepts. In each case, no equivalent data have been identified in publicly available sources that meet our principles for data inclusion. These include:

  • Understanding Society data, which, if the sample size supports presenting results at upper-tier local authority level, may enable the inclusion of multiple concepts within the Health Index, such as perceptions of the places people live in, including fear of crime, feelings of safety in the neighbourhood and a sense of belonging to the local community
  • Monitor of Engagement with the Natural Environment (MENE) data, which would allow us to include measures of the user-perceived quality of the green space and of the perceived ease of walking to these green spaces

Specific concepts, or measures of concepts, which it has not been possible to include at present are now detailed for each domain.

Healthy People

Disability-free life expectancy

In the mortality measures, we considered using disability-free life expectancy but the time period this measure is available for is not sufficient at present.

Broader range of some physical health conditions

For physical health conditions, we would prefer to include a broader range of musculoskeletal and respiratory conditions but data do not permit this. The GP Patient Survey does measure these in a way that avoids double-counting individuals with both of the conditions captured in each existing indicator used, but there is a break in the data time series for 2018 because of changes made to the survey, which means that the measure is not comparable over time. We would also like to include a measure of multimorbidity because having more than one condition can have a greater impact on quality of life than the sum of the impact of each condition would suggest.

Impact of other health issues on daily life

We have included a measure of the impact of long-term health conditions on activities of daily living (ADLs), but we also intended to capture the impact of other issues on daily life, such as because of frailty, brain injuries or other injuries with long-term consequences (not necessarily covered by musculoskeletal conditions). This has not been entirely possible because of the measure of ADLs available. The Health Survey for England measures this more broadly but cannot be disaggregated to upper-tier local authority (UTLA) level. We attempted to account for the narrow focus of the measure of ADLs by adding a measure of frailty as a proxy for difficulty with ADLs. Other measures, such as number of emergency hospital admissions due to falls (HES), could be included in later versions of the Index if more detail here is preferred.

Disability for other age groups

We have measured disability for working age adults but ideally we would include measures of disability for other age groups, younger and older. No suitable data were identified.

Mental health

Mental health indicator selection has been limited by the availability of data that met the required criteria for the construction of the Health Index. The indicators included previously therefore better reflect data availability than necessarily reflecting the richer definition of mental health that we would prefer. There are notable absences such as eating disorders and several common mental health conditions.

Data are also available from NHS Digital on numbers of Improving Access to Psychological Therapies (IAPT) referrals but further exploration is needed to understand whether they would add value to the measure of prevalence of depression, or largely provide a subset of those diagnosed with depression, which are already captured in the previous indicator. These data are also linked with service availability, patients’ willingness to engage with services and GPs’ referral practices, which cause further hesitation over their use. However, given the scarcity of mental health data, these data may be included in the full Index, should further investigations satisfy our concerns.

Healthy Lives

As mentioned previously, several indicators could be sourced from Understanding Society data, but we have not yet explored whether it is suitable to present these data at UTLA level.

Alcohol consumption and prevalence of drug misuse

In addition to this, for alcohol and drug misuse, ideally we would measure alcohol consumption above recommended levels and the prevalence of drug misuse. Alcohol consumption data are available on the Health Survey for England from NHS Digital, but these cannot be disaggregated sufficiently. Questions on alcohol consumption were also previously included on the Office for National Statistics’s (ONS’s) Opinions and Lifestyle Survey but these have been discontinued. We have instead used hospital admissions related to alcohol or drug misuse, using Hospital Episode Statistics.

For both, we have considered the use of data from the National Drug Treatment Monitoring System (NDTMS), but these are not available for all years currently presented and there are concerns over whether these would be more closely linked to service availability than some other measures. These therefore require further consideration to understand whether they should be adopted into the full version of the Index. Using them would give us measures of the number of adults with an alcohol dependency potentially in need of specialist treatment, and the prevalence of opiate and/or crack cocaine use.

Income and poverty at lower geographical levels

For income and poverty, there are a lot of good measures at national level, but few can be disaggregated to the desired levels of geography. Poverty data from the Department for Work and Pensions (DWP) Households below average income (HBAI) survey cannot be disaggregated below Nomenclature of Territorial Units for Statistics (NUTS1) regions. Other measures such as persistent poverty or low income, material deprivation, or being at risk of poverty or social exclusion are all available nationally but not to the required geography. There were also concerns about measures of average income masking inequality and failing to focus on the lower end of the distribution, the most important aspect for health.

DWP benefits data could be used to supplement the indicators currently used, which at present cover children and working adults, particularly out-of-work benefits and Pension Credit. The change to Universal Credit is a particular area of focus that also serves to highlight the need to understand how we would handle any such future changes to the benefit system, as this will both affect the time series for benefit recipients during rollout, and may affect take-up, reflecting a service change rather than a change in income or poverty as a risk factor.

Quality and safety of employment

As well as measures of unemployment, we aim to capture quality and safety of employment. Job satisfaction is a useful measure that we have not been able to include at present but may be able to do so if we can use the Understanding Society data. For safety, we would have preferred to include RIDDOR-reported fatal injuries from the Health and Safety Executive (HSE), in addition to the non-fatal injuries. These data are, however, missing for too many UTLAs, and the counts are small and volatile in the time series. Another variable of interest in this area was the estimated prevalence of self-reported illness or injury caused or made worse by work, available from the Labour Force Survey but with insufficient geographic disaggregation for our purposes.

Social interaction

A key area where very little suitable data are available is for social interaction. We have not been able to include indicators for loneliness, support networks or social isolation, other than a measure for adult social care users. The Community Life Survey, produced by the Department for Digital, Culture, Media and Sport and Office for Civil Society, is a potential source for this but the geographic disaggregation is not sufficient. The GP Patient Survey also has a measure but the changes made to this mean the time series is not comparable at present.

Other indicators unable to include

In addition to this, other indicators we have not been able to include because of a lack of suitable data are:

  • high cholesterol
  • children’s physical activity
  • children’s eating behaviours
  • children’s dental health
  • breastfeeding
  • violence or abuse experienced within a household
  • maternal health characteristics (mental health, substance misuse, obesity)
  • sun and UV exposure
  • sleep
  • sedentary behaviour
  • problem gambling
  • internet safety
  • childhood bullying
  • quality of the early years workforce

Healthy Places

Access to quality green space

The measure of access to green space would ideally encompass access to safe green space but unfortunately the data currently available do not support this. For the full version of the Index, if we could include the measure of user-perceived quality of green space from the Monitor of Engagement with the Natural Environment (MENE) data mentioned previously, this may in some way reflect the safety. The measurement of the perceived ease of walking to these green spaces would also add depth to the measure by including how accessible users perceive them to be.

Travel time to services

Rather than using “as the crow flies” distances for all indicators measuring access to services or spaces, our preference would be to include a measure of travel time to the nearest GP, pharmacy or sports and leisure facility, to better reflect how accessible the services are. Calculating this was outside of the scope of the beta release. The methods currently known to us are resource-intensive and suggestions are welcomed for alternatives.

Access to unhealthy goods

Access to services could also include access to unhealthy goods; this was out of scope for this version of the Health Index because of the resource required to produce this measure from OpenStreetMap or company registration data, but will be considered for the full version of the Index.

Quality of housing

For housing, the aim is to capture quality of housing, whether houses are adequately heated, the extent of overcrowding, and homelessness, as these have all been shown to affect health. The potential source for measures of housing quality in terms of the state of repair of the property we identified is data produced by the Ministry of Housing, Communities and Local Government (MHCLG) using the English Housing Survey. These were found to be unsuitable as they cannot be disaggregated to the geographies required for the Health Index. The same source includes data on property energy efficiency rating bands, which could have provided a measure for the adequacy of heating of homes, had it been suitable.

Back to table of contents

12. Methods overview

The methods chosen for the purposes of the Health Index fall into one of two categories:

  1. The method is our preferred method and where there are viable alternatives, sensitivity analysis will be conducted during the development of the full index.
  2. The method is suitable for use but has been chosen for its simplicity for the purposes of the beta.

It is intended that more refined methods will be developed when working on the full index but the method used here will suffice in producing experimental data that are used to illustrate the index rather than intended to allow users to draw firm conclusions.

The sections that follow detail each step taken to create the beta and include the method used, the preferred method and the alternatives. In all cases rationales are given for why particular methods are or are not preferred. This covers:

  • geographical aggregation (Section 13)
  • imputation of missing data (Section 14)
  • multivariate analysis (Section 15)
  • homogenising the data (Section 16)
  • weighting (Section 17)
  • aggregating indicators (Section 18)
  • sensitivity analysis (Section 19)
  • scaling (Section 20)
Back to table of contents

13. Geographical aggregation

Data are collected for each indicator for all the geographies provided at lower-tier local authority (LTLA), upper-tier local authority (UTLA), region and country level for England. These are equivalent to levels E06, E07, E08, E09 and E10.

UTLAs include unitary authorities (UAs), metropolitan boroughs, London boroughs and counties. There are 151 UTLAs that combine to form the nine regions of England. For the purposes of the Health Index, results for the Isles of Scilly and City of London UTLAs are not included for any indicators because of small sample sizes leading to unreliable underlying data. For some sources, these UTLAs are grouped with nearby UTLAs in the source data. Where this is the case we have not made an adjustment to separate them.

The Health Index is only presented at country, region and UTLA level, and only using 2020 administrative geographies, but for some data alternative levels are needed to aggregate to higher level geographies, or to more recent geographies such as where non-metropolitan districts have become unitary authorities.

Some data sources, such as the Quality Outcomes Framework and the GP patient survey, present data based on health geographies rather than administrative geographies. These data are collected for individual GP surgeries with the GP practice code, then aggregated using the National Statistics Postcode Lookup (NSPL) to LTLA level, and again to UTLA level. Postcodes for GP practices are published by NHS Digital.

All indicators used published for GP surgeries presented the numerators and denominators as well as the value, and the values are percentages. Therefore, numerator and denominator for each LTLA can be calculated as the sum of the numerators and denominators respectively for the GP practices that fall within that LTLA, and the value for the LTLA can be calculated from the calculated numerator and denominator.

There were no changes to the boundaries or structure of LTLAs and UTLAs in 2015 to 2018. In 2019 and 2020 there have been mergers of LTLAs to form UAs or non-metropolitan districts, and some counties have been abolished and replaced by the resulting unitary authority. The areas that have changed are:

2020:

  • Buckinghamshire UA (E06000060) created from a merger of four non-metropolitan districts (E07000004-7)
  • The county of Buckinghamshire (E10000002), which had comprised the same four non-metropolitan districts as the new UA, was abolished

2019:

  • Dorset UA (E06000059) created from a merger of five non-metropolitan districts (E07000049-53)
  • Bournemouth, Christchurch and Poole UA (E06000058) created from a merge of one non-metropolitan district (E07000048) and two unitary authorities (E06000028 and E06000029)
  • The county of Dorset (E10000009), which had comprised the non-metropolitan districts E07000048-53 that were merged into the two UAs above, was abolished
  • East Suffolk non-metropolitan district (E07000244) created from two non-metropolitan districts (E07000205 and E07000206)
  • Somerset West and Taunton (E07000246) created from two non-metropolitan districts (E07000190 and E07000191)
  • West Suffolk non-metropolitan district (E07000245) created from a merger of two non-metropolitan districts (E07000201 and E07000204)

The new areas can be calculated or estimated from the relevant non-metropolitan districts and UAs. Values are calculated for the new non-metropolitan districts as well as the new unitary authorities despite not being included in the Health Index, as they may be needed to calculate non-metropolitan counties using 2020 geography. Where further aggregations use the 2020 geography for years 2015 to 2018, the populations of those previous geographies in those years from the Office for National Statistics’s (ONS’s) mid-year population estimates are applied.

Aggregation to higher geographies followed the Public Health England (PHE) Technical Document on Aggregations:

  • Method 1: if all the areas needed to form the new area are provided in the dataset, the new numerator and denominator are calculated as the sum of the numerators and denominators respectively of the comprising areas; the statistic is then calculated appropriately from the numerator and denominator (for example, a rate per 1,000 or a percentage)
  • Method 3: if the numerator or denominator is not provided, the new value is calculated by multiplying each of the values of the comprising areas by the population of that area divided by the population of the new area, and summing the adjusted values

Method 1 calculates an aggregated value, while Method 3 provides an estimate. Each indicator was aggregated with Method 1 or 3 as appropriate. If an indicator’s source statistic was not a rate or a percentage, Method 3 was automatically used as for statistics such as means (for example, Pe.3.1.c Self-reported well-being mean happiness score) where a numerator and denominator is not applicable. Method 3 was also used to give an estimate where the statistic was an age-standardised rate. In subsequent versions of the Health Index, a calculation for age-standardised values will be provided using Method 1 where the age-breakdown of the population in each of the geographies is used to calculate the resulting value.

Some denominators are not based on the whole population of the local authority, or not based on population at all. Where the numerator and denominator are not provided, by making estimations based on population proportions we are assuming that the denominator also follows these proportions. For example, if the denominator is the number of people aged 65 years and over, we are assuming that the proportion of the population of the old area that is 65 years and over is the same as in the new area.

In future, in order to estimate values more accurately where numerators and denominators are not provided, we could use the numbers of the denominator in each of the areas we are calculating from and to where these are available. This was beyond the scope of this version of the Health Index.

If the former county of Buckinghamshire is provided but the 2020 UA Buckinghamshire is not, the latter is given the values of the former because of the 1:1 relationship.

In the case that the old county of Dorset is provided, but non-metropolitan districts are not, the new UA of Dorset can be estimated from the county of Dorset, as well as the non-metropolitan borough of Christchurch, which is then used to calculate the new UA of Bournemouth, Christchurch and Poole.

The value for the former county of Dorset is assigned to both the new UA of Dorset and the old non-metropolitan borough of Christchurch. Numerators and denominators can be estimated by adjusting those for the former country of Dorset by the population of the calculated area divided by the population of the former county of Dorset. This process is carried out before the aggregation to 2020 geographies so that the estimated value for Christchurch can be used to calculate the value for the UA of Bournemouth, Christchurch and Poole if needed.

Back to table of contents

14. Imputation of missing data (COIN Step 3)

For the purposes of the beta, we have taken a more simplistic approach to imputation, which is in line with how other indices handle missing values. The detail of the approach is as follows, presented in the order that steps were applied:

  • if in the back-series we had results either side of a missing value for an upper-tier local authority (UTLA), the missing year(s) value was calculated as a linear interpolation of the values either side
  • if one or more values are missing without values available on both sides in the time series, missing values are replaced with the nearest adjacent value
  • if a value was suppressed because the numerator was small, that is, the value was too low to be presented, it was replaced with the lowest value presented for that data series
  • if a value was suppressed and the denominator was small, that is, there were too few observations to base it on, it was replaced with the median value from the data series
  • if a value is missing for an UTLA for all years, we impute the mean for the region; this only occurred for two UTLAs for one indicator

We are developing a more sophisticated approach to imputation methodology with the Office for National Statistics’s (ONS’s) editing and imputation group, but have not produced this for the Index’s beta release, where this simpler method suited our needs given the nature of all missing values.

Back to table of contents

15. Multivariate analysis (COIN Step 4)

Typically for a composite index, we would aim to avoid collinearity between indicators as that suggests they are measuring similar topics, so one or more may be redundant to include.

By nature of the Health Index’s aims of presenting health at multiple levels, and being transparent in its construction, the Index looks to capture multiple indicators measuring similar principles and cluster these into subdomains for comparison. Similarly, for the beta version we want to present options for measures of different principles where these are available and appropriate. We also expect many of our indicators are correlated because of including risk factors and the outcomes that we expect are associated with those risk factors.

We assessed correlation matrices of all indicators within each domain, using this in conjunction with factor analysis when multiple data options were available for indicators to assess which was a better fit for the Index as a whole.

Factor analysis was used to produce the weights for each indicator, at which point the suitability for some indicators was assessed more thoroughly. Indicators that were removed from the Index at that stage are listed in COIN Step 6.

Back to table of contents

16. Homogenising the data (Normalisation, COIN Step 5)

It is necessary when constructing an index to transform all indicators to a homogenous scale.

Scaling

Certain indicators needed to undergo directional adjustment such that for all indicators, a higher value corresponds with better health – this process is as simple as multiplying the indicator by negative one. For example, lower smoking prevalence is associated with better health. Therefore, the smoking prevalence indicator needed to be directionally adjusted.

Population differences between upper-tier local authorities (UTLAs) and regions were accounted for in scaling through the calculation of proportions or rates. To do this, the Office for National Statistics’s (ONS’s) population estimates were applied to all indicators that measure raw counts. Accounting for differing characteristics, namely age and sex, is more difficult and was not within the scope of this version of the Health Index. We have used age-standardised rates where they were applicable and available, but this was reliant on the data published and so was only possible for a minority of the data sources used.

It was important to address indicators that displayed skewness in their distribution, as this would distort the resultant index. Ideally, the indicators would also be smooth (measured by kurtosis). Functional transformations have been applied on a case-by-case basis to try to address this. We explored a number of commonly used transformation methods (log, square root, cube root, square, cube, reciprocal) and selected the method which most effectively reduced the skewness and kurtosis of the indicator. For the majority of indicators, the log transformation was used or the data were untransformed.

Normalisation

There are a range of methods that can be used to normalise the indicators. The three most commonly used are ranking, scaling to range and standardisation.

The methods available for normalisation are narrowed greatly by the Health Index’s need to be comparable across time and geographic area, with additional years of data not affecting the back-series values. Time-series standardisation is the method used for the beta version of the Index, and is our preferred method for the more finalised version of the Index.

Regular standardisation involves subtracting the mean value and dividing by the standard deviation, for each indicator. For the Health Index, it is not suitable to employ this method across all observations as additional years of data would change the mean and standard deviation calculated and, consequently, all data from previous years. If standardisation is applied within years, the resultant values would no longer be comparable across years – a key attribute of the Health Index.

Temporal comparability, without enforcing annual revisions, can be achieved using a method discussed in the COIN (2020) 10-step-guide. The standardisation method is modified, such that the mean and standard deviation for each indicator are calculated for a base year and are then applied to the whole time series. This allows for comparisons across time and only causes back-series changes when the reference year is updated – a common practice used across a number of national statistics.

Although we have used time-series standardisation for this beta release, and it is our preferred method, there is an alternative method that we propose to use as a comparison in sensitivity analysis when we refine this concept further into the more finalised version of the Health Index. This is time-series minimum-maximum scaling. With this technique, indicators are scaled to a normalised range (0, 1). Often, 0 is given to the minimum value observed and 1 to the maximum. As with standardisation, this method would need to be adapted to allow for temporal comparability, without requiring annual revisions of the back series. In the same fashion, the minimum and maximum are found for a given base year. If the minimum and maximum values in the base year are not the minimum and maximum across all years, the scale created will not lie in the range (0, 1), but will contain values slightly lower and higher than these boundaries.

There are limitations to the time-series minimum-maximum scaling method. If the variables to be normalised do not follow similar distributions, there can be issues of distortion. Furthermore, the normalisation for each indicator is dependent on just two values: the minimum and maximum. If these values are unreliable or outliers, the normalised distribution will also be distorted (COIN, 2020).

Given the Health Index’s need to compare across time and space without annual revisions, alternative methods of standardisation, minimum-maximum scaling and ranking were not deemed appropriate for this statistic.

Back to table of contents

17. Weighting (COIN Step 6)

Because of the Index’s hierarchical structure, there are multiple levels at which weighting must be applied. Indicators must be weighted within their subdomain, subdomains must be weighted within their domain and the domains must be weighted within the overall Index. Different weighting approaches have been used at these different levels. The approach to weighting is not always in line with our preferred methods for the fuller index, but it is still likely that weighting methods will be different for different levels.

Weighting indicators within subdomains: time-series factor analysis

The fundamental assumption of factor analysis is that there is a latent factor that underpins the variables in a group. This translates to this level of the Health Index: we assume that there is a single unobserved variable that underpins the indicators within each subdomain. The indicators within each subdomain will likely be highly correlated, which could lead to double counting in the index. Factor analysis directly addresses this issue, accounting for the correlation between indicators in their implied weights. Factor analysis also groups indicators into subdomains based on statistical information, and not just theorised concepts, as had been the case thus far in the Index’s development.

As with the normalisation methods, factor analysis cannot be used in its regular form to meet this index’s aims. If the factor analysis were carried out across all observations, the weights would change with each additional year of data. As such, the weights need to be calculated for a set time period, and these weights are held constant until a review date. Sensitivity analysis will be undertaken to ensure that these weights do not alter greatly when they are derived using different time periods.

To conduct factor analysis, the indicators must be standardised before they are weighted and combined. For this purpose we have standardised our indicators to have a mean of 0 and standard deviation of 1 for this step.

We conducted factor analysis on all indicators within one domain, and assessed the most suitable results for grouping into subdomains using our correlation matrices and hypothesised indicator groupings. Where groupings were surprising, we re-ran factor analysis using only specific variables to confirm the subdomains we are presenting would not split out into separate factors (subdomains) if allowed to. Each indicator could only be included in one subdomain even if it loaded onto multiple factors, for ease of user interpretation. Where indicators did not load as expected in our initial hypotheses, we critically considered the sources used for those indicators to check they were measuring the intended information and tested the indicator in different subdomains and even domains if applicable.

Each indicator’s factor loading is the amount of the latent factor (subdomain) variance, which that indicator can explain. Weights were constructed for each indicator within each subdomain using the scaled factor loadings within that subdomain. For example, if a subdomain had two indicators with factor loadings of 0.7 and 0.5 respectively, one indicator would receive a weight of 0.7/1.2 and the other of 0.5/1.2.

These weights were compared with those produced using a regression method on the latent factor as a sensitivity test. While individual weights could be quite different with this alternative method, the overall subdomain score was highly correlated (0.97) between the two approaches.

Limitations of factor analysis

There are limitations involved with using factor analysis. This method only accounts for the collinearity between indicators and does not derive any measure of the importance of the indicators (COIN, 2020). Furthermore, this method gives lower weight to indicators that are not highly correlated with others, while the low correlation between indicators is often the exact reason why an index is being created, because it suggests the indicator that is not well correlated with others is measuring a different aspect of the whole. There are subjective choices made within the process that affect the resultant weights.

Alternative methods

We have found no alternatives to time-series factor analysis that could be used appropriately for this step. Equal and random weighting of indicators within each subdomain will be used for comparisons with factor analysis in our sensitivity testing. Alternative methods that we did not deem suitable include:

  • equal weighting, which does not account for intercollinearity between indicators and oversimplifies results
  • regression analysis, which relies on an existing dependent variable to already measure the concept we are producing an index for
  • the unobserved components model method, which accounts for indicator variance within weightings, but does not address double-counting between indicators
  • budget allocation process and analytical hierarchy process, which involve asking experts to allocate points to indicators to denote their weight, or rank them; these are resource-intensive because of the sheer number of indicator comparisons required at this level of the index, and less directly informed by statistical process
  • conjoint analysis, which is similar to the budget allocation process, but asks participants to rank what would be the end result of the Index (in this case, upper-tier local authorities (UTLAs) based on the Health Index’s definition of health; from here, weights for indicators are constructed to match the proposed order, this involves detailed knowledge of geographic areas from participants so is not seen as suitable
  • public opinion, which would produce weights that are representative of the population, but requires detailed response from a large number of users
  • price-based approach, where each indicator is weighted according to its monetary value; it would be a large task in itself to produce these values to then weight

Exclusion of certain indicators

The following indicators were all deemed suitable for inclusion within the Health Index but removed during factor analysis, because of not fitting neatly within factors and then having their inclusion considered more critically as a result.

Reasons for exclusion include not measuring the full concept they were intended to; being based on a small subset of the population; and seeming unsuitable to become subdomains on their own, given the approach to subdomain weighting that follows. These sources can be considered further for future versions of the Health Index when both general content and subdomain weighting procedures are revised:

  • prevalence of schizophrenia, bipolar affective disorder and other psychoses
  • alcohol misuse in children and young people
  • drug misuse in young people
  • prevalence of adults with learning disabilities
  • primary school pupil attainment
  • access to further education
  • long-term unemployment
  • working hours
  • social isolation
  • child carers
  • cold homes
  • statutory homelessness
  • ease to walk and cycle

Weighting subdomains within domains: equal weighting

For the purposes of the beta version of the Health Index, all subdomains have equal weighting within their domain. This means if Healthy People as a domain has five subdomains, each subdomain has a weight of 1/5 of the overall domain. If Healthy Lives has seven subdomains, each subdomain has a weight of 1/7 of the overall domain.

This approach is not our preferred method for the fully-developed version of the Index, but was most suitable for the beta version because our preferred method of budget allocation and analytical hierarchy is resource-intensive on participants. It does not make sense to apply this method for an experimental version of the Index when content is likely to change as a result of consultation, which would require this method to be repeated.

At this level of the Index there is less collinearity between variables (subdomains), as they attempt to measure separate concepts. Data-driven approaches, such as factor analysis, would therefore be less effective. A budget allocation process or analytical hierarchy process can be used because there are far fewer individual components to compare: 17 subdomains (for this beta version), rather than 58 indicators. As outlined previously, these approaches would ask a group of experts to assign weights of importance to the different subdomains.

If this method remains the preferred approach following consultation after publication of this beta version of the Index, we propose that our Expert Advisory Group (EAG) will be used as the participants in these methods, using their expertise in public health to assign appropriate weights of importance to the subdomains. It is important in this process that the pool of experts is selected carefully: if the pool does not represent “a wide spectrum of knowledge, experience and concerns” (COIN, 2020), it could be biased. We believe that the EAG provides a representative group of participants.

All methods considered for weighting indicators within subdomains were considered for this level too, and many were deemed unsuitable for similar reasons to the previous.

Weighting domains to the overall Health Index score: equal weighting

Equal weighting will be used to weight the three domains. The Health Index’s aim is to offer a broad measure of health and not focus simply on health outcomes, and weighting each of these domains equally would satisfy this. If a participatory approach to subdomain weighting suggests the three domains should not be equally weighted – for example, if all of the subdomains of Healthy Places are deemed highest importance – this decision will be reviewed.

Back to table of contents

18. Aggregating indicators (COIN Step 7)

Different aggregation methods allow compensability between indicators to varying degrees. The two most popular methods used are linear and geometric aggregation, though some argue that a completely non-compensatory approach is ideal.

For the beta version of the Index we have used linear aggregation. Linear aggregation involves taking the (weighted) arithmetic mean of indicators to calculate the Index. This is the simplest aggregation method; however, it introduces compensability into the composite index. This means that poor performance in one area can be offset by good performance elsewhere (COIN, 2020). The Health Index should encourage improvements across the broad range of health indicators, so linear aggregation may not be the preferred method for future versions.

During consultation we will investigate geometric linear aggregation and Mazziotta-Pareto index. Geometric aggregation refers to taking the (weighted) geometric mean of the indicators. Using geometric aggregation reduces the implied degree of compensability between indicators. This method is more critical of poor performance and offers a greater incentive to improve low-scoring indicators rather than continue to improve in an area that is already high scoring (Organisation for Economic Co-operation and Development, and Joint Research Centre, 2008).

Mazziotta-Pareto index construction assumes that all indicators are equally important (have the same weight) and are non-substitutable. This is achieved by extending the previously described methods, introducing a penalty for units that have unbalanced performance across indicators – this penalty is a function of the standard deviation and coefficient of variation across the unit’s indicator values. Units that perform highly across all indicators will have the highest composite index value, while those with low and unbalanced scores are disadvantaged. A variation on this method, the Adjusted Mazziotta-Pareto Index allows for spatio-temporal comparisons (Mazziotta and Pareto, 2016).

As stated, this method assumes equal weights, which is not applicable for the Health Index. However, we are currently investigating the possibility of adapting this method to allow the inclusion of weights. If possible, then this method will be tested as a future aggregation approach. It may become the preferred method if it is more successful than linear or geometric aggregation in reducing compensability between indicators.

We considered but rejected the following aggregation methods as unsuitable. These have various advantages, but none allows the production of a top-level, national index value to be calculated:

  • exponential transformation and aggregation, as used in the Index of Multiple Deprivation
  • non-compensatory multi-criteria approaches
  • data envelopment analysis (DEA) and benefit of the doubt approach
Back to table of contents

19. Sensitivity analysis (COIN Step 8)

Some sensitivity analysis has been conducted throughout earlier steps, such as testing different indicators and methods for calculating weights within factor analysis in Step 7. However, there are several steps where we have not yet tested alternative methods we have identified. It makes most sense to consider alternative methods after consultation, as changes to earlier methods and data will impact results in later steps.

Where we have already identified alternative methodologies, we will test those on the beta version during consultation to assess their suitability.

Back to table of contents

20. Scaling for use

At this stage, all values are currently standardised using the 2015 mean and standard deviation, and centred around a score of 0. We aimed to avoid presenting negative values for ease of user interpretation, and in general to be consistent with other indices’ scales where possible.

The Health Index has been scaled to a base of 100 for England, with base year of 2015. Values higher than 100 indicate better health than England in 2015, and values below 100 indicate worse health. The scale is such that a score of 110 represents a score one standard deviation higher than England 2015’s score for that same indicator. In this way comparisons both over time and within a single year are simple to understand.

Back to table of contents

21. Future developments

Following the public consultation, the Health Index will be reviewed and refined, to produce a more finalised version of the index that will be made available for use. At present, the aim is for that version to be published in the first half of 2021 but this may be subject to change.

Back to table of contents

Contact details for this Methodology

Greg Ceely
health.data@ons.gov.uk
Telephone: +44 (0)207 592 8692