The purpose of this review has been to examine and assess existing, proposed other potentially effective methods of producing population estimates by ethnic group (PEEGs), for local authority districts of England and Wales, through:
study of documents including a summary prepared for this review, an unpublished review dated June 2012, the comparison with the census published by Office for National Statistics (ONS) (2013), PEEGs documents published in 2012 and earlier
consideration of administrative and other relevant data sources, and methods used elsewhere
consideration of analysis, which would help to prioritise and choose the most effective methods for the current decade
This report recommends that:
PEEGs should be resumed after confirming the granularity of output required by users. A plausible required granularity and secondary priorities are provided in this report (section 4.2).
Promising methods should be progressed quickly to a short-list of “beta test” tools and these developed to be tested against the 2011 Census (section 4.5). This test is necessary because none of the methods clearly outshine all others and all have their clear weaknesses.
The most effective method or combination of methods should then be developed for annual production. A review of the promising methods and strategies is provided in this report (sections 5.1 to 5.4).
Since it is expected that no single method will provide the most reliable estimates and that further relevant data will become available during the next decade, a robust strategy should be sought for early implementation. It should be designed so that additional estimates can be integrated when shown to improve on detail or accuracy in some or all sub-populations.Back to table of contents
Office for National Statistics (ONS) provided population estimates by ethnic group (PEEGs) for local authority districts (LAD) in England and Wales for mid year in each of 2001 to 2009, as Experimental Statistics.
For these estimates 2001 to 2009, the ONS methodological strategy was to disaggregate the cohort component population accounts for the mid year estimates (MYE) for each local authority, with an ethnic group dimension. Each of births, deaths, flows of migration into and out of each local authority district (LAD) and the special populations of armed forces, prisoners and school boarders, were estimated for males and females and each single year of age, summing across ethnic groups to the corresponding component in the MYE accounts.
The output was published by quinary age group by sex for England and for Wales, and for three broad age groups by sex for each LAD, along with the national totals of births, deaths, net migration and “other changes”. Results for primary care organisation areas (PCOs) were also published, derived by allocation of LAD results to PCOs. Results were published after rounding to the nearest 100 people. The method also provided the detailed age and sex structure of components of change for each group in each LAD, without rounding, which were felt to be insufficiently reliable to be published but could be made available for research projects.Back to table of contents
Concerns about the accuracy of Office for National Statistics (ONS) population estimates by ethnic group (PEEGs) led to ONS reviews after the publication in 2011 of the 2009 PEEGs. ONS announced in June 2012 their decision to stop further production, pending evaluation against the outputs from the 2011 Census. The main findings of the concerns and reviews are listed in this section.
The concerns indicate information available to evaluate the PEEGs and therefore potentially useful in designing a future methodology.
The PEEGs showed faster movement of minorities out of the areas where they were the highest proportion of the population, than did either the 2001 Census, or the experience of 1991 to 2001. A summary of dispersal is the change in geographical concentration of minorities within England and Wales, measured by the index of dissimilarity between White groups and the rest of the population. The index decreased between 1991 and 2001 from 0.519 to 0.515, but according to the PEEGs for 2006 had already rapidly decreased in 5 years to 0.429 (Simpson, 2010: 3). The index of dissimilarity for 2011 is 0.494, confirming the PEEGs’ large over-estimate of the dispersal of minorities.
Net migration to each local authority district (LAD) without an ethnic group dimension is different when taken from the census (as used in the PEEGs with its ethnic group dimension) and when taken from patient re-registration (as used in mid year estimates (MYEs)) to which the PEEGs were subsequently controlled. The differences were highly related to LAD ethnic diversity for 2000 to 2001 (Fry, 2010), with unknown but likely impact on the PEEGs. For example, if the most diverse areas’ migration was controlled upwards and the least diverse areas controlled downwards, further tests might show that the result was faster movement out of diverse areas, noted in the previous point.
The PEEGs showed higher White British population in London than published survey estimates (Travers, 2010).
ONS provided Quality and Methodology Information for PEEGs (ONS, Feb 2012). Its conclusion begins: “At present the PEEGs are Experimental Statistics and should not be confidently relied on in making major policy decisions. The estimates are likely to provide a reasonable broad estimate of the ethnic group composition of the population of England and Wales”. The report lists empirical results and observations on methodology that support this limited endorsement of the PEEGs.
PEEGs showed an ethnic distribution in 2009 different from the Annual Population Survey, for example, twice the size of the Chinese population (0.8% versus 0.4%).
PEEGs showed a higher percentage of White British children aged 5 to 15 than School Census (81.6% versus 76.9%) and more discrepancies for broad ethnic groups in London than elsewhere.
PEEGs showed more White British and fewer White Other than birth registrations linked to NHS birth notifications in 2008. London was again most discrepant and more so than could be explained by the 10% of records without ethnicity recorded, which were concentrated outside London and not related to ethnic diversity (ONS, 2011).
PEEG relies on assumptions about patterns of migration between LADs, which are unlikely to hold, with insufficient graduation between LADs or types of LAD.
PEEG assumptions to allocate ethnic group to international migration using the International Passenger Survey information on country of birth could have been framed differently, with impact of over 5% on the final population estimates of African, White: Irish and Other White.
In June 2013, ONS released a comparison of unpublished PEEGs for 2010 with the 2011 Census ethnic group distribution. It confirmed discrepancies that were most evident in the region of London but equally large in other ethnically diverse LADs. The comparison was limited because cross-tabulations of ethnic group with age had not been released from the census and relative confidence intervals round the census estimates of ethnic group were wrongly applied as absolute values.
An unpublished ONS review dated June 2012 and a summary paper provided for this report, proposed alternative methods for future production of PEEGs. The ONS description of alternatives is reproduced at Appendix 1 and these and other potential strategies are discussed in section 5.Back to table of contents
Office for National Statistics (ONS) considers quality dimensions of relevance, timeliness and punctuality, comparability and coherence, accuracy, output quality trade-offs, user needs and perceptionsand accessibility and clarity (ONS, 2012).
For relevance and user needs, the clarification of the purposes of population estimates by ethnic group (PEEGs) would be useful. They are understood for this report to be to help in (a) the identification of social inequalities that government seeks to reduce, and (b) the identification of diversity of demand for services based on culture or tradition, that government seeks to satisfy. These services and policies vary subnationally and are delivered by local as well as central government.
It is assumed in this review that users require PEEGs (a) for local authority district (LAD) areas and (b) which identify ethnic groups more finely than the broad headings of White, Asian and Black. The ONS unpublished review from June 2012 suggested that future estimates would merge White: Irish with White: Other and this aggregation of categories was used for the comparison with the 2011 Census (ONS, 2013). Such a reduction in granularity seems unnecessary and unhelpful.
A high but secondary priority is broad age structure to address policy areas such as adult care, youth services and employment, such as 0 to 4, 5 to 15, 16 to 24, 25 to 44, 45 to 64, 65 and over. Important but lesser priorities are single year of age structure for re-aggregation to user’s needs, disaggregation by sex and smaller geographical units.
It is assumed in this review that users require PEEGs referring to mid-2014 to be produced by the end of 2015, by which time the 2011 Census will be considered out of date, given the considerable annual change in ethnic diversity. Average annual growth of minority populations as a whole was 6% in the 2000s and considerably greater for some groups. Some may consider this timescale too slow.
Limited comparability is an important issue for evaluation of PEEGs. PEEGs “accuracy” by measurement against another source, is limited by the known patterns of unreliability in any measurement tool for ethnic group. One must accept that ethnic group will differ significantly when recorded for the same person at different times on the same register and expect larger differences when question layout or categories change, or when the context, mode and purpose of the record-filling changes. The unreliability is greater for all categories other than White British, greater for mixed groups than for “‘single” ethnic groups and is very high for residual groups titled “Other” in the census classification (Simpson and Akinwale 2007; Saunders et al. 2013; Simpson et al, 2014).
Coherence should be used to evaluate potential methods. There are two structural aspects of changing ethnic composition, which should be observed in successful methods.
First, there is considerable “ageing in place” of each ethnic group, such that its age structure in later years is predictable from its age structure at earlier years, because the number aged “a” at an earlier year is related to the number aged “a plus t” at a time “t” years later. Since numbers of births and deaths are highly dependent on age structure, not only the future age structure but the growth of each ethnic group is predictable. Migration and mortality do reduce this predictability, but the relationships should be observable broadly.
For example, a projection of 12% growth for Birmingham was accounted for by age momentum, which was particularly responsible for growth in the Indian, Pakistani and Bangladeshi populations (Simpson, 2007: 14 to 15). The proposed methods apart from cohort component estimates suffer from ignoring this relationship. All the methods upset the relationship when they constrain results to totals that have been independently estimated without an ethnic group dimension. Indicators of cohort stability are discussed at section 4.5.8. It may be possible to use cohort stability to improve constraining methods, but we are not aware of existing methods to do this.
Second, the geographical spreading of immigrants and their descendants from areas in which they have settled has been observed in the UK and other countries over many decades and generations. The scale of this “spreading” or dispersal is well known in Britain and only upset by large student populations or other points of attraction to new streams of immigration. The existence and approximate pace of this structural change to ethnic composition of areas should be reproduced in PEEGs.
The potential of learning from comparison with the 2011 Census has not yet been realised. A further evaluation against the 2011 Census should be a high priority in order to test out the current and alternative methods. Without such an evaluation, it is hard to judge any method as suitable.
Alternative methods should now be developed to a “beta test” stage where it is shown they (a) can be practically implemented and (b) promise potentially accurate updates to the LAD ethnic group distribution going forward from the 2011 Census.
The closest possible implementation of each of these “beta test” methods should be applied to mid-2011 without use of the 2011 Census information.
The evaluation should include age and sex dimensions, for those methods that provide it. This is important in its own right, but also allows insights from the separate analysis of age groups highly dependent on fertility (age 0 to 9), on migration (age 16 to 34) and on mortality (age 65 and over).
The methods should include a benchmark of no change since the 2001 Census.
Methods that depend on the mid year estimate (MYE) will need to be constrained to the 2011 Census estimates without an ethnic group dimension, so that discrepancies due to the MYE are not included. However, it may also be of use to evaluate estimates both with and without constraint to the MYE, when this is possible, as the constraint itself may introduce a bias.
Accuracy should be represented by the absolute percentage distance of a PEEG from the 2011 Census estimate. The approach taken in ONS (2013) compared absolute differences between ethnic group distributions, leading inevitably but misleadingly to the conclusion that smaller groups were relatively well estimated.
A regression analysis will allow the separate impacts on accuracy to be assessed of: methods, ethnic group, age, sex, type of area including its ethnic composition, population change and characteristics such as presence of a University or armed forces. Interactions between these independent variables will indicate if one method appears to have particular strengths or weaknesses for types of population or area. Such an analysis is likely to first transform the accuracy variable to achieve an approximate normal distribution to allow tests of significance (see, for example, Lunn et al. 1999 for a similar analysis without the dimension of ethnic group).
Summary measures in the evaluation should include not only the average accuracy achieved across all LADs, but also:
the geographical spread of each group (for example, its index of dissimilarity with the rest of the population across all LADs)
cohort stability, which can be measured by mean percentage deviation (MPD) and mean absolute percentage deviation (MAPD) of a group’s current age a plus t compared with age a at the previous census year t years before, with the mean taken across each age estimated within 0 to 15 and 34 to 59 (that is, before mortality is effective and omitting the years of highest migration); if the MPD is similar to the MAPD it suggests that cohorts are being affected similarly by migration as one would expect, if the MPD is much smaller than MAPD, it suggests that age cohorts are being differently affected, consistent with errors introduced by constraining; the variation in MPD across the age groups would be an alternative measure of stability, lower variation indicating greater stability
Alternative methods of disaggregating PEEGs from LADs to smaller areas should be included in the evaluation against the 2011 Census.Back to table of contents
The intention of this section is to help identify the most likely “beta test” methods for evaluation against the 2011 Census. A table describing potential methods is followed by commentary on how the methods may be combined in a robust strategy for population estimates by ethnic group (PEEGs). A further table provides specific comments on methods and data sources.
The following table comments on proposed methods. It is assumed that each method’s results will also be considered after constraint to the current mid year estimate (MYE).
Proposed potential methods
|Methodological approach||Commentary||Potential for imminent usage (beta test)|
|The previous census without adjustment||Minimal resources: any alternative method will use considerably more resources and thus require proof of its improved quality compared with this approach.||Straightforward|
|Demographic cohort progression||A simple ageing in place since the previous census, at single year of age, respects the momentum of age structure.||Straightforward|
|Demographic cohort component modelling||(a) the current method as it is, (b) an elaboration by implementing suggested improvements, (c) a simplified application of ageing, fertility, mortality, UK and overseas migration (for example, the Hamilton-Perry approach of cohort change ratios from 2001 to 2011 to include all mortality and migration).||Work would be required to bring the current estimates to 2011 and to design improvements or simplifications.|
|Direct estimates from Annual Population Survey (APS) data||APS data may be pooled over several years, with allowance for communal establishments.||Straightforward, but only for regions and very large conurbations.|
|Direct estimation from administrative data||As data become available from administrative sources with ethnic group, or through name analysis.||Currently only Schools Census and birth notifications could be used as proxies for young population and indicators of total population.|
|Small area modelling with survey and auxiliary data (raking and contingency tables, binomial or multinomial regression)||Proxy data to estimate ethnic group percentage or counts. Modelling (a) uses sampling variability by "borrowing strength" from similar areas, (b) uses relationships between local auxiliary data and ethnic group found from the survey or from other sources, and (c) ensures consistency with known margins. The known margins are subtotals for any combination of age, sex, ethnic group and geographical units, derived from other methods or from MYEs.||Initial models could be implemented, for example, with APS, Schools Census, birth notifications and cohort progression estimates. In the long-term, further administrative datasets could be used including from primary care and the results from name analysis of administrative data.|
|Source: University of Manchester|
Download this table Proposed potential methods.xls (28.2 kB)
A successful strategy is likely to combine more than one methodological approach. These should be evaluated against the 2011 Census at the same time as each method is assessed individually. The following three types of combining methods are likely to be of practical importance for the PEEGs.
An evaluation will identify whether two or more methods’ errors have low or negative correlation, an indication that their average is likely to be a more accurate estimate than any method alone. In such a (possibly weighted) average, the aim is that each method counterbalances the major errors of the other(s). Evaluation against the 2011 Census will confirm whether feasible combinations outperform individual methods.
Methods that work well nationally or for regions but not for local authority districts (LADs), may be subject to “hierarchical constraining”. For example, the APS might be used for a national estimate, to constrain regional estimates based on a combination of cohort progression and the APS, which in turn could constrain LAD estimates based on modelled administrative data.
Methods may be appropriate only for some sub-populations. If the principle can be accepted that estimates should be the best possible in all cases, a method may be supplemented in some sub-populations (by area, group or age), so long as the decision to do so is triggered by evidence. This may be the case when administrative data is missing or of poor quality in some areas. It may also be appropriate where two datasets have inconsistent categories recorded for ethnic group (for example, from name analysis), suggesting a different method should be used for some ethnic groups.
The following table is intended to help reduce the promising avenues of research when developing the potential methods into practical implementation. It lists concerns and suggestions about methods and data sources, arising from Office for National Statistics (ONS) documents or during this review. It begins with aspects of specific methods and then lists concerns that apply to more than one method.
Potential research methods with comments
|Method, component, or data source and concern||Comment|
|Aspects of specific methods|
|Cohort component model: relies heavily on the previous census||The census is the most detailed source of relationships between age, sex, ethnic group and geography, and should be used where its relevant patterns are plausibly stable. As further updated data sources become available, use of the census can be reduced. This concern may have been over-stated in previous reports, in that some questioned census patterns were not then shown to be unstable. Other census patterns were certainly not well used (see internal migration mismatch section of this table).|
|Cohort component model: internal migration mismatch between census and MYE||A mismatch noted by ONS in 2010 between LAD total net internal migration from the census and the equivalent used in the MYE, was highly correlated to ethnic group diversity. Should this also be the case for 2010 to 2011 when census migration is released later in 2014, its impact on the estimation of that component should be investigated.|
|The cohort component model is complicated||The accumulation of components each with uncertain estimates makes evaluation of the model results difficult and therefore improvements are difficult to justify.|
|Cohort component model: lessons from other producers of PEEG in the UK, USA and Canada||Although these other PEEG are projections from the previous census, they should not be dismissed on that account. Projection and estimation methods have many common elements and those for the UK offer alternative elaboration of assumptions for a cohort component approach for LAD areas. See, for example, Wohland et al. (2010) and Rees et al. (2013), who are seeking funds from the Economic and Social Research Council (ESRC) Secondary Data Analysis Initiative for an update with 2011 Census information, with the support of ONS.|
|Cohort component: each ethnic group’s internal migration has a specific geographical pattern||The assumption in the current method that a group’s age-sex propensity to migrate out of a district is the same for each district, albeit adjusted in its net impact by “attraction factors”, is too crude. It may be responsible for the over-spreading of minorities noted in the current PEEGs. An alternative assumption was implemented by the Leeds Understanding Population Trends and Processes (UPTAP) projections, after estimating the propensity for each group in each district, which showed considerable variation. For example Indian outmigration varied from 0.02 to 0.03 for Leicester, Wolverhampton and Slough to 0.20 to 0.30 for areas with few Indian residents (Rees, 2014).|
|Cohort component model: sampling error from use of International Passenger Survey (IPS) for international migration||Comment: ONS (2012, Feb: Table 7) states a 95% confidence interval around the IPS estimates of international migration used for the 2009 PEEG represents more than 1% of the total estimate of every ethnic group other than White: British, and 3% to 7% for eight of them. This is a substantial uncertainty from just one source.|
|Cohort component approach: other issues||Concern has been raised about other assumptions necessary for PEEG based on the cohort component approach, for which there is no supporting evidence. These may not significantly affect the results relative to the other issues raised previously, because they affect few people, but include: under-estimation of births due to mothers immigrating during a year; uncertainty in fertility estimates due to children not in their mother’s households; propensities to move to Scotland and Wales assumed the same for each ethnic group; application of the single year’s experience in census data for allocation of international migration to specific LADs; the use of country of birth of asylum seekers for England and Wales as a whole to allocate ethnicity to asylum seekers in every LAD; the assumption of equal mortality rates by age for each group within at LAD, for which alternatives are possible (for example, Wohland et al., 2010).|
|Administrative records: reliability of ethnic group response on different records||The availability of ethnic group or proxies for ethnic group on administrative records will be of particular importance for improving sub-regional population estimates. It would be useful to understand the relationship between ethnic group or proxies for ethnic group on administrative records and that recorded by the census, through matching studies with primary and secondary care health records, birth notifications, School Census, HESA records and other potential datasets.|
|Administrative records: name analysis||Ethnic group proxies from name analysis seem a promising resource for small area modelling. It is of varying validity for each ethnic group category used in the census (more accurate for Asian, African and continental European groups than for Caribbean and Irish, for example). It provides complete analysis for datasets that have no or incomplete record of ethnic group, such as patient records. Name analysis and evaluation of its ability to indicate ethnic group have developed in recent years (Peterson et al., 2011; Mateos et al., 2011). Paul Longley et al. are seeking funds from the ESRC Secondary Data Analysis Initiative to develop methods that ONS can test against the 2011 individual records, with ONS support. This approach could be extended to other datasets.|
|Small area modelling: methods for multiple categories are not yet developed for practical use||Alternatives of multinomial and a series of binomial models will need to be considered. An evaluation against the 2011 Census must establish the potential accuracy and inaccuracy of these methods.|
|Concerns that apply to more than one method|
|Use of proxy information to estimate ethnicity from its relationship with nationality or country of birth||The relationship is derived from one dataset, usually from the previous census and applied to another dataset, which does not have ethnicity but does have country of birth or nationality. If that relationship is not accurate for the second dataset or if the relationship changes over time, biased estimation will occur. Used in the current cohort component method to allocate ethnic group to flows of international migration from the IPS, it may also be considered for other methods using other data sources. ONS (2012: Table 7) showed that the relationship from the census could be estimated plausibly either from residents or from immigrants, and that the impact was very significant after 8 years (creating a change of more than 5% of the final population estimate for three ethnic groups).|
|Special populations: armed forces, prisoners and, more generally, residents not in households||In the cohort component method, the distribution of ethnic group for special populations should be updated using the population age-sex-ethnic group distribution of the most recent estimate. In methods based on household surveys, the distribution of ethnic group for non-household populations should similarly respond to the changing age-sex-ethnic group distribution of the population as a whole.|
|Individuals change their identity over time, which will affect the ethnic group populations||Although there are individual changes that have an impact on population estimates, the net impact is thought to be small compared with the impact of changes in the census question and not clearly related to age (Simpson et al, 2014). Individual unreliability creates an inherent unreliability in PEEGs, but it is highly unlikely that it can be practically modelled in any way that would improve the PEEGs.|
|Reliance on MYEs||It is possible that constraint of PEEGs to the MYEs for the year would bias the ethnic group estimates. For example, if the constraint had more impact in areas of greater diversity, or the other way around, then it would induce a change in the national ethnic composition. If the impact of the constraint was to correct for mis-estimation of one ethnic group, then that mis-estimation will be spread to all groups, inducing more error than it corrects. The impact of constraining to the MYE can be estimated as part of the evaluation against the 2011 Census.|
|Treatment of zero population||Because of the increase in minority population size nationally and its faster increase in areas of low minority population, small populations should not be assumed to remain small or to be unimportant to policy analysis. For many age-sex-ethnic group combinations, there will be more zero populations in the previous census, in survey estimates and in past administrative datasets, than in current population counts. Methods should be adapted as necessary to ensure that estimation is not biased due to zero counts in data sources.|
|Allocation of estimates from LADs to smaller areas||Currently LAD estimates are shared to primary care organisation areas using the previous census distribution of ethnic-age-sex population. An improved method of providing small area PEEG would use School Census or other administrative data to reflect changes in distribution since the census.|
|Source: University of Manchester|
Download this table Potential research methods with comments.xls (37.9 kB)
Fry, R (2010). Internal migration comparison (PEEGs compared against census). Email from Rob Fry, ONS, to Ludi Simpson. 13 January 2010.
Lunn, D. J., Simpson, S. N., Diamond, I. and Middleton, E. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327 to 344.
Mateos, P., Longley, P. and O’Sullivan, D. (2011). Ethnicity and Population Structure in Personal Naming Networks. PLoS ONE, 6(9): e22943.
Mathur, R., Grundy, E. and Smeeth, L. (2013). Availability and use of UK based ethnicity data for health research. NCRM Working Paper 01/13. http://eprints.ncrm.ac.uk/3040/1/Mathur-_Availability_and_use_of_UK_based_ethnicity_data_for_health_res_1.pdf
ONS (2011). Quality of ethnicity and gestation data subnationally for births and infant deaths in England and Wales, 2005-2008. Statistical Bulletin, 13 September. http://www.ons.gov.uk/ons/dcp171778_232681.pdf
ONS (2012). Population Estimates by Ethnic Group: Quality and Methodology Information. 6 February. http://www.ons.gov.uk/ons/guide-method/method-quality/quality/quality-information/social-statistics/summary-quality-report-for-population-estmates-by-ethnic-group.pdf
ONS (2013). Comparison of mid-2010 population estimates by ethnic group against the 2011 Census. 25 July. http://www.ons.gov.uk/ons/guide-method/method-quality/specific/population-and-migration/pop-ests/population-estimates-by-ethnic-group/comparison-of-pop-estimates-by-ethnic-group-against-2011-census-estimates.pdf
Rees, P (2014) Personal communication including the file 'Asian Indian 2001 Internal Mig V3.xlsx'.
Rees, Philip, Pia Wohland and Paul Norman (2013) Using 2011 Census data to evaluate and update ethnic group projections, Presentation at the Census Research User Conference, Friday 27 September 2013, Birkbeck College, London.
Petersen, J., Longley, P., Gibin, M., Mateos, P. and Atkinson, P. (2011). Names-based classification of accident and emergency department users. Health and Place, 17: 1162 to 1169.
Saunders C. L., Abel G. A., El Turabi A., et al. (2013) Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: evidence from the English Cancer Patient Experience survey. British Medical Journal Open, 2013.
Simpson, L. (2007). Population forecasts for Birmingham, with an ethnic group dimension. Birmingham City Council, Birmingham. Reproduced as CCSR Working Paper 2007-12, University of Manchester. http://hummedia.manchester.ac.uk/institutes/cmist/archive-publications/working-papers/2007/2007-12-population-forecasts-for-birmingham.pdf
Simpson, L. (2010). ONS experimental population estimates with ethnic group dimension (PEEG): does their UK internal migration reflect evidence from the 2001 Census? Note to Rob Fry, Office for National Statistics. Ludi Simpson, University of Manchester, 4 January 2010.
Simpson, L. and Akinwale, B. (2007). Quantifying stability and change in ethnic group. Journal of Official Statistics 23, 185 to 208.
Simpson, L., Jivraj, S. and Warren, J. (2014). The stability of ethnic group and religion in the Censuses of England and Wales 2001-2011. CoDE Working Paper, University of Manchester. Travers, T (2011). Correspondence between Tony Travers of London School of Economics and ONS after the publication of 2009 PEEGs.
Wohland, P., Rees, P., Norman, P., Boden, P. and Jasinska, M. (2010). Ethnic Population Projections for the UK and Local Areas, 2001-2051, Working Paper 10/2. School of Geography, University of Leeds.Back to table of contents
1 Apply census distributions directly to the mid-year estimates.
Simple to apply and understand.
Less prone to error in production.
Heavy reliance on census.
Reliability drops over time since the census.
Comparison against population estimates by ethnic group (PEEGs) shows no real improvement in the estimates using this approach.
2 Use a combination of social survey sources (Annual Population Survey or Integrated Household Survey)
Sample sizes are reliable at Government Region level.
Reliability can be improved by merging 3 or 5 years’ data.
The survey ethnicity question is harmonised with 2011 Census ethnicity.
Sampling error and non-response create bias.
Despite the large sample sizes, estimates are not typically reliable at local authority level.
Does not cover the population living in communal establishments.
3 Improve the current PEEG methodology
Components of change could be improved. For example, using births data for fertility rates.
Administrative or survey data could be applied to allocate ethnicity to people born outside the UK.
The revisions could be made to previous years’ estimates to allow back series comparisons.
Although the estimates should be enhanced, they may also draw criticism for their complexity and heavy reliance on census.
4 Hierarchical constraining
Figure 1 summarises the proposed hierarchical constraining methodology. The intention is for a simplified alternative methodology based on social survey and census data in the short-term, with the later addition of administrative sources as these become available and are considered adequately robust.
Flexibility to incorporate new administrative or survey sources and cope with ethnicity or geography reclassifications.
Combines census, survey and administrative sources and so overcomes over-reliance on any one of these.
More likely to produce accurate estimates for areas with large non-White populations such as London and Birmingham.
In the short-term, there is still a reliance on census for local authority-level estimates.
Possibly less accurate for areas with small non-White populations
5 Use small area estimation
Small area estimation may provide an alternative framework for combining survey, administrative and census data to improve the precision of population estimates by ethnic group . Robust estimates are made directly from the Annual Population Survey at regional level but the sample data are insufficient to provide direct estimates at local authority level.
A model-based approach may provide robust estimates if auxiliary information available in administrative data (such as the School Census or the personal demographic spine) is sufficiently related to the variable of interest. The standard approach uses regression models to estimate the small area characteristics of interest and incorporates random area effects to account for between area variations beyond that explained by the model covariates. The feasibility of this approach would depend on the existence of suitable methods for estimating variables with multiple categories.
Breaks the reliance on census so the estimates will capture changes over time more reliably.
Small area estimation can include direct and synthetic estimates, using, for example, direct estimates where social survey data are adequately robust (for example, London or Birmingham) and drawing strength from auxiliary using synthetic estimation, for areas with little ethnic mix or population turnover.
Can incorporate new data sources as they become available.
Provides a formal framework for combining information from different data sources and involves less complex data manipulation than the current method.
The calculation of variance for these estimates will be straightforward.
The method for estimating variables with multiple categories is still in development.
The method relies on the availability of auxiliary data with a strong relationship to the variable of interest.
This methodology is less intuitive to communicate to stakeholders.Back to table of contents
Contact details for this Article
Telephone: +44 (0) 1329 444661