## 1. Introduction

This paper is designed to assist users of the Office for National Statistics’s (ONS’s) social survey data in calculating standard errors.

Standard errors are a widely-used measure of the precision of survey estimates. They describe sampling variability – the variability in estimates caused by the fact that we have taken a sample of a population rather than a census. They are useful in their own right, as a measure of statistical accuracy and can also be used to calculate confidence intervals and coefficients of variation.

Social survey standard errors are typically influenced by a number of factors:

• the survey sample size – a larger sample size will reduce standard errors

• the variability in the population of the characteristic of interest – when measuring a more variable characteristic, standard errors will be larger

• the survey sample design – for example, any stratification or clustering used

• the estimation method used

Various options exist for estimating standard errors. The main body of this paper contains 3 sections:

• Section 2 describes how to calculate standard errors assuming a simple random sample – accounting for only the survey sample size and the variability in the population of the characteristic of interest – and how to use these standard errors along with published design factors to approximate `standard errors`

• Section 3 begins with a brief description of common social survey design features and their usual impact on accuracy, and then describes how to use statistical software to calculate standard errors that account for the survey sample design survey in addition to the sample size and the variability in the population

• Section 4 describes methods used at ONS to calculate standard errors accounting for the estimation method used in addition to all other factors

## 2. Approximating standard errors using design factors

### 2.1 Standard errors assuming a simple random sample

Standard errors “assuming a simple random sample” account for the survey sample size and the variability in the population, but not the survey sample design or the estimation method used. It can sometimes be appropriate to publish these simple random sample standard errors in their own right, but when calculating statistics based on complex surveys, it is normally more appropriate to use simple random sample standard errors alongside design factors to approximate the “complex” or “ true” standard error, as described in section 2.3.

We do not attempt to present a comprehensive set of methods for calculating standard errors assuming a simple random sample in this paper, as the methods for some estimates (particularly non-linear estimates) are quite complex. We do briefly discuss standard errors assuming a simple random sample for proportions and means, as these tend to be more straightforward.

In the case of proportions, standard errors assuming a simple random sample can in some cases be calculated “by hand” using standard formulae:

where p is a proportion and n is the number of cases that p is based on. For example, when calculating the proportion of 16 and over adults who are employed, p would be the proportion and n would be the number of 16 and over responders. An example of this calculation is given in section 8.3 of volume 1 of the LFS user guide1.

Standard errors assuming a simple random sample for means are slightly more complex to calculate, as both require an estimate of the population variance s2 or standard deviation s. It is important to distinguish the population standard “deviation” from the standard “error” – the standard deviation is a measure of the observed variability of the variable of interest, while the standard error is a measure of the theoretical variability of a survey estimate. The standard deviation refers to the data, while the standard error refers to the estimate – in technical terms, the standard error is the standard deviation of the sampling distribution.

Standard formulae utilising the population variance s2 or population standard deviation s can in some cases be used to calculate standard errors assuming a simple random sample. For example, the standard error assuming simple random sampling of a mean is:

All statistical software will have functionality to calculate population variances, standard deviationsand standard errors assuming a simple random sample. Again, care should be taken in establishing whether the output is a standard deviation or a standard error.

#### 2.2 Domain estimates

“Domain estimates” will be referred to throughout this document. We use this term to refer to statistics that are calculated for subgroups of the surveyed population. More precisely, domain estimates are where estimates are produced for a particular subgroup and the size of that subgroup is not known.

For example, for a survey of adults only, the “proportion of adults who smoke” is not a domain estimate, since this estimate is calculated using the entire surveyed population. In contrast, the “proportion of employed adults who smoke” is a domain estimate, since it refers to employed individuals – and the number of employed individuals is not “known” – it is itself a survey estimate.

Standard errors for domain estimates will be larger than standard errors for estimates covering the entire surveyed population. If statistical software is used to calculate standard errors assuming a simple random sample, “domain” commands should be used to account for this.

#### 2.3 Design factors and design effect

In many cases, a design factor or “DEFT”, is available in a survey’s technical manual or user guide. Appendix 1 contains DEFTs for a number of main Labour Force Survey (LFS) estimates. The DEFT is defined as:

that is, the DEFT is the ratio of the “complex” or “true” standard error to the standard error calculated assuming a simple random sample.

The DEFT can therefore be used with SE(SRS) to approximate a standard error:

SE = DEFT.SE(SRS)

The design “factor” or DEFT should not be mistaken for the design “effect” or DEFF – the DEFF is the ratio of the variances, not of the standard errors and the DEFT is therefore the square root of the DEFF.

Utilising standard errors assuming a simple random sample alongside DEFTs can be particularly useful where DEFTs from a similar estimate are available – for example, DEFTs for the same estimate from a previous period, or DEFTs for a similar variable in the same period.

When selecting a DEFT from a similar estimate in order to approximate a standard error, particular care should be taken with estimates where the clustering in the survey design may have a particularly large impact. For example, where the survey is clustered at the household level, care should be taken when using DEFTs to calculate standard errors for variables that are very homogenous within households.

This can be seen in Appendix 1 by comparing DEFTs by ethnicity (Table 6) with the DEFTs in other tables. The DEFTs by ethnicity are considerably larger than most other DEFTs, mostly because ethnicity is highly homogenous within households and the Labour Force Survey, which has been used to calculate the DEFTs, is clustered at the household level. It would therefore be inappropriate, for example, to utilise a DEFT for the overall employment rate to estimate a ethnicity-specific employment rate. For more details on clustering and when it has a large impact, see section 3.

### Notes for Approximating standard errors using design factors

1. Available on the “Labour Force Survey user guidance” pages on the ONS website.

## 3. Accounting for survey design using statistical software

### 3.1 Social survey design features

Section 2.1 described the calculation of standard errors “assuming a simple random sample” – that is, standard errors ignoring the sample design. In reality, almost all social surveys incorporate a variety of “complex” design features such as stratification, clustering, multi-stage sampling or systematic sampling, all of which will usually have an impact on standard errors. This section briefly outlines common features of social survey sample designs and their usual impact on standard errors, and sections 3.2 to 3.5 describe how these survey design features may be accounted for by calculating standard errors using statistical software.

#### 3.1.1 Stratification

Stratified sampling involves dividing the population into non-overlapping strata and sampling units (or clusters) independently in every stratum. If units within a stratum are similar to each other (“‘within-strata homogeneity”) then stratification will reduce standard errors. For example, if a survey is stratified by region, the number sampled in each region is “fixed” and not subject to random variability. This will reduce sampling variability to the extent that individuals within a region are similar to each other, because it would be impossible to randomly sample more or fewer cases than expected in any given region – some of the “randomness” in the random sampling disappeared.

Some social survey samples, in addition to being “explicitly” stratified in the fashion described in the previous paragraph, are also selected systematically – ordering the sampling frame by some characteristic, selecting a random start-point and sampling units at a fixed interval. This effectively acts as “implicit” stratification. For example, if the sampling frame is ordered by postcode, a good geographic spread of the sample is guaranteed – it would usually be impossible, for example, to select multiple households very close to each other. This will reduce standard errors further.

#### 3.1.2 Clustering

Cluster sampling involves dividing the population into non-overlapping clusters and taking a sample of clusters followed by a sample of the individuals within the clusters. This will typically result in a cheaper data collection, but will increase standard errors to the extent that individuals within a cluster are similar to each other (“within-cluster homogeneity”). Many social surveys utilise small geographic areas as their clusters, while other are clustered only at the household level (that is, a sample of households is drawn and all eligible individuals within a household are interviewed). In both cases, clustering will increase standard errors. A household survey may be completely unclustered if it utilises no geographic clustering and samples only a single adult per household.

#### 3.1.3 Weight variability

Variability in the survey weights will, all other things being equal, increase standard errors. This variability may be due to variability in the selection probabilities, adjustments to the weights to reduce non-response bias, or other adjustments to the weights. The methods discussed in this section will account for this increase to standard errors due to weight variability, but will not account for the reduction in standard errors caused by the estimation method, as discussed in section 4.

### 3.2 Accounting for the survey design using statistical software

The remainder of section 3 describes how R, SAS, SPSS and STATA can account for complex survey design features. Accounting for the estimation method is more complex, and is covered in section 4.

The important variables that will be required in all instances, are:

• “variable of interest” (the variable for which an estimate is required)

• “strata variable”

• “cluster variable” if clustering is utilised in the survey design

• “domain variable” for domain estimates

• “weight variable”

The “variable of interest” is simply the binary, categorical or continuous variable for which the user requires an estimate and standard error.

The “strata variable” should represent any stratification present on the survey in question. On the Labour Force Survey (LFS) and Annual Population Survey (APS), it is usually appropriate to use the lowest level of geography available on the dataset as a “strata variable”, to reflect the use of systematic sampling. Details on “strata variables” for other surveys should be available in the survey user guide and will often be some kind of geographic variable.

The “cluster variable” should represent any clustering present on the survey in question. If the survey is clustered at the household level – for example, the LFS and APS – a household identifier, if available, should be used as the “cluster variable”. For surveys clustered at a higher geographic level – often the “postcode sector” level – details on the “cluster variable” should be available in the survey user guide. The statistical contact for the survey should be able to assist further if the relevant variable does not appear to be available.

The “domain variable” refers to the binary or categorical variable required for a breakdown of the estimates; for example, estimates of employment by ethnicity, where ethnicity is the “domain variable”. In most software packages it is possible to specify multiple “domain variables” and to specify a cross tab as a domain – for example, employment estimates by ethnicity and sex, where ethnicity and sex are cross tab “domain variables”.

As outlined in section 2.2, careful treatment of “domain estimates“ – statistics that are calculated for subgroups of the surveyed population – is important. Statistical software will calculate “domain estimates” correctly, but care should always be taken to specify the required domain in the “domain variable” and not to simply subset the dataset. For example, if a standard error for the proportion of employed individuals who smoke is required, then the entire dataset should be used with the employment variable used as a “domain variable” – the dataset should not just be subset to contain employed individuals only. Subsetting the data would result in the software treating the survey as a survey of employed people, not a survey of the general population and the standard error would be under-estimated.

The “weight variable” should be the appropriate weight for the analysis you are conducting. Some social surveys include specialist weights for certain estimates and these should be used in the calculation of standard errors.

The standard procedure used in SAS software to calculate standard errors, which accounts for a complex sample design, is “proc surveymeans”9. We note that guidance for utilising “proc surveymeans” to estimate standard errors specifically for the Family Resources Survey is also available1.

The procedure is set up in the following way:

``````proc surveymeans data = dataset command cl alpha = alpha;
var     variable of interest;
class  class variable ;
strata stratum variable;
cluster cluster variable;
weight weight variable  ;
domain domain variable ;
run;
``````

The “dataset” should be the name of the dataset – SAS requires datasets to be named with less than 32 characters and no spaces.

The “command” section of the code can be used to specify the type of estimate that is required. Means, proportions and totals can be calculated using the commands “mean”, “ratio” and “sum” respectively. If a command is not specified the default option is to calculate estimates and standard errors for means.

The “cl” and “alpha” sections of the code construct a confidence interval for the estimate. The “cl” should be replaced with “clm” when a mean or ratio is required and “clsum” when a total is being calculated. The “cl” command should always be used when a confidence interval is required. As a default the “cl” command produces a 95% confidence interval. If the user requires a different interval then the “alpha” command should be used in conjunction with the “clm” or “clsum” command. The “alpha” value should be equal to the level of significance required; for example, “alpha = 0.05” would produce a 95% interval, “alpha = 0.1” produces a 90% interval and “alpha = 0.01” produces a 99% confidence interval.

The next command in the code, “variable of interest”, allows the user to specify the variable for which they require an estimate and standard error. This line can be used for continuous or categorical variables. If a categorical variable is used this should also be specified as the “class variable” – the “class variable” command should not be used for continuous variables.

Strata, cluster, weight and domain variables can be specified as outlined in section 2. The domain line is only needed if an estimate is required across a domain, or a number of domains – for example, if the user requires employment status estimates by ethnicity and sex. In the case where more than one domain is required the domains must be specified with a space in between as follows:

``````domain domain1 domain2;.
``````

If the user wants to use cross-tabulated domains, such as age group by sex, this should be specified by using a multiplication sign as follows:

`````` domain domain1*domain2;.
``````

Within the Office for National Statistics (ONS) the statistical software R2 is being used more frequently for research purposes; however, it is not currently used in the production of regular outputs.

R contains a number of packages with functionality to analyse complex survey data. This section focuses on the “survey” package3, which needs to be installed, if not already, and loaded into R prior to analysis.

Within the survey package there are a number of functions for calculating standard errors, including “svytotal”, “svymean” and “svyratio” and “svyby”. If estimates are required across a domain the “svyby” function should be used, which can be specified to calculate estimates of totals, means and ratios. Therefore the information in this section will focus on the “svyby” function, as this is more appropriate for users calculating standard errors for subgroups or domains.

If the package has already been installed, it is loaded as follows:

``````library(survey)
``````

Unlike SAS, in R the survey design is specified and stored in an object first using the “svydesign” command and subsequent commands are used to calculate standard errors. The survey design is specified and stored in an object named “svyname” (which may be replaced with alternative names) as follows:

``````svyname <- svydesign(   id   = ~cluster variable,
strata = ~stratum variable,
weights = ~weight variable,
data  = dataset,
nest  = TRUE       )
``````

The `nest` command equals `FALSE` by default. The `nest` command is used to identify whether or not primary sampling units (PSUs) appear in different strata, returning an error if a PSU appears in more than one stratum. When `nest = FALSE` the function will not return an error should PSUs overlap strata, however, when nest = TRUE an error will be returned. It is recommended that the user use `nest = TRUE`, since this will have no impact on the overall output and may assist with error detection.

It is also important that the “~” is included in each command line, as this specifies that the variables being used exist in the specified dataset.

Following the specification of the survey design, the “svyby” function is used to calculate estimates and the standard errors associated with them. It is specified as follows:

``````output <- svyby(       formula = ~variable of interest,
by    = ~domain variable,
design  = ~svyname,
FUN   = command,
keep.var = TRUE     )
``````

This “svyby” function, similar to the above “svydesign” function, uses many of the variables described in section 2. The “variable of interest” refers to the variable of interest; that is, the variable for which estimates and standard errors are required.

The “domain variable” refers to the domain over which estimates are required; where the “domain variable(s)” should always be categorical in nature. If a cross-tabulated domain is required, for example, sex by age group, then the variables should be specified with a “+”¬ ¬in the middle; for example, “by = ~sex + agegroup”. Note: In the “svyby” command it is necessary to define a “domain variable”. Should a domain not be required the user should create a dummy variable, where the value is the same for all observations and specify this as the “domain variable”. Alternatively the “svytotal”, “svymean” or “svyratio” functions may be used directly, as mentioned previously.

The “svyname” line simply refers to the survey design object that was created in the “svydesign” function. The “command” refers to the type of estimate and standard error required. The “command” function can be replaced with “svymean”, “svytotal” or “svyratio”; these calculate estimates and standard errors for means, totals and ratios respectively. Should a user require estimates for more than one of these options a new “svyby” function should be used.

The “keep.var = TRUE” command returns the standard error with the estimate. Without this command line in the function it would simply return estimates for the specified “variable of interest” and “domain variable(s)”.

To calculate standard errors for complex sample design in SPSS4 using in-built functions, it is necessary to download the paid-for add-on Complex Samples tool. This tool allows the user to take into account stratified, clustered and multi-stage sampling techniques when carrying out statistical analysis on survey data.

Calculating standard errors for complex sample designs in Stata5 is possible using the built-in survey commands. The survey commands allow the user to specify the survey design of the dataset using the “svyset” command before running procedures to calculate estimates of standard errors using the other “svy” commands such as “svy: mean”, in a similar fashion to R.

For further information on these procedures please refer to the SPSS Complex Samples 21 Guide6 and Stata Survey Data Reference Manual7.

### Notes for Accounting for survey design using statistical software

1. Available via the FRS pages at GOV.UK
2. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing. [online]
3. T. Lumley (2014) Survey: analysis of complex survey samples. R package version 3.0.2[online]
4. IBM SPSS. (2012). IBM SPSS Complex Samples 21. Armonk, NY: IBM Corporation. [online]
5. StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP
6. IBM SPSS. (2012). IBM SPSS Complex Samples 21. Armonk, NY: IBM Corporation ]
7. StataCorp. (2013). Stata Survey Data Reference Manual. Stata Press. [online]

## 4. Accounting for the estimation method

The methods outlined in section 3 will account for the survey sample design, but will not account for the impact of the estimation (weighting) method used. More precisely, the methods in section 3 will account for variability caused by variance in the weight variable itself, as discussed in section 3.13, but they will not account for the reduction in variance due to the use of “known” population totals. These totals differ for different surveys and weights, but are usually age, sex or region totals projected from the census, so are not strictly known with certainty, but can typically be treated as having negligible variability.

The use of “known” population totals will usually reduce standard errors. A technical explanation for this is that the estimator becomes a GREG estimator, which, if well-specified, has a lower variance than a simpler Horvitz-Thompson estimator – for details, see standard survey estimation texts1. An intuitive explanation is that the use of fixed totals will effectively remove some random imbalances due to sampling. For example, if more men are randomly sampled than women and the characteristic of interest is correlated with sex, with no weighting the estimate would be skewed towards men. The use of totals by sex will remove this imbalance in the estimate whenever it occurs, effectively removing some random variability.

The Office for National Statistics (ONS) uses a “linearised jackknife” variance estimation method to account for the impact of the use of known population totals in the weighting in addition to other factors effecting standard errors. 10This “linearised jackknife” method is applied using a suite of SAS macros developed in-house and is not at present suitable for sharing with survey users.

Some example Labour Force Survey design factors implied by the “linearised jackknife” standard error estimation method are given in Annex A. For comparison, we have also included design factors implied by using the “proc surveymeans” command in SAS, as outlined in section 3.3 (which gives identical results to the “survey” R package outlined in section 3.4).

Almost all “linearised jackknife” design factors are smaller than the “proc surveymeans” design factors, reflecting the reduction in standard errors from the use of “known” totals. However, the size of the reduction in the design factors varies substantially between estimates, with a much larger reduction for estimates of employment and inactivity at a high level of aggregation than for estimates of unemployment or estimates at a lower level of aggregation. This likely reflects the fact that the population totals used in the weighting – age, sex and region – tend to be better-correlated with variables that cover more of the population. However, this result is hard to generalise to other surveys or estimates and it may be safer to conclude that the impact of the use of “known” totals on design factors is almost always negative, but can vary substantially depending on the estimate in question.

### Notes for Accounting for the estimation method

1. For example: Carl-Erik Sarndal et al, Model Assisted Survey Sampling (pp. 437-442). New York: Springer-Verlag
2. For a fairly accessible description of the method, see David Holmes and Chris Skinner “Variation estimation for Labour Force Survey estimes of level and change”, GSS methodology series no 21.

## 5. Conclusion

There are 3 primary options for estimating social survey standard errors:

• calculating a standard error assuming a simple random sample and applying a design factor

• calculating standard errors accounting for the complex survey design using statistical software

• calculating standard errors accounting for the complex survey design and estimation method

The most appropriate method depends on the availability of statistical software and the expertise to use it, and whether an appropriate design factor is available.

Since published design factors are generally calculated accounting for both the complex survey design and the estimation method, if an appropriate design factor is available it may be preferable to use this to calculate a standard error. What counts as an “appropriate” design factor is subjective, but a design factor for the same estimate for a different period (assuming the survey design has not changed in the interim), or for a similar estimate, may be suitable. Care should be taken with estimates that are likely to suffer from a high degree of clustering – see discussion in section 2 - and domain estimates should be treated appropriately – see section 2.2. Design factors from different surveys should not be used.

Using statistical software and accounting for the survey design using the methods outlined in sections 3 and 4 may be a good option if software and expertise are available or if appropriate design factors are not available. This will result in an over-estimation of most standard errors, as this method will not account for the impact of the estimation method, as outlined in section 4.

## 6. Authors

Matt Greenaway and Bethan Russ, Office for National Statistics

November 2016

methodology@ons.gov.uk