The Standard Occupation Classification (SOC) 2000 that replaced SOC 90 was introduced to the Labour Force Survey (LFS) in spring 2001.
When new questions and classifications are introduced to the LFS, it is normal practice not to release these for public use until they have been quality assured over a number of quarters.
However for SOC 2000, the questions and methods to code the classification were well established, it is only the categories of the classification that were new.
Though SOC 2000 still has nine major groups, there have been considerable changes in the structure and composition of the classification.
Therefore a meaningful comparison of results based on one classification with those based on the other is not possible. This is a problem if one wants to compare data overtime.
To overcome this problem of comparability two solutions were available.
The first solution was to code the historical micro-data to SOC 2000. However, this would have been a very time consuming and costly operation.
The second solution was to code certain data sources to both the classifications. These dual-coded datasets could have been used to estimate the correspondence between the two classifications. These correspondences could then have been used to backcast historical data at an aggregated level.
This solution was quicker and easier but has its problems which are outlined below.
The Office for National Statistics (ONS) made the decision to dual-code the LFS summer 2000 quarter to both SOC 90 and SOC 2000. Further details of this dual coding exercise can be found in an article in the July 2001 edition of Labour Market Trends (1.89 Mb Pdf) .
Apart from this dual-coded quarter, other dual-coded LFS data were available. Analysis of these various dual-coded data showed that the LFS winter 2000/2001 quarter provided the best estimates to base the backcasting probabilities on.
Matrices showing the correspondence between SOC 90 and SOC 2000 which derived from the LFS winter 2000/2001 dual-coded quarter have been used to backcast the historical time series.
Where individuals in the LFS winter 2000/2001 dual-coded quarter had codes assigned on both SOC 90 and SOC 2000, the observed relationship was included in a matrix.
The cell counts in these matrices were then calculated as percentages, representing the proportional relationship to SOC 2000 of each SOC 90 minor group.
Each cell in the resulting matrix showed the probability of how many observations in a given category of SOC 90 would be classified in a specific category of SOC 2000.
Separate matrices had been calculated for each economic activity group at the lowest level with a full-time/part-time gender split.
Using this method preserved the distinct occupational characteristics of each group.
For example, the distribution of part-time workers shows a smaller percentage in manager occupations than the equivalent proportion of managers among those who are full-time workers.
The SOC 2000 probability distributions for each SOC 90 category were then applied to other datasets as a proxy for what respondents would have been coded to under SOC 2000.
The estimates provided using the matrices from LFS winter 2000/2001 quarter are considered the best available. However, any methodology using only the one-time-period as a proxy for the relationship in other periods, will be subject to a number of quality issues that users should take into consideration before using the data.
Transformation matrices for SOC 2000 - Quality issues
Caution should be exercised when analysing or interpreting the backcasted data series. This section presents a number of issues to be considered in respect to data quality.
The dual-coded LFS winter 2000/2001 quarter which produced the matrix with correspondences cannot replicate the exact method of classification that SOC 2000 used for the LFS spring 2001 quarter.
Two different coding systems were being used for the two quarters which meant there were minor differences in the on-screen information available to coders.
These differences were mainly linked to information on supervisory and managerial duties. This difference may have caused discontinuities. This would be particularly true for areas where the classifications have seen the most change, for example, major groups 1, 4, and 7.
The LFS is a sample survey so the data are subject to sampling error.
Estimates based on smaller subgroups tend to have larger relative sampling errors, although sampling errors also depend on the way the sample and population are distributed.
Therefore, both the data from previous time periods being transformed into SOC 2000 and the probabilities based on the data from the dual-coded dataset were subject to sampling error.
In addition to sampling error in the dual-coded dataset, the observed relationship in the LFS winter 2000/2001 quarter will have been affected by coder variance.
Occupational information on the LFS is coded to SOC by interviewers so there will have been a certain amount of variation in the way interviewers assign SOC codes.
This will have affected the distributions in the probability matrix and the historic time series data.
It is also difficult to assess whether any seasonal differences affect the use of a probability matrix based on only one data quarter.
The SOC 2000 major group 5 (skilled trade occupations), which includes such occupations as skilled farm and construction workers, did show a seasonal pattern in data produced from a transitional matrix.
The strength of the seasonal pattern is as much dependent on the clarity of the relationship between the categories in the two classifications, as it is on the seasonal changes in numbers for that group.
Thus, if a specific SOC 90 category only corresponds to one category in SOC 2000, the seasonal pattern would have been replicated in its entirety, even though the relationship was based on data from only one time point (LFS winter 2000/2001 dual-coded quarter).
However, if the SOC 90 group was spread over several SOC 2000 groups, then the seasonal pattern would also be diffused. Therefore, basing the relationship on only one time point, that is, LFS winter 2000/2001 quarter would most likely affect the results.
Changing occupational structure
Over time the structure of industry changes and therefore people's occupations also change. Therefore it is not meaningful to apply a classification with new occupations to data for a time period which did not have these new occupations.
This problem will increase the further back in time data are backcasted.
In balancing this risk and users' interests in the time series of data, ONS had estimated the occupations under the new classification from LFS spring 1995 quarter to LFS winter 2000/2001 quarter.
As can be shown in the LFS 1996/1997 winter quarter that was also recoded to SOC 2000, (further details of this dual-coding exercise can be found in an article in the July 2001 edition of LMT.) the distribution of occupational groups had not changed significantly over the intervening period.
Therefore, the matrix based on LFS winter 2000/2001 quarter should reasonably reflect, in most cases, the likely relationship between SOC 90 and SOC 2000 for those earlier periods.
The probabilities between SOC 90 and SOC 2000 for LFS winter 2000/2001 quarter, were computed based on unweighted data because we wanted internal correspondences between two classifications.
However, backcasting data could have been affected if any given relationship between two classifications in the correspondence tables used were over or under represented.
It is possible this could have occured because the data used in unweighted form would not have corrected the response differences in the UK. Such differences in response rates in different parts of the UK may have lead to more subtle relationships being affected.
For example, if an area such as the North East, which is rich in energy intensive industries, had a high response rate, while inner London, which has less of these industries, had a lower response rate, then it is possible occupations typical to these energy intensive industries in North East would have suppressed more subtle relationships for similar occupations, originally in the same SOC 90 group from inner London.
This would have occured simply because there were a disproportionately high number of people from the North East in the sample.
When comparing the spring data and the historic datasets, it can be observed from the estimates that there were some discontinuities in distribution.
This difference in distribution is in groups 4 (administrative and secretarial) and 7 (sales and customer services) where the historic data is of a lower level.
The majority of these unexplained changes in levels from the historic time series to the LFS spring 2001 quarter could have been attributable to one or more of the quality issues mentioned above. It could have been an unusual movement or sampling error in the spring data.
However, the differences are small and the time series had been broadly consistent over the time periods.
SOC 2000 backcasting tables
The full range of backcasting tables available includes people in employment, employees, self-employed, full and part-time workers, temporary workers and second jobs.
Also, people that are long term unemployed and unemployed by previous occupation are available.
These are all split by gender, at the 1 (major) and 2 (sub-major) digit level and cover the periods from LFS spring 1995 quarter to LFS autumn 2001 quarter. This is the only backcast data that is available.
Please see the Downloads section for the full range of backcasting tables available.
For general information on methodology and background to the LFS, please see User Guide Volume 1.
For information on the structure of the SOC 2000 classifications, see Related links.
If you have any specific comments or questions on the SOC 2000 backcasting please contact