This publication considers the use of patent life estimates as an indicator of R&D service lives, discussing the strengths and limitations of this approach in detail. Patent lives are derived for the UK according to methods previously used by researchers in other countries and using survival analysis techniques to address the downward bias in those methods. Analysis is also made by industry section using patent data matched to businesses’ Standard Industrial Classification (SIC) codes. Findings indicate that the average service life of UK patents lies between 8.0 and 20.0 years with variation across different industrial sections and different methods of estimation.
The author would like to thank Christopher Steer, Helen Meaker, and Walter Mkandawire of the Office for National Statistics for their valued support and contributions.
The forthcoming capitalisation of Research and Development in the National Accounts of EU Member States and other countries creates a requirement for new information on the useful ‘service life’ of the resulting R&D assets. Two alternative methods have typically been used to estimate R&D asset lives; collecting information through survey questions, and using information on patent renewals extracted from national Intellectual Property Protection systems.
This publication is from a suite of three which provide a comprehensive overview of Office for National Statistics (ONS) research into R&D service lives. This paper derives estimates from patent renewals data, ‘Service Lives of R&D Assets: Questionnaire Approach’ (Ker, 2013b) describes the implementation and analysis of data collected through new survey questions, while ‘Service Lives of R&D Assets: Background and Comparison of Approaches’ (Ker, 2013a) provides context for the research, compares the methods and results for the UK, and draws conclusions. It is suggested that these are read together.
This approach uses information about annual patent renewals fee payments to estimate how long patents’ useful lives are and then uses this to infer conclusions about the lives of R&D assumed to be embodied by those patents. Patents are granted by government to inventors in exchange for publishing details of the invention being patented as a means to prevent others from using the protected invention or knowledge. They give a monopoly, for a number of years, on the use of protected processes, products, or ideas. Applicants must prove the novelty and originality of the innovation to receive protection.
In most patent systems, a fee is paid upon application and an annual renewal paid in each year after a patent is successfully granted in order for it to remain in force - up to a maximum duration. In the UK, annual fees are payable from the fourth anniversary of filing and range from £70 up to £600 for the final permitted renewal, typically 21 years after filing (Intellectual Property Office, 2012) although lags in the approval process may extend this in some cases. If renewals are not paid or the patent is revoked, the Intellectual Property is no longer protected.
This system of annual renewals can be used to infer information about the economic usefulness of patents; rational owners should only pay renewals in each year that the patent’s value outweighs the renewal fee. A (granted) patent’s ‘service life’ therefore might be viewed as the number of years between patent application and the final year in which a renewal is paid. The application (or priority) year is relevant rather than the approval year as the knowledge must exist in order for the patent to be filed and ‘patent pending’ status offers some protection during the period between filing and granting. The number of years between filing and expiry of the patent gives its “age at death”.
However, it may not always be the case that patent holders will only pay if the patent’s value outweighs the cost of renewal; some companies, especially those holding large numbers of patents, may renew by default as the cost of annually reassessing each patent may outweigh the relatively small renewal fees. Furthermore, with patent litigation continuing to increase amid growing awareness of the value of patents (PwC, 2012) patent holders may also choose to renew as a precautionary measure or because the actions of other businesses could give the patent value in future.
In the case where a patent takes more than five years to grant, renewals are due annually from the year of approval for up to 20 years. This means that patents may have a life in excess of 21 years, accounted for by the additional lag between application and approval.
The Australian Bureau of Statistics (2009) presented the survival distribution of granted Australian patents filed between 1980 and 2001 based on how many years renewals were paid for - an approach similar to Federico (1954) who examined UK data. This showed the population of patents gradually diminishes over time and found that on average Australian patents reach 11 years of age before lapsing.
Tanriseven, van Rooijen-Horsten, & de Haan (2010) conducted similar research using patent data for the Netherlands from 1968 onwards, finding a median patent life of around seven years. However, the authors argue that the value of patents should be taken into account since the lives of more valuable patents are of most interest; suggesting that more expensive patents might generally be expected to have longer useful lives. They weighted Dutch patent lives with (assumed) value by combining the age distribution with information on the value distribution of Dutch patents from the EU PatVal report (Gambardella, Giuri, & Mariani, 2005). This was achieved by assuming perfect correlation between patent age and value so that increasingly more weight is given to longer-lived patents.
They acknowledge that this assumption is simplistic and will bias the estimate upwards and therefore treat the weighted median of 18 years as an upper-bound to the range of possible average lives, with the un-weighted median of seven years as the lower bound. The actual life of interest is believed to lie somewhere in-between so the mean of the two, 12.5 years, is adopted for most industries. However, this was deemed inappropriate for electro-technical engineering, given a shorter life of 9.5 years, and the chemical industry, given a longer life of 15.5 years.
Gaining a patent requires proof of novelty, innovation, inventiveness, and industrial application (Intellectual Property Office, 2012). These characteristics sound like they will apply to many of the results of R&D and so that patents will often represent R&D. If the link between R&D and patents is sufficient, patent lives may provide a reasonable indication of R&D lives.
However, patents are more likely to relate to certain types of R&D, particularly that relating to products, processes, and industrial applications. Basic Research may be less likely to result in patents, indeed scientific and mathematical discoveries, theories, and methods; methods of medical treatment or diagnosis; and animal or plant varieties – all of which are all likely to rely on research and development – cannot be patented at all. Furthermore, patents might relate to the results of other innovative processes defined on the Oslo Manual (OECD, 2005), or filings may relate to an idea which is later shown to be unworkable.
Kirankabes (2010) found a very strong positive correlation between firms’ R&D expenditures and patent applications using fixed and random effects and feasible generalised least squares on panel data for 33 countries (including EU countries) spanning 1997 to 2007. However, correlation is not the same as causality and by contrast a pilot survey of 12 enterprises conducted by the Federal Statistical Office of Germany found that the share of respondents’ R&D activity that resulted in patents ‘ranged from between 1.5 to 90 per cent, indicating very strongly that considerable care is needed when using patent data to estimate the service lives of R&D’ (OECD, 2010, p. 65).
The 2011 UK Innovation Survey (UKIS), estimate of businesses internal R&D spending in 2010 was in line with the estimate of £16bn from the Business Enterprise R&D (BERD) survey. BERD is the most authoritative source (Department for Business, Innovation, and Skills, 2012). The BERD is the most authoritative source of UK business R&D expenditure data and the similarity with the estimate from this source suggests that UKIS also provides reasonable representation of R&D performing businesses. UKIS results showed that 14 per cent of businesses conducted R&D inn 2010 but only three per cent had protected innovations with patents – similar to observations in previous years.
Furthermore, internal R&D is only 35 per cent of total innovation expenditure measured and other types of spending may also contribute to patenting. This suggests that the link between patents and R&D in the UK is not firm.
There are various reasons why businesses might not use patents and so un-patented R&D may have different service lives to patented R&D:
there is typically a lag of around four years between application and granting of patents in the UK, if researchers believe the R&D will be of use for less time than it takes to gain a patent they may choose not to apply at all – this implies that patent lives may be biased upwards compared to R&D more generally
if they believe that benefits are likely to last longer than the period provided by patents, they might decide to employ other approaches such as industrial secrecy – this implies that patent lives may be biased down compared to R&D more generally
some R&D, such as medical procedures cannot be patented and will be unrepresented by patent lives
‘Unsuccessful R&D’ (R&D that is not expected to provide benefits) is also unlikely to be patented. However, the SNA does not distinguish success or failure as on aggregate rational investors will invest up to the point that they do not expect to make a loss. Once this assumption has been made, only the life of successful R&D (as would be embodied by patents) is relevant.
Overall the evidence suggests that the theoretical link between R&D and patents is unclear and likely to vary between industries. However, the few alternatives available also have limitations and so the patent approach warrants investigation.
Three organisations issue patents valid in the UK; the UK Intellectual Property Office (IPO), the European Patent Office (EPO), and the World Intellectual Property Office (WIPO). The ONS Virtual Micro-data Laboratory (VML) provides access to IPO and EPO patent data. WIPO data are not available but it is likely that most UK patents are filed through the other channels.
IPO data are an extract which was taken from the live patent register in 2010 and covers patents granted since 1990 and also those published patents applied for since 1990 – 115,568 patents in total. For each patent it gives the date of achieving each status (published, amended, granted, renewed, dead, etc). Where a status has not yet achieved the field remains empty.
The EPO ‘PATSTAT’ dataset covers 91,246 published patents filed with UK addresses between 1986 and 2009. The information is comparable to the IPO data but PATSTAT is not a live updating register and there is a separate record for each status of an individual patent (eg. one record for application, one for publication, etc.) such duplicates were removed.
The patent data only provide basic information about the owner (name, address, etc). In order to facilitate more detailed analysis, patents were matched to the Inter-Departmental Business Register (IDBR); the most comprehensive list of UK businesses covering 99 per cent of economic activity (Office for National Statistics, 2012). Matching replaced sensitive fields such as the business name and address with a unique ‘enterprise reference number’ used throughout ONS business surveys, this facilitates matching of patents to complementary data sources.
Thomas (2011) explains that names and addresses were extracted from the patent data and matched to the IDBR using ‘fuzzy matching’. A match score estimating the quality of the match (where no match would score zero and a perfect match scores 100) was allocated to each case and a threshold applied to determine whether a sufficiently good match has been made. This is necessary because of the wide variety of ways in which company details can be recorded in the patent data and IDBR.
The patents were either matched, not matched, or fairly often had multiple possible matches arising from multiple subsidiaries of the same organisation operating from the same address. Manual matching was performed for those with large patent holdings but was not achievable for all cases due to the volume of multiple matches.
As the IDBR is a live register, if firms have changed their name or premises in the time since patent filing, a match was not possible. As such, while all the matched firms are businesses, it is likely that a considerable number of businesses’ patents remained unmatched due to the limitations of the method. Despite this, 74,356 patents were matched, enough to permit new analysis of patent lives by industrial section.
The patent datasets contain enterprise reference numbers enabling matching with business data sources such as the Business Structural Data (BSD) files. These are annual extracts from the IDBR providing a snapshot including firms’ Standard Industrial Classification 2007 (SIC07) classification codes which are matched to patents for analysis by industry section.
Before analysis, it was necessary to create a single dataset containing all required information. Firstly, the EPO dataset was appended to the IPO data with variables paired when possible. Some data items differ between the sources and were kept separate. For example, the EPO data contained a single enterprise reference while IPO data contained separate references relating to the time of filing and the time of granting. The latter were chosen though in most cases the two are the same.
In the EPO dataset a number of multiple records were identified and would lead to multiple counting unless these 3,168 duplicates were removed. Also, while the EPO data contained only patents applied for after 1986, the IPO data included both all patents applied for and all those approved from 1990 onward, including those applied for prior to 1990. A number of the patents had an unusually long lag between the priority year and the grant year – with some having priority as far back as 1975. Including such patents would lead to upward bias of estimated lives because it is the unusually long lag which causes them to be in this sample at all. All patents with priority dates earlier than 1986 were removed. This brings the IPO data approximately into line with the EPO data and is reasonable because the average time between application and granting for patents filed at the IPO is 4-5 years. However, there may have been patents filed in or after 1986 and approved before 1990 (ie. patents with a shorter than usual lag) which would be excluded but any resulting bias is likely to be insignificant.
Variables giving the patent’s “birth year” and “death year” were created; the latter simply by extracting the year from the date of death variable. The ‘death age’ was taken as the difference between the two. Birth years consist of ‘priority’ years for IPO patents and ‘application’ years for EPO patents. The difference between these is that when the applicant makes reference to a previous filing in another member state of a patent union the priority date may be up to 12 months before the application date. These are used rather than the date of patent granting as the R&D is likely to be useful during the period while the Patent Office makes its decision.
Of 203,043 patents remaining in the dataset after duplicates were excised, 55,673 had died during the period covered. 92 per cent died due to renewals ceasing, while four percent expired at the maximum duration. Other causes of death include revocation, surrender, and invalidation for reasons such as not filing a translation. The latter tends to happen at younger ages (<4 years) but they were kept since these patents must have embodied some invention to be granted, but their owners presumably found the costs of compliance to avoid revocation outweighed the expected benefits. These constitute only 0.1 per cent of dead patents.
74,356 patents had been successfully matched to the entity reference numbers through the ‘fuzzy matching’ process. These could then be matched to businesses in the BSD datasets to find out their Standard Industrial Classification 2007 (SIC07) codes which identify the industry in which the patent holder operates. Matching was mad with BSD datasets from 2007 onwards only – previous years have SIC 2003 codes. There is no practical way to convert between the two at the business level because patent lives cannot be apportioned out to industries using a conversion matrix in the way expenditure or employment can. To maximise SIC07 matches BSD files for 2007-2011 were merged increasing matches compared to using one BSD file alone and 70,307 patents were successfully matched to SIC07 codes. Where a business’ SIC07 code had changed between years the most recent was taken. Table 1 summarises this process by source and death status.
|Total patents||Duplicates removed||Born 1986 onward||Matched to entity references||Matched to SICs|
|- of which dead||56,275||56,275||55,673||24,268||21,978|
|- of which dead||37,982||37,982||37,380||13,594||11,731|
|- of which dead||18,293||18,293||18,293||10,674||10,247|
The patents were divided into 24 groups; the 21 standard ‘industrial sections’ (Office for National Statistics, 2009) but with ‘Scientific Research and Development’ (SIC 72) and ‘readymade interactive leisure and entertainment software development’ and ‘Business and domestic software development’ (SIC 62.011/2) identified separately, plus a group for patents with no SIC information.
There was considerably greater industry matching success for EPO patents than IPO patents (56 per cent compared to 19 per cent). This may be because more EPO patents are from ‘big companies’ which are more easily matched or due to greater standardisation of recording in EPO systems. It could be a source of bias if the propensity to file at the IPO rather than the EPO (or vice versa) differs between industries along with patent lives. Table 2 compares the IPO and EPO patents with SIC07 codes by industry section. If businesses are as likely to file at the EPO as the IPO (and if there is no systemic bias in the overall success of industry identification) the proportion of the patents relating to each industry should be similar.
|Industrial section||IPO Patents||EPO Patents||Difference|
|Total patents with industries identified||21,376||48,931|
|Proportions of total||Per cent||Per cent|
|A - Agriculture, forestry, and fishing||1.0||0.2||0.8|
|B - Mining and quarrying||0.6||0.6||0.0|
|C - Manufacturing||50.0||53.1||-3.1|
|D - Electricity, gas, steam, and air conditioning supply||0.1||0.1||0.1|
|E - Water supply, sewerage, waste management and remediation activities||0.5||0.2||0.3|
|F - Construction||2.8||0.6||2.2|
|G - Wholesale and retail trade, repair of motorvehicles and motorcycles||10.1||7.1||2.9|
|H - Transportation and storage||1.4||0.8||0.6|
|I - Accommodation and food service activities||0.3||0.3||0.0|
|J - Information and communication (ex. software)||2.8||3.3||-0.6|
|J - Software||0.6||0.6||0.0|
|K - Financial and insurance activities||5.6||5.1||0.5|
|L - Real estate activities||0.7||0.2||0.5|
|M - Professional, scientific and technical activities (ex. R&D)||8.5||4.5||4.0|
|M - Research & Development||5.9||12.8||-6.9|
|N - Administrative and support activities||3.9||5.0||-1.1|
|O - Public administration and defence; compulsory social security||0.0||0.1||0.0|
|P - Education||1.1||3.4||-2.3|
|Q - Human health and social work activities||0.5||0.5||0.0|
|R - Arts, entertainment, and recreation||0.6||0.1||0.5|
|S - Other service activities||2.8||1.2||1.6|
|T - Activities of households as employers||0.1||0.0||0.0|
|U - Activities of extraterritorial organisations and bodies||-||-||-|
In both sources around half the patents relate to manufacturing but there is a disparity of around three percentage points between the sources. The greatest disparity is in ‘research and development’ which comprises 12.8 per cent of EPO patents but only 5.8 per cent of IPO patents, this suggests that R&D businesses may be more likely to file at the EPO – perhaps because firms classified in this sector perform R&D for sale or are more likely to be the R&D branches of wider Multi-National Corporations and the EPO can grant protection across multiple European states. Conversely, the proportion of IPO patents in ‘Professional, scientific, and technical activities (excluding R&D)’ at is almost double that of the EPO.
While these differences should be noted, it is necessary to assume in analysis that the industry-matched patents are representative of all patents and that there are no systematic issues with the identification of industries.
The patent values used by Tanriseven, van Rooijen-Horsten, & de Haan (2010) are from the ‘PatVal’ report (Gambardella, Giuri, & Mariani, 2005) which presented the findings of an EU funded survey directed to the inventors of 27,531 patents granted by the EPO with priority dates between 1993 and 1997 (i.e. part of the period of the EPO data outlined above). Patent filed by inventors located in six participating countries were chosen; France, Germany, Italy, the Netherlands, Spain, and the UK. The survey covered 42 per cent of total EPO patents with samples proportionate to each country’s share of total patents.
The UK gained 20 per cent response from 7,846 questionnaires, lower than other countries. However, due to the larger volume of patents covered by the UK survey the response covered 1,540 patents - second only to Germany.
The survey used a hypothetical question which asked inventors the minimum amount which the patent’s owner should have been willing to sell the patent for on the day it was granted. Respondents were asked to make use of all information available including that arising after the day of granting. Respondents were asked to consider all information available at the time of response; this should improve the precision of estimates as the passage of time since granting is likely to improve understanding of the value of the patent. The survey took place around seven years after the latest application dates of patents in the sample, in many cases the true value of the patent had probably become clear in that time. However, hypothetical questions are known to be susceptible to bias; for example in the question the prospect of selling to a competitor may elicit over-estimation. Further information on the PatVal report can be found I Annex 1.
Results are presented in Table 3, where the value ranges have been converted to average weights based on the assumption that values are normally distributed within the range (ie. that the central value is the average). UK patents are skewed toward higher values than other EU countries. The UK held the largest share of licensed patents and of those that gave rise to new firms. As this suggests, the UK was found to hold a large share of high value patents. However, the sample design focused on patents that were more likely to be valuable and so such patents are relatively over represented in the sample. This will over emphasise longer-living patents when this information is used for weighting.
|Value Interval||Average value||All||UK|
|'000 euro||'000 euro||Per cent||Per cent|
|0 - 30||15||8||5|
|30 - 100||65||17||11|
|100 - 300||200||21||16|
|300 – 1,000||650||22||22|
|1,000 – 3,000||2,000||15||23|
|3,000 – 10,000||6,500||10||12|
|10,000 – 30,000||20,000||4||6|
|30,000 – 100,000||65,000||2||3|
|100,000 – 300,000||200,000||1||1|
|More than 300,000||300,000||1||1|
To add weights to the dataset the patents were sorted by age at death and the cumulative proportion of the sample calculated for each one. This was coupled with the UK patent value distribution from Table 3, so that an average value (weight) of €15,000 was given to the first five per cent of patents, €65,000 to the next 11 per cent, and so on. The thresholds between these groups were determined to 14 decimal places to maximise the accuracy of allocation and avoid bias.
Table 4 presents mean and median patent lives. With positively skewed data such as this the median is preferred as it will not be biased upwards by extreme values in the right tail of the distribution. The median life for all patents is 8.0 years, and this is the same for patents matched to businesses. This is slightly longer than the 7 years found by Tanriseven, van Rooijen-Horsten, & de Haan (2010) . Median lives vary from 6.0 in the software industry to 9.0 in ‘Manufacturing’, ‘Transportation’, ‘Administrative activities’, and ‘Human health and social work’. Median Absolute Deviations (MADs) are consistently very low though they are known to be less powerful when applied to skewed data such as this (Rousseeuw & Croux, 1993).
The mean life for all patents is 9.2 years, slightly shorter than industry matched patents at 9.5 years. Both estimates are below the 11.0 years found by the Australian Bureau of Statistics (2009). The maximum life is 21 years – this is expected since patents can usually be renewed up until 20 years after the first anniversary of the filing date.
The data permits detailed industry section analysis, although sections O and T do suffer from sample sizes below 10 and should be treated only as indicative. Median patent lives vary between industrial sections, with the shortest life of 6.0 years in Software and the longest in ‘Transportation and Storage’ at 10 years. Standard deviations are relatively large at just below four on average.
|Industrial section||Patents||Mean life||Min. life||Max. life||St. Dev||Median life||Median absolute deviation|
|No industry identified||33,695||9.1||2||21||4.35||8.0||0.000|
|A - Agriculture, forestry, and fishing||163||8.7||4||21||4.13||8.0||0.000|
|B - Mining and quarrying||62||8.6||4||20||3.24||8.0||0.000|
|C - Manufacturing||12,040||9.5||2||21||4.33||9.0||0.000|
|D - Electricity, gas, steam, etc.||26||7.6||5||13||2.73||7.0||0.000|
|E - Water supply, sewerage, etc.||87||8.3||4||18||3.38||7.0||0.000|
|F - Construction||363||8.4||4||21||4.07||7.0||0.000|
|G - Wholesale and retail trade, etc.||1,873||9.5||2||21||4.28||8.0||0.000|
|H - Transportation and storage||326||10.0||4||21||4.57||9.0||0.000|
|I - Accommodation and food service||60||9.5||4||21||4.28||8.0||0.000|
|J - Information and comms (ex. software)||482||9.6||4||21||4.31||8.0||0.000|
|J - Software||40||7.1||4||15||2.61||6.0||0.000|
|K - Financial and insurance activities||1,647||9.7||3||21||4.69||8.0||0.000|
|L - Real estate activities||114||9.0||4||21||3.94||8.0||0.000|
|M - Professional, scientific and tech (ex. R&D)||1,147||8.4||4||21||3.92||7.0||0.000|
|M - Research & Development||1,637||9.7||4||21||4.25||9.0||0.000|
|N - Administrative and support activities||940||9.4||4||21||4.37||8.0||0.000|
|O - Public administration and defence, etc.||9||7.2||5||10||1.64||8.0||0.000|
|P - Education||359||9.2||4||21||3.71||8.0||0.000|
|Q - Human health and social work activities||109||9.9||4||21||4.37||9.0||0.000|
|R - Arts, entertainment, and recreation||77||7.9||4||20||3.69||7.0||0.000|
|S - Other service activities||410||9.1||4||21||4.12||8.0||0.000|
|T - Activities of households as employers||6||7.7||4||12||3.20||7.0||0.000|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-||-|
Table 5 presents value-weighted averages alongside the unweighted estimates. As in the Netherlands, weighting makes the median lives much longer at 20.0 years for all patents. There is less variation between industrial sections, with 16 sporting lives of 20 years. The lowest value-weighted median life, 8.0 years, is found in ‘Public Adinistration and Defence’, though this has a very low sample of 9.
The value-weighted mean life is 18.8 years for patents mactched to industries, slightly longer than other patents. By industry the longest weighted mean life of 19.5 years is in ‘Financial and Insurance activities’ while the shortest is also in ‘Public Administration’. Software patents remain relatively short lived at 10.8 years.
Following Tanriseven, van Rooijen-Horsten, & de Haan (2010), unweighted and weighted estimates have been treated as lower and upper bounds of the range of possible lives and the average taken. The Dutch unweighted median of seven years is below the eight years of the UK, as is the weighted median of 18 years compared to 20 years in the UK. Therefore the UK ‘average of averages’ is longer – 14.0 years compared to 12.5 years.
|Industrial section||Patents||Mean life||Value-weighted mean life||Average of averages||Median Life||Value-weighted Median life||Average of averages|
|No industry identified||33,695||9.1||18.7||13.9||8.0||20.0||14.0|
|A - Agriculture, forestry, and fishing||163||8.7||18.6||13.6||8.0||20.0||14.0|
|B - Mining and quarrying||62||8.6||17.1||12.9||8.0||20.0||14.0|
|C - Manufacturing||12,040||9.5||18.7||14.1||9.0||20.0||14.5|
|D - Electricity, gas, steam, etc.||26||7.6||11.2||9.4||7.0||13.0||10.0|
|E - Water supply, sewerage, etc.||87||8.3||13.4||10.8||7.0||13.0||10.0|
|F - Construction||363||8.4||19.0||13.7||7.0||20.0||13.5|
|G - Wholesale and retail trade, etc.||1,873||9.5||18.6||14.0||8.0||20.0||14.0|
|H - Transportation and storage||326||10.0||18.8||14.4||9.0||20.0||14.5|
|I - Accommodation and food service||60||9.5||18.6||14.0||8.0||20.0||14.0|
|J - Information and comms (ex. software)||482||9.6||17.9||13.8||8.0||19.0||13.5|
|J - Software||40||7.1||10.8||8.9||6.0||12.0||9.0|
|K - Financial and insurance activities||1,647||9.7||19.5||14.6||8.0||20.0||14.0|
|L - Real estate activities||114||9.0||18.6||13.8||8.0||20.0||14.0|
|M - Professional, scientific and tech (ex. R&D)||1,147||8.4||18.5||13.5||7.0||20.0||13.5|
|M - Research & Development||1,637||9.7||18.6||14.2||9.0||20.0||14.5|
|N - Administrative and support activities||940||9.4||18.9||14.2||8.0||20.0||14.0|
|O - Public administration and defence, etc.||9||7.2||8.1||7.7||8.0||8.0||8.0|
|P - Education||359||9.2||17.3||13.3||8.0||20.0||14.0|
|Q - Human health and social work activities||109||9.9||18.8||14.3||9.0||20.0||14.5|
|R - Arts, entertainment, and recreation||77||7.9||18.1||13.0||7.0||20.0||13.5|
|S - Other service activities||410||9.1||19.0||14.1||8.0||20.0||14.0|
|T - Activities of households as employers||6||7.7||11.0||9.3||7.0||12.0||9.5|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-||-|
|- not published: low sample size|
Figure 1 presents survival profiles based on all patents which had died. The unweighted profile declines smoothly from age 3 onwards with a slight acceleration in deaths toward age 21. Weighting by assumed value shows that the overall value of patents declines only gradually to leave 70 per cent remaining at age 19. Because the weight given to the longest living patents is so high, the remaining value declines rapidly beyond this age. The average profile falls between the two, indicating a more gradual decline in the stock than the unweighted profile, with around 38 per cent surviving to 19 years of age. This is followed by a rapid decline towards the maximum age at death (21 years).
The median service life of 8.0 years for all patents, and the accompanying survival distribution, are drawn from data representing all patents applied for at the EPO with UK addresses between 1986 and 1989, and UK patents applied for between 1990 and 2010, plus those patents applied for from 1986 onward but granted after 1990. As such it is, to all intents and purposes, a census of patents which died between 1986 and 2010 and thus estimates at this level should be highly accurate.
The first level of disaggregation is between the 21,978 dead patents that have been matched to an industry and the other 33,695 that have not. The second group contains not only all patents not related to any business but is likely to also hold a large number of patents which are owned by businesses but were not successfully matched. Independence of these two populations therefore cannot be guaranteed as it is possible that patents relating to the same business may appear in each group as only some of their patents may have been matched. There is no way to estimate the extent of this; the shorter mean life for patents with no industry (9.1 years) this does not necessarily indicate bias in the industry matching process as the unmatched group includes patents belonging to non-businesses such as charities or individuals (which might generally have shorter lives) as well as unmatched patents belonging to businesses. Furthermore, it is possible that the success of matching varies between industries.
Evaluation of Quartile-Quartile (Q-Q) plots, coupled with statistical checks showed that the data are significantly skewed and that kurtosis is also prevalent. Box-plots identified over 70 outliers across 10 industries. The data are therefore not normally distributed and so violate the assumptions of parametric tests for statistical significance which require data to be normally distributed and to have homogeneous variance. The optimal Box-Cox transformation (Box & Cox, 1964) was applied but the data remained significantly skewed and kurtosis remained with instances of multimodality.
Kitchen (2009) explained that in the presence of only one arbitrarily large outlier the mean becomes arbitrarily large; by contrast, the median will not breakdown as long as only a minority of observations are corrupted. Therefore, in the presence of skewed data such as this, the median is likely to provide a more appropriate measure of central location.
Non-parametric tests are more robust to such violations. The Kruskal-Wallis H test for differences in the distributions of service lives across industries was statistically significant. Pairwise Mann-Witney U comparisons of each industry section against the all others grouped together suggest that patent lives in ‘Manufacturing’, ‘Construction’, ‘Software’, ‘Professional, Scientific, and Technical Activities’, ‘R&D’, and ‘Arts Entertainment, and Recreation’ are significantly different from the wider population. Pairwise Mann-Witney U tests of all industry sections identified statistically significant differences between the majority of industries.
Bootstrapping was used to estimate 95 per cent confidence intervals for the mean and median service lives. For unweighted estimates, these are shown in Tables 9 and 10 in Annex 2 where a detailed overview of the methods applied is given.
In general, for unweighted means, the confidence intervals shown are narrow, averaging 0.1 year in width for the various totals and 1.3 years across the various industrial sections. ‘Activities of households as employers’ is an exception, with a confidence interval 4.5 years wide (largely due to its small sample size).
Confidence intervals around the medians are wider in general, though this is partially because the data are not strictly continuous being in round years (as most responses were given in round years rather than years and months) and because the bootstrapping method is better suited to means (Hesterberg et. al, 2003). The picture is similar to the means, with the same industries showing the greatest uncertainty.
Weighting by patent values (not presented) has little effect on the estimated uncertainty around the various totals but increases the width around the industry section estimates from 1.9 to 2.9 years for medians and 1.3 to 2.5 years for means on average. The impact is particularly notable for ‘Mining and quarrying’ (10.0 years for the median life), ‘Arts, entertainment, and recreation’ (8.0 years), and ‘Accommodation and food’ (7.0 years), and ‘Software (7.0).
The average confidence interval around the means increases from 1.3 years to 1.8 years with expenditure weighting. As might be expected, the ‘average of averages’ estimates are somewhere in between with an average confidence interval of 2.7 years around the median. The aforementioned industries continue to have the widest intervals, though these are reduced considerably compared to the weighted estimates.
The results obtained above suggest that unweighted average lives lie in the range of eight to ten years. However, in focusing only upon the 55,673 patents that died during the observation period (1986 to 2010), these estimates ignore the information available for all the other patents in the data set. While the age at which these patents died is not known (as this was not observed), it is known that they survived at least until the age they reached by 2010. For example, in this dataset 7,670 patents (3.8 per cent) were observed to have survived between 22 and 24 years by 2010. This is longer than the maximum death age of 21 years. The Kaplan-Meier approach makes use of this information.
Kaplan-Meier survival analysis methods can take account of such ‘right censoring’ of cases (Jager, van Dijk, Zoccali, & Dekker, 2008). It calculates the nonparametric maximum likelihood estimate of the probability that each member of the population will have a lifetime exceeding each given age:
The result is a series of ‘cumulative survivals’ that decline as patents die – similar to the survival curve presented in Figure 1. Censored patents are included for as many years as they were observed living; when observations are no longer available (ie. for ages beyond that reached by 2010) the cases are removed from both the numerator and the denominator with a neutral impact on survival probability. In this way the model makes use of all the information available not only that relating to patents which had died.
Table 6 presents Kaplan-Meier average service lives which take account of the patents which were observed to exist but did not die before the end of the period in 2010. In survival analysis the median provides a better measure of central location than the mean due to the typically skewed nature of the data. The mean is a statement about the observed survival durations and should not be taken as a statement about how long a patent is expected to survive. Conversely, the median can be interpreted in such a way.
However, the median can only be calculated if the survival probability function steps below 50 per cent; if this has not happened the first half of the survival function cannot be defined from the data and thus the median cannot be calculated. The method makes no inference about survival times longer than the range of times found in the data. This affects the calculation of medians for three industrial sections. The issue also affects the calculation of confidence intervals in a further three cases (IBM, 2010).
All medians are considerably longer, with the median life for all patents at 21 years and 20 years for all patents matched to industries. Industries continue to show variation though four industrial sections share the longest life of 20 years, with a further there having median survival of 19 years. The shortest life is in ‘Public administration and defence, compulsory social security’ which now benefits from a larger sample of 39.
|Industrial section||Patents||Died||Censored||Proportion censored||Mean||95% confidence interval ±||Median||95% confidence interval ±|
|A - Agriculture, forestry, and fishing||342||163||179||52.3||13.5||0.92||12.0||1.14|
|B - Mining and quarrying||436||62||374||85.8||20.3||0.87||*||*|
|C - Manufacturing||36,698||12040||24658||67.2||16.7||0.09||19.0||0.25|
|D - Electricity, gas, steam, etc.||56||26||30||53.6||14.2||2.50||11.0||3.19|
|E - Water supply, sewerage, etc.||184||87||97||52.7||13.6||1.25||12.0||2.17|
|F - Construction||886||363||523||59.0||15.1||0.62||14.0||1.39|
|G - Wholesale and retail trade, etc.||5,633||1873||3760||66.7||16.1||0.25||17.0||0.77|
|H - Transportation and storage||705||326||379||53.8||15.7||0.62||15.0||2.08|
|I - Accommodation and food service||206||60||146||70.9||16.4||1.38||16.0||4.00|
|J - Information and comms (ex. software)||2,226||482||1744||78.3||17.5||0.45||19.0||0.99|
|J - Software||449||40||409||91.1||19.3||1.67||*||*|
|K - Financial and insurance activities||3,693||1647||2046||55.4||16.7||0.26||20.0||0.74|
|L - Real estate activities||246||114||132||53.7||15.2||1.08||15.0||3.13|
|M - Professional, scientific and tech (ex. R&D)||4,046||1147||2899||71.7||16.4||0.34||19.0||0.72|
|M - Research & Development||7,534||1637||5897||78.3||17.9||0.23||20.0||0.41|
|N - Administrative and support activities||3,265||940||2325||71.2||17.5||0.32||20.0||*|
|O - Public administration and defence, etc.||39||9||30||76.9||11.8||2.12||10.0||*|
|P - Education||1,916||359||1557||81.3||18.5||0.48||*||*|
|Q - Human health and social work activities||371||109||262||70.6||16.7||0.98||16.0||2.40|
|R - Arts, entertainment, and recreation||181||77||104||57.5||12.8||1.17||12.0||2.55|
|S - Other service activities||1,179||410||769||65.2||16.9||0.51||20.0||1.69|
|T - Activities of households as employers||15||6||9||60.0||11.6||2.45||12.0||2.05|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-||-||-|
|- not published: low sample size|
The mean life for all patents is 17.9 years, while the mean for those with industry detail is lower at 16.9 years. Again, both are considerably longer than the estimates based on dead patents. The shortest mean life is in ‘Arts, entertainment, and recreation’ (12.8 years), while the longest is in ‘Mining and quarrying (20.3). 95 percent confidence intervals are generally relatively narrow although they are wider (up to five in width) in sections D, O, and T. This is likely to be related to the smaller sample sizes in these groups.
A cautionary note concerns the high proportion of censored data (72.6 per cent for all patents) which is known to affect the efficacy of Kaplan-Meier since the less that is observed, the more conclusions rely on the method. However, this possible downward biasing effect of this will be mitigated somewhat by the hard limit at 21 years imposed by the renewal system. Using data covering a longer period (eg. 30 or more years) would improve the quality of results but even with the data available here there will still be less downward bias than when only considering dead patents. As such Kaplan-Meier based estimates are an improvement over the forerunning analysis.
If one is willing to make the intellectual ‘leap of faith’ necessary to link patents with R&D, they provide observed data that can be used to gain insight on the useful life of the R&D they embody. Furthermore, through matching with industry information, patent data can provide insight at the section level and generally offers good sample sizes.
However, the assumptions required are considerable:
patents represent solely the outcomes of R&D so that patent lives are representative of the R&D they embody
furthermore, in the absence of alternative information patent lives are also representative of all other (unpatented) R&D (which would not be subject to maximum life of 20 years etc.
patents are renewed only as long as they provide benefit, however low fees may encourage some firms may renew by default
while observed data may make results more accurate they are observations, are backward looking, and will lag possible changes such as increasing rates of obsolescence
Tanriseven, van Rooijen-Horsten, & de Haan (2010) suggest making the additional assumption that patent lives and values are perfectly correlated so that the value of patents can be considered. While it seems likely that the correlation is considerable, the basic assumption of perfect correlation required is very strong. The authors acknowledge this and take ‘average of averages’ estimates to moderate this effect. Another issue is that patent value data is only available for a small number of European countries preventing others from producing comparable estimates.
Practical conclusions are that if patent data covering a sufficiently long time period (ideally over 30 years) can be accessed, patent analysis may offer a quick and cost-effective method for deriving at least some indication of the possible life of R&D. Survival analysis techniques represent an improvement in the treatment of data and seem likely to yield less biased estimates of patent lives.
Australian Bureau of Statistics. (2009). Implementation of New International Statistical Standards. Canberra: Australian Bureau of Statistics.
Box, G., & Cox, D. (1964). An Analysis of Transformations. Journal of the Royal Statistical Society , 211-252.
Department for Business, Innovation, and Skills. (2012). First Findings from the UK Innovation Survey, 2011. London: BIS.
Federico, P. J. (1954). Renewal Fees and Other Patent Fees in Foreign Countries. Journal of the Patent Office Society , 827-861.
Gambardella, A., Giuri, P., & Mariani, M. (2005). The Value of European Patents.
Hawkins, D., & Wixley, R. (1986). A note on the transformation of chi-squared variables to normality. The American Statistician , 296-298.
Hesterberg, T., Monaghan, S., Moore, D. S., Clipson, A., & Epstein, R. (2003), Bootstrap Methods and Permutation Tests. W. H. Freeman.
Holm, S. (1979). A Simple Sequential Rejective Multiple Test Procedure. Scandinavian Journal of Statistics , 65 - 70.
IBM. (2010, 12 06). Mean vs Median Survival Time in Kaplan-Meier estimate. Retrieved 03 14, 2013, from IBM Support Portal: http://www-01.ibm.com/support/docview.wss?uid=swg21476688
Intellectual Property Office. (2012). Renewing Your Patent. Retrieved October 4, 2012, from Intellectual Property Office: http://www.ipo.gov.uk/types/patent/p-manage/p-renew.htm
Intellectual Property Office. (2012). What is a Patent? Retrieved March 20, 2013, from Intellectual Property Office: http://www.ipo.gov.uk/p-whatis.htm
Jager, K. J., van Dijk, P. C., Zoccali, C., & Dekker, F. W. (2008). The analysis of survival data: the Kaplan-Meier method. Kidney International , 560-565.
Ker, D. (2013a). Service Lives of R&D Assets: Background and Comparison of Approaches. Newport: Office for National Statistics.
Ker, D. (2013b). Service Lives of R&D Assets: Questionnaire Approach. Newport: Office for National Statistics.
Kirankabes, M. C. (2010). Relationship Between Gross Domestic Expenditure on R&D (GERD) and Patent Applications. Middle Eastern Finance and Economics.
Kitchen, C. (2009). Nonparametric vs Parametric Tests of Location in Biomedical Research. Los Angeles: UCLA School of Public Health.
OECD. (2010). Handbook on Deriving Capital Measures of Intellectual Property Products. Paris: OECD Publishing.
OECD. (2005). Oslo Manual: Guidelines for Collecting and Interpreting Innovation Data. Paris: OECD Publishing.
Office for National Statistics. (2012). Inter-Departmental Business Register (IDBR). Retrieved 10 10, 2012, from Office for National Statistics: http://www.ons.gov.uk/ons/about-ons/who-we-are/services/unpublished-data/business-data/idbr/index.html
Office for National Statistics. (2009). UK Standard Industrial Classification of Economic Activities 2007. Basingstoke: Palgrave Macmillan.
Osborne, J. W. (2010). Improving your data transformations: applying the Box-Cox transformation. Practical Assessment, Research & Evaluation , 1-9.
PwC. (2012). 2012 Patent Litigation Study. Delaware: PwC.
Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association, 1273-1283.
Tanriseven, M., van Rooijen-Horsten, M., & de Haan, M. (2010). Capitalisation of R&D: Preparing the new ESA. The Hague: Statistics Netherlands.
Thomas, Andrew. (2011). Investigating the Characteristics of Patents and the Businesses Which Hold Them. Economic & Labour Market Review, 68 - 86.
Details of the policy governing the release of new data are available by visiting www.statisticsauthority.gov.uk/assessment/code-of-practice/index.html or from the Media Relations Office email: email@example.com
PatVal was a survey study directed to the inventors of 27,531 patents granted by the EPO with priority dates between 1993 and 1997. It aimed to find out about the economic value of patents as well as the characteristics of inventors, the inventive process, and the motivations to protect inventions using patents (Gambardella, Giuri, & Mariani, 2005). Inventors were found by matching details from the patent filing to telephone directories and three pilot surveys were conducted to test the matching and the questionnaire itself. In the UK the postal and online survey was conducted by a professional polling company on behalf of the IPO. ‘Multiple inventors’ were asked about a maximum of five of their patents due to burden and those with three or more patents were interviewed directly by the IPO. However, covering only five patents might lead to the omission of other important patents. Where filings listed multiple inventors co-inventors were contacted if possible.
The sample targeted ‘important’ patents which had been opposed and/or received one or more citations. This provides more information on patents likely to be valuable but affects the ability to generalise the results as any factor correlated with patents ‘importance’ (such as value) is overrepresented in the results. This is another reason why Tanriseven, van Rooijen-Horsten, & de Haan (2010) treat estimates weighted by patent values as an ‘upper bound’. Although there were more important patents filed in the UK than in other countries, only 15 per cent of UK patents were ‘important’ – a lower proportion than elsewhere in Europe.
Information on the economic value of patents was gathered through questions on the financial and time costs of the invention, the strategic value of the patent relative to others in the industry, and the financial value estimated through a hypothetical question on the minimum price for which the owner of the patent would have been willing to sell the patent to a competitor on the day it was granted. This gives the present value of expected benefits of the patent (revenues from products embodying the knowledge, license fees, etc.). Respondents were asked to consider all information available at the time of response; this should improve the precision of estimates as the passage of time since granting is likely to improve understanding of the value of the patent. The survey took place around seven years after the latest application dates of patents in the sample, in many cases the true value of the patent had probably become clear in that time. However, hypothetical questions are known to be susceptible to bias; for example in the question the prospect of selling to a competitor may elicit over-estimation.
Furthermore, inventors may not have the best information about the value of the patent, especially in larger firms (which apply for around 70 per cent of patents); managers or those in sales and licensing may have better knowledge of patents’ commercial value. The authors found, based on one and two-tailed t-tests of French cases where inventor and organisation responses were received, that inventors in large firms (>250 employees) do appear to overestimate the value of their patents compared to managers while those in smaller firms and research institutions do not. However, no systematic way could be found to identify appropriate contacts other than inventors and the overall effect was small.
As the patent data provides a census of all EPO and IPO patents between 1986 and 2010, and matched and unmatched groups will not be independent as some business patents will remain in the unmatched group the main area of interest for statistical testing is differences in lives and survival distributions between industries as these will help to determine the R&D stocks held by different industries.
However, Q-Q plots and histograms suggested positive skew and this was confirmed by Z scores computed from skewness and kurtosis statistics presented in Table 7. This shows that the data are not normally distributed; there is significant positive skew in all industries except D, O, and T – which are notable for comparatively low sample sizes. Nine groups also had significant kurtosis. Additionally, boxplots identified over 70 outliers across 10 industries. The raw data therefore violate the assumptions of parametric tests.
|A - Agriculture, forestry, and fishing||1.22||6.40||1%||0.89||2.35||5%|
|B - Mining and quarrying||1.30||4.27||1%||2.25||3.76||1%|
|C - Manufacturing||0.91||40.67||1%||0.09||1.91|
|D - Electricity, gas, steam, etc.||0.84||1.85||-0.42||-0.47|
|E - Water supply, sewerage, etc.||0.96||3.73||1%||0.27||0.54|
|F - Construction||1.26||9.82||1%||1.11||4.33||1%|
|G - Wholesale and retail trade, etc.||0.90||15.85||1%||0.05||0.47|
|H - Transportation and storage||0.85||6.33||1%||-0.25||-0.94|
|I - Accommodation and food service||0.94||3.04||1%||0.09||0.14|
|J - Information and comms (ex. software)||0.84||7.52||1%||-0.34||-1.51|
|J - Software||1.59||4.25||1%||2.39||3.26||1%|
|K - Financial and insurance activities||0.92||15.32||1%||-0.05||-0.42|
|L - Real estate activities||1.25||5.52||1%||1.19||2.66||1%|
|M - Professional, scientific and tech (ex. R&D)||1.33||18.36||1%||1.38||9.58||1%|
|M - Research & Development||0.89||14.66||1%||0.05||0.40|
|N - Administrative and support activities||1.02||12.75||1%||0.30||1.87|
|O - Public administration and defence, etc.||-0.02||-0.02||-0.44||-0.31|
|P - Education||1.03||8.00||1%||0.77||3.01||1%|
|Q - Human health and social work activities||0.69||3.00||1%||-0.31||-0.68|
|R - Arts, entertainment, and recreation||1.62||5.90||1%||2.46||4.54||1%|
|S - Other service activities||1.09||9.08||1%||0.71||2.94||1%|
|T - Activities of households as employers||0.45||0.53||-1.55||-0.89|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-|
The Box-Cox suite of power transformations can be used to normalise data (including Chi2 distributed data (Hawkins & Wixley, 1986). The method outlined by Osborne (2010) was followed in which the data (y) were transformed such that:
A number of different values of lambda were estimated and suggested that a λ=-0.07 would optimally transform the data. However, though this minimised the skewness of the dataset as a whole to 0.000 (SE=0.017), the data remained highly negatively kurtosed -0.763 (SE=0.03) and did not display a normal distribution in histograms. Furthermore, box-plots showed outliers remaining in three sectors – violating another condition of parametric tests. As such, non parametric tests were performed.
The Kruskal Wallis H tests for statistically significant differences between distributions of the different industrial section groups by evaluating differences in the mean ranks using a chi-squared statistic. Patent age at death was statistically significantly different between industry groups Chi2 (21) = 175.978, p < 0.005. Using boxplots to assess the data for adherence to the underlying assumption of Kruskal Wallis; that groups have similar data distributions around their various medians, it was found that ‘Electricity, gas, steam, and air conditioning supply’, ‘Software, and ‘Public administration and defence’ had markedly different distributions from the other groups. These were notable for their relatively low sample sizes of 26, 40, and 9 respectively. The chi-squared statistic is more accurate with larger samples. Re-performing with these groups excluded, the test remained highly significant Chi2 (18) = 155.339, p < 0.005.
Pairwise Mann-Witney U comparisons of each industry section against the others together suggest that patent lives in ‘Manufacturing’, ‘Construction’, ‘Software’, ‘Professional, Scientific, and Technical Activities’, ‘R&D’, and ‘Arts, Entertainment, and Recreation’ are significantly different when compared against all other patents together as shown in Table 8. These results have been given ‘Holm’s sequential Bonferroni Adjustment’ (Holm, 1979) which computes more conservative 95 per cent thresholds based on the rank significance of each result to reduce the chance of Type I error.
|Industrial Section||U||Z||Asymptotic Significance||Bonferroni-adjusted sig.*|
|A - Agriculture, forestry, and fishing||1,572,841||-2.55||0.01|
|B - Mining and quarrying||636,819||-0.86||0.39|
|C - Manufacturing||58,039,756||-3.82||0.00||1%|
|D - Electricity, gas, steam, etc.||216,082||-2.15||0.03|
|E - Water supply, sewerage, etc.||815,123||-2.33||0.02|
|F - Construction||3,306,060||-5.17||0.00||1%|
|G - Wholesale and retail trade, etc.||18,551,446||-1.05||0.29|
|H - Transportation and storage||3,319,752||-1.85||0.06|
|I - Accommodation and food service||655,441||-0.04||0.97|
|J - Information and comms (ex. software)||5,092,074||-0.64||0.52|
|J - Software||289,262||-3.74||0.00||1%|
|K - Financial and insurance activities||16,394,198||-1.41||0.16|
|L - Real estate activities||1,188,330||-0.86||0.39|
|M - Professional, scientific and tech (ex. R&D)||10,105,157||-8.83||0.00||1%|
|M - Research & Development||15,790,720||-3.49||0.00||1%|
|N - Administrative and support activities||9,745,049||-0.75||0.45|
|O - Public administration and defence, etc.||72,882||-1.37||0.17|
|P - Education||3,850,328||-0.25||0.80|
|Q - Human health and social work activities||1,120,476||-1.08||0.28|
|R - Arts, entertainment, and recreation||641,326||-3.65||0.00||1%|
|S - Other service activities||4,262,040||-1.26||0.21|
|T - Activities of households as employers||51,091||-0.96||0.34|
|U - Activities of extraterritorial organisations||-||-||-||-|
However, more detailed full pairwise comparisons (not presented) showed that the significant differences between industrial sections are concentrated in these groups but there were no clear patterns marking individual sections as consistently different from others, although ‘Construction’ had more significant differences (7) than others industries.
As is often the case, the analysis presented here is based on only one sample from the population of interest – all UK patents across different industry groups. However, statistical testing is based upon the sampling distribution of the statistic of interest (mean, median) which can only be found by taking repeated samples from the target population. Parametric (and to a lesser extent non-parametric) tests make assumptions about the sampling distribution of the test statistic. However, evaluation of this data has shown strong positive skew and kurtosis and, although the central limit theorem states that the sampling distribution of the mean should tend to normality as sample sizes increase, this is not always the case.
Bootstrapping provides an alternative tool that uses the distribution of the data itself rather than relying on a theoretical distribution which may not hold. By taking 10,000 repeated re-samples of the composite lives with replacement a bootstrap sampling distribution of 10,000 sample statistics (means, medians) was created for each R&D type and industry section. Resample sizes equal the number of responses in the observed sample for that group. Weighting prior to this process produces a bootstrap distribution of the weighted statistic of interest.
Provided the sample is sufficiently large ‘its shape and spread don’t depend heavily on the original sample and mimic the shape and spread of the sampling distribution’ (Hesterberg et. al, 2003). However, as usual, results will be more reliable with larger samples since ‘bootstrap distributions do not have the same centre as the sampling distribution; they mimic bias, not the actual centre’ (ibid) and the implicit assumption is that the distribution of the sample data is representative of the true population distribution. Here the smallest sample was 45 in ‘J – Information and Communication (ex. software)’. With enough resamples, bootstrapping will introduce very little additional variation beyond the original sampling variation so ‘we can rely on a bootstrap distribution to inform us about the shape, bias, and spread of the sampling distribution’ (ibid).
This bootstrap sampling distribution is used to draw conclusions about the distribution of the sample means. Histograms showed varying degrees of non-normality with combinations of skewness, kurtosis, and multimodality – especially in the bootstrap distributions of expenditure weighted means. Table 9 presents means, bootstrap 95 per cent confidence intervals, standard errors, and bias estimators for the unweighted estimates. These are based on 10,000 resamples except in the case of the totals for which computational limitations permitted only 1,000 resamples. Even so, these results should still be sufficiently robust.
The 95% confidence intervals are very narrow around the means for the large totals. These are estimated using the percentile method as more robust methods are unavailable in SPSS 12. However, percentiles perform better than t-distribution based intervals when there is skewness and also improve with larger samples (Hesterberg et. al, 2003) so the method should perform acceptably here. For industry sections, the confidence intervals average 1.3 years in width, with the largest range in ‘Activities of Households as Employers’ (4.5 years) which was notable for its small sample size. Bootstrap standard errors are relatively small and there is little evidence of bias in the observed means.
|Industrial section||Sample Mean life||Bootstrap 95% CI Lower||Bootstrap 95% CI Upper||Mean of bootstrap distribution (of mean life)||Bootstrap standard error||Bootstrap bias estimate|
|No industry identified||9.1||9.1||9.1||9.1||0.0||0.0|
|A - Agriculture, forestry, and fishing||8.7||8.1||9.3||8.7||0.3||0.0|
|B - Mining and quarrying||8.6||7.9||9.5||8.6||0.4||0.0|
|C - Manufacturing||9.5||9.5||9.6||9.5||0.0||0.0|
|D - Electricity, gas, steam, etc.||7.6||6.6||8.7||7.6||0.5||0.0|
|E - Water supply, sewerage, etc.||8.3||7.6||9.0||8.3||0.4||0.0|
|F - Construction||8.4||8.0||8.9||8.4||0.2||0.0|
|G - Wholesale and retail trade, etc.||9.5||9.3||9.7||9.5||0.1||0.0|
|H - Transportation and storage||10.0||9.5||10.5||10.0||0.3||0.0|
|I - Accommodation and food service||9.5||8.4||10.6||9.5||0.6||0.0|
|J - Information and comms (ex. software)||9.6||9.2||10.0||9.6||0.2||0.0|
|J - Software||7.1||6.3||7.9||7.1||0.4||0.0|
|K - Financial and insurance activities||9.7||9.5||10.0||9.7||0.1||0.0|
|L - Real estate activities||9.0||8.3||9.7||9.0||0.4||0.0|
|M - Professional, scientific and tech (ex. R&D)||8.4||8.2||8.7||8.4||0.1||0.0|
|M - Research & Development||9.7||9.5||9.9||9.7||0.1||0.0|
|N - Administrative and support activities||9.4||9.1||9.7||9.4||0.1||0.0|
|O - Public administration and defence, etc.||7.2||6.2||8.2||7.2||0.5||0.0|
|P - Education||9.2||8.9||9.6||9.2||0.2||0.0|
|Q - Human health and social work activities||9.9||9.1||10.7||9.9||0.4||0.0|
|R - Arts, entertainment, and recreation||7.9||7.1||8.7||7.9||0.4||0.0|
|S - Other service activities||9.1||8.7||9.5||9.1||0.2||0.0|
|T - Activities of households as employers||7.7||5.5||10.0||7.7||1.2||0.0|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-|
Although median life estimates are preferred as a measure of central location due to being less susceptible to upward bias, bootstraps for medians rely upon the few data points around the centre of the sample (Hesterberg et. al, 2003) and therefore bootstrap estimates for the median are less reliable than the mean (ibid). This is less problematic with sample sizes over 100 so the method should perform well in the majority of cases. However, industries D, O, and T contain fewer than 100 patents and the potential limitations must be considered when interpreting bootstrap parameters for the medians.
The medians in Table 10 show a similar story to the means with generally narrow confidence Intervals, low Median Average Deviations (MADs), and no evidence of bias. However, MADs are known to be relatively inefficient and problematic with skewed data such as this (Rousseeuw & Croux, 1993).
In general the unweighted means and medians appear relatively robust with generally narrow confidence intervals (1.9 years in width on average) and no evidence of bias. A notable exception is ‘Activities of extraterritorial organisations’ which has a very low sample size.
|Industrial section||Sample median life||Bootstrap 95% CI Lower||Bootstrap 95% CI Upper||Mean of bootstrap distribution (of median life)||Bootstrap standard error||Bootstrap bias estimate|
|No industry identified||8.0||8.0||8.0||8.0||0.0||0.0|
|A - Agriculture, forestry, and fishing||8.0||7.0||8.0||8.0||0.0||0.0|
|B - Mining and quarrying||8.0||7.0||9.0||8.0||0.0||0.0|
|C - Manufacturing||9.0||8.0||9.0||9.0||0.0||0.0|
|D - Electricity, gas, steam, etc.||7.0||5.0||8.0||7.0||0.0||0.0|
|E - Water supply, sewerage, etc.||7.0||6.0||8.0||7.0||0.0||0.0|
|F - Construction||7.0||7.0||8.0||7.0||0.0||0.0|
|G - Wholesale and retail trade, etc.||8.0||8.0||9.0||8.0||0.0||0.0|
|H - Transportation and storage||9.0||8.0||10.0||9.0||0.0||0.0|
|I - Accommodation and food service||8.0||7.0||10.0||8.0||0.0||0.0|
|J - Information and comms (ex. software)||8.0||8.0||9.0||8.0||0.0||0.0|
|J - Software||6.0||6.0||7.0||6.0||0.0||0.0|
|K - Financial and insurance activities||8.0||8.0||9.0||8.0||0.0||0.0|
|L - Real estate activities||8.0||7.0||9.0||8.0||0.0||0.0|
|M - Professional, scientific and tech (ex. R&D)||7.0||7.0||8.0||7.0||0.0||0.0|
|M - Research & Development||9.0||8.0||9.0||9.0||0.0||0.0|
|N - Administrative and support activities||8.0||8.0||9.0||8.0||0.0||0.0|
|O - Public administration and defence, etc.||8.0||5.0||8.0||8.0||0.0||0.0|
|P - Education||8.0||8.0||9.0||8.0||0.0||0.0|
|Q - Human health and social work activities||9.0||8.0||11.0||9.0||0.0||0.0|
|R - Arts, entertainment, and recreation||7.0||6.0||8.0||7.0||0.0||0.0|
|S - Other service activities||8.0||8.0||9.0||8.0||0.0||0.0|
|T - Activities of households as employers||7.0||4.5||11.5||7.0||0.0||0.0|
|U - Activities of extraterritorial organisations||-||-||-||-||-||-|