This paper reports the findings of a feasibility study in which data from the Annual Enterprise Survey (AES) were linked to data from the Linked Employer-Employee Database (LEED) and enterprise-level measures of labour productivity and financial performance were constructed. An assessment was then made of the strengths and weaknesses of these data for research purposes. The overall purpose of the exercise was to explore the benefits that may be gained from linking business data to LEED.
In the course of the project, many useful insights were gained into the strengths and weaknesses of both AES and LEED at unit record level. This paper summarises those insights for the benefit of researchers who are considering the use of AES microdata in their own research, with or without other data sources such as LEED.
The paper covers issues such as the size and representativeness of the AES sample; whether AES can provide a representative longitudinal sample for use in longitudinal analysis; the longitudinal correlation of AES responses; the quality of the match obtained between AES and LEED records; and whether labour productivity measures that are constructed using AES measures of value-added and LEED employment data are comparable with other firm-level labour productivity measures.
The findings of the investigation indicate that a longitudinally-linked AES-LEED dataset is complete enough and of sufficiently good quality to be used in exploring a class of research problems that require longitudinal enterprise data. However, there is measurement error in the data, caused by non-response in AES and other data collection limitations. Researchers need to be aware of the data quality issues that exist and take care when drawing inferences from the data.
Key Results
This paper has investigated the potential benefits of using unit record data from the Annual Enterprise Survey in conjunction with unit record data from the Linked Employer-Employee Database, to construct an enterprise-level dataset containing measures of labour productivity and financial performance. A trial dataset linking AES and LEED records was created and its strengths and weaknesses were assessed. The dataset included all privately-owned profitoriented firms with employees. This section summarises the main findings of the investigation.
Strengths and weaknesses of the data sources
LEED is a rich source of employee data. It is considered to be the most accurate source of earnings data currently available. Its strengths include the monthly unit of observation and the fact that records are available for all wage and salary earners in the economy, and the enterprises that employed them. Significant limitations include the period of time covered by LEED at present (1999–2006) and the absence of a measure of hours worked per employee. The latter means that hourly wage estimates cannot be derived, either at individual or firm level. The poor measurement of the quantity of labour inputs makes it difficult to interpret results involving any variable or relationship that is likely to be influenced by variations in the quantity of labour inputs.41 Another limitation is that working proprietor measures are still under development.
AES provides detailed measures of sales, gross output, value-added, intermediate inputs, income, expenditure, and profits. AES data are derived from responses to industry-specific postal questionnaires (the AES survey), and from the company accounts data that is provided to Inland Revenue on IR10 forms. Because of the detailed approach taken, output and value added are likely to be measured more accurately in AES than in any other data source.
Some data limitations arise from the fact that AES uses a stratified sample and postal questionnaire to gather data from limited liability companies. The sample is stratified by industry and size of firm. The sampling fractions tend to be high for large companies, leading to the inclusion of most of these large companies, but they are much lower for small and medium-sized firms. This means that most small and medium-sized limited liability firms in the economy are not included in the AES sample. In 2004 around 10,000 firms with employees that were classified as limited liability companies, joint ventures or companies incorporated overseas, were sampled from a total population that we roughly estimate was around 90,000. At a detailed industry level, it is possible that the size of the AES sample could be too small for some types of analysis.
AES uses IR10 returns to obtain data on sole proprietorships, partnerships and businesses in agriculture. This approach leads to a fairly high level of coverage of the population, but the data quality tends to be poorer, and key variables are not measured in as much detail as in the postal AES survey.
The AES sample design means that weights must be used to derive population estimates. The current survey weights were designed for a specific purpose (to give good industry-level and national estimates of firms’ financial performance and financial position), and they may not be ideal for all estimation purposes.
Other issues arise from the fact that one-third of AES records are imputed because of nonresponse (or because the response provided did not met Statistics NZ editing checks). This paper compared the non-imputed subsample with the total AES sample. The non-imputed subsample appears to represent a reasonably balanced cross-section of enterprises in the full sample, but researchers need to be aware of the potential for sample biases if imputed records are dropped. In particular, smaller firms are under-represented in the non-imputed sample. Consideration could be given to the development and application of modified weights, to make the non-imputed sample match the intended population more closely on key attributes such as size and industry.
Although AES gathers data on firms’ assets at balance-sheet values, it was not designed to measure productive capital assets or the flow of capital services used by firms. This could be a significant limitation for some research purposes.
AES-LEED record matching
The vast majority of enterprises in AES with employees can be matched with employing enterprises in LEED using Statistics NZ’s unique enterprise numbers. Although we believe these enterprise matches are largely accurate, a comparison of annual salary and wage expenditure variables (common to both AES and LEED) revealed that approximately 38 percent of enterprises had inconsistent data. Some of the possible reasons for this were discussed in this paper. One hypothesis, for example, is that payments for some types of labour inputs (such as contractors, agents on commission, and family members) may be recorded as salaries and wages in financial accounts, but do not lead to PAYE deductions, or vice versa. It is also possible that a percentage of enterprise records are incorrectly matched because of incorrect enterprise numbers or administrative changes to enterprise numbers.
Further analysis indicated that the exclusion or separate identification of firms with large discrepancies in their salaries and wages data does have a statistically significant impact on estimates, although not necessarily an economically material one. It is recommended that future researchers be aware of this issue, undertake their own sensitivity analyses, and decide whether to exclude the categories of firm which are most likely to suffer from the mismeasurement of labour productivity.
Linked sample coverage and representativeness
Because AES is a sample survey, and sampling fractions vary by size of firm and industry, weights must be used to generate valid estimates of population totals. In this investigation, we focused on private profit-oriented businesses with employees. The linked sample (with weights) covers a large proportion of total output as measured by AES. Firms in the linked AES-LEED sample contributed approximately 84 percent of the sum of value-added in the AES survey as a whole, and about 96 percent of total salaries and wages paid. After we excluded imputed records, records with missing data for value-added, and records with negative values of value-added, the final analytical sample accounted for more than 70 percent of the AES value-added total, and 76 percent of AES-estimated total expenditure on salaries and wages.
Smaller enterprises are more likely to be dropped from the sample when imputed records and outlying values are excluded. The industry composition of the sample is not much changed by these exclusions, at two-digit level at least. Nevertheless, researchers need to be aware that any significant sample exclusions have the potential to introduce bias. They should assess the fitness of their chosen subsample for the research purpose, and consider re-weighting to reduce the impact of selection biases.
Constructed labour productivity measures using linked records
Labour productivity per person measures derived from the AES-LEED linked dataset appear to be broadly plausible. However, we currently have very little ability to validate firm-level labour productivity measures against other evidence. The official approach to productivity measurement for New Zealand as a whole uses an index number approach. There are no plans to release ‘level’ or dollar-value measures of output per person or per hour.
A comparison of the labour productivity estimates derived in this study with labour productivity estimates obtained by Maré and Timmins (2006) and Law, Buckle and Hyslop (2006), using Business Activity Indicator (BAI) data to construct a proxy measure of valuedadded and Business Demography data to measure employment, revealed large differences in some industries. We compared 1-digit industry averages. The results obtained suggest that further work using unit record data to compare the BAI and AES measures of value-added, aimed to help users better understand the pattern of differences, should be undertaken.
Longitudinal data availability
The AES sample was designed for cross-sectional estimation purposes. However, stable enterprises in the postal sample are mainly reselected each year, many firms in the tax sample provide data repeatedly through their IR10s, and researchers can link enterprise responses across years using each firm’s unique enterprise number. This means a panel sample can be constructed. A limitation on this comes from the fact that changes in enterprise numbers for legal or administrative reasons can prevent firms from being linked or lead to attrition from the sample (because no attempt is made to follow and reselect firms whose enterprise number has changed). Enterprises in the postal sample that expand or contract significantly are also less likely to be reselected in subsequent years, due to sample design features. In addition, longitudinal continuity is reduced by non-response and partial non-response.
We constructed several different longitudinal samples using the 2000 to 2004 AES data and analysed the characteristics of these samples. Sixty-nine percent of respondents in our 2000 analytical sample also had non-imputed responses in 2001 but only 40 percent had nonimputed responses in 2004. Longitudinal response continuity is higher among firms that had 10 or more employees in the base year, and among firms in the postal sample. In both cases approximately 50 percent of respondents in 2000 were also respondents in 2004. The majority of medium-sized and large firms in our main study samples had at least two responses during the 2000 to 2004 period.
Response probability is associated with firm size, so longitudinal samples invariably contain a higher proportion of larger firms. On other measured attributes, the longitudinal samples that we constructed from AES appeared to be reasonably similar to the cross-sectional samples. There were no pronounced differences in industry composition, for example. However, researchers need to be aware of the potential effects of longitudinal sample attrition and explore those effects when conducting a longitudinal analysis.
Consistency of firm responses across time and within years
In general, input and output measures are quite highly correlated across time at enterprise level, suggesting a reasonable level of consistency in responses. Our analysis of the economic relationships between inputs, outputs and performance measures also gave broadly plausible results, suggesting that the unit record AES data does capture meaningful information on these relationships and could be used in modelling firm behaviour.