Article Text

Download PDFPDF

Reliability and validity of health status measurement by the TAPQOL
  1. E M Bunge1,
  2. M-L Essink-Bot1,
  3. M P H M Kobussen2,
  4. L W A van Suijlekom-Smit3,
  5. H A Moll3,
  6. H Raat1,4
  1. 1Department of Public Health, Erasmus MC, University Medical Center Rotterdam, Netherlands
  2. 2Community Care Salland, Ommen, Netherlands
  3. 3Department of Paediatrics, Erasmus MC–University Medical Center Rotterdam, Netherlands
  4. 4GGD–Municipal Health Service, Rotterdam, Netherlands
  1. Correspondence to:
    Ms E Bunge
    Department of Public Health, Erasmus MC, University Medical Center Rotterdam, PO Box 1738, 3000 DR Rotterdam, Netherlands; e.bungeerasmusmc.nl

Abstract

Background: In addition to clinical measures in the evaluation of paediatric interventions, health related quality of life (HRQoL) is an important outcome. The TAPQOL (TNO-AZL Preschool children Quality of Life) was developed to measure HRQoL in preschool children. It is a generic instrument consisting of 12 scales that cover the domains physical, social, cognitive, and emotional functioning.

Aims: To evaluate the feasibility, score distribution, internal consistency, test-retest reliability, and discriminative and concurrent validity of the TAPQOL multi-item scales in preschool children, aged 2–48 months. Also to evaluate the feasibility, reliability, and validity separately for infants (2–12 months old) and toddlers (12–48 months old).

Methods: Parents of a random general population sample of 500 preschool children were sent a questionnaire by mail. A random subgroup of 159 parents who participated received a retest after two weeks.

Results: The response rate was 83% at the test and 75% at the retest. There were few missing answers. Six scales showed ceiling effects. Nine scales had Cronbach’s alphas >0.70. In general, score distributions and Cronbach’s alphas were comparable for infants and toddlers. Test-retest showed no significant differences in mean scale scores; two scales had intra-class correlations <0.50. Five scales showed significant differences between children with no conditions versus children with two or more parent reported chronic conditions.

Conclusion: Results showed that the TAPQOL is a feasible instrument to measure HRQoL and support the reliability and discriminative validity of the majority of its scales for infants as well as toddlers.

  • TAPQOL
  • quality of life
  • reliability
  • test retest
  • validity

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Health status and health related quality of life (HRQoL) measures are used for the evaluation of healthcare intervention in community medicine and clinical practice.1–6 Furthermore, HRQoL measures are used for descriptive studies; for example, burden of disease studies in public health7,8 and follow up studies of distinct patients groups.9 In the future, possibilities may arise for applications in daily medical practice in both community and clinical medicine.10

Few HRQoL measures are available for preschool children.11–17 A reason for this might be that young children show a fast development of cognitive, motor, and behavioural functions, especially during the first years of life.18 This means that instruments which intend to cover a relatively wide age range (for example, 0–4 years) have to somehow accommodate for this.

The TAPQOL is the first multi-dimensional HRQoL measure that was specifically designed for preschool children aged 1–5 years.14–17 As preschool children cannot complete questionnaires by themselves, the TAPQOL uses a proxy, mostly a parent. In this study we evaluated the psychometric properties of the TAPQOL including, for the first time, assessment of the test-retest reliability. Additionally, also for the first time, we applied the TAPQOL to infants (2–12 months old) and specifically evaluated its performance in this subgroup.

The aim of this study was to evaluate the feasibility, score distribution, internal consistency, test-retest reliability, and discriminative and concurrent validity of the TAPQOL multi-item scales in preschool children, aged 2–48 months. In addition, the feasibility, reliability, and validity were evaluated separately for infants (2–12 months old) and toddlers (12–48 months old).

METHODS

The Medical Ethical Committee of the Erasmus MC, University Medical Center Rotterdam, approved this study.

Population and data collection

Parents of a random general population sample of 500 preschool children (2–48 months old) in the eastern part of the Netherlands were sent the TAPQOL questionnaire by mail. The parents themselves decided which parent should participate. In case of non-response each household received maximally two reminder letters; no incentives to participate were given. Two weeks later, a random subgroup of 158 participating parents received the same questionnaire to assess test-retest reliability. The completed TAPQOL questionnaires were returned by mail.

Only parents who were considered to be able to adequately read and write Dutch were eligible for analysis. This was operationalised as at least one parent being Dutch or, if both parents were of foreign origin, that they should have an education of higher vocational level or have a university degree.

Questionnaire

The TAPQOL is a 43 item questionnaire consisting of 12 multi-item scales that cover the domains physical, social, cognitive, and emotional functioning (see fig 1). The number of items per scale ranges from three to seven. TAPQOL items generally relate to the past three months, but this may be adjusted for specific research aims. For all scales, the presence of a specific complaint or limitation was scored on a three point scale, namely “never”, “occasionally”, and “often”. For seven TAPQOL scales (“stomach problems”, “skin problems”, “lung problems”, “sleeping”, “appetite”, “motor functioning”, and “communication”), first the presence of a specific complaint or limitation is recorded and, if this is the case, the wellbeing of the child related to that complaint or limitation is measured on a four point scale, namely “fine”, “not so good”, “quite bad”, and “bad”. Scale scores we calculated by adding up item scores within scales, and transforming crude scales scores linearly to a 0–100 scale, with higher scores indicating better quality of life (see fig 2 for an example). The scales “social functioning”, “motor functioning”, and “communication” are only relevant for children aged 1½ years and older.19

Figure 1

 Items on the TAPQOL questionnaire.

Figure 2

 Example item scores TAPQOL.

The TAPQOL is available in a Dutch as well as in an English version, translated from Dutch according to international guidelines.20

Besides the TAPQOL, demographic variables and the prevalence of chronic conditions and visits to the general practitioner were assessed. Questions about chronic conditions covered: asthma or recurrent problems of the respiratory tract, recurrent otitis or having tympanostomy tubes, defective vision in which glasses are not helpful, regular abdominal pain, allergies, eczema, and other conditions.

Analyses

In accordance with the TAPQOL guidelines all items of a three item scale should be completed in order to be eligible for analysis. In scales with four items one missing answer is allowed; in the seven item scale, “problem behaviour”, two missing answers were allowed. In case of non-unique answers (more than one answer per question), one answer was imputed randomly.

Feasibility of the TAPQOL was evaluated by assessing the response rate and by evaluating both the number of missing answers per item and the number of non-unique answers per item. Cronbach’s alpha was used to determine the internal consistency of the scales.21 Separate analyses were made for the subgroup with two or more parent reported chronic conditions. Average correlation coefficients were calculated between items and their own scale (without the item under consideration) and between items and every other scale, to determine whether the items were well chosen and if the scales represent different domains. The average corrected item-own scale correlation coefficients are expected to be higher than the average item-other scale correlation coefficients. At the group level, test-retest reliability was assessed by the Wilcoxon signed ranks test. We used non-parametric tests, because data were skewed and TAPQOL scales are not continuous but have a lowest and highest possible value that will show ceiling effects. Cohen’s effect size,22 which relates the difference in mean scores between test and retest to the dispersion of the scores of the test, were calculated: d  =  [mean(a) − mean(b)]/SD at the test.22 At the individual level, the intra-class correlation (ICC) was applied to assess test-retest reliability.23 Discriminant validity was evaluated by comparing TAPQOL scale scores of a subgroup of children with no parent reported chronic conditions with those of a subgroup of children with two or more parent reported chronic conditions. The Mann-Whitney U test was used to determine differences in mean scale scores between the two groups. Cohen’s effect sizes were calculated: d  =  [mean(a) − mean(b)]/SD of the subgroup with parent reported conditions. Comparisons were also made of TAPQOL scale scores between a subgroup of children with zero or one visit to the general practitioner and a subgroup of children with four or more visits to the general practitioner in the last year. Spearman’s rank order correlation coefficients were applied to evaluate concurrent validity of the TAPQOL with a single item general health rating: “In general, would you say your child’s health is: excellent, very good, good, fair, or poor”.

SPSS 10.0 was used for the analysis.

RESULTS

Response, feasibility, and sample characteristics

Response rate was 83.0%; five (1.2%) questionnaires were not eligible for analysis (see methods). Response rate at the retest was 75.3%; one questionnaire was not eligible for analysis; 115 retest questionnaires could be matched to a test questionnaire (same child, same respondent). Mean age of the parent respondents was 33.1 (SD 7.1) years; 97% of the respondents were mothers. Most lived together with their partner (98%); 50% of the respondents had a part-time job and 36% were homemakers; 33% of the respondents had an education at intermediate vocational level, 39% had a lower, and 27% a higher educational level than intermediate vocational education.

Fifty per cent of the children eligible for analysis were girls and 22% of the children were infants (between 2 and 12 months old).

There were few missing answers on the TAPQOL (circa 1% per item) and very few non-unique answers (less than 1% per item).

Score distribution and internal consistency

Six scales had ceiling effects (that is, >50% of the respondents had the maximum score). When the total group showed a ceiling effect, then both the infants and toddlers as subgroups did so. On only one item (“appetite”) did the subgroup infants show a ceiling effect whereas the subgroup toddlers and total group did not. In the total group, nine scales had Cronbach’s alpha >0.70. The subgroup infants (except for “liveliness”) and the subgroup toddlers showed sufficient internal consistency for the same scales as the total group, but in general, the subgroup infants had somewhat lower Cronbach’s alphas than the subgroup toddlers (table 1). For the subgroup with two or more parent reported chronic conditions, five scales (same scales as for the total group except for “lung problems”) showed ceiling effects. Overall, the percentages of respondents with a maximum score were lower in this subgroup than in the total group. In this subgroup, eight scales had Cronbach’s alphas >0.70. These were the same scales as in the total group; only liveliness showed in this subgroup a Cronbach’s alpha below 0.70.

Table 1

 Score distribution and internal consistency of TAPQOL scales in 410 children: 92 infants aged 2–12 months and 318 toddlers aged 12–48 months

There were no differences with regard to scale means between boys and girls, except that girls had a higher mean score than boys on the scale “communication” (p < 0.01).

All scales had higher average corrected item-own scale correlation coefficients than the corresponding average item-other scale correlation coefficients in the total group, as well as in the subgroups infants and toddlers (table 2).

Table 2

 Average inter-item, corrected item-own scale, and item-other scale correlations* of the TAPQOL scales in 410 children: 92 infants aged 2–12 months and 318 toddlers aged 12–48 months

Test-retest reliability

In the total group there were no significant differences in mean scale scores between test and retest. The subgroup infants showed significant differences between mean scores for “lung problems” and “anxiety”; toddlers did not show significant differences. Two scales in the total group, three scales in the subgroup infants, and two scales in the subgroup toddlers had an ICC <0.50. For most scales ICCs were lower in the subgroup of infants than in the subgroup of toddlers, except for “liveliness and “positive mood” (table 3).

Table 3

 Test-retest reliability of the TAPQOL in a subgroup of 115 preschool children: 28 infants aged 2–12 months and 87 toddlers aged 12–48 months

Discriminant validity

The most prevalent parent reported chronic conditions were asthma (20%), eczema (14%), and regular otitis or having tympanostomy tubes (11%); the remainder of the conditions were prevalent in less than 6% of the children. For the total group, five scales (“sleeping”, “appetite”, “lung problems”, “skin problems”, and “problem behaviour”) showed significantly different mean scores between the subgroup of children with zero parent reported chronic conditions versus the subgroup of children with two or more conditions. Cohen’s effect sizes were large for the scales “lung problems” and “skin problems”. In general, the subgroup infants showed the same effect sizes as the subgroup toddlers, except for “sleeping” and “lung problems”, where the effect sizes for the subgroup infants were much larger than for the subgroup toddlers (table 4).

Table 4

 Mean (SD) scores of the TAPQOL scales, separately for all ages, infants, and toddlers, for the subgroup without parent reported conditions (n = 240; of which 57 infants and 183 toddlers), the subgroup with one condition (n = 113; of which 24 infants and 89 toddlers), and the subgroup with two or more conditions (n = 57; of which 11 infants and 46 toddlers)

For the number of visits to the general practitioner, six scales (“sleeping”, “appetite”, “lung problems”, “stomach problems”, “skin problems”, and “problem behaviour”) showed significant mean scale score differences between the subgroup of children with zero or one visit and the subgroup of children with four or more visits in the last year for the total group. “Appetite” and “stomach problems” showed no significant differences in the subgroup infants. The scales “sleeping” and “lung problems” had large Cohen’s effect sizes, especially in the subgroup infants.

Concurrent validity

There were low but significant Spearman’s correlation coefficients in the expected direction between nine TAPQOL scales and a single item general health rating. Six of the nine scales suitable for the subgroup infants showed larger correlation coefficients between TAPQOL scales and a single item general health rating than in the subgroup toddlers (table 5).

Table 5

 Concurrent validity: Spearman’s correlation coefficients and Pearson correlations between TAPQOL scales and the CHQ-IT general health question in 410 children: 92 infants aged 2–12 months and 318 toddlers aged 12–48 months

DISCUSSION

This study, with a very high response rate,24 established the feasibility of the parent completed TAPQOL questionnaire for preschool children in a large general population sample; psychometric properties were generally adequate in the total group as well as in the subgroups infants and toddlers.

Because our study was limited to a random general population sample, we could not evaluate the applicability of the TAPQOL in clinical populations. We had only one cross-sectional assessment and a retest; therefore, we could not evaluate responsiveness to change in health status over time. Another limitation is that we are unaware of the adequacy of proxy rating (by parents) which are indispensable for this age group. Proxy rating may be confounded by many factors.13,25,26

Our results can be compared only with those from the study by Fekkes and colleagues,14 and our data confirm their results concerning ceiling effects. The phenomenon “ceiling effects” may limit the use of the TAPQOL to detect changes and to describe health beyond the average in relatively healthy populations. In general, our Cronbach’s alphas were somewhat higher than in the study of Fekkes et al, but especially for “skin problems”, “motor functioning”, “communication”, and “positive mood” our Cronbach’s alphas were much higher than reported by Fekkes and colleagues.14 For discriminant validity, both Fekkes et al and our study found significant differences in mean scale scores in the physical functioning domain for children with and without parent reported chronic conditions.

Test-retest reliability was low for some scales; a phenomenon that has also been reported in evaluations of other instruments.27,28 For the scales “stomach problems” and “anxiety” the Cronbach’s alpha is also low. We suggest further research on this topic, as test-retest reliability should be adequately shown, especially in studies with repeated measurements.

Ceiling effects were present in the total group as well as in the subgroup with parent reported chronic conditions, although in the subgroup to a lesser degree (five scales instead of six scales with ceiling effects and fewer respondents with maximum scores). This can be interpreted as follows. The chronic conditions mentioned by the parent mostly affected physical functioning. The scales belonging to this domain did show differences between children with and without parent reported chronic conditions. The other domains seemed not to be affected in these conditions. We suggest further evaluation of the TAPQOL in patient groups with distinct conditions that affect the emotional, social, and cognitive TAPQOL domains, such as children with attention deficit hyperactivity disorder (ADHD) or mental retardation.

In conclusion, our study that was conducted in the setting of community medicine showed that the TAPQOL is a feasible and reliable instrument to measure health status and health related quality of life. Our results suggest that the TAPQOL will also be applicable in the clinical setting with conditions that affect physical functioning, since it clearly discriminated between children with and without parent reported chronic conditions with a physical nature. Although the TAPQOL was not originally designed for infants, our study supports the reliability and discriminative validity of the majority of its scales, not only for toddlers but also for infants. We propose further research, including cross-cultural validation,29 evaluations in clinical samples, and evaluations of responsiveness to community or clinical interventions.

Acknowledgments

Community Care Salland was responsible for the data collection. We like to thank the physicians, nurses, physician’s assistants, and managers of the Home Care for facilitating this project. We are also very grateful to the parents who participated in this study.

REFERENCES

Footnotes

  • Funding: This study was funded by the Netherlands Organisation for Health Research and Development (ZonMw) NWO-Health Care Efficiency Research Program Grant # 2200.0128

  • Competing interests: none declared