Introduction recurrent 1 4 5 8 9 13 14 16 17 1 18 26 27 31 32 33 Table 1 Definitions of health-related quality of life and functional health status Health-related quality of life: 18 20 21 22 Functional health status: 23 26 34 35 34 36 37 39 40 The current RCT on the effectiveness of pneumococcal vaccination in children with rAOM will address both the issues of using generic versus disease-specific questionnaires and responsiveness in evaluating treatment effects on HRQoL in RCTs. The results will lead to recommendations regarding the applicability of these questionnaires in clinical studies in children with rAOM. Methods Setting and procedure FHS and HRQoL were assessed in 383 children with rAOM participating in a double-blind randomized, placebo-controlled trial on the effectiveness of pneumococcal conjugate vaccination versus control hepatitis vaccination. The study was conducted at the paediatric outpatient departments of a general hospital (Spaarne Hospital Haarlem) and a tertiary care hospital (University Medical Center Utrecht). Children were recruited for this trial through referral by general practitioners, paediatricians, or otolaryngologists, or were enrolled on the caregiver’s own initiative from April 1998 to February 2001. Study population Inclusion criteria: children were aged between 12 and 84 months and suffering from rAOM at study entry; defined in this study as having had at least 2 episodes of physician diagnosed AOM in the year prior to study entry. Exclusion criteria were conditions with a known increased risk for AOM such as: known immunodeficiency (other than IgA or IgG2 subclass deficiency), cystic fibrosis, immotile cilia syndrome, cleft palate, chromosomal abnormalities (like Down syndrome) or severe adverse events upon vaccination in the past. At each scheduled visit, two research physicians (C.N.M.B. and R.H.V.) collected data regarding the number of episodes of AOM (based on parental report at baseline and on physician report during follow-up), upper respiratory tract infections, and pneumonia. Information about the medical treatment, and ear, nose, and throat surgery in the preceding 6 months was also collected. The primary caregivers completed questionnaires assessing FHS and HRQoL of their child and family during the clinic visits at baseline and at 7, 14, and 26 months follow-up. Caregivers were requested to have the same person complete the questionnaires each time and to rate their child’s FHS and HRQoL with regard to their recurrent episodes of acute otitis media. Informed consent was obtained from caregivers of all children before study entry. Medical ethics committees of both participating hospitals approved the study protocol. Questionnaires 2 14 41 57 Table 2 Characteristics of FHS and HRQoL questionnaires used in this study Questionnaires Type; number of items; scale Construct(s) measured Application in other studies Generic RAND FHS; 7; Likert General health: current health; previous health; resistance to illness 42 44 47 48 FSQ generic FHS; 14; Likert Age appropriate functioning and emotional behaviour 41 45 47 49 51 FSQ specific Idem TAIQOL HRQoL; 35/46*; Likert Sleeping, appetite, lung problems, stomach problems, skin problems, motor functioning, problem behaviour, social functioning, communication, positive mood, anxiety, liveliness 51 54 Disease-specific OM-6 FHS; 6; Likert Physical suffering; hearing loss; speech impairment; emotional distress; activity limitations; caregiver concerns 14 55 57 14 Family Functioning Questionnaire (FFQ) FHS; 7; Likert Parents: sleep deprivation; change of daily or social activities; emotional distress. Family: cancelling family plans or trips. Siblings: feeling neglected; demanding extra attention. None * 46 items when age > 15 months Generic questionnaires 41 42 2 43 Specific any 43 51 53 Disease-specific questionnaires 14 55 58 61 15 62 2 14 14 52 Questionnaire application 63 64 Statistical analyses Floor and ceiling effects Floor and ceiling effects were estimated for the baseline-assessment of each questionnaire by calculating percentages of respondents that had minimum and maximum scores, respectively. Questionnaires should exhibit minimal floor and ceiling effects to be optimally able to detect difference and change. Reliability 65 n 66 65 67 Construct and discriminant validity 68 r r 5 2 69 71 72 73 Responsiveness 74 36 75 77 34 35 34 36 The assessment of responsiveness will be described in further detail below. Sensitivity to change 78 79 Clinical relevance of change scores 80 Interpretation of change—distribution-based methods (ES-MCID and SEM-MCID) 36 81 83 77 84 86 Interpretation of change—anchor-based methods frequency severity 52 52 For all analyses the Statistical Package for the Social Sciences (SPSS) version 10.1 was used. Results Population 3 Table 3 Characteristics of study population* (n = 383) SD or 95% CI Age (months) 34 (19.7) Male gender 62% (57–67) In the year prior to inclusion Number of AOM episodes/year 5.0 (2.7) 2–3 37% (32–42) 4–5 31% (26–36) 6 or more 32% (27–37) Impaired hearing** 35% (30–40) Language or speech problems** 22% (18–26) History of Chronic airway problems or atopic symptoms *** 51% (46–56) Adenoidectomy 47% (42–52) Tympanostomy tubes 51% (46–56) Other ear-, nose-, and throat surgeries 2% (0.6–3) Antibiotic prophylaxis 15% (11–19) Ever had speech-therapy 9% (6–12) * at inclusion in the study ** reported by the caregiver *** asthma, wheezing, hayfever, or eczema Floor and ceiling effects 4 Table 4 Floor and ceiling effects*, internal consistency and test–retest reliability of the questionnaires Minimum score (%) Maximum score (%) n =  ** n =  Generic RAND 0 0 0.81 0.89 FSQ generic 0 2 0.80 0.92 FSQ specific 0 21 0.86 0.89 TAIQOL N.A. N.A. 0.72–0.90 0.76–0.90     Sleeping 2 12 0.90 0.83     Appetite 0 22 0.86 0.82     Positive mood 0 80 0.90 0.81     Liveliness 0.6 81 0.88 0.76     Problem behaviour 1 4 0.86 0.85     Communication 0.4 53 0.88 0.82 Disease-specific OM-6 0 14 0.85 0.89 NRS child 2 3 N.A. 0.83 FFQ 0.5 27 0.90 0.93 NRS caregiver 0 0 N.A. 0.81 * percentage of respondents with minimum (floor effect) and maximum (ceiling effect) scores n *** Intra-class Correlation Coefficient Reliability 4 4 Construct and discriminant validity 5 6 Table 5 Construct validity: calculated correlations *  between the questionnaires** RAND FSQ generic FSQ specific OM-6 NRS child FFQ NRS caregiver RAND 1.00 0.52 0.49 0.34 0.33 0.43 0.49 FSQ generic 1.00 0.80 0.37 0.25 0.43 0.24 FSQ specific 1.00 0.49 0.26 0.52 0.24 OM-6 1.00 0.23 0.74 0.28 NRS child 1.00 0.22 0.47 FFQ 1.00 0.39 NRS caregiver 1.00 * Spearman correlation coefficients were calculated ** appropriately à priori predicted correlations are bold-printed Table 6 Construct validity—‘correlations* between questionnaire scores and frequency of physician visits for URTI** and of AOM** episodes’ Frequency of physician visits for URTI Frequency of AOM episodes*** Generic RAND −0.48 −0.31 FSQ generic −0.20 # FSQ specific −0.27 ## Disease-specific OM-6 −0.32 −0.41 NRS child −0.41 −0.49 FFQ −0.29 −0.39 NRS caregiver −0.41 −0.40 * Spearmans’rho correlation coefficients were calculated ** URTI: upper respiratory tract infection; AOM: acute otitis media P # P ## P 7 Table 7 Discriminant validity: scores of children with 2–3 vs. 4 or more AOM episodes in the preceding year* 2–3 AOM episodes ≥4 AOM episodes P Generic RAND 21.1 19.6 0.004 FSQ generic 76.5 72.2 0.002 FSQ specific 83.9 78.4 0.001 TAIQOL     Sleeping 66.2 60.7 0.10     Appetite 74.7 73.2 0.44     Liveliness 93.2 91.3 0.81     Positive mood 92.0 92.5 0.97     Problem behaviour 64.8 60.9 0.24     Communication 83.8 84.5 0.69 Disease-specific OM-6 18.9 17.0 <0.001 NRS child 5.2 5.4 0.48 FFQ 84.9 78.5 <0.001 NRS caregiver 6.6 6.2 0.22 Calculated by Mann–Whitney test * 2–3 episodes means moderate and >4 episodes means serious AOM Responsiveness 74 Sensitivity to change 8 Table 8 Sensitivity to change: mean change-scores* and effect sizes** for changed subjects Mean change-score Effect size—GRS # 7–14 months 0–7 months 7–14 months n P n P n n Generic RAND 10.2 <0.001 7.7 <0.001 0.60 0.54 FSQ Generic 7.0 <0.001 4.9 0.001 0.37 0.29 FSQ specific 9.1 <0.001 6.0 <0.001 0.37 0.32 TAIQOL     Sleeping 9.9 <0.001 7.1 0.03 0.37 0.36     Appetite 6.8 0.001 0.0 1.0 0.28 0.00     Problem behaviour 0.4 0.80 −2.8 0.33 0.02 0.13     Positive mood 1.5 0.30 3.9 0.11 0.06 0.25     Liveliness 2.3 0.19 1.6 0.51 0.22 0.11     Communication 2.9 0.12 1.7 0.32 0.16 0.11 Disease-specific OM-6 16.6 <0.001 11.5 <0.001 0.60 0.73 NRS child 28.3 <0.001 14.2 <0.001 0.91 0.64 FFQ 13.6 <0.001 8.0 <0.001 0.55 0.60 NRS caregiver 19.2 0.003 9.1 0.003 0.95 0.57 t **calculated with Guyatt’s responsiveness statistic (GRS) n # n The effect sizes for the generic FHS questionnaires ranged from small to moderate (0.29–0.60). For the generic TAIQOL subscales however, the effect sizes were lower, ranging from almost zero for the subscales ‘Appetite’ (0,0), ‘Problem behaviour’ (0.02) and ‘Positive mood’ (0.06) to small for ‘Sleeping’(0.37) and ‘Liveliness’ (0.22). Effect sizes for the disease-specific questionnaires were moderate to large (0.55–0.95). For the questionnaires the ES were quite similar for the first (0–7 months) and second intervals (7–14 months), whereas for the second interval absolute change scores were smaller. The TAIQOL was excluded from further analyses on the interpretation of change, due to its poor sensitivity to change. Interpretation of change—distribution-based methods 9 Table 9 Responsiveness—distribution-based indices for minimally clinical important difference (MCID) using 0.3 Effect Size (ES) and one standard error of measurement (SEM) ES—MCID* SEM—MCID** # 7–14 months 0–7 months 7–14 months Generic RAND 5.0 4.3 5.3 4.5 FSQ generic 5.7 5.1 5.4 4.8 FSQ specific 7.4 5.6 7.8 5.9 Disease-specific OM-6 8.3 4.7 8.8 5.0 NRS child 9.4 6.7 12.5 8.9 FFQ 7.4 4.0 6.1 3.3 NRS caregiver 6.1 4.8 8.3 6.6 # ** MCID using one-SEM as benchmark 9 Interpretation of change—anchor-based methods frequency small 1 Graph 1 a b small severity 1 moderate to large 1 Comparison of anchor- and distribution-based methods 2 Graph 2  Minimally clinical important difference (MCID) per questionnaire according to distribution-based (ES-MCID and SEM-MCID) and anchor-based (AOM frequency and AOM severity) methods Discussion 4 5 6 7 8 8 9 1 2 Reliability and validity 14 41 42 51 52 51 41 42 65 87 88 89 Responsiveness 55 55 74 55 34 75 81 85 86 9 1 2 90 Generic versus disease-specific questionnaires 19 37 91 92 21 93 96 97 99 3 The reasons for the poor performance of the TAIQOL with regard to both discriminant validity and sensitivity to change are not obvious. Possibly the subscale scores represent each an aspect of HRQoL that is too limited to be sensitive to differences or change. Combining the subscales to more comprehensive constructs may then improve sensitivity. In addition, each item of the TAIQOL consists of two questions; a question about FHS is followed by the request to rate the child’s well-being in relation to this health status. Response shift bias may have modified the caregivers’ expectations about how their child feels in line with the child’s changing health, that is caregivers may rate their child’s well-being as better than it actually is as they adapt to the situation. Studies on factors that may influence sensitivity to change or responsiveness besides the type of questionnaire (generic versus disease-specific), such as questionnaire structure and content, disease severity, co-morbidity and other population characteristics, are needed. Bias and generalisibility 100 Secondly, in assessing test–retest reliability, two different modes of questionnaire administration were used: completion at the clinic versus home completion. The possible intention to give more socially desirable answers at the clinic as well as other effects such as being more distracted when filling in the questionnaires at home, may have caused differences in questionnaire scores between the first (test) and second (retest) assessment. Although this impact may be larger for single item questionnaires such as the NRSs compared to multiple item questionnaires, and might explain their somewhat smaller ICCs, the impact on the ICCs appears to be small. Thirdly, during the trial, 8 children (4.2%) in the pneumococcal vaccine group and 13 (6.7%) in the control vaccine group were lost to follow-up. One child switched from the control to the pneumococcal vaccine group. It is unlikely that these small numbers of dropouts and crossovers influenced the trial results. Furthermore, indices of validity and reliability are not fixed characteristics of FHS and HRQoL questionnaires but are influenced by the study design, intervention, and study population in particular. Our study population had significantly severe ear disease with frequent episodes and was older than the average child with AOM. Assessment of reliability and validity of the questionnaires in populations with less severe disease may present more ceiling effects and lack of discriminant validity. Therefore, the results of this study should only be generalized to paediatric populations with moderately to seriously severe recurrent acute ear-infections at an older age (approximately 14–54 months). 31 Recommendations for clinical use In conclusion, generic (RAND, FSQ Generic and FSQ Specific) as well as disease-specific (OM-6, FFQ, and, to a lesser extent, NRS Caregiver) questionnaires demonstrated similar and high reliability and adequate construct and discriminant validity as well as responsiveness to justify use in clinical studies of children with rAOM. However, NRS as used in this study may be less adequate for assessment of HRQoL in this population. The TAIQOL, the only true generic HRQoL questionnaire, unfortunately showed a poor discriminant validity and sensitivity to change, needing extensive revision before further use in clinical outcome studies in children with otitis media. Using both a generic questionnaire (RAND or FSQ) and the OM-6 in clinical studies regarding FHS in children with rAOM is recommended, as it would combine the merits of both generalisability and sensitivity in outcome assessment and facilitate head-to-head comparisons of their performance in various paediatric populations with OM. More studies are needed assessing responsiveness of paediatric QoL questionnaires by multiple, distribution as well as anchor-based, methods to increase our appreciation of minimal clinically important changes in various paediatric conditions. Further studies on factors such as questionnaire structure and content, disease severity, co-morbidity and other population characteristics that may influence sensitivity to change or responsiveness besides the type of questionnaire (generic versus disease-specific) may increase our appreciation of the complex dynamics in HRQoL and FHS assessment.