Introduction 5 1 2 5 6 9 10 15 16 17 18 19 5 Methods Instruments 20 21 22 23 1 Table 1 Direct quantification of three- and five- level (3L, 5L) descriptors Number Mean Median 95% CI 3L Mobility a – – – –     Some problems in walking about 74 26.70 22 22.82−30.59 a – – – – Self-care a – – – –     Some problems washing or dressing self 74 30.18 28 26.10−34.25 a – – – – Usual activities a – – – –     Some problems with performing usual activities 77 29.74 25 25.95−33.53 a – – – – Pain/Discomfort a – – – –     Moderate pain or discomfort 66 32.33 31 28.56−36.10     Extreme pain or discomfort 66 86.36 89 83.75−88.98 a – – – – Anxiety/Depression a – – – –     Moderately anxious or depressed 67 33.94 34 29.89−37.99 a 67 88.82 90 86.88−90.77 a – – – – 5L Mobility a – – – –     Mild problems in walking about 75 11.31 11 9.73–12.88     Some problems in walking about 75 38.39 40 35.39–41.39     Many problems in walking about 75 79.80 82 76.81–82.79 a – – – – Self-care a – – – –     Mild problems washing or dressing self 76 11.24 10 9.72–12.76     Some problems washing or dressing self 76 37.14 38 34.14–40.15     Many problems washing or dressing self 76 80.61 81 77.81–83.40 a – – – – Usual activities a – – – –     Mild problems with performing usual activities 77 11.08 10 9.29–12.87     Some problems with performing usual activities 77 39.01 40 36.12–41.90     Many problems with performing usual activities 77 80.81 83 77.70–83.91 a – – – – Pain/Discomfort a – – – –     Mild pain or discomfort 53 8.85 8 7.43–10.26     Moderate pain or discomfort 53 32.32 31 29.58–35.06     Severe pain or discomfort 53 67.94 68 64.98–70.90     Extreme pain or discomfort 53 91.26 94 88.96–93.57 a – – – – Anxiety/Depression a – – – –     A little anxious or depressed 59 9.46 8 7.97–10.94     Moderately anxious or depressed 59 32.56 33 30.01–35.11     Very anxious or depressed 59 67.37 66 64.55–70.20     Extremely anxious or depressed 59 91.34 92 89.42–93.25 a – – – – CI a To obtain quantitative values for each level descriptor of 3L and 5L, the VAS was used. We used five VAS scales, one for each EQ-5D dimension. Each VAS consisted of a horizontal hashmarked line without corresponding numbers, with the extreme-level descriptors belonging to that dimension as anchors. Respondents were asked to indicate their score on the VAS by marking the line. For the most severe category of Pain/Discomfort and Anxiety/Depression, the original descriptor was labeled “extreme”. Because the study was part of a larger process of choosing the definite level descriptors for the official five-level version of the EQ-5D, we decided to use the entire continuum of disability (extreme included), and used “worst imaginable” as upper VAS anchor for these two dimensions. This is analogous to the other three dimensions, which ranged from “no problems” to “unable to”. Study design N All participants completed both the direct and the indirect quantification task. For the direct method, all 3L answers were obtained during the panel sessions and all 5L answers as part of the postal survey to avoid memory effects. For the indirect method, participants scored ten health states in the panel sessions (acute pharyngitis, exacerbation of eczema, hip fracture, cerebrovascular accident/stroke with moderate impairments, moderate gastritis, low spinal cord lesion, mild depression, back and neck pain, severe dementia, and acute multiple injury) and the remaining five in the survey (otitis externa, severe stable brain injury, irritable bowel syndrome, acute large burn, and posttraumatic stress disorder), because we expected that more than ten health states within one session could lead to concentration problems. The two sets of health states were balanced according to severity and duration. Following this design, the indirect method provided 225 responses for each respondent: 15 diseases × 5 dimensions × 3 response scales. Direct quantification of level descriptors In the direct method, respondents were asked to project the 3L and the 5L descriptors on the VAS scales for each dimension separately. As the extreme levels were used as anchors of the VAS, for 3L only, the midcategory (3L-2) level descriptor needed to be scored, except for Pain/Discomfort and Anxiety/Depression, which needed additional scoring of 3L-3 (extreme). Similarly, the midcategories 5L-2, 5L-3, and 5L-4 descriptors were scored for each dimension, except for Pain/Discomfort and Anxiety/Depression, which included the scoring of 5L-5. Indirect quantification of level descriptors As an alternative to the direct method, we developed an indirect method that we believe lies closer to the actual use of the EQ-5D instrument, as it uses a (hypothetical) health state as a calibrator or medium to derive a VAS score. In contrast to the direct method, the object of measurement in the indirect method is not a 3L or 5L descriptor but a complete health scenario (vignette). Each vignette was scored with the 3L and 5L descriptors and on a VAS, one for each separate dimension, independently. Consequently, an indirect head-to-head comparison of 3L and 5L scores could be made, calibrated via the common VAS score. 1 Fig. 1 Disease vignette with empty EQ-5D descriptive system The 5L and 3L response scales were presented on the left and the right side of one page (per dimension), respectively. The respondents were first invited to score the 5L descriptors for all dimensions and all vignettes while covering the right side of the page that showed the 3L descriptors. Next, they were instructed to return to the first vignette, asked to cover the left side with the 5L scores, and provide the 3L response for all vignettes. Pilot testing revealed that when respondents scored 3L first, there was a tendency to avoid the in-between levels 2 and 4 of 5L, and for this reason, all respondents were asked to score 5L first. Adequate instruction was critical, stressing that 3L and 5L were two independent ways of scoring (in the postal survey, these instructions were repeated in writing). Subsequently, VAS scores were obtained on a separate form without respondents having access to the 3L and 5L scores. The demanding task of first providing 5L classifications on all five dimensions of all 15 vignettes minimized possible memory effects when the participants were instructed to return to the first vignette to score the 3L classifications while covering the 5L responses. Analysis Results of the direct and indirect methods are presented with conventional descriptive statistics. Results of the indirect method were derived by grouping 3L-VAS pairs and 5L-VAS pairs for each respondent per vignette and subsequently by calculating level means over all vignettes and all respondents combined. For each respondent, scorings were removed for the combined 3L, 5L, and VAS scores if at least one of the 3L, 5L, or VAS scores was missing, equalizing the number of VAS observations between 3L and 5L. Characteristics 23 24 16 y  ax b y  x 1.5 Part of the evaluation of equidistance is analysis of the position of the extreme levels according to the indirect method: are the VAS ratings for the extreme level descriptors close to the supposed anchor values for the indirect method? Ideally, 3L-1 and 5L-1 scores would equal 0 and 3L-3 and 5L-5 scores would equal 100, except for Pain/Discomfort and Anxiety/Depression in which the 3L and 5L extreme level descriptors were not identical to the VAS anchors. t 25 26 27 We regarded transformed or untransformed equidistance to be a desirable characteristic for the new 5L system as opposed to no systematic relation between the quantitative position of the level descriptors at all. Consistency between identical-level descriptors across dimensions was also regarded as a desirable property because this expresses that respondents have a consistent conceptualization of the grading terms used over different dimensions of health. When consistency is achieved, this does not imply that utility values would also be expected to be consistent over dimensions, because utility values are an expression of an entire EQ-5D profile, whereas we investigated VAS scores within each dimension separately. Furthermore, a choice-based method presumably leads to different results than the dimension-specific VAS scales we used. We investigated isoformity to see whether the new 5L system was a refinement or a new system, and whether isoformity was achieved or not does not tell us anything about the 5L system in itself. Results The mean age of the participants was 53.6 years, with 42.7% being men. Of the 82 respondents who attended in the panel sessions, 81 returned the survey. Three respondents (4%) were of Turkish nationality, two (2%) were of Moroccan nationality, and the remaining 75 (94%) were of Dutch origin. In the Pain/Discomfort and Anxiety/Depression dimensions, respondents often failed to score the extreme-level descriptor when using the direct method (8 and 9 for 3L, respectively, and 22 and 16 for 5L, respectively). For these respondents, the remaining scorings were deleted for that dimension because of possible context effects (i.e., spreading out the VAS scores of the remaining 3L descriptors over the VAS scale). For the direct method, missing responses for 3L ranged from 6.1% (Usual Activities) to 19.5% (Pain/Discomfort) and for 5L from 4.9% (Usual Activities) to 34.6% (Pain/Discomfort). For the indirect method, missing responses ranged from 1.1% (Usual Activities) to 2.5% (Pain/Discomfort) for the three response scales (3L, 5L, and VAS) combined. Characteristics: direct method 1 2 R 2  Fig. 2 Direct quantification of the three- and five-level (3L, 5L) descriptors. Visual analog scale (VAS) means by dimension 2 Table 2 Isoformity of identical three-and five-level (3L, 5L) descriptors for the direct quantification method Dimension Comparison Mean difference P Mobility 3L-2 5L-3 −11.4 <0.001 Self-care 3L-2 5L-3 −8.0 0.002 Usual activities 3L-2 5L-3 −9.4 <0.001 Pain/Discomfort 3L-2 5L-3 −1.4 0.501 Pain/Discomfort 3L-3 5L-5 −4.9 0.012 Anxiety/Depression 3L-2 5L-3 2.8 0.276 Anxiety/Depression 3L-3 5L-5 −3.0 0.025 2 Characteristics: indirect method 3 3 R 2  Table 3 Indirect quantification of three- and five-level (3L, 5L) descriptors Number Mean Median CI 3L Mobility     No problems in walking about 599 1.69 0 1.31–2.07     Some problems in walking about 403 42.94 40 40.24–45.64     Unable to walk about 180 91.70 99 89.28–94.12 Self-care     No problems with self-care 482 3.24 0 2.40–4.08     Some problems washing or dressing self 435 39.18 34 36.58–41.78     Unable to wash or dress self 273 85.47 95 82.77–88.16 Usual activities     No problems with performing usual activities 235 4.50 2 3.49–5.51     Some problems with performing usual activities 582 36.55 30 34.40–38.71     Unable to perform usual activities 378 88.54 95 86.87–90.22 Pain/Discomfort     No pain or discomfort 246 12.64 4 9.94–15.34     Moderate pain or discomfort 643 35.76 31 33.92–37.60     Extreme pain or discomfort 275 83.21 89 80.82–85.61 Anxiety/Depression     Not anxious or depressed 433 6.29 1 5.01–7.57     Moderately anxious or depressed 478 42.45 40 40.26–44.63     Extremely anxious or depressed 270 84.80 90 82.73–86.86 5L Mobility     No problems in walking about 547 1.30 0 0.92–1.69     Mild problems in walking about 147 15.33 11 12.64–18.02     Some problems in walking about 159 36.48 31 33.00–39.97     Many problems in walking about 217 69.82 76 66.72–72.92     Unable to walk about 112 97.36 100 95.24–99.48 Self-care     No problems with self-care 398 2.45 0 1.43–3.48     Mild problems washing or dressing self 204 12.70 9 10.76–14.64     Some problems washing or dressing self 184 36.09 33 33.00–39.17     Many problems washing or dressing self 257 71.20 78 68.33–74.06     Unable to wash or dress self 147 91.37 99 87.80–94.94 Usual activities     No problems with performing usual activities 136 3.22 0 1.49–4.95     Mild problems with performing usual activities 268 12.39 9 10.68–14.10     Some problems with performing usual activities 228 32.53 30 29.97–35.09     Many problems with performing usual activities 351 69.54 75 67.18–71.90     Unable to perform usual activities 212 95.35 100 93.74–96.96 Pain/Discomfort     No pain or discomfort 145 8.34 0 5.32–11.37     Mild pain or discomfort 274 17.27 12 15.13–19.41     Moderate pain or discomfort 367 36.83 35 34.91–38.76     Severe pain or discomfort 263 71.72 79 69.05–74.39     Extreme pain or discomfort 115 92.76 98 89.86–95.65 Anxiety/Depression     Not anxious or depressed 305 4.75 0 3.07–6.43     A little anxious or depressed 241 16.48 10 14.33–18.63     Moderately anxious or depressed 271 41.98 41 39.72–44.25     Very anxious or depressed 248 74.19 80 71.69–76.70     Extremely anxious or depressed 116 92.33 97 89.61–95.04 CI Fig. 3 Indirect quantification of the three- and five-level (3L, 5L) descriptors. Visual analog scale (VAS) means by dimension 4 Table 4 Consistency between dimensions for the indirect quantification method. Variance components estimates (percentages) and generalizability coefficients (G-coefficients) for comparable dimensions of three- and five-level (3L, 5L) instruments 3L 5L Mobility/Self-care/Usual activities Label 66.12 Label 71.52 Vignette 8.05 Vignette 6.35 Dimension 0.26 Dimension 0.04 Respondent 0.33 Respondent 0.79 Label × vignette 5.60 Label × vignette 2.91 Label × dimension 0.22 Label × dimension 0.12 Label × respondent 2.20 Label × respondent 2.59 Vignette × dimension 0.60 Vignette × dimension 0.17 Vignette × respondent 3.77 Vignette × respondent 2.57 Dimension × respondent 0.76 Dimension × respondent 0.60 Residual 12.09 Residual 12.34 G-coefficient 0.86 G-coefficient 0.87 Pain/Discomfort; Anxiety/Depression Label 65.25 Label 73.58 Vignette 4.95 Vignette 2.73 Dimension 0.00 Dimension 0.00 Respondent 0.65 Respondent 0.77 Label × vignette 1.91 Label × vignette 1.02 Label × dimension 0.04 Label × dimension 0.00 Label × respondent 2.96 Label × respondent 3.36 Vignette × dimension 1.06 Vignette × dimension 0.17 Vignette × respondent 5.52 Vignette × respondent 4.50 Dimension × respondent 0.88 Dimension × respondent 0.45 Residual 16.78 Residual 13.42 G-coefficient 0.81 G-coefficient 0.86 Discussion In this study, we compared the quantitative position of the level descriptors of the standard EQ-5D3L and a new five-level version using two independent methods. The study showed that the extension of the EQ-5D3L to a five-level version by inserting two extra levels, leaving the existing descriptors unaltered, is not a simple refinement but a redesign. The inserted levels pushed the extreme levels closer to the anchors, which indicates that 5L makes better use of the measurement continuum, contributing to superior descriptive power of the 5L version. In both the 3L and 5L versions, the position of the 3L or 5L descriptors, reassuringly, was independent of dimension. Equidistance was not achieved for both systems, in most cases showing values lower than the equidistant values. Both methods revealed a large gap between the 5L-3 and 5L-4 levels, regardless of dimension. This could be caused by the wording of 5L-3 [some and moderate(ly)] being interpreted as fairly mild. In Pain/Discomfort, respondents tended to avoid the lower anchor of the scale, indicating some pain or discomfort on VAS while scoring no problems on 3L and 5L. This indicates that respondents preferred a more refined response scale for scoring pain or discomfort, maybe a scale with even more than five response options (as is the case of, e.g., the HUI3 or SF-36). Also noticeable were the gaps observed for the upper extreme in Self-Care, for which we cannot provide an explanation. Isoformity between 3L and 5L showed mixed results. The 3L-1 vs. 5L-1 descriptors showed isoformity (indirect method only), as expected, as these both indicated the upper ceiling (no problems). Isoformity was also established for the middle level descriptors of Pain/Discomfort and Anxiety/Depression for both methods. This could be due to the wording of the middle level descriptors, as the descriptor some problems represented a wider range and hence more potential variation, than moderate(ly), as used in Pain/Discomfort and Anxiety/Depression. Assuming that the descriptor some problems was a well-considered choice in the development of the original EQ-5D3L system in order to cover the entire range between the two extremes, it is questionable whether that descriptor is still suitable in a 5L version. 16 17 28 29 t A potential weakness of the study procedure is that 3L and 5L were presented on one sheet, and panelists were asked to score 5L dimensions first while covering 3L and vice versa. We cannot be sure that respondents actually complied to the blinding procedure in the follow-up measurement. Also, there might have been an order effect, as 5L always preceded 3L. 18 30 31 The experimental five-level EQ-5D version presented here is likely to demonstrate a less severe ceiling effect. Assuming that milder states are more common in the general population, we expect increased benefit in the detection of mild problems and in measuring and monitoring general population health, although the extra 5L-4 level is expected to also lead to better differentiation and detection of more severe health states. The methodology presented here can be of use in the development of generic or disease-specific health status measures.