Introduction Mueller et al., 2005 Li et al., 2005 Jovicich et al., 2006; Littmann et al., 2006 Jovicich et al., 2006 Preboske et al., 2006 Ashburner and Friston, 2000 Tofts, 1998; Van Haren et al., 2003; Schnack et al., 2004; Jovicich et al., 2006 Tofts, 1998; Han et al., 2006; Jovicich et al., 2006 Jovicich et al., 2006 Raz et al., 2005 Breillmann et al., 2001 Good et al., 2002 Methods and results Table 1 Ashburner and Friston, 2000 Benjamini and Hochberg, 1995 http://www.fil.ion.ucl.ac.uk/spm Ashburner, 2007 Ashburner and Friston, 2005 Ashburner and Friston, 2000 Fig. 1 F p x y z F x y z F Fig. 2 p x y z F x y z F Fig. 3 T x y z T p x y z T Z p F Z p x y z F x y z F p x y z F x y z F x y z F x y z F Fig. 4 Fig. 4 Z p Discussion In our data-set, we found the effect of disease to be substantially larger than the effect of scanner and failed to find a significant interaction of disease with scanner or software upgrades. In general, the effect of disease in AD is liable to be larger than the effect of scanner, which is supported by our result of no important interaction between scanner and effect of interest. Ideally, further studies with an even larger data set to better calculate the effect sizes and quantify the distance between the scanner effect cluster and group effect cluster could be done to validate our findings. However, comparison of the magnitude of the scanner effect versus disease effect in the medial temporal lobe cluster in our data-set demonstrates that scanner differences had minimal effects in the areas that are important in the study of AD. Furthermore, even though there are likely to be some differences among data from different scanners, our experiments were explicitly designed to detect scanner-related differences. By modeling appropriate confounds in the design matrix, it appears possible to remove these small effects. We were able to model this interaction easily because relatively homogeneous cases and controls were scanned in each machine. We are unable to say whether lesser or more subtle and distributed changes would be as resistant to scanner effects. Though we did not detect a significant interaction of scanner with disease, we cannot be absolutely certain it is indeed due to the absence of such effect. The lack of significance may be a reflection of the lack of statistical power, e.g., insufficient number of scans. Other causes such as a high average residual variance or residual variance inhomogeneities could also under-power the detection of the effects. However, the variance inhomogeneity was considered in our analysis by assuming unequal variance for the different levels of each of the two factors in our full factorial design (the two factors are the scanner and group). In a post-hoc manner, we explored the residual variance across scanners to assess whether that explained the lack of sensitivity in the results. In the area of greatest disease effect, the variance was low and did not reflect significant inhomogeneity, but in the area of greatest scanner effect there was lower average residual variance and more variability. Though our tests for variance inhomogeneity across the 6 scanners was not voxel-by-voxel over the whole brain volume, our findings seem to support that the variance inhomogeneity is location-dependent and should be accounted for when analyzing data acquired from different scanners as we attempted to do in our analysis. The greatest effect of scanner was in the thalamus. The effect of the thalamus was largely driven by the scanner with the resistive shim set that was not cooled to superconducting temperatures, which suggests an impact of such hardware differences on thalamic segmentation. The composition of the thalamus is an issue of debate as it is not completely grey matter receiving numerous white matter tracts from other parts of the brain. Additionally, the grey matter intensity value of the thalamus is different from that of cortical grey matter. The intrinsically poor intensity contrast in the thalamus renders it susceptible to small differences in image contrast due to scanner differences. There is also less variability in this part of the brain, so tests will be more sensitive to such differences. These factors may contribute to the difficulty of accurate segmentation of the thalamus in addition to scanner effects. Preboske et al., 2006 Ashburner and Friston, 2005 As long as provision for different scanners and/or upgrades is made within an analysis, the effect of scanner regardless of magnitude is not likely to devalue the integrity of results. However, if there is a true physical interaction of the biological effect of interest with the method of measurement, perfect calibration or even using a single scanner would not prevent bias. On the other hand, any unusually large effect from one scanner would be attenuated by the totality of scans from different scanners that make up the template, which averages the different scanner effects for normalization. Zakzanis et al., 2003; Whitwell and Jack, 2005; Wahlund et al., 2005 Zakzanis et al., 2003 a priori