Introduction 1978 1994 1985 1985 2000 2000 b 1998 1999 1996 2002 2002 2000 2002 1999 1 2003 2 1999 Σ μ 1996 Σ μ 2 1996 2002 b 2002 1999 1998–1997 2 2003 2005 3 Power calculation The concept of power is closely related to the two types of statistical errors: the Type I error (i.e., the probability of rejecting a true hypothesis, α), and the Type II error (i.e., the probability of accepting a false hypothesis, β). Power is defined as 1 − β, i.e., the probability of rejecting a false hypothesis, or the probability of not making a Type II error. The basic aim of a power study is to determine the sample size N, which is required to achieve adequate power, given chosen α and a particular effect size. 1 a c e 1 A a C c E e 1 1 a c e μ 1mz 1dz 1 0 0    c e μ a 0 0mz 0dz 0 0 1 0 1 1989 1985 0 1 a N Fig. 1 Classical univariate ACE-twin model 1985 1993 T T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ T = N*[\log |\Sigma | + \;\;{\text{trace}}(\Sigma ^{{ - 1}} {{\mathbf{S}}}) - \log |{\mathbf{S}}| - p + ({\mathbf{m}} - \mu)^{\prime}\Sigma ^{{ - 1}} ({\mathbf{m}} - \varvec\upmu )], $$\end{document} N Σ μ S m Σ μ Σ 1 μ 1 T 2 1 T 2 1 1996 1989 Σ μ Σ 0 μ 0 N T 2 0 2 0 1    0 α 2 T T c α T c α P 2 0 c α 0 0 1 1 Σ 1 μ 1 2 1 2 0 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \lambda = N*[\log |\varvec\Sigma _{A} | + \;\;{\text{trace}}\,\,(\varvec\Sigma ^{{ - 1}}_{A} \varvec\Sigma _{0} ) - \log |\varvec\Sigma _{0} | - p + (\varvec\upmu _{0} - \varvec\upmu _{A} )^{\prime}\varvec\Sigma ^{{ - 1}}_{A} (\varvec\upmu _{0} - \varvec\upmu _{A} ), $$\end{document} 2 1    0 1988 Σ μ Σ μ 1996 Σ μ 1 0 Σ μ 2002 2002 2002 2 i 3 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ {\text{Var }}(ti) = (a + \upbeta_{a} *\;\bmod _{{ti}} )^{2} + (c + \upbeta_{c} *\bmod _{{ti}} )^{2} + (e + \upbeta_{e} *\bmod _{{ti}} )^{2} $$\end{document} i j 4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ {\text{Covar}}_{{MZ}} (ti,\;\;tj) = (a + \upbeta_{a} *\bmod _{{ti}} )(a + \upbeta_{a} *\bmod _{{tj}} ) + (c + \upbeta_{c} *\bmod _{{ti}} )(c + \upbeta_{c} *\bmod _{{tj}} ) $$\end{document} 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ {\text{Covar}}_{{DZ}} (t_{i} ,\;t_{j} ) = \raise0.5ex\hbox{$\scriptstyle 1$} \kern-0.1em/\kern-0.15em \lower0.25ex\hbox{$\scriptstyle 2$}(a + \upbeta_{a} *\bmod _{{ti}} )(a + \upbeta_{a} *\bmod _{{tj}} ) + (c + \upbeta_{c} *\bmod _{{ti}} )(c + \upbeta_{a} *\bmod _{tj} ) $$\end{document} 6 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ {\varvec{\mu}} = {\left[ {\begin{array}{*{20}c} {{m + \upbeta_{m} *\bmod _{{ti}} }} & {{m + \upbeta_{m} *\bmod _{{tj}} }} \\ \end{array} } \right]}. $$\end{document} Fig. 2 Univariate ACE-twin model including moderation on the variances and the means Σ μ Σ μ Σ μ 1 N 0 1 1 2 0 1 2 1 0 2 Exact data simulation Power calculations based on sufficient summary statistics are computationally relatively efficient to carry out. However, the actual feasibility of this type of power calculation depends on the number of distinct groups. If the number of groups is large (i.e., >100), it may be more convenient to carry out Monte Carlo based power calculations. We now introduce the concept of exact data simulation, which shares the virtues of the power studies based on summary statistics, but is more practicable given a large number of distinct groups. 0 1993 0 N k p k Y N k q k N k N p k q m q S  Y t Y N mm t Σ 1 μ 1 S 1/2 Σ 1/2 S Σ Z Y 7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \mathbf{Z} = (\mathbf{Y} - \mathbf{J}\; \otimes \;\mathbf{m}^{t} )\mathbf{S}^{{ - 1/2}} \varvec\Sigma ^{{1/2}} + \mathbf {J} \otimes \varvec\upmu ^{t} $$\end{document} Σ μ 1993 7 J q 0 A A  0 p k k S N k q S N N N given the proportion of subjects in each group N k q N q not N k scripts library. 4 Illustration 1: multivariate ACE-model with data MCAR 3 1 5 1 2 q 8  7 −05 k q N q −05  Fig. 3 Four-variate ACE-model with common factors for additive genetic and shared environmental effects, and specifics for A, C and E Table 1 Illustration 1: Four-variate cross-trait-cross-twin MZ correlations (below diagonal) and DZ correlation (above diagonal) for data without missingness Twin 1 Twin 2 Trait1 Trait2 Trait3 Trait4 Trait1 Trait2 Trait3 Trait4 Twin 1 Trait1 1.00 .45 .45 .45 .55 .30 .30 .30 Trait2 .45 1.00 .45 .45 .30 .55 .30 .30 Trait3 .45 .45 1.00 .45 .30 .30 .55 .30 Trait4 .45 .45 .45 1.00 .30 .30 .30 .55 Twin 2 Trait1 .80 .45 .45 .45 1.00 .45 .45 .45 Trait2 .45 .80 .45 .45 .45 1.00 .45 .45 Trait3 .45 .45 .80 .45 .45 .45 1.00 .45 Trait4 .45 .45 .45 .80 .45 .45 .45 1.00 Note N 3 q 4 1 3 2 The three simulated data sets were subsequently analyzed in Mx. In the Mx-script, we specify different groups for the MZ and DZ twins. Because we use full information maximum likelihood to accommodate the missingness, we do not need to specify different groups for all possible missing data patterns. The Mx command ‘option power’ (α = .05, df = 4) was used to obtain an estimation of the total sample size that would be required for a power of 80%, given the current proportions of subjects in each group. 1 2 3 2002 S T S T Illustration 2: gene by environment interaction with latent G and measured, categorical E 3 6 2002 4    a c e C E 4 N Fig. 4 Univariate ACE-model for parents and twin-offspring, including moderation on the variances and the means a c e The data were analyzed in Mx: different groups were specified for the MZ and DZ twins, and the moderator featured as a so-called definition variable. The ‘option power’ command (α = .05, df = 3) was again used to obtain an estimation of the total sample size that would be required for a power of 80%, given the current proportions of subjects in each group. a T T Illustration 3: association for a tri-allelic locus with different allele frequencies 1999 2    2 2 2    2 2    2 2    p q r 2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ E(\ifmmode\expandafter\bar\else\expandafter\=\fi{g}_{{ij}} ) $$\end{document} p q r E N N N 3 Table 2 Expectations for a tri-allelic locus following the standard biometric model when dominance is assumed absent Genotype AA AB BB AC BC CC f ij 2 2pq 2 2pr 2qr 2 g ij x (x + y)/2 y x + z/2 = −y/2 y + z/2 = −x/2 z \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \upmu_{{qtl}} = f_{{ij}} \times g_{{ij}} = p^{2} x + 2pq[(x + y)/2] + q^{2} y + 2pr( - y/2) + 2qr( - x/2) + zr^{2} $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sigma ^{2} _{{qtl}} {\text{ }} = {\text{ }}f_{{ij}} {\text{ }}(g_{{ij}} - \mu _{{qtl}} )^{2} = p^{2} (x - \mu _{{qtl}} )^{2} + {\text{ }}2pq([(x + y)/2] - {\text{ }}\mu _{{qtl}} )^{2} + q^{2} (y - {\text{ }}\mu _{{qtl}} )^{2} + {\text{ }}2pr([ - y/2] - \mu_{{qtl}} )^{2} + {\text{ }}2qr([ - x/2] - \mu _{{{\text{qtl}}}} {\text{)}}^{{\text{2}}} + {\text{ r}}^{{\text{2}}} {\text{(z}} - \mu _{{qtl}} {\text{)}}^{{\text{2}}} $$\end{document} Note p q r x \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ E(\ifmmode\expandafter\bar\else\expandafter\=\fi{g}_{{ij}} ) = 0, $$\end{document} μ qtl 2 qtl 1996 Table 3 Results for illustration 3: Power calculations for sib-pair association with a tri-allelic locus with fixed genotypic values Frequencies alleles A, B, C Effect size (%) Actual N Nr of groups represented 2 Observed power N required for power of 80% .33/.33/.33 2.5 9,639 81 837.824 1 157 .25/.5/.25 1.7 9,985 81 734.357 1 185 .45/.45/.1 .6 9,993 81 352.963 1 386 .1/.45/.45 2.8 9,992 80 966.698 1 141 Note Effect size Actual N Nr of groups represented χ 2 (6) 2 Observed power N B W 3 T T Conclusion In this paper we discussed a third method of power calculation, which can be useful when sufficient summary statistics are available in principle, but the number of possible groups is so large to render a multi-group analysis impractical. The illustrations presented in this paper represent only a few of the possible (behavior genetics) designs in which exact data simulation may prove useful. Other models for which exact data simulation can be used include random-effects models, latent growth curve models, simplex models, and (hierarchical) structural models, either or not in the context of genetics, just to name a few. Exact data simulation does not require more programming skills, or programming time, than Monte Carlo simulation, but one may save a lot of time analyzing the simulated data and calculating power, especially when one wishes to construct graphs of power vs. effect size. In this paper, we used the Mx program to analyze the simulated data because of its inbuilt power calculation function. Another useful option of Mx in this context is the possibility to output individual likelihood statistics for each raw data group. This information can be used to identify the groups that contribute most to the power to detect the effects of interest. Of course, various other statistical software packages (e.g., QTDT, LISREL, MPlus, R) can also be used in combination with exact data simulation to obtain the non-centrality parameters required for power calculations. exactly Mx scripts library 2002 Finally we note that the extension of this method to discrete data would obviously be very useful, and does seem feasible.