Introduction The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and quantitative prediction of activity. A serious weakness within the field is a lack of standards with respect to statistical evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. In this editorial, we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage. predicting not 1 11 Data sharing The issues precise 7 9 11 4 7 9 4 7 9 4 7 9 Recommendations on data sharing usable publicly available 8 Preparation of datasets The issues predictions do not already know Docking Pose prediction Cognate docking 7 7 9 8 9 7 Cross docking 7 8 Virtual screening 5 8 10 2 4 5 8 10 Scoring accurately Affinity prediction tests can be done absent any affinity data on related analogs. However, to date, successful predictions without prior affinity information have been so anecdotal and untransferable that the field seems willing to accept any input of prior structural information. Hence, inclusion of information as to the protein’s disposition upon binding that is not available in an operational setting is considered acceptable. 3 Ligand-based modeling Pose prediction Virtual screening 2 8 11 will not exist 2 Scoring The descriptions of test case construction above involve different degrees of challenge in proportion to the amount of information provided to a method. The problems often encountered in reviewing or reading papers is that methods claim a lower level of information concerning the answers than is actually true. This is seldom intentional, no matter the provocation to believe otherwise, but a reflection of the difficulty in preparing a ‘clean’ test. Recommendations on dataset preparation Protein structure selection and preparation 4 5 9 free not 5 7 other 8 4 6 10 11 4 10 11 Decoy set construction 4 8 10 11 Active ligand set construction 1 2 4 7 2 1 4 6 Ligand preparation 7 9 Parameter tuning 3 4 9 Even within the constraints outlined above, data set preparation and parameter selection can yield a wide range of results. This is acceptable to illuminate which choices are of most benefit to users of the different methods. However, without strong requirements for data sharing (the subject of the previous section), this benefit will be diluted. Further, without baseline requirements for statistical reporting (the subject of the next section), this diversity will lead to an unacceptable degree of incomparability between different reports. Reporting results The issues what Pose prediction 8 5 5 7 Virtual screening and 10 1 7 8 10 Affinity estimation 3 General 10 Recommendations for reporting results Pose prediction 5 7 Virtual screening 1 3 7 10 10 Affinity estimation 3 8 General 1 5 7 10 standard metrics requirement Conclusions Molecular modeling is a relatively young field. As such, its growing pains include the slow development of standards. Our hope for this special issue of JCAMD is that with the help of the arguments made in the contributed papers, the modest recommendations made here will form the kernel of standards that will help us as a community to both improve the methods we develop and to reduce the disparity between reported performance and operational performance.