In large-scale epidemiological studies on endogenous sex steroids and cancer risk, direct immunoassays of circulating hormone levels have the advantage of being fast and comparatively inexpensive while requiring only small sample volumes. On the other hand, indirect assays after organic extraction and chromatographic prepurification have the advantage of reducing specific interferences and matrix effects and hence are thought to have better validity. We compared direct assays of testosterone (T, six different assays), Delta4-androstenedione (A, four assays), estrone (E(1), one assay), and 17beta-estradiol (E(2), five assays) with measurements obtained by an indirect assay in a representative subset of 20 postmenopausal women who were part of a large prospective cohort study. Within-batch reproducibilities of the subject rankings by relative hormone levels were good (intraclass correlations >0.89) for all direct assays tested. Between batches, reproducibilities generally were also acceptable (r > 0.80) to good (r > 0.90) in terms of Pearson's correlations. The between-batch reproducibility in terms of intraclass correlations was systematically lower in terms of Pearson's correlations, however, because of between-batch variations in the absolute scale of measurements. The relative validity of direct versus indirect assays in terms of the subjects' ranking by relative hormone levels was also high for most of the kits tested for T, A, and E(1) (Pearson's correlations between 0.70 and 0.89) but was high for only two kits of five tested for E(2) (correlations of 0.86 and 0.84). On an absolute scale, mean measurement values were generally higher for direct assays than for the indirect assay and, for each hormone, varied substantially, depending on the kit used. Overall, the results of this study show that, with careful selection, commercial kits for direct radioimmunoassays of steroid hormones in postmenopausal serum can be found that may allow a reliable estimation of relative risks in epidemiological studies. However, standardization of the absolute scale of assays remains problematic.