We compared 5 different statistics (i.e., G index, gamma, d′, sensitivity, specificity) used in the social sciences and medical diagnosis literatures to assess calibration accuracy in order to examine the relationship among them and to explore whether one statistic provided a best fitting general measure of accuracy. College undergraduates completed separate 15-item vocabulary, probability, and paper folding tests by answering a test item and indicating whether or not the item was answered correctly. We computed scores for each of the 5 calibration statistics using the same raw scores for each test and compared 3 theoretical models, including 1-, 2-, and 3-factor confirmatory factor analysis solutions. Results supported the 3-factor model over the 1-factor and 2-factor models with respect to goodness-of-fit indices and least number of estimated parameters. The 3-factor solution was consistent with the hypothesis that the 5 individual calibration scores are related to 2 different types of 2nd-order processes (i.e., accuracy of judgments about correct and incorrect performance), as measured by sensitivity and specificity that are subsumed under a general 3rd-order discrimination process as measured by d′. Implications for a theory of calibration accuracy and measurement practice were discussed.