Bennet, E.M., Alpert, R. y Goldstein, A.C. (1954). Communications through limited response questioning. Public Opinion Quarterly, 18, 303-308.
Bloch, D.A. y Kraemer, H.C. (1989). 2 x 2 kappa coefficients: measures of agreement or association. Biometrics, 45, 269-287.
Brennan, R.L. y Prediger, D. (1981). Coefficient kappa: somes uses, misuses and alternatives. Educational and Psychological Measurement, 41,...
Byrt, T., Bishop, J. y Carlin, J.B. (1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423-429.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.
Cronbach, L.J., Gleser, G.C. y Rajaratnam, J. (1972). The dependability of behavioral measurements. New York, NY: Wiley.
Darroch, J.M. y McCloud, P.I. (1986). Category distinguishability and observer agreement. Australian Journal of Statistics, 28, 371-388.
Dillon, W.R. y Mullani, N. (1984). A probabilistic latent class model for assessing inter-judge reliability. Multivariate Behavioral Research,...
Dunn, C. (1989). Design and analysis of reliability studies: the statistical evaluation of measurement errors. Cambridge, UK: Cambridge University...
Feinstein, A. y Cichetti, D. (1990). High agreement but low kappa: I. The problem of two paradoxes. Journal of Clinical Epidemiology, 43,...
Fleiss, J.L., Cohen, J. y Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323-327.
Graham, P. (1995). Modelling covariante effects in observer agreement studies: the case of nominal agreement. Statistics in Medicine, 14,...
Guggenmoos-Holtzmann, I. (1993). How reliable are change-corrected measures of agreement. Statistics in Medicine, 12, 2.191-2.205.
Guggenmoos-Holtzmann, I. (1996). The meaning of kappa: probabilistic concepts of reliability and validity revisited. Journal of Clinical Epidemiology,...
Guggenmoos-Holtzmann, I. y Vonk, R. (1998). Kappa-like indices of observer agreement viewed from a latent class perspective. Statistics in...
Hoehler, F.K. (2000). Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. Journal of Clinical Epidemiology,...
Holley, W. y Guilford, J.P. (1964). A note on the G-index of agreement. Educational and Psychological Measuerement, 24, 749-753.
Hsu, L.M. y Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s p and Aickin’s a. Understanding Statistics,...
Janson, S. y Vegelius, J. (1979). On generalizations of the G-index and the phi coefficient to nominal scales. Multivariate Behavioral Research,...
Lantz, C.A. y Nebenzahl, E. (1996). Behavior and interpretation of the k statistics: resolution of the two paradoxes. Journal of Clinical...
Lin, L., Hedayat, A.S., Sinha, B. y Yang, M. (2002). Statistical methods in assessing agreement: models, issues and tools. Journal of the...
Martín, A. y Femia, P. (2004). Delta: a new measure of agreement between two raters. British Journal of Mathematical and Statistical Psychology,...
Maxwell, A.E. (1977). Coefficients of agreement between observers and their interpretation. British Journal of Psychiatry, 116, 651-655.
Schuster, C. (2002). A mixture model approach to indexing rater agreement. British Journal of Mathematical and Statistical Psychology, 55,...
Schuster, C. y von Eye, A. (2001). Models for ordinal agreement data. Biometrical Journal, 43, 795-808.
Schuster, C. y Smith, D.A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 7, 384-395.
Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321-325.