Meta-heuristics in short scale construction

Reference. Schroeders, U., Wilhelm, O., & Olaru, G. (2016). Meta-heuristics in short scale construction: Ant Colony Optimization and Genetic Algorithm. PloS One, 11, e0167110.

Abstract. The advent of large-scale assessment, but also the more frequent use of longitudinal and multivariate approaches to measurement in psychological, educational, and sociological research, caused an increased demand for psychometrically sound short scales. Shortening scales economizes on valuable administration time, but might result in inadequate measures because reducing an item set could: a) change the internal structure of the measure, b) result in poorer reliability and measurement precision, c) deliver measures that cannot effectively discriminate between persons on the intended ability spectrum, and d) reduce test-criterion relations. Different approaches to abbreviate measures fare differently with respect to the above-mentioned problems. Therefore, we compare the quality and efficiency of three item selection strategies to derive short scales from an existing long version: a Stepwise COnfirmatory Factor Analytical approach (SCOFA) that maximizes factor loadings and two metaheuristics, specifically an Ant Colony Optimization (ACO) with a tailored userdefined optimization function and a Genetic Algorithm (GA) with an unspecific cost-reduction function. SCOFA compiled short versions were highly reliable, but had poor validity. In contrast, both metaheuristics outperformed SCOFA and produced efficient and psychometrically sound short versions (unidimensional, reliable, sensitive, and valid). We discuss under which circumstances ACO and GA produce equivalent results and provide recommendations for conditions in which it is advisable to use a metaheuristic with an unspecific out-of-the-box optimization function.

Comment. This is my first Open Access pubclication, funded by the University of Bamberg. With respect to Open Material, all syntax used is published on my GitHub-repository. Finally, in this paper data from the National Educational Panel Study (NEPS): Starting Cohort 4-9th Grade, doi:10.5157/NEPS:SC4:4.0.0. is used, that is, Open Data. Thus, hat trick: Open Access, Open Materials, and Open Data.

The influence of item sampling on sex differences in knowledge tests

IntelligenceReference. Schroeders, U., Wilhelm, O., & Olaru, G. (2016). The influence of item sampling on sex differences in knowledge tests. Intelligence, 58, 22–32. doi:10.1016/j.intell.2016.06.003

Abstract. Few topics in psychology have generated as much controversy as sex differences in intelligence. For fluid intelligence, researchers emphasize the high overlap between the ability distributions of males and females, whereas research on sex differences in declarative knowledge often uncovers a male advantage. However, on the level of knowledge domains, a more nuanced picture emerged: while females perform better in health-related topics (e.g., aging, medicine), males outperform females in domains of natural sciences (e.g., engineering, physics). In this paper we show that sex differences vary substantially depending on item sampling. Analyses were based on a sample of n = 3,306 German high-school students (Grades 9 and 10) who worked on the 64 declarative knowledge items of the Berlin Test of Fluid and Crystallized Intelligence (BEFKI) assessing knowledge within three broad content domains (science, humanities, social studies). Using two strategies of item sampling — stepwise confirmatory factor analysis and ant colony optimization algorithm — we deliberately manipulate sex differences in multi-group structural equation models. Results show that sex differences considerably vary depending on the indicators drawn from the item pool. Furthermore, ant colony optimization outperforms the simple stepwise selection strategy since it can optimize several criteria simultaneously (model fit, reliability, and preset sex differences). Taken together, studies reporting sex differences in declarative knowledge fail to acknowledge item sampling issues. On a more general stance, handling item sampling hinges on profound considerations of the content of measures.

Comment. I presented the central findings of this paper at the 50th Conference of the German Society for Psychology in a symposium on Current methods in intelligence assessment: invariance, item sampling and scoring, which I organized together with Philipp Doebler. For everyone who is interest, but didn’t made to the session, here are the slides.

A comparison of different psychometric approaches to modeling testlet structures

JEMReference. Schroeders, U., Robitzsch, A. & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: an example with c-tests. Journal of Educational Measurement, 51, 400–418. doi:10.1111/jedm.12054

Abstract. C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties, reliability estimates, and person parameter estimates for different modeling approaches: (a) Rasch, (b) testlet, (c) partial credit, and (d) copula models. The results are complemented with findings of a simulation study in which sample size, number of testlets, and strength of residual correlations between items were systematically manipulated. Results are discussed with regard to the pivotal question whether residual dependencies between items are an artifact or part of the construct.

Example code for the different models and data file


# specify working directory, read in data
dat <- read.table( "text308.csv" , sep=";", dec=",", header=T , na="" )
testlet <- c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6)

# Rasch model
# --------------------
mod.rasch <- tam( resp = dat[,-1] , pid=dat[,1], 
			control=list(maxiter=300, snodes=5000, seed = 99081) )

# Testlet model
# --------------------
# testlets are dimensions, assign items to Q-matrix
TT <- length(unique(testlet))
Q <- matrix( 0 , nrow=ncol(dat[,-1]) , ncol= TT + 1)

Q[,1] <- 1 # First dimension constitutes g-factor
for (tt in 1:TT){ Q[ testlet == tt , tt+1 ] <- 1 }

# In a testlet model, all dimensions are uncorrelated among each other, 
# that is, all pairwise correlations are set to 0, which can be 
# accomplished with the "variance.fixed" command
variance.fixed <- cbind( t( combn( TT+1,2 ) ) , 0 )

mod.testlet <- tam( resp = dat[,-1] , pid=dat[,1], Q = Q , 
				    control=list( snodes = 5000 , maxiter = 300 , seed = 99081 ) , 
					variance.fixed = variance.fixed )

# Partial credit model
# --------------------
scores <- list()
testlet.names <- NULL
dat.pcm <- NULL
for (l in 1:max(testlet)) {
	scores[[l]] <- rowSums( dat[,-1][, testlet == l, drop=FALSE] )
	dat.pcm <- c(dat.pcm, list(c(scores[[l]])))
	testlet.names <- append(testlet.names, paste("testlet",l, sep=""))

dat.pcm <-
colnames(dat.pcm) <- testlet.names

mod.pcm <- tam(resp= dat.pcm ,  control=list( snodes = 5000 , maxiter = 300 , seed = 99081 ) )

# Copula model
# --------------------
mod.copula <- rasch.copula2( dat=dat[,-1] , itemcluster = testlet )