R

Analyze your Zotero database with R

Why should someone be interested in analyzing her/his literature database? Actually, there are several good reasons to do so. You may also be interested, if you ask yourself one of the following questions:

  • Which journals should your local librarian add to the university bookshelves?
  • To which journal you should send you next ground-breaking manuscript?
  • Which journals are most interesting for me and should get an e-mail alert?

In all these instances, you want to find some data driven recommendations and answers. Here is how you can achieve this goal with just a few lines of R syntax. Before starting your R console you only have to export your library/folder/selected entries to a csv-file.

library(plyr)

setwd("c:/Dropbox/workspace/Bibliothek")
ebf.bib <- read.csv("ebf-jp.csv", encoding="UTF-8")

# my ebf-jp library
ebf.jour.freq <- count(ebf.bib[ebf.bib$Item.Type=="journalArticle" &
  as.Date(substring(ebf.bib$Date.Added,1,10)) - as.Date("2014-08-03") > 1, ], 
  "Publication.Title")
ebf.jour <- subset(ebf.jour.freq, freq > 10)
arrange(ebf.jour, freq, decreasing = TRUE)

In case you are interest what the output looks like. Here are the results of my last year reading:

Publication.Titlefreq
1Intelligence37
2Journal of Educational Psychology27
3Learning and Individual Differences27
4Zeitschrift für Pädagogische Psychologie19
5International Journal of Science Education12
6Educational Psychologist11
7Learning and Instruction11

If you have any recommendations or examples, then drop me a line in the comment box below.

A comparison of different psychometric approaches to modeling testlet structures

JEMReference. Schroeders, U., Robitzsch, A. & Schipolowski, S. (2014). A comparison of different psychometric approaches to modeling testlet structures: an example with c-tests. Journal of Educational Measurement, 51, 400–418. doi:10.1111/jedm.12054

Abstract. C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties, reliability estimates, and person parameter estimates for different modeling approaches: (a) Rasch, (b) testlet, (c) partial credit, and (d) copula models. The results are complemented with findings of a simulation study in which sample size, number of testlets, and strength of residual correlations between items were systematically manipulated. Results are discussed with regard to the pivotal question whether residual dependencies between items are an artifact or part of the construct.

Example code for the different models and data file

library(TAM)

# specify working directory, read in data
setwd("c:\\temp\\")
dat <- read.table( "text308.csv" , sep=";", dec=",", header=T , na="" )
testlet <- c(1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6)

# Rasch model
# --------------------
mod.rasch <- tam( resp = dat[,-1] , pid=dat[,1], 
			control=list(maxiter=300, snodes=5000, seed = 99081) )

# Testlet model
# --------------------
# testlets are dimensions, assign items to Q-matrix
TT <- length(unique(testlet))
Q <- matrix( 0 , nrow=ncol(dat[,-1]) , ncol= TT + 1)

Q[,1] <- 1 # First dimension constitutes g-factor
for (tt in 1:TT){ Q[ testlet == tt , tt+1 ] <- 1 }

# In a testlet model, all dimensions are uncorrelated among each other, 
# that is, all pairwise correlations are set to 0, which can be 
# accomplished with the "variance.fixed" command
library(combinat)
variance.fixed <- cbind( t( combn( TT+1,2 ) ) , 0 )

mod.testlet <- tam( resp = dat[,-1] , pid=dat[,1], Q = Q , 
				    control=list( snodes = 5000 , maxiter = 300 , seed = 99081 ) , 
					variance.fixed = variance.fixed )

# Partial credit model
# --------------------
scores <- list()
testlet.names <- NULL
dat.pcm <- NULL
for (l in 1:max(testlet)) {
	scores[[l]] <- rowSums( dat[,-1][, testlet == l, drop=FALSE] )
	dat.pcm <- c(dat.pcm, list(c(scores[[l]])))
	testlet.names <- append(testlet.names, paste("testlet",l, sep=""))
}

dat.pcm <- as.data.frame(dat.pcm)
colnames(dat.pcm) <- testlet.names

mod.pcm <- tam(resp= dat.pcm ,  control=list( snodes = 5000 , maxiter = 300 , seed = 99081 ) )

# Copula model
# --------------------
library(sirt)
mod.copula <- rasch.copula2( dat=dat[,-1] , itemcluster = testlet )