Projects

German translation of this site


PINGUIN
Replication Crisis in Machine Learning

Potential Identification in Elementary School

In the project “Potential Identification in Elementary School for Individual Support” (PINGUIN), we are developing a screening tool to objectively and reliably assess students’ cognitive potential and initial learning conditions at school entry. The computer-based assessment of the PINGUIN project consists of four modules: (1) cognitive potential, (2) language skills, (3) early literacy, and (4) basic mathematical competencies. For each module, tasks are selected adaptively from a comprehensive item bank. The study is conducted in small groups at school using tablets. PINGUIN is designed to help identify children’s potential at an early stage, to provide an objective and fair evaluation of their initial learning conditions, and provide individual support. Teachers can use the knowledge of each child’s individual strengths and weaknesses to tailor their teaching.

Work Program

For Modules 2-4, which assess basic skills, a comprehensive item bank must be developed to fully capture the constructs and ensure a differentiated assessment of students’ performance levels. From the end of first grade, the assessment requirements are aligned with the curriculum (criterion-based comparisons). Given the considerable variation in students’ initial learning conditions at school entry, adaptive testing is recommended to make best use of the limited 10-minute testing time. Adaptive testing, in turn, requires extensive norming data to estimate item difficulty.

The broad content coverage on the one hand and the adaptive testing on the other hand require analyses based on Item Response Theory (IRT), which is widely used in educational monitoring. IRT models allow results to be mapped onto a common metric, even when different items are administered. This allows for direct comparisons of student performance within the same grade level (social comparisons). In addition, IRT modeling allows for the tracking of developmental progress over time (temporal comparisons). Another focus is the consideration of contextual factors, such as socio-economic status or immigrant background, to ensure a fair assessment of student performance.

Collaborating Partners

The project is funded by the HECTOR Foundation (project duration: 09/2024 – 08/2027). It involves researchers from the universities of Tübingen (Prof. Dr. Ulrich Trautwein, Jun.-Prof. Jessica Golle, Dr. Benjamin Goecke), Ulm (Prof. Dr. Oliver Wilhelm), Kassel (Prof. Dr. Ulrich Schroeders, Priscilla Achaa-Amankwaa, Jonas Walter), Würzburg (Dr. Darius Endlich), and Bonn (Dr. Johanna Hartung), as well as the DIPF | Leibniz Institute for Research and Information in Education (Prof. Dr. Marcus Hasselhorn, Dr. Patrick Lösche).


Facing the Replication Crisis in Machine Learning

Predictive modeling using machine learning (ML) algorithms is gaining popularity in many scientific disciplines, including medicine, epidemiology, and psychology. However, the transfer of complex statistical methods to other areas of application outside its core area is prone to error. Thus, initially promising results were unfortunately often based on incorrectly validated models that led to overly optimistic predictive accuracy (e.g., in predicting the risk of suicide). As methodological shortcomings can have serious negative consequences for both individuals and society, some researchers warn of a “new” replication crisis in ML-based research. Previous work has largely focused on the algorithmic aspects of this crisis, ignoring the specific challenges in psychological research, such as unreliable indicators, small samples, missing data. We propose a workflow specifically tailored to ML research in psychology, highlighting typical challenges and pitfalls. It consists of five steps: (1) conceptualization, (2) preprocessing, (3) model training, (4) validation and evaluation, and (5) interpretation and generalizability. In addition to the more technical-statistical steps, the workflow also includes the more conceptual aspects that need to be addressed to successfully implement ML modeling in psychological research.

Work Program

As a first project, we will conduct a comprehensive systematic review of the predictive modeling literature across different psychological subdisciplines over the past decade. The goal is to provide an overview of common practices in psychological research regarding conceptualization, data preprocessing, model training and validation, generalizability claims, and open science practices. In a second project, based on the systematic review, we will identify typical pitfalls and develop a checklist to help authors navigate through the ML workflow. In addition, we will compile a brief Risk of Bias assessment for ML modeling that can be used to assess the quality of ML studies, for example when conducting a meta-analysis. In a third project, we will create an ML predicting challenge and evaluate our best practice recommendations in an experimental setting. In one condition, we will provide no further guidance or restrictions beyond the description of the prediction task, while in the other condition, we will provide researchers with recommendations and information on how to identify and avoid common ML modeling pitfalls. Finally, we will test whether following the recommendations leads to more robust, transparent, and reproducible predictions. In a fourth project, we will develop an open online learning course to teach the logic and techniques of ML modeling. All four projects will provide tools and resources to mitigate the replication crisis in ML.

Collaborating Partners

Dr. Kristin Jankowsky and Prof. Dr. Ulrich Schroeders have received a grant from the German Research Foundation (DFG) for the project “Facing the Replication Crisis in Machine Learning Modeling” as part of the DFG priority program “META-REP” (project duration: 01/2025 - 12/2027).