Why Situational Judgement Tests Work: Fidelity and Generalizability

Situational judgement tests (SJTs) are a popular selection tool amongst both researchers, admissions teams, and human resources departments – and their popularity is well-deserved.

SJTs deliver a slew of advantages for high-stakes education admissions teams, but their greatest advantage is their ability to predict important performance outcomes in school and, more importantly, on the job.

Research proves that SJTs are a highly effective method of gathering useful information about candidates, but understanding why they’re so effective is still somewhat of a mystery. Researchers have suggested a number of different reasons, with varying levels of empirical support.

What we do know definitively is that the SJT’s strength can be partially attributed to its fidelity — the extent to which the test realistically reflects a situation that would be encountered in the workplace. Generally speaking, higher fidelity means a closer match between what the assessment tool is measuring and what kind of outcomes are desired on the job. This leads to a higher predictive validity of the tool.

Like multiple mini-interviews (MMIs), where applicants participate in an in-person circuit of assessments that include hypothetical situations, SJTs present test-takers with a series of hypothetical scenarios that they may face on the job. Unlike MMIs, SJTs are administered by pencil-and-paper or on a computer, so their fidelity is not as high as an in-person MMI. But SJTs have a distinct advantage. While it’s impossible for most programs to offer in-person assessments to every single applicant, offering an online-based SJT to all applicants is not only possible, but convenient.

SJTs can vary in their degree of fidelity, though, and there are a number of modifications that can be made to increase it:

  • Using video-based questions instead of text-based questions, so the question stimuli are closer to reality
  • Having test-takers provide open-ended responses (such as short answers) rather than closed-ended responses (like multiple choice), as life is not a multiple-choice – challenging situations in the real world don’t provide a set of possible response choices
  • Asking “what would you do?” (behavioural tendency questions) instead of “what should you do?” (knowledge instructions). These questions better target the non-cognitive domains (professional and personal characteristics) and greatly reduce the discrepancies between scores of test-takers from different ethnic and socioeconomic backgrounds (a common disadvantage of many assessment tools)

But fidelity isn’t the only measure of the SJT’s effectiveness. It’s important to balance fidelity with generalizability — the extent to which scores from a test can be generalized to other contexts.

Let’s say you want to hire a mechanic, so you create a simulation task with a broken-down car, where the test-taker (the potential mechanic) needs to identify the issue and plan a course of action. This is a high-fidelity simulation, as the simulation is very close to what the mechanic will actually do on the job. But it’s also a highly specific context, as the ability to fix a car may not generalize to fixing a computer or repairing a fridge.

For jobs with very specific tasks, this is less of an issue. But for professions that have a wide range of different sub-disciplines, generalizability is a greater concern.

For instance, medicine encompasses various specialties, like family doctors, research, pediatricians, surgery, public health, diagnostic radiology, and pathology. They vary in the amount of direct interaction with patients, collaboration with peers, and time spent conducting independent work. In medicine then, we would not want to create a selection tool that is overly specific, as we want scores to be equally applicable to all the different specialties.

An SJT that mostly includes scenarios addressing patient concerns would be highly applicable for family doctors, but much less so for pathologists and radiologists, who spend most of their time in the lab or the reading room.

That’s why it’s important that selection tools balance both fidelity and generalizability. We want to use tools that are representative of real-life work situations, to increase the predictive validity of the test, but we also want the information derived from the tools to be generalizable to a potentially wide range of different sub-specialties.

CASPer®, an SJT, puts applicants in situations that they will likely face at some point in school and on the job, but the situations presented greatly vary so that the scores will be equally meaningful across different programs. Regardless of the specific path that an applicant chooses, whether it’s pediatrics or neurosurgery, CASPer® effectively assesses the competencies that are important to their respective disciplines.

Published: October 16, 2017
By: Christopher Zou, Ph.D.
Education Researcher at Altus Assessments