Reliability and validity
Validity
Important for assessment, not only for research
It’s the scores that are valid and reliable, not the instrument
Sometimes the whole is greater than the sum of the parts e.g. when a student gets all the check marks but doesn’t perform competently overall e.g. the examiner can tick each competency being assessed but the student doesn’t establish rapport with the patient. Difficult to address
What does the score mean?
Students are efficient in the use of their time i.e. they will study what is being assessed because the inference is that we’re assessing what is important
Validity can be framed as an “argument / defense” proposition
Our Ethics exam is a problem of validity. Written tests measure knowledge, not behaviour e.g. students can know and report exactly what informed consent is and how to go about getting it, but may not pay it any attention in practice. How do we make the Ethics assessment more valid?
“Face” validity doesn’t exist, it’s more accurately termed “content” validity. “Face” validity basically amounts to saying that something looks OK
What are the important things to score? Who determines what is important?
There are some things that standardised patients can’t do well e.g. trauma
Assessment should sample more broadly from a domain. This improves validity and also students don’t feel like they’ve wasted their time studying things that aren’t assessed. The more assessment items we include, the more valid the results
Scores drop if timing of assessment is inappropriate e.g. too much or too little time → lower scores as students either rush or try to “fill” the time something that isn’t appropriate for the assessment
First round scores in OSCEs are often lower then later rounds
Even though the assessment is meant to indicate competence, there’s actually no way to predict if practitioners are actually competent
Students really do want to learn!
Reliability
We want to ensure that a students observed score is a reasonable reflection of their “true ability”
In reliability assessments, how do you reduce the learning that occurs between assessments?
In OSCEs, use as many cases / stations as you can, and have different assessor for each station. This is the most effective rating design
We did a long session on standard setting, which was fascinating especially when it came to having to defend the cut-scores of exams i.e. what criteria do we use to say that 50% (or 60 or 70) is the pass mark? What data do we have to defend that standard?
Didn’t even realise that this was something to be considered, good to know that methods exist to use data to substantiate decisions made with regards to standards that are set (e.g. Angoff Method)
Should students be able to compensate for poor scores in one area, with good scores in another. Should they have to pass every section that we identify as being important? If it’s not important, why is it being assessed?
Norm-referenced critera are not particularly useful to determine competence. Standards should be set according to competence, not according to the performance of others
Standard setting panels shouldn’t give input on the quality of the assessment items
You can use standard setting to lower the pass mark in a difficult assessment, and to raise the pass mark in an easier exam
Alignment of expectations with actual performance
Setting up an OSCE
- Design
- Evaluate
- Logistics
Standardised, compartmentalised (i.e. not holistic), variables removed / controlled, predetermined standards, variety of methods
Competencies broken into components
Is at the “shows how” part of Miller’s pyramd (Miller, 1990, The assessment of clinical skills, Academic Medicine, 65; S63-S67)
Design an OSCE, using the following guidelines:
- Summative assessment for undergraduate students
- Communication skill
- Objective
- Instructions (student, examiner, standardised patient)
- Score sheet
- Equipment list
Criticise the OSCE stations of another group
Assessing clinical performance
Looked at using mini-CEX (clinical evaluation exercise)
Useful for formative assessment
Avoid making judgements too soon → your impression may change over time