Categories
assessment

Objective Structured Clinical Exams

This is the first draft of the next piece of content that I’ll be publishing in my Clinical Teacher app.

Abstract

The Objective Structured Clinical Examination was introduced as an assessment method that aimed to address some of the challenges that arose with the assessment of students’ competence in clinical skills. In a traditional clinical examination there are several interacting variables that can influence the outcome, including the student, the patient, and the examiner. In the structured clinical examination, two of the variables – the patient and the examiner – are more controlled, allowing for a more objective assessment of the student’s performance.

The OSCE is a performance-based assessment that can be used in both formative and summative situations. It is a versatile multipurpose tool that can be used to evaluate healthcare students in the clinical context, used to assess competency based on objective testing through direct observation. As an assessment method it is precise, objective, and reproducible which means that it allows consistent testing of students for a wide range of clinical skills. Unlike the traditional clinical exam, the OSCE could evaluate areas most critical to the performance of healthcare professionals such as communication skills and the ability to handle unpredictable patient behaviour. However, the OSCE is not inherently without fault and is only as good as the team implementing it. Care should be taken not to assume that the method is in itself valid, reliable or objective. In addition, the OSCE cannot be used as a measure of all things important in medical education and should be used in conjunction with other assessment tasks.

 

Introduction and background

The OSCE was developed in an attempt to address some of the challenges with the assessment of clinical competence that were prevalent at the time (Harden, Stevenson, Wilson-Downie & Wilson, 1975). These included problems with validity, reliability, objectivity and feasibility. In the standard clinical assessment at the time, the student’s performance was assessed by two examiners who observed them with a several patients. However, the patient and examiner selection meant that chance played too dominant a role in the examination, leading to variations in the outcome (ibid.). Thus there was a need for a more objective and structured approach to clinical examination. The OSCE assesses competencies that are based on objective testing through direct observation. It consists of several stations in which candidates must perform a variety of clinical tasks within a specified time period against predetermined criteria (Zayyan, 2011).

The OSCE is a method of assessment that is well-suited to formative assessment. It is a form of performance-based assessment, which means that a student must demonstrate the ability to perform a task under the direct observation of an examiner. Candidates get examined on predetermined criteria on the same or similar clinical scenario or tasks with marks written down against those criteria thus enabling recall, teaching audit and determination of standards.

 

Rationale for the OSCE

While the OSCE attempts to address issues of validity, reliability, objectivity and feasibility it should be noted that it cannot be all things to all people. It is practically impossible to have an assessment method that satisfies all the criteria of a good test in terms of validity and reliability. For example the OSCE cannot be used to measure students’ competence of characteristics like empathy, commitment to lifelong learning and care over time. These aspects of students’ competence should be assessed with other methods. Having said that, we should discuss the four important aspects of accurate assessment that inform the implementation of the OSCE (Barman, 2005).

Validity

Validity is a measure of how well an assessment task measures what it is supposed to measure, and may be regarded as the most important factor to be considered in an assessment. For a test of have a high level of validity, it must contain a representative sample of what students are expected to have achieved. For example, if the outcome of the assessment task is to say that the student is competent in performing a procedure, then the test must actually measure the student’s ability to perform the procedure. In addition, the OSCE tests a range of skills in isolation, which does not necessarily indicate their ability to perform the separate tasks as an integrated whole.

Reliability

Reliability is a measure of the stability of the test results over time and across sample items. In the OSCE reliability may be low if there are few stations and short timeframes. Other factors that influence reliability include unreliable “standardised” patients, personal scoring systems, patients, examiners and students who are fatigued, and noisy or disruptive assessment environments. The best way to improve the reliability of an OSCE is to have a high number of stations and to combine the outcomes with other methods of assessment.

Objectivity

The objectivity of the OSCE relies on the standardisation of the stations and the checklist method of scoring student performance, which theoretically means that every student will be assessed on the same task in the same way. However, there is evidence that inter-rater reliability can be low on the OSCE as well, meaning that there is still a bias present in the method. In order to reduce the effect of this bias, the OSCE should include more stations.

Feasibility

In the process of making the decision about whether or not to use the OSCE as an assessment method i.e. whether or not it is feasible, there are a number of factors to be considered. These include the number of students to be assessed, the number of examiners available, the physical space available for running the exam, and the associated cost of these factors. It is important to note that the OSCE is more time-consuming and more expensive in terms of human and material cost than other assessment methods, for example the structured oral examination. In addition, the time required for setting up the examination is greater than that needed in traditional assessment methods, which must be taken into account when making decisions about whether or not to use the OSCE.

 

Advantages of the OSCE format

The OSCE format allows for the direct observation of a student’s ability to engage with clinical ethics skills during a patient interaction. In addition, the OSCE can be used effectively to evaluate students’ communication skills, especially if standardised instruments for assessing this skills are used. In addition, it (Shumway & Harden, 2003; Chan, 2009):

  • Provides a uniform marking scheme for examiners and consistent examination scenarios for students, including pressure from patients.
  • Generates formative feedback for both the learners and the curriculum, whereby feedback that is gathered can improve students’ competency and enhance the quality of the learning experience.
  • Allows for more students to be be examined at any one time. For example, when a student is carrying out a procedure, another student who has already completed that stage may be answering the question at another station.
  • Provides for a more controlled setting because only two variables exist: the patient and the examiner.
  • Provides more insights about students’ clinical and interactive competencies.
  • Can be used to objectively assess other aspects of clinical expertise, such as physical examination skills, interpersonal skills, technical skills, problem-solving abilities, decision-making abilities, and patient treatment skills.
  • Student participation in an OSCE has a positive impact on learning because the students’ attention is focused on the acquisition of clinical skills that are directly relevant to clinical performance.

 

Preparation for an OSCE

The first thing to do when considering developing an OSCE is to ask what it is to be assessed. It is important to realise that OSCEs are not appropriate for assessing all aspects of competence. For example, knowledge is best assessed with a written exam.

The venue where the OSCE is going to take place must be carefully considered, especially if it needs to be booked in advance. If there are large numbers of students, it may be worthwhile to have multiple tracks running in different venues. The advantages are that there will be less noise and fewer distractions. If space is not an issue, having separate rooms for each station is preferable, although multiple stations in a single room with partitions is also reasonable. If you will have real patient assisting, note that you will need rooms for them to rest in (Bouriscot, 2005).

Be aware that you will need to contact and confirm external examiners well in advance of running the OSCE. Clinicians are busy and will needs lots of advance warning. It may be useful to provide a grid of dates and times that are available to give examiners the option of choosing sessions that are most suitable for them (ibid.).

One of the key factors in the success of using the OSCE for assessment is the use of either real or standardised patients. This is a component that adds confidence to the reliability of the outcomes. Standardised patients are the next best thing to working with live patients. They are usually volunteers or actors who are trained in the role playing of different psychological and physiological aspects of patients. Finding and training standardised patients is a significant aspect of preparing for an OSCE (Dent & Hardent, 2005).

If equipment is required, ensure that there are lists available at every station, highlighting what equipment should be present in order for the student to successfully complete the station. You should go through each station with the list the day before the OSCE to ensure that all equipment is present (Bouriscot, 2005).

Mark sheets to be used for the OSCE must be developed in advance. Each examiner at each station must be provided with an appropriate number of mark sheets for the students, including an estimation of spoilage. If there are going to be large numbers of students, it may be worthwhile developing mark sheets that can be electronically scanned. If results are to be manually entered, someone will need to ensure that they have been captured correctly (Bouriscot, 2005).

 

Developing scenarios for each station

The number of stations in an examination is dependent on a number of factors, including the number of students to be assessed, the range of skills and content areas to be covered, the time allocated to each station, the total time available for the examination and the facilities available to conduct the examination (Harden & Cairncross, 1980). Preparing the content for each station should begin well in advance so that others can review the stations and perhaps even complete a practice run before the event. It may happen that a scenario is good in theory but that logistical complications make it unrealistic to run in practice.

The following points are important to note when developing stations (Bouriscot, 2005):

  • Instructions to students must be clear so that they know exactly what is expected of them at each station
  • Similarly, instructions to examiners must also make it clear what is expected of them
  • The equipment required at each station should be identified
  • Marking schedule that identifies the important aspects of the skill being assessed
  • The duration of the station

Stations should be numbered so that there is less confusion for students who are moving between them, and also for examiners who will be marking at particular stations. Note that it is recommended to have one rest station for every 40 minutes of assessment (Bouriscot, 2005). Arrows, either on the floor or wall will help candidates move between stations and avoid any confusion about rotation.

While stations may be set up in any number of ways, one suggested format is for the student to rotate through two “types” of stations; a procedure station and a question station (Harden, Stevenson, Wilson-Downie & Wilson, 1975). There are two advantages to this approach. In the first place it reduces the effect of cueing, whereby the question that the student must answer is presented at the same time as the instruction for performing the procedure. The nature of the question may prompt the student towards the correct procedure. By using two stations, the candidate is presented with a problem to solve or an examination to be carried out without the questions that come later. When the student gets to the “question” station, they are then unable to go back to the previous station to change their response. Thus the questions do not provide a prompt for the examination. The second advantage of the station approach is that more students can be examined at any one time. While one student is performing, another student who has already completed that stage is answering the questions (ibid.).

 

Running an OSCE

It may be useful, if the venue is large, to have a map of the facility set up, including the location of specific stations. This can help determine early on which stations will be set up in which rooms, as well as determining the order of the exam. The number of available rooms will determine how many stations are possible, as well as how many tracks can be run simultaneously (and therefore how many times each track will need to be run). You will also need a space for subsequent groups of students to be sequestered while previous round of students are finishing. If the exam is going to continue for a long time, you may need an additional room for examiners and patients to rest and eat.

Students should be informed in advance how they will proceed from one station to another. For example, will one bell be used to signal the end of one station and the beginning of another. If the OSCE is formative in nature, or a practice round, will different buzzers be used to signal a period of feedback from the examiner? When the bell signalling the end of the station sounds, candidates usually have 1 minute to move to the next station and read the instructions before entering.

On the day of the exam, time should be allocated for registering students, directing them to stations, setting the time, indicating station changes (buzzers, bells, etc.), and assisting with both setting up final changes and dismantling stations. Each station must have the station number and instructions posted at the entrance, and standardised patients, examiners and candidates matched to the appropriate stations. Examines and patients should be set up at their stations sufficiently in advance of the starting time in order to review the checklists and prepare themselves adequately. It may be possible to have a dry run of the station in order to help the patient get into the role.

It is possible to use paper checklists or to capture the marks with handheld devices like iPads or smartphones (see Software later). The benefits of using digital capturing methods as opposed to paper checklists is that the data is already captured at the end of the examination, and feedback to students and the organisers can be provided more efficiently. If paper checklists are used, they must be collected at the end of the day and data captured manually.

Some of the common challenges that are experienced during the running of the OSCE include (Bouriscot, 2005):

  • Examiners not turning up – send reminders the week before and have reserves on standby
  • Standardised patients not turning up – have reserves on standby
  • Patients not turning up – remind them the day before, provide transport, plan for more patients than are needed
  • Patient discomfort with the temperature – ensure that the venue is warmed up or cooled down before the OSCE begins
  • Incorrect / missing equipment – check the equipment the day before, have spares available in case of equipment malfunction, batteries dying, etc.
  • Patients getting ill – have medical staff on hand
  • Student getting ill – take them somewhere nearby to lie down and recover

The above list demonstrates the range of complications that can arise during an OSCE. You should expect that things will go wrong and try and anticipate them. However, you should also be aware that there will always be room for improvement, which is why attention must be paid to evaluating the process. It is essential that the process be continually refined and improved based on student and staff feedback (Frantz, et al., 2013).

 

Marking of the OSCE

The marking scheme for the OSCE is intentional and objectively designed. It must be concise, well-focused and unambiguous, with the aim of discrimination between good and poor student performance. The marking scheme must therefore be cognisant of many possible choices and provide scores that are appropriate to each student performance (Zayyan, 2011).

The allocation of marks between the different parts of the examination should be determined in advance and will vary with, among other things, the seniority of the students. Thus, with junior students there will be more emphasis their technique and fewer marks will be awarded for the findings of their interpretation (Harden, Stevenson, Wilson Downie, Wilson, 1975).

The following example marking rubric for OSCE stations is taken from Chan (2009):

 

ExcellentProficientAveragePoor
DiagnosisAble to give an excellent analysis and understanding on the patients’ problems and situations and applied medical knowledge to the clinical practice and determined the appropriate treatment.Able to demonstrate medical knowledge with a satisfactory analysis on the patients’ problems, and determined the appropriate treatment.Showed a basic analysis and knowledge on the patients’ problems, still provided the appropriate treatment.Only able to show minimal level of analysis and knowledge on the patients’ problems, unable to provide the appropriate treatment.
Problem-solving skillsAble to manage the time to suggest and bring out appropriate solutions to problems; more than one solutions were provided; logical approach to seek for solutions was observed.Able to manage the time to bring out only one solution; logical flow was still observed but there was a lack of relevance of the flow.Still able to bring out one solution on time; logical flow was hardly observed.Failed to bring out any solution in specific time; logical flow was not observed.
Communication and interactionAble to get detail information needed for diagnosis; gave very clear and detail explanation and answers to patients; paid attention to patients’ responses and words.Able to get detail information needed for diagnosis; gave clear explanation and answers to patients; attempted but only paid some attention to patients’ responses and words.Only able to get basic information needed for diagnosis; attempted to give a clear explanation to patients but omitted some points; did not pay attention to patients’ responses and words.Failed to get information for diagnosis; gave ambiguous explanation to patients.
Clinical skillsPerfectly performed the appropriate clinical procedures for every clinical tasks with no omission; no unnecessary procedure was done.Performed the required clinical procedures satisfactorily; committed a few minor mistakes or unnecessary procedure which did not affect the overall completion of the procedure.Performed the clinical procedures at an acceptable standard; committed some mistakes and some unnecessary procedures were done.Failed to carry out the necessary clinical procedures; committed lots of mistakes and misconception about operating clinical apparatus.

 

Common mistakes made by students during the OSCE

It may be helpful to guide students before the examination by helping them to understand what the OSCE is not (Medical Council of Canada, n.d.).

  • Not reading the instructions carefully – The student must elicit from the “patient” only the precise information that the question requires. Any additional or irrelevant information provided must not receive a mark.
  • Asking too many questions – Avoid asking too many questions, especially if the questions are disorganised and erratic, and seem aimed at hopefully stumbling across the few appropriate questions that are relevant to the task. The short period of time is designed to test candidates ability to elicit the most appropriate information from the patient.
  • Misinterpreting the instructions – This happens when candidates try to determine what the station is trying to test, rather than working through a clinically appropriate approach to the patient’s presenting complaint.
  • Using too many directed questions – Open-ended questions are helpful in this regard as they give the patient the opportunity to share more detailed information, while still leaving space for you to follow up with more directed questions.
  • Not listening to patients – Patients often report that candidates did not listen appropriately and therefore missed important information that was provided during the interview. In the case of using standardised patients, they may be trained to respond to an apparently indifferent candidate by withdrawing and providing less information.
  • Not explaining what you are doing in physical examination stations – The candidates may not explain what they are doing during the examination, leaving the examiner guessing as to what was intended, or whether the candidate observed a particular finding. By explaining what you see, hear and intend doing, you provide the examiner with context that helps them in scoring you appropriately.
  • Not providing enough direction in management stations – At stations that aim to assess the candidate’s management skills, they should provide clear instructions that will help you to improve their performance.
  • Missing the urgency of a patient problem – When the station is designed to assess clinical priorities, work through the priorities first and then come back later for additional information if this was not elicited earlier.
  • Talking too much – The time that the candidate spends with their patient should be used effectively in order to obtain the most relevant information. Candidates should avoid showing off with their vast knowledge base. Speak to the patient with courtesy and respect, eliciting relevant information.
  • Giving generic information – The candidate should avoid giving generic information that is of little value to the patient when it comes to making an informed decision.

 

Challenges with the OSCE

While the OSCE has many positive aspects, it should be noted that there are also many challenges when it comes to setting up and running them. The main critique against the OSCE is that it is very resource intensive but there are other disadvantages that include (Barman, 2005; Chan, 2009):

  • Requiring a lot of organisation. However, an argument can also be made that the increased preparation time occurs before the exam and allows for an examiners time to be used more efficiently.
  • Being expensive in terms of manpower, resources and time.
  • Discouraging students from looking at the patient as a whole.
  • Examining a narrow range of knowledge and skills and does not test for history-taking competency properly. Students only examine a number of different patients in isolation at each station instead of comprehensively examining a single patient.
  • Manual scoring of OSCE stations is time-consuming and increases the probability of mistakes.
  • It is nearly impossible to have children as standardised patients or patients with similar physical findings.

In addition, while being able to take a comprehensive history is an essential clinical skill, the time constraints necessary in an OSCE preclude this from being assessed. Similarly, because students’ skills are assessed in sections, it is difficult to make decisions regarding students’ ability to assess and manage patients holistically (Barman, 2005). Even if one were able to construct stations that assessed all aspects of clinical skills, it would only test those aspects in isolation rather than comprehensively integrating them all into a single demonstration. Linked to that, the OSCE also has a potentially negative impact on students’ learning because it contains multiple stations that sample isolated aspects of clinical medicine. The student may therefore prepare for the examination by compartmentalising the skills and not completely understanding the connection between them (Shumway & Harden, 2003). There also seems to be some evidence that while the OSCE is an appropriate method of assessment in undergraduate medical education, it is less well-suited for assessing the in-depth knowledge and skills of postgraduate students (Patil, 1993).

Challenges with reliability in the clinical examination may arise from the fact that different students are assessed on different patients and one may come across a temperamental patient who may help some students while obstructing others. In addition, test scores may not reflect students’ actual ability as repetitive demands may fatigue the student, patient or examiner. Students’ fatigue due to lengthy OSCEs may may affect their performance. Moreover, some students affect experience greater tension before and during examinations, as compared to other assessment methods. In spite of efforts to control patient and examiner variability, inaccuracies in judgment due to these effects remain. (Barman, 2005).

 

Software for managing an OSCE

There is an increasing range of software that assists with setting up and running an OSCE. These services often run a on a variety of mobile devices, offering portability and ease of use for examiners. One of the primary benefits of the using digital, instead of paper, scoring sheets is that the results are instantly available for analysis and for reporting to students. Examples of some of the available software include OSCE Online, OSCE Manager and eOSCE.

Selection_002

Ten OSCE pearls

The following list is taken from Dent & Harden (2005), and includes lessons learned from practical experiences of running OSCEs.

  1. Make all stations the same length, since rotating students through the stations means that you can’t have some students finishing before others.
  2. Linked stations require preparation. For example, if station 2 requires the student to follow up on what was done at station 1, then no student can begin at station 2. This means that a staggered start is required. In this case, one student would begin the exam before everyone else. Then, when the main exam begins, the student at station 1 will move to station 2. This student will finish one station before everyone else.
  3. Prepare additional standardised patients, and have additional examiners available to allow for unpredictable events detaining either one.
  4. Have backup equipment in case any of the exam equipment fails.
  5. Have staff available during the examination to maintain security and help students move between stations, especially those who are nervous at the beginning.
  6. If there is a missing student, move a sign labelled “missing student” to each station as the exam progresses. This will help avoid confusion when other students move into the unoccupied station by mistake.
  7. Remind students to remain in the exam room until the buzzer signals the end of the station, even if they have completed their task. This avoids having students standing around in the areas between rooms.
  8. Maintain exam security, especially when running the exam multiple times in series. Ensure that the first group of students are kept away from the second group.
  9. Make sure that the person keeping time and sounding the buzzer is well-prepared, as they have the potential to cause serious confusion among examiners and students. In addition, ensure that the buzzer can be heard throughout the exam venue.
  10. If the rotation has been compromised and people are confused, stop the exam before trying to sort out the problem. If a student has somehow missed a station, rather allow them the opportunity to return at the end and complete it then.

 

Take home points

  • The OSCE aims to improve the validity, reliability, objectivity and feasibility of assessing clinical competence in undergraduate medical students
  • The method is not without it’s challenges, which include the fact that it is resource intensive and therefore expensive
  • Factors which can play a role in reducing confidence in the test results include student, examiner and patient fatigue.
  • The best way to limit the influence of factors that negatively impact on the OSCE is to have a high number of stations.
  • Being well-prepared for the examination is the best way to ensure that it runs without problems. However, even when you are well-prepared, expect their to be challenges.
  • The following suggestions are presented to ensure a well-run OSCE:
    • Set an exam blueprint
    • Develop the station cases with checklists and rating scales
    • Recruit and train examiners
    • Recruit and train standardised patients
    • Plan space and equipment needs
    • Identify budgetary requirements
    • Prepare for last-minute emergencies

 

Conclusion

The use of the OSCE format for clinical examination has been shown to demonstrate improvements in reliability and validity of the assessment, allowing examiners to say with more confidence that students are proficient in the competencies that are tested. While OSCEs are considered to be more fair than other types of practical assessment, they do require significant investment in terms of finance, time and effort. However, these disadvantages are offset by the improvement in objectivity that emerge as a result of the approach.

 

Bibliography

Categories
assessment curriculum education research teaching

SAFRI 2011 (session 2) – day 4

Reliability and validity

Validity

Important for assessment, not only for research

It’s the scores that are valid and reliable, not the instrument

Sometimes the whole is greater than the sum of the parts e.g. when a student gets all the check marks but doesn’t perform competently overall e.g. the examiner can tick each competency being assessed but the student doesn’t establish rapport with the patient. Difficult to address

What does the score mean?

Students are efficient in the use of their time i.e. they will study what is being assessed because the inference is that we’re assessing what is important

Validity can be framed as an “argument / defense” proposition

Our Ethics exam is a problem of validity. Written tests measure knowledge, not behaviour e.g. students can know and report exactly what informed consent is and how to go about getting it, but may not pay it any attention in practice. How do we make the Ethics assessment more valid?

Face” validity doesn’t exist, it’s more accurately termed “content” validity. “Face” validity basically amounts to saying that something looks OK

What are the important things to score? Who determines what is important?

There are some things that standardised patients can’t do well e.g. trauma

Assessment should sample more broadly from a domain. This improves validity and also students don’t feel like they’ve wasted their time studying things that aren’t assessed. The more assessment items we include, the more valid the results

Scores drop if timing of assessment is inappropriate e.g. too much or too little time → lower scores as students either rush or try to “fill” the time something that isn’t appropriate for the assessment

First round scores in OSCEs are often lower then later rounds

Even though the assessment is meant to indicate competence, there’s actually no way to predict if practitioners are actually competent

Students really do want to learn!

Reliability

We want to ensure that a students observed score is a reasonable reflection of their “true ability”

In reliability assessments, how do you reduce the learning that occurs between assessments?

In OSCEs, use as many cases / stations as you can, and have different assessor for each station. This is the most effective rating design

We did a long session on standard setting, which was fascinating especially when it came to having to defend the cut-scores of exams i.e. what criteria do we use to say that 50% (or 60 or 70) is the pass mark? What data do we have to defend that standard?

Didn’t even realise that this was something to be considered, good to know that methods exist to use data to substantiate decisions made with regards to standards that are set (e.g. Angoff Method)

Should students be able to compensate for poor scores in one area, with good scores in another. Should they have to pass every section that we identify as being important? If it’s not important, why is it being assessed?

Norm-referenced critera are not particularly useful to determine competence. Standards should be set according to competence, not according to the performance of others

Standard setting panels shouldn’t give input on the quality of the assessment items

You can use standard setting to lower the pass mark in a difficult assessment, and to raise the pass mark in an easier exam

Alignment of expectations with actual performance

Setting up an OSCE

  • Design
  • Evaluate
  • Logistics

Standardised, compartmentalised (i.e. not holistic), variables removed / controlled, predetermined standards, variety of methods

Competencies broken into components

Is at the “shows how” part of Miller’s pyramd (Miller, 1990, The assessment of clinical skills, Academic Medicine, 65; S63-S67)

Design an OSCE, using the following guidelines:

  • Summative assessment for undergraduate students
  • Communication skill
  • Objective
  • Instructions (student, examiner, standardised patient)
  • Score sheet
  • Equipment list

Criticise the OSCE stations of another group

 

Assessing clinical performance

Looked at using mini-CEX (clinical evaluation exercise)

Useful for formative assessment

Avoid making judgements too soon → your impression may change over time