Categories
AI education

Why I think that AI-based grading in education is inevitable.

A few days ago I commented on an article that discusses the introduction of AI into education and why teachers shouldn’t worry about it. I also said that AI for grading was inevitable because it would be cheaper, and more reliable, fair and valid than human beings. I got some pushback from Ben on Twitter and realised that I was making several assumptions in my post so I’ve written this post to clarify some of what I said. I also wanted to use this post to test my assumptions around the claims I made, so it’s a bit longer than usual since I’m “thinking out loud” and trying to justify each point.

First, there are the 4 claims I make for why I think that AI-based assessment of things like essays is inevitable:

  • AI will be cheaper than human beings.
  • AI will be more reliable than human beings.
  • AI will be more fair than human beings.
  • AI will be more valid than human beings.

Cheaper: Over the past 60 years or so we’ve seen fairly consistent improvements in power, efficiency and speed at increasingly lower costs. Even if we assume that Moore’s Law is bottoming out we’ll still see continued progress in cost reduction because of improvements in programming techniques, purpose-built chips and new technologies like quantum computers. This is important because, like any industry, education works on a budget. If a university can get close to the same outcomes with a significant reduction in cost, they’ll take it. Software vendors will offer the “essay grading” module that can be integrated into the institutional LMS and the costing will be such that universities would be crazy not to at least pilot it. And my thinking is that it’ll become very clear, very quickly that a significant part of essay grading is really very simple for machines to do. Which brings me to the next claim…

More reliable: A lot of essay grading boils down to things that are relatively simple to programme into a system. For example, spelling is largely a problem that we’ve solved (barring regional inconsistencies) and can therefore express as system of rules. These rules can be coded into an algorithm, which is why spell-checking works. Grammatical structure is also generally well-understood, with most cultures having concepts like nouns, verbs, adjectives, etc., as well as an understanding of how these words are best positioned relative to each other to enhance readability and understanding. Whether we use prescriptive rules (“we should do this”) or descriptive rules (“we actually do this”) matters less than knowing what set of rules we’ll use for the task at hand. It seems reasonable that physiotherapy lecturers could tune an algorithm with a slider, specifying that grammatical structure is less important for their students (i.e. lower scores wrt prescriptive rules are OK) while an English lecturer might insist than their students must score higher on how words should be used. Referencing formatting is also easy to code with a series of rules, as well as the idea that knowledge claims should be supported with evidence. And related to this is the idea that machines are getting better at identifying passages of text that are simply copied from a source. And I think it’s reasonable to assert that a computer can count more quickly, and more reliably, than a person. Of course this doesn’t take into account things like creativity but I’ll get to that. For now, we should at least grant that an AI could plausibly be more reliable than a human being (i.e. it assesses the same things in the same way across multiple examples) when it comes to evaluating things like spelling, grammatical structure, essay structure, referencing, and plagiarism. And machines will do this consistently across tens of thousands of students.

Fairer: Human beings are inherently unfair. Regardless of how fair we think we’re being, there are some variables that we simply can’t tune because we’re not even aware that they’re affecting us. There’s evidence that we’re more strict when we’re hungry or when we’re angry with a partner, and that also we’re influenced by the gender of person we’re grading, the time of day, etc. We’re also affected by sequencing; my grading of the essays I read later are influenced by the earlier examples I’ve seen. This means that a student’s grade might be affected by where in the pile their script is lying, or by their surname if the submission is digital and sorted alphabetically. It may be literally true (warning: controversial opinion coming up) that a student’s mark is more strongly influenced by my current relationship with my children than by what they’ve actually written. Our cognitive biases make it almost impossible for human beings to be as fair as we think we are. And yes, I’m aware that biases are inherent to machine learning algorithms as well. The difference is that those kinds of biases can be seen and corrected, whereas human bias is – and is likely to remain – invisible and unaccountable.

More valid: And finally there’s the issue of validity; are we assessing what we say we’re assessing? For essays this is an important point. Essays are often written in response to a critical question and it’s easy for the assessor to lose sight of that during the grading process. Again, our biases can influence our perceptions without us even being aware of them. A student’s reference to a current political situation may score them points (availability bias) while another, equally valid reference to a story we’re not aware of wouldn’t have the same valence for the assessor. Students can tweak other variables to create a good impression on the reader, none of which are necessarily related to how well they answer the question. For example, even just taking a few minutes to present the essay in a way that’s aesthetically pleasing can influence an assessor, never mind the points received for simply following instructions on layout (e.g. margin size, line spacing, font selection, etc.). When you add things like the relationship between students and the assessor, you start to get a sense for how the person doing the grading can be influenced by many other factors besides the students’ ability to answer the essay question.

OK, so that’s why I think that the introduction of AI for grading – at least for grading essays – is inevitable. However, I’m aware that doesn’t really deal with the bulk of the concerns that Ben raised. I just wanted to provide some context and support for the initial claims I made. The rest of this post is in response to the specific concerns that Ben raised in his series of tweets. I’ve combined some of them below for easier reference.

Can we be sure [that AI-based grading of assessment] is not a bad thing? Is it necessarily fairer? Thinking about the last lot of essays I marked, the ones getting the highest grades varied significantly, representing different takes on a wide ranging and pretty open ended topic. As markers we could allow some weaknesses in an assignment that did other things extremely well and showed independence of thought. The ones getting very good but not excellent grades were possibly more consistent, they were polished and competent but didn’t make quite the same creative or critical jump.

I think I addressed the concern about fairness earlier in the post. I really do think that AI-based grading will be more fair to students. There’s also the argument about how the range of examples with the highest grades tend to be quite different. This is a good thing and represents the kinds of open-ended response to questions that demonstrates how students can use their imagination to construct wide-ranging, unanticipated responses to difficult questions. I think that this would be addressed by the fact that AI-based systems are trained on tens of thousands of examples, all of which are labelled by human assessors. Instead of the system being narrowly constrained by the algorithm, I think that algorithms will open up the possible space of what “good” looks like. While I’m often delighted with variation and creative responses from students, not all of my colleagues feel the same way. An AI-based grading system will ensure that, if we highlight “creativity” as an attribute that we value in our assessments, individual lecturers won’t have as much power to constrain its development. And AI systems will also be able to “acknowledge” that some areas of the students’ submissions are stronger than others, and will be able to grade across different criteria (for example, the output might look like: “student’s ability to follow instructions is “excellent”, language – especially grammar – can be improved, ability to develop an argument from initial premises is “good”, etc.”).

How will AI marking allow for the imaginative, creative assignments and avoid a cycle of increasingly standardised and sanitised assignments as students work out how to please the algorithm?

My first response to this is: how do we “…avoid a cycle of increasingly standardised and sanitised assignments as students work out how to please the lecturer?” And then there’s also the progress being made in “creative expressions” of AI-based systems; art (see here and here), music (see here, here, and here), and stories/poems (see here, and here). You can argue that an AI that uses human artifacts to generate new examples is simply derivative. But I’d counter by saying almost all human-generated art is similarly derivative. There are very few people who have developed unique insights that shift how we see the world at some fundamental level. You could also argue that some of these platforms aren’t yet very good. I’d suggest that they will only ever get better, and that you can’t say the same for people.

Is it fairer to aim for consistency of input/output or to allow for individual interpretations of an assignment? What, at heart is the point of assessment in higher education – consistent competence or individual critical thought?

Also who influences the algorithm? Is it on an institutional basis or wider? Is it fairer to allow for varied and localised interpretations of excellence or end up making everyone fit to one homogenous standard (we can guess which dominant cultural norms it would reflect…)

This is an excellent point and the main reason for why I think it’s incumbent on lecturers to be involved in the development of AI-based systems in education. We can’t rely on software engineers in Silicon Valley to be solely responsible for the design choices that influence how artificial intelligence should be used in education. I expand on these ideas in this book chapter (slideshow summary here).

On the whole I think that Ben has raised important questions and agree that these are valid concerns. For me, there are three main issues to highlight, which I’d summarise like so:

  1. There is a tension between creating assignments that enable open-ended (and therefore creative) student responses and those that are more closed, pushing students towards more standardised submissions. Will AI-based grading systems be able to deal with this nuance?
  2. There is a risk that students might become more concerned with gaming the system and aiming to “please the algorithm”, resulting in sanitised essays rather than imaginative and creative work. How can we avoid this “gaming the system” approach?
  3. There is a bias that’s built into machine learning which is likely to reflect the dominant cultural norms of those responsible for the system. Are we happy to have these biases influence student outcomes and if not, how will we counter them?

Looking back, I think that I’ve presented what I think are reasonable arguments for each of the points above. I may have misunderstood the concerns and I’ve definitely left out important points. But I think that this is enough for now. If you’re a university lecturer or high school teacher I think that the points raised by Ben in his tweets are great starting points for a conversation about how these systems will affect us all.

I don’t think that the introduction of AI-based essay grading will affect our ability to design open-ended assessments that enable student creativity and imagination. We’ve known for decades that rules cannot describe the complexity of human society because people – and the outcomes of interactions between people – are unknowable. And if we can’t specify in advance what these outcomes will look like, we can’t encode them in rules. But this has been the breakthrough that machine learning has brought to AI research. AI-based systems don’t attempt to have “reality” coded into them but rather learn about “reality” from massive sets of examples that are labelled by human beings. This may turn out to be the wrong approach but, for me at least, the argument for using AI in assessment is a plausible one.

Categories
AI education

Comment: Teachers, the Robots Are Coming. But That’s Not a Bad Thing.

…that’s exactly why educators should not be putting their heads in the sand and hoping they never get replaced by an AI-powered robot. They need to play a big role in the development of these technologies so that whatever is produced is ethical and unbiased, improves student learning, and helps teachers spend more time inspiring students, building strong relationships with them, and focusing on the priorities that matter most. If designed with educator input, these technologies could free up teachers to do what they do best: inspire students to learn and coach them along the way.

Bushweller, K. (2020). Teachers, the Robots Are Coming. But That’s Not a Bad Thing. Education Week.

There are a few points in the article that confuse rather than clarify (for example, the conflation of robots with software) but on the whole I think this provides a useful overview of some of the main concerns around the introduction of AI-based systems in education. Personally, I’m not at all worried about having humanoid (or animal-type) physical robots coming into the classroom to take over my job.

I think that AI will be introduced into educational settings more surreptitiously, for example via the institutional LMS in the form of grading assistance, risk identification, timetabling, etc. And we’ll welcome this because it frees us from the very labour intensive, repetitive work that we all complain about. Not only that but grading seems to be one of the most expensive aspects (in terms of time) of a teacher’s job and because of this we’re going to see a lot of interest in this area by governments. For example, see this project by Ofqual (the UK teaching standards regulator) to explore the use of AI to grade school exams.

In fact, I think that AI-based assessment is pretty much inevitable in educational contexts, given that it’ll probably be (a lot) cheaper, more reliable, fair, and valid than human graders.


Shameless self-promotion: I wrote a book chapter about how teachers could play a role in the development of AI-based systems in education, specifically in the areas of data collection, teaching practice, research, and policy development. Here is the full-text (preprint) and here are my slides from a seminar at the University of Cape Town where I presented an overview.

Categories
education learning

Comment: The game of school.

Schools are about learning, but it’s mostly learning how to play the game. At some level, even though we like to talk about schools as though they are about learning in some pure, liberal-arts sense, on a pragmatic level we know that what we’re really teaching students is to get done the things that they are asked to do, to get them done on time, and to get them done with as few mistakes as possible.

I think the danger comes from believing that those who by chance, genetics, temperament, family support, or cultural background find the game easier to play are actually somehow inherently betteror have more human value than the other students.

The students who aren’t succeeding usually don’t have any idea that school is a game. Since we tell them it’s about learning, when they fail they then internalize the belief that they themselves are actual failures–that they are not good learners. And we tell ourselves some things to feel OK about this taking place: that some kids are smart and some are not, that the top students will always rise to the top, that their behavior is not the result of the system but that is their own fault.

Hargadon, S. (2019). The game of school. Steve Hargadon blog: The learning revoluation has begun.

I thought that this was an interesting post with a few ideas that helped me to think more carefully about my own teaching. I’ve pulled out a few of the sentences from the post that really resonated with me but there are plenty more. Once you accept the idea that school (and university) is a game, it all makes a lot more sense; ranking students in leaderboards, passing and failing (as in quests or missions), levelling up, etc.

The author also then goes on to present 4 hierarchical “levels” of learning that really describe frameworks or paradigms rather than any real description of learning (i.e. the categores and names of the levels in the hierarchy are to some extent, arbitrary; it’s the descriptions in each level that count).

If I think about our own physiotherapy programme, we use all 4 “levels” interchangeably and have varying degrees of each of them scattered throughout the curriculum. However, I’d say that the bulk of our approach happens at the lowest level of Schooling, some at Training, a little at Education, and almost none at Self-regulated learning. While we pay lip service to the fact that we “offer opportunities for self-regulated learning”, what it really boils down to is that we give students reading to do outside of class time.

Categories
assessment clinical education research

Emotions and assessment: considerations for rater‐based judgements of entrustment

We identify and discuss three different interpretations of the influence of raters’ emotions during assessments: (i) emotions lead to biased decision making; (ii) emotions contribute random noise to assessment, and (iii) emotions constitute legitimate sources of information that contribute to assessment decisions. We discuss these three interpretations in terms of areas for future research and implications for assessment.

Source: Gomez‐Garibello, C. and Young, M. (2018), Emotions and assessment: considerations for rater‐based judgements of entrustment. Med Educ, 52: 254-262. doi:10.1111/medu.13476

When are we going to stop thinking that assessment – of any kind – is objective? As soon as you’re making a decision (about what question to ask, the mode of response, the weighting of the item, etc.) you’re making a subjective choice about the signal you’re sending to students about what you value. If the student considers you to be a proxy of the profession/institution, then you’re subconsciously signalling the values of the profession/institution.

If you’re interested in the topic of subjectivity in assessment, you may be interested in two of our In Beta episodes:

Categories
AI education

We Need Transparency in Algorithms, But Too Much Can Backfire

The students had also been asked what grade they thought they would get, and it turned out that levels of trust in those students whose actual grades hit or exceeded that estimate were unaffected by transparency. But people whose expectations were violated – students who received lower scores than they expected – trusted the algorithm more when they got more of an explanation of how it worked. This was interesting for two reasons: it confirmed a human tendency to apply greater scrutiny to information when expectations are violated. And it showed that the distrust that might accompany negative or disappointing results can be alleviated if people believe that the underlying process is fair.

Source: We Need Transparency in Algorithms, But Too Much Can Backfire

This article uses the example of algorithmic grading of student work to discuss issues of trust and transparency. One of the findings I thought was a useful takeaway in this context is that full transparency may not be the goal, but that we should rather aim for medium transparency and only in situations where students’ expectations are not met. For example, a student who’s grade was lower than expected might need to be told something about how it was calculated. But when they got too much information it eroded trust in the algorithm completely. When students got the grade they expected then no transparency was needed at all i.e. they didn’t care how the grade was calculated.

For developers of algorithms, the article also provides a short summary of what explainable AI might look like. For example, without exposing the underlying source code, which in many cases is proprietary and holds commercial value for the company, explainable AI might simply identify the relationships between inputs and outcomes, highlight possible biases, and provide guidance that may help to address potential problems in the algorithm.

Categories
education ethics students technology

Critical digital pedagogy in the classroom: Practical implementation

Update (12-02-18): You can now download the full chapter here (A critical pedagogy for online learning in physiotherapy education) and the edited collection here.

This post is inspired by the work I’ve recently done for a book chapter, as well as several articles on Hybrid Pedagogy but in particular, Adam Heidebrink-Bruno’s Syllabus as Manifesto. I’ve been wanting to make some changes to my Professional Ethics module for a while and the past few weeks have really given me a lot to think about. Critical pedagogy is an approach to teaching and learning that not only puts the student at the centre of the classroom but then helps them to figure out what to do now that they’re there. It also pushes teachers to go beyond the default configurations of classroom spaces. Critical digital pedagogy is when we use technology to do things that are difficult or impossible in those spaces without it.

One of the first things we do in each module we teach is provide students with a course overview, or syllabus. We don’t even think about it but this document might be the first bit of insight into how we define the space we’re going to occupy with our students. How much thought do we really give to the language and structure of the document? How much of it is informed by the students’ voice? I wondered what my own syllabus would look like if I took to heart Jesse Stommel’s suggestion that we “begin by trusting students”.

I wanted to find out more about where my students come from, so I created a shared Google Doc with a very basic outline of what information needed to be included in a syllabus. I asked them to begin by anonymously sharing something about themselves that they hadn’t shared with anyone else in the class before. Something that influenced who they are and how they came to be in that class. I took what they shared, edited it and created the Preamble to our course outline, describing our group and our context. I also added my own background to the document, sharing my own values, beliefs and background, as well as positioning myself and my biases up front. I wanted to let them know that, as I ask them to share something of themselves, so will I do the same.

The next thing were the learning outcomes for the modules. We say that we want our students to take responsibility for their learning but we set up the entire programme without any input from them. We decide what they will learn based on the outcomes we define, as well as how it will be assessed. So for this syllabus I included the outcomes that we have to have and then I asked the students to each define what “success” looks like in this module for them. Each student described what they wanted to achieve by the end of the year, wrote it as a learning outcome, decided on the indicators of progress they needed, and then set timelines for completion. So each of them would have the learning outcomes that the institution and professional body requires, plus one. I think that this goes some way toward acknowledging the unique context of each student, and also gives them skills in evaluating their own development towards goals that they set that are personally meaningful.

I’ve also decided that the students will decide their own marks for these personal outcomes. At the end of the year they will evaluate their progress against the performance indicators that they have defined, and give themselves a grade that will count 10% towards their Continuous Assessment mark. This decision was inspired by this post on contract grading from HASTAC. What I’m doing isn’t exactly the same thing but it’s a similar concept in that students not only define what is important to them, but decide on the grade they earn. I’m not 100% how this will work in practice, but I’m leaning towards a shared document where students will do peer review on each other’s outcomes and progress. I’m interested to see what a student-led, student-graded, student-taught learning outcome looks like.

Something that is usually pretty concrete in any course is the content. But many concepts can actually be taught in a wide variety of ways and we just choose the ones that we’re most familiar with. For example the concept of justice (fairness) could be discussed using a history of the profession, resource allocation for patients, Apartheid in South Africa, public and private health systems, and so on. In the same shared document I asked students to suggest topics they’d like to cover in the module. I asked them to suggest the things that interest them, and I’d figure out how to teach concepts from professional ethics in those contexts. This is what they added: Income inequality. Segregation. #FeesMustFall. Can ethics be taught? The death penalty. Institutional racism. Losing a patient. That’s a pretty good range of topics that will enable me to cover quite a bit of the work in the module. It’s also more likely that students will engage considering that these are the things they’ve identified as being interesting.

Another area that we have completely covered as teachers is assessment. We decide what will be assessed, when the assessment happens, how it is graded, what formats we’ll accept…we even go so far as to tell students where to put the full stops and commas in their referencing lists. That’s a pretty deep level of control we’re exerting. I’ve been using a portfolio for assessment in this module for a few years so I’m at a point where I’m comfortable with students submitting a variety of different pieces. What I’m doing differently this year is asking the students to submit each task when it’s ready rather than for some arbitrary deadline. They get to choose when it suits them to do the work, but I have asked them to be reasonable with this, mainly because if I’m going to give them decent feedback I need time before their next piece arrives. If they’re submitted all at once, there’s no time to use the feedback to improve their next submission.

The students then decided what our “rules of engagement” would be in the classroom. Our module guides usually have some kind of prescription about what behaviour is expected, so I asked the students what they thought appropriate behaviour looks like and then to commit as a class to those rules. Unsurprisingly, their suggestions looked a lot like it would have if I had written it myself. Then I asked them to decide how to address situations when individuals contravened our rules. I don’t want to be the policeman who has to discipline students…what would it look like if students decided in advance what would work in their classroom, and then took action when necessary? I’m pretty excited to find out.

I decided that there would be no notes provided for this module, and no textbook either. I prepare the lecture outline in a shared Google document, including whatever writing assignments the students need to work on and links to open access resources that are relevant for the topic. The students take notes collaboratively in the document, which I review afterwards. I add comments and structure to their notes, and point them to additional resources. Together, we will come up with something unique describing our time together. Even if the topic is static our conversations never are, so any set of notes that focuses only on the topic is going to necessarily leave out the sometimes wonderful discussion that happens in class. This way, the students get the main ideas that are covered, but we also capture the conversation, which I can supplement afterwards.

Finally, I’ve set up a module evaluation form that is open for comment immediately and committed to having it stay open for the duration of the year. The problem with module evaluations is that we ask students to complete them at the end of the year, when they’re finished and have no opportunity to benefit from their suggestions. I wouldn’t fill it in either. This way, students get to evaluate me and the module at any time, and I get feedback that I can act on immediately. I use a simple Google Form that they can access quickly and easily, with a couple of rating scales and an option to add an open-ended comment. I’m hoping that this ongoing evaluation option in a format that is convenient for students means that they will make use of it to improve our time together.

As we worked through the document I could see students really struggling with the idea that they were being asked to contribute to the structure of the module. Even as they commented on each other’s suggestions for the module, there was an uncertainty there. It took a while for them to be comfortable saying what they wanted. Not just contributing with their physical presence in the classroom, but to really contribute in designing the module; how it would be run, how they would be assessed, how they could “be” in the classroom. I’m not sure how this is going to work out but I felt a level of enthusiasm and energy that I haven’t felt before. I felt a glimmer of something real as they started to take seriously my offer to take them seriously.

The choices above demonstrate a few very powerful additions to the other ways that we integrate technology into this module (the students portfolios are all on the IEP blog, they do collaborative authoring and peer review in Google Drive, course resources are shared in Drive, they do digital stories for one of the portfolio submissions, and occasionally we use Twitter for sharing interesting stories). It makes it very clear to the students that this is their classroom and their learning experiences. I’m a facilitator but they get to make real choices that have a real impact in the world. They get to understand and get a sense of what it feels like to have power and authority, as well as the responsibility that comes with that.

Categories
assessment

Public posting of marks

My university has a policy where the marks for each assessment task are posted – anonymously – on the departmental notice board. I think it goes back to a time when students were not automatically notified by email and individual notifications of grades would have been too time consuming. Now that our students get their marks as soon as they are captured in the system, I asked myself why we still bother to post the marks publicly.

I can’t think of a single reason why we should. What is the benefit of posting a list of marks where students are ranked against how others performed in the assessment? It has no value – as far as I can tell – for learning. No value for self-esteem (unless you’re performing in the higher percentile). No value for the institution or teacher. So why do we still do it?

I conducted a short poll among my final year ethics students asking them if they wanted me to continue posting their marks in public. See below for their responses.

selection_001

Moving forward, I will no longer post my students marks in public nor will I publish class averages, unless specifically requested to do so. If I’m going to say that I’m assessing students against a set of criteria rather than against each other, I need to have my practice mirror this. How are students supposed to develop empathy when we constantly remind them that they’re in competition with each other?

Categories
assessment

Interrogating the mistakes

We tend to focus our attention on the things that students got right. This seems perfectly appropriate at first glance because we want to celebrate what they know. Their grades are reported in such a way as to highlight the number of questions answered correctly. The cut score (pass mark) is set based on what we (often arbitrarily) decide a reasonably competent student should know (there is no basis for setting 50% as the cut score, but that’s for another post). The emphasis is always on what is known rather than what is not known.

But if you think about it getting the right answer is a bit of a dead end as far as learning is concerned. There’s nowhere to go from there. But the wrong answer opens up a whole world of possibility. If the capacity to learn and move forward sits in the spaces taken up by faulty reasoning shouldn’t we pay more attention to the errors that students make? The mistakes give us a starting point from which to proceed with learning.

What if we changed our emphasis in the curriculum to focus attention on the things that students don’t understand? Instead of celebrating the points they scored for getting the right answer could we pay closer attention to the areas where they lost marks? And not in a negative way that makes students feel inferior or stupid. I’m talking about actually celebrating the wrong answers because it gives us a starting point and a direction to move. “You got that wrong. Great! Let’s talk about it. What was the first thing you thought when you read the question? Why did you say that? Did you consider this other option? What is the logical end point of the reasoning you used? Do you see now how your answer can’t be correct?” Imagine a conversation going like that. Imagine what it would mean for students’ ability to reflect on their thinking and practice.

We might end up with some powerful shared learning experiences as we get into students’ heads as we try to understand what and how they think. The faulty reasoning that got them to the wrong answer is way more interesting than the correct reasoning that got them to the right answer. A focus on the mistakes that they make would actually help improve students ability to learn in the future because you’d be helping to correct their faulty reasoning.

But we don’t do this. We focus on counting up the the right answers and celebrating them, which means that we deflect attention from the wrong answers. We make implicit the idea that getting the right answer is important and the getting the wrong answers are bad. But learning only happens when we interrogate the faulty reasoning that got us to the wrong answer.

Categories
assessment clinical physiotherapy

How my students do case studies in clinical practice

Our students do small case studies as part of their clinical practice rotations. The basic idea is that they need to identify a problem with their own practice; something that they want to improve. They describe the problem in the context of a case study which gives them a framework to approach the problem like a research project. In this post I’ll talk about the process we use for designing, implementing, drafting and grading these case studies.

There are a few things that I consider to be novel in the following approach:

  1. The case studies are about improving future clinical practice, and as such are linked to students’ practices i.e. what they do and how they think
  2. Students are the case study participants i.e. they are conducting research on themselves
  3. We shift the emphasis away from a narrow definition of “The Evidence” (i.e. journal articles) and encourage students to get creative ideas from other areas of practice
  4. The grading process has features that develop students’ knowledge and skills beyond “Conducting case study research in a clinical practice module”

Design

Early on in their clinical practice rotations, the students identify an aspect of that block that they want to learn more about. We discuss the kinds of questions they want to answer, both in class and by email. Once the topic and question are agreed, they do mini “literature” reviews (3-5 sources that may include academic journals, blogs, YouTube videos, Pinterest boards…whatever) to explore the problem as described by others. They also use the literature to identify possible solutions to their problems, which then get incorporated into the Method. They must also identify what “data” they will use to determine an improvement in their performance. They can use anything from personal reflections to grades to perceived level of comfort…anything that allows them to somehow say that their practice is getting better.

Implementation and drafting of early case studies

Then they try an intervention – on themselves, because this is about improving their own practice – and gather data to analyse as part of describing a change in practice or thinking.  They must also try to develop a general principle from the case study that they can apply to other clinical contexts. I give feedback on the initial questions and comment on early drafts to guide the projects and also give them the rubric that will be used to grade their work.

Examples of case studies from last semester include:

  • Exploring the impact of meditation and breathing techniques to lower stress before and during clinical exams, using heart rate as a proxy for stress – and learning that taking a moment to breathe can help with feeling more relaxed during an exam.
  • The challenges of communicating with a patient who has expressive aphasia – and learning that the commonly suggested alternatives are often 1) very slow, 2) frustrating, and 3) not very effective.
  • Testing their own visual estimation of ROM against a smartphone app – and learning that visual estimation is (surprise) pretty poor.
  • Exploring the impact of speaking to a patient in their own language on developing rapport – and learning that spending 30 minutes every day learning a few new Xhosa words made a huge difference to how likely the patient was to agree to physio.

Submission and peer grading

Students submit hard copies to me so that I can make sure all submissions are in. Then I take the hard copies to class and randomly assign 1 case study to each student. They pair up (Reviewer 1 and 2) and we go through the case studies together, using the rubric as a guide. I think out loud about each section of the rubric, explaining what I’m looking for in each section and why it’s important for clinical practice. For example, if we’re looking at the “Language” section I explain why clarity of expression is important for describing clinical presentations, and why conciseness allows them to practice conveying complex ideas quickly (useful for ward rounds and meetings). Spelling and grammar are important, as is legibility, to ensure that your work is clearly understandable to others in the team. I go through these rationales while the students are marking and giving feedback on the case studies in front of them.

Then they swap case studies and fill out another rubric for the case study that their team member has just completed. We go through the process again, and I encourage them to look for additional places to comment on the case study. Once that’s done they compare their rubrics for the two case studies in their team, explaining why certain marks and comments were given for certain sections. They don’t have to agree on the exact mark but they do have to come to consensus over whether each section of the work is “Poor”, “Satisfactory” or “Good”. Then they average their marks and submit it to me again.

I take all the case studies with their 2 sets of comments (on the rubric) and feedback (on the case study itself) and I go through them all myself. This means I can focus on more abstract feedback (e.g. appropriateness of the question, analysis, ethics, etc.) because the students have already commented on much of the structural, grammatical and content-related issues.

Outcomes of the process

For me, the following outcomes of the process are important to note:

  1. Students learn how to identify an area of their own clinical practice that they want to improve. It’s not us telling them what they’re doing wrong. If we want lifelong learning to happen, our students must know how to identify areas for improvement.
  2. They take definite steps towards achieving those improvements because the case study requires them to implement an intervention. “Learning” becomes synonymous with “doing” i.e. they must take concrete steps towards addressing the problem they identified.
  3. Students develop the skills they need to find answers to questions they have about their own practice. Students learn how to regulate their own learning.
  4. Each student gets 3 sets of feedback on their case study. It’s not just me – the external “expert” – telling them how to improve, it’s their peers as well.
  5. Students get exposed to a variety of other case studies across a spectrum of quality. The peer reviewers need to know what a “good” case study looks like in order to grade one. They learn what their next case study should look like.
  6. The marking time for 54 case studies goes down from about 10 hours (I give a lot of feedback) to about 3 hours. I don’t have to give feedback on everything because almost all of the common errors are already identified and highlighted.
  7. Students learn how I think when I’m marking their work, which helps them to make different choices for the next case study. This process allows them access to how I think about case study research in clinical practice, which means they are more likely to improve their next submission, knowing what I’m looking for.

In terms of the reliability of the peer marking and feedback, I noted the following when I reviewed the peer feedback and grades from earlier in the year:

  • 15 (28%) students’ marks went up when I compared my mark with the peer average, 7 (13%) students’ marks went up by 5% or more, and 4 (7%) students went from “Fail” to “Pass”.
  • 7 (13%) students’ marks went down, 3 (6%) by 5% or more, and 0 students went from “Pass” to “Fail”.
  • 28 (52%) students’ marks stayed the same.

The points I take from the above is that it’s really important for me to review the marks and that I have a tendency to be more lenient with marking; more students had mark increases and only 3 students’ marks went down by what I would consider a significant amount. And finally, more than half the students didn’t get a mark change at all, which is pretty good when you think about it.

 

 

Categories
assessment

How do we choose what to assess?

Assessing content (facts) for the sake of it – for the most part – is a useless activity because it tells us almost nothing about how students can use the facts to achieve meaningful objectives. On the other hand, how do you assess students’ ability to apply what they’ve learned? The first is easy (i.e. assessing content and recall), while the second is very difficult (i.e. assessing how students work with ideas). If we’re honest with ourselves, we have a tendency to assess what is easy to assess, rather than what we should assess.

You can argue that your assessment is valid i.e. that you are, in fact, assessing what you say you’re assessing. However, even if the assessment is valid, it may not be appropriate. In other words, your assessment tasks might match your learning outcomes (i.e. they are valid) but are you questioning your outcomes to make sure that they’re the right outcomes?

Are we assessing the things that matter?