Categories
AI

Comment: There’s a new obstacle to getting a job after college: Getting approved by AI

Companies may not be ready to outsource vetting candidates for C-Suite and executive positions to algorithms, but the stakes are lower for entry-level roles and internships. That means some of today’s college students are effectively the guinea pigs for a largely unproven mechanism for evaluating applicants.

Metz, R. (2019). There’s a new obstacle to getting a job after college: Getting approved by AI. CNN Business.

I agree with the concern that we don’t have a good idea of how well these algorithms will work when it comes to narrowing the field of potential interviewees for a post. However, I think that it can’t be any worse than what currently happens.

We already know that unstructured interviews by human beings are completely unreliable predictors of future performance (structured interviews seem to work better but the improvements in validity are marginal…better than chance but not by much). What if we find out that AI is at least reliable? At first glance, the idea that an AI-based system will screen candidates to narrow the pool of applicants seems unfair but we already know that being screened and interviewed by a human being is also unfair. So a human interview panel is likely to be both invalid and unreliable, whereas a computer might at least be reliable. Although I suspect the AI will also be a better predictor of performance than human beings, because it’ll probably be less likely to be influenced by irrelevant factors.

For me, this seems to be another example of having different expectations for outcomes, where an AI has to be perfect but a human being gets a pass. Self-driving cars are the same; they have to demonstrate near perfect reliability, whereas human drivers are responsible for the preventable deaths of tens of thousands of poeple every year.

Categories
assessment curriculum education research teaching

SAFRI 2011 (session 2) – day 4

Reliability and validity

Validity

Important for assessment, not only for research

It’s the scores that are valid and reliable, not the instrument

Sometimes the whole is greater than the sum of the parts e.g. when a student gets all the check marks but doesn’t perform competently overall e.g. the examiner can tick each competency being assessed but the student doesn’t establish rapport with the patient. Difficult to address

What does the score mean?

Students are efficient in the use of their time i.e. they will study what is being assessed because the inference is that we’re assessing what is important

Validity can be framed as an “argument / defense” proposition

Our Ethics exam is a problem of validity. Written tests measure knowledge, not behaviour e.g. students can know and report exactly what informed consent is and how to go about getting it, but may not pay it any attention in practice. How do we make the Ethics assessment more valid?

Face” validity doesn’t exist, it’s more accurately termed “content” validity. “Face” validity basically amounts to saying that something looks OK

What are the important things to score? Who determines what is important?

There are some things that standardised patients can’t do well e.g. trauma

Assessment should sample more broadly from a domain. This improves validity and also students don’t feel like they’ve wasted their time studying things that aren’t assessed. The more assessment items we include, the more valid the results

Scores drop if timing of assessment is inappropriate e.g. too much or too little time → lower scores as students either rush or try to “fill” the time something that isn’t appropriate for the assessment

First round scores in OSCEs are often lower then later rounds

Even though the assessment is meant to indicate competence, there’s actually no way to predict if practitioners are actually competent

Students really do want to learn!

Reliability

We want to ensure that a students observed score is a reasonable reflection of their “true ability”

In reliability assessments, how do you reduce the learning that occurs between assessments?

In OSCEs, use as many cases / stations as you can, and have different assessor for each station. This is the most effective rating design

We did a long session on standard setting, which was fascinating especially when it came to having to defend the cut-scores of exams i.e. what criteria do we use to say that 50% (or 60 or 70) is the pass mark? What data do we have to defend that standard?

Didn’t even realise that this was something to be considered, good to know that methods exist to use data to substantiate decisions made with regards to standards that are set (e.g. Angoff Method)

Should students be able to compensate for poor scores in one area, with good scores in another. Should they have to pass every section that we identify as being important? If it’s not important, why is it being assessed?

Norm-referenced critera are not particularly useful to determine competence. Standards should be set according to competence, not according to the performance of others

Standard setting panels shouldn’t give input on the quality of the assessment items

You can use standard setting to lower the pass mark in a difficult assessment, and to raise the pass mark in an easier exam

Alignment of expectations with actual performance

Setting up an OSCE

  • Design
  • Evaluate
  • Logistics

Standardised, compartmentalised (i.e. not holistic), variables removed / controlled, predetermined standards, variety of methods

Competencies broken into components

Is at the “shows how” part of Miller’s pyramd (Miller, 1990, The assessment of clinical skills, Academic Medicine, 65; S63-S67)

Design an OSCE, using the following guidelines:

  • Summative assessment for undergraduate students
  • Communication skill
  • Objective
  • Instructions (student, examiner, standardised patient)
  • Score sheet
  • Equipment list

Criticise the OSCE stations of another group

 

Assessing clinical performance

Looked at using mini-CEX (clinical evaluation exercise)

Useful for formative assessment

Avoid making judgements too soon → your impression may change over time

 

Categories
education students

Assessment in an outcomes based curriculum

I attended a seminar / short course on campus yesterday, presented by Prof. Chrissie Boughey from Rhodes University. She spoke about the role of assessment in curriculum development and the link between teaching and assessing. Here are the notes I took.

Assessment is the most important factor in improving learning because we get back what we test. Therefore assessment is acknowledged as a driver of the quality of learning.

Currently, most assessment tasks encourage the reproduction of content, whereas we should rather be looking for the production of new knowledge (the analyse, evaluate and create parts of Bloom’s top level cognitive processes).

Practical exercise: Pick a course / module / subject you currently teach (Professional Ethics for Physiotherapists), think about how you assess it (Assignment, Test, Self-study, Guided reflection, Written exam) and finally, what you think you’re assessing (Critical thinking / Analysis around ethical dilemmas in healthcare, Application of theory to clinical practice). I went on to identify the following problems with assessment in the current module:

  • I have difficulty assigning a quantitative grade to what is generally a qualitative concept
  • There is little scope in the current assessment structure for a creative approach

This led to a discussion about formal university structures that determine things like, how subjects will be assessed, as well as the regimes of teaching and learning (“we do it this way because this is the way it’s always been done”). Do they remove your autonomy? It made me wonder what our university official assessment policy is.

Construct validity: Are we using assessment to asses something other than what we say we’re assessing? If so, what are we actually assessing?

There was also a question about whether or not we could / should asses only what’s been formally covered in class. How do you / should you asses knowledge that is self-taught? We could for example, measure the process of learning, rather than the product. I made a point that in certain areas of what I teach, I no longer assign a grade to an individual peice of work and rather give a mark for the progress that the student has made, based on feedback and group discussion in that area.

Outcomes based assessment / criterion referenced assessment

  1. Uses the principle of ALIGNMENT (aligning learning outcomes, passing criteria, assessment)
  2. Is assessing what students should be able to do
  3. “Design down” is possible when you have standardised exit level outcomes (we do, prescribed by the HPCSA)
  4. The actual criteria are able to be observed and are not a guess at a mental process, “this is what I need to see in order to know that the student can do it”
  5. Choosing the assessment tasks answers the question “How will I provide opportunities for students to demonstrate what I need to see?” When this is the starting point, it knocks everything else out of alignment
  6. You need space for students / teachers to engage with the course content and to negotiate meaning or understanding of the course requirements, “Where can they demonstrate competence?”

Criteria are negotiable and form the basis of assessment. They should be public, which makes educators accountable.

When designing outcomes, the process should be fluid and dynamic.

Had an interesting conversation about the priviliged place of writing in assessment. What about other expressions of competence? Since speech is the primary form of communication (we learn to speak before we learn to write), we find it easier to convey ideas through conversation, as it includes other cues that we use to construct meaning. Writing is a more difficult form because we lack visual (and other) cues. Drafting is one way that constructing meaning through writing could be made easier. The other point I thought was interesting was that academic writing is communal (drafting, editors, reviewers all provide a feedback mechanism that isn’t as fluid as speech, but is helpful nonetheless), but we often don’t allow students to write communally.

Outcomes based assessment focusses on providing students with multiple opportunities to practice what they need to do, and the provision of feedback on that practice (formative). Eventually, students must demonstrate achievement (summative).

We should only assign marks when we evaluate performace against the course outcomes.

Finally, in thinking about the written exam as a form of assessment, we identified these characteristics:

  • It is isolated and individual
  • There is a time constraint
  • There is pressure to pass or fail

None of these characteristics are present in general physiotherapy practice. We can always ask a colleage / go to the literature for assistance. There is no constraint to have the patient fully rehabilitated by any set time, and there are no pass or fail criteria.

If assessment is a method we use to determine competence to perform a given task, and the way we asses isn’t related to the task physio students will one day perform, are we assessing them appropriately?

Note: the practical outcomes of this session will include the following:

  • Changing the final assessment of the Ethics module from a written exam to a portfolio presentation
  • Rewriting the learning outcomes of the module descriptors at this year’s planning meeting
  • Evaluating the criteria I use to mark my assignments to better reflect the module outcomes