The artificial intelligence (AI) landscape has evolved significantly from 1950 when Alan Turing first posed the question of whether machines can think. Today, AI is transforming societies and economies. It promises to generate productivity gains, improve well-being and help address global challenges, such as climate change, resource scarcity and health crises. Yet, as AI applications are adopted around the world, their use can raise questions and challenges related to human values, fairness, human determination, privacy, safety and accountability, among others. This report helps build a shared understanding of AI in the present and near-term by mapping the AI technical, economic, use case and policy landscape and identifying major public policy considerations. It is also intended to help co-ordination and consistency with discussions in other national and international fora.
OECD (n.d.). Artificial intelligence in society.
I thought that this was a decent overview of the state of AI and it’s potential to influence a variety of different aspects of society. Unfortunatley, the PDF is behind a paywall but you can read the full text online. The Executive Summary is a handy introduction to the rest of the book.
In this study, we explored the availability and characteristics of the assisting tools for the peer-reviewing process. The aim was to provide a more comprehensive understanding of the tools available at this time, and to hint at new trends for further developments…. Considering these categories and their defining traits, a curated list of 220 software tools was completed using a crowdfunded database to identify relevant programs and ongoing trends and perspectives of tools developed and used by scholars.
This article provides a nice overview of the software tools and services that are available for authors, from the early stages of the writing process, all the way through to dissemination of your research more broadly. Along the way the authors also highlight some of the challenges and concerns with the publication process, including issues around peer review and bias.
This classification of the services is divided into the following nine categories:
Identification and social media: Researcher identity and community building within areas of practice.
Academic search engines: Literature searching, open access, organisation of sources.
Journal-abstract matchmakers: Choosing a journal based on links between their scope and the article you’re writing.
Collaborative text editors: Writing with others and enhancing the writing experience by exploring different ways to think about writing.
Data visualization and analysis tools: Matching data visualisation to purpose, and alternatives to the “2 tables, 1 figure” limitations of print publication.
Reference management: Features beyond simply keeping track of PDFs and folders; export, conversion between citation styles, cross-platform options, collaborating on citation.
Proofreading and plagiarism detection: Increasingly sophisticated writing assistants that identify issues with writing and suggest alternatives.
Data archiving: Persistent digital datasets, metadata, discoverability, DOIs, archival services.
Scientometrics and Altmetrics: Alternatives to citation and impact factor as means of evaluating influence and reach.
There’s an enormous amount of information packed into this article and I found myself with loads of tabs open as I explored different platforms and services. I spend a lot of time thinking about writing, workflow and compatability, and this paper gave me even more to think about. If you’re fine with Word and don’t really get why anyone would need anything else, you probably don’t need to read this paper. But if you’re like me and get irritated because Word doesn’t have a “distraction free mode”, you may find yourself spending a couple of hours exploring options you didn’t know existed.
Note: I’m the editor and founder of OpenPhysio, an open-access, peer-reviewed online journal with a focus on physiotherapy education. If you’re doing interesting work in the classroom, even if you have no experience in publishing educational research, we’d like to help you share your stories.
There’s a lot going on in this article, some of which I agree with and some of which I think is not useful. For me, the takeaway is that AI-based systems in the workplace really have the potential to improve our interactions with each other, but that there will be powerful incentives to use them for surveillance of employees.
The article focuses on software that analyses the conversation between call centre agents and customers and provides on screen guidance to the agent on how to “improve” the quality of the interaction. Using natural language processing to provide real-time feedback to call centre workers is, in my opinion, more like coaching than having an AI as “your boss”. We’re all biased, forgetful, get tired, have bad days, etc. and I think that a system that helped me to get around those issues would be useful.
The article presents this as some kind of dystopia where our decision making (or performance, or behaviour) will be subject to algorithmic manipulation. There are two things to note here:
We’re already subject to algorithmic manipulation (see Twitter, Netflix, email marketing, etc.);
Sometimes I want my performance to be optimised. When I’m running I get constant feedback on my pace, heart rate, distance, etc. all of which give me a sense of whether or not I’m working in an optimal zone for improving my cardiac fitness. Then I choose whether or not to adjust my pace, based on that real-time feedback.
Having said that, there are other aspects of the programme that move us into a more problematic scenario, which is where your performance and behaviours (e.g. did you minimise the feedback and ignore it) are reported to a supervisor, which may or may not influence your continued employment. This feels more like surveillance than coaching, where employees are less likely to use the system to improve their performace, and more likely to figure out how to avoid the punishment. When the aim of the system is to improve the relationship or interaction with customers it’s easier to get behind. But when it moves into judgement, it becomes more difficult to support.
This brings me to another aspect of the story that’s problematic; when algorithms evaluate performance against a set of metrics that are undefined or invisible to the user (i.e. you don’t know what you’re being compared to) and then the algorithm makes a decision independently that has a real world consequence (e.g. you get fired because you’re “underperforming”). If supervisors regard the information from the system as representing some kind of ground truth and use it for their own decision making, it’s likely to have negative consequences. For example, when employees are ranked from “most productive” to “least productive” based on some set of criteria that were easy to optimise for but which may have limited validity, and this output is simpley accepted as “the truth”, then it will essentially be the system making the decision rather than the supervisor.
But framing the problem as if it’s the algorithms that are the issue – “automated systems can dehumanize and unfairly punish employees” – misses the point that it’s human beings who are actually acting with agency in the real world. Unless we’re able to help people figure out how to use the information provided by algorithms, and understand that they don’t represent ground truth, we’re going to see more and more examples of people being taken out of the loop, with damaging consequences.
There’s another aspect of the story that I found worrying and it’s about the relationship between training data and user behaviour. In the example of the AI system that gives the user feedback on the quality of the conversation with customer, the system uses different criteria to come up with an empathy score. When the agent scores low on empathy, the system suggests that they need to be more empathic. However, the way to do this is, apparently, to “mirror the customers mood”, which seems problematic for a few reasons:
If the customer is angry, should the agent reflect that anger back to them?
How do you determine the customer’s and agent’s moods?
Savvy employees will focus on getting higher empathy scores by using a check list to work through the variables that the AI uses to calculate the score. But as supervisor you don’t care about the empathy score, you care about satisfied customers. (See this earlier post about encouraging people to aim for higher scores on metrics, rather than the actual outcomes you care about).
Using AI to correct for human biases is a good thing. But as more AI enters the workplace, executives will have to resist the temptation to use it to tighten their grip on their workers and subject them to constant surveillance and analysis.
We used a machine learning approach to test the uniqueness and robustness of muscle activation patterns. Our results show that activation patterns not only vary between individuals, but are unique to each individual. Individual differences should, therefore, be considered relevant information for addressing fundamental questions about the control of movement.
Machine learning algorithms have been able to identify unique individuals based on their gait pattern for a while. Now we have this study showing that ML can identify individuals from their unique muscle activation patterns. For me the main takeaway is that technology has a level of insight into our bodies that is just going to keep getting better.
As much as we may think that our observations, palpation, and special tests give us useful information to integrate into patient management it’s not even close to the level of detail we can get from machines. I’m fairly convinced that pretty soon we’ll start seeing studies exploring what aspects of physiotherapy assessment are more accurate when conducted by algorithms.
I’m reading the collection of responses to John Brockman’s 2015 Edge.org question: What to think about machines that think and wanted to share an idea highlighted by Peter Norvig in his short essay called “Design machines to deal with the world’s complexity”.
Pessimists warn that we don’t know how to safely and reliably build large, complex AI systems. They have a valid point. We also don’t know how to safely and reliably build large, complex non-AI systems. We need to do better at predicting, controlling, and mitigating the unintended consequences of the systems we build.
For example, we invented the internal combustion engine 150 years ago, and in many ways it has served humanity well, but it has also led to widespread pollution, political instability over access to oil, more than a million traffic deaths per year, and (some say) a deterioration in the social cohesiveness of neighborhoods.
Norvig, P. (2015). Design machines to deal with the world’s complexity. In, Brockman, J. What to think about machines that think.
There’s a lot of justified concern about how we’re going to use AI in society in general, and healthcare in particular, but I think it’s important to point out that it does us no good to blame algorithms as if they had any agency (I’m talking about narrow, or weak AI, rather than artificial general intelligence, which will almost certainly have agency).
It’s human beings who will make choices about how this technology is used and, as with previous decisions, it’s likely that those choices will have unintended consequences. The next time you read a headline decrying the dangers presented by AI, take a moment to reflect on the dangers presented by human beings.
You can see the entirety of Norvig’s contribution here (all of the Edge.org responses are public), although note that the book chapters have different titles to the original contributions.
Across a variety of medical decisions ranging from prevention to diagnosis to treatment, we document a robust reluctance to use medical care delivered by AI providers rather than comparable human providers.
Whereas much is known about medical AI’s accuracy, cost-efficiency, and scalability, little is known about patients’ receptivity to medical AI. Yet patients are the ultimate consumers of medical AI, and will determine its adoption and implementation both directly and indirectly.
This is a long paper analysing 9 studies that look at patient preferences when comparing health services that are either automated or provided by human beings. I think it’s an important article that covers a wide range of factors that need to be considered in the context of clinical AI. We’re spending a lot of money on research and development into AI-based interventions but we know almost nothing about how patients will engage with it.
Note: This is a nice idea for a study looking at patient preferences in rehabilitation contexts where we’re likely to see the introduction of robots, for example. I’d be interested to know if there are any differences across geography, culture, etc. Let me know if you’re keen to collaborate.
Stuart Russell’s newest work, Human Compatible: Artificial Intelligence and the Problem of Control, is a cornerstone piece, alongside Superintelligence and Life 3.0, that articulates the civilization-scale problem we face of aligning machine intelligence with human goals and values. Not only is this a further articulation and development of the AI alignment problem, but Stuart also proposes a novel solution which bring us to a better understanding of what it will take to create beneficial machine intelligence.
It’s really hard to specify in advance what we mean when we say “human values” because it’s something that’s likely to be different depending on which humans we ask. This is a significant problem in health systems when clinical AI will increasingly make decisions that affect patient outcomes, considering all the points within that system where ethical judgement influences the choices being made. For example:
Micro: What is the likely prognosis for this patient? Do we keep them in the expensive ICU considering that the likelihood of survival is 37%, or do we move them onto the ward? Or send them home for palliative care? These all have cost implications that are weighted differently depending on the confidence we have in the predicted prognosis.
Macro: How are national health budgets developed? Do we invest more in infrastructure that is high impact (saves lives, usually in younger patients) but which touches relatively few people, or in services (like physiotherapy) that help many more patients improve quality of life but who may be unlikely to contribute to the state’s revenue base?
In the context of tool AI it’s relatively simple to specify what the utility function should be. In other words we can be quite confident that we can simply tell the system what the goal is and then reward it when it achieves that goal. As Russell says, “this works when machines are stupid.” If the AI gets the goal wrong it’s not a big deal because we can reset it and then try to figure out where the mistake happened. Over time we can keep reiterating until the goal that’s achieved by the system starts to approximate the goal we care about.
But at some point we’re going to move towards clinical AI that makes a decision and then acts on it, which is where we need to have a lot more trust that the system is making the “right choice”. In this context, “right” means a choice that’s aligned with human values. For example, we may decide that in certain contexts the cost of an intervention shouldn’t be considered (because it’s the outcome we care about and not the expense), whereas in other contexts we really do want to say that certain interventions are too expensive relative to the expected outcomes.
Since we can’t specify up front what the “correct” decision in certain kinds of ethical scenarios should be (because the answer is almost always, “it depends”) we need to make sure that clinical AI really is aligned with what we care about. But, if we can’t use formal rules to determine how AI should integrate human values into its decision-making then how do we move towards a point where we can trust the decisions – and actions – taken by machines?
Russell suggests that, rather than begin with the premise that the AI has perfect knowledge of the world and of our preferences, we could begin with an AI that only knows something about our contextual preferences but that it doesn’t understand them. In this context the AI model only has imperfect or partial knowledge of the objective, which means that it can never be certain of whether it has achieved it. This may lead to situations where the AI must always first check in with a human being because it never knows what the full objective is or if it has been achieved.
Instead of building AI that is convinced of the correctness of its knowledge and actions, Russell suggests that we build doubt into our AI-based systems. Considering the high value of doubt in good decision-making, this is probably a good idea.
…when I started reading about AI safety, I was great at finding reasons to dismiss it. I had already decided that AI couldn’t be a big danger because that just sounded bizarre, and then I searched for ways to justify my unconcern, quite certain that I would find one soon enough.
I thought I’d provide some light reading for the weekend by sharing this post about the extinction risks presented by our drive to build Artificial General Intelligence (AGI). Even if you don’t think that AGI is a thing we need to care about you should read this anyway; if nothing else you’ll get some insight into what concerns people about the long-term future of humanity.
The post I link to above presents arguments in support of the following claims and then provides short responses to common rebuttals:
Humans will eventually make a human-level intelligence that pursues goals.
That intelligence will quickly surpass human-level intelligence.
At that point, it will be very hard to keep it from connecting to the Internet.
Most goals, when pursued efficiently by an AI connected to the Internet, result in the extinction of biological life.
Most goals that preserve human existence still would not preserve freedom, autonomy, and a number of other things we value.
It is profoundly difficult to give an AI a goal such that it would preserve the things we care about, we can’t even check if a potential goal would be safe, and we have to get AI right on the first attempt.
If someone makes human-level-AI before anyone makes human-level-AI-with-a-safe-goal-structure, we will all die, and as hard as the former is, the latter is much harder.
To be honest, I find the argument that an Artificial General Intelligence poses a significant risk to humanity, to be plausible and compelling.
Lately I’ve been thinking about metrics and all the ways that they can be misleading. Don’t get me wrong; I think that measuring is important. Measuring is the reason that our buildings and bridges don’t collapse. Measurements help tell us when a drug is working. GPS would be impossible without precise measurements of time. My Fitbit tells me when I’m exercising close to my maximum heart rate. So I’m definitely a fan of measuring things.
The problem is when we try to use measurements for things that aren’t easy to measure. For example, it’s hard to know when an article we publish has had an impact, so we look at the number of times that other researchers have used our articles as proxy indicators for their influence on the thinking of others. But this ignores the number of times that the articles are used to change a programme or trigger a new line of thinking in someone who isn’t publishing themselves. Or we use the number of articles being published in a department as a measure of “how much” science that department is doing. But this prioritises quantity over quality and ignores the fact that what we really want is a better understanding of the world, not “more publications”.
It sometimes feels like academia is just a weird version of Klout where we’re all trying to get better at increasing our “engagement” scores and we’ve forgotten the purpose of the exercise. We’ve confused achieving better scores on the metric rather than workign to move the larger project forward. We publish articles because articles are evidence that we’re doing research, and we use article citations and journal impact factors as evidence that our work is influential. But when a metric becomes a target it fails to be a good metric.
We see similar things happening all around us in higher education. We use percentages and scores to measure learning, even though we know that these numbers in themselves are subjective and sometimes arbitrary. We set targets in departments that ostensibly help us know when we’ve achieved an objective but we’re only mildly confident that the behaviours we’re measuring will help achieve the objective. For example, you have to be in the office for a certain number of hours each week so that we know that you’re working. But I don’t really care how often you’re in your office; I only really care about the quality of the work you do. But it’s hard to measure the quality of the work you do so I measure the thing that’s easy to measure.
This isn’t to say that we shouldn’t try to measure what we value, only that measurement is hard and that the metrics we choose will influence our behaviour. If I notice that people at work don’t seem to like each other very much I might start using some kind of likeability index that aims to score everyone. But then we’d see people trying to increase their scores on the index rather than simply being kinder to each other. What I care about is that we treat each other well, not how well we each score on a metric.
We’ve set up the system so that students – and teachers – care more about the score achieved on the assessment rather than learning or critical thinking or collaborating. We give students page limits for writing tasks because we don’t want them to write everything in the hope that some of what they write is what we’re looking for. But then they play around with different variables (margin and font sizes, line spacing, title pages, etc.) in order to hit the page limit. What we really care about are other things, for example the ability to answer a question clearly and concisely, from a novel perspective, and to support claims about the world with good arguments.
I don’t have any solutions to the problem of measurement in higher education and academia. It’s a ahrd problem. I’m just thinking out loud about the fact that our behaviours are driven by what we’ve chosen to measure, and I’m wondering if maybe it’s time to start using different metrics as a way to be more intentional about achieving what we say we care about. Maybe it doesn’t even matter what the metrics are. Maybe what matters is how the choice of metrics can change certain kinds of behaviours.