Michael Rowe

Trying to get better at getting better

Article: CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering

Pride, D., Cancellieri, M., & Knoth, P. (2023). CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering (arXiv:2307.04683). arXiv. http://arxiv.org/abs/2307.04683

In this paper, we present CORE-GPT, a novel question answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT’s performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

This is just another example of the idea that we can’t look for refuge in the things that AI cannot do. A few months ago, academics were making fun of ChatGPT because it couldn’t be trusted. And, while that’s actually a difficult technical problem to solve (because of the underlying architecture of language models), it’s not an insurmountable problem.

There are other problems with LLMs that make them suboptimal for certain kinds of interactions, but these will also be solved. Or, we’ll find that some questions will never be suitable for language models, and we’ll adapt our expectations. We don’t try to use calculators to write poems. In a few years time there’ll be ‘obvious’ use cases for LLMs, and it just won’t make sense to use them outside that scope.


Share this


Discover more from Michael Rowe

Subscribe to get the latest posts to your email.