Michael Rowe

Trying to get better at getting better

Use generative AI to extract information from photos of slides

I’ve shared my thoughts about taking photos of slides at conferences, and while I generally think that this isn’t very useful, I admitted that it’s something I’ve sometimes felt compelled to do. But then I usually leave those photos in the ‘I’ll get around to it someday’ folder, which basically means I’ll never look at them again.

However, in the spirit of exploring the use-cases of generative AI, I thought I’d experiment with the multimodality feature by asking if it could do the job for me.

Immediately, I could see that this was going to be very useful. ChatGPT made the obligatory noises about how it couldn’t be certain about its responses and that the output may include errors, but its analysis was excellent and made no mistakes in the transcription.

Example of one of the slides that ChatGPT was able to transcribe perfectly.

I then tried to upload multiple images (it can, and I uploaded 12 photos from a recent seminar) and learned that it can analyse the images in batches. But it wasn’t always obvious which order it analysed them in. So, after trying a few different versions, here’s the prompt I used:

Please extract the text from the attached photos of slides. In your response, clearly distinguish the content between slides, giving each slide a summary sentence describing the content. Where there are images in the slide, note this in your response. Please use sentence case for your output.


Share this


Discover more from Michael Rowe

Subscribe to get the latest posts to your email.