Molmo is an open source multimodal AI model

Oct 7, 2024

in AI

Allen Institute, Molmo, multimodal AI model, multimodality, OLMo

Molmo is a family of open source, multimodal AI models built by the Allen Institute for AI.

Our most powerful model closes the gap between open and proprietary systems across a wide range of academic benchmarks as well as human evaluations. Our smaller models outperform models 10x their size.

It’s focused on images and voice interaction for now. You can upload a picture and then ask follow-up questions, but you don’t seem to be able to interact with it using text only i.e. you have to start with an image.

You can also have the model speak its response back to you, which sounds pretty good but would benefit from audio controls. At the moment you can only change the speed of the audio response.

I’m a fan of open source in general, and of open source language models in particular. I’m excited to see what else Ai2 creates in the future (I’ve posted before about their Semantic Reader, which I quite like).

Share this

Discover more from Michael Rowe

Subscribe to get the latest posts to your email.

No longer active

Molmo is an open source multimodal AI model

Like this:

Discover more from Michael Rowe