Yesterday I posted about a reasoning problem that’s common in language models, where they couldn’t figure out how many R’s there are in the word, ‘strawberry’.
I wasn’t aware that today, OpenAI would release OpenAI o1 (codenamed ‘Strawberry’), which is aimed at addressing exactly the kinds of reasoning problem I posted about yesterday.
Strawberry / OpenAI o1 seems like a significant improvement in how ChatGPT processes the prompts we give it:
Similar to how a human may think for a long time before responding to a difficult question, o1 uses a chain of thought when attempting to solve a problem. Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses. It learns to recognize and correct its mistakes. It learns to break down tricky steps into simpler ones. It learns to try a different approach when the current one isn’t working. This process dramatically improves the model’s ability to reason.
Also worth noting, OpenAI remains on trend when it comes to choosing awful names for their products.