I was asked the other day to recommend an image creation platform for a colleague, and had to say that this isn’t something I’ve spent much time on. Most of what I do is text-based, and I haven’t experimented with images at all.
So I thought I might play around with a few options. I’ve done a little bit with Midjourney but found the Discord interface unintuitive. I tried DALL-E 2 a few times but didn’t like the outputs (although, to be honest, I feel like I struggled with the prompts…others have clearly created amazing work). And I very briefly tried Stable Diffusion from Stability AI via Poe (which I mentioned here).
I saw recently that OpenAI have released DALL-E 3, and that this version is built into Microsoft’s Image Creator, so thought I’d experiment with that a little, since this is most likely the version that most colleagues at work will come into contact with.
Image Creator is currently free, but it’s not integrated with ChatGPT, is very restrictive in what images it accepts for interpretation, and only offers square aspect ratios in the output. So the full DALL-E 3 package is currently only available from OpenAI.
I like to have a ‘standard’ prompt when testing, which means I have something consistent to evaluate outputs over time. That’s not to say I don’t use other prompts, but I tend to use a version of one that is easy to remember. For images, that prompt is a variation of Maurice Sendak’s, “Everybody should be quiet near a little stream and listen”, which also has a handy image that conveys a sense of peaceful solitude that I love.
The prompt I used was, “A young boy sitting quietly by a mountain stream, lost in thought, looking into the distance, with trees in the background, at sunset”, and then added a short description of the style I wanted in the image. So, the images below were created with the prompt, “A young boy sitting quietly by a mountain stream, lost in thought, looking into the distance, with trees in the background, at sunset, digital art”.
Examples of outputs using Microsoft Image Creator (style in the captions)
DALL-E 2 vs DALL-E 3
To get a sense of the difference between DALL-2 and DALL-E 3, here is OpenAI’s implementation of DALL-E 2 (via the OpenAI interface), in the style of a pencil sketch…
…and here is the output of the same prompt using DALL-E 3 (via the Microsoft Image Creator interface)…
It’s hard to do a real comparison, because all versions will generate something different if you ask them to try again. I’m also not sure if the Image Creator version of DALL-E 3 has been modified in any way as part of the integration into Microsoft’s systems. But regardless, I think I prefer the rougher, less polished interpretation of DALL-E 2 (i.e. the first pencil sketch version above), although you can probably get DALL-E 3 to create something similar if you experiment with the prompts.
The thing that still blows me away is how many variations you can generate in a couple of minutes, not only across multiple styles, but within styles. I didn’t do this, but you can also experiment with an arbitrary number of prompt variations, adding, editing, and removing features through a dialogue with the model.
I used to spend a lot of time trying to find images for my presentations, and I anticipate that in future, I’ll be spending a lot of time trying to generate the ones I want, in the style I want. The point, for me, isn’t that this is going to save me time. It’s going to allow me to create exactly what I’m looking for.