In-Depth Guide for Training Logos With AI (FLUX DEV)

Dori Adar
Sep 16, 2024
3 min read

Flux Dev is the first open model to date that can reliably be trained to produce logos, fonts, and brand guidelines. In this guide, I'll show you the results of my experiment: training an AI model on our company logo using a very small data set.

With this guide, you'll gain a solid understanding of how to train any logo with Flux. Hopefully, this will save you time on your next branding project!

Research Parameters - Flux Training

I conducted 7 different training sessions, all run locally on an Nvidia 4090 using the AI-toolkit. (Of course, you can use online trainers like Astria, Civitai, Fal, and Replicate). I tested these parameters across 7 batches of training:

Data set size and variety
Captions
Batch size
Overall steps

Do captions matter?

Training batch 1-4

All four batches had the same configuration and a very small data set - just 5 images. These batches were trained on a total of 1,000 steps.

The only difference was the captions - the text describing each image.

Batch 1: "Fntnrs logo... (over a white background)"
Batch 2: "AI Finetuners logo... (over a white background)"
Batch 3: "AI FINETUNERS logo in bold purple, the AI is written inside a dial... (over a white background)"
Batch 4: No captions

Did it matter?

These images were not cherry-picked, and all settings for creating them (including seed number) were identical. The logo wasn’t captured perfectly, no matter the type of captioning used. Unlike SDXL training, which heavily relies on captions, Flux is more forgiving.

However, when I tried training the set without captions, the results weren’t as good, but still the essence of the logo was captured. So the takeaway here is simple:

Use captions.
You can probably rely on automatic captions, as Flux is flexible and can draw on many data points.

Data Set Size

This is a crucial factor. The data set needs a baseline minimum for the logo to turn out correctly. But what if we don't have enough images of the logo? We can artificially enlarge the set.

Here’s how the data sets for batches 5 and 6 looked:

In batch #4, I changed the background of a few images.

In batch #5, I added a couple of generated photos from finetuned images in batch #4.

More changes here:

Both sets were trained on 3,000 steps.
Both sets used a batch size of 2.
I used the longer captions for both sets.

Let's see the results:

This is good news. Flux was able to capture the essence of the logo in both data sets, with batch #6 producing slightly results. All we needed were 16 images, many of which were created "artificially" by changing the background behind the logo or tilting it slightly.

When the image didn’t have enough space to display the logo horizontally, Flux got creative. Thanks to the abundance of logo references, it was able to position the text below the dial while maintaining the overall colors, shapes, and font of the original logo.

In cases when the logo didn't come out exactly right but I fond the image appealing, a quick inpainiting resolved the issue.

Still, I wanted to get a better model, that churns better results most of the times, reducing the need for cherry picking and inpainting. Hence, I set out for a final training. This time, the data set was composed of some of the images I generated with previous model iterations.

Final Batch

Yup. Here's the DS:

This batch was trained on 4,500 steps, batch size of two. (Took 12 hours to complete and the room felt like a furnace but hey, what we won't do for sciene).

Results were the best so far, but not by drastically better, as previous attempts already generated good results.

What does this all mean?

It means logos, pack shots, and other product images can now be easily reproduced with AI using a model like Flux. Even with just one image of the logo, it's possible.

Take a look at the differences between a 5-image model vs. a 20-image model - it’s not that drastic.

In conclusion, to create a model that is truly faithful to the source, you would need a large, varied, and well-captioned dataset. However, a functional model only requires a few images and works right out of the box. Model training is becoming a commodity.