Training AI Models Is a Lot Like Raising Younglings

What You'll Learn
craft mastery
intentional preparation
iteration over brute force
quality over quantity
teaching through variety
foundation before scale
patience
Ideas Connected
10 connected articles

LORA + Checkpoint Model Training GUIDE - Get the BEST RESULTS super easy

Your AI model is learning from what you feed it. And just like any student... the quality of the teaching shapes everything that comes after.

Most people training LoRA models or Stable Diffusion checkpoints get frustrated early. Bad results. Weird faces. Mush where detail should be. They blame the tools. They blame the hardware.

The tools aren't the problem. The inputs are.

Let me break it down.

The Noise Underneath Everything

Diffusion models work by dissolving your training image into pure noise... then learning to reconstruct it. That seed number you type in when generating an image? That's the noise. The whole training process is the AI learning how to pull signal back out of chaos.

Once you understand that, everything else clicks.

If your training image has a tiny face occupying a sliver of the frame... the AI only learns that face as a tiny sliver of noise. It can't reconstruct a close-up from that. It doesn't have enough information. So you need variety. Big faces. Small faces. Mid-range. Full body. The AI needs to see the subject at every scale to reproduce it at every scale.

Variety Is the Curriculum

Think of your training dataset like a lesson plan for a youngling. You wouldn't teach someone to recognize a face by showing them the same photo 50 times. You'd show them:

- Different expressions... smiling, serious, mid-laugh, contemplative - Different lighting... bright sun, overcast, neon, soft indoor - Different angles... profile, three-quarter, straight on, looking up - Different clothing and hairstyles... so the AI learns the face isn't defined by the outfit - Different body framing... close-up, half-body, full body

Each variation teaches the neural network something new about how the subject exists in the world. More variation means more resilience in generation. Fewer surprises. Better results.

Quality Over Resolution

Here's where people trip up. They grab the biggest images they can find and assume bigger equals better.

Not quite.

A sharp 512×512 image will train better than a blurry 2048×2048 one. The AI dissolves every image into noise... and if your source image is a muddy mess of compression artifacts and soft edges, the noise can't capture distinct features. Eyelashes blur into shadows. Hair blurs into skin. The AI can't learn what it can't distinguish.

Sharp. Clean. Clear detail. That's the standard. Resolution helps, but image quality is the foundation.

Keywords Are Your Control Panel

Every training image gets a text caption file. Those caption keywords aren't just descriptions... they're variables.

If you label every hairstyle as just "hair," the AI can't differentiate between curly and straight. Between short and long. Between dark and blonde. You've given it no lever to pull.

But write "curly dark short hair" in one caption and "straight blonde long hair" in another... now the AI maps those words to specific visual patterns. Later, when you prompt for "short hair," it knows what to do. The specificity in your captions directly creates the controllability in your outputs.

BAM... that's the whole game. Describe what changes so the AI can learn the difference.

LoRA or Full Model?

LoRA training produces a small, portable add-on. You can stack multiple LoRAs together. Train a face as one LoRA, a clothing style as another, combine them at generation time. LoRAs also transfer across different base models... train on photos, apply to anime. The face structure carries over into whatever style the checkpoint uses.

Checkpoint model training produces a full model. Bigger files. More storage. But here's the thing that matters... checkpoints are more forgiving to train. They don't have to be perfect because you can merge them with existing high-quality models to fill in the gaps. Your undertrained areas get patched by the model you merge with.

For faces and portable concepts... LoRA. For themes and architecture and styles where you want forgiveness... checkpoint with a merge.

The 10-Epoch Principle

Steps and training epochs confuse people. Here's the simple version.

One epoch of 1,000 steps is NOT the same as ten epochs of 100 steps... even though the math is identical. Each epoch builds on the last. The model iterates. Improves. Refines. Ten epochs with fewer steps creates a feedback loop that produces better results than one long brute-force pass.

For faces... 15 quality images, 10 steps each, 10 epochs. That can be enough. For complex subjects like architectural styles... more images, more steps, same epoch principle.

Start With What You Can Verify

Train on a face you already know well. A public figure with thousands of diverse images available online. Not for distribution... for learning.

Why? Because you'll spot problems instantly. A weird jawline. Eyes that don't quite track. Expressions that feel off. When you know the face, you can diagnose the training. Adjust your images. Refine your captions. Dial in your parameters.

You learn the craft by working with familiar materials first. Then you apply that knowledge to everything else.

The Merge Trick

Your trained checkpoint doesn't have to be perfect. Train it to 70-80% quality, then merge it with an existing model that already handles what yours is missing. The merge fills gaps. Smooths edges. Creates something better than either model alone.

This is quietly one of the most powerful techniques in the whole workflow... and most tutorials skip right past it.

Training AI models isn't magic. It's craft. Patience. Iteration. Understanding what's happening underneath the interface so you can make intentional choices instead of throwing images at a wall and hoping.

Start small. Train a face you know. Study the results. Adjust. Train again.

The AI learns from what you give it. Give it your attention... focused, intentional, quality attention... and it'll show you what it can do. ✨

--- Source: https://www.youtube.com/watch?v=j-So4VYTL98

From TIG's Notebook

Thoughts that surfaced while watching this.

All change takes additional energy. Ruts get a bad rap, but when used with purpose, they are fantastic! They conserve energy, and empower you to focus more energy on other things.
— TIG's Notebook — Core Principles
Don't be afraid of take two.
— TIG's Notebook — On Failure & Perseverance
Living the lives we want not only requires doing the right things but also necessitates not doing the things we know we'll regret. — *Nir Eyal, Indistractable*
— TIG's Notebook — Core Principles

Echoes

Wisdom from across the constellation that resonates with this article.

The human brain isn’t designed to process all of the world’s breaking emergencies in realtime.
— Naval Ravikant | Tweet by @naval community
A Mysterious Design That Appears Across Millennia | Terry Moore | TED - What can we make of a design that shows up over and over in disparate cultures throughout history? Theorist Terry Moore explores "Penrose tiling" -- two shapes that fit together in infinite combinatio
— TED | A Mysterious Design That Appears Across Millennia | Terry Moore | TED community
Brand strategist Chris Do teaches entrepreneurs that a great story isn't one you tell perfectly... it's one a child can retell without you in the room.
— Chris Do | Can Your Brand Pass This Test?!! (Brand Story Challenge) community