February 14, 2026Behind the Scenes10 min read

How We Keep Characters Looking the Same in Every Frame

A behind-the-scenes look at how Paintbrush maintains character consistency across scenes using multi-angle reference sheets and AI composition.

Character consistency is the core problem Paintbrush was built to solve. If you've ever tried generating a video with a specific character using a generic AI tool, you know the pain — they look different in every single frame. Hair color drifts, outfits change, facial features morph. It breaks the illusion immediately.

We spent months building a system that addresses this at every level of the pipeline, from character creation through final video generation. Here's how it works.

The reference sheet approach

When you create a character in Paintbrush, we don't just generate a single image. We create a multi-angle reference sheet — front view, left profile, right profile, and back view — all from the same generation seed.

This gives the AI model multiple perspectives of the same character, making it far more likely to reproduce their exact appearance when composing a scene. Think of it like giving an artist a character turnaround sheet before asking them to draw a comic — the more angles they can reference, the more consistent the character stays.

The generation process uses a specialized prompt structure that ensures all four angles share the same visual identity: same hair, same clothing, same proportions. We also run a background removal step on each angle so the reference images are clean and uncluttered.

@Mentions as composition glue

When you write a scene description like "@Aria walks through the forest", Paintbrush doesn't just insert a name. Behind the scenes, we attach Aria's reference images to the generation request. The AI literally sees what Aria looks like before generating the scene.

This is the key insight: text descriptions alone aren't enough to maintain consistency. You need visual references. By coupling character names with their reference sheets at the API level, we close the gap between "a girl with red hair" (vague) and "this specific girl, with this exact shade of red hair, this hairstyle, and this outfit" (precise).

You can include multiple characters in a single scene. Each @mention pulls in that character's full reference sheet, so even in group scenes, individual characters maintain their distinct appearances.

What affects consistency

Even with reference sheets, several factors influence how well a character's appearance is maintained across scenes:

Prompt clarity — Specific descriptions help the model match the reference. "Aria, wearing her red cloak, stands facing the camera" works better than "Aria stands there"
Style selection — Anime and cartoon styles tend to be more consistent than realistic ones. Stylized art has clearer visual anchors (bold colors, distinct shapes) that are easier for the model to reproduce
Number of characters — Fewer characters per scene means more attention to each. With three or more characters, the model has to juggle multiple reference sets simultaneously
Model choice — Pro models generally maintain better likeness than Standard. The extra compute time allows the model to more carefully match reference details
Scene complexity — Simpler backgrounds and fewer moving elements give the model more capacity to focus on character accuracy

Iterating on results

If a generation doesn't quite nail the character's look, you can regenerate just that scene. Each attempt uses the same reference images, so results converge toward consistency over a few tries. Most users find that 1–2 regenerations is enough to get a result they're happy with.

You can also use the "tweak" feature to make targeted adjustments without regenerating from scratch. This is useful when the overall composition is good but a specific detail — like the character's expression or pose — needs refinement.

Multi-character scenes

Things get more complex when multiple characters share a scene. Each @mention attaches a full reference sheet, so a scene with three characters sends three sets of multi-angle references to the model. This is a lot of visual information for the model to process simultaneously.

In practice, we've found that two characters per scene is the sweet spot for consistency. Both characters maintain their distinct appearances reliably. With three characters, results are still good but you may see occasional drift on the least prominent character. Beyond three, we recommend splitting into multiple scenes or using a Pro model for better results.

Character placement also matters. If two characters have similar builds or colors, the model can sometimes blend their features. Giving characters distinct visual signatures — different hair colors, contrasting outfits, varied body types — helps the model keep them separate.

Working with custom styles

Character consistency behaves differently depending on the art style you've chosen for your project. Our built-in styles (Anime, Realistic, Cartoon, etc.) are tuned to work well with the reference sheet pipeline. Custom styles can sometimes interfere if they push the visual output far from the reference images.

If you're using a custom style, keep it specific but not contradictory. A custom style like "Studio Ghibli watercolor, warm earth tones" works well because it defines an aesthetic without overriding character-specific details. A style like "abstract cubist figures" will struggle with consistency because it fundamentally reshapes how characters are rendered.

What's next

We're constantly improving our reference pipeline. Upcoming updates will include automatic pose detection, smarter reference image selection based on the scene's camera angle, and a captioning step that generates text descriptions of each character to supplement the visual references.

We're also exploring fine-tuned character embeddings — where the model learns a compact representation of your character that can be applied more reliably than reference images alone. Early experiments are promising, and we expect to roll this out later this year. The goal is to make consistency feel automatic — something you never have to think about.

Older

Create Your First AI Video in 5 Minutes

Newer

Scene Chaining: Seamless Continuity Between Shots