The naive approach (and why it fails)
The first thing every team tries: concatenate the style references into the prompt at generation time. Upload six reference images, pack their URLs into the prompt, generate. The result drifts anyway.
Three reasons. First, image models attend to text more than to image references when the two compete. Your six-word style cue gets crowded out by the scene description. Second, the model re-interprets the references freshly every call, with no canonical extraction step — so there's no commitment to "this style means warm grayscale, 70% white space, 2-3 pixel line weight". Third, reference images are token-heavy. Tools that bundle them every generation either crop aggressively (losing signal) or batch fewer (losing diversity).
The naive approach drifts because the style was never locked. It was just suggested.
What locking actually requires
Three things. A vision pass that runs once and extracts a written specification. A persistent record keyed to user × project × artist. A prompt compiler that splices the spec into every future generation as a hard rule, above the user's per-frame prompt.
The vision pass is the heart of it. Hand your references to a vision model with a structured prompt: extract line weight ratios, value distribution percentages, palette hex codes, finishing technique, negative rules. What comes back is not an image. It's a written style schema in JSON. The schema lives in your database. Every subsequent generation reads it before the user's prompt is even compiled.
Re-analysis should be free
Subtle but important: re-analysing an unchanged bible should cost nothing. Hash the inputs (artist key, style key, sorted image URLs) into a SHA-256. Cache the schema by that hash. On re-sync, recompute the hash; if it matches, return the cached schema. Vision passes cost real money — usually ~12 credits per style. Heavy users sync several times a day. The hash cuts the bill to zero on the no-change case, which is most of them.
Per-frame fidelity scoring closes the loop
Locking the prompt is necessary but not sufficient. Models still drift sometimes — especially on complex multi-character scenes, or when the user's prompt accidentally fights the style ("a vibrant sunset" while the style is monochrome).
The closing step: hand the finished image back to the vision model alongside the bible. Score six dimensions (line, tone, colour, cinematic composition, photorealism risk, prompt accuracy). Weighted-average against a threshold (82/100 is a sensible floor). Below the floor, regenerate once with an auto-composed correction prompt that names the failure dimension. User pays for the first attempt; the regeneration is free.
This is the difference between "style lock" as marketing copy and as production capability.
Practical tests for filmmakers
Three things to test on any AI storyboard tool. Drift over distance: generate frames 1 and 50 from realistic prompts, compare line weight, value structure, palette. The contradiction test: generate a frame whose prompt fights the style. The locked style should win. The re-sync cost: click re-sync without changing the bible. If you get billed credits, the hashing isn't there.
What this looks like inside StoryboardCanvas
The AI Artist surface ships the full pipeline above: vision-pass extraction, persistent schemas keyed by user × project × artist, 7-layer prompt compiler, post-generation fidelity scoring, automatic re-run on sub-threshold scores, bible-hash caching for free re-syncs. Mitchell carries three canon styles, each locked via the same engine. Custom artists work identically — up to 4 user-named styles × up to 10 reference images per style.