---
url: https://lettuceai.app/docs/images
title: "Image Generation — LettuceAI"
description: "Generate images with scene generation, avatar creation, and design references using AUTOMATIC1111, OpenAI, Gemini, and more."
---

Menu 

# Image Generation

LettuceAI does not treat every image feature as one generic mode. There is a normal image-generation path, a separate scene-writer path for roleplay images, and a design-reference writer for turning reference images into reusable visual notes.

## What most users need to know

You do not need to understand the full image stack to use this feature. For most people, it comes down to three simple actions:

-   generate a new image from a prompt
-   edit an existing image
-   let the app help draft a scene prompt before generating

Simple mental model

Some models can **create images**. Some models can **look at images and describe them**. Some can do both. Most of the advanced wording on this page is just explaining that split.

Three distinct image systems

Avatar generation and normal image jobs use image-output models. Scene prompt drafting and design-reference drafting use a different kind of model: one that can read images and output text.

## What the app actually supports

The image stack breaks down into three user-facing workflows:

If you only want basic image generation, focus on the first one. The scene writer and design reference tools are optional advanced helpers.

-   **Image generation**: create or edit images directly with an image-capable model.
-   **Scene generation**: draft a scene prompt from recent roleplay context, then render that prompt into an image.
-   **Design reference drafting**: read an avatar and a small set of reference images, then write a clean visual description for future use.

That is why the settings page separates regular image models from the **Scene Writer** model. They solve different jobs.

![Image generation settings page](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEov95OMxgeUAy1BckbI0oS8FO2ihTxv7GqEl9)

Image settings split the normal image model from the separate scene-writer model, and also control scene generation mode.

## Capabilities matter more than provider names

The app chooses features from model scopes, not from marketing labels. In practice, you want to look at what a model can accept as input and what it can return as output.

![Model capability scopes for image features](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE7iLisBS3XDOMPNhgrUjbZp9fWTYnCuB0v4wx)

The capability list is the real contract. Image generation needs image output. The scene writer needs text plus image input with text output.

| Feature | What the model must support |
| --- | --- |
| Avatar generation | Image output |
| Avatar editing | Image output, and ideally image input too for edit-style models |
| Scene image rendering | Image output |
| Scene writer / design reference writer | Text input + image input + text output |

A model that can _see_ images is not automatically a model that can _generate_ them. Those are separate capabilities.

## Regular image generation

The normal image pipeline is the most direct one. The app sends your prompt to the selected image model, saves the returned image locally, and then reuses it in the UI like any other media asset.

-   Generated images are saved locally after the provider returns them.
-   The returned result can be a hosted URL or raw image data.
-   The app records width, height, mime type, and the stored asset id so the image behaves like a normal attachment.

Behind the scenes, LettuceAI supports several provider adapters, so the exact payload shape varies by provider even though the UI flow stays the same.

### Supported image generators

The app currently includes built-in adapters for several image backends. That means you can use different providers without rewriting the rest of your workflow.

-   **AUTOMATIC1111** for local Stable Diffusion style txt2img and img2img setups.
-   **OpenAI** image generation and edit-style requests.
-   **OpenRouter** models that can return image output.
-   **Google Gemini** image-capable generation flows.
-   **Stability** text-to-image and image-to-image generation.
-   **xAI** image generation and edit flows.
-   **NanoGPT** OpenAI-style image generation requests.

The UI stays mostly the same across these providers, but the provider still matters for edit quality, reference-image handling, returned formats, and how reliable multimodal prompting feels in practice.

## Images generated inside chat

For normal chat image output, the behavior is simpler than that: if the selected chat model returns an image, LettuceAI shows that image as an attachment on the assistant message. If the model does not return an image, then nothing visual is added.

-   The assistant message still keeps its normal text content.
-   Any returned image is saved locally and attached to that same message.
-   This depends on the chat model actually producing image output, not just on the provider existing in settings.

![Images generated directly inside chat](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEAlauYJv6HVGa4JAWoXSTlxK7bywCNnupM92j)

In-chat image generation is a post-processing step. The assistant reply lands first, then LettuceAI runs the attached image job and replaces the placeholder with the final saved image.

Not every provider behaves the same way

Some providers support clean image editing and multiple reference images better than others. A model being visible in settings does not mean all image workflows are equally strong on that provider.

## Avatar generation and avatar editing

Avatar tools are built on top of the normal image-generation pipeline, but the prompt itself comes from a dedicated avatar template. That means avatar generation is structured and reusable rather than just one freeform prompt box.

-   **Generate** writes a fresh avatar prompt from the subject name, subject description, and your request.
-   **Edit** reuses the current avatar image as the source and asks the model to preserve identity while changing only what you asked for.
-   Every accepted result becomes a local asset you can keep, replace, or regenerate later.

![Avatar generation interface](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEKx51RCjUThaiVjQ5lFnW8C6uS7RMmf4pPb9t)

Avatar generation is template-driven: the app renders a dedicated avatar prompt first, then sends it through the selected image model.

![Generated avatar result](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEGqSZXlvZNz7b9S6JOnaWIErcYV8jTRAoKUtB)

Generated avatars are stored locally and can be reused as profile art or as fallback visual references elsewhere in the app.

If a character or persona has no saved design-reference images, the base avatar can also act as the fallback visual reference for later scene generation.

## Prompting and visual consistency

Consistency is not just about picking a better model. In LettuceAI, it comes from how the app builds prompts and how it reuses visual anchors across different image features.

-   **Avatar templates** give the app a stable way to phrase who the subject is and what should change.
-   **Design reference notes** turn image observations into reusable text that can carry outfit cues, face coverage, materials, silhouette, and non-negotiables into future prompts.
-   **Saved design-reference images** are the strongest identity anchor for later scene generation.
-   **Base avatars** can still act as fallback references when no dedicated design references exist.

That is the main consistency loop: generate or choose a stable avatar, attach a few good design references, draft clean design notes, and then let later scene prompts reuse that same visual identity instead of starting from scratch every time.

Consistency is a pipeline, not one prompt

The app gets more reliable when the same character identity shows up in multiple layers: avatar prompt, design-reference images, design-reference text, and scene-generation prompt. Using only one of those layers usually makes results drift faster.

### What helps prompts stay stable

-   Use a small set of clean reference images instead of many inconsistent ones.
-   Keep one concise design description with durable visual facts.
-   Edit existing avatars when refining style, rather than regenerating from zero every time.
-   Use the scene writer for roleplay scenes so the prompt is based on recent context and your saved references, not only on a raw one-line request.

This is also why scene generation is split into writer plus renderer. One model can focus on producing a clean, identity-aware prompt from the chat context, while another model focuses on actually drawing the image.

## Scene generation in roleplay chats

Scene generation is a two-step system. First, a scene-writer model turns recent roleplay context into one polished scene prompt. Then an image-generation model renders that prompt into the final image.

1.  The app looks at the selected message and a short recent context window.
2.  A scene-writer template injects character info, persona info, recent messages, and optional reference images.
3.  The writer returns one final prompt, not an explanation or analysis.
4.  The image model receives that prompt plus any saved character or persona references.

-   In **auto** mode, the app runs the scene image generation immediately.
-   In **ask first** mode, you can review and edit the drafted prompt before the image request is sent.
-   In **manual** mode, no automatic scene image job runs from assistant replies.

![Scene prompt approval sheet](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE39hd88QQBzq82mPK4fsNVJMRDrejSU1A9Zan)

Ask-first mode lets you inspect and edit the drafted scene prompt before sending the final image request.

![Scene generation result in chat](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE80xZv9qFdHBKrgh7YW2EbSeyfUJVmjDRkonu)

After approval or automatic generation, the rendered scene image is attached back onto the assistant message.

Reference images are identity anchors

For scene images, saved design-reference images are preferred. If those do not exist, LettuceAI can fall back to the base avatar image so the model still gets a stable face and outfit anchor.

## Design reference drafting

Design reference generation does not create a picture. It creates text notes from a subject avatar and optional reference images so future image prompts can stay visually consistent.

-   The scene-writer model reads the images and any current notes.
-   The returned result is a concise artist-facing description, not a narrative caption.
-   Those notes can then feed later prompt templates and scene generation.

![Design reference drafting UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEQHqKK0g9vKmiI437oLMurRteET6N0XOnDfwW)

Design reference drafting uses the scene-writer model to turn avatar and reference images into reusable visual notes.

## Prompt templates also affect image tools

Image features are not hardcoded to one prompt. Avatar generation, avatar editing, scene prompt generation, and design reference drafting all run through protected internal templates.

-   Avatar templates write image prompts.
-   Scene-generation templates write one scene prompt from chat context.
-   Design-reference templates can inject multimodal image payloads for avatar and reference images.

If you customize prompt templates elsewhere in the docs, remember that some of those changes affect image-related tools too, not just normal text chat.

## Privacy and local storage

Prompts and image inputs go only to the provider you selected for that specific workflow. After generation, LettuceAI saves the resulting image locally so it can be reused as an avatar, a chat attachment, or a design reference.

-   Generated assets are stored locally after the request finishes.
-   Design-reference images and avatars can later be reused as scene references.
-   The app does not need a separate LettuceAI image hosting step to keep those results available in your workspace.

## What to configure first

1.  Choose at least one model with image output for avatar or image generation.
2.  Choose a separate scene-writer model if you want automatic scene prompt drafting or design-reference drafting.
3.  Add a few stable design-reference images if you want consistent faces, outfits, and proportions in scene images.

The wrong model mix causes confusing failures

If scene generation is enabled but no compatible scene-writer model is configured, prompt drafting helpers will fail even if you already have a normal image model set up.

[

PreviousHost API

](/docs/host-api)[

NextAccessibility

](/docs/accessibility)
