---
url: https://lettuceai.app/docs/images
title: "Image Generation — LettuceAI"
description: "Generate images in LettuceAI with scene generation, avatar creation, and design references. Connect ComfyUI, Diffusers, AUTOMATIC1111, OpenAI, Gemini, Stability, and more."
---

Menu 

# Image Generation

LettuceAI does not treat every image feature as one generic mode. There is a normal image-generation path, a separate scene-writer path for roleplay images, and a design-reference writer for turning reference images into reusable visual notes.

Image generation now runs through a provider

LettuceAI no longer ships an on-device image engine. Every image is produced by a provider you connect, whether that provider runs on your own machine (like ComfyUI or a Diffusers server) or is an online service (like OpenAI or Google Gemini). If no image-capable model is set up, nothing will generate.

## What most users need to know

You do not need to understand the full image stack to use this feature. For most people, it comes down to three simple actions:

-   generate a new image from a prompt
-   edit an existing image
-   let the app help draft a scene prompt before generating

Simple mental model

Some models can **create images**. Some models can **look at images and describe them**. Some can do both. Most of the advanced wording on this page is just explaining that split.

Three distinct image systems

Avatar generation and normal image jobs use image-output models. Scene prompt drafting and design-reference drafting use a different kind of model: one that can read images and output text.

## What the app actually supports

The image stack breaks down into three user-facing workflows:

If you only want basic image generation, focus on the first one. The scene writer and design reference tools are optional advanced helpers.

-   **Image generation**: create or edit images directly with an image-capable model.
-   **Scene generation**: draft a scene prompt from recent roleplay context, then render that prompt into an image.
-   **Design reference drafting**: read an avatar and a small set of reference images, then write a clean visual description for future use.

That is why the settings page separates regular image models from the **Scene Writer** model. They solve different jobs.

![Image generation settings page](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEov95OMxgeUAy1BckbI0oS8FO2ihTxv7GqEl9)

In Settings, the Image page splits avatar and scene image models from the separate scene-writer model, and also controls how scene prompts are handled.

## Capabilities matter more than provider names

The app chooses features from model scopes, not from marketing labels. In practice, you want to look at what a model can accept as input and what it can return as output.

![Model capability scopes for image features](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE7iLisBS3XDOMPNhgrUjbZp9fWTYnCuB0v4wx)

The capability list is the real contract. Image generation needs image output. The scene writer needs text plus image input with text output.

| Feature | What the model must support |
| --- | --- |
| Avatar generation | Image output |
| Avatar editing | Image output, and ideally image input too for edit-style models |
| Scene image rendering | Image output |
| Scene writer / design reference writer | Text input + image input + text output |

A model that can _see_ images is not automatically a model that can _generate_ them. Those are separate capabilities.

## Regular image generation

The normal image pipeline is the most direct one. The app sends your prompt to the selected image model, saves the returned image locally, and then reuses it in the UI like any other media asset.

-   Generated images are saved locally after the provider returns them.
-   The returned result can be a hosted URL or raw image data.
-   The app records width, height, mime type, and the stored asset id so the image behaves like a normal attachment.

Behind the scenes, LettuceAI supports several provider adapters, so the exact payload shape varies by provider even though the UI flow stays the same.

### Supported image generators

The app currently includes built-in support for several image backends. That means you can use different providers without rewriting the rest of your workflow. Some run locally on your own hardware, and some are online services.

-   **ComfyUI** for local node-graph workflows you export from ComfyUI yourself.
-   **Diffusers** for a local Diffusers-style image server.
-   **AUTOMATIC1111** for local Stable Diffusion style txt2img and img2img setups.
-   **OpenAI** image generation and edit-style requests.
-   **OpenRouter** models that can return image output.
-   **Google Gemini** image-capable generation flows.
-   **Stability** text-to-image and image-to-image generation.
-   **xAI** image generation and edit flows.
-   **NanoGPT** OpenAI-style image generation requests.
-   **Pollinations** for simple hosted image generation.

The UI stays mostly the same across these providers, but the provider still matters for edit quality, reference-image handling, returned formats, and how reliable multimodal prompting feels in practice.

## Connecting a local image backend

Local backends like ComfyUI and Diffusers are added the same way as any other provider, under **Settings → Providers**. You pick the provider, then give it the **Base URL** where it is running (for example a ComfyUI or Diffusers server on your own machine or local network). An API key is only needed if your endpoint requires one.

### ComfyUI workflows

ComfyUI is driven by workflows rather than a single fixed request. In the ComfyUI provider editor you paste an **API-format workflow exported from ComfyUI**. You can paste a text-to-image workflow and, optionally, a separate image-to-image workflow that is used whenever reference images are present.

LettuceAI fills in the parts of the workflow that change per request by replacing placeholder tokens. The available tokens include the prompt and negative prompt, size, steps, CFG, seed, sampler, checkpoint, denoise, and image count, plus ordered reference-image tokens (the first reference image, the second, and so on). This is how the same saved workflow can be reused for every generation.

The image workflow is for reference-based jobs

Your text-to-image workflow handles plain prompts. The optional image-to-image workflow is what runs when one or more reference images are attached, so you can wire reference images into the nodes that expect them.

### Diffusers

A Diffusers endpoint is simpler to connect: set its Base URL and it works like the other Stable Diffusion style backends. Per-model details such as size, steps, CFG, sampler, seed, and denoise strength come from that model's settings, described further down.

### Self-signed and local endpoints

Local and self-hosted endpoints often do not have a normal public certificate. LettuceAI applies your trusted certificates to image requests, and for self-hosted providers you can turn on **Allow Invalid TLS** in the provider editor to skip certificate validation for that one endpoint.

Only relax TLS for endpoints you control

The Allow Invalid TLS option exists for your own local or private machines. Do not enable it for an endpoint you do not personally trust.

## Reference images and ordering

Several image features can send more than one reference image, and the order is meaningful. LettuceAI passes the references as an ordered set, so backends that care about position (like a ComfyUI workflow with separate reference nodes) receive the first reference, the second reference, and so on in a predictable sequence.

-   Character design references come first, followed by any persona references and an optional chat background image.
-   The first reference image is treated as the primary one for edit and image-to-image style requests.
-   Ordering lets a workflow or model tell, for example, the character reference apart from the background reference.

## Images generated inside chat

For normal chat image output, the behavior is simpler than that: if the selected chat model returns an image, LettuceAI shows that image as an attachment on the assistant message. If the model does not return an image, then nothing visual is added.

-   The assistant message still keeps its normal text content.
-   Any returned image is saved locally and attached to that same message.
-   This depends on the chat model actually producing image output, not just on the provider existing in settings.

![Images generated directly inside chat](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEAlauYJv6HVGa4JAWoXSTlxK7bywCNnupM92j)

In-chat image generation is a post-processing step. The assistant reply lands first, then LettuceAI runs the attached image job and replaces the placeholder with the final saved image.

Not every provider behaves the same way

Some providers support clean image editing and multiple reference images better than others. A model being visible in settings does not mean all image workflows are equally strong on that provider.

## Avatar generation and avatar editing

Avatar tools are built on top of the normal image-generation pipeline, but the prompt itself comes from a dedicated avatar template. That means avatar generation is structured and reusable rather than just one freeform prompt box.

-   **Generate** writes a fresh avatar prompt from the subject name, subject description, and your request.
-   **Edit** reuses the current avatar image as the source and asks the model to preserve identity while changing only what you asked for.
-   Every accepted result becomes a local asset you can keep, replace, or regenerate later.

![Avatar generation interface](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEKx51RCjUThaiVjQ5lFnW8C6uS7RMmf4pPb9t)

Avatar generation is template-driven: the app renders a dedicated avatar prompt first, then sends it through the selected image model.

![Generated avatar result](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEGqSZXlvZNz7b9S6JOnaWIErcYV8jTRAoKUtB)

Generated avatars are stored locally and can be reused as profile art or as fallback visual references elsewhere in the app.

If a character or persona has no saved design-reference images, the base avatar can also act as the fallback visual reference for later scene generation.

## Per-model image settings

Image models keep their own settings inside the model editor, so each model can be tuned without affecting the others. Open a model from the Models page to find its image options.

-   **Negative prompt**: things you never want in the image, applied to every request for that model.
-   **Extra prompt**: text that is always added before your prompt. This is the right place for quality tags and style boilerplate so you do not have to retype them.
-   **Prompt writer instructions**: format guidance for the scene writer when it composes prompts for this model. For example, you can tell it to write comma-separated tags instead of full sentences.
-   **Generation controls** such as size, steps, CFG, sampler, seed, and denoise strength for Stable Diffusion style backends.

Extra prompt vs writer instructions

Extra prompt is glued onto the final image prompt. Prompt writer instructions instead change how the scene writer phrases the prompt in the first place. One shapes the words sent to the image model, the other shapes how those words get written.

## Prompting and visual consistency

Consistency is not just about picking a better model. In LettuceAI, it comes from how the app builds prompts and how it reuses visual anchors across different image features.

-   **Avatar templates** give the app a stable way to phrase who the subject is and what should change.
-   **Design reference notes** turn image observations into reusable text that can carry outfit cues, face coverage, materials, silhouette, and non-negotiables into future prompts.
-   **Saved design-reference images** are the strongest identity anchor for later scene generation.
-   **Base avatars** can still act as fallback references when no dedicated design references exist.

That is the main consistency loop: generate or choose a stable avatar, attach a few good design references, draft clean design notes, and then let later scene prompts reuse that same visual identity instead of starting from scratch every time.

Consistency is a pipeline, not one prompt

The app gets more reliable when the same character identity shows up in multiple layers: avatar prompt, design-reference images, design-reference text, and scene-generation prompt. Using only one of those layers usually makes results drift faster.

### What helps prompts stay stable

-   Use a small set of clean reference images instead of many inconsistent ones.
-   Keep one concise design description with durable visual facts.
-   Edit existing avatars when refining style, rather than regenerating from zero every time.
-   Use the scene writer for roleplay scenes so the prompt is based on recent context and your saved references, not only on a raw one-line request.

This is also why scene generation is split into writer plus renderer. One model can focus on producing a clean, identity-aware prompt from the chat context, while another model focuses on actually drawing the image.

## Scene generation in roleplay chats

Scene generation is a two-step system. First, a scene-writer model turns recent roleplay context into one polished scene prompt. Then an image-generation model renders that prompt into the final image.

![How a scene image is built, approved, and rendered](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEj1NfOZyefClpeDuryxBRsUFaKGm9Jk3d017L)

A prompt writer turns the moment in the chat (plus your design references and extra prompt) into a scene prompt. Depending on your approval setting it generates automatically, after your review, or only when you ask, then a self-hosted or cloud backend renders the image into the chat.

1.  The app looks at the selected message and a short recent context window.
2.  A scene-writer template injects character info, persona info, recent messages, and optional reference images.
3.  The writer returns one final prompt, not an explanation or analysis.
4.  The image model receives that prompt plus any saved character or persona references, sent in order.

-   In **Automatic** mode, the app generates the scene image as soon as the model provides a scene prompt.
-   In **Ask first** mode, the detected scene prompt is shown so you can review and edit it before any image is generated.
-   In **Manual** mode, scene prompts in model responses are ignored and images only generate from actions you trigger yourself.

![Scene prompt approval sheet](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE39hd88QQBzq82mPK4fsNVJMRDrejSU1A9Zan)

Ask-first mode lets you inspect and edit the drafted scene prompt before sending the final image request.

![Scene generation result in chat](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiE80xZv9qFdHBKrgh7YW2EbSeyfUJVmjDRkonu)

After approval or automatic generation, the rendered scene image is attached back onto the assistant message.

Reference images are identity anchors

For scene images, saved design-reference images are preferred. If those do not exist, LettuceAI can fall back to the base avatar image so the model still gets a stable face and outfit anchor.

## Design references

Design references live on each character and persona, in the editor. A design reference is a small set of clear reference images plus one canonical visual description. Together they tell scene generation what the same face, build, outfit cues, and style should keep looking like.

You can write the visual description yourself, or use **design reference drafting** to have the app write it for you. Drafting does not create a picture. It reads the subject avatar and any reference images and produces a concise, artist-facing description, not a narrative caption.

-   The scene-writer model reads the images and any current notes.
-   The returned result is a clean visual note covering things like face, hair, build, outfit cues, accessories, and art direction.
-   Those notes and images then feed later prompt templates and scene generation.

![Design reference drafting UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEQHqKK0g9vKmiI437oLMurRteET6N0XOnDfwW)

Design reference drafting uses the scene-writer model to turn avatar and reference images into reusable visual notes.

## Prompt templates also affect image tools

Image features are not hardcoded to one prompt. Avatar generation, avatar editing, scene prompt generation, and design reference drafting all run through protected internal templates.

-   Avatar templates write image prompts.
-   Scene-generation templates write one scene prompt from chat context, and can include your per-model prompt writer instructions.
-   Design-reference templates can inject multimodal image payloads for avatar and reference images.

If you customize prompt templates elsewhere in the docs, remember that some of those changes affect image-related tools too, not just normal text chat.

## Privacy and local storage

Prompts and image inputs go only to the provider you selected for that specific workflow. If that provider runs on your own machine, the data never leaves your network. After generation, LettuceAI saves the resulting image locally so it can be reused as an avatar, a chat attachment, or a design reference.

-   Generated assets are stored locally after the request finishes.
-   Design-reference images and avatars can later be reused as scene references.
-   The app does not need a separate LettuceAI image hosting step to keep those results available in your workspace.

## What to configure first

1.  Connect at least one provider with image output and choose it for avatar or scene image generation.
2.  Choose a separate scene-writer model if you want automatic scene prompt drafting or design-reference drafting.
3.  Add a few stable design-reference images if you want consistent faces, outfits, and proportions in scene images.

The wrong model mix causes confusing failures

If scene generation is enabled but no compatible scene-writer model is configured, prompt drafting helpers will fail even if you already have a normal image model set up.

[

PreviousHost API

](/docs/host-api)[

NextAccessibility

](/docs/accessibility)
