---
url: https://lettuceai.app/docs/tts
title: "Text-to-Speech — LettuceAI"
description: "Make characters speak aloud with ElevenLabs, Google Gemini TTS, OpenAI-compatible TTS, or local Kokoro voices in LettuceAI."
---

Menu 

# Text-to-Speech (TTS)

Text-to-Speech lets the assistant read messages out loud using natural-sounding voices. You can choose from multiple providers, create custom voices, and assign them to your characters so their replies can be spoken automatically or on demand.

TTS is completely optional. If you don’t enable it, chats remain text-only.

## Audio Providers

Audio providers are services that generate speech audio. You can add and manage providers from the TTS settings screen:

![Audio providers UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEjKX4qbefClpeDuryxBRsUFaKGm9Jk3d017LH)

LettuceAI currently supports:

-   **ElevenLabs:** premium neural speech synthesis with voice cloning and voice design.
-   **Google Gemini TTS:** neural voice generation with natural-sounding personas.
-   **OpenAI-compatible TTS:** works with OpenAI's audio endpoint and any compatible third-party server. You supply the model ID, base URL, and voice ID.
-   **Kokoro (local):** fully offline TTS that runs on your device. Voices, model weights, and the eSpeak NG phonemizer are downloaded once and then used without network access.

Cloud providers (ElevenLabs, Gemini TTS, OpenAI-compatible) need an API key. Kokoro is local and does not need a key, but it does need its assets installed and (on desktop) eSpeak NG available on your system.

## My Voices

Voices are reusable presets you create for reading messages aloud. Once a provider is added, you can create and manage your voices:

![My Voices UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEkMFyWwNIHUl6r78TbaLPjui1K5EfoCqge3SR)

A voice includes:

-   provider + model
-   display name
-   a description of how the voice should sound
-   test text you can preview

Custom voices work with ElevenLabs, Gemini TTS, OpenAI-compatible TTS, and Kokoro. Each provider exposes the controls that make sense for it.

## Creating a Voice

Tap **Create Voice** to design your own TTS voice. You can describe tone, style, and personality, and preview how it sounds.

![Create Voice UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEm0ZsDMLsaiE1GPrgcTIvH6l03y9FhketVp2J)

Depending on the provider, you can:

-   clone or style voices (ElevenLabs)
-   describe vocal personality (Gemini TTS)
-   enter a model ID and voice ID for OpenAI-compatible endpoints
-   pick or blend installed Kokoro voices in a dedicated editor (Kokoro)

LettuceAI recognizes and allows the use of Elevenlabs voices created through the Elevenlabs dashboard.

## Provider Voices

Some providers also include pre-made voices you can preview and use:

![Provider voices UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEnBsGTIoaGgJtxP6TwRVdlImiLB7eNsWhfF0r)

These are great if you want to get started quickly.

## Assigning a Voice to a Character

You can assign a voice to a character so their replies are spoken aloud. This can be done:

-   while creating the character
-   later in **Edit Character → Voice**

![Character voice selection UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEgtsGWsz1uLhxGKCwTQRqZ4bmt2ikFOrvslNd)

If no voice is assigned, the character remains text-only.

Voice assignment is optional and per-character. You can mix silent and voiced characters in the same chat.

## Playing Voice Audio

When a character has a voice, you can either play messages manually or enable automatic playback.

### Per-Message Playback

Each message with TTS support shows a speaker icon. Tap it to play the message whenever you like.

![Per-message playback button UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEI3u4FPWH5ZchmM4LX9xYKk8DP0ard7TuUASv)

-   replay messages
-   listen selectively

Manual playback works even when Autoplay is disabled.

### Autoplay Voice

If you want replies to be spoken automatically, enable **Autoplay voice** in the character settings.

![Autoplay toggle UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEArTzpU6HVGa4JAWoXSTlxK7bywCNnupM92jq)

-   new replies play automatically
-   you can still replay messages anytime
-   autoplay is per-character

If no voice is assigned, Autoplay has no effect.

## Kokoro Studio

Kokoro Studio is the management hub for the local Kokoro provider. You reach it from a Kokoro audio provider in TTS settings. It handles everything Kokoro needs to run offline:

-   **Model variant**: pick the Kokoro model variant you want to install. Switching variants triggers a new install for the chosen variant and reuses the existing voice library.
-   **Voice catalog**: browse the available voice list, filter by all or installed, search by name, and install or uninstall voices one at a time or in bulk.
-   **Install queue**: model and voice downloads run through a queue. Active and failed downloads show inline so you can retry or cancel.
-   **Storage stats**: see how much disk Kokoro assets are using and uninstall the model entirely if you want to reclaim space.
-   **Try it**: a built-in preview field lets you type a phrase and hear any installed voice with the current model variant before you assign it.
-   **Saved blends**: blends you create appear alongside installed voices so you can preview, edit, or delete them in one place.

Kokoro assets are downloaded once and stored locally. On desktop, you also need eSpeak NG installed on your system for phonemization.

## Voice Blending

A Kokoro blend is a custom voice built by mixing two or more installed Kokoro voices with weights. Blends are saved as regular user voices and can be assigned to characters like any other voice.

You configure a blend in the Kokoro blend editor:

-   **Add voices**: pick from the installed voice list. Each voice you add starts with a weight of 50.
-   **Weights**: each voice has a 0 to 100 slider. Higher weights pull the blend toward that voice. Voices with weight 0 are skipped on save.
-   **Speed**: adjust playback speed between 0.5x and 2.0x.
-   **Test bench**: enter preview text and play the current blend before saving. You can stop and replay as you tweak weights.
-   **Name**: each blend has a display name. It is stored and listed next to your other voices.

Blends need at least one installed voice with a non-zero weight. If nothing is installed yet, open Kokoro Studio first and install a voice.

## Audio Cache

Generated speech is cached locally so repeated lines don’t need to be regenerated. This improves performance and reduces cost when using paid providers.

![Audio cache UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEQXu4TEg9vKmiI437oLMurRteET6N0XOnDfwW)

You can clear cached audio anytime.

[

PreviousHelp Me Reply

](/docs/help-me-reply)[

NextSpeech Recognition

](/docs/speech-recognition)
