---
url: https://lettuceai.app/docs/tts
title: "Text-to-Speech — LettuceAI"
description: "Make characters speak aloud with ElevenLabs, Google Gemini TTS, OpenAI-compatible TTS, Fish, or local Kokoro voices in LettuceAI."
---

Menu 

# Text-to-Speech (TTS)

Text-to-Speech lets the assistant read messages out loud using natural-sounding voices. You can choose from multiple providers, create custom voices, and assign them to your characters so their replies can be spoken automatically or on demand.

TTS is completely optional. If you don’t enable it, chats remain text-only.

## Audio Providers

Audio providers are services that generate speech audio. You can add and manage providers from the TTS settings screen:

![Audio providers UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEjKX4qbefClpeDuryxBRsUFaKGm9Jk3d017LH)

LettuceAI currently supports:

-   **ElevenLabs:** premium neural speech synthesis with voice cloning and voice design.
-   **Google Gemini TTS:** neural voice generation with natural-sounding personas.
-   **OpenAI-compatible TTS:** works with OpenAI's audio endpoint and any compatible third-party server. You supply the model ID, base URL, and voice ID.
-   **Fish (cloud):** Fish Audio's hosted voice synthesis. It uses an API key and lets you pick from the voice models in your Fish account.
-   **Fish (local):** connects to a Fish Speech server you run yourself. You point LettuceAI at the server address, and an API key is optional depending on how the server is configured.
-   **Kokoro (local):** fully offline TTS that runs on your device. Voices, model weights, and the eSpeak NG phonemizer are downloaded once and then used without network access.

Cloud providers (ElevenLabs, Gemini TTS, OpenAI-compatible, Fish cloud) need an API key. Kokoro is local and does not need a key, but it does need its assets installed and (on desktop) eSpeak NG available on your system. The local Fish option talks to a server you host, so it does not send audio to a third party.

## My Voices

Voices are reusable presets you create for reading messages aloud. Once a provider is added, you can create and manage your voices:

![My Voices UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEkMFyWwNIHUl6r78TbaLPjui1K5EfoCqge3SR)

A voice includes:

-   provider + model
-   display name
-   a description of how the voice should sound
-   test text you can preview

Custom voices work with ElevenLabs, Gemini TTS, OpenAI-compatible TTS, and Kokoro. Each provider exposes the controls that make sense for it.

## Creating a Voice

Tap **Create Voice** to design your own TTS voice. You can describe tone, style, and personality, and preview how it sounds.

![Create Voice UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEm0ZsDMLsaiE1GPrgcTIvH6l03y9FhketVp2J)

Depending on the provider, you can:

-   clone or style voices (ElevenLabs)
-   describe vocal personality (Gemini TTS)
-   enter a model ID and voice ID for OpenAI-compatible endpoints
-   pick or blend installed Kokoro voices in a dedicated editor (Kokoro)

LettuceAI recognizes and allows the use of Elevenlabs voices created through the Elevenlabs dashboard.

## Provider Voices

Some providers also include pre-made voices you can preview and use:

![Provider voices UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEnBsGTIoaGgJtxP6TwRVdlImiLB7eNsWhfF0r)

These are great if you want to get started quickly.

## Assigning a Voice to a Character

You can assign a voice to a character so their replies are spoken aloud. This can be done:

-   while creating the character
-   later in **Edit Character → Voice**

![Character voice selection UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEgtsGWsz1uLhxGKCwTQRqZ4bmt2ikFOrvslNd)

If no voice is assigned, the character remains text-only.

Voice assignment is optional and per-character. You can mix silent and voiced characters in the same chat.

## Playing Voice Audio

When a character has a voice, you can either play messages manually or enable automatic playback.

### Per-Message Playback

Each message with TTS support shows a speaker icon. Tap it to play the message whenever you like.

![Per-message playback button UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEI3u4FPWH5ZchmM4LX9xYKk8DP0ard7TuUASv)

-   replay messages
-   listen selectively

Manual playback works even when Autoplay is disabled.

### Autoplay Voice

If you want replies to be spoken automatically, enable **Autoplay voice** in the character settings.

![Autoplay toggle UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEArTzpU6HVGa4JAWoXSTlxK7bywCNnupM92jq)

-   new replies play automatically
-   you can still replay messages anytime
-   autoplay is per-character

If no voice is assigned, Autoplay has no effect.

## Setting up Kokoro (local voices)

Kokoro runs entirely on your device, so before you can use it you need to download its model once. Setup now happens with a guided menu right where you add the provider, so you do not need to hunt for files manually.

-   Open **Settings → Providers** and switch to the **Audio** tab, then add or open the **Kokoro (Local)** provider.
-   A setup menu appears where you pick a **model variant**. The options trade size for quality: a smaller compressed build, a balanced build (recommended for most devices), and a full-size build.
-   You can also tick a **starter pack** to install a small set of ready-to-use voices alongside the model, so you have something to speak with immediately.
-   Tap download and let the model and any selected voices install. Once finished, Kokoro is ready to use offline.

On desktop, Kokoro also needs the eSpeak NG phonemizer installed on your system. On mobile this is bundled for you.

## Kokoro Studio

Once the model is installed, Kokoro Studio is the management hub for your local voices. You reach it from the **Voices** screen by opening your Kokoro provider. It handles everything Kokoro needs to run offline:

-   **Model variant**: pick the Kokoro model variant you want to install. Switching variants triggers a new install for the chosen variant and reuses the existing voice library.
-   **Voice catalog**: browse the available voice list, filter by all or installed, search by name, and install or uninstall voices one at a time or in bulk.
-   **Install queue**: model and voice downloads run through a queue. Active and failed downloads show inline so you can retry or cancel.
-   **Storage stats**: see how much disk Kokoro assets are using and uninstall the model entirely if you want to reclaim space.
-   **Try it**: a built-in preview field lets you type a phrase and hear any installed voice with the current model variant before you assign it.
-   **Saved blends**: blends you create appear alongside installed voices so you can preview, edit, or delete them in one place.

Kokoro assets are downloaded once and stored locally. On desktop, you also need eSpeak NG installed on your system for phonemization.

## Voice Blending

A Kokoro blend is a custom voice built by mixing two or more installed Kokoro voices with weights. Blends are saved as regular user voices and can be assigned to characters like any other voice.

You configure a blend in the Kokoro blend editor:

-   **Add voices**: pick from the installed voice list. Each voice you add starts with a weight of 50.
-   **Weights**: each voice has a 0 to 100 slider. Higher weights pull the blend toward that voice. Voices with weight 0 are skipped on save.
-   **Speed**: adjust playback speed between 0.5x and 2.0x.
-   **Test bench**: enter preview text and play the current blend before saving. You can stop and replay as you tweak weights.
-   **Name**: each blend has a display name. It is stored and listed next to your other voices.

Blends need at least one installed voice with a non-zero weight. If nothing is installed yet, open Kokoro Studio first and install a voice.

## Audio Cache

Generated speech is cached locally so repeated lines don’t need to be regenerated. This improves performance and reduces cost when using paid providers.

![Audio cache UI](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiEQXu4TEg9vKmiI437oLMurRteET6N0XOnDfwW)

You can clear cached audio anytime.

## Audio Library

The Audio Library is one place to find every audio clip in the app. Open it from **Settings → Library** and switch to the **Audio** tab. It gathers two kinds of audio:

-   **Generated**: speech created by your TTS voices.
-   **Uploaded**: audio files you attached to a chat.

You can filter between all audio, generated only, or uploaded only. Each clip shows as a player card with a play button, a progress bar you can scrub, and the file name, size, and date. From a card you can:

-   **Open in chat**: jump back to the message the clip came from (available for clips tied to a conversation).
-   **Download**: save the audio file to your device.
-   **Delete**: remove the clip after a confirmation.

Reset removes downloaded models

Resetting the app's data clears downloaded voice and speech models, including Kokoro and Whisper, along with cached and library audio. You can re-download anything you need afterward.

[

PreviousHelp Me Reply

](/docs/help-me-reply)[

NextSpeech Recognition

](/docs/speech-recognition)
