---
url: https://lettuceai.app/docs/models
title: "Models — LettuceAI"
description: "Learn how AI models work and how to configure temperature, top-p, top-k, and other generation parameters in LettuceAI."
---

Menu 

# Models

Models are the AI systems that generate the responses you see when you chat. When you type a message, a model reads the request, looks at the context it was given, and writes the reply.

These are sometimes called **LLMs**. In LettuceAI we just call them **models**.

## What is a model?

A model is the part that actually produces text. When you send a message, your provider runs the model, and the model sends the answer back to LettuceAI.

Different models behave differently. Some are better at creative writing, some at reasoning, some are faster, and some are cheaper.

You do not need to understand the full math. In practice, choosing a model mostly means choosing the quality, tone, speed, and price you want.

## What is an LLM?

LLM stands for **Large Language Model**. That is just the technical name for this kind of text-generating AI.

## Why choose different models?

Different models can be better for different tasks. For example:

-   some are faster
-   some are more detailed
-   some are cheaper to use
-   some have a unique tone or style

There is no single best model for everyone. You can switch depending on what kind of conversation you want.

## Do I need to configure anything?

Not necessarily. If you are unsure which model to use, the default options work fine for most conversations. You can change models later without losing chats or memory.

## Model Parameters

Some models expose extra settings that change how they generate replies. Most users do not need to touch them, but it helps to know what they mean.

### How generation works, simplified

The model does not write the whole answer in one go. It generates text one step at a time.

At each step, it looks at many possible next words or tokens and asks "which one is most likely here?"

Example: if the text so far is `The cat sat on the`, the model will usually think words like `mat`, `floor`, or `chair` make more sense than something random like `galaxy`.

Then it has to pick one. Temperature, Top-K, and Top-P change how safe or adventurous that pick can be.

Important

These settings do not make the model smarter. They only change how it chooses between possible next words.

### Temperature

Temperature is the main setting for how **predictable vs random** the output feels.

-   **Lower temperature:** safer, more stable, more repetitive
-   **Higher temperature:** more creative, looser, more chaotic

Example prompt: `The wizard opened the ancient`

With **lower temperature**, the model is more likely to continue with something obvious like `door`, `book`, or `chest`.

With **higher temperature**, it is more willing to choose something less expected like `gateway`, `tomb`, or `void`.

Lower temperature usually feels more consistent. Higher temperature can feel more vivid, but also more weird or off-track.

Beginner advice

If you only want one creativity setting, use Temperature and leave Top-K and Top-P alone.

### Top-K (if supported)

Top-K is a **hard limit** on how many next-word options the model is allowed to consider.

Example: if `Top-K = 5`, the model keeps only the 5 most likely next options and ignores everything else.

| Rank | Possible next word | Kept? |
| --- | --- | --- |
| 1   | door | Yes |
| 2   | book | Yes |
| 3   | chest | Yes |
| 4   | hall | Yes |
| 5   | room | Yes |
| 6+  | void / galaxy / thunder | No  |

So if the best options are `door`, `book`, `chest`, `hall`, and `room`, the model must choose from those. A weirder option that ranked much lower is not even allowed into the final choice pool.

Top-K is useful when you want a **strict cap** on how wide the model's choice pool can be.

Not all providers support this setting.

### Top-P

Top-P, also called **nucleus sampling**, is similar to Top-K but more flexible.

Instead of saying "keep exactly 5 options", Top-P says "keep enough of the likely options to cover most of the probability."

Example: if `Top-P = 0.90`, the model keeps adding likely next words until the combined likelihood reaches about 90%. Some moments might only need a few obvious options. Other moments might need many more.

| Word | Chance | Running total |
| --- | --- | --- |
| door | 30% | 30% |
| book | 20% | 50% |
| chest | 15% | 65% |
| hall | 10% | 75% |
| room | 8%  | 83% |
| gate | 4%  | 87% |
| tomb | 3%  | 90% |

In that example, the model stops at `tomb` because the total has reached 90%. Lower-ranked words after that are ignored.

-   **Lower Top-P:** fewer options survive, so replies feel safer
-   **Higher Top-P:** more options survive, so replies feel looser

The short version is:

-   **Top-K:** how many options can survive?
-   **Top-P:** how much of the likely option space should survive?

In practice, Top-P and Temperature both affect randomness. Most users should only adjust one of them, not both at the same time.

If you are unsure, leave Top-P at its default value. The model will behave normally without any tuning.

### Max Output Tokens

This setting controls the maximum length of the model's reply.

-   **Lower values:** shorter answers
-   **Higher values:** longer answers

Example: if you keep getting giant walls of text, lower this. If the AI keeps cutting itself off too early, raise it.

### Presence Penalty

Presence penalty pushes the model to bring in new ideas instead of staying on the same topic.

Example: if the AI keeps circling around the same scene or thought, raising presence penalty can make it introduce something new.

### Frequency Penalty

Frequency penalty tries to reduce repeated wording.

Example: if the model keeps repeating the same phrases, sentence shapes, or favorite words, a higher frequency penalty can help break that habit.

## Reasoning Mode

Some models support a special reasoning mode. This lets the AI spend more effort on difficult tasks before replying.

It is mostly useful for coding, planning, analysis, and logic-heavy tasks. For normal conversation or roleplay, you usually do not need it.

Different providers expose reasoning in different ways. LettuceAI groups them into four types:

-   **Effort:** you choose how hard the model should think
-   **Budget-only:** you set a token limit for reasoning
-   **Dynamic:** the provider supports both effort and budget, but only one can be active at a time
-   **None:** reasoning controls are not available

If you enable effort or budget, the other one will be ignored for that request.

Models with Built-In Reasoning

Some AI models have built-in reasoning behavior. In those cases, your settings may be ignored if the provider does not expose control over it.

[

PreviousProviders

](/docs/providers)[

NextModel Browser & Local Inference

](/docs/model-browser)