---
url: https://lettuceai.app/docs/ai-basics
title: "AI Basics — LettuceAI"
description: "Learn the basics of AI chat and roleplay in plain English: characters, personas, providers, API keys, tokens, context, and memory, written for total beginners."
---

Menu 

# AI Basics

If you are new to chat AI, start here. This page is written for people who have never used API-based AI apps before and need the basics in plain English.

## What LettuceAI actually is

LettuceAI is not the AI model itself. It is the app you use to connect to models, manage chats, store memory, build characters, and organize everything around the conversation.

| Thing | What it means | Example |
| --- | --- | --- |
| LettuceAI | The app that manages your chats and settings | The thing you install and open |
| Provider | The service that gives you access to models | OpenRouter, Anthropic, Gemini, Ollama |
| Model | The actual AI that writes the reply | Claude, Gemini, Qwen, Mistral |
| API key | Your private access key for a provider account | A key you generate on the provider site |

Very short version

LettuceAI is the app, the provider is the service, and the model is the AI doing the writing.

## How chat AI works

When you send a message, LettuceAI builds a request and sends it to the provider and model you selected. That request usually includes:

-   your latest message
-   recent chat history
-   system instructions
-   character or persona data
-   any memory or lorebook entries that were retrieved

The model reads that request, predicts what text should come next, and sends back a reply. It does not "think" like a human or secretly know your entire chat history unless that information is included in the request.

This process is usually called **inference**. In simple terms, inference just means the model is generating an answer.

Important

A model only knows what is inside the current request window. If something is missing from that window, it can forget it, contradict it, or make up details.

## What AI roleplaying is

Most people first meet AI as an assistant: you ask a question, it gives an answer. AI roleplaying is different. Instead of asking for facts, you have a conversation with a **character** and play a part in a story yourself. The AI stays in character, you reply as yourself or as a role you have chosen, and the two of you make up the story as you go.

There is no script, no score, and no single correct path. It is closer to improv acting or collaborative writing than to a search engine. You steer where things go, and the character reacts.

The mindset that helps most

Think of yourself as a co-author, not a customer. The story is something you build together, and you are allowed to steer it, rewrite it, and try again.

## Characters, personas, and the opening scene

Three pieces shape almost every roleplay chat:

-   **Character:** who the AI plays. A character is defined by a profile (sometimes called a character card) that holds its name, personality, background, and speaking style.
-   **Persona:** who _you_ are in the story. Your persona tells the character your name and a little about you, so it can react to you specifically.
-   **Opening message:** the first message that sets the scene, also called a greeting or starting scene. It establishes where you are and how the story begins.

You can learn more in [Characters](/docs/characters), [Personas](/docs/personas), and [Chat Templates](/docs/chat-templates).

## How roleplay messages usually look

Roleplay writing has a few common habits. None of them are required, but recognizing them helps you read and write replies:

-   **Speech and actions:** spoken words are usually written plainly or in quotes, while actions and descriptions are often wrapped in asterisks, like `*she glances up from her book*`.
-   **First or third person:** some people write as `I walk closer` and others as `He walks closer`. Pick whichever feels natural, and the character will usually mirror your style.
-   **Out of character (OOC):** when you want to talk to the AI _about_ the story rather than inside it, people often label it OOC or put it in parentheses, like `(OOC: can we slow this scene down?)`.

You set the tone

The length and style you write in teaches the character what to match. Short replies tend to get short replies, and rich, descriptive writing tends to get more of the same back.

## Steering the story: swipe, edit, continue

Because you are a co-author, you are never stuck with a reply you did not like. The main tools are:

-   **Regenerate (swipe):** ask for a fresh version of the last reply when it misses the mark. Each try can go a different way.
-   **Edit:** change the wording of a message, including the character's, to fix a detail or nudge the direction before you continue.
-   **Continue:** let the character keep writing from where it stopped, instead of sending a new message.
-   **Branch:** split off an alternate version of the chat to try a different path without losing the original. See [Chat Branching](/docs/branching).

A small habit that pays off

If a character drifts off course, edit or regenerate early. Fixing one reply is much easier than arguing with the character for ten turns.

## Who can see my roleplay chats?

This matters a lot in roleplay, so it is worth saying plainly. LettuceAI does not run its own AI and does not store your chats on a LettuceAI server. Where your messages go depends only on the model you pick:

-   **Cloud models:** your messages are sent to the provider you connected, and are handled under that provider's privacy and content rules.
-   **Local models:** everything stays on your own device, and nothing is sent over the internet.

If privacy is a priority, a local model keeps the whole conversation on your machine. See [Security](/docs/security) for the full picture.

## What is a token?

A token is a small chunk of text that the model processes. Tokens are not exactly the same as words.

-   a short word may be one token
-   a longer word may be split into multiple tokens
-   punctuation and spaces can affect token count too

A rough rule of thumb for English is that **1 token is about 3-4 characters**, or **roughly 0.75 words**. This is only an estimate. The real count depends on the exact text and the model's tokenizer.

Example: the sentence `I love this character.` is a handful of tokens, not four exact "word units". The model sees tokenized text, not ordinary word counts.

Why this matters

Providers usually bill by tokens, and most model limits are also measured in tokens.

## Input tokens vs output tokens

There are usually two token buckets involved in a chat request:

-   **Input tokens:** everything sent to the model
-   **Output tokens:** the reply the model generates

Long system prompts, big lorebooks, too much chat history, and memory retrieval all increase input token usage. Longer replies increase output token usage.

## What is context length?

Context length, also called the context window, is the maximum number of tokens the model can handle in one request.

That limit usually covers **everything together**: system prompt, chat history, memory, lorebook entries, your latest message, and the model's reply.

Example: if a model has a `32k` context window, that space is shared by both the text you send and the text the model returns.

If the request gets too large, something has to give. Older messages may be removed, memory may be trimmed, or the output may need a lower max token limit.

Simple way to think about it

Context length is the size of the model's short-term working memory for a single request, not its permanent memory.

![What fills the context window and what happens when it is full](https://lhdgeo5fms.ufs.sh/f/m0TBUtMLsaiElfsvXoM7C73dLiJV9kSZtR4oG2cbhwPzgNan)

The context window is one fixed space shared by the system prompt, character and persona, lorebook entries, memories, recent history, and the room saved for the reply. When it fills up, the oldest messages drop out first, which is why saved memories matter.

## Why AI forgets things

Models do not continuously remember everything forever. They only see the tokens that fit inside the current request. If older information no longer fits, it may be dropped, summarized, or replaced by newer text.

This is why very long chats can drift over time, especially if you are not using a memory system.

LettuceAI helps with this through [Memory](/docs/memory), which tries to bring back the most relevant information instead of resending the entire chat every time.

## Context length is not the same as memory

These two terms are easy to confuse:

-   **Context length:** what fits into one request right now
-   **Memory:** a separate system for storing useful details and re-inserting them later

A bigger context window can help, but it does not automatically solve long-term consistency. Memory systems still matter for long chats and roleplay.

## What is a provider?

A **provider** is the service that gives you access to AI models. Examples include OpenAI, Anthropic, Gemini, OpenRouter, Ollama, and many others.

A **model** is the actual AI you pick inside that provider. One provider can offer many models.

If this part is still confusing, read [Providers](/docs/providers) and [Models](/docs/models) after this page.

## Why do I need an API key?

An API key is how a provider knows which account is making requests. It is similar to a password, but it is meant for apps and software rather than normal website logins.

LettuceAI uses **bring your own key**. That means you do not pay LettuceAI for messages. You connect your own provider account, and the provider bills that account for usage.

Why this exists

This gives you more control over price, privacy, and model choice, but it also means you need to understand the basic provider-model-key setup.

## Are there free providers?

Sometimes, yes. Some providers offer free tiers, free trial credits, or a small amount of free daily usage. Others are paid from the start.

The important part is that **free availability changes often**. A provider that is free today may add limits, remove the free tier, or require billing setup later.

-   free tiers are usually slower or rate-limited
-   free models are often smaller or less capable
-   daily caps and queueing are common
-   some "free" offers are really just temporary trial credits

Best expectation

Treat free provider access as a bonus, not something guaranteed forever. Always check the provider's pricing or usage page before assuming a model is free.

## Cloud models vs local models

There are two common ways to use models in LettuceAI:

-   **Cloud models:** the model runs on someone else's servers over the internet
-   **Local models:** the model runs on your own machine. LettuceAI can run local models directly, including ones you download from inside the app, or connect to a local server like Ollama

Cloud models (the bring-your-own-key option) are usually easier to start with and often stronger. Local models give you more privacy and can work offline, but they depend on your hardware. You can mix both and switch between them whenever you like.

## Why some replies are slow

Slow responses do not always mean something is broken. Speed depends on the provider, model size, current server load, your network, and how much text is being sent in the request.

-   bigger models are often slower
-   longer prompts are often slower
-   longer replies are often slower
-   busy providers can add queue time

## Why the same prompt can get different answers

Chat models are not perfectly fixed machines. Even with the same prompt, the answer can vary because of randomness settings, model updates, provider-side changes, or small differences in the surrounding context.

This is normal. AI chat is probabilistic, not deterministic. If you want more stable behavior, use clearer prompts and lower randomness settings such as temperature.

## What is a hallucination?

A hallucination is when the model states something incorrect, invented, or unsupported as if it were true.

-   it can invent facts
-   it can misremember earlier parts of the chat
-   it can sound confident while being wrong

This does not always mean the model is "bad". It means language models are prediction systems, not truth machines.

Practical rule

Do not treat confident wording as proof. For factual, legal, medical, or financial claims, verify important information separately.

## Why roleplay bots can feel inconsistent

In roleplay, consistency depends on more than just the model. Character cards, system prompts, memory retrieval, lorebooks, context space, and model quality all affect the result.

If a character suddenly acts out of character, common causes are missing context, weak instructions, overloaded prompts, or the model simply making a bad prediction.

## What costs money?

In most cases, you are paying the provider for token usage. More text in and more text out usually means higher cost.

-   long prompts cost more than short prompts
-   long replies cost more than short replies
-   resending huge chat histories costs more than targeted memory
-   some models are simply more expensive per token than others

This is why "free app" does not mean "free AI usage". LettuceAI is free software, but many cloud models are paid services.

## Settings most beginners should care about

-   **Model:** changes quality, style, speed, and price
-   **Max output tokens:** limits how long the reply can be
-   **Temperature:** changes how predictable or creative the output feels

Most other settings can stay at their defaults until you know exactly what behavior you want to change.

## The three beginner mistakes to avoid

-   expecting the model to remember everything forever without memory
-   changing five settings at once and then not knowing what caused the result
-   assuming a confident answer must be a correct answer

## What to learn next

If you only want the minimum needed to use the app well, learn these in this order:

1.  what a provider is
2.  what a model is
3.  what an API key is
4.  what a character and a persona are
5.  what tokens and context length mean
6.  how memory changes long chats

## Beginner glossary

-   **Prompt:** the text you send to the model
-   **System prompt:** hidden instructions that guide the model's behavior
-   **Model:** the AI that generates the reply
-   **Provider:** the service that hosts or serves the model
-   **Token:** a chunk of text used for limits and billing
-   **Context window:** the maximum tokens visible in one request
-   **Inference:** the act of the model generating a reply
-   **Roleplay:** chatting in character to build a story together, rather than asking an assistant for answers
-   **Character:** the personality the AI plays, defined by a character card
-   **Persona:** who you are in the story, so the character can react to you
-   **Greeting:** the opening message that sets the scene
-   **Regenerate (swipe):** asking for a fresh version of the last reply
-   **OOC:** out of character, talking to the AI about the story instead of inside it

Recommended next reads

Continue with [Quick Start](/docs/quickstart), [Providers](/docs/providers), and [Models](/docs/models).

[

PreviousQuick Start

](/docs/quickstart)[

NextAPI Keys

](/docs/api-keys)
