---
url: https://lettuceai.app/docs/ai-basics
title: "AI Basics — LettuceAI"
description: "Learn what tokens, context length, models, providers, and API keys mean in plain English for beginners new to AI apps."
---

Menu 

# AI Basics

If you are new to chat AI, start here. This page is written for people who have never used API-based AI apps before and need the basics in plain English.

## What LettuceAI actually is

LettuceAI is not the AI model itself. It is the app you use to connect to models, manage chats, store memory, build characters, and organize everything around the conversation.

| Thing | What it means | Example |
| --- | --- | --- |
| LettuceAI | The app that manages your chats and settings | The thing you install and open |
| Provider | The service that gives you access to models | OpenRouter, Anthropic, Chutes, Ollama |
| Model | The actual AI that writes the reply | Claude, Gemini, Qwen, Mistral |
| API key | Your private access key for a provider account | A key you generate on the provider site |

Very short version

LettuceAI is the app, the provider is the service, and the model is the AI doing the writing.

## How chat AI works

When you send a message, LettuceAI builds a request and sends it to the provider and model you selected. That request usually includes:

-   your latest message
-   recent chat history
-   system instructions
-   character or persona data
-   any memory or lorebook entries that were retrieved

The model reads that request, predicts what text should come next, and sends back a reply. It does not "think" like a human or secretly know your entire chat history unless that information is included in the request.

This process is usually called **inference**. In simple terms, inference just means the model is generating an answer.

Important

A model only knows what is inside the current request window. If something is missing from that window, it can forget it, contradict it, or make up details.

## What is a token?

A token is a small chunk of text that the model processes. Tokens are not exactly the same as words.

-   a short word may be one token
-   a longer word may be split into multiple tokens
-   punctuation and spaces can affect token count too

A rough rule of thumb for English is that **1 token is about 3-4 characters**, or **roughly 0.75 words**. This is only an estimate. The real count depends on the exact text and the model's tokenizer.

Example: the sentence `I love this character.` is a handful of tokens, not four exact "word units". The model sees tokenized text, not ordinary word counts.

Why this matters

Providers usually bill by tokens, and most model limits are also measured in tokens.

## Input tokens vs output tokens

There are usually two token buckets involved in a chat request:

-   **Input tokens:** everything sent to the model
-   **Output tokens:** the reply the model generates

Long system prompts, big lorebooks, too much chat history, and memory retrieval all increase input token usage. Longer replies increase output token usage.

## What is context length?

Context length, also called the context window, is the maximum number of tokens the model can handle in one request.

That limit usually covers **everything together**: system prompt, chat history, memory, lorebook entries, your latest message, and the model's reply.

Example: if a model has a `32k` context window, that space is shared by both the text you send and the text the model returns.

If the request gets too large, something has to give. Older messages may be removed, memory may be trimmed, or the output may need a lower max token limit.

Simple way to think about it

Context length is the size of the model's short-term working memory for a single request, not its permanent memory.

## Why AI forgets things

Models do not continuously remember everything forever. They only see the tokens that fit inside the current request. If older information no longer fits, it may be dropped, summarized, or replaced by newer text.

This is why very long chats can drift over time, especially if you are not using a memory system.

LettuceAI helps with this through [Memory](/docs/memory), which tries to bring back the most relevant information instead of resending the entire chat every time.

## Context length is not the same as memory

These two terms are easy to confuse:

-   **Context length:** what fits into one request right now
-   **Memory:** a separate system for storing useful details and re-inserting them later

A bigger context window can help, but it does not automatically solve long-term consistency. Memory systems still matter for long chats and roleplay.

## What is a provider?

A **provider** is the service that gives you access to AI models. Examples include OpenAI-compatible endpoints, Anthropic, Gemini, Chutes, Ollama, and many others.

A **model** is the actual AI you pick inside that provider. One provider can offer many models.

If this part is still confusing, read [Providers](/docs/providers) and [Models](/docs/models) after this page.

## Why do I need an API key?

An API key is how a provider knows which account is making requests. It is similar to a password, but it is meant for apps and software rather than normal website logins.

LettuceAI uses **bring your own key**. That means you do not pay LettuceAI for messages. You connect your own provider account, and the provider bills that account for usage.

Why this exists

This gives you more control over price, privacy, and model choice, but it also means you need to understand the basic provider-model-key setup.

## Are there free providers?

Sometimes, yes. Some providers offer free tiers, free trial credits, or a small amount of free daily usage. Others are paid from the start.

The important part is that **free availability changes often**. A provider that is free today may add limits, remove the free tier, or require billing setup later.

-   free tiers are usually slower or rate-limited
-   free models are often smaller or less capable
-   daily caps and queueing are common
-   some "free" offers are really just temporary trial credits

Best expectation

Treat free provider access as a bonus, not something guaranteed forever. Always check the provider's pricing or usage page before assuming a model is free.

## Cloud models vs local models

There are two common ways to use models in LettuceAI:

-   **Cloud models:** the model runs on someone else's servers over the internet
-   **Local models:** the model runs on your own machine, usually through something like Ollama

Cloud models are usually easier to start with and often stronger. Local models give you more privacy and can work offline, but they depend on your hardware.

## Why some replies are slow

Slow responses do not always mean something is broken. Speed depends on the provider, model size, current server load, your network, and how much text is being sent in the request.

-   bigger models are often slower
-   longer prompts are often slower
-   longer replies are often slower
-   busy providers can add queue time

## Why the same prompt can get different answers

Chat models are not perfectly fixed machines. Even with the same prompt, the answer can vary because of randomness settings, model updates, provider-side changes, or small differences in the surrounding context.

This is normal. AI chat is probabilistic, not deterministic. If you want more stable behavior, use clearer prompts and lower randomness settings such as temperature.

## What is a hallucination?

A hallucination is when the model states something incorrect, invented, or unsupported as if it were true.

-   it can invent facts
-   it can misremember earlier parts of the chat
-   it can sound confident while being wrong

This does not always mean the model is "bad". It means language models are prediction systems, not truth machines.

Practical rule

Do not treat confident wording as proof. For factual, legal, medical, or financial claims, verify important information separately.

## Why roleplay bots can feel inconsistent

In roleplay, consistency depends on more than just the model. Character cards, system prompts, memory retrieval, lorebooks, context space, and model quality all affect the result.

If a character suddenly acts out of character, common causes are missing context, weak instructions, overloaded prompts, or the model simply making a bad prediction.

## What costs money?

In most cases, you are paying the provider for token usage. More text in and more text out usually means higher cost.

-   long prompts cost more than short prompts
-   long replies cost more than short replies
-   resending huge chat histories costs more than targeted memory
-   some models are simply more expensive per token than others

This is why "free app" does not mean "free AI usage". LettuceAI is free software, but many cloud models are paid services.

## Settings most beginners should care about

-   **Model:** changes quality, style, speed, and price
-   **Max output tokens:** limits how long the reply can be
-   **Temperature:** changes how predictable or creative the output feels

Most other settings can stay at their defaults until you know exactly what behavior you want to change.

## The three beginner mistakes to avoid

-   expecting the model to remember everything forever without memory
-   changing five settings at once and then not knowing what caused the result
-   assuming a confident answer must be a correct answer

## What to learn next

If you only want the minimum needed to use the app well, learn these in this order:

1.  what a provider is
2.  what a model is
3.  what an API key is
4.  what tokens and context length mean
5.  how memory changes long chats

## Beginner glossary

-   **Prompt:** the text you send to the model
-   **System prompt:** hidden instructions that guide the model's behavior
-   **Model:** the AI that generates the reply
-   **Provider:** the service that hosts or serves the model
-   **Token:** a chunk of text used for limits and billing
-   **Context window:** the maximum tokens visible in one request
-   **Inference:** the act of the model generating a reply

Recommended next reads

Continue with [Quick Start](/docs/quickstart), [Providers](/docs/providers), and [Models](/docs/models).

[

PreviousQuick Start

](/docs/quickstart)[

NextAPI Keys

](/docs/api-keys)