Introduction to LLM

What is an LLM?

LLM stands for Large Language Model. Bolt generates code using Anthropic’s Claude 3.7 Sonnet and Claude Sonnet 4 LLMs, which provide robust programming performance. OpenAI’s ChatGPT is another popular LLM that you may have heard of or used yourself.

Without getting too detailed, LLMs work as a very smart autocomplete. It doesn’t ‘know’ information, but based on all of the training information it has been fed, it’s able to generate ‘new’ thoughts based on patterns that it detected during training.

Prompt

A prompt is a message you send to the AI. How you write your prompts is key to getting good results. Refer to Prompt effectively for more guidance.

Context

Context is all the information about the project that’s available to the AI: your prompts, previous responses, and the existing code. The context window is the amount of text that a large language model (LLM) can process at once.

Different vibe coding tools support different context windows, but you still can’t rely on the AI remembering your whole conversation. There’s a tradeoff here, as larger context windows consume tokens faster, which increases costs.

A way to preserve context while keeping the context window small is to get the AI to summarize the conversation so far, then reset the context window. Refer to Reset the AI context window for instructions.

Tokens and LLMs

Understanding tokens and token usage is critical to using Bolt effectively.

A simple definition of tokens:

Tokens are small pieces of text

“I”, “love”, “cats”, ”!” are all examples of a token.

LLMs process text as tokens, analyze them, and then predict the next tokens to generate a response.

AI tokens are a complex topic related to all AI apps, not just Bolt. For detailed background on tokens in AI, check out Nvidia blog | Explaining Tokens — the Language and Currency of AI

Token limits

LLMs can only handle a certain number of tokens at a time and the total includes both:

The input you give (for example, a long question or document)
The output it generates (for example, the response, or the code you get back)

Costs

If you’re using an LLM through a paid service such as Bolt, costs are often calculated based on the number of tokens processed. Fewer tokens means lower cost.

In this table, you can find an approximate guide for estimating token costs for code tasks when you use a vibe coding tool like Bolt:

Task	Approx. token cost
Simple function (10 lines)	50-100 tokens
Medium script (50 lines)	300-500 tokens
Complex logic (100+ lines)	1000+ tokens
Full application (~1000 lines)	8000+ tokens

For Bolt specifically, it’s important to note that the majority of token usage is related to syncing your project’s file system to the AI: the larger the project, the more tokens used per message.

Costs can grow very quickly, so refer to Maximizing token efficiency to learn how to keep costs down.

Remember Longer prompts and larger outputs use more tokens, so different actions cost different amounts.

Accuracy

LLMs can produce inaccurate or outdated outputs. There are a couple of reasons for this:

Training set age: LLMs are trained on massive amounts of data, but they can’t know anything that occurs after their training finishes. When building software, it’s important to be aware that the LLM may not know about the latest version of all the tools and frameworks you’re using.
Hallucination: LLMs are predictive, not deterministic. They can produce different results even with the same prompt. Sometimes, they generate false information.

It’s important to keep this in mind when building, and always test your applications carefully. The guide to Prompt effectively includes tips that can reduce errors.