LLM Fundamentals

Understand how large language models actually work — transformers, tokenization, prompting, sampling strategies, and the context window constraints that shape every LLM-powered system.

Day 4 Progress0%

What is a Large Language Model?

A large language model (LLM) is a neural network trained on massive amounts of text to predict what token comes next. That tiny capability — predicting the next token — turns out to be enough to power conversation, code generation, translation, summarization, and reasoning.

The Core Idea: Next-Token Prediction

Every LLM, no matter how sophisticated, is fundamentally doing this loop:

  1. Take the input tokens so far
  2. Output a probability distribution over the next token
  3. Pick one (deterministically or by sampling)
  4. Append it and repeat

Everything else — instruction following, tool use, reasoning — emerges from how this loop is trained and prompted.

The Transformer Architecture

Modern LLMs use the transformer architecture (Vaswani et al., 2017). The key innovation is self-attention: each token can look at every other token in the input to decide what's relevant.

  • Attention heads learn different relationship patterns (subject-verb, coreference, style)
  • Layers stack attention + feedforward blocks; deeper layers capture more abstract patterns
  • Parameters are the learned weights; bigger models can encode more knowledge

Pretraining vs. Fine-Tuning

StageDataGoal
PretrainingTrillions of tokens of internet textLearn general language + world knowledge
Fine-tuningCurated task-specific examplesSpecialize on a domain or behavior
RLHF / preference tuningHuman preferencesAlign outputs with what people want

What "70B parameters" Actually Means

Parameters are the weights in the network. More parameters = more capacity, but also more compute and memory. Rough modern landscape: small models (1-8B) run on a laptop, medium models (30-70B) need a high-end GPU, frontier models (100B+) live in cloud datacenters.

Key Takeaways
  • LLMs are next-token predictors — everything else emerges from that
  • Transformer self-attention lets every token consider every other token
  • Parameter count, training data, and alignment all shape capability

AI Learning Assistant

Powered by advanced LLM

Get personalized help with LLM concepts, prompting strategies, and model selection.

Course Stats

Estimated Time
65 min
Lessons
5 sections