Name: LLM Fundamentals
Availability: InStock

What is a Large Language Model?

A large language model (LLM) is a neural network trained on massive amounts of text to predict what token comes next. That tiny capability — predicting the next token — turns out to be enough to power conversation, code generation, translation, summarization, and reasoning.

The Core Idea: Next-Token Prediction

Every LLM, no matter how sophisticated, is fundamentally doing this loop:

Take the input tokens so far
Output a probability distribution over the next token
Pick one (deterministically or by sampling)
Append it and repeat

Everything else — instruction following, tool use, reasoning — emerges from how this loop is trained and prompted.

The Transformer Architecture

Modern LLMs use the transformer architecture (Vaswani et al., 2017). The key innovation is self-attention: each token can look at every other token in the input to decide what's relevant.

Attention heads learn different relationship patterns (subject-verb, coreference, style)
Layers stack attention + feedforward blocks; deeper layers capture more abstract patterns
Parameters are the learned weights; bigger models can encode more knowledge

Pretraining vs. Fine-Tuning

Stage	Data	Goal
Pretraining	Trillions of tokens of internet text	Learn general language + world knowledge
Fine-tuning	Curated task-specific examples	Specialize on a domain or behavior
RLHF / preference tuning	Human preferences	Align outputs with what people want

What "70B parameters" Actually Means

Parameters are the weights in the network. More parameters = more capacity, but also more compute and memory. Rough modern landscape: small models (1-8B) run on a laptop, medium models (30-70B) need a high-end GPU, frontier models (100B+) live in cloud datacenters.

Key Takeaways

LLMs are next-token predictors — everything else emerges from that
Transformer self-attention lets every token consider every other token
Parameter count, training data, and alignment all shape capability

LLM Fundamentals

What is a Large Language Model?

What is a Large Language Model?

The Core Idea: Next-Token Prediction

The Transformer Architecture

Pretraining vs. Fine-Tuning

What "70B parameters" Actually Means

Tokenization

Prompting & Instructions

Temperature, Top-p, and Sampling

Context Windows & Limits

AI Learning Assistant

Course Stats

Course Navigation