Machine Learning

Continual Pre-Training

Extending a pre-trained model's training on new domain-specific data without starting from scratch. It adapts the model to a new domain while preserving general capabilities.

Why It Matters

Continual pre-training is cheaper than training from scratch and produces better domain models than fine-tuning alone — the sweet spot for domain adaptation.

Example

Taking Llama 3 and continuing pre-training on 100B tokens of financial documents, producing a finance-specialized model that retains general capabilities.

Think of it like...

Like a doctor doing a fellowship after residency — they already have broad medical knowledge and are now deepening expertise in a specialty.

Related Terms