Continual Pre-Training
Extending a pre-trained model's training on new domain-specific data without starting from scratch. It adapts the model to a new domain while preserving general capabilities.
Why It Matters
Continual pre-training is cheaper than training from scratch and produces better domain models than fine-tuning alone — the sweet spot for domain adaptation.
Example
Taking Llama 3 and continuing pre-training on 100B tokens of financial documents, producing a finance-specialized model that retains general capabilities.
Think of it like...
Like a doctor doing a fellowship after residency — they already have broad medical knowledge and are now deepening expertise in a specialty.
Related Terms
Pre-training
The initial phase of training a model on a large, general-purpose dataset before specializing it for specific tasks. Pre-training gives the model broad knowledge and capabilities.
Fine-Tuning
The process of taking a pre-trained model and further training it on a smaller, domain-specific dataset to specialize its behavior for a particular task or domain. Fine-tuning adjusts the model's weights to improve performance on the target task.
Transfer Learning
A technique where a model trained on one task is repurposed as the starting point for a model on a different but related task. Instead of training from scratch, you leverage knowledge the model has already acquired.
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks. Foundation models serve as the base upon which specialized applications are built.