CatBoost
A gradient boosting library by Yandex that handles categorical features natively without requiring manual encoding. CatBoost also addresses prediction shift and target leakage.
Why It Matters
CatBoost simplifies the ML pipeline by eliminating the need for manual categorical encoding — a common source of bugs and data leakage.
Example
Training a model on a dataset with features like 'city,' 'product_category,' and 'day_of_week' without converting them to numbers first — CatBoost handles it natively.
Think of it like...
Like a chef who can work with any ingredient in its raw form — no need to pre-process or convert before cooking.
Related Terms
XGBoost
Extreme Gradient Boosting — an optimized implementation of gradient boosting that is fast, accurate, and the most winning algorithm in machine learning competitions on tabular data.
LightGBM
Light Gradient Boosting Machine — Microsoft's gradient boosting framework optimized for speed and efficiency. LightGBM uses histogram-based splitting and leaf-wise growth for faster training.
Gradient Boosting
An ensemble technique that builds models sequentially, where each new model focuses on correcting the errors made by previous models. It combines many weak learners into a single strong learner.
Ensemble Learning
A strategy that combines multiple models to produce better predictions than any single model alone. Ensemble methods leverage the diversity of different models to reduce errors.