🟪 1-Minute Summary
Cross-validation (CV) evaluates model performance by splitting data into K folds, training on K-1 folds and testing on the remaining fold, repeating K times. Averages results for more robust estimate than single train-test split. Common: K-Fold (K=5 or 10), Stratified K-Fold (preserves class distribution), Leave-One-Out. Use to detect overfitting and compare models fairly.
🟦 Core Notes (Must-Know)
What is Cross Validation?
[Content to be filled in]
Types of Cross Validation
[Content to be filled in]
- K-Fold CV
- Stratified K-Fold CV
- Leave-One-Out CV (LOOCV)
- Time Series CV
Why Use Cross Validation?
[Content to be filled in]
How to Choose K
[Content to be filled in]
🟨 Interview Triggers (What Interviewers Actually Test)
Common Interview Questions
-
“Explain how K-Fold cross-validation works”
- [Answer: Split into K parts, train on K-1, test on 1, repeat K times, average]
-
“Why use cross-validation instead of single train-test split?”
- [Answer: More robust estimate, less variance, uses all data]
-
“When would you use stratified K-Fold?”
- [Answer: Imbalanced classification to preserve class distribution]
-
“What’s the tradeoff of increasing K?”
- [Answer: More accurate but computationally expensive]
🟥 Common Mistakes (Traps to Avoid)
Mistake 1: Using regular K-Fold for imbalanced data
[Content to be filled in - use stratified]
Mistake 2: Not using CV for time series correctly
[Content to be filled in - use TimeSeriesSplit]
Mistake 3: Preprocessing before CV split
[Content to be filled in - data leakage]
🟩 Mini Example (Quick Application)
Scenario
[Evaluate model with cross-validation]
Solution
from sklearn.model_selection import cross_val_score, StratifiedKFold
# Example to be filled in
🔗 Related Topics
Navigation: