🟪 1-Minute Summary

Imbalanced data occurs when classes have very different frequencies (e.g., 95% no-fraud, 5% fraud). Problems: model biased toward majority class, accuracy misleading. Solutions: (1) Resampling (over/undersample), (2) Different metrics (precision/recall/F1, not accuracy), (3) Class weights, (4) Anomaly detection. Choose based on data size and importance of minority class.


🟦 Core Notes (Must-Know)

What is Imbalanced Data?

[Content to be filled in]

Why It’s a Problem

[Content to be filled in]

Detection

[Content to be filled in]

Solutions

[Content to be filled in]

  1. Resampling (SMOTE, undersampling)
  2. Class weights
  3. Different algorithms (tree-based handle well)
  4. Anomaly detection
  5. Different metrics
  6. Ensemble methods

🟨 Interview Triggers (What Interviewers Actually Test)

Common Interview Questions

  1. “How do you handle imbalanced data?”

    • [Answer: List approaches - resampling, class weights, metrics]
  2. “Why is accuracy bad for imbalanced data?”

    • [Answer: Dominated by majority class]
  3. “What metrics would you use instead?”

    • [Answer: Precision, Recall, F1, ROC-AUC]

🟥 Common Mistakes (Traps to Avoid)

Mistake 1: Using accuracy

[Content to be filled in]

Mistake 2: Resampling before train-test split

[Content to be filled in - data leakage]


🟩 Mini Example (Quick Application)

Scenario

[Fraud detection with 2% fraud rate]

Solution

from imblearn.over_sampling import SMOTE
from sklearn.utils.class_weight import compute_class_weight

# Example to be filled in


Navigation: