🟪 1-Minute Summary

Undersampling reduces the majority class to match minority class size. Simple and fast. Types: random undersampling, Tomek links, NearMiss. Pros: faster training, reduces class imbalance. Cons: loses information, may underfit. Use when you have abundant data and can afford to discard some. Alternative: oversampling (SMOTE) when data is limited.


🟦 Core Notes (Must-Know)

What is Undersampling?

[Content to be filled in]

Types of Undersampling

[Content to be filled in]

  • Random undersampling
  • Tomek links
  • NearMiss

When to Use Undersampling

[Content to be filled in]

Pros & Cons

[Content to be filled in]

Undersampling vs Oversampling

[Content to be filled in]


🟨 Interview Triggers (What Interviewers Actually Test)

Common Interview Questions

  1. “What’s the tradeoff of undersampling?”

    • [Answer: Lose information vs balanced classes]
  2. “When would you undersample vs oversample?”

    • [Answer: Undersample with abundant data, oversample with limited data]

🟥 Common Mistakes (Traps to Avoid)

Mistake 1: Undersampling before train-test split

[Content to be filled in - data leakage]

Mistake 2: Random undersampling losing important patterns

[Content to be filled in]


🟩 Mini Example (Quick Application)

Scenario

[Balance imbalanced dataset]

Solution

from imblearn.under_sampling import RandomUnderSampler
from sklearn.model_selection import train_test_split

# Example to be filled in


Navigation: