🟪 1-Minute Summary
Undersampling reduces the majority class to match minority class size. Simple and fast. Types: random undersampling, Tomek links, NearMiss. Pros: faster training, reduces class imbalance. Cons: loses information, may underfit. Use when you have abundant data and can afford to discard some. Alternative: oversampling (SMOTE) when data is limited.
🟦 Core Notes (Must-Know)
What is Undersampling?
[Content to be filled in]
Types of Undersampling
[Content to be filled in]
- Random undersampling
- Tomek links
- NearMiss
When to Use Undersampling
[Content to be filled in]
Pros & Cons
[Content to be filled in]
Undersampling vs Oversampling
[Content to be filled in]
🟨 Interview Triggers (What Interviewers Actually Test)
Common Interview Questions
-
“What’s the tradeoff of undersampling?”
- [Answer: Lose information vs balanced classes]
-
“When would you undersample vs oversample?”
- [Answer: Undersample with abundant data, oversample with limited data]
🟥 Common Mistakes (Traps to Avoid)
Mistake 1: Undersampling before train-test split
[Content to be filled in - data leakage]
Mistake 2: Random undersampling losing important patterns
[Content to be filled in]
🟩 Mini Example (Quick Application)
Scenario
[Balance imbalanced dataset]
Solution
from imblearn.under_sampling import RandomUnderSampler
from sklearn.model_selection import train_test_split
# Example to be filled in
🔗 Related Topics
Navigation: