🟪 1-Minute Summary
Decision trees make predictions by learning decision rules from features, creating a tree structure of if-then conditions. Splits data recursively to maximize purity (using Gini or entropy). Pros: highly interpretable, handles non-linear relationships, no scaling needed. Cons: prone to overfitting, unstable (small data changes = big tree changes). Control depth to prevent overfitting.
🟦 Core Notes (Must-Know)
How Decision Trees Work
[Content to be filled in]
Splitting Criteria
[Content to be filled in]
- Gini impurity
- Entropy (Information Gain)
- MSE (for regression)
Tree Structure
[Content to be filled in]
- Root node
- Internal nodes
- Leaf nodes
Hyperparameters
[Content to be filled in]
- max_depth
- min_samples_split
- min_samples_leaf
Pros & Cons
[Content to be filled in]
🟨 Interview Triggers (What Interviewers Actually Test)
Common Interview Questions
-
“How does a decision tree decide where to split?”
- [Answer: Maximizes information gain (entropy) or minimizes Gini impurity]
-
“What’s the difference between Gini and Entropy?”
- [Answer framework to be filled in - both measure impurity, similar results]
-
“Why do decision trees overfit easily?”
- [Answer: Can grow deep and memorize training data]
-
“How do you prevent overfitting in decision trees?”
- [Answer: Limit depth, min_samples_split, pruning]
🟥 Common Mistakes (Traps to Avoid)
Mistake 1: Not limiting tree depth
[Content to be filled in]
Mistake 2: Using decision tree without ensemble
[Content to be filled in - single trees are unstable]
Mistake 3: Treating decision trees as always interpretable
[Content to be filled in - deep trees are not interpretable]
🟩 Mini Example (Quick Application)
Scenario
[Customer churn prediction]
Solution
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# Example to be filled in
# Include tree visualization
🔗 Related Topics
Navigation: