🟪 1-Minute Summary

Decision trees make predictions by learning decision rules from features, creating a tree structure of if-then conditions. Splits data recursively to maximize purity (using Gini or entropy). Pros: highly interpretable, handles non-linear relationships, no scaling needed. Cons: prone to overfitting, unstable (small data changes = big tree changes). Control depth to prevent overfitting.


🟦 Core Notes (Must-Know)

How Decision Trees Work

[Content to be filled in]

Splitting Criteria

[Content to be filled in]

  • Gini impurity
  • Entropy (Information Gain)
  • MSE (for regression)

Tree Structure

[Content to be filled in]

  • Root node
  • Internal nodes
  • Leaf nodes

Hyperparameters

[Content to be filled in]

  • max_depth
  • min_samples_split
  • min_samples_leaf

Pros & Cons

[Content to be filled in]


🟨 Interview Triggers (What Interviewers Actually Test)

Common Interview Questions

  1. “How does a decision tree decide where to split?”

    • [Answer: Maximizes information gain (entropy) or minimizes Gini impurity]
  2. “What’s the difference between Gini and Entropy?”

    • [Answer framework to be filled in - both measure impurity, similar results]
  3. “Why do decision trees overfit easily?”

    • [Answer: Can grow deep and memorize training data]
  4. “How do you prevent overfitting in decision trees?”

    • [Answer: Limit depth, min_samples_split, pruning]

🟥 Common Mistakes (Traps to Avoid)

Mistake 1: Not limiting tree depth

[Content to be filled in]

Mistake 2: Using decision tree without ensemble

[Content to be filled in - single trees are unstable]

Mistake 3: Treating decision trees as always interpretable

[Content to be filled in - deep trees are not interpretable]


🟩 Mini Example (Quick Application)

Scenario

[Customer churn prediction]

Solution

from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Example to be filled in
# Include tree visualization


Navigation: