Null and Missing Value Treatment

🟪 1-Minute Summary

Missing values are inevitable in real datasets. Treatment options: (1) Drop (if < 5% missing or MCAR), (2) Impute (mean/median/mode for numerical, mode for categorical, or advanced methods like KNN/MICE), (3) Create missing indicator (if missingness is informative). Choice depends on missingness mechanism (MCAR, MAR, MNAR) and percentage missing.

🟦 Core Notes (Must-Know)

Types of Missingness

[Content to be filled in]

MCAR (Missing Completely At Random)
MAR (Missing At Random)
MNAR (Missing Not At Random)

Detection Strategies

[Content to be filled in]

Treatment Options

Option 1: Drop

[Content to be filled in]

Option 2: Simple Imputation

[Content to be filled in]

Option 3: Advanced Imputation

[Content to be filled in]

Option 4: Missing Indicator Feature

[Content to be filled in]

Decision Framework

[Content to be filled in]

🟨 Interview Triggers (What Interviewers Actually Test)

Common Interview Questions

“30% of values are missing in a key feature. What do you do?”
- [Answer framework: Check if missingness is informative first]
“When would you drop rows vs impute?”
- [Answer: Drop if MCAR and < 5%, otherwise impute]
“What’s wrong with always using mean imputation?”
- [Answer: Reduces variance, doesn’t work for MNAR, ignores relationships]

🟥 Common Mistakes (Traps to Avoid)

Mistake 1: Always using mean/median imputation

[Content to be filled in]

Mistake 2: Dropping rows before checking patterns

[Content to be filled in]

Mistake 3: Imputing before train-test split

[Content to be filled in - causes data leakage]

🟩 Mini Example (Quick Application)

Scenario

[Missing value treatment example]

Solution

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer, KNNImputer

# Detection
print(df.isnull().sum())
print(df.isnull().sum() / len(df) * 100)  # Percentage

# Visualization
import missingno as msno
msno.matrix(df)

# Treatment examples to be filled in...

Navigation:

Null and Missing Value Treatment

Arun Murali

🟪 1-Minute Summary

🟦 Core Notes (Must-Know)

Types of Missingness

Detection Strategies

Treatment Options

Option 1: Drop

Option 2: Simple Imputation

Option 3: Advanced Imputation

Option 4: Missing Indicator Feature

Decision Framework

🟨 Interview Triggers (What Interviewers Actually Test)

Common Interview Questions

🟥 Common Mistakes (Traps to Avoid)

Mistake 1: Always using mean/median imputation

Mistake 2: Dropping rows before checking patterns

Mistake 3: Imputing before train-test split

🟩 Mini Example (Quick Application)

Scenario

Solution

🔗 Related Topics