Data Science Interview Cheat Sheets
Quick reference guides organized by topic. These are meant for last-minute review before interviews.
π Statistics & Probability
Cheat Sheet: Descriptive Statistics
| Metric |
Formula |
Use Case |
| Mean |
Ξ£x / n |
Central tendency, continuous data |
| Median |
Middle value |
Skewed distributions, outliers present |
| Mode |
Most frequent |
Categorical data |
| Variance |
Ξ£(x - ΞΌ)Β² / n |
Data spread |
| Std Dev |
βVariance |
Same units as data |
Cheat Sheet: Probability Distributions
| Distribution |
Type |
Parameters |
Use Case |
| Normal |
Continuous |
ΞΌ, Ο |
Natural phenomena, errors |
| Binomial |
Discrete |
n, p |
Success/failure trials |
| Poisson |
Discrete |
Ξ» |
Rare events over time |
| Uniform |
Continuous |
a, b |
Equal probability |
π¬ Hypothesis Testing Decision Tree
Start: Do you have a question about relationships?
β
ββ YES β What type of data?
β β
β ββ Categorical vs Categorical β Chi-Square Test
β β
β ββ Numerical vs Categorical (2 groups) β t-test
β β ββ Known population Ο? β Z-test
β β ββ Unknown Ο, small sample β t-test
β β
β ββ Numerical vs Categorical (3+ groups) β ANOVA
β β
β ββ Numerical vs Numerical β Correlation / Regression
β
ββ NO β EDA / Descriptive Statistics
Cheat Sheet: Hypothesis Tests Comparison
| Test |
Data Types |
Null Hypothesis |
When to Use |
| Chi-Square |
Cat vs Cat |
No association |
Independence, goodness-of-fit |
| t-test |
Num vs Cat (2 groups) |
Means are equal |
Compare 2 group means |
| Z-test |
Num vs Cat (2 groups) |
Means are equal |
Large sample, known Ο |
| ANOVA |
Num vs Cat (3+ groups) |
All means equal |
Compare 3+ group means |
| F-test |
Num vs Num |
Variances equal |
Compare variances |
π€ Machine Learning Algorithms
Cheat Sheet: Supervised Learning Algorithm Selection
| Algorithm |
Problem Type |
Pros |
Cons |
When to Use |
| Linear Regression |
Regression |
Fast, interpretable |
Assumes linearity |
Linear relationships |
| Logistic Regression |
Classification |
Interpretable, probabilities |
Linear boundary |
Binary/multi-class, need probabilities |
| Decision Tree |
Both |
Non-linear, interpretable |
Overfits easily |
Complex patterns, explainability needed |
| Random Forest |
Both |
Reduces overfitting, robust |
Slow, black box |
High accuracy, less interpretable OK |
| KNN |
Both |
Simple, no training |
Slow prediction, sensitive to scale |
Small datasets, simple patterns |
Cheat Sheet: Clustering Algorithms
| Algorithm |
Type |
Pros |
Cons |
When to Use |
| K-Means |
Partitioning |
Fast, scalable |
Need to set K, spherical clusters |
Large datasets, known # clusters |
| Hierarchical |
Agglomerative/Divisive |
No need to set K, dendrogram |
Slow, memory intensive |
Small datasets, explore # clusters |
π Model Evaluation Metrics
Cheat Sheet: Regression Metrics
| Metric |
Formula |
Range |
Interpretation |
When to Use |
| RMSE |
β(Ξ£(y - Ε·)Β² / n) |
[0, β] |
Same units as target |
Penalize large errors |
| MAE |
Ξ£|y - Ε·| / n |
[0, β] |
Same units as target |
Treat all errors equally |
| MAPE |
(100/n) * Ξ£|y - Ε·|/|y| |
[0, β]% |
Percentage error |
Relative error important |
| RΒ² |
1 - (SS_res / SS_tot) |
(-β, 1] |
Variance explained |
Model comparison |
Cheat Sheet: Classification Metrics
| Metric |
Formula |
Range |
When to Use |
| Accuracy |
(TP + TN) / Total |
[0, 1] |
Balanced classes |
| Precision |
TP / (TP + FP) |
[0, 1] |
Minimize false alarms |
| Recall |
TP / (TP + FN) |
[0, 1] |
Find all positives (e.g., disease detection) |
| F1-Score |
2 * (Prec * Rec) / (Prec + Rec) |
[0, 1] |
Balance precision & recall |
| AUC-ROC |
Area under ROC curve |
[0, 1] |
Overall classifier performance |
Confusion Matrix Quick Reference
Predicted
Pos Neg
Actual Pos TP FN
Neg FP TN
- Precision = “Of all predicted positives, how many were correct?”
- Recall = “Of all actual positives, how many did we find?”
π― Overfitting vs Underfitting
| Aspect |
Underfitting |
Good Fit |
Overfitting |
| Training Error |
High |
Low |
Very Low |
| Validation Error |
High |
Low |
High |
| Model Complexity |
Too simple |
Just right |
Too complex |
| What’s happening |
Not learning patterns |
Learning generalizable patterns |
Memorizing noise |
| Fix |
More features, complex model |
β Good to go |
Regularization, more data, simpler model |
π§ Regularization
| Technique |
Type |
Formula |
Effect |
When to Use |
| Ridge (L2) |
Linear |
+ λΣβ² |
Shrinks coefficients |
Multicollinearity, keep all features |
| Lasso (L1) |
Linear |
+ λΣ|β| |
Sets some Ξ² to 0 |
Feature selection needed |
π² Ensemble Methods
| Method |
Type |
How it Works |
Best For |
| Random Forest |
Bagging |
Average of many trees |
Reduce variance, high accuracy |
| AdaBoost |
Boosting |
Sequential, focus on errors |
Weak learners, binary classification |
| Gradient Boosting |
Boosting |
Sequential, fit residuals |
High accuracy, regression/classification |
| XGBoost |
Boosting |
Optimized gradient boosting |
Competition winning, production systems |
# Standard Error
SE = Ο / βn
# Z-Score
z = (x - ΞΌ) / Ο
# Confidence Interval
CI = xΜ Β± (z * SE)
# RΒ² (coefficient of determination)
RΒ² = 1 - (SS_residual / SS_total)
# Bias-Variance Tradeoff
Total Error = BiasΒ² + Variance + Irreducible Error
πΊοΈ Navigation
Pro Tip: Print these cheat sheets and review them the night before your interview!