Random Forest: Notes and Interview Questions

What is bias? What is variance?

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. For high bias the difference is high, and for low bias it's low.
Model with high bias always leads to high error on training and test data. High bias would cause an algorithm to miss relevant relations between the input features and the target outputs. This is sometimes referred to as underfitting.

Low Bias: Suggests fewer assumptions about the form of the target function.
High-Bias: Suggests more assumptions about the form of the target function.
Examples of low-bias: Decision Trees, k-Nearest Neighbors, Support Vector Machines.
Examples of high-bias: Linear Regression, Linear Discriminant Analysis, Logistic Regression.

Variance is the value that tells us about the spread of our data. High variance means the predicted values are more scattered in relation to each other, and low variance means less scattered. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data. This is most commonly referred to as overfitting.
Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.
Examples of low-variance: Linear Regression, Linear Discriminant Analysis, Logistic Regression.
Examples of high-variance: Decision Trees, k-Nearest Neighbors, Support Vector Machines.

Linear machine learning algorithms often have a high bias but a low variance.
Nonlinear machine learning algorithms often have a low bias but a high variance.
In most cases, attempting to minimize one of these two errors, would lead to increasing the other. Thus the two are usually seen as a trade-off.


What are Ensemble Methods? What is Bagging?

Ensemble methods - which combine several decision trees to produce better predictive performance than utilizing a single decision tree. The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner.
Two techniques to perform ensemble decision trees:
1. Bagging
2. Boosting
Bagging (Bootstrap Aggregation) is used when our goal is to reduce the variance of a decision tree. 
Here the idea is to create several subsets of data from the training sample chosen randomly with replacement. Now, each collection of subset data is used to train their decision trees. As a result, we end up with an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree.

What is Random Forest? How does it work?

In Random Forest, we grow multiple trees as opposed to a single tree. To classify a new object based on attributes, each tree gives a classification. The forest chooses the classification having the most votes (over all the trees in the forest) and in case of regression, it takes the average of outputs by different trees.

Advantages of Random Forest.

- Doesn't Overfit.
- Less Parameter Tuning required.
- Decision Tree can handle both continuous and categorical variables.
- No feature scaling (standardization and normalization) required in case of Random Forest as it uses Decision Tree internally.
- Suitable for any kind of ML problems.

Disadvantages of Random Forest.

- Biased in multiclass classification problems towards more frequent classes.


**All questions and notes have been compiled from various sources.

Comments

Popular posts from this blog

Linear Regression: Notes and Interview Questions

Recommendation Systems: Notes and Interview Questions

Logistic Regression: Notes and Interview Questions