Decision Tree: Notes and Interview Questions

What is a Decision Tree?

Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Advantages of Decision Tree.

- Simple to understand, interpret and visualize.
- Used for both classification and regression problems.
- Handle both continuous and categorical variables.
- No feature scaling required as it uses a rule-based approach instead of distance calculation.
- Handles non-linear parameters efficiently.
- Automatically handle missing values.
- Robust to outliers and can handle them automatically.

Disadvantages of Decision Tree.

- Generally leads to overfitting of the data which ultimately leads to wrong predictions.
- Due to the overfitting, there are very high chances of high variance in the output which leads to many errors in the final estimation.
- Adding a new data point can lead to regeneration of the overall tree and all nodes need to be recreated.
- If data size is large, then one single tree may lead to overfitting. Then use Random Forest.

To which kind of problems are decision trees most suitable?

- Decision trees are most suitable for tabular data.
- The outputs are discrete.
- Explanations for decisions are required.
- The training data may contain errors.
- The training data may contain missing attribute values.

Impact of outliers?

It is not sensitive to outliers. Since, extreme values or outliers, never cause much reduction in RSS, they are never involved in split. Hence, tree based methods are insensitive to outliers.

Performance Metrics?

Classification:
- Confusion Matrix
- Precision, Recall, F1 score
Regression:
- R2, Adjusted R2
- MSE, RMSE, MAE

What are Entropy and Information gain in Decision tree algorithm?

The core algorithm for building decision tree is called ID3. ID3 uses Entropy and Information Gain to construct a decision tree.

Entropy: A decision tree is built top-down from a root node and involves partitioning of data into homogeneous subsets. Entropy will give a measure of impurity in the dataset. If the sample is completely homogeneous/pure then entropy is 0 and if the sample is equally divided it has entropy of 1.

Information Gain: The Information Gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attributes that return the highest information gain. The attribute with the highest information gain is used as a root node, and a similar approach is followed after that.

On what basis is an attribute selected in the decision tree for choosing it as a node?

Attribute selection is done using Information Gain in decision trees. The attribute with maximum information gain is selected.

What is the inductive bias of decision trees?

Shorter trees are preferred over longer trees. Trees that place high information gain attributes close to the root are preferred over those that do not.

How does a decision tree handle continuous attributes?

The splitting of numerical features can be performed by sorting the feature in ascending order and trying each value as the threshold point and calculating the information gain for each value as the threshold. Finally, whichever split gives the best information gain (or whatever metric you're using) on the training data is used.

What is a decision stump?

If height or depth of the tree is exactly one then such a tree is called as a decision stump.

How to handle imbalanced class?

Imbalanced class does have a detrimental impact on the tree’s structure so it can be avoided by either using upsampling or by using downsampling depending upon the dataset.

How does a decision tree handle missing attribute values?

One way to assign the most common value of that attribute to the missing attribute value. The other way is to assign a probability to each of the possible values of the attribute based on other samples.

Algorithms used for deriving decision trees?

CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.
ID3 (Iterative Dichotomiser 3) → uses Entropy function and Information gain as metrics.

Steps to decide which attribute to split in ID3.

1. Compute the entropy for the dataset
2. For every attribute:
2.1 Calculate entropy for all categorical values.
2.2 Take average information entropy for the attribute.
2.3 Calculate gain for the current attribute.
3. Pick the attribute with the highest information gain.
4. Repeat until we get the desired tree

Steps to calculate Gini for a split.

1. Calculate Gini for subnodes, using formula sum of the square of probability for success and failure.
2. Calculate Gini for split using weighted Gini score of each node of that split.
3. Choose the split based on higher Gini value

Stop criterion.

Setting constraints on tree size:
- Providing minimum number of samples for a node split.
- Deploying the minimum number of samples for a terminal node (leaf).
- Allowing maximum depth of tree (vertical depth).
- Maximum number of terminal nodes.
- Maximum features to consider for the split.

Pruning is mostly done to reduce the chances of overfitting the tree to the training data and reduce 
the overall complexity of the tree.
- Pre-pruning is also known as the early stopping criteria. The tree stops growing when it meets any of these pre-pruning criteria, or it discovers the pure classes.
- In Post-pruning, the idea is to allow the decision tree to grow fully and observe the CP (Complexity Parameter) value. We prune/cut the tree with the optimal CP value as the parameter.

Feature selection in decision trees.

The goal of the feature selection is to find the features or attributes which lead to split in children nodes whose combined entropy sums up to lower entropy than the entropy value of data segment before the split. This implies higher information gain.


**All questions and notes have been compiled from various sources.

Comments

Popular posts from this blog

Linear Regression: Notes and Interview Questions

Recommendation Systems: Notes and Interview Questions

Logistic Regression: Notes and Interview Questions