Logistic Regression: Notes and Interview Questions

What is Logistic Regression?

It’s a classification algorithm, that is used where the response variable is categorical. The idea of Logistic Regression is to find a relationship between features and the probability of a particular outcome.
Binomial Logistic Regression - response variable has two values 0 and 1 or pass and fail.
Multinomial Logistic Regression - response variable can have three or more possible values.

The idea of Logistic Regression.

f(z) = 1/(1+e -z )
The values of Z will vary from -infinity to +infinity. The values of a logistic function will range from 0 to 1. Logistic regression can convert the values of logits (logodds), which can range from -infinity to +infinity to a range between 0 and 1.

What are the assumptions of Logistic Regression?

Linear Relation between independent features and the log odds (logit of the outcome).
No multicollinearity among predictors.
Observations to be independent of each other.

Advantages

Logistic Regression Are very easy to understand
It requires less training
Good accuracy for many simple data sets and it performs well when the dataset is linearly separable.
It makes no assumptions about distributions of classes in feature space.
Logistic regression is less inclined to over-fitting but it can overfit in high dimensional datasets.One may consider Regularization (L1 and L2) techniques to avoid over-fittingin these scenarios.
Logistic regression is easier to implement, interpret, and very efficient to train.

Disadvantages

Sometimes Lot of Feature Engineering Is required
If the independent features are correlated it may affect performance
It is often quite prone to noise and overfitting
If the number of observations is lesser than the number of features, Logistic Regression should not be used, otherwise, it may lead to overfitting.
Non-linear problems can’t be solved with logistic regression because it has a linear decision surface. Linearly separable data is rarely found in real-world scenarios.
It is tough to obtain complex relationships using logistic regression. More powerful and compact algorithms such as Neural Networks can easily outperform this algorithm.
In Linear Regression independent and dependent variables are related linearly. But Logistic Regression needs that independent variables are linearly related to the log odds (log(p/(1-p)).

Whether Feature Scaling is required?

Yes.

Missing Values

Sensitive to missing values.

Impact of outliers?

Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain.

Why is logistic regression called regression and not classification?

In linear regression, we predict a real-valued output y based on a weighted sum of input variables. 
y=c+x1∗w1+x2∗w2+........+xn∗wn
The aim of linear regression is to estimate values for the model coefficients c, w1, w2...wn and fit the training data with minimal squared error and predict the output y.
Logistic regression does the same thing, but with one addition. It runs the result through a special non-linear function (called the logistic function or sigmoid function) to produce the output y.
y=logistic(c+x1∗w1+x2∗w2+....+xn∗wn) 
y=1/1+e^[−(c+x1∗w1+x2∗w2+....+xn∗wn)] 
Here model builds a regression model just like linear regression to predict the probability that a given data entry belongs to the category. Hence it is called regression.

How to measure the accuracy of logistic regression?

Where the prediction is < 0.5 there the predicted variable = 0. Where the prediction is >= 0.5 there the predicted variable = 1.
Confusion matrix is used to measure the accuracy of the logistic regression.

What are odds?

Ratio of the probability of an event occurring to the probability of the event not occurring.
Odds = p/q
p: probability of winning
q: probability of not winning

Odds Ratio: Ratio of two odds.
Odds Ratio = odds(1)/odds(0)
Eg: odds(heads of fair coin)/odds(biased coin)

Odds Ratio for a variable in Log Reg represents how the odds change with 1 unit increase in that variable keeping other variables constant.
Eg: weight = X, sleep apnea = Y
odds ratio(weight) = 1.07
Means 1 kg increase in weight, increases log(odds) of having sleep apnea by 1.07.

What are the outputs of the logistic model and the logistic function?

Logistic model outputs the logits, i.e. log odds.
Logistic function outputs the probabilities.
Logistic model: z = c+x1∗w1+x2∗w2+........+xn∗wn
Logistic function: f(z) = 1/1+e^[−(c+x1∗w1+x2∗w2+....+xn∗wn)] 

ln (odds) = ln (p/1-p) = logit (p) = β0 + β1X1                    [where loge x = ln x]
p lies between 0 and 1.
Logit value ranges between -infinity to +infinity.

f(z) = 1/(1+e -z )
Inverse Logit = Sigmoid function graph
Lies between 0 and 1.

What is the likelihood function?

The likelihood function gives the probability of observing the results using unknown parameters.

What is the Maximum Likelihood Estimator (MLE)?

The MLE chooses those sets of unknown parameters (estimator/coefficients) that maximize the likelihood function. The method to find the MLE is to use calculus and setting the derivative of the logistic function with respect to an unknown parameter to zero, and solving it will give the MLE.
The regression coefficients for Logistic Reg are calculated using MLE.

What is the output of a standard MLE program?

Maximized likelihood value: This is the numerical value obtained by replacing the unknown parameter values in the likelihood function with the MLE parameter estimator.
Estimated variance-covariance matrix: The diagonal of this matrix consists of estimated variances of the ML estimates. The off-diagonal consists of the covariances of the pairs of the ML estimates.

Why can’t we use MSE as a cost function for logistic regression?

In logistic regression, we use the sigmoid function and perform a non-linear transformation to obtain the probabilities. Squaring this prediction (as we do in MSE), results in a non-convex function with many local minimums. If our cost function has many local minimums, gradient descent may not find the optimal global minimum.
Instead of MSE, we use a cost function called Cross-Entropy, also known as Log Loss. Cross-entropy loss can be divided into two separate cost functions: one for y=1 and one for y=0. In the cost function for logistic regression, the confident wrong predictions are penalized heavily. The confident right predictions are rewarded less. By optimizing this cost function, convergence is achieved.

There are three steps to find Log Loss:
- To find corrected probabilities.
- Take a log of corrected probabilities.
- Take the negative average of the values we get in the 2nd step.

What are the meanings of alpha and beta in a logistic regression model?

Alpha is the log odds for an instance when none of the attributes is taken into consideration.
Beta is the value by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed.

Is the decision boundary linear or nonlinear in the case of a logistic regression model?

In case of a logistic regression model, the decision boundary is a straight line. Logistic regression is only suitable in such cases where a straight line is able to separate the different classes. If a straight line is not able to do it, then nonlinear algorithms should be used to achieve better results.

How will you deal with the multiclass classification problem using logistic regression?

Using the one-vs-all approach. Under this approach, a number of models are trained, which is equal to the number of classes. For example, the first model classifies the datapoint depending on whether it belongs to class 1 or some other class; the second model classifies the datapoint into class 2 or some other class. This way, each data point can be checked over all the classes.


**All questions and notes have been compiled from various sources.

Comments

Popular posts from this blog

Linear Regression: Notes and Interview Questions

Decision Tree: Notes and Interview Questions