Last time we did the data science basics article, where we discussed the different types of Machine Learning. Today, we’ll discuss the different regression techniques available to us, and why we use them.

Regression for data science
Regression for data science

Simple Linear Regression

Regression is a technique for Machine Learning to predict values from a given information.

Consider a dataset on the details of the employee and their salary, for example.

Attributes such as ‘Age’, ‘Experience’, and ‘Salary’ will be included in this dataset. Here to predict the salary of a person who is probably working in the industry for 8 years, we can use regression.

We get the best fit line for the data by simple linear regression, and our values are predicted based on this line. The equation for this line looks like this:

y = b0 + b1 * x1

Y is the dependent variable in the above equation, which is predicted using the independent variable x1. Here the constants are b0 and b1.

Multiple Linear Regression

Multiple Linear Regression is an extension of Simple Linear Regression where for the prediction results, the model depends on more than 1 independent variable. For the various linear regressors, our equation looks as follows:

Y = b0 + b1 *x1 + b2 * x2 + …. + bn * xn

Y is a dependent variable here and our independent variables are x1, x2,..,xn, which are used to predict the value of y. Values like b0,b1,…bn work as constants.

How to implement

SLR

It is very simply done using the sklearn package in Python:

Logistic Regression helps one to categorize the training data using the logit (/sigmoid) function to match the result of the dependent binary variable:


In addition, the logit function relies entirely on the importance of the odds and the likelihood of estimating the binary answer vector.

from sklearn.linear_model import LogisticRegression
logit= LogisticRegression(class_weight='balanced' , random_state=0).fit(X_train,Y_train)
target = logit.predict(X_test)

Lasso Regression

Another regularised variant of linear regression is the lowest absolute shrinkage and selection operator regression (usually also called lasso regression):

It applies a regularisation term to the cost function, just like peak regression.

But instead of half the square of the l2 norm, it utilizes the L1 norm of the weight vector.

from sklearn.linear_model import Lasso 
lasso=Lasso()

Ridge Regression

Ridge regression is a variant of linear regression that is regularised. In addition to fitting the input, this forces the training algorithm to make the model weights as small as possible.

from sklearn.linear_model import Ridge
ridge =Ridge()

Notice the squared y-term in the ridge formula.

And that’s it for regression techniques! Well done, and keep coding.

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)</pre>

Here x_train and y_train are the results of using sklearn’s train-test-split function on a dataset. I’ll show you implementations of all of these in another article. For now, internalize the concepts.


MLR


The multiple LR case is also the same as simple, only the target labels need to be encoded into categories:


from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelEncoder_X = LabelEncoder() 
X= labelEncoder_X.fit_transform(X)</pre>

Other regression techniques


Logistic Regression


Logistic Regression helps one to categorize the training data using the logit (/sigmoid) function to match the result of the dependent binary variable:

In addition, the logit function relies entirely on the importance of the odds and the likelihood of estimating the binary answer vector.

from sklearn.linear_model import LogisticRegression
logit= LogisticRegression(class_weight='balanced' , random_state=0).fit(X_train,Y_train)
target = logit.predict(X_test)

Lasso Regression

Another regularised variant of linear regression is the lowest absolute shrinkage and selection operator regression (usually also called lasso regression):

It applies a regularisation term to the cost function, just like peak regression.

But instead of half the square of the l2 norm, it utilizes the L1 norm of the weight vector.

from sklearn.linear_model import Lasso 
lasso=Lasso()

Ridge Regression

Ridge regression is a variant of linear regression that is regularised. In addition to fitting the input, this forces the training algorithm to make the model weights as small as possible.

from sklearn.linear_model import Ridge
ridge =Ridge()

Notice the squared y-term in the ridge formula.

And that’s it for regression techniques! Well done, and keep coding.

Doubts? WhatsApp me !