Regression in Machine Learning

Regression is a statistical technique that is used to analyze the relationship between a dependent variable and one or more independent variables. The objective is to find the most suitable function that characterizes the link between these variables. It seeks to find the best-fit model, which can be used to make predictions or conclusions.

In Machine Learning Regression is one of the supervised machine learning techniques.

A story that makes regression easy to understand

Imagine you’re a detective trying to crack the case of house prices. You know there are some key suspects: the size of the house and the number of bedrooms. These are your independent variables, the things you think might influence the price (the dependent variable), the big mystery you are trying to solve. Here’s where a regression model comes in. It’s like a fancy interrogation technique. You gather information on a bunch of houses â€“ their size, bedrooms, and the selling price. The model then analyzes all this evidence and tries to establish a connection between the suspects (independent variables) and the crime scene (dependent variable).

Types of regression in machine learning

Linear Regression
Multiple Regression
Polynomial Regression
Logistic Regression
Ridge Regression and Lasso Regression

Linear Regression:
In linear regression, the relationship between the dependent variable and independent variable(s) is assumed to be linear. It’s a straightforward approach suitable for situations where variables show a linear trend, approximated by a straight line.

class LinearRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        # Gradient descent
        for _ in range(self.n_iterations):
            y_predicted = np.dot(X, self.weights) + self.bias

            # Compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / n_samples) * np.sum(y_predicted - y)

            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

model = LinearRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X, y)

Types of Linear Regression in Machine Learning

Simple Linear Regression
Multiple Linear Regression

Simple Linear Regression:

Simple linear regression is a statistical technique that is used to find the relationship between a single independent variable (X) and a dependent variable (Y). It assumes a linear relationship between the two variables, which can be represented by a straight-line equation:

Y = Î²0 + Î²1X + Îµ

Where:

Y is the dependent variable (we want to predict),
X is the independent variable(used to make predictions),
Î²0 is the intercept of the line (the value of Y when X is 0),
Î²1 is the slope of the line (the change in Y for a one-unit change in X),
Îµ is the error term (the difference between the predicted and actual values of Y).
Here’s an example to illustrate simple linear regression:

Let’s say we want to predict the score a student will get on a test (Y) based on the number of hours they study (X). We have data from several students showing their study hours and corresponding test scores. We can use simple linear regression to model this relationship.

Here’s a simplified dataset:

Using simple linear regression, we can fit a line to this data that represents the relationship between study hours and test scores. After fitting the model, we can use it to predict test scores for new students based on the number of hours they study.

The regression equation might look like this:

Test Score = Î²0 + Î²1 * Hours Studied + Îµ

After fitting the model to the data, we might find that the equation is:

Test Score = 50 + 5 * Hours Studied + Îµ

This equation tells us that for every additional hour a student studies, we expect their test score to increase by 5 points, assuming all else remains constant. The intercept (Î²0) indicates that if a student studied 0 hours, we would expect their test score to be 50.

We can then use this equation to predict the test score of a student who studies, for example, 7 hours:

Test Score = 50 + 5 * 7 = 85

Multiple Linear Regression:

Multiple linear regression is a method to predict an outcome based on two or more input factors. Imagine you’re trying to predict a person’s weight based on their diet and exercise habits. Instead of just considering one factor like diet, multiple linear regression allows you to consider both diet and exercise together.

The equation for multiple linear regression looks like this:

y=b0+b1x1+b2x2+ ……..+bnxn

Here,

y is the predicted outcome.
b0 is the intercept, y value when all the input factors are zero.
b1,b2,…… bn are the coefficients that represent how much each input factor affects the outcome.
x1,x2,……,xn are the values of the input factors.

Multiple Regression:

Multiple regression builds upon linear regression by incorporating various independent variables to predict the dependent variable. It’s utilized when two or more independent variables influence the dependent variable.

Polynomial Regression:

Polynomial regression is employed when the relationship between the independent and dependent variables isn’t linear. This method allows for fitting a curve to the data, offering flexibility for capturing more complex relationships beyond straight lines.

Logistic Regression:

Logistic regression is specifically designed for situations where the dependent variable is binary (e.g., yes/no, 0/1). It models the probability of the dependent variable belonging to a particular category, making it suitable for classification tasks.

Ridge Regression and Lasso Regression:

Ridge and Lasso regressions are regularization techniques applied to linear regression models to prevent overfitting and enhance performance. Ridge regression introduces a penalty term to the regression equation, while Lasso regression induces sparsity in the coefficient estimates, aiding in feature selection.