Linear regression is a statistical method to model the relationship between a dependent variable \( y \) and one or more independent variables \( x \). The simplest form, called simple linear regression, involves a single independent variable.
The equation of a simple linear regression line is given by:
\( y = mx + b \)
where:
The goal of linear regression is to find the best-fitting line through the data points. This line minimizes the sum of the squared differences (errors) between the observed values and the predicted values. This method is known as the "least squares" method.
The quality of the fit can be measured using the R-squared value, which indicates the proportion of the variance in the dependent variable that is predictable from the independent variable. An R-squared value closer to 1 indicates a better fit.
Mean Squared Error (MSE) is another metric used to evaluate the performance of the regression model. It is the average of the squared differences between the observed values and the predicted values. Lower MSE values indicate better fit.
R-squared is calculated as:
\( R^2 = 1 - \frac{\text{MSE}_{\text{r}}}{\text{MSE}_{\text{b}}} \)
where \(\text{MSE}_{\text{r}}\) is the Mean Squared Error of the regression model and \(\text{MSE}_{\text{b}}\) is the Mean Squared Error of the base model, which uses the mean of the observed values as the predictor.