What is Gradient Descent?
Gradient Descent is an optimization algorithm used to minimize the loss function in a machine learning
model.
It works by iteratively adjusting the model's parameters (in this case, the weight and bias) to find the
values that result in the lowest possible loss.
Key Concepts:
- Loss Function: Measures how far off the model's predictions are from the actual
values.
In this demo, we're using Mean Squared Error (MSE) as the loss function.
- Gradient: The slope or direction of change in the loss function. It tells us how to
adjust
the weight and bias to reduce the loss.
- Learning Rate: A small number that determines the step size for each iteration of
gradient descent.
A smaller learning rate means smaller steps and more iterations, while a larger learning rate means
bigger steps but can
overshoot the minimum.
How it Works:
The algorithm starts with initial guesses for the weight and bias, often set to zero. It then calculates
the
gradient of the loss function with respect to each parameter. The weight and bias are updated in the
direction
that decreases the loss, using the formula:
New Weight \( w_{\text{new}} = w_{\text{old}} - \eta \cdot
\frac{\partial L}{\partial w} \)
New Bias \( b_{\text{new}} = b_{\text{old}} - \eta \cdot
\frac{\partial L}{\partial b} \)
This process repeats for a specified number of iterations or until the loss converges to a minimum
value.
The "Optimal Regression Line" shown on the chart represents the best possible fit using all the data
points. The gradient
descent algorithm tries to approximate this line by minimizing the loss through iterative updates.
Explanation of Symbols:
- \( w \): The weight parameter of the model. It determines how much influence the
number of stop signs has on the travel time.
- \( b \): The bias parameter of the model. It represents the baseline travel time
when the number of stop signs is zero.
- \( \eta \): The learning rate. This is a small positive number that controls the
size of the steps taken to reach the minimum loss.
- \( \frac{\partial L}{\partial w} \): The gradient of the loss function with respect
to the weight. It shows how much the loss would change with a small change in weight.
- \( \frac{\partial L}{\partial b} \): The gradient of the loss function with respect
to the bias. It shows how much the loss would change with a small change in bias.
- \( L \): The loss function. In this case, it is the Mean Squared Error (MSE), which
measures the average squared difference between the predicted travel times and the actual travel
times.