What is Gradient Descent?

Gradient Descent is an optimization algorithm used to minimize the loss function in a machine learning model. It works by iteratively adjusting the model's parameters (in this case, the weight and bias) to find the values that result in the lowest possible loss.

Key Concepts:

Loss Function: Measures how far off the model's predictions are from the actual values. In this demo, we're using Mean Squared Error (MSE) as the loss function.
Gradient: The slope or direction of change in the loss function. It tells us how to adjust the weight and bias to reduce the loss.
Learning Rate: A small number that determines the step size for each iteration of gradient descent. A smaller learning rate means smaller steps and more iterations, while a larger learning rate means bigger steps but can overshoot the minimum.

How it Works:

The algorithm starts with initial guesses for the weight and bias, often set to zero. It then calculates the gradient of the loss function with respect to each parameter. The weight and bias are updated in the direction that decreases the loss, using the formula:

New Weight \( w_{\text{new}} = w_{\text{old}} - \eta \cdot \frac{\partial L}{\partial w} \)

New Bias \( b_{\text{new}} = b_{\text{old}} - \eta \cdot \frac{\partial L}{\partial b} \)

This process repeats for a specified number of iterations or until the loss converges to a minimum value.

The "Optimal Regression Line" shown on the chart represents the best possible fit using all the data points. The gradient descent algorithm tries to approximate this line by minimizing the loss through iterative updates.

Explanation of Symbols:

\( w \): The weight parameter of the model. It determines how much influence the number of stop signs has on the travel time.
\( b \): The bias parameter of the model. It represents the baseline travel time when the number of stop signs is zero.
\( \eta \): The learning rate. This is a small positive number that controls the size of the steps taken to reach the minimum loss.
\( \frac{\partial L}{\partial w} \): The gradient of the loss function with respect to the weight. It shows how much the loss would change with a small change in weight.
\( \frac{\partial L}{\partial b} \): The gradient of the loss function with respect to the bias. It shows how much the loss would change with a small change in bias.
\( L \): The loss function. In this case, it is the Mean Squared Error (MSE), which measures the average squared difference between the predicted travel times and the actual travel times.

Gradient Descent Demo

This demo shows how gradient descent works by training a linear model to predict travel time based on the number of stop signs.
_{Based on this example}

Weight: 0

Bias: 0

Loss: 0