Stochastic Gradient Descent (SGD) is an optimization algorithm commonly used in machine learning and deep learning to minimize a loss function. Unlike the traditional gradient descent, which computes the gradient using the entire dataset, SGD updates the model weights using only a single sample (or a small batch) at each iteration. This makes it faster and allows it to escape local minima more effectively. The update rule for SGD can be expressed as:
where represents the parameters, is the learning rate, and is the gradient of the loss function with respect to a single training example . While SGD can converge more quickly than standard gradient descent, it may exhibit more fluctuation in the loss function due to its reliance on individual samples. To mitigate this, techniques such as momentum, learning rate decay, and mini-batch gradient descent are often employed.
Start your personalized study experience with acemate today. Sign up for free and find summaries and mock exams for your university.