AI Computation (self study analysis, pointers welcome)

Click For Summary
SUMMARY

This discussion focuses on deriving a closed-form solution for the gradient descent problem applied to a single sigmoid neuron using calculus. The activation function is defined as $$\sigma(z) = \frac{1}{1 + e^{-z}}$$, with inputs $$I$$ and weights $$w$$. The cost function $$f(\theta) = (\theta - E)^2$$ is analyzed, and the gradients with respect to weights and bias are calculated. The user seeks feedback on their gradient calculations and is looking for resources to better understand the learning rate in this context.

PREREQUISITES
  • Understanding of gradient descent algorithms
  • Familiarity with sigmoid activation functions
  • Basic knowledge of calculus, particularly derivatives
  • Experience with neural network concepts
NEXT STEPS
  • Research "Learning Rate Optimization Techniques" to enhance gradient descent performance
  • Study "Vector-Valued Functions and Gradients" for better understanding of multi-variable calculus
  • Explore "Stochastic Gradient Descent Variants" for advanced optimization methods
  • Review "Backpropagation in Neural Networks" for insights on weight updates
USEFUL FOR

Students and practitioners in machine learning, particularly those focusing on neural networks and optimization techniques, will benefit from this discussion.

Chenkel
Messages
482
Reaction score
109
TL;DR
AI Computation problem statement: represent mathematically single sigmoid neuron with arbitrary number of weights and inputs, calculate back propagation formula using stochastic gradient descent.

I'm looking to improve my understanding of the algorithm, and hopefully create a thread that could be useful to someone facing a similar AI problem.
In this thread I attempt to find a closed form solution to the gradient descent problem for a single sigmoid neuron using basic calculus.

If you would like to give pointers feel free, if you see me make a mistake please let me know!

Thank you!
 
Technology news on Phys.org
Activation function:
$$\sigma(z) = \frac 1 {1+e^{-z}}$$Inputs to sigmoid neuron:
$$I=(I_1, I_2, ..., I_m)$$Weights:$$w = (w_1, w_2, ..., w_m)$$Output of sigmoid neuron:$$\theta(w, b) = \sigma(z(w, b))$$where$$z(w, b) = I \cdot w + b$$Cost function ##f(\theta)## for expected value ##E##:$$f(\theta) = (\theta - E)^2$$Gradient of ##f(\theta)##$$\nabla f = ((\frac {\partial f} {\partial w_1}, ..., \frac {\partial f} {\partial w_m}), \frac {\partial f} {\partial b})$$$$\frac {\partial f} {\partial w_i} = 2(\theta - E)\frac {\partial \theta} {\partial w_i}$$$$\frac {\partial \theta} {\partial w_i} = \frac {d\sigma} {dz}\frac {\partial z} {\partial w_i}$$$$\frac {d\sigma} {dz} = -\frac {e^{-z}} {(1 + e^{-z})^2}=-(\frac 1 \sigma - 1)\sigma^2=\sigma(\sigma - 1)$$$$\frac {\partial z} {\partial w_i} = I_i$$$$\frac {\partial \theta} {\partial w_i} = I_i\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial w_i} = 2(\theta - E)I_i\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial b} = 2(\theta - E)\frac {\partial \theta}{\partial b}$$$$\frac {\partial \theta}{\partial b}=\frac {d\sigma}{dz}\frac {\partial z}{\partial b}=\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial b} = 2(\theta - E)\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial w_i} = I_i\frac {\partial f} {\partial b}$$
 
I'm still working on the problem, I want to add the learning rate parameters and show the equations for gradient descent. I'll be posting throughout the day. Feel free to post here to add to the discussion on the problem at hand.
 
I am a little unsure I calculated the gradient properly, I'm a little confused about how to go about calculating the gradient of a vector valued function with weights w, and bias (scalar value) b. It seems a little messy and I'm wondering if there is a conceptual flaw about the way I am doing it.

Any feedback is welcome, thank you!
 
The following is the formula to compute the new set of weights recursively using learning rate ##\eta##:$$(w, b) := (w, b) - \eta{\nabla}f = (w, b) - \eta((I_1\frac {\partial f} {\partial b}, I_2\frac {\partial f} {\partial b}, ..., I_m\frac {\partial f} {\partial b}), \frac {\partial f} {\partial b})$$Does this look correct?

Does anyone know what the ideal solution is to solve for the learning rate?
 

Similar threads

  • · Replies 15 ·
Replies
15
Views
3K
Replies
10
Views
5K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 6 ·
Replies
6
Views
11K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K