AI Computation (self study analysis, pointers welcome)

Click For Summary

Discussion Overview

This thread explores the computation of gradient descent for a single sigmoid neuron, focusing on deriving a closed form solution using calculus. Participants discuss the formulation of the activation function, cost function, and gradients, while seeking pointers and resources for further understanding.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant presents the activation function and cost function for a sigmoid neuron, detailing the gradient calculations for weights and bias.
  • Another participant expresses uncertainty about the correctness of their gradient calculations, particularly regarding vector-valued functions and the inclusion of bias.
  • A participant shares external resources related to gradient descent and neural networks, inviting others to contribute additional materials.
  • There is a proposal to incorporate learning rate parameters into the gradient descent equations, with an intention to share further developments throughout the day.
  • A participant questions the correctness of the recursive formula for updating weights and bias using the learning rate and seeks clarification on the ideal solution for determining the learning rate.

Areas of Agreement / Disagreement

Participants express various levels of confidence in their calculations and understanding, with some uncertainties remaining about the gradient computation and the learning rate. No consensus has been reached regarding the ideal solution for the learning rate or the correctness of the gradient calculations.

Contextual Notes

Participants acknowledge potential conceptual flaws in their approaches and express confusion about the gradient of vector-valued functions, indicating that assumptions may be missing or definitions may need clarification.

Who May Find This Useful

This discussion may be of interest to those studying neural networks, gradient descent optimization, or anyone looking to deepen their understanding of the mathematical foundations of machine learning algorithms.

Chenkel
Messages
482
Reaction score
109
TL;DR
AI Computation problem statement: represent mathematically single sigmoid neuron with arbitrary number of weights and inputs, calculate back propagation formula using stochastic gradient descent.

I'm looking to improve my understanding of the algorithm, and hopefully create a thread that could be useful to someone facing a similar AI problem.
In this thread I attempt to find a closed form solution to the gradient descent problem for a single sigmoid neuron using basic calculus.

If you would like to give pointers feel free, if you see me make a mistake please let me know!

Thank you!
 
Technology news on Phys.org
Activation function:
$$\sigma(z) = \frac 1 {1+e^{-z}}$$Inputs to sigmoid neuron:
$$I=(I_1, I_2, ..., I_m)$$Weights:$$w = (w_1, w_2, ..., w_m)$$Output of sigmoid neuron:$$\theta(w, b) = \sigma(z(w, b))$$where$$z(w, b) = I \cdot w + b$$Cost function ##f(\theta)## for expected value ##E##:$$f(\theta) = (\theta - E)^2$$Gradient of ##f(\theta)##$$\nabla f = ((\frac {\partial f} {\partial w_1}, ..., \frac {\partial f} {\partial w_m}), \frac {\partial f} {\partial b})$$$$\frac {\partial f} {\partial w_i} = 2(\theta - E)\frac {\partial \theta} {\partial w_i}$$$$\frac {\partial \theta} {\partial w_i} = \frac {d\sigma} {dz}\frac {\partial z} {\partial w_i}$$$$\frac {d\sigma} {dz} = -\frac {e^{-z}} {(1 + e^{-z})^2}=-(\frac 1 \sigma - 1)\sigma^2=\sigma(\sigma - 1)$$$$\frac {\partial z} {\partial w_i} = I_i$$$$\frac {\partial \theta} {\partial w_i} = I_i\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial w_i} = 2(\theta - E)I_i\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial b} = 2(\theta - E)\frac {\partial \theta}{\partial b}$$$$\frac {\partial \theta}{\partial b}=\frac {d\sigma}{dz}\frac {\partial z}{\partial b}=\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial b} = 2(\theta - E)\sigma(\sigma - 1)$$$$\frac {\partial f} {\partial w_i} = I_i\frac {\partial f} {\partial b}$$
 
I'm still working on the problem, I want to add the learning rate parameters and show the equations for gradient descent. I'll be posting throughout the day. Feel free to post here to add to the discussion on the problem at hand.
 
I am a little unsure I calculated the gradient properly, I'm a little confused about how to go about calculating the gradient of a vector valued function with weights w, and bias (scalar value) b. It seems a little messy and I'm wondering if there is a conceptual flaw about the way I am doing it.

Any feedback is welcome, thank you!
 
The following is the formula to compute the new set of weights recursively using learning rate ##\eta##:$$(w, b) := (w, b) - \eta{\nabla}f = (w, b) - \eta((I_1\frac {\partial f} {\partial b}, I_2\frac {\partial f} {\partial b}, ..., I_m\frac {\partial f} {\partial b}), \frac {\partial f} {\partial b})$$Does this look correct?

Does anyone know what the ideal solution is to solve for the learning rate?
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
370
  • · Replies 15 ·
Replies
15
Views
3K
Replies
10
Views
5K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 6 ·
Replies
6
Views
11K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K