- #1
fahraynk
- 186
- 6
I am trying to minimize the function below, ##R##, to find the optimum ##K##, ##V_d##, and ##V_m##.
Currently I minimize ##V_d## and ##V_m## with gradient descent, and find the best K through a binary search, but, if possible, I would like to get rid of binary search and use only gradient descent. I am having trouble though, and I'm not sure if its my code or math, so I am hoping someone can help check my math.
I have ##N## experiments. ##K## is an unknown independent variable that, along with a given ##H_j##, controls my dependent ##X## variables through a function ##X_n=F(H,K); n \in[1,2]##. ##V_d## and ##V_m## are coefficients within a certain range which must be found through minimization. ##H_j## and ##V_{0j}## are given experimental values.
$$R=\sum_{j=0}^N[V_{0j}-\frac{1}{H_{j}}(V_mX_{1j}+2V_dX_{2j})]^2$$
$$\\\\\phi_j=V_{0j}-\frac{1}{H_j}(V_mX_{1j}+2V_dX_{2j})$$
$$\frac{\partial R}{\partial V_m}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}X_{1j}$$
$$\frac{\partial R}{\partial V_d}=\sum_{j=0}^N-\frac{4\phi_j}{H_j}X_{2j}$$
Per Gradient Descent, I upgrade ##V_m## and ##V_d## using the partial derivatives of ##R## and learning rate ##\alpha##.
$$V_m^{i+1}=V_m^i-\alpha\frac{\partial R}{\partial V_m}$$
$$V_d^{i+1}=V_d^i-\alpha\frac{\partial R}{\partial V_d}$$
The above calculations work, but I run into trouble when I try to do the same for ##K##. This is my math for taking the partial derivative of ##R## with respect to ##K##:
$$\frac{\partial R}{\partial K}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}(\frac{\partial V_m}{\partial K}X_{1j}+\frac{\partial X_{1j}}{\partial K}V_m+2\frac{\partial V_d}{\partial K}X_{2j}+2\frac{\partial X_{2j}}{\partial K}V_d)$$
Because ##K## controls ##X_j##, for a small change in ##K## the optimal ##V_d## and ##V_m## might change a bit through the minimization of ##R## by Gradient Descent, but I don't have a relationship for it. I am not sure, but I think ##\frac{\partial V_d}{\partial K} = 0## and ##\frac{\partial V_m}{\partial K} = 0##. Furthermore, for ##X## :
$$\frac{\partial X_n}{\partial K}=\frac{\partial X_n}{\partial F}\frac{\partial F}{\partial K}=1\frac{\partial F}{\partial K}=\frac{F(H,k)-F(H,K+\delta K)}{\delta K}$$
So, the partial of ##R## becomes :
$$\frac{\partial R}{\partial K}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}(\frac{\partial X_{1j}}{\partial K}V_m+2\frac{\partial X_{2j}}{\partial K}V_d)$$
and the update step for ##K## :
$$K_{i+1}=K_i-\alpha\frac{\partial R}{\partial K}$$
Anyway, it does not converge correctly when I try to implement this in Matlab. I am not sure if it is because I am wrong about the partials of ##V_d##,##V_m##, or if it is because ##V_m## multiplies with ##X## and maybe that has some effect, or if my math is right and maybe the problem is in my code. Any help is appreciated.
Currently I minimize ##V_d## and ##V_m## with gradient descent, and find the best K through a binary search, but, if possible, I would like to get rid of binary search and use only gradient descent. I am having trouble though, and I'm not sure if its my code or math, so I am hoping someone can help check my math.
I have ##N## experiments. ##K## is an unknown independent variable that, along with a given ##H_j##, controls my dependent ##X## variables through a function ##X_n=F(H,K); n \in[1,2]##. ##V_d## and ##V_m## are coefficients within a certain range which must be found through minimization. ##H_j## and ##V_{0j}## are given experimental values.
$$R=\sum_{j=0}^N[V_{0j}-\frac{1}{H_{j}}(V_mX_{1j}+2V_dX_{2j})]^2$$
$$\\\\\phi_j=V_{0j}-\frac{1}{H_j}(V_mX_{1j}+2V_dX_{2j})$$
$$\frac{\partial R}{\partial V_m}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}X_{1j}$$
$$\frac{\partial R}{\partial V_d}=\sum_{j=0}^N-\frac{4\phi_j}{H_j}X_{2j}$$
Per Gradient Descent, I upgrade ##V_m## and ##V_d## using the partial derivatives of ##R## and learning rate ##\alpha##.
$$V_m^{i+1}=V_m^i-\alpha\frac{\partial R}{\partial V_m}$$
$$V_d^{i+1}=V_d^i-\alpha\frac{\partial R}{\partial V_d}$$
The above calculations work, but I run into trouble when I try to do the same for ##K##. This is my math for taking the partial derivative of ##R## with respect to ##K##:
$$\frac{\partial R}{\partial K}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}(\frac{\partial V_m}{\partial K}X_{1j}+\frac{\partial X_{1j}}{\partial K}V_m+2\frac{\partial V_d}{\partial K}X_{2j}+2\frac{\partial X_{2j}}{\partial K}V_d)$$
Because ##K## controls ##X_j##, for a small change in ##K## the optimal ##V_d## and ##V_m## might change a bit through the minimization of ##R## by Gradient Descent, but I don't have a relationship for it. I am not sure, but I think ##\frac{\partial V_d}{\partial K} = 0## and ##\frac{\partial V_m}{\partial K} = 0##. Furthermore, for ##X## :
$$\frac{\partial X_n}{\partial K}=\frac{\partial X_n}{\partial F}\frac{\partial F}{\partial K}=1\frac{\partial F}{\partial K}=\frac{F(H,k)-F(H,K+\delta K)}{\delta K}$$
So, the partial of ##R## becomes :
$$\frac{\partial R}{\partial K}=\sum_{j=0}^N-\frac{2\phi_j}{H_j}(\frac{\partial X_{1j}}{\partial K}V_m+2\frac{\partial X_{2j}}{\partial K}V_d)$$
and the update step for ##K## :
$$K_{i+1}=K_i-\alpha\frac{\partial R}{\partial K}$$
Anyway, it does not converge correctly when I try to implement this in Matlab. I am not sure if it is because I am wrong about the partials of ##V_d##,##V_m##, or if it is because ##V_m## multiplies with ##X## and maybe that has some effect, or if my math is right and maybe the problem is in my code. Any help is appreciated.