Is this system of equations (numerically) solvable?

In summary, the conversation discusses a set of equations involving variables and known values, and the possibility of solving the equations for the variables using numerical methods. The suggested approach involves minimizing a cost function by varying the initial values for the parameters and calculating the amount of violation of the equations for each set of values. The use of a gradient descent algorithm is also mentioned.
  • #1
dumbperson
77
0
Hi,

In a project of mine I've encountered the following set of equations:

$$ \sum_{i=1}^N \left(\frac{1}{M}\sum_{\alpha=1}^Mg_{ij}^\alpha - u_{ij}^* \right) = 0 \qquad \forall: 1\leq j \leq N$$
$$\sum_{i<j}\left( (u_{ij}^*)^2 - \frac{2}{M^2}\sum_{\alpha < \beta}^Mg_{ij}^\alpha g_{ij}^\beta \right) = 0 $$

and ##u_{ij}^*## is the ##u_{ij}## that solves the equation
$$ u_{ij}= \frac{1}{2} + \frac{1}{2}\tanh{\left( - \frac{\theta_i + \theta_j}{2} + 2\cdot J\cdot M \cdot u_{ij}\right)} $$
where the ##\theta_i (i=1...N)## and ##J## are the variables to be solved for (which are ##N+1## different variables) and ##M,N,g_{ij}^\alpha## are known values (observables).

Do you think it is possible to solve this set of equations for ##\theta_i, J##? I would naively say yes, since we have ##N+1## equations (the first two) and ##N+1## variables, but I'm not sure how I would solve this numerically.

I have been stuck on this for a long time so some thoughts from you smart guys would be greatly appreciated!
 
Mathematics news on Phys.org
  • #2
I'm sorry, I should've posted this in a different section of the forum since this is a project of mine
 
Last edited:
  • #3
You can always plug it into some numerical solver and see what comes out.

Some quick simplifications here, with suitable new variables cj and D. First set of equations:
$$\sum_{i=1}^N u_{ij}^* = c_j \qquad \forall: 1\leq j \leq N$$
Second equation:$$ \sum_{i<j} (u_{ij}^*)^2 = D$$
With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.
 
  • #4
dumbperson said:
I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

If you mean it's schoolwork, it is fine here. Advanced schoolwork where you show lots of work qualifies for the technical forums. :smile:
 
  • #5
mfb said:
You can always plug it into some numerical solver and see what comes out.

With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.

But how exactly do I solve here for ##\theta_i, J##? To use the equation in the quote, I would need to know the value of ##u_{ij}##, but from the equations, i can only obtain the values of the sum ##\sum_{i=1}^N u_{ij}##, no? Maybe I misunderstood what you're saying
 
  • #6
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.
 
  • Like
Likes dumbperson
  • #7
mfb said:
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.
Ah, so like minimizing some error function? Thanks I'll think about it and try to implement it :)
 
  • #8
mfb said:
There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

Could you recommend me some tools for this automatic optimization? :)
 
  • #9
I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?
 
  • #10
mfb said:
I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?
I have tried to implement my own algorithm, but it does not seem to work and maybe you have some suggestions on why it doesn't work or can suggest alternativies? :-)

First of all, I forgot to mention that the value of ##g_{ij}^\alpha## is either 0 or 1.

First I make the following simplifications/definitions:

$$ \sum_{j=1}^N u_{ij}^* = \frac{1}{M}\sum_{\alpha=1}^M\sum_{j=1}^Ng_{ij}^\alpha \equiv \overline{k_i^*} $$
$$\sum_{i<j} (u_{ij}^*)^2 = \frac{2}{M^2}\sum_{\alpha<\beta} \sum_{i<j} g_{ij}^\alpha g_{ij}^\beta \equiv O^*$$

I now define a Cost Function :
$$C(\vec{\theta}, J) \equiv \frac{1}{2} \sum_{i=1}^N \bigg|\bigg|\overline{k_i^*} - \sum_{j=1}^Nu_{ij}^*(\theta_i,\theta_j, J) \bigg|\bigg|^2 + \frac{1}{2}\bigg|\bigg|O^* - \sum_{i<j} \big(u_{ij}^*(\theta_i, \theta_j, J)\big)^2 \bigg|\bigg|^2 $$
which we want to minimize.

A change in the value of the parameters ##(\theta_1,...,\theta_N,J)## leads to a change in the cost function

$$ \Delta C(\vec{\theta},J) \approx \sum_{k=1}^N \frac{\partial C(\vec{\theta},J)}{\partial \theta_k} \Delta \theta_k + \frac{\partial C(\vec{\theta},J)}{\partial J}\Delta J $$

The "optimal" direction in which we can move our parameters is in the direction of the negative gradient of the cost function:

$$ \Delta \theta_k = -\eta \frac{\partial C(\vec{\theta},J)}{\partial \theta_k}, \qquad \Delta J = -\eta \frac{\partial C(\vec{\theta}, J)}{\partial J} $$
where ##\eta## is some positive number that controls the size of this change.

Now, the derivative of the cost function with respect to some parameter ##x## is

$$ \frac{\partial C}{\partial x} = - \sum_{i=1}^N \bigg(\overline{k_i^*} - \sum_{j=1}^N u_{ij}^*\bigg)\sum_{j=1}^N\frac{\partial u_{ij}^*}{\partial x} - 2 \bigg( O^* - \sum_{i<j}\big(u_{ij}^*\big)^2\bigg)\sum_{i<j}u_{ij}^* \frac{\partial u_{ij}^*}{\partial x}$$
We can find the derivatives ##\partial u_{ij}^*/\partial J## and ##\partial u_{ij}^*/\partial \theta_k## by using the implicit relation we have for ##u_{ij}^*##:

$$u_{ij}^* = \frac{1}{2} + \frac{1}{2}\tanh{\left(- \frac{\theta_i + \theta_j}{2} + 2JMu_{ij}^* \right)}$$

Taking the derivative of both sides with respect to ##J## and ##\theta_k## (using that ##d(tanh(x))/dx = 1-tanh^2(x)## )

leads to

$$\frac{\partial u_{ij}^*}{\partial \theta_k} = -\frac{\frac{1}{4}\big[1-(2u_{ij}^*-1)^2\big](\delta_i^k + \delta_j^k)}{1 - \big[1-(2u_{ij}^*-1)^2\big]\cdot JM}$$
$$\frac{\partial u_{ij}^*}{\partial J} = \frac{\big[1-(2u_{ij}^*-1)^2\big]\cdot JMu_{ij}^*}{1 - \big[1- (2u_{ij}^*-1)^2 \big]\cdot JM} $$

Note that here we can see that the derivatives are 0 when ##u_{ij}^* = 1##. Looking at the implicit relation for ##u_{ij}^*##, we can see that this happens when #J# becomes "relatively large".

The algorithm is therefore:

  1. Initialize the parameters ##(\theta_1,...,\theta_N, J)## (randomly)
  2. Solve for the value of ##u_{ij}^*## using the implicit relation
  3. Calculate the value of the cost function
  4. Calculate the partial derivatives
  5. Update parameters using partial derivatives
  6. Repeat step 2 till desired cost is achieved
I've experimented with different values of ##\eta##, and it seems that the first iteration may reduce the cost enormously , which will lead to a large change in ##J## , which means in the next iteration ##u_{ij}^*## may be close to 1, so it will not lead to further changes in the parameters anymore, or the first iteration leads to a small change, and so do all iterations (and it would take years to converge) and I can't seem to find a middle ground really!

I've also considered the possibility that if the algorith does converge, it may converge to a local minimum of the cost function

Any thoughts?
 
Last edited:
  • #11
Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.
 
  • #12
mfb said:
Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

Thanks for your response. I'll think about the points you mentioned!

About pre existing algorithms/ tools: I assumed that they would be incompatible with my problem since I do not have an explicit function that I want to minimize, but that the function that I want to minimize involves some hidden function ##u_{ij}^*(\theta_1,...,\theta_N,J)## (that obeys the relation I posted). If I'm wrong about this, I would love to find a tool that I can use for my specific problem! If so, could you steer me into the right direction? I'd understand if you think I'm asking too much :P

I tried to implement the algorithm I described in Python by the way
 
  • #13
Some tools should allow a separate user-defined function to calculate the cost of a given set of parameters - in that function you can include your intermediate parameters. I never needed that, so I don't know how exactly.
 

1. Can every system of equations be solved numerically?

No, not every system of equations can be solved numerically. Some equations may have no solution or an infinite number of solutions, making it impossible to find a numerical solution.

2. How do I know if a system of equations is solvable?

A system of equations is considered solvable if it has a unique solution or a finite number of solutions. This can be determined by checking if the equations are consistent and independent.

3. What methods can be used to solve a system of equations numerically?

There are several methods that can be used to solve a system of equations numerically, including Gaussian elimination, Jacobi method, Gauss-Seidel method, and the Newton-Raphson method. The most suitable method depends on the specific characteristics of the system of equations.

4. How accurate are numerical solutions compared to analytical solutions?

Numerical solutions are generally less accurate than analytical solutions. This is because numerical methods involve approximations and rounding errors, which can accumulate and affect the final result. However, with sufficiently small intervals and careful implementation, numerical solutions can be very close to analytical solutions.

5. Are there any limitations to solving systems of equations numerically?

Yes, there are limitations to solving systems of equations numerically. Some systems may be too complex or have too many variables, making it computationally infeasible to find a numerical solution. Additionally, numerical methods may not work well for systems with discontinuous or rapidly changing functions.

Similar threads

  • Calculus and Beyond Homework Help
Replies
0
Views
144
  • Classical Physics
Replies
30
Views
2K
Replies
1
Views
378
  • General Math
Replies
1
Views
1K
Replies
6
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
941
  • General Math
Replies
9
Views
1K
Replies
27
Views
917
  • Calculus and Beyond Homework Help
Replies
1
Views
597
Replies
3
Views
720
Back
Top