Is this system of equations (numerically) solvable?

Click For Summary

Discussion Overview

The discussion revolves around the numerical solvability of a specific system of equations involving variables related to a project. Participants explore the feasibility of solving for certain parameters within a set of equations that include summations and hyperbolic tangent functions, focusing on the numerical methods that could be employed.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents a set of equations and expresses uncertainty about the numerical solvability for the variables ##\theta_i## and ##J##, noting the number of equations matches the number of variables.
  • Another participant suggests using a numerical solver and proposes simplifications to the equations, indicating that with redefinitions, the system can be analyzed more easily.
  • Several participants discuss methods for minimizing the violation of the equations, including defining a cost function and using optimization techniques to adjust parameters iteratively.
  • One participant shares their experience with ROOT and suggests finding minimization tools in the reader's preferred programming language.
  • A later reply details an algorithm for optimizing the parameters, including the calculation of partial derivatives based on the implicit relations of the equations.
  • Concerns are raised about the behavior of the derivatives when certain conditions are met, particularly when ##u_{ij}^* = 1## and the implications for the optimization process.

Areas of Agreement / Disagreement

Participants generally agree on the approach of using numerical solvers and optimization techniques, but there is no consensus on the specific methods or tools to be used, and some uncertainty remains regarding the effectiveness of the proposed algorithms.

Contextual Notes

The discussion includes various assumptions about the behavior of the equations and the parameters involved, particularly regarding the implications of certain values leading to zero derivatives. There are also limitations in the clarity of how to derive certain values from the equations.

Who May Find This Useful

This discussion may be useful for individuals interested in numerical methods for solving complex systems of equations, particularly in the context of physics or engineering projects involving optimization and parameter estimation.

dumbperson
Messages
77
Reaction score
0
Hi,

In a project of mine I've encountered the following set of equations:

$$ \sum_{i=1}^N \left(\frac{1}{M}\sum_{\alpha=1}^Mg_{ij}^\alpha - u_{ij}^* \right) = 0 \qquad \forall: 1\leq j \leq N$$
$$\sum_{i<j}\left( (u_{ij}^*)^2 - \frac{2}{M^2}\sum_{\alpha < \beta}^Mg_{ij}^\alpha g_{ij}^\beta \right) = 0 $$

and ##u_{ij}^*## is the ##u_{ij}## that solves the equation
$$ u_{ij}= \frac{1}{2} + \frac{1}{2}\tanh{\left( - \frac{\theta_i + \theta_j}{2} + 2\cdot J\cdot M \cdot u_{ij}\right)} $$
where the ##\theta_i (i=1...N)## and ##J## are the variables to be solved for (which are ##N+1## different variables) and ##M,N,g_{ij}^\alpha## are known values (observables).

Do you think it is possible to solve this set of equations for ##\theta_i, J##? I would naively say yes, since we have ##N+1## equations (the first two) and ##N+1## variables, but I'm not sure how I would solve this numerically.

I have been stuck on this for a long time so some thoughts from you smart guys would be greatly appreciated!
 
Mathematics news on Phys.org
I'm sorry, I should've posted this in a different section of the forum since this is a project of mine
 
Last edited:
You can always plug it into some numerical solver and see what comes out.

Some quick simplifications here, with suitable new variables cj and D. First set of equations:
$$\sum_{i=1}^N u_{ij}^* = c_j \qquad \forall: 1\leq j \leq N$$
Second equation:$$ \sum_{i<j} (u_{ij}^*)^2 = D$$
With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.
 
dumbperson said:
I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

If you mean it's schoolwork, it is fine here. Advanced schoolwork where you show lots of work qualifies for the technical forums. :smile:
 
mfb said:
You can always plug it into some numerical solver and see what comes out.

With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.

But how exactly do I solve here for ##\theta_i, J##? To use the equation in the quote, I would need to know the value of ##u_{ij}##, but from the equations, i can only obtain the values of the sum ##\sum_{i=1}^N u_{ij}##, no? Maybe I misunderstood what you're saying
 
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.
 
  • Like
Likes   Reactions: dumbperson
mfb said:
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.
Ah, so like minimizing some error function? Thanks I'll think about it and try to implement it :)
 
mfb said:
There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

Could you recommend me some tools for this automatic optimization? :)
 
I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?
 
  • #10
mfb said:
I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?
I have tried to implement my own algorithm, but it does not seem to work and maybe you have some suggestions on why it doesn't work or can suggest alternativies? :-)

First of all, I forgot to mention that the value of ##g_{ij}^\alpha## is either 0 or 1.

First I make the following simplifications/definitions:

$$ \sum_{j=1}^N u_{ij}^* = \frac{1}{M}\sum_{\alpha=1}^M\sum_{j=1}^Ng_{ij}^\alpha \equiv \overline{k_i^*} $$
$$\sum_{i<j} (u_{ij}^*)^2 = \frac{2}{M^2}\sum_{\alpha<\beta} \sum_{i<j} g_{ij}^\alpha g_{ij}^\beta \equiv O^*$$

I now define a Cost Function :
$$C(\vec{\theta}, J) \equiv \frac{1}{2} \sum_{i=1}^N \bigg|\bigg|\overline{k_i^*} - \sum_{j=1}^Nu_{ij}^*(\theta_i,\theta_j, J) \bigg|\bigg|^2 + \frac{1}{2}\bigg|\bigg|O^* - \sum_{i<j} \big(u_{ij}^*(\theta_i, \theta_j, J)\big)^2 \bigg|\bigg|^2 $$
which we want to minimize.

A change in the value of the parameters ##(\theta_1,...,\theta_N,J)## leads to a change in the cost function

$$ \Delta C(\vec{\theta},J) \approx \sum_{k=1}^N \frac{\partial C(\vec{\theta},J)}{\partial \theta_k} \Delta \theta_k + \frac{\partial C(\vec{\theta},J)}{\partial J}\Delta J $$

The "optimal" direction in which we can move our parameters is in the direction of the negative gradient of the cost function:

$$ \Delta \theta_k = -\eta \frac{\partial C(\vec{\theta},J)}{\partial \theta_k}, \qquad \Delta J = -\eta \frac{\partial C(\vec{\theta}, J)}{\partial J} $$
where ##\eta## is some positive number that controls the size of this change.

Now, the derivative of the cost function with respect to some parameter ##x## is

$$ \frac{\partial C}{\partial x} = - \sum_{i=1}^N \bigg(\overline{k_i^*} - \sum_{j=1}^N u_{ij}^*\bigg)\sum_{j=1}^N\frac{\partial u_{ij}^*}{\partial x} - 2 \bigg( O^* - \sum_{i<j}\big(u_{ij}^*\big)^2\bigg)\sum_{i<j}u_{ij}^* \frac{\partial u_{ij}^*}{\partial x}$$
We can find the derivatives ##\partial u_{ij}^*/\partial J## and ##\partial u_{ij}^*/\partial \theta_k## by using the implicit relation we have for ##u_{ij}^*##:

$$u_{ij}^* = \frac{1}{2} + \frac{1}{2}\tanh{\left(- \frac{\theta_i + \theta_j}{2} + 2JMu_{ij}^* \right)}$$

Taking the derivative of both sides with respect to ##J## and ##\theta_k## (using that ##d(tanh(x))/dx = 1-tanh^2(x)## )

leads to

$$\frac{\partial u_{ij}^*}{\partial \theta_k} = -\frac{\frac{1}{4}\big[1-(2u_{ij}^*-1)^2\big](\delta_i^k + \delta_j^k)}{1 - \big[1-(2u_{ij}^*-1)^2\big]\cdot JM}$$
$$\frac{\partial u_{ij}^*}{\partial J} = \frac{\big[1-(2u_{ij}^*-1)^2\big]\cdot JMu_{ij}^*}{1 - \big[1- (2u_{ij}^*-1)^2 \big]\cdot JM} $$

Note that here we can see that the derivatives are 0 when ##u_{ij}^* = 1##. Looking at the implicit relation for ##u_{ij}^*##, we can see that this happens when #J# becomes "relatively large".

The algorithm is therefore:

  1. Initialize the parameters ##(\theta_1,...,\theta_N, J)## (randomly)
  2. Solve for the value of ##u_{ij}^*## using the implicit relation
  3. Calculate the value of the cost function
  4. Calculate the partial derivatives
  5. Update parameters using partial derivatives
  6. Repeat step 2 till desired cost is achieved
I've experimented with different values of ##\eta##, and it seems that the first iteration may reduce the cost enormously , which will lead to a large change in ##J## , which means in the next iteration ##u_{ij}^*## may be close to 1, so it will not lead to further changes in the parameters anymore, or the first iteration leads to a small change, and so do all iterations (and it would take years to converge) and I can't seem to find a middle ground really!

I've also considered the possibility that if the algorith does converge, it may converge to a local minimum of the cost function

Any thoughts?
 
Last edited:
  • #11
Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.
 
  • #12
mfb said:
Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

Thanks for your response. I'll think about the points you mentioned!

About pre existing algorithms/ tools: I assumed that they would be incompatible with my problem since I do not have an explicit function that I want to minimize, but that the function that I want to minimize involves some hidden function ##u_{ij}^*(\theta_1,...,\theta_N,J)## (that obeys the relation I posted). If I'm wrong about this, I would love to find a tool that I can use for my specific problem! If so, could you steer me into the right direction? I'd understand if you think I'm asking too much :P

I tried to implement the algorithm I described in Python by the way
 
  • #13
Some tools should allow a separate user-defined function to calculate the cost of a given set of parameters - in that function you can include your intermediate parameters. I never needed that, so I don't know how exactly.
 

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
Replies
0
Views
924
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
Replies
4
Views
2K