Is this system of equations (numerically) solvable?

• A
Hi,

In a project of mine I've encountered the following set of equations:

$$\sum_{i=1}^N \left(\frac{1}{M}\sum_{\alpha=1}^Mg_{ij}^\alpha - u_{ij}^* \right) = 0 \qquad \forall: 1\leq j \leq N$$
$$\sum_{i<j}\left( (u_{ij}^*)^2 - \frac{2}{M^2}\sum_{\alpha < \beta}^Mg_{ij}^\alpha g_{ij}^\beta \right) = 0$$

and ##u_{ij}^*## is the ##u_{ij}## that solves the equation
$$u_{ij}= \frac{1}{2} + \frac{1}{2}\tanh{\left( - \frac{\theta_i + \theta_j}{2} + 2\cdot J\cdot M \cdot u_{ij}\right)}$$
where the ##\theta_i (i=1...N)## and ##J## are the variables to be solved for (which are ##N+1## different variables) and ##M,N,g_{ij}^\alpha## are known values (observables).

Do you think it is possible to solve this set of equations for ##\theta_i, J##? I would naively say yes, since we have ##N+1## equations (the first two) and ##N+1## variables, but I'm not sure how I would solve this numerically.

I have been stuck on this for a long time so some thoughts from you smart guys would be greatly appreciated!!

I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

Last edited:
mfb
Mentor
You can always plug it into some numerical solver and see what comes out.

Some quick simplifications here, with suitable new variables cj and D. First set of equations:
$$\sum_{i=1}^N u_{ij}^* = c_j \qquad \forall: 1\leq j \leq N$$
Second equation:$$\sum_{i<j} (u_{ij}^*)^2 = D$$
With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.

berkeman
Mentor
I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

If you mean it's schoolwork, it is fine here. Advanced schoolwork where you show lots of work qualifies for the technical forums.

You can always plug it into some numerical solver and see what comes out.

With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.

But how exactly do I solve here for ##\theta_i, J##? To use the equation in the quote, I would need to know the value of ##u_{ij}##, but from the equations, i can only obtain the values of the sum ##\sum_{i=1}^N u_{ij}##, no? Maybe I misunderstood what you're saying

mfb
Mentor
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

dumbperson
Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.
Ah, so like minimizing some error function? Thanks I'll think about it and try to implement it :)

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

Could you recommend me some tools for this automatic optimization? :)

mfb
Mentor
I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?

I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?
I have tried to implement my own algorithm, but it does not seem to work and maybe you have some suggestions on why it doesn't work or can suggest alternativies? :-)

First of all, I forgot to mention that the value of ##g_{ij}^\alpha## is either 0 or 1.

First I make the following simplifications/definitions:

$$\sum_{j=1}^N u_{ij}^* = \frac{1}{M}\sum_{\alpha=1}^M\sum_{j=1}^Ng_{ij}^\alpha \equiv \overline{k_i^*}$$
$$\sum_{i<j} (u_{ij}^*)^2 = \frac{2}{M^2}\sum_{\alpha<\beta} \sum_{i<j} g_{ij}^\alpha g_{ij}^\beta \equiv O^*$$

I now define a Cost Function :
$$C(\vec{\theta}, J) \equiv \frac{1}{2} \sum_{i=1}^N \bigg|\bigg|\overline{k_i^*} - \sum_{j=1}^Nu_{ij}^*(\theta_i,\theta_j, J) \bigg|\bigg|^2 + \frac{1}{2}\bigg|\bigg|O^* - \sum_{i<j} \big(u_{ij}^*(\theta_i, \theta_j, J)\big)^2 \bigg|\bigg|^2$$
which we want to minimize.

A change in the value of the parameters ##(\theta_1,...,\theta_N,J)## leads to a change in the cost function

$$\Delta C(\vec{\theta},J) \approx \sum_{k=1}^N \frac{\partial C(\vec{\theta},J)}{\partial \theta_k} \Delta \theta_k + \frac{\partial C(\vec{\theta},J)}{\partial J}\Delta J$$

The "optimal" direction in which we can move our parameters is in the direction of the negative gradient of the cost function:

$$\Delta \theta_k = -\eta \frac{\partial C(\vec{\theta},J)}{\partial \theta_k}, \qquad \Delta J = -\eta \frac{\partial C(\vec{\theta}, J)}{\partial J}$$
where ##\eta## is some positive number that controls the size of this change.

Now, the derivative of the cost function with respect to some parameter ##x## is

$$\frac{\partial C}{\partial x} = - \sum_{i=1}^N \bigg(\overline{k_i^*} - \sum_{j=1}^N u_{ij}^*\bigg)\sum_{j=1}^N\frac{\partial u_{ij}^*}{\partial x} - 2 \bigg( O^* - \sum_{i<j}\big(u_{ij}^*\big)^2\bigg)\sum_{i<j}u_{ij}^* \frac{\partial u_{ij}^*}{\partial x}$$
We can find the derivatives ##\partial u_{ij}^*/\partial J## and ##\partial u_{ij}^*/\partial \theta_k## by using the implicit relation we have for ##u_{ij}^*##:

$$u_{ij}^* = \frac{1}{2} + \frac{1}{2}\tanh{\left(- \frac{\theta_i + \theta_j}{2} + 2JMu_{ij}^* \right)}$$

Taking the derivative of both sides with respect to ##J## and ##\theta_k## (using that ##d(tanh(x))/dx = 1-tanh^2(x)## )

$$\frac{\partial u_{ij}^*}{\partial \theta_k} = -\frac{\frac{1}{4}\big[1-(2u_{ij}^*-1)^2\big](\delta_i^k + \delta_j^k)}{1 - \big[1-(2u_{ij}^*-1)^2\big]\cdot JM}$$
$$\frac{\partial u_{ij}^*}{\partial J} = \frac{\big[1-(2u_{ij}^*-1)^2\big]\cdot JMu_{ij}^*}{1 - \big[1- (2u_{ij}^*-1)^2 \big]\cdot JM}$$

Note that here we can see that the derivatives are 0 when ##u_{ij}^* = 1##. Looking at the implicit relation for ##u_{ij}^*##, we can see that this happens when #J# becomes "relatively large".

The algorithm is therefore:

1. Initialize the parameters ##(\theta_1,...,\theta_N, J)## (randomly)
2. Solve for the value of ##u_{ij}^*## using the implicit relation
3. Calculate the value of the cost function
4. Calculate the partial derivatives
5. Update parameters using partial derivatives
6. Repeat step 2 till desired cost is achieved
I've experimented with different values of ##\eta##, and it seems that the first iteration may reduce the cost enormously , which will lead to a large change in ##J## , which means in the next iteration ##u_{ij}^*## may be close to 1, so it will not lead to further changes in the parameters anymore, or the first iteration leads to a small change, and so do all iterations (and it would take years to converge) and I can't seem to find a middle ground really!

I've also considered the possibility that if the algorith does converge, it may converge to a local minimum of the cost function

Any thoughts?

Last edited:
mfb
Mentor
Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.