Is this system of equations (numerically) solvable?

dumbperson · Jun 14, 2017

Hi,

In a project of mine I've encountered the following set of equations:

$$ \sum_{i=1}^N \left(\frac{1}{M}\sum_{\alpha=1}^Mg_{ij}^\alpha - u_{ij}^* \right) = 0 \qquad \forall: 1\leq j \leq N$$
$$\sum_{i<j}\left( (u_{ij}^*)^2 - \frac{2}{M^2}\sum_{\alpha < \beta}^Mg_{ij}^\alpha g_{ij}^\beta \right) = 0 $$

and ##u_{ij}^*## is the ##u_{ij}## that solves the equation
$$ u_{ij}= \frac{1}{2} + \frac{1}{2}\tanh{\left( - \frac{\theta_i + \theta_j}{2} + 2\cdot J\cdot M \cdot u_{ij}\right)} $$
where the ##\theta_i (i=1...N)## and ##J## are the variables to be solved for (which are ##N+1## different variables) and ##M,N,g_{ij}^\alpha## are known values (observables).

Do you think it is possible to solve this set of equations for ##\theta_i, J##? I would naively say yes, since we have ##N+1## equations (the first two) and ##N+1## variables, but I'm not sure how I would solve this numerically.

I have been stuck on this for a long time so some thoughts from you smart guys would be greatly appreciated!

dumbperson · Jun 14, 2017

I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

mfb · Jun 14, 2017

You can always plug it into some numerical solver and see what comes out.

Some quick simplifications here, with suitable new variables c_j and D. First set of equations:
$$\sum_{i=1}^N u_{ij}^* = c_j \qquad \forall: 1\leq j \leq N$$
Second equation:$$ \sum_{i<j} (u_{ij}^*)^2 = D$$
With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find u_ij, and see how much the equalities are violated, and then you can try to minimize this violation.

berkeman · Jun 14, 2017

dumbperson said:

I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

If you mean it's schoolwork, it is fine here. Advanced schoolwork where you show lots of work qualifies for the technical forums.

dumbperson · Jun 15, 2017

mfb said:

You can always plug it into some numerical solver and see what comes out.

With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find u_ij, and see how much the equalities are violated, and then you can try to minimize this violation.

But how exactly do I solve here for ##\theta_i, J##? To use the equation in the quote, I would need to know the value of ##u_{ij}##, but from the equations, i can only obtain the values of the sum ##\sum_{i=1}^N u_{ij}##, no? Maybe I misunderstood what you're saying

mfb · Jun 15, 2017

Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

dumbperson · Jun 15, 2017

mfb said:

Pick random values for ##\theta_i##, ##J##. Calculate ##u_{ij}##. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive ##u_{ij}## again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

Ah, so like minimizing some error function? Thanks I'll think about it and try to implement it :)

dumbperson · Jun 16, 2017

mfb said:

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

Could you recommend me some tools for this automatic optimization? :)

mfb · Jun 16, 2017

I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?

dumbperson · Jun 19, 2017

mfb said:

I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?

I have tried to implement my own algorithm, but it does not seem to work and maybe you have some suggestions on why it doesn't work or can suggest alternativies? :-)

First of all, I forgot to mention that the value of ##g_{ij}^\alpha## is either 0 or 1.

First I make the following simplifications/definitions:

$$ \sum_{j=1}^N u_{ij}^* = \frac{1}{M}\sum_{\alpha=1}^M\sum_{j=1}^Ng_{ij}^\alpha \equiv \overline{k_i^*} $$
$$\sum_{i<j} (u_{ij}^*)^2 = \frac{2}{M^2}\sum_{\alpha<\beta} \sum_{i<j} g_{ij}^\alpha g_{ij}^\beta \equiv O^*$$

I now define a Cost Function :
$$C(\vec{\theta}, J) \equiv \frac{1}{2} \sum_{i=1}^N \bigg|\bigg|\overline{k_i^*} - \sum_{j=1}^Nu_{ij}^*(\theta_i,\theta_j, J) \bigg|\bigg|^2 + \frac{1}{2}\bigg|\bigg|O^* - \sum_{i<j} \big(u_{ij}^*(\theta_i, \theta_j, J)\big)^2 \bigg|\bigg|^2 $$
which we want to minimize.

A change in the value of the parameters ##(\theta_1,...,\theta_N,J)## leads to a change in the cost function

$$ \Delta C(\vec{\theta},J) \approx \sum_{k=1}^N \frac{\partial C(\vec{\theta},J)}{\partial \theta_k} \Delta \theta_k + \frac{\partial C(\vec{\theta},J)}{\partial J}\Delta J $$

The "optimal" direction in which we can move our parameters is in the direction of the negative gradient of the cost function:

$$ \Delta \theta_k = -\eta \frac{\partial C(\vec{\theta},J)}{\partial \theta_k}, \qquad \Delta J = -\eta \frac{\partial C(\vec{\theta}, J)}{\partial J} $$
where ##\eta## is some positive number that controls the size of this change.

Now, the derivative of the cost function with respect to some parameter ##x## is

$$ \frac{\partial C}{\partial x} = - \sum_{i=1}^N \bigg(\overline{k_i^*} - \sum_{j=1}^N u_{ij}^*\bigg)\sum_{j=1}^N\frac{\partial u_{ij}^*}{\partial x} - 2 \bigg( O^* - \sum_{i<j}\big(u_{ij}^*\big)^2\bigg)\sum_{i<j}u_{ij}^* \frac{\partial u_{ij}^*}{\partial x}$$
We can find the derivatives ##\partial u_{ij}^*/\partial J## and ##\partial u_{ij}^*/\partial \theta_k## by using the implicit relation we have for ##u_{ij}^*##:

$$u_{ij}^* = \frac{1}{2} + \frac{1}{2}\tanh{\left(- \frac{\theta_i + \theta_j}{2} + 2JMu_{ij}^* \right)}$$

Taking the derivative of both sides with respect to ##J## and ##\theta_k## (using that ##d(tanh(x))/dx = 1-tanh^2(x)## )

leads to

$$\frac{\partial u_{ij}^*}{\partial \theta_k} = -\frac{\frac{1}{4}\big[1-(2u_{ij}^*-1)^2\big](\delta_i^k + \delta_j^k)}{1 - \big[1-(2u_{ij}^*-1)^2\big]\cdot JM}$$
$$\frac{\partial u_{ij}^*}{\partial J} = \frac{\big[1-(2u_{ij}^*-1)^2\big]\cdot JMu_{ij}^*}{1 - \big[1- (2u_{ij}^*-1)^2 \big]\cdot JM} $$

Note that here we can see that the derivatives are 0 when ##u_{ij}^* = 1##. Looking at the implicit relation for ##u_{ij}^*##, we can see that this happens when #J# becomes "relatively large".

The algorithm is therefore:

Initialize the parameters ##(\theta_1,...,\theta_N, J)## (randomly)
Solve for the value of ##u_{ij}^*## using the implicit relation
Calculate the value of the cost function
Calculate the partial derivatives
Update parameters using partial derivatives
Repeat step 2 till desired cost is achieved

I've experimented with different values of ##\eta##, and it seems that the first iteration may reduce the cost enormously , which will lead to a large change in ##J## , which means in the next iteration ##u_{ij}^*## may be close to 1, so it will not lead to further changes in the parameters anymore, or the first iteration leads to a small change, and so do all iterations (and it would take years to converge) and I can't seem to find a middle ground really!

I've also considered the possibility that if the algorith does converge, it may converge to a local minimum of the cost function

Any thoughts?

mfb · Jun 19, 2017

Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

dumbperson · Jun 19, 2017

mfb said:

Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

Thanks for your response. I'll think about the points you mentioned!

About pre existing algorithms/ tools: I assumed that they would be incompatible with my problem since I do not have an explicit function that I want to minimize, but that the function that I want to minimize involves some hidden function ##u_{ij}^*(\theta_1,...,\theta_N,J)## (that obeys the relation I posted). If I'm wrong about this, I would love to find a tool that I can use for my specific problem! If so, could you steer me into the right direction? I'd understand if you think I'm asking too much :P

I tried to implement the algorithm I described in Python by the way

mfb · Jun 19, 2017

Some tools should allow a separate user-defined function to calculate the cost of a given set of parameters - in that function you can include your intermediate parameters. I never needed that, so I don't know how exactly.

Is this system of equations (numerically) solvable?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Trigonometry problem of interest

High School Area of Overlapping Squares

Undergrad Find the Number of Triangles

Geometric Game: Fun With Matches (Safe!)

Undergrad Three Circle Problem

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers