# A Is this system of equations (numerically) solvable?

1. Jun 14, 2017

### dumbperson

Hi,

In a project of mine I've encountered the following set of equations:

$$\sum_{i=1}^N \left(\frac{1}{M}\sum_{\alpha=1}^Mg_{ij}^\alpha - u_{ij}^* \right) = 0 \qquad \forall: 1\leq j \leq N$$
$$\sum_{i<j}\left( (u_{ij}^*)^2 - \frac{2}{M^2}\sum_{\alpha < \beta}^Mg_{ij}^\alpha g_{ij}^\beta \right) = 0$$

and $u_{ij}^*$ is the $u_{ij}$ that solves the equation
$$u_{ij}= \frac{1}{2} + \frac{1}{2}\tanh{\left( - \frac{\theta_i + \theta_j}{2} + 2\cdot J\cdot M \cdot u_{ij}\right)}$$
where the $\theta_i (i=1...N)$ and $J$ are the variables to be solved for (which are $N+1$ different variables) and $M,N,g_{ij}^\alpha$ are known values (observables).

Do you think it is possible to solve this set of equations for $\theta_i, J$? I would naively say yes, since we have $N+1$ equations (the first two) and $N+1$ variables, but I'm not sure how I would solve this numerically.

I have been stuck on this for a long time so some thoughts from you smart guys would be greatly appreciated!!

2. Jun 14, 2017

### dumbperson

I'm sorry, I should've posted this in a different section of the forum since this is a project of mine

Last edited: Jun 14, 2017
3. Jun 14, 2017

### Staff: Mentor

You can always plug it into some numerical solver and see what comes out.

Some quick simplifications here, with suitable new variables cj and D. First set of equations:
$$\sum_{i=1}^N u_{ij}^* = c_j \qquad \forall: 1\leq j \leq N$$
Second equation:$$\sum_{i<j} (u_{ij}^*)^2 = D$$
With a suitable redefinition of u, you can simplify the third equation as well. And while I'm simplifying, solve for half the negative angle and J'=JM instead of M.
$$u'_{ij} = 1+ \tanh \left( \theta'_i \theta'_j + J' u'_{ij}\right)$$

Much easier to see how the system works now. For every set of angles and J', you can find uij, and see how much the equalities are violated, and then you can try to minimize this violation.

4. Jun 14, 2017

### Staff: Mentor

If you mean it's schoolwork, it is fine here. Advanced schoolwork where you show lots of work qualifies for the technical forums.

5. Jun 15, 2017

### dumbperson

But how exactly do I solve here for $\theta_i, J$? To use the equation in the quote, I would need to know the value of $u_{ij}$, but from the equations, i can only obtain the values of the sum $\sum_{i=1}^N u_{ij}$, no? Maybe I misunderstood what you're saying

6. Jun 15, 2017

### Staff: Mentor

Pick random values for $\theta_i$, $J$. Calculate $u_{ij}$. Calculate how much your equalities are violated and define some metric to reduce this to a single number (e.g. squared deviations).
Vary the initial values for your parameters a bit, derive $u_{ij}$ again, and see how the "quality" of the equations changes. Change the initial values in the direction that gives the largest improvement.
Repeat.

There are tools to do this optimization automatically, you just have to write a function that defines this "quality" - it should get minimal for a perfect match.

7. Jun 15, 2017

### dumbperson

Ah, so like minimizing some error function? Thanks I'll think about it and try to implement it :)

8. Jun 16, 2017

### dumbperson

Could you recommend me some tools for this automatic optimization? :)

9. Jun 16, 2017

### Staff: Mentor

I have mainly used ROOT so far (->RooFit, Minuit), but if you don't have a framework based on ROOT already I wouldn't recommend it. Find a minimization tool in your favorite language?

10. Jun 19, 2017

### dumbperson

I have tried to implement my own algorithm, but it does not seem to work and maybe you have some suggestions on why it doesn't work or can suggest alternativies? :-)

First of all, I forgot to mention that the value of $g_{ij}^\alpha$ is either 0 or 1.

First I make the following simplifications/definitions:

$$\sum_{j=1}^N u_{ij}^* = \frac{1}{M}\sum_{\alpha=1}^M\sum_{j=1}^Ng_{ij}^\alpha \equiv \overline{k_i^*}$$
$$\sum_{i<j} (u_{ij}^*)^2 = \frac{2}{M^2}\sum_{\alpha<\beta} \sum_{i<j} g_{ij}^\alpha g_{ij}^\beta \equiv O^*$$

I now define a Cost Function :
$$C(\vec{\theta}, J) \equiv \frac{1}{2} \sum_{i=1}^N \bigg|\bigg|\overline{k_i^*} - \sum_{j=1}^Nu_{ij}^*(\theta_i,\theta_j, J) \bigg|\bigg|^2 + \frac{1}{2}\bigg|\bigg|O^* - \sum_{i<j} \big(u_{ij}^*(\theta_i, \theta_j, J)\big)^2 \bigg|\bigg|^2$$
which we want to minimize.

A change in the value of the parameters $(\theta_1,...,\theta_N,J)$ leads to a change in the cost function

$$\Delta C(\vec{\theta},J) \approx \sum_{k=1}^N \frac{\partial C(\vec{\theta},J)}{\partial \theta_k} \Delta \theta_k + \frac{\partial C(\vec{\theta},J)}{\partial J}\Delta J$$

The "optimal" direction in which we can move our parameters is in the direction of the negative gradient of the cost function:

$$\Delta \theta_k = -\eta \frac{\partial C(\vec{\theta},J)}{\partial \theta_k}, \qquad \Delta J = -\eta \frac{\partial C(\vec{\theta}, J)}{\partial J}$$
where $\eta$ is some positive number that controls the size of this change.

Now, the derivative of the cost function with respect to some parameter $x$ is

$$\frac{\partial C}{\partial x} = - \sum_{i=1}^N \bigg(\overline{k_i^*} - \sum_{j=1}^N u_{ij}^*\bigg)\sum_{j=1}^N\frac{\partial u_{ij}^*}{\partial x} - 2 \bigg( O^* - \sum_{i<j}\big(u_{ij}^*\big)^2\bigg)\sum_{i<j}u_{ij}^* \frac{\partial u_{ij}^*}{\partial x}$$
We can find the derivatives $\partial u_{ij}^*/\partial J$ and $\partial u_{ij}^*/\partial \theta_k$ by using the implicit relation we have for $u_{ij}^*$:

$$u_{ij}^* = \frac{1}{2} + \frac{1}{2}\tanh{\left(- \frac{\theta_i + \theta_j}{2} + 2JMu_{ij}^* \right)}$$

Taking the derivative of both sides with respect to $J$ and $\theta_k$ (using that $d(tanh(x))/dx = 1-tanh^2(x)$ )

$$\frac{\partial u_{ij}^*}{\partial \theta_k} = -\frac{\frac{1}{4}\big[1-(2u_{ij}^*-1)^2\big](\delta_i^k + \delta_j^k)}{1 - \big[1-(2u_{ij}^*-1)^2\big]\cdot JM}$$
$$\frac{\partial u_{ij}^*}{\partial J} = \frac{\big[1-(2u_{ij}^*-1)^2\big]\cdot JMu_{ij}^*}{1 - \big[1- (2u_{ij}^*-1)^2 \big]\cdot JM}$$

Note that here we can see that the derivatives are 0 when $u_{ij}^* = 1$. Looking at the implicit relation for $u_{ij}^*$, we can see that this happens when #J# becomes "relatively large".

The algorithm is therefore:

1. Initialize the parameters $(\theta_1,...,\theta_N, J)$ (randomly)
2. Solve for the value of $u_{ij}^*$ using the implicit relation
3. Calculate the value of the cost function
4. Calculate the partial derivatives
5. Update parameters using partial derivatives
6. Repeat step 2 till desired cost is achieved
I've experimented with different values of $\eta$, and it seems that the first iteration may reduce the cost enormously , which will lead to a large change in $J$ , which means in the next iteration $u_{ij}^*$ may be close to 1, so it will not lead to further changes in the parameters anymore, or the first iteration leads to a small change, and so do all iterations (and it would take years to converge) and I can't seem to find a middle ground really!

I've also considered the possibility that if the algorith does converge, it may converge to a local minimum of the cost function

Any thoughts?

Last edited: Jun 19, 2017
11. Jun 19, 2017

### Staff: Mentor

Minimization algorithms have some clever ways to estimate good step sizes to avoid (or at least reduce) issues like that.
If your cost function doesn't have a single minimum, there is also the risk that you run into a local minimum. Typically algorithms use things lile simulated annealing to avoid this: start with larger steps in random (but not uniformly random) directions, prefer directions that reduce the cost but also do some steps that increase it. Reduce the step size and the probability of "bad" steps over time.

Existing algorithms have all this built in. You can reproduce all that, but it is probably easier to use existing tools.

12. Jun 19, 2017

### dumbperson

About pre existing algorithms/ tools: I assumed that they would be incompatible with my problem since I do not have an explicit function that I want to minimize, but that the function that I want to minimize involves some hidden function $u_{ij}^*(\theta_1,...,\theta_N,J)$ (that obeys the relation I posted). If I'm wrong about this, I would love to find a tool that I can use for my specific problem! If so, could you steer me into the right direction? I'd understand if you think I'm asking too much :P

I tried to implement the algorithm I described in Python by the way

13. Jun 19, 2017

### Staff: Mentor

Some tools should allow a separate user-defined function to calculate the cost of a given set of parameters - in that function you can include your intermediate parameters. I never needed that, so I don't know how exactly.