Closest approach of a parabola to a point, using lagrange multipliers

E'lir Kramer · Feb 11, 2013

Advanced Calculus of Several Variables, Edwards, problem II.4.1: Find the shortest distance from the point (1, 0) to a point of the parabola [itex]y^{2} = 4x[/itex].

This is the Lagrange multipliers chapter. There might be another way to solve this, but the only way I'm interested in right now is the Lagrange multipliers method.

The first tricky thing is that he has no supplied us with the function f that we are trying to minimize.

We need an equation for the distance between some point (x,y) and the point (1,0). [itex] d(x,y) = \sqrt{(x-1)^{2} + y^{2}} [/itex]. For convenience we can minimize [itex]d^{2}(x, y) = (x-1)^{2} + y^{2}[/itex] and get the same answer.

Now, this is a tricky part that I may have gotten wrong, since this author has given only a brief overview of the gradient function. [itex]\nabla d^{2}(x, y) = (2x-2, 2y) [/itex].

Now the parabola function is actually the constraint on this distance function. However, I am having a hard time with the form of this. Is the correct constraint [itex] 0 = 4x - y^{2} [/itex]? The thing is, that's not a function, it's just an equation. I don't understand how I'm suppose to take the gradient of it. Now I can imagine a function [itex] g : \Re^{2} \to \Re [/itex] such that [itex] g(x, y) = 4x - y^{2}[/itex]. And I know the parabola are all the points {[itex] { p \in \Re^{2} : g(p) = 0 } [/itex]}. Perhaps I just didn't understand the proof of the Langrangian method, but I am not sure why my constraint has to be the zero set of some function.

If [itex] \nabla g = (4, -2y) [/itex] (is it?), then I have the Lagrangian equality:

[itex] (2x-2, 2y) = \lambda (4, -2y) [/itex].

The three equations to solve are:

[itex] 2x-2 = 4\lambda \\
2y = -2\lambda y \\
0 = 4x - y^{2}
[/itex]

Is this correct? If so, it's unfortunate, because I can't solve it.

I tried first solving for [itex] \lambda = \frac{2x-2}{4} [/itex]
and [itex] \lambda = -1 [/itex] if y ≠ 0. I think this is justified because if y = 0, then [itex] \lambda = 0 [/itex].

Then [itex] -1 = \frac{2x-2}{4} [/itex], so [itex] x = -1 [/itex].

But then we have [itex] 0 = -4 - y^{2} [/itex], so [itex] -4 = y^{2} [/itex]. So y is an imaginary number? That's impossible. There must be some real solution to this problem.

STEMucator · Feb 11, 2013

Here's a graph that should help you verify :
http://gyazo.com/59ebde06b393b9c47f0f7d62f9172616

E'lir Kramer · Feb 11, 2013

I'm sorry, Zondrina. I'm not saying that your graph doesn't help me, but I don't see how it does.

marcusl · Feb 11, 2013

Your distance and constraint equations are fine, but you go very slightly astray in setting up the minimization equation. You should have, instead,
[tex]\nabla[d^2 + \lambda g] = 0.[/tex] This gives two (and only two) equations
[tex]2(x-1)+4\lambda x=0,[/tex] [tex]2y-2\lambda y = 0.[/tex]
The second is solved to give lambda = +1 and you can find x and y from there.

E'lir Kramer · Feb 11, 2013

How do I simplify [itex] \nabla[d^{2} + \lambda g][/itex]?

Is [itex] \nabla[d^{2} + \lambda g] = \nabla d^{2} + \lambda \nabla g[/itex]?

If so, and if [itex] \nabla d^{2} + \lambda \nabla g = 0 [/itex] is true for some [itex] \lambda \in \Re[/itex], then [itex] \nabla d^{2} - \lambda \nabla g = 0 [/itex] must be true for some other number [itex] \gamma \in \Re[/itex], where [itex] \gamma = -\lambda [/itex]. So aren't these just two formulations of the same insight?

Edit, oh, wait.

[itex] d^{2} + \lambda g = (x-1)^{2} + y ^{2} + 4\lambda x - \lambda y^{2} [/itex], so
[itex] \nabla [d^{2} + g] = (2(x-1) + 4\lambda , 2y - 2 \lambda y) = (0, 0) [/itex]

But you've got that [itex] 2(x-1) + 4\lambda x = 0 [/itex]. It seems like [itex] 2(x-1) + 4 \lambda = 0 [/itex] to me. If your extra x was a typo, then I still have the same problem: substituting 1 for [itex]\lambda[/itex] into your equations yields x = -1, which is what I had. And from then on our work is the same. Where does your extra x come from: was it a typo or did I mess up somewhere?

E'lir Kramer · Feb 11, 2013

My justification for y ≠ 0 is false. We have from [itex] 0 = 4x - y^{2} [/itex] that if y = 0, then x = 0. (0,0) could be the solution?

E'lir Kramer · Feb 11, 2013

I just tried to do the next problem in the chapter, which was also a minimum distance problem, and encountered the same thing with an imaginary root for y. I really hope someone can come in here and show me where I'm going wrong :).

marcusl · Feb 11, 2013

Sorry, the extra x was a typo. I see your point--that y ends up being imaginary, which solves the problem (it gives the minimum value d=0) but does not give a valid closest point. I'll think about this a bit more.

Ray Vickson · Feb 11, 2013

E'lir Kramer said:

Advanced Calculus of Several Variables, Edwards, problem II.4.1: Find the shortest distance from the point (1, 0) to a point of the parabola [itex]y^{2} = 4x[/itex].

This is the Lagrange multipliers chapter. There might be another way to solve this, but the only way I'm interested in right now is the Lagrange multipliers method.

The first tricky thing is that he has no supplied us with the function f that we are trying to minimize.

We need an equation for the distance between some point (x,y) and the point (1,0). [itex] d(x,y) = \sqrt{(x-1)^{2} + y^{2}} [/itex]. For convenience we can minimize [itex]d^{2}(x, y) = (x-1)^{2} + y^{2}[/itex] and get the same answer.

Now, this is a tricky part that I may have gotten wrong, since this author has given only a brief overview of the gradient function. [itex]\nabla d^{2}(x, y) = (2x-2, 2y) [/itex].

Now the parabola function is actually the constraint on this distance function. However, I am having a hard time with the form of this. Is the correct constraint [itex] 0 = 4x - y^{2} [/itex]? The thing is, that's not a function, it's just an equation. I don't understand how I'm suppose to take the gradient of it. Now I can imagine a function [itex] g : \Re^{2} \to \Re [/itex] such that [itex] g(x, y) = 4x - y^{2}[/itex]. And I know the parabola are all the points {[itex] { p \in \Re^{2} : g(p) = 0 } [/itex]}. Perhaps I just didn't understand the proof of the Langrangian method, but I am not sure why my constraint has to be the zero set of some function.

If [itex] \nabla g = (4, -2y) [/itex] (is it?), then I have the Lagrangian equality:

[itex] (2x-2, 2y) = \lambda (4, -2y) [/itex].

The three equations to solve are:

[itex] 2x-2 = 4\lambda \\
2y = -2\lambda y \\
0 = 4x - y^{2}
[/itex]

Is this correct? If so, it's unfortunate, because I can't solve it.

I tried first solving for [itex] \lambda = \frac{2x-2}{4} [/itex]
and [itex] \lambda = -1 [/itex] if y ≠ 0. I think this is justified because if y = 0, then [itex] \lambda = 0 [/itex].

Then [itex] -1 = \frac{2x-2}{4} [/itex], so [itex] x = -1 [/itex].

But then we have [itex] 0 = -4 - y^{2} [/itex], so [itex] -4 = y^{2} [/itex]. So y is an imaginary number? That's impossible. There must be some real solution to this problem.

To clear up some confusion: your constraint is a curve in 2 dimensions, written as g(x,y) = 0. You want to miminize (or maximize) a function f(x,y), subject to the restriction g(x,y) = 0. Your constraint really does involve a function! At a minimizing point ##p_0 = (x_0,y_0)## the two gradient vectors ##\nabla f## and ##\nabla g## must be parallel. Here is why.

For a small increment ##h = (h_x,h_y)## to be feasible, we need ##h_x g_x(p_0) + h_y g_y(p_0) = 0##, which says that the point ##p+h## lies in the tangent line of the constraint (and is therefore still feasible when we neglect "second-order" terms like ##h_x^2, \: h_y^2, \; h_x h_y## or higher order). In other words, if we "linearize" the problem, the condition ##h \cdot \nabla g = 0## keeps us feasible.

In order to have a constrained min of f at ##p_0## we need to have ##f(p_0 + h) \geq f(p_0)## for all feasible ##h = (h_x,h_y),##, so we need to have
[tex] h_x f_x(p_0) + h_y f_y(p_0) \geq 0 \text{ whenever }\; h_x g_x(p_0) + h_y g_y(p_0) = 0.[/tex] However, the same holds for ##-h = (-h_x,-h_y)## as well, so we finally need
[tex] h_x f_x(p_0) + h_y f_y(p_0) = 0 \text{ whenever }\; h_x g_x(p_0) + h_y g_y(p_0) = 0.[/tex] This means that the gradient vectors ∇g and ∇f must be linearly dependent, so if ∇g ≠ 0 there must be a scalar multiplier λ such that ∇f = λ ∇g.

Note that it does not really matter whether you write ∇f = λ ∇g (which is ∇f - λ ∇g = 0) or whether you write ∇f + λ ∇g = 0; you will just get different signs for λ. Later, when you deal with inequality-constrained problems like min f subject to g ≥ 0 it will then matter how you write the Lagrangian, but for now don't worry about it.

So, in your case you do have three equations in the three unknowns x, y and λ. Your equation ##2y = -2\lambda y## implies either (i) y = 0; or (ii) λ = -1. (Of course, you might have both!)

If you try (i), your other conditions ##2x-2 = 4\lambda## and ##4x = 0## give ## x = 0, \: \lambda = -1/2.##

If you try (ii) you have ##2x - 2 = 4(-1) = -4,## so ##x = -1## and then ##y^2 = -4 < 0,## which is impossible. Therefore, the appropriate case is (i).

HallsofIvy · Feb 12, 2013

Here's another way of thinking, Similar to what Ray Vickson said, about problems like this. In order to find the point closest to (a, b), starting from given point (x, y), you would find the vector from (x, y) to (a, b), move in that direction a short distance, then repeat with this now point. Of course, in this simple situation, where the function to be minimized is just distance, you would keep getting the same vector and eventually end up at (a, b), where the vector from (x, y) to (a, b) is the 0 vector.

Now, suppose you want to find the point on the curve g(x,y)= constant that is closest to (a, b). You could start the same way- find the vector from an initial point (x, y) to (a, b). But now, unless that vector happens to be tangent to the curve, you can't follow it. What you can do is find its projection onto the curve and go "left" or "right" on the curve depending upon whicy way the projection vector points on the curve. You can continue doing that, following the projection vector and getting closer to (a, b) until the projection vector is 0 and you can't get any closer by going either "left" or "right".

That happens when the vector from (x, y) to (a, b) is perpendicular to the curve. And, because [itex]\nabla g(x,y)[/itex] is perpendicular to the curve, that happens when the vector from (x, y) to (a, b) is parallel to [itex]\nabla g(x,y)[/itex]- that is, one is a multiple of the other.

It should be obvious that the distance from (x, y) to (a, b) increases fastest when we move directly away from (a, b), in the direction of the vector from (a, b) to (x, y). That is, the vector from (a, b) to (x, y) is in the direction of the gradient of the distance function- as you found: minimizing (or maximizing) the distance function, [itex]d(x,y)= \sqrt{(x- a)^2+ (y- b)^2}[/itex], is the same as minimizing (or maximizing) [itex]d^2(x,y)= (x- a)^2+ (y- b)^2[/itex] and its gradient is [itex]\nabla d^2(x, y)= 2(x- a)\vec{i}+ 2(y- b)\vec{j}= 2((x- a)\vec{i}+ (y- b)\vec{j})[/itex], as multiple of the vector from (a, b) to (x, y).

That is, to find the minimum distance from (x, y) to (a, b), subject to the condition that g(x,y)= constant, we solve [itex]\nabla g(x, y)= \lambda \nabla d^2(x, y)[/itex], where one vector is a multiple of the other.

More generally, to find (x, y) such that f(x,y) has a minimum or maximum, subject to the requirement that g(x,y)= constant, solve [itex]\nabla g(x,y)= k\nabla f(x,y)[/itex].

Here, the function to be minimized is the distance to (1, 0), and that, as you say, is equivalent to minimizing [itex]f(x,y)= (x- 1)^2+ y^2[/itex] subject to the condition that [itex]g(x, y)= y^2- 4x= 0[/itex]. We need to solve [itex]\nabla f(x, y)= 2(x- 1)\vec{i}+ 2y\vec{j}= \lambda\nabla g(x, y)= -4\lambda\vec{i}+ 2\lambda y\vec{j}[/itex]. That gives the two equations [itex]2(x- 1)= -4\lambda[/itex] and [itex]2y= 2\lambda y[/itex]. You have three equations, including the constraint [itex]y^2= 4x[/itex] to solve for x, y, and [itex]\lambda[/itex].

As long as y is not 0, you can divide that last equation by 2y to get [itex]\lambda= 1[/itex] as marcusl says. But since a value of [itex]\lambda[/itex] is not necessary to solve the problem, I find it often simplest to start by eliminating [itex]\lambda[/itex] by dividing one equation by another. Dividing each side of [itex]2(x- 1)= -4\lambda[/itex] by the corresponding side of [itex]2y= 2\lambda y[/itex], we have [itex]2(x- 1)/2y= -4/2y[/itex].

Ray Vickson · Feb 12, 2013

Sorry: where in my reply I said " ... ##h \cdot \nabla f = 0## keeps us feasible..." I meant "... ## h \cdot \nabla g = 0## keeps us feasible...". For some reason, the "edit" option is no longer available, so I can't correct the typo.

E'lir Kramer · Feb 12, 2013

Thank you both for your generous attention to my questions! I read both of your posts last night before bed, and again this morning over coffee. Gradually, the insight that the gradients must be parallel at the extrema is sinking into me. I am convinced of the intuitive argument now, and I just need some more time to internalize the formal proof that Edwards gives.

vela · Feb 12, 2013

Ray Vickson said:

Sorry: where in my reply I said " ... ##h \cdot \nabla f = 0## keeps us feasible..." I meant "... ## h \cdot \nabla g = 0## keeps us feasible...". For some reason, the "edit" option is no longer available, so I can't correct the typo.

The edit option is available only for a certain length of time after you post. I made the change for you.

Ray Vickson · Feb 12, 2013

vela said:

The edit option is available only for a certain length of time after you post. I made the change for you.

Thank you.

E'lir Kramer · Feb 12, 2013

I have another related question. I am not sure if I should put it here or in a new thread, so I'll put it here.

II.4.6: The equation [itex]73x^{2} + 72xy + 52^{2} = 100[/itex] defines an ellipse which is centered at the origin, but has been rotated about it. Find the semiaxes of this ellipse by maximizing and minimizing [itex]f(x, y) = x^{2} + y^{2}[/itex] on it.

When drawing out the diagram for this one, it really brings home that the two functions must be tangent at their maximum and minimum points.

I *think* that I Just have a mechanics problem in being unable to solve the system of equations that the Lagrangian generates. But, since I have been unable to do so for several hours of on-and-off tinkering, I'm hoping for a hint.

We have

[itex](2x, 2y) = \lambda(146x + 72y, 104y + 72x) [/itex], so,

[itex]2x = \lambda(146x + 72y) \\
2x - 146\lambda x= 72\lambda y \\
x(1-73\lambda) = 36 \lambda y [/itex]

[itex]2y = \lambda(104x + 72x) \\
y(1-52 \lambda) = 36 \lambda x[/itex]

[itex]73x^{2} + 72xy + 52^{2} - 100 = 0 \\[/itex]

No matter what I try, I can't eliminate one of the variables by substitution. Is this system even solvable?

vela · Feb 12, 2013

It might help to rewrite the two linear equations in this form:
\begin{align*}
(73-k)x + 36y &= 0 \\
36x + (52-k)y &= 0
\end{align*} where ##k=1/\lambda##. In matrix form, this would be
$$\begin{pmatrix} 73-k & 36 \\ 36 & 52-k \end{pmatrix}\begin{pmatrix} x \\ y \end{pmatrix} = 0.$$ Obviously, x=y=0 is a solution to those equations, but it won't satisfy the constraint. The only way this system can have a non-trivial solution is if the determinant of the matrix vanishes. This will allow you to solve for k. Once you have that, you can solve for x in terms of y (or vice versa). Then the constraint will allow you to find specific values.

E'lir Kramer · Feb 15, 2013

Thanks, Vela. Taking your word on it that (73-k)(52-k) - 36*36 = 0, I was able to solve for k and then back substitute for x and y.

Is this result true in general? If I have a matrix equation equal to zero, does the equation only have non-trivial solutions if the determinant is zero?

Ray Vickson · Feb 15, 2013

E'lir Kramer said:

Thanks, Vela. Taking your word on it that (73-k)(52-k) - 36*36 = 0, I was able to solve for k and then back substitute for x and y.

Is this result true in general? If I have a matrix equation equal to zero, does the equation only have non-trivial solutions if the determinant is zero?

Yes: that is a standard theorem in Linear Algebra 101: a square system Ax = 0 has a nonzero solution for x if and only if det(A) = 0.

E'lir Kramer · Feb 15, 2013

A square system?

voko · Feb 15, 2013

"Square system" is somewhat informal for a linear system whose matrix is square, i.e., the number of row equals the number of columns (alternatively, the number of equations equals the number of unknowns).

vela · Feb 15, 2013

E'lir Kramer said:

We have

[itex](2x, 2y) = \lambda(146x + 72y, 104y + 72x) [/itex], so
[itex]2x = \lambda(146x + 72y)[/itex]
[itex]2y = \lambda(104y + 72x)[/itex]

No matter what I try, I can't eliminate one of the variables by substitution. Is this system even solvable?

Another approach would be to do what HallsofIvy suggested earlier. Divide one equation by the other to eliminate ##\lambda##. If you assume ##y\ne 0##, you get
$$\frac{x}{y} = \frac{146x+72y}{104y+72x} = \frac{146\left(\frac{x}{y}\right)+72}{104+72\left(\frac{x}{y}\right)}$$ and then solve for x/y. You still need to go back and check that the assumption ##y \ne 0## is valid, which it is. You can avoid using math you're not familiar with this way.

vela · Feb 15, 2013

E'lir Kramer said:

Is this result true in general? If I have a matrix equation equal to zero, does the equation only have non-trivial solutions if the determinant is zero?

As Ray noted, it's a standard result from linear algebra, but you probably already know enough to convince yourself it's true. Typically in algebra, you learn some basic stuff about matrices, in particular, if det(A)≠0 then A is invertible. That's all you really need.

Let Ax=0. If det(A)≠0, you know A^-1 exists, so you can multiply both sides of the equation by A^-1, which leads to x=0. Therefore, if you want a solution other than x=0, you must have det(A)=0.

E'lir Kramer · Feb 20, 2013

thanks everyone. I took a few days off from math to clear my head, but I will revisit this result in the morning and try to convince myself of this result in linear algebra

Closest approach of a parabola to a point, using lagrange multipliers

1. What is the closest approach of a parabola to a point?

2. How is the closest approach of a parabola to a point calculated?

3. What is the role of Lagrange multipliers in finding the closest approach of a parabola to a point?

4. Can Lagrange multipliers be used for any type of parabola and point?

5. Are there any alternative methods for finding the closest approach of a parabola to a point?

Similar threads

Hot Threads

Recent Insights