Is there a typo in this theorem in Apostol or not?

  • #1
zenterix
771
84
Homework Statement
The theorem shown below appears in Apostol Vol II, Ch. 7 "Systems of Differential Equations".
Relevant Equations
It seems matrix ##B## should not be ##n\times n## but rather ##n\times 1##.
1738611841430.png
 
Physics news on Phys.org
  • #2
I mean, maybe there isn't a typo.

If ##B## is ##n\times n## then don't we have ##n## initial value problems?

##F(t)## is said to be ##n\times n##. Is not each column of ##F## a solution to one of the ##n## IVPs?

Just a little later in the book there is the following theorem

1738612092882.png


Now, ##B## is ##n\times 1## but the problem is also called an "initial value problem".

So, at a minimum, it seems the language used in the theorems is confusing. Is it also incorrect? How can ##B## have dimensions of either ##n\times n## or ##n\times 1## and either way the problem of ##Y'=AY## with ##Y(0)=B## is considered "an initial value problem".
 
  • #3
It is no typo. ##A,B## are ##n\times n##-matrices. The function ##F(t)## is the parameterization of a curve through a space of these matrices.
 
  • #4
I never really understood why there are these two separate theorems in the book.
 
  • #5
It's a bit unfortunate to use the same letter ##B## for a matrix and a vector, but the principles are the same, only that ##F(t)## is a curve in a matrix space and ##Y(t)## is a curve in a vector space.
 
  • #6
Okay, let me try to go through why there are the two theorems in steps here.

1738612689521.png


Ok, so ##A## is definitely ##n\times n##. Is ##F## ##n\times 1##?

Right after the above, there is a proof that for any ##n\times n## matrix ##A## and any scalar ##t## we have ##e^{tA}e^{-tA}=I##, proving that ##e^{tA}## is nonsingular.

Then we have the theorem

1738612785107.png


which, as far as I understand it, is like having ##n## initial value problems.

Then there is the following result

1738612844592.png


and

1738612870064.png

So here they have a vector ##Y##. Was ##F## a vector or a matrix, previously, in ##F'=AF##?

Finally there is the theorem that I showed in the OP

1738612929296.png
 
  • #8
##F## is no vector, it's a curve. E.g. ##F\, : \,[0,1]\longrightarrow \mathbb{M}(2,\mathbb{R})## defined by
$$
F(t)=\begin{pmatrix}e^t&ct\\0&e^{-t}\end{pmatrix}
$$
would be such a function, with ##F(0)=B=I.##
 
  • #9
Okay, but why are there the two theorems? What makes them so different?
 
  • #10
zenterix said:
Okay, but why are there the two theorems? What makes them so different?
I have no idea, and I haven't the book. I assume the proofs are very similar. Or you use the matrix theorem and simply a vector to the equation.
 
  • #11
zenterix said:
Okay, but why are there the two theorems? What makes them so different?
The proof of the second theorem should give you some idea of why it is or is not a consequence of the first theorem.
 
  • #12
Here's my take on it, without having the book at hand. (I sometimes wish I hadn't given all these great books away.)

an initial value problem is any problem where the value of your desired solution is specified at t=0, the initial time. If your function is matrix valued, the initial value is a matrix, if it is vector valued, the initial value is a vector, if it is real valued, the initial value is a number. If you think everything should be a number, then to you, a vector valued initial value problem is n^2 initial values. One reason for introducing vectors and matrices is to be able to think of n numbers, or a square array of n^2 numbers, as one object. When I was a young student, it ws very challenging to me to have my book just write f(x), which meant sometimes a number and sometimes a vector, and sometimes a matrix. Your book is using B purposely for a matrix and for a vector because a vector is the special case of a column matrix. In theorem. 7.7 he wants you to remember that previously, in 7.5, the same letter B was a matrix, and you should realize this later result is a specialization of the earlier one.

I.e. Theorem. 7.7 is just a special case of theorem 7.5. It is stated separately because it is in this version that many books state the result. This shows you that the usual vector version is a special case of the general matrix version. They could also be stated in the other order, and one could then observe that the more common vector version has a generalization, with essentially the same proof, to a theorem about matrices. Other books only give the result for real valued functions, and then these are two further generalizations of that more elementary result, all with essentially the same proof. There is no doubt an infinite dimensional Banach space version, with the letters A and B representing linear transformations or elements of a Banach space. For this see Lang or Dieudonne'.
 
Last edited:
  • Like
Likes FactChecker and fresh_42
  • #13
The original theorems 7.5 and 7.7, are almost trivial, as the proof in Apostol shows. I.e. existence, or the fact that e^tA solves u' = Au, is just a matter of knowing how to differentiate the exponential function. the uniqueness is the same argument used in beginning calculus, to show that the only functions with derivative zero are constants. I.e. assuming u' = au, look at h = u/e^at. By the quotient (or product) rule we get h' = [u'e^at - uae^at]/ e^2at = [aue^at-uae^at]/e^2at = 0. Hence h = constant c, so h = c.e^at. and c is determined by the initial condition.

But the question was why they are both stated separately, for which I stick with pedagogy, i.e. learning is facilitated by repetition, in particular it is useful to see how to specialize general statements. The OP is thus to be commended for seeing that the statements are in theory needlessly repetitious.

If you are interested in more insight as to the proofs in general, the usual argument for the existence, is by a sequence of approximations, which also can be used to see uniqueness. I.e. given a (possibly time dependent) vector field f(t,x), (where x is an arbitrary element of some vector space E) and hence differential equation u'(t) = f(t,u(t)) with initial condition u(0) = b, where u is a function from the reals to the vector space E, then u solves the differential equation if and only if it solves the integral equation u(t) = b + integral from 0 to t, of f(s,u(s))ds.

Hence u is a solution if and only if u is a "fixed point" of the operator H taking any function u to the function H(u) = b + integral from 0 to t, of f(s,u(s))ds.

Moreover, with appropriate condition on the domain, and on f, this operator H is a "contraction" (of the space of functions u from some t-interval to E), hence does have a unique fixed point. I.e. the contraction H gradually squeezes the whole space down to that one point, which itself is left fixed, and this fixed point is the solution to both the integral and differential equation. Thus one can begin from any point u0 at all in the function space, and repeated application of the operator H will give a sequence u0, H(u0) = u1, H(u1) = u2,...converging to the fixed point.

In the simple case of f(t,u(t)) = A.u(t), and u(0) = B, we get the operator H taking a function u to the function H(u) = [B + integral from 0 to t, of A.u(s)ds].

Beginning from the constant function u0 = B, and applying the operator H repeatedly, we get the sequence of approximations:
u0 = B,
u1 = H(u0) = H(B) = B + ABt,
u2 = H(u1) = H(B+ABt) = B + ABt + A^2B.t^2/2,
u3 = H(u2) = H(B+ABt+A^2B.t^2/2) = B + ABt + A^2B.t^2/2 + A^3B.t^3/3!,
.......
This is the famous exponential sequence, converging to (e^At).B = (e^tA).B.

Here we can take B in any Banach space E, i.e. E any vector space of any dimension, even infinite, which has a notion of "length" or norm" for which convergence of Cauchy sequences occurs, and A any continuous (i.e. "bounded") linear transformation on E.

Technical note: An interesting subtlety arises here that I got wrong in an earlier comment, (since deleted). Namely in order to guarantee convergence of the sequence {H(uj)} we must let H act on a space of merely continuous functions u from a t-interval to a closed bounded ball in E. I.e. a uniform limit of smooth functions may not be smooth in general. But in this case, even if u is merely assumed continuous, but also a fixed point of H, we have u = H(u). Then since H(u) is an integral of a continuous integrand, H(u) must be smooth by the fundamental theorem of calculus, hence u = H(u) forces the fix point u to also be smooth.
[I have not defined a contraction, but it is an operator H which in particular is continuous, so It follows that if u is the limit of {H(uj)}, then H(u) is also the limit of {H(uj)} hence u = H(u). The definition of contraction moreover forces the sequence {H(uj)} to be Cauchy, hence convergent under our assumptions.

Defn: H is a contraction of the metric space S, (with distance function d), iff H:S-->S and there is a constant c, with 0 < c < 1, such that for every pair of points x,y in S, we have d(H(x),H(y)) ≤ c.d(x,y). e.g this holds if the distance between any pair of points is cut in half by applying H.]
 
Last edited:

Similar threads

Replies
2
Views
1K
Replies
2
Views
739
Replies
1
Views
821
Replies
1
Views
1K
Replies
5
Views
1K
Back
Top