for people trying to graduate to a higher level view of calculusa, here is a little of an old lecture on background for beginning students in differential topology:
Math 4220/6220, lecture 0,
Review and summary of background information
Introduction: The most fundamental *concepts used in this course are those of continuity and differentiability (hence linearity), and integration.
Continuity
Continuity is at bottom the idea of approximation, since a continuous function is one for which f(x) approximates f(a) well whenever x approximates a well enough. The precise version of this is couched in terms of “neighborhoods” of a point. In that language we say f is continuous at a, if whenever a neighborhood V of f(a) is specified, there exists a corresponding neighborhood U of a, such that every point x lying in U has f(x) lying in V.
Then the intuitive statement “if x is close enough to a, then f(x) is as close as desired to f(a)”, becomes the statement: “for every neighborhood V of f(a), there exists a neighborhood U of a, such that if x is in U, then f(x) is in V”.
Neighborhoods in turn are often defined in terms of distances, for example an “r neighborhood” of a, consists of all points x having distance less than r from a. In the language of distances, continuity of f at a becomes: “if a distance r > 0 is given, there is a corresponding distance s > 0, such that if dist(x,a) < s, (and f is defined at x) then dist(f(x),f(a)) < r”.
More generally we say f(x) has limit L as x approaches a, if for every nbhd V of L, there is a nbhd U of a such that for every point of U except possibly a, we have f(x) in V. Notice that the value f(a) plays no role in the definition of the limit of f at a. Then f is continuous at a iff f(x) has limit equal to f(a) as x approaches a.
Differentiability
Differentiability is the approximation of non linear functions by linear ones. Thus making use of differentiability requires one to know how to calculate the linear function which approximates a given differentiable one, to know the properties of the approximating linear function, and how to translate these into analogous properties of the original non linear function. Hence a prerequisite for understanding differentiability is understanding linear functions and the linear spaces on which they are defined.
Linearity
Linear spaces capture the idea of flatness, and allow the concept of dimension. A line with a specified point of origin is a good model of a one dimensional linear space. A Euclidean plane with an origin is a good model of a two dimensional linear space. Every point in a linear space is thought of as equivalent to the arrow drawn to it from the specified origin. This makes it possible to add points in a linear space by adding their position vectors via the parallelogram law, and to "scale" points by real numbers or "scalars", by stretching the arrows by this scale factor, (reversing the direction if the scalar is negative).
We often call the points of a linear space "vectors" and the space itself a "vector space". A linear function, or linear map, is a function from one linear space to another which commutes with these operations, i.e. f is linear if f(v+w) = f(v)+f(w) and f(cv) = cf(v), for all scalars c, and all vectors v,w.
The standard model of a finite dimensional linear space is R^n. A fundamental example of an infinite dimensional linear space is the space of all infinitely differentiable functions on R.
Linear Dimension
This is an algebraic version of the geometric idea of dimension. A line is one dimensional. This means given any point except the origin, the resulting non zero vector can be scaled to give any other vector on the line. Thus a linear space is one dimensional if it contains a non zero vector v such that given any other vector x, there is a real number c such that x = cv. We say then v spans the line.
A plane has the two dimensional property that if we pick two distinct points both different from the origin, and not collinear with the origin, then every point of the plane is the vector sum of multiples of the two corresponding vectors. Thus a linear space S is two dimensional if it contains two non zero vectors v,w, such that w is not a multiple of v, but every vector in S has form av+bw for some real numbers a,b. We say the set {v,w} spans the plane S.
In general a set of vectors {vi} spans a space S if every vector in S has form <summation> aivi where the sum is finite. The space is finite dimensional if the set {vi} can be taken to be finite. A space has dimension r if it can be spanned by a set of r vectors but not by any set of fewer than r vectors. If S is inside T, and both are finite dimensional linear spaces of the same dimension, then S = T.
Linear maps
Unlike continuous maps, linear maps cannot raise dimension, and bijective linear maps preserve dimension. More precisely, if f:S-->T is a surjective linear map, then dim(T) <= dim(S), whereas if f:S-->T is an injective linear map, then dim(T) >= dim(S). Still more precisely, if ker(f) = f-1(0), and im(f) = {f(v): v is in S}, then ker(f) and im(f) are both linear spaces [contained in S,T respectively], and dim(ker(f)) + dim(im(f)) = dimS. This is the most fundamental and important property of dimension. This is often stated as follows. The rank of a linear map f:S-->T is the dimension of im(f) and the nullity is the dimension of ker(f). Then for f:S-->T, we have rank(f) + nullity(f) = dim(S).
It follows that f is injective if and only if ker(f) = {0}, and surjective if dimT = dim(im(f)) is finite. A linear map f:S-->T with a linear inverse is called an isomorphism. A linear map is an isomorphism if and only if it is bijective. If dimS = dimT is finite, a linear map f:S-->T is bijective if and only if f is injective, if and only if f is surjective. A simple and important example of a linear map is projection R^nxR^m-->R^n, taking (v,w) to v. This map is trivially surjective with kernel {0}xR^m.
The theory of dimension gives a strong criterion for proving the existence of solutions of linear equations f(x) = w in finite dimensional spaces. Assume dimS = dimT finite, f:S-->T linear, and f(x) = 0 only if x = 0. Then for every w in T, the equation f(x) = w has a unique solution.
More generally, if S,T are finite dimensional, f:S-->T linear, and dim(ker(f)) = dim(S) - dim(T) = r, then every equation f(x) = w has an r dimensional set of solutions. We describe the set of solutions more precisely below.
Differentiation D(f) = f' is a linear map from the space of infinitely differentiable functions on R to itself. The mean value theorem implies the kernel of D is the one dimensional space of constant functions, and the fundamental theorem of calculus implies D is surjective.
More generally, for every constant c the differential operator
(D-c) is surjective with kernel the one dimensional space of multiples of ect, hence a composition of n such operators has n dimensional kernel. One can deduce that a linear combination <summation>cjDj 0<=j<=n, cn not 0, with constant coefficients cj, of compositions of D with maximum order n, has n dimensional kernel.
Geometry of linear maps.
If f:S-->T is a linear surjection of finite dimensional spaces, then ker(f) = f-1(0) is a linear space of dimension r = dim(T)-dim(S), and for every w in T, the set f-1(w) is similar to a linear space of dimension r, except it has no specified origin. I.e. if v is any solution of f(v) = w, then the translation taking x--> x+v, is a bijection from f-1(0) to f-1(w). Hence the choice of v as "origin" in f-1(w) allows us to define a unique structure of linear space making f-1(w) isomorphic to f-1(0). Thus f-1(w) is a translate of an r dimensional linear space.
In this way, f "fibers" or "partitions" the space S into the disjoint union of the "affine" linear sets" f-1(w). There is one fiber f-1(w) for each w in T, each such fiber being a translate of the linear space ker(f) = f-1(0). If
f:S-->T is surjective and linear, and dimT = dimS - 1, then the fibers of f are all one dimensional, so f fibers S into a family of parallel lines, one line over each point of T. If f:S-->T is surjective (and linear), but dimT = dimS - r with r > 0, then f fibers S into a family of parallel affine linear sets f-1(w) each of dimension r.
The matrix of a linear map R^n-->R^m
If S, T are linear spaces of dimension n and m and {v1,...,vn}, {w1,...,wm} are sets of vectors spanning S,T respectively, then for every v in S, and every w in T, the scalar coefficients ai, bj in the expressions v = <summation>aivi, and w = <summation>bjwj, are unique. Then given these minimal spanning sets, a linear map f:S-->T determines and is determined by the "m by n matrix" [cij] of scalars where: f(vj) =
<summation>i cijwi, for all j = 1,...,n. If S = T = Rn, we may take vi = wi = (0,...,0,1,0,...,0) = ei = the "ith unit vector", where the 1 occurs in the ith place.
If S is a linear space of dimension n and {v1,...,vn} is a minimal spanning set, we call {v1,...,vn} a basis for S. Then there is a unique isomorphism S-->R^n that takes vi to ei, where the set of unit vectors {e1,...,en} is called the "standard" basis of Rn. Conversely under any isomorphism S-->R^n, the vectors in S corresponding to the set {e1,...,en} in R^n, form a basis for S. Thus a basis for an n dimensional linear space S is equivalent to an isomorphism of S with R^n. Since every linear space has a basis, after choosing one, a finite dimensional vector space can be regarded as essentially equal to some R^n.
In the context of the previous sentence, every linear map can be regarded as a map f:R^n-->R^m. The matrix of such a map, with respect to the standard bases, is the m by n matrix whose jth column is the coordinate vector f(ej) in R^m.
If f:S-->T is any linear surjection of finite dimensional spaces, a careful choice of bases for S,T can greatly simplify the matrix of the corresponding map R^n-->R^m. In fact there are bases for S,T such that under the corresponding isomorphisms, f is equivalent to a projection
R^(n-m)xR^m-->R^m. I.e., up to linear isomorphism, every linear surjection is equivalent to the simplest example, a projection.
This illustrates the geometry of a linear surjection as in the previous subsection. I.e. a projection f:R^nxR^m-->R^m fibers the domain space R^nxR^m into the family of disjoint parallel affine spaces f-1(v) = R^nx{v}, with the affine space R^nx{v} lying over the vector v. Since every linear surjection is equivalent to a projection, every linear surjection fibers its domain into a family of disjoint affine spaces linearly isomorphic to this family. We will see that the implicit function theorem gives an analogous statement for differentiable functions.
The determinant of a linear map R^n-->R^n.
For each linear map f:R^n-->R^n there is an important associated number det(f) = det(cij) = the sum of the products <summation>p <product>i sgn(p)cip(i), where p ranges over all permutations of the integers (1,2,3...,n). det(f) is the oriented volume of the parallelepiped (i.e. block) spanned by the image of the ordered set of unit vectors f(e1),...,f(en). Then f is invertible iff det(f) is not 0. The intuition is that this block has non zero n dimensional volume iff the vectors f(e1),...,f(en) span R^n, iff f is surjective, iff f is invertible.
