1] "Seeing" ultimately occurs in the occipital lobes at the back of the brain, and all the brain has to work with is nerve cells, action potentials of nerve cells exciting or inhibiting subsequent nerve cells... the brain does not "see" physical objects, nor the light from objects, it has only the signals from upstream nervous activity with which to abstract the experience of seeing.
2] Detection of edges, angles, shapes, lateral movements, rotations, and many other features are extracted within the ten layers of the retina before the signals leave out the optic nerve.
The surface of the cortex of the brain also has ten functional processing layers, and the retina is very much a specialized projection of the brain's cortical surface through the optic nerve to the back of the eyeball.
3] A pair of lateral geniculate nuclei take signals from each half of each retina and form six layers from the two eyes - six layers of processing structures. This is where much of the processing of depth is done by comparing and processing the different signals in adjacent layers from the two eyes.
4] Very many more things are going on... there is a very complicated system that allows you to shift your eyes from looking at one object to another without causing a mass movement of the background across the field of vision - so you can move around and move your eyes around and still percieve a "steady" field of view...
But ultimately we only perceive our own nervous system from the inside out; we don't actually see objects or light, we don't hear sounds, etc. We abstract our entire perception of the world... space, distance, size, color, perspective, time, motion, and all conceptual relations of these.
5] Just to be clear; when you watch a tennis match and think you are "seeing" the moving yellow ball, the part of the brain that process the identification of the shape as a ball, the part that processes the color yellow, and the part that processes the motion of the object... all three of those features are processed in physically separate and different structures. Yet, these are integrated by further processing and you "see" a moving yellow ball as a single whole object of perception - so the level and degree of feature detection, abstraction, and integration is very subtle and sophisticated. Virtually nothing is known about how it really works even today.