They have sensor cameras that work side by side with the on air camera to analyze whats going on in the field. The technology can distinguish players based on differences in color of their uniforms from the field green as well as what team they're. It pretty fascinating, it can even analyze player formations from the image alone.
Also based on the pan and zoom of the on-air camera it is possible to superimpose 3d rendered elements onto the video image of the field. Its interesting stuff, google " Image extraction technology".
There are even computer webcams out that can follow your face around by the whites of your eyes and superimpose funny images of glasses and googly eyes and whatnot.
If the camera is fixed to a point superimposing things is quite easy, if all the players (or anything else likely to move, e.g. crowd) are in front of or behind the virtual object (e.g. the players are always in front of the ground) you can make a texture map for deciding whether to render the video feed or the 3d model. If you know anything about alpha blending, just consider extending it to a cubemap (i.e. a texture for each face of a cube) and rendering that alpha cubemap like a skybox. Not very flexible, but I'm sure it gets used for some of those effects, because as the camera wobbles the blending does not, and it notices.
If you are not constrained to a point you can model occlusion with low detail 3d models, standing in for real objects. You take a line from the camera to the point in the object you want to test for visibility, if it intersects a plane from the "blocker" model and is within the bounds of one of its triangular face, then you render the video feed there. You would still need to measure the path the camera takes, however.
The more complicated stuff, I have no idea about though... lots of digital image processing.