Framing versus Streaming

  1. Is it correct that all visual displays are frame based and all audio players deliver output in streaming format? What is the reason why we do not or cannot present visual displays in streaming format and we do not or cannot have frame based audio players? If you stop a film or a video monitor you see one of the frames. If you stop an audio playback you hear nothing. Why do we frame the orchestra but stream the music?

    As I understand it, in our visual system the eyes jump from one object to the next and we retain in memory the objects previously scanned so that a continually updated total picture is held in memory. There are no frames. When we listen to music we capture each sound and we retain in memory the notes previously played, producing an auditory experience, such as a song. If it's all streaming, why do we have to use frames for video?

    By comparison, can someone also tell me how computer vision and computer hearing works – does AI in mobile robotics use frames for sight and streaming for hearing? To achieve object recognition does AI stream the data or does it use frames too?
  2. jcsd
  3. As I didn’t get any replies, I am going to try to answer my own question. I think it goes like this:

    Originally eyes were simple light sensors and detected photon streams only. With evolution, today’s eyes developed into sophisticated scanning instruments which are capable of spatial differentiation, so enabling the brain to build a larger picture. All man-made displays are frame based in order to deliver spatial information, in line with the capabilities of the human visual system to receive and process it.

    However, as the human audio system can detect streamed data only, that is what man-made audio systems deliver. If our ears were constructed in the same manner as our eyes and we could build sound pictures in the brain, then man-made audio systems would be designed to deliver sound frames.

    So far so good, but I still don’t know if the brain sees frames. Does the brain build its pictures sequentially using streamed data from memory, or does it retrieve frames? In the case of sound, since there is no scanning by the ear, then logically there are no scanned audio pictures in the brain to be retrieved. But what do musicians retrieve when they are playing from memory? And how do/will the visual and audio systems of robots work? Do they/will they also have highly developed visual systems and substandard hearing systems as we have?
  4. rcgldr

    rcgldr 7,398
    Homework Helper

    Visual displays are frame based, but some of them will somewhat smoothly transition the pixels on a image from frame to frame. Film projectors will have some period of darkness between frames, but it's relatively small compared to the frame rate. Digital projectors can use the same somewhat smooth pixel transition methods used for monitors. The persitance of the phosphors on CRT type monitors or projectors determines how often the image has to be redrawn before flicker becomes noticable.

    Oscilloscopes, some ancient computer monitors (like the ones on a CDC 6600 computer), and some old arcade game monitors (like astreroids, or battlezone) don't use pixels, and instead just move an electron beam around in various shapes to end up with a monochrome image (just one color and darkness), on a display, called vector graphics. Some of these are limited to only drawing objects using straight lines as opposed to also handling smooth curve shapes.

    The visual receptors in a persons eyes also have a form of "persistance" and do not immediately respond to changes, but frame based perceived motion (where visual motion is emulated via a sequence of fixed frames as opposed to continous smooth motion) is supposed to be due more to how a person's brain interprets a series of frames into apparent motion. Wiki article about how people perceive movement from visual displays:
    Last edited: Dec 3, 2012
  5. FYI, audioplayers are all "frame based" as well. An mpeg-2 layer-3 (aka mp3) stream is simply a stream of frames.

    The brain is arguably an analog device, but digital devices all build their outputs from discrete frames. The trick is playing the frames back fast enough to give the perception of continuity.

    "streaming" in the context of the internet just means that the content is not fully downloaded before playback starts.
  6. Thanks, I didn't know that audioplayers use frames. But are the audio frames not just the whole recorded data separated into batches with nothing missing? If so, it's not much different from a constant stream of data.

    As I understand it, video cameras generate sample data in a scanning process and the camera is not able to capture an entire picture at once. It only captures enough sample data so that the video display can do its job of tricking the eye and brain with speed and repetition. When we watch the monitor, the eye rescans the frames and sends the data to the visual cortex. Whether the brain then generates its own frames is one of my questions.

    In the case of audio there is no scanning, neither in the recording process nor in the playback. It's all sequential with no sound pictures or frames in this sense, right?
  7. rcgldr

    rcgldr 7,398
    Homework Helper

    Depends on the camera technology. A rolling shutter (film or CMOS) captures a portion of a picture at a time, which can create strange effects on fast moving objects, like what appears to be a bent spinning propeller. Wiki article include some examples of rolling shutter effects:

    A global shutter captures an entire picture at once. For high speed film capture, one way to get a global shutter effect is to use a very high speed strobe in an otherwise dark environment and no physical shutter. Example photo of a rocket sled moving at 3300 mph, captured with a single very bright and short duration flash:


    What video devices don't do is provide images of continous motion. To prevent an "animated" look to film or video, the shutter speed is almost as slow as the frame rate so that moving objects leave a blurry trail across the image, called motion blur.

    Audio is continous in the case of analog devices, such as analog tape or vinyl records. CD's, digital tape, and the audio tracks on a DVD are digitally sampled, the equivalent of frames. During playback these devices use a digital to analog converter that smooths the output so that it closely resembles the original continous sound wave(s).
    Last edited: Dec 3, 2012
  8. Thanks for your explanations and the link to rolling shutters. The videos produce some nice effects in the frames shown on my laptop monitor! I think I am now more or less clear on the visual/audio technology part of my questions.

    Regarding your previous link to the phi phenomenon, I had already read it, but still didn't understand the difference between phi and beta. In case someone is interested, I found a better demo and explanation here:
  9. There is nothing analogous to frame scanning on the audio side, that's right, but you can effectively ignore this on the video side as well. You can either use a frame transfer CCD (expensive!) for recording, or record in analog and do a digital transfer afterwards. During playback, artifacts can be introduced by the player or compression that are not actually present in the recording itself.
  10. rcgldr

    rcgldr 7,398
    Homework Helper

    This is my understanding of the difference.

    The wiki article doesn't explain phi phenomenon well, but it's example of zoetrope may help. With a zoetrope, each frame is "static" image, but the components of image move from frame to frame. A similar effect is being produced with the image of an animation of a running horse in this wiki article. The legs of the horse have actually moved from frame to frame.

    Other examples of phi phenomenon are the mutoscope or flip book:

    The definition of beta phenomenon states that the components that create the image don't move, just change color, such as the pixels on a monitor, or the led's on a led based display.

    In the case of film, the components of images move from frame to frame which would be phi phenomenon, but if you watch that film on a computer monitor or digital projector, it's beta phenomenon.

    There's no equivalent to a rolling shutter for capturing audio (assuming that all channels for a multi-track audio source are sampled at the same moments), but digital audio recording normally just samples moments of audio amplitude (for each channel) at some fixed rate, such as 44.1 khz for cd. During playback, the digital to analog conversion tries to recreate the original continous sound wave based on the samples.
    Last edited: Dec 6, 2012
  11. The difference between Phi and Beta is difficult to "see" but easy to explain. Both are brain phenomenon though, it's not that one takes place in the eyes, but how the illusion is created. Both lead to the same illusion (motion).

    Phi is a succession of images with a missing element (or elements), while Beta is a succession of images with a moving element.

    If you look at the Phi Phenomenon wiki page, you'll see two example images. The image demonstrating Phi is a succession of still images that are identical except for one element being removed. The image demonstrating Beta is a succession of still images as well, but instead of an element being removed from each image, a single foreground element is moved to a different position.

    The difference isn't in the display technology used, it's in "what" appears to be moving -- a foreground element, or a section of the background.

    Another way to look at phi is that the brain sees a "thing" where there isn't actually a thing to see. In the example image, it looks like "something" is moving around in a circle, covering up the dots, when in fact nothing is moving -- one of the dots is being removed. In beta movement, there is an actual thing, and it is actually moving.
  12. rcgldr

    rcgldr 7,398
    Homework Helper

    That wiki article continues with this example:

    the zoetrope is a device that produces the illusion of motion by presenting static pictures in quick succession

    This would correspond to the same principle as the mutoscope, flip book, or film (movie), and these devices do not involve toggling elements in an image on and off, but instead involve movment of elements of an image from frame to frame.

    As for the image demonstration, assume that the image is a static image with one dot missing, and that the static image is rotated to the right by one dot for each frame (as opposed to the dots being lights that are turned on and off).

    The beta description in the wiki article is even more confusing:

    Its illusion is that fixed images seem to move, even though of course the image does not change.

    The phi phenomenon can be considered to be an apparent movement caused by luminous impulses in sequence, (that is to say, it is lights going on and off at regular intervals), whereas the beta movement is an apparent movement caused by lights that do not move, but seem to.

    These descriptions would seem to imply optical illusions where a viewer gets a sense of movment in a single static image, but conflict with the LED example where LED's are turned on and off in in a pattern to give the illusion of motion. The toggling of the LED example is similar to how monitors or digital projectors work (except the transitions may be smoothed out depending on the monitor or projector).
    Last edited: Dec 6, 2012
  13. I agree. Looking at the wiki article, that the difference between the two is about as clear as mud. If the yellow dot is spun the other way and arranged into a circle, what are we left with? One image with a yellow dot appearing to travel in a circle, and one with an empty space appearing to travel in a circle.

    I also came across this (t's not actually xml) on the talk page for phi.

    It says in part:
    If this is correct it would certainly help clear things up -- no distinguishable difference between phi and beta if phi is a superset containing beta.
  14. Interesteing
Know someone interested in this topic? Share a link to this question via email, Google+, Twitter, or Facebook