Is it correct that all visual displays are frame based and all audio players deliver output in streaming format? What is the reason why we do not or cannot present visual displays in streaming format and we do not or cannot have frame based audio players? If you stop a film or a video monitor you see one of the frames. If you stop an audio playback you hear nothing. Why do we frame the orchestra but stream the music? As I understand it, in our visual system the eyes jump from one object to the next and we retain in memory the objects previously scanned so that a continually updated total picture is held in memory. There are no frames. When we listen to music we capture each sound and we retain in memory the notes previously played, producing an auditory experience, such as a song. If it's all streaming, why do we have to use frames for video? By comparison, can someone also tell me how computer vision and computer hearing works – does AI in mobile robotics use frames for sight and streaming for hearing? To achieve object recognition does AI stream the data or does it use frames too?