Resolution of the human visual system in physical terms

Graeme M · Sep 6, 2017

I've often wondered at the "resolution" of the human visual system but it's not at all clear from what I've read whether this question even makes sense. As a sort of general position, many of the articles I've read suggest that the human eye, over the full field of vision, delivers around 400-600 megapixels of detail. The central foveal field is considerably less, perhaps no more than 10 megapixels. And only around 1 million fibres extend from retina to brain.

However, I don't think this is really what I'm thinking about. The visual system does a lot of processing along the way and at some point in the system an image becomes conscious. And of course, vision is more like a video stream than a static image.

What I am curious about is whether we know how detailed is a conscious image. While we might only discern 10 megapixels in terms of visual field at the retina's foveal region, and only 1 million fibres extend to the brain, I assume that the constant stream being generated produces a far more detailed conscious image. That is, the "resolution" of a conscious image must be greater than is physically derived at a singular moment at the retina.

If that is so (or even if it isn't I suppose) is it known how many neurons contribute to each pixel in a conscious image? I mean by this that as conscious images are formed over time, does a single neuron contribute to one pixel or many pixels? Do we know the ratio of neurons to pixels at the stages of image processing that most likely give rise to conscious experience of a scene?

Or is this not really a valid question because I don't understand enough about visual processing?

FactChecker · Sep 6, 2017

I think that is a good question, but the answer is very complicated. The mind fills in missing parts of what the eye sees -- right or wrong. Mental recognition of the pattern that the eye sends it is very complicated. If you are asking about that, then I don't believe we know enough to answer your question or even how to rigorously express an answer.

jim mcnamara · Sep 6, 2017

How about a measure of resolution, or visual acuity? What the fovea picks up - a small subsection of the visual panorama. Is that what you mean?

@FactChecker is close to the heart of the perception issues. Your brain does a lot of postprocessing of visual input to fill in images it interprets as belonging to whole whole picture, and can be fooled pretty easily. Optical illusions like trompe l'oeil, and illusions stage magicians perform -- come to mind.

This helps:
https://en.wikipedia.org/wiki/Visual_acuity. It discusses what you asked and all of the attendant factors. IMO.

Ygggdrasil · Sep 6, 2017

Measuring the resolution of an image by number of pixels is a relatively new phenomenon that has largely come about because of how digital cameras store information. For images not stored digitally (e.g. on film), it's unclear how you could describe the resolution in terms of megapixels. Traditionally, the spatial resolution of an imaging system would be measured by how closely apart two objects could be for them to be distinguishable in the image. Features such as the numerical aperture/f-number of the imaging system help determine the spatial resolution of the imaging system. Ultimately, the spatial resolution of an optical imaging system is limited by the diffraction of light.

Measuring the resolution of the human eye in megapixels presuposes the image information gets stored/interpreted in a pixelated, digital format. Is this true?

FactChecker · Sep 6, 2017

Some issues with the mental image of what a person sees depend a lot on how long he gets to study the image. It is well established that eye witness testimony of a brief criminal event is very unreliable. The mind fills in many details that are not really there. On the other hand, given time to study a picture and take mental notes of the details, the results are much more reliable.

Andy Resnick · Sep 6, 2017

Graeme M said:

I've often wondered at the "resolution" of the human visual system but it's not at all clear from what I've read whether this question even makes sense. As a sort of general position, many of the articles I've read suggest that the human eye, over the full field of vision, delivers around 400-600 megapixels of detail. The central foveal field is considerably less, perhaps no more than 10 megapixels. And only around 1 million fibres extend from retina to brain.

However, I don't think this is really what I'm thinking about. The visual system does a lot of processing along the way and at some point in the system an image becomes conscious. And of course, vision is more like a video stream than a static image.

What I am curious about is whether we know how detailed is a conscious image. While we might only discern 10 megapixels in terms of visual field at the retina's foveal region, and only 1 million fibres extend to the brain, I assume that the constant stream being generated produces a far more detailed conscious image. That is, the "resolution" of a conscious image must be greater than is physically derived at a singular moment at the retina.

If that is so (or even if it isn't I suppose) is it known how many neurons contribute to each pixel in a conscious image? I mean by this that as conscious images are formed over time, does a single neuron contribute to one pixel or many pixels? Do we know the ratio of neurons to pixels at the stages of image processing that most likely give rise to conscious experience of a scene?

Or is this not really a valid question because I don't understand enough about visual processing?

"perfect vision" corresponds to about 1 arcmin of resolution. However, as you correctly note, that doesn't begin to address visual acuity: it is known that the angular resolution of vision depends on the contrast between object and background, if the object is moving, if the object is 'blinking' (and the temporal characteristics of the blinks- duration, repetition rate, etc.), if the image is located at the fovea or periphery... And everything depends on wavelength and if you mean scotopic (dim light, so the rods) or photopic (bright light, the cones).

Your retina has about 7 layers of processing: local averaging, edge detection, movement tracking, temporal averaging, and more.

The best book I have read on this topic is "Basic Vision" https://www.amazon.com/dp/019957202X/?tag=pfamazon01-20

BillTre · Sep 6, 2017

Ygggdrasil said:

For images not stored digitally (e.g. on film), it's unclear how you could describe the resolution in terms of megapixels.

I have heard pixel based digital picture resolutions compared to the number of grains/area in film and negatives, mostly from photographers.

--------------------------
I human visual system functions differently from film and digital images.

There is processing of visual image information in the retinal itself.
Changes vs. time will affect how this processing proceeds, so the movie like aspect can be important.
A flash, or a sudden increasingly large dark area, in the peripheral visual field will more strongly attract attention than areas of constant illumination.

Different areas of the retina serve different functions in the internal reconstruction of the visual area (like a stage), the objects in that area, and movements between the various parts. Information concerning the depth (distance; derived from eye convergence, eye lens focusing, other visual cues) of objects in a scene could also be included.
Retinal areas surrounding the fovea have rods (B&W), which are better for low light conditions . Spatially, they may provide a more low-res. regional awareness of objects in the area.

The fovea is constantly moving, going from focusing on a spot for a short period of time and them moving to another part of the scene (eye movement information would also be combined with the photo-receptor derived information). This provides more information, at a higher density, from particular areas under observation, unconsciously selected for you by the visual system. Many areas just get filled in based on their surroundings. There are optical illusions based on this.

Therefore, your observed scene could vary quite a bit in what you might be able to resolve.
While pictures or movies (digital of non-digital) story an even density of picture components (pixels or film grains) across the image, a visual image (as normally used to explore an environment), the internal human visual image is more like a built up model, based mostly on visual inputs, but sometimes also on other senses (such as sensing eye/lens movements or orientation vs. gravity).

When you observe a visual field, you are probably looking at an internally constructed model of the objects and areas you are observing.
The inputs to this system are not evenly distributed across the retina. Most color receptors are found at the highest densities at the fovea. The fovea is aimed at areas of interest in a visual field to get detailed information of a particular part of the visual field.

With continued observation from different observation points, points of increasingly high resolution could be built up, exceeding what might otherwise seem like limits. Of course technology (such as microscopes/telescopes) have allowed us to further extend our "vision" to even higher resolutions (which is still going through your visual system).
In that sense, there are no limits to the resolution of the human visual system.

If you are just interested in the limits of direct observation under controlled conditions, then a simple psychophysical approach to determine the limits of how close together two points can be resolved.

However, normal use of your visual system involves much more than just this.

Graeme M · Sep 6, 2017

Thank you for the interesting responses. I'll get back to my question in a moment. Thanks too for the book recommendation, Andy Resnik. I think it's very interesting to learn that there are several layers of "processing" in the retina - I'd love to know what that actually means given that this physical area should be relatively open to study. How do retinal cells process information in that way?

In terms of resolution or visual acuity, to which I admit to only a very sketchy understanding, this page seems to be referenced quite a bit.
http://www.clarkvision.com/articles/human-eye/

Here, the author proposes that typical visual acuity of 1.7 when measured via a line pair corresponds to 0.59 arc minute per line pair, and hence the "pixel" spacing works out to 0.3 arc-minute. He makes some interesting calculations in terms of the pixel detail needed for an image to reach the limits of human visual acuity.

Returning to my question, I think what I am getting at is best addressed in the comments above that talk about how our brains/minds construct scenes and fill in details etc. Now, without straying into philosophy, I am going to assume that when I see a scene, regardless of how much in-filling or construction is going on, what I experience must be a physical artefact.

Assuming the reference above is fairly accurate, it seems that there could be assumed to be a finite limit to the number of points or pixels in an experienced scene. At some fine limit of detail, two points resolve into one as far as vision goes. I am asking whether we know how many neurons are required to represent any of these points at the limit of acuity.

Put another way, when I look at a wall of uniform colour and brightness, I experience something different from a scene of detail, such as a wall of fine coloured dots. As we decrease the size and spacing of the dots, at some point the wall ceases to be experienced as separate dots and becomes visible as a solid colour (I assume!). Is there a difference in how many neurons represent the separate dots versus the solid colour? I don't see that there can be as the initial sensory perception seems to be largely the same (scattered cones responding to photons of x wavelength). Those signals are then passed through the visual cortex and eventually make it to consciousness, but is it a one to one relationship (I doubt that), or is the final "image" composed of more points (ie neurons) than the original sensory response passed from the retina (I imagine that it must).

Does that make sense? I suppose we run into the problem of not knowing what makes something conscious, but in terms of the visual cortex processing, is there a stage at which processing finishes, so to speak, where my question might apply?

russ_watters · Sep 6, 2017

jim mcnamara said:

How about a measure of resolution, or visual acuity?

That's what I was thinking: [to OP] try calculating what angle the letters on an eye chart subtend for 20/20 vision...

Drakkith · Sep 7, 2017

Graeme M said:

Put another way, when I look at a wall of uniform colour and brightness, I experience something different from a scene of detail, such as a wall of fine coloured dots. As we decrease the size and spacing of the dots, at some point the wall ceases to be experienced as separate dots and becomes visible as a solid colour (I assume!). Is there a difference in how many neurons represent the separate dots versus the solid colour? I don't see that there can be as the initial sensory perception seems to be largely the same (scattered cones responding to photons of x wavelength). Those signals are then passed through the visual cortex and eventually make it to consciousness, but is it a one to one relationship (I doubt that), or is the final "image" composed of more points (ie neurons) than the original sensory response passed from the retina (I imagine that it must).

No, retinal cells are not linked to the brain on a one-for-one basis. There are far more receptors in the retina than there are "paths" in the optic nerve. Receptors are linked together in different ways and the signals from multiple receptors are usually added together and processed in some fashion before ever reaching the optic nerve. The receptors can also respond differently, with some being turned "on" by light and some being turned "off" instead. Interestingly, the different layers of this processing chain appear to perform things like edge detection and shape detection, among others.

Once the signals from the receptors make it through the beginning of the optic nerve, they end up being processed again at various points up the chain and the signals eventually spread out to a great many neurons in the form of a conscious image. Each stage does different things. For example, the lateral geniculate nucleus (LGN) performs ranging and velocity detection of major objects in the visual field before up-channeling the visual signals to the other parts of the visual system. Then these perform more processing to start to piece together all the different types of information into a coherent "global view", which is then passed on.

See the following image: https://upload.wikimedia.org/wikipedia/commons/c/cd/Lisa_analysis.png

Note that the output of the retina is very, very low quality compared to the input image. Instead of a single raw image you have many different lower quality "images" which give the rest of the visual system basic information to work with. I put the word "image" in quotes for a reason. It's easy to think that the output of the retina is an actual image, when it is very likely that it is more like a streams of data which are all processed in parallel and added together to form the image in your head that you "see". Given all of the processing and compression that takes place, it seems almost a miracle that we can see anything at all!

If you were to think of this process in terms of digital imaging, then imagine that you're trying to film a scene at 60 FPS with a 100 megapixel camera, but the cable and equipment transmitting the image to your computer can only support a fraction of that amount of data. So the camera has to do a lot of pre-processing to compress the data without losing valuable information. Now, this transmitted image isn't a raw image, it may not even be an "image" at all. The computer has to do its own processing of the data to recover the image from the incoming data, correct for artifacts introduced by the compression process, store the image in memory, and keep track of changes in the image and all of the other things that need to happen.

See the following links for more info:
https://en.wikipedia.org/wiki/Retina#Function
https://en.wikipedia.org/wiki/Visual_system

.Scott · Sep 7, 2017

There are two elements of image processing that are important in answering your question. The first is the use of a priori information is fair game in image processing - and the human brain certainly employs this. The second is a form of integration - either spatially or over time.

Taken together, they allow the resolution of some image details well below the pixel level. For example, if you move a sword across the view of a camera, using certain assumptions, I can process that image to determine the position of the sword to high precision - much greater than my pixel resolution. Those assumptions, my a priori information, will include: the shape of the sword is not significantly changing; there are limits to the amount of acceleration in the velocity of that sword; the background image is also not changing or changing relatively slowly. I can combine images across time to deduce a precise shape of the sword - then, with each frame of imagery, I can spatially integrate the entire sword image to precisely determine its location.

That the mammalian visual cortex processes edge and motion at a very early stage has been observed since the 1960's. Cats have been a favorite subject. For example: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1359523/pdf/jphysiol01247-0121.pdf

From there, this edge and motion information is passed on to other parts of the brain. Further processing produces both obvious and not-so-obvious results. It allow us to consciously recognize and appreciate the form and 3D location of the objects we see. Less obvious is how visual information can be used without direct conscious recognition. https://en.wikipedia.org/wiki/Blindsight

But the key is that we don't consciously see pixels. What we consciously see are objects in our field of vision - along with their relative 3D locations and an experience-based recognition of what those objects are.

Graeme M · Nov 6, 2018

Briefly coming back to this one and I guess the answer is that we don't know. It seems my thought that human visual images are composed of some kind of points or pixels was incorrect. To me it seemed that if the information collected at the retina was forwarded through the visual processing system and eventually an image becomes "conscious", then it must be that the image is simply some arrangement of neurons and there must be some sort of correlation between the image and those neurons at a level of "resolution". When I see things, they appear to be clear and lack graininess, though they can be blurry when not in focus. My question was more motivated by the problem of how images, which are really collections of neurons, can not be grainy. That is, at the level of the retina, images are really specific points of activation - cells - and that remains the case right through the system. I guess though that's kind of assuming an observer inside us seeing the image which isn't the case. So the question doesn't really make sense, and as .Scott points out above all we really see are objects that appear to be solid, clear and not grainy.

BillTre · Nov 7, 2018

I think of my experiential visual experience in a fundamentally different way.

You (and me) are now visually observing the world through a mature nervous system that has many levels of processing between the activation of photoreceptors and the elevation of a visual stimulus to conscious awareness.
Many visual inputs from the eye are already processed to convey information about lines, contrasts, and movement (probably among other things).
At this point in your development, your nervous system has become adept (through a combination of innate developmental processes combined with experience of the outside world) and observing and tracking the movements and changing properties of various objects as well as making sense of the stage (background and non-moving objects) upon which they move.
Rather than emphasizing (at a conscious level) a pixel based representation of which the sensory world is telling us, I think of it as placing objects (with their associated properties) on virtual stage.
Objects are non-pixelated because they are not being simply recreated (internally) direct from sensory input. Rather they are virtual objects called up from memory, continuously updated in their properties, location, movement etc, and on a virtual stage.
Many visual illusions (for example here), where your perception can switch between seeing two different objects, play on this object recall function.

.Scott · Nov 7, 2018

Graeme M said:

It seems my thought that human visual images are composed of some kind of points or pixels was incorrect.

The purpose behind your visual system is to give you information about the world. So if you look at a fuzzy photo of a tree, you see:
1) The tree - to the extent you can make it out;
2) The fact that it is a photo; and
3) The fact that the photo is not in focus.

But what if there was a hole in your vision? Would you see that?
Well there is such a hole - and you don't see it. If you did, it would be a distraction. It would be like being constantly aware of the socks you are wearing. You can feel them if you take deliberate notice - but not normally.

https://en.wikipedia.org/wiki/Blind_spot_(vision)

The brain will adapt to many visual distortions to present a sane view of the world to your conscious self. If the blind spot isn't impressive enough, read up on the work of Stratton:

https://en.wikipedia.org/wiki/George_M._Stratton#Work

Graeme M · Nov 8, 2018

BillTre said:

I think of my experiential visual experience in a fundamentally different way.

You (and me) are now visually observing the world through a mature nervous system that has many levels of processing between the activation of photoreceptors and the elevation of a visual stimulus to conscious awareness.
Many visual inputs from the eye are already processed to convey information about lines, contrasts, and movement (probably among other things).
At this point in your development, your nervous system has become adept (through a combination of innate developmental processes combined with experience of the outside world) and observing and tracking the movements and changing properties of various objects as well as making sense of the stage (background and non-moving objects) upon which they move.
Rather than emphasizing (at a conscious level) a pixel based representation of which the sensory world is telling us, I think of it as placing objects (with their associated properties) on virtual stage.
Objects are non-pixelated because they are not being simply recreated (internally) direct from sensory input. Rather they are virtual objects called up from memory, continuously updated in their properties, location, movement etc, and on a virtual stage.
Many visual illusions (for example here), where your perception can switch between seeing two different objects, play on this object recall function.

Bill, that's an interesting comment. I take your point that visual imagery isn't some kind of composition using points of light but rather a virtual object constantly being refined from both memory and perceptual input. That's kind of what I meant in my comment about my conceptual model being incorrect - I am applying some kind of digital imaging concept to a process that isn't even really generating actual images.

FactChecker · Nov 8, 2018

In addition to the process of interpreting a still picture, there are things that determine motion. The "Waterfall Illusion" demonstrated that. (see https://en.wikipedia.org/wiki/Motion_aftereffect )

256bits · Nov 8, 2018

Graeme M said:

Bill, that's an interesting comment. I take your point that visual imagery isn't some kind of composition using points of light but rather a virtual object constantly being refined from both memory and perceptual input. That's kind of what I meant in my comment about my conceptual model being incorrect - I am applying some kind of digital imaging concept to a process that isn't even really generating actual images.

What about two scenes quite similar except whereas one has N number of points randomly presented, and the other has N-1 is the same positions as the first ( with one missing ). No object is being defined or can be discerned from the random points, but after staring at the two pictures, one can spot the missing point in the second picture.
( An adaptation of this is found sometimes in the newspaper where they do have two scenes and one is asked to point out seven differences - maybe a flower petal missing, or a person's tie is shorter. )
One could say the scene is pixelated, the brain not using edge detection and all that other stuff, and nothing one can compare to from memory of a past perceived object with color and form.
Is that not different from recognizing familiar objects such as a cup, a face, a puppy in the general sense and then honing in the recognition to a particular such as his cup, your face, my puppy with higher image processing that the brain does?

BillTre · Nov 8, 2018

256bits said:

What about two scenes quite similar except whereas one has N number of points randomly presented, and the other has N-1 is the same positions as the first ( with one missing ). No object is being defined or can be discerned from the random points, but after staring at the two pictures, one can spot the missing point in the second picture.

If the N or N+1 points are not enough to define or discern an object, then:

there is not enough light to define or discern the object in question.
or, stated another way there are not enough points of retinal data to define or discern the object in question (that is to internally recall an object and its properties). Since the retina can respond to single photons of light this makes me think its is one photon per point of illumination.
Turning up the lights or possible watching longer (to accumulate more data points) might resolve the issue (identifying the object)

256bits said:

One could say the scene is pixelated, the brain not using edge detection and all that other stuff, and nothing one can compare to from memory of a past perceived object with color and form.

There are certain conditions in which you can manipulate the inputs in order to achieve a quantized light input to the visual system.
These would be like a visual psychophysics experiment (which I did some of, as an undergrad).
These are usually done in situations involving tightly controlled light.
Here is an example of a study, on humans, involved in detecting single photons of light.
The experimental subjects say its not like seeing a dot of light:

“The most amazing thing is that it’s not like seeing light. It’s almost a feeling, at the threshold of imagination,” says Alipasha Vaziri, a physicist at the Rockefeller University in New York City, who led the work and tried out the experience himself.

The retinal has some retinal ganglion cells (the cells that send signals to the brain) that receive sensory inputs from photoreceptors over large areas of the retina.
These cells are thought to respond to low light inputs from these large areas.
From the conscious side of the this interaction, it would might not make sense to expect a point of light.

256bits · Nov 9, 2018

BillTre said:

If the N or N+1 points are not enough to define or discern an object, then:

Good reply.
The photonic response of the retina is something to be considered for the brain to be trying to make sense of a scene under low lighting conditions.

I was originally thinking of a 4th choice.
4. the random points of light in the scene in their special extent do not form an object.
An example would be gazing at the night sky with the stars being the points of light. Studying a particular region, within it, one should become familiar as to a general spatial density of stars, their locations relative to one another, relative brightness. A good stargazer ( making an assumption ) should be able to pick out another point of light, such as Venus or Mars in particular, that moves into the scene.
Pattern recognition> there is no pattern for randomly placed points.
Edge detection? there is no edge to the scene
shape detection? no shape - all is black ( as black can be under presently lighted up cities and areas )
??
What is the brain and the visual system doing? - something similar to imaginary images that one sees in clouds and on muddy floors. - but not all people see the same imaginary image. Observing a truly random scene such as the continuously changing off-air static on a CRT television( which some folks will sadly never appreciate ), one feels that the scene is just about to form an image but does not.

BillTre · Nov 9, 2018

256bits said:

4. the random points of light in the scene in their special extent do not form an object.
An example would be gazing at the night sky with the stars being the points of light. Studying a particular region, within it, one should become familiar as to a general spatial density of stars, their locations relative to one another, relative brightness. A good stargazer ( making an assumption ) should be able to pick out another point of light, such as Venus or Mars in particular, that moves into the scene.
Pattern recognition> there is no pattern for randomly placed points.

I would argue that looking at a star-field in some area of the sky is not really like looking at random points of light. While they may be dusted across the sky to a kind of random looking spatial manner, they are reproducible found in the same positions when the area is looked at again. They are therefore not random in a temporal sense (individual points becoming landmarks) and could well be recognized as a pattern and perhaps as a group.

256bits said:

Edge detection? there is no edge to the scene
shape detection? no shape - all is black ( as black can be under presently lighted up cities and areas )
??

Points and similar things in a "field" can be used to define things unseen or non-existent. There are optical illusions based on this, where some object is defined by the lack of dots, stars, or whatever that partially defines a shape. The brain fills in the rest.
There is a real drive for the brain to put things together in patterns. It is used in design and art. Groupings are frequently implied.

256bits said:

What is the brain and the visual system doing? - something similar to imaginary images that one sees in clouds and on muddy floors. - but not all people see the same imaginary image.

Why not, in situations where there is not enough light to accurately discern what you are looking at.
The sensory system will be wanting to try different objects for some kind of fit to some perceptual pattern.
It might switch around a few times before setting in on some pattern.

256bits · Nov 9, 2018

BillTre said:

Why not

A priori experience and cultural upbringing would reflect imaginary images seen in a scene I would expect.

Thanks for the replies and information.
Much appreciated.

Graeme M · Nov 9, 2018

BillTre, thanks for the link to the paper about humans detecting single photons. In a way that tackles my original question. If a single photon is detected, then I presume only one cell in the retina is affected, so its signal will be sent to a ganglion cell that in turn signals up through the visual processing chain. Apparently, it IS detected at some level of statistical significance. Sooo... once it makes it to the point of consciousness, I wonder how many neurons are involved? I realize no-one knows when something becomes conscious, but at the highest point in the processing hierarchy I wonder how many cells are involved in representing a single photon? Or even more interesting, what happens to the signal from that one photon as it progresses through the visual system?

Presumably too, the experience of being aware of a single photon is also written to episodic memory (else the subjects wouldn't be able to report the experience), so now we have more neurons representing the event itself. I wonder if the memory of the experience uses more neurons than the actual event (presuming of course the subjects are reliably aware of the event and not just the memory of it)?

Resolution of the human visual system in physical terms

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Can Dogs Talk Using Buttons?

Hantavirus outbreak aboard ship MV Hondius; virus present in Argentina

A New Niche for Life at Low G

There are people in biology who really do math

Molds in Front Loading Washing Machine

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect