1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

What does it mean and how to convert picture to spatial frequency

  1. Mar 4, 2012 #1
    (This may sound like home work but it's not)

    What does it mean to display a picture in its frequency domain? A solid square box becomes a star shape?


    As title thanks. Firstly, what does it mean to show a picture in its frequency domain? and secondly, how do you do it?

    I am looking at this webpage
    which shows a white solid square is actually star shaped in frequency domain. I don't understand how you go from the white square to the star below it.

    What does it mean to display a picture in its frequency domain? Asked differently, what does it mean to apply Fourier Transformation to a picture?

    Suppose I have an image. We can apply Fourier Transformation to it and get the spatial frequency of this image.
    What exactly does it mean spatial frequency?
    What do you actually do to the pixels, to convert them to frequency?

    This is how I understand it. Please correct me:

    An image is made up by pixels.
    If we lay the pixels out linearly ie. take first line of pixels, then just next line of pixels immediately after the first line, then join onto third line of pixels and so on. Basically lay the 2D image out as a serial array of pixels, start from upper left hand corner of the picture, all the way to right lower corner.
    Each of these pixels has an intensity (brightness).
    I can plot them on a graph where y-axis is intensity of pixel, and x-axis is the number of pixel.
    So for example pixel 1, intensity 100; pixel 2, intensity 10; pixel 3, intensity 40 and so on. Just plot all of them on brightness vs pixel graph.
    The graph is going to be a very complicated wave.

    Fourier Transformation will let me break down this complicated wave to the sum of many many sine waves of various frequencies, amplitudes and phases. It might take 2000 different sine waves to make them add up to give the combined effect of the complicated wave, but that's what FT will do for me. I am happy with the maths.

    Is this what spacial frequency is about?
    I still don't get how the webpage manages to change the box to star.

  2. jcsd
  3. Mar 4, 2012 #2


    Staff: Mentor

    The second part is easier to answer: simply take the discrete Fourier transform of the image using your favorite FFT package.

    The first part is a little more complicated. It is a subject that you can spend an entire course on, so I think the best thing is to refer you to the relevant pages from my favorite online course. You have to register, but it is free:
  4. Mar 5, 2012 #3
    Hi DaleSpam, thanks a lot for the reply.
    I guess when I asked "how do you do it", I meant "if you use a software to do it, how would that software do it? ie. what would be its step by step algorithm?".

    Thanks a lot for the links. They have very nice pictures. That kind of half make sense to me at the moment. I just need to watch and re-read it a few times to be able to verbalise it in lay terms. I reckon it's a concept that's straight forward, just need to get the key idea of what's happening, then one should be able to explain it in just a few very simple sentences.
  5. Mar 5, 2012 #4
    The Fourier transform is very helpful in quantitative image analysis.

    In real space (or direct space) blurring by a lens results in the convolution of the blur function (point spread function) and the original image. In frequency space, it is the product of the FT of the original image and the FT of the point spread function.
    Also, if you know the inverse of the point spread function you can perform a "deconvolution", i.e. remove the blurring.

  6. Mar 5, 2012 #5
    Just to set the scene, Paul Bouke's webpage shows, this picture as the original signal:

    Viewed in intensity versus coordinates mode (spatial domain) is like this:

    Viewed in frequency domain looks like these:
    So that's where I'm coming from. I can't understand the spike picture, how that is frequency domain and what it means when referring back to the original box picture. After some thoughts during dinner and further look up about line spectra for acoustics for better appreciation of signals in simpler terms,
    I hoped I could but sadly still can't make any sense of why the solid white box in spatial domain becomes a star shape in frequency domain.

    Paul Bourke's pictures show 2D signals which for my own requirement, I will interpret as corresponding to a digital image (made of pixels and intensities; because I'm self studying digital image processing). In any case it's easier to look at it in one plane first.

    The picture in the top upper right (of his webpage) is a spatial representation of the solid white square to its left.
    Z axis is intensity of pixel, x and y axes correspond to (x,y) coordinates of each pixel.
    In otherwords it's showing intensity of individual pixels with respect to their coordinates.

    How does that top upper right picture of his webpage (one that looks like a hill with a plateau) become star shaped when viewed in frequency domain?
    Since that hill picture is symmetrical in x and y axes, I figured it'll be easier to just look at one plane first, then the other plane will sort itself out.
    So I only look at xz plane of the hill for now. So it's like looking the 3D hill picture side on. I have flat, flat, block rises up, then comes down flat, flat again. That's looking at xz plane of spatial domain.
    How does that plateau hill in spatial domain, become a spike in frequency domain? This is my question, and I can't get past this stumbling block :(
    So looking at side on view of frequency domain (ie. frequency domain's xz plane), I get this 2D graph: ______|_______ (basically side on view of the spike)

    Can someone label the x & z axes for the 3D spike picture for me please? (z axis is the vertical height of that spike)

    Obviously in frequency domain, x axis is frequency (and so is y axis). But frequency of what? What does "frequency" mean on the original white square box picture?

    I mean the spike picture tells me we have very few of low frequency, lots of middle frequency, then again very few of high frequency.
    What is it about the white square box picture that follows this pattern of: few of something, lots of something, few of something again? and is not with respect to coordinates.

    Lastly, what's the unit for z axis in the spike picture? Amplitude (as in brightness of pixel)? Change of amplitude?... utterly confused.


    Hope my question is clear: can someone label the x and z axes of the spike picture for me please? and explain what frequency means when we look at the flat white solid box picture. Assuming the flat white box picture is a photograph made of pixels with different brightnesses and have some noise in the picture.
    Last edited: Mar 5, 2012
  7. Mar 5, 2012 #6
    We can draw a Sine wave alternating between positive peak and negative peak in Time. And its frequency is found from Time Period, length of 1 oscillation.

    Similarly we can draw a pattern that alternates between slowly increasing and decreasing Bright and Dark bands along a Spatial Dimension (x or y or z, but not in Time). This pattern has a Spatial Period just like a Sine wave has in Time. Note one is in Time and the other in Space. Also note unlike Sine wave spatial pattern is generally 2 dimensional.

    Reciprocal of a period is frequency. If you understand frequency of a temporal Sine wave, you should have little difficulty understanding spatial frequency.
  8. Mar 6, 2012 #7
    If you have a broomstick, ruler and a black marker handy, start measuring from one end and place a mark every 6 inches on the stick. The one dimensional spatial frequency of the marks is 2 per foot.

    It looks to me like the z axis is the amplitude of each frequency component. A continuum of possible frequency components is aligned along the flat bottom in the x and y directions. Higher frequencies toward the edge and lowest in the center (with DC at the very center point) As you add comparatively more amplitude of high frequency components the spatial form of the inverse transformed image becomes sharper or more spiky.
    Last edited: Mar 6, 2012
  9. Mar 6, 2012 #8


    Staff: Mentor

    This is the quintessential FFT algorithm, but I really don't think this is going to be useful information for you.
  10. Mar 6, 2012 #9


    Staff: Mentor

    We cannot label the frequency domain until you label the spatial domain. Then the label for the z axis in the frequency domain is the same as the label for the z axis in the spatial domain. And the label for the x axis in the frequency doman is the inverse (1/x) of the label in the spatial domain.

    On the second link in my first reply there is an interactive tool that will help you understand what spatial frequency means. As you move the point around each "wave image" is a single frequency image. The FFT reprents the original image as a weighted sum of these "wave images".
  11. Mar 7, 2012 #10
    You seem to think that frequency is always limited to oscillations as function of time. That is not the case.

    Fourier analysis deconstructs a periodic functions as superposition of sin and cos functions (or complex exponentials, if you are more comfortable with that).

    In the case of images, the function is a function of position (x, y) instead of time, so Fourier analysis shows how the image is composed of sin and cos functions as function of spatial frequency, i.e. oscillating along the x and y directions. So the x and y axes in the "spike picture" can be labelled [itex] \omega_x[/itex] and [itex] \omega_y[/itex]
  12. Mar 7, 2012 #11
    Thanks guys. Really appreciate all the input.

    Suppose for the original solid white box picture, X-axis is labelled "X coordinate of pixel" and Y-axis is labelled "X coordinate of pixel". Z axis is "brightness of pixel at coordinate (x,y)".

    @ DaleSpam: Thanks for the links. I had a play with it but still don't understand.
    In frequency domain (of the solid box picture), the X and Y axes become "inverse of X coordinate" and "inverse of Y coordinate"? and Z remains the same ie. intensity??... What does inverse of x part of (x,y) coordinate even mean?

    I thought about it like this: suppose the white solid box has intensity A, and the noises have intensities between say a1 to a99
    Obviously that solid white box represents an area of pixels all at intensity A (+/- tiny functuations due to noises).

    Does that mean on the spike diagram, just look at it from the side, is saying "the intensity for 'per x coordinate of each pixel' is initially close to 0 then the spike which is 'A' units tall, then after that, the intensity for 'per x coordinate of each' pixel comes down to 0 again". What does this even mean?

    @ M Quack: what is ω? is it like angular velocity of a pixel?

    I appreciate frequency does not always have to be against time ie. doesn't have to be cycles / second, can be cycles / pixel. However I don't quite understand you by "the function is a function of position (x, y)". Let's say we pick the pixel (0,0) ie. the top left hand corner of the picture. And we pick another pixel (50,50) ie. say it corresponds to the top left hand corner of the white box. And pick a third pixel (80,80) ie. say it corresponds to the dead centre of that white solid box.
    Spike diagram's top left hand corner would be saying, "at coordinate (0,0), the number of oscillations is nearly 0". Oscillations of what? Of a pixel? How does a pixel even oscillate? How does a pixel at coordinate (0,0) with a fixed intensity oscillate?
    Or take the actual spike itself in the spike diagram. Is it saying "at one particular coordinate, we are getting great oscillations, but all around that coordinate, we are getting very little oscillations"? What does this even mean when referring back to the original solid white box picture? Where in the original box image do we see one thing that oscillates hugely and everything else hardly oscillates? At first I thought okay, something obvious about that white box picture is clearly you have lots of pixels giving you nearly equal amplitude (or intensity) of say magnitude 'A' at the white box. But then I look around the white box and I see even more pixels having roughly equal amplitudes (intensities) of close to 0 all around the box. So if I really draw the spike diagram based on how many pixels give me intensities near A or intensities near 0, then the spike really should be talking about all the background noise, not the white box. If you know what I mean.

    I have pixels with coordinates (x,y) and brightness intensities in Z axis. How do I find its frequency in terms of "oscillation along x direction"? What does it even mean to look at a photograph full of pixels of different intensities and ask for "oscillation along x direction"?

    Sorry for all these basic questions. I must be asking the obvious but it just doesn't make sense to me what the spike means when referring back to the flat solid white box picture (Assume that white box picture represents a set of pixels of highest and nearly equal intensities in the middle (the square box) and low low intensities all around. X axis is x-coordinate of each pixel, y axis is y coordinate of each pixel, z axis is intensity of each pixel).

    @ PhilDSP

    I think your broomstick analogy makes sense to me.

    I use metric system, so just converting your units to cm. You are giving me an example like this:
    I get a broom and make a mark every 5cm. Then the one dimensional spatial frequency of the marks is 2 per 10cm (in otherwords, 2 marks every 10cm).
    I am describing how often the same mark occurs as I walk along the x-axis (or y axis).

    Going back to my white box picture and spike picture. Whitebox picture is easy: X, Y is just coordinate of pixel. Z is intensity of each pixel. That's easy.

    The spike picture I'd like to label the axes like this:

    X-axis is a spectrum of brightness intensity. Ie. along the axis we have for example "-1000 units of brightness (ie. very dark)", "-900", -800,.. , -100, 0, +100, +200, .... +1000 (ie. very bright).
    Y axis the same.
    The unit for X axis (and Y axis) is brightness. For example Haunsfield Unit, or whatever brightness unit but it's a spectrum of different brightnesses.

    Z axis is saying "look at this particular brightness being indicated by one mark on the X axis (for example where x axis reads "50 units of brightness"), how many of the pixels from the original whitebox picture can you count to have this brightness?
    Suppose the white solid box is of brightness 50 units, and suppose that white solid box is made of 100 pixels, then the spike would be 100 units tall in the Z direction. The tip of the spike would therefore be (50,50,100).

    Am I right so far?

    What then does not make sense to me is this: how about noise? We have so much noise surrounding the white solid box. Shouldn't we have a few ultra high spikes that correspond to brightness intensity of those noises? But the picture we have here is one single big spike. What I mean is, sure I get a big spike for the white solid box, but what happened to my one single even taller spike for all this sea of low intensity signals surrounding the box?
    Last edited: Mar 7, 2012
  13. Mar 7, 2012 #12


    Staff: Mentor

    So, suppose in the image domain your axes go from pixel 0 to pixel 100 in steps of 1 pixel. Then in the frequency domain the axes would go from -0.5/pixel to 0.5/pixel in steps of .01/pixel. The number refers to the number of cycles per pixel. So, if the wavelength of a given wave image is 5 then the location in the frequency domain wolud be at the 0.2/pixel location.

    In the interactive tool I linked you to as you moved the cursor around to different locations in the frequency domain you obtained different wave forms in the image domain. If you take the spike shape and you move around the frequency domain and at each point in your frequency domain you scale the brightness of the wave image by the brightness of the frequency domain location then you will get a large number of wave images with different spatial frequencies and intensities. If you then add all of those wave images together you will get the white box.
  14. Mar 7, 2012 #13
    Sorry, I couldn't map what you described into something I could interpret.

    The x and y axes could be labeled by what M Quack mentioned: ## \omega_x ## and ## \omega_y ##. As you travel along the x axis from the edge, the frequency varies from the Nyquist (or cut-off) frequency (at the edge) to 0 (at the center) then presumably to -Nyquist frequency at the opposite edge (that depends on the implementation)

    The z axis could be labeled ## W_{\omega} ## where ## W ## is the weight or amplitude of the particular frequency at that x-y position.

    The amplitude of the weighted sinusoidal waves of the FT then add up to the same intensity values as in the original picture.

    "Noise" is generally regarded as broadband noise (white noise, pink noise, etc.,) That means it contains a pretty consistent amount of all frequencies when sampled in a large time or space window as in your example. So in the transform it will look like a very slighty warbly or spiky low level bottom to the image. You filter it by removing a small amount of every frequency and the resulting FFT plot should have a very flat smooth bottom.
  15. Mar 7, 2012 #14
    IF you have access to some plotting software, please try the following: make a square NxN matrix of zeros, set a single point at (x,y) to some complex Z≠ 0, do an inverse 2D FFT on it then plot real and imaginary parts of the result as 3D surfaces. You'll get two "corrugated iron" functions, which are sine waves in the direction of vector (x,y) and constant in perpendicular direction. Real and imaginary parts are cosine and sine waves with period is N/length(x,y), amplitude |z| and phase arg(z).

    So that's what 2D FFT does, it decomposes your image into linear combination of these "corrugated iron" functions.

    Now if you set (x,y) to z and (-x,-y) to conj(z), real parts will reinforce and imaginary parts will cancel out. That's what you get in the FFT of real image. Note: either (-x,-y) or (N-x,N-y) depending if the origin is in the center or in the corner, which is a matter of convention.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook