Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Need Advice on Spatial Statistics for a Lattice

  1. Jun 6, 2012 #1
    My professor gave me data points representing living cells on a rectangular plane and was told to analyze their spatial pattern, i.e., do the date points on the finite plane have a tendency to be clustered, random, or dispersed. I successfully accomplished this for points on a continuous plane using the nearest-neighbor (NN) method but now he wants it done for points on a regular 2D lattice!

    I know I cannot use the NN approach as it's not well suited for discrete space. Does some know of an equivalent method to NN but for a discrete lattice? I, do not know spatial statistics all that well and I have been searching books and the web for an appropriate method without success. Either what I read is too technical or I am mistaking what the methods can be used for. Any suggested would help a great deal.

    Thank you
     
  2. jcsd
  3. Jun 6, 2012 #2
    I'm pretty much just guessing here, but do you think a k-means algorithm would work well with a lattice underpinning? It's the most geometrical method I know of, and perhaps that would make it useful.
     
  4. Jun 7, 2012 #3

    chiro

    User Avatar
    Science Advisor

    Hey Paradise Lost and welcome to the forums.

    You should consider methods that use metrics on lattices. Things like the Manhattan metric for one.

    Consider these metrics in the context of general techniques rather than having to select specific techniques for lattices: you will find that using different metrics may be able to be used with conventional techniques. You could also apply this metric to things like k-means as suggested already in this thread.
     
  5. Jun 7, 2012 #4
    Hi Paradise,

    What strikes me the most is the fact that your professor asks you to solve something but it seems he gives you no clue to how, is that common in your school/country? In my case the professor would explain a problem, a group of solutions for it, and then we would exercise on that. So your professor gave you the data points and loose you all in the wild to solve the problem?

    Anyway, it seems to me NN is suited for your problem whether you discretize the space of not, for instance, you haven't said how fine this discretization should be; if you make the lattice fine enough you wouldn't have any difference analyzing that discretization as if it was a continuum.

    So I am going to guess now;
    could it be that your professor wants you to discretize the rectangular plane so that you use a Poisson distribution to test the randomness?

    Let me explain, NN is a better method to test randomness of points in a space than using a Poisson, but since you mention you are working with living cells I can picture the following scenario: There is a biologist using a microscope and looking at living cells distributed in a rectangular area divided into squares. Then the biologist counts the number of living cells per square but he/she has no accurate way to measure the distance between cells....

    Now, in this scenario, for obvious reasons, you cannot use a NN method, but you can resort to the fact that the number of cells per square must follow a Poisson distribution if the cells are randomly distributed. Therefore, by using one of the many goodness of fit distribution test for the Poisson (e.g χ2) you can decide how randomly distributed the cells are.

    I know I am doing a lot of Sherlock guessing work here but, could it be this is what you want/need?
     
    Last edited: Jun 7, 2012
  6. Jun 7, 2012 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Paradise Lost,

    If your professor is a professor of statistics and you have been given a mathematical challenge then you can concentrate on lattices. If your professor is a professor of biology and you have been given a real world problem then you should think about the real world aspects of the data. For example, one reason that a real world problem might produce data on a lattice is that the real world cells are distributed on a continuous plane but when a data reducer measures them, he uses a slide with a grid and assigns the location of each cell to be the coodinates of the grid square that contains most of the cell. Or perhaps the data comes from some automatic image recognition process that implements a similar method.

    There is also the question of what "randomly distributed" means in a real world problem. For example, suppose cells are grown on a culture dish. It may be that what an experimenter considers randomly distributed growth is more orderly than mathematically random distribution.
     
  7. Jun 9, 2012 #6
    Thank you, thus far, to everyone responding. I still haven't found an even close analogous process for analyzing data on a discrete lattice.

    To clarify, the data are biological cells represented by values on actual lattice points and not within square areas--the data is laterally discrete. Some points do not have cells (so their weight is zero) while others do, or might have multiple copies per grid point (so those lattice points will have more weight). Suppose the lattice has a plane area of 1000X1000. Now I desire to analyze the distribution of the cells and see whether they tend to form 1) clusters, 2) are randomly distributed, or 3) dispersed.

    Using the standard nearest neighbor (NN) method developed for a continuous plane for this lattice would only work if the lattice spacing was sufficiently small to approximate the continuous case.

    viraltux, noted an earlier suspicion of mine that the lattice spacing needs to be fine but arbitrarily changing my unit lattice distance presents problems with the NN derivation since the notion of unit distance changed. E.g., too fine of a lattice will make (from the perspective of NN) it look like the whole lattice is clustering. Anyway, the literature even says NN is typically not used for lattices.

    theorem4.5.9, thanks for the suggestion of the k-means algorithm. I have been trying to understand it and see if it can be used but I am running into the same problem of no clear examples or discussions that I have been having with textbooks on spatial analysis. I just can't cut through the abstractness of their discussions. I will continue reading up on it.

    chiro, I thought about using a NN-like analysis using a taxicab metric but deriving a complete spatial randomness (CSR) model based on such a metric is proving exceedingly hard to do. I can't find anything in the literature about it. I'm still searching for other methods that use taxicab geometry.

    The thing I might use, but with only partial success is viraltux's suggestion of analyzing the number of cells per lattice point site, assuming a Poisson distribution and then testing to see if the cells are distributed randomly by taking it to be the null hypothesis and applying χ2 test to see if I can reject it. The one failure of this test is that if I reject the null (random distribution via Poisson process) then I can't really claim the cells are distributed in any other particular way, no? In other words, all I can do is say that the cells are not randomly distributed, but I CAN'T say if they are then clustered or dispersed.

    Any further insight is welcomed. Thank you...
     
  8. Jun 9, 2012 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    Umm.. I don't call that clarification! What would "laterally discrete" mean? Make this clear to non-biologists. Are these cells on a flat surface like a culture plate? Can the cells be piled on top of each other - as "might have multiple copies per grid point" indicates. Whatis the procedure for measuring the data? Does a person superimpose a grid on a collection of cells and only record what happens at each point on the grid?

    Or is this not real biological data? - just a conceptual model for biological data?
     
  9. Jun 9, 2012 #8
    Aside from the data, spatial analysis incorporates only a few parameters at most. I am not going into detail as to what the data is intended to model in the real world because that is not germane to the topic at hand. I have points distributed on a finite 2D integer lattice (not a plane in ℝ[itex]^{2}[/itex])--say with area of 1000x1000 sq. units. As I posted earlier, some sites can have more weight indicating there are more points at a given lattice sites. And to reiterate, the organisms are not in the squares but on the lattice points themselves. I want to characterize their distribution as clustering, 2) random, or 3) dispersed.

    They can be biological cells, christmas trees, or anything else one desires. It's kind of like the game Battleship. Suppose someone took an ariel snapshot of the game and wanted to know if the battleships on the grid are clustered, dispersed or randomly distributed.
     
  10. Jun 10, 2012 #9
    Well, if we are allowed to group all the cells in one point it actually simplifies the problem. One the weaknesses of using a Poisson in ℝ2 is the fact that we miss the intra-cell space distribution patterns, but since now we have all the cells together the weakness disappears and Poisson is the way to go.

    OK, then our problem is about what kind of non-randomness is present if any, right? Checking for randomness is a tricky thing, strictly speaking, even if the lattice points passes the Poisson test, we could not still guarantee the randomness; we might also want to check for autocorrelations and other non-random behavior, so keep this in mind.

    Anyway, in this example we only care about disperse vs clustered kind of non-randomness and, luckily, autocorrelations are sensitive to clusters and insensitive to dispersion, so you can use the autocorrelations to decide the kind of non-randomness you are dealing with.

    Summarizing, a simple approach could be:

    • If data is autocorrelated: Clustered
    • If data is not autocorrelated and fails the Poisson test: Disperse
    • If data is not autocorrelated and passes the Poisson test: Random
     
    Last edited: Jun 10, 2012
  11. Jun 10, 2012 #10
    I was looking at autocorrelation but I decided to put it on the back burner, so to speak, because I am unsure as to what attribute is actually being correlated. The example I saw was weather temperatures at certain grid points on a map. Higher temperatures mean a higher weight. Could it be that, instead of temperatures, I use number of biological cells?
    E.g., if no cells exist, than at that lattice it's 0; if there is one cell, the value is 1; etc...
    If so, I think that should solve it.
    1) Is this a corrected assumption?
    2) Should I use "Moran's I" or "Geary’s C" as the autocorrelation method?

    Thank you viraltux, you have been a big help.
     
  12. Jun 10, 2012 #11
    Think about it in one dimension, if you have a clear trend the autocorrelation will detect it, a cluster is in a way like this unidimensional trends, so no autocorrelation means dispersion in a non random scenario.

    1) Yeah, if no cells then 0, and then the number of cells per grid point; this should work just fine.
    2) They both are similar, my advice is that you try them both and choose the one that fits better in your problem, you can actually use them both; there are many different test you can use for testing randomness, it is for you to decide when you have the right collection for your needs though any of these two methods should work just fine.

    You're welcome :smile:
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook