Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Grouping Stores based on past retail sales.

  1. May 15, 2012 #1
    I have a set of retail data explaining how many sales per store occurred during each week of the past 52 weeks. I want to group these stores by which follow each other the best over the course of the year. Let's say they'll be 10-20 stores in each group, although this number will be refined based on the analysis. These groups will be used to then create location specific curves of distribution. By grouping I can smooth out anomalies while creating a bigger dataset to pull from. I also want to find those stores that have the biggest deviation from the mean.
    It would be nice to also find variance within each group, what is statistically significant in these groupings.

    I did a simple analysis by just creating curves for each store and comparing them to the mean. Then finding the percent difference for each week, and ranking each store by its average percent difference.

    What is a better way to analyze each curve with respect to each other to create groups?
  2. jcsd
  3. May 16, 2012 #2
    It seems that in your analysis you are not taking into account that you are dealing with time series; comparing the distribution of sales of two stores to see which one follow each other the best is not a good idea since you might have S1 = {1,2,3,4,5,6} and S2 = {6,5,4,3,2,1} and they both will have the same distribution, yet S1 is clearly doing better and better and S2 worse and worse, and yet, I do not think you want to put them both int he same group, is that correct?

    In fact, it is up to how you define 'follow each other the best', for example, imagine you have two stores following each other's ups and downs, but one having twice as much sells as the other, they definitely follow each other even if at different levels.

    On the other hand you could have stores which behavior is totally independent and yet they might have the same average sales after a year... You see what I mean?

    So to you question 'what is the best way to group' I think you really have to make sure what you want to have in each group and define with more precision the criterion 'follow each other the best'. Is it just the average sales you care about? you care about the variance? you care about he evolution over time? ...
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook