Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Statistics for bursty data

  1. Jan 29, 2008 #1
    Hi -

    First timer here. Excuse me if this question is not up to the level i see posted on this forum, but here goes.

    I have been asked to provide a daily signal generated from the number of occurrences of a set of specified phrases present in a news data feed. The first thing I did is generate a moving average from the daily count of each phrase in the feed and generate a signal if the current count was above the moving average by a specified percentage. Using this approach I didn't think the signal provided much value beacuse the phrase counts are very bursty. The count can be in the low teens for a number of days in a row and then jump to a 100 for a couple of days and then settle back into the low teens.

    What type of statistics should I use to determine a statistically significant event given my scenario described above?

    Thanks in advance
  2. jcsd
  3. Jan 30, 2008 #2


    User Avatar
    Science Advisor
    Homework Helper

    One way is to:
    1. calculate the historical average up to day t: HA(t) = [itex]\left.\sum_{s=1}^t n_s\right/t[/itex], where ns is the number of occurrences on day s
    2. calculate the historical standard deviation HSD(t) similarly
    3. test whether nt is > HA(t) + 2 HSD(t).
    Last edited: Jan 30, 2008
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook