# Statistics for bursty data

1. Jan 29, 2008

### kmrstats

Hi -

First timer here. Excuse me if this question is not up to the level i see posted on this forum, but here goes.

I have been asked to provide a daily signal generated from the number of occurrences of a set of specified phrases present in a news data feed. The first thing I did is generate a moving average from the daily count of each phrase in the feed and generate a signal if the current count was above the moving average by a specified percentage. Using this approach I didn't think the signal provided much value beacuse the phrase counts are very bursty. The count can be in the low teens for a number of days in a row and then jump to a 100 for a couple of days and then settle back into the low teens.

What type of statistics should I use to determine a statistically significant event given my scenario described above?

2. Jan 30, 2008

### EnumaElish

One way is to:
1. calculate the historical average up to day t: HA(t) = $\left.\sum_{s=1}^t n_s\right/t$, where ns is the number of occurrences on day s
2. calculate the historical standard deviation HSD(t) similarly
3. test whether nt is > HA(t) + 2 HSD(t).

Last edited: Jan 30, 2008