Is there a statistically significant increase in phrase occurrences?

  • Thread starter Thread starter kmrstats
  • Start date Start date
  • Tags Tags
    Data Statistics
kmrstats
Messages
2
Reaction score
0
Hi -

First timer here. Excuse me if this question is not up to the level i see posted on this forum, but here goes.

I have been asked to provide a daily signal generated from the number of occurrences of a set of specified phrases present in a news data feed. The first thing I did is generate a moving average from the daily count of each phrase in the feed and generate a signal if the current count was above the moving average by a specified percentage. Using this approach I didn't think the signal provided much value beacuse the phrase counts are very bursty. The count can be in the low teens for a number of days in a row and then jump to a 100 for a couple of days and then settle back into the low teens.

What type of statistics should I use to determine a statistically significant event given my scenario described above?

Thanks in advance
 
Physics news on Phys.org
One way is to:
1. calculate the historical average up to day t: HA(t) = \left.\sum_{s=1}^t n_s\right/t, where ns is the number of occurrences on day s
2. calculate the historical standard deviation HSD(t) similarly
3. test whether nt is > HA(t) + 2 HSD(t).
 
Last edited:
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Back
Top