Statistical assessment of the quality of event detention

  • Context: Graduate 
  • Thread starter Thread starter StarWars
  • Start date Start date
  • Tags Tags
    Quality Statistical
Click For Summary

Discussion Overview

The discussion revolves around the statistical assessment of an algorithm designed to detect events in time-domain data, particularly focusing on the efficiency of this algorithm in terms of sensitivity and specificity. Participants explore the challenges of analyzing large datasets and the implications of sampling methods for valid statistical analysis.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks advice on valid statistical analysis methods for assessing the efficiency of their event detection algorithm, particularly in the context of large datasets.
  • Another participant notes that statistical applications involve subjective judgments and requests more details about the specific concerns, such as the nature of the events and the availability of ground truth data.
  • A participant describes the nature of the sounds being analyzed, emphasizing the challenge of detecting low-amplitude signals amidst noise and the variability in sound occurrence.
  • One reply suggests simulating data from "real" events to evaluate the algorithm's performance, highlighting the need for a null hypothesis and methods for hypothesis testing.
  • There is mention of standard methods in engineering and science that might address the problem, with a suggestion to seek further advice in relevant sections of the forum.

Areas of Agreement / Disagreement

Participants express differing views on the best approach to validate the algorithm's performance, with some advocating for simulation and hypothesis testing while others emphasize the need for more information to provide practical advice. The discussion remains unresolved regarding the optimal statistical methods to apply.

Contextual Notes

Limitations include the absence of prior information for comparison, reliance on the algorithm's output, and the need for a defined null hypothesis for hypothesis testing.

StarWars
Messages
2
Reaction score
0
Hello.

I developed an algorithm to detect events in time domain and I want to know the efficiency of the algorithm.

The problem is related with the time duration of the data.

Each file has data with a time duration of hundreds of minutes and I have dozens of files.

Instead of calculate the specificity and the sensitivity of this algorithm for the entire data set, I was thinking to choose random samples.

My question is:

What is the correct approach to have a valid statistical analysis?

Thank you.
 
Physics news on Phys.org
Unfortunately, applications of statistics involve subjective judgements. If you want practical advice about a valid statistical approach, you need to give more practical details of the situation. For example, what are you concerned about? - the number of events in the file? - the exact time when an event occurs? Do you have information about when an event "really" happened vs when the algorithm said it happened?
 
I am studying sounds in time domain. Usually this sound has a low amplitude profile, just noise. Sometimes, a sound is generated and there is an increase in the signal amplitude.

The goal of the algorithm is to detect this increase in the signal amplitude. Unfortunately, the generation of a sound can be interpreted as random. It is possible that a low amplitude profile lasts for minutes or even hours without a single sound being generated. On the other hand, it is possible that there is a sequence of sounds for several minutes with a time difference between the sound n+1 and the sound n of just a few seconds.

I am concerned with the quality of the detection, sensitivity and specificity. This means a desire of knowing if a generated sound is either detected or not detected and if there is a "detected" sound when no sound is generated.

I do not have any prior information, just the one given by the algorithm.

Thank you
 
StarWars said:
I do not have any prior information, just the one given by the algorithm.

If that means that you have no way to compare the detections from the algorithm to real events then I think you should resort to simulating data from "real" events and seeing how well the algorithm detects them. To simulate data from events, you need algorithms to do the simulation.

If you want to do statistical hypothesis testing on each set of data, you need a "null hypothesis", which could be that no sounds are present and that the data is generated by some specific random process. You need a way to compute the probability of getting similar data when those assumptions are true. If you have no algorithm or formula to compute this probability then you can't do hypothesis testing.

There may be situations in engineering and science where people have developed standard methods of dealing with your problem. You can try asking about your problem in the engineering or science sections of the forum and give more details.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 12 ·
Replies
12
Views
5K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 21 ·
Replies
21
Views
4K