Second opinion for an Index Problem

  • Context: Undergrad 
  • Thread starter Thread starter JudasIscariot
  • Start date Start date
  • Tags Tags
    Index
Click For Summary
SUMMARY

The discussion centers on a proposed solution for identifying low-volume files based on record counts in a system that processes files every five minutes. The method involves calculating a mean A from historical data over six months, determining a mean B from the lowest 30 records, and establishing a minimum index point for comparison. A flaw identified in the approach is the lack of a clear definition for "low-volume," which may lead to inaccurate assessments. Additionally, considerations for outlier detection and order statistics are suggested as relevant concepts for improving the solution.

PREREQUISITES
  • Statistical analysis for mean calculations
  • Understanding of outlier detection methods
  • Familiarity with order statistics
  • Basic knowledge of data processing systems
NEXT STEPS
  • Research statistical methods for defining low-volume thresholds
  • Learn about outlier detection techniques in data analysis
  • Explore order statistics and their applications in data sets
  • Investigate alternative methods for monitoring file volume in real-time systems
USEFUL FOR

Data analysts, software engineers, and system architects involved in file processing and performance monitoring will benefit from this discussion.

JudasIscariot
Messages
3
Reaction score
0
I would like to hear a second opinion about this solution that was presented to us.

As a background, we have a system that will compute for the total number of records for a given a file. The arrival time for each file is set for 5 minutes. For monitoring purposes, the problem now is how to determine if the computed record for the newly arrived file is a low-volume file, meaning the number of record is very low compared to the other files.

The solution that was presented to us was this. First, the mean A for the total number of records from 6 months ago will be computed. Then, the data will be sorted from lowest to highest, get the top 30 lowest data and compute for the mean B. The mean B will then be divided by mean A and the resulting value will be called a minimum index point.

Afterwards, the total record for the newly arrived file will be divided by mean A. The value for this computation will then be compared against the minimum index point. If the value is lower than the minimum index point, an alarm will be generated informing the user that the current file is a low volume file.

Is there any flaw with this method? Or is there a more efficient solution for this particular problem?
 
Physics news on Phys.org
The only flaw in it is that you have not defined what 'low-volume' means. As far as I can tell, you seem to have declared something to be low-volume if it fails this test. That may or may not be what you want - we don't know.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 11 ·
Replies
11
Views
4K