Find Function/Transform for signal that minimizes CV of data

In summary, the goal is to find a function/transform "F" that can be applied on the variable recorded in the third column ("col3_val") such that the coefficient of variation of all the "R" values is minimized. The CV should definitely be less than 3%, but I expect a good solution could easily make it less than 1%. A plot of the R values against VF values should show no trend/pattern, where Ts is the total time for each log (i.e., the value in the last row of the first column for each log).
  • #1
johnpjust
22
0
Warning...this requires scripting and iteration, and is not theoretical -- it is a real problem I haven't been able to solve, but I'm sure someone here can... :-)

Data: each .csv file is a test recorded at a time interval of 7.5Hz and each file has 3 columns. The first column is time in seconds, the second column is a multiplier (see formula below), and third column is the measured value (to be "transformed"). There is also a corresponding value for each log in the "W.csv" file.

Formula (to produce a value for each file): R = [W_log_Val] / [#.log_val] -->
  • W_log_Val is the corresponding value for that file located in the "W.csv" file.
  • the #.log_val = ∑(col2_val)*(F(col3_val)) (a summation over the rows in the file)
  • F(col3_val) is the function/transformation of the measured value to be found
Goal: A function/transform "F" that can be applied on the variable recorded in the third column ("col3_val") such that the coefficient of variation of all the "R" values is minimized. The CV should definitely be less than 3%, but I expect a good solution could easily make it less than 1%.

Additional Constraint: A plot of the R values against VF values should show no trend/pattern, where -->
  • VF = [#.log_val]/[Ts]
  • "Ts" = the total time for each log (i.e., the value in the last row of the first column for each log)
See an example of the trend when no transform is applied in attached jpg.

Example:
applying a SQRT transformation to the measured value "col3_val" helps significantly, but the CV is still around 8-9% and does not satisfy the constraint.

See example of trend after applying sqrt in attached PDF
 

Attachments

  • data.zip
    71.7 KB · Views: 402
  • Fit R by VF.jpg
    Fit R by VF.jpg
    7.6 KB · Views: 415
  • sqrtPlot.pdf
    34.3 KB · Views: 232
Last edited:
Physics news on Phys.org
  • #2
Without additional constraints the problem will have a mathematical solution you clearly don't want: some weird jumping function that works exactly with the dataset used to produce it (and nothing else), based on tuning the function value for specific col3_val appearing in your dataset.

To generalize the sqrt attempt, you can take the nth power of the values and see which n works best (for real n).

The unchanged function looks close to $$ R \approx \frac{1}{VF} = \frac{Ts}{[\#.log\_val]}$$The dependence on #.log_val follows its definition: $$R=\frac{[W\_log\_Val]}{ [\#.log\_val]}$$ which is suspicious.
 
  • #3
Thanks for your response.

I've tried the obvious stuff already (including a sweep over n powers). While I agree with your implicit point (that one does not want to over-fit the data) in trying to generalize the fit, I'm at the point where I need to "step up" to another level of complexity because I know this same issue exists in a larger data set. I'm not sure if that means a sigmoidal function or something similar, but my experience with the problem tells me if someone knows of a more "flexible" nonlinear transformation (more tune-able parameters that still produce monotonically increasing values post-transformation), then I think a solution is possible that will generalize well to the larger data set.

Also, I do realize the stipulation/constraint I have looks somewhat suspicious at first, but it does have physical meaning and the added "W_log_Val" term is what makes it an independent term (so it turns out OK).
 
  • #4
- (x+c)^n
- log(x+c)
- e^(cx)
- atan(x/c + d)
- arbitrary, scaled sums of the options above (including n=0 which gives a constant)
- sine, cosine, if there is a motivation to expect oscillations
 

1. What is the purpose of finding a function/transform for a signal that minimizes CV of data?

The purpose of this task is to reduce the coefficient of variation (CV) of a signal, which is a measure of the variability in the data. By finding a function or transform that minimizes the CV, we can make the data more consistent and reliable for analysis.

2. How is the CV of data calculated?

The CV of data is calculated by dividing the standard deviation of the data by the mean and multiplying by 100. It is expressed as a percentage and is a measure of how much the data varies in relation to the mean. A higher CV indicates a greater variability in the data.

3. What are some common functions/transforms used to minimize the CV of data?

Some common functions/transforms used to minimize CV include logarithmic, square root, and power transformations. These transformations can help to reduce the impact of extreme values and improve the normality of the data. Other methods such as data smoothing and outlier removal can also help to minimize the CV.

4. How do you know if a function/transform has successfully minimized the CV of data?

The best way to determine if a function/transform has successfully minimized the CV of data is to compare the CV before and after the transformation. If the CV has decreased significantly, then the transformation can be considered successful. It is also important to visually inspect the data to ensure that it appears more consistent and normally distributed after the transformation.

5. Are there any limitations to using a function/transform to minimize the CV of data?

Yes, there are some limitations to using a function/transform to minimize the CV of data. One limitation is that the transformation may not work well for all types of data. Additionally, the choice of transformation may be subjective and could potentially alter the interpretation of the data. It is important to carefully consider the data and the purpose of the analysis before selecting a transformation method.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
6
Views
1K
  • Linear and Abstract Algebra
Replies
15
Views
4K
  • Programming and Computer Science
Replies
30
Views
4K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
Replies
8
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
887
  • MATLAB, Maple, Mathematica, LaTeX
Replies
4
Views
3K
  • General Math
Replies
2
Views
965
Replies
7
Views
856
Back
Top