MHB How to calculate max/min scales on a scatter plot

  • Thread starter Thread starter expertalmost
  • Start date Start date
  • Tags Tags
    Plot
AI Thread Summary
The discussion focuses on establishing smooth maximum and minimum lines for log scatter plots using time series data. The user seeks a robust mathematical method, noting that traditional mean and standard deviation approaches are ineffective due to data clumping. They currently use a rank-type smoothing method, averaging the largest and smallest values from a sample to create a 0-1 scaling system. The goal is to ensure that no more than 10-15% of values are clipped outside the established max/min lines. The conversation highlights the challenges of adapting methods for different datasets while maintaining a consistent scaling approach.
expertalmost
Messages
4
Reaction score
0
Good morning!

I have 3 log scatter plots that I want to establish smooth maximum and minimum lines. What is the usual mathematical method to do that? (Image and excel file links below.)

The black lines on the scatter plot images are hand drawn. The third scatter plot is especially tricky and not amenable to a moving average plus stddev because of the data clumping. Note: This is time series data so new data constantly comes in. In other words, I cannot just use the whole data population in one shot.

Any ideas would be greatly appreciated.

Excel File: https://dl.dropboxusercontent.com/u/44057708/Three%20Scatters.xls
Image at: https://dl.dropboxusercontent.com/u/44057708/ThreeScatters.jpg
 
Mathematics news on Phys.org
Can you give us a little more context? Here are some questions I have:

1. How is this data generated? What are you measuring?

2. Is it important that every single data point in one cluster lies between your smooth max and min lines? Or is it enough that the vast majority lie between the two lines?

3. What is the data rate of this data? That is, how fast is the data coming in?

4. Are there any other features you'd like to know about the data? Local peaks, for example?
 
Thank you for your time and questions! I appreciate your efforts. Here are some brief answers to your questions.

1) I use financial market analysis and these are the log-normal values of them. Whether the data is actually/truly log-normal is not really a concern as extremes are clipped and indicated as such. Using mean/stddev analysis on the third series does not work well due to the data clumping. I am looking for a solution elegant/roboust/general enough for all three data sets. And I have many groups of three data sets.

2) Not every point needs to lie between my max/min. I was targetting 80% on the minimum due to the paucity of points there and zero is a less critical component. Targeting 95% on the maximum side.

3) The data is coming in slowly. Only using daily analysis now.

4) In this case, not interested in local peaks other than how well they get smoothed in the final scaling.

Hope this helps define the problem more clearly :)

Thank you again for your interest.
 
You say that the mean/std dev approach doesn't work. What if you computed a moving average on the basis of a lot more data points? For example:

1. Fit a cubic polynomial to the data. Excel will do this quite readily. Suppose the result to be $f(t)$.
2. Compute the maximum deviation from the cubic, and construct an envelope around $f(t)$ thus: $f(t) \pm \text{max dev}$. That would guarantee all the data would be in the envelope.

However, the envelope might not be tight enough. To help you more, I think I still need to know your design requirements better. By what criteria would you judge the "goodness" of the envelope?
 
Thank you for your suggestions! You obviously know considerably more math than me and I appreciate your insights and experience. I will have to investigate cubic polynomials.

Right now I take the 25 largest/smallest of the last 100 elements and average that. I also add a stddev amount to the max. (ad hoc...yes!) Then smooth by damping (multiplying the change by .1 and using it). I was hoping there was a more elegant/roboust/general solution as I have to tweak the stddev and damping factors for different data sets.

The issue I have with using a larger data sample is the the lag introduced.

The use is quite simple. I use previous data to establish stable max/min levels so I can scale new values as they come in. Gives me a 0-1 range that is consistent and meaningful across data sets. As far as goodness, again, ad hoc. No more than 10-15 percent of the values should be clipped above/below my max/min scale. So the upper black line is my 1 and the lower is my 0 and as new data comes in, it is scaled to the most recent 0-1 range. And then it is used to update the population sample. Standard time-series analysis (hopefully).

My apologies for the lengthy replies. Not having much formal training in this, I end up using more words than probably necessary. Thank you for your patience! Hope it helps clarify what I'm trying to do. :)
 
I've generated a working solution and just wanted to post it for future generations ;)

Someone pointed out that because the data is not really fall under "standard error" due to the gaps, that medians +/- stddev would not really work. I have confirmed this with many days of attempts.

Therefore, the other typical solution, as far as I can tell, is called rank-type smoothing or moving quantiles; with damping of the end result. Basically, using a sample size of 100, I take an average of the 10 largest items for the maximum and an average of the 25 smallest items for the minimum. I then dampen the changes to 10% for smoothing. This gives me a smooth enough maximum and minimum line to use as a 0-1 scale.

Hope this helps!
 
Glad you found something that worked! I can't say I understand it - this shows that you might not be so far below me in mathematical knowledge as you thought. ;)
 

Similar threads

Replies
0
Views
5K
Replies
2
Views
8K
Replies
10
Views
5K
Replies
13
Views
3K
Back
Top