How to calculate max/min scales on a scatter plot

  • Context: MHB 
  • Thread starter Thread starter expertalmost
  • Start date Start date
  • Tags Tags
    Plot
Click For Summary

Discussion Overview

The discussion revolves around methods for establishing smooth maximum and minimum lines on log scatter plots, particularly in the context of time series data from financial market analysis. Participants explore various mathematical approaches to address challenges posed by data clumping and the need for ongoing updates as new data arrives.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks a mathematical method to create smooth maximum and minimum lines for log scatter plots, noting issues with data clumping and the continuous influx of new data.
  • Another participant asks for clarification on the data generation process, the importance of data point inclusion between the lines, the data rate, and any additional features of interest.
  • The original poster explains that the data represents log-normal values from financial market analysis and specifies target percentages for data point inclusion between the max/min lines.
  • A suggestion is made to use cubic polynomial fitting to create an envelope around the data, with a request for further clarification on the criteria for evaluating the envelope's effectiveness.
  • The original poster describes their current ad hoc method of averaging extreme values and applying damping, expressing a desire for a more robust solution.
  • A later reply introduces a working solution involving rank-type smoothing or moving quantiles, detailing a specific approach using sample sizes and damping to achieve smooth lines.
  • Another participant expresses support for the original poster's findings, indicating a shared understanding of the mathematical concepts discussed.

Areas of Agreement / Disagreement

Participants generally agree on the challenges posed by the data and the need for effective smoothing techniques. However, multiple approaches are proposed, and no consensus is reached on a single best method, reflecting ongoing exploration and refinement of ideas.

Contextual Notes

Limitations include the dependence on specific data characteristics, such as clumping and the nature of the time series, which may affect the applicability of proposed methods. The discussion also highlights the variability in participant expertise and the informal nature of the solutions presented.

Who May Find This Useful

This discussion may be useful for individuals involved in data analysis, particularly in financial contexts, who are seeking methods for visualizing and interpreting time series data through scatter plots.

expertalmost
Messages
4
Reaction score
0
Good morning!

I have 3 log scatter plots that I want to establish smooth maximum and minimum lines. What is the usual mathematical method to do that? (Image and excel file links below.)

The black lines on the scatter plot images are hand drawn. The third scatter plot is especially tricky and not amenable to a moving average plus stddev because of the data clumping. Note: This is time series data so new data constantly comes in. In other words, I cannot just use the whole data population in one shot.

Any ideas would be greatly appreciated.

Excel File: https://dl.dropboxusercontent.com/u/44057708/Three%20Scatters.xls
Image at: https://dl.dropboxusercontent.com/u/44057708/ThreeScatters.jpg
 
Physics news on Phys.org
Can you give us a little more context? Here are some questions I have:

1. How is this data generated? What are you measuring?

2. Is it important that every single data point in one cluster lies between your smooth max and min lines? Or is it enough that the vast majority lie between the two lines?

3. What is the data rate of this data? That is, how fast is the data coming in?

4. Are there any other features you'd like to know about the data? Local peaks, for example?
 
Thank you for your time and questions! I appreciate your efforts. Here are some brief answers to your questions.

1) I use financial market analysis and these are the log-normal values of them. Whether the data is actually/truly log-normal is not really a concern as extremes are clipped and indicated as such. Using mean/stddev analysis on the third series does not work well due to the data clumping. I am looking for a solution elegant/roboust/general enough for all three data sets. And I have many groups of three data sets.

2) Not every point needs to lie between my max/min. I was targetting 80% on the minimum due to the paucity of points there and zero is a less critical component. Targeting 95% on the maximum side.

3) The data is coming in slowly. Only using daily analysis now.

4) In this case, not interested in local peaks other than how well they get smoothed in the final scaling.

Hope this helps define the problem more clearly :)

Thank you again for your interest.
 
You say that the mean/std dev approach doesn't work. What if you computed a moving average on the basis of a lot more data points? For example:

1. Fit a cubic polynomial to the data. Excel will do this quite readily. Suppose the result to be $f(t)$.
2. Compute the maximum deviation from the cubic, and construct an envelope around $f(t)$ thus: $f(t) \pm \text{max dev}$. That would guarantee all the data would be in the envelope.

However, the envelope might not be tight enough. To help you more, I think I still need to know your design requirements better. By what criteria would you judge the "goodness" of the envelope?
 
Thank you for your suggestions! You obviously know considerably more math than me and I appreciate your insights and experience. I will have to investigate cubic polynomials.

Right now I take the 25 largest/smallest of the last 100 elements and average that. I also add a stddev amount to the max. (ad hoc...yes!) Then smooth by damping (multiplying the change by .1 and using it). I was hoping there was a more elegant/roboust/general solution as I have to tweak the stddev and damping factors for different data sets.

The issue I have with using a larger data sample is the the lag introduced.

The use is quite simple. I use previous data to establish stable max/min levels so I can scale new values as they come in. Gives me a 0-1 range that is consistent and meaningful across data sets. As far as goodness, again, ad hoc. No more than 10-15 percent of the values should be clipped above/below my max/min scale. So the upper black line is my 1 and the lower is my 0 and as new data comes in, it is scaled to the most recent 0-1 range. And then it is used to update the population sample. Standard time-series analysis (hopefully).

My apologies for the lengthy replies. Not having much formal training in this, I end up using more words than probably necessary. Thank you for your patience! Hope it helps clarify what I'm trying to do. :)
 
I've generated a working solution and just wanted to post it for future generations ;)

Someone pointed out that because the data is not really fall under "standard error" due to the gaps, that medians +/- stddev would not really work. I have confirmed this with many days of attempts.

Therefore, the other typical solution, as far as I can tell, is called rank-type smoothing or moving quantiles; with damping of the end result. Basically, using a sample size of 100, I take an average of the 10 largest items for the maximum and an average of the 25 smallest items for the minimum. I then dampen the changes to 10% for smoothing. This gives me a smooth enough maximum and minimum line to use as a 0-1 scale.

Hope this helps!
 
Glad you found something that worked! I can't say I understand it - this shows that you might not be so far below me in mathematical knowledge as you thought. ;)
 

Similar threads

  • · Replies 0 ·
Replies
0
Views
6K
  • · Replies 2 ·
Replies
2
Views
8K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 13 ·
Replies
13
Views
4K