Median interpolation on logarithmic data

In summary, the conversation discusses finding the 10th percentile of a data set with logarithmically increasing bin sizes. The speaker shares their approach of using the median interpolation method and asks for clarification on its suitability for logarithmic data. The expert then suggests using the geometric mean method and provides the steps for calculating the 10th percentile using this approach. They also encourage the speaker to explore other methods for logarithmically increasing data in the future.
  • #1
thomas49th
655
0

Homework Statement


Hello, I have a set of data that is grouped into bins. The bin sizes increase logarithmically by 10% (1.1 x the previous bin).


Homework Equations


Bin
Mean
Radius Frequency
1.03 4924
1.10 9938
1.21 14009
1.32 12269
1.44 15813
1.58 18723
1.74 21471
... ...

My total frequency is 612574
I used excel's frequency function. For the first group any data that is greater than 1.03 and less than 1.10 is put into that group. So you can write it

1.03<= x < 1.10

so the lower class boundary is 1.03

The Attempt at a Solution



using the median interpolation formulae

lower class boundary + ((n/2*[culmalative frequency of the previous classes])/frequency of class) * class width

So for the 10% percentile:

it lies in class 1.58.

1.58 + (((612574/2)-56953)/18723) . (1.74 - 1.58)

= 3.7107

Which cannot be. Is this because I am using logarithmic data? Is there a more suitable way?
Thanks
 
Physics news on Phys.org
  • #2


Hi there,

Thank you for sharing your data and the approach you have taken so far. It appears that you are using the median interpolation method to find the 10th percentile of your data set. This method can be used for data that is evenly distributed, but it may not be the most suitable for logarithmically increasing data.

In this case, you may want to consider using the geometric mean method to find the 10th percentile. This method takes into account the fact that your data is increasing logarithmically and can provide a more accurate estimate.

To use the geometric mean method, you can follow these steps:

1. Calculate the geometric mean of the lower and upper boundaries of the class where the 10th percentile falls. In this case, it would be:

√(1.58 * 1.74) = 1.659

2. Next, calculate the ratio of the frequency of the class where the 10th percentile falls to the total frequency. In this case, it would be:

18723/612574 = 0.0305

3. Finally, multiply the geometric mean by the frequency ratio to get the estimated 10th percentile. In this case, it would be:

1.659 * 0.0305 = 0.0506

Therefore, the 10th percentile of your data set is estimated to be 0.0506. I hope this helps and I encourage you to explore other methods that may be more suitable for logarithmically increasing data in the future. Best of luck with your research!
 

1. What is median interpolation on logarithmic data?

Median interpolation on logarithmic data is a method used to estimate missing values in a set of data that follows a logarithmic trend. It involves finding the median of the logarithmic values of the data points and using that value as the estimate for the missing data point.

2. When is median interpolation on logarithmic data used?

Median interpolation on logarithmic data is often used when there are missing values in a set of data that follows a logarithmic trend. It is also used when the data has a large range of values and the median is a better representation of the central tendency than the mean.

3. How is median interpolation on logarithmic data different from linear interpolation?

Median interpolation on logarithmic data differs from linear interpolation in that it takes into account the logarithmic nature of the data. In linear interpolation, the missing value is estimated by drawing a straight line between the two nearest data points, whereas in median interpolation on logarithmic data, the median of the logarithmic values is used to estimate the missing value.

4. What are the advantages of using median interpolation on logarithmic data?

One advantage of using median interpolation on logarithmic data is that it is less sensitive to extreme values or outliers in the data. It also takes into account the logarithmic nature of the data, which can provide a more accurate estimate for the missing value.

5. Are there any limitations to using median interpolation on logarithmic data?

One limitation of median interpolation on logarithmic data is that it assumes a linear trend in the logarithmic values between the two nearest data points. This may not always be the case in real-world data. Additionally, it may not be suitable for data sets with a small number of data points.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
5K
Back
Top