How Can I Remove Outlying Data Points from a Scatter Plot in MATLAB?

  • Thread starter Keithwkc
  • Start date
In summary, the user is trying to filter out data points that are too far away from a given curve by replacing them with NaN's. They have tried using different methods, but have not been successful. A solution using distance-based criteria in MATLAB is provided to achieve the desired output.
  • #1
Keithwkc
1
0
Greetings,

I have been trying to get rid of a bunch scattered data points whose x coordinates lie outside a distance from the x coordinates of a curve. These outlying x coordinates need to be changed into NaN's. The trouble is that a lot of these outlying points have the same y-axis values and my code has only been able to remove one of each set of data points with the same y-axis value.
For example:

scattered point x-coord= [1 2 3 4 5 6 7 8 9 10]

scattered point y-coord= [2 2 10 10 2 4 4 4 5 5 ]

curve x-coord= [0 1 2 3 4 5 6 7 8 9]

curve y-coord= [0 1 2 3 4 5 6 7 8 9]

Need to change elements of scattered point x-coord lying more than 2 units away from curve x-coord into NaN's, i.e.

desired scattered output x-coord= [1 2 NaN NaN NaN 6 NaN NaN NaN NaN]

Any advice on to obtain the above "desired scattered output x-coord" will be greatly appreciated.

I have tried using xi(find(abs(xi-xj)>=1))=NaN and even a for loop to run through each and every individual element

for i=1:n %n=number of elements in the vector scattered output coord
index=find(sx==cx(i)) % sx= scattered x-coord and cx= curve x-coord
sx(find(abs(sx(index)-cy(i))))=NaN % cy= curve y coord
end



but have not been successful in obtaining the desired result.

Thank you in advance for your help.
 
Technology news on Phys.org
  • #2




Thank you for reaching out. It seems like you are trying to filter out data points that are too far away from a given curve. One way to approach this problem is to use a distance-based criteria to identify the outliers and then replace them with NaN's. Here is an example of how you can do this using MATLAB code:

% Define your data points
x = [1 2 3 4 5 6 7 8 9 10];
y = [2 2 10 10 2 4 4 4 5 5];

% Define your curve
curve_x = [0 1 2 3 4 5 6 7 8 9];
curve_y = [0 1 2 3 4 5 6 7 8 9];

% Set a distance threshold
threshold = 2;

% Calculate the distance between each data point and the curve
distances = abs(x - curve_x');

% Find the indices of the points that are more than the threshold distance away
outliers = find(distances > threshold);

% Replace the outliers with NaN's
x(outliers) = NaN;

% Print the result
disp(x);

This code will give you the desired output of [1 2 NaN NaN NaN 6 NaN NaN NaN NaN]. You can adjust the threshold value to fit your specific needs.

I hope this helps. Let me know if you have any further questions. Happy coding!
 

1. How do I identify outliers in my data?

To identify outliers, you can use statistical methods such as calculating the standard deviation and interquartile range, or visual methods such as box plots and scatter plots.

2. Should I remove outliers from my data?

Whether or not to remove outliers depends on the nature of your data and the purpose of your analysis. In some cases, outliers may be valid data points that provide valuable insights. However, if the outliers are due to measurement errors or do not align with the overall pattern of the data, it may be appropriate to remove them.

3. What is the best method for removing outliers?

There is no one-size-fits-all method for removing outliers. Some common approaches include using the interquartile range to determine a threshold for outlier removal, using statistical models to identify and remove outliers, or manually inspecting and removing outliers. It is important to carefully consider the implications of each method and choose the most appropriate one for your data.

4. Will removing outliers affect my results?

Removing outliers can affect your results, as it changes the distribution and summary statistics of your data. It is important to carefully consider the impact of outlier removal on your analysis and communicate any changes made to your data.

5. Are there any alternatives to removing outliers?

Instead of removing outliers, you can also consider transforming your data or using robust statistical methods that are less affected by outliers. In addition, you can conduct sensitivity analyses to assess the impact of outliers on your results and make informed decisions about their inclusion in your analysis.

Similar threads

  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
4
Views
999
  • Programming and Computer Science
Replies
27
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
4
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
10
Views
3K
  • Programming and Computer Science
Replies
34
Views
2K
  • Programming and Computer Science
Replies
17
Views
2K
  • Programming and Computer Science
Replies
17
Views
2K
  • Programming and Computer Science
Replies
3
Views
680
  • Programming and Computer Science
Replies
2
Views
1K
Back
Top