MATLAB Matlab help: indexing in for loops

AI Thread Summary
The user is experiencing an error in MATLAB when running a for loop with a large dataset, where the index unexpectedly becomes NaN, leading to an "index must be a positive integer" error. The issue arises during nested loops designed to compare data points for identifying those on the Pareto front, which is computationally intensive given the size of the dataset (up to 50,000 points). Suggestions include preallocating matrices to improve performance and considering alternative algorithms to reduce the number of comparisons needed. The user acknowledges that memory limitations on their older laptop may exacerbate these issues, indicating a need for algorithm optimization. The discussion highlights the importance of efficient coding practices in MATLAB to handle large datasets effectively.
sarawayland76
Messages
3
Reaction score
0
Hello!

I consider myself typically a confident Matlab user. But I have encountered a new problem that I have no idea how to solve, so I'm hoping I can find some help.

I have set up a for loop in my code to the effect of

for i = 1:n
if MyArray(i,1) < some value

perform other calculations

end​

end

The code runs fine for small values of n. But, when I have a very large n (somewhere near 50,000), I get an error message:
? Attempted to access MyArray(NaN,1); index must be a positive integer or logical.
And, checking on the value of the index i, it has become NaN.

I don't understand how this is possible as Matlab should be incrementing i itself within the for loop. (I promise I'm not messing with the index inside the for loop!) I thought it might be a problem with the variable type of i, since I think values stored as integers have a relatively small limit, but it looked like i was stored as type long, which should be able to hold much larger than 50,000.

Does anyone have any suggestions? Your help is much appreciated!

-S
 
Physics news on Phys.org
If you aren't touching i inside the loop then this looks really strange. Can you provide the code / MATLAB build so I can try to replicate the results? Perhaps cut down the code inside the loops to some simple operation which still produces the error message if you don't want to post the entire thing?
 
I have tried to create a simpler version of my code for posting. What this code does (or is supposed to do):
  • First, create a matrix (nrpts x 2) that contains my data (for this test code, just a bunch of random numbers). Next, create a nrpts x 1 array that contains the series 1, 2, 3, 4,... etc (this data needed later so it really doesn't serve much purpose in the example code).
  • I have a pair of nested for loops to go through the data held in objectivePoints. The goal is to compare each pair of data points objectivePoints(i,:) to each other. I am trying to compare a point (a1, b1) to (a2, b2), and if both a1 < a2 and b1 < b2, set the index of b2 to zero in pointIndex.
  • Also, it's not meaningful to compare a pair to itself so there is an if check on count1 ~= count2.

I ran this code a few times, and didn't get an error every time, maybe half of the time. The error I get is (with various numbers instead of 63313):
? Attempted to access objectivePoints(63313,-2.14748e+009); index must be a positive integer or logical.
which really makes no sense, considering that I am only looking at objectivePoints(*,1) or (*,2). I occasionally get this error instead of the NaN error on my original code as well.

I am totally stumped! Please do let me know if you also encounter this error. Thanks! (Also, there is probably a far more efficient way to approach the comparison problem, so if you happen to have any great ideas on a better way to do that, let me know, since it is clearly a bit slow as I have it set up currently.)




clear all

% Number of points tested
nrpts = 1e5;

% Create example data set
objectivePoints = zeros(nrpts,2);
objectivePoints = rand(size(objectivePoints));

for k = 1:nrpts
pointIndex(k,1) = k;​
end


% Comparison test
for count1 = 1:nrpts
for count2 = 1:nrpts​
if count1 ~= count2​
if objectivePoints(count1,1) < objectivePoints(count2,1)​
if objectivePoints(count1,2) < objectivePoints(count2,2)​

pointIndex(count2,1) = 0;​
end​
end​
end​
end​
end
 
sarawayland76 said:
I have tried to create a simpler version of my code for posting. What this code does (or is supposed to do):
  • First, create a matrix (nrpts x 2) that contains my data (for this test code, just a bunch of random numbers). Next, create a nrpts x 1 array that contains the series 1, 2, 3, 4,... etc (this data needed later so it really doesn't serve much purpose in the example code).
  • I have a pair of nested for loops to go through the data held in objectivePoints. The goal is to compare each pair of data points objectivePoints(i,:) to each other. I am trying to compare a point (a1, b1) to (a2, b2), and if both a1 < a2 and b1 < b2, set the index of b2 to zero in pointIndex.
  • Also, it's not meaningful to compare a pair to itself so there is an if check on count1 ~= count2.
]

Ok, thanks. I have tried running the code and it is still going, but there is clearly no reason why MATLAB should be looking for a negative one billion matrix index so something is probably making it explode. I suggest you try to reformulate your algorithm to do the same calculation using fewer iterations, because you're looking up matrix entries and comparing them 10^10 times (two for loops of 100,000). This is definitely why it is taking so long, and probably why your code is making MATLAB break.One thing which isn't fully clear is your overall objective. I am confused because you are in some way evaluating every pair of points against some criteria, and if the pair meets this criteria you record only the index of one of the points in the pair. I am guessing this code is used to find the index of the point which is closer from the origin than any other (according to the Manhatten metric), since if any point is found to be further away than any other point you set its index to zero?

Perhaps there is more to this than I understand.
Anyway, try preallocating your pointIndex matrix with
pointIndex = zeros(nrpts,1);

one line before the iteration which creates this. If you create a large matrix one row at a time it takes ages since MATLAB must create new matrices and copy the entries across, it can't just add a row on.

Second, I suggest you look for a different way of comparing distances. If you are actually looking for the index of the closest point, you can just assign each point a Manhatten distance (x + y rather than sqrt(x^2 + y^2), because you are only allowed to travel north-south and east-west) and then find the minimum, using something like:

Code:
mdist = zeros(nrpts,1);
for i = 1:nrpts
    mdist(i) = objectivePoints(i,1) + objectivePoints(i,2);
end
[C,I] = min(mdist);
I

mdist is your distance from the origin and min finds the minimum and saves its value in C and its index in I.
 
Thanks for the reply.

I agree, this would definitely not be the best way to find the index of the closest point. The goal of the code is to locate points along the Pareto front. To illustrate, here's a diagram from Wikipedia:

http://en.wikipedia.org/wiki/File:Front_pareto.svg"

All of the data points I have are the blue squares, and I'm trying to find points along the red line shown in the picture. But, as that picture shows, this code won't just return one point that is the closest, but many points. I am going through the comparison and looking at, say, point A and point C; if we see that f1(A) < f1(C) and f2(A) < f2(C), then we know that C is not on the Pareto front so I set its index to zero to remember that information.

The problem I am having is that even with a very large number of data points, only a few will end up lying along the Pareto front. I typically end up with 20-100 points out of 50,000. Unfortunately, 20 points really isn't enough to discern the shape of the Pareto front (it ends up looking like the Wikipedia picture and not like a nice curve). This is why I have been trying to use very large numbers of points.

Your tip about using the Manhattan distance to compare the points is interesting and I'll have to keep thinking if there is a way I can use that. But for now, the problem is that
f1(A) < f1(B) AND f2(A) < f2(B) implies f1(A) + f2(A) < f1(B) + f2(B)​
However, the converse isn't necessarily true.

Anyway, after all that trouble, I think the real problem is the scale of the loop. I often get "out of memory" errors when running Matlab (on my five year old laptop), so I suspect that the errors I'm getting are really more side effects of that. I will have to keep working on improving my algorithm instead.

Thanks for your help!
 
Last edited by a moderator:

Similar threads

Replies
4
Views
4K
Replies
1
Views
2K
Replies
4
Views
2K
Replies
10
Views
3K
Replies
8
Views
3K
Replies
10
Views
3K
Back
Top