Why does my loop run slower with larger lists in Python?

Isabella Hope
Messages
7
Reaction score
1
TL;DR
Noticing slower execution when looping through larger Python lists. Wondering if this is normal behavior or if I’m misunderstanding how Python handles iteration internally.
Hi everyone,


I’m practicing Python and noticed something strange.
When I run a simple loop on a small list, it’s fast—but when I run the same loop on a much larger list, the execution time increases more than I expected.


Example (simplified version):


Python:
data =list(range(1_000_000))
count = 0 for i in data:   
    count += i

My question:
Is this slowdown normal due to the size of the list, or is there something about Python’s loop execution that I might be misunderstanding?


I’m just trying to learn more about how Python handles iteration under the hood.
Any explanation would be appreciated.


Thanks!

Mentor note: cleaned up pasted code removing numerous span tags.
 
Last edited by a moderator:
Technology news on Phys.org
It is important to know what the algorithm is doing with the list. Some algorithm times grow exponentially. Others do not. That has nothing to do with the computer language. You would have to rule that out before you blame the language.
 
You’re right the algorithm matters more than the language itself. I’m checking the time complexity now to see whether the slowdown is coming from the way the list is being processed rather than the language. Thanks for pointing that out.
 
Isabella Hope said:
when I run the same loop on a much larger list, the execution time increases more than I expected.
Can you be more precise? What execution time are you expecting to see?
 
Isabella Hope said:
You’re right the algorithm matters more than the language itself.
The "algorithm" here is very simple and looks linear in the size of the list--but actually you are running two linear algorithms--one to construct the list (the list(range) call), the second to iterate over it (the for loop).
 
  • Like
Likes FactChecker
How long does it take?
Python has a reputation of being very slow, but I would be surprised if something that simple took a noticeable time.
I have practically no personal experience with Python, so I am curious.
 
FactChecker said:
How long does it take?
Python has a reputation of being very slow, but I would be surprised if something that simple took a noticeable time.
I have practically no personal experience with Python, so I am curious.
One of the reasons that python is slow, is that any variable can be any object. This will make it necessary to check the type of the object on any read of a variable, and then you need to fetch the data of the object (stored elsewhere in what might be a random location, potentially causing a slow memory read). If you change an integer, a new integer object will have to be created, so the memory allocater needs to be called.
If you do "count +=i" you'll get 2 variable reads in this and one creation of a new int object in this loop.

Python is however faster if you use small integers. (<= 255, or <= 1024 in later versions). All the integer objects will be created at the start of the interpreter, so python doesn't have to create new integer objects. https://datasciocean.com/en/python-tutorial/small-integer-cache-in-python/

data = list(range(1_000_000)) will probably take longer than the loop, but it will also cause new int objects to be created, so it will also work faster with small ints.

Both the creation of the list and the loop can also work faster for smaller lists because of memory caching, so this could also be the cause, altough I think that the integer cache is more important.
 
  • Informative
Likes FactChecker
PeterDonis said:
The "algorithm" here is very simple and looks linear in the size of the list--but actually you are running two linear algorithms--one to construct the list (the list(range) call), the second to iterate over it (the for loop).
Thanks for pointing that out I actually didn’t think about the fact that list(range(...)) itself is already an O(n) operation. I was only focusing on the loop. So in reality I’m timing two separate linear passes, which explains part of the slowdown. That definitely helps clarify things.
 
FactChecker said:
How long does it take?
Python has a reputation of being very slow, but I would be surprised if something that simple took a noticeable time.
I have practically no personal experience with Python, so I am curious.
Good question! On my machine, creating the list + looping through 1,000,000 items takes around a few hundred milliseconds total. It’s not “slow” in an absolute sense, but the increase becomes noticeable when I compare it to much smaller lists. I’m still trying to understand where that extra time comes from, so this discussion is really helpful.
 
  • Like
Likes FactChecker
  • #10
willem2 said:
One of the reasons that python is slow, is that any variable can be any object. This will make it necessary to check the type of the object on any read of a variable, and then you need to fetch the data of the object (stored elsewhere in what might be a random location, potentially causing a slow memory read). If you change an integer, a new integer object will have to be created, so the memory allocater needs to be called.
If you do "count +=i" you'll get 2 variable reads in this and one creation of a new int object in this loop.

Python is however faster if you use small integers. (<= 255, or <= 1024 in later versions). All the integer objects will be created at the start of the interpreter, so python doesn't have to create new integer objects. https://datasciocean.com/en/python-tutorial/small-integer-cache-in-python/

data = list(range(1_000_000)) will probably take longer than the loop, but it will also cause new int objects to be created, so it will also work faster with small ints.

Both the creation of the list and the loop can also work faster for smaller lists because of memory caching, so this could also be the cause, altough I think that the integer cache is more important.
Thanks for the detailed explanation this makes things much clearer. I didn’t realize how much overhead comes from Python treating every integer as an object, including the constant type checks and the creation of new int objects during count += i. That definitely explains why even a simple loop adds up over a million iterations.


Also interesting point about the small-integer cache I didn’t know Python pre-allocates those. It makes sense now why range() performs better with small numbers. Appreciate the link too; I’ll read more into this.
 
  • #11
I spoke too soon when I said your example looked simple. If it is setting up a linked list of integers, data, possibly scattered around in memory, and then tracing through the linked list inside the loop, that is not so simple. The short time you mentioned does not seem so bad. It's not like it is just iterating an integer counter.
 
  • #12
FactChecker said:
If it is setting up a linked list of integers
A Python list is not a linked list; the internal implementation at the C level is an array of pointers to the objects in the list.

There is some extra overhead while adding items to the list, since every so often a new, larger array of pointers has to be allocated and the pointers moved.
 
  • Informative
Likes FactChecker

Similar threads

  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
0
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 4 ·
Replies
4
Views
1K
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 5 ·
Replies
5
Views
3K