Overly large Python Jupyter Notebook (.ipynb) files

  • Context: Python 
  • Thread starter Thread starter WWGD
  • Start date Start date
  • Tags Tags
    files Python
Click For Summary

Discussion Overview

The discussion revolves around the unexpectedly large file sizes of certain Python Jupyter Notebook (.ipynb) files compared to others that are similar in content and structure. Participants explore potential reasons for the size discrepancies, including the presence of output data and images, as well as the implications of coding practices that may lead to larger files. Additionally, a secondary question regarding the use of keyboard shortcuts within a Jupyter Notebook is raised.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Homework-related

Main Points Raised

  • Some participants suggest that the large file sizes may be due to the notebooks saving output from all cells, which can include large amounts of data or images.
  • One participant proposes that the large notebooks might contain images generated by libraries like matplotlib.
  • Another participant mentions that the size could be attributed to running infinite loops or extensive data generation in the code.
  • There is a discussion about the method to clear outputs before saving to potentially reduce file size.
  • Participants inquire about the operating system and method of launching Jupyter, suggesting that these factors may influence the experience and functionality.
  • One participant expresses that they do not use plots or complex features, relying solely on basic Python code, which raises questions about the reasons for their large file sizes.
  • Another participant shares their experience with coding practices that led to larger files due to repeated outputs from inefficient loops.

Areas of Agreement / Disagreement

The discussion contains multiple competing views regarding the reasons for the large file sizes, and no consensus is reached on a single explanation. Participants offer various hypotheses, but uncertainty remains about the exact causes.

Contextual Notes

Participants mention the potential impact of coding practices, such as infinite loops and output generation, on file sizes. There are also references to the specific environments in which Jupyter is run, which may affect user experience.

Who May Find This Useful

This discussion may be useful for users of Jupyter Notebooks who are experiencing similar issues with file sizes, as well as those interested in optimizing their coding practices and understanding the implications of output data on file management.

WWGD
Science Advisor
Homework Helper
Messages
7,783
Reaction score
13,038
TL;DR
For some strange-seeming reasons, some of my Python files are extremely larger than others with similar content
Hi all,
I was looking up my virtual file manager in Python Jupyter and in the listing of notebooks; all similar to each other in size and scope , some files
stand out in terms of size for no apparent reasons. I have some 40 notebooks ; all- but- 2 ranging from 10kb to 284kb at the extremes , and two others, notebooks as well, with the same .ipynb extension with sizes 77.3 mb and 91mb respectively. As I said, the latter two are very similar to the other 38: regular notebooks with Python code. Why would these two files be so much larger than the other 38?

EDIT: I hope its ok to post an additional question in the same post:
How do we do a Ctrl +alt+ Delete within a Python Notebook? The think is this notebook is part of
a virtual back end server and not part of the physical machine.
 
Last edited:
Technology news on Phys.org
WWGD said:
Summary:: For some strange-seeming reasons, some of my Python files are extremely larger than others with similar content

Hi all,be so
I was looking up my virtual file manager in Python Jupyter and in the listing of notebooks; all similar to each other in size and scope , some files
stand out in terms of size for no apparent reasons. I have some 40 notebooks ; all- but- 2 ranging from 10kb to 284kb at the extremes , and two others, notebooks as well, with the same .ipynb extension with sizes 77.3 mb and 91mb respectively. As I said, the latter two are very similar to the other 38: regular notebooks with Python code. Why would these two files be so much larger than the other 38?
Do the large ones contain images from e.g. matplotlib?

WWGD said:
EDIT: I hope its ok to post an additional question in the same post:

WWGD said:
How do we do a Ctrl +alt+ Delete within a Python Notebook? The think is this notebook is part of
a virtual back end server and not part of the physical machine.
Windows, Linux, iOS...? Also how did you start Jupyter - from the command line, Anaconda Navigator...?

Try visiting http://localhost:8888/tree#running in a browser and selecting 'shutdown' on the appropriate process.

If Windows: try switching to the terminal window running the Jupyter Notebook process with Alt-Tab then hit Ctrl-C.
 
Last edited:
  • Like
Likes   Reactions: WWGD
When you save the notebook, it saves the output of all of the cells, which may include large amounts of data or images. That's why the files are so large. If, before you save the file, you go to the Kernel tab and click "Restart and clear output", it will clear all of the output in the cells. Then when you save it, you will just be saving the code in the cells, not the output, which will probably make the file much smaller.
 
  • Like
Likes   Reactions: WWGD
pbuk said:
Do the large ones contain images from e.g. matplotlib?

Windows, Linux, iOS...? Also how did you start Jupyter - from the command line, Anaconda Navigator...?

Try visiting http://localhost:8888/tree#running in a browser and selecting 'shutdown' on the appropriate process.

If Windows: try switching to the terminal window running the Jupyter Notebook process with Alt-Tab then hit Ctrl-C.
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
 
WWGD said:
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
My largest python source code file (of 55 different source code files) is 2 KB. For my purposes of merely learning python syntax I don't need or use any IDE -- I open a Win 10 command prompt window and run python.exe from that window.
 
  • Like
Likes   Reactions: WWGD
Mark44 said:
My largest python source code file (of 55 different source code files) is 2 KB. For my purposes of merely learning python syntax I don't need or use any IDE -- I open a Win 10 command prompt window and run python.exe from that window.
I am too used to the Jupyter notebook interface. But , thanks, I will consider that.
 
WWGD said:
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
So does the http://localhost:8888/tree#running -> Shutdown method work for you?

There is no virtual machine involved, just jupyter running in the background running your python programs in a shell and a web server as an interface.
 
  • Like
Likes   Reactions: WWGD
WWGD said:
I don't have any plots at all, I don't use anything other than python proper in my overly large files.
This code will generate a pretty large .ipynb file:
[CODE lang="python" title="Infinite loop"]while True:
print(1)[/CODE]
 
  • Like
Likes   Reactions: WWGD
pbuk said:
So does the http://localhost:8888/tree#running -> Shutdown method work for you?

There is no virtual machine involved, just jupyter running in the background running your python programs in a shell and a web server as an interface.
Thanks, my bad. I meant a virtual server at local host. Thanks for the suggestion. I haven't gotten to my pc yet, will let you know.
 
  • #10
pbuk said:
This code will generate a pretty large .ipynb file:
[CODE lang="python" title="Infinite loop"]while True:
print(1)[/CODE]
Hmm.. I am remembering now I did several copies ( for practice) of an algorithm to print all primes in a given range. It was from 2 to around 10,000. Maybe that explains it.
 
  • #11
Still nothing virtual. It's all running in the same Windows kernel in multiple processes spawned by the Jupyter server.
 
  • Like
Likes   Reactions: WWGD
  • #12
WWGD said:
Hmm.. I am remembering now I did several copies ( for practice) of an algorithm to print all primes in a given range. It was from 2 to around 10,000. Maybe that explains it.
Ya think :wink:?

If you change the extension to .json and open it up in a browser you will probably be able to see all those primes. Change back to .ipynb to open up again in Jupyter.
 
  • Like
Likes   Reactions: WWGD
  • #13
Thanks again. Upon checking, I realized these were the days before I became (a bit more ) proficient with indentation issues Let's just say I have Ctrl+ C etched into my nervous system, to stop way too many infinite loops. Sadly, I am often still too impatient to sit down and write up the flowchart :(., so I keep repeating these indent mistakes at times. This means some of the prime printouts wrote out the same prime more than once. If I was disciplined-enough I would try to figure out the logic flaw. Will do it by this weeks end, when I will look into indenting more carefully. Thanks.
 

Similar threads

  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 12 ·
Replies
12
Views
11K
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
1K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K