Overly large Python Jupyter Notebook (.ipynb) files

  • Context: Python 
  • Thread starter Thread starter WWGD
  • Start date Start date
  • Tags Tags
    files Python
Click For Summary
SUMMARY

The discussion centers on the unexpectedly large sizes of certain Python Jupyter Notebook (.ipynb) files, specifically two notebooks measuring 77.3 MB and 91 MB, while others range from 10 KB to 284 KB. Users identified that the large file sizes could be attributed to saved outputs, including extensive data or images generated during notebook execution. To reduce file sizes, it is recommended to use the "Restart and clear output" option in the Kernel menu before saving. Additionally, the conversation touches on managing Jupyter Notebook processes in a Windows environment.

PREREQUISITES
  • Understanding of Jupyter Notebook interface and functionality
  • Familiarity with Python programming and code execution
  • Knowledge of managing processes in a Windows environment
  • Basic understanding of file formats, specifically .ipynb and .json
NEXT STEPS
  • Learn how to use the "Restart and clear output" feature in Jupyter Notebook
  • Explore methods for optimizing Jupyter Notebook file sizes
  • Investigate the impact of output data on .ipynb file sizes
  • Understand how to convert .ipynb files to .json for inspection
USEFUL FOR

This discussion is beneficial for data scientists, Python developers, and anyone utilizing Jupyter Notebooks for data analysis or educational purposes, particularly those facing issues with large notebook file sizes.

WWGD
Science Advisor
Homework Helper
Messages
7,777
Reaction score
13,011
TL;DR
For some strange-seeming reasons, some of my Python files are extremely larger than others with similar content
Hi all,
I was looking up my virtual file manager in Python Jupyter and in the listing of notebooks; all similar to each other in size and scope , some files
stand out in terms of size for no apparent reasons. I have some 40 notebooks ; all- but- 2 ranging from 10kb to 284kb at the extremes , and two others, notebooks as well, with the same .ipynb extension with sizes 77.3 mb and 91mb respectively. As I said, the latter two are very similar to the other 38: regular notebooks with Python code. Why would these two files be so much larger than the other 38?

EDIT: I hope its ok to post an additional question in the same post:
How do we do a Ctrl +alt+ Delete within a Python Notebook? The think is this notebook is part of
a virtual back end server and not part of the physical machine.
 
Last edited:
Technology news on Phys.org
WWGD said:
Summary:: For some strange-seeming reasons, some of my Python files are extremely larger than others with similar content

Hi all,be so
I was looking up my virtual file manager in Python Jupyter and in the listing of notebooks; all similar to each other in size and scope , some files
stand out in terms of size for no apparent reasons. I have some 40 notebooks ; all- but- 2 ranging from 10kb to 284kb at the extremes , and two others, notebooks as well, with the same .ipynb extension with sizes 77.3 mb and 91mb respectively. As I said, the latter two are very similar to the other 38: regular notebooks with Python code. Why would these two files be so much larger than the other 38?
Do the large ones contain images from e.g. matplotlib?

WWGD said:
EDIT: I hope its ok to post an additional question in the same post:

WWGD said:
How do we do a Ctrl +alt+ Delete within a Python Notebook? The think is this notebook is part of
a virtual back end server and not part of the physical machine.
Windows, Linux, iOS...? Also how did you start Jupyter - from the command line, Anaconda Navigator...?

Try visiting http://localhost:8888/tree#running in a browser and selecting 'shutdown' on the appropriate process.

If Windows: try switching to the terminal window running the Jupyter Notebook process with Alt-Tab then hit Ctrl-C.
 
Last edited:
  • Like
Likes   Reactions: WWGD
When you save the notebook, it saves the output of all of the cells, which may include large amounts of data or images. That's why the files are so large. If, before you save the file, you go to the Kernel tab and click "Restart and clear output", it will clear all of the output in the cells. Then when you save it, you will just be saving the code in the cells, not the output, which will probably make the file much smaller.
 
  • Like
Likes   Reactions: WWGD
pbuk said:
Do the large ones contain images from e.g. matplotlib?

Windows, Linux, iOS...? Also how did you start Jupyter - from the command line, Anaconda Navigator...?

Try visiting http://localhost:8888/tree#running in a browser and selecting 'shutdown' on the appropriate process.

If Windows: try switching to the terminal window running the Jupyter Notebook process with Alt-Tab then hit Ctrl-C.
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
 
WWGD said:
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
My largest python source code file (of 55 different source code files) is 2 KB. For my purposes of merely learning python syntax I don't need or use any IDE -- I open a Win 10 command prompt window and run python.exe from that window.
 
  • Like
Likes   Reactions: WWGD
Mark44 said:
My largest python source code file (of 55 different source code files) is 2 KB. For my purposes of merely learning python syntax I don't need or use any IDE -- I open a Win 10 command prompt window and run python.exe from that window.
I am too used to the Jupyter notebook interface. But , thanks, I will consider that.
 
WWGD said:
Thank you for your reply. I am using Windows 10 and not quite from either of your options. I enter 'Jupyter' into the search box, then the OS accesses the command line and gives me access through localhost 8888. I don't have any plots at all, I don't use anything other than python proper in my overly large files.
So does the http://localhost:8888/tree#running -> Shutdown method work for you?

There is no virtual machine involved, just jupyter running in the background running your python programs in a shell and a web server as an interface.
 
  • Like
Likes   Reactions: WWGD
WWGD said:
I don't have any plots at all, I don't use anything other than python proper in my overly large files.
This code will generate a pretty large .ipynb file:
[CODE lang="python" title="Infinite loop"]while True:
print(1)[/CODE]
 
  • Like
Likes   Reactions: WWGD
pbuk said:
So does the http://localhost:8888/tree#running -> Shutdown method work for you?

There is no virtual machine involved, just jupyter running in the background running your python programs in a shell and a web server as an interface.
Thanks, my bad. I meant a virtual server at local host. Thanks for the suggestion. I haven't gotten to my pc yet, will let you know.
 
  • #10
pbuk said:
This code will generate a pretty large .ipynb file:
[CODE lang="python" title="Infinite loop"]while True:
print(1)[/CODE]
Hmm.. I am remembering now I did several copies ( for practice) of an algorithm to print all primes in a given range. It was from 2 to around 10,000. Maybe that explains it.
 
  • #11
Still nothing virtual. It's all running in the same Windows kernel in multiple processes spawned by the Jupyter server.
 
  • Like
Likes   Reactions: WWGD
  • #12
WWGD said:
Hmm.. I am remembering now I did several copies ( for practice) of an algorithm to print all primes in a given range. It was from 2 to around 10,000. Maybe that explains it.
Ya think :wink:?

If you change the extension to .json and open it up in a browser you will probably be able to see all those primes. Change back to .ipynb to open up again in Jupyter.
 
  • Like
Likes   Reactions: WWGD
  • #13
Thanks again. Upon checking, I realized these were the days before I became (a bit more ) proficient with indentation issues Let's just say I have Ctrl+ C etched into my nervous system, to stop way too many infinite loops. Sadly, I am often still too impatient to sit down and write up the flowchart :(., so I keep repeating these indent mistakes at times. This means some of the prime printouts wrote out the same prime more than once. If I was disciplined-enough I would try to figure out the logic flaw. Will do it by this weeks end, when I will look into indenting more carefully. Thanks.
 

Similar threads

  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 12 ·
Replies
12
Views
10K
Replies
3
Views
3K
  • · Replies 7 ·
Replies
7
Views
1K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K