Python Should I rewrite my modules in order to implement json/pickle?

Eclair_de_XII · Sep 22, 2021

I'm figuratively beating myself up for not knowing about these modules when I went to write my modules. On one hand, I feel like it would be a giant hassle to rewrite them just to implement some module. One of these modules is over a thousand lines long, which might be inconsequential to professional coders, but is certainly an accomplishment for an amateur. Rewriting that specific module alone just makes me not look forward to the task that might improve how efficiently it runs; a silly thought, I should think, considering Python programs generally run slower than programs written in compiled languages such as C/++/#. On the other, I feel like it would be good practice to get into the habit of using efficient data-storage techniques.

I am aware that I've made it clear that I definitely do not want to go through with this task, and I am aware that no one is forcing me to do so except for myself. But my question remains: Would loading objects from data-persistence files be faster or slower than loading them from normal text files?

pbuk · Sep 22, 2021

It wouldn’t be materially different in loading time. JSON might be a good choice if you choose a sensible structure because you could use the data in another program without having to work out how to read it.

Writing a 1,000 line module that you dread refactoring is a very bad idea. Reading and writing data should have been in a separate ‘data layer’ module that could be refactored easily.

Study abstraction and data models so you do this better next time.

PeterDonis · Sep 22, 2021

Eclair_de_XII said:

Would loading objects from data-persistence files be faster or slower than loading them from normal text files?

Probably not enough to make it worth the effort to rewrite your module to use them.

There are also other considerations besides speed. CSV and TXT files are easier for humans to read (particularly CSV if the humans have a spreadsheet program handy that they can load the CSV file into). Often being able to read your data files independently of your program is very helpful for debugging. It's also helpful for others to understand what data your program is storing.

jedishrfu · Sep 22, 2021

This is always the programmers dilemma. Do I use one file format or another? Do I use flat file or database? Do I use key/value or json or csv or …?

The answer usually depends many factors:

- on how complex the data is (json favors structured data) vs ease of accessing the data (csv favors easy access via spreadsheet app or python program).

- on whether the data needs to be updated periodically favors the database vs sorting the file and inserting the data vs appending the data to the front or back…

- on whether the some portion of the data needs to be selectively accessed favors a database vs reading the whole file to find the data

- on how the data will be used later on by yourself vs others vs web accessible…

This is why programmers dream, go mad, become managers, and then gods of data wrangling, forge empires and move the world…. I could go on but then I’d be making a molehill into a mountain fortress.

jack action · Sep 22, 2021

I don't really understand the question and it seems to me that you do not fully understand what do these MIME types represent.

.txt, .csv, or .json are all text files. What differentiates them is how the content is structured.

.txt is the general case and the content is a succession of characters without any specified structure;
.csv is text that is structured in a tabular form like a spreadsheet;
.json is text that represents structured data, like arrays and objects.

The only reason to use one versus another is portability. Any program knows what to expect with .csv and .json files, can quickly make sense of the info recorded, and could present a nice spreadsheet or make the data available as an array or object to easily search through it.

On the other end, .txt file could be anything, thus a program can only present it as a succession of characters and let the user deal with it. But if the way you structured your data is only known to you and only used by you, it doesn't really matter. It is just that you might have had reinvented the wheel for a structure that already exists and introduced bugs that have already been resolved with programs/modules that have been tried & tested for a very long time.

If you have stored objects in a tabular form, maybe it would be advantageous to rewrite your code. You might even find that a lot of what you have written will be replaced with a lot less code. But there is also the great advice: «If it works, don't mess with it.»

Eclair_de_XII · Sep 22, 2021

jack action said:

tabular form

Yes, most of the auxilliary data I have stored in .csv files are pd.DataFrame, pd.Series, and dict objects.

Python Should I rewrite my modules in order to implement json/pickle?

Thread 'Claude used to facilitate a cyberattack'

Similar threads

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Sequential Analog Computers?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers