Python Should I rewrite my modules in order to implement json/pickle?

  • Thread starter Thread starter Eclair_de_XII
  • Start date Start date
  • Tags Tags
    Modules
AI Thread Summary
The discussion centers on the challenges of rewriting a large module in Python to improve efficiency, particularly regarding data storage methods. The original poster expresses frustration over not utilizing existing modules and contemplates whether switching to data-persistence files would enhance performance compared to standard text files. It is noted that while JSON may offer structured data advantages, the loading time differences are negligible. The conversation highlights the importance of separating data handling into a dedicated module for easier refactoring and emphasizes the need to understand various file formats—.txt, .csv, and .json—each serving different purposes based on data complexity and accessibility. Ultimately, the consensus suggests that if the current implementation works, it may be best to avoid unnecessary changes, although transitioning to a more efficient structure could simplify the code significantly.
Eclair_de_XII
Messages
1,082
Reaction score
91
TL;DR Summary
Basically, I wrote a bunch of code that web-scraped data off the internet into some objects, and stored the objects by writing them into .csv and .txt files. Is there any advantage towards storing them into .json or .pkl files, as opposed to the file types thus mentioned? I mean, I am vaguely aware that the code to store the objects into the former is much less complicated than any code used to store them into the latter. But for example, is loading objects from .json or .pkl files faster?
I'm figuratively beating myself up for not knowing about these modules when I went to write my modules. On one hand, I feel like it would be a giant hassle to rewrite them just to implement some module. One of these modules is over a thousand lines long, which might be inconsequential to professional coders, but is certainly an accomplishment for an amateur. Rewriting that specific module alone just makes me not look forward to the task that might improve how efficiently it runs; a silly thought, I should think, considering Python programs generally run slower than programs written in compiled languages such as C/++/#. On the other, I feel like it would be good practice to get into the habit of using efficient data-storage techniques.

I am aware that I've made it clear that I definitely do not want to go through with this task, and I am aware that no one is forcing me to do so except for myself. But my question remains: Would loading objects from data-persistence files be faster or slower than loading them from normal text files?
 
Technology news on Phys.org
It wouldn’t be materially different in loading time. JSON might be a good choice if you choose a sensible structure because you could use the data in another program without having to work out how to read it.

Writing a 1,000 line module that you dread refactoring is a very bad idea. Reading and writing data should have been in a separate ‘data layer’ module that could be refactored easily.

Study abstraction and data models so you do this better next time.
 
Eclair_de_XII said:
Would loading objects from data-persistence files be faster or slower than loading them from normal text files?
Probably not enough to make it worth the effort to rewrite your module to use them.

There are also other considerations besides speed. CSV and TXT files are easier for humans to read (particularly CSV if the humans have a spreadsheet program handy that they can load the CSV file into). Often being able to read your data files independently of your program is very helpful for debugging. It's also helpful for others to understand what data your program is storing.
 
This is always the programmers dilemma. Do I use one file format or another? Do I use flat file or database? Do I use key/value or json or csv or …?

The answer usually depends many factors:

- on how complex the data is (json favors structured data) vs ease of accessing the data (csv favors easy access via spreadsheet app or python program).

- on whether the data needs to be updated periodically favors the database vs sorting the file and inserting the data vs appending the data to the front or back…

- on whether the some portion of the data needs to be selectively accessed favors a database vs reading the whole file to find the data

- on how the data will be used later on by yourself vs others vs web accessible…

This is why programmers dream, go mad, become managers, and then gods of data wrangling, forge empires and move the world…. I could go on but then I’d be making a molehill into a mountain fortress.
 
I don't really understand the question and it seems to me that you do not fully understand what do these MIME types represent.

.txt, .csv, or .json are all text files. What differentiates them is how the content is structured.
  • .txt is the general case and the content is a succession of characters without any specified structure;
  • .csv is text that is structured in a tabular form like a spreadsheet;
  • .json is text that represents structured data, like arrays and objects.
The only reason to use one versus another is portability. Any program knows what to expect with .csv and .json files, can quickly make sense of the info recorded, and could present a nice spreadsheet or make the data available as an array or object to easily search through it.

On the other end, .txt file could be anything, thus a program can only present it as a succession of characters and let the user deal with it. But if the way you structured your data is only known to you and only used by you, it doesn't really matter. It is just that you might have had reinvented the wheel for a structure that already exists and introduced bugs that have already been resolved with programs/modules that have been tried & tested for a very long time.

If you have stored objects in a tabular form, maybe it would be advantageous to rewrite your code. You might even find that a lot of what you have written will be replaced with a lot less code. But there is also the great advice: «If it works, don't mess with it.»
 
jack action said:
tabular form
Yes, most of the auxilliary data I have stored in .csv files are pd.DataFrame, pd.Series, and dict objects.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I have a quick questions. I am going through a book on C programming on my own. Afterwards, I plan to go through something call data structures and algorithms on my own also in C. I also need to learn C++, Matlab and for personal interest Haskell. For the two topic of data structures and algorithms, I understand there are standard ones across all programming languages. After learning it through C, what would be the biggest issue when trying to implement the same data...
Back
Top