Should I rewrite my modules in order to implement json/pickle?

  • Python
  • Thread starter Eclair_de_XII
  • Start date
  • Tags
    Modules
In summary: As you can see, there are pros and cons to every approach. The best approach is usually something that works well for the specific situation.
  • #1
Eclair_de_XII
1,083
91
TL;DR Summary
Basically, I wrote a bunch of code that web-scraped data off the internet into some objects, and stored the objects by writing them into .csv and .txt files. Is there any advantage towards storing them into .json or .pkl files, as opposed to the file types thus mentioned? I mean, I am vaguely aware that the code to store the objects into the former is much less complicated than any code used to store them into the latter. But for example, is loading objects from .json or .pkl files faster?
I'm figuratively beating myself up for not knowing about these modules when I went to write my modules. On one hand, I feel like it would be a giant hassle to rewrite them just to implement some module. One of these modules is over a thousand lines long, which might be inconsequential to professional coders, but is certainly an accomplishment for an amateur. Rewriting that specific module alone just makes me not look forward to the task that might improve how efficiently it runs; a silly thought, I should think, considering Python programs generally run slower than programs written in compiled languages such as C/++/#. On the other, I feel like it would be good practice to get into the habit of using efficient data-storage techniques.

I am aware that I've made it clear that I definitely do not want to go through with this task, and I am aware that no one is forcing me to do so except for myself. But my question remains: Would loading objects from data-persistence files be faster or slower than loading them from normal text files?
 
Technology news on Phys.org
  • #2
It wouldn’t be materially different in loading time. JSON might be a good choice if you choose a sensible structure because you could use the data in another program without having to work out how to read it.

Writing a 1,000 line module that you dread refactoring is a very bad idea. Reading and writing data should have been in a separate ‘data layer’ module that could be refactored easily.

Study abstraction and data models so you do this better next time.
 
  • Like
Likes sysprog
  • #3
Eclair_de_XII said:
Would loading objects from data-persistence files be faster or slower than loading them from normal text files?
Probably not enough to make it worth the effort to rewrite your module to use them.

There are also other considerations besides speed. CSV and TXT files are easier for humans to read (particularly CSV if the humans have a spreadsheet program handy that they can load the CSV file into). Often being able to read your data files independently of your program is very helpful for debugging. It's also helpful for others to understand what data your program is storing.
 
  • Like
Likes sysprog
  • #4
This is always the programmers dilemma. Do I use one file format or another? Do I use flat file or database? Do I use key/value or json or csv or …?

The answer usually depends many factors:

- on how complex the data is (json favors structured data) vs ease of accessing the data (csv favors easy access via spreadsheet app or python program).

- on whether the data needs to be updated periodically favors the database vs sorting the file and inserting the data vs appending the data to the front or back…

- on whether the some portion of the data needs to be selectively accessed favors a database vs reading the whole file to find the data

- on how the data will be used later on by yourself vs others vs web accessible…

This is why programmers dream, go mad, become managers, and then gods of data wrangling, forge empires and move the world…. I could go on but then I’d be making a molehill into a mountain fortress.
 
  • Like
Likes sysprog
  • #5
I don't really understand the question and it seems to me that you do not fully understand what do these MIME types represent.

.txt, .csv, or .json are all text files. What differentiates them is how the content is structured.
  • .txt is the general case and the content is a succession of characters without any specified structure;
  • .csv is text that is structured in a tabular form like a spreadsheet;
  • .json is text that represents structured data, like arrays and objects.
The only reason to use one versus another is portability. Any program knows what to expect with .csv and .json files, can quickly make sense of the info recorded, and could present a nice spreadsheet or make the data available as an array or object to easily search through it.

On the other end, .txt file could be anything, thus a program can only present it as a succession of characters and let the user deal with it. But if the way you structured your data is only known to you and only used by you, it doesn't really matter. It is just that you might have had reinvented the wheel for a structure that already exists and introduced bugs that have already been resolved with programs/modules that have been tried & tested for a very long time.

If you have stored objects in a tabular form, maybe it would be advantageous to rewrite your code. You might even find that a lot of what you have written will be replaced with a lot less code. But there is also the great advice: «If it works, don't mess with it.»
 
  • Like
Likes sysprog
  • #6
jack action said:
tabular form
Yes, most of the auxilliary data I have stored in .csv files are pd.DataFrame, pd.Series, and dict objects.
 

1. Should I rewrite my modules to implement json/pickle?

The decision to rewrite your modules to implement json/pickle should be based on several factors, such as the complexity of your data and the level of compatibility with your existing code. It may be beneficial if you plan on sharing your data with other programs or if you need to store complex data structures. However, if your data is simple and your current code works well, it may not be necessary to rewrite your modules.

2. What is the difference between json and pickle?

Json and pickle are both ways of serializing and deserializing data, but they use different formats. Json is a human-readable format that is often used for transferring data between different systems, while pickle is a binary format that is specific to python and can only be read by python programs.

3. Will rewriting my modules to implement json/pickle improve performance?

In most cases, rewriting your modules to use json/pickle will not significantly improve performance. In fact, it may even slow down your program since the data needs to be converted between formats. The performance benefit of using json/pickle is only noticeable when dealing with large or complex data structures.

4. Can I use both json and pickle in my program?

Yes, you can use both json and pickle in your program. However, it is important to keep in mind that they are not interchangeable and should be used for different purposes. Json is more suitable for sharing data with other programs, while pickle is better for storing and retrieving data within a python program.

5. Are there any security concerns when using json/pickle?

There are potential security concerns when using pickle since it can execute arbitrary code when loading data. It is important to only unpickle data from trusted sources to avoid any potential security risks. Json, on the other hand, is not as vulnerable since it only contains data and does not have the ability to execute code.

Similar threads

  • Programming and Computer Science
Replies
10
Views
4K
  • Programming and Computer Science
Replies
9
Views
852
  • Programming and Computer Science
Replies
22
Views
907
Replies
2
Views
863
  • Programming and Computer Science
Replies
8
Views
2K
  • Programming and Computer Science
Replies
6
Views
3K
  • Programming and Computer Science
Replies
5
Views
1K
  • Computing and Technology
Replies
3
Views
2K
  • STEM Academic Advising
3
Replies
71
Views
1K
Back
Top