Python Writing modules that require supplementary files

AI Thread Summary
When developing Python modules that require supplementary files, a key challenge is managing file accessibility for end-users, especially when using relative paths. Users may struggle to place files correctly, and automatic downloads can raise security concerns. It is suggested to provide clear instructions for file placement or allow users to specify file locations, potentially using environment variables for flexibility. Distributing data files alongside the module and utilizing setup.py for installation is recommended if the files remain consistent across versions. For efficiency, pre-processing data files and considering memory-mapped files can enhance load times.
Eclair_de_XII
Messages
1,082
Reaction score
91
TL;DR Summary
Let's say we have a module that allows the user to view sunset data, rent rates, and economic data for various regions of the United States. Let's say I have that data compiled. Would it be better for that data to be automatically be downloaded when the program is run, or should it be included with the module with instructions on where to put it?
I'm working with Python modules at the mo', and I am having trouble trying to decide how best to include supplementary material that is accessed by my module, but not necessarily a part of it. The program works alright when accessing it from the IDE, but it fails to recognize the files when I'm the program is being run from the shell, because these files are referenced with relative paths. Let's say I want an end-user to access these files. Do I trust him or her to follow my instructions on where to place the files, or do I write more script to automatically download the files to the right locations for him or her?

1. On one hand, the user will not always know where to place the files and if they might trick the module by placing a given file with the same name in the right location, with incorrect contents.

2. On the other hand, the user might not be trusting of programs that automatically download things to their computer, even if they are prompted to confirm the download.

I feel like option 1 is better, because the user can view the contents of the file(s) beforehand, and the instructions are explicit enough:

Python:
from os import getcwd

folder='sunset_data'
message='Please place the \"%s\" folder into %s'%(folder,getcwd())
print(message)
 
Last edited:
Technology news on Phys.org
These are all good questions that only you can answer based on feedback from users.

I usually consider startup costs as a key decision point. You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

for deciding on where files are saved, you could tell the user they will be saved in a subdirectory under the home directory, or in a subdirectory under the current directory or in a subdirectory in temp. Each has some pros and cons. For a single user, it makes sense to create it under home or under tmp. For a multiuser computer, saving per user under home is best unless the data would be the same for all users then tmp would be a good place.

saving under the current directory can lead to your program pooping data files everywhere cluttering up the filesystem so its not the best way to go.

You could provide user options to ask each time or just once when you need permission to place files on their machine. You could provide options of where and when to save the data and when to expire it.
 
  • Like
Likes Eclair_de_XII
jedishrfu said:
You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

My program has to read a collection of twenty-something files, all of my creation. It does not produce any additional file output; it only shows information calculated from the data files. Since it depends entirely on these supplementary files, I do not think it could even run until the files are available.
 
@jedishrfu

You know, I think I had already made up my mind before I made this topic. So I appreciate the input and I will keep it in mind for future coding projects. But for this particular one, I do not think it would be a great fit. Thank you for your help and insight.
 
You could read in the file in the background and display a progress bar and partial data while they are being read. Giving the user some feedback prevents a user panic that the program is hung or something else is wrong.
 
  • Like
Likes Eclair_de_XII
Is this for Windows? For Windows, you can define an .msi file that will install things (from a .zip file?) where they should go. I don't know how .exe files do the same thing, but they can.
 
Pergpas have a command line option which allows the user to specify a path to the datafile (as a single zipped archive; your program can unzip it as part of the load process) with a default to the current directory if the option is missing.
 
Let's say I want to let the user specify where he or she wants the data files. My code is hard-wired to look for the files in question in the paths relative to the shell's cwd. Would I have to change every line of code that reads these files from the relative paths, or is there an easier way? I ask out of pure curiosity; I'm still ambivalent on whether or not I wish to let the user decide where to save the supplementary files.
 
You could use an environment variable to hold the base directory of where you will store the files and if not present use a relative directory relative to your current directory or relative to where your program resides.

You should play with these ideas and let your users try them out. We really can't answer this question well here.

As an exampe, you could see how other systems do this. One example is golang aka go which has two defined environment parameters GOROOT and GOPATH for its go command.

GOROOT locates the program and its related code and data files usually defaulted to /usr/local/go. GOPATH locates where it will store vendor code downloaded from the internet usually defaulted ~/go.

If a user types "go get github.com/gizak/termui" the go command will download termui code from github into the ~/go/src (GOPATH) directory stored as ~/go/src/github.com/gizak/termui.
 
Last edited:
  • #10
Based on your description, it appears these are data files that should belong with and be distributed with your module. In that case place them within a subdirectory relative to your module. For example:

Code:
/package/mymodule.py
/package/data/file1.dat
/package/data/file2.dat
/package/data/file...dat

From within your module. You can get the path to the data using something like:

Python:
DATA_DIR = os.path.dirname(__file__)
data1_filename = os.path.join(DATA_DIR, 'file1.dat')
...

If you distribute your module, you can specify data files within the setup.py file using the package_data attribute to make sure these files are installed in the appropriate location.

This is the recommended approach to use if your data files do not change frequently for a given module version. But if you are using different data files each time the application runs, then this approach won't work and you will have to download them each time either to the users current working directory (not a good idea since they may not have write permissions to it) or to a temporary directory which you create (see https://docs.python.org/3/library/tempfile.html).

One other point, if you do a lot of pre-processing of data files that is the same from instance to instance, it would be wise to run this in advance and save the processed files rather than the raw data files, to speed up load time. Also consider using a binary memory-mapped file for saving the data, as this can speed up load times dramatically.
 
  • Like
Likes jedishrfu

Similar threads

Replies
8
Views
2K
Replies
2
Views
28K
Replies
6
Views
37K
Replies
4
Views
9K
Replies
18
Views
7K
Back
Top