Writing modules that require supplementary files

  • Context: Python 
  • Thread starter Thread starter Eclair_de_XII
  • Start date Start date
  • Tags Tags
    files Modules Writing
Click For Summary

Discussion Overview

The discussion revolves around the challenges of managing supplementary files required by Python modules, particularly in terms of user accessibility and file management. Participants explore various strategies for file inclusion, user permissions, and the implications of automatic downloads versus user-directed file placement.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant expresses concern about whether to trust users to place supplementary files correctly or to automate the download process, weighing the risks of user error against potential distrust of automatic downloads.
  • Another participant highlights the importance of startup costs and user wait times, suggesting that background downloads could improve user experience.
  • A participant notes that their program requires a set of files to function and cannot run without them, emphasizing the necessity of file availability.
  • Suggestions include allowing users to specify file paths through command line options or using environment variables to manage file locations.
  • One participant proposes packaging data files within the module's directory structure and using setup tools to ensure proper installation.
  • Another participant mentions the potential for pre-processing data files to enhance performance and suggests using binary memory-mapped files for efficiency.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best approach to managing supplementary files, with multiple competing views on user trust, file management strategies, and the implications of automatic downloads versus manual placement.

Contextual Notes

Some limitations include the dependency on user permissions for file placement, the need for clear instructions, and the potential for clutter in the filesystem if files are saved inappropriately. Additionally, the discussion does not resolve the best practices for handling varying user environments or preferences.

Eclair_de_XII
Messages
1,082
Reaction score
91
TL;DR
Let's say we have a module that allows the user to view sunset data, rent rates, and economic data for various regions of the United States. Let's say I have that data compiled. Would it be better for that data to be automatically be downloaded when the program is run, or should it be included with the module with instructions on where to put it?
I'm working with Python modules at the mo', and I am having trouble trying to decide how best to include supplementary material that is accessed by my module, but not necessarily a part of it. The program works alright when accessing it from the IDE, but it fails to recognize the files when I'm the program is being run from the shell, because these files are referenced with relative paths. Let's say I want an end-user to access these files. Do I trust him or her to follow my instructions on where to place the files, or do I write more script to automatically download the files to the right locations for him or her?

1. On one hand, the user will not always know where to place the files and if they might trick the module by placing a given file with the same name in the right location, with incorrect contents.

2. On the other hand, the user might not be trusting of programs that automatically download things to their computer, even if they are prompted to confirm the download.

I feel like option 1 is better, because the user can view the contents of the file(s) beforehand, and the instructions are explicit enough:

Python:
from os import getcwd

folder='sunset_data'
message='Please place the \"%s\" folder into %s'%(folder,getcwd())
print(message)
 
Last edited:
Technology news on Phys.org
These are all good questions that only you can answer based on feedback from users.

I usually consider startup costs as a key decision point. You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

for deciding on where files are saved, you could tell the user they will be saved in a subdirectory under the home directory, or in a subdirectory under the current directory or in a subdirectory in temp. Each has some pros and cons. For a single user, it makes sense to create it under home or under tmp. For a multiuser computer, saving per user under home is best unless the data would be the same for all users then tmp would be a good place.

saving under the current directory can lead to your program pooping data files everywhere cluttering up the filesystem so its not the best way to go.

You could provide user options to ask each time or just once when you need permission to place files on their machine. You could provide options of where and when to save the data and when to expire it.
 
  • Like
Likes   Reactions: Eclair_de_XII
jedishrfu said:
You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

My program has to read a collection of twenty-something files, all of my creation. It does not produce any additional file output; it only shows information calculated from the data files. Since it depends entirely on these supplementary files, I do not think it could even run until the files are available.
 
@jedishrfu

You know, I think I had already made up my mind before I made this topic. So I appreciate the input and I will keep it in mind for future coding projects. But for this particular one, I do not think it would be a great fit. Thank you for your help and insight.
 
You could read in the file in the background and display a progress bar and partial data while they are being read. Giving the user some feedback prevents a user panic that the program is hung or something else is wrong.
 
  • Like
Likes   Reactions: Eclair_de_XII
Is this for Windows? For Windows, you can define an .msi file that will install things (from a .zip file?) where they should go. I don't know how .exe files do the same thing, but they can.
 
Pergpas have a command line option which allows the user to specify a path to the datafile (as a single zipped archive; your program can unzip it as part of the load process) with a default to the current directory if the option is missing.
 
Let's say I want to let the user specify where he or she wants the data files. My code is hard-wired to look for the files in question in the paths relative to the shell's cwd. Would I have to change every line of code that reads these files from the relative paths, or is there an easier way? I ask out of pure curiosity; I'm still ambivalent on whether or not I wish to let the user decide where to save the supplementary files.
 
You could use an environment variable to hold the base directory of where you will store the files and if not present use a relative directory relative to your current directory or relative to where your program resides.

You should play with these ideas and let your users try them out. We really can't answer this question well here.

As an exampe, you could see how other systems do this. One example is golang aka go which has two defined environment parameters GOROOT and GOPATH for its go command.

GOROOT locates the program and its related code and data files usually defaulted to /usr/local/go. GOPATH locates where it will store vendor code downloaded from the internet usually defaulted ~/go.

If a user types "go get github.com/gizak/termui" the go command will download termui code from github into the ~/go/src (GOPATH) directory stored as ~/go/src/github.com/gizak/termui.
 
Last edited:
  • #10
Based on your description, it appears these are data files that should belong with and be distributed with your module. In that case place them within a subdirectory relative to your module. For example:

Code:
/package/mymodule.py
/package/data/file1.dat
/package/data/file2.dat
/package/data/file...dat

From within your module. You can get the path to the data using something like:

Python:
DATA_DIR = os.path.dirname(__file__)
data1_filename = os.path.join(DATA_DIR, 'file1.dat')
...

If you distribute your module, you can specify data files within the setup.py file using the package_data attribute to make sure these files are installed in the appropriate location.

This is the recommended approach to use if your data files do not change frequently for a given module version. But if you are using different data files each time the application runs, then this approach won't work and you will have to download them each time either to the users current working directory (not a good idea since they may not have write permissions to it) or to a temporary directory which you create (see https://docs.python.org/3/library/tempfile.html).

One other point, if you do a lot of pre-processing of data files that is the same from instance to instance, it would be wise to run this in advance and save the processed files rather than the raw data files, to speed up load time. Also consider using a binary memory-mapped file for saving the data, as this can speed up load times dramatically.
 
  • Like
Likes   Reactions: jedishrfu

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
1K
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
28K
  • · Replies 6 ·
Replies
6
Views
37K
  • · Replies 4 ·
Replies
4
Views
9K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 3 ·
Replies
3
Views
6K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 18 ·
Replies
18
Views
7K