Writing modules that require supplementary files

Eclair_de_XII · Dec 5, 2020

I'm working with Python modules at the mo', and I am having trouble trying to decide how best to include supplementary material that is accessed by my module, but not necessarily a part of it. The program works alright when accessing it from the IDE, but it fails to recognize the files when I'm the program is being run from the shell, because these files are referenced with relative paths. Let's say I want an end-user to access these files. Do I trust him or her to follow my instructions on where to place the files, or do I write more script to automatically download the files to the right locations for him or her?

1. On one hand, the user will not always know where to place the files and if they might trick the module by placing a given file with the same name in the right location, with incorrect contents.

2. On the other hand, the user might not be trusting of programs that automatically download things to their computer, even if they are prompted to confirm the download.

I feel like option 1 is better, because the user can view the contents of the file(s) beforehand, and the instructions are explicit enough:

Python:

from os import getcwd

folder='sunset_data'
message='Please place the \"%s\" folder into %s'%(folder,getcwd())
print(message)

jedishrfu · Dec 5, 2020

These are all good questions that only you can answer based on feedback from users.

I usually consider startup costs as a key decision point. You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

for deciding on where files are saved, you could tell the user they will be saved in a subdirectory under the home directory, or in a subdirectory under the current directory or in a subdirectory in temp. Each has some pros and cons. For a single user, it makes sense to create it under home or under tmp. For a multiuser computer, saving per user under home is best unless the data would be the same for all users then tmp would be a good place.

saving under the current directory can lead to your program pooping data files everywhere cluttering up the filesystem so its not the best way to go.

You could provide user options to ask each time or just once when you need permission to place files on their machine. You could provide options of where and when to save the data and when to expire it.

Eclair_de_XII · Dec 5, 2020

jedishrfu said:

You don’t want the user to wait too long for your program to startup. Similarly, during program execution you have to decide how long the user might wait for you to do a download. If you can do downloads in the background that’s better.

My program has to read a collection of twenty-something files, all of my creation. It does not produce any additional file output; it only shows information calculated from the data files. Since it depends entirely on these supplementary files, I do not think it could even run until the files are available.

Eclair_de_XII · Dec 5, 2020

@jedishrfu

You know, I think I had already made up my mind before I made this topic. So I appreciate the input and I will keep it in mind for future coding projects. But for this particular one, I do not think it would be a great fit. Thank you for your help and insight.

jedishrfu · Dec 5, 2020

You could read in the file in the background and display a progress bar and partial data while they are being read. Giving the user some feedback prevents a user panic that the program is hung or something else is wrong.

FactChecker · Dec 6, 2020

Is this for Windows? For Windows, you can define an .msi file that will install things (from a .zip file?) where they should go. I don't know how .exe files do the same thing, but they can.

pasmith · Dec 6, 2020

Pergpas have a command line option which allows the user to specify a path to the datafile (as a single zipped archive; your program can unzip it as part of the load process) with a default to the current directory if the option is missing.

Eclair_de_XII · Dec 7, 2020

Let's say I want to let the user specify where he or she wants the data files. My code is hard-wired to look for the files in question in the paths relative to the shell's cwd. Would I have to change every line of code that reads these files from the relative paths, or is there an easier way? I ask out of pure curiosity; I'm still ambivalent on whether or not I wish to let the user decide where to save the supplementary files.

jedishrfu · Dec 7, 2020

You could use an environment variable to hold the base directory of where you will store the files and if not present use a relative directory relative to your current directory or relative to where your program resides.

You should play with these ideas and let your users try them out. We really can't answer this question well here.

As an exampe, you could see how other systems do this. One example is golang aka go which has two defined environment parameters GOROOT and GOPATH for its go command.

GOROOT locates the program and its related code and data files usually defaulted to /usr/local/go. GOPATH locates where it will store vendor code downloaded from the internet usually defaulted ~/go.

If a user types "go get github.com/gizak/termui" the go command will download termui code from github into the ~/go/src (GOPATH) directory stored as ~/go/src/github.com/gizak/termui.

lodbrok · Dec 15, 2020

Based on your description, it appears these are data files that should belong with and be distributed with your module. In that case place them within a subdirectory relative to your module. For example:

Code:

/package/mymodule.py
/package/data/file1.dat
/package/data/file2.dat
/package/data/file...dat

From within your module. You can get the path to the data using something like:

Python:

DATA_DIR = os.path.dirname(__file__)
data1_filename = os.path.join(DATA_DIR, 'file1.dat')
...

If you distribute your module, you can specify data files within the setup.py file using the package_data attribute to make sure these files are installed in the appropriate location.

This is the recommended approach to use if your data files do not change frequently for a given module version. But if you are using different data files each time the application runs, then this approach won't work and you will have to download them each time either to the users current working directory (not a good idea since they may not have write permissions to it) or to a temporary directory which you create (see https://docs.python.org/3/library/tempfile.html).

One other point, if you do a lot of pre-processing of data files that is the same from instance to instance, it would be wise to run this in advance and save the processed files rather than the raw data files, to speed up load time. Also consider using a binary memory-mapped file for saving the data, as this can speed up load times dramatically.

Writing modules that require supplementary files

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect