Best practice for using GIT on a shared lab computer?

f95toli · Jan 12, 2022

Firstly, I don't expect there to a single good answer to this question. I've done some googling and it seems there are multiple options, but I am still interested in suggestions and/or what your experiences are.

We started using GIT (GITLab) a couple of years ago and it works well when we are developing SW (mainly Python for analyzing data, simulations etc ) on our office computers. However, much of the software we work on is used to control experiments in the lab. Our lab computers are (obviously) shared machines and we use a single account for all machines which everyone uses (you can't really log on or off in the middle of a measurement run and several people work on the same machines) .

My question is if someone has experience of using GIT in this scenario? If so, what is "best practice"?
We haven't really settled on a way of of working, and right now GIT Is essentially only being used as a backup system.
Obviously, this is not ideal and means that we are not really keeping track of changes.

I know we are not the only ones with this problem and after some searching I've found a number of partial workarounds. However, they all seem to be very inconvenient and/or using e.g. various Linux BASH scripts which is not really an option since we are using WIn10 machines.

I suspect a partial answer is to force the use of the -author parameter when committing,

e.g.
git commit --author="Someone Unknown <uknown@example.com>"

But ideally, I would like to be able to find a GIT Windows client that can handle this...

jedishrfu · Jan 12, 2022

There are no best practices that I know of. You should know how to clone/ fork a git project, how to checkin your changes and how push your changes to the master.

Sometimes things get out of sync and you’ll need to know how to get things resynced.

Sometimes your repo may get too large to checkin and you’ll have to rethink what you want saved by making a new repo. I had this issue saving large changing binary images which made a really large repo.

anorlunda · Jan 12, 2022

Is your problem that your uses conflict with others? Perhaps you need version N of some software, while others need version M.

f95toli · Jan 12, 2022

anorlunda said:

Is your problem that your uses conflict with others? Perhaps you need version N of some software, while others need version M.

Well, the problem is that we have multiple people using and modifying the same code (which continuously evolves during an experiment) using the same computer and the same Windows account while in the lab.
When we then go back go to the office and continue working on the same code there is no way to see who has done what or even your own changes. Forking (when needed) also gets very messy for the same reason (who created the fork?).

For most of what we do in the lab, everyone should (ideally) be using the same version of the software; and this is also used on multiple computers.

A large part of the problem is that much of the GIT configuration (including the name of the author) is tied to the computer account; a method that asked the user to identify themselves before committing would probably solve most problems. I have seen such functionality implemented under Linux but nothing that would work under windows.

FactChecker · Jan 12, 2022

f95toli said:

My question is if someone has experience of using GIT in this scenario? If so, what is "best practice"?
We haven't really settled on a way of of working, and right now GIT Is essentially only being used as a backup system.
Obviously, this is not ideal and means that we are not really keeping track of changes.

This confuses me. Doesn't Git always track changes in source code? In a test lab situation, I would check in all code used for any significant set of test runs. It was useful when looking for a change that might have caused a problem and also during audits to prove that changes were being adequately managed.

f95toli said:

I know we are not the only ones with this problem and after some searching I've found a number of partial workarounds. However, they all seem to be very inconvenient and/or using e.g. various Linux BASH scripts which is not really an option since we are using WIn10 machines.

You can use scripts on Win10 machines. Python, Perl, BAT scripts should work in a Command Prompt window.

f95toli said:

I suspect a partial answer is to force the use of the -author parameter when committing,

It might be more useful to be able to track code changes to specific change requests so that you can find the paperwork that documents the change.

f95toli said:

e.g.
git commit --author="Someone Unknown <uknown@example.com>"

This might be a good idea. We would track changes to the appropriate change request paperwork which would tell us who made the change. In reality, there were not as many people making changes at the same time as one might expect.

f95toli said:

But ideally, I would like to be able to find a GIT Windows client that can handle this...

It might have been ignorance on my part, but I never felt confident that the GUIs would give enough utility. But they were good enough for less formal use as a programmer's tool.

FactChecker · Jan 12, 2022

f95toli said:

Well, the problem is that we have multiple people using and modifying the same code (which continuously evolves during an experiment) using the same computer and the same Windows account while in the lab.
When we then go back go to the office and continue working on the same code there is no way to see who has done what or even your own changes. Forking (when needed) also gets very messy for the same reason (who created the fork?).

This sounds like a very challenging configuration management problem. If you can force all the programmers to check-in code using a Python, Perl, or .BAT script, then you can have the script prompt for the programmer name. But there might still be a lot of problems with uncoordinated code changes.

sysprog · Jan 12, 2022

Maybe you could use e.g. Mercurial (https://mercurial.selenic.com/) to implement version control locally, and run a daily script to update git.

pbuk · Jan 12, 2022

You could try creating different users under Windows Subsystem for Linux (WSL) and enforcing commits from there? Not sure how this would work with authentication to GitLab though, I can't see it working through https so you'd need ssl keys for each user on each machine.

Or you could install (probably Linux) virtual machines with individual users set up.

I don't think you will find an easy way around this because sharing accounts simply isn't best practice, or even close. It's a bit like saying 'to make it easier for everyone to get into the office we all use the same ID key card which we currently keep under the doormat; what is best practice for doing this?'

sysprog · Jan 12, 2022

@pbuk, even if it isn't 'best practice', wouldn't using Mercurial suffice to serialize updates?

pbuk · Jan 13, 2022

sysprog said:

@pbuk, even if it isn't 'best practice', wouldn't using Mercurial suffice to serialize updates?

How would that be different? Hg is not going to have any more information about the user making the commit than git would have.

sysprog · Jan 13, 2022

pbuk said:

How would that be different? Hg is not going to have any more information about the user making the commit than git would have.

They're sharing a single ID on git. They're not required to use the "author=" parm, and they're not consistently using it, so they don't always know who did what.. They could serialize and track using multiple IDs locally with Mercurial, and upload the consensus once a day.

pbuk · Jan 13, 2022

Actually I've just thought of the obvious solution: prefix all commit messages with the user name eg @pbuk: Add unit conversion.

Advantages:

easy to do
safe fallback if it is not done (unlike e.g. manually updating git user when you switch machines)
easily visible in the commit history

Disadvantages:

`git blame` and similar tools are not going to be helpful (although you could write a commit hook that changes the git user according to the commit prefix).

pbuk · Jan 13, 2022

sysprog said:

They could serialize and track using multiple IDs locally with Mercurial, and upload the consensus once a day.

But won't they still have to remember to switch Id in Mercurial, with the added complication of now having 2 different VCSs to manage?

f95toli · Jan 13, 2022

Thanks for all comments/suggestions

pbuk said:

Actually I've just thought of the obvious solution: prefix all commit messages with the user name eg @pbuk: Add unit conversion.

Advantages:

easy to do

safe fallback if it is not done (unlike e.g. manually updating git user when you switch machines)

easily visible in the commit history

Disadvantages:

`git blame` and similar tools are not going to be helpful (although you could write a commit hook that changes the git user according to the commit prefix).

yes, that might be the easiest solution.
It would be nice if there was a more "automated" solution, but for now that might have to do

pbuk said:

I don't think you will find an easy way around this because sharing accounts simply isn't best practice, or even close. It's a bit like saying 'to make it easier for everyone to get into the office we all use the same ID key card which we currently keep under the doormat; what is best practice for doing this?'

Indeed, but shared accounts are unavoidable in a lab setting (and in many other settings as well).
We can't have a piece of software that is controlling a large experimental setup that is used by multiple people and is running 24/7 be associated with a single user.

It would be nice there was a "switch user" functionality in Windows which allowed multiple people to share the same "Instance" of the Windows desktop but where Windows was still "aware" of who the current user was and could therefore control permissions etc.
But since this is not possible we have to find workarounds.

pbuk · Jan 13, 2022

f95toli said:

It would be nice if there was a more "automated" solution, but for now that might have to do

As I say you could automate it with a commit hook, or even a script that hacks the commit history after the event.

f95toli said:

Indeed, but shared accounts are unavoidable in a lab setting (and in many other settings as well).
We can't have a piece of software that is controlling a large experimental setup that is used by multiple people and is running 24/7 be associated with a single user.

But that's exactly what you do have - it's just that you have different people pretending to be that 'single user'. You shouldn't really have this control software running in userspace at all - the right way to deal with this would be for the control software to run as a service in the background. Users would log in individually using their own Windows accounts.

f95toli · Jan 13, 2022

pbuk said:

But that's exactly what you do have - it's just that you have different people pretending to be that 'single user'. You shouldn't really have this control software running in userspace at all - the right way to deal with this would be for the control software to run as a service in the background. Users would log in individually using their own Windows accounts.

"Control software" in this case means scripts (usually a Jupyter Notebook or a Matlab script) either controlling and acquiring data from a bunch of instruments (waverform generators, digitizers, oscilloscopes, spectrum analyzers et) directly via Ethernet/USB or controlling a "virtual" instrument frontpanel provided by the instrument manufacturer (which in turn controls the electronics in the instrument, this is becoming more common). This is the suite of SW we often need to develop/modify while in the lab (because you need to be able physically monitor what happens when you run your code by e.g. looking at an oscilloscope).

While the measurement is running we usually also need to plot lot of diagnostic information in various graphs. Everyone who is in the lab needs to be able to see this information.
Obviously, a single measurement instrument can only do one thing at a time; meaning we can't have multiple users trying to use the same setup at the same time.

Hence, whereas I agree that a "service" model would in theory be better; it is usually not realistic.

That said, some of our newer instrument can in fact be set up so one computer runs a server which acts as a "master instrument" and then you can have multiple clients connecting via TCP/IP
However, this is still "blocking" action since we can not -as mentioned above not- have several people using the same measurement setup at the same time (there are also very good reasons for why you don't want instruments connected to very sensitive samples suddenly jumping between settings).

It is possible to set up a "cloud access" model for this with a queue system etc (something similar to IBM's Qiskit platform). However, such platforms are all proprietary and there will still only be one instance of the the software handling the low level control; the latter is the type of SW we are working on my lab.

Twigg · Jan 13, 2022

f95toli said:

However, this is still "blocking" action since we can not -as mentioned above not- have several people using the same measurement setup at the same time (there are also very good reasons for why you don't want instruments connected to very sensitive samples suddenly jumping between settings).

Sorry, I don't understand the problem. You say that multiple users cannot control the "master" computer at once. Isn't that a feature, not a bug (for the reasons you list)? What's wrong with that model?

sysprog · Jan 13, 2022

pbuk said:

But won't they still have to remember to switch Id in Mercurial, with the added complication of now having 2 different VCSs to manage?

No, each participant would use his own Mercurial ID, and only the faculty supervisor would have the password to the git account. That's simple enough. I imagine that requiring every participant to have his own git account might be exceeding the academic authority.

f95toli · Jan 13, 2022

Twigg said:

Sorry, I don't understand the problem. You say that multiple users cannot control the "master" computer at once. Isn't that a feature, not a bug (for the reasons you list)? What's wrong with that model?

I did not mean that it was a problem; I was just trying to explain why there usually isn't much point in running control software as a service as pbuk suggested.
I should have explained it better.

sysprog · Jan 13, 2022

f95toli said:

twigg said:

Sorry, I don't understand the problem. You say that multiple users cannot control the "master" computer at once. Isn't that a feature, not a bug (for the reasons you list)? What's wrong with that model?

I did not mean that it was a problem; I was just trying to explain why there usually isn't much point in running control software as a service as pbuk suggested.
I should have explained it better.

The single-threaded access to the 'master' computer provides adequate serialization; however, it does nothing to track who did what, so that participants can communicate collaboratively.

Twigg · Jan 14, 2022

Do you intend to have users remote connect to the master computer (e.g., each user AnyDesk/TeamViewer's into the master computer and runs the script from there) or do you intend to have users run the script locally on their own machines, using the resources/peripherals of the master computer shared over TCP/IP?

In the latter case, I believe you do not need everyone to share one account. No?

Best practice for using GIT on a shared lab computer?

1. How can I safely share my code on a lab computer using GIT?

2. Can I use a single remote repository for multiple lab computers?

3. How can I avoid conflicts when multiple users are working on the same file?

4. Is it necessary to create a README file for each project on the lab computer?

5. How do I handle sensitive information in a shared lab computer environment?

Similar threads

Hot Threads

Recent Insights