Git, staging and committing files

  • Thread starter fog37
  • Start date
  • #1
fog37
1,568
108
TL;DR Summary
Git, staging and committing files
Hello,
I started learning Git and I am very clear on some of the steps in the Git workflow. For example, we initialize a folder to be a repository with
Code:
git init

We place a file called file1.txt inside the folder, make some changes to it and stage it, i.e. we make a copy of the file in the staging area using
Code:
git add file1.txt
We can then complete the process making a commit using
Code:
git commit file1.txt
Along the way, we use commands like
Code:
git log
,
Code:
git status
...

I am not sure what do to from here. Is the command
Code:
git diff
with certain options the command to use? I guess we want to compare the file in the working directory with saved/committed versions of it....

Why is it important to use
Code:
git diff --cashed
to compare what is in the staging area with what has been committed?

At the end of the day, once we are happy with the all the changes to a file, do we use
Code:
git checkpoint hash
to take the commit/save version back into the working directory and use it/email it out?

Thank you
 
Last edited:
Technology news on Phys.org
  • #2
After you commit files, the next common command that most people will do is to push it to some remote repository, for both backup/storage as well as sharing code with other people (they can pull down from the remote repository)

git diff alone will show you the changes in all uncommitted files that you currently have. If you specify a file name such a git diff ./path/to/file.java even if it's already been committed, it will show you the difference between the most recent version of that file, and the the version before that, showing you what has changed.

The difference between git diff and git diff --cached are only that one gives you the full diff of un-staged files in the working directory you're in, the other one gives you the diff of any file that are staged to be committed (after a git add filename occurs, but before a commit has occurred)

I've personally never used a git checkpoint command in my 10+ years of software development. Typically after you're done writing code for the day, you'll either commit to the main/master branch directly, or you've committed it into a new branch, then push the changes to a remote repo as I described earlier with a git push.

I hope this helps!

Edit: Github.com has an excellent resource online about how to use git here: https://skills.github.com/
 
Last edited:
  • #3
weller said:
The difference between git diff and git diff --cached are only that one gives you the full diff of the working directory you're in, the other one gives you the diff of any file that are staged to be committed (after a git add filename occurs, but before a commit has occurred)
This isn't quite correct. Once you stage a file to be committed, git diff does not show you differences in that file any longer; only git diff --cached does. So if you have some files that are staged and some that are not, neither of those commands will show you all the differences.
 
  • #4
Ah thanks for the clarification!
 
  • #5
fog37 said:
I guess we want to compare the file in the working directory with saved/committed versions of it....
Per my previous post just now, git diff only looks at changed files in the working directory that are not staged for commit. git diff --cached only looks at changed files in the working directory that are staged for commit. "Changed" for both commands means relative to the last commit.

git commit has the -a option if you want to combine the "staging" and "committing" operations into one. I use this quite a lot for small changes to save time.

fog37 said:
take the commit/save version back into the working directory and use it/email it out?
The commit/save version already is in your working directory. git commit doesn't change your working directory at all. It just updates what is actually committed (stored in the repository) to match your working directory. So you can use the file in your working directory without having to do anything else.

As for emailing files out, the whole point of a distributed VCS like git is to not have to share files that way--instead, just have a clone of your repository in some place where both you and the person you want to share files with can access it.
 
  • #6
In regards to git and the working directory: if I place 3 files in the directory that represents the repo (because it has the .git folder in it), and I type ##ls##, I would see a list of those physical files.

However, if I created a new branch, it is possible, after typing ##ls##, to not see the same files...Why is that?

Does Git create a different working directory (working tree) for different branches? What am I missing? I though the working directory was the physical Windows directory with my 3 files and that was the same for all branches. I get that the commits (saved versions) can be different on different branches...

thank you!
 
Last edited:
  • #7
fog37 said:
if I created a new branch, it is possible, after typing , to not see the same files...
Just creating a new branch won't do this; creating a new branch, in itself, doesn't change your working directory. So you must be doing something else. (For what I suspect you might be doing, see below.)

fog37 said:
Does Git create a different working directory (working tree) for different branches?
No.

fog37 said:
I though the working directory was the physical Windows directory with my 3 files and that was the same for all branches.
This is not a good way to think of it. The files in your working directory are those that correspond to whatever branch you currently have checked out, with whatever uncommitted changes you have made to those files. So checking out a different branch will change the files in your working directory. So I suspect you might be checking out a different branch, not creating a new branch.

Creating a new branch doesn't change your working directory at all; all it does is associate the current state of your working directory to the new branch.
 
  • Like
Likes fog37 and FactChecker
  • #8
PeterDonis said:
Just creating a new branch won't do this; creating a new branch, in itself, doesn't change your working directory. So you must be doing something else. (For what I suspect you might be doing, see below.)


No.


This is not a good way to think of it. The files in your working directory are those that correspond to whatever branch you currently have checked out, with whatever uncommitted changes you have made to those files. So checking out a different branch will change the files in your working directory. So I suspect you might be checking out a different branch, not creating a new branch.

Creating a new branch doesn't change your working directory at all; all it does is associate the current state of your working directory to the new branch.
I guess my confusion lie in the fact that I think of a working directory as something static, i.e. as the folder where the actual files I am working with reside. So those files are there no matter what commits we make or not.

When we switch to a different branch (after creating it) and have a look to the working directory using ##ls##, I get surprise that the working directory does not exactly match the working directory when I type ##ls## while on the main branch...since I have been stuck thinking about the working directory as this folder with the actual files in it.

For example, if I create a new file while on the ##Feature## branch, that file come into existence and go into the working directory....but when I explore the working directory of the ##main## branch, that file won't be there...

Thank you
 
  • #9
Think of git as a filing cabinet and a robot filing clerk. Your working directory is your desk. Creating a new branch is asking the clerk to open a new folder in the cabinet containing a copy of the version you are currently working on. Switching to a branch is asking the clerk to clear your "desk" and initialise it with a copy of the branch to which you switched. That may cause files to appear or disappear depending on how your desk looks compared to the branch you switch to.
 
  • Like
Likes fog37
  • #10
Ibix said:
Think of git as a filing cabinet and a robot filing clerk. Your working directory is your desk. Creating a new branch is asking the clerk to open a new folder in the cabinet containing a copy of the version you are currently working on. Switching to a branch is asking the clerk to clear your "desk" and initialise it with a copy of the branch to which you switched. That may cause files to appear or disappear depending on how your desk looks compared to the branch you switch to.
Getting close to understanding thanks to your analogy :)

1711895175666.png

1711895224510.png

To paraphrase:
  • The entire file cabinet (grey box) is where the files reside.
  • Initially, the entire file cabinet has only 1 cabinet (branch 1). But we then create 3 more cabinets, i.e. three new branches (branch 2,3,4) and end up with the grey file cabinet in the image above.
  • We can only open one cabinet at a time (i.e. we can only be on a branch at a time).
  • At the very beginning, we open cabinet 1 (branch 1). All other cabinets/branches are closed. We have files A and B inside cabinet 1. These two files will be placed by Git (clerk) on our desk (working directory).
  • We then open cabinet 2 (branch 2). Cabinet 1 closes automatically. What files are inside cabinet 2? Well, magically, the clerk (Git) has placed inside cabinet 2 two files which are identical copies of file A and B. Our desk (working directory) has been cleared and now has those two exact copies of file A and B.
  • While cabinet 2 (branch 2) is open, I decide to create a new file C. File C ends up both on my desk and inside cabinet 2 (branch 2) only. My desk (working directory) reflects what is inside the currently open cabinet.
  • We close cabinet 2 (branch 2) and reopen cabinet 1 (branch 1). File C is not inside cabinet 1 and neither on my desk (which mirrors what is inside the open cabinet). This shows that the working directory is updated and only shows the files on the current branch (Currently open cabinet). In general, cabinets contain exact copies of the same files, different copies of the same files, or even new different files...My desk, i.e. the working directory, reflects only what is inside the cabinet that is currently open.
  • We reopen cabinet 2 (which closes cabinet/branch 1). We have files A,B,C. We modify file A. File A is different from file A stored in cabinet/branch 1, correct?
I know this an analogy but is my understanding correct?

This file cabinet conceptually represents how Git works. The actual files we created (file A,B,C) are all stored together in a folder (yellow folder) of our computer. The cabinet subdivision and the desk are Git concepts... I have seen the yellow folder on the left being the working directory though...But the desk is not the same thing as the yellow folder on the left. Using some imagery, I am envisioning like this.

1711897040970.png


The analogy skips to include the staging area and commit area. How would you modify the analogy to include those concepts? The desk is the working directory. What about the commit area and the staging area?

THANK YOU!
 
  • #11
fog37 said:
Well, magically, the clerk (Git) has placed inside cabinet 2 two files which are identical copies of file A and B.
That depends - were they in whatever you copied when you created branch 2?
fog37 said:
While cabinet 2 (branch 2) is open, I decide to create a new file C. File C ends up both on my desk and inside cabinet 2 (branch 2) only.
C is only on your desk until you commit the change. Changes you make in your working directory aren't automatically reflected in the git repository/filing cabinet. But you are correct that it won't be in the other branches.
fog37 said:
The analogy skips to include the staging area and commit area.
Staging is just telling git what you want to commit, and as far as I'm aware it doesn't involve copying anything. In the analogy, it's calling the file clerk over and telling it which of the files that you've changed you actually want to put in the cabinet (which is usually all of them, except assets you don't want to version control at all that you'd put in the .gitignore file). Committing is copying to the repository.

The missing step is that the .git folder in your working folder is a repository - the filing cabinet. However, that may itself be a copy of all or part of another repository elsewhere - usually on a GitHub or GitLab server. That is also a filing cabinet and a filing clerk, one that the project leadership owns. Your git talks to that git to fetch/push/pull branches to your local repository, then your git can check them out into your working directory.
 
  • #12
fog37 said:
I think of a working directory as something static, i.e. as the folder where the actual files I am working with reside.
But the actual files you are working with change if you switch branches. That's the whole point of having different branches: to have different sets of actual files you are working with.

fog37 said:
if I create a new file while on the branch, that file come into existence and go into the working directory....but when I explore the working directory of the branch, that file won't be there...
Well, of course, because that file is only part of the actual files you are working with on the feature branch, not on the main branch. To make it part of the actual files you are working with on the main branch, you need to merge the feature branch into the main branch.
 
  • #13
Ibix said:
Staging is just telling git what you want to commit, and as far as I'm aware it doesn't involve copying anything.
That's correct.
 
  • #14
fog37 said:
  • Initially, the entire file cabinet has only 1 cabinet (branch 1). But we then create 3 more cabinets, i.e. three new branches (branch 2,3,4) and end up with the grey file cabinet in the image above.
This isn't complete. What files were in your working directory when you created the other branches?

For example, suppose you are on branch 1, and your working directory contains one file, file1.txt, which is committed. That means branch 1 has that one file in it.

Now you create branch 2. Branch 2 now has the same set of files in it as branch 1. So branch 2 also has that one file in it, and your working directory will look the same if you switch to branch 2.

But now suppose you create another file while you are on branch 2, file2.txt, and commit it. Now branch 2 contains two files, but branch 1 still contains only one; so if you switch back to branch 1, file2.txt will no longer be in your working directory.

If you then merge branch 2 into branch 1, branch 1 will be updated to contain whatever branch 2 contains. So after the merge, file2.txt will be in branch 1 as well as branch 2.

fog37 said:
  • We can only open one cabinet at a time (i.e. we can only be on a branch at a time).
Yes.

fog37 said:
  • At the very beginning, we open cabinet 1 (branch 1). All other cabinets/branches are closed. We have files A and B inside cabinet 1. These two files will be placed by Git (clerk) on our desk (working directory).
Yes.

fog37 said:
  • We then open cabinet 2 (branch 2). Cabinet 1 closes automatically. What files are inside cabinet 2?
It depends on what files were in your working directory when branch 2 was created. See above.

fog37 said:
  • Well, magically, the clerk (Git) has placed inside cabinet 2 two files which are identical copies of file A and B.
Not if you just switch to branch 2. If you create branch 2 when you are on branch 1 with those files, then branch 2 will also contain them. But if you just switch to branch 2, Git will not update the files in the cabinet for branch 2; it will just clear your desk and then put on your desk whatever files were already in the cabinet for branch 2.

fog37 said:
  • Our desk (working directory) has been cleared and now has those two exact copies of file A and B.
No. See above.

fog37 said:
  • While cabinet 2 (branch 2) is open, I decide to create a new file C. File C ends up both on my desk
On your desk, yes. But...

fog37 said:
  • and inside cabinet 2 (branch 2) only.
Not when you create file C. Only when you commit file C while on branch 2. Committing is what takes the files on your desk (more precisely, the ones that are staged for commit) and updates the files in the cabinet under your current branch from them.

fog37 said:
  • My desk (working directory) reflects what is inside the currently open cabinet.
Only if everything in your working directory is committed.

fog37 said:
  • We close cabinet 2 (branch 2) and reopen cabinet 1 (branch 1). File C is not inside cabinet 1 and neither on my desk (which mirrors what is inside the open cabinet). This shows that the working directory is updated and only shows the files on the current branch (Currently open cabinet). In general, cabinets contain exact copies of the same files, different copies of the same files, or even new different files...My desk, i.e. the working directory, reflects only what is inside the cabinet that is currently open.
More or less. See above.

fog37 said:
  • We reopen cabinet 2 (which closes cabinet/branch 1). We have files A,B,C. We modify file A. File A is different from file A stored in cabinet/branch 1, correct?
Yes.

fog37 said:
The analogy skips to include the staging area and commit area. How would you modify the analogy to include those concepts? The desk is the working directory. What about the commit area and the staging area?
There is no "commit area"; committing means updating the files in the cabinet (see above), which you already have in your analogy.

To add the staging area, imagine a special tray at one corner of your desk marked "staging". You have a bunch of files on your desk, all matching what is in the currently open cabinet (since everything is committed). Then you change a file, say file A. Now your desk is out of sync with what's in the cabinet.

If you decide you want to store the change you made in file A in the cabinet, you "stage" file A by putting it in the tray marked "staging". The git command that does this is git add. Then git commit updates the currently open cabinet using whatever is in the "staging" tray, and puts the files in the tray back on your desk. Assuming you haven't changed any other files on your desk, your desk is now in sync with the currently open file cabinet.
 
  • Like
Likes fog37
  • #15
Thank you PeterDonis and Ibix for the explanations and the file cabinet analogy.

The working directory idea has really been confusing me. I was thinking that the working directory was the .git folder parent directory. If that was the case, the working directory would be the yellow folder on the left, which is not, since the working directory is the desk and represents a conceptual Git area just like the staging area and commit area.

1711922673050.png


As PeterDonis mentions, the files in the file cabinet are committed/saved versions of the files in the working directory. Before we commit, there are not files in cabinet 1/branch 1. The files are either on the desk or on the special side tray (staging area).

Based on the analogy, when would the desk clean, i.e. empty (clean tree)? Would that happen when the files that were in the working directory are moved into the tray (staged area) and then moved into the filed cabinet (committed) leaving the desk empty? The desk would be empty, cabinet 1 would have files and cabinet 2 (branch 2) would have the same files as cabinet 1 (branch 1). The desk, when cabinet 2 is open, would also be clean. It becomes unclean with some files on it when we create a new file or modify one of the files in cabinet 2?

Git may know about the files located in the yellow folder on the right (after we run ##git init##) but that does not mean Git is tracking those files.
  • Each file in your working directory ( on the desk) can be in one of two states: tracked or untracked. Tracked files can be unmodified, modified, staged.
  • Using the analogy, a staged file: it is a file that is on the desk and there is also a copy of it inside the special tray. The copy in the tray is the staged file (ready to be committed).
  • A modified file would be a file that currently exists on our the desk (working directory) and there is also a copy of file on the special tray (staging area) but such file copy (staged file) is different from file on the desk. A staged file can unmodified when the file in the staging area (tray) is exactly the same as the file version on the desk.
  • An untracked file is a file that is on the desk but there aren't any copies of it either in the staging (tray) or commit area (file cabinet).
 

Attachments

  • 1711920424714.png
    1711920424714.png
    19.6 KB · Views: 4
  • #16
fog37 said:
I was thinking that the working directory was the .git folder parent directory.
It is. More precisely, when you clone a git repository, the clone goes in a directory with the same name as the repository; that is the working directory. The .git subdirectory of that directory stores the repository data.

fog37 said:
Based on the analogy, when would the desk clean, i.e. empty
When you first initialize a fresh repository that has had nothing committed to it. That is the only time the working directory will ever be "clean" as in empty (no files in it). There is no such thing as a "clean" working directory with nothing checked out; your working directory always reflects the state of some branch in your repository. In the desk analogy, the desk's contents always corresponds to some cabinet; it is never possible to not have any cabinet as the currently open one, and have the desk empty.

fog37 said:
(clean tree)?
"Clean tree" just means your desk (working directory) is in sync with the currently open cabinet (the currently active branch in the repository). It doesn't mean the desk is empty.

fog37 said:
Each file in your working directory ( on the desk) can be in one of two states: tracked or untracked. Tracked files can be unmodified, modified, staged.
Yes.

fog37 said:
  • Using the analogy, a staged file: it is a file that is on the desk and there is also a copy of it inside the special tray. The copy in the tray is the staged file (ready to be committed).
Yes. (Note, though, that if you stage a file and then make further changes to it, the further changes are not automatically staged; so it is possible to have a file which has changes in it that are staged and other changes in it that are not staged. The actual "unit" as far as git is concerned here is changes--diffs--not files.)

fog37 said:
  • A modified file would be a file that currently exists on our the desk (working directory) and there is also a copy of file on the special tray (staging area) but such file copy (staged file) is different from file on the desk.
No. A modified file is a file on your desk which is changed from what is in the currently open cabinet, but which has not been staged.

fog37 said:
  • A staged file can unmodified when the file in the staging area (tray) is exactly the same as the file version on the desk.
No. See above.

fog37 said:
  • An untracked file is a file that is on the desk but there aren't any copies of it either in the staging (tray) or commit area (file cabinet).
Yes.
 
  • Like
Likes fog37
  • #17
@fog37 have you read the git documentation? Or any detailed tutorials or references on git? The questions you are asking are questions that are answered in those sources.
 
  • Like
Likes fog37
  • #18
PeterDonis said:
@fog37 have you read the git documentation? Or any detailed tutorials or references on git? The questions you are asking are questions that are answered in those sources.
I will read better. For some reason, I know Git is supposed to make life easier with version controlling but it feels like it has been the opposite, given that I am trying to understand how Git works...
Thank you!
 
  • #19
fog37 said:
I know Git is supposed to make life easier with version controlling but it feels like it has been the opposite
Many people have that reaction. I don't know if you're forced to use git for some reason or are just trying it out, but if possible, you might want to take a look at Mercurial. It is a distributed version control system like git, but its user interface is easier for many people to grasp, and its model is somewhat simpler--for example, there is no analogue in Mercurial of the git "staging area", tracked files in the working directory are either unchanged (same as what's in the repository) or changed (meaning you need to commit to make them the same as what's in the repository).
 
  • Like
Likes fog37
  • #20
fog37 said:
I will read better. For some reason, I know Git is supposed to make life easier with version controlling
It really can. If you are just using it for your own work, then you can always work in the same directory and use Get to pull different versions into that directory as needed. A common amateurish method without Git (or anything similar) is to keep different versions in different directories with the version number in the directory name or to tack a version number onto source file names. That can lead you to dozens, even hundreds, of almost similar directories and files. Then, of course, the directory and file name often appears in make files and those have to be kept up to date for different versions. It's a mess.
Even if you work on a larger system where other people are working on other parts, having your own personal Git system can really help you.
I can tell you one example where Git got me off the hook. I had developed many (hundreds?) of versions of my code that worked in a larger system. I used a personal Git for my own work, but I was the only one who used Git. One day everything stopped working. I was able to look at my personal Git system and retrieve code that I knew worked in the past. When I rebuilt my program (Git does not work well with the binary files of libraries or executables), the larger system still did not work. So I could tell other people that my latest changes did not cause their current problem. Then I put my latest version back, rebuild the library and executable. The entire thing took less than two hours.
 
  • Like
Likes fog37
  • #21
FactChecker said:
It really can.
Version control systems can, yes. But nothing you say about version control systems is specific to git. Mercurial, for example, gives the same functionality.
 
  • Like
Likes fog37 and FactChecker

Similar threads

  • Programming and Computer Science
Replies
4
Views
2K
  • Computing and Technology
Replies
3
Views
1K
Replies
16
Views
2K
  • Computing and Technology
Replies
13
Views
2K
  • Programming and Computer Science
Replies
8
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
4
Views
2K
Replies
2
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
2K
  • General Discussion
Replies
9
Views
2K
  • Programming and Computer Science
Replies
6
Views
5K
Back
Top