In general you don’t have to worry about this. It just works. Don’t even think of files as “renamed”, just think of them as existing, or not existing. If you remove
index.html entirely, and then create an all-new
index.php that holds the same content as the old
index.html, was that actually any different from renaming the old
index.html to the new
(This is a rhetorical question. The answer may be no, and it may be yes. Think about the cases where it’s no, and the cases where it’s yes. Then think about whether any of those cases where the answer is yes, this is different apply to a Git repository. Do they apply? If so, why do they apply? How did your repository get into that state?)
The thing to realize about Git is that it is not about files, and not even about branches. Git is, instead, all about commits.
Every commit, in Git, has a unique number. These are not simple counting numbers: they are not commits 1, 2, 3, and so on. But each one has a number. The number is huge, and is typically expressed in hexadecimal as a hash ID. Each commit gets its own unique hash ID. No other commit can ever have that ID: that ID, once that commit has it, is now reserved to that particular commit. The IDs are arranged so that every Git agrees that that ID is now reserved to that commit.1 This unique number is how your Git finds your commits in your own repository, and also how your Git will share this commit with some other Git repository. This numbering system imposes an iron constraint: nothing can change in any existing commit, ever. You can make all the new commits you like, but you cannot change an existing one; not even Git can do that.2
Besides their unique numbers, by which Git finds them, the thing to know about commits is that they store two parts: data and metadata. The data in a commit hold a full snapshot of every file. The metadata include things like the name and email address of the person who made the commit, when they made the commit, and so on. The metadata also include, for each commit, the hash ID of the previous commit. This strings commits together into a backwards-looking chain, so that if we know the hash ID of the last commit in the chain, we can use that to find all of the earlier commits, one by one.
The source snapshot in a commit contains every file (that exists in that commit, that is). These are stored in a frozen, read-only, Git-only format that de-duplicates the files. That way, if you save the same file thousands of times, it takes no extra space: all the commits actually share a single copy of the file. But while these files are great for retrieving a commit, they are completely useless for getting any new work done, because you can’t change them. So the files that are in Git are not the files you actually use when you do your work.
Knowing all of that, now think about this: regardless of how you achieve it, to “rename” a file, you must take some existing commit that has some files in it, extract all those files to a work area, then have Git delete that file and create a new file with a different name (but the same content as the old file). You then have Git make a new commit that, instead of containing
index.php. Because the contents of the new file match the contents of the old file, the new commit needs no space space to hold new the file,3 but you do need a new commit, because nothing can change the existing one.
So, in a very real sense, no commit ever renames a file. Each one just has whatever files it has. What Git does, when comparing some old commit to some new one, is look to see if the old commit deletes a file entirely, while the new commit creates an all-new file. If so, maybe Git should call that a rename instead of a delete-and-add. Git calls this process, of deciding whether some old file got renamed, rename detection. You can enable or disable rename detection when doing a
When you bring
git pull into the picture, things can get more complicated.
1This particular bit of magic is done by using a cryptographic hash. Git computes this hash over the commit’s content. Since each commit has a unique date-and-time stamp and its source snapshot and all its other metadata, the hash ID of this unique content is in turn unique.
The pigeonhole principle tells us that this technique must eventually fail. The hash ID is large enough so that the actual failures are so rare that we never see one.
2This means that
git commit --amend is a sort of lie. The old commit remains in the repository: what
git commit --amend does is make a new and improved commit, and have Git use the new one instead of the old one.
If the old one is not ever used anywhere, Git will eventually delete it, but the details about this are beyond the scope of this answer.
3Technically, Git needs a little bit of space to record the new file’s name. Most commits need a bit of space to record the set of files committed; the exception occurs when the new commit contains exactly the same files and contents as some previous commit, so that diffing the old and new commits would say that they are 100% identical.
git pull =
git fetch + another Git command
git pull does is a bit complicated, because it runs two Git commands.4 First, it runs
git fetch. This step obtains new commits from some other Git, such as the Git over on GitHub.
The full details of
git fetch can get very complicated, but the simplified picture is pretty straightforward: your Git calls up some other Git. That other Git is in charge of some repository. That repository has some set of commits in it, which that other Git finds by branch names in that other repository. The other Git tells your Git about those names, and the commit hash ID associated with each name.
Your Git now checks in your own repository: Do I have that commit number? If you have that commit, under that number, this particular phase of
git fetch is done. If not, your Git asks their Git to send that commit, and they immediately offer the previous commit. Your Git checks to see if you have that one, and if not, your Git asks for that one too. This repeats until they send the number of a commit that you do have. In this way, your Git gets, from their Git, all the new commits that they have, that you don’t.
Once your Git has all these commits, your Git takes their branch name(s) and renames them to become your remote-tracking names. If they have a
master, your Git changes this into
origin/master. If they have a
develop, your Git renames this to
origin/develop. So you end up with a bunch of
origin/* names. Your
git fetch operation now updates your remote-tracking
origin/* names, so that your Git remembers where their Git’s branches were, at the time of this
git fetch has both obtained new commits and updated remote-tracking names, it’s done and quits. This allows
git pull to run its second Git command.
The choice of second Git command is up to you. You can program Git to run
git rebase here. The default command to use is
git merge.6 In either case,
git pull directs this second command to merge with, or rebase upon, the last commit obtained by the first command, in the branch you told
git pull to use, as seen when
git fetch asked the other Git about its branches.
Hence, if you tell
git pull to use
git rebase, then:
git pull origin master
is roughly equivalent to:
git fetch origin && git rebase origin/master
and if you tell
git pull to use
git merge (or don’t direct it not to), it’s roughly equivalent to:
git fetch origin && git merge origin/master
So what this does depends on three things:
- which second command you have it run (including any options you have
git pullpass to the second command);
- what new commit(s) came in from their
masterto arrive in your
- what commits you have in your current branch, in relationship to the new commit(s).
4In the not-too-distant past, this was literally true. These days,
git pull has been rewritten in C, instead of being a shell script; it now has the other commands built directly into it. But it still works the same way it used to, it’s just more efficient now.
5This information gets out of date. How fast it gets out of date depends on how fast other people add new commits to that other Git repository. In a highly active repository, your
git fetch might be out of date in a few seconds or minutes. Most repositories aren’t that active, though—and if the other repository is under your control, on GitHub, you might be the only one who can add new commits to it.
6In one case,
git pull will run
git checkout as its second command, but that’s not something you control directly. You’ll mainly see it if you use
git pull as the last step of manually cloning some repository, instead of using
git fetch followed by
Simplifying assumptions to keep this post short (if it’s not too late)
We might, here, make some simplifying assumptions:
- Your repository, locally, and the repository you have on GitHub, were in sync.
- You added one or two new commits to your repository on GitHub. (Or, some small finite number, anyway.)
- You have not done any uncommitted work locally either, and do not have existing untracked work-tree files that will interfere with the next few steps.
- You’re using
git mergeand allowing it to do a fast-forward.
- No other conditions block this fast-forward operation.
In this case, your
git pull origin master, or whatever similar command you run, will merely find a few new commits and will therefore be able to do this fast-forward operation. This means that
git merge does not have to do a merge at all.7
Your Git will at this point see that you have, currently, a file named
index.html.8 Your Git will see that in the commit that you are about to move to, there is no file named
index.html. There is no unsaved work in your work-tree
index.html either. There is, however, a file named
index.php in the new commit, and you don’t have a file named
index.php in the way, in your work-tree. So your Git will remove the
index.html file entirely—this is safe, because it’s saved in the current commit—and then create an all-new
index.php file that holds the same content as the new commit, which of course matches the content of the existing commit.
Once Git’s index and your work-tree match the new commit—which happens by removing some old files entirely and adding some new files “from scratch”—the fast-forward “merge” operation is essentially complete. Git now changes your branch name to identify the new commit, so that your
master now holds the same hash ID as your
origin/master. Since your
origin/master holds the same hash ID as the GitHub repository’s
master, your setup and their setup are once again even.
index.html was completely removed and the file
index.php was created from scratch, but unless you have a way to tell that this actually happened, you won’t be able to distinguish this from having the existing
index.html renamed in place. On a Unix or Linux system, the way to tell is to examine the inode information (the inode number might change, though this isn’t guaranteed), to use a file system monitor (which will see the individual events rather than just the final result), and/or to use hard-links so that the inode number is significant. If you are not going to do any of that, you probably won’t care whether the operation was “rename in place” or not.
7Despite Git calling it a fast-forward merge, it’s not really a merge at all. It’s just a
git checkoutthat drags the branch name forward.
8Technically, Git knows this because there is a “copy” of
index.html in Git’s index. The index tracks (keeps “copies” of) all the files that will go into the next commit that you make. (The word “copy” is in quotes here because Git uses the de-duplicated format, so that if what’s in Git’s index matches any existing file anywhere in the repository, there’s no actual copy, just the name.) Checking out a commit includes adjusting Git’s index to mirror the commit that was just checked out, so the index and the current commit normally match—well, up until you start using
git add or other Git commands to modify it to be ready to make another commit.
CLICK HERE to find out more related problems solutions.