.DS_Store file is in .gitignore, but it keeps popping up in status

The (rather mild) problem here is that you have some number of existing commits in which .DS_Store exists. Note that the current status indicates at least two such files, one in the top level of the tree and one in assets:

deleted:    .DS_Store
modified:   assets/.DS_Store

These existing commits cannot be changed. They will continue to hold a copy of the .DS_Store file forever (or as long as these commits continue to exist, anyway).

You must delete these .DS_Store files (all of them) from Git’s index if you wish them not to be stored in future commits. By doing that—running git rm --cached and making a commit—you’re telling Git that when you check out one of these historical commits, it should extract the historical .DS_Store files, and when you switch from that historical commit to one of the more modern commits that lacks these .DS_Store files, Git should remove them.

Since macOS Finder will, if the file is missing, create a new .DS_Store whenever it shows the directory in a Finder window, this particular action is safe enough in this particular case. However, there are several things to be aware of that can make this trickier for other files, should you need to use git rm --cached on them as well.

Optional reading: why it’s this complicated

Git’s index, which has this relatively poor name (index? what is it index-ing?), has two other names. Git also calls this thing the staging area, which refers to how you use it, and the cache, which refers to how it’s used internally. The name “cache” mostly shows up in the spelling of git rm --cached, which tells Git to remove it from the index without also removing it from your working tree.

Okay, so we have three names for the thing. What does that tell us? Well, for one thing, it tells us that Git’s index is really important. In fact, it’s absolutely crucial. But why? Why does Git insist on shoving its index in our faces over and over again? Ultimately, the real answer to that question is just “it was a deliberate choice”—but it’s worth looking at the factors that made Linus Torvalds make this choice. This turns out to start with commits.

Git is, in the end, all about commits. It’s not about files, although files are, in a sense, contained within each commit. It’s not about branch names either, although branch names are necessary to help you (and Git) find the commits. The commits, though, are the history in the repository. They hold the files, and they form things that we sometimes call branches (see What exactly do we mean by “branch”?).

Each commit is numbered, by a big ugly hash ID number. Each commit stores two things:

  • a snapshot of all of the files that Git knew about, at the time you (or whoever) made the commit; and
  • metadata, or information about the commit itself, such as who made it, when, and why (your log message).

Inside the metadata for a commit, each commit holds the hash ID of some earlier commit, and this is what makes commits act as history. We won’t get into the details here but this is how branches actually work.

The crucial bit to know here is that no part of any commit can ever be changed. This is because the hash ID of the commit is a checksum of all of the data in the commit. Git computes these checksums when making the commits, and verifies them when extracting the commits. If they don’t match, the commit has somehow become corrupt in storage,1 and cannot be used.

So, all the files stored in a commit are read-only. They are also compressed (to save space) and de-duplicated (to save space and time). If two commits in a row share 999 out of 1000 files, they literally share the files: only the one changed file has to go into storage for the later commit. But this means that the committed files are entirely useless for getting any new work done:

  • you can’t change them, and
  • most programs can’t even read them.

So Git has to extract commits to a usable area. Git calls this area your working tree or work-tree, because it’s where you do your work. The files in here are ordinary everyday files, and you can get work done.

What Git needs, then—and this is common across all version control systems; they all share this kind of setup—is:

  • committed, unchanging historical files; and
  • a work area (workspace or working tree or whatever you like) where you get work done.

In Git’s case, the work area is literally yours, to do with as you will. This means you can create files in it that you don’t want Git to save-for-all-time. This, you might assume, is where .gitignore comes in. This assumption isn’t wrong, but it’s incomplete.

1Storage media do fail, in the real world. Most failures are detected, but there is a chance—usually claimed to be about 1 in 1016 or better—that some failure might be missed, giving you back bad data. Google have done analyses and found that the actual error rate tends to be higher than claimed.

The index

What Git really needs is a list of files to commit. That is, suppose you’re in your work-tree, doing your work. You create a bunch of files—maybe several hundreds, or thousands, or whatever. Two of those files should go into the next commit as new files, and the rest should be ignored.

One possible way to deal with this is to just have an “ignore these files” file, and automatically generate the list of files-to-be-committed from every file that’s not listed in the “ignore these files” file. But if you try this out, you’ll find it is error-prone. Git and several similar version control systems use, instead, an explicit “add some file(s)” command to add them to a list of files.

The index, then, could have been just a sort of manifest: these are the files to include; all other files are to be called out as untracked when you ask for status. Suppose this were the case, and you said add all files. You hadn’t told Git explicitly to ignore the .DS_Store files. They go into the list. You make a commit, and the commit has the .DS_Store files. Later, you realize that you didn’t intend to commit the .DS_Store files.

It’s too late. Those commits now exist. No matter what you do to the manifest, at most, you’re just going to omit the .DS_Store files from future commits. You can’t fix existing commits as they are read-only. At best, you can go back to all your old commits, take them out one by one, remove the .DS_Store files, and make a new and improved commit that’s otherwise the same as the original, but now lacks the .DS_Store files.

(You can in fact do all this. But it means that you need to get everyone else—all the other people that have a clone of your repository—to stop using the old commits in favor of the new and improved ones.)

Now, what makes Git’s index particularly unusual—as compared to, e.g., Mercurial’s manifest—is that having this list of files, Linus decided to expose it, and to do a couple of special tricks with it:

  • The index holds not only the names of all the files that go into the next commit—initially, populated by extracting whichever commit you git checkout—but also the internal Git blob hash ID of each such file.

  • During merges, the index expands to hold up to three files at a time, all of which have the same name.

  • The git commit command doesn’t bother to look at your work-tree.2 Instead, it just packages up whatever is in the index, at the time you run git commit. This is very fast, because those internal blob hash IDs are how Git stores the files: in fact, they’re already there, pre-compressed and pre-de-duplicated.

  • The git add command amounts to: compress and de-duplicate these files and put them in your index, replacing any previous file of the same name, or creating a new entry if there is no previous file.3

  • The git rm command means remove the file from both your index and my work-tree. Adding --cached means leave my work-tree copy alone.

One outcome of all of this is that git commit won’t commit what’s in your work-tree. You can use this property to fuss with files in your work-tree for testing purposes, without actually committing the test code. This can be good or bad; different people have different opinions about whether it’s more often good or more often bad; but that’s how Linus chose to do it, and that behavior is now embedded in the hearts and minds of many Git users.

What this all boils down to is a relatively simple statement: The index holds, at all times, your proposed next commit, or else holds merge conflicts that are yet to be resolved so that a commit is not currently possible. If we omit the merge conflict case, you can just think of the index as holding your proposed next commit.

When checking out some existing commit, Git populates its index from that commit, then uses the populated index to fill in your work-tree with files. This means that Git is now ready to make a new commit, that would exactly match the current commit.4

2For usability, this long ago got changed a bit: git commit now runs git status internally, and produces a commented-out git status section in the commit message you can edit.

3In fact, git add can also mean make the index match to the extent that if you remove a work-tree file, git add can remove the index copy of that file. For instance: rm path/to/file; git add path/to/file is a long-winded way of running git rm path/to/file.

4If the index and current commit do match, git commit will typically refuse to make a new commit, forcing you to use git commit --allow-empty to make the commit. The new commit isn’t empty—it has whatever is in the index—but the difference from the current commit will be empty.

Summary so far

Git doesn’t make a new commit from your working tree, and therefore the contents of your .gitignore file are irrelevant to the git commit command. Instead, Git makes a new commit from whatever is in Git’s index. The contents of Git’s index index are normally from the current commit.5

Once you’ve added some file, then, the file keeps going into new commits until you explicitly remove it. That’s true regardless of the file’s name being, or not being, in a .gitignore file. To make the file really go away, you must remove it. Then the difference between the current commit and the next commit you make will include literally removing the file.

So: What does listing file names, or directory names, or patterns, or anything that you can list in .gitignore files, in a .gitignore do? What good is it?

5There are some exceptions to this rule; see, e.g., Checkout another branch when there are uncommitted changes on the current branch.

What .gitignore does

There are two useful things that .gitignore (or any other exclusion file such as .git/info/exclude) does, and one kind of dangerous thing:

  • First, there are en-masse git add operations, such as git add --all or git add . or git add *.6 Or, for that matter, you could list a file pattern like *.pyc in .gitignore and then run git add file.pyc anyway. What happens here is simple: If the file isn’t already in Git’s index, and the name is in an exclusion file, git add doesn’t add it.

    This means that if a file is currently untracked—see below for the definition of untracked—it stays untracked. But if the file is already in Git’s index, the .gitignore entry has no effect.

  • Second, when you run git status, Git will often whine about various files being untracked. Listing the name or pattern in an exclusion file stops the whining.

We’ll get to the dangerous thing in a moment. Let’s define tracked now. A tracked file, in Git, is a file that is currently in Git’s index. That’s it. It’s really that simple. If a file in your work-tree is in Git’s index right now, it is tracked. If it is not in Git’s index right now, it is untracked.

Remember that Git’s index contents change! If you git checkout some commit, Git fills in its index. Those files are now tracked. If you run git add on a new file, that file goes into Git’s index. That file is now tracked. If you run git rm—with or without --cached—on a file, that file comes out of Git’s index. That file is now untracked. Of course, if you ran git rm without --cached, that file is gone from your work-tree too.7

What git status does, among other things like print out the name of your current branch, is run two git diff commands:

  • The first git diff compares the current commit to Git’s index. For every file that is the same, Git says nothing. For every file that’s different, or new or removed, Git says that the file is staged for commit, along with being modified, added, or deleted.
  • The second git diff compares what’s in Git’s index to what is in your work-tree. For every file that is the same, Git says nothing. For files that are different, Git says the file is not staged for commit. What’s a bit unusual here is that for files that are new, Git calls these files untracked. That last bit, of course, is because of the definition of an untracked file.

Listing a file in .gitignore makes git status shut up about untracked-ness. It doesn’t have any effect on the actual tracked-ness at all! It just shuts up the whining.

The last thing that listing a file name or pattern in .gitignore (or some other exclusion file) does is where things are a tiny bit dangerous. This gives Git permission to destroy such a file:

  • Suppose you’re on commit a123456..., which doesn’t have a .DS_Store file in it, and there is a .DS_Store file in your work-tree. That is, .DS_Store is currently untracked.
  • You now issue a git checkout command to check out commit 4321cab..., which does have a .DS_Store file in it.

To extract commit 4321cab..., Git will have to put a .DS_Store file into Git’s index, and then copy that file out into your work-tree. You already have a .DS_Store file in your work-tree. This file will be overwritten.

Normally, Git will stop and complain: Hey, if I extract commit 4321cab..., I’ll destroy your .DS_Store file! This gives you a chance to move it out of the way, if it’s precious data. But if you list the file as ignorable, Git will feel free to clobber it.

Since the data in a .DS_Store is rarely considered precious, this is probably OK here. But be careful in general.

6The precise action of the * in a Git command depends on whether you’re using a Unix-style shell such as bash, or a DOS-style command-interpreter such as CMD.EXE, but Git itself does glob expansion, so it comes out pretty similarly. There are subtle differences we won’t cover here though.

7Exercises with a bit of philosophical bent: if the file named ghost isn’t in your work-tree and isn’t in Git’s index, is the non-existent ghost file also untracked? (It is not in Git’s index, so it won’t be in the next commit, anyway.) What about a file named ghost that is in Git’s index, but isn’t in your work-tree? Is this file tracked? Will it be in the next commit?

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top