apply only changes to new or updated files from a git branch

First, let’s get pull out of the question, because pulling is a red herring: git pull means, roughly speaking, run git fetch, then run git merge. The action you’re concerned with here is merging.

Update It is appearing unlikely that git supports something like this directly. I am looking into a mixed approach of cherrypicking individual files via a script that looks for new and modified entries only.

Let’s go back to the basics: Git implements version control by making full snapshots of every file. These are in commits. Commits are numbered, with big ugly hash IDs. Each commit has two parts: the snapshot, and some metadata. The metadata shows who made the commit, when, and so on, and contains the number(s) of its parent commits.

Now look at your question again: you want new files and modified files. A commit, standing by itself, has no new files. It has no modified files. It has no deleted files. It just has files: it is simply a snapshot. You find something to be “new” or “modified” by comparing the snapshot to some other snapshot.

Which snapshot shall we compare, to which other snapshot? That’s the key to solving your problem: you must pick the right set of snapshots and direct Git to do the right comparisons. Do you want one single comparison? Do you want many comparisons? Which ones do you want done, when, and what do you want to do with each comparison’s result?

Realizing this, and looking at how git merge itself operates, will tell you whether git merge can be helpful. Knowing this, and looking at how git cherry-pick works, will tell you whether git cherry-pick can be helpful. Or, perhaps you should simply write your own tool—a script that invokes various Git plumbing commands1 that will do what you want done.


1Git divides its commands into porcelain or user-facing commands, vs plumbing commands. These are typically tools to do some specific job, but porcelain commands typically offer both options and user configuration. For instance, git pull runs two Git commands, but some users want the second command to be git rebase instead of git merge, so you can configure git pull to run git rebase instead of git merge. That in turn means that if you are 100% sure you want, in some script, to run git fetch followed by git merge, you should not use git pull because it might run git rebase instead!

Git’s attempt to divide these is not completely successful, but some commands, such as git for-each-ref and git rev-list, are definitely not end-user-oriented, while others like git log and git diff are, or try to be. The git diff command has multiple plumbing commands that implement various parts: git diff-index, git diff-tree, git diff-files. None of those read user configuration. The git diff porcelain command does read user configuration. So in a script, you would generally want to figure out which plumbing command to use, so as not to have your script break due to user configuration.


git merge

Merge is a big and complicated command, but if we ignore various options, such as --squash, and edge cases such as when git merge does a fast-forward instead of merging, it ends up being relatively simple for most cases:

  • The merge operation takes two commits: the current commit, at the tip of the current branch, and some other commit, at the tip of some other branch.

  • Merge uses the commit graph to identify a merge base commit. This is the best common commit: of all the commits that are on both branches, one of them is “best”. (Technically this is the Lowest Common Ancestor of the DAG formed by the commits’ parent/child relationships in the commit graph.)

  • Merge now performs not one but two diffs. The two diffs compare the common ancestor—the merge base—to each of the two branch tip commits:

              I--J   <-- current-branch (HEAD)
             /
    ...--G--H
             \
              K--L   <-- their-branch
    

    Here, the two branch tip commits are J, the last commit on our current branch, and L, the last commit on the other branch. The best shared commit is obviously commit H as it is where the branches rejoin in history: all commits up through and including H are on both branches, and H is “closest to the end” (non-technically-speaking; for the technical definition of LCA, see the Wikipedia link).

    These two differences, which ignore all the intervening commits, produce a set of added, modified, deleted, and/or renamed files.2 When files are modified, they produce a list of changes that will transform the merge-base copy of some file into the branch-tip copy of the same file.3

Having obtained the correct differences and identified the correct files, merge’s work is now to combine the various changes. The combined changes are then applied to the snapshot from the merge base. The result is a new snapshot, that takes “our changes”, but also takes “their changes”. The terminology for this is a three-way merge. See also Why is a 3-way merge advantageous over a 2-way merge?

In some cases, the combining process goes awry because Git can’t combine the two sets of changes. In these cases, git merge stops in the middle of the operation. The new snapshot is not yet made. You now have the job of figuring out the correct snapshot, arranging for that correct snapshot to be in Git’s index, and then finishing the merge.

The finished merge is special in exactly one way: instead of a single parent commit—the normal parent in our drawing above would be commit J—the merge commit that Git makes, or that you make after you resolve conflicts, has two parents, J and L. This means that a future git merge operation that uses the commit graph will find a different merge base, because now commits K-L will be on our branch. Commit H, the old merge base, won’t be the merge base in a later merge.

Other than this alteration to the graph, a merge commit is otherwise exactly the same as any other commit: it has some metadata, saying who made it, when, and why (the log message), and it has a snapshot. The snapshot is the one that was in Git’s index when the merge completed, just as any git commit makes a snapshot of what is in Git’s index.

What you want to do is similar, but you definitely want to discard some changes. Sometimes, you want to keep some files the way they appear in your own branch tip commit. Sometimes you don’t. If you let git merge complete the merge and make its own commit, that’s the wrong result. But git merge lets you stop the command, as if there had been a conflict, before making the new commit, using the --no-commit option. This leaves everything else set up as usual, but now you can alter what’s in Git’s index, before you finish the merge.

Will that do what you want? Maybe—if git merge‘s commit graph manipulations are what you want and you can come up with the right set of manipulations you should make to alter Git’s index to contain what you want it to contain.


2In Git, this is the job of the merge strategy. There is a new merge strategy being worked on now, called merge-ort, which I have yet to catch up on. For now, I’m talking about the recursive or resolve strategy, where I’m assuming a single merge base and therefore both strategies do the same thing.

3This gets into an interesting question: what does it mean for some file to be the same file? If you look very closely at this question, it becomes harder and harder to answer. Git’s answer is usually that a file whose path name in the merge base is path/to/file must be the same file as a file in the branch-tip commit whose name is also path/to/file, but if you’ve renamed files, this gets murkier.


git cherry-pick

Cherry-picking is not as complicated as git merge: it makes a single ordinary (non-merge) commit that attempts to duplicate the changes from some other commit. The surprise here is that git cherry-pick uses Git’s merge machinery.

Remember again that each commit has the two parts: data—the snapshot—and metadata. The metadata for a cherry-pick will be the same as for any ordinary commit. But how should Git get the data?

What we want from git cherry-pick is to reproduce what happened in some existing commit. But as we just observed yet again, commits don’t store changes. How can we tell what happened in some previous commit? The answer is: we pick two previous commits and compare them. Which two previous commits do we pick? That’s obvious after a moment of thought and a look at the commit graph:

...--o--o--P--C--o--o--...   <-- some-branch
  ...
    ...--o--T   <-- current-branch (HEAD)

Here we are with commit T, our target for the cherry-pick, all checked out in our work-tree, with us ready to work on it. But the work we’d like to do now was already done, in commit C, the child of parent P. What does that mean? Well, it means that if we compare the snapshot in parent P to the snapshot in child C, we’ll see the changes we’d like to make to our target snapshot T.

So, we need a diff from P to C. Git can certainly do that: commit C contains in its metadata the (single) parent hash ID for commit P, so Git can run the diff and see what changed. Then Git can try to apply these changes to the current snapshot, in our target commit.

What people have found over time is that this “apply some other change” works better when it is treated as a three-way merge. We simply designate commit P as the “merge base” and T, our current commit, as our current commit, with commit C—the child of our chosen merge base P, and of course the argument to git cherry-pick, as the other commit. Git then finds what we changed from P to T, and what they changed from P to C, and combines them.

If the combining goes well, git cherry-pick makes the commit on its own. The commit it makes isn’t a merge commit though:

...--o--o--P--C--o--o--...   <-- some-branch
  ...
    ...--o--T--C'   <-- current-branch (HEAD)

Here, the new commit is an ordinary single-parent commit, with parent T. I’ve called the new commit C' because it is, sort of, a copy of commit C: the diff from T to C' is the same, sort of, as the diff from P to C. (The line numbers of various changes might differ, and if you had to resolve merge conflicts, the diffs themselves will differ up in exactly those places where you did something to fix the conflict.)

You can get git cherry-pick to avoid doing the final commit, even if there are no conflicts. Just add the --no-commit argument. The cherry-pick will stop, just like merge will stop. You now have a chance to change what’s in Git’s index, so that the final commit—from git cherry-pick --continue or git commit—has contents that aren’t the result of the automated three-way merge.

What this means for you is that you will have to do the same kind of work as you would do for git merge --no-commit. You will need to determine which changes to keep and which ones to discard.

Writing your own command

If neither of these methods seems to produce the results you want, consider doing your own git diff (using the appropriate plumbing commands, of course, so that you don’t get tripped up by user configurations). You can find the hash IDs of the appropriate commit(s) using git rev-parse, which is a very handy plumbing command. (Sometimes you can use names, but generally, it’s a good idea to turn the names into the appropriate raw hash IDs. See the gitrevisions documentation for the various ways to spell a commit hash ID for git rev-parse.)

Note that git diff --name-status will show you the file names and a status letter for the result of each diff. For instance, if you diff the parent and child commits that git cherry-pick would use:

git diff <hash-of-P> <hash-of-C>

you’ll see some set of changes, and in those set of changes, you may also see an action of the form delete some file entirely or create some new file. If you enable rename detection, you may see an action of the form rename old name to new name. With git diff --name-status, you’ll see A (add new file), D (delete file), R (rename file), or M (modify file) as the status.4 You can use this to decide whether a file is “new”, for instance.

If you then decide you’d like to run a standard three-way merge algorithm on some specific file content, you will need to extract the three input files from three commits (or wherever they might come from). The git merge-file command will then be able to perform that merge, possibly with conflicts.

Note that this amounts to writing your own merge strategy, and in general, writing merge strategies is hard. See also footnote 2.


4The full possible set of letters for a standard diff is ABCDMRT; in a merge conflict situation, you can get status code U. B is only possible if you use the -B option; C is only possible if you use the -C option; and R occurs only when rename detection is enabled, but this is a user configuration item. T represents a type-change, e.g., from file to symbolic link.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top