We have a Git repository with over 400 commits, the first couple dozen of which were a lot of trial-and-error. We want to clean up these commits by squashing many down into a single commit. Naturally, git-rebase seems the way to go. My problem is that it ends up with merge conflicts, and these conflicts are not easy to resolve. I don't understand why there should be any conflicts at all, since I'm just squashing commits (not deleting or rearranging). Very likely, this demonstrates that I'm not completely understanding how git-rebase does its squashes.
Here's a modified version of the scripts I'm using:
repo_squash.sh (this is the script that is actually run):
rm -rf repo_squash
git clone repo repo_squash
cd repo_squash/
GIT_EDITOR=../repo_squash_helper.sh git rebase --strategy theirs -i bd6a09a484b8230d0810e6689cf08a24f26f287a
repo_squash_helper.sh (this script is used only by repo_squash.sh):
if grep -q "pick " $1
then
# cp $1 ../repo_squash_history.txt
# emacs -nw $1
sed -f ../repo_squash_list.txt < $1 > $1.tmp
mv $1.tmp $1
else
if grep -q "initial import" $1
then
cp ../repo_squash_new_message1.txt $1
elif grep -q "fixing bad import" $1
then
cp ../repo_squash_new_message2.txt $1
else
emacs -nw $1
fi
fi
repo_squash_list.txt: (this file is used only by repo_squash_helper.sh)
# Initial import
s/pick \(251a190\)/squash \1/g
# Leaving "Needed subdir" for now
# Fixing bad import
s/pick \(46c41d1\)/squash \1/g
s/pick \(5d7agf2\)/squash \1/g
s/pick \(3da63ed\)/squash \1/g
I'll leave the "new message" contents to your imagination. Initially, I did this without the "--strategy theirs" option (i.e., using the default strategy, which if I understand the documentation correctly is recursive, but I'm not sure which recursive strategy is used), and it also didn't work. Also, I should point out that, using the commented out code in repo_squash_helper.sh, I saved off the original file that the sed script works on and ran the sed script against it to make sure it was doing what I wanted it to do (it was). Again, I don't even know why there would be a conflict, so it wouldn't seem to matter so much which strategy is used. Any advice or insight would be helpful, but mostly I just want to get this squashing working.
Before working on our massive "real" repository, I used similar scripts on a test repository. It was a very simple repository and the test worked cleanly.
The message I get when it fails is:
Finished one cherry-pick.
# Not currently on any branch.
nothing to commit (working directory clean)
Could not apply 66c45e2... Needed subdir
This is the first pick after the first squash commit. Running git status
yields a clean working directory. If I then do a git rebase --continue
, I get a very similar message after a few more commits. If I then do it again, I get another very similar message after a couple dozen commits. If I do it yet again, this time it goes through about a hundred commits, and yields this message:
Automatic cherry-pick failed. After resolving the conflicts,
mark the corrected paths with 'git add <paths>', and
run 'git rebase --continue'
Could not apply f1de3bc... Incremental
If I then run git status
, I get:
# Not currently on any branch.
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# modified: repo/file_A.cpp
# modified: repo/file_B.cpp
#
# Unmerged paths:
# (use "git reset HEAD <file>..." to unstage)
# (use "git add/rm <file>..." as appropriate to mark resolution)
#
# both modified: repo/file_X.cpp
#
# Changed but not updated:
# (use "git add/rm <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# deleted: repo/file_Z.imp
The "both modified" bit sounds weird to me, since this was just the result of a pick. It's also worth noting that if I look at the "conflict", it boils down to a single line with one version beginning it with a [tab] character, and the other one with four spaces. This sounded like it might be an issue with how I've set up my config file, but there's nothing of the sort in it. (I did note that core.ignorecase is set to true, but evidently git-clone did that automatically. I'm not completely surprised by that considering that the original source was on a Windows machine.)
If I manually fix file_X.cpp, it then fails shortly afterward with another conflict, this time between a file (CMakeLists.txt) that one version thinks should exist and one version thinks shouldn't. If I fix this conflict by saying I do want this file (which I do), a few commits later I get another conflict (in this same file) where now there's some rather non-trivial changes. It's still only about 25% of the way through the conflicts.
I should also point out, since this might be very important, that this project started out in an svn repository. That initial history very likely was imported from that svn repository.
On a lark (influenced by Jefromi's comments), I decided to do the change my repo_squash.sh to be:
rm -rf repo_squash
git clone repo repo_squash
cd repo_squash/
git rebase --strategy theirs -i bd6a09a484b8230d0810e6689cf08a24f26f287a
And then, I just accepted the original entries, as is. I.e., the "rebase" shouldn't have changed a thing. It ended up with the same results describe previously.
Alternatively, if I omit the strategy and replace the last command with:
git rebase -i bd6a09a484b8230d0810e6689cf08a24f26f287a
I no longer get the "nothing to commit" rebase problems, but I'm still left with the other conflicts.
test_squash.sh (this is the file you actually run):
#========================================================
# Initialize directories
#========================================================
rm -rf test_squash/ test_squash_clone/
mkdir -p test_squash
mkdir -p test_squash_clone
#========================================================
#========================================================
# Create repository with history
#========================================================
cd test_squash/
git init
echo "README">README
git add README
git commit -m"Initial commit: can't easily access for rebasing"
echo "Line 1">test_file.txt
git add test_file.txt
git commit -m"Created single line file"
echo "Line 2">>test_file.txt
git add test_file.txt
git commit -m"Meant for it to be two lines"
git checkout -b dev
echo Meaningful code>new_file.txt
git add new_file.txt
git commit -m"Meaningful commit"
git checkout master
echo Conflicting meaningful code>new_file.txt
git add new_file.txt
git commit -m"Conflicting meaningful commit"
# This will conflict
git merge dev
# Fixes conflict
echo Merged meaningful code>new_file.txt
git add new_file.txt
git commit -m"Merged dev with master"
cd ..
#========================================================
# Save off a clone of the repository prior to squashing
#========================================================
git clone test_squash test_squash_clone
#========================================================
#========================================================
# Do the squash
#========================================================
cd test_squash
GIT_EDITOR=../test_squash_helper.sh git rebase -i HEAD@{7}
#========================================================
#========================================================
# Show the results
#========================================================
git log
git gc
git reflog
#========================================================
test_squash_helper.sh (used by test_sqash.sh):
# If the file has the phrase "pick " in it, assume it's the log file
if grep -q "pick " $1
then
sed -e "s/pick \(.*\) \(Meant for it to be two lines\)/squash \1 \2/g" < $1 > $1.tmp
mv $1.tmp $1
# Else, assume it's the commit message file
else
# Use our pre-canned message
echo "Created two line file" > $1
fi
P.S.: Yes, I know some of you cringe when you see me using emacs as a fall-back editor.
P.P.S.: We do know we'll have to blow away all of our clones of the existing repository after the rebase. (Along the lines of "thou shalt not rebase a repository after it's been published".)
P.P.P.S: Can anyone tell me how to add a bounty to this? I'm not seeing the option anywhere on this screen whether I'm in edit mode or view mode.
If you don't mind creating a new branch, this is how I dealt with the problem:
Being on main:
# create a new branch
git checkout -b new_clean_branch
# apply all changes
git merge original_messy_branch
# forget the commits but have the changes staged for commit
git reset --soft main
git commit -m "Squashed changes from original_messy_branch"