How can I perform a diff that ignores all comments?

Matt V. picture Matt V. · Sep 21, 2011 · Viewed 17.2k times · Source

I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.

Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //, /* */ and #.

Answer

kenorb picture kenorb · Mar 21, 2015

To use visual diff, you can try Meld or DiffMerge.

DiffMerge

Its rulesets and options provide for customized behavior.

GNU diffutils

From the command-line perspective, you can use --ignore-matching-lines=RE option for diff, for example:

diff -d -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.


See also:

Alternatively, check other diff apps, for example: