git grep and two substrings: cheat sheet

Advanced Git usage scenario explained

Let’s discuss three advanced cases of using “git grep”. The most common of the three is:

  • find files that contain two or more strings anywhere in the same file (IOW, both strings must be present).

Every time I need to do this again, I usually forget how I did it the last time, so here is a cheat sheet, ready for copy-pasting.

The other two cases are:

  • find files that contain one string but do not contain another string;

  • find files that do not contain either of the two strings.

Also, we’ll discuss some variations of presented commands.

Cheat sheet

Case 1: both strings in the same file

$ git grep -e 'use Foo' --or -e 'Foo::bar' --all-match

src/foo-bar.txt:use Foo;
src/foo-bar.txt:Foo::bar;

Case 2: one string, but without the other (note that the first string is repeated two times)

$ git grep -l 'use Foo' | xargs git grep -L 'Foo::bar' | xargs git grep 'use Foo'

src/foo-no-bar.txt:use Foo;

Case 3: neither one string nor another

$ git grep -L 'use Foo' | xargs git grep -L 'Foo::bar'

src/no-foo-no-bar.txt

(The first string here is “use Foo”; the second string is “Foo::bar”).  You can use https://github.com/squadette/git-grep to play around.

In the rest of the post we go into some technical details that may be useful.

Case 1: find files that contain both strings at the same time

In the playground repository (https://github.com/squadette/git-grep) you can find four files in the src/ directory.  Only one of them (src/foo-bar.txt) contains both substrings (“use Foo” and “Foo::bar”).

Let’s look again at the command line and its output:

$ git grep -e 'use Foo' --or -e 'Foo::bar' --all-match

src/foo-bar.txt:use Foo;
src/foo-bar.txt:Foo::bar;

The most important part of this is “--all-match”.  Together with “--or” it does what you need.

Note however, that variations probably don’t work as you would’ve expected.  That’s the main reason why I cannot remember the full picture and that’s why I had to write it down here.

Regex dialects and fixed strings

Two “-e” here specify the patterns to use.  By default, a “basic POSIX regex” is used. Nobody really remembers its syntax, and it’s not super-friendly to the shell, so I recommend just using the full-scale Perl-compatible syntax (“-Pe”) instead.

Here are some useful alternatives for ‘-e’.  You can use different regex dialects in different patterns!

-Fe string” searches for a fixed string, without any regex syntax.

-Pe regex” searches using Perl-compatible regex syntax, the most powerful of them all.

-we pattern” is a “word mode” modifier. If your file contains the string “hello” and you search for “ell” then it WON’T match in this mode.  Without “-w” it will match anywhere.

You can use this modifier in addition to the others!  (I realized that while working on this post).  For example, you can use “-Fwe quux” to find a fixed string “quux” that is not a part of another word.

We won’t be talking about basic/extended POSIX regexes here (“-Ge”) because they are completely superseded by Perl-compatible ones.

Command output

By default, Git shows file name and the matching lines, with matching patterns highlighted.  On my machine it looks like this:

Some useful additions here are:

-n” shows line numbers:

-l” just shows list of files:

Case 2: find files that contain one string and do not contain another

The files in our playground repository simulate a pseudo-language that has module imports (“use Foo”) and method calls (“Foo::bar”).  Suppose that we want to find the files that import the module but do not use it (maybe we want to remove unused import declaration).  One such file is src/foo-no-bar.txt.

As I said before, the approach from Case 1 does not generalize.  Unfortunately, we have to use additional shell commands to solve the problem.

$ git grep -l 'use Foo' | xargs git grep -L 'Foo::bar' | xargs git grep 'use Foo'

src/foo-no-bar.txt:use Foo;

Here is what’s going on here: first, the “git grep -l 'use Foo'” is executed.  It finds all files that contain “use Foo” and emit their names on stdout. 

Then “xargs” shell command reads that list and runs “git grep -L 'Foo::bar'”, appending those names as extra arguments.  Here we use the “-L” (uppercase) switch, so only the files that DO NOT contain “Foo::bar” are emitted on the stdout. 

Finally, we run git grep again with the help of another “xargs”, just so that it would show us the strings found.   That’s why we must repeat the first string two times.  If we want to use additional modifiers such as “-n” (show line numbers), this is where we add them.

We see that this case is not a generalization of Case 1, but can we do it the other way round and represent Case 1 similarly to Case 2?  Well, let’s discuss Case 3 first.

Case 3: find files that contain neither one string nor another

To show files that contain neither one string nor another, we use the same approach as in Case 2:

$ git grep -L 'use Foo' | xargs git grep -L 'Foo::bar' 
src/no-foo-no-bar.txt

Here, first we run “git grep -L 'use Foo'”; it emits the list of files that don’t contain the first string (we use uppercase “-L” here).

Then we run “xargs” together with “git grep -L 'Foo::bar'”, this further filters the list of files.

There are no substrings to show in the output, so we are just satisfied with the list of files.  (You can continue chaining this command if needed.)

I want a pony

(There is no useful information in this section unless you’re into this stuff.  You can skip to the end.)

Basically what I think would be nice to have is just*:

  • git grep       -e 'use Foo' --or        -e 'Foo::bar' (Case 1);

  • git grep       -e 'use Foo' --and --not -e 'Foo::bar' (Case 2);

  • git grep –-not -e 'use Foo' --and --not -e 'Foo::bar' (Case 3);

Speaking practically, maybe a different git subcommand could be created for this?  “git mgrep”, anyone?

One of the reasons why I cannot remember all of this between attempts is that it feels that maybe I just did not grasp the entire mental model of what’s going on and I just need to think about it one more time.  What if there is still some algebraically elegant construct that does what I need using only existing features of “git grep”?

By the way, let’s investigate the full design space:

  • any regex can match a string or not match a string;

  • you can use negative/positive lookahead/lookbehinds in Perl-compatible regexes; all four combinations are allowed, but some are restricted (lookbehinds could probably be only fixed-width).

  • --and” and “--or” allow to match two patterns on the same line;

  • --not” inverts the match (but a more precise definition is needed).

  • -v” inverts the result of “grep match” (it seems that “-v” flag is global, I hoped that it could be used on both sides of “--and”);

  • -l/-L” show matching and non-matching file names (and it does negate “-v” as I would have hoped);

  • --all-match” seems to be not generic enough, otherwise we could just have our pony; but then we have to really carefully define what exactly is a “match” here;

This is a different kind of pony, but maybe somebody has a better mental model for what’s going on here?

Outro

I hope that this text was useful, and maybe you even learned something new.

My main creative output is currently focused on: