EXECUTIVE SUMMARY
Search for the string 'Recipe' in all files that have the .org
or
.html
extension anywhere in the current directory or below, ensuring
that the filename is prepended to all matches:
> grep -e 'Recipe' `find . \( -name "*.org" -o -name "*.html" \)` /dev/null
Same as above except all non-binary files are searched:
grep -HIre 'Recipe' *
SUPPORTING JABBER
Consider the situation where you have many text files in a certain
directory tree and you want to discover which files have particular
content. Here we discuss the use of grep
and find
to help solve
this problem. Modern versions of grep
remove the need to use
find
, and we will discuss that method after the one applicable to
more disadvantaged systems.
The grep
command is used to search the contents of files. A
familiar output is to have the filename prepended to the line that
matches the search, for example
> grep -e 'ground' *
photos.org: background. I also had the privilege of seeing the physical
photos.org: ground on the night of the 27th. On the morning of the 28th there
Quotes.org:going to take a lovely, simple melody and drive it into the ground. --
It is tempting to interpret the prepended filename as the overall
default. However, whether the filename appears or not depends also on the
context in which grep
was used. Specifically, when grep
is
provided a single file to search through the filename is not prepended,
> grep -e 'ground' photos.org
background. I also had the privilege of seeing the physical
ground on the night of the 27th. On the morning of the 28th there
This is a reasonable behavior from the perspective of grep
since
only a single file was given there should be no doubt what file
contained the match. As we will see below there are times when grep
may be provided a single file but the user does not know what that
file is. In these cases we want to force the filename to be
identified. One way to do this is to pass grep
the real file and
one other file that has the following property; its contents will
never match the search expression, for example /dev/null
. Witness
the difference,
> grep -e 'ground' photos.org /dev/null
photos.org: background. I also had the privilege of seeing the physical
photos.org: ground on the night of the 27th. On the morning of the 28th there
Before continuing there are two observations to be made about the
grep
invocations above. First, and almost as an aside, the calls
could have been written just a bit more simply by dropping the -e
switch and the quote marks. However, this construct allows for more
complex search expressions. An example is to find either the word
'ground' or the word 'Recipe' in any files,
> grep -e 'ground\|Recipe' *
photos.org: background. I also had the privilege of seeing the physical
photos.org: ground on the night of the 27th. On the morning of the 28th there
Quotes.org:going to take a lovely, simple melody and drive it into the ground. --
Recipes.org:#+TITLE: Recipes
sitemap.org: + [[file:Recipes.org][Recipes]]
The observation that pertains directly to the problem at hand is that
the list of files for grep
to search must be specified somehow. If
all the files are in the same directory, then a simple wildcard
expression might be all that is needed. However, sometimes the search
is to be done recursively or across several directories.
The find
command is useful for finding files on the system with
particular characteristics. As an example, the following expression
finds all files in the current directory and below that have either a
.org
or .html
extension,
> find . \( -name "*.org" -o -name "*.html" \)
backcountry/photos.html
backcountry/readme.html
backcountry/maintenance.html
backcountry/sitemap.html
backcountry/index.html
[--snip--]
templates/rketburt-01-Level00.org
templates/rketburt-01-Level01.org
[--snip--]
Be aware, the space after the \(
and before the \)
proved to be
vital while testing commands for this article. I am unaware if this
is a general necessity or just on my particular system.
Now it is a simple matter to search the contents of multiple files.
We build the file list using find
embedded in backticks (`
) to
capture the result, then invoke grep
on that list. Here is a
complete example,
> grep -e 'Recipe' `find . \( -name "*.org" -o -name "*.html" \)` /dev/null
rketburt-org/Recipes.org:#+TITLE: Recipes
rketburt-org/sitemap.org: + [[file:Recipes.org][Recipes]]
rketburt/sitemap.html:<a href="Recipes.html">Recipes</a>
rketburt/Recipes.html:<title>Recipes</title>
rketburt/Recipes.html:<h1 class="title">Recipes</h1>
rketburt/index.html:<a href="Recipes.html">Recipes</a>
Note the use of /dev/null
as a file argument to grep
to ensure
that the filename is prepended.
Another way to effect the same final result is to invoke find
first
and use the -exec
argument to call grep
. In this ordering grep
is only provided with a single file which leads to the lack of
filename problem indicated earlier. The overall syntax is a bit more
cumbersome as well, since {}
is used to pass the result of find
to
grep
and there is the trailing \;
as well. An equivalent example
to the one in the previous paragraph is
> find . \( -name "*.org" -o -name "*.html" \) -exec grep -e 'Recipe' {} /dev/null \;
Syntax or preferences aside, it is interesting to note that while
these two examples provided the same end result, the one that begins
with grep
executed nearly 10 times faster.
The find
command has been used above for two reasons. First, the
desire was to search files that may appear in directories below the
one called out. In other words we desired a recursive search. The
second reason was to eliminate the prospect of searching non-text
files which would have simply been a time sink. The method to exclude
the binary files was to limit the file extensions to just two (.org
and .html
). This may be the exact behavior desired for some
questions, but may be too restrictive for others.
Modern versions of grep
permit both recursive searching (-r
) and
binary file exclusion (-I
). Additionally, prepending the filename
can be specified (-H
) even in the event only a single file is
searched. To find all text files in or below the current directory
that contain the string 'Recipe', the command is now simply
grep -HIre 'Recipe' *
During testing for this article the time to complete was at its
fastest only about twice that of the grep
that uses the find
in
backticks, and at its slowest was over 100 times slower. This
difference may have been due to the system load or possibly the fact
that there were hundreds of files that together total nearly 2GB.
Even so, there may be times when the blind search is well worth the
time spent to discover something.