GREP is a filter that searches input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines.
GREP combines most features of UNIX grep and fgrep. GREP has many other advantages over FIND besides using regular expressions:
The two versions operate the same and have the same features,
except that the 32-bit version supports long filenames.
You may wish to rename the version you use more often to the
simpler GREP.EXE. All the following user instructions will assume you've
done that. Otherwise, just substitute GREP16
or
GREP32
wherever you see GREP
in the examples.
grepThe full command form is one of
grep [options] ["regexp"] [<inputfile] [>outputfile] grep [options] ["regexp"] inputfiles [>outputfile]In the first form, GREP is a filter, taking its input from the standard input (most likely piped from some other command). In the second form, GREP takes its input from any number of files, possibly specified with paths and wildcards. Please be aware that the 16-bit and 32-bit GREP programs expand wildcards slightly differently because the 32-bit version supports long filenames. Thus the 32-bit version would expand
abc*
to include all files, with any extension or none, whose names start
with abc; with the 16-bit version you need abc*.*
to get
the same result.
In both forms, the outputfile will receive the matching lines
(or other output, depending on the output
options). For output to the screen, omit >
and
outputfile.
"regexp" is a regular expression; see
below for how to construct one. A regular expression is normally
required on the command line; however, if you use the /F
option, regular expressions will be taken from file instead of the
command line.
Example:
grep -I "pic[t\s]" \proj\*.cob >prnwill examine every COBOL source file in the PROJ directory and print every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the
/I
option).
You have a lot of freedom about how you enter options. You can use
a leading hyphen or slash mark; you can use upper- or lower-case
letters; you can leave spaces between options or combine them. For
instance, the following are just some of the different ways of turning
on the P3
and B
options:
/p3 -b /b/P3 /p3B -B/P3 -P3 -bThis document will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.
/?
Display a help message and exit with no further processing.
/0
or /1
/0
returns 0 if there are
differences or 1 if there are no differences; /1
returns
1 for differences or 0 for no differences. For more details, see
Return values below.
/D
file
file is optional. If you specify /D
by itself
(followed by a space), debugging information will be sent to the
standard error stream, which is normally the screen.
Since the debugging information can be voluminous, you may want to
specify an output file: file must follow the D
with no intervening space, and the filename ends at the next space.
GREP will append to the file if it already
exists.
/F
file
file must follow the /F
with no intervening
space, and the filename ends at the next space.
If you use a minus sign as the filename (/F-
option),
GREP will accept regular expressions from standard input. Don't do this if you
are redirecting input from a file!
/I
Caution: the /I
option does not apply to 8-bit
characters (characters 128-255). Because there are many different
encoding schemes, GREP doesn't know which characters above 127
correspond to each other as upper and lower case on your computer.
Therefore, if you want case-blind comparisons, you must explicitly
code any 8-bit upper and lower case in your regular expression. For
instance, to search for the French word "thé" in upper or lower case,
code it as th[éÉ]
. The "th", being 7-bit ASCII
characters, will be handled correctly by the /I
option.
/Q
/D
option).
/V
Before going through the options, let's take a moment to look at some of the possible output formats. By default, GREP's output is similar to that of DOS FIND:
---------- GREP.C op_showhead = ShowNoHeads; else if (op_showhead == ShowNoHeads) op_showhead = ShowNoHeads; ---------- GREP_MAT.C op_showhead == ShowNoHeads)However, the
/U
option (see below) produces UNIX
grep-style output like this:
GREP.C: op_showhead = ShowNoHeads; GREP.C: else if (op_showhead == ShowNoHeads) GREP.C: op_showhead = ShowNoHeads; GREP_MAT.C: op_showhead == ShowNoHeads)As you can see, the main difference is that DOS-style output has the filename as a header above the group of matching lines from that file, and UNIX-style output has the name of the file on every matching line.
Now, here are the options that control what GREP outputs and how it is formatted:
/B
/C
/H
grep /H "Directory" <inputfile | other program
/L
/V
option, display the names of files that contain no matches. (This is
the same as the L option in UNIX grep.)
/N
---------- GREP.C [ 144] op_showhead = ShowNoHeads; [ 178] else if (op_showhead == ShowNoHeads) [ 366] op_showhead = ShowNoHeads; ---------- GREP_MAT.C [ 98] op_showhead == ShowNoHeads)With both the
/N
and /U
options, the
UNIX-style output looks like this:
GREP.C:144: op_showhead = ShowNoHeads; GREP.C:178: else if (op_showhead == ShowNoHeads) GREP.C:366: op_showhead = ShowNoHeads; GREP_MAT.C:98: op_showhead == ShowNoHeads)UNIX-style output is suitable for use with the excellent freeware editor Vim.
/P
before,after
Either number can be 0. For instance, use /P0,4
if you
want to show every match and the four lines that follow it.
If you use the /P
option, you probably want to use the
/N
option as well, to display line numbers. In that case,
the punctuation of the line numbers will distinguish which lines are
actual matches and which are displayed for context. Here is some
DOS-style output from a run with the options /P1,1N
set:
---------- GREP.C 143 if (opcount >= argc) [ 144] op_showhead = ShowNoHeads; 145 177 PRTDBG "with each matching line"); [ 178] else if (op_showhead == ShowNoHeads) 179 PRTDBG "NO"); 365 if (myToggle('L') || myToggle('U') || myToggle('H')) [ 366] op_showhead = ShowNoHeads; 367 else if (myToggle('B')) ---------- GREP_MAT.C 97 op_showwhat == ShowMatchCount || [ 98] op_showhead == ShowNoHeads) 99 headered = TRUE;As you can see, the actual matches have square brackets around the line numbers, and the context lines do not.
/U
There's one small difference from UNIX grep output: UNIX grep
suppresses the filename when there is only one input file, but GREP
assumes that if you didn't want the filename you wouldn't have
specified the /U
option. Neither GREP and UNIX grep
displays a filename if input comes from a file via <
redirection.
Some combinations of output options are logically incompatible. For
instance, /H/L
makes no sense. In such cases, GREP will
turn off one of the incompatible options and tell you what it did.
The following list of incompatibilities is given for completeness only:
/B
| overrides /H ;
ignored with /L or /U
|
/C
| overrides /H , /L , /N , /P
|
/H
| ignored with /B , /C , /L , /U
|
/L
| overrides /B , /H , /N , /P , /U ;
ignored with /C
|
/N
| ignored with /C or /L
|
/P
| ignored with /C or /L
|
/U
| overrides /B and /H ;
ignored with /L
|
The following characters are special if they occur in the listed contexts:
\
), always
.
), asterisk (*
), plus sign
(+
), and left square bracket ([
), anywhere
except within square brackets
^
), only at the beginning of the regular
expression or immediately after a left square bracket
$
), only at the end of the regular
expression
-
), only between square
brackets
Here are the rules for a regular expression:
\
).
Example: to search for the string "^abc\def", you must put backslashes
before the two special characters to make GREP treat them as normal
characters and not give them special meanings, so that
\^abc\\def
is your regular expression.
You can use any character from space through character 255. If using
8-bit characters on the command line, see
Special rules for the command line below.
[ ]
).
Examples: [aA]
will match an upper- or lower-case letter
A; sno[wr]ing
will match "snowing" or "snoring".
You can indicate a character range with the minus sign
(-
). Examples:
[0-9]
will match any single digit, and
[a-zA-Z]
will match any English letter.
To match any Western European letter (under most recent versions of
Windows, in North America and Western Europe), use
[a-zA-ZÀ-ÖØ-öø-ÿ]
.
A character class can contain both ranges and single characters, and
the order doesn't matter as long as each range is written
low-high.
^
).
Examples: [^0-9 ]
matches any character except a
digit or a space; the[^a-z]
matches "the" followed by
anything except a lower-case letter.
Note: The negative character class matches any character not within
the square brackets, but it does match a character. For instance,
the[a-z]
matches "the" followed by something other than a
lower-case letter; it does not match "the" at the end of a line
because then "the" is not followed by any characters. Please see the
extended example at the end of these rules for
further explanation.
+
) after a character or character
class matches one or more occurrences; an asterisk
(*
) matches zero or more occurrences.
Examples: snor+ing
matches "snoring", "snorring",
"snorrring", and so on, but not "snoing". snor*ing
matches "snoing", "snoring", and so on.
Used with a character class, the plus sign and asterisk match any
multiple characters in the class, not only multiple occurrences
of the same character. For instance, sno[rw]+ing
matches
"snowing", "snorwing", "snowrring", and so on.
Obligatory example: [A-Za-z_]+[A-Za-z0-9_]*
matches a C
or C++ identifier, which is at least one letter or underscore,
followed by any number of letters, digits, and underscores.
^
) at the start of a regular
expression means that the pattern starts at the beginning of a line in
the file(s) being searched. A dollar sign ($
,
ASCII 36) at the end of a regular expression means that the pattern
ends at the end of a line in the file(s) being searched.
Example: ^[wW]hereas
matches the word "Whereas" or
"whereas" at the start of a line, but not in the middle of a line.
Blanks are not ignored, so if you want to find that word whenever it's
the first word of the line, you need to use a pattern like
^ *[wW]hereas
to allow for indention.
Examples: ^$
will find lines that contain no characters at
all. ^ *$
will match lines that contain no
characters or contain only spaces. ^ +$
will match
lines that contain only spaces, but not empty lines.
Examples: ^[A-Za-z]+$
will find every line that contains
nothing but English letters. ^ *[A-Za-z]+ *$
will find every line that contains exactly one English word, possibly
preceded or followed by blanks.
/I
option to make the search case blind, and concentrate
on constructing the regular expressions. At first glance,
[^a-z]the[^a-z]
seems adequate: anything other than a
letter, followed by "the", followed by anything but a letter. That
lets in "the" and rules out "then" and "mother". But it also rules
out "the" at the beginning or end of a line. Remember that a negative
character class does insist on matching some character. So the
solution is to have four regular expressions, for "the" at the
beginning, middle, or end of a line, or on a line by itself:
^the[^a-z] [^a-z]the[^a-z] [^a-z]the$ ^the$So to search for just the occurrences of the word "the", you'd put those four lines in a file and then use the
/F
option on
GREP.
GREP /F
), the above rules are sufficient. But when
you enter a regular expression on the command line, you also have to
contend with DOS command parsing. Putting double quotes around the
expression will help, but it doesn't avoid all problems.
Suppose you want to search for a character like <
or
|
. The DOS command-line parser always gives these
characters special meanings, so if you put them in a regular
expression GREP will never see them. Therefore, GREP defines several
escape sequences to let you make an end run around DOS:
\"
is the double quote (ASCII 34)
\c
is the comma (,
)
\e
is the escape character (ASCII 27)
\g
is the greater-than sign (>
)
\i
is the semicolon (;
)
\l
is the less-than sign (<
)
\q
is the equal sign (=
)
\s
is the space character (ASCII 32)
\t
is the tab character (ASCII 9, Control-I)
\v
is the vertical bar (|
)
0x
), or
octal (leading 0). Example: capital A would be
\65
, \0x41
, or \0101
.
Finally, if your regular expression begins with a minus
(-
) or slash (/
), GREP will try to interpret
it as an option. Example: if you're searching for the string
"-in-law", GREP will think you're trying to turn on the
options /I
, /N
, and so on. To avoid this
problem, use a leading backslash (\-in-law
).
Remember, the rules in this section are required only to get around
parsing problems on the command line. These escape sequences are not
needed, and don't work, in regular expressions in a file or when you
use the /F-
option to enter regular expressions on
separate lines from the keyboard.
ORS_GREP
environment variable. You have the same freedom
as on the command line: leading slashes or hyphens, space separation
or options run together, caps or lower case.
Only options can be put in the environment variable. If you want to
"can" a regular expression, put it in a file and put the
/F
file option in the environment variable.
If you have some options in the environment variable but you don't want one of them for a particular run of GREP, you don't have to edit the environment variable. You can make most changes on the command line, like this:
/N
in the environment variable. Then if
you don't want line numbers in a particular run of GREP, just specify
/N
on the command line for that run to cancel the
/N
option set in the environment variable.
/0
and /1
, which set
return values from GREP, override each other. The latest one specified
on the command line will be effective.
/D
and /F
options, if set in the
environment variable, cannot be turned off on the command line.
However, you can specify different files on the command line with
either of those options.
/P
in the environment variable can be overridden by a
different /P
setting on the command line. You can use
/P0
to request no context lines.
If you're ever in doubt about the interaction of options between the command line and the environment variable, simply type
grep /dand GREP will tell you all the option settings in effect.
IF ERRORLEVEL
in
a batch file.
255 | bad option, or other error on the command line |
0 | program ran to completion (whether or not there were any matches) |
You might want to use GREP in a batch file or a makefile and take
different actions depending on whether matches were found or not.
To do this, use the /0
or /1
option. The /1
option returns an error level of 1 if
matches were found or 0 if there were no matches. /0
is
the opposite: it returns 0 if there were matches or 1 if there were
none. In other words, the /0
or /1
option
gives the value you want GREP to return if matches are found.
Regular expressions are limited to 127 input characters, and GREP will behave strangely if you enter a longer expression.
GREP's regular expressions are slightly
different from UNIX grep's. Specifically, to accommodate DOS
command-line parsing, GREP defines quite a few more
escape characters like \c
and
\s
, as well as numeric escapes. On the other hand, GREP
does not (yet) implement ?
, \<
,
\(
, \{
, and \|
in regular
expressions.
/F-
option.
/I
, character classes entered in
lower case were expanded incorrectly.