GREP -- Find Regular Expressions in Files

program and documentation by Stan Brown, Oak Road Systems
revised February 20, 1999
Copyright © 1986-1999 by Oak Road Systems, +1 216 371-0043

GREP is a filter that searches input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines.

        Why GREP?
License and warranty
System requirements
Installation
User instructions
Options
      General options
Output options
Regular expressions
Normal and special characters
How to construct a regular expression
Special rules for the command line
Environment variable
Return values
Bugs
What's new?

 

Why GREP?


The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, then, and so on? You don't really want to search for a specific string. Rather, what you're looking for is a regular expression, namely the preceded and followed by something other than a letter. GREP to the rescue!

GREP combines most features of UNIX grep and fgrep. GREP has many other advantages over FIND besides using regular expressions:


License and warranty


GREP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.
 

System requirements


The 16-bit version runs under DOS 2.0 or higher, including a DOS box under Windows. The 32-bit version requires a DOS box under Windows 98, Win95, or Win NT 4.0.

The two versions operate the same and have the same features, except that the 32-bit version supports long filenames.
 


Installation


There is no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path.

You may wish to rename the version you use more often to the simpler GREP.EXE. All the following user instructions will assume you've done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.
 


User instructions


For a quick summary of operating instructions, type
        grep
The full command form is one of
        grep [options] ["regexp"] [<inputfile] [>outputfile]
        grep [options] ["regexp"] inputfiles [>outputfile]
In the first form, GREP is a filter, taking its input from the standard input (most likely piped from some other command). In the second form, GREP takes its input from any number of files, possibly specified with paths and wildcards. Please be aware that the 16-bit and 32-bit GREP programs expand wildcards slightly differently because the 32-bit version supports long filenames. Thus the 32-bit version would expand abc* to include all files, with any extension or none, whose names start with abc; with the 16-bit version you need abc*.* to get the same result.

In both forms, the outputfile will receive the matching lines (or other output, depending on the output options). For output to the screen, omit > and outputfile.

"regexp" is a regular expression; see below for how to construct one. A regular expression is normally required on the command line; however, if you use the /F option, regular expressions will be taken from file instead of the command line.

Example:

        grep -I "pic[t\s]" \proj\*.cob >prn
will examine every COBOL source file in the PROJ directory and print every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the /I option).
 

Options


GREP's operation can be modified by several options, either on the command line (before or after the regular expression) or in an environment variable (see below).

You have a lot of freedom about how you enter options. You can use a leading hyphen or slash mark; you can use upper- or lower-case letters; you can leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the P3 and B options:

        /p3 -b    /b/P3    /p3B    -B/P3    -P3 -b
This document will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.

General options

Quite a number of GREP's options control the appearance of the output, and those options are collected in a separate section below. This section explains the other options.
/? Display a help message and exit with no further processing.
 
/0 or /1
These options control the values that GREP returns in the DOS error level. /0 returns 0 if there are differences or 1 if there are no differences; /1 returns 1 for differences or 0 for no differences. For more details, see Return values below.
 
/Dfile
Display debugging information. Debugging information includes whether you're running the 16-bit or 32-bit version, the value of the environment variable, the values of all options specified or implied, the raw and interpreted values of the regular expression(s), and details of every file scanned. This information is normally suppressed, but you may find it helpful if GREP seems to behave in a way you don't expect.

file is optional. If you specify /D by itself (followed by a space), debugging information will be sent to the standard error stream, which is normally the screen.

Since the debugging information can be voluminous, you may want to specify an output file: file must follow the D with no intervening space, and the filename ends at the next space. GREP will append to the file if it already exists.
 

/Ffile
Read one or more regular expressions from file instead of taking a single regular expression from the command line. You must enter the regular expressions one per line in the file; don't surround them with quotes. (This is similar to the F option in UNIX grep, but unlike UNIX grep, you can have multiple regular expressions in the file.)

file must follow the /F with no intervening space, and the filename ends at the next space.

If you use a minus sign as the filename (/F- option), GREP will accept regular expressions from standard input. Don't do this if you are redirecting input from a file!
 

/I
Ignore case, treating caps and lower case as matching each other. (This is the same as the I option in UNIX grep and DOS FIND.)

Caution: the /I option does not apply to 8-bit characters (characters 128-255). Because there are many different encoding schemes, GREP doesn't know which characters above 127 correspond to each other as upper and lower case on your computer. Therefore, if you want case-blind comparisons, you must explicitly code any 8-bit upper and lower case in your regular expression. For instance, to search for the French word "thé" in upper or lower case, code it as th[éÉ]. The "th", being 7-bit ASCII characters, will be handled correctly by the /I option.
 

/Q
Suppress the program logo and all warning messages. Error messages will still be displayed (as will debug output, if you set the /D option).
/V
Show or count the lines that don't match instead of those that do. (This is the same as the V option in UNIX grep and DOS FIND.)

Output options

This section lists GREP's options that control the appearance of the output. The other options are listed in the preceding section.

Before going through the options, let's take a moment to look at some of the possible output formats. By default, GREP's output is similar to that of DOS FIND:

        ---------- GREP.C
                op_showhead = ShowNoHeads;
                else if (op_showhead == ShowNoHeads)
                op_showhead = ShowNoHeads;

        ---------- GREP_MAT.C
                op_showhead == ShowNoHeads)
However, the /U option (see below) produces UNIX grep-style output like this:
        GREP.C:        op_showhead = ShowNoHeads;
        GREP.C:        else if (op_showhead == ShowNoHeads)
        GREP.C:        op_showhead = ShowNoHeads;
        GREP_MAT.C:        op_showhead == ShowNoHeads)
As you can see, the main difference is that DOS-style output has the filename as a header above the group of matching lines from that file, and UNIX-style output has the name of the file on every matching line.

Now, here are the options that control what GREP outputs and how it is formatted:

/B
Display a header for every file examined, even if the file contains no matches. (This option is meaningful only with DOS-style output.)
 
/C
Display only a count of the matching lines in each file, instead of the matching lines themselves. (This is the same as the C option in UNIX grep and DOS FIND.)
 
/H
Don't display any filenames as headers. This is useful when you're using GREP as a filter to extract lines from a file for processing by another program, like this:
    grep /H "Directory" <inputfile | other program
/L
Display only a bare list of the names of files that contain matches, not the actual lines that match. With the /V option, display the names of files that contain no matches. (This is the same as the L option in UNIX grep.)
 
/N
Show the line number before each matching line. (This is the same as the N option in UNIX grep and DOS FIND.) DOS-style output looks like this:
    ---------- GREP.C
    [ 144]        op_showhead = ShowNoHeads;
    [ 178]        else if (op_showhead == ShowNoHeads)
    [ 366]        op_showhead = ShowNoHeads;

    ---------- GREP_MAT.C
    [  98]        op_showhead == ShowNoHeads)
With both the /N and /U options, the UNIX-style output looks like this:
    GREP.C:144:        op_showhead = ShowNoHeads;
    GREP.C:178:        else if (op_showhead == ShowNoHeads)
    GREP.C:366:        op_showhead = ShowNoHeads;
    GREP_MAT.C:98:        op_showhead == ShowNoHeads)
UNIX-style output is suitable for use with the excellent freeware editor Vim.
 
/Pbefore,after
Show context lines before and after each match. If you omit after, GREP will show the same number of lines after each match as before. If you omit both numbers, GREP will show two lines before and two lines after.

Either number can be 0. For instance, use /P0,4 if you want to show every match and the four lines that follow it.

If you use the /P option, you probably want to use the /N option as well, to display line numbers. In that case, the punctuation of the line numbers will distinguish which lines are actual matches and which are displayed for context. Here is some DOS-style output from a run with the options /P1,1N set:

    ---------- GREP.C
      143     if (opcount >= argc)
    [ 144]        op_showhead = ShowNoHeads;
      145
      177             PRTDBG "with each matching line");
    [ 178]        else if (op_showhead == ShowNoHeads)
      179             PRTDBG "NO");
      365     if (myToggle('L') || myToggle('U') || myToggle('H'))
    [ 366]        op_showhead = ShowNoHeads;
      367     else if (myToggle('B'))

    ---------- GREP_MAT.C
       97         op_showwhat == ShowMatchCount ||
    [  98]        op_showhead == ShowNoHeads)
       99         headered = TRUE;
As you can see, the actual matches have square brackets around the line numbers, and the context lines do not.
 
/U
Show the filename with each matching line, instead of just once in a separate header. This UNIX-style output is useful with editors like Vim that can automatically jump to the file that contains a match. Some examples of UNIX-style output have been given earlier in this section.

There's one small difference from UNIX grep output: UNIX grep suppresses the filename when there is only one input file, but GREP assumes that if you didn't want the filename you wouldn't have specified the /U option. Neither GREP and UNIX grep displays a filename if input comes from a file via < redirection.

Some combinations of output options are logically incompatible. For instance, /H/L makes no sense. In such cases, GREP will turn off one of the incompatible options and tell you what it did.

The following list of incompatibilities is given for completeness only:
       /B   overrides /H; ignored with /L or /U
       /C   overrides /H, /L, /N, /P
       /H   ignored with /B, /C, /L, /U
       /L   overrides /B, /H, /N, /P, /U; ignored with /C
       /N   ignored with /C or /L
       /P   ignored with /C or /L
       /U   overrides /B and /H; ignored with /L


Regular expressions


A regular expression is essentially a string with a bunch of operators thrown in to express possibilities like "any of these characters" and "repeated".

Normal and special characters

To understand regular expressions, you need to know the difference between special characters and normal characters. (The meanings of the special characters will be explained in the next section.)

The following characters are special if they occur in the listed contexts:

Any other character, or one of the above characters not in the listed context, is a normal character. Any of the above characters also becomes a normal character if preceded by a backslash, as will be shown below.

How to construct a regular expression

Here are the rules for a regular expression:

single character
Any normal character matches itself. To match a special character, precede it with a backslash (\). Example: to search for the string "^abc\def", you must put backslashes before the two special characters to make GREP treat them as normal characters and not give them special meanings, so that \^abc\\def is your regular expression.

You can use any character from space through character 255. If using 8-bit characters on the command line, see Special rules for the command line below.
 

character class
To match any one of a group of characters, enclose them in square brackets ([ ]). Examples: [aA] will match an upper- or lower-case letter A; sno[wr]ing will match "snowing" or "snoring".

You can indicate a character range with the minus sign (-). Examples: [0-9] will match any single digit, and [a-zA-Z] will match any English letter. To match any Western European letter (under most recent versions of Windows, in North America and Western Europe), use [a-zA-ZÀ-ÖØ-öø-ÿ].

A character class can contain both ranges and single characters, and the order doesn't matter as long as each range is written low-high.
 

negative character class
To match any character that is not in a class, use square brackets with a caret (^). Examples: [^0-9 ] matches any character except a digit or a space; the[^a-z] matches "the" followed by anything except a lower-case letter.

Note: The negative character class matches any character not within the square brackets, but it does match a character. For instance, the[a-z] matches "the" followed by something other than a lower-case letter; it does not match "the" at the end of a line because then "the" is not followed by any characters. Please see the extended example at the end of these rules for further explanation.
 

repetition
A plus sign (+) after a character or character class matches one or more occurrences; an asterisk (*) matches zero or more occurrences. Examples: snor+ing matches "snoring", "snorring", "snorrring", and so on, but not "snoing". snor*ing matches "snoing", "snoring", and so on.

Used with a character class, the plus sign and asterisk match any multiple characters in the class, not only multiple occurrences of the same character. For instance, sno[rw]+ing matches "snowing", "snorwing", "snowrring", and so on.

Obligatory example: [A-Za-z_]+[A-Za-z0-9_]* matches a C or C++ identifier, which is at least one letter or underscore, followed by any number of letters, digits, and underscores.
 

start of line, end of line
A caret (^) at the start of a regular expression means that the pattern starts at the beginning of a line in the file(s) being searched. A dollar sign ($, ASCII 36) at the end of a regular expression means that the pattern ends at the end of a line in the file(s) being searched.

Example: ^[wW]hereas matches the word "Whereas" or "whereas" at the start of a line, but not in the middle of a line. Blanks are not ignored, so if you want to find that word whenever it's the first word of the line, you need to use a pattern like ^ *[wW]hereas to allow for indention.

Examples: ^$ will find lines that contain no characters at all. ^ *$ will match lines that contain no characters or contain only spaces. ^ +$ will match lines that contain only spaces, but not empty lines.

Examples: ^[A-Za-z]+$ will find every line that contains nothing but English letters. ^ *[A-Za-z]+ *$ will find every line that contains exactly one English word, possibly preceded or followed by blanks.

Extended example: suppose you want to find the word "the" in a file, whether in caps or lower case. You can use the /I option to make the search case blind, and concentrate on constructing the regular expressions. At first glance, [^a-z]the[^a-z] seems adequate: anything other than a letter, followed by "the", followed by anything but a letter. That lets in "the" and rules out "then" and "mother". But it also rules out "the" at the beginning or end of a line. Remember that a negative character class does insist on matching some character. So the solution is to have four regular expressions, for "the" at the beginning, middle, or end of a line, or on a line by itself:
        ^the[^a-z]
        [^a-z]the[^a-z]
        [^a-z]the$
        ^the$
So to search for just the occurrences of the word "the", you'd put those four lines in a file and then use the /F option on GREP.

Special rules for the command line

When you enter regular expressions in a file or from the keyboard (using GREP /F), the above rules are sufficient. But when you enter a regular expression on the command line, you also have to contend with DOS command parsing. Putting double quotes around the expression will help, but it doesn't avoid all problems.

Suppose you want to search for a character like < or |. The DOS command-line parser always gives these characters special meanings, so if you put them in a regular expression GREP will never see them. Therefore, GREP defines several escape sequences to let you make an end run around DOS:

In addition, you can enter any character as a numeric sequence in decimal, hex (leading 0x), or octal (leading 0). Example: capital A would be \65, \0x41, or \0101.

Finally, if your regular expression begins with a minus (-) or slash (/), GREP will try to interpret it as an option. Example: if you're searching for the string "-in-law", GREP will think you're trying to turn on the options /I, /N, and so on. To avoid this problem, use a leading backslash (\-in-law).

Remember, the rules in this section are required only to get around parsing problems on the command line. These escape sequences are not needed, and don't work, in regular expressions in a file or when you use the /F- option to enter regular expressions on separate lines from the keyboard.
 


Environment variable


If you use certain options frequently, you can put them in the ORS_GREP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

Only options can be put in the environment variable. If you want to "can" a regular expression, put it in a file and put the /Ffile option in the environment variable.

If you have some options in the environment variable but you don't want one of them for a particular run of GREP, you don't have to edit the environment variable. You can make most changes on the command line, like this:

If you're ever in doubt about the interaction of options between the command line and the environment variable, simply type

        grep /d
and GREP will tell you all the option settings in effect.
 

Return values


By default, GREP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.
 
255   bad option, or other error on the command line
0program ran to completion (whether or not there were any matches)
 

You might want to use GREP in a batch file or a makefile and take different actions depending on whether matches were found or not. To do this, use the /0 or /1 option. The /1 option returns an error level of 1 if matches were found or 0 if there were no matches. /0 is the opposite: it returns 0 if there were matches or 1 if there were none. In other words, the /0 or /1 option gives the value you want GREP to return if matches are found.
 


Bugs


Regular expressions are limited to 127 input characters, and GREP will behave strangely if you enter a longer expression.

GREP's regular expressions are slightly different from UNIX grep's. Specifically, to accommodate DOS command-line parsing, GREP defines quite a few more escape characters like \c and \s, as well as numeric escapes. On the other hand, GREP does not (yet) implement ?, \<, \(, \{, and \| in regular expressions.
 


What's new?


Here's what's new in version 4.2, the latest version. A complete revision history is also available.