History log of /freebsd-current/usr.bin/grep/util.c
Revision Date Author Comments
# e116e040 04-Nov-2023 Kyle Evans <kevans@FreeBSD.org>

grep: don't rely on implementation-defined malloc(0) behavior

The very few places that rely on malloc/calloc of a zero-size region
won't attempt to dereference it, so just return NULL rather than rolling
the dice with the underlying malloc implementation.

Reported by: brooks, Shawn Webb


# e738085b 17-Aug-2023 Dag-Erling Smørgrav <des@FreeBSD.org>

Remove my middle name.


# 1d386b48 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 2a63c3be 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

Remove $FreeBSD$: one-line .c comment pattern

Remove /^/[*/]\s*\$FreeBSD\$.*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 24c681a7 09-Jul-2021 Mariusz Zaborski <oshogbo@FreeBSD.org>

grep: fix combination of quite and count flag

When the quite (-q) flag is provided, we don't expect any output.
Currently, the behavior is broken:
$ grep -cq flag util.c
1

$ grep -cs flag util.c
55

First of all, we print a number to stdout. Secondly, it just returns
0 or 1 (which is unexpected). GNU grep with c and q flags doesn't
print anything.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D31108


# 3e2d96ac 07-Feb-2021 Kyle Evans <kevans@FreeBSD.org>

grep: fix -A handling in conjunction with -m match limitation

The basic issue here is that grep, when given -m 1, would stop all
line processing once it hit the match count and exit immediately. The
problem with exiting immediately is that -A processing only happens when
subsequent lines are processed and do not match.

The fix here is relatively easy; when bsdgrep matches a line, it resets
the 'tail' of the matching context to the value supplied to -A and
dumps anything that's been queued up for -B. After the current line has
been printed and tail is reset, we check our mcount and do what's
needed. Therefore, at the time that we decide we're doing nothing, we
know that 'tail' of the context is correct and we can simply continue
on if there's still more to pick up.

With this change, we still bail out immediately if there's been no -A
flag. If -A was supplied, we signal that we should continue on. However,
subsequent lines will not even bothere to try and process the line. We
have reached the match count, so even if the next line would match then
we must process it if it hadn't. Thus, the loop in procfile() can
short-circuit and just process the line as a non-match until
procmatches() indicates that it's safe to stop.

A test has been added to reflect both that we should be picking up the
next line and that the next line should be considered a non-match even
if it should have been.

PR: 253350
MFC-after: 3 days


# f823c6dc 04-Feb-2021 Kyle Evans <kevans@FreeBSD.org>

grep: fix null pattern and empty pattern file behavior

The null pattern semantics were terrible because I tried to match gnugrep,
but I got it wrong. Let's unwind that:

- The null pattern should match every line if neither -w nor -x.
- The null pattern should match empty lines if -x.
- The null pattern should not match any lines if -w.

The first two will stop processing (shortcut) even if additional patterns
are specified. In any other case, we will continue processing other
patterns. If no other patterns are specified beside a null pattern, then
we match if neither -w nor -x or set and do not match if either of those
are specified.

The justification for -w is that it should match on a whole word, but the
null pattern deos not have a whole word to match on.

Empty pattern files should never match anything, and more importantly, -v
should cause everything to be written.

PR: 253209
MFC-after: 4 days


# 2dfa4b66 08-Dec-2020 Bryan Drewery <bdrewery@FreeBSD.org>

fts_read: Handle error from a NULL return better.

This is addressing cases such as fts_read(3) encountering an [EIO]
from fchdir(2) when FTS_NOCHDIR is not set. That would otherwise be
seen as a successful traversal in some of these cases while silently
discarding expected work.

As noted in r264201, fts_read() does not set errno to 0 on a successful
EOF so it needs to be set before calling it. Otherwise we might see
a random error from one of the iterations.

gzip is ignoring most errors and could be improved separately.

Reviewed by: vangyzen
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D27184


# 38325e2a 25-Sep-2019 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): various fixes of empty pattern/exit code/-c behavior

When an empty pattern is encountered in the pattern list, I had previously
broken bsdgrep to count that as a "match all" and ignore any other patterns
in the list. This commit rectifies that mistake, among others:

- The -v flag semantics were not quite right; lines matched should have been
counted differently based on whether the -v flag was set or not. procline
now definitively returns whether it's matched or not, and interpreting
that result has been kicked up a level.
- Empty patterns with the -x flag was broken similarly to empty patterns
with the -w flag. The former is a whole-line match and should be more
strict, only matching blank lines. No -x and no -w will will match the
empty string at the beginning of each line.
- The exit code with -L was broken, w.r.t. modern grep. Modern grap will
exit(0) if any file that didn't match was output, so our interpretation
was simply backwards. The new interpretation makes sense to me.

Tests updated and added to try and catch some of this.

This misbehavior was found by autoconf while fixing ports found in PR 229925
expecting either a more sane or a more GNU-like sed.

MFC after: 1 week


# 031f92f5 14-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Remove redundant initialization; unconditionally assigned later


# be13c0f9 09-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Some more int -> bool conversions and name changes

Again motivated by upcoming work to rewrite a bunch of this- single-letter
variable names and slightly misleading variable names ("lastmatches" to
indicate that the last matched) are not helpful.


# bd60b9b4 07-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Slooowly peel away the chunky onion

(or peel off the band-aid, whatever floats your boat)

This addresses two separate issues:

1.) Nothing within bsdgrep actually knew whether it cared about line numbers
or not.

2.) The file layer knew nothing about the context in which it was being
called.

#1 is only important when we're *not* processing line-by-line. #2 is
debatably a good idea; the parsing context is only handy because that's
where we store current offset information and, as of this commit, whether or
not it needs to be line-aware.


# 66f780ae 07-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Don't initialize fts_flags twice

Admittedly, this is a clang-scan complaint... but it wasn't wrong. fts_flags
is initialized by all cases in the switch(), which should be fairly obvious.
Annotate this anyways.


# 40f0e0b1 07-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): whoops, garbage collect the now write-only variable


# cbfff13f 07-Jun-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Do some less dirty things with return types

Neither procfile nor grep_tree return anything meaningful to their callers.
None of the callers actually care about how many lines were matched in all
of the files they processed; it's all about "did anything match?"

This is generally just a light refactoring to remind me of what actually
matters as I'm rewriting these bits to care less about 'stuff'.


# 30dc9502 06-Jun-2018 Baptiste Daroussin <bapt@FreeBSD.org>

Remove NLS support from BSD grep

GNU grep as in actually in base does not have any translations support
compiled in, so no functionnality loss.

We do support 193 locales in base, we will never catch up on that number of
translation with bsd grep.

Removing NLS support make bsd grep consistent with the other binaries in base
which are not translated, and also reduce a little bit the code.

Reviewed by: kevans
Approved by: kevans
Discussed with: kevans @BSDCan
Differential Revision: https://reviews.freebsd.org/D15682


# a2584d1b 03-May-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: annihilate our in-tree TRE, previously disabled by default

It was an old TRE that had plenty of bugs and no performance gain over
regex(3). I disabled it by default in r323615, and there was some confusion
about what the knob does- likely due to poor naming on my part- to the tune
of "well, it sounds like it should speed things up" (mentioned by multiple
people).

To compound this, I have no intention of maintaining a second regex
implementation. If someone would like to step up and volunteer to maintain a
lean-and-mean implementation for grep, this is OK, but we have very few
volunteers to maintain even our primary regex implementation.


# f2f0b02b 02-May-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Adjust a missed NLS reference that was invalidated by recent work

Submitted by: Dan McGregor <dan.mcgregor@usask.ca>


# 66ab2983 21-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Use grep_strdup instead of grep_malloc+strcpy


# ff415f05 21-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Fix --include/--exclude ordering issues

Prior to r332851:
* --exclude always win out over --include
* --exclude-dir always wins out over --include-dir

r332851 broke that behavior, resulting in:
* First of --exclude, --include wins
* First of --exclude-dir, --include-dir wins

As it turns out, both behaviors are wrong by modern grep standards- the
latest rule wins. e.g.:

`grep --exclude foo --include foo 'thing' foo`
foo is included

`grep --include foo --exclude foo 'thing' foo`
foo is excluded

As tested with GNU grep 3.1.

This commit makes bsdgrep follow this behavior.

Reported by: se


# e3a2abad 20-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: More trivial cleanup/style cleanup

We can avoid branching for these easily reduced patterns


# f3cf3e59 20-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Some light cleanup

There's no point checking for a bunch of file modes if we're not a
practicing believer of DIR_SKIP or DEV_SKIP.

This also reduces some style violations that were particularly ugly looking
when browsing through.


# 042db8e8 20-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Break procmatches down a little bit more

Split the matching and non-matching cases out into their own functions to
reduce future complexity. As the name implies, procmatches will eventually
process more than one match itself in the future.


# d83f17e5 19-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Add some TODOs for future work on operating on chunks


# 5ea3fdc7 19-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Clean up procmatches a little bit


# 81c634e8 19-Apr-2018 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: Split match processing out of procfile

procfile is getting kind of hairy, and it's not going to get better as we
correct some more bits that assume we process one line at a time.


# 1de7b4b8 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

various: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.


# 05ad8215 23-Aug-2017 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: add a primitive literal matcher

fgrep/grep -F will error out at runtime if compiled with a regex(3)
that does not define REG_NOSPEC or REG_LITERAL. glibc is one such regex(3)
implementation, and as it turns out they don't support literal matching at
all.

Provide a primitive literal matcher for use with glibc and other
implementations that don't support literal matching so that we don't
completely lose fgrep/grep -F if compiled against libgnuregex on stable/10,
stable/11, or other systems that we don't necessarily support.

This is a wholly unoptimized implementation with no plans to optimize it as
of now. This is due to both its use-case being primarily on unsupported
systems in the near-distant future and that it's reinventing the wheel that
we already have available as a feature of regex(3).

Reviewed by: cem, emaste, ngie
Approved by: emaste (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D12056


# de3d7a82 17-Aug-2017 Kyle Evans <kevans@FreeBSD.org>

bsdgrep: cast pmatch.rm_so to fix build when linking against libgnuregex

Reported by: many
Approved by: emaste (mentor)
MFC after: immediate


# 0e957942 24-Jul-2017 Kyle Evans <kevans@FreeBSD.org>

bsdgrep(1): Don't exit before processing every file

Given an empty pattern (i.e. grep "" A B), bsdgrep(1) would previously exit()
with the appropriate exit code upon encountering an empty file. Likely intended
as an optimization, but this behavior is technically incorrect since an empty
pattern should match every line.

PR: 220924
Reviewed by: emaste, cem (earlier version), ngie
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D11698


# 26e1c38f 06-Jul-2017 Kyle Evans <kevans@FreeBSD.org>

Update copyright e-mail address to @FreeBSD.org address

Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D11508


# b05c7cde 29-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: bump version number and add Kyle Evans copyright

The following changes have been made over the last couple of months:

Features:

- With bsdgrep -r, the working directory is implied if no directory is
specified
- bsdgrep will now behave as bsdgrep -r does when it's named rgrep
- bsdgrep now understands -z/--null-data to use \0 as EOL
- GNU regex compatibility is now indicated with a "GNU compatible" in
the version string

Fixes:

- --mmap no longer hangs when coming across an EOF without an
accompanying EOL
- -o/--color matching generally improved, now produces earliest /
longest matches
- Context output now more closely aligns with GNU grep
- Zero-length matches no longer exhibit broken behavior
- Every output line now honors -b/-H/-n flags

Tests have been added for previous regressions as well as other
previously untested behaviors.

Various other fixes have been commited, and refactoring for further /
later improvements has taken place.

(The original submission changed the version string to 2.5.2, but I
decided to use 2.6.0 to reflect the addition of new features.)

Submitted by: Kyle Evans <kevans91@ksu.edu>
Differential Revision: https://reviews.freebsd.org/D10982


# 8bf46064 25-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: correct assumptions to prepare for chunking

Correct a couple of minor BSD grep assumptions that are valid for line
processing but not future chunk-based processing.

Submitted by: Kyle Evans <kevans91@ksu.edu>
Reviewed by: bapt, cem
Differential Revision: https://reviews.freebsd.org/D10824


# 6d635d3b 20-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: Correct per-line line metadata printing

Metadata printing with -b, -H, or -n flags suffered from a few flaws:

1) -b/offset printing was broken when used in conjunction with -o

2) With -o, bsdgrep did not print metadata for every match/line, just
the first match of a line

3) There were no tests for this

Address these issues by outputting this data per-match if the -o flag is
specified, and prior to outputting any matches if -o but not --color,
since --color alone will not generate a new line of output for every
iteration over the matches.

To correct -b output, fudge the line offset as we're printing matches.

While here, make sure we're using grep_printline in -A context. Context
printing should *never* look at the parsing context, just the line.

The tests included do not pass with gnugrep in base due to it exhibiting
similar quirky behavior that bsdgrep previously exhibited.

Submitted by: Kyle Evans <kevans91@ksu.edu>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D10580


# fe8c9d5b 19-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: emit more than MAX_LINE_MATCHES per line

We should not set an arbitrary cap on the number of matches on a line,
and in any case MAX_LINE_MATCHES of 32 is much too low. Instead, if we
match more than MAX_LINE_MATCHES, keep processing and matching from the
last match until all are found.

For the regression test, we produce 4096 matches (larger than we expect
we'll ever set MAX_LINE_MATCHES) and make sure we actually get 4096
lines of output with the -o flag.

We'll also make sure that every distinct line is getting its own line
number to detect line metadata not being printed as appropriate along
the way.

PR: 218811
Submitted by: Kyle Evans <kevans91@ksu.edu>
Reported by: jbeich
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D10577


# b5fc583c 15-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: don't allow negative -A / -B / -C

Previously, when given a negative -A/-B/-C argument bsdgrep would
overflow the respective context flag(s) and exhibited surprising
behavior.

Fix this by removing unsignedness of Aflag/Bflag and erroring out if
we're given a value < 0. Also adjust the type used to track 'tail'
context in procfile() so that it accurately reflects the Aflag value
rather than overflowing and losing trailing context.

This also fixes an inconsistency previously existing between -n and
-C "n" behavior. They are now both limited to LLONG_MAX, to be
consistent.

Add some test cases to make sure grep errors out properly for both
negative context values as well as non-numeric context values rather
than giving bogus matches.

Submitted by: Kyle Evans <kevans91@ksu.edu>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D10675


# e2127de8 05-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: don't ouptut matches with -c, -l, -L

Refactoring done in r317703 broke -c, -l, and -L flags implying
suppression of match printing. Fortunately this is just a matter of not
doing any printing of the resulting matches and context printing was not
broken in this refactoring.

Add some regression tests since this area may still see further
refactoring, include different context flags as well even though they
were not broken in this case.

PR: 219077
Submitted by: Kyle kevans91@ksu.edu
Reported by: markj
Reviewed by: cem, ngie
Differential Revision: https://reviews.freebsd.org/D10607


# 83fd8885 03-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: correct uninitialized variable introduced in r317703

CID: 1374747
Submitted by: Kyle Evans <kevans91@ksu.edu>


# a4f3f02b 02-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: fix -w flag matching with an empty pattern

-w flag matching with an empty pattern was generally 'broken', allowing
matches to occur on any line whether or not it actually matches -w
criteria.

This fix required a good amount of refactoring to address. procline()
is altered to *only* process the line and return whether it was a match
or not, necessary to be able to short-circuit the whole function in case
of this matchall flag. -m flag handling is moved out as well because it
suffers from the same fate as context handling if we bypass any actual
pattern matching.

The matching context (matches, mostly) didn't previously exist outside
of procline(), so we go ahead and create context object for file
processing bits to pass around. grep_printline() was created due to
this, for the scenarios where the matches don't actually matter and we
just want to print a line or two, a la flushing the context queue and
no -o or --color specified.

Damage from this broken behavior would have been mitigated by the fact
that it is unlikely users would invoke grep -w with an empty pattern.

This was identified while checking PR 105221 for problems it this may
cause in BSD grep, but PR 105221 is *not* a report of this behavior.

Submitted by: Kyle Evans <kevans91 at ksu.edu>
Differential Revision: https://reviews.freebsd.org/D10433


# 945fc991 01-May-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: fix -w -v matching improperly with certain patterns

-w and -v flag matching was mostly functional but had some minor
problems:

1. -w flag processing only allowed one iteration through pattern
matching on a line. This was problematic if one pattern could match
more than once, or if there were multiple patterns and the earliest/
longest match was not the most ideal, and

2. Previous work "fixed" things to not further process a line if the
first iteration through patterns produced no matches. This is clearly
wrong if we're dealing with the more restrictive -w matching.

#2 breakage could have also occurred before recent broad rewrites, but
it would be more arbitrary based on input patterns as to whether or not
it actually affected things.

Fix both of these by forcing a retry of the patterns after advancing
just past the start of the first match if we're doing more restrictive
-w matching and we didn't get any hits to start with. Also move -v flag
processing outside of the loop so that we have a greater change to match
in the more restrictive cases. This wasn't strictly wrong, but it could
be a little more error prone.

While here, introduce some regressions tests for this behavior and fix
some excessive wrapping nearby that hindered readability. GNU grep
passes these new tests.

PR: 218467, 218811
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem, ngie
Differential Revision: https://reviews.freebsd.org/D10329


# 3f39ffc8 21-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: add BSD_GREP_FASTMATCH knob for built-in fastmatch

Bugs have been found in the fastmatch implementation as used in bsdgrep.
Some have been fixed (r316495) while fixes for others are in review
(D10098).

In comparison with the fastmatch implementation, Kyle Evans found that:

- regex(3)'s performance with literal expressions offers a speed
improvement over fastmatch

- regex(3)'s performance, both with simple BREs and EREs, seems to be
comparable

The regex implementation was imported in r226035, and the commit message
reports:

This is a temporary solution until the whole regex library is
not replaced so that BSD grep development can continue and the
backported code gets some review and testing. This change only
improves scalability slightly, there is no big performance boost
yet but several minor bugs have been found and fixed.

Introduce a WITH_/WITHOUT_BSD_GREP_FASTMATCH knob to support testing
of both approaches.

PR: 175314, 194823
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: bdrewery (in part)
Differential Revision: https://reviews.freebsd.org/D10282


# e06ffa32 17-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: fix zero-length matches without the -o flag

r316477 broke zero-length matches when not using the -o flag, by
skipping over them entirely.

Add a regression test so that it doesn't break again in the future.

Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem emaste ngie
Differential Revision: https://reviews.freebsd.org/D10333


# 22130a21 17-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: remove output separators between overlapping segments

Make bsdgrep more sensitive to context overlaps. If it's printing
context that either overlaps or is immediately adjacent to another bit
of context, don't print a separator.

- Non-overlapping segments no longer have two separators between them

- Overlapping segments no longer have separators between them with
overlapping sections repeated

Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D10105


# a461896a 17-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: for -r, use the working directory if none specified

This is more sensible than the previous behaviour of grepping stdin,
and matches newer GNU grep behaviour.

PR: 216307
Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem, emaste, ngie
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/


# 5ee1ea02 17-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: add -z/--null-data support

-z treats input and output data as sequences of lines terminated by a
zero byte instead of a newline. This brings it more in line with GNU grep
and brings us closer to passing the current tests with BSD grep.

Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reviewed by: cem
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D10101


# a5ed8685 04-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: revert color changes from r316477

r316477 changed the color output to match exactly the in-tree GNU grep,
but introduces unnecessary escape sequences.

Submitted by: Kyle Evans <kevans91 at ksu.edu>
Reported by: ache
MFC after: 1 month
MFC with: r316477


# a734ae9c 04-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: Initialize vars to avoid a false positive GCC warning

Reported by: lwhsu
MFC after: 1 month
MFC with: r316477


# 87c485cf 03-Apr-2017 Ed Maste <emaste@FreeBSD.org>

bsdgrep: fix matching behaviour

- Set REG_NOTBOL if we've already matched beginning of line and we're
examining later parts

- For each pattern we examine, apply it to the remaining bits of the
line rather than (potentially) smaller subsets

- Check for REG_NOSUB after we've looked at all patterns initially
matching the line

- Keep track of the last match we made to later determine if we're
simply not matching any longer or if we need to proceed another byte
because we hit a zero-length match

- Match the earliest and longest bit of each line before moving the
beginning of what we match to further in the line, past the end of the
longest match; this generally matches how gnugrep(1) seems to behave,
and seems like pretty good behavior to me

- Finally, bail out of printing any matches if we were set to print all
(empty pattern) but -o (output matches) was set

PR: 195763, 180990, 197555, 197531, 181263, 209116
Submitted by: "Kyle Evans" <kevans91@ksu.edu>
Reviewed by: cem
MFC after: 1 month
Relnotes: Yes
Differential Revision: https://reviews.freebsd.org/D10104


# 33f5799a 28-Jul-2016 Ed Schouten <ed@FreeBSD.org>

Call basename() in a portable way.

Pull a copy of the filename string before calling basename(). Change the
loop to not return on its own, so we can put a free() statement at the
bottom.


# 2ac5b1c7 17-Aug-2014 Gabor Kovesdan <gabor@FreeBSD.org>

- Do not look for more matching lines if -L is specified

Submitted by: eadler (based on)
MFC after: 2 weeks


# 78bef01f 17-Jul-2014 Pedro F. Giffuni <pfg@FreeBSD.org>

grep: Fix type.

Obtained from: NetBSD (CVS rev. 1.17)
MFC after: 3 days


# 66edec08 20-Jun-2014 Glen Barber <gjb@FreeBSD.org>

Fix a bug in bsdgrep(1) where patterns are not correctly
detected.

Certain criteria must be met for this bug to show up:

* the -w flag is specified, and
* neither -o or --color are specified, and
* the pattern is part of another word in the line, and
* the other word that contains the pattern occurs first

PR: 181973
MFC after: 3 days
Sponsored by: The FreeBSD Foundation


# 924500b7 20-Dec-2012 Eitan Adler <eadler@FreeBSD.org>

Make bsdgrep behave as gnugrep and as documented: -m should only stop
reading the specific file, not any file.

Tested by: frogs (irc)
Reviewed by: gabor
Approved by: cperciva (implicit)
MFC after: 1 week


# 6f4cbf7c 06-Dec-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Match GNU behavior of exit code
- Rename variable that has a different meaning now

PR: bin/162930
Submitted by: Jan Beich <jbeich@tormail.net>
MFC after: 1 week


# ede01be2 28-Nov-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Call warnx() instead of errx() if a directory is not readable when using
a recursive search. This is the expected behavior instead of aborting.

PR: bin/162907
Submitted by: Jan Beich <jbeich@tormail.net>
MFC after: 3 days


# f0c94259 28-Nov-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix behavior of --null to match GNU grep

PR: bin/162906
Submitted by: Jan Beich <jbeich@tormail.net>
MFC after: 3 days


# bbf9339d 11-Oct-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix counting of match limit (-m)

Reported by: Nali Toja <nalitoja@gmail.com>
Approved by: delphij (mentor)


# f20f6f3f 05-Oct-2011 Gabor Kovesdan <gabor@FreeBSD.org>

Update BSD grep to the latest development version. It has some code
backported that was written for the TRE integration project in Google
Summer of Code 2011. This is a temporary solution until the whole
regex library is not replaced so that BSD grep development can continue
and the backported code gets some review and testing. This change only
improves scalability slightly, there is no big performance boost yet
but several minor bugs have been found and fixed.

Approved by: delphij (mentor)
Sposored by: Google Summer of Code 2011
MFC after: 1 week


# ad92276e 17-Aug-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix exclusion of directories from a recursive search
- Use FTS_SKIP for exclusion instead of custom code

Submitted by: ttsestt@gmail.com
Approved by: re (kib), delphij (mentor)


# 69a6d198 11-Jun-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Use REG_NOSUB to bypass submatch counting when not necessary. This may
yield in somewhat better performance in a few cases.

Approved by: delphij (mentor)


# cbe6b9e5 11-Jun-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix -w behavior
- Make -F and -w work together
- Fix --color to colorize all of the matches

PR: bin/156826
Submitted by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: delphij (mentor)


# b66a823b 07-Apr-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Adjust a comment to actual behaviour
- Makefile nit
- Add more CVS/SVN keywords to make it easier to track changes from NetBSD
in case they add further improvements

Approved by: delphij (mentor)
Obtained from: The NetBSD Project


# d841ecb3 07-Apr-2011 Gabor Kovesdan <gabor@FreeBSD.org>

- Simplify the fixed string pattern preprocessing code
- Improve readability

Approved by: delphij (mentor)
Obtained from: The NetBSD Project


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# a0ef9ad6 19-Aug-2010 Dag-Erling Smørgrav <des@FreeBSD.org>

UTFize my name.


# 3ed1008b 18-Aug-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Refactor file reading code to use pure syscalls and an internal buffer
instead of stdio. This gives BSD grep a very big performance boost,
its speed is now almost comparable to GNU grep.

Submitted by: Dimitry Andric <dimitry@andric.com>
Approved by: delphij (mentor)


# 59218eb7 15-Aug-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Revert strlcpy() changes to memcpy() because it's more efficient and
former may be safer but in this case it doesn't add extra
safety [1]
- Fix -w option [2]
- Fix handling of GREP_OPTIONS [3]
- Fix --line-buffered
- Make stdin input imply --line-buffered so that tail -f can be piped
to grep [4]
- Imply -h if single file is grepped, this is the GNU behaviour
- Reduce locking overhead to gain some more performance [5]
- Inline some functions to help the compiler better optimize the code
- Use shortcut for empty files [6]

PR: bin/149425 [6]
Prodded by: jilles [1]
Reported by: Alex Kozlov <spam@rm-rf.kiev.ua> [2] [3],
swell.k@gmail.com [2],
poyopoyo@puripuri.plala.or.jp [4]
Submitted by: scf [5],
Shuichi KITAGUCHI <ki@hh.iij4u.or.jp> [6]
Approved by: delphij (mentor)


# 97a012f2 29-Jul-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Some minor changes to the messages to increase usefulness of error msgs

Reviewed by: hrs (Japanese catalogs),
pluknet <pluknet at gmail dot com> (Russian catalog)
Approved by: delphij (mentor)


# 55e44f51 28-Jul-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Use the traditional behaviour for filename and directory name inclusion
and exclusion patterns [1]
- Some improvements on the exiting code, like replacing memcpy with
strlcpy/strcpy

Approved by: delphij (mentor)
Pointed out by: bf [1], des [1]


# 36bcf7c1 25-Jul-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix -l and -L by really surpressing output and just showing filenames

Submitted by: swell.k@gmail.com
Approved by: delphij (mentor)


# 27116286 25-Jul-2010 Gabor Kovesdan <gabor@FreeBSD.org>

- Fix --color behaviour to only output color sequences if stdout is a tty
or if forced mode is specified [1]
- While here, add some alternative names for the options and make then
case-insensitive
- Fix -q and -l behaviour [2]
- Some small changes to make the code easier to review

Submitted by: swell.k@gmail.com [1],
dougb [2]
Approved by: delphij (mentor)


# 0c41ffb3 23-Jul-2010 Xin LI <delphij@FreeBSD.org>

Fix crashes when using grep -R:

- Explicitly pre-zero memory for fts_open parameters.
- Don't test against directory patterns when we are testing direct
leaf of current directory.

While I'm there plug a few of memory leaks.


# 4dc88ebe 22-Jul-2010 Gabor Kovesdan <gabor@FreeBSD.org>

Add BSD grep to the base system and make it our default grep.

Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
lower memory usage than GNU grep, GNU compatibility,
BSD license.

TODO: Performance is somewhat behind GNU grep but it is only
significant for bigger searches. The reason is complex, the
most important factor is that GNU grep uses lots of
optimizations to improve the speed of the regex library.
First, we need a modern regex library (practically by adopting
TRE), add support for GNU-style non-standard regexes and then
reevalute the performance issues and look for bottlenecks. In
the meantime, for those, who need better performance, it is
possible to build GNU grep by setting WITH_GNU_GREP.

Approved by: delphij (mentor)
Obtained from: OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
freegrep (http://github.com/howardjp/freegrep)
Sponsored by: Google SoC 2008
Portbuild tests run by: kris, pav, erwin
Acknowledgements to: fjoe (as SoC 2008 mentor),
everyone who helped in reviewing and testing