History log of /freebsd-10.1-release/lib/libc/regex/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


269484 03-Aug-2014 pfg

MFC r268066:
regex(3): Add support for \< and \> word delimiters

Solaris and other OSs have support for \< and \> as word
delimiters in utilities like sed(1). These are useful to
have for general compatiblity with Solaris but should be
avoided for portability with other systems, including the
traditional BSDs.

Bump __FreeBSD_version as this is likely to affect some
userland utilities.

Reference:
https://www.illumos.org/issues/516

PR: bin/153257
Obtained from: Illumos


265726 09-May-2014 pfg

MFC r265202:
Remove some unreachable breaks in regex.

This is based on a much bigger cleanup done in Illumos.

Reference:
https://www.illumos.org/issues/2077


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


247596 01-Mar-2013 delphij

Fix assignment of maximum bounadary.

Submitted by: Sascha Wildner <saw online de>
Obtained from: DragonFly rev fd39c81ba220f7ad6e4dc9b30d45e828cf58a1ad
MFC after: 2 weeks


232601 06-Mar-2012 theraven

Remove some duplicated copyright notices.

Approved by: dim (mentor)


227753 20-Nov-2011 theraven

Implement xlocale APIs from Darwin, mainly for use by libc++. This adds a
load of _l suffixed versions of various standard library functions that use
the global locale, making them take an explicit locale parameter. Also
adds support for per-thread locales. This work was funded by the FreeBSD
Foundation.

Please test any code you have that uses the C standard locale functions!

Reviewed by: das (gdtoa changes)
Approved by: dim (mentor)


227435 11-Nov-2011 kevlo

Converting int to wint_t leads to broekn comparison of raw char
and encoded wint_t.

Spotted by: ache


227414 10-Nov-2011 kevlo

- Don't handle out-of-memory condition
- Fix types of function arguments match their declaration

Reviewed by: delphij
Obtained from: NetBSD


213573 08-Oct-2010 uqs

mdoc: drop redundant .Pp and .LP calls

They have no effect when coming in pairs, or before .Bl/.Bd


197246 16-Sep-2009 dds

Fix an off-by-one error in the marking of the O_CH operator
following an OOR2 operator.

PR: 130504
MFC after: 2 weeks


197245 16-Sep-2009 dds

Add a couple of debugging statements.


197234 15-Sep-2009 dds

Add two test cases from PR 130504.
An additional one coming from http://www.research.att.com/~gsf/testregex/
was not added; at some point the entire AT&T regression test harness
should be imported here.
But that would also mean commitment to fix the uncovered errors.

PR: 130504
Submitted by: Chris Kuklewicz


182795 05-Sep-2008 keramida

Add two example regexps: (1) one for matching all the characters
that belong in a character class, and (2) one for matching all
the characters *not* in a character class.

Submitted by: Mark B, mkbucc at gmail.com
MFC after: 3 days


176380 18-Feb-2008 kevlo

getopt(3) returns -1, not EOF.


170528 11-Jun-2007 delphij

Diff reduction against other *BSDs: ANSIfy function
prototypes. No function changes.


169982 25-May-2007 delphij

Const'ify and ANSIfy the internal interfaces of regex(3).
This is the final change that makes libc to compile with
WERROR on my amd64 crashbox.


169092 29-Apr-2007 deischen

Use C comments since we now preprocess these files with CPP.


167223 05-Mar-2007 delphij

Test cases for back references.

Obtained from: OpenBSD


167222 05-Mar-2007 delphij

Only stop evaluation of a back reference if the match length is
zero and the recursion level is too deep.

Obtained from: OpenBSD


167216 05-Mar-2007 delphij

Avoid infinite recursion on:

echo "foo foo bar bar bar baz" | sed 's/\([^ ]*\)\( *\1\)*/\1/g'

Obtained from: OpenBSD via NetBSD (rev. 1.18)


165903 09-Jan-2007 imp

Per Regents of the University of Calfornia letter, remove advertising
clause.

# If I've done so improperly on a file, please let me know.


156613 13-Mar-2006 deischen

Add each directory's symbol map file to SYM_MAPS.


156608 13-Mar-2006 deischen

Add symbol maps and initial symbol version definitions to libc.

Reviewed by: davidxu


150053 12-Sep-2005 stefanf

Use prototypes for CHIN1() and CHIN().


149180 17-Aug-2005 tjr

Fix a boundary condition error in slow() and fast() in multibyte locales:
we must allow the character beginning at "p" to be converted to a wide
character for the purposes of EOL processing and word-boundary matching.


149179 17-Aug-2005 tjr

Document the fact that word-boundary matching does not work
properly in multibyte locales.


149009 13-Aug-2005 tjr

Change OUT from -2 to CHAR_MIN-1, making it impossible for it to
inadvertently match a negative char in the RE being compiled.

This fixes compilation of "\376" (as an ERE) and "\376\376" (as a BRE).

PR: 84740
MFC after: 1 week


145493 25-Apr-2005 delphij

Remove unused file.

Confirmed by: tjr [1]

[1] PERFORCE CHANGESET 57044:
http://perforce.freebsd.org/changeView.cgi?CH=57044


141846 13-Feb-2005 ru

Expand *n't contractions.


140505 20-Jan-2005 ru

Sort sections.


139437 30-Dec-2004 dds

Plug memory leak.

PR: bin/75656
MFC after: 2 weeks


137959 21-Nov-2004 tjr

Fix computation of the 'n' argument to mbrtowc (through XMBRTOWC) to avoid
reading past 'stop' in various places when converting multibyte characters.
Reading too far caused truncation to not be detected when it should have
been, eventually causing regexec() to loop infinitely in with certain
combinations of patterns and strings in multibyte locales.

PR: 74020
MFC after: 4 weeks


136091 03-Oct-2004 stefanf

Directly include <runetype.h> for _CurrentRuneLocale, <_ctype.h> doesn't
include it in all cases.


134802 05-Sep-2004 tjr

Fix two problems with REG_ICASE that were introduced with the addition of
multibyte character support:
- In CHadd(), avoid writing past the end of the character set bitmap when
the opposite-case counterpart of wide characters with values less than
NC have values greater than or equal to NC.
- In CHaddtype(), fix a braino that caused alphabetic characters to be
added to all character classes! (but only with REG_ICASE)

PR: 71367


132390 19-Jul-2004 tjr

Update paths to reg*.c and regex2.h. Add a target to build regex.h.


132389 19-Jul-2004 tjr

Update for removal of cclass.h. Trim some useless targets. Invoke mkh
with "sh mkh" so it works if the script is not executable.


132388 19-Jul-2004 tjr

Update for recent changes to struct re_guts. Disable printing the contents
of OANYOF sets for the moment.


132387 19-Jul-2004 tjr

Remove unused files.


132031 12-Jul-2004 tjr

Remove an entry from the BUGS section: we have multibyte character
support now.


132019 12-Jul-2004 tjr

Make regular expression matching aware of multibyte characters. The general
idea is that we perform multibyte->wide character conversion while parsing
and compiling, then convert byte sequences to wide characters when they're
needed for comparison and stepping through the string during execution.

As with tr(1), the main complication is to efficiently represent sets of
characters in bracket expressions. The old bitmap representation is replaced
by a bitmap for the first 256 characters combined with a vector of individual
wide characters, a vector of character ranges (for [A-Z] etc.), and a vector
of character classes (for [[:alpha:]] etc.).

One other point of interest is that although the Boyer-Moore algorithm had
to be disabled in the general multibyte case, it is still enabled for UTF-8
because of its self-synchronizing nature. This greatly speeds up matching
by reducing the number of multibyte conversions that need to be done.


132017 12-Jul-2004 tjr

Add a new error code, REG_ILLSEQ, to indicate that a regular expression
contains an illegal multibyte character sequence.


131973 11-Jul-2004 tjr

Remove incomplete support for multi-character collating elements. Remove
unused character category calculations.


131692 06-Jul-2004 tjr

Document incorrect handling of multibyte characters.


131504 02-Jul-2004 ru

Mechanically kill hard sentence breaks.


119893 08-Sep-2003 ru

mdoc(7): Use the new feature of the .In macro.


111010 16-Feb-2003 nectar

Eliminate 61 warnings emitted at WARNS=2 (leaving 53 to go).
Only warnings that could be fixed without changing the generated object
code and without restructuring the source code have been handled.

Reviewed by: /sbin/md5


108087 19-Dec-2002 ru

mdoc(7) police: "The .Fa argument.".


108037 18-Dec-2002 ru

mdoc(7) police: "The .Fn function".


107052 18-Nov-2002 ru

libc_r wasn't so tied to libc for 22 months.


104358 02-Oct-2002 mike

Add restrict type-qualifier.


102411 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


92991 22-Mar-2002 obrien

Fix the style of the SCM ID's.
I believe have made all of libc .h's as consistent as possible.


92986 22-Mar-2002 obrien

Fix the style of the SCM ID's.
I believe have made all of libc .c's as consistent as possible.


92971 22-Mar-2002 obrien

Back out last commit (rev 1.2). I thought I caught this file in time
when deP'ing. But I guess not.


92905 21-Mar-2002 obrien

Remove __P() usage.


92889 21-Mar-2002 obrien

Remove 'register' keyword.


89647 22-Jan-2002 ru

Fix a typo I made in revision 1.5.

Submitted by: trevor


86208 09-Nov-2001 dcs

The algorithm that computes the tables used in the BM search algorithm sometimes
access an array beyond it's length. This only happens in the last iteration of
a loop, and the value fetched is not used then, so the bug is a relatively
innocent one. Fix this by not fetching any value on the last iteration of said
loop.

Submitted by: MKI <mki@mozone.net>
MFC after: 1 week


84306 01-Oct-2001 ru

mdoc(7) police: Use the new .In macro for #include statements.


81449 10-Aug-2001 ru

mdoc(7) police: protect trailing full stops of abbreviations
with a trailing zero-width space: `e.g.\&'.


81251 07-Aug-2001 ru

mdoc(7) police:

Avoid using parenthesis enclosure macros (.Pq and .Po/.Pc) with plain text.
Not only this slows down the mdoc(7) processing significantly, but it also
has an undesired (in this case) effect of disabling hyphenation within the
entire enclosed block.


79754 15-Jul-2001 dd

Remove whitespace at EOL.


74870 27-Mar-2001 ru

MAN[1-9] -> MAN.


72205 09-Feb-2001 ru

mdoc(7) police: fixed the weird construct.


70966 12-Jan-2001 ru

man(7) -> mdoc(7).


70481 29-Dec-2000 ru

Prepare for mdoc(7)NG.


68820 16-Nov-2000 ru

Replace a `dagger' sign with a `double dagger' one.
The former looks ugly on grotty(1) devices.


68722 14-Nov-2000 ru

Convert this from -man to -mdoc.


62872 10-Jul-2000 green

Actually make it so this Makefile can build grot.


62857 09-Jul-2000 dcs

Add a test case for one of the bugs found on the new additions to
regex(3).


62856 09-Jul-2000 dcs

Spencer's regex(3) test code.

Obtained from: BSD/OS


62855 09-Jul-2000 dcs

altoffset() always returned whenever it recursed, because at the end
of the processing of the recursion, "scan" would be pointing to O_CH
(or O_QUEST), which would then be interpreted as being the end character
for altoffset().

We avoid this by properly increasing scan before leaving the switch.

Without this, something like (a?b?)?cc would result in a g->moffset of
1 instead of 2.

I added a case to the soon-to-be-imported regex(3) test code to catch
this error.


62854 09-Jul-2000 dcs

Since g->moffset points to the _maximum_ offset at which the must
string may be found (from the beginning of the pattern), the point
at which must is found minus that offset may actually point to some
place before the start of the text.

In that case, make start = start.

Alternatively, this could be tested for in the preceding if, but it
did not occur to me. :-)

Caught by: regex(3) test code


62848 09-Jul-2000 dcs

Add some casts here and there.


62817 08-Jul-2000 dcs

Since we have modified charjump to be CHAR_MIN-based, we have to
correct the offset when we free it.

Caught by: phkmalloc


62755 07-Jul-2000 dcs

Do not free NULL pointers.


62754 07-Jul-2000 dcs

Deal with the signed/unsigned chars issue in a more proper manner. We
use a CHAR_MIN-based array, like elsewhere in the code.

Remove a number of unused variables (some due to the above change, one
that was left after a number of optimizing steps through the source).

Brucified by: bde


62674 06-Jul-2000 dcs

I hate signed chars.^W^W^W^W^WCast to unsigned char before using signed
chars as array indices.


62673 06-Jul-2000 dcs

Correct comment to work with test code.

Prevent out of bounds array access in some specific cases.


62670 06-Jul-2000 dcs

Use UCHAR_MAX consistently.


62417 02-Jul-2000 dcs

Fix memory leak introduced with regcomp.c rev 1.14.


62391 02-Jul-2000 dcs

Enhance the optimization provided by pre-matching. Fix style bugs with
previous commits.

At the time we search the pattern for the "must" string, we now compute
the longest offset from the beginning of the pattern at which the must
string might be found. If that offset is found to be infinite (through
use of "+" or "*"), we set it to -1 to disable the heuristics applied
later.

After we are done with pre-matching, we use that offset and the point in
the text at which the must string was found to compute the earliest
point at which the pattern might be found.

Special care should be taken here. The variable "start" is passed to the
automata-processing functions fast() and slow() to indicate the point in
the text at which they should start working from. The real beginning of
the text is passed in a struct match variable m, which is used to check
for anchors. That variable, though, is initialized with "start", so we
must not adjust "start" before "m" is properly initialized.

Simple tests showed a speed increase from 100% to 400%, but they were
biased in that regexec() was called for the whole file instead of line
by line, and parenthized subexpressions were not searched for.

This change adds a single integer to the size of the "guts" structure,
and does not change the ABI.

Further improvements possible:

Since the speed increase observed here is so huge, one intuitive
optimization would be to introduce a bias in the function that computes
the "must" string so as to prefer a smaller string with a finite offset
over a larger one with an infinite offset. Tests have shown this to be a
bad idea, though, as the cost of false pre-matches far outweights the
benefits of a must offset, even in biased situations.

A number of other improvements suggest themselves, though:

* identify the cases where the pattern is identical to the must
string, and avoid entering fast() and slow() in these cases.

* compute the maximum offset from the must string to the end of
the pattern, and use that to set the point at which fast() and
slow() should give up trying to find a match, and return then
return to pre-matching.

* return all the way to pre-matching if a "match" was found and
later invalidated by back reference processing. Since back
references are evil and should be avoided anyway, this is of
little use.


62389 02-Jul-2000 dcs

Remove from the notes a bug that it's said to have been fixed.

PR: 15561
Submitted by: Martin Kammerhofer <mkamm@gmx.net>
Confirmed by: ache


62263 29-Jun-2000 dcs

Initialize variables used by the Boyer-Moore algorithm.

This should fix core dumps when the must pattern is of length
three or less.

Bug found by: knu


62232 29-Jun-2000 dcs

Add Boyler-Moore algorithm to pre-matching test.

The BM algorithm works by scanning the pattern from right to left,
and jumping as many characters as viable based on the text's mismatched
character and the pattern's already matched suffix.

This typically enable us to test only a fraction of the text's characters,
but has a worse performance than the straight-forward method for small
patterns. Because of this, the BM algorithm will only be used if the
pattern size is at least 4 characters.

Notice that this pre-matching is done on the largest substring of the
regular expression that _must_ be present on the text for a succesful
match to be possible at all.

For instance, "(xyzzy|grues)" will yield a null "must" substring, and,
therefore, not benefit from the BM algorithm at all. Because of the
lack of intelligence of the algorithm that finds the "must" string,
things like "charjump|matchjump" will also yield a null string. To
optimize that, "(char|match)jump" should be used.

The setup time (at regcomp()) for the BM algorithm will most likely
outweight any benefits for one-time matches. Given the slow regex(3)
we have, this is unlikely to be even perceptible, though.

The size of a regex_t structure is increased by 2*sizeof(char*) +
256*sizeof(int) + strlen(must)*sizeof(int). This is all inside the
regex_t's "guts", which is allocated dynamically by regcomp(). If
allocation of either of the two tables fail, the other one is freed.
In this case, the straight-forward algorithm is used for pre-matching.

Tests exercising the code path affected have shown a speed increase of
50% for "must" strings of length four or five.

API and ABI remain unchanged by this commit.

The patch submitted on the PR was not used, as it was non-functional.

PR: 14342


50476 28-Aug-1999 peter

$Id$ -> $FreeBSD$


49099 26-Jul-1999 ache

remove <ctype.h> - not needed


49094 26-Jul-1999 ache

unsigned char cleanup
fix wrong index from p_simp_re()

PR: 8790
Submitted by: Alexander Viro <viro@math.psu.edu> (partially)


48794 12-Jul-1999 nik

Add $Id$, to make it simpler for members of the translation teams to
track.

The $Id$ line is normally at the bottom of the main comment block in the
man page, separated from the rest of the manpage by an empty comment,
like so;

.\" $Id$
.\"

If the immediately preceding comment is a @(#) format ID marker than the
the $Id$ will line up underneath it with no intervening blank lines.
Otherwise, an additional blank line is inserted.

Approved by: bde


39327 16-Sep-1998 imp

Replace memory leaking instances of realloc with non-leaking reallocf.
In some cases replace if (a == null) a = malloc(x); else a =
realloc(a, x); with simple reallocf(a, x). Per ANSI-C, this is
guaranteed to be the same thing.

I've been running these on my system here w/o ill effects for some
time. However, the CTM-express is at part 6 of 34 for the CAM
changes, so I've not been able to do a build world with the CAM in the
tree with these changes. Shouldn't impact anything, but...


36043 14-May-1998 jb

int -> long changes that reduce the diffs with the NetBSD version to
work in a 64-bit environment.


33352 14-Feb-1998 steve

Note that '+' and '?' are not special characters in basic REs but they
can be simulated using bounds.

PR: 5708
Submitted by: Oliver Fromme <oliver.fromme@heim3.tu-clausthal.de>


30447 15-Oct-1997 bde

Removed the subdirectory paths from the definitions of MAN[1-9]. They
were a workaround for limitations in bsd.man.mk that were fixed about
2 years ago.


25401 03-May-1997 jb

Changed all paths to be relative to src/lib instead of src/lib/libc
so that all these makefiles can be used to build libc_r too.

Added .if ${LIB} == "c" tests to restrict man page builds to libc
to avoid needlessly building them with libc_r too.

Split libc Makefile into Makefile and Makefile.inc to allow the
libc_r Makefile to include Makefile.inc too.


24637 04-Apr-1997 ache

Speedup in case locale not used


19277 31-Oct-1996 ache

collate_range_cmp -> __collate_range_cmp


17782 22-Aug-1996 mpp

Correctly use .Fn instead of .Nm to reference function names
in a bunch of man pages.

Use the correct .Bx (BSD UNIX) or .At (AT&T UNIX) macros
instead of explicitly specifying the version in the text
in a bunch of man pages.


17552 12-Aug-1996 ache

Convert to newly aded collate compare function


17532 12-Aug-1996 ache

Remove static collcmp, use new internal function now


17514 11-Aug-1996 ache

Use collate data for national alpha character ranges like [a-z]


17509 11-Aug-1996 ache

Short value is better for hash due to easy overflow in 8bit characters


17508 11-Aug-1996 ache

Use locale for character classes instead of hardcoded values
Misc 8bit cleanup


17141 12-Jul-1996 jkh

General -Wall warning cleanup, part I.
Submitted-By: Kent Vander Velden <graphix@iastate.edu>


14815 25-Mar-1996 ache

8bit clean fixes


11664 22-Oct-1995 phk

More cleanup.
Uhm, I also forgot: I took "EXTRA_SANITY" out of malloc.c


8870 30-May-1995 rgrimes

Remove trailing whitespace.


1849 05-Aug-1994 wollman

First crack at making libc work with the new make macros. It compiles on
my machine, and a simple static (genassym) and shared (sysctl) executable
both work. Still to be done: RPCand YP merge.


1574 27-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1573,
which included commits to RCS files with non-trunk default branches.