11590Srgrimes#	@(#)POSIX	8.1 (Berkeley) 6/6/93
2168417Syar# $FreeBSD$
31590Srgrimes
41590SrgrimesComments on the IEEE P1003.2 Draft 12
51590Srgrimes     Part 2: Shell and Utilities
61590Srgrimes  Section 4.55: sed - Stream editor
71590Srgrimes
81590SrgrimesDiomidis Spinellis <dds@doc.ic.ac.uk>
91590SrgrimesKeith Bostic <bostic@cs.berkeley.edu>
101590Srgrimes
111590SrgrimesIn the following paragraphs, "wrong" usually means "inconsistent with
121590Srgrimeshistoric practice", as most of the following comments refer to
131590Srgrimesundocumented inconsistencies between the historical versions of sed and
141590Srgrimesthe POSIX 1003.2 standard.  All the comments are notes taken while
151590Srgrimesimplementing a POSIX-compatible version of sed, and should not be
161590Srgrimesinterpreted as official opinions or criticism towards the POSIX committee.
171590SrgrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
181590Srgrimes
191590Srgrimes 1.	32V and BSD derived implementations of sed strip the text
201590Srgrimes	arguments of the a, c and i commands of their initial blanks,
211590Srgrimes	i.e.
221590Srgrimes
231590Srgrimes	#!/bin/sed -f
241590Srgrimes	a\
251590Srgrimes		foo\
261590Srgrimes		\  indent\
271590Srgrimes		bar
281590Srgrimes
291590Srgrimes	produces:
301590Srgrimes
311590Srgrimes	foo
321590Srgrimes	  indent
331590Srgrimes	bar
341590Srgrimes
351590Srgrimes	POSIX does not specify this behavior as the System V versions of
361590Srgrimes	sed do not do this stripping.  The argument against stripping is
371590Srgrimes	that it is difficult to write sed scripts that have leading blanks
381590Srgrimes	if they are stripped.  The argument for stripping is that it is
391590Srgrimes	difficult to write readable sed scripts unless indentation is allowed
401590Srgrimes	and ignored, and leading whitespace is obtainable by entering a
411590Srgrimes	backslash in front of it.  This implementation follows the BSD
421590Srgrimes	historic practice.
431590Srgrimes
441590Srgrimes 2.	Historical versions of sed required that the w flag be the last
451590Srgrimes	flag to an s command as it takes an additional argument.  This
461590Srgrimes	is obvious, but not specified in POSIX.
471590Srgrimes
481590Srgrimes 3.	Historical versions of sed required that whitespace follow a w
491590Srgrimes	flag to an s command.  This is not specified in POSIX.  This
501590Srgrimes	implementation permits whitespace but does not require it.
511590Srgrimes
521590Srgrimes 4.	Historical versions of sed permitted any number of whitespace
531590Srgrimes	characters to follow the w command.  This is not specified in
541590Srgrimes	POSIX.  This implementation permits whitespace but does not
551590Srgrimes	require it.
561590Srgrimes
571590Srgrimes 5.	The rule for the l command differs from historic practice.  Table
581590Srgrimes	2-15 includes the various ANSI C escape sequences, including \\
591590Srgrimes	for backslash.  Some historical versions of sed displayed two
601590Srgrimes	digit octal numbers, too, not three as specified by POSIX.  POSIX
611590Srgrimes	is a cleanup, and is followed by this implementation.
621590Srgrimes
631590Srgrimes 6.	The POSIX specification for ! does not specify that for a single
641590Srgrimes	command the command must not contain an address specification
651590Srgrimes	whereas the command list can contain address specifications.  The
661590Srgrimes	specification for ! implies that "3!/hello/p" works, and it never
671590Srgrimes	has, historically.  Note,
681590Srgrimes
691590Srgrimes		3!{
701590Srgrimes			/hello/p
711590Srgrimes		}
721590Srgrimes
731590Srgrimes	does work.
741590Srgrimes
751590Srgrimes 7.	POSIX does not specify what happens with consecutive ! commands
761590Srgrimes	(e.g. /foo/!!!p).  Historic implementations allow any number of
771590Srgrimes	!'s without changing the behaviour.  (It seems logical that each
781590Srgrimes	one might reverse the behaviour.)  This implementation follows
791590Srgrimes	historic practice.
801590Srgrimes
811590Srgrimes 8.	Historic versions of sed permitted commands to be separated
821590Srgrimes	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
831590Srgrimes	three lines of a file.  This is not specified by POSIX.
841590Srgrimes	Note, the ; command separator is not allowed for the commands
851590Srgrimes	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
861590Srgrimes	command.  This implementation follows historic practice and
871590Srgrimes	implements the ; separator.
881590Srgrimes
891590Srgrimes 9.	Historic versions of sed terminated the script if EOF was reached
901590Srgrimes	during the execution of the 'n' command, i.e.:
911590Srgrimes
921590Srgrimes	sed -e '
931590Srgrimes	n
941590Srgrimes	i\
951590Srgrimes	hello
961590Srgrimes	' </dev/null
971590Srgrimes
981590Srgrimes	did not produce any output.  POSIX does not specify this behavior.
991590Srgrimes	This implementation follows historic practice.
1001590Srgrimes
1011590Srgrimes10.	Deleted.
1021590Srgrimes
1031590Srgrimes11.	Historical implementations do not output the change text of a c
1041590Srgrimes	command in the case of an address range whose first line number
1051590Srgrimes	is greater than the second (e.g. 3,1).  POSIX requires that the
1061590Srgrimes	text be output.  Since the historic behavior doesn't seem to have
1071590Srgrimes	any particular purpose, this implementation follows the POSIX
1081590Srgrimes	behavior.
1091590Srgrimes
1101590Srgrimes12.	POSIX does not specify whether address ranges are checked and
1111590Srgrimes	reset if a command is not executed due to a jump.  The following
1121590Srgrimes	program will behave in different ways depending on whether the
1131590Srgrimes	'c' command is triggered at the third line, i.e. will the text
1141590Srgrimes	be output even though line 3 of the input will never logically
1151590Srgrimes	encounter that command.
1161590Srgrimes
1171590Srgrimes	2,4b
1181590Srgrimes	1,3c\
1191590Srgrimes		text
1201590Srgrimes
121168386Syar	Historic implementations did not output the text in the above
122168386Syar	example.  Therefore it was believed that a range whose second
123168386Syar	address was never matched extended to the end of the input.
124168386Syar	However, the current practice adopted by this implementation,
125168386Syar	as well as by those from GNU and SUN, is as follows:  The text
126168386Syar	from the 'c' command still isn't output because the second address
127168389Syar	isn't actually matched; but the range is reset after all if its
128168389Syar	second address is a line number.  In the above example, only the
129168389Syar	first line of the input will be deleted.
1301590Srgrimes
1311590Srgrimes13.	Historical implementations allow an output suppressing #n at the
1321590Srgrimes	beginning of -e arguments as well as in a script file.  POSIX
1331590Srgrimes	does not specify this.  This implementation follows historical
1341590Srgrimes	practice.
1351590Srgrimes
1361590Srgrimes14.	POSIX does not explicitly specify how sed behaves if no script is
1371590Srgrimes	specified.  Since the sed Synopsis permits this form of the command,
1381590Srgrimes	and the language in the Description section states that the input
1391590Srgrimes	is output, it seems reasonable that it behave like the cat(1)
1401590Srgrimes	command.  Historic sed implementations behave differently for "ls |
1411590Srgrimes	sed", where they produce no output, and "ls | sed -e#", where they
1421590Srgrimes	behave like cat.  This implementation behaves like cat in both cases.
1431590Srgrimes
1441590Srgrimes15.	The POSIX requirement to open all w files at the beginning makes
1451590Srgrimes	sed behave nonintuitively when the w commands are preceded by
1461590Srgrimes	addresses or are within conditional blocks.  This implementation
1471590Srgrimes	follows historic practice and POSIX, by default, and provides the
1481590Srgrimes	-a option which opens the files only when they are needed.
1491590Srgrimes
1501590Srgrimes16.	POSIX does not specify how escape sequences other than \n and \D
1511590Srgrimes	(where D is the delimiter character) are to be treated.  This is
1521590Srgrimes	reasonable, however, it also doesn't state that the backslash is
1531590Srgrimes	to be discarded from the output regardless.  A strict reading of
1541590Srgrimes	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
1551590Srgrimes	As historic sed implementations always discarded the backslash,
1561590Srgrimes	this implementation does as well.
1571590Srgrimes
1581590Srgrimes17.	POSIX specifies that an address can be "empty".  This implies
1591590Srgrimes	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
1601590Srgrimes	is not true for historic implementations or this implementation
1611590Srgrimes	of sed.
1621590Srgrimes
1631590Srgrimes18.	The b t and : commands are documented in POSIX to ignore leading
1641590Srgrimes	white space, but no mention is made of trailing white space.
1651590Srgrimes	Historic implementations of sed assigned different locations to
1661590Srgrimes	the labels "x" and "x ".  This is not useful, and leads to subtle
1671590Srgrimes	programming errors, but it is historic practice and changing it
1681590Srgrimes	could theoretically break working scripts.  This implementation
1691590Srgrimes	follows historic practice.
1701590Srgrimes
1711590Srgrimes19.	Although POSIX specifies that reading from files that do not exist
1721590Srgrimes	from within the script must not terminate the script, it does not
1731590Srgrimes	specify what happens if a write command fails.  Historic practice
1741590Srgrimes	is to fail immediately if the file cannot be opened or written.
1751590Srgrimes	This implementation follows historic practice.
1761590Srgrimes
1771590Srgrimes20.	Historic practice is that the \n construct can be used for either
1781590Srgrimes	string1 or string2 of the y command.  This is not specified by
1791590Srgrimes	POSIX.  This implementation follows historic practice.
1801590Srgrimes
1811590Srgrimes21.	Deleted.
1821590Srgrimes
1831590Srgrimes22.	Historic implementations of sed ignore the RE delimiter characters
1841590Srgrimes	within character classes.  This is not specified in POSIX.  This
1851590Srgrimes	implementation follows historic practice.
1861590Srgrimes
1871590Srgrimes23.	Historic implementations handle empty RE's in a special way: the
1881590Srgrimes	empty RE is interpreted as if it were the last RE encountered,
1891590Srgrimes	whether in an address or elsewhere.  POSIX does not document this
1901590Srgrimes	behavior.  For example the command:
1911590Srgrimes
1921590Srgrimes		sed -e /abc/s//XXX/
1931590Srgrimes
1941590Srgrimes	substitutes XXX for the pattern abc.  The semantics of "the last
1951590Srgrimes	RE" can be defined in two different ways:
1961590Srgrimes
1971590Srgrimes	1. The last RE encountered when compiling (lexical/static scope).
1981590Srgrimes	2. The last RE encountered while running (dynamic scope).
1991590Srgrimes
2001590Srgrimes	While many historical implementations fail on programs depending
2011590Srgrimes	on scope differences, the SunOS version exhibited dynamic scope
2021590Srgrimes	behaviour.  This implementation does dynamic scoping, as this seems
2031590Srgrimes	the most useful and in order to remain consistent with historical
2041590Srgrimes	practice.
205