11590Srgrimes# @(#)POSIX 8.1 (Berkeley) 6/6/93 2168417Syar# $FreeBSD$ 31590Srgrimes 41590SrgrimesComments on the IEEE P1003.2 Draft 12 51590Srgrimes Part 2: Shell and Utilities 61590Srgrimes Section 4.55: sed - Stream editor 71590Srgrimes 81590SrgrimesDiomidis Spinellis <dds@doc.ic.ac.uk> 91590SrgrimesKeith Bostic <bostic@cs.berkeley.edu> 101590Srgrimes 111590SrgrimesIn the following paragraphs, "wrong" usually means "inconsistent with 121590Srgrimeshistoric practice", as most of the following comments refer to 131590Srgrimesundocumented inconsistencies between the historical versions of sed and 141590Srgrimesthe POSIX 1003.2 standard. All the comments are notes taken while 151590Srgrimesimplementing a POSIX-compatible version of sed, and should not be 161590Srgrimesinterpreted as official opinions or criticism towards the POSIX committee. 171590SrgrimesAll uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. 181590Srgrimes 191590Srgrimes 1. 32V and BSD derived implementations of sed strip the text 201590Srgrimes arguments of the a, c and i commands of their initial blanks, 211590Srgrimes i.e. 221590Srgrimes 231590Srgrimes #!/bin/sed -f 241590Srgrimes a\ 251590Srgrimes foo\ 261590Srgrimes \ indent\ 271590Srgrimes bar 281590Srgrimes 291590Srgrimes produces: 301590Srgrimes 311590Srgrimes foo 321590Srgrimes indent 331590Srgrimes bar 341590Srgrimes 351590Srgrimes POSIX does not specify this behavior as the System V versions of 361590Srgrimes sed do not do this stripping. The argument against stripping is 371590Srgrimes that it is difficult to write sed scripts that have leading blanks 381590Srgrimes if they are stripped. The argument for stripping is that it is 391590Srgrimes difficult to write readable sed scripts unless indentation is allowed 401590Srgrimes and ignored, and leading whitespace is obtainable by entering a 411590Srgrimes backslash in front of it. This implementation follows the BSD 421590Srgrimes historic practice. 431590Srgrimes 441590Srgrimes 2. Historical versions of sed required that the w flag be the last 451590Srgrimes flag to an s command as it takes an additional argument. This 461590Srgrimes is obvious, but not specified in POSIX. 471590Srgrimes 481590Srgrimes 3. Historical versions of sed required that whitespace follow a w 491590Srgrimes flag to an s command. This is not specified in POSIX. This 501590Srgrimes implementation permits whitespace but does not require it. 511590Srgrimes 521590Srgrimes 4. Historical versions of sed permitted any number of whitespace 531590Srgrimes characters to follow the w command. This is not specified in 541590Srgrimes POSIX. This implementation permits whitespace but does not 551590Srgrimes require it. 561590Srgrimes 571590Srgrimes 5. The rule for the l command differs from historic practice. Table 581590Srgrimes 2-15 includes the various ANSI C escape sequences, including \\ 591590Srgrimes for backslash. Some historical versions of sed displayed two 601590Srgrimes digit octal numbers, too, not three as specified by POSIX. POSIX 611590Srgrimes is a cleanup, and is followed by this implementation. 621590Srgrimes 631590Srgrimes 6. The POSIX specification for ! does not specify that for a single 641590Srgrimes command the command must not contain an address specification 651590Srgrimes whereas the command list can contain address specifications. The 661590Srgrimes specification for ! implies that "3!/hello/p" works, and it never 671590Srgrimes has, historically. Note, 681590Srgrimes 691590Srgrimes 3!{ 701590Srgrimes /hello/p 711590Srgrimes } 721590Srgrimes 731590Srgrimes does work. 741590Srgrimes 751590Srgrimes 7. POSIX does not specify what happens with consecutive ! commands 761590Srgrimes (e.g. /foo/!!!p). Historic implementations allow any number of 771590Srgrimes !'s without changing the behaviour. (It seems logical that each 781590Srgrimes one might reverse the behaviour.) This implementation follows 791590Srgrimes historic practice. 801590Srgrimes 811590Srgrimes 8. Historic versions of sed permitted commands to be separated 821590Srgrimes by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first 831590Srgrimes three lines of a file. This is not specified by POSIX. 841590Srgrimes Note, the ; command separator is not allowed for the commands 851590Srgrimes a, c, i, w, r, :, b, t, # and at the end of a w flag in the s 861590Srgrimes command. This implementation follows historic practice and 871590Srgrimes implements the ; separator. 881590Srgrimes 891590Srgrimes 9. Historic versions of sed terminated the script if EOF was reached 901590Srgrimes during the execution of the 'n' command, i.e.: 911590Srgrimes 921590Srgrimes sed -e ' 931590Srgrimes n 941590Srgrimes i\ 951590Srgrimes hello 961590Srgrimes ' </dev/null 971590Srgrimes 981590Srgrimes did not produce any output. POSIX does not specify this behavior. 991590Srgrimes This implementation follows historic practice. 1001590Srgrimes 1011590Srgrimes10. Deleted. 1021590Srgrimes 1031590Srgrimes11. Historical implementations do not output the change text of a c 1041590Srgrimes command in the case of an address range whose first line number 1051590Srgrimes is greater than the second (e.g. 3,1). POSIX requires that the 1061590Srgrimes text be output. Since the historic behavior doesn't seem to have 1071590Srgrimes any particular purpose, this implementation follows the POSIX 1081590Srgrimes behavior. 1091590Srgrimes 1101590Srgrimes12. POSIX does not specify whether address ranges are checked and 1111590Srgrimes reset if a command is not executed due to a jump. The following 1121590Srgrimes program will behave in different ways depending on whether the 1131590Srgrimes 'c' command is triggered at the third line, i.e. will the text 1141590Srgrimes be output even though line 3 of the input will never logically 1151590Srgrimes encounter that command. 1161590Srgrimes 1171590Srgrimes 2,4b 1181590Srgrimes 1,3c\ 1191590Srgrimes text 1201590Srgrimes 121168386Syar Historic implementations did not output the text in the above 122168386Syar example. Therefore it was believed that a range whose second 123168386Syar address was never matched extended to the end of the input. 124168386Syar However, the current practice adopted by this implementation, 125168386Syar as well as by those from GNU and SUN, is as follows: The text 126168386Syar from the 'c' command still isn't output because the second address 127168389Syar isn't actually matched; but the range is reset after all if its 128168389Syar second address is a line number. In the above example, only the 129168389Syar first line of the input will be deleted. 1301590Srgrimes 1311590Srgrimes13. Historical implementations allow an output suppressing #n at the 1321590Srgrimes beginning of -e arguments as well as in a script file. POSIX 1331590Srgrimes does not specify this. This implementation follows historical 1341590Srgrimes practice. 1351590Srgrimes 1361590Srgrimes14. POSIX does not explicitly specify how sed behaves if no script is 1371590Srgrimes specified. Since the sed Synopsis permits this form of the command, 1381590Srgrimes and the language in the Description section states that the input 1391590Srgrimes is output, it seems reasonable that it behave like the cat(1) 1401590Srgrimes command. Historic sed implementations behave differently for "ls | 1411590Srgrimes sed", where they produce no output, and "ls | sed -e#", where they 1421590Srgrimes behave like cat. This implementation behaves like cat in both cases. 1431590Srgrimes 1441590Srgrimes15. The POSIX requirement to open all w files at the beginning makes 1451590Srgrimes sed behave nonintuitively when the w commands are preceded by 1461590Srgrimes addresses or are within conditional blocks. This implementation 1471590Srgrimes follows historic practice and POSIX, by default, and provides the 1481590Srgrimes -a option which opens the files only when they are needed. 1491590Srgrimes 1501590Srgrimes16. POSIX does not specify how escape sequences other than \n and \D 1511590Srgrimes (where D is the delimiter character) are to be treated. This is 1521590Srgrimes reasonable, however, it also doesn't state that the backslash is 1531590Srgrimes to be discarded from the output regardless. A strict reading of 1541590Srgrimes POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". 1551590Srgrimes As historic sed implementations always discarded the backslash, 1561590Srgrimes this implementation does as well. 1571590Srgrimes 1581590Srgrimes17. POSIX specifies that an address can be "empty". This implies 1591590Srgrimes that constructs like ",d" or "1,d" and ",5d" are allowed. This 1601590Srgrimes is not true for historic implementations or this implementation 1611590Srgrimes of sed. 1621590Srgrimes 1631590Srgrimes18. The b t and : commands are documented in POSIX to ignore leading 1641590Srgrimes white space, but no mention is made of trailing white space. 1651590Srgrimes Historic implementations of sed assigned different locations to 1661590Srgrimes the labels "x" and "x ". This is not useful, and leads to subtle 1671590Srgrimes programming errors, but it is historic practice and changing it 1681590Srgrimes could theoretically break working scripts. This implementation 1691590Srgrimes follows historic practice. 1701590Srgrimes 1711590Srgrimes19. Although POSIX specifies that reading from files that do not exist 1721590Srgrimes from within the script must not terminate the script, it does not 1731590Srgrimes specify what happens if a write command fails. Historic practice 1741590Srgrimes is to fail immediately if the file cannot be opened or written. 1751590Srgrimes This implementation follows historic practice. 1761590Srgrimes 1771590Srgrimes20. Historic practice is that the \n construct can be used for either 1781590Srgrimes string1 or string2 of the y command. This is not specified by 1791590Srgrimes POSIX. This implementation follows historic practice. 1801590Srgrimes 1811590Srgrimes21. Deleted. 1821590Srgrimes 1831590Srgrimes22. Historic implementations of sed ignore the RE delimiter characters 1841590Srgrimes within character classes. This is not specified in POSIX. This 1851590Srgrimes implementation follows historic practice. 1861590Srgrimes 1871590Srgrimes23. Historic implementations handle empty RE's in a special way: the 1881590Srgrimes empty RE is interpreted as if it were the last RE encountered, 1891590Srgrimes whether in an address or elsewhere. POSIX does not document this 1901590Srgrimes behavior. For example the command: 1911590Srgrimes 1921590Srgrimes sed -e /abc/s//XXX/ 1931590Srgrimes 1941590Srgrimes substitutes XXX for the pattern abc. The semantics of "the last 1951590Srgrimes RE" can be defined in two different ways: 1961590Srgrimes 1971590Srgrimes 1. The last RE encountered when compiling (lexical/static scope). 1981590Srgrimes 2. The last RE encountered while running (dynamic scope). 1991590Srgrimes 2001590Srgrimes While many historical implementations fail on programs depending 2011590Srgrimes on scope differences, the SunOS version exhibited dynamic scope 2021590Srgrimes behaviour. This implementation does dynamic scoping, as this seems 2031590Srgrimes the most useful and in order to remain consistent with historical 2041590Srgrimes practice. 205