1.de EX
2.nf
3.ft CW
4..
5.de EE
6.br
7.fi
8.ft 1
9..
10.de TF
11.IP "" "\w'\fB\\$1\ \ \fP'u"
12.PD 0
13..
14.TH AWK 1
15.CT 1 files prog_other
16.SH NAME
17awk \- pattern-directed scanning and processing language
18.SH SYNOPSIS
19.B awk
20[
21.BI \-F
22.I fs
23|
24.B \-\^\-csv
25]
26[
27.BI \-v
28.I var=value
29]
30[
31.I 'prog'
32|
33.BI \-f
34.I progfile
35]
36[
37.I file ...
38]
39.SH DESCRIPTION
40.I Awk
41scans each input
42.I file
43for lines that match any of a set of patterns specified literally in
44.I prog
45or in one or more files
46specified as
47.B \-f
48.IR progfile .
49With each pattern
50there can be an associated action that will be performed
51when a line of a
52.I file
53matches the pattern.
54Each line is matched against the
55pattern portion of every pattern-action statement;
56the associated action is performed for each matched pattern.
57The file name
58.B \-
59means the standard input.
60Any
61.I file
62of the form
63.I var=value
64is treated as an assignment, not a filename,
65and is executed at the time it would have been opened if it were a filename.
66The option
67.B \-v
68followed by
69.I var=value
70is an assignment to be done before
71.I prog
72is executed;
73any number of
74.B \-v
75options may be present.
76The
77.B \-F
78.I fs
79option defines the input field separator to be the regular expression
80.IR fs .
81The
82.B \-\^\-csv
83option causes
84.I awk
85to process records using (more or less) standard comma-separated values
86(CSV) format.
87.PP
88An input line is normally made up of fields separated by white space,
89or by the regular expression
90.BR FS .
91The fields are denoted
92.BR $1 ,
93.BR $2 ,
94\&..., while
95.B $0
96refers to the entire line.
97If
98.BR FS
99is null, the input line is split into one field per character.
100.PP
101A pattern-action statement has the form:
102.IP
103.IB pattern " { " action " }
104.PP
105A missing
106.BI { " action " }
107means print the line;
108a missing pattern always matches.
109Pattern-action statements are separated by newlines or semicolons.
110.PP
111An action is a sequence of statements.
112A statement can be one of the following:
113.PP
114.EX
115.ta \w'\f(CWdelete array[expression]\fR'u
116.RS
117.nf
118.ft CW
119if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
120while(\fI expression \fP)\fI statement\fP
121for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
122for(\fI var \fPin\fI array \fP)\fI statement\fP
123do\fI statement \fPwhile(\fI expression \fP)
124break
125continue
126{\fR [\fP\fI statement ... \fP\fR] \fP}
127\fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
128print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
129printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
130return\fR [ \fP\fIexpression \fP\fR]\fP
131next	#\fR skip remaining patterns on this input line\fP
132nextfile	#\fR skip rest of this file, open next, start at top\fP
133delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
134delete\fI array\fP	#\fR delete all elements of array\fP
135exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
136.fi
137.RE
138.EE
139.DT
140.PP
141Statements are terminated by
142semicolons, newlines or right braces.
143An empty
144.I expression-list
145stands for
146.BR $0 .
147String constants are quoted \&\f(CW"\ "\fR,
148with the usual C escapes recognized within.
149Expressions take on string or numeric values as appropriate,
150and are built using the operators
151.B + \- * / % ^
152(exponentiation), and concatenation (indicated by white space).
153The operators
154.B
155! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
156are also available in expressions.
157Variables may be scalars, array elements
158(denoted
159.IB x  [ i ] \fR)
160or fields.
161Variables are initialized to the null string.
162Array subscripts may be any string,
163not necessarily numeric;
164this allows for a form of associative memory.
165Multiple subscripts such as
166.B [i,j,k]
167are permitted; the constituents are concatenated,
168separated by the value of
169.BR SUBSEP .
170.PP
171The
172.B print
173statement prints its arguments on the standard output
174(or on a file if
175.BI > " file
176or
177.BI >> " file
178is present or on a pipe if
179.BI | " cmd
180is present), separated by the current output field separator,
181and terminated by the output record separator.
182.I file
183and
184.I cmd
185may be literal names or parenthesized expressions;
186identical string values in different statements denote
187the same open file.
188The
189.B printf
190statement formats its expression list according to the
191.I format
192(see
193.IR printf (3)).
194The built-in function
195.BI close( expr )
196closes the file or pipe
197.IR expr .
198The built-in function
199.BI fflush( expr )
200flushes any buffered output for the file or pipe
201.IR expr .
202.PP
203The mathematical functions
204.BR atan2 ,
205.BR cos ,
206.BR exp ,
207.BR log ,
208.BR sin ,
209and
210.B sqrt
211are built in.
212Other built-in functions:
213.TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR"
214.TP
215\fBlength(\fR[\fIv\^\fR]\fB)\fR
216the length of its argument
217taken as a string,
218number of elements in an array for an array argument,
219or length of
220.B $0
221if no argument.
222.TP
223.B rand()
224random number on [0,1).
225.TP
226\fBsrand(\fR[\fIs\^\fR]\fB)\fR
227sets seed for
228.B rand
229and returns the previous seed.
230.TP
231.BI int( x\^ )
232truncates to an integer value.
233.TP
234\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
235the
236.IR n -character
237substring of
238.I s
239that begins at position
240.I m
241counted from 1.
242If no
243.IR n ,
244use the rest of the string.
245.TP
246.BI index( s , " t" )
247the position in
248.I s
249where the string
250.I t
251occurs, or 0 if it does not.
252.TP
253.BI match( s , " r" )
254the position in
255.I s
256where the regular expression
257.I r
258occurs, or 0 if it does not.
259The variables
260.B RSTART
261and
262.B RLENGTH
263are set to the position and length of the matched string.
264.TP
265\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
266splits the string
267.I s
268into array elements
269.IB a [1] \fR,
270.IB a [2] \fR,
271\&...,
272.IB a [ n ] \fR,
273and returns
274.IR n .
275The separation is done with the regular expression
276.I fs
277or with the field separator
278.B FS
279if
280.I fs
281is not given.
282An empty string as field separator splits the string
283into one array element per character.
284.TP
285\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
286substitutes
287.I t
288for the first occurrence of the regular expression
289.I r
290in the string
291.IR s .
292If
293.I s
294is not given,
295.B $0
296is used.
297.TP
298\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
299same as
300.B sub
301except that all occurrences of the regular expression
302are replaced;
303.B sub
304and
305.B gsub
306return the number of replacements.
307.TP
308\fBgensub(\fIpat\fB, \fIrepl\fB, \fIhow\fR [\fB, \fItarget\fR]\fB)\fR
309replaces instances of
310.I pat
311in
312.I target
313with
314.IR repl .
315If
316.I how
317is \fB"g"\fR or \fB"G"\fR, do so globally. Otherwise,
318.I how
319is a number indicating which occurrence to replace.  If no
320.IR target ,
321use
322.BR $0 .
323Return the resulting string;
324.I target
325is not modified.
326.TP
327.BI sprintf( fmt , " expr" , " ...\fB)
328the string resulting from formatting
329.I expr ...
330according to the
331.IR printf (3)
332format
333.IR fmt .
334.TP
335.B systime()
336returns the current date and time as a standard
337``seconds since the epoch'' value.
338.TP
339.BI strftime( fmt ", " timestamp\^ )
340formats
341.I timestamp
342(a value in seconds since the epoch)
343according to
344.IR fmt ,
345which is a format string as supported by
346.IR strftime (3).
347Both
348.I timestamp
349and
350.I fmt
351may be omitted; if no
352.IR timestamp ,
353the current time of day is used, and if no
354.IR fmt ,
355a default format of \fB"%a %b %e %H:%M:%S %Z %Y"\fR is used.
356.TP
357.BI system( cmd )
358executes
359.I cmd
360and returns its exit status. This will be \-1 upon error,
361.IR cmd 's
362exit status upon a normal exit,
363256 +
364.I sig
365upon death-by-signal, where
366.I sig
367is the number of the murdering signal,
368or 512 +
369.I sig
370if there was a core dump.
371.TP
372.BI tolower( str )
373returns a copy of
374.I str
375with all upper-case characters translated to their
376corresponding lower-case equivalents.
377.TP
378.BI toupper( str )
379returns a copy of
380.I str
381with all lower-case characters translated to their
382corresponding upper-case equivalents.
383.PD
384.PP
385The ``function''
386.B getline
387sets
388.B $0
389to the next input record from the current input file;
390.B getline
391.BI < " file
392sets
393.B $0
394to the next record from
395.IR file .
396.B getline
397.I x
398sets variable
399.I x
400instead.
401Finally,
402.IB cmd " | getline
403pipes the output of
404.I cmd
405into
406.BR getline ;
407each call of
408.B getline
409returns the next line of output from
410.IR cmd .
411In all cases,
412.B getline
413returns 1 for a successful input,
4140 for end of file, and \-1 for an error.
415.PP
416The functions
417.BR compl ,
418.BR and ,
419.BR or ,
420.BR xor ,
421.BR lshift ,
422and
423.B rshift
424peform the corresponding bitwise operations on their
425operands, which are first truncated to integer.
426.PP
427Patterns are arbitrary Boolean combinations
428(with
429.BR "! || &&" )
430of regular expressions and
431relational expressions.
432Regular expressions are as in
433.IR egrep ;
434see
435.IR grep (1).
436Isolated regular expressions
437in a pattern apply to the entire line.
438Regular expressions may also occur in
439relational expressions, using the operators
440.B ~
441and
442.BR !~ .
443.BI / re /
444is a constant regular expression;
445any string (constant or variable) may be used
446as a regular expression, except in the position of an isolated regular expression
447in a pattern.
448.PP
449A pattern may consist of two patterns separated by a comma;
450in this case, the action is performed for all lines
451from an occurrence of the first pattern
452through an occurrence of the second, inclusive.
453.PP
454A relational expression is one of the following:
455.IP
456.I expression matchop regular-expression
457.br
458.I expression relop expression
459.br
460.IB expression " in " array-name
461.br
462.BI ( expr ,\| expr ,\| ... ") in " array-name
463.PP
464where a
465.I relop
466is any of the six relational operators in C,
467and a
468.I matchop
469is either
470.B ~
471(matches)
472or
473.B !~
474(does not match).
475A conditional is an arithmetic expression,
476a relational expression,
477or a Boolean combination
478of these.
479.PP
480The special patterns
481.B BEGIN
482and
483.B END
484may be used to capture control before the first input line is read
485and after the last.
486.B BEGIN
487and
488.B END
489do not combine with other patterns.
490They may appear multiple times in a program and execute
491in the order they are read by
492.IR awk .
493.PP
494Variable names with special meanings:
495.TF FILENAME
496.TP
497.B ARGC
498argument count, assignable.
499.TP
500.B ARGV
501argument array, assignable;
502non-null members are taken as filenames.
503.TP
504.B CONVFMT
505conversion format used when converting numbers
506(default
507.BR "%.6g" ).
508.TP
509.B ENVIRON
510array of environment variables; subscripts are names.
511.TP
512.B FILENAME
513the name of the current input file.
514.TP
515.B FNR
516ordinal number of the current record in the current file.
517.TP
518.B FS
519regular expression used to separate fields; also settable
520by option
521.BI \-F fs\fR.
522.TP
523.BR NF
524number of fields in the current record.
525.TP
526.B NR
527ordinal number of the current record.
528.TP
529.B OFMT
530output format for numbers (default
531.BR "%.6g" ).
532.TP
533.B OFS
534output field separator (default space).
535.TP
536.B ORS
537output record separator (default newline).
538.TP
539.B RLENGTH
540the length of a string matched by
541.BR match .
542.TP
543.B RS
544input record separator (default newline).
545If empty, blank lines separate records.
546If more than one character long,
547.B RS
548is treated as a regular expression, and records are
549separated by text matching the expression.
550.TP
551.B RSTART
552the start position of a string matched by
553.BR match .
554.TP
555.B SUBSEP
556separates multiple subscripts (default 034).
557.PD
558.PP
559Functions may be defined (at the position of a pattern-action statement) thus:
560.IP
561.B
562function foo(a, b, c) { ... }
563.PP
564Parameters are passed by value if scalar and by reference if array name;
565functions may be called recursively.
566Parameters are local to the function; all other variables are global.
567Thus local variables may be created by providing excess parameters in
568the function definition.
569.SH ENVIRONMENT VARIABLES
570If
571.B POSIXLY_CORRECT
572is set in the environment, then
573.I awk
574follows the POSIX rules for
575.B sub
576and
577.B gsub
578with respect to consecutive backslashes and ampersands.
579.SH EXAMPLES
580.TP
581.EX
582length($0) > 72
583.EE
584Print lines longer than 72 characters.
585.TP
586.EX
587{ print $2, $1 }
588.EE
589Print first two fields in opposite order.
590.PP
591.EX
592BEGIN { FS = ",[ \et]*|[ \et]+" }
593      { print $2, $1 }
594.EE
595.ns
596.IP
597Same, with input fields separated by comma and/or spaces and tabs.
598.PP
599.EX
600.nf
601	{ s += $1 }
602END	{ print "sum is", s, " average is", s/NR }
603.fi
604.EE
605.ns
606.IP
607Add up first column, print sum and average.
608.TP
609.EX
610/start/, /stop/
611.EE
612Print all lines between start/stop pairs.
613.PP
614.EX
615.nf
616BEGIN	{	# Simulate echo(1)
617	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
618	printf "\en"
619	exit }
620.fi
621.EE
622.SH SEE ALSO
623.IR grep (1),
624.IR lex (1),
625.IR sed (1)
626.br
627A. V. Aho, B. W. Kernighan, P. J. Weinberger,
628.IR "The AWK Programming Language, Second Edition" ,
629Addison-Wesley, 2024.  ISBN 978-0-13-826972-2, 0-13-826972-6.
630.SH BUGS
631There are no explicit conversions between numbers and strings.
632To force an expression to be treated as a number add 0 to it;
633to force it to be treated as a string concatenate
634\&\f(CW""\fP to it.
635.PP
636The scope rules for variables in functions are a botch;
637the syntax is worse.
638.PP
639Input is expected to be UTF-8 encoded. Other multibyte
640character sets are not handled.
641However, in eight-bit locales,
642.I awk
643treats each input byte as a separate character.
644.SH UNUSUAL FLOATING-POINT VALUES
645.I Awk
646was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
647and Infinity values, which are supported by all modern floating-point
648hardware.
649.PP
650Because
651.I awk
652uses
653.IR strtod (3)
654and
655.IR atof (3)
656to convert string values to double-precision floating-point values,
657modern C libraries also convert strings starting with
658.B inf
659and
660.B nan
661into infinity and NaN values respectively.  This led to strange results,
662with something like this:
663.PP
664.EX
665.nf
666echo nancy | awk '{ print $1 + 0 }'
667.fi
668.EE
669.PP
670printing
671.B nan
672instead of zero.
673.PP
674.I Awk
675now follows GNU AWK, and prefilters string values before attempting
676to convert them to numbers, as follows:
677.TP
678.I "Hexadecimal values"
679Hexadecimal values (allowed since C99) convert to zero, as they did
680prior to C99.
681.TP
682.I "NaN values"
683The two strings
684.B +nan
685and
686.B \-nan
687(case independent) convert to NaN. No others do.
688(NaNs can have signs.)
689.TP
690.I "Infinity values"
691The two strings
692.B +inf
693and
694.B \-inf
695(case independent) convert to positive and negative infinity, respectively.
696No others do.
697