1.de EX 2.nf 3.ft CW 4.. 5.de EE 6.br 7.fi 8.ft 1 9.. 10.de TF 11.IP "" "\w'\fB\\$1\ \ \fP'u" 12.PD 0 13.. 14.TH AWK 1 15.CT 1 files prog_other 16.SH NAME 17awk \- pattern-directed scanning and processing language 18.SH SYNOPSIS 19.B awk 20[ 21.BI \-F 22.I fs 23| 24.B \-\^\-csv 25] 26[ 27.BI \-v 28.I var=value 29] 30[ 31.I 'prog' 32| 33.BI \-f 34.I progfile 35] 36[ 37.I file ... 38] 39.SH DESCRIPTION 40.I Awk 41scans each input 42.I file 43for lines that match any of a set of patterns specified literally in 44.I prog 45or in one or more files 46specified as 47.B \-f 48.IR progfile . 49With each pattern 50there can be an associated action that will be performed 51when a line of a 52.I file 53matches the pattern. 54Each line is matched against the 55pattern portion of every pattern-action statement; 56the associated action is performed for each matched pattern. 57The file name 58.B \- 59means the standard input. 60Any 61.I file 62of the form 63.I var=value 64is treated as an assignment, not a filename, 65and is executed at the time it would have been opened if it were a filename. 66The option 67.B \-v 68followed by 69.I var=value 70is an assignment to be done before 71.I prog 72is executed; 73any number of 74.B \-v 75options may be present. 76The 77.B \-F 78.I fs 79option defines the input field separator to be the regular expression 80.IR fs . 81The 82.B \-\^\-csv 83option causes 84.I awk 85to process records using (more or less) standard comma-separated values 86(CSV) format. 87.PP 88An input line is normally made up of fields separated by white space, 89or by the regular expression 90.BR FS . 91The fields are denoted 92.BR $1 , 93.BR $2 , 94\&..., while 95.B $0 96refers to the entire line. 97If 98.BR FS 99is null, the input line is split into one field per character. 100.PP 101A pattern-action statement has the form: 102.IP 103.IB pattern " { " action " } 104.PP 105A missing 106.BI { " action " } 107means print the line; 108a missing pattern always matches. 109Pattern-action statements are separated by newlines or semicolons. 110.PP 111An action is a sequence of statements. 112A statement can be one of the following: 113.PP 114.EX 115.ta \w'\f(CWdelete array[expression]\fR'u 116.RS 117.nf 118.ft CW 119if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP 120while(\fI expression \fP)\fI statement\fP 121for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP 122for(\fI var \fPin\fI array \fP)\fI statement\fP 123do\fI statement \fPwhile(\fI expression \fP) 124break 125continue 126{\fR [\fP\fI statement ... \fP\fR] \fP} 127\fIexpression\fP #\fR commonly\fP\fI var = expression\fP 128print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 129printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP 130return\fR [ \fP\fIexpression \fP\fR]\fP 131next #\fR skip remaining patterns on this input line\fP 132nextfile #\fR skip rest of this file, open next, start at top\fP 133delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP 134delete\fI array\fP #\fR delete all elements of array\fP 135exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP 136.fi 137.RE 138.EE 139.DT 140.PP 141Statements are terminated by 142semicolons, newlines or right braces. 143An empty 144.I expression-list 145stands for 146.BR $0 . 147String constants are quoted \&\f(CW"\ "\fR, 148with the usual C escapes recognized within. 149Expressions take on string or numeric values as appropriate, 150and are built using the operators 151.B + \- * / % ^ 152(exponentiation), and concatenation (indicated by white space). 153The operators 154.B 155! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 156are also available in expressions. 157Variables may be scalars, array elements 158(denoted 159.IB x [ i ] \fR) 160or fields. 161Variables are initialized to the null string. 162Array subscripts may be any string, 163not necessarily numeric; 164this allows for a form of associative memory. 165Multiple subscripts such as 166.B [i,j,k] 167are permitted; the constituents are concatenated, 168separated by the value of 169.BR SUBSEP . 170.PP 171The 172.B print 173statement prints its arguments on the standard output 174(or on a file if 175.BI > " file 176or 177.BI >> " file 178is present or on a pipe if 179.BI | " cmd 180is present), separated by the current output field separator, 181and terminated by the output record separator. 182.I file 183and 184.I cmd 185may be literal names or parenthesized expressions; 186identical string values in different statements denote 187the same open file. 188The 189.B printf 190statement formats its expression list according to the 191.I format 192(see 193.IR printf (3)). 194The built-in function 195.BI close( expr ) 196closes the file or pipe 197.IR expr . 198The built-in function 199.BI fflush( expr ) 200flushes any buffered output for the file or pipe 201.IR expr . 202.PP 203The mathematical functions 204.BR atan2 , 205.BR cos , 206.BR exp , 207.BR log , 208.BR sin , 209and 210.B sqrt 211are built in. 212Other built-in functions: 213.TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR" 214.TP 215\fBlength(\fR[\fIv\^\fR]\fB)\fR 216the length of its argument 217taken as a string, 218number of elements in an array for an array argument, 219or length of 220.B $0 221if no argument. 222.TP 223.B rand() 224random number on [0,1). 225.TP 226\fBsrand(\fR[\fIs\^\fR]\fB)\fR 227sets seed for 228.B rand 229and returns the previous seed. 230.TP 231.BI int( x\^ ) 232truncates to an integer value. 233.TP 234\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR 235the 236.IR n -character 237substring of 238.I s 239that begins at position 240.I m 241counted from 1. 242If no 243.IR n , 244use the rest of the string. 245.TP 246.BI index( s , " t" ) 247the position in 248.I s 249where the string 250.I t 251occurs, or 0 if it does not. 252.TP 253.BI match( s , " r" ) 254the position in 255.I s 256where the regular expression 257.I r 258occurs, or 0 if it does not. 259The variables 260.B RSTART 261and 262.B RLENGTH 263are set to the position and length of the matched string. 264.TP 265\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR 266splits the string 267.I s 268into array elements 269.IB a [1] \fR, 270.IB a [2] \fR, 271\&..., 272.IB a [ n ] \fR, 273and returns 274.IR n . 275The separation is done with the regular expression 276.I fs 277or with the field separator 278.B FS 279if 280.I fs 281is not given. 282An empty string as field separator splits the string 283into one array element per character. 284.TP 285\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) 286substitutes 287.I t 288for the first occurrence of the regular expression 289.I r 290in the string 291.IR s . 292If 293.I s 294is not given, 295.B $0 296is used. 297.TP 298\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB) 299same as 300.B sub 301except that all occurrences of the regular expression 302are replaced; 303.B sub 304and 305.B gsub 306return the number of replacements. 307.TP 308\fBgensub(\fIpat\fB, \fIrepl\fB, \fIhow\fR [\fB, \fItarget\fR]\fB)\fR 309replaces instances of 310.I pat 311in 312.I target 313with 314.IR repl . 315If 316.I how 317is \fB"g"\fR or \fB"G"\fR, do so globally. Otherwise, 318.I how 319is a number indicating which occurrence to replace. If no 320.IR target , 321use 322.BR $0 . 323Return the resulting string; 324.I target 325is not modified. 326.TP 327.BI sprintf( fmt , " expr" , " ...\fB) 328the string resulting from formatting 329.I expr ... 330according to the 331.IR printf (3) 332format 333.IR fmt . 334.TP 335.B systime() 336returns the current date and time as a standard 337``seconds since the epoch'' value. 338.TP 339.BI strftime( fmt ", " timestamp\^ ) 340formats 341.I timestamp 342(a value in seconds since the epoch) 343according to 344.IR fmt , 345which is a format string as supported by 346.IR strftime (3). 347Both 348.I timestamp 349and 350.I fmt 351may be omitted; if no 352.IR timestamp , 353the current time of day is used, and if no 354.IR fmt , 355a default format of \fB"%a %b %e %H:%M:%S %Z %Y"\fR is used. 356.TP 357.BI system( cmd ) 358executes 359.I cmd 360and returns its exit status. This will be \-1 upon error, 361.IR cmd 's 362exit status upon a normal exit, 363256 + 364.I sig 365upon death-by-signal, where 366.I sig 367is the number of the murdering signal, 368or 512 + 369.I sig 370if there was a core dump. 371.TP 372.BI tolower( str ) 373returns a copy of 374.I str 375with all upper-case characters translated to their 376corresponding lower-case equivalents. 377.TP 378.BI toupper( str ) 379returns a copy of 380.I str 381with all lower-case characters translated to their 382corresponding upper-case equivalents. 383.PD 384.PP 385The ``function'' 386.B getline 387sets 388.B $0 389to the next input record from the current input file; 390.B getline 391.BI < " file 392sets 393.B $0 394to the next record from 395.IR file . 396.B getline 397.I x 398sets variable 399.I x 400instead. 401Finally, 402.IB cmd " | getline 403pipes the output of 404.I cmd 405into 406.BR getline ; 407each call of 408.B getline 409returns the next line of output from 410.IR cmd . 411In all cases, 412.B getline 413returns 1 for a successful input, 4140 for end of file, and \-1 for an error. 415.PP 416The functions 417.BR compl , 418.BR and , 419.BR or , 420.BR xor , 421.BR lshift , 422and 423.B rshift 424peform the corresponding bitwise operations on their 425operands, which are first truncated to integer. 426.PP 427Patterns are arbitrary Boolean combinations 428(with 429.BR "! || &&" ) 430of regular expressions and 431relational expressions. 432Regular expressions are as in 433.IR egrep ; 434see 435.IR grep (1). 436Isolated regular expressions 437in a pattern apply to the entire line. 438Regular expressions may also occur in 439relational expressions, using the operators 440.B ~ 441and 442.BR !~ . 443.BI / re / 444is a constant regular expression; 445any string (constant or variable) may be used 446as a regular expression, except in the position of an isolated regular expression 447in a pattern. 448.PP 449A pattern may consist of two patterns separated by a comma; 450in this case, the action is performed for all lines 451from an occurrence of the first pattern 452through an occurrence of the second, inclusive. 453.PP 454A relational expression is one of the following: 455.IP 456.I expression matchop regular-expression 457.br 458.I expression relop expression 459.br 460.IB expression " in " array-name 461.br 462.BI ( expr ,\| expr ,\| ... ") in " array-name 463.PP 464where a 465.I relop 466is any of the six relational operators in C, 467and a 468.I matchop 469is either 470.B ~ 471(matches) 472or 473.B !~ 474(does not match). 475A conditional is an arithmetic expression, 476a relational expression, 477or a Boolean combination 478of these. 479.PP 480The special patterns 481.B BEGIN 482and 483.B END 484may be used to capture control before the first input line is read 485and after the last. 486.B BEGIN 487and 488.B END 489do not combine with other patterns. 490They may appear multiple times in a program and execute 491in the order they are read by 492.IR awk . 493.PP 494Variable names with special meanings: 495.TF FILENAME 496.TP 497.B ARGC 498argument count, assignable. 499.TP 500.B ARGV 501argument array, assignable; 502non-null members are taken as filenames. 503.TP 504.B CONVFMT 505conversion format used when converting numbers 506(default 507.BR "%.6g" ). 508.TP 509.B ENVIRON 510array of environment variables; subscripts are names. 511.TP 512.B FILENAME 513the name of the current input file. 514.TP 515.B FNR 516ordinal number of the current record in the current file. 517.TP 518.B FS 519regular expression used to separate fields; also settable 520by option 521.BI \-F fs\fR. 522.TP 523.BR NF 524number of fields in the current record. 525.TP 526.B NR 527ordinal number of the current record. 528.TP 529.B OFMT 530output format for numbers (default 531.BR "%.6g" ). 532.TP 533.B OFS 534output field separator (default space). 535.TP 536.B ORS 537output record separator (default newline). 538.TP 539.B RLENGTH 540the length of a string matched by 541.BR match . 542.TP 543.B RS 544input record separator (default newline). 545If empty, blank lines separate records. 546If more than one character long, 547.B RS 548is treated as a regular expression, and records are 549separated by text matching the expression. 550.TP 551.B RSTART 552the start position of a string matched by 553.BR match . 554.TP 555.B SUBSEP 556separates multiple subscripts (default 034). 557.PD 558.PP 559Functions may be defined (at the position of a pattern-action statement) thus: 560.IP 561.B 562function foo(a, b, c) { ... } 563.PP 564Parameters are passed by value if scalar and by reference if array name; 565functions may be called recursively. 566Parameters are local to the function; all other variables are global. 567Thus local variables may be created by providing excess parameters in 568the function definition. 569.SH ENVIRONMENT VARIABLES 570If 571.B POSIXLY_CORRECT 572is set in the environment, then 573.I awk 574follows the POSIX rules for 575.B sub 576and 577.B gsub 578with respect to consecutive backslashes and ampersands. 579.SH EXAMPLES 580.TP 581.EX 582length($0) > 72 583.EE 584Print lines longer than 72 characters. 585.TP 586.EX 587{ print $2, $1 } 588.EE 589Print first two fields in opposite order. 590.PP 591.EX 592BEGIN { FS = ",[ \et]*|[ \et]+" } 593 { print $2, $1 } 594.EE 595.ns 596.IP 597Same, with input fields separated by comma and/or spaces and tabs. 598.PP 599.EX 600.nf 601 { s += $1 } 602END { print "sum is", s, " average is", s/NR } 603.fi 604.EE 605.ns 606.IP 607Add up first column, print sum and average. 608.TP 609.EX 610/start/, /stop/ 611.EE 612Print all lines between start/stop pairs. 613.PP 614.EX 615.nf 616BEGIN { # Simulate echo(1) 617 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 618 printf "\en" 619 exit } 620.fi 621.EE 622.SH SEE ALSO 623.IR grep (1), 624.IR lex (1), 625.IR sed (1) 626.br 627A. V. Aho, B. W. Kernighan, P. J. Weinberger, 628.IR "The AWK Programming Language, Second Edition" , 629Addison-Wesley, 2024. ISBN 978-0-13-826972-2, 0-13-826972-6. 630.SH BUGS 631There are no explicit conversions between numbers and strings. 632To force an expression to be treated as a number add 0 to it; 633to force it to be treated as a string concatenate 634\&\f(CW""\fP to it. 635.PP 636The scope rules for variables in functions are a botch; 637the syntax is worse. 638.PP 639Input is expected to be UTF-8 encoded. Other multibyte 640character sets are not handled. 641However, in eight-bit locales, 642.I awk 643treats each input byte as a separate character. 644.SH UNUSUAL FLOATING-POINT VALUES 645.I Awk 646was designed before IEEE 754 arithmetic defined Not-A-Number (NaN) 647and Infinity values, which are supported by all modern floating-point 648hardware. 649.PP 650Because 651.I awk 652uses 653.IR strtod (3) 654and 655.IR atof (3) 656to convert string values to double-precision floating-point values, 657modern C libraries also convert strings starting with 658.B inf 659and 660.B nan 661into infinity and NaN values respectively. This led to strange results, 662with something like this: 663.PP 664.EX 665.nf 666echo nancy | awk '{ print $1 + 0 }' 667.fi 668.EE 669.PP 670printing 671.B nan 672instead of zero. 673.PP 674.I Awk 675now follows GNU AWK, and prefilters string values before attempting 676to convert them to numbers, as follows: 677.TP 678.I "Hexadecimal values" 679Hexadecimal values (allowed since C99) convert to zero, as they did 680prior to C99. 681.TP 682.I "NaN values" 683The two strings 684.B +nan 685and 686.B \-nan 687(case independent) convert to NaN. No others do. 688(NaNs can have signs.) 689.TP 690.I "Infinity values" 691The two strings 692.B +inf 693and 694.B \-inf 695(case independent) convert to positive and negative infinity, respectively. 696No others do. 697