{{redirect|AWK}}
{{Infobox programming language
|name = AWK
|paradigm = [[scripting language]], [[procedural programming|procedural]], [[Event-driven programming|event-driven]]
|year = 1977, last revised 1985
, current POSIX edition is [http://www.opengroup.org/onlinepubs/000095399/utilities/awk.html IEEE Std 1003.1-2004]
|designer = [[Alfred V. Aho|Alfred '''A'''ho]], [[Peter J. Weinberger|Peter '''W'''einberger]], and [[Brian Kernighan|Brian '''K'''ernighan]]
|typing = none; can handle strings, integers and floating point numbers; regular expressions
|implementations
= awk, GNU Awk, mawk, nawk, MKS AWK, Thompson AWK (compiler), Awka (compiler)
|dialects = ''old awk'' oawk 1977, ''new awk'' nawk 1985, GNU Awk
|influenced_by = [[C (programming language)|C
]], [[SNOBOL]]4, [[Bourne shell]]
|influenced = [[Perl
]], [[Korn Shell]] (''ksh93'', ''dtksh'', ''tksh''), [[JavaScript syntax#Origin of Syntax|JavaScript]]
|operating_system = [[Cross-platform]]
|website =[http://cm.bell-labs.com/cm/cs/awkbook/index.html
]
}}

'''AWK''' is a general purpose [[programming language]] that is designed for processing text-based data, either in files or data streams. The name AWK is derived from the [[family name]]s of its authors — [[Alfred V. Aho|Alfred '''A'''ho]], [[Peter J. Weinberger|Peter '''W'''einberger]], and [[Brian Kernighan|Brian '''K'''ernighan]]; however, it is not commonly pronounced as a string of separate letters but rather to sound the same as the name of the bird, [[auk]] (which acts as an emblem of the language such as on ''The AWK Programming Language'' book cover). <tt>awk</tt>, when written in all lowercase letters, refers to the [[Unix]] or [[Plan 9 from Bell Labs|Plan 9]] program that runs other programs written in the AWK programming language.

AWK
is an example of a [[programming language]] that extensively uses the [[String (computer science)|string]] [[datatype]], [[associative array]]s (that is, arrays indexed by key strings), and [[regular expression]]s. The power, terseness, and limitations of AWK programs and [[sed]] scripts inspired [[Larry Wall]] to write [[Perl]]. Because of their dense notation, all these languages are often used for writing [[one-liner program]]s.

AWK is one of the early tools to appear in [[Version 7 Unix]] and gained popularity as a way to add computational features to a Unix [[Pipeline (Unix)|pipeline]].
A version of the AWK language is a standard feature of nearly every modern [[Unix-like]] [[operating system]] available today. AWK is mentioned in the [[Single UNIX Specification]] as one of the mandatory utilities of a [[Unix]] [[operating system]]. Besides the [[Bourne shell]], AWK is the only other scripting language available in a [http://www.unix.org/version3/apis/cu.html standard Unix environment]. Implementations of AWK exist as installed software for almost all other operating systems.

==Structure of
AWK programs==

An AWK program is a series of

''pattern'' { ''action'' }

pairs, where ''pattern'' is typically an expression and ''action'' is a series of commands.
Each line of
input is tested against all the patterns in turn and the ''action'' executed if the expression is true.
Either the ''pattern'' or the ''action'' may be omitted.
The ''pattern'' defaults to matching every line of input.
The default ''action'' is to print the line of input.

In addition to a simple AWK expression, the ''pattern
'' can be ''BEGIN'' or ''END'' causing the ''action'' to be executed before or after all lines of input have been read, or ''pattern1, pattern2'' which matches the range of lines of input starting with a line that matches ''pattern1'' up to and including the line that matches ''pattern2'' before again trying to match against ''pattern1'' on future lines.

In addition to normal arithmetic and logical operators,
AWK expressions include the tilde operator
, ''~'', which matches a [[regular expression#POSIX modern (extended) regular expressions|regular expressions]] against a string.
As a handy
default, ''/regexp/'' without using the tilde operator matches against the current line of input.

==AWK commands==
AWK commands are the statement that is substituted for ''action'' in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of [[dynamically linked library|dynamically linked libraries]], which can also provide more functions.

For brevity, the enclosing curly braces ( ''{ }'' ) will be omitted from these examples.

===The ''print'' command===
The ''print'' command is used to
output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is:

print

This displays the contents of the current line. In
AWK, lines are broken down into ''fields'', and these can be displayed separately:

; <tt>print $1</tt>
: Displays the first field of the current line
; <tt>print $1, $3</tt>
: Displays the first and third fields of the current line, separated by a predefined string
called the output field separator (OFS) whose default value is a single space character

Although these fields (''$X'') may bear resemblance to variables (the $ symbol indicates variables in perl), they actually refer to the fields of the current line. A special case, ''$0'', refers to the entire line. In fact, the commands "<tt>print</tt>" and "<tt>print $0</tt>" are identical in functionality.

The ''print'' command can also display the results of calculations and/or function calls:

print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2
)

Output may be sent to a file:

print "expression" > "file name"


===Variables and Syntax===
Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators ''+ - * /'' are addition, subtraction, multiplication, and division, respectively. For string [[concatenation]], simply place two variables (or string constants) next to each other. It is optional to use a space in between if string constants are involved. But you can't place two variable names adjacent to each other without having a space in between. String constants are [[delimited]] by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using ''#'' as the first character on a line.

===User-defined functions===
In a format similar to [[C (programming language)|C]], function
definitions consist of the keyword <tt>function</tt>, the function name, argument names and the function body. Here is an example of a function.

function add_three (number
, temp) {
temp = number + 3
return temp
}

This statement can be invoked as follows:

print add_three(36
) # Outputs '''39'''

Functions can have variables
that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some [[whitespace]] in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.

==Sample applications==
===Hello World===
Here is the ubiquitous "[[Hello world program]]" program written in AWK:

BEGIN { print "Hello, world!" }

===Print lines longer than 80 characters===
Print all lines longer than 80 characters. Note that the default action is to print the current line.

length > 80


<i>The AWK Programming Language</i> now specifies an explicit $0 in the length function:

length($0) > 80


===Print a count of words===
Count words in the input, and print lines, words, and characters (like
[[Wc (Unix)|wc]])

{
w += NF
c += length
+ 1
}
END { print NR, w, c }

===Sum
last word===

{ s += $NF }
END { print s + 0 }

As there is no pattern for the first line of the program, every line of input matches by default so the ''s += $NF'' action is executed.
''s'' is incremented by the numeric value of ''$NF'' which is the last word on the line as defined by AWK's field separator, by default white-space.
''NF'' is the number of fields in the current line, e.g. 4.
Since ''$4'' is the value of the fourth field, ''$NF'' is the value of the last field in the line regardless of how many fields this line has, or whether it has more or
fewer fields than surrounding lines. $ is actually a unary operator with the highest [[operator precedence]].

(If the line has no fields then ''NF'' is 0, ''$0'' is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.)

At the end of the input the ''END'' pattern matches so ''s'' is printed.
However, since there may have been no lines of input at all, and so ''s'' has never been assigned to, it will by default be an empty string.
Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value
.
(Concatenating an empty string is to coerce from a number to a string, e.g. ''s ""''. Note, there's no operator to concatenate strings, they're just placed adjacently.)
With the coercion the program prints ''0'' on an empty input, without it an empty line is printed
.

===Match a range of input lines===

$ yes Wikipedia
| awk '{ printf "%6d %s\n", NR, $0 }' | awk 'NR % 4 == 1, NR % 4 == 3' | head -7
1 Wikipedia
2 Wikipedia
3 Wikipedia
5 Wikipedia
6 Wikipedia
7 Wikipedia
9 Wikipedia
$

The [[yes
(Unix)|yes]] command repeatidly prints the letter "y" on a line. In this case, we tell the command to print the word "Wikipedia".
The first awk command prints each line
numbered. The printf function acts like [[printf|the one in most other programming languages]].
The second awk command works
as follows:
''NR'' is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input.
''%'' is the modulo operator.
''NR % 4 == 1'' is true for the first, fifth, ninth, etc., lines of input.
Likewise, ''NR % 4 == 3'' is true for the third, seventh, eleventh, etc., lines of input.
The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3.
It then stays false until the first part matches again on line 5
. The [[head (Unix)|head]] command is then issued to only print the first seven lines of that output, thus ensuring that the program actually quits (as yes never terminates itself).

The first part of a range pattern being constantly true, e.g. ''1'', can be used to start the range at the beginning of input.
Similarly, if the second part is constantly false, e.g. ''0'', the range continues until the end of input:
/^--cut here--$/, 0

prints lines of input from the first line matching the regular expression ''^--cut here--$'' to the end.

===Calculate word frequencies===
Word frequency, uses [[associative array]]s:

BEGIN { FS="[^a-zA-Z]+"}

{ for (i=1; i<=NF; i++)
words
[tolower($i)]++
}

END { for (i in words)
print i, words[i
]
}

The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Note that separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line
for (i in words)
creates a loop that goes through the array words, setting i to each <i>subscript</i> of the array. This is different from most languages, where such a loop goes through each value in the array. This means that you print the word with each count in a simple way
.

==Self-contained AWK scripts==
As with many other programming languages, self-contained AWK script can be constructed using the so-called "[[shebang (Unix)|shebang]]" syntax.

For example, a UNIX command called
<tt>hello.awk</tt> that prints the string "Hello, world!" may be built by creating a file named <tt>hello.awk</tt> containing the following lines:

#!/usr/bin/awk -f
BEGIN { print "Hello, world!"; exit
}

==AWK versions and implementations==
AWK was originally written in 1977, and distributed with [[Version 7 Unix]].

In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book ''The AWK Programming Language'', published 1988, and its implementation was made available in releases of [[UNIX System V]]. To avoid confusion with the incompatible older version, this version was sometimes known as "new awk" or ''nawk''. This implementation was released under a
[[free software license]] in 1996, and is still maintained by Brian Kernighan. (see external links below)

''BWK awk'' refers to the version by [[Brian Kernighan|Brian W. Kernighan]]. It has been dubbed the "One True AWK" because of the use of the term in association with the book<ref>[http://cm.bell-labs.com/cm/cs/awkbook/ '''The AWK Programming Language''', ISBN 0-201-07981-X.]</ref> that originally described the language, and the fact that Kernighan was one of the original authors of awk. FreeBSD refers to this version as ''one-true-awk''<ref>[http://www.freebsd.org/cgi/cvsweb.cgi/src/contrib/one-true-awk/FREEBSD-upgrade?rev=1.9&content-type=text/x-cvsweb-markup FreeBSD's work log for importing BWK awk into FreeBSD's core], dated 2005-05-16, downloaded 2006-09-20</ref>.

''gawk
'' ([[GNU]] awk) is another free software implementation. It was written before the original implementation became freely available, and is still widely used. Many [[Linux distribution]]s come with a recent version of ''gawk'' and ''gawk'' is widely recognized as the de-facto standard implementation in the [[Linux]] world; ''gawk'' version 3.0 was included as ''awk'' in [[FreeBSD]] prior to version 5.0. Subsequent versions of FreeBSD use ''BWK awk'' in order to avoid<ref>[http://www.freebsd.org/doc/en_US.ISO8859-1/articles/bsdl-gpl/ FreeBSD's view of ''GPL Advantages and Disadvantages'']</ref> the [[GNU General Public License|GPL]], a [[BSD and GPL licensing|more restrictive]] (in the sense that GPL licensed code cannot be modified to become proprietary software) license than the [[BSD licenses|BSD license]].
<ref>[http://www.freebsd.org/releases/5.0R/relnotes-i386.html#USERLAND FreeBSD 5.0 release notes] with notice of '''BWK awk''' in the base distribution</ref>


''xgawk'' is a SourceForge project<ref>[http://sourceforge.net/projects/xmlgawk/ ''xgawk'' at SourceForge]</ref> based on ''gawk''. It extends ''gawk'' with dynamically loadable libraries.

''mawk'' is a very fast AWK implementation by Mike Brennan based on a [[byte code]] interpreter.

''awka'' (whose front end is written on top of the ''mawk'' program) is a translator of awk scripts into C code. When compiled, statically including the author's libawka.a, the resulting executables are considerably sped up and according to the author's tests compare very well with other versions of awk, perl or tcl. Small scripts will turn into programs of 160-170 kB. http://awka.sourceforge.net

Downloads and further information about these versions are available from the sites listed below.

Thompson AWK or ''TAWK'' is an AWK [[compiler]] for [[MS-DOS|DOS]] and [[Microsoft Windows|Windows]], previously sold by Thompson Automation Software (which has ceased its activities).

''Jawk'' is a SourceForge project<ref>[http://sourceforge.net/projects/jawk/ ''Jawk'' at SourceForge]</ref> to implement AWK in [[Java (programming language)|Java]]. Extensions to the language are added to provide access to [[Java (programming language)|Java]] features within AWK scripts (i.e., Java threads, sockets, Collections, etc).

[[BusyBox]] includes a sparsely documented Awk implementation that appears to be complete, written by Dmitry Zakharov. This implementation is the smallest Awk implementation out there, suitable for embedded systems.

==Books==
* {{cite book
| author
=[[Alfred V. Aho]], [[Brian W. Kernighan]], and [[Peter J. Weinberger]]
| publisher=Addison-Wesley
| year= 1988
| id=ISBN 0-201-07981-X
| url=http://cm.bell-labs.com/cm/cs/awkbook/
| title=The AWK Programming Language
}} ''The book's webpage includes downloads of the current implementation of Awk and links to others.''
* {{cite book
| author
=[[Arnold Robbins]]
| url=http://www.oreilly.com/catalog/awkprog3/index.html
| edition=Edition 3
| title=Effective awk Programming
}} ''Arnold Robbins maintained the GNU Awk implementation of AWK for more than 10 years. The free GNU Awk manual was also published by O'Reilly in May 2001. Free download of this manual is possible through the following book references.''
* {{cite book
| author
=[[Arnold Robbins]]
| url=http://www.gnu.org/software/gawk/manual/html_node/index.html
| edition=Edition 3
| title=GAWK: Effective AWK Programming: A User's Guide for GNU Awk
}}
* {{cite book
| title
=sed & awk, Second Edition
| author=[[Dale Dougherty]], [[Arnold Robbins]]
|edition= Second Edition
| year = March 1997
| id=ISBN 1-56592-225-5
| url=http://www.oreilly.com/catalog/sed2/
| publisher=[[O'Reilly Media]]
}}

==References==
{{reflist
}}

==See also==
*[[sed]]
*[[List of Unix programs]]

== External links ==
*{{man|cu|awk|SUS|pattern scanning and processing language}}
*[http://cm.bell-labs.com/cm/cs/awkbook/index.html awk] maintained by [[Brian Kernighan]].
*[news:comp.lang.awk comp.lang.awk] is a [[USENET]] [[newsgroup]] dedicated to AWK.
*[http://www.gnu.org/software/gawk/gawk.html GAWK (GNU awk) webpage]
*[http://freshmeat.net/projects/mawk/ ''mawk download site'']
*[http://clio.rice.edu/djgpp/win2k/gwk311b.zip DJGPP port of Gawk 3.11b as a downloadable 768KB zipfile]
*[http://sourceforge.net/projects/xmlgawk/ ''xgawk download site'']
*[http://awka.sourceforge.net/ Awka Open Source, AWK to C Conversion Tool]
*[http://www.tasoft.com/tawk.html TAWK Compiler
]
*[http://sourceforge.net/projects/jawk/ Jawk Open Source, an implementation of AWK in Java with extensions]
*[http://www.gnulamp.com/awk.html gnulamp awk tutorial]
*[http://www.samiam.org/awk/annoyances.html AWK annoyances]; This page includes a Linux port of the MKS version of AWK.
{{unix commands}}

[[Category:Curly bracket programming languages|AWK]]
[[Category
:Domain-specific programming languages]]
[[Category:Text-oriented programming
languages]]
[[Category:Scripting languages]]
[[Category:Unix
software]]
[[Category:Free compilers and interpreters]]
[[Category:Standard Unix programs|Awk]]

[[bs:Awk]]
[[ca:Awk]]
[[cs:AWK]]
[[de:Awk]]
[[es:AWK
]]
[[fa:ای‌دبلیو‌کی]]
[[fr:Awk]]
[[ko:AWK]]
[[hr:Awk]]
[[it:Awk]]
[[hu:Awk]]
[[nl:AWK]]
[[ja:AWK]]
[[no:Awk]]
[[pl
:AWK]]
[[pt:AWK]]
[[ru:AWK]]
[[sk:AWK (programovací jazyk
)]]
[[fi:AWK]]
[[sv:Awk]]
[[th:ภาษา AWK]]
[[tg:Awk]]
[[tr:AWK]]
[[uk:AWK]]
[[zh:AWK]]