Awk Command

awk Linux

Source: https://likegeeks.com/awk-command/

Awk Options

Awk can take the following options:
-F fs To specify a file separator
-f file To specify a file that contains awk script
-v var=value To declare a variable

Read AWK Scripts

To define an awk script, use braces surrounded by single quotation marks like this:

$ awk '{print "Welcome to awk command tutorial "}'

If you type anything, it returns the same welcome string we provide.
To terminate the program, press the CTRL+D.

Using variables

With awk, you can process text files. Awk assigns some variables for each data field found:

$0 - for the whole line.
$1 - for the first field.
$2 - for the second field.
$n - for the nth field.

The whitespace character like space or tab is the default separator between fields in awk.

Example file we are using in this article.
File name: myfile
Content:

This is a test
This is the second test
This is the third test
This is the fourth test

$ awk '{print $1}' myfile
This
This
This
This

The above example prints the first word of each line.

Some times the separator in some files is not space nor tab but somethin else. You can specify it using -F option:

$ awk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
...

This command prints the first field in the passwd file. We use the colon as a separator because the passwd file uses it.

Using Multiple Commands

To run multiple commands, separate them with a semicolon like this:

$ echo "Hello Tom" | awk '{$2="Adam"; print $0}'
Hello Adam

The first command makes the $2 field equals Adam. The second command prints the entire line.

Reading The Script From a File

You can type your awk script in a file and specify that file using the -f option
Our file contains this script:

{print $1 " home at " $6}

$ awk -F: -f testfile /etc/passwd
root home at /root
daemon home at /usr/sbin
bin home at /bin
sys home at /dev
sync home at /bin
...

Here we print the username and his home path from /etc/passwd, and surely the separator is specified with capital -F which is the colon.
You also can edit your awk script file like this and get same output:

{
    text = " home at "
    print $1 text $6
}

Awk Preprocessing

If you need to create a title or a header for your result or so. You can use the BEGIN keyword to achieve this.
It runs before processing the data:

$ awk 'BEGIN {print "The File Contents:"} {print $0}' myfile
The File Contents:
This is a test
This is the second test
This is the third test
This is the fourth test

Awk Postprocessing

To run a script after processing the data, use the END keyword:

$ awk 'BEGIN {print "The file Contents:"}
{print $0}
END {print "File footer"}' myfile

Output:

The File Contents:
This is a test
This is the second test
This is the third test
This is the fourth test
File footer

This is useful, you can use it to add a footer for example.
Let’s combine them together in a script file:

BEGIN {
    print "Users and thier corresponding home"
    print " UserName \t HomePath"
    print "___________ \t __________"
    FS=":"
}
{
    print $1 "  \t  " $6
}
END {
    print "The end"
}

First, the top section is created using BEGIN keyword. The we define the FS and print the footer at the end.

$ awk -f myscript /etc/passwd
Users and thier corresponding home
 UserName        HomePath
___________      __________
root      /root
daemon            /usr/sbin
bin       /bin
sys       /dev
sync      /bin<Paste>
...
sshd      /var/run/sshd
The end

Built-in Variables

We saw the data field variables $1, $2, $3, etc are used to extract data fields, we also deal with the field separator FS.
But these are not the only variables, there are more built-in variables.
The following list shows some of the built-in variables:

Option	Description
FIELDWIDTHS	Specifies the field width
RS	Specifies the record separator
FS	Specifies the field separator
OFS	Specifies the Output separator
ORS	Specifies the Output separator

By default, the OFS variable is the space, you can set the OFS variable to specify the separator you need:

$ awk 'BEGIN {FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd
root-/root-/bin/bash
daemon-/usr/sbin-/usr/sbin/nologin
bin-/bin-/usr/sbin/nologin
sys-/dev-/usr/sbin/nologin
sync-/bin-/bin/sync
...

Sometimes, the fields are distributed without a fixed separator. In these cases, FIELDWIDTHS variable solves the problem.
Suppose we have this content in testfile:

1235.96521
927-8.3652
36257.8157

$ awk 'BEGIN {FIELDWIDTHS="3 4 3"} {print $1,$2,$3}' testfile
5.96 521
-8.3 652
57.8 157

Look at the output. The output fields are 3 pre line and each field length is based on what we asssigned by FIELDWIDTHS exactly.

Suppose that your data distributed on different lines like the following:

Person Name
123 High Street
(222) 466-1234

Another person
487 High Street
(523) 643-8754

In the above example, awk fails to process fields properly because the fields are separated by new lines and not spaces.
You need to set the FS to the newline(\n) and the RS to a blank text, so empty lines will be considered separators.

$ awk 'BEGIN {FS="\n"; RS=""} {print $1, $2, $3}' addresses
Person Name 123 High Street (222) 466-1234
Another person 487 High Street (523) 643-8754

Awesome! we can read the records and fields properly.

More Variables

There are some other variables that help you to get more information:

Option	Description
ARGC	Retrieves the number of passed parameters
ARGV	Retrieves the command line parameters
ENVIRON	Array of the shell environment variables and corresponding values
FILENAME	The file name that is processed by awk
NF	Field count of the line being processed
NR	Retrieves total count of processed records
FNR	The record which is processed
IGNORECASE	To ignore the character case

Let’s test them.

$ awk 'BEGIN {print ARGC, ARGV[1]}' myfile
2 myfile

The ENVIRON variable retreves the shell environment variables like this:

$ awk 'BEGIN {print ENVIRON["PATH"]}'
/home/testuser/bin:/home/testuser/.local/bin:/home/testuser/bin:/home/testuser/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

Remove duplicate lines

awk '!NF || !seen[$0]++'