awk text

it's a spreadsheet. in your terminal. from 1977.

The awk man page is 1,600 lines long. It’s technically a full programming language. You need about six patterns and you’ll never open Excel to look at a CSV again.

You had a log file. You needed the third column. So you opened it in Excel. Or Google Sheets. You imported it. You selected “space delimited.” It asked about text qualifiers. You clicked through three dialogs. It loaded 40,000 rows and froze for eight seconds. You scrolled to column C. You selected it. You copied it. You pasted it somewhere else. You closed Excel without saving.

Or you had a CSV file and needed to sum a column. So you opened it in a spreadsheet. You scrolled to the bottom. You typed =SUM(D2:D40000). You waited while it calculated. It showed you a number. You closed the spreadsheet. That entire interaction took ninety seconds for a task that takes awk about two.

awk treats every line of text as a row and splits it into columns automatically. It’s a spreadsheet that runs in your terminal, processes millions of rows without flinching, and never asks you about text qualifiers.

Unless you’re running Windows then wtf none of this applies to you. But hey, come to the dark side, go install WSL2 and you can follow along. We’ll wait. Impatiently.

If you’re lazy like me (all sysadmins are!) then click here for the awk cheat sheet.


awk '{print $3}' file.txt

Prints the third column of every line. $1 is the first column, $2 is the second, etc. $0 is the entire line.

awk splits on whitespace by default — spaces and tabs. It doesn’t care how many spaces are between columns. It just figures it out.

awk '{print $1, $4}' file.txt

First and fourth columns. The comma adds a space between them in the output.

awk '{print $1 " -> " $3}' file.txt

Adds your own text between columns. Now your output says 192.168.1.5 -> 200 instead of just two bare columns.


Use a different separator

Input separator (for CSV files)

awk -F',' '{print $2}' data.csv

-F',' sets the field separator to a comma. Now awk treats CSV columns as fields. Second column of a CSV file, one command, no import dialogs.

Tab-separated files

awk -F'\t' '{print $1, $3}' data.tsv

Same idea. Tab-separated? Tell awk with -F'\t'.

Colon-separated (like /etc/passwd)

awk -F':' '{print $1, $7}' /etc/passwd

Shows every username and their login shell. The /etc/passwd file is colon-separated and awk handles it without blinking.


Filter rows (only print matching lines)

Match a pattern

awk '/ERROR/' logfile.txt

Prints every line containing “ERROR.” Like grep, but with awk’s column-processing power available in the same command.

Match and extract a column

awk '/ERROR/ {print $1, $2, $NF}' logfile.txt

Lines containing “ERROR” — but only print the first column, second column, and last column ($NF). NF is a built-in variable for “number of fields.” $NF is always the last column, no matter how many columns there are.

Filter by column value

awk '$3 > 500 {print $0}' data.txt

Only print lines where the third column is greater than 500. awk understands numbers. It compares. It filters. No formulas. No conditional formatting.

awk '$1 == "admin"' /etc/passwd

Lines where the first column is exactly “admin.”


Built-in variables

Variable What it is
$0 The entire line.
$1, $2, ... Column 1, 2, etc.
$NF The last column.
NF Number of fields (columns) on the current line.
NR Line number (row number, starting at 1).
FS Field separator (default: whitespace).
OFS Output field separator (default: space).

These are why awk is powerful. NR gives you line numbers. NF gives you column counts. $NF gives you the last column without knowing how many columns there are.


Do math

Sum a column

awk '{sum += $3} END {print sum}' data.txt

Adds up every value in column 3 and prints the total at the end. END runs after all lines are processed. This is your =SUM() without the spreadsheet.

Average a column

awk '{sum += $3; count++} END {print sum/count}' data.txt

Sum divided by count. Average. Done. No pivot tables.

Count lines matching a pattern

awk '/ERROR/ {count++} END {print count}' logfile.txt

How many error lines? One number. Like grep -c but extendable — you could count errors AND warnings in the same pass.

Min and max

awk 'NR==1 {min=max=$3} $3>max {max=$3} $3<min {min=$3} END {print "min:", min, "max:", max}' data.txt

Finds the minimum and maximum of column 3 in one pass. No sorting. No scrolling. No conditional highlighting.


Add headers

awk 'BEGIN {print "User", "Shell"} -F: {print $1, $7}' /etc/passwd

BEGIN runs before any lines are processed. Use it to print headers, set variables, or initialize things.

Tab-separated output

awk -F',' '{print $1 "\t" $3}' data.csv

Read CSV, output tab-separated. Format conversion in one line.

Printf for precise formatting

awk '{printf "%-20s %10.2f\n", $1, $3}' data.txt

Left-aligned name (20 chars wide), right-aligned number with two decimal places. Like printf in C. For when your output needs to line up in neat columns.


Line numbers and ranges

Add line numbers

awk '{print NR, $0}' file.txt

Prepends the line number to every line. Like cat -n but with the option to do more.

awk 'NR>=10 && NR<=20' file.txt

Lines 10 through 20. Clean. No head/tail arithmetic.

awk 'NR%2==0' file.txt

Even-numbered lines only. Modular arithmetic. In a text processor. From 1977.


Real-world one-liners

Disk usage report (with df)

df -h | awk 'NR>1 && $5+0 > 80 {print $6, $5}'

Filesystems over 80% full. Skips the header (NR>1), checks the percentage column ($5+0 converts “85%” to 85), prints the mount point and usage.

Top talkers in a web log

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10

Most frequent IP addresses. awk extracts the first column (IP), the rest of the pipeline counts and sorts. You just did log analysis without a SIEM.

Sum file sizes from ls

ls -l *.log | awk '{total += $5} END {printf "Total: %.2f MB\n", total/1048576}'

Total size of all log files in megabytes. awk does the math, printf formats it.


The flags that actually matter

Flag / Feature What it does
-F'SEP' Set the field separator (comma, tab, colon, etc.).
'{print $N}' Print column N.
'/pattern/' Filter lines matching a regex.
'$N > X' Filter by column value.
'{sum += $N}' Accumulate values for math.
'BEGIN {}' Run code before processing lines.
'END {}' Run code after all lines are processed.
NR Current line number.
NF Number of fields on current line.
$NF The last column.

“But I’ll just open it in—”

Put the spreadsheet down.

“Excel can do all this.” Excel can handle about a million rows before it starts choking. awk processes multi-gigabyte files without flinching because it reads one line at a time and never loads the whole file into memory. Also, Excel costs money. awk has been free since Carter was president.

“Pandas in Python is more powerful.” Pandas is great for data science. It’s also a library that requires Python, pip, a virtual environment, an import statement, and about twelve lines of boilerplate before you can print a column. awk is one line. For quick data inspection, awk wins on sheer convenience.

“Google Sheets is free.” Google Sheets requires uploading your data to Google’s servers, waiting for it to render, and hoping your internet doesn’t hiccup while you’re looking at sensitive log files. awk runs locally, instantly, and doesn’t report your data to an advertising company.

“I use cut for columns.” cut extracts columns. That’s all it does. awk extracts columns AND filters rows AND does math AND supports conditionals AND has variables. cut is a screwdriver. awk is a toolbox.

“csvkit is made for CSV files.” csvkit is excellent and you should use it for complex CSV work. But it’s a Python package you have to install. awk is already on your machine and handles simple CSV tasks without an apt install.


awk cheat sheet

You made it. Or you skipped straight here. Either way, no judgment. Copy and paste these. Pin them. Tattoo them on your forearm. Whatever works.

What you’re doing Command
Print a column awk '{print $3}' file
Print multiple columns awk '{print $1, $4}' file
CSV columns awk -F',' '{print $2}' data.csv
Filter by pattern awk '/ERROR/' file
Filter by column value awk '$3 > 100' file
Last column awk '{print $NF}' file
Sum a column awk '{s+=$3} END {print s}' file
Average a column awk '{s+=$3} END {print s/NR}' file
Count matches awk '/pattern/ {c++} END {print c}' file
Add line numbers awk '{print NR, $0}' file
Print line range awk 'NR>=10 && NR<=20' file
Custom separator output awk -F',' '{print $1 "\t" $3}' file
Skip header row awk 'NR>1 {print $2}' file

The one command: awk '{print $3}' file — extract a column, instantly, no import wizard. Everything else is refinement.

Back to the top, you overachiever.