awk text
it's a spreadsheet. in your terminal. from 1977.
The awk man page is 1,600 lines long. It’s technically a full programming language. You need about six patterns and you’ll never open Excel to look at a CSV again.
You had a log file. You needed the third column. So you opened it in Excel. Or Google Sheets. You imported it. You selected “space delimited.” It asked about text qualifiers. You clicked through three dialogs. It loaded 40,000 rows and froze for eight seconds. You scrolled to column C. You selected it. You copied it. You pasted it somewhere else. You closed Excel without saving.
Or you had a CSV file and needed to sum a column. So you opened it in a spreadsheet. You scrolled to the bottom. You typed =SUM(D2:D40000). You waited while it calculated. It showed you a number. You closed the spreadsheet. That entire interaction took ninety seconds for a task that takes awk about two.
awk treats every line of text as a row and splits it into columns automatically. It’s a spreadsheet that runs in your terminal, processes millions of rows without flinching, and never asks you about text qualifiers.
Unless you’re running Windows then wtf none of this applies to you. But hey, come to the dark side, go install WSL2 and you can follow along. We’ll wait. Impatiently.
If you’re lazy like me (all sysadmins are!) then click here for the awk cheat sheet.
Print a specific column
awk '{print $3}' file.txt
Prints the third column of every line. $1 is the first column, $2 is the second, etc. $0 is the entire line.
awk splits on whitespace by default — spaces and tabs. It doesn’t care how many spaces are between columns. It just figures it out.
Print multiple columns
awk '{print $1, $4}' file.txt
First and fourth columns. The comma adds a space between them in the output.
Print with custom formatting
awk '{print $1 " -> " $3}' file.txt
Adds your own text between columns. Now your output says 192.168.1.5 -> 200 instead of just two bare columns.
Use a different separator
Input separator (for CSV files)
awk -F',' '{print $2}' data.csv
-F',' sets the field separator to a comma. Now awk treats CSV columns as fields. Second column of a CSV file, one command, no import dialogs.
Tab-separated files
awk -F'\t' '{print $1, $3}' data.tsv
Same idea. Tab-separated? Tell awk with -F'\t'.
Colon-separated (like /etc/passwd)
awk -F':' '{print $1, $7}' /etc/passwd
Shows every username and their login shell. The /etc/passwd file is colon-separated and awk handles it without blinking.
Filter rows (only print matching lines)
Match a pattern
awk '/ERROR/' logfile.txt
Prints every line containing “ERROR.” Like grep, but with awk’s column-processing power available in the same command.
Match and extract a column
awk '/ERROR/ {print $1, $2, $NF}' logfile.txt
Lines containing “ERROR” — but only print the first column, second column, and last column ($NF). NF is a built-in variable for “number of fields.” $NF is always the last column, no matter how many columns there are.
Filter by column value
awk '$3 > 500 {print $0}' data.txt
Only print lines where the third column is greater than 500. awk understands numbers. It compares. It filters. No formulas. No conditional formatting.
awk '$1 == "admin"' /etc/passwd
Lines where the first column is exactly “admin.”
Built-in variables
| Variable | What it is |
|---|---|
$0 |
The entire line. |
$1, $2, ... |
Column 1, 2, etc. |
$NF |
The last column. |
NF |
Number of fields (columns) on the current line. |
NR |
Line number (row number, starting at 1). |
FS |
Field separator (default: whitespace). |
OFS |
Output field separator (default: space). |
These are why awk is powerful. NR gives you line numbers. NF gives you column counts. $NF gives you the last column without knowing how many columns there are.
Do math
Sum a column
awk '{sum += $3} END {print sum}' data.txt
Adds up every value in column 3 and prints the total at the end. END runs after all lines are processed. This is your =SUM() without the spreadsheet.
Average a column
awk '{sum += $3; count++} END {print sum/count}' data.txt
Sum divided by count. Average. Done. No pivot tables.
Count lines matching a pattern
awk '/ERROR/ {count++} END {print count}' logfile.txt
How many error lines? One number. Like grep -c but extendable — you could count errors AND warnings in the same pass.
Min and max
awk 'NR==1 {min=max=$3} $3>max {max=$3} $3<min {min=$3} END {print "min:", min, "max:", max}' data.txt
Finds the minimum and maximum of column 3 in one pass. No sorting. No scrolling. No conditional highlighting.
Print with formatting
Add headers
awk 'BEGIN {print "User", "Shell"} -F: {print $1, $7}' /etc/passwd
BEGIN runs before any lines are processed. Use it to print headers, set variables, or initialize things.
Tab-separated output
awk -F',' '{print $1 "\t" $3}' data.csv
Read CSV, output tab-separated. Format conversion in one line.
Printf for precise formatting
awk '{printf "%-20s %10.2f\n", $1, $3}' data.txt
Left-aligned name (20 chars wide), right-aligned number with two decimal places. Like printf in C. For when your output needs to line up in neat columns.
Line numbers and ranges
Add line numbers
awk '{print NR, $0}' file.txt
Prepends the line number to every line. Like cat -n but with the option to do more.
Print specific line ranges
awk 'NR>=10 && NR<=20' file.txt
Lines 10 through 20. Clean. No head/tail arithmetic.
Print every other line
awk 'NR%2==0' file.txt
Even-numbered lines only. Modular arithmetic. In a text processor. From 1977.
Real-world one-liners
Disk usage report (with df)
df -h | awk 'NR>1 && $5+0 > 80 {print $6, $5}'
Filesystems over 80% full. Skips the header (NR>1), checks the percentage column ($5+0 converts “85%” to 85), prints the mount point and usage.
Top talkers in a web log
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
Most frequent IP addresses. awk extracts the first column (IP), the rest of the pipeline counts and sorts. You just did log analysis without a SIEM.
Sum file sizes from ls
ls -l *.log | awk '{total += $5} END {printf "Total: %.2f MB\n", total/1048576}'
Total size of all log files in megabytes. awk does the math, printf formats it.
The flags that actually matter
| Flag / Feature | What it does |
|---|---|
-F'SEP' |
Set the field separator (comma, tab, colon, etc.). |
'{print $N}' |
Print column N. |
'/pattern/' |
Filter lines matching a regex. |
'$N > X' |
Filter by column value. |
'{sum += $N}' |
Accumulate values for math. |
'BEGIN {}' |
Run code before processing lines. |
'END {}' |
Run code after all lines are processed. |
NR |
Current line number. |
NF |
Number of fields on current line. |
$NF |
The last column. |
“But I’ll just open it in—”
Put the spreadsheet down.
“Excel can do all this.” Excel can handle about a million rows before it starts choking. awk processes multi-gigabyte files without flinching because it reads one line at a time and never loads the whole file into memory. Also, Excel costs money. awk has been free since Carter was president.
“Pandas in Python is more powerful.” Pandas is great for data science. It’s also a library that requires Python, pip, a virtual environment, an import statement, and about twelve lines of boilerplate before you can print a column. awk is one line. For quick data inspection, awk wins on sheer convenience.
“Google Sheets is free.” Google Sheets requires uploading your data to Google’s servers, waiting for it to render, and hoping your internet doesn’t hiccup while you’re looking at sensitive log files. awk runs locally, instantly, and doesn’t report your data to an advertising company.
“I use cut for columns.” cut extracts columns. That’s all it does. awk extracts columns AND filters rows AND does math AND supports conditionals AND has variables. cut is a screwdriver. awk is a toolbox.
“csvkit is made for CSV files.” csvkit is excellent and you should use it for complex CSV work. But it’s a Python package you have to install. awk is already on your machine and handles simple CSV tasks without an apt install.
awk cheat sheet
You made it. Or you skipped straight here. Either way, no judgment. Copy and paste these. Pin them. Tattoo them on your forearm. Whatever works.
| What you’re doing | Command |
|---|---|
| Print a column | awk '{print $3}' file |
| Print multiple columns | awk '{print $1, $4}' file |
| CSV columns | awk -F',' '{print $2}' data.csv |
| Filter by pattern | awk '/ERROR/' file |
| Filter by column value | awk '$3 > 100' file |
| Last column | awk '{print $NF}' file |
| Sum a column | awk '{s+=$3} END {print s}' file |
| Average a column | awk '{s+=$3} END {print s/NR}' file |
| Count matches | awk '/pattern/ {c++} END {print c}' file |
| Add line numbers | awk '{print NR, $0}' file |
| Print line range | awk 'NR>=10 && NR<=20' file |
| Custom separator output | awk -F',' '{print $1 "\t" $3}' file |
| Skip header row | awk 'NR>1 {print $2}' file |
The one command:
awk '{print $3}' file— extract a column, instantly, no import wizard. Everything else is refinement.