Process a textfile/output line by line in Bash

One of the most common tasks when working with Bash-scripts and the Linux shell is text processing like filtering, selection, transforming, …

Often, these texts come from text files like CSV, log files, and so on. If you are an experienced user who is doing this on a daily basis, typing these kinds of command chains often feel like they are coming from “muscle memory” more than from your brain. But most of the time, you need only parts of these lines, like “the 5th to the 7th field” or some regular expressions match; these are usually quite easy to catch using a combination of the well-known tools awk, grep, cut or sed.
But: How to iterate (loop) over each line of a file in Bash and use that value for your processing; like, if you want to preserve only lines, matching a specific pattern or divide the script of a play into separate files per role, for example?

This is not really a hard task. But unlike the formerly mentioned processing for but parts of that lines, doing such things for the whole line is not a very commonly needed thing for my daily work. That’s why this gives me a hard time when I need it in those rare cases because that is not stored in my very own “muscle memory” and it enforces me to crawl through my memory castle and dig for it (or Google for it ?).

The solution

Reading from a file

Sure thing, there are plenty of ways to solve this, including not utilizing Bash in the first place, but some real programming language like Python ?. But to me, the following approach has proven itself as the most effective and easy to remember one:

#!/bin/bash
while IFS="" read -r line || [[ -n "$line" ]]; do
    echo "Text read from file: $line"
done < "/my/file"

Explanation:

  • IFS=””
    prevents leading/trailing whitespace from being trimmed.
  • -r
    prevents backslash escapes from being interpreted.
  • -d ‘ ‘    (not used in line-by-line processing)
    if this is added as a parameter to read, a whitespace is used to terminate the input line, rather than newline. This is handy, if you want to do something for each (whitespace-separated) element of the input, rather than line-by-line.
  • || [[ -n “$line” ]]
    prevents the last line from being ignored if it doesn’t end with a newline (\n), since read returns a non-zero exit code when it encounters EOF.

Instead of the “echo”-line, you can do whatever you like with the ${line}-variable, of course!

Reading another programs output

When reading the output from another command (like  find) instead of a file, the call is very similar:

#!/bin/bash
while IFS="" read -r line || [[ -n "$line" ]]; do
    echo "Text read from find output: $line"
done < <(find /some/dir -name file.txt)

See the explanation in the file-section before. Really nothing too different here.

An Example

The following example is using this sketch.txt – file, which contains the text of a Monty Python sketch. It uses the proposed solution to separate its content into two separate files for each of the roles:

#!/bin/bash
while IFS="" read -r line || [[ -n "${line}" ]]; do
    if [ ! -z "$(echo ${line} | egrep '^Man:')" ]; then
        echo ${line} >> man_lines.txt
    elif [ ! -z "$(echo ${line} | egrep '^Other Man:')" ]; then
        echo ${line} >> otherman_lines.txt
    else
        echo ${line} >> garbage_lines.txt
    fi
done < sketch.txt

When you save this as “sketch-process.sh”, set it’s execution bit and put it into the same folder like formerly mentioned sketch.txt – file, you will end up with another 3 files after you executed “sketch-process.sh”:

  1. garbage_lines.txt
    … containing the “(pause)“-lines
  2. man_lines.txt
    … containing the lines starting with “Man:
  3. otherman_lines.txt
    … containing the lines, starting with “Other Man:

I hope you liked this article and that it turns out being helpful for some! Let me know in the comments ✌