One of the most common tasks when working with Bash-scripts and the Linux shell is text processing like filtering, selection, transforming, …
Often, these texts come from text files like CSV, log files, and so on. If you are an experienced user who is doing this on a daily basis, typing these kinds of command chains often feel like they are coming from “muscle memory” more than from your brain. But most of the time, you need only parts of these lines, like “the 5th to the 7th field” or some regular expressions match; these are usually quite easy to catch using a combination of the well-known tools awk, grep, cut or sed.
But: How to iterate (loop) over each line of a file in Bash and use that value for your processing; like, if you want to preserve only lines, matching a specific pattern or divide the script of a play into separate files per role, for example?
This is not really a hard task. But unlike the formerly mentioned processing for but parts of that lines, doing such things for the whole line is not a very commonly needed thing for my daily work. That’s why this gives me a hard time when I need it in those rare cases because that is not stored in my very own “muscle memory” and it enforces me to crawl through my memory castle and dig for it (or Google for it ?).
The solution
Reading from a file
Sure thing, there are plenty of ways to solve this, including not utilizing Bash in the first place, but some real programming language like Python ?. But to me, the following approach has proven itself as the most effective and easy to remember one:
#!/bin/bash while IFS="" read -r line || [[ -n "$line" ]]; do echo "Text read from file: $line" done < "/my/file"
Explanation:
- IFS=””
prevents leading/trailing whitespace from being trimmed. - -r
prevents backslash escapes from being interpreted. - -d ‘ ‘ (not used in line-by-line processing)
if this is added as a parameter toread
, a whitespace is used to terminate the input line, rather than newline. This is handy, if you want to do something for each (whitespace-separated) element of the input, rather than line-by-line. - || [[ -n “$line” ]]
prevents the last line from being ignored if it doesn’t end with a newline (\n), since read returns a non-zero exit code when it encounters EOF.
Instead of the “echo”-line, you can do whatever you like with the ${line}-variable, of course!
Reading another programs output
When reading the output from another command (like find
) instead of a file, the call is very similar:
#!/bin/bash while IFS="" read -r line || [[ -n "$line" ]]; do echo "Text read from find output: $line" done < <(find /some/dir -name file.txt)
See the explanation in the file-section before. Really nothing too different here.
An Example
The following example is using this sketch.txt – file, which contains the text of a Monty Python sketch. It uses the proposed solution to separate its content into two separate files for each of the roles:
#!/bin/bash while IFS="" read -r line || [[ -n "${line}" ]]; do if [ ! -z "$(echo ${line} | egrep '^Man:')" ]; then echo ${line} >> man_lines.txt elif [ ! -z "$(echo ${line} | egrep '^Other Man:')" ]; then echo ${line} >> otherman_lines.txt else echo ${line} >> garbage_lines.txt fi done < sketch.txt
When you save this as “sketch-process.sh”, set it’s execution bit and put it into the same folder like formerly mentioned sketch.txt – file, you will end up with another 3 files after you executed “sketch-process.sh”:
- garbage_lines.txt
… containing the “(pause)“-lines - man_lines.txt
… containing the lines starting with “Man:“ - otherman_lines.txt
… containing the lines, starting with “Other Man:“
I hope you liked this article and that it turns out being helpful for some! Let me know in the comments ✌
Born in 1982, Marc Richter is an IT enthusiast since 1994. He became addicted when he first put his hands on their family’s PC and never stopped investigating and exploring new things since then.
He is married to Jennifer and a proud father of two wonderful children.
His current professional focus is DevOps and Python development.
An exhaustive bio can be found in this blog post.