Bash, reading a file lineby line and extracting substrings

In summary: There are a lot of ways to do this. Here are a couple of ideas:- read the file into bash as an array. Each line is an element in the array. You can write the array back out as a file, or whatever.- use awk. This is a tool intended to parse fields [columns] of text and do stuff with the data. It was made just for this task. It is the perfect tool to use for this task. If you are supposed to use bash, you should, but you should also know how to use the whole toolset.- use sed, stream editor, which is the same as the substitute command in vi (or vim). You
  • #1
Arnoldjavs3
191
3

Homework Statement


so I have this text file that I'm reading line by line
and here is an example of a line:

root pts/1 01-19 13:41 (10.0.0.48)

Homework Equations

The Attempt at a Solution


How can I extract the terminal and timeframe only?

I tried
echo $line | cut -d : -f2, 4

and it just gives me "1 10.0.0.48"
is it not just delimited by spaces?

even
echo $line | cut -d ' ' -f2,4
only gives me "pts/1" and they're poorly formatted.How can I get it to show me the terminal and timeframe only?
 
Physics news on Phys.org
  • #2
Arnoldjavs3 said:
is it not just delimited by spaces?
You specified ":" as delimiter. I'm surprised you get a meaningful result with -f2,4, as there are just two parts separated by ":", and I get them with -f1 and -f2:

> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d : -f 1
root pts/1 01-19 13
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d : -f 2
41 (10.0.0.48)

What you want is a separation by spaces:

> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f4
13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f3,4
01-19 13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f2,4
pts/1 13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f2,3,4
pts/1 01-19 13:41

If your input line uses a mixture of space and tabs or other whitespace characters, then things get more complicated.
 
  • #3
mfb said:
You specified ":" as delimiter. I'm surprised you get a meaningful result with -f2,4, as there are just two parts separated by ":", and I get them with -f1 and -f2:

> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d : -f 1
root pts/1 01-19 13
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d : -f 2
41 (10.0.0.48)

What you want is a separation by spaces:

> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f4
13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f3,4
01-19 13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f2,4
pts/1 13:41
> echo "root pts/1 01-19 13:41 (10.0.0.48)" | cut -d " " -f2,3,4
pts/1 01-19 13:41

If your input line uses a mixture of space and tabs or other whitespace characters, then things get more complicated.

in my second command, i changed ' ' to " " and it worked. What gives? Why does it differ in output
 
  • #4
Doesn't make a difference in my shell, so I cannot reproduce it.
 
  • #5
Very odd... although thank you for guiding me down this realization
 
  • #6
Arnoldjavs3 said:
echo $line | cut -d ' ' -f2,4
If you are processing a large file, it is inefficient to set up a pipe and call a utility for every line you process; it is better to keep everything within bash.

You can make use of bash's built-in pattern matching to select the desired text, e.g., here's a start
x=${line#* }
x=${x% *}
echo "$x"

I've shown it verbosely, you can condense it to a single statement.
 
  • Like
Likes Arnoldjavs3
  • #7
This is just a comment: UNIX has really very good tools designed for dealing with files specifically. bash is great on a general level, but as @NascentOxygen points out, you can wind up introducing inefficient code which bogs down to a crawl on big files. There is a tool meant to do what you want: parsing lines into fields. It is done by default with no action on your part.

The tool is awk. You specify fields with $[field number]: $0 is the whole line, $1 is the first field, $2 is the second ... $(NF) is the last fields where NF is a variable that provides the number of fields:
Assume you want pts/1 (second field) and (10.0.0.48) is the last field. One line of awk and you're done. You can write a new file as a side effect.
Code:
awk '{ print $2, $(NF) }'  myinputfilename  > myoutputfilename

The point is not that you have to use awk, bash may be okay, but you should always consider the whole toolset to save yourself grief. Frequently homework makes you use a less friendly tool, so that's certainly okay - which is the case here.

FWIW: in bash, if you must use it, consider arrays
Code:
while read line
do
   arr=( $line )  # create an array named arr, index 0 is the first array element, we use $IFS
                       #      variable as the delimiter [i.e., spaces and tabs == default]
                       # you can specify different values for the IFS bash internal variable is you want.
   echo "${arr[1]}  ${arr[4]}"    # 1 is really the second element in the array
done < myinputfilename  > myoutputfilename
 
  • Like
Likes Arnoldjavs3 and UsableThought
  • #8
Thanks for the clarity. Bash feels a lot different to wrap my head around than with today's systems.

Just curious as well, say I have a textfile(list of medications and their info for example) that contains some information. If I do this:

medDoses=$(cut -f3 <medFile.txt>)

Would medDoses be considered a temporary file or no? And how do you clean/remove temporary files in bash?
 
  • #9
Oops I was distracted and had to edit things... I cannot type well when distracted.

It is stored as a variable, $medDoses, in memory. Not a file at all. Try
Code:
 echo "$MedDoses"
It is interesting. Not what you would expect at all.

Temporary files are another 'thing'. Some uses in shell that did create temporary files in the past, e.g., pipes, are now totally in memory as shared memory segments -- read [ and sometimes write] access memory shared between different processes.

True temporary files on disk are easily made with the mkstemp command. There are standards about how commands work (FYI POSIX is the standards name), so what I'm saying is the standard. Your system may have some extra tweaks. Or if it uses a really old OS, this may vary. Read man mkstemp to see. mkstemp tries to create a file with a unique filename in the directory, so that you cannot accidentally open somebody else's file, for example. Two or more processes can have one file open simultaneously, which causes problems when you do not know that happened.

Nothing beats an example:
Code:
#!/bin/bash
export tmpfile=$(mkstemp /tmp/myscript.XXXXXX)  # creates an empty file, when process exits
                                     # the function will delete the file
finish()  # cleanup function
{
    rm $tmpfile
}
trap finish EXIT
ls -l $tmpfile   # to see what you have
[ use more commands on the file if you want ]
exit 0
The file is deleted just before the process running the bash script is destroyed by the OS. That is what exit means basically.

mkostemp allows the use of flags to provide finer control over file creation parameters.
 
Last edited:
  • Like
Likes Arnoldjavs3
  • #10
Arnoldjavs3 said:
medDoses=$(cut -f3 <medFile.txt>)

Would medDoses be considered a temporary file or no? And how do you clean/remove temporary files in bash?
medDoses is a shell variable. It's not a file.
 
  • #11
Arnoldjavs3 said:
echo $line | cut -d : -f2, 4

When using a variable, it is often easy to break its value into pieces using read like this:

IFS=" " read -r f1 f2 f3 f4 f5 <<<"$line"
echo $f2 $f4 $f1 $f3 $f5
 

FAQ: Bash, reading a file lineby line and extracting substrings

1. How do I read a file line by line in Bash?

In Bash, you can use the while read loop to read a file line by line. For example: while read line; do echo $line; done < input.txt will print each line in the input.txt file.

2. How do I extract substrings from a line in Bash?

To extract substrings from a line in Bash, you can use the cut command. For example: echo "Hello World" | cut -d " " -f 1 will print "Hello", the first substring separated by a space.

3. Can I specify the delimiter when extracting substrings in Bash?

Yes, you can specify the delimiter using the -d flag in the cut command. For example: echo "Hello,World" | cut -d "," -f 2 will print "World", the second substring separated by a comma.

4. How can I save the extracted substrings in a variable in Bash?

You can save the extracted substrings in a variable using command substitution. For example: var=$(echo "Hello World" | cut -d " " -f 1) will save "Hello" in the var variable.

5. Can I use regular expressions to extract substrings in Bash?

Yes, you can use regular expressions with the grep command to extract substrings in Bash. For example: echo "Hello World" | grep -o "H.*d" will print "Hello World", the substring that starts with "H" and ends with "d".

Back
Top