Grep strings from a .txt file (linux/mac)

In summary, the conversation discusses using grep and awk to extract specific information from a text file and store it in a new file. The conversation also mentions the possibility of using Perl or Python for this task, with some disagreement on which is easier to use. The importance of being able to quickly and efficiently code a solution is emphasized.
  • #1
dRic2
Gold Member
883
225
I have a DATA.txt file which contains lots of useless info. I get the informations I want typing in my terminal
Bash:
grep freq DATA.txt
and the output is the following:

freq ( 1) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 2) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 3) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 4) = 5.968261 [THz] = 199.079745 [cm-1]
freq ( 5) = 5.968261 [THz] = 199.079745 [cm-1]
freq ( 6) = 5.968261 [THz] = 199.079745 [cm-1]

I would like to create a bash script which extracts only the numbers of the last column and stores them in a new .txt file as follows:

1 -6.461768
2 -6.461768
3 -6.461768
4 199.079745
5 199.079745
6 199.079745

I though of something on the line of

Bash:
FILEIN=DATA.txt
FILEOUT=OUT.txt

while ...
freq = `grep freq $FILEIN | ... something ...`

echo ... ${freq} >> $FILEOUT
done

I don't really know how to program in bash... I just copy and modify scripts that were handed to me in similar occasions. But I don't know how to implement this one.

Thanks in advance!
Ric
 
Physics news on Phys.org
  • #3
If you have Perl installed (it is often installed by default), you can try this program.

Perl:
open(IN, "<DATA.txt");
open(OUT, ">new.txt");
while( $line = <IN> ){
    if( $line =~ /freq\s*\(\s*(\d+)\).*?(\S+) \[cm/ ){
        print OUT "$1 $2\n";
    }
}
close IN;
close OUT;
 
  • Like
Likes dRic2
  • #4
Thanks a lot! awk works like a charm and I got everything I needed done. I'll be looking into Perl as well since I don't know what that is.
 
  • #5
I do all of these kinds of things in Python. It is much easier and more versatile than bash scripts or Perl.
 
Last edited:
  • Like
Likes jedishrfu
  • #6
phyzguy said:
I do all of these kinds of things in Python. It is much easier and more versatile than bash scripts or Perl.
1) I work remotely via ssh connection on a machine and I'm not sure if Phyton is installed on it...
2) I don't know python... sad but true. I' will try to learn the basics though
 
  • #7
I disagree. Perl is significantly easier than Python for a scripting task like this. I have re-coded several Python scripts done by Python advocates who are excellent programmers and ended up with much simpler programs. For example, Perl makes it much easier to capture the STDOUT output of a called application in a string and parse it.
That being said, I would currently recommend that a person learn Python rather than Perl because of several other advantages it has.

Another thing I like about Perl is how easy and natural it is to include error checking and warnings. Here is the code above with some error checking:

Perl with some simple error checking and warnings:
open(IN, "<DATA.txt") or die "ERROR: Can't open DATA.txt to read";
open(OUT, ">new.txt") or die "ERROR: Can't open new.txt to write" ;
while( $line = <IN> ){
    $lineNum++;
    if( $line =~ /freq\s*\(\s*(\d+)\).*?(\S+) \[cm/ ){
        print OUT "$1 $2\n";
    }elsif( $line =~ /freq/ ){
        warn "Line $lineNum wasn't parsed correctly:\n$line\n";
    }
}
close IN;
close OUT;
 
Last edited:
  • #8
dRic2 said:
1) I work remotely via ssh connection on a machine and I'm not sure if Phyton is installed on it...
2) I don't know python... sad but true. I' will try to learn the basics though
Most Unix systems have Python installed by default. I just find Python more capable and easier to use than Perl or shell scripts, and once you've learned Python you can do many more things that you can't do with scripting languages. But these things are a matter of taste, so use whichever makes the job easier for you. I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
 
  • #9
phyzguy said:
Most Unix systems have Python installed by default. I just find Python more capable and easier to use than Perl or shell scripts, and once you've learned Python you can do many more things that you can't do with scripting languages. But these things are a matter of taste, so use whichever makes the job easier for you. I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
IMHO, the great advantage now of Python over Perl (which I still think is a better OS scripting language) is the great interest and thriving community of users. There are so many scientific hobbies, like robotics, artificial intelligence, gaming, etc., where Python works within an entire environment for that hobby. In many (most?) of those hobbies, the Python code is a small fraction of the learning and most of the effort is in learning how to use the specific tools for the hobby (which is the fun part).
 
  • Like
Likes jedishrfu
  • #10
A slight adjustment shows Awk to be very versatile too:
Bash:
awk -e ‘/freq/ { print $4 }’ data.txt
 
  • #11
jedishrfu said:
A slight adjustment shows Awk to be very versatile too:
Bash:
awk -e ‘/freq/ { print $4 }’ data.txt
People who do a lot of work with combinations of sed and awk are likely to go crazy. There is a good reason that Perl was developed and completely transformed that kind of work.
 
  • #12
I seldom used sed. While working on PC DOS, I happened upon the Thompson toolkit which provided unix commands for the PC world. In that collection was the Awk compiler which could generate exe files. I was in a slump at the time doing C utilities for our group and Awk transformed my work productivity that I began doing a lot of tools using it exclusively.

After awhile though, there was a point where some new functionality didn’t really fit the Awk programming pattern and I had to recode in another language or suffer through the added complexity of using counters and flags... to get what I wanted. This usually came up when more than one file of a different format was being processed by the Awk script.

I tried Perl once and ran into some strange error and decided that awk suited me better. Also I don’t recall Perl having an implementation on PC DOS at the time. One thing I did like was the enhanced regular expressions in Perl that awk didn’t implement where you could identify words or classes of characters.

More recently, I’ve recoded some of my awk scripts into python with good results and better scalability. But I still view Awk fondly and do one offs with it when I can.
 
  • Like
Likes Dr Transport and Keith_McClary
  • #13
My largest single use of awk and sed was in the conversion of a lot of FORTRAN code to Ada. I used a several-step sequence of alternating sed and awk steps to perform a lot of the routine conversions before I went in for the final conversion by hand. A while later, another task was to automate code generation to access data in memory locations in the languages of FORTRAN, C, and Ada. Starting from any of the three languages, the other two had to be generated. The data structures allowed were very flexible. By that time, I had learned Perl and was able to completely automate the process. I don't believe that I could have attempted it using sed and awk without ending up in an asylum.
 
  • #14
Program transcription is a painful art. I've done it a few times.

Once, years ago I had a Timesharing Fortran program in a dialect different from the Honeywell Fortran-Y that we had installed on our mainframe. I wrote a conversion tool in the Text Executive programming language to facilitate the conversion when I started seeing that I was not being consistent when converting similar lines of code and realized an automated approach could reduce injected errors through misunderstanding what a line of code was doing.

I left that job when it was half finished but going well. My successor decided instead to take what I had converted and go manual from there. The end result was it never got implemented and the project failed.

More recently, I wrote a small awk script to do a MatLab to Julia conversion which worked well but I never got to the point of actually testing the converted code as there was little interest in our group to invest in Julia.

I also considered a Fortran to Julia conversion to break away from legacy Fortran code and make it more accessible as a Julia program. The one area that gave me trouble was Fortran's use of common blocks and Julia's lack of global variables. However there was some Julia structure that could approximate a global but I stopped at that point too for lack of management interest.
 
  • Like
Likes FactChecker
  • #15
phyzguy said:
I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
yes I agree. Here I'm working with a software that is well interfaced with simple bash scripts and the community online seems to only use bash scripts. That's why I'm sticking to it... I don't have preferences since I'm a newbie... I'll look into Python, as suggested, because it seems a valuable programming tool to know, but I'll keep using bash scripts for this one.
 
  • #16
phyzguy said:
I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
A simple program should normally be much easier to code and get working. KISS: Keep It Simple Stupid. And it is hard to beat a language that is made especially for the specific type of application. IMHO, for most scripting, Perl has advantages that Python can not match.
 
Last edited:

1. What is Grep and how does it work?

Grep is a command line tool in Linux/Mac operating systems that is used to search for specific patterns or strings within a text file. It works by using regular expressions to match the given pattern and then displays the matching lines in the output.

2. How do I use Grep to search for a specific string in a .txt file?

To use Grep to search for a specific string in a .txt file, use the command "grep [string] [file name]" in the terminal. This will search for the given string in the specified file and display any matching lines in the output.

3. Can I use Grep to search for multiple strings in a .txt file?

Yes, you can use Grep to search for multiple strings in a .txt file by using the "-e" option followed by the strings you want to search for, separated by a space. For example, "grep -e [string1] -e [string2] [file name]".

4. How do I ignore case sensitivity when using Grep?

To ignore case sensitivity when using Grep, you can use the "-i" option. This will make the search case-insensitive and will match both upper and lower case versions of the given string.

5. Can Grep be used to search for patterns in multiple files at once?

Yes, Grep can be used to search for patterns in multiple files at once by using the "*" wildcard character to specify the files you want to search through. For example, "grep [string] *.txt" will search for the given string in all .txt files in the current directory.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
1K
  • Programming and Computer Science
Replies
2
Views
1K
Replies
16
Views
2K
  • Programming and Computer Science
Replies
12
Views
1K
  • Programming and Computer Science
Replies
13
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • Programming and Computer Science
Replies
16
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • Programming and Computer Science
Replies
34
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
10
Views
9K
Back
Top