Grep strings from a .txt file (linux/mac)

  • Thread starter Thread starter dRic2
  • Start date Start date
  • Tags Tags
    File Strings
Click For Summary

Discussion Overview

The discussion revolves around extracting specific numerical data from a text file using various programming and scripting languages, particularly focusing on the use of grep, awk, Perl, and Python. Participants share their approaches, experiences, and preferences regarding these tools in the context of data manipulation tasks.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant describes their need to extract specific frequency data from a text file using grep and seeks help in creating a bash script.
  • Another participant suggests using awk for its ability to easily access specific columns in the data.
  • A third participant proposes a Perl solution, highlighting its capabilities for error checking and parsing output.
  • Some participants express a preference for Python over bash or Perl, citing its versatility and ease of use, while others argue that Perl is simpler for certain scripting tasks.
  • Several participants share personal experiences with awk, Perl, and Python, discussing their strengths and weaknesses in different contexts.
  • One participant mentions the importance of community support for Python, suggesting it may be more beneficial to learn despite personal preferences for Perl.
  • There are multiple suggestions for using awk with slight variations in syntax to achieve similar results.
  • Some participants reflect on their historical use of awk and sed, discussing their experiences with programming and data conversion tasks.

Areas of Agreement / Disagreement

Participants express differing opinions on the best tool for the task, with some advocating for Python, others for Perl, and some for awk. There is no consensus on which language is superior, as preferences vary based on individual experiences and specific use cases.

Contextual Notes

Participants mention limitations such as uncertainty about the availability of Python on remote machines and varying levels of familiarity with the discussed programming languages.

Who May Find This Useful

Individuals interested in data extraction and manipulation using command-line tools, as well as those exploring different programming languages for scripting tasks.

dRic2
Gold Member
Messages
887
Reaction score
225
I have a DATA.txt file which contains lots of useless info. I get the informations I want typing in my terminal
Bash:
grep freq DATA.txt
and the output is the following:

freq ( 1) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 2) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 3) = -0.193719 [THz] = -6.461768 [cm-1]
freq ( 4) = 5.968261 [THz] = 199.079745 [cm-1]
freq ( 5) = 5.968261 [THz] = 199.079745 [cm-1]
freq ( 6) = 5.968261 [THz] = 199.079745 [cm-1]

I would like to create a bash script which extracts only the numbers of the last column and stores them in a new .txt file as follows:

1 -6.461768
2 -6.461768
3 -6.461768
4 199.079745
5 199.079745
6 199.079745

I though of something on the line of

Bash:
FILEIN=DATA.txt
FILEOUT=OUT.txt

while ...
freq = `grep freq $FILEIN | ... something ...`

echo ... ${freq} >> $FILEOUT
done

I don't really know how to program in bash... I just copy and modify scripts that were handed to me in similar occasions. But I don't know how to implement this one.

Thanks in advance!
Ric
 
Physics news on Phys.org
If you have Perl installed (it is often installed by default), you can try this program.

Perl:
open(IN, "<DATA.txt");
open(OUT, ">new.txt");
while( $line = <IN> ){
    if( $line =~ /freq\s*\(\s*(\d+)\).*?(\S+) \[cm/ ){
        print OUT "$1 $2\n";
    }
}
close IN;
close OUT;
 
  • Like
Likes   Reactions: dRic2
Thanks a lot! awk works like a charm and I got everything I needed done. I'll be looking into Perl as well since I don't know what that is.
 
I do all of these kinds of things in Python. It is much easier and more versatile than bash scripts or Perl.
 
Last edited:
  • Like
Likes   Reactions: jedishrfu
phyzguy said:
I do all of these kinds of things in Python. It is much easier and more versatile than bash scripts or Perl.
1) I work remotely via ssh connection on a machine and I'm not sure if Phyton is installed on it...
2) I don't know python... sad but true. I' will try to learn the basics though
 
I disagree. Perl is significantly easier than Python for a scripting task like this. I have re-coded several Python scripts done by Python advocates who are excellent programmers and ended up with much simpler programs. For example, Perl makes it much easier to capture the STDOUT output of a called application in a string and parse it.
That being said, I would currently recommend that a person learn Python rather than Perl because of several other advantages it has.

Another thing I like about Perl is how easy and natural it is to include error checking and warnings. Here is the code above with some error checking:

[CODE lang="perl" title="Perl with some simple error checking and warnings"]open(IN, "<DATA.txt") or die "ERROR: Can't open DATA.txt to read";
open(OUT, ">new.txt") or die "ERROR: Can't open new.txt to write" ;
while( $line = <IN> ){
$lineNum++;
if( $line =~ /freq\s*\(\s*(\d+)\).*?(\S+) \[cm/ ){
print OUT "$1 $2\n";
}elsif( $line =~ /freq/ ){
warn "Line $lineNum wasn't parsed correctly:\n$line\n";
}
}
close IN;
close OUT;[/CODE]
 
Last edited:
dRic2 said:
1) I work remotely via ssh connection on a machine and I'm not sure if Phyton is installed on it...
2) I don't know python... sad but true. I' will try to learn the basics though
Most Unix systems have Python installed by default. I just find Python more capable and easier to use than Perl or shell scripts, and once you've learned Python you can do many more things that you can't do with scripting languages. But these things are a matter of taste, so use whichever makes the job easier for you. I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
 
phyzguy said:
Most Unix systems have Python installed by default. I just find Python more capable and easier to use than Perl or shell scripts, and once you've learned Python you can do many more things that you can't do with scripting languages. But these things are a matter of taste, so use whichever makes the job easier for you. I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
IMHO, the great advantage now of Python over Perl (which I still think is a better OS scripting language) is the great interest and thriving community of users. There are so many scientific hobbies, like robotics, artificial intelligence, gaming, etc., where Python works within an entire environment for that hobby. In many (most?) of those hobbies, the Python code is a small fraction of the learning and most of the effort is in learning how to use the specific tools for the hobby (which is the fun part).
 
  • Like
Likes   Reactions: jedishrfu
  • #10
A slight adjustment shows Awk to be very versatile too:
Bash:
awk -e ‘/freq/ { print $4 }’ data.txt
 
  • #11
jedishrfu said:
A slight adjustment shows Awk to be very versatile too:
Bash:
awk -e ‘/freq/ { print $4 }’ data.txt
People who do a lot of work with combinations of sed and awk are likely to go crazy. There is a good reason that Perl was developed and completely transformed that kind of work.
 
  • #12
I seldom used sed. While working on PC DOS, I happened upon the Thompson toolkit which provided unix commands for the PC world. In that collection was the Awk compiler which could generate exe files. I was in a slump at the time doing C utilities for our group and Awk transformed my work productivity that I began doing a lot of tools using it exclusively.

After awhile though, there was a point where some new functionality didn’t really fit the Awk programming pattern and I had to recode in another language or suffer through the added complexity of using counters and flags... to get what I wanted. This usually came up when more than one file of a different format was being processed by the Awk script.

I tried Perl once and ran into some strange error and decided that awk suited me better. Also I don’t recall Perl having an implementation on PC DOS at the time. One thing I did like was the enhanced regular expressions in Perl that awk didn’t implement where you could identify words or classes of characters.

More recently, I’ve recoded some of my awk scripts into python with good results and better scalability. But I still view Awk fondly and do one offs with it when I can.
 
  • Like
Likes   Reactions: Dr Transport and Keith_McClary
  • #13
My largest single use of awk and sed was in the conversion of a lot of FORTRAN code to Ada. I used a several-step sequence of alternating sed and awk steps to perform a lot of the routine conversions before I went in for the final conversion by hand. A while later, another task was to automate code generation to access data in memory locations in the languages of FORTRAN, C, and Ada. Starting from any of the three languages, the other two had to be generated. The data structures allowed were very flexible. By that time, I had learned Perl and was able to completely automate the process. I don't believe that I could have attempted it using sed and awk without ending up in an asylum.
 
  • #14
Program transcription is a painful art. I've done it a few times.

Once, years ago I had a Timesharing Fortran program in a dialect different from the Honeywell Fortran-Y that we had installed on our mainframe. I wrote a conversion tool in the Text Executive programming language to facilitate the conversion when I started seeing that I was not being consistent when converting similar lines of code and realized an automated approach could reduce injected errors through misunderstanding what a line of code was doing.

I left that job when it was half finished but going well. My successor decided instead to take what I had converted and go manual from there. The end result was it never got implemented and the project failed.

More recently, I wrote a small awk script to do a MatLab to Julia conversion which worked well but I never got to the point of actually testing the converted code as there was little interest in our group to invest in Julia.

I also considered a Fortran to Julia conversion to break away from legacy Fortran code and make it more accessible as a Julia program. The one area that gave me trouble was Fortran's use of common blocks and Julia's lack of global variables. However there was some Julia structure that could approximate a global but I stopped at that point too for lack of management interest.
 
  • Like
Likes   Reactions: FactChecker
  • #15
phyzguy said:
I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
yes I agree. Here I'm working with a software that is well interfaced with simple bash scripts and the community online seems to only use bash scripts. That's why I'm sticking to it... I don't have preferences since I'm a newbie... I'll look into Python, as suggested, because it seems a valuable programming tool to know, but I'll keep using bash scripts for this one.
 
  • #16
phyzguy said:
I think the most important thing is not how simple or elegant the program is. The important thing is how long it takes you to get the program coded up and working.
A simple program should normally be much easier to code and get working. KISS: Keep It Simple Stupid. And it is hard to beat a language that is made especially for the specific type of application. IMHO, for most scripting, Perl has advantages that Python can not match.
 
Last edited:

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
16
Views
3K
Replies
10
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
Replies
1
Views
2K
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K