Creating a Subset of Data with Awk Command in Unix

  • Thread starter Thread starter RJLiberator
  • Start date Start date
  • Tags Tags
    Unix
Click For Summary

Discussion Overview

The discussion focuses on using the awk command in Unix to create a subset of data from an original file, specifically targeting records with longitudes between -112 and -102. The context includes homework assistance and practical coding examples.

Discussion Character

  • Homework-related
  • Technical explanation
  • Exploratory

Main Points Raised

  • One participant seeks to understand how to use awk to filter records based on longitude values, mentioning the need for a command that selects column 1 from the original file.
  • Another participant suggests creating a sample data file to test the awk command and emphasizes the importance of hands-on practice with the utility.
  • A participant expresses uncertainty about the data format in column 1, indicating they will verify if the values are integers.
  • There is a suggestion to search online for tutorials on awk to gain a better understanding of its functionality.
  • A participant proposes a command structure but is unsure how to correctly implement the range filtering using the && operator.
  • One participant successfully shares a working awk command that filters the desired records, demonstrating the correct syntax for the range condition.
  • Another participant reiterates the successful command and notes that the print command can be omitted since printing the whole record is the default action in awk.

Areas of Agreement / Disagreement

Participants generally agree on the approach to using awk for filtering data, with some expressing uncertainty about specific syntax and data formats. The discussion includes multiple attempts and suggestions without a definitive resolution on the best practices.

Contextual Notes

Participants mention the need to confirm the data format in column 1, which may affect the implementation of the awk command. There are also references to varying levels of familiarity with awk and Unix commands among participants.

Who May Find This Useful

Individuals learning to use awk for data manipulation in Unix, particularly those working on homework or practical coding exercises related to data filtering.

RJLiberator
Gold Member
Messages
1,094
Reaction score
63

Homework Statement


use awk to create a subset of the original data file called subset.gmt that only contains records with longitudes between -112 and -102. Hint: you can use the operator && to tell awk to look for more than one pattern at once.

Homework Equations

The Attempt at a Solution



What we need is to find an awk command that allows me to take the data from a previous file, the column 1 (longitude) and take only the records between -112 and -102.

So, I understand we start the command as such:

awk 'NR>1{print $1...}' originalfile.txt >! newoutput.ps

The column that I want to pull from is 1 from the file.
But how do I create a code such that it pulls only the records between -112 and -102?
Is it an if statement? Professor mentions the use of boolean operator &&, but I've searched the internet with no luck...

Thank you.
 
Physics news on Phys.org
Presumably, the values in column 1 will not be all neat integers?

I think your first move should be to invent your own sample data file containing a representative selection of records, including a couple where the relevant field is, say, -112 or something. Then focus on devising the code for selecting those records where the desired field equals that value, -112

Once you have this working, you can build on it to select for a range of values, and not necessarily integers. Looking at short sample awk scripts in your class notes or online is the best way to learn how this can be achieved.

You have an awk utility at home that you can keep testing your script on? Study manuals all you like, but nothing is as valuable to learning as the immediacy of being able to "guess & test".
 
  • Like
Likes   Reactions: RJLiberator
Yes, I have a virtual machine which allows me to use linux to study unix and the awk commands here.

I can't confirm if they are neat integers or not, they may be, I will be able to confirm that tomorrow. :)

I will check through some sample class code, but after doing so earlier, I didn't have much luck in finding a way to isolate part of the column results in such a manner.
What could the && operator be used for?
 
Google will find plenty of resources, e.g., search on "unix awk tutorial".
 
A friend of mine suggested something along the nature of :

awk 'NR>1{print $1 -112&&-102}' originalfile.txt >! newoutput.ps

but not sure how to make it 'search' for the values inbetween -112 and -102

Perhaps:

awk -F'[:}]''$(NF-1) >= -112&& $(NF-1) <= -102' file > output.txt
 
Boom.
Got it.
Sample code:
Code:
awk '$1>=-112&&$1<=-102{print}' EX1.csh > subset11.gmt

What this does is it takes column 1 and says all values between -112 and -102 you will print.
 
RJLiberator said:
Sample code:
Code:
awk '$1>=-112&&$1<=-102{print}' EX1.csh > subset11.gmt

What this does is it takes column 1 and says all values between -112 and -102 you will print.
Looks good. You will find that you can even omit the {print} command because printing the whole record is the default action for AWK.
 
  • Like
Likes   Reactions: RJLiberator

Similar threads

Replies
7
Views
3K
Replies
1
Views
4K
  • · Replies 3 ·
Replies
3
Views
6K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
2K
  • · Replies 14 ·
Replies
14
Views
4K