What is the purpose of the awk command in relation to file.txt?

  • Thread starter Thread starter frankliuao
  • Start date Start date
Click For Summary
SUMMARY

The awk command `awk '{$1=$1}1' file.txt` is utilized to delete leading spaces from the specified file, file.txt. The expression `$1=$1` forces awk to reevaluate the first field, effectively trimming whitespace due to its default behavior of ignoring leading spaces when reading input. The trailing `1` serves as a shorthand for the print action, ensuring that the modified line is output. Alternative methods using Perl and sed, such as `perl -pi.bak -e 's/^\s+//' file.txt` and `sed -i 's/^ *//' file.txt`, are also discussed for achieving similar results.

PREREQUISITES
  • Understanding of awk syntax and functionality
  • Familiarity with regular expressions
  • Basic knowledge of command-line interfaces
  • Experience with file manipulation in Unix/Linux environments
NEXT STEPS
  • Research the differences between awk and Perl for text processing tasks
  • Learn about awk field separators and how to customize them
  • Explore advanced awk idioms and their practical applications
  • Investigate the performance implications of using sed versus awk for file manipulation
USEFUL FOR

This discussion is beneficial for system administrators, developers, and data analysts who require efficient methods for text processing and file manipulation in Unix/Linux environments.

frankliuao
Messages
3
Reaction score
0
awk '{$1=$1}1' file.txt

This command can delete leading spaces of file.txt.

But why? Anyone can tell me what $1=$1 does and the single 1 does?

THanks,
 
Technology news on Phys.org


That's just an awk idiom that forces it to recompute the value of $1. It's deleting leading spaces because by default, awk ignores them when reading values into vars. Personally I'd never use that syntax; it's ugly (it's intentionally obfuscated -- there is a more explicit way to write it) and won't always work depending on filename. Perl is better for that sort of job.

perl -pi.bak -e 's/^\s+//' file.txt

This will remove leading spaces and tabs from file.txt, making a backup of the original just in case as file.txt.bak.
 


Here is another perl

perl -p -e 's/^ *//' file.txt

and a sed one

sed -i 's/^ *//' file.txt
 
frankliuao: I have not tried your awk command yet (so I could be wrong), but the $1=$1 appears to redefine the beginning of the current line ($0) with the first string ($1); therefore, any leading spaces in $0 are removed. The "1" following the right-hand brace might be a typographic mistake, and therefore might be ignored (?). Does it make any difference in your output, if you remove the appended 1?

Actually, I am rather surprised if your awk command prints anything, because the default action is to print $0 if no action is supplied (the portion inside the braces). But your awk command contains an action, and it is not print $0. Therefore, it seems you would need to explicitly tell it to print $0, if you want it to print the lines. Otherwise, it would print nothing.
 
Last edited:


You peeked my curiosity so I checked.

With the "1" it works.
Without it nothing is printed.

So it appears the "1" specifies that $0 should be printed.
I'm not sure where to find that in the manuals though.
 
Interesting, I like Serena. Thanks for checking that. Did it really remove leading spaces? If so, does awk '' file.txt[/color] remove leading spaces, or not?
 


awk '' file.txt does not print anything.
But awk '{$2=$2}1' file.txt has the same effect as awk '{$1=$1}1' file.txt.
In particular they also replace all non-leading sequences of white space by a single white space.

Edit: and both awk '1' file.txt and awk '{}1' file.txt have the same effect.
They print the original lines.
 


This bugged me, so I started reading the manual.
My conclusion is that

awk '{$1=$1}1' file.txt

is short hand for:

awk '
{$1=$1}
1
' file.txt

That is, the first line automatically matches, since no pattern has been specified.
In the second line the "1" is an expression pattern that is evaluated as true, meaning it always matches.
Since the action is left out on this line, the action defaults to {print}.
 


That leave the strange effect of {$1=$1}.
For that I found in the manual:

From the POSIX standard:
The awk utility shall denote the first field in a record $1, the second $2, and so on.
The symbol $0 shall refer to the entire record;
setting any other field causes the re-evaluation of $0.
Assigning to $0 shall reset the values of all other fields and the NF built-in variable.


It's not very specific, but apparently assigning to any of the fields $1, $2, etcetera, has the effect of joining all fields together and assigning that to $0.
 
  • #10


It is not "assigning" it to $0, it's reevaluating $0. I explained a few posts back, it's just some ugly awk idiom. Some people use it out of habit because they don't know what it does. Those who understand what it's doing only use it as a sort of shortcut to demonstrate their command of awk.

It's really just a bug in awk. $1=$1 makes the interpreter think you've changed $1, though you obviously haven't, so it reevaluates it. In so doing, leading spaces are trimmed because that's what awk does -- reads whitespace separated fields into $1..$n, trimming the whitespace as it does.
 
  • #11


justsomeguy said:
It is not "assigning" it to $0, it's reevaluating $0. I explained a few posts back, it's just some ugly awk idiom. Some people use it out of habit because they don't know what it does. Those who understand what it's doing only use it as a sort of shortcut to demonstrate their command of awk.

It's really just a bug in awk. $1=$1 makes the interpreter think you've changed $1, though you obviously haven't, so it reevaluates it. In so doing, leading spaces are trimmed because that's what awk does -- reads whitespace separated fields into $1..$n, trimming the whitespace as it does.

What's the difference between "assigning" and "reevaluating"?
And how do you explain that not only $1 is changed, but all other white space in the line is compressed as well?
 
  • #12


I like Serena said:
What's the difference between "assigning" and "reevaluating"?

An evaluation can change each time it is executed, that's what it means. Assignments, barring external influence, do not change their value (their evaluation).

Perhaps it's too subtle of a difference to get into.

I like Serena said:
And how do you explain that not only $1 is changed, but all other white space in the line is compressed as well?

That is what awk *does*. That is its purpose. All input to awk is processed into fields, by default, fields are separated by whitespace. This has nothing to do with the idiomatic assignment of one of the fields to itself.

" alice betty charlie dave" as input to awk always results in $1='alice', $2='betty', etc. unless you change the field separator (FS variable).
 

Similar threads

Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
1
Views
5K
  • · Replies 1 ·
Replies
1
Views
8K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
3
Views
2K
Replies
6
Views
3K
Replies
1
Views
2K