What is the purpose of the awk command in relation to file.txt?

  • Thread starter Thread starter frankliuao
  • Start date Start date
AI Thread Summary
The awk command `awk '{$1=$1}1' file.txt` is used to delete leading spaces from the lines of file.txt by forcing awk to reevaluate the first field. The expression `$1=$1` does not change the value of $1 but triggers awk to recompute the entire line, thus trimming leading whitespace. The trailing `1` acts as a true condition, prompting awk to print the modified lines. While this syntax is effective, it is considered an awkward idiom and may not be the best practice; alternatives like Perl or sed are often recommended for such tasks. Overall, this command illustrates how awk processes input by treating whitespace as field separators, which inherently compresses multiple spaces into a single space.
frankliuao
Messages
3
Reaction score
0
awk '{$1=$1}1' file.txt

This command can delete leading spaces of file.txt.

But why? Anyone can tell me what $1=$1 does and the single 1 does?

THanks,
 
Technology news on Phys.org


That's just an awk idiom that forces it to recompute the value of $1. It's deleting leading spaces because by default, awk ignores them when reading values into vars. Personally I'd never use that syntax; it's ugly (it's intentionally obfuscated -- there is a more explicit way to write it) and won't always work depending on filename. Perl is better for that sort of job.

perl -pi.bak -e 's/^\s+//' file.txt

This will remove leading spaces and tabs from file.txt, making a backup of the original just in case as file.txt.bak.
 


Here is another perl

perl -p -e 's/^ *//' file.txt

and a sed one

sed -i 's/^ *//' file.txt
 
frankliuao: I have not tried your awk command yet (so I could be wrong), but the $1=$1 appears to redefine the beginning of the current line ($0) with the first string ($1); therefore, any leading spaces in $0 are removed. The "1" following the right-hand brace might be a typographic mistake, and therefore might be ignored (?). Does it make any difference in your output, if you remove the appended 1?

Actually, I am rather surprised if your awk command prints anything, because the default action is to print $0 if no action is supplied (the portion inside the braces). But your awk command contains an action, and it is not print $0. Therefore, it seems you would need to explicitly tell it to print $0, if you want it to print the lines. Otherwise, it would print nothing.
 
Last edited:


You peeked my curiosity so I checked.

With the "1" it works.
Without it nothing is printed.

So it appears the "1" specifies that $0 should be printed.
I'm not sure where to find that in the manuals though.
 
Interesting, I like Serena. Thanks for checking that. Did it really remove leading spaces? If so, does awk '' file.txt[/color] remove leading spaces, or not?
 


awk '' file.txt does not print anything.
But awk '{$2=$2}1' file.txt has the same effect as awk '{$1=$1}1' file.txt.
In particular they also replace all non-leading sequences of white space by a single white space.

Edit: and both awk '1' file.txt and awk '{}1' file.txt have the same effect.
They print the original lines.
 


This bugged me, so I started reading the manual.
My conclusion is that

awk '{$1=$1}1' file.txt

is short hand for:

awk '
{$1=$1}
1
' file.txt

That is, the first line automatically matches, since no pattern has been specified.
In the second line the "1" is an expression pattern that is evaluated as true, meaning it always matches.
Since the action is left out on this line, the action defaults to {print}.
 


That leave the strange effect of {$1=$1}.
For that I found in the manual:

From the POSIX standard:
The awk utility shall denote the first field in a record $1, the second $2, and so on.
The symbol $0 shall refer to the entire record;
setting any other field causes the re-evaluation of $0.
Assigning to $0 shall reset the values of all other fields and the NF built-in variable.


It's not very specific, but apparently assigning to any of the fields $1, $2, etcetera, has the effect of joining all fields together and assigning that to $0.
 
  • #10


It is not "assigning" it to $0, it's reevaluating $0. I explained a few posts back, it's just some ugly awk idiom. Some people use it out of habit because they don't know what it does. Those who understand what it's doing only use it as a sort of shortcut to demonstrate their command of awk.

It's really just a bug in awk. $1=$1 makes the interpreter think you've changed $1, though you obviously haven't, so it reevaluates it. In so doing, leading spaces are trimmed because that's what awk does -- reads whitespace separated fields into $1..$n, trimming the whitespace as it does.
 
  • #11


justsomeguy said:
It is not "assigning" it to $0, it's reevaluating $0. I explained a few posts back, it's just some ugly awk idiom. Some people use it out of habit because they don't know what it does. Those who understand what it's doing only use it as a sort of shortcut to demonstrate their command of awk.

It's really just a bug in awk. $1=$1 makes the interpreter think you've changed $1, though you obviously haven't, so it reevaluates it. In so doing, leading spaces are trimmed because that's what awk does -- reads whitespace separated fields into $1..$n, trimming the whitespace as it does.

What's the difference between "assigning" and "reevaluating"?
And how do you explain that not only $1 is changed, but all other white space in the line is compressed as well?
 
  • #12


I like Serena said:
What's the difference between "assigning" and "reevaluating"?

An evaluation can change each time it is executed, that's what it means. Assignments, barring external influence, do not change their value (their evaluation).

Perhaps it's too subtle of a difference to get into.

I like Serena said:
And how do you explain that not only $1 is changed, but all other white space in the line is compressed as well?

That is what awk *does*. That is its purpose. All input to awk is processed into fields, by default, fields are separated by whitespace. This has nothing to do with the idiomatic assignment of one of the fields to itself.

" alice betty charlie dave" as input to awk always results in $1='alice', $2='betty', etc. unless you change the field separator (FS variable).
 
Back
Top